Observability: logs, metrics, and health in Laravel monoliths and microservices
âWorks on my machineâ is not a monitoring strategy. Production breaks in specific ways: disk fills, queue latency spikes, a dependency times out, or one replica serves stale reads. Good observability lets you answer what changed, for whom, and which hop failedâwithout SSH-ing into every box. The same ideas apply whether you run a Laravel monolith or a fleet of services; only the plumbing and cardinality get harder as you split boundaries.
Related: PHP database connection pooling ¡ Databases under load ¡ API gateway & messaging ¡ Sail: troubleshooting
Contents
- Three pillars: logs, metrics, traces
- What differs by environment
- Laravel monolith: practical layers
- Microservices: correlation and tracing
- Tool spectrum: classic to modern
- Under load: sampling, cardinality, cost
- Alerting that humans will respect
- Checklist
Three pillars: logs, metrics, traces
| Pillar | Answers | Typical mistakes |
|---|---|---|
| Logs | What narrative happened? (errors, audit, debug context) | Unstructured one-liners you cannot query; logging secrets; INFO floods in prod |
| Metrics | How much, how fast, compared to yesterday? (rates, histograms, saturation) | High-cardinality labels (full URL, user id on every series); charts nobody looks at |
| Traces | Which span in the chain was slow? | Missing context propagation so microservice B has no idea it belongs to request A |
They reinforce each other: a spike in error rate (metric) sends you to representative logs and a trace of a slow checkout. No single pillar replaces the others.
What differs by environment
- Local / dev â maximize developer speed:
tail, Telescope-style debug UIs, verbose logs, breakpoints. Do not ship that verbosity to prod unchanged. - Staging / pre-prod â mirror prod-like logging sinks and dashboards where affordable; catch âworks until we enable JSON to Lokiâ class mistakes.
- Production â optimize for signal, retention cost, and safe defaults: structured logs, redaction, sampling on debug paths, health checks that reflect real dependencies.
The goal is not identical tooling everywhereâit is compatible contracts (same correlation field names, same metric names) so incidents do not require relearning the system.
Laravel monolith: practical layers
Application logging
- Use
config/logging.phpchannels:stack,daily,syslog, or a JSON formatter to stdout for container hosts to scrape. - Prefer structured fields (
contextarrays) over parsing English sentences later. - Attach request id, user id (if policy allows), queue job idâwhatever helps you pivot from one log line to the rest of the story.
Request lifecycle
- Middleware for correlation id: accept incoming
X-Request-Idor generate one; return it in responses; pass it to jobs and HTTP clients. - Laravelâs
Log::withContext()(modern versions) helps keep context on the request without threading parameters everywhere.
Queues and schedules
- Horizon (Redis) gives queue depth, throughput, failed jobsâtreat it as first-class monitoring, not an optional UI.
- Scheduled tasks: log start/end/duration; alert on missed runs (cron monitoring or external heartbeat).
Deep introspection (non-production)
- Telescope is invaluable for local/staging; keep it off in production unless you have hard gates (IP, auth, sampling) and accept the overhead.
- Laravel Pulse (when applicable) surfaces slow queries, exceptions, queues in a dashboardâstill mind sampling and retention on busy apps.
Health and readiness
/up(Health) in Laravel 11+ and similar patterns: distinguish liveness (âprocess runsâ) from readiness (âcan talk to DB and cacheâ). Load balancers and Kubernetes probes care about the difference.
Errors as a product
- Sentry, Flare, Bugsnagâgrouped stack traces, release tracking, breadcrumbs. They complement logs; they do not replace metrics for saturation.
Microservices: correlation and tracing
When one HTTP call becomes gateway â service A â service B â broker â worker, a plain access log per service is insufficient.
Correlation ID
- Propagate a stable id on every outbound call (
X-Request-Idor W3Ctraceparentalongside your internal id). - Log it in every service at entry; include it in async payloads (job
payload, message headers).
Distributed tracing
- OpenTelemetry is the emerging vendor-neutral way to emit traces; collectors forward to Jaeger, Tempo, Zipkin, or SaaS backends.
- PHP ecosystems vary in maturityâverify instrumentation for your HTTP client, DB driver, and queue library. Partial tracing still beats none.
Service boundaries
- Standardize timeout, retry, and idempotency policies; observability will show cascading retries if each layer blindly retries.
See also the API gateway guide for edge auth, routing, and how failures surface to clients.
Tool spectrum: classic to modern
Roughly older â newer popularity (all still seen in the wild):
Host and network era
- syslog, rsyslog, logrotate â centralize plain files; still valid as a transport stage.
- Nagios, Icinga, Zabbix â host checks, ping, disk, simple service probes. Less about app traces, still common for infra baselines.
Log aggregation
- ELK / Elastic Stack (Elasticsearch, Logstash/Beats, Kibana) â powerful search; operate or buy capacity consciously.
- Graylog, Splunk (enterprise) â similar problem space.
Metrics and dashboards
- Prometheus scrape model + Grafana dashboards â de facto for Kubernetes and many bare-metal shops; Alertmanager for routing.
- VictoriaMetrics, Mimir, Thanos â long-term or HA variants around Prometheus protocols.
Logs âlike metricsâ
- Grafana Loki â label-based log storage that pairs naturally with Grafana; often cheaper than indexing every field like search engines.
Cloud-native
- AWS CloudWatch, Google Cloud Logging/Monitoring, Azure Monitor â tight integration if you already live on those bills.
SaaS all-in-one
- Datadog, New Relic, Honeycomb â logs, metrics, APM, RUM; fast to value, priced by volumeâwatch cardinality.
Errors and APM for PHP
- Sentry (errors + performance), Scout, Tideways (PHP-focused profiling) â strong Laravel community usage.
Standardization wave
- OpenTelemetry (OTel) â unified SDKs/exporters; collector can fan out to many backends. Adoption is growing precisely to avoid vendor lock-in per signal.
eBPF and auto-instrumentation (emerging)
- Tools that observe kernel-level traffic without code changesâpowerful for infra teams, not a substitute for business-level logs you control in Laravel.
Under load: sampling, cardinality, cost
- Log volume grows linearly with traffic; JSON per request at
debugcan dwarf app CPU. Use levels and sampled debug for hot paths. - Prometheus labels: never use unbounded values (raw URLs with ids, emails) as label names or high-cardinality valuesâmetrics explode.
- Trace sampling: keep 100% for errors or slow requests; sample the restâbackends and wallets will thank you.
- Retention: define hot (days) vs cold (object storage) vs delete; compliance may mandate longer audit retention separately from debug logs.
Alerting that humans will respect
Alert on user-visible or imminent failure: SLO burn, error rate jump, queue wait p95, disk threshold, certificate expiry.
Avoid paging for known noisy conditions unless you attach runbooks. âCPU > 80%â for five minutes is often not an incident; âpayment success rate dropped 10xâ is.
Checklist
- Structured logs to stdout or a shipper; one correlation id across sync and async work.
- Golden signals per service: latency, traffic, errors, saturation (plus queue depth for Laravel workers).
- Health endpoints that test real dependencies; separate liveness vs readiness where orchestrators need it.
- Error tracker in prod; Telescope-like tools gated to non-prod.
- Cost and cardinality review before enabling âlog everythingâ or âlabel everything.â
Observability is part of the product: the same Laravel codebase that serves users should tell youâwith evidenceâwhen it is about to fail and where to look first.