Distributed tracing and observability have become essential components of modern application development, especially in the world of microservices and distributed systems. When dealing with multiple services communicating with each other, tracking how requests flow through the system and identifying bottlenecks or failures can be challenging. Distributed tracing helps address these challenges by providing visibility into how individual requests traverse multiple services.In this chapter, we will explore how to implement distributed tracing and observability in Node.js applications using tools like Jaeger and OpenTelemetry. We will cover everything from the basic concepts to advanced setups, complete with practical examples and code.
Distributed tracing is a method for tracking and monitoring requests as they flow through a distributed system. When a request is made to a system composed of multiple services (for example, in a microservice architecture), each service processes part of the request. Distributed tracing allows you to see the entire path of the request, including which services were involved and how much time each service took.
Observability is the ability to understand the internal state of a system based on the external outputs (logs, metrics, and traces). Observability helps developers and operators answer questions about system behavior, identify problems, and improve performance.
The three key pillars of observability are:
OpenTelemetry is an open-source project that provides a unified set of APIs, libraries, and instrumentation to capture and export traces, metrics, and logs. It’s a standard way to instrument your code for observability.
To get started with OpenTelemetry in your Node.js application, install the necessary packages:
npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-jaeger
Let’s walk through a basic setup to enable distributed tracing in a Node.js application.
Configure OpenTelemetry: In your entry file (e.g., app.js
or index.js
), initialize the OpenTelemetry SDK.
const { NodeTracerProvider } = require('@opentelemetry/sdk-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
// Create a provider for managing traces
const provider = new NodeTracerProvider();
// Export traces to Jaeger
const exporter = new JaegerExporter({
serviceName: 'nodejs-app',
});
// Add a span processor
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
// Register the provider globally
provider.register();
console.log('OpenTelemetry initialized and sending traces to Jaeger');
In OpenTelemetry, spans represent individual operations or units of work. Let’s add custom spans to trace specific parts of the application:
const opentelemetry = require('@opentelemetry/api');
// Get a tracer instance
const tracer = opentelemetry.trace.getTracer('nodejs-app');
// Create a new span for a custom operation
const span = tracer.startSpan('process-order');
// Simulate some work
setTimeout(() => {
span.end(); // End the span when the work is done
}, 1000);
For distributed tracing to work across services, trace context (trace ID and span ID) must be propagated between services. OpenTelemetry handles this automatically for common protocols like HTTP. Here’s an example of instrumenting an HTTP request:
const axios = require('axios');
const { context, trace } = require('@opentelemetry/api');
async function makeRequest() {
const tracer = trace.getTracer('nodejs-app');
const span = tracer.startSpan('http-request');
try {
await axios.get('http://another-service/api/data');
} finally {
span.end();
}
}
makeRequest();
OpenTelemetry automatically injects the trace context into the HTTP headers, so downstream services can link the spans together.
Jaeger is an open-source distributed tracing system developed by Uber. It helps with:
To visualize the traces generated by OpenTelemetry, we’ll use Jaeger as our backend tracing system. You can run Jaeger locally using Docker:
docker run -d --name jaeger \
-e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
-p 5775:5775/udp \
-p 6831:6831/udp \
-p 6832:6832/udp \
-p 5778:5778 \
-p 16686:16686 \
-p 14268:14268 \
-p 14250:14250 \
-p 9411:9411 \
jaegertracing/all-in-one:1.21
Jaeger will be available at http://localhost:16686
. You can search for traces, visualize spans, and analyze performance directly in the Jaeger UI.
Once Jaeger is up and running, OpenTelemetry will send the trace data to Jaeger. Here’s how you can visualize a trace:
http://localhost:16686
.nodejs-app
service in the Jaeger UI.When working with a microservices architecture, distributed tracing becomes crucial to understand how requests flow between services. Each service in your architecture will have its own spans, and OpenTelemetry will ensure that the trace context is propagated between them.
OpenTelemetry not only captures traces but also supports logging and metrics, creating a full observability suite.
For example, you can send traces to Jaeger, logs to Elasticsearch, and metrics to Prometheus — all from the same application.
Here’s how to capture custom metrics:
const { MeterProvider } = require('@opentelemetry/sdk-metrics-base');
// Create a MeterProvider
const meterProvider = new MeterProvider();
// Create a meter to track custom metrics
const meter = meterProvider.getMeter('example-meter');
// Create a counter
const requestCount = meter.createCounter('requests', {
description: 'Count of all incoming requests',
});
// Increment the counter when a request is made
requestCount.add(1);
To optimize performance and avoid overwhelming your tracing system, you can implement sampling strategies that control how many traces are sent. By default, OpenTelemetry captures all traces, but you can adjust this behavior by configuring a sampling rate.
const provider = new NodeTracerProvider({
sampler: new TraceIdRatioBasedSampler(0.5), // Capture 50% of traces
});
Distributed tracing and observability are essential for building and maintaining complex distributed systems, especially in the microservices world. With tools like OpenTelemetry and Jaeger, Node.js developers can gain deep insights into how their applications perform, track request flows, and quickly identify bottlenecks or failures.In this chapter, we’ve covered:The basics of distributed tracing and observability. How to set up OpenTelemetry in a Node.js application. How to use Jaeger for visualizing traces. Advanced usage, such as context propagation, metrics, and logging. With this setup, you’ll be well-equiHappy coding !❤️