1. Introduction
In today's world of service-oriented architecture (SOA), microservices and cloud solutions, a single user request can "fly" through dozens of different services before returning a result. Each node in that "route" writes its own logs, but how do you know that an error or latency in one place is actually related to that specific request?
Distributed tracing is like a tracking system for a package in logistics: you see which checkpoints your package went through, where it got delayed, and where something went wrong. In programming, it's the ability to follow the "journey" of one request across all services, understand how long it spent in each place, and where an error or slowdown occurred.
Why do you need tracing?
- Error localization: If something goes wrong, you immediately see exactly where: in which service, at which stage, and even which specific method was called.
- Finding bottlenecks: Tracing will show where your services "slow down" and which stage loses the most time.
- User scenario analytics: You can understand how clients actually use your services and where to prioritize optimization.
How does it work?
Each request gets a unique trace identifier (TraceId) that "travels" with it across the whole stack: from frontend to backend, from API to database and back. At each step, "spans" are created that record the operation duration and additional events. In the end you get a tree (graph) of operations with their durations and relationships.
Key entities and terms of distributed tracing
| Term | Description |
|---|---|
| Trace | The logical path of a single request across many services. Unique by TraceId. |
| Span | An operation or sub-operation within a trace. Each span has its own unique identifier. |
| TraceId | The unique identifier of the whole trace (the entire request). |
| SpanId | The unique identifier of the current span (operation). |
| Parent SpanId | The id of the parent span (if this span is part of another operation). |
| Attributes | A set of custom metadata for a span: method name, parameters, status, etc. |
| Events/Logs | Important events related to a span (for example, an error, completion). |
| Context Propagation | Passing trace information between services and threads. |
2. OpenTelemetry: the open standard for observability
Briefly about OpenTelemetry
OpenTelemetry is an open standard and a set of tools for collecting telemetry (traces, metrics, logs) from various applications. It's supported by the Cloud Native Computing Foundation (CNCF) and has become the de-facto standard in the world of cloud and distributed applications.
OpenTelemetry (OTel for short) can:
- Automatically collect and send traces, metrics and logs;
- Work with major programming languages, including C#, Java, Python and others;
- Send data back to observability systems — Jaeger, Zipkin, Azure Monitor, Grafana, etc.;
- Be extended via plugins and configured for any stack.
Why OpenTelemetry?
- Vendor independence: OTel is a standard, not a product of a specific company.
- Compatibility: Supports main tracing formats and easily integrates with APM systems.
- Automation: It can do most of the work automatically (for example, HTTP request tracing) with minimal setup.
- Scalability: OTel works well in both small and very large systems.
OpenTelemetry architecture: in plain terms
To avoid getting lost in "layers", let's visualize a typical tracing data flow in OTel.
flowchart LR
subgraph Application
A[OpenTelemetry SDK]
end
A -->|Traces, metrics, logs| B[OpenTelemetry Collector]
B -->|Export to| C[Jaeger/Zipkin/Grafana/Azure/Application Insights etc.]
- OpenTelemetry SDK: Embedded directly into your application (for example, via NuGet packages for .NET).
- OTel Collector: A separate service-hub (often deployed as a Docker container) that accepts trace data from all your applications and exports it where you configure.
- Observability frontend: The system where you see nice graphs, trees, filter and analyze traces.
3. Quick start with tracing on C#/.NET
Let's add basic distributed tracing to our demo app. The main steps are pretty straightforward.
Install NuGet packages
dotnet add package OpenTelemetry
dotnet add package OpenTelemetry.Exporter.Console
OpenTelemetry — the core SDK implementation;
OpenTelemetry.Exporter.Console — exporter that sends traces to the console (you can replace it with Jaeger, Zipkin, etc.).
Minimal tracing setup
In Program.cs (for a console app):
using OpenTelemetry;
using OpenTelemetry.Trace;
class Program
{
static void Main(string[] args)
{
// Configure tracing
using var tracerProvider = Sdk.CreateTracerProviderBuilder()
.AddSource("DemoApp") // Specify the source name
.AddConsoleExporter() // Export traces to the console
.Build();
var source = new System.Diagnostics.ActivitySource("DemoApp");
// Start a new trace (root span)
using (var activity = source.StartActivity("Main operation"))
{
DoWork();
}
}
static void DoWork()
{
// Some application logic...
System.Threading.Thread.Sleep(300);
}
}
What's happening here?
- We create a tracer provider, register the source ("DemoApp") and add the console exporter AddConsoleExporter().
- We start a new "activity" (span) via ActivitySource at the root of the program.
- Inside we perform the useful work we want to trace.
Result:
Activity.Id: 0b69e3e97ca5f14d
Activity.Operation: Main operation
Duration: 00:00:00.3001082
...
Automatic HTTP and database tracing
OpenTelemetry supports instrumentation — automatic tracing for popular libraries. For example, calls through HttpClient and database queries (ADO.NET) can be collected without your own code.
using var tracerProvider = Sdk.CreateTracerProviderBuilder()
.AddHttpClientInstrumentation() // HTTP requests
.AddSqlClientInstrumentation() // SQL requests (if you use it)
.AddConsoleExporter()
.Build();
Now all your calls via HttpClient and SQL work will automatically appear in traces.
4. Tracing visualization
Console output is just the first step. For real benefit you want a frontend with visualization.
Jaeger
Jaeger is one of the most popular open-source systems for tracing visualization.
- You deploy Jaeger (for example, via Docker).
- Instead of AddConsoleExporter() you use:
.AddJaegerExporter(options => { options.AgentHost = "localhost"; options.AgentPort = 6831; }) - Now all traces will appear in the Jaeger UI — you can filter by TraceId, view detailed timing diagrams.
Full guide: OpenTelemetry Jaeger Exporter for .NET
5. Useful nuances
Comparing approaches
Without OTel you manually copy the TraceId into logs, hope you don't forget to forward it and suffer during incident analysis. With OTel the whole chain is built automatically and visualized with one click.
| Approach | Effort | Reliability | Scalability | Compatibility |
|---|---|---|---|---|
| Manual TraceId logging | High | Low | Requires constant maintenance | Only your apps |
| OpenTelemetry | Minimal (after setup) | Guaranteed | Wide (any language, stack) | Compatible with most APM and logging systems |
Example architecture
[Web client] --> [API service] --> [Auth service] --> [Database service]
When a user clicks a button, the request goes through all these services. With OTel you can "follow the thread" (TraceId) through each of them, automatically joining the logic into a single trace.
- In each service the OTel SDK automatically recognizes the TraceId from HTTP headers and continues the chain.
- As a result you can open one trace — and see where milliseconds were spent, what happened and where an error occurred.
Use in real projects and in interviews
- Microservice systems and distributed applications (fintech, marketplaces, SaaS, etc.).
- DevOps/SRE: fast problem localization.
- Improving SLA and service responsiveness.
- Profiling and optimization.
- At interviews — demonstrating mature engineering practices.
6. Common mistakes and implementation specifics
Forgot to pass TraceContext. If the TraceId is not passed between services (for example, you forgot to forward the required HTTP headers), traces will be broken — they'll show up as isolated "points", and all the observability benefits will disappear.
Uninstrumented code. If your library or framework doesn't support instrumentation, some operations will drop out of the trace (but you can always create spans manually via ActivitySource and StartActivity()).
Too much data. Collecting traces for every request is expensive, especially in large production. Usually tracing is enabled only for a portion of requests (sample rate) or on triggers (errors, slow requests).
Performance. Adding OTel minimally affects performance when using asynchronous export, however under very high traffic watch the overhead and tune sampling rate.
GO TO FULL VERSION