Finding Tail Latency Bottlenecks - 2 : Collecting information using Start Stop Activity

 The first part of this post is at : Finding Tail Latency Bottlenecks.

From the first post, we found if we have right information, we can do more in-depth analysis of slow requests. In this post we will learn about how to collect the information.

In windows, ETW is the general tracing framework build into the system. Different languages like C#, C++ can use the same for tracing information about the applications. It is fast. ETW tracing system is used by profiling tools like Perfview to collect traces & metrics from your app and show you what is happening.

If we trace information during each stage of a request then we can use different tools from this raw data of all the requests. We can plot percentiles of each stage. If some stage seems to take more time than expected, we can put more traces in that stage and re run the test.

During optimizing a system that i didn't originally wrote, i did this most simple brute force to find real bottlenecks. This is binary search applied to finding bottlenecks. I added 3 traces - start, middle, end of a request life cycle. If time till middle took most of the time, then i focused on code base from start to middle, otherwise i focused on code base from middle to end. If both took equal time, i focused on any one of them. :P. This kept tracing overhead to minimum [ even though i didn't need it because ETW is fast and efficient. ]

So, you can use something similar to ETW for your operating system, language, framework. 

Here i will explain another wrapper written in C# from dotnet runtime itself that makes this work more simpler for tracing code path of async code. It is called Start/Stop activity. What Activity does is to have a unique Id for a particular task. Any subsequent tracing from EventSource from the sub tasks of that task gets the same Id, unless you call Stop event.

Today (Aug 16 2020) i could not find documentation online about Perfview's start/stop Activity / ThreadTime. Here is documentation from inside the perfview tool itself : [ which is a good source to know about profiling and perfview tool ].

 

    Thread Time (With StartStop Activities) Stacks - This is like Thread Time Stacks in that it shows what every thread is doing (consuming CPU, Disk, Network) at any instant of time, and it tracks the causality of System.Threading.Tasks.Tasks so that costs incurred by the task is 'charged' to the creator of the task. However in addition to all this, it looks for 'Start-Stop' EventSource events as well as HTTP, ASP.NET WCF events and creates 'activities' for each of these. These Activities are place 'at the top (near the process node) of the stack so it nicely separates all costs associated with a particular starts-stop activity (e.g. a web request). It is very valuable for doing server investigations. You will only get this view if you collected data with the Thread Time events. See Making Server Investigation Easy and Blocked time investigation for more details.

 

Here are steps to use activity feature in C#.

1. Create a simple eventsource with Start/Stop event functions with EventID of start as X and of stop as X+1. 

2. No GUID in event source declaration. 

3. Start top level event in a Task.Run() so that it gets its own task, otherwise, it is possible for perfview to believe these tasks are nested.

    using System.Diagnostics.Tracing;
    
    namespace MyReadTest
    {
        [EventSource(Name = "My-ReadLatency-Activities")]
        sealed internal class LatencyEvents : EventSource
        {
                    /// <summary>
            /// My one and only instance of LatencyEvents that can log events.
            /// </summary>
            public static LatencyEvents Log = new LatencyEvents();
          
            /// <summary>
            /// Useful for simple, ad-hoc print style logging.
            /// </summary>
            /// <param name="message"></param>
            public void DebugMessage(string message) { WriteEvent(1, message); }
    
            public void ReadStart() { WriteEvent(2); }
            public void ReadStop() { WriteEvent(3); }
    
            public void MiddleStageStart() { WriteEvent(4); }
            public void MiddleStageStop() { WriteEvent(5); }

            // More stage start/stop event here.
        }
    }


Perfview command:

PerfView.exe /BufferSizeMB:2024 /InMemoryCircularBuffer /zip:false /merge:false /circularmb:20048 /nogui /providers=*My-ReadLatency-Activities /ThreadTime  /MaxCollectSec:10 collect c:\perfview\1\readlatency.etl   

 

you can also track only your provider so etl file does not grow large.. 

/onlyProviders=*<Your_StartStopActivityEventSourceName>

 

After collecting profile,  go to the events view of perfview:

Select all your events And press enter. Sort by DURATION_MSEC ActivityID * in `Colums to Display` in right hand side. Use `Text Filter` to focus on only one activity. See below image for details

 


 

Open Thread Stacks : or cpu stacks to see what is running on the cpu during this time. In my case, GC was running on the CPU during this time.

 

Problem with start/stop Activities:

Perfview has to collect a lot of data to give you all the details for what happened during a particular request. This includes context switches, CPU samples and all kind of Activities running in your app. This makes profile size large, causing possible loss of events and Perfview tool becomes slow.

So, what can we do ?


Remember ETW is fast. We can just emit traces from each stage and process the raw traces ourselves. This is what i did. We will learn about this in the next blog.


Comments