linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf trace timestamps
@ 2012-11-16 13:07 Saumil Merchant
  2012-11-16 14:39 ` David Ahern
  0 siblings, 1 reply; 5+ messages in thread
From: Saumil Merchant @ 2012-11-16 13:07 UTC (permalink / raw)
  To: linux-perf-users

Hi,
I am trying to understand the origin of the timestamps in the trace output. 
Here is an example trace of a sample C code.

/> perf sched record -a ./a.out
/> perf script

[ ... ]
a.out 29311 [000] 1746678.549886: sched_stat_runtime: comm=a.out pid=29311
runtime=310028 [ns] vruntime=159246710291986646 [ns]
a.out 29311 [000] 1746678.550885: sched_stat_runtime: comm=a.out pid=29311
runtime=998470 [ns] vruntime=159246710292985116 [ns]
a.out 29311 [000] 1746678.550895: sched_stat_runtime: comm=a.out pid=29311
runtime=11039 [ns] vruntime=159246710292996155 [ns]
[ ... ]

My original impression was that this timestamp (eg, 1746678.550895) would align
with time reported clk_gettime(MONOTONIC,...) function, but it does not. My C
function above prints out timestamps from clk_gettime(MONOTONIC,..)  function
and there is a delta of >15 mins.

Here is the printout on console:
CLKTIME (MONOTONIC):  1748132.102228068

This delta has been varying from 5 mins to 15+ mins.  My objective is to
correlate events in a kernel trace with the execution of our user program.  My
understanding is that both perf and clk_gettime functions on newer x86 systems
use TSC as the clock source and hence I was confused why they do not align.

Can someone please help?

Regards,
Saumil

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: perf trace timestamps
  2012-11-16 13:07 perf trace timestamps Saumil Merchant
@ 2012-11-16 14:39 ` David Ahern
  2012-11-20 11:33   ` Saumil Merchant
  0 siblings, 1 reply; 5+ messages in thread
From: David Ahern @ 2012-11-16 14:39 UTC (permalink / raw)
  To: Saumil Merchant; +Cc: linux-perf-users

On 11/16/12 6:07 AM, Saumil Merchant wrote:
> Hi,
> I am trying to understand the origin of the timestamps in the trace output.
> Here is an example trace of a sample C code.
>
> /> perf sched record -a ./a.out
> /> perf script
>
> [ ... ]
> a.out 29311 [000] 1746678.549886: sched_stat_runtime: comm=a.out pid=29311
> runtime=310028 [ns] vruntime=159246710291986646 [ns]
> a.out 29311 [000] 1746678.550885: sched_stat_runtime: comm=a.out pid=29311
> runtime=998470 [ns] vruntime=159246710292985116 [ns]
> a.out 29311 [000] 1746678.550895: sched_stat_runtime: comm=a.out pid=29311
> runtime=11039 [ns] vruntime=159246710292996155 [ns]
> [ ... ]
>
> My original impression was that this timestamp (eg, 1746678.550895) would align
> with time reported clk_gettime(MONOTONIC,...) function, but it does not. My C
> function above prints out timestamps from clk_gettime(MONOTONIC,..)  function
> and there is a delta of >15 mins.
>
> Here is the printout on console:
> CLKTIME (MONOTONIC):  1748132.102228068

At one time many, many releases ago perf timestamps did align with the 
monotonic clock. I have tried (unsuccessfully) to get an option into 
perf to correlate perf events with time-of-day stamps.

>
> This delta has been varying from 5 mins to 15+ mins.  My objective is to
> correlate events in a kernel trace with the execution of our user program.  My
> understanding is that both perf and clk_gettime functions on newer x86 systems
> use TSC as the clock source and hence I was confused why they do not align.
>
> Can someone please help?

See John's explanations in this thread:
https://lkml.org/lkml/2012/10/16/173

David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: perf trace timestamps
  2012-11-16 14:39 ` David Ahern
@ 2012-11-20 11:33   ` Saumil Merchant
  2012-11-21 15:54     ` David Ahern
  0 siblings, 1 reply; 5+ messages in thread
From: Saumil Merchant @ 2012-11-20 11:33 UTC (permalink / raw)
  To: linux-perf-users

David Ahern <dsahern <at> gmail.com> writes:

> 
> >
> > My original impression was that this timestamp (eg, 1746678.550895) would
> > with time reported clk_gettime(MONOTONIC,...) function, but it does not. My
> > function above prints out timestamps from clk_gettime(MONOTONIC,..)  
> > and there is a delta of >15 mins.
> >
> > Here is the printout on console:
> > CLKTIME (MONOTONIC):  1748132.102228068
> 
> At one time many, many releases ago perf timestamps did align with the 
> monotonic clock. I have tried (unsuccessfully) to get an option into 
> perf to correlate perf events with time-of-day stamps.
> 


Since clk_gettime and perf timestamps do not align, I tried synchronizing 
timestamps at start my code. I used a syscall that I can track in the perf
trace.  I can then normalize perf trace output timestamps using the delta 
between perf timestamp and clk_gettime timestamp for the syscall.  I chose to 
use nanosleep for this.  While doing this I realized that this normalizing 
delta between the perf and clk_gettime timestamps linearly increases with 
time.  Here is a simple C code that I used to check this behavior:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int ITERATIONS=600;
struct timespec sleepy, t1, t2;

int main(int argc, char **argv) {
    int i;

    sleepy.tv_sec=0;
    sleepy.tv_nsec=1000000;

    for(i=0; i<ITERATIONS; i++) {
        clock_gettime(CLOCK_MONOTONIC,&t1);
        nanosleep(&sleepy,NULL);
        clock_gettime(CLOCK_MONOTONIC,&t2);
        printf("%d.%09d, %d.%09d\n",
               t1.tv_sec,t1.tv_nsec,t2.tv_sec,t2.tv_nsec);
    }

    return(0);
}

I correlated entry (t1) clk_gettime timestamps with entry nanosleep timestamps
from perf.  Same with he exit timestamps.  
i.e. two arrays of "entry_clk_gettime - entry_perf" and 
"exit_clk_gettime - exit_perf". 

On plotting these two arrays against the iterations I got a plot which is
straight line linearly increasing with slope 0.00000095.  The difference 
between the first and the last sample of these arrays is almost 0.6 msecs over
just 600 iterations.  I was expecting variations when I plotted these two
arrays, but was expecting random variations and not a linearly increasing 
straight line.  This tells me that perf is inflating the timestamps over time 
with probably a constant value. 

Is this an expected behavior or a bug?

I didn't want to bombard the list with a big spreadsheet file detailing the
experiment outcomes, but I can email it separately to whoever wants to take a
look at it.



> >
> > This delta has been varying from 5 mins to 15+ mins.  My objective is to
> > correlate events in a kernel trace with the execution of our user program.  
> > understanding is that both perf and clk_gettime functions on newer x86 
> > use TSC as the clock source and hence I was confused why they do not align.
> >
> > Can someone please help?
> 
> See John's explanations in this thread:
> https://lkml.org/lkml/2012/10/16/173
> 
> David
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: perf trace timestamps
  2012-11-20 11:33   ` Saumil Merchant
@ 2012-11-21 15:54     ` David Ahern
  2012-11-27  1:37       ` John Stultz
  0 siblings, 1 reply; 5+ messages in thread
From: David Ahern @ 2012-11-21 15:54 UTC (permalink / raw)
  To: Saumil Merchant, john.stultz, Thomas Gleixner
  Cc: linux-perf-users, Stephane Eranian

On 11/20/12 4:33 AM, Saumil Merchant wrote:
> David Ahern <dsahern <at> gmail.com> writes:
>>>
>>> My original impression was that this timestamp (eg, 1746678.550895) would
>>> with time reported clk_gettime(MONOTONIC,...) function, but it does not. My
>>> function above prints out timestamps from clk_gettime(MONOTONIC,..)
>>> and there is a delta of >15 mins.
>>>
>>> Here is the printout on console:
>>> CLKTIME (MONOTONIC):  1748132.102228068
>>
>> At one time many, many releases ago perf timestamps did align with the
>> monotonic clock. I have tried (unsuccessfully) to get an option into
>> perf to correlate perf events with time-of-day stamps.
>>
>
>
> Since clk_gettime and perf timestamps do not align, I tried synchronizing
> timestamps at start my code. I used a syscall that I can track in the perf
> trace.  I can then normalize perf trace output timestamps using the delta
> between perf timestamp and clk_gettime timestamp for the syscall.  I chose to
> use nanosleep for this.  While doing this I realized that this normalizing
> delta between the perf and clk_gettime timestamps linearly increases with
> time.  Here is a simple C code that I used to check this behavior:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <time.h>
>
> int ITERATIONS=600;
> struct timespec sleepy, t1, t2;
>
> int main(int argc, char **argv) {
>      int i;
>
>      sleepy.tv_sec=0;
>      sleepy.tv_nsec=1000000;
>
>      for(i=0; i<ITERATIONS; i++) {
>          clock_gettime(CLOCK_MONOTONIC,&t1);
>          nanosleep(&sleepy,NULL);
>          clock_gettime(CLOCK_MONOTONIC,&t2);
>          printf("%d.%09d, %d.%09d\n",
>                 t1.tv_sec,t1.tv_nsec,t2.tv_sec,t2.tv_nsec);
>      }
>
>      return(0);
> }
>
> I correlated entry (t1) clk_gettime timestamps with entry nanosleep timestamps
> from perf.  Same with he exit timestamps.
> i.e. two arrays of "entry_clk_gettime - entry_perf" and
> "exit_clk_gettime - exit_perf".
>
> On plotting these two arrays against the iterations I got a plot which is
> straight line linearly increasing with slope 0.00000095.  The difference
> between the first and the last sample of these arrays is almost 0.6 msecs over
> just 600 iterations.  I was expecting variations when I plotted these two
> arrays, but was expecting random variations and not a linearly increasing
> straight line.  This tells me that perf is inflating the timestamps over time
> with probably a constant value.
>
> Is this an expected behavior or a bug?

Perhaps John or Thomas can explain what is going on. cc'ed. Added 
Stephane as well given his recent thread on the topic.

David

>
> I didn't want to bombard the list with a big spreadsheet file detailing the
> experiment outcomes, but I can email it separately to whoever wants to take a
> look at it.
>
>
>
>>>
>>> This delta has been varying from 5 mins to 15+ mins.  My objective is to
>>> correlate events in a kernel trace with the execution of our user program.
>>> understanding is that both perf and clk_gettime functions on newer x86
>>> use TSC as the clock source and hence I was confused why they do not align.
>>>
>>> Can someone please help?
>>
>> See John's explanations in this thread:
>> https://lkml.org/lkml/2012/10/16/173
>>
>> David
>>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: perf trace timestamps
  2012-11-21 15:54     ` David Ahern
@ 2012-11-27  1:37       ` John Stultz
  0 siblings, 0 replies; 5+ messages in thread
From: John Stultz @ 2012-11-27  1:37 UTC (permalink / raw)
  To: David Ahern
  Cc: Saumil Merchant, Thomas Gleixner, linux-perf-users,
	Stephane Eranian

On 11/21/2012 07:54 AM, David Ahern wrote:
> On 11/20/12 4:33 AM, Saumil Merchant wrote:
>> David Ahern <dsahern <at> gmail.com> writes:
>>>>
>>>> My original impression was that this timestamp (eg, 1746678.550895) 
>>>> would
>>>> with time reported clk_gettime(MONOTONIC,...) function, but it does 
>>>> not. My
>>>> function above prints out timestamps from clk_gettime(MONOTONIC,..)
>>>> and there is a delta of >15 mins.
>>>>
>>>> Here is the printout on console:
>>>> CLKTIME (MONOTONIC):  1748132.102228068
>>>
>>> At one time many, many releases ago perf timestamps did align with the
>>> monotonic clock. I have tried (unsuccessfully) to get an option into
>>> perf to correlate perf events with time-of-day stamps.
>>>
>>
>>
>> Since clk_gettime and perf timestamps do not align, I tried 
>> synchronizing
>> timestamps at start my code. I used a syscall that I can track in the 
>> perf
>> trace.  I can then normalize perf trace output timestamps using the 
>> delta
>> between perf timestamp and clk_gettime timestamp for the syscall.  I 
>> chose to
>> use nanosleep for this.  While doing this I realized that this 
>> normalizing
>> delta between the perf and clk_gettime timestamps linearly increases 
>> with
>> time.  Here is a simple C code that I used to check this behavior:
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <time.h>
>>
>> int ITERATIONS=600;
>> struct timespec sleepy, t1, t2;
>>
>> int main(int argc, char **argv) {
>>      int i;
>>
>>      sleepy.tv_sec=0;
>>      sleepy.tv_nsec=1000000;
>>
>>      for(i=0; i<ITERATIONS; i++) {
>>          clock_gettime(CLOCK_MONOTONIC,&t1);
>>          nanosleep(&sleepy,NULL);
>>          clock_gettime(CLOCK_MONOTONIC,&t2);
>>          printf("%d.%09d, %d.%09d\n",
>>                 t1.tv_sec,t1.tv_nsec,t2.tv_sec,t2.tv_nsec);
>>      }
>>
>>      return(0);
>> }
>>
>> I correlated entry (t1) clk_gettime timestamps with entry nanosleep 
>> timestamps
>> from perf.  Same with he exit timestamps.
>> i.e. two arrays of "entry_clk_gettime - entry_perf" and
>> "exit_clk_gettime - exit_perf".
>>
>> On plotting these two arrays against the iterations I got a plot 
>> which is
>> straight line linearly increasing with slope 0.00000095.  The difference
>> between the first and the last sample of these arrays is almost 0.6 
>> msecs over
>> just 600 iterations.  I was expecting variations when I plotted these 
>> two
>> arrays, but was expecting random variations and not a linearly 
>> increasing
>> straight line.  This tells me that perf is inflating the timestamps 
>> over time
>> with probably a constant value.
>>
>> Is this an expected behavior or a bug?
>
> Perhaps John or Thomas can explain what is going on. cc'ed. Added 
> Stephane as well given his recent thread on the topic.

I suspect this is expected behavior, as to my understanding perf uses 
sched_clock, not CLOCK_MONOTONIC for its timestamps.

CLOCK_MONOTONIC is frequency corrected for accuracy by NTP (thus after 
NTP converges, one second is really one second long), sched_clock has no 
such correction.

So if you either disable NTP or compare with CLOCK_MONOTONIC_RAW (which 
also has no frequency correction, so one second may not be actually one 
second, given any initial calibration error or hardware drift), I 
suspect you'll not see the linear slope.

Another possible source for this slope/drift could be if the clocksource 
hardware being used for time (CLOCK_MONOTONIC) is different the the 
hardware used for sched_clock.  For instance, on x86 sched_clock is 
usually based on the TSC (which is calibrated at boot, which introduces 
accuracy errors). The TSC may not be stable on the cpu, so it could halt 
or may not be consistent between cores, so we may use something like the 
HPET or ACPI PM hardware (which we are given the freq by the hardware) 
for the timekeeping clocksource. Thus there may be either calibration or 
hardware drift (or both) between the different clocks.

There was some discussion recently on lkml about either exposing the 
perf clock to userland, or possibly reworking perf so the timestamps is 
exports are based on CLOCK_MONOTONIC instead, but that hasn't come to 
any resolution yet.

thanks
-john

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-11-27  1:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-16 13:07 perf trace timestamps Saumil Merchant
2012-11-16 14:39 ` David Ahern
2012-11-20 11:33   ` Saumil Merchant
2012-11-21 15:54     ` David Ahern
2012-11-27  1:37       ` John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).