public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Needed faster implementation of do_gettimeofday()
@ 2005-02-20 10:58 puneet_kaushik
  2005-02-20 15:48 ` Parag Warudkar
  0 siblings, 1 reply; 6+ messages in thread
From: puneet_kaushik @ 2005-02-20 10:58 UTC (permalink / raw)
  To: linux-kernel

Hello all,

I am running oprofile on some program. Following is the oprofile output.

-----------------------------------------------------------------------

Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples  %        app name                 symbol name
985913    8.6083  vmlinux                  mark_offset_tsc
584473    5.1032  libc-2.3.2.so            getc
295901    2.5836  vmlinux                  ide_outb
270823    2.3646  vmlinux                  _spin_lock
249791    2.1810  vmlinux                  _spin_unlock
236140    2.0618  vmlinux                  timer_interrupt
175249    1.5302  ld-2.3.2.so              do_lookup_versioned
140429    1.2261  sendmail                 putc
138739    1.2114  sendmail                 stabhash
134145    1.1713  sendmail                 getc

-----------------------------------------------------------------------


>From this output what I can analyse is that mark_offset_tsc(which is
called from do_gettimeofday), and some other timer functions, are taking
most of the CPU.

Is there any faster implementation of do_gettimeofday. I am using kernel
2.6.10. with dual P4.

What I found from google search is: http://lwn.net/Articles/9266/ , which
is only for kernel 2.4

Thanks for help.


-Puneet




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Needed faster implementation of do_gettimeofday()
  2005-02-20 10:58 Needed faster implementation of do_gettimeofday() puneet_kaushik
@ 2005-02-20 15:48 ` Parag Warudkar
  2005-02-22  3:06   ` George Anzinger
  0 siblings, 1 reply; 6+ messages in thread
From: Parag Warudkar @ 2005-02-20 15:48 UTC (permalink / raw)
  To: puneet_kaushik; +Cc: linux-kernel

On Sunday 20 February 2005 05:58 am, puneet_kaushik@persistent.co.in wrote:
> 985913    8.6083  vmlinux                  mark_offset_tsc
> 584473    5.1032  libc-2.3.2.so            getc

What makes you think mark_offset_tsc is slow? Do you have any comparative 
numbers?  It might just be that the workload you are throwing at it justifies 
it. (For e.g. if your workload does a zillion system calls, system_call will 
show up as a hot spot in oprofile - doesn't necessarily mean it is slow - 
it's just overused.) Can you post the relevant code?

Parag

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Needed faster implementation of do_gettimeofday()
  2005-02-20 15:48 ` Parag Warudkar
@ 2005-02-22  3:06   ` George Anzinger
  2005-02-22 13:56     ` Puneet Kaushik
  0 siblings, 1 reply; 6+ messages in thread
From: George Anzinger @ 2005-02-22  3:06 UTC (permalink / raw)
  To: Parag Warudkar; +Cc: puneet_kaushik, linux-kernel

Parag Warudkar wrote:
> On Sunday 20 February 2005 05:58 am, puneet_kaushik@persistent.co.in wrote:
> 
>>985913    8.6083  vmlinux                  mark_offset_tsc
>>584473    5.1032  libc-2.3.2.so            getc
> 
> 
> What makes you think mark_offset_tsc is slow? Do you have any comparative 
> numbers?  It might just be that the workload you are throwing at it justifies 
> it. (For e.g. if your workload does a zillion system calls, system_call will 
> show up as a hot spot in oprofile - doesn't necessarily mean it is slow - 
> it's just overused.) Can you post the relevant code?

He really is right.  Mark offset is reading the PIT counter and that is not only 
rather dumb but dog slow.

A suggestion, try the high res timers patch.  Even if you don't use the timers 
the mark offset there is MUCH faster.  It does not read the PIT.

The difference is where we assume the jiffie bump is in time.  If we assume it 
is at the point that the PIT interrupts, well then the only way to get to that 
is to read the PIT.  If, on the other hand, we assume it is at the time after 
the interrrupt where we mark offset, we can observe the "best" time for this 
event based on the TSC and avoid reading the PIT.

Try the HRT patch (see signature below) and see if if doesn't do better.


-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Needed faster implementation of do_gettimeofday()
  2005-02-22  3:06   ` George Anzinger
@ 2005-02-22 13:56     ` Puneet Kaushik
  2005-02-22 15:46       ` Chris Friesen
  2005-02-22 16:44       ` George Anzinger
  0 siblings, 2 replies; 6+ messages in thread
From: Puneet Kaushik @ 2005-02-22 13:56 UTC (permalink / raw)
  To: george, kernel-stuff; +Cc: linux-kernel

Hello Parag and George,

Thanks for immediate reply.
The main problem is I am working on a SMP system. I have written a small
program that just calls the gettimeofday(), one billion times. I have
run it with time utility and it takes almost double time on SMP then a
UP.



with kernel 2.6.10 on UP

real    4m5.495s
user    1m17.088s
sys     2m48.046s


With Kernel 2.6.10 on SMP

real    6m24.485s
user    1m43.723s
sys     4m30.749s


And the fact is this SMP machine is faster and with more memory than the
UP one. In SMP systems it make a spinlock every time it got called,
synchronizes both the processors, and unlock them. Thats all I know
about it.

George I am just working on your suggestion, let me know if it will work
for SMPs.

If there is some good implementation for SMP, please let me know.

Thanks,

- Puneet




On Tue, 2005-02-22 at 08:36, George Anzinger wrote:
> Parag Warudkar wrote:
> > On Sunday 20 February 2005 05:58 am, puneet_kaushik@persistent.co.in wrote:
> > 
> >>985913    8.6083  vmlinux                  mark_offset_tsc
> >>584473    5.1032  libc-2.3.2.so            getc
> > 
> > 
> > What makes you think mark_offset_tsc is slow? Do you have any comparative 
> > numbers?  It might just be that the workload you are throwing at it justifies 
> > it. (For e.g. if your workload does a zillion system calls, system_call will 
> > show up as a hot spot in oprofile - doesn't necessarily mean it is slow - 
> > it's just overused.) Can you post the relevant code?
> 
> He really is right.  Mark offset is reading the PIT counter and that is not only 
> rather dumb but dog slow.
> 
> A suggestion, try the high res timers patch.  Even if you don't use the timers 
> the mark offset there is MUCH faster.  It does not read the PIT.
> 
> The difference is where we assume the jiffie bump is in time.  If we assume it 
> is at the point that the PIT interrupts, well then the only way to get to that 
> is to read the PIT.  If, on the other hand, we assume it is at the time after 
> the interrrupt where we mark offset, we can observe the "best" time for this 
> event based on the TSC and avoid reading the PIT.
> 
> Try the HRT patch (see signature below) and see if if doesn't do better.
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Needed faster implementation of do_gettimeofday()
  2005-02-22 13:56     ` Puneet Kaushik
@ 2005-02-22 15:46       ` Chris Friesen
  2005-02-22 16:44       ` George Anzinger
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Friesen @ 2005-02-22 15:46 UTC (permalink / raw)
  To: Puneet Kaushik; +Cc: george, kernel-stuff, linux-kernel

Puneet Kaushik wrote:
> Hello Parag and George,
> 
> Thanks for immediate reply.
> The main problem is I am working on a SMP system. I have written a small
> program that just calls the gettimeofday(), one billion times. I have
> run it with time utility and it takes almost double time on SMP then a
> UP.

If the hardware is known in advance, can you use some arch-specific 
thing (like rdtsc on intel) to get a timestamp that can then be 
calibrated by calling gettimeofday() at a lower frequency?

There will be issues (may have to use cpu affinity if the two don't run 
at the same rate, may need to disable any frequency stepping), but it 
might be possible to work around them.

Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Needed faster implementation of do_gettimeofday()
  2005-02-22 13:56     ` Puneet Kaushik
  2005-02-22 15:46       ` Chris Friesen
@ 2005-02-22 16:44       ` George Anzinger
  1 sibling, 0 replies; 6+ messages in thread
From: George Anzinger @ 2005-02-22 16:44 UTC (permalink / raw)
  To: Puneet Kaushik; +Cc: kernel-stuff, linux-kernel

Puneet Kaushik wrote:
> Hello Parag and George,
> 
> Thanks for immediate reply.
> The main problem is I am working on a SMP system. I have written a small
> program that just calls the gettimeofday(), one billion times. I have
> run it with time utility and it takes almost double time on SMP then a
> UP.
> 
> 
> 
> with kernel 2.6.10 on UP
> 
> real    4m5.495s
> user    1m17.088s
> sys     2m48.046s
> 
> 
> With Kernel 2.6.10 on SMP
> 
> real    6m24.485s
> user    1m43.723s
> sys     4m30.749s
> 
> 
> And the fact is this SMP machine is faster and with more memory than the
> UP one. In SMP systems it make a spinlock every time it got called,
> synchronizes both the processors, and unlock them. Thats all I know
> about it.

On 2.6 the lock is a r/w sequence lock.  The machines are not synchronized or 
locked, but some of the sequence lock instructions around the locking are 
"locked".  I find it hard to believe that this would double the time, however.

Ah..., now I remember.  On SMP x86 boxen, the accounting/ run_timer interrupt 
comes from the lapic timer.  This is triggered at a 1/HZ rate and means that 
there is an additional time keeping interrupt.  Actually, over the box, you get 
(N+1)/HZ interrupts where N is the number of cpus.  Assuming that the PIT and 
the lapic interrupt take about the same amount of time and that the PIT 
interrupt is evenly distributed on the CPUs, the interrupt contention should go 
from 1 to 1.5.  This alone would take your 4.084 sec UP time to 6.125 sec on an 
SMP boxen (that is amazingly close to what you are seeing if you ask me).

Again, I recommend my HRT patch.  There the accounting interrupt is generated by 
an "all-but-self" IPI.  This is generated by the PIT interrupt code which also 
does the accounting on the cpu handling the PIT interrupt.  Result: total time 
keeping interrupts N/HZ where N is the number of CPUs.


> 
> George I am just working on your suggestion, let me know if it will work
> for SMPs.

See above.  Should solve your problem.
> 
> If there is some good implementation for SMP, please let me know.
> 
> Thanks,
> 
> - Puneet
> 
> 
> 
> 
> On Tue, 2005-02-22 at 08:36, George Anzinger wrote:
> 
>>Parag Warudkar wrote:
>>
>>>On Sunday 20 February 2005 05:58 am, puneet_kaushik@persistent.co.in wrote:
>>>
>>>
>>>>985913    8.6083  vmlinux                  mark_offset_tsc
>>>>584473    5.1032  libc-2.3.2.so            getc
>>>
>>>
>>>What makes you think mark_offset_tsc is slow? Do you have any comparative 
>>>numbers?  It might just be that the workload you are throwing at it justifies 
>>>it. (For e.g. if your workload does a zillion system calls, system_call will 
>>>show up as a hot spot in oprofile - doesn't necessarily mean it is slow - 
>>>it's just overused.) Can you post the relevant code?
>>
>>He really is right.  Mark offset is reading the PIT counter and that is not only 
>>rather dumb but dog slow.
>>
>>A suggestion, try the high res timers patch.  Even if you don't use the timers 
>>the mark offset there is MUCH faster.  It does not read the PIT.
>>
>>The difference is where we assume the jiffie bump is in time.  If we assume it 
>>is at the point that the PIT interrupts, well then the only way to get to that 
>>is to read the PIT.  If, on the other hand, we assume it is at the time after 
>>the interrrupt where we mark offset, we can observe the "best" time for this 
>>event based on the TSC and avoid reading the PIT.
>>
>>Try the HRT patch (see signature below) and see if if doesn't do better.
>>

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-02-22 16:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-20 10:58 Needed faster implementation of do_gettimeofday() puneet_kaushik
2005-02-20 15:48 ` Parag Warudkar
2005-02-22  3:06   ` George Anzinger
2005-02-22 13:56     ` Puneet Kaushik
2005-02-22 15:46       ` Chris Friesen
2005-02-22 16:44       ` George Anzinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox