* Needed faster implementation of do_gettimeofday()
@ 2005-02-20 10:58 puneet_kaushik
2005-02-20 15:48 ` Parag Warudkar
0 siblings, 1 reply; 6+ messages in thread
From: puneet_kaushik @ 2005-02-20 10:58 UTC (permalink / raw)
To: linux-kernel
Hello all,
I am running oprofile on some program. Following is the oprofile output.
-----------------------------------------------------------------------
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % app name symbol name
985913 8.6083 vmlinux mark_offset_tsc
584473 5.1032 libc-2.3.2.so getc
295901 2.5836 vmlinux ide_outb
270823 2.3646 vmlinux _spin_lock
249791 2.1810 vmlinux _spin_unlock
236140 2.0618 vmlinux timer_interrupt
175249 1.5302 ld-2.3.2.so do_lookup_versioned
140429 1.2261 sendmail putc
138739 1.2114 sendmail stabhash
134145 1.1713 sendmail getc
-----------------------------------------------------------------------
>From this output what I can analyse is that mark_offset_tsc(which is
called from do_gettimeofday), and some other timer functions, are taking
most of the CPU.
Is there any faster implementation of do_gettimeofday. I am using kernel
2.6.10. with dual P4.
What I found from google search is: http://lwn.net/Articles/9266/ , which
is only for kernel 2.4
Thanks for help.
-Puneet
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Needed faster implementation of do_gettimeofday() 2005-02-20 10:58 Needed faster implementation of do_gettimeofday() puneet_kaushik @ 2005-02-20 15:48 ` Parag Warudkar 2005-02-22 3:06 ` George Anzinger 0 siblings, 1 reply; 6+ messages in thread From: Parag Warudkar @ 2005-02-20 15:48 UTC (permalink / raw) To: puneet_kaushik; +Cc: linux-kernel On Sunday 20 February 2005 05:58 am, puneet_kaushik@persistent.co.in wrote: > 985913 8.6083 vmlinux mark_offset_tsc > 584473 5.1032 libc-2.3.2.so getc What makes you think mark_offset_tsc is slow? Do you have any comparative numbers? It might just be that the workload you are throwing at it justifies it. (For e.g. if your workload does a zillion system calls, system_call will show up as a hot spot in oprofile - doesn't necessarily mean it is slow - it's just overused.) Can you post the relevant code? Parag ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Needed faster implementation of do_gettimeofday() 2005-02-20 15:48 ` Parag Warudkar @ 2005-02-22 3:06 ` George Anzinger 2005-02-22 13:56 ` Puneet Kaushik 0 siblings, 1 reply; 6+ messages in thread From: George Anzinger @ 2005-02-22 3:06 UTC (permalink / raw) To: Parag Warudkar; +Cc: puneet_kaushik, linux-kernel Parag Warudkar wrote: > On Sunday 20 February 2005 05:58 am, puneet_kaushik@persistent.co.in wrote: > >>985913 8.6083 vmlinux mark_offset_tsc >>584473 5.1032 libc-2.3.2.so getc > > > What makes you think mark_offset_tsc is slow? Do you have any comparative > numbers? It might just be that the workload you are throwing at it justifies > it. (For e.g. if your workload does a zillion system calls, system_call will > show up as a hot spot in oprofile - doesn't necessarily mean it is slow - > it's just overused.) Can you post the relevant code? He really is right. Mark offset is reading the PIT counter and that is not only rather dumb but dog slow. A suggestion, try the high res timers patch. Even if you don't use the timers the mark offset there is MUCH faster. It does not read the PIT. The difference is where we assume the jiffie bump is in time. If we assume it is at the point that the PIT interrupts, well then the only way to get to that is to read the PIT. If, on the other hand, we assume it is at the time after the interrrupt where we mark offset, we can observe the "best" time for this event based on the TSC and avoid reading the PIT. Try the HRT patch (see signature below) and see if if doesn't do better. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Needed faster implementation of do_gettimeofday() 2005-02-22 3:06 ` George Anzinger @ 2005-02-22 13:56 ` Puneet Kaushik 2005-02-22 15:46 ` Chris Friesen 2005-02-22 16:44 ` George Anzinger 0 siblings, 2 replies; 6+ messages in thread From: Puneet Kaushik @ 2005-02-22 13:56 UTC (permalink / raw) To: george, kernel-stuff; +Cc: linux-kernel Hello Parag and George, Thanks for immediate reply. The main problem is I am working on a SMP system. I have written a small program that just calls the gettimeofday(), one billion times. I have run it with time utility and it takes almost double time on SMP then a UP. with kernel 2.6.10 on UP real 4m5.495s user 1m17.088s sys 2m48.046s With Kernel 2.6.10 on SMP real 6m24.485s user 1m43.723s sys 4m30.749s And the fact is this SMP machine is faster and with more memory than the UP one. In SMP systems it make a spinlock every time it got called, synchronizes both the processors, and unlock them. Thats all I know about it. George I am just working on your suggestion, let me know if it will work for SMPs. If there is some good implementation for SMP, please let me know. Thanks, - Puneet On Tue, 2005-02-22 at 08:36, George Anzinger wrote: > Parag Warudkar wrote: > > On Sunday 20 February 2005 05:58 am, puneet_kaushik@persistent.co.in wrote: > > > >>985913 8.6083 vmlinux mark_offset_tsc > >>584473 5.1032 libc-2.3.2.so getc > > > > > > What makes you think mark_offset_tsc is slow? Do you have any comparative > > numbers? It might just be that the workload you are throwing at it justifies > > it. (For e.g. if your workload does a zillion system calls, system_call will > > show up as a hot spot in oprofile - doesn't necessarily mean it is slow - > > it's just overused.) Can you post the relevant code? > > He really is right. Mark offset is reading the PIT counter and that is not only > rather dumb but dog slow. > > A suggestion, try the high res timers patch. Even if you don't use the timers > the mark offset there is MUCH faster. It does not read the PIT. > > The difference is where we assume the jiffie bump is in time. If we assume it > is at the point that the PIT interrupts, well then the only way to get to that > is to read the PIT. If, on the other hand, we assume it is at the time after > the interrrupt where we mark offset, we can observe the "best" time for this > event based on the TSC and avoid reading the PIT. > > Try the HRT patch (see signature below) and see if if doesn't do better. > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Needed faster implementation of do_gettimeofday() 2005-02-22 13:56 ` Puneet Kaushik @ 2005-02-22 15:46 ` Chris Friesen 2005-02-22 16:44 ` George Anzinger 1 sibling, 0 replies; 6+ messages in thread From: Chris Friesen @ 2005-02-22 15:46 UTC (permalink / raw) To: Puneet Kaushik; +Cc: george, kernel-stuff, linux-kernel Puneet Kaushik wrote: > Hello Parag and George, > > Thanks for immediate reply. > The main problem is I am working on a SMP system. I have written a small > program that just calls the gettimeofday(), one billion times. I have > run it with time utility and it takes almost double time on SMP then a > UP. If the hardware is known in advance, can you use some arch-specific thing (like rdtsc on intel) to get a timestamp that can then be calibrated by calling gettimeofday() at a lower frequency? There will be issues (may have to use cpu affinity if the two don't run at the same rate, may need to disable any frequency stepping), but it might be possible to work around them. Chris ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Needed faster implementation of do_gettimeofday() 2005-02-22 13:56 ` Puneet Kaushik 2005-02-22 15:46 ` Chris Friesen @ 2005-02-22 16:44 ` George Anzinger 1 sibling, 0 replies; 6+ messages in thread From: George Anzinger @ 2005-02-22 16:44 UTC (permalink / raw) To: Puneet Kaushik; +Cc: kernel-stuff, linux-kernel Puneet Kaushik wrote: > Hello Parag and George, > > Thanks for immediate reply. > The main problem is I am working on a SMP system. I have written a small > program that just calls the gettimeofday(), one billion times. I have > run it with time utility and it takes almost double time on SMP then a > UP. > > > > with kernel 2.6.10 on UP > > real 4m5.495s > user 1m17.088s > sys 2m48.046s > > > With Kernel 2.6.10 on SMP > > real 6m24.485s > user 1m43.723s > sys 4m30.749s > > > And the fact is this SMP machine is faster and with more memory than the > UP one. In SMP systems it make a spinlock every time it got called, > synchronizes both the processors, and unlock them. Thats all I know > about it. On 2.6 the lock is a r/w sequence lock. The machines are not synchronized or locked, but some of the sequence lock instructions around the locking are "locked". I find it hard to believe that this would double the time, however. Ah..., now I remember. On SMP x86 boxen, the accounting/ run_timer interrupt comes from the lapic timer. This is triggered at a 1/HZ rate and means that there is an additional time keeping interrupt. Actually, over the box, you get (N+1)/HZ interrupts where N is the number of cpus. Assuming that the PIT and the lapic interrupt take about the same amount of time and that the PIT interrupt is evenly distributed on the CPUs, the interrupt contention should go from 1 to 1.5. This alone would take your 4.084 sec UP time to 6.125 sec on an SMP boxen (that is amazingly close to what you are seeing if you ask me). Again, I recommend my HRT patch. There the accounting interrupt is generated by an "all-but-self" IPI. This is generated by the PIT interrupt code which also does the accounting on the cpu handling the PIT interrupt. Result: total time keeping interrupts N/HZ where N is the number of CPUs. > > George I am just working on your suggestion, let me know if it will work > for SMPs. See above. Should solve your problem. > > If there is some good implementation for SMP, please let me know. > > Thanks, > > - Puneet > > > > > On Tue, 2005-02-22 at 08:36, George Anzinger wrote: > >>Parag Warudkar wrote: >> >>>On Sunday 20 February 2005 05:58 am, puneet_kaushik@persistent.co.in wrote: >>> >>> >>>>985913 8.6083 vmlinux mark_offset_tsc >>>>584473 5.1032 libc-2.3.2.so getc >>> >>> >>>What makes you think mark_offset_tsc is slow? Do you have any comparative >>>numbers? It might just be that the workload you are throwing at it justifies >>>it. (For e.g. if your workload does a zillion system calls, system_call will >>>show up as a hot spot in oprofile - doesn't necessarily mean it is slow - >>>it's just overused.) Can you post the relevant code? >> >>He really is right. Mark offset is reading the PIT counter and that is not only >>rather dumb but dog slow. >> >>A suggestion, try the high res timers patch. Even if you don't use the timers >>the mark offset there is MUCH faster. It does not read the PIT. >> >>The difference is where we assume the jiffie bump is in time. If we assume it >>is at the point that the PIT interrupts, well then the only way to get to that >>is to read the PIT. If, on the other hand, we assume it is at the time after >>the interrrupt where we mark offset, we can observe the "best" time for this >>event based on the TSC and avoid reading the PIT. >> >>Try the HRT patch (see signature below) and see if if doesn't do better. >> -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-02-22 16:45 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-02-20 10:58 Needed faster implementation of do_gettimeofday() puneet_kaushik 2005-02-20 15:48 ` Parag Warudkar 2005-02-22 3:06 ` George Anzinger 2005-02-22 13:56 ` Puneet Kaushik 2005-02-22 15:46 ` Chris Friesen 2005-02-22 16:44 ` George Anzinger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox