netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* gettimeofday scalability
@ 2004-10-05 16:36 P
  2004-10-05 18:35 ` David S. Miller
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: P @ 2004-10-05 16:36 UTC (permalink / raw)
  To: netdev; +Cc: Ingo Molnar, Andrea Arcangeli

I'm starting to look again at the performance of my packet sniffer.
Any performace tips are appreciated (I'm using irq affinity and
CONFIG_PACKET_MMAP on 2.4.20 on a dual P4 xeon at present).

In particular I was wondering about reducing the overhead of
calling do_gettimeofday.

I noticed in the following paper that the xeon is much less
efficient than the P3 for gettimeofday (for the syscall at least):
http://www.labs.fujitsu.com/en/techinfo/linux/lse-0211/lse-0211.pdf

I've seen various gettimeofday locking speedup patches floating
around for 2.4. There is a version from Stephen and Andrea
that uses frlock, claiming 18%, and one from ingo that uses brlock.
2.6.8.1 uses seqlock, which contains the comment
that it's not as cache friendly as brlock.

So can anyone summarise the relative merits of these locking
mechanisms, before I start benchmarking?

thanks,
Pádraig.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday scalability
  2004-10-05 16:36 gettimeofday scalability P
@ 2004-10-05 18:35 ` David S. Miller
  2004-10-05 18:48 ` Stephen Hemminger
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: David S. Miller @ 2004-10-05 18:35 UTC (permalink / raw)
  To: P; +Cc: netdev, mingo, andrea

On Tue, 05 Oct 2004 17:36:06 +0100
P@draigBrady.com wrote:

> So can anyone summarise the relative merits of these locking
> mechanisms, before I start benchmarking?

Seq locks plus the timer interpolator layer found in 2.6.x kernels
is the most scalable gettimeofday() implementation currently.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday scalability
  2004-10-05 16:36 gettimeofday scalability P
  2004-10-05 18:35 ` David S. Miller
@ 2004-10-05 18:48 ` Stephen Hemminger
  2004-10-05 18:55 ` Andrea Arcangeli
  2004-10-05 19:18 ` Ingo Molnar
  3 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2004-10-05 18:48 UTC (permalink / raw)
  To: P; +Cc: netdev, Ingo Molnar, Andrea Arcangeli

On Tue, 2004-10-05 at 17:36 +0100, P@draigBrady.com wrote:
> I'm starting to look again at the performance of my packet sniffer.
> Any performace tips are appreciated (I'm using irq affinity and
> CONFIG_PACKET_MMAP on 2.4.20 on a dual P4 xeon at present).
> 
> In particular I was wondering about reducing the overhead of
> calling do_gettimeofday.
> 
> I noticed in the following paper that the xeon is much less
> efficient than the P3 for gettimeofday (for the syscall at least):
> http://www.labs.fujitsu.com/en/techinfo/linux/lse-0211/lse-0211.pdf
> 
> I've seen various gettimeofday locking speedup patches floating
> around for 2.4. There is a version from Stephen and Andrea
> that uses frlock, claiming 18%, and one from ingo that uses brlock.
> 2.6.8.1 uses seqlock, which contains the comment
> that it's not as cache friendly as brlock.

Don't bother with doing new work on 2.4. Look at 2.6.
You could use TSC in user space but you aren't going to see absolute
times and you run into all the portablity, and possible speed change
issues.

> So can anyone summarise the relative merits of these locking
> mechanisms, before I start benchmarking?
> 
> thanks,
> Pádraig.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday scalability
  2004-10-05 16:36 gettimeofday scalability P
  2004-10-05 18:35 ` David S. Miller
  2004-10-05 18:48 ` Stephen Hemminger
@ 2004-10-05 18:55 ` Andrea Arcangeli
  2004-10-05 19:16   ` P
  2004-10-05 19:18 ` Ingo Molnar
  3 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2004-10-05 18:55 UTC (permalink / raw)
  To: P; +Cc: netdev, Ingo Molnar

On Tue, Oct 05, 2004 at 05:36:06PM +0100, P@draigBrady.com wrote:
> I've seen various gettimeofday locking speedup patches floating
> around for 2.4. There is a version from Stephen and Andrea
> that uses frlock, claiming 18%, and one from ingo that uses brlock.
> 2.6.8.1 uses seqlock, which contains the comment
> that it's not as cache friendly as brlock.

seqlock and frlock are the same thing. I don't see how the brlock can
work well given the fact you'll have to take it in write mode at every
timer irq. Maybe it works on a 2-way, sure not more than that. brlock
should be totally replaced by RCU anyways. brlock can also starve the
writer, which make it a security DoS (at least for some architecture,
there were two implementations, maybe one is safe).

> So can anyone summarise the relative merits of these locking
> mechanisms, before I start benchmarking?

frlock/seqlock (2.4/2.6 respectively) is the way to go, no write
starvation, and zero cacheline bouncing. 

upgrade to x86-64, there I implemented gettimeofday with vsyscalls which
also avoids entering exiting userspace which becomes the by far biggest
overhead after using seqlock. (speedup is tenfold or so)

Hope this helps.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday scalability
  2004-10-05 18:55 ` Andrea Arcangeli
@ 2004-10-05 19:16   ` P
  2004-10-05 19:35     ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: P @ 2004-10-05 19:16 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: netdev, Ingo Molnar

Andrea Arcangeli wrote:
> On Tue, Oct 05, 2004 at 05:36:06PM +0100, P@draigBrady.com wrote:
> 
>>So can anyone summarise the relative merits of these locking
>>mechanisms, before I start benchmarking?
> 
> 
> frlock/seqlock (2.4/2.6 respectively) is the way to go, no write
> starvation, and zero cacheline bouncing. 

Cheers.

Perhaps the confusing comment wrt brlock at the
top of seqlock.h can be changed so?

> upgrade to x86-64, there I implemented gettimeofday with vsyscalls which
> also avoids entering exiting userspace which becomes the by far biggest
> overhead after using seqlock. (speedup is tenfold or so)

This is all in kernel space.
However Stephen's suggestion of reading the tsc in user space
may be a runner, as I just care about relative times.

thanks guys!

Pádraig.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday scalability
  2004-10-05 16:36 gettimeofday scalability P
                   ` (2 preceding siblings ...)
  2004-10-05 18:55 ` Andrea Arcangeli
@ 2004-10-05 19:18 ` Ingo Molnar
  3 siblings, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2004-10-05 19:18 UTC (permalink / raw)
  To: P; +Cc: netdev, Andrea Arcangeli


* P@draigBrady.com <P@draigBrady.com> wrote:

> In particular I was wondering about reducing the overhead of
> calling do_gettimeofday.

> 2.6.8.1 uses seqlock, which contains the comment that it's not as
> cache friendly as brlock.

that comment is way too modest! Seqlocks are very cache-friendly in the
read path. There is no reason to use brlocks anymore for fixed-frequency
writers like the timer seqlock. (writers can starve seqlock readers but
in the timer case the writers occur only once every 1 msec.)

so please benchmark 2.6, it should scale linearly in this area.

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday scalability
  2004-10-05 19:16   ` P
@ 2004-10-05 19:35     ` Andrea Arcangeli
  0 siblings, 0 replies; 7+ messages in thread
From: Andrea Arcangeli @ 2004-10-05 19:35 UTC (permalink / raw)
  To: P; +Cc: netdev, Ingo Molnar

On Tue, Oct 05, 2004 at 08:16:33PM +0100, P@draigBrady.com wrote:
> Andrea Arcangeli wrote:
> >On Tue, Oct 05, 2004 at 05:36:06PM +0100, P@draigBrady.com wrote:
> >
> >>So can anyone summarise the relative merits of these locking
> >>mechanisms, before I start benchmarking?
> >
> >
> >frlock/seqlock (2.4/2.6 respectively) is the way to go, no write
> >starvation, and zero cacheline bouncing. 
> 
> Cheers.
> 
> Perhaps the confusing comment wrt brlock at the
> top of seqlock.h can be changed so?

I guess so.

> This is all in kernel space.

vsyscalls are in userspace, but you will not notice the difference.

Or do you mean your code is in kernel space? vsyscalls would run from
kernel space too, but then you can use gettimeofday by hand with seqlock
and it won't be any different.

> However Stephen's suggestion of reading the tsc in user space
> may be a runner, as I just care about relative times.

Stephen's suggestion will lead to your app breaking on asymmetric TSC on
SMP if you're not careful, if you use the vsyscall on a correct kernel
(like x86-64) you won't take that risk (HPETS avoids that, slower than
the tsc but the only safe one). Otherwise you've to use process affinity
+ tsc, only with cpu binding you're safe. Some big app uses TSC but only
optionally, so you can turn it on/off depending on the hardware (on
x86-64 this obsoleted by vgettimeofday, that's a x86-only hack).

hope this helps ;)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-10-05 19:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-05 16:36 gettimeofday scalability P
2004-10-05 18:35 ` David S. Miller
2004-10-05 18:48 ` Stephen Hemminger
2004-10-05 18:55 ` Andrea Arcangeli
2004-10-05 19:16   ` P
2004-10-05 19:35     ` Andrea Arcangeli
2004-10-05 19:18 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).