Re: rps perfomance WAS(Re: rps: question

All of lore.kernel.org
 help / color / mirror / Atom feed

From: jamal <hadi@cyberus.ca>
To: Andi Kleen <andi@firstfloor.org>
Cc: Changli Gao <xiaosuo@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Rick Jones <rick.jones2@hp.com>,
	David Miller <davem@davemloft.net>,
	therbert@google.com, netdev@vger.kernel.org, robert@herjulf.net
Subject: Re: rps perfomance WAS(Re: rps: question
Date: Fri, 16 Apr 2010 09:58:46 -0400	[thread overview]
Message-ID: <1271426326.4606.83.camel@bigi> (raw)
In-Reply-To: <20100416133707.GZ18855@one.firstfloor.org>

On Fri, 2010-04-16 at 15:37 +0200, Andi Kleen wrote:
> On Fri, Apr 16, 2010 at 09:27:35AM -0400, jamal wrote:

> > So you are saying that the old implementation of IPI (likely what i
> > tried pre-napi and as recent as 2-3 years ago) was bad because of a
> > single lock?
> 
> Yes.

> The old implementation of smp_call_function. Also in the really old
> days there was no smp_call_function_single() so you tended to broadcast.
> 
> Jens did a lot of work on this for his block device work IPI implementation.

Nice - thanks for that info! So not only has h/ware improved, but
implementation as well..

> > On IPIs:
> > Is anyone familiar with what is going on with Nehalem? Why is it this
> > good? I expect things will get a lot nastier with other hardware like
> > xeon based or even Nehalem with rps going across QPI.
> 
> Nehalem is just fast. I don't know why it's fast in your specific
> case. It might be simply because it has lots of bandwidth everywhere.
> Atomic operations are also faster than on previous Intel CPUs.

Well, the cache architecture is nicer. The on-die MC is nice. No more
shared MC hub/FSB. The 3 MC channels are nice. Intel finally beating
AMD ;-> someone did a measurement of the memory timings (L1, L2, L3, MM
and the results were impressive; i have the numbers somewhere).

> 
> > Here's why i think IPIs are bad, please correct me if i am wrong:
> > - they are synchronous. i.e an IPI issuer has to wait for an ACK (which
> > is in the form of an IPI).
> 
> In the hardware there's no ack, but in the Linux implementation there
> is usually (because need to know when to free the stack state used
> to pass information)
>
> However there's also now support for queued IPI
> with a special API (I believe Tom is using that)
> 

Which is the non-queued-IPI call?

> > - data cache has to be synced to main memory
> > - the instruction pipeline is flushed
> 
> At least on Nehalem data transfer can be often through the cache.

I thought you have to go all the way to MM in case of IPIs.

> IPIs involve APIC accesses which are not very fast (so overall
> it's far more than a pipeline worth of work), but it's still
> not a incredible expensive operation.
> 
> There's also X2APIC now which should be slightly faster, but it's 
> likely not in your Nehalem (this is only in the highend Xeon versions)
> 

Ok, true - forgot about the APIC as well...

> > Do you know any specs i could read up which will tell me a little more?
> 
> If you're just interested in IPI and cache line transfer performance it's
> probably best to just measure it.

There are tools like benchit which would give me L1,2,3,MM measurements;
for IPI the ping + rps test i did maybe sufficient.

> Some general information is always in the Intel optimization guide.

Thanks Andi!

cheers,
jamal

next prev parent reply	other threads:[~2010-04-16 13:58 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-07 18:42 rps: question jamal
2010-02-08  5:58 ` Tom Herbert
2010-02-08 15:09   ` jamal
2010-04-14 11:53     ` rps perfomance WAS(Re: " jamal
2010-04-14 17:31       ` Tom Herbert
2010-04-14 18:04         ` Eric Dumazet
2010-04-14 18:53           ` jamal
2010-04-14 19:44             ` Stephen Hemminger
2010-04-14 19:58               ` Eric Dumazet
2010-04-15  8:51                 ` David Miller
2010-04-14 20:22               ` jamal
2010-04-14 20:27                 ` Eric Dumazet
2010-04-14 20:38                   ` jamal
2010-04-14 20:45                   ` Tom Herbert
2010-04-14 20:57                     ` Eric Dumazet
2010-04-14 22:51                       ` Changli Gao
2010-04-14 23:02                         ` Stephen Hemminger
2010-04-15  2:40                           ` Eric Dumazet
2010-04-15  2:50                             ` Changli Gao
2010-04-15  8:57                       ` David Miller
2010-04-15 12:10                       ` jamal
2010-04-15 12:32                         ` Changli Gao
2010-04-15 12:50                           ` jamal
2010-04-15 23:51                             ` Changli Gao
2010-04-15  8:51                 ` David Miller
2010-04-14 20:34               ` Andi Kleen
2010-04-15  8:50               ` David Miller
2010-04-15  8:48             ` David Miller
2010-04-15 11:55               ` jamal
2010-04-15 16:41                 ` Rick Jones
2010-04-15 20:16                   ` jamal
2010-04-15 20:25                     ` Rick Jones
2010-04-15 23:56                     ` Changli Gao
2010-04-16  5:18                       ` Eric Dumazet
2010-04-16  6:02                         ` Changli Gao
2010-04-16  6:28                           ` Tom Herbert
2010-04-16  6:32                           ` Eric Dumazet
2010-04-16 13:42                             ` jamal
2010-04-16  7:15                           ` Andi Kleen
2010-04-16 13:27                             ` jamal
2010-04-16 13:37                               ` Andi Kleen
2010-04-16 13:58                                 ` jamal [this message]
2010-04-16 13:21                         ` jamal
2010-04-16 13:34                           ` Changli Gao
2010-04-16 13:49                             ` jamal
2010-04-16 14:10                               ` Changli Gao
2010-04-16 14:43                                 ` jamal
2010-04-16 14:58                                   ` Changli Gao
2010-04-19 12:48                                     ` jamal
2010-04-17  7:35                           ` Eric Dumazet
2010-04-17  8:43                             ` Tom Herbert
2010-04-17  9:23                               ` Eric Dumazet
2010-04-17 14:27                                 ` Eric Dumazet
2010-04-17 17:26                                   ` Tom Herbert
2010-04-17 14:17                               ` [PATCH net-next-2.6] net: remove time limit in process_backlog() Eric Dumazet
2010-04-18  9:36                                 ` David Miller
2010-04-17 17:31                             ` rps perfomance WAS(Re: rps: question jamal
2010-04-18  9:39                               ` Eric Dumazet
2010-04-18 11:34                                 ` Eric Dumazet
2010-04-19  2:09                                   ` jamal
2010-04-19  9:37                                   ` [RFC] rps: shortcut net_rps_action() Eric Dumazet
2010-04-19  9:48                                     ` Changli Gao
2010-04-19 12:14                                       ` Eric Dumazet
2010-04-19 12:28                                         ` Changli Gao
2010-04-19 13:27                                           ` Eric Dumazet
2010-04-19 14:22                                             ` Eric Dumazet
2010-04-19 15:07                                               ` [PATCH net-next-2.6] " Eric Dumazet
2010-04-19 16:02                                                 ` Tom Herbert
2010-04-19 20:21                                                 ` David Miller
2010-04-20  7:17                                                   ` [PATCH net-next-2.6] rps: cleanups Eric Dumazet
2010-04-20  8:18                                                     ` David Miller
2010-04-19 23:56                                                 ` [PATCH net-next-2.6] rps: shortcut net_rps_action() Changli Gao
2010-04-20  0:32                                                   ` Changli Gao
2010-04-20  5:55                                                     ` Eric Dumazet
2010-04-20 12:02                                   ` rps perfomance WAS(Re: rps: question jamal
2010-04-20 13:13                                     ` Eric Dumazet
     [not found]                                       ` <1271853570.4032.21.camel@bigi>
2010-04-21 19:01                                         ` Eric Dumazet
2010-04-22  1:27                                           ` Changli Gao
2010-04-22 12:12                                           ` jamal
2010-04-25  2:31                                             ` Changli Gao
2010-04-26 11:35                                               ` jamal
2010-04-26 13:35                                                 ` Changli Gao
2010-04-21 21:53                                         ` Rick Jones
2010-04-16 15:57             ` Tom Herbert
2010-04-14 18:53       ` Stephen Hemminger
2010-04-15  8:42       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1271426326.4606.83.camel@bigi \
    --to=hadi@cyberus.ca \
    --cc=andi@firstfloor.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hp.com \
    --cc=robert@herjulf.net \
    --cc=therbert@google.com \
    --cc=xiaosuo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.