From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: netdev@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
herbert@gondor.apana.org.au, jesse.brandeburg@intel.com,
shemminger@vyatta.com, David Miller <davem@davemloft.net>
Subject: Re: [RFC v2: Patch 1/3] net: hand off skb list to other cpu to submit to upper layer
Date: Fri, 13 Mar 2009 17:06:47 +0800 [thread overview]
Message-ID: <1236935207.2567.559.camel@ymzhang> (raw)
In-Reply-To: <20090312143427.GJ11935@one.firstfloor.org>
On Thu, 2009-03-12 at 15:34 +0100, Andi Kleen wrote:
> On Thu, Mar 12, 2009 at 04:16:32PM +0800, Zhang, Yanmin wrote:
> >
> > > Seems very inconvenient to have to configure this by hand.
> > A little, but not too much, especially when we consider there is interrupt binding.
>
> Interrupt binding is something popular for benchmarks, but most users
> don't (and shouldn't need to) care. Having it work well out of the box
> without special configuration is very important.
Thanks Andi. You tell the truth. Now I understand why David Miller is working
on auto TX selection.
One thing I want to clarify is, with the default configuration, the processing path
still goes to current automation selection. That means my method has little impact
on current automation selection with default configuration, except a small cache miss.
Another exception is IXGBE prefers to getting one packet and sending one packet
immediately instead of backlog.
Even when turning on the new capability to separate packet receiving and packet
processing, TX selection is still following current automatic selection. The difference
is we use different cpu. Driver still could record RX number into skb which is used
when sending out.
>
> >
> > > How about
> > > auto selecting one that shares the same LLC or somesuch?
> > There are 2 kinds of LLC sharing here.
> > 1) RX/TX share the LLC;
> > 2) All RX share the LLC of some cpus and TX share the LLC of other cpus.
> >
> > Item 1) is important, but sometimes item 2) is also important when the sending speed is
> > very high and huge data is on flight which flushes cpu cache quickly.
> > It's hard to distinguish the 2 different scenarioes automatically.
>
> Why is it hard if you know the CPUs?
RX binding depends on interrupt binding totally. If the MSI-X interrupt is sent to cpu A,
cpu A will collect the packets on the RX queue. By default, interrupt isn't bound.
Software knows the LLC sharing of cpu A. If cpu A receives the interrupt, it couldn't just
throw packets to other cpus which share its LLC, because it doesn't know whether other cpus
are collecting packets from other RX queues now.
>
> > > and just use the hash function on the
> > > NIC.
> > Sorry. I can't understand what the hash function of NIC is. Perhaps NIC hardware has something
> > like hash function to decide the RX queue number based on SRC/DST?
>
> There's a Microsoft spec for a standard hash function that does this
> on NICs and all the serious ones support it these days. The hash
> is normally used to select a MSI-X target based on the input header.
Thanks for the explanation. The capability defined by the spec is to choose
a MSI-X number and provides a hint when sending a cloned packet out. Does the NIC
know how cpu is busy? I assume not. So the hash is trying to distribute packets
into RX queues evenly while also avoiding reorder.
We might say irqbalance could balance workload so we expect cpu workload is
even. My testing shows such evenly distribution of packets on all cpu isn't
good at performance.
>
> I think if that works your manual target shouldn't be necessary.
Here are 2 targets with my method. The one is packet collecting cpu and the other
is packet processing cpu.
As NIC doesn't know how busy cpu is, why can't we separate the processing?
>
> > > The trick here would
> > > be to try to avoid reordering inside streams as far as possible,
> > It's not to solve reorder issue. The start point is 10G NIC is very fast. We need some cpu
>
> Point was that any solution shouldn't add more reordering. But when a RSS
> hash is used there is no reordering on stream basis.
Yes.
Thanks again.
Yanmin
prev parent reply other threads:[~2009-03-13 9:07 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-11 8:53 [RFC v2: Patch 1/3] net: hand off skb list to other cpu to submit to upper layer Zhang, Yanmin
2009-03-11 11:13 ` Andi Kleen
2009-03-12 8:16 ` Zhang, Yanmin
2009-03-12 14:08 ` Ben Hutchings
2009-03-13 6:43 ` Zhang, Yanmin
2009-03-13 17:06 ` Tom Herbert
2009-03-13 18:51 ` David Miller
2009-03-13 21:01 ` Tom Herbert
2009-03-13 22:10 ` Ben Hutchings
2009-03-13 22:15 ` Stephen Hemminger
[not found] ` <65634d660903131358h765bef64y6a0f1b0db7400f6f@mail.gmail.com>
2009-03-13 21:02 ` David Miller
2009-03-13 21:59 ` Tom Herbert
2009-03-13 22:19 ` David Miller
2009-03-13 23:58 ` Herbert Xu
2009-03-14 0:24 ` Tom Herbert
2009-03-14 1:53 ` Andi Kleen
2009-03-14 2:19 ` David Miller
2009-03-14 13:19 ` Herbert Xu
2009-03-14 18:15 ` Tom Herbert
2009-03-14 18:45 ` David Miller
2009-03-16 16:53 ` Tom Herbert
2009-03-14 1:51 ` Andi Kleen
2009-03-16 3:20 ` Zhang, Yanmin
2009-03-12 14:34 ` Andi Kleen
2009-03-13 9:06 ` Zhang, Yanmin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1236935207.2567.559.camel@ymzhang \
--to=yanmin_zhang@linux.intel.com \
--cc=andi@firstfloor.org \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=jesse.brandeburg@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).