netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: netdev@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	herbert@gondor.apana.org.au, jesse.brandeburg@intel.com,
	shemminger@vyatta.com, David Miller <davem@davemloft.net>
Subject: Re: [RFC v2: Patch 1/3] net: hand off skb list to other cpu to submit to upper layer
Date: Thu, 12 Mar 2009 16:16:32 +0800	[thread overview]
Message-ID: <1236845792.2567.484.camel@ymzhang> (raw)
In-Reply-To: <877i2wfh1l.fsf@basil.nowhere.org>

On Wed, 2009-03-11 at 12:13 +0100, Andi Kleen wrote:
> "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> writes:
> 
> > I got some comments. Special thanks to Stephen Hemminger for teaching me on
> > what reorder is and some other comments. Also thank other guys who raised comments.
> 
> 
> >
> > v2 has some improvements.
> > 1) Add new sysfs interface /sys/class/net/ethXXX/rx_queueXXX/processing_cpu. Admin
> > could use it to configure the binding between RX and cpu number. So it's convenient
> > for drivers to use the new capability.
> 

> Seems very inconvenient to have to configure this by hand.
A little, but not too much, especially when we consider there is interrupt binding.

>  How about
> auto selecting one that shares the same LLC or somesuch?
There are 2 kinds of LLC sharing here.
1) RX/TX share the LLC;
2) All RX share the LLC of some cpus and TX share the LLC of other cpus.

Item 1) is important, but sometimes item 2) is also important when the sending speed is
very high and huge data is on flight which flushes cpu cache quickly.
It's hard to distinguish the 2 different scenarioes automatically.

>  Passing
> data to anything with the same LLC should be cheap enough.
Yes, when the data isn't huge. My forwarding testing currently could reach at 270M bytes per
second on Nehalem and I wish higher if I could get the latest NICs.


> BTW the standard idea to balance processing over multiple CPUs was to
> use MSI-X to multiple CPUs.
Yes. My method still depends on MSI-X and multi-queue. One difference is I just need less than
CPU_NUM interrupt numbers as there are only some cpus working on packet receiving.

>  and just use the hash function on the
> NIC.
Sorry. I can't understand what the hash function of NIC is. Perhaps NIC hardware has something
like hash function to decide the RX queue number based on SRC/DST?

>  Have you considered this for forwarding too?
Yes. originally, I plan to add a tx_num under the same sysfs directory, so admin could
define that all packets received from a RX queue should be sent out from a specific TX queue.
So struct sk_buff->queue_mapping would be a union of 2 sub-members, rx_num and tx_num. But
sk_buff->queue_mapping is just a u16 which is a small type. We might use the most-significant
bit of sk_buff->queue_mapping as a flag as rx_num and tx_num wouldn't exist at the
same time.

>  The trick here would
> be to try to avoid reordering inside streams as far as possible,
It's not to solve reorder issue. The start point is 10G NIC is very fast. We need some cpu
work on packet receiving dedicately. If they work on other things, NIC might drop packets
quickly.

The sysfs interface is just to facilitate NIC drivers. If there is no the sysfs interface,
driver developers need implement it with parameters which are painful.

>  but
> since the NIC hash should work on flow basis that should be ok.
Yes, hardware is good at preventing reorder. My method doesn't change the order in software
layer.

Thanks Andi.



  reply	other threads:[~2009-03-12  8:17 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-11  8:53 [RFC v2: Patch 1/3] net: hand off skb list to other cpu to submit to upper layer Zhang, Yanmin
2009-03-11 11:13 ` Andi Kleen
2009-03-12  8:16   ` Zhang, Yanmin [this message]
2009-03-12 14:08     ` Ben Hutchings
2009-03-13  6:43       ` Zhang, Yanmin
2009-03-13 17:06         ` Tom Herbert
2009-03-13 18:51           ` David Miller
2009-03-13 21:01             ` Tom Herbert
2009-03-13 22:10               ` Ben Hutchings
2009-03-13 22:15                 ` Stephen Hemminger
     [not found]             ` <65634d660903131358h765bef64y6a0f1b0db7400f6f@mail.gmail.com>
2009-03-13 21:02               ` David Miller
2009-03-13 21:59                 ` Tom Herbert
2009-03-13 22:19                   ` David Miller
2009-03-13 23:58                     ` Herbert Xu
2009-03-14  0:24                     ` Tom Herbert
2009-03-14  1:53                       ` Andi Kleen
2009-03-14  2:19                       ` David Miller
2009-03-14 13:19                         ` Herbert Xu
2009-03-14 18:15                         ` Tom Herbert
2009-03-14 18:45                           ` David Miller
2009-03-16 16:53                             ` Tom Herbert
2009-03-14  1:51               ` Andi Kleen
2009-03-16  3:20           ` Zhang, Yanmin
2009-03-12 14:34     ` Andi Kleen
2009-03-13  9:06       ` Zhang, Yanmin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1236845792.2567.484.camel@ymzhang \
    --to=yanmin_zhang@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=jesse.brandeburg@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).