Re: [RFC v2: Patch 1/3] net: hand off skb list to other cpu to submit to upper layer

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Ben Hutchings <bhutchings@solarflare.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	netdev@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	herbert@gondor.apana.org.au, jesse.brandeburg@intel.com,
	shemminger@vyatta.com, David Miller <davem@davemloft.net>
Subject: Re: [RFC v2: Patch 1/3] net: hand off skb list to other cpu to submit to upper layer
Date: Fri, 13 Mar 2009 14:43:22 +0800	[thread overview]
Message-ID: <1236926602.2567.528.camel@ymzhang> (raw)
In-Reply-To: <1236866906.3221.11.camel@achroite>

On Thu, 2009-03-12 at 14:08 +0000, Ben Hutchings wrote:
> On Thu, 2009-03-12 at 16:16 +0800, Zhang, Yanmin wrote:
> > On Wed, 2009-03-11 at 12:13 +0100, Andi Kleen wrote:
> [...]
> > >  and just use the hash function on the
> > > NIC.
> > Sorry. I can't understand what the hash function of NIC is. Perhaps NIC hardware has something
> > like hash function to decide the RX queue number based on SRC/DST?
> 
> Yes, that's exactly what they do.  This feature is sometimes called
> Receive-Side Scaling (RSS) which is Microsoft's name for it.  Microsoft
> requires Windows drivers performing RSS to provide the hash value to the
> networking stack, so Linux drivers for the same hardware should be able
> to do so too.
Oh, I didn't know the background. I need study more about network.
Thanks for explain it.

> 
> > >  Have you considered this for forwarding too?
> > Yes. originally, I plan to add a tx_num under the same sysfs directory, so admin could
> > define that all packets received from a RX queue should be sent out from a specific TX queue.
> 
> The choice of TX queue can be based on the RX hash so that configuration
> is usually unnecessary.
I agree. I double checked the latest codes of tree net-next-2.6 and function skb_tx_hash
is enough. 

> 
> > So struct sk_buff->queue_mapping would be a union of 2 sub-members, rx_num and tx_num. But
> > sk_buff->queue_mapping is just a u16 which is a small type. We might use the most-significant
> > bit of sk_buff->queue_mapping as a flag as rx_num and tx_num wouldn't exist at the
> > same time.
> > 
> > >  The trick here would
> > > be to try to avoid reordering inside streams as far as possible,
> > It's not to solve reorder issue. The start point is 10G NIC is very fast. We need some cpu
> > work on packet receiving dedicately. If they work on other things, NIC might drop packets
> > quickly.
> 

> Aggressive power-saving causes far greater latency than context-
> switching under Linux.
Yes when NIC is free mostly. When NIC is busy, it wouldn't enter power-saving mode.
Performance testing is used to turn off all power-saving modes. :)

>   I believe most 10G NICs have large RX FIFOs to
> mitigate against this.  Ethernet flow control also helps to prevent
> packet loss.
I guess NIC might allocate resources evenly for all queues, at least by default. If considering
packet sending burst with the same SRC/DST, a specific queue might be full quickly. I
instrumented driver and kernel to print out packet receiving and forwarding. As The latest IXGBE
driver gets a packet and forwards it immediately, I think most packets are dropped by hardware
because cpu doesn't collects packets quickly when the specific receiving queue is full. By
comparing the sending speed and forwarding speed, we could get the dropping rate easily.

My experiment shows receving cpu idle is more than 50% and cpu does often collect all packets
till the specific queue is empty. I think that's because pktgen switches to a new SRC/DST to
produce another burst to fill other queues quickly.

It's hard to say cpu is slower than NIC because they work on different parts of the full
receiving/processing procedures. But we need cpu collect packets ASAP.


> > The sysfs interface is just to facilitate NIC drivers. If there is no the sysfs interface,
> > driver developers need implement it with parameters which are painful.
> [...]
> 
> Or through the ethtool API, which already has some multiqueue control
> operations.
That's an alternative approach to configure it. If checking the sample patch on driver,
we can find the change is very small.

Thanks for your kind comments.

Yanmin

next prev parent reply	other threads:[~2009-03-13  6:43 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-11  8:53 [RFC v2: Patch 1/3] net: hand off skb list to other cpu to submit to upper layer Zhang, Yanmin
2009-03-11 11:13 ` Andi Kleen
2009-03-12  8:16   ` Zhang, Yanmin
2009-03-12 14:08     ` Ben Hutchings
2009-03-13  6:43       ` Zhang, Yanmin [this message]
2009-03-13 17:06         ` Tom Herbert
2009-03-13 18:51           ` David Miller
2009-03-13 21:01             ` Tom Herbert
2009-03-13 22:10               ` Ben Hutchings
2009-03-13 22:15                 ` Stephen Hemminger
     [not found]             ` <65634d660903131358h765bef64y6a0f1b0db7400f6f@mail.gmail.com>
2009-03-13 21:02               ` David Miller
2009-03-13 21:59                 ` Tom Herbert
2009-03-13 22:19                   ` David Miller
2009-03-13 23:58                     ` Herbert Xu
2009-03-14  0:24                     ` Tom Herbert
2009-03-14  1:53                       ` Andi Kleen
2009-03-14  2:19                       ` David Miller
2009-03-14 13:19                         ` Herbert Xu
2009-03-14 18:15                         ` Tom Herbert
2009-03-14 18:45                           ` David Miller
2009-03-16 16:53                             ` Tom Herbert
2009-03-14  1:51               ` Andi Kleen
2009-03-16  3:20           ` Zhang, Yanmin
2009-03-12 14:34     ` Andi Kleen
2009-03-13  9:06       ` Zhang, Yanmin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1236926602.2567.528.camel@ymzhang \
    --to=yanmin_zhang@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=jesse.brandeburg@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).