parallel networking (was Re: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jeff Garzik <jeff@garzik.org>
To: hadi@cyberus.ca
Cc: David Miller <davem@davemloft.net>,
	peter.p.waskiewicz.jr@intel.com, krkumar2@in.ibm.com,
	johnpol@2ka.mipt.ru, herbert@gondor.apana.org.au,
	kaber@trash.net, shemminger@linux-foundation.org,
	jagana@us.ibm.com, Robert.Olsson@data.slu.se, rick.jones2@hp.com,
	xma@us.ibm.com, gaagaan@gmail.com, netdev@vger.kernel.org,
	rdreier@cisco.com, Ingo Molnar <mingo@elte.hu>,
	mchan@broadcom.com, general@lists.openfabrics.org,
	kumarkr@linux.ibm.com, tgraf@suug.ch, randy.dunlap@oracle.com,
	sri@us.ibm.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: parallel networking (was Re: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock)
Date: Mon, 08 Oct 2007 10:22:28 -0400	[thread overview]
Message-ID: <470A3D24.3050803@garzik.org> (raw)
In-Reply-To: <1191850490.4352.41.camel@localhost>

jamal wrote:
> On Sun, 2007-07-10 at 21:51 -0700, David Miller wrote:
> 
>> For these high performance 10Gbit cards it's a load balancing
>> function, really, as all of the transmit queues go out to the same
>> physical port so you could:
>>
>> 1) Load balance on CPU number.
>> 2) Load balance on "flow"
>> 3) Load balance on destination MAC
>>
>> etc. etc. etc.
> 
> The brain-block i am having is the parallelization aspect of it.
> Whatever scheme it is - it needs to ensure the scheduler works as
> expected. For example, if it was a strict prio scheduler i would expect
> that whatever goes out is always high priority first and never ever
> allow a low prio packet out at any time theres something high prio
> needing to go out. If i have the two priorities running on two cpus,
> then i cant guarantee that effect.

Any chance the NIC hardware could provide that guarantee?

8139cp, for example, has two TX DMA rings, with hardcoded 
characteristics:  one is a high prio q, and one a low prio q.  The logic 
is pretty simple:   empty the high prio q first (potentially starving 
low prio q, in worst case).

In terms of overall parallelization, both for TX as well as RX, my gut 
feeling is that we want to move towards an MSI-X, multi-core friendly 
model where packets are LIKELY to be sent and received by the same set 
of [cpus | cores | packages | nodes] that the [userland] processes 
dealing with the data.

There are already some primitive NUMA bits in skbuff allocation, but 
with modern MSI-X and RX/TX flow hashing we could do a whole lot more, 
along the lines of better CPU scheduling decisions, directing flows to 
clusters of cpus, and generally doing a better job of maximizing cache 
efficiency in a modern multi-thread environment.

IMO the current model where each NIC's TX completion and RX processes 
are both locked to the same CPU is outmoded in a multi-core world with 
modern NICs.  :)

But I readily admit general ignorance about the kernel process 
scheduling stuff, so my only idea about a starting point was to see how 
far to go with the concept of "skb affinity" -- a mask in sk_buff that 
is a hint about which cpu(s) on which the NIC should attempt to send and 
receive packets.  When going through bonding or netfilter, it is trivial 
to 'or' together affinity masks.  All the various layers of net stack 
should attempt to honor the skb affinity, where feasible (requires 
interaction with CFS scheduler?).

Or maybe skb affinity is a dumb idea.  I wanted to get people thinking 
on the bigger picture.  Parallelization starts at the user process.

	Jeff

next      parent reply	other threads:[~2007-10-08 14:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1190674298.4264.24.camel@localhost>
     [not found] ` <D5C1322C3E673F459512FB59E0DDC32903A51462@orsmsx414.amr.corp.intel.com>
     [not found]   ` <1190677099.4264.37.camel@localhost>
     [not found]     ` <20071007.215124.85709188.davem@davemloft.net>
     [not found]       ` <1191850490.4352.41.camel@localhost>
2007-10-08 14:22         ` Jeff Garzik [this message]
2007-10-08 15:18           ` parallel networking (was Re: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock) jamal
2007-10-08 21:11           ` parallel networking David Miller
2007-10-08 22:30             ` jamal
2007-10-08 22:33               ` David Miller
2007-10-08 22:35                 ` Waskiewicz Jr, Peter P
2007-10-08 23:42                 ` jamal
2007-10-09  1:53             ` Jeff Garzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=470A3D24.3050803@garzik.org \
    --to=jeff@garzik.org \
    --cc=Robert.Olsson@data.slu.se \
    --cc=davem@davemloft.net \
    --cc=gaagaan@gmail.com \
    --cc=general@lists.openfabrics.org \
    --cc=hadi@cyberus.ca \
    --cc=herbert@gondor.apana.org.au \
    --cc=jagana@us.ibm.com \
    --cc=johnpol@2ka.mipt.ru \
    --cc=kaber@trash.net \
    --cc=krkumar2@in.ibm.com \
    --cc=kumarkr@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchan@broadcom.com \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=peter.p.waskiewicz.jr@intel.com \
    --cc=randy.dunlap@oracle.com \
    --cc=rdreier@cisco.com \
    --cc=rick.jones2@hp.com \
    --cc=shemminger@linux-foundation.org \
    --cc=sri@us.ibm.com \
    --cc=tgraf@suug.ch \
    --cc=xma@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox