Re: parallel networking

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jeff Garzik <jeff@garzik.org>
To: David Miller <davem@davemloft.net>
Cc: hadi@cyberus.ca, peter.p.waskiewicz.jr@intel.com,
	krkumar2@in.ibm.com, johnpol@2ka.mipt.ru,
	herbert@gondor.apana.org.au, kaber@trash.net,
	shemminger@linux-foundation.org, jagana@us.ibm.com,
	Robert.Olsson@data.slu.se, rick.jones2@hp.com, xma@us.ibm.com,
	gaagaan@gmail.com, netdev@vger.kernel.org, rdreier@cisco.com,
	mingo@elte.hu, mchan@broadcom.com, general@lists.openfabrics.org,
	kumarkr@linux.ibm.com, tgraf@suug.ch, randy.dunlap@oracle.com,
	sri@us.ibm.com, linux-kernel@vger.kernel.org
Subject: Re: parallel networking
Date: Mon, 08 Oct 2007 21:53:25 -0400	[thread overview]
Message-ID: <470ADF15.2090100@garzik.org> (raw)
In-Reply-To: <20071008.141154.107706003.davem@davemloft.net>

David Miller wrote:
> From: Jeff Garzik <jeff@garzik.org>
> Date: Mon, 08 Oct 2007 10:22:28 -0400
> 
>> In terms of overall parallelization, both for TX as well as RX, my gut 
>> feeling is that we want to move towards an MSI-X, multi-core friendly 
>> model where packets are LIKELY to be sent and received by the same set 
>> of [cpus | cores | packages | nodes] that the [userland] processes 
>> dealing with the data.
> 
> The problem is that the packet schedulers want global guarantees
> on packet ordering, not flow centric ones.
> 
> That is the issue Jamal is concerned about.

Oh, absolutely.

I think, fundamentally, any amount of cross-flow resource management 
done in software is an obstacle to concurrency.

That's not a value judgement, just a statement of fact.

"traffic cops" are intentional bottlenecks we add to the process, to 
enable features like priority flows, filtering, or even simple socket 
fairness guarantees.  Each of those bottlenecks serves a valid purpose, 
but at the end of the day, it's still a bottleneck.

So, improving concurrency may require turning off useful features that 
nonetheless hurt concurrency.

> The more I think about it, the more inevitable it seems that we really
> might need multiple qdiscs, one for each TX queue, to pull this full
> parallelization off.
> 
> But the semantics of that don't smell so nice either.  If the user
> attaches a new qdisc to "ethN", does it go to all the TX queues, or
> what?
> 
> All of the traffic shaping technology deals with the device as a unary
> object.  It doesn't fit to multi-queue at all.

Well the easy solutions to networking concurrency are

* use virtualization to carve up the machine into chunks

* use multiple net devices

Since new NIC hardware is actively trying to be friendly to 
multi-channel/virt scenarios, either of these is reasonably 
straightforward given the current state of the Linux net stack.  Using 
multiple net devices is especially attractive because it works very well 
with the existing packet scheduling.

Both unfortunately impose a burden on the developer and admin, to force 
their apps to distribute flows across multiple [VMs | net devs].

The third alternative is to use a single net device, with SMP-friendly 
packet scheduling.  Here you run into the problems you described "device 
as a unary object" etc. with the current infrastructure.

With multiple TX rings, consider that we are pushing the packet 
scheduling from software to hardware...  which implies
* hardware-specific packet scheduling
* some TC/shaping features not available, because hardware doesn't 
support it

	Jeff

WARNING: multiple messages have this Message-ID (diff)

From: Jeff Garzik <jeff@garzik.org>
To: David Miller <davem@davemloft.net>
Cc: randy.dunlap@oracle.com, Robert.Olsson@data.slu.se,
	gaagaan@gmail.com, kumarkr@linux.ibm.com,
	peter.p.waskiewicz.jr@intel.com, shemminger@linux-foundation.org,
	johnpol@2ka.mipt.ru, herbert@gondor.apana.org.au,
	rdreier@cisco.com, general@lists.openfabrics.org, mingo@elte.hu,
	sri@us.ibm.com, jagana@us.ibm.com, hadi@cyberus.ca,
	mchan@broadcom.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, tgraf@suug.ch, kaber@trash.net
Subject: [ofa-general] Re: parallel networking
Date: Mon, 08 Oct 2007 21:53:25 -0400	[thread overview]
Message-ID: <470ADF15.2090100@garzik.org> (raw)
In-Reply-To: <20071008.141154.107706003.davem@davemloft.net>

David Miller wrote:
> From: Jeff Garzik <jeff@garzik.org>
> Date: Mon, 08 Oct 2007 10:22:28 -0400
> 
>> In terms of overall parallelization, both for TX as well as RX, my gut 
>> feeling is that we want to move towards an MSI-X, multi-core friendly 
>> model where packets are LIKELY to be sent and received by the same set 
>> of [cpus | cores | packages | nodes] that the [userland] processes 
>> dealing with the data.
> 
> The problem is that the packet schedulers want global guarantees
> on packet ordering, not flow centric ones.
> 
> That is the issue Jamal is concerned about.

Oh, absolutely.

I think, fundamentally, any amount of cross-flow resource management 
done in software is an obstacle to concurrency.

That's not a value judgement, just a statement of fact.

"traffic cops" are intentional bottlenecks we add to the process, to 
enable features like priority flows, filtering, or even simple socket 
fairness guarantees.  Each of those bottlenecks serves a valid purpose, 
but at the end of the day, it's still a bottleneck.

So, improving concurrency may require turning off useful features that 
nonetheless hurt concurrency.

> The more I think about it, the more inevitable it seems that we really
> might need multiple qdiscs, one for each TX queue, to pull this full
> parallelization off.
> 
> But the semantics of that don't smell so nice either.  If the user
> attaches a new qdisc to "ethN", does it go to all the TX queues, or
> what?
> 
> All of the traffic shaping technology deals with the device as a unary
> object.  It doesn't fit to multi-queue at all.

Well the easy solutions to networking concurrency are

* use virtualization to carve up the machine into chunks

* use multiple net devices

Since new NIC hardware is actively trying to be friendly to 
multi-channel/virt scenarios, either of these is reasonably 
straightforward given the current state of the Linux net stack.  Using 
multiple net devices is especially attractive because it works very well 
with the existing packet scheduling.

Both unfortunately impose a burden on the developer and admin, to force 
their apps to distribute flows across multiple [VMs | net devs].

The third alternative is to use a single net device, with SMP-friendly 
packet scheduling.  Here you run into the problems you described "device 
as a unary object" etc. with the current infrastructure.

With multiple TX rings, consider that we are pushing the packet 
scheduling from software to hardware...  which implies
* hardware-specific packet scheduling
* some TC/shaping features not available, because hardware doesn't 
support it

	Jeff

next prev parent reply	other threads:[~2007-10-09  1:54 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-14  9:00 [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar
2007-09-14  9:01 ` [PATCH 1/10 REV5] [Doc] HOWTO Documentation for batching Krishna Kumar
2007-09-14 18:37   ` [ofa-general] " Randy Dunlap
2007-09-17  4:10     ` Krishna Kumar2
2007-09-17  4:13       ` [ofa-general] " Jeff Garzik
2007-09-14  9:01 ` [PATCH 2/10 REV5] [core] Add skb_blist & support " Krishna Kumar
2007-09-14 12:46   ` [ofa-general] " Evgeniy Polyakov
2007-09-17  3:51     ` Krishna Kumar2
2007-09-14  9:01 ` [PATCH 3/10 REV5] [sched] Modify qdisc_run to support batching Krishna Kumar
2007-09-14 12:15   ` [ofa-general] " Evgeniy Polyakov
2007-09-17  3:49     ` Krishna Kumar2
2007-09-14  9:02 ` [PATCH 4/10 REV5] [ethtool] Add ethtool support Krishna Kumar
2007-09-14  9:02 ` [PATCH 5/10 REV5] [IPoIB] Header file changes Krishna Kumar
2007-09-14  9:03 ` [PATCH 6/10 REV5] [IPoIB] CM & Multicast changes Krishna Kumar
2007-09-14  9:03 ` [PATCH 7/10 REV5] [IPoIB] Verbs changes Krishna Kumar
2007-09-14  9:03 ` [PATCH 8/10 REV5] [IPoIB] Post and work completion handler changes Krishna Kumar
2007-09-14  9:04 ` [PATCH 9/10 REV5] [IPoIB] Implement batching Krishna Kumar
2007-09-14  9:04 ` [PATCH 10/10 REV5] [E1000] " Krishna Kumar
2007-09-14 12:47   ` [ofa-general] " Evgeniy Polyakov
2007-09-17  3:56     ` Krishna Kumar2
2007-11-13 21:28   ` [ofa-general] " Kok, Auke
2007-11-14  8:30     ` Krishna Kumar2
2007-09-14 12:49 ` [ofa-general] Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Evgeniy Polyakov
2007-09-16 23:17 ` David Miller
2007-09-17  0:29   ` jamal
2007-09-17  1:02     ` David Miller
2007-09-17  2:14       ` [ofa-general] " jamal
2007-09-17  2:25         ` David Miller
2007-09-17  3:01           ` jamal
2007-09-17  3:13             ` David Miller
2007-09-17 12:51               ` jamal
2007-09-17 16:37                 ` [ofa-general] " David Miller
2007-09-17  4:46           ` Krishna Kumar2
2007-09-23 17:53     ` [PATCHES] TX batching jamal
2007-09-23 17:56       ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal
2007-09-23 17:58         ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal
2007-09-23 18:00           ` [PATCH 3/4][NET_BATCH] net core use batching jamal
2007-09-23 18:02             ` [ofa-general] [PATCH 4/4][NET_SCHED] kill dev->gso_skb jamal
2007-09-30 18:53               ` [ofa-general] [PATCH 3/3][NET_SCHED] " jamal
2007-10-07 18:39               ` [ofa-general] [PATCH 3/3][NET_BATCH] " jamal
2007-09-30 18:52             ` [ofa-general] [PATCH 2/3][NET_BATCH] net core use batching jamal
2007-10-01  4:11               ` Bill Fink
2007-10-01 13:30                 ` jamal
2007-10-02  4:25                   ` [ofa-general] " Bill Fink
2007-10-02 13:20                     ` jamal
2007-10-03  5:29                       ` [ofa-general] " Bill Fink
2007-10-03 13:42                         ` jamal
2007-10-01 10:42               ` [ofa-general] " Patrick McHardy
2007-10-01 13:21                 ` jamal
2007-10-08  5:03                   ` Krishna Kumar2
2007-10-08 13:17                     ` jamal
2007-10-09  3:09                       ` [ofa-general] " Krishna Kumar2
2007-10-09 13:10                         ` jamal
2007-10-07 18:38             ` [ofa-general] " jamal
2007-09-30 18:51           ` [ofa-general] [PATCH 1/4] [NET_BATCH] Introduce batching interface jamal
2007-09-30 18:54             ` [ofa-general] Re: [PATCH 1/3] " jamal
2007-10-07 18:36           ` [ofa-general] " jamal
2007-10-08  9:59             ` Krishna Kumar2
2007-10-08 13:49               ` jamal
2007-09-24 19:12         ` [ofa-general] RE: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock Waskiewicz Jr, Peter P
2007-09-24 22:51           ` jamal
2007-09-24 22:57             ` Waskiewicz Jr, Peter P
2007-09-24 23:38               ` [ofa-general] " jamal
2007-09-24 23:47                 ` Waskiewicz Jr, Peter P
2007-09-25  0:14                   ` [ofa-general] " Stephen Hemminger
2007-09-25  0:31                     ` [ofa-general] " Waskiewicz Jr, Peter P
2007-09-25 13:15                     ` [ofa-general] " jamal
2007-09-25 15:24                       ` Stephen Hemminger
2007-09-25 22:14                         ` jamal
2007-09-25 22:43                           ` jamal
2007-09-25 13:08                   ` [ofa-general] " jamal
2007-10-08  4:51                 ` [ofa-general] " David Miller
2007-10-08 13:34                   ` jamal
2007-10-08 14:22                     ` parallel networking (was Re: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock) Jeff Garzik
2007-10-08 15:18                       ` jamal
2007-10-08 15:18                         ` [ofa-general] " jamal
2007-10-08 21:11                       ` parallel networking David Miller
2007-10-08 21:11                         ` [ofa-general] " David Miller
2007-10-08 22:30                         ` jamal
2007-10-08 22:33                           ` David Miller
2007-10-08 22:35                             ` Waskiewicz Jr, Peter P
2007-10-08 22:35                               ` [ofa-general] " Waskiewicz Jr, Peter P
2007-10-08 23:42                             ` jamal
2007-10-08 23:42                               ` [ofa-general] " jamal
2007-10-09  1:53                         ` Jeff Garzik [this message]
2007-10-09  1:53                           ` Jeff Garzik
2007-10-09 14:59                           ` Michael Krause
2007-10-08 21:05                     ` [PATCH 1/4] [NET_SCHED] explict hold dev tx lock David Miller
2007-09-23 18:19       ` [PATCHES] TX batching Jeff Garzik
2007-09-23 19:11         ` [ofa-general] " jamal
2007-09-23 19:36           ` Kok, Auke
2007-09-23 21:20             ` jamal
2007-09-24  7:00               ` Kok, Auke
2007-09-24 22:38                 ` jamal
2007-09-24 22:52                   ` [ofa-general] " Kok, Auke
2007-09-24 22:54           ` [DOC] Net batching driver howto jamal
2007-09-25 20:16             ` [ofa-general] " Randy Dunlap
2007-09-25 22:28               ` jamal
2007-09-25  0:15           ` [PATCHES] TX batching Jeff Garzik
2007-09-30 18:50       ` [ofa-general] " jamal
2007-09-30 19:19         ` [ofa-general] " jamal
2007-10-07 18:34       ` [ofa-general] " jamal
2007-10-08 12:51         ` [ofa-general] " Evgeniy Polyakov
2007-10-08 14:05           ` jamal
2007-10-09  8:14             ` Krishna Kumar2
2007-10-09 13:25               ` jamal
2007-09-17  4:08   ` [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar2

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=470ADF15.2090100@garzik.org \
    --to=jeff@garzik.org \
    --cc=Robert.Olsson@data.slu.se \
    --cc=davem@davemloft.net \
    --cc=gaagaan@gmail.com \
    --cc=general@lists.openfabrics.org \
    --cc=hadi@cyberus.ca \
    --cc=herbert@gondor.apana.org.au \
    --cc=jagana@us.ibm.com \
    --cc=johnpol@2ka.mipt.ru \
    --cc=kaber@trash.net \
    --cc=krkumar2@in.ibm.com \
    --cc=kumarkr@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchan@broadcom.com \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=peter.p.waskiewicz.jr@intel.com \
    --cc=randy.dunlap@oracle.com \
    --cc=rdreier@cisco.com \
    --cc=rick.jones2@hp.com \
    --cc=shemminger@linux-foundation.org \
    --cc=sri@us.ibm.com \
    --cc=tgraf@suug.ch \
    --cc=xma@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.