From: David Miller <davem@davemloft.net>
To: jeff@garzik.org
Cc: johnpol@2ka.mipt.ru, herbert@gondor.apana.org.au,
gaagaan@gmail.com, Robert.Olsson@data.slu.se,
netdev@vger.kernel.org, rdreier@cisco.com,
peter.p.waskiewicz.jr@intel.com, hadi@cyberus.ca,
mcarlson@broadcom.com, jagana@us.ibm.com,
general@lists.openfabrics.org, mchan@broadcom.com, tgraf@suug.ch,
randy.dunlap@oracle.com, sri@us.ibm.com,
shemminger@linux-foundation.org, kaber@trash.net
Subject: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
Date: Mon, 08 Oct 2007 18:41:26 -0700 (PDT) [thread overview]
Message-ID: <20071008.184126.124062865.davem@davemloft.net> (raw)
In-Reply-To: <470AD5D7.1070000@garzik.org>
From: Jeff Garzik <jeff@garzik.org>
Date: Mon, 08 Oct 2007 21:13:59 -0400
> If you assume a scheduler implementation where each prio band is mapped
> to a separate CPU, you can certainly see where some CPUs could be
> substantially idle while others are overloaded, largely depending on the
> data workload (and priority contained within).
Right, which is why Peter added the prio DRR scheduler stuff for TX
multiqueue (see net/sched/sch_prio.c:rr_qdisc_ops) because this is
what the chips do.
But this doesn't get us to where we want to be as Peter has been
explaining a bit these past few days.
Ok, we're talking a lot but not pouring much concrete, let's start
doing that. I propose:
1) A library for transmit load balancing functions, with an interface
that can be made visible to userspace. I can write this and test
it on real multiqueue hardware.
The whole purpose of this library is to set skb->queue_mapping
based upon the load balancing function.
Facilities will be added to handle virtualization port selection
based upon destination MAC address as one of the "load balancing"
methods.
2) Switch the default qdisc away from pfifo_fast to a new DRR fifo
with load balancing using the code in #1. I think this is kind
of in the territory of what Peter said he is working on.
I know this is controversial, but realistically I doubt users
benefit at all from the prioritization that pfifo provides. They
will, on the other hand, benefit from TX queue load balancing on
fast interfaces.
3) Work on discovering a way to make the locking on transmit as
localized to the current thread of execution as possible. Things
like RCU and statistic replication, techniques we use widely
elsewhere in the stack, begin to come to mind.
I also want to point out another issue. Any argument wrt. reordering
is specious at best because right now reordering from qdisc to device
happens anyways.
And that's because we drop the qdisc lock first, then we grab the
transmit lock on the device and submit the packet. So, after we
drop the qdisc lock, another cpu can get the qdisc lock, get the
next packet (perhaps a lower priority one) and then sneak in to
get the device transmit lock before the first thread can, and
thus the packets will be submitted out of order.
This, along with other things, makes me believe that ordering really
doesn't matter in practice. And therefore, in practice, we can treat
everything from the qdisc to the real hardware as a FIFO even if
something else is going on inside the black box which might reorder
packets on the wire.
next prev parent reply other threads:[~2007-10-09 1:41 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-08 18:26 [PATCH 2/3][NET_BATCH] net core use batching jamal
2007-10-08 19:46 ` Waskiewicz Jr, Peter P
2007-10-08 20:48 ` jamal
2007-10-08 21:26 ` [ofa-general] " David Miller
2007-10-08 22:34 ` jamal
2007-10-08 22:36 ` [ofa-general] " Waskiewicz Jr, Peter P
2007-10-08 22:33 ` Waskiewicz Jr, Peter P
2007-10-08 23:40 ` jamal
2007-10-09 1:13 ` Jeff Garzik
2007-10-09 1:41 ` David Miller [this message]
2007-10-09 2:01 ` [ofa-general] " Herbert Xu
2007-10-09 2:03 ` Herbert Xu
2007-10-09 2:04 ` Herbert Xu
2007-10-09 2:15 ` jamal
2007-10-09 2:16 ` Herbert Xu
2007-10-09 2:19 ` [ofa-general] " jamal
2007-10-09 2:20 ` Herbert Xu
2007-10-09 2:45 ` [ofa-general] " David Miller
2007-10-09 2:43 ` David Miller
2007-10-09 2:46 ` Herbert Xu
2007-10-09 2:12 ` [ofa-general] " Jeff Garzik
2007-10-09 2:46 ` David Miller
2007-10-09 18:48 ` [ofa-general] " Waskiewicz Jr, Peter P
2007-10-09 19:04 ` Jeff Garzik
2007-10-09 19:07 ` Waskiewicz Jr, Peter P
2007-10-09 2:14 ` [ofa-general] " jamal
2007-10-09 2:16 ` Herbert Xu
2007-10-09 2:47 ` [ofa-general] " David Miller
2007-10-09 16:51 ` Andi Kleen
2007-10-09 18:22 ` Stephen Hemminger
2007-10-09 18:30 ` Andi Kleen
2007-10-09 20:43 ` David Miller
2007-10-09 20:53 ` Stephen Hemminger
2007-10-09 21:22 ` David Miller
2007-10-09 21:56 ` jamal
2007-10-10 0:04 ` David Miller
2007-10-10 0:37 ` Andi Kleen
2007-10-10 0:50 ` David Miller
2007-10-10 9:16 ` Andi Kleen
2007-10-10 9:25 ` David Miller
2007-10-10 10:23 ` Andi Kleen
2007-10-10 10:44 ` David Miller
2007-10-10 13:08 ` jamal
2007-10-10 22:37 ` David Miller
2007-10-10 15:35 ` Waskiewicz Jr, Peter P
2007-10-10 16:02 ` Andi Kleen
2007-10-10 16:42 ` Waskiewicz Jr, Peter P
2007-10-10 9:53 ` Herbert Xu
2007-10-12 16:08 ` Brandeburg, Jesse
2007-10-12 17:05 ` Stephen Hemminger
2007-10-12 18:29 ` Andi Kleen
2007-10-12 18:27 ` Andi Kleen
2007-10-10 16:02 ` Bill Fink
2007-10-10 22:53 ` David Miller
2007-10-11 6:52 ` Krishna Kumar2
2007-10-09 1:31 ` Jeff Garzik
2007-10-09 10:58 ` [ofa-general] " Krishna Kumar2
2007-10-09 11:02 ` David Miller
2007-10-09 11:20 ` [ofa-general] " Krishna Kumar2
2007-10-09 11:21 ` Krishna Kumar2
2007-10-09 11:24 ` David Miller
2007-10-09 12:44 ` [ofa-general] " Jeff Garzik
2007-10-09 12:55 ` Herbert Xu
2007-10-09 13:00 ` Jeff Garzik
2007-10-09 20:14 ` David Miller
2007-10-09 20:20 ` [ofa-general] " Jeff Garzik
2007-10-09 21:25 ` David Miller
2007-10-09 20:22 ` [ofa-general] " Roland Dreier
2007-10-09 20:51 ` David Miller
2007-10-09 21:40 ` Roland Dreier
2007-10-09 22:44 ` [ofa-general] " Roland Dreier
2007-10-09 22:46 ` [ofa-general] [PATCH 1/4] IPoIB: Fix unused variable warning Roland Dreier
2007-10-09 22:47 ` [ofa-general] [PATCH 2/4] ibm_emac: Convert to use napi_struct independent of struct net_device Roland Dreier
2007-10-09 22:47 ` [PATCH 3/4] ibm_new_emac: Nuke SET_MODULE_OWNER() use Roland Dreier
2007-10-09 22:48 ` [PATCH 4/4] ibm_emac: Convert to use napi_struct independent of struct net_device Roland Dreier
2007-10-09 22:51 ` [ofa-general] " Roland Dreier
2007-10-09 23:17 ` [PATCH 1/4] IPoIB: Fix unused variable warning David Miller
2007-10-10 0:32 ` Jeff Garzik
2007-10-10 0:47 ` [ofa-general] " Jeff Garzik
-- strict thread matches above, loose matches on Subject: below --
2007-10-08 13:17 [PATCH 2/3][NET_BATCH] net core use batching jamal
2007-10-09 3:09 ` [ofa-general] " Krishna Kumar2
2007-10-09 13:10 ` jamal
2007-09-14 9:00 [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar
2007-09-16 23:17 ` [ofa-general] " David Miller
2007-09-17 0:29 ` jamal
2007-09-23 17:53 ` [PATCHES] TX batching jamal
2007-09-23 17:56 ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal
2007-09-23 17:58 ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal
2007-09-23 18:00 ` [PATCH 3/4][NET_BATCH] net core use batching jamal
2007-09-30 18:52 ` [ofa-general] [PATCH 2/3][NET_BATCH] " jamal
2007-10-01 4:11 ` Bill Fink
2007-10-01 13:30 ` jamal
2007-10-02 4:25 ` [ofa-general] " Bill Fink
2007-10-02 13:20 ` jamal
2007-10-03 5:29 ` [ofa-general] " Bill Fink
2007-10-01 10:42 ` Patrick McHardy
2007-10-01 13:21 ` jamal
2007-10-08 5:03 ` Krishna Kumar2
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071008.184126.124062865.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=Robert.Olsson@data.slu.se \
--cc=gaagaan@gmail.com \
--cc=general@lists.openfabrics.org \
--cc=hadi@cyberus.ca \
--cc=herbert@gondor.apana.org.au \
--cc=jagana@us.ibm.com \
--cc=jeff@garzik.org \
--cc=johnpol@2ka.mipt.ru \
--cc=kaber@trash.net \
--cc=mcarlson@broadcom.com \
--cc=mchan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=peter.p.waskiewicz.jr@intel.com \
--cc=randy.dunlap@oracle.com \
--cc=rdreier@cisco.com \
--cc=shemminger@linux-foundation.org \
--cc=sri@us.ibm.com \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).