netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: jamal <hadi@cyberus.ca>
To: "Waskiewicz Jr, Peter P" <peter.p.waskiewicz.jr@intel.com>
Cc: David Miller <davem@davemloft.net>,
	krkumar2@in.ibm.com, johnpol@2ka.mipt.ru,
	herbert@gondor.apana.org.au, kaber@trash.net,
	shemminger@linux-foundation.org, jagana@us.ibm.com,
	Robert.Olsson@data.slu.se, rick.jones2@hp.com, xma@us.ibm.com,
	gaagaan@gmail.com, netdev@vger.kernel.org, rdreier@cisco.com,
	mcarlson@broadcom.com, jeff@garzik.org, mchan@broadcom.com,
	general@lists.openfabrics.org, tgraf@suug.ch,
	randy.dunlap@oracle.com, sri@us.ibm.com
Subject: RE: [PATCH 2/3][NET_BATCH] net core use batching
Date: Mon, 08 Oct 2007 19:40:45 -0400	[thread overview]
Message-ID: <1191886845.4373.138.camel@localhost> (raw)
In-Reply-To: <D5C1322C3E673F459512FB59E0DDC32903BC14AC@orsmsx414.amr.corp.intel.com>

On Mon, 2007-08-10 at 15:33 -0700, Waskiewicz Jr, Peter P wrote:


> Addressing your note/issue with different rings being services
> concurrently: I'd like to remove the QDISC_RUNNING bit from the global

The challenge to deal with is that netdevices, filters, the queues and
scheduler are closely inter-twined. So it is not just the scheduling
region and QDISC_RUNNING. For example, lets pick just the filters
because they are simple to see: You need to attach them to something -
whatever that is, you then need to synchronize against config and
multiple cpus trying to use them. You could: 
a) replicate them across cpus and only lock on config, but you are
wasting RAM then
b) attach them to rings instead of netdevices - but that makes me wonder
if those subqueues are now going to become netdevices. This also means
you change all user space interfaces to know about subqueues. If you
recall this was a major contention in our earlier discussion.

> device; with Tx multiqueue, this bit should be set on each queue (if at
> all), allowing multiple Tx rings to be loaded simultaneously. 

This is the issue i raised - refer to Dave's wording of it. If you run
access to the rings simultenously you may not be able to guarantee any
ordering or proper qos in contention for wire-resources (think strict
prio in hardware) - as long as you have the qdisc area.  You may
actually get away with it with something like DRR.
You could totaly bypass the qdisc region and go to the driver directly
and let it worry about the scheduling but youd have to make the qdisc
area a "passthrough" while providing the illusion to user space that all
is as before.

>  The
> biggest issue today with the multiqueue implementation is the global
> queue_lock.  I see it being a hot source of contention in my testing; my
> setup is a 8-core machine (dual quad-core procs) with a 10GbE NIC, using
> 8 Tx and 8 Rx queues.  On transmit, when loading all 8 queues, the
> enqueue/dequeue are hitting that lock quite a bit for the whole device.

Yes, the queuelock is expensive; in your case if all 8 hardware threads
are contending for that one device, you will suffer. The txlock on the
other hand is not that expensive since the contention is for a max of 2
cpus (tx and rx softirq). 
I tried to use that fact in the batching to move things that i processed
under queue lock into the area for txlock. I'd be very interested in
some results on such a piece of hardware with the 10G nic to see if
these theories make any sense.

> I really think that the queue_lock should join the queue_state, so the
> device no longer manages the top-level state (since we're operating
> per-queue instead of per-device).

Refer to above.


> 
> The multiqueue implementation today enforces the number of qdisc bands
> (RR or PRIO) to be equal to the number of Tx rings your hardware/driver
> is supporting.  Therefore, the queue_lock and queue_state in the kernel
> directly relate to the qdisc band management.  If the queue stops from
> the driver, then the qdisc won't try to dequeue from the band. 

Good start.

>  What I'm
> working on is to move the lock there too, so I can lock the queue when I
> enqueue (protect the band from multiple sources modifying the skb
> chain), and lock it when I dequeue.  This is purely for concurrency of
> adding/popping skb's from the qdisc queues.  

Ok, so the "concurency" aspect is what worries me. What i am saying is
that sooner or later you have to serialize (which is anti-concurency)
For example, consider CPU0 running a high prio queue and CPU1 running
the low prio queue of the same netdevice.
Assume CPU0 is getting a lot of interupts or other work while CPU1
doesnt (so as to create a condition that CPU1 is slower). Then as long
as there packets and there is space on the drivers rings, CPU1 will send
more packets per unit time than CPU0.
This contradicts the strict prio scheduler which says higher priority
packets ALWAYS go out first regardless of the presence of low prio
packets.  I am not sure i made sense.

cheers,
jamal


  reply	other threads:[~2007-10-08 23:40 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-08 18:26 [PATCH 2/3][NET_BATCH] net core use batching jamal
2007-10-08 19:46 ` Waskiewicz Jr, Peter P
2007-10-08 20:48   ` jamal
2007-10-08 21:26     ` [ofa-general] " David Miller
2007-10-08 22:34       ` jamal
2007-10-08 22:36         ` [ofa-general] " Waskiewicz Jr, Peter P
2007-10-08 22:33     ` Waskiewicz Jr, Peter P
2007-10-08 23:40       ` jamal [this message]
2007-10-09  1:13         ` Jeff Garzik
2007-10-09  1:41           ` [ofa-general] " David Miller
2007-10-09  2:01             ` Herbert Xu
2007-10-09  2:03               ` Herbert Xu
2007-10-09  2:04                 ` Herbert Xu
2007-10-09  2:15                   ` jamal
2007-10-09  2:16                     ` Herbert Xu
2007-10-09  2:19                       ` [ofa-general] " jamal
2007-10-09  2:20                         ` Herbert Xu
2007-10-09  2:45                   ` [ofa-general] " David Miller
2007-10-09  2:43                 ` David Miller
2007-10-09  2:46                   ` Herbert Xu
2007-10-09  2:12             ` [ofa-general] " Jeff Garzik
2007-10-09  2:46               ` David Miller
2007-10-09 18:48               ` [ofa-general] " Waskiewicz Jr, Peter P
2007-10-09 19:04                 ` Jeff Garzik
2007-10-09 19:07                   ` Waskiewicz Jr, Peter P
2007-10-09  2:14             ` [ofa-general] " jamal
2007-10-09  2:16               ` Herbert Xu
2007-10-09  2:47                 ` [ofa-general] " David Miller
2007-10-09 16:51             ` Andi Kleen
2007-10-09 18:22               ` Stephen Hemminger
2007-10-09 18:30                 ` Andi Kleen
2007-10-09 20:43               ` David Miller
2007-10-09 20:53                 ` Stephen Hemminger
2007-10-09 21:22                   ` David Miller
2007-10-09 21:56                     ` jamal
2007-10-10  0:04                       ` David Miller
2007-10-10  0:37                         ` Andi Kleen
2007-10-10  0:50                           ` David Miller
2007-10-10  9:16                             ` Andi Kleen
2007-10-10  9:25                               ` David Miller
2007-10-10 10:23                                 ` Andi Kleen
2007-10-10 10:44                                   ` David Miller
2007-10-10 13:08                                     ` jamal
2007-10-10 22:37                                       ` David Miller
2007-10-10 15:35                                     ` Waskiewicz Jr, Peter P
2007-10-10 16:02                                       ` Andi Kleen
2007-10-10 16:42                                         ` Waskiewicz Jr, Peter P
2007-10-10  9:53                               ` Herbert Xu
2007-10-12 16:08                           ` Brandeburg, Jesse
2007-10-12 17:05                             ` Stephen Hemminger
2007-10-12 18:29                               ` Andi Kleen
2007-10-12 18:27                             ` Andi Kleen
2007-10-10 16:02                         ` Bill Fink
2007-10-10 22:53                           ` David Miller
2007-10-11  6:52                 ` Krishna Kumar2
2007-10-09  1:31         ` Jeff Garzik
2007-10-09 10:58       ` [ofa-general] " Krishna Kumar2
2007-10-09 11:02         ` David Miller
2007-10-09 11:20           ` [ofa-general] " Krishna Kumar2
2007-10-09 11:21           ` Krishna Kumar2
2007-10-09 11:24             ` David Miller
2007-10-09 12:44               ` [ofa-general] " Jeff Garzik
2007-10-09 12:55                 ` Herbert Xu
2007-10-09 13:00                   ` Jeff Garzik
2007-10-09 20:14                 ` David Miller
2007-10-09 20:20                   ` [ofa-general] " Jeff Garzik
2007-10-09 21:25                     ` David Miller
2007-10-09 20:22               ` [ofa-general] " Roland Dreier
2007-10-09 20:51                 ` David Miller
2007-10-09 21:40                   ` Roland Dreier
2007-10-09 22:44                   ` [ofa-general] " Roland Dreier
2007-10-09 22:46                     ` [ofa-general] [PATCH 1/4] IPoIB: Fix unused variable warning Roland Dreier
2007-10-09 22:47                       ` [ofa-general] [PATCH 2/4] ibm_emac: Convert to use napi_struct independent of struct net_device Roland Dreier
2007-10-09 22:47                       ` [PATCH 3/4] ibm_new_emac: Nuke SET_MODULE_OWNER() use Roland Dreier
2007-10-09 22:48                       ` [PATCH 4/4] ibm_emac: Convert to use napi_struct independent of struct net_device Roland Dreier
2007-10-09 22:51                         ` [ofa-general] " Roland Dreier
2007-10-09 23:17                       ` [PATCH 1/4] IPoIB: Fix unused variable warning David Miller
2007-10-10  0:32                         ` Jeff Garzik
2007-10-10  0:47                       ` [ofa-general] " Jeff Garzik
  -- strict thread matches above, loose matches on Subject: below --
2007-10-08  5:03 [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching Krishna Kumar2
2007-10-08 13:17 ` jamal
2007-09-14  9:00 [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 Krishna Kumar
2007-09-16 23:17 ` [ofa-general] " David Miller
2007-09-17  0:29   ` jamal
2007-09-23 17:53     ` [PATCHES] TX batching jamal
2007-09-23 17:56       ` [ofa-general] [PATCH 1/4] [NET_SCHED] explict hold dev tx lock jamal
2007-09-23 17:58         ` [ofa-general] [PATCH 2/4] [NET_BATCH] Introduce batching interface jamal
2007-09-23 18:00           ` [PATCH 3/4][NET_BATCH] net core use batching jamal
2007-09-30 18:52             ` [ofa-general] [PATCH 2/3][NET_BATCH] " jamal
2007-10-01  4:11               ` Bill Fink
2007-10-01 13:30                 ` jamal
2007-10-02  4:25                   ` [ofa-general] " Bill Fink
2007-10-02 13:20                     ` jamal
2007-10-03  5:29                       ` [ofa-general] " Bill Fink
2007-10-03 13:42                         ` jamal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1191886845.4373.138.camel@localhost \
    --to=hadi@cyberus.ca \
    --cc=Robert.Olsson@data.slu.se \
    --cc=davem@davemloft.net \
    --cc=gaagaan@gmail.com \
    --cc=general@lists.openfabrics.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jagana@us.ibm.com \
    --cc=jeff@garzik.org \
    --cc=johnpol@2ka.mipt.ru \
    --cc=kaber@trash.net \
    --cc=krkumar2@in.ibm.com \
    --cc=mcarlson@broadcom.com \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=peter.p.waskiewicz.jr@intel.com \
    --cc=randy.dunlap@oracle.com \
    --cc=rdreier@cisco.com \
    --cc=rick.jones2@hp.com \
    --cc=shemminger@linux-foundation.org \
    --cc=sri@us.ibm.com \
    --cc=tgraf@suug.ch \
    --cc=xma@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).