netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: David Miller <davem@davemloft.net>
Cc: sven@thebigcorporation.com, ghaskins@novell.com,
	vernux@us.ibm.com, andi@firstfloor.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	pmullaney@novell.com
Subject: Re: High contention on the sk_buff_head.lock
Date: Thu, 19 Mar 2009 06:49:19 +0100	[thread overview]
Message-ID: <49C1DCDF.6050300@cosmosbay.com> (raw)
In-Reply-To: <20090318.185441.138157931.davem@davemloft.net>

David Miller a écrit :
> From: Sven-Thorsten Dietrich <sven@thebigcorporation.com>
> Date: Wed, 18 Mar 2009 18:43:27 -0700
> 
>> Do we have to rule-out per-CPU queues, that aggregate into a master
>> queue in a batch-wise manner? 
> 
> That would violate the properties and characteristics expected by
> the packet scheduler, wrt. to fair based fairness, rate limiting,
> etc.
> 
> The only legal situation where we can parallelize to single device
> is where only the most trivial packet scheduler is attached to
> the device and the device is multiqueue, and that is exactly what
> we do right now.

I agree with you David.

Still, there is room for improvements, since :

1) default qdisc is pfifo_fast. This beast uses three sk_buff_head (96 bytes)
  where it could use 3 smaller list_head (3 * 16 = 48 bytes on x86_64)

 (assuming sizeof(spinlock_t) is only 4 bytes, but it's more than that
 on various situations (LOCKDEP, ...)

2) struct Qdisc layout could be better, letting read mostly fields
   at beginning of structure. (ie move 'dev_queue', 'next_sched', reshape_fail,
   u32_node, __parent, ...)

  'struct gnet_stats_basic' has a 32 bits hole

   'gnet_stats_queue' could be split, at least in Qdisc, so that three
   seldom use fields (drops, requeues, overlimits) go in a different cache line.

   gnet_stats_rate_est might be also moved in a 'not very used' cache line, if
   I am not mistaken ?

3) In stress situation a CPU A queues a skb to a sk_buff_head, but a CPU B
   dequeues it to feed device, involving an expensive cache line miss
   on the skb.{next|prev} (to set them to NULL)

   We could:
      Use a special dequeue op that doesnt touch skb.{next|prev}
   Eventually set next/prev to NULL after q.lock is released



--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2009-03-19  5:49 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-18 17:24 High contention on the sk_buff_head.lock Vernon Mauery
2009-03-18 19:07 ` Eric Dumazet
2009-03-18 20:17   ` Vernon Mauery
2009-03-20 23:29     ` Jarek Poplawski
2009-03-23  8:32       ` Eric Dumazet
2009-03-23  8:37         ` David Miller
2009-03-23  8:50           ` Jarek Poplawski
2009-04-02 14:13           ` Herbert Xu
2009-04-02 14:15             ` Herbert Xu
2009-03-18 20:54 ` Andi Kleen
2009-03-18 21:03   ` David Miller
2009-03-18 21:10     ` Vernon Mauery
2009-03-18 21:38       ` David Miller
2009-03-18 21:49         ` Vernon Mauery
2009-03-19  1:02           ` David Miller
2009-03-18 21:54         ` Gregory Haskins
2009-03-19  1:03           ` David Miller
2009-03-19  1:13             ` Sven-Thorsten Dietrich
2009-03-19  1:17               ` David Miller
2009-03-19  1:43                 ` Sven-Thorsten Dietrich
2009-03-19  1:54                   ` David Miller
2009-03-19  5:49                     ` Eric Dumazet [this message]
2009-03-19  5:58                       ` David Miller
2009-03-19 14:04                         ` [PATCH] net: reorder struct Qdisc for better SMP performance Eric Dumazet
2009-03-20  8:33                           ` David Miller
2009-03-19 13:45                   ` High contention on the sk_buff_head.lock Andi Kleen
2009-03-19  3:48             ` Gregory Haskins
2009-03-19  5:38               ` David Miller
2009-03-19 12:42                 ` Gregory Haskins
2009-03-19 20:52                   ` David Miller
2009-03-19 12:50             ` Peter W. Morreale
2009-03-19  7:15           ` Evgeniy Polyakov
2009-03-18 21:07   ` Vernon Mauery
2009-03-18 21:45     ` Eilon Greenstein
2009-03-18 21:51       ` Vernon Mauery
2009-03-18 21:59         ` Andi Kleen
2009-03-18 22:19           ` Rick Jones
2009-03-19 12:59   ` Peter W. Morreale
2009-03-19 13:36     ` Peter W. Morreale
2009-03-19 13:46     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C1DCDF.6050300@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=andi@firstfloor.org \
    --cc=davem@davemloft.net \
    --cc=ghaskins@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pmullaney@novell.com \
    --cc=sven@thebigcorporation.com \
    --cc=vernux@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).