From: Eric Dumazet <dada1@cosmosbay.com>
To: David Miller <davem@davemloft.net>
Cc: sven@thebigcorporation.com, ghaskins@novell.com,
vernux@us.ibm.com, andi@firstfloor.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
pmullaney@novell.com
Subject: Re: High contention on the sk_buff_head.lock
Date: Thu, 19 Mar 2009 06:49:19 +0100 [thread overview]
Message-ID: <49C1DCDF.6050300@cosmosbay.com> (raw)
In-Reply-To: <20090318.185441.138157931.davem@davemloft.net>
David Miller a écrit :
> From: Sven-Thorsten Dietrich <sven@thebigcorporation.com>
> Date: Wed, 18 Mar 2009 18:43:27 -0700
>
>> Do we have to rule-out per-CPU queues, that aggregate into a master
>> queue in a batch-wise manner?
>
> That would violate the properties and characteristics expected by
> the packet scheduler, wrt. to fair based fairness, rate limiting,
> etc.
>
> The only legal situation where we can parallelize to single device
> is where only the most trivial packet scheduler is attached to
> the device and the device is multiqueue, and that is exactly what
> we do right now.
I agree with you David.
Still, there is room for improvements, since :
1) default qdisc is pfifo_fast. This beast uses three sk_buff_head (96 bytes)
where it could use 3 smaller list_head (3 * 16 = 48 bytes on x86_64)
(assuming sizeof(spinlock_t) is only 4 bytes, but it's more than that
on various situations (LOCKDEP, ...)
2) struct Qdisc layout could be better, letting read mostly fields
at beginning of structure. (ie move 'dev_queue', 'next_sched', reshape_fail,
u32_node, __parent, ...)
'struct gnet_stats_basic' has a 32 bits hole
'gnet_stats_queue' could be split, at least in Qdisc, so that three
seldom use fields (drops, requeues, overlimits) go in a different cache line.
gnet_stats_rate_est might be also moved in a 'not very used' cache line, if
I am not mistaken ?
3) In stress situation a CPU A queues a skb to a sk_buff_head, but a CPU B
dequeues it to feed device, involving an expensive cache line miss
on the skb.{next|prev} (to set them to NULL)
We could:
Use a special dequeue op that doesnt touch skb.{next|prev}
Eventually set next/prev to NULL after q.lock is released
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-03-19 5:49 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-18 17:24 High contention on the sk_buff_head.lock Vernon Mauery
2009-03-18 19:07 ` Eric Dumazet
2009-03-18 20:17 ` Vernon Mauery
2009-03-20 23:29 ` Jarek Poplawski
2009-03-23 8:32 ` Eric Dumazet
2009-03-23 8:37 ` David Miller
2009-03-23 8:50 ` Jarek Poplawski
2009-04-02 14:13 ` Herbert Xu
2009-04-02 14:15 ` Herbert Xu
2009-03-18 20:54 ` Andi Kleen
2009-03-18 21:03 ` David Miller
2009-03-18 21:10 ` Vernon Mauery
2009-03-18 21:38 ` David Miller
2009-03-18 21:49 ` Vernon Mauery
2009-03-19 1:02 ` David Miller
2009-03-18 21:54 ` Gregory Haskins
2009-03-19 1:03 ` David Miller
2009-03-19 1:13 ` Sven-Thorsten Dietrich
2009-03-19 1:17 ` David Miller
2009-03-19 1:43 ` Sven-Thorsten Dietrich
2009-03-19 1:54 ` David Miller
2009-03-19 5:49 ` Eric Dumazet [this message]
2009-03-19 5:58 ` David Miller
2009-03-19 14:04 ` [PATCH] net: reorder struct Qdisc for better SMP performance Eric Dumazet
2009-03-20 8:33 ` David Miller
2009-03-19 13:45 ` High contention on the sk_buff_head.lock Andi Kleen
2009-03-19 3:48 ` Gregory Haskins
2009-03-19 5:38 ` David Miller
2009-03-19 12:42 ` Gregory Haskins
2009-03-19 20:52 ` David Miller
2009-03-19 12:50 ` Peter W. Morreale
2009-03-19 7:15 ` Evgeniy Polyakov
2009-03-18 21:07 ` Vernon Mauery
2009-03-18 21:45 ` Eilon Greenstein
2009-03-18 21:51 ` Vernon Mauery
2009-03-18 21:59 ` Andi Kleen
2009-03-18 22:19 ` Rick Jones
2009-03-19 12:59 ` Peter W. Morreale
2009-03-19 13:36 ` Peter W. Morreale
2009-03-19 13:46 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49C1DCDF.6050300@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=andi@firstfloor.org \
--cc=davem@davemloft.net \
--cc=ghaskins@novell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pmullaney@novell.com \
--cc=sven@thebigcorporation.com \
--cc=vernux@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).