From: Eric Dumazet <dada1@cosmosbay.com>
To: Vernon Mauery <vernux@us.ibm.com>
Cc: netdev <netdev@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
rt-users <linux-rt-users@vger.kernel.org>
Subject: Re: High contention on the sk_buff_head.lock
Date: Wed, 18 Mar 2009 20:07:19 +0100 [thread overview]
Message-ID: <49C14667.2040806@cosmosbay.com> (raw)
In-Reply-To: <49C12E64.1000301@us.ibm.com>
Vernon Mauery a écrit :
> I have been beating on network throughput in the -rt kernel for some time
> now. After digging down through the send path of UDP packets, I found
> that the sk_buff_head.lock is under some very high contention. This lock
> is acquired each time a packet is enqueued on a qdisc and then acquired
> again to dequeue the packet. Under high networking loads, the enqueueing
> processes are not only contending among each other for the lock, but also
> with the net-tx soft irq. This makes for some very high contention on this
> one lock. My testcase is running varying numbers of concurrent netperf
> instances pushing UDP traffic to another machine. As the count goes from
> 1 to 2, the network performance increases. But from 2 to 4 and from 4
> to 8,
> we see a big decline, with 8 instances pushing about half of what a single
> thread can do.
>
> Running 2.6.29-rc6-rt3 on an 8-way machine with a 10GbE card (I have tried
> both NetXen and Broadcom, with very similar results), I can only push about
> 1200 Mb/s. Whereas with the mainline 2.6.29-rc8 kernel, I can push nearly
> 6000 Mb/s. But still not as much as I think is possible. I was curious and
> decided to see if the mainline kernel was hitting the same lock, and using
> /proc/lock_stat, it is hitting the sk_buff_head.lock as well (it was the
> number one contended lock).
>
> So while this issue really hits -rt kernels hard, it has a real effect on
> mainline kernels as well. The contention of the spinlocks is amplified
> when they get turned into rt-mutexes, which causes a double context switch.
>
> Below is the top of the lock_stat for 2.6.29-rc8. This was captured from
> a 1 minute network stress test. The next high contender had 2 orders of
> magnitude fewer contentions. Think of the throughput increase if we could
> ease this contention a bit. We might even be able to saturate a 10GbE
> link.
>
> lock_stat version 0.3
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> class name con-bounces contentions waittime-min
> waittime-max waittime-total acq-bounces acquisitions
> holdtime-min holdtime-max holdtime-total
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> &list->lock#3: 24517307 24643791 0.71
> 1286.62 56516392.42 34834296 44904018
> 0.60 164.79 31314786.02
> -------------
> &list->lock#3 15596927 [<ffffffff812474da>]
> dev_queue_xmit+0x2ea/0x468
> &list->lock#3 9046864 [<ffffffff812546e9>]
> __qdisc_run+0x11b/0x1ef
> -------------
> &list->lock#3 6525300 [<ffffffff812546e9>]
> __qdisc_run+0x11b/0x1ef
> &list->lock#3 18118491 [<ffffffff812474da>]
> dev_queue_xmit+0x2ea/0x468
>
>
> The story is the same for -rt kernels, only the waittime and holdtime
> are both
> orders of magnitude greater.
>
> I am not exactly clear on the solution, but if I understand correctly,
> in the
> past there has been some discussion of batched enqueueing and
> dequeueing. Is
> anyone else working on this problem right now who has just not yet posted
> anything for review? Questions, comments, flames?
>
Yes we have a known contention point here, but before adding more complex code,
could you try following patch please ?
[PATCH] net: Reorder fields of struct Qdisc
dev_queue_xmit() needs to dirty fields "state" and "q"
On x86_64 arch, they currently span two cache lines, involving more
cache line ping pongs than necessary.
Before patch :
offsetof(struct Qdisc, state)=0x38
offsetof(struct Qdisc, q)=0x48
offsetof(struct Qdisc, dev_queue)=0x60
After patch :
offsetof(struct Qdisc, dev_queue)=0x38
offsetof(struct Qdisc, state)=0x48
offsetof(struct Qdisc, q)=0x50
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f8c4742..e24feeb 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -51,10 +51,11 @@ struct Qdisc
u32 handle;
u32 parent;
atomic_t refcnt;
- unsigned long state;
+ struct netdev_queue *dev_queue;
+
struct sk_buff *gso_skb;
+ unsigned long state;
struct sk_buff_head q;
- struct netdev_queue *dev_queue;
struct Qdisc *next_sched;
struct list_head list;
next prev parent reply other threads:[~2009-03-18 19:07 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-18 17:24 High contention on the sk_buff_head.lock Vernon Mauery
2009-03-18 19:07 ` Eric Dumazet [this message]
2009-03-18 20:17 ` Vernon Mauery
2009-03-20 23:29 ` Jarek Poplawski
2009-03-23 8:32 ` Eric Dumazet
2009-03-23 8:37 ` David Miller
2009-03-23 8:50 ` Jarek Poplawski
2009-04-02 14:13 ` Herbert Xu
2009-04-02 14:15 ` Herbert Xu
2009-03-18 20:54 ` Andi Kleen
2009-03-18 21:03 ` David Miller
2009-03-18 21:10 ` Vernon Mauery
2009-03-18 21:38 ` David Miller
2009-03-18 21:49 ` Vernon Mauery
2009-03-19 1:02 ` David Miller
2009-03-18 21:54 ` Gregory Haskins
2009-03-19 1:03 ` David Miller
2009-03-19 1:13 ` Sven-Thorsten Dietrich
2009-03-19 1:17 ` David Miller
2009-03-19 1:43 ` Sven-Thorsten Dietrich
2009-03-19 1:54 ` David Miller
2009-03-19 5:49 ` Eric Dumazet
2009-03-19 5:58 ` David Miller
2009-03-19 14:04 ` [PATCH] net: reorder struct Qdisc for better SMP performance Eric Dumazet
2009-03-20 8:33 ` David Miller
2009-03-19 13:45 ` High contention on the sk_buff_head.lock Andi Kleen
2009-03-19 3:48 ` Gregory Haskins
2009-03-19 5:38 ` David Miller
2009-03-19 12:42 ` Gregory Haskins
2009-03-19 20:52 ` David Miller
2009-03-19 12:50 ` Peter W. Morreale
2009-03-19 7:15 ` Evgeniy Polyakov
2009-03-18 21:07 ` Vernon Mauery
2009-03-18 21:45 ` Eilon Greenstein
2009-03-18 21:51 ` Vernon Mauery
2009-03-18 21:59 ` Andi Kleen
2009-03-18 22:19 ` Rick Jones
2009-03-19 12:59 ` Peter W. Morreale
2009-03-19 13:36 ` Peter W. Morreale
2009-03-19 13:46 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49C14667.2040806@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=vernux@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).