High contention on the sk_buff_head.lock

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Vernon Mauery <vernux@us.ibm.com>
To: netdev <netdev@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	rt-users <linux-rt-users@vger.kernel.org>
Subject: High contention on the sk_buff_head.lock
Date: Wed, 18 Mar 2009 10:24:52 -0700	[thread overview]
Message-ID: <49C12E64.1000301@us.ibm.com> (raw)

I have been beating on network throughput in the -rt kernel for some time
now.  After digging down through the send path of UDP packets, I found
that the sk_buff_head.lock is under some very high contention.  This lock
is acquired each time a packet is enqueued on a qdisc and then acquired
again to dequeue the packet.  Under high networking loads, the enqueueing
processes are not only contending among each other for the lock, but also
with the net-tx soft irq.  This makes for some very high contention on this
one lock.  My testcase is running varying numbers of concurrent netperf
instances pushing UDP traffic to another machine.  As the count goes from
1 to 2, the network performance increases.  But from 2 to 4 and from 4 to 8,
we see a big decline, with 8 instances pushing about half of what a single
thread can do.

Running 2.6.29-rc6-rt3 on an 8-way machine with a 10GbE card (I have tried
both NetXen and Broadcom, with very similar results), I can only push about
1200 Mb/s.  Whereas with the mainline 2.6.29-rc8 kernel, I can push nearly
6000 Mb/s. But still not as much as I think is possible.  I was curious and
decided to see if the mainline kernel was hitting the same lock, and using
/proc/lock_stat, it is hitting the sk_buff_head.lock as well (it was the
number one contended lock).

So while this issue really hits -rt kernels hard, it has a real effect on
mainline kernels as well.  The contention of the spinlocks is amplified
when they get turned into rt-mutexes, which causes a double context switch.

Below is the top of the lock_stat for 2.6.29-rc8.  This was captured from
a 1 minute network stress test.  The next high contender had 2 orders of
magnitude fewer contentions.  Think of the throughput increase if we could
ease this contention a bit.  We might even be able to saturate a 10GbE
link.

lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
       class name    con-bounces    contentions   waittime-min   waittime-max   waittime-total    acq-bounces   acquisitions   holdtime-min  holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

    &list->lock#3:      24517307       24643791           0.71        1286.62      56516392.42       34834296       44904018           0.60        164.79    31314786.02
     -------------
    &list->lock#3       15596927    [<ffffffff812474da>] dev_queue_xmit+0x2ea/0x468
    &list->lock#3        9046864    [<ffffffff812546e9>] __qdisc_run+0x11b/0x1ef
     -------------
    &list->lock#3        6525300    [<ffffffff812546e9>] __qdisc_run+0x11b/0x1ef
    &list->lock#3       18118491    [<ffffffff812474da>] dev_queue_xmit+0x2ea/0x468

The story is the same for -rt kernels, only the waittime and holdtime are both
orders of magnitude greater.

I am not exactly clear on the solution, but if I understand correctly, in the
past there has been some discussion of batched enqueueing and dequeueing.  Is
anyone else working on this problem right now who has just not yet posted
anything for review?  Questions, comments, flames?

--Vernon

next             reply	other threads:[~2009-03-18 17:24 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-18 17:24 Vernon Mauery [this message]
2009-03-18 19:07 ` High contention on the sk_buff_head.lock Eric Dumazet
2009-03-18 20:17   ` Vernon Mauery
2009-03-20 23:29     ` Jarek Poplawski
2009-03-23  8:32       ` Eric Dumazet
2009-03-23  8:37         ` David Miller
2009-03-23  8:50           ` Jarek Poplawski
2009-04-02 14:13           ` Herbert Xu
2009-04-02 14:15             ` Herbert Xu
2009-03-18 20:54 ` Andi Kleen
2009-03-18 21:03   ` David Miller
2009-03-18 21:10     ` Vernon Mauery
2009-03-18 21:38       ` David Miller
2009-03-18 21:49         ` Vernon Mauery
2009-03-19  1:02           ` David Miller
2009-03-18 21:54         ` Gregory Haskins
2009-03-19  1:03           ` David Miller
2009-03-19  1:13             ` Sven-Thorsten Dietrich
2009-03-19  1:17               ` David Miller
2009-03-19  1:43                 ` Sven-Thorsten Dietrich
2009-03-19  1:54                   ` David Miller
2009-03-19  5:49                     ` Eric Dumazet
2009-03-19  5:58                       ` David Miller
2009-03-19 14:04                         ` [PATCH] net: reorder struct Qdisc for better SMP performance Eric Dumazet
2009-03-20  8:33                           ` David Miller
2009-03-19 13:45                   ` High contention on the sk_buff_head.lock Andi Kleen
2009-03-19  3:48             ` Gregory Haskins
2009-03-19  5:38               ` David Miller
2009-03-19 12:42                 ` Gregory Haskins
2009-03-19 20:52                   ` David Miller
2009-03-19 12:50             ` Peter W. Morreale
2009-03-19  7:15           ` Evgeniy Polyakov
2009-03-18 21:07   ` Vernon Mauery
2009-03-18 21:45     ` Eilon Greenstein
2009-03-18 21:51       ` Vernon Mauery
2009-03-18 21:59         ` Andi Kleen
2009-03-18 22:19           ` Rick Jones
2009-03-19 12:59   ` Peter W. Morreale
2009-03-19 13:36     ` Peter W. Morreale
2009-03-19 13:46     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C12E64.1000301@us.ibm.com \
    --to=vernux@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).