From: Eric Dumazet <eric.dumazet@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: netdev <netdev@vger.kernel.org>
Subject: [PATCH V4 0/4] net: relax dst refcnt for net-next-2.6
Date: Mon, 10 May 2010 23:08:36 +0200 [thread overview]
Message-ID: <1273525716.2590.313.camel@edumazet-laptop> (raw)
Here is V4 of a patch previously sent last year
One serious point of contention in network stack is the IP route cache
refcounts in input path, on SMP setups.
On stress situation, one cpu (say A) handles network softirq RX processing.
When a packet is received, we need to find a dst_entry, take
a reference on this dst_entry and associate skb to this dst_entry.
skb is queued on a socket receive queue.
When application (running from another CPU B) dequeues this packet,
it has to release the dst_entry, which refcount is hot and dirty on
another CPU A cache, involving an expensive cache line ping-pong.
Back in November 2008, we tried to keep this cache line only
in CPU A (commit 703556028792)
(net: release skb->dst in sock_queue_rcv_skb()), but we had
to revert this commit because it broke IP_PKTINFO handling,
as noticed by Mark McLoughlin
Then David suggested not taking the reference at the first place,
which this patch does when possible.
We prepared this work with commit adf30907 (net: skb->dst accessors),
introducing accessors to work on skb->dst
We now can use the low order bit of skb->_skb_dst to tell
if a reference was _not_ taken on dst for this skb
We make sure a dst leaving rcu protected region has a refcount.
This is done on enqueueing on any kind of queue (backlog, qdisc,
nf_queue, ...)
Net effect of this patch is avoiding two atomic ops per
incoming packet, and two atomic ops per outgoing TCP packet.
Same for outgoing path, if device has IFF_XMIT_DST_RELEASE,
or qdisc is work-conserving (or no queue)
V2: Forwarding is taken into account by changes in dev_queue_xmit(),
forcing a dst refcount on !IFF_XMIT_DST_RELEASE devices.
V3: As pointed by Patrick, we must force a dst refcount in
__nf_queue(), before queueing a packet.
V4:
- output path (ip_queue_xmit()) handled as well.
- commit f84af32cbca70 (net: ip_queue_rcv_skb() helper) already in tree.
- Some interim checks make sure a dst does not escape unrefcounted
from a RCU section (thanks to lockdep)
- Better handling of queueing (backlog, qdisc)
Patch split into 4 parts :
1/4 : add a noref bit on skb dst (dstref infrastructure)
2/4 : ip_route_input_noref() introduction
3/4 : Use ip_route_input_noref() in three input paths
4/4 : norefcounting in ip_queue_xmit()
include/linux/skbuff.h | 58 ++++++++++++++++++++++++++++++++++---
include/net/dst.h | 48 ++++++++++++++++++++++++++++--
include/net/route.h | 17 ++++++++++
include/net/sock.h | 13 +++++---
net/core/dev.c | 3 +
net/core/skbuff.c | 2 -
net/core/sock.c | 6 +++
net/ipv4/arp.c | 2 -
net/ipv4/icmp.c | 6 +--
net/ipv4/ip_input.c | 4 +-
net/ipv4/ip_options.c | 9 +++--
net/ipv4/ip_output.c | 9 ++++-
net/ipv4/netfilter.c | 6 +--
net/ipv4/route.c | 17 +++++++---
net/ipv4/xfrm4_input.c | 4 +-
net/netfilter/nf_queue.c | 2 +
net/sched/sch_generic.c | 2 -
17 files changed, 170 insertions(+), 38 deletions(-)
next reply other threads:[~2010-05-10 21:27 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-10 21:08 Eric Dumazet [this message]
2010-05-18 0:22 ` [PATCH V4 0/4] net: relax dst refcnt for net-next-2.6 David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1273525716.2590.313.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox