All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: David Miller <davem@davemloft.net>
Cc: andi@firstfloor.org, netdev@vger.kernel.org
Subject: [PATCH] net: release skb->dst in sock_queue_rcv_skb()
Date: Wed, 26 Nov 2008 01:00:30 +0100	[thread overview]
Message-ID: <492C919E.3050108@cosmosbay.com> (raw)
In-Reply-To: <20081124.210038.90767194.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 2867 bytes --]

David Miller a écrit :
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Tue, 25 Nov 2008 05:43:32 +0100
> 
>> Very interesting. So we could try the following path :
>>
>> 1) First try to release dst when queueing skb to various queues
>> (UDP, TCP, ...) while its hot. Reader wont have to release it
>> while its cold.
>>
>> 2) Check if we can handle the input path without any refcount
>>    dirtying ?
>>
>> To make the transition easy, we could use a bit on skb to mark
>> dst being not refcounted (ie no dst_release() should be done on it)
> 
> It is possible to make this self-auditing.  For example, by
> using the usual trick where we encode a pointer in an
> unsigned long and use the low bits for states.
> 
> In the first step, make each skb->dst access go through some
> accessor inline function.
> 
> Next, audit the paths where skb->dst's can "escape" the pure
> packet input path.  Add annotations, in the form of a
> inline function call, for these locations.
> 
> Also, audit the other locations where we enqueue into a socket
> queue and no longer care about the skb->dst, and annotate
> those with another inline function.
> 
> Finally, the initial skb->dst assignment in the input path doesn't
> grab a reference, but sets the low bit ("refcount pending") in
> the encoded skb->dst pointer.  The skb->dst "escape" inline
> function performs the deferred refcount grab.  And kfree_skb()
> is taught to not dst_release() on skb->dst's which have the
> low bit set.
> 
> Anyways, something like that.

I looked this stuff and found it would be difficult to not grab a 
reference (and more important not writing to dst) in input path.

ip_rcv_finish() calls ip_route_input()
and ip_route_input() calls dst_use(&rth->u.dst, jiffies);

static inline void dst_use(struct dst_entry *dst, unsigned long time)
{
        dst_hold(dst);
        dst->__use++;
        dst->lastuse = time;
}

Even if we avoid the refcount increment, I guess we need the lastuse
assignement in order to keep dst in cache. Not sure about the role of
__use field. Hum... for a tcp connection, dst refcount should already
be pinned by a sk->sk_dst_cache. Maybe test refcount value, and if this
value is > 1, dont take a reference. (given rcu_read_lock() is done
before calling ip_rcv_finish())

In the meantime, what do you think of the following patch ?

[PATCH] net: release skb->dst in sock_queue_rcv_skb()

When queuing a skb to sk->sk_receive_queue, we can release its dst, not
anymore needed.
Since current cpu did the dst_hold(), refcount is probably still hot
int this cpu caches.

This avoids readers to access the original dst to decrement its refcount,
possibly a long time after packet reception. This should speedup UDP
and RAW receive path.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

[-- Attachment #2: sock_queue_rcv_skb.patch --]
[-- Type: text/plain, Size: 534 bytes --]

diff --git a/net/core/sock.c b/net/core/sock.c
index a4e840e..b287645 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -289,7 +289,11 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 
 	skb->dev = NULL;
 	skb_set_owner_r(skb, sk);
-
+	/*
+	 * release dst right now while its hot
+	 */
+	dst_release(skb->dst);
+	skb->dst = NULL;
 	/* Cache the SKB length before we tack it onto the receive
 	 * queue.  Once it is added it no longer belongs to us and
 	 * may be freed by other threads of control pulling packets

  reply	other threads:[~2008-11-26  0:00 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-24  8:57 [RFC] Could we avoid touching dst->refcount in some cases ? Eric Dumazet
2008-11-24  9:42 ` Andi Kleen
2008-11-24 10:14   ` Eric Dumazet
2008-11-24 11:24     ` [PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_append_data() Eric Dumazet
2008-11-24 13:59       ` [PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames() Eric Dumazet
2008-11-25  0:07         ` David Miller
2008-11-24 23:55       ` [PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_append_data() David Miller
2008-11-25  2:22         ` Andi Kleen
2008-11-24 11:27     ` [RFC] Could we avoid touching dst->refcount in some cases ? Andi Kleen
2008-11-24 23:36       ` David Miller
2008-11-24 23:39     ` David Miller
2008-11-25  4:43       ` Eric Dumazet
2008-11-25  5:00         ` David Miller
2008-11-26  0:00           ` Eric Dumazet [this message]
2008-11-26  0:23             ` [PATCH] net: release skb->dst in sock_queue_rcv_skb() David Miller
2008-11-26  2:04             ` David Miller
2008-11-26  7:39               ` Eric Dumazet
2008-11-26  9:08                 ` David Miller
2008-12-17 11:25             ` net-next: broken IP_PKTINFO [was Re: [PATCH] net: release skb->dst in sock_queue_rcv_skb()] Mark McLoughlin
2008-12-18  3:34               ` net-next: broken IP_PKTINFO David Miller
2008-12-18  5:59                 ` Eric Dumazet
2008-12-18  6:17                   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=492C919E.3050108@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=andi@firstfloor.org \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.