From: Eric Dumazet <dada1@cosmosbay.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Linux Netdev List <netdev@vger.kernel.org>
Subject: Re: [RFC] Could we avoid touching dst->refcount in some cases ?
Date: Mon, 24 Nov 2008 11:14:29 +0100 [thread overview]
Message-ID: <492A7E85.3060502@cosmosbay.com> (raw)
In-Reply-To: <87y6z9h33h.fsf@basil.nowhere.org>
Andi Kleen a écrit :
> Eric Dumazet <dada1@cosmosbay.com> writes:
>
>> tbench has hard time incrementing decrementing the route cache refcount
>> shared by all communications on localhost.
>
> iirc there was a patch some time ago to use per CPU loopback devices to
> avoid this, but it was considered too much a benchmark hack.
> As core counts increase it might stop being that though.
Well, you probably mention Stephen patch to avoid dirtying other contended
cache lines (one napi structure per cpu)
Having multiple loopback dev would really be a hack I agree.
>
>> On real world, we also have this problem on RTP servers sending many UDP
>> frames to mediagateways, especially big ones handling thousand of streams.
>>
>> Given that route entries are using RCU, we probably can avoid incrementing
>> their refcount in case of connected sockets ?
>
> Normally they can be hold over sleeps or queuing of skbs too, and RCU
> doesn't handle that. To make it handle that you would need to define a
> custom RCU period designed for this case, but this would be probably
> tricky and fragile: especially I'm not sure even if you had a "any
> packet queued" RCU method it be guaranteed to always finish
> because there is no fixed upper livetime of a packet.
>
> The other issue is that on preemptible kernels you would need to
> disable preemption all the time such a routing entry is hold, which
> could be potentially quite long.
>
Well, in case of UDP, we call ip_push_pending_frames() and this one
does the increment of refcount (again). I was not considering
avoiding the refcount hold we do when queing a skb in transmit
queue, only during a short period of time. Oh well, ip_append_data()
might sleep, so this cannot work...
I agree avoiding one refcount increment/decrement is probably
not a huge gain, considering we *have* to do the increment,
but when many cpus are using UDP send/receive in //, this might
show a gain somehow.
So maybe we could make ip_append_data() (or its callers) a
litle bit smarter, avoiding increment/decrement if possible.
next prev parent reply other threads:[~2008-11-24 10:14 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-24 8:57 [RFC] Could we avoid touching dst->refcount in some cases ? Eric Dumazet
2008-11-24 9:42 ` Andi Kleen
2008-11-24 10:14 ` Eric Dumazet [this message]
2008-11-24 11:24 ` [PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_append_data() Eric Dumazet
2008-11-24 13:59 ` [PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames() Eric Dumazet
2008-11-25 0:07 ` David Miller
2008-11-24 23:55 ` [PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_append_data() David Miller
2008-11-25 2:22 ` Andi Kleen
2008-11-24 11:27 ` [RFC] Could we avoid touching dst->refcount in some cases ? Andi Kleen
2008-11-24 23:36 ` David Miller
2008-11-24 23:39 ` David Miller
2008-11-25 4:43 ` Eric Dumazet
2008-11-25 5:00 ` David Miller
2008-11-26 0:00 ` [PATCH] net: release skb->dst in sock_queue_rcv_skb() Eric Dumazet
2008-11-26 0:23 ` David Miller
2008-11-26 2:04 ` David Miller
2008-11-26 7:39 ` Eric Dumazet
2008-11-26 9:08 ` David Miller
2008-12-17 11:25 ` net-next: broken IP_PKTINFO [was Re: [PATCH] net: release skb->dst in sock_queue_rcv_skb()] Mark McLoughlin
2008-12-18 3:34 ` net-next: broken IP_PKTINFO David Miller
2008-12-18 5:59 ` Eric Dumazet
2008-12-18 6:17 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=492A7E85.3060502@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=andi@firstfloor.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).