From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [RFC] Could we avoid touching dst->refcount in some cases ? Date: Mon, 24 Nov 2008 11:14:29 +0100 Message-ID: <492A7E85.3060502@cosmosbay.com> References: <492A6C94.7030308@cosmosbay.com> <87y6z9h33h.fsf@basil.nowhere.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Linux Netdev List To: Andi Kleen Return-path: Received: from gw1.cosmosbay.com ([86.65.150.130]:34920 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751856AbYKXKOc convert rfc822-to-8bit (ORCPT ); Mon, 24 Nov 2008 05:14:32 -0500 In-Reply-To: <87y6z9h33h.fsf@basil.nowhere.org> Sender: netdev-owner@vger.kernel.org List-ID: Andi Kleen a =E9crit : > Eric Dumazet writes: >=20 >> tbench has hard time incrementing decrementing the route cache refco= unt >> shared by all communications on localhost. >=20 > iirc there was a patch some time ago to use per CPU loopback devices = to=20 > avoid this, but it was considered too much a benchmark hack. > As core counts increase it might stop being that though. Well, you probably mention Stephen patch to avoid dirtying other conten= ded cache lines (one napi structure per cpu) Having multiple loopback dev would really be a hack I agree. >=20 >> On real world, we also have this problem on RTP servers sending many= UDP >> frames to mediagateways, especially big ones handling thousand of st= reams. >> >> Given that route entries are using RCU, we probably can avoid increm= enting >> their refcount in case of connected sockets ? >=20 > Normally they can be hold over sleeps or queuing of skbs too, and RCU > doesn't handle that. To make it handle that you would need to define = a > custom RCU period designed for this case, but this would be probably > tricky and fragile: especially I'm not sure even if you had a "any > packet queued" RCU method it be guaranteed to always finish=20 > because there is no fixed upper livetime of a packet. >=20 > The other issue is that on preemptible kernels you would need to=20 > disable preemption all the time such a routing entry is hold, which > could be potentially quite long. >=20 Well, in case of UDP, we call ip_push_pending_frames() and this one does the increment of refcount (again). I was not considering avoiding the refcount hold we do when queing a skb in transmit queue, only during a short period of time. Oh well, ip_append_data() might sleep, so this cannot work... I agree avoiding one refcount increment/decrement is probably not a huge gain, considering we *have* to do the increment, but when many cpus are using UDP send/receive in //, this might show a gain somehow. So maybe we could make ip_append_data() (or its callers) a litle bit smarter, avoiding increment/decrement if possible.