From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [PATCH net-next] net: avoid unneeded atomic operation in ip*_append_data() Date: Wed, 04 Apr 2018 11:54:18 -0400 (EDT) Message-ID: <20180404.115418.1433035029476266484.davem@davemloft.net> References: <7e55c93c2c7cddf4c077aa77aa1ab58396f502ff.1522844999.git.pabeni@redhat.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, edumazet@google.com To: pabeni@redhat.com Return-path: Received: from shards.monkeyblade.net ([184.105.139.130]:40660 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751582AbeDDPyU (ORCPT ); Wed, 4 Apr 2018 11:54:20 -0400 In-Reply-To: <7e55c93c2c7cddf4c077aa77aa1ab58396f502ff.1522844999.git.pabeni@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: From: Paolo Abeni Date: Wed, 4 Apr 2018 14:30:01 +0200 > After commit 694aba690de0 ("ipv4: factorize sk_wmem_alloc updates > done by __ip_append_data()") and commit 1f4c6eb24029 ("ipv6: > factorize sk_wmem_alloc updates done by __ip6_append_data()"), > when transmitting sub MTU datagram, an addtional, unneeded atomic > operation is performed in ip*_append_data() to update wmem_alloc: > in the above condition the delta is 0. > > The above cause small but measurable performance regression in UDP > xmit tput test with packet size below MTU. > > This change avoids such overhead updating wmem_alloc only if > wmem_alloc_delta is non zero. > > The error path is left intentionally unmodified: it's a slow path > and simplicity is preferred to performances. > > Fixes: 694aba690de0 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()") > Fixes: 1f4c6eb24029 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()") > Signed-off-by: Paolo Abeni ... > - refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc); > + if (wmem_alloc_delta) > + refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc); ... > - refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc); > + if (wmem_alloc_delta) > + refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc); This is simple enough, so applied. But I wonder if atomic_{add,sub} and refcount_{add,sub}() should just check for zero inline, just like the {set,clear}_bit() implementations avoid the atomic operation if the bit already has the desired value.