From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Abeni Subject: Re: [RFC PATCH] reuseport: compute the ehash only if needed Date: Thu, 14 Dec 2017 09:29:46 +0100 Message-ID: <1513240186.2604.10.camel@redhat.com> References: <20171213.150855.2054919319089098824.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, kraig@google.com, edumazet@google.com To: David Miller Return-path: Received: from mx1.redhat.com ([209.132.183.28]:44626 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750795AbdLNI3s (ORCPT ); Thu, 14 Dec 2017 03:29:48 -0500 In-Reply-To: <20171213.150855.2054919319089098824.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-ID: Hi, On Wed, 2017-12-13 at 15:08 -0500, David Miller wrote: > From: Paolo Abeni > Date: Tue, 12 Dec 2017 14:09:28 +0100 > > > When a reuseport socket group is using a BPF filter to distribute > > the packets among the sockets, we don't need to compute any hash > > value, but the current reuseport_select_sock() requires the > > caller to compute such hash in advance. > > > > This patch reworks reuseport_select_sock() to compute the hash value > > only if needed - missing or failing BPF filter. Since different > > hash functions have different argument types - ipv4 addresses vs ipv6 > > ones - to avoid over-complicate the interface, reuseport_select_sock() > > is now a macro. > > > > Additionally, the sk_reuseport test is move inside reuseport_select_sock, > > to avoid some code duplication. > > > > Overall this gives small but measurable performance improvement > > under UDP flood while using SO_REUSEPORT + BPF. > > > > Signed-off-by: Paolo Abeni > > I don't doubt that this improves the case where the hash is elided, but > I suspect it makes things slower othewise. > > You're doing two function calls for an operation that used to require > just one in the bottom of the call chain. > > You're also putting something onto the stack that the compiler can't > possibly optimize into purely using cpu registers to hold. Thank you for the feedback. I was unable to measure any performance regression for the hash based demultiplexing, and I think that the number of function calls is unchanged in such scenario (with vanilla kernel we have ehash() and reuseport_select_sock(), with the patched one __reuseport_get_info() and ehash()). I agree you are right about the additional stack usage introduced by this patch. Overall I see we need something better than this. Thanks, Paolo