Re: [PATCH nf-next 3/5] nft_set_pipapo: Prepare for vectorised implementation: alignment

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stefano Brivio <sbrivio@redhat.com>
To: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>, netfilter-devel@vger.kernel.org
Subject: Re: [PATCH nf-next 3/5] nft_set_pipapo: Prepare for vectorised implementation: alignment
Date: Thu, 5 Mar 2020 21:35:13 +0100	[thread overview]
Message-ID: <20200305213513.504171de@elisabeth> (raw)
In-Reply-To: <20200223221435.GX19559@breakpoint.cc>

Hi Florian,

On Sun, 23 Feb 2020 23:14:35 +0100
Florian Westphal <fw@strlen.de> wrote:

> Stefano Brivio <sbrivio@redhat.com> wrote:
> >  struct nft_pipapo_field {
> > @@ -439,6 +456,9 @@ struct nft_pipapo_field {
> >  	unsigned long rules;
> >  	size_t bsize;
> >  	int bb;
> > +#ifdef NFT_PIPAPO_ALIGN
> > +	unsigned long *lt_aligned;
> > +#endif
> >  	unsigned long *lt;
> >  	union nft_pipapo_map_bucket *mt;
> >  };  
> 
> I wonder if these structs can be compressed.
> AFAICS bsize is in sizes of longs, so when this number is
> large then we also need to kvmalloc a large blob of memory.
> 
> I think u32 would be enough?
> 
> nft_pipapo_field is probably the most relevant one wrt. to size.

...so I tried rearranging that struct. The results (on both x86_64 and
aarch64) are rather disappointing: the hole (that we get on 64-bit
architectures) seems to actually be beneficial.

If I turn the 'unsigned long' and 'size_t' members to u32, matching
rates drop very slightly (1-3% depending on the case in the usual
kselftest).

I then tried to shrink it more aggressively ('bb' and 'groups' can be
u8, 'bsize' can probably even be u16), and there the performance hit is
much more apparent (> 10%) -- but this is something I can easily explain
with word masks and shifts.

I'm not sure exactly what happens with the pair of u32's. The assembly
looks clean. I would probably need some micro-benchmarking to
clearly relate this to execution pipeline "features" and to, perhaps,
find a way to shuffle accesses to fields to actually speed this up
while fitting two fields in the same word.

However, I'm not so sure it's worth it at this stage.

> >  struct nft_pipapo_match {
> >  	int field_count;
> > +#ifdef NFT_PIPAPO_ALIGN
> > +	unsigned long * __percpu *scratch_aligned;
> > +#endif
> >  	unsigned long * __percpu *scratch;
> >  	size_t bsize_max;  
> 
> Same here (bsize_max -- could fit with hole after field_count)?
> 
> Also, since you know the size of nft_pipapo_match (including the
> dynamically allocated array at the end), you could store the
> original memory (*scratch) and the rcu_head at the end, since
> they are not needed at lookup time and a little overhead to calculate
> their storage offset is fine.
> 
> Not sure its worth it, just an idea.

I actually bothered with this: even without the trick you explained,
struct field f[0] can safely become f[16] (NFT_REG32_COUNT), and I can
move it to the top, and then push the rcu_head down. This, again, hits
lookup rates quite badly. With f[4] lookup rates are the same as the
current case.

So, well, I wouldn't touch this either -- maybe some micro-benchmarking
might suggest a better way, but I doubt it's worth it right now.

-- 
Stefano

next prev parent reply	other threads:[~2020-03-05 20:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-23 21:23 [PATCH nf-next 0/5] nft_set_pipapo: Performance improvements: Season 1 Stefano Brivio
2020-02-23 21:23 ` [PATCH nf-next 1/5] nft_set_pipapo: Generalise group size for buckets Stefano Brivio
2020-02-23 21:23 ` [PATCH nf-next 2/5] nft_set_pipapo: Add support for 8-bit lookup groups and dynamic switch Stefano Brivio
2020-02-23 21:23 ` [PATCH nf-next 3/5] nft_set_pipapo: Prepare for vectorised implementation: alignment Stefano Brivio
2020-02-23 22:14   ` Florian Westphal
2020-02-23 23:04     ` Stefano Brivio
2020-02-23 23:15       ` Florian Westphal
2020-03-05 20:35     ` Stefano Brivio [this message]
2020-02-23 21:23 ` [PATCH nf-next 4/5] nft_set_pipapo: Prepare for vectorised implementation: helpers Stefano Brivio
2020-02-23 21:23 ` [PATCH nf-next 5/5] nft_set_pipapo: Introduce AVX2-based lookup implementation Stefano Brivio
2020-02-23 22:16   ` Florian Westphal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200305213513.504171de@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=fw@strlen.de \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.