netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Florian Westphal <fw@strlen.de>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps
Date: Wed, 7 Feb 2024 18:27:51 +0100	[thread overview]
Message-ID: <20240207182751.6ed0dd1d@elisabeth> (raw)
In-Reply-To: <20240207152328.GA11077@breakpoint.cc>

On Wed, 7 Feb 2024 16:23:28 +0100
Florian Westphal <fw@strlen.de> wrote:

> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > This isn't reliable.  
> > 
> > Uh oh. In hindsight, this sounds so obvious now...  
> 
> Thats a recurring theme with a lot of bugs.
> So, no, it was not obvious.
> 
> > > There can be multiple sets and we can't be sure that the upper
> > > and lower half of all set scratch map is always in sync (lookups
> > > can be conditional), so one set might have swapped, but other might
> > > not have been queried.
> > > 
> > > Thus we need to keep the index per-set-and-cpu, just like the
> > > scratchpad.
> > > 
> > > Note that this bug fix is incomplete, there is a related issue.
> > > 
> > > avx2 and normal implementation might use slightly different areas of the
> > > map array space due to the avx2 alignment requirements, so
> > > m->scratch (generic/fallback implementation) and ->scratch_aligned
> > > (avx) may partially overlap. scratch and scratch_aligned are not distinct
> > > objects, the latter is just the aligned address of the former.
> > > 
> > > After this change, write to scratch_align->map_index may write to
> > > scratch->map, so this issue becomes more prominent, we can set to 1
> > > a bit in the supposedly-all-zero area of scratch->map[].
> > > 
> > > A followup patch will remove the scratch_aligned and makes generic and
> > > avx code use the same (aligned) area.
> > > 
> > > Its done in a separate change to ease review.
> > > 
> > > Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
> > > Signed-off-by: Florian Westphal <fw@strlen.de>  
> > 
> > Minus one nit (not worth respinning) and one half-doubt below,
> > 
> > Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> > 
> > ...I'm still reviewing the rest.  
> 
> Thanks for reviewing.
> 
> > >  #ifdef NFT_PIPAPO_ALIGN
> > > -		scratch_aligned = NFT_PIPAPO_LT_ALIGN(scratch);
> > > +		scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
> > > +		/* Need to align ->map start, not start of structure itself */
> > > +		scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);  
> > 
> > This should be fine with the current version of
> > NFT_PIPAPO_ALIGN_HEADROOM, but it took me quite some pondering, reasoning
> > below if you feel like double checking.  
> 
> Good point.
> 
> > Let's say ARCH_KMALLOC_MINALIGN is 4, NFT_PIPAPO_LT_ALIGN is 32, we
> > need 100 bytes for the scratch map (existing implementation), and the
> > address, x, we get from kzalloc_node() is k + 28, with k aligned to 32
> > bytes.
> > Then, we'll ask to allocate 32 - 4 = 28 extra bytes
> > (NFT_PIPAPO_ALIGN_HEADROOM), that is, 128 bytes, and we'll start using
> > the area at x + 4 (aligned to 32 bytes), with 124 bytes in front of us.
> >
> > With this change, and the current NFT_PIPAPO_ALIGN_HEADROOM, we'll
> > allocate (usually) 4 bytes more, 132 bytes, and we'll start using the
> > area at x + 4 anyway, with 128 bytes in front of us, and we could have
> > asked to allocate less, which made me think for a moment that
> > NFT_PIPAPO_ALIGN_HEADROOM needed to be adjusted too.  
> 
> We'll allocate sizeof(long) more space (map_index), which is 4 bytes in
> your example.

Oh, right, it could be 8 bytes too. Let's stick to 4 for simplicity.

> > However, 'scratch' at k + 28 is not the worst case: k + 32 is. At that
> > point, we need anyway to ask for 28 extra bytes, because 'map_index'
> > will force us to start from x + 32.  
> 
> Wait.  k + 32 is really "k" for old code: slab gave us an aligned
> address.

Yes, I meant that for the *new* code: k + 32 mod 32 (that is, k) is the
worst case for the new code.

> In new code, k+4 is the perfect "already-aligned" address where we would
> 'no-op' the address on a ARCH_KMALLOC_MINALIGN == 4 system.

Isn't the already aligned-address k - 4, that is, k + 28? With k + 4,
we would have &scratch->map[0] at k + 8. But anyway:

> Lets assume we get address k, and that address is the worst
> possible aligned one (with minalign 4), so we align
> (k + 4) (&scratch->map[0]), then subtract the index/struct head,
> which means we store (align(k+4) - 4), which would be 28.
> 
> Worst case aligned value allocator can provide for new code
> is k or k + 32 (make no difference):
> 
> Lets say address we got from allocator is 0x4:
> 
> NFT_PIPAPO_LT_ALIGN(&scratch->map); -> aligned to 32, we store 28
> as start of struct, so ->map[0] is at address 32.
> 
> Lets say address we got from allocator is 0x20 (32):
> NFT_PIPAPO_LT_ALIGN(&scratch->map); -> aligned to 64, we store 60
> as start of struct, so ->map[0] at 64.
> 
> In both cases ALIGN() ate 28 bytes of the allocation, which we accounted
> for as NFT_PIPAPO_ALIGN_HEADROOM.
> 
> Maybe thats what you were saying.  I could try to add/expand the
> comments here for the alignment calculations.

...yes, the rest is exactly what I meant. I'm not really satisfied of
the paragraph below but maybe something on the lines of:

	/* Align &scratch->map (not the struct itself): the extra
	 * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node() above
	 * guarantee we can waste up to those bytes in order to align the map
	 * field regardless of its offset within the struct.
	 */

-- 
Stefano


  reply	other threads:[~2024-02-07 17:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-06 12:23 [PATCH nf 0/3] netfilter: nft_set_pipapo: map_index must be per set Florian Westphal
2024-02-06 12:23 ` [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps Florian Westphal
2024-02-07 14:15   ` Stefano Brivio
2024-02-07 15:23     ` Florian Westphal
2024-02-07 17:27       ` Stefano Brivio [this message]
2024-02-07 19:24         ` Florian Westphal
2024-02-06 12:23 ` [PATCH nf 2/3] netfilter: nft_set_pipapo: add helper to release pcpu scratch area Florian Westphal
2024-02-07 17:29   ` Stefano Brivio
2024-02-06 12:23 ` [PATCH nf 3/3] netfilter: nft_set_pipapo: remove scratch_aligned pointer Florian Westphal
2024-02-07 17:29   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240207182751.6ed0dd1d@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=fw@strlen.de \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).