From: Stefano Brivio <sbrivio@redhat.com>
To: Florian Westphal <fw@strlen.de>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps
Date: Wed, 7 Feb 2024 18:27:51 +0100 [thread overview]
Message-ID: <20240207182751.6ed0dd1d@elisabeth> (raw)
In-Reply-To: <20240207152328.GA11077@breakpoint.cc>
On Wed, 7 Feb 2024 16:23:28 +0100
Florian Westphal <fw@strlen.de> wrote:
> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > This isn't reliable.
> >
> > Uh oh. In hindsight, this sounds so obvious now...
>
> Thats a recurring theme with a lot of bugs.
> So, no, it was not obvious.
>
> > > There can be multiple sets and we can't be sure that the upper
> > > and lower half of all set scratch map is always in sync (lookups
> > > can be conditional), so one set might have swapped, but other might
> > > not have been queried.
> > >
> > > Thus we need to keep the index per-set-and-cpu, just like the
> > > scratchpad.
> > >
> > > Note that this bug fix is incomplete, there is a related issue.
> > >
> > > avx2 and normal implementation might use slightly different areas of the
> > > map array space due to the avx2 alignment requirements, so
> > > m->scratch (generic/fallback implementation) and ->scratch_aligned
> > > (avx) may partially overlap. scratch and scratch_aligned are not distinct
> > > objects, the latter is just the aligned address of the former.
> > >
> > > After this change, write to scratch_align->map_index may write to
> > > scratch->map, so this issue becomes more prominent, we can set to 1
> > > a bit in the supposedly-all-zero area of scratch->map[].
> > >
> > > A followup patch will remove the scratch_aligned and makes generic and
> > > avx code use the same (aligned) area.
> > >
> > > Its done in a separate change to ease review.
> > >
> > > Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
> > > Signed-off-by: Florian Westphal <fw@strlen.de>
> >
> > Minus one nit (not worth respinning) and one half-doubt below,
> >
> > Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> >
> > ...I'm still reviewing the rest.
>
> Thanks for reviewing.
>
> > > #ifdef NFT_PIPAPO_ALIGN
> > > - scratch_aligned = NFT_PIPAPO_LT_ALIGN(scratch);
> > > + scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
> > > + /* Need to align ->map start, not start of structure itself */
> > > + scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);
> >
> > This should be fine with the current version of
> > NFT_PIPAPO_ALIGN_HEADROOM, but it took me quite some pondering, reasoning
> > below if you feel like double checking.
>
> Good point.
>
> > Let's say ARCH_KMALLOC_MINALIGN is 4, NFT_PIPAPO_LT_ALIGN is 32, we
> > need 100 bytes for the scratch map (existing implementation), and the
> > address, x, we get from kzalloc_node() is k + 28, with k aligned to 32
> > bytes.
> > Then, we'll ask to allocate 32 - 4 = 28 extra bytes
> > (NFT_PIPAPO_ALIGN_HEADROOM), that is, 128 bytes, and we'll start using
> > the area at x + 4 (aligned to 32 bytes), with 124 bytes in front of us.
> >
> > With this change, and the current NFT_PIPAPO_ALIGN_HEADROOM, we'll
> > allocate (usually) 4 bytes more, 132 bytes, and we'll start using the
> > area at x + 4 anyway, with 128 bytes in front of us, and we could have
> > asked to allocate less, which made me think for a moment that
> > NFT_PIPAPO_ALIGN_HEADROOM needed to be adjusted too.
>
> We'll allocate sizeof(long) more space (map_index), which is 4 bytes in
> your example.
Oh, right, it could be 8 bytes too. Let's stick to 4 for simplicity.
> > However, 'scratch' at k + 28 is not the worst case: k + 32 is. At that
> > point, we need anyway to ask for 28 extra bytes, because 'map_index'
> > will force us to start from x + 32.
>
> Wait. k + 32 is really "k" for old code: slab gave us an aligned
> address.
Yes, I meant that for the *new* code: k + 32 mod 32 (that is, k) is the
worst case for the new code.
> In new code, k+4 is the perfect "already-aligned" address where we would
> 'no-op' the address on a ARCH_KMALLOC_MINALIGN == 4 system.
Isn't the already aligned-address k - 4, that is, k + 28? With k + 4,
we would have &scratch->map[0] at k + 8. But anyway:
> Lets assume we get address k, and that address is the worst
> possible aligned one (with minalign 4), so we align
> (k + 4) (&scratch->map[0]), then subtract the index/struct head,
> which means we store (align(k+4) - 4), which would be 28.
>
> Worst case aligned value allocator can provide for new code
> is k or k + 32 (make no difference):
>
> Lets say address we got from allocator is 0x4:
>
> NFT_PIPAPO_LT_ALIGN(&scratch->map); -> aligned to 32, we store 28
> as start of struct, so ->map[0] is at address 32.
>
> Lets say address we got from allocator is 0x20 (32):
> NFT_PIPAPO_LT_ALIGN(&scratch->map); -> aligned to 64, we store 60
> as start of struct, so ->map[0] at 64.
>
> In both cases ALIGN() ate 28 bytes of the allocation, which we accounted
> for as NFT_PIPAPO_ALIGN_HEADROOM.
>
> Maybe thats what you were saying. I could try to add/expand the
> comments here for the alignment calculations.
...yes, the rest is exactly what I meant. I'm not really satisfied of
the paragraph below but maybe something on the lines of:
/* Align &scratch->map (not the struct itself): the extra
* %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node() above
* guarantee we can waste up to those bytes in order to align the map
* field regardless of its offset within the struct.
*/
--
Stefano
next prev parent reply other threads:[~2024-02-07 17:29 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-06 12:23 [PATCH nf 0/3] netfilter: nft_set_pipapo: map_index must be per set Florian Westphal
2024-02-06 12:23 ` [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps Florian Westphal
2024-02-07 14:15 ` Stefano Brivio
2024-02-07 15:23 ` Florian Westphal
2024-02-07 17:27 ` Stefano Brivio [this message]
2024-02-07 19:24 ` Florian Westphal
2024-02-06 12:23 ` [PATCH nf 2/3] netfilter: nft_set_pipapo: add helper to release pcpu scratch area Florian Westphal
2024-02-07 17:29 ` Stefano Brivio
2024-02-06 12:23 ` [PATCH nf 3/3] netfilter: nft_set_pipapo: remove scratch_aligned pointer Florian Westphal
2024-02-07 17:29 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240207182751.6ed0dd1d@elisabeth \
--to=sbrivio@redhat.com \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.