From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Mina Almasry <almasrymina@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Jakub Kicinski <kuba@kernel.org>,
stable@vger.kernel.org, Helge Deller <deller@gmx.de>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
linux-mm@kvack.org, netdev@vger.kernel.org
Subject: Re: [PATCH net v2] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
Date: Wed, 01 Oct 2025 09:31:06 +0200 [thread overview]
Message-ID: <878qhvm62t.fsf@toke.dk> (raw)
In-Reply-To: <CAHS8izPGxvdDu7JwEWK2=fk=qHoYgFzOs1FjOWjmNwqrU2r0kA@mail.gmail.com>
Mina Almasry <almasrymina@google.com> writes:
> On Tue, Sep 30, 2025 at 4:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
>> boot on his 32-bit parisc machine. The cause of this is the mask is set
>> too wide, so the page_pool_page_is_pp() incurs false positives which
>> crashes the machine.
>>
>> Just disabling the check in page_pool_is_pp() will lead to the page_pool
>> code itself malfunctioning; so instead of doing this, this patch changes
>> the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
>> pointers for page_pool-tagged pages.
>>
>> The fix relies on the kernel pointers that alias with the pp_magic field
>> always being above PAGE_OFFSET. With this assumption, we can use the
>> lowest bit of the value of PAGE_OFFSET as the upper bound of the
>> PP_DMA_INDEX_MASK, which should avoid the false positives.
>>
>> Because we cannot rely on PAGE_OFFSET always being a compile-time
>> constant, nor on it always being >0, we fall back to disabling the
>> dma_index storage when there are not enough bits available. This leaves
>> us in the situation we were in before the patch in the Fixes tag, but
>> only on a subset of architecture configurations. This seems to be the
>> best we can do until the transition to page types in complete for
>> page_pool pages.
>>
>> v2:
>> - Make sure there's at least 8 bits available and that the PAGE_OFFSET
>> bit calculation doesn't wrap
>>
>> Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
>> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
>> Cc: stable@vger.kernel.org # 6.15+
>> Tested-by: Helge Deller <deller@gmx.de>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>> include/linux/mm.h | 22 +++++++------
>> net/core/page_pool.c | 76 ++++++++++++++++++++++++++++++--------------
>> 2 files changed, 66 insertions(+), 32 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 1ae97a0b8ec7..0905eb6b55ec 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -4159,14 +4159,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>> * since this value becomes part of PP_SIGNATURE; meaning we can just use the
>> * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
>> * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
>> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
>> - * we won't mistake a valid kernel pointer for a value we set, regardless of the
>> - * VMSPLIT setting.
>> + * 0, we use the lowest bit of PAGE_OFFSET as the boundary if that value is
>> + * known at compile-time.
>> *
>> - * Altogether, this means that the number of bits available is constrained by
>> - * the size of an unsigned long (at the upper end, subtracting two bits per the
>> - * above), and the definition of PP_SIGNATURE (with or without
>> - * POISON_POINTER_DELTA).
>> + * If the value of PAGE_OFFSET is not known at compile time, or if it is too
>> + * small to leave at least 8 bits available above PP_SIGNATURE, we define the
>> + * number of bits to be 0, which turns off the DMA index tracking altogether
>> + * (see page_pool_register_dma_index()).
>> */
>> #define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
>> #if POISON_POINTER_DELTA > 0
>> @@ -4175,8 +4174,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>> */
>> #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>> #else
>> -/* Always leave out the topmost two; see above. */
>> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
>> +/* Use the lowest bit of PAGE_OFFSET if there's at least 8 bits available; see above */
>> +#define PP_DMA_INDEX_MIN_OFFSET (1 << (PP_DMA_INDEX_SHIFT + 8))
>> +#define PP_DMA_INDEX_BITS ((__builtin_constant_p(PAGE_OFFSET) && \
>> + PAGE_OFFSET >= PP_DMA_INDEX_MIN_OFFSET && \
>> + !(PAGE_OFFSET & (PP_DMA_INDEX_MIN_OFFSET - 1))) ? \
>> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
>> +
>> #endif
>
> It took some staring at, but I think I understand this code and it is
> correct. This is the critical check, it's making sure that the bits
> used by PAGE_OFFSET are not shared with the bits used for the
> dma-index:
>
>> + !(PAGE_OFFSET & (PP_DMA_INDEX_MIN_OFFSET - 1))) ? \
>
> The following check confused me for a while, but I think I figured it
> out. It's checking that the bits used for PAGE_OFFSET are 'higher'
> than the bits used for PP_DMA_INDEX:
>
>> + PAGE_OFFSET >= PP_DMA_INDEX_MIN_OFFSET && \
>
> And finally this calculation should indeed be the bits we can use (the
> empty space between the lsb set by PAGE_OFFSET and the msb set by the
> pp magic:
>
>> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
Yup, exactly! Thanks for walking through it and confirming that the
logic is sound :)
> AFAIU we should not need the MIN anymore, since that subtraction is
> guaranteed to be positive, but that's a nit.
The MIN was originally there to limit how many bits we use for 64-bit
systems that don't set POISON_POINTER_DELTA, since xarray uses a u32 for
the size of the limits. Not sure if such a combination exists in the
real world, but I figure that having it there doesn't hurt in any case.
> Reviewed-by: Mina Almasry <almasrymina@google.com>
Thanks!
-Toke
next prev parent reply other threads:[~2025-10-01 7:31 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-30 11:43 [PATCH net v2] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches Toke Høiland-Jørgensen
2025-09-30 23:36 ` Mina Almasry
2025-10-01 7:31 ` Toke Høiland-Jørgensen [this message]
2025-10-01 7:21 ` Helge Deller
2025-10-01 8:28 ` Toke Høiland-Jørgensen
2025-10-06 20:34 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878qhvm62t.fsf@toke.dk \
--to=toke@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=deller@gmx.de \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rppt@kernel.org \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.