From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Paolo Abeni <pabeni@redhat.com>, Mina Almasry <almasrymina@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Jakub Kicinski <kuba@kernel.org>,
Helge Deller <deller@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Simon Horman <horms@kernel.org>,
linux-mm@kvack.org, netdev@vger.kernel.org
Subject: Re: [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
Date: Tue, 30 Sep 2025 13:30:42 +0200 [thread overview]
Message-ID: <87ms6cmb31.fsf@toke.dk> (raw)
In-Reply-To: <0dc3f975-c769-4a78-9211-80869509cfd2@redhat.com>
Paolo Abeni <pabeni@redhat.com> writes:
> On 9/30/25 9:45 AM, Toke Høiland-Jørgensen wrote:
>> Mina Almasry <almasrymina@google.com> writes:
>>> On Fri, Sep 26, 2025 at 4:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>
>>>> Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
>>>> boot on his 32-bit parisc machine. The cause of this is the mask is set
>>>> too wide, so the page_pool_page_is_pp() incurs false positives which
>>>> crashes the machine.
>>>>
>>>> Just disabling the check in page_pool_is_pp() will lead to the page_pool
>>>> code itself malfunctioning; so instead of doing this, this patch changes
>>>> the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
>>>> pointers for page_pool-tagged pages.
>>>>
>>>> The fix relies on the kernel pointers that alias with the pp_magic field
>>>> always being above PAGE_OFFSET. With this assumption, we can use the
>>>> lowest bit of the value of PAGE_OFFSET as the upper bound of the
>>>> PP_DMA_INDEX_MASK, which should avoid the false positives.
>>>>
>>>> Because we cannot rely on PAGE_OFFSET always being a compile-time
>>>> constant, nor on it always being >0, we fall back to disabling the
>>>> dma_index storage when there are no bits available. This leaves us in
>>>> the situation we were in before the patch in the Fixes tag, but only on
>>>> a subset of architecture configurations. This seems to be the best we
>>>> can do until the transition to page types in complete for page_pool
>>>> pages.
>>>>
>>>> Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
>>>> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>> ---
>>>> Sorry for the delay on getting this out. I have only compile-tested it,
>>>> since I don't have any hardware that triggers the original bug. Helge, I'm
>>>> hoping you can take it for a spin?
>>>>
>>>> include/linux/mm.h | 18 +++++------
>>>> net/core/page_pool.c | 76 ++++++++++++++++++++++++++++++--------------
>>>> 2 files changed, 62 insertions(+), 32 deletions(-)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index 1ae97a0b8ec7..28541cb40f69 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -4159,14 +4159,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>> * since this value becomes part of PP_SIGNATURE; meaning we can just use the
>>>> * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
>>>> * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
>>>> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
>>>> - * we won't mistake a valid kernel pointer for a value we set, regardless of the
>>>> - * VMSPLIT setting.
>>>> + * 0, we use the lowest bit of PAGE_OFFSET as the boundary if that value is
>>>> + * known at compile-time.
>>>> *
>>>> - * Altogether, this means that the number of bits available is constrained by
>>>> - * the size of an unsigned long (at the upper end, subtracting two bits per the
>>>> - * above), and the definition of PP_SIGNATURE (with or without
>>>> - * POISON_POINTER_DELTA).
>>>> + * If the value of PAGE_OFFSET is not known at compile time, or if it is too
>>>> + * small to leave some bits available above PP_SIGNATURE, we define the number
>>>> + * of bits to be 0, which turns off the DMA index tracking altogether (see
>>>> + * page_pool_register_dma_index()).
>>>> */
>>>> #define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
>>>> #if POISON_POINTER_DELTA > 0
>>>> @@ -4175,8 +4174,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>> */
>>>> #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>>>> #else
>>>> -/* Always leave out the topmost two; see above. */
>>>> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
>>>> +/* Constrain to the lowest bit of PAGE_OFFSET if known; see above. */
>>>> +#define PP_DMA_INDEX_BITS ((__builtin_constant_p(PAGE_OFFSET) && PAGE_OFFSET > PP_SIGNATURE) ? \
>>>> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
>>>
>>> Do you have to watch out for an underflow of __ffs(PAGE_OFFSET) -
>>> PP_DMA_INDEX_SHIFT (at which point we'll presumably use 32 here
>>> instead of the expected 0)? Or is that guaranteed to be positive for
>>> some reason I'm not immediately grasping.
>>
>> That's what the 'PAGE_OFFSET > PP_SIGNATURE' in the ternary operator is
>> for. I'm assuming that PAGE_OFFSET is always a "round" number (e.g.,
>> 0xc0000000), in which case that condition should be sufficient, no?
>
> IDK if such assumption is obviously true. I think it would be safer to
> somehow express it with a build time constraint.
Alright, I'll respin and constrain it further.
-Toke
prev parent reply other threads:[~2025-09-30 11:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 11:38 [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches Toke Høiland-Jørgensen
2025-09-26 15:25 ` Helge Deller
2025-09-29 10:07 ` Toke Høiland-Jørgensen
2025-09-30 0:04 ` Mina Almasry
2025-09-30 7:45 ` Toke Høiland-Jørgensen
2025-09-30 10:13 ` Paolo Abeni
2025-09-30 11:30 ` Toke Høiland-Jørgensen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ms6cmb31.fsf@toke.dk \
--to=toke@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=deller@kernel.org \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.