linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
@ 2025-09-12 23:06 Helge Deller
  2025-09-14 17:06 ` Christoph Biedl
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Helge Deller @ 2025-09-12 23:06 UTC (permalink / raw)
  To: David Hildenbrand, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Linux Memory Management List, netdev, Linux parisc List,
	Andrew Morton

Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
32-bit platforms.

The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
pages, but the remaining bits are not sufficient to unambiguously identify
such pages any longer.

So page_pool_page_is_pp() now sometimes wrongly reports pages as page pool
pages and as such triggers a kernel BUG as it believes it found a page pool
leak.

There are patches upcoming where page_pool_page_is_pp() will not depend on
PP_MAGIC_MASK and instead use page flags to identify page pool pages. Until
those patches are merged, the easiest temporary fix is to disable the check
on 32-bit platforms.

Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: netdev@vger.kernel.org
Cc: Linux parisc List <linux-parisc@vger.kernel.org>
Cc: <stable@vger.kernel.org> # v6.15+
Signed-off-by: Helge Deller <deller@gmx.de>
Link: https://www.spinics.net/lists/kernel/msg5849623.html
Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ae97a0b8ec7..f3822ae70a81 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4190,7 +4190,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
  */
 #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
 
-#ifdef CONFIG_PAGE_POOL
+#if defined(CONFIG_PAGE_POOL) && defined(CONFIG_64BIT)
 static inline bool page_pool_page_is_pp(const struct page *page)
 {
 	treturn (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;

----- End forwarded message -----


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-12 23:06 [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp() Helge Deller
@ 2025-09-14 17:06 ` Christoph Biedl
  2025-09-15  9:45 ` Jesper Dangaard Brouer
  2025-09-15 11:44 ` Toke Høiland-Jørgensen
  2 siblings, 0 replies; 13+ messages in thread
From: Christoph Biedl @ 2025-09-14 17:06 UTC (permalink / raw)
  To: Helge Deller
  Cc: David Hildenbrand, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Linux Memory Management List, netdev, Linux parisc List,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1540 bytes --]

Helge Deller wrote...

> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
> 32-bit platforms.
> 
> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
> pages, but the remaining bits are not sufficient to unambiguously identify
> such pages any longer.
> 
> So page_pool_page_is_pp() now sometimes wrongly reports pages as page pool
> pages and as such triggers a kernel BUG as it believes it found a page pool
> leak.
> 
> There are patches upcoming where page_pool_page_is_pp() will not depend on
> PP_MAGIC_MASK and instead use page flags to identify page pool pages. Until
> those patches are merged, the easiest temporary fix is to disable the check
> on 32-bit platforms.
> 
> Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Cc: David Hildenbrand <david@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Linux Memory Management List <linux-mm@kvack.org>
> Cc: netdev@vger.kernel.org
> Cc: Linux parisc List <linux-parisc@vger.kernel.org>
> Cc: <stable@vger.kernel.org> # v6.15+
> Signed-off-by: Helge Deller <deller@gmx.de>
> Link: https://www.spinics.net/lists/kernel/msg5849623.html
> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")

Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-12 23:06 [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp() Helge Deller
  2025-09-14 17:06 ` Christoph Biedl
@ 2025-09-15  9:45 ` Jesper Dangaard Brouer
  2025-09-15 11:44 ` Toke Høiland-Jørgensen
  2 siblings, 0 replies; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2025-09-15  9:45 UTC (permalink / raw)
  To: Helge Deller, David Hildenbrand, Toke Høiland-Jørgensen,
	Ilias Apalodimas, David S. Miller, Linux Memory Management List,
	netdev, Linux parisc List, Andrew Morton
  Cc: Mina Almasry, Jakub Kicinski, Toke Høiland-Jørgensen



On 13/09/2025 01.06, Helge Deller wrote:
> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
> 32-bit platforms.
> 
> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
> pages, but the remaining bits are not sufficient to unambiguously identify
> such pages any longer.
> 

Function netmem_is_pp[1] uses same kind of check against PP_MAGIC_MASK.
The call-site skb_pp_recycle[2] is protected by skb->pp_recycle check.

BUT napi_pp_put_page[3] callers have been expanded, and I'm uncertain if
they will be affected by this. (Cc Mina)

  [1] 
https://elixir.bootlin.com/linux/v6.16.7/source/net/core/netmem_priv.h#L23-L26
  [2] 
https://elixir.bootlin.com/linux/v6.16.7/source/net/core/skbuff.c#L1009
  [3] 
https://elixir.bootlin.com/linux/v6.16.7/source/net/core/skbuff.c#L991-L995

> So page_pool_page_is_pp() now sometimes wrongly reports pages as page pool
> pages and as such triggers a kernel BUG as it believes it found a page pool
> leak.
> 

This sounds scary to me, as netstack (see above code examples) also uses
checks against PP_MAGIC_MASK (+ PP_SIGNATURE).  I hope these checks
isn't also subject to this issue(!?)

(To Toke:) Did we steal too many bits for PP_DMA_INDEX_MASK?


> There are patches upcoming where page_pool_page_is_pp() will not depend on
> PP_MAGIC_MASK and instead use page flags to identify page pool pages. Until
> those patches are merged, the easiest temporary fix is to disable the check
> on 32-bit platforms.
> 
> Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Cc: David Hildenbrand <david@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Linux Memory Management List <linux-mm@kvack.org>
> Cc: netdev@vger.kernel.org
> Cc: Linux parisc List <linux-parisc@vger.kernel.org>
> Cc: <stable@vger.kernel.org> # v6.15+
> Signed-off-by: Helge Deller <deller@gmx.de>
> Link: https://www.spinics.net/lists/kernel/msg5849623.html
> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ae97a0b8ec7..f3822ae70a81 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4190,7 +4190,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>    */
>   #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>   
> -#ifdef CONFIG_PAGE_POOL
> +#if defined(CONFIG_PAGE_POOL) && defined(CONFIG_64BIT)
>   static inline bool page_pool_page_is_pp(const struct page *page)
>   {
>   	treturn (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> 
> ----- End forwarded message -----

Looks like your email got truncated.
--Jesper


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-12 23:06 [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp() Helge Deller
  2025-09-14 17:06 ` Christoph Biedl
  2025-09-15  9:45 ` Jesper Dangaard Brouer
@ 2025-09-15 11:44 ` Toke Høiland-Jørgensen
  2025-09-15 13:08   ` Helge Deller
  2 siblings, 1 reply; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-09-15 11:44 UTC (permalink / raw)
  To: Helge Deller, David Hildenbrand, Jesper Dangaard Brouer,
	Ilias Apalodimas, David S. Miller, Linux Memory Management List,
	netdev, Linux parisc List, Andrew Morton

Helge Deller <deller@kernel.org> writes:

> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
> 32-bit platforms.
>
> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
> pages, but the remaining bits are not sufficient to unambiguously identify
> such pages any longer.

Why not? What values end up in pp_magic that are mistaken for the
pp_signature?

As indicated by the comment above the definition of the PP_DMA_INDEX_*
definitions, I did put some care into ensuring that the values would not
get mistaken, see specifically this:

(...) On arches where POISON_POINTER_DELTA is
 * 0, we make sure that we leave the six topmost bits empty, as that guarantees
 * we won't mistake a valid kernel pointer for a value we set, regardless of the
 * VMSPLIT setting.

So if that does not hold, I'd like to understand why not?

> So page_pool_page_is_pp() now sometimes wrongly reports pages as page pool
> pages and as such triggers a kernel BUG as it believes it found a page pool
> leak.
>
> There are patches upcoming where page_pool_page_is_pp() will not depend on
> PP_MAGIC_MASK and instead use page flags to identify page pool pages. Until
> those patches are merged, the easiest temporary fix is to disable the check
> on 32-bit platforms.

As Jesper pointed out, we also use this check internally in the network
stack, and the patch as proposed will at least trigger the
DEBUG_NET_WARN_ON_ONCE() in include/net/netmem.h. I think a better
solution would be, as Jesper also alludes to, simply adding more bits to
the mask. For instance, the patch below reserves (somewhat arbitrarily)
six bits instead of two, changing the mask to 0xfc00007c; would that
work?

-Toke

diff --git i/include/linux/mm.h w/include/linux/mm.h
index 1ae97a0b8ec7..17cb8157ba08 100644
--- i/include/linux/mm.h
+++ w/include/linux/mm.h
@@ -4159,12 +4159,12 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
  * since this value becomes part of PP_SIGNATURE; meaning we can just use the
  * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
  * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
- * 0, we make sure that we leave the two topmost bits empty, as that guarantees
+ * 0, we make sure that we leave the six topmost bits empty, as that guarantees
  * we won't mistake a valid kernel pointer for a value we set, regardless of the
  * VMSPLIT setting.
  *
  * Altogether, this means that the number of bits available is constrained by
- * the size of an unsigned long (at the upper end, subtracting two bits per the
+ * the size of an unsigned long (at the upper end, subtracting six bits per the
  * above), and the definition of PP_SIGNATURE (with or without
  * POISON_POINTER_DELTA).
  */
@@ -4175,8 +4175,8 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
  */
 #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
 #else
-/* Always leave out the topmost two; see above. */
-#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
+/* Always leave out the topmost six; see above. */
+#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 6)
 #endif
 
 #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-15 11:44 ` Toke Høiland-Jørgensen
@ 2025-09-15 13:08   ` Helge Deller
  2025-09-15 20:41     ` Mina Almasry
  2025-09-15 23:51     ` Mina Almasry
  0 siblings, 2 replies; 13+ messages in thread
From: Helge Deller @ 2025-09-15 13:08 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Helge Deller, David Hildenbrand,
	Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Linux Memory Management List, netdev, Linux parisc List,
	Andrew Morton

On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
> Helge Deller <deller@kernel.org> writes:
> 
>> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
>> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
>> 32-bit platforms.
>>
>> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
>> pages, but the remaining bits are not sufficient to unambiguously identify
>> such pages any longer.
> 
> Why not? What values end up in pp_magic that are mistaken for the
> pp_signature?

As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
And we have PP_SIGNATURE == 0x40  (since POISON_POINTER_DELTA is zero on 32-bit platforms).
That means, that before page_pool_page_is_pp() could clearly identify such pages,
as the (value & 0xFFFFFFFC) == 0x40.
So, basically only the 0x40 value indicated a PP page.

Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
just showing a few examples:
0x01111040
0x082330C0
0x03264040
0x0ad686c0 ....

For me it crashes immediately at bootup when memblocked pages are handed
over to become normal pages.

> As indicated by the comment above the definition of the PP_DMA_INDEX_*
> definitions, I did put some care into ensuring that the values would not
> get mistaken, see specifically this:
> 
> (...) On arches where POISON_POINTER_DELTA is
>   * 0, we make sure that we leave the six topmost bits empty, as that guarantees
>   * we won't mistake a valid kernel pointer for a value we set, regardless of the
>   * VMSPLIT setting.
> 
> So if that does not hold, I'd like to understand why not?

Because on 32-bit arches POISON_POINTER_DELTA is zero, and as such
you basically can't take away any of the remaining low 32 (30) bits.
  
>> So page_pool_page_is_pp() now sometimes wrongly reports pages as page pool
>> pages and as such triggers a kernel BUG as it believes it found a page pool
>> leak.
>>
>> There are patches upcoming where page_pool_page_is_pp() will not depend on
>> PP_MAGIC_MASK and instead use page flags to identify page pool pages. Until
>> those patches are merged, the easiest temporary fix is to disable the check
>> on 32-bit platforms.
> 
> As Jesper pointed out, we also use this check internally in the network
> stack, and the patch as proposed will at least trigger the
> DEBUG_NET_WARN_ON_ONCE() in include/net/netmem.h.

Interestingly it did not triggered this warning for me.
Need to look into this.

> I think a better
> solution would be, as Jesper also alludes to, simply adding more bits to
> the mask. For instance, the patch below reserves (somewhat arbitrarily)
> six bits instead of two, changing the mask to 0xfc00007c; would that
> work?

That just shifting the main problem and you may be lucky for short time.
page_pool_page_is_pp() need to *reliably* detect pp pages, all other is guessing.
I don't believe there is any way to really fix it for 32-bit other than
reverting your change, or to disable the check (on 32-bit).

Helge

> 
> -Toke
> 
> diff --git i/include/linux/mm.h w/include/linux/mm.h
> index 1ae97a0b8ec7..17cb8157ba08 100644
> --- i/include/linux/mm.h
> +++ w/include/linux/mm.h
> @@ -4159,12 +4159,12 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>    * since this value becomes part of PP_SIGNATURE; meaning we can just use the
>    * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
>    * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
> + * 0, we make sure that we leave the six topmost bits empty, as that guarantees
>    * we won't mistake a valid kernel pointer for a value we set, regardless of the
>    * VMSPLIT setting.
>    *
>    * Altogether, this means that the number of bits available is constrained by
> - * the size of an unsigned long (at the upper end, subtracting two bits per the
> + * the size of an unsigned long (at the upper end, subtracting six bits per the
>    * above), and the definition of PP_SIGNATURE (with or without
>    * POISON_POINTER_DELTA).
>    */
> @@ -4175,8 +4175,8 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>    */
>   #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>   #else
> -/* Always leave out the topmost two; see above. */
> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
> +/* Always leave out the topmost six; see above. */
> +#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 6)
>   #endif
>   
>   #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-15 13:08   ` Helge Deller
@ 2025-09-15 20:41     ` Mina Almasry
  2025-09-15 23:51     ` Mina Almasry
  1 sibling, 0 replies; 13+ messages in thread
From: Mina Almasry @ 2025-09-15 20:41 UTC (permalink / raw)
  To: Helge Deller
  Cc: Toke Høiland-Jørgensen, Helge Deller, David Hildenbrand,
	Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Linux Memory Management List, netdev, Linux parisc List,
	Andrew Morton

On Mon, Sep 15, 2025 at 6:08 AM Helge Deller <deller@gmx.de> wrote:
>
> On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
> > Helge Deller <deller@kernel.org> writes:
> >
> >> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
> >> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
> >> 32-bit platforms.
> >>
> >> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
> >> pages, but the remaining bits are not sufficient to unambiguously identify
> >> such pages any longer.
> >
> > Why not? What values end up in pp_magic that are mistaken for the
> > pp_signature?
>
> As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
> And we have PP_SIGNATURE == 0x40  (since POISON_POINTER_DELTA is zero on 32-bit platforms).
> That means, that before page_pool_page_is_pp() could clearly identify such pages,
> as the (value & 0xFFFFFFFC) == 0x40.
> So, basically only the 0x40 value indicated a PP page.
>
> Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
> just showing a few examples:
> 0x01111040
> 0x082330C0
> 0x03264040
> 0x0ad686c0 ....
>
> For me it crashes immediately at bootup when memblocked pages are handed
> over to become normal pages.
>
> > As indicated by the comment above the definition of the PP_DMA_INDEX_*
> > definitions, I did put some care into ensuring that the values would not
> > get mistaken, see specifically this:
> >
> > (...) On arches where POISON_POINTER_DELTA is
> >   * 0, we make sure that we leave the six topmost bits empty, as that guarantees
> >   * we won't mistake a valid kernel pointer for a value we set, regardless of the
> >   * VMSPLIT setting.
> >
> > So if that does not hold, I'd like to understand why not?
>
> Because on 32-bit arches POISON_POINTER_DELTA is zero, and as such
> you basically can't take away any of the remaining low 32 (30) bits.
>
> >> So page_pool_page_is_pp() now sometimes wrongly reports pages as page pool
> >> pages and as such triggers a kernel BUG as it believes it found a page pool
> >> leak.
> >>
> >> There are patches upcoming where page_pool_page_is_pp() will not depend on
> >> PP_MAGIC_MASK and instead use page flags to identify page pool pages. Until
> >> those patches are merged, the easiest temporary fix is to disable the check
> >> on 32-bit platforms.
> >
> > As Jesper pointed out, we also use this check internally in the network
> > stack, and the patch as proposed will at least trigger the
> > DEBUG_NET_WARN_ON_ONCE() in include/net/netmem.h.
>
> Interestingly it did not triggered this warning for me.
> Need to look into this.
>

I think you're probably not running into this because you're not
testing on with a driver that supports pp. There are 2 broad buckets
of places where page_pool_page_is_pp and netmem_is_pp is used:

(1) in the networking stack, where all the checks are guarded by
skb->pp_recycle, which is not set unless the driver supports pp
(2) in the general mm code, where the checks are not guarded by skb->pp_recycle

You seem to be hitting 2 and not 1, which just indicates you don't
have a pp enabled driver, I think. If you do and you're not hitting
the warns then that's indeed a bit weird.

The proposed patch fixes callsites #2 but does not fix #1, so I think
it may not be good enough. 32-bit setups with pp drivers will still be
completely bonked with that patch I think, although it's obviously
better than crashing on boot. I think we need the *_is_pp() checks to
work reliably one way or another. Reverting will also introduce the
crashes that patch aimed to fix :( I'm taking a look to see if there
is any suggestion here I can make.


--
Thanks,
Mina


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-15 13:08   ` Helge Deller
  2025-09-15 20:41     ` Mina Almasry
@ 2025-09-15 23:51     ` Mina Almasry
  2025-09-16  9:27       ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 13+ messages in thread
From: Mina Almasry @ 2025-09-15 23:51 UTC (permalink / raw)
  To: Helge Deller
  Cc: Toke Høiland-Jørgensen, Helge Deller, David Hildenbrand,
	Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Linux Memory Management List, netdev, Linux parisc List,
	Andrew Morton

On Mon, Sep 15, 2025 at 6:08 AM Helge Deller <deller@gmx.de> wrote:
>
> On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
> > Helge Deller <deller@kernel.org> writes:
> >
> >> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
> >> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
> >> 32-bit platforms.
> >>
> >> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
> >> pages, but the remaining bits are not sufficient to unambiguously identify
> >> such pages any longer.
> >
> > Why not? What values end up in pp_magic that are mistaken for the
> > pp_signature?
>
> As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
> And we have PP_SIGNATURE == 0x40  (since POISON_POINTER_DELTA is zero on 32-bit platforms).
> That means, that before page_pool_page_is_pp() could clearly identify such pages,
> as the (value & 0xFFFFFFFC) == 0x40.
> So, basically only the 0x40 value indicated a PP page.
>
> Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
> just showing a few examples:
> 0x01111040
> 0x082330C0
> 0x03264040
> 0x0ad686c0 ....
>
> For me it crashes immediately at bootup when memblocked pages are handed
> over to become normal pages.
>

I tried to take a look to double check here and AFAICT Helge is correct.

Before the breaking patch with PP_MAGIC_MASK==0xFFFFFFFC, basically
0x40 is the only pointer that may be mistaken as a valid pp_magic.
AFAICT each bit we 0 in the PP_MAGIC_MASK (aside from the 3 least
significant bits), doubles the number of pointers that can be mistaken
for pp_magic. So with 0xFFFFFFFC, only one value (0x40) can be
mistaken as a valid pp_magic, with  0xc000007c AFAICT 2^22 values can
be mistaken as pp_magic?

I don't know that there is any bits we can take away from
PP_MAGIC_MASK I think? As each bit doubles the probablity :(

I would usually say we can check the 3 least significant bits to tell
if pp_magic is a pointer or not, but pp_magic is unioned with
page->lru I believe which will use those bits.

AFAICT, only proper resolution I see is a revert of the breaking patch
+ reland after we can make pp a page-flag and deprecate using
pp_magic. Sorry about that. Thoughts Toke? Anything better you can
think of here?

-- 
Thanks,
Mina


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-15 23:51     ` Mina Almasry
@ 2025-09-16  9:27       ` Toke Høiland-Jørgensen
  2025-09-16 22:21         ` Mina Almasry
  0 siblings, 1 reply; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-09-16  9:27 UTC (permalink / raw)
  To: Mina Almasry, Helge Deller
  Cc: Helge Deller, David Hildenbrand, Jesper Dangaard Brouer,
	Ilias Apalodimas, David S. Miller, Linux Memory Management List,
	netdev, Linux parisc List, Andrew Morton

Mina Almasry <almasrymina@google.com> writes:

> On Mon, Sep 15, 2025 at 6:08 AM Helge Deller <deller@gmx.de> wrote:
>>
>> On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
>> > Helge Deller <deller@kernel.org> writes:
>> >
>> >> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
>> >> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
>> >> 32-bit platforms.
>> >>
>> >> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
>> >> pages, but the remaining bits are not sufficient to unambiguously identify
>> >> such pages any longer.
>> >
>> > Why not? What values end up in pp_magic that are mistaken for the
>> > pp_signature?
>>
>> As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
>> And we have PP_SIGNATURE == 0x40  (since POISON_POINTER_DELTA is zero on 32-bit platforms).
>> That means, that before page_pool_page_is_pp() could clearly identify such pages,
>> as the (value & 0xFFFFFFFC) == 0x40.
>> So, basically only the 0x40 value indicated a PP page.
>>
>> Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
>> just showing a few examples:
>> 0x01111040
>> 0x082330C0
>> 0x03264040
>> 0x0ad686c0 ....
>>
>> For me it crashes immediately at bootup when memblocked pages are handed
>> over to become normal pages.
>>
>
> I tried to take a look to double check here and AFAICT Helge is correct.
>
> Before the breaking patch with PP_MAGIC_MASK==0xFFFFFFFC, basically
> 0x40 is the only pointer that may be mistaken as a valid pp_magic.
> AFAICT each bit we 0 in the PP_MAGIC_MASK (aside from the 3 least
> significant bits), doubles the number of pointers that can be mistaken
> for pp_magic. So with 0xFFFFFFFC, only one value (0x40) can be
> mistaken as a valid pp_magic, with  0xc000007c AFAICT 2^22 values can
> be mistaken as pp_magic?
>
> I don't know that there is any bits we can take away from
> PP_MAGIC_MASK I think? As each bit doubles the probablity :(
>
> I would usually say we can check the 3 least significant bits to tell
> if pp_magic is a pointer or not, but pp_magic is unioned with
> page->lru I believe which will use those bits.

So if the pointers stored in the same field can be any arbitrary value,
you are quite right, there is no safe value. The critical assumption in
the bit stuffing scheme is that the pointers stored in the field will
always be above PAGE_OFFSET, and that PAGE_OFFSET has one (or both) of
the two top-most bits set (that is what the VMSPLIT reference in the
comment above the PP_DMA_INDEX_SHIFT definition is alluding to).

The crash Helge reported obviously indicates that this assumption
doesn't hold. What I'd like to understand if whether this is because I
have completely misunderstood how things work, or whether it is only on
*some* 32-bit systems that this assumption on the range of kernel
pointers doesn't hold?

> AFAICT, only proper resolution I see is a revert of the breaking patch
> + reland after we can make pp a page-flag and deprecate using
> pp_magic. Sorry about that. Thoughts Toke? Anything better you can
> think of here?

We can just conditionally disable the tracking if we don't have enough
bits? Something like the below (which could maybe be narrowed down
further depending on the answer to my question above).

-Toke


diff --git i/include/linux/mm.h w/include/linux/mm.h
index 1ae97a0b8ec7..3e3b090104d9 100644
--- i/include/linux/mm.h
+++ w/include/linux/mm.h
@@ -4175,8 +4175,8 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
  */
 #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
 #else
-/* Always leave out the topmost two; see above. */
-#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
+/* Can't store the DMA index if we don't have a poison offset */
+#define PP_DMA_INDEX_BITS 0
 #endif
 
 #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
diff --git i/net/core/netmem_priv.h w/net/core/netmem_priv.h
index cd95394399b4..afc5a56bba03 100644
--- i/net/core/netmem_priv.h
+++ w/net/core/netmem_priv.h
@@ -38,6 +38,7 @@ static inline void netmem_set_dma_addr(netmem_ref netmem,
 
 static inline unsigned long netmem_get_dma_index(netmem_ref netmem)
 {
+#if PP_DMA_INDEX_BITS > 0
 	unsigned long magic;
 
 	if (WARN_ON_ONCE(netmem_is_net_iov(netmem)))
@@ -46,11 +47,13 @@ static inline unsigned long netmem_get_dma_index(netmem_ref netmem)
 	magic = __netmem_clear_lsb(netmem)->pp_magic;
 
 	return (magic & PP_DMA_INDEX_MASK) >> PP_DMA_INDEX_SHIFT;
+#endif
 }
 
 static inline void netmem_set_dma_index(netmem_ref netmem,
 					unsigned long id)
 {
+#if PP_DMA_INDEX_BITS > 0
 	unsigned long magic;
 
 	if (WARN_ON_ONCE(netmem_is_net_iov(netmem)))
@@ -58,5 +61,6 @@ static inline void netmem_set_dma_index(netmem_ref netmem,
 
 	magic = netmem_get_pp_magic(netmem) | (id << PP_DMA_INDEX_SHIFT);
 	__netmem_clear_lsb(netmem)->pp_magic = magic;
+#endif
 }
 #endif
diff --git i/net/core/page_pool.c w/net/core/page_pool.c
index ba70569bd4b0..427fdf92b82c 100644
--- i/net/core/page_pool.c
+++ w/net/core/page_pool.c
@@ -495,6 +495,7 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem, gfp_t g
 		goto unmap_failed;
 	}
 
+#if PP_DMA_INDEX_BITS > 0
 	if (in_softirq())
 		err = xa_alloc(&pool->dma_mapped, &id, netmem_to_page(netmem),
 			       PP_DMA_INDEX_LIMIT, gfp);
@@ -507,6 +508,7 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem, gfp_t g
 	}
 
 	netmem_set_dma_index(netmem, id);
+#endif
 	page_pool_dma_sync_for_device(pool, netmem, pool->p.max_len);
 
 	return true;
@@ -688,6 +690,7 @@ static __always_inline void __page_pool_release_netmem_dma(struct page_pool *poo
 		 */
 		return;
 
+#if PP_DMA_INDEX_BITS > 0
 	id = netmem_get_dma_index(netmem);
 	if (!id)
 		return;
@@ -698,7 +701,7 @@ static __always_inline void __page_pool_release_netmem_dma(struct page_pool *poo
 		old = xa_cmpxchg_bh(&pool->dma_mapped, id, page, NULL, 0);
 	if (old != page)
 		return;
-
+#endif
 	dma = page_pool_get_dma_addr_netmem(netmem);
 
 	/* When page is unmapped, it cannot be returned to our pool */



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-16  9:27       ` Toke Høiland-Jørgensen
@ 2025-09-16 22:21         ` Mina Almasry
  2025-09-17  6:27           ` Helge Deller
  2025-09-17 10:08           ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 13+ messages in thread
From: Mina Almasry @ 2025-09-16 22:21 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Helge Deller, Helge Deller, David Hildenbrand,
	Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Linux Memory Management List, netdev, Linux parisc List,
	Andrew Morton

On Tue, Sep 16, 2025 at 2:27 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Mina Almasry <almasrymina@google.com> writes:
>
> > On Mon, Sep 15, 2025 at 6:08 AM Helge Deller <deller@gmx.de> wrote:
> >>
> >> On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
> >> > Helge Deller <deller@kernel.org> writes:
> >> >
> >> >> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
> >> >> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
> >> >> 32-bit platforms.
> >> >>
> >> >> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
> >> >> pages, but the remaining bits are not sufficient to unambiguously identify
> >> >> such pages any longer.
> >> >
> >> > Why not? What values end up in pp_magic that are mistaken for the
> >> > pp_signature?
> >>
> >> As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
> >> And we have PP_SIGNATURE == 0x40  (since POISON_POINTER_DELTA is zero on 32-bit platforms).
> >> That means, that before page_pool_page_is_pp() could clearly identify such pages,
> >> as the (value & 0xFFFFFFFC) == 0x40.
> >> So, basically only the 0x40 value indicated a PP page.
> >>
> >> Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
> >> just showing a few examples:
> >> 0x01111040
> >> 0x082330C0
> >> 0x03264040
> >> 0x0ad686c0 ....
> >>
> >> For me it crashes immediately at bootup when memblocked pages are handed
> >> over to become normal pages.
> >>
> >
> > I tried to take a look to double check here and AFAICT Helge is correct.
> >
> > Before the breaking patch with PP_MAGIC_MASK==0xFFFFFFFC, basically
> > 0x40 is the only pointer that may be mistaken as a valid pp_magic.
> > AFAICT each bit we 0 in the PP_MAGIC_MASK (aside from the 3 least
> > significant bits), doubles the number of pointers that can be mistaken
> > for pp_magic. So with 0xFFFFFFFC, only one value (0x40) can be
> > mistaken as a valid pp_magic, with  0xc000007c AFAICT 2^22 values can
> > be mistaken as pp_magic?
> >
> > I don't know that there is any bits we can take away from
> > PP_MAGIC_MASK I think? As each bit doubles the probablity :(
> >
> > I would usually say we can check the 3 least significant bits to tell
> > if pp_magic is a pointer or not, but pp_magic is unioned with
> > page->lru I believe which will use those bits.
>
> So if the pointers stored in the same field can be any arbitrary value,
> you are quite right, there is no safe value. The critical assumption in
> the bit stuffing scheme is that the pointers stored in the field will
> always be above PAGE_OFFSET, and that PAGE_OFFSET has one (or both) of
> the two top-most bits set (that is what the VMSPLIT reference in the
> comment above the PP_DMA_INDEX_SHIFT definition is alluding to).
>

I see... but where does the 'PAGE_OFFSET has one (or both) of the two
top-most bits set)' assumption come from? Is it from this code?

/*
 * PAGE_OFFSET -- the first address of the first page of memory.
 * When not using MMU this corresponds to the first free page in
 * physical memory (aligned on a page boundary).
 */
#ifdef CONFIG_MMU
#ifdef CONFIG_64BIT
....
#else
#define PAGE_OFFSET _AC(0xc0000000, UL)
#endif /* CONFIG_64BIT */
#else
#define PAGE_OFFSET ((unsigned long)phys_ram_base)
#endif /* CONFIG_MMU */

It looks like with !CONFIG_MMU we use phys_ram_base and I'm unable to
confirm that all the values of this have the first 2 bits set. I
wonder if his setup is !CONFIG_MMU indeed.

It also looks like pp_magic is also union'd with __folio_index in
struct page, and it looks like the data there is sometimes used as a
pointer and sometimes not.

-- 
Thanks,
Mina


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-16 22:21         ` Mina Almasry
@ 2025-09-17  6:27           ` Helge Deller
  2025-09-17 10:08           ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 13+ messages in thread
From: Helge Deller @ 2025-09-17  6:27 UTC (permalink / raw)
  To: Mina Almasry, Toke Høiland-Jørgensen
  Cc: Helge Deller, David Hildenbrand, Jesper Dangaard Brouer,
	Ilias Apalodimas, David S. Miller, Linux Memory Management List,
	netdev, Linux parisc List, Andrew Morton

On 9/17/25 00:21, Mina Almasry wrote:
> On Tue, Sep 16, 2025 at 2:27 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Mina Almasry <almasrymina@google.com> writes:
>>
>>> On Mon, Sep 15, 2025 at 6:08 AM Helge Deller <deller@gmx.de> wrote:
>>>>
>>>> On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
>>>>> Helge Deller <deller@kernel.org> writes:
>>>>>
>>>>>> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
>>>>>> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
>>>>>> 32-bit platforms.
>>>>>>
>>>>>> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
>>>>>> pages, but the remaining bits are not sufficient to unambiguously identify
>>>>>> such pages any longer.
>>>>>
>>>>> Why not? What values end up in pp_magic that are mistaken for the
>>>>> pp_signature?
>>>>
>>>> As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
>>>> And we have PP_SIGNATURE == 0x40  (since POISON_POINTER_DELTA is zero on 32-bit platforms).
>>>> That means, that before page_pool_page_is_pp() could clearly identify such pages,
>>>> as the (value & 0xFFFFFFFC) == 0x40.
>>>> So, basically only the 0x40 value indicated a PP page.
>>>>
>>>> Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
>>>> just showing a few examples:
>>>> 0x01111040
>>>> 0x082330C0
>>>> 0x03264040
>>>> 0x0ad686c0 ....
>>>>
>>>> For me it crashes immediately at bootup when memblocked pages are handed
>>>> over to become normal pages.
>>>>
>>>
>>> I tried to take a look to double check here and AFAICT Helge is correct.
>>>
>>> Before the breaking patch with PP_MAGIC_MASK==0xFFFFFFFC, basically
>>> 0x40 is the only pointer that may be mistaken as a valid pp_magic.
>>> AFAICT each bit we 0 in the PP_MAGIC_MASK (aside from the 3 least
>>> significant bits), doubles the number of pointers that can be mistaken
>>> for pp_magic. So with 0xFFFFFFFC, only one value (0x40) can be
>>> mistaken as a valid pp_magic, with  0xc000007c AFAICT 2^22 values can
>>> be mistaken as pp_magic?
>>>
>>> I don't know that there is any bits we can take away from
>>> PP_MAGIC_MASK I think? As each bit doubles the probablity :(
>>>
>>> I would usually say we can check the 3 least significant bits to tell
>>> if pp_magic is a pointer or not, but pp_magic is unioned with
>>> page->lru I believe which will use those bits.
>>
>> So if the pointers stored in the same field can be any arbitrary value,
>> you are quite right, there is no safe value. The critical assumption in
>> the bit stuffing scheme is that the pointers stored in the field will
>> always be above PAGE_OFFSET, and that PAGE_OFFSET has one (or both) of
>> the two top-most bits set (that is what the VMSPLIT reference in the
>> comment above the PP_DMA_INDEX_SHIFT definition is alluding to).
>>
> 
> I see... but where does the 'PAGE_OFFSET has one (or both) of the two
> top-most bits set)' assumption come from? Is it from this code?
> 
> /*
>   * PAGE_OFFSET -- the first address of the first page of memory.
>   * When not using MMU this corresponds to the first free page in
>   * physical memory (aligned on a page boundary).
>   */
> #ifdef CONFIG_MMU
> #ifdef CONFIG_64BIT
> ....
> #else
> #define PAGE_OFFSET _AC(0xc0000000, UL)
> #endif /* CONFIG_64BIT */
> #else
> #define PAGE_OFFSET ((unsigned long)phys_ram_base)
> #endif /* CONFIG_MMU */
> 
> It looks like with !CONFIG_MMU we use phys_ram_base and I'm unable to
> confirm that all the values of this have the first 2 bits set. I
> wonder if his setup is !CONFIG_MMU indeed.

Btw, on 32-bit parisc we have PAGE_OFFSET = 0x10000000.
Other architectures seem to have other values than 0xc0000000 too.

Helge


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-16 22:21         ` Mina Almasry
  2025-09-17  6:27           ` Helge Deller
@ 2025-09-17 10:08           ` Toke Høiland-Jørgensen
  2025-09-18  0:28             ` Mina Almasry
  1 sibling, 1 reply; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-09-17 10:08 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Helge Deller, Helge Deller, David Hildenbrand,
	Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Linux Memory Management List, netdev, Linux parisc List,
	Andrew Morton

Mina Almasry <almasrymina@google.com> writes:

> On Tue, Sep 16, 2025 at 2:27 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Mina Almasry <almasrymina@google.com> writes:
>>
>> > On Mon, Sep 15, 2025 at 6:08 AM Helge Deller <deller@gmx.de> wrote:
>> >>
>> >> On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
>> >> > Helge Deller <deller@kernel.org> writes:
>> >> >
>> >> >> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
>> >> >> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
>> >> >> 32-bit platforms.
>> >> >>
>> >> >> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
>> >> >> pages, but the remaining bits are not sufficient to unambiguously identify
>> >> >> such pages any longer.
>> >> >
>> >> > Why not? What values end up in pp_magic that are mistaken for the
>> >> > pp_signature?
>> >>
>> >> As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
>> >> And we have PP_SIGNATURE == 0x40  (since POISON_POINTER_DELTA is zero on 32-bit platforms).
>> >> That means, that before page_pool_page_is_pp() could clearly identify such pages,
>> >> as the (value & 0xFFFFFFFC) == 0x40.
>> >> So, basically only the 0x40 value indicated a PP page.
>> >>
>> >> Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
>> >> just showing a few examples:
>> >> 0x01111040
>> >> 0x082330C0
>> >> 0x03264040
>> >> 0x0ad686c0 ....
>> >>
>> >> For me it crashes immediately at bootup when memblocked pages are handed
>> >> over to become normal pages.
>> >>
>> >
>> > I tried to take a look to double check here and AFAICT Helge is correct.
>> >
>> > Before the breaking patch with PP_MAGIC_MASK==0xFFFFFFFC, basically
>> > 0x40 is the only pointer that may be mistaken as a valid pp_magic.
>> > AFAICT each bit we 0 in the PP_MAGIC_MASK (aside from the 3 least
>> > significant bits), doubles the number of pointers that can be mistaken
>> > for pp_magic. So with 0xFFFFFFFC, only one value (0x40) can be
>> > mistaken as a valid pp_magic, with  0xc000007c AFAICT 2^22 values can
>> > be mistaken as pp_magic?
>> >
>> > I don't know that there is any bits we can take away from
>> > PP_MAGIC_MASK I think? As each bit doubles the probablity :(
>> >
>> > I would usually say we can check the 3 least significant bits to tell
>> > if pp_magic is a pointer or not, but pp_magic is unioned with
>> > page->lru I believe which will use those bits.
>>
>> So if the pointers stored in the same field can be any arbitrary value,
>> you are quite right, there is no safe value. The critical assumption in
>> the bit stuffing scheme is that the pointers stored in the field will
>> always be above PAGE_OFFSET, and that PAGE_OFFSET has one (or both) of
>> the two top-most bits set (that is what the VMSPLIT reference in the
>> comment above the PP_DMA_INDEX_SHIFT definition is alluding to).
>>
>
> I see... but where does the 'PAGE_OFFSET has one (or both) of the two
> top-most bits set)' assumption come from? Is it from this code?

Well, from me grepping through the code and trying to make sense of all
the different cases of the preprocessor and config directives across
architectures. Seems I did not quite succeed :/

> /*
>  * PAGE_OFFSET -- the first address of the first page of memory.
>  * When not using MMU this corresponds to the first free page in
>  * physical memory (aligned on a page boundary).
>  */
> #ifdef CONFIG_MMU
> #ifdef CONFIG_64BIT
> ....
> #else
> #define PAGE_OFFSET _AC(0xc0000000, UL)
> #endif /* CONFIG_64BIT */
> #else
> #define PAGE_OFFSET ((unsigned long)phys_ram_base)
> #endif /* CONFIG_MMU */
>
> It looks like with !CONFIG_MMU we use phys_ram_base and I'm unable to
> confirm that all the values of this have the first 2 bits set. I
> wonder if his setup is !CONFIG_MMU indeed.

Right, that's certainly one thing I missed. As was the parisc arch
thing, as Helge followed up with. Ugh :/

> It also looks like pp_magic is also union'd with __folio_index in
> struct page, and it looks like the data there is sometimes used as a
> pointer and sometimes not.

Not according to my pahole:

[...]
			union {
				long unsigned int __folio_index; /*    32     8 */
[...]
       	struct {
			long unsigned int pp_magic;      /*     8     8 */
	
So I think we're good with this, no?

So given the above, we could do something equivalent to this, I think?

diff --git i/include/linux/mm.h w/include/linux/mm.h
index 1ae97a0b8ec7..615aaa19c60c 100644
--- i/include/linux/mm.h
+++ w/include/linux/mm.h
@@ -4175,8 +4175,12 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
  */
 #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
 #else
+#if PAGE_OFFSET > PP_SIGNATURE
 /* Always leave out the topmost two; see above. */
-#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
+#define PP_DMA_INDEX_BITS MIN(32, __fls(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT - 1)
+#else
+#define PP_DMA_INDEX_BITS 0
+#endif /* PAGE_OFFSET > PP_SIGNATURE */
 #endif
 
 #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS +  PP_DMA_INDEX_SHIFT - 1, \


Except that it won't work in this form as-is because PAGE_OFFSET is not
always a constant (see the #define PAGE_OFFSET ((unsigned
long)phys_ram_base) that your quoted above), so we'll have to turn it
into an inline function or something.

I'm not sure adding this extra complexity is really worth it, or if we
should just go with the '#define PP_DMA_INDEX_BITS 0' when
POISON_POINTER_DELTA is unset and leave it at that for the temporary
workaround. WDYT?

-Toke



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-17 10:08           ` Toke Høiland-Jørgensen
@ 2025-09-18  0:28             ` Mina Almasry
  2025-09-22 15:49               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 13+ messages in thread
From: Mina Almasry @ 2025-09-18  0:28 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Helge Deller, Helge Deller, David Hildenbrand,
	Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Linux Memory Management List, netdev, Linux parisc List,
	Andrew Morton

On Wed, Sep 17, 2025 at 3:09 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Mina Almasry <almasrymina@google.com> writes:
>
> > On Tue, Sep 16, 2025 at 2:27 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Mina Almasry <almasrymina@google.com> writes:
> >>
> >> > On Mon, Sep 15, 2025 at 6:08 AM Helge Deller <deller@gmx.de> wrote:
> >> >>
> >> >> On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
> >> >> > Helge Deller <deller@kernel.org> writes:
> >> >> >
> >> >> >> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
> >> >> >> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
> >> >> >> 32-bit platforms.
> >> >> >>
> >> >> >> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
> >> >> >> pages, but the remaining bits are not sufficient to unambiguously identify
> >> >> >> such pages any longer.
> >> >> >
> >> >> > Why not? What values end up in pp_magic that are mistaken for the
> >> >> > pp_signature?
> >> >>
> >> >> As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
> >> >> And we have PP_SIGNATURE == 0x40  (since POISON_POINTER_DELTA is zero on 32-bit platforms).
> >> >> That means, that before page_pool_page_is_pp() could clearly identify such pages,
> >> >> as the (value & 0xFFFFFFFC) == 0x40.
> >> >> So, basically only the 0x40 value indicated a PP page.
> >> >>
> >> >> Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
> >> >> just showing a few examples:
> >> >> 0x01111040
> >> >> 0x082330C0
> >> >> 0x03264040
> >> >> 0x0ad686c0 ....
> >> >>
> >> >> For me it crashes immediately at bootup when memblocked pages are handed
> >> >> over to become normal pages.
> >> >>
> >> >
> >> > I tried to take a look to double check here and AFAICT Helge is correct.
> >> >
> >> > Before the breaking patch with PP_MAGIC_MASK==0xFFFFFFFC, basically
> >> > 0x40 is the only pointer that may be mistaken as a valid pp_magic.
> >> > AFAICT each bit we 0 in the PP_MAGIC_MASK (aside from the 3 least
> >> > significant bits), doubles the number of pointers that can be mistaken
> >> > for pp_magic. So with 0xFFFFFFFC, only one value (0x40) can be
> >> > mistaken as a valid pp_magic, with  0xc000007c AFAICT 2^22 values can
> >> > be mistaken as pp_magic?
> >> >
> >> > I don't know that there is any bits we can take away from
> >> > PP_MAGIC_MASK I think? As each bit doubles the probablity :(
> >> >
> >> > I would usually say we can check the 3 least significant bits to tell
> >> > if pp_magic is a pointer or not, but pp_magic is unioned with
> >> > page->lru I believe which will use those bits.
> >>
> >> So if the pointers stored in the same field can be any arbitrary value,
> >> you are quite right, there is no safe value. The critical assumption in
> >> the bit stuffing scheme is that the pointers stored in the field will
> >> always be above PAGE_OFFSET, and that PAGE_OFFSET has one (or both) of
> >> the two top-most bits set (that is what the VMSPLIT reference in the
> >> comment above the PP_DMA_INDEX_SHIFT definition is alluding to).
> >>
> >
> > I see... but where does the 'PAGE_OFFSET has one (or both) of the two
> > top-most bits set)' assumption come from? Is it from this code?
>
> Well, from me grepping through the code and trying to make sense of all
> the different cases of the preprocessor and config directives across
> architectures. Seems I did not quite succeed :/
>
> > /*
> >  * PAGE_OFFSET -- the first address of the first page of memory.
> >  * When not using MMU this corresponds to the first free page in
> >  * physical memory (aligned on a page boundary).
> >  */
> > #ifdef CONFIG_MMU
> > #ifdef CONFIG_64BIT
> > ....
> > #else
> > #define PAGE_OFFSET _AC(0xc0000000, UL)
> > #endif /* CONFIG_64BIT */
> > #else
> > #define PAGE_OFFSET ((unsigned long)phys_ram_base)
> > #endif /* CONFIG_MMU */
> >
> > It looks like with !CONFIG_MMU we use phys_ram_base and I'm unable to
> > confirm that all the values of this have the first 2 bits set. I
> > wonder if his setup is !CONFIG_MMU indeed.
>
> Right, that's certainly one thing I missed. As was the parisc arch
> thing, as Helge followed up with. Ugh :/
>
> > It also looks like pp_magic is also union'd with __folio_index in
> > struct page, and it looks like the data there is sometimes used as a
> > pointer and sometimes not.
>
> Not according to my pahole:
>
> [...]
>                         union {
>                                 long unsigned int __folio_index; /*    32     8 */
> [...]
>         struct {
>                         long unsigned int pp_magic;      /*     8     8 */
>
> So I think we're good with this, no?
>
> So given the above, we could do something equivalent to this, I think?
>
> diff --git i/include/linux/mm.h w/include/linux/mm.h
> index 1ae97a0b8ec7..615aaa19c60c 100644
> --- i/include/linux/mm.h
> +++ w/include/linux/mm.h
> @@ -4175,8 +4175,12 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>   */
>  #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>  #else
> +#if PAGE_OFFSET > PP_SIGNATURE
>  /* Always leave out the topmost two; see above. */
> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
> +#define PP_DMA_INDEX_BITS MIN(32, __fls(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT - 1)

Shouldn't have this been:

#define PP_DMA_INDEX_BITS MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT)

I.e. you're trying to use the space between the least significant bit
set in PAGE_OFFSET and the most significant bit set in PP_SIGNATURE.
Hmm. I'm not sure I understand this, I may be reading wrong.

> +#else
> +#define PP_DMA_INDEX_BITS 0
> +#endif /* PAGE_OFFSET > PP_SIGNATURE */
>  #endif
>
>  #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS +  PP_DMA_INDEX_SHIFT - 1, \
>
>
> Except that it won't work in this form as-is because PAGE_OFFSET is not
> always a constant (see the #define PAGE_OFFSET ((unsigned
> long)phys_ram_base) that your quoted above), so we'll have to turn it
> into an inline function or something.
>
> I'm not sure adding this extra complexity is really worth it, or if we
> should just go with the '#define PP_DMA_INDEX_BITS 0' when
> POISON_POINTER_DELTA is unset and leave it at that for the temporary
> workaround. WDYT?
>

I think this would work. It still wouldn't handle cases where the data
in pp_magic ends up used as a non-pointer at all or a pointer to some
static variable in the code like `.mp_ops = &dmabuf_devmem_ops,`
right? Because these were never allocated from memory so are unrelated
to PAGE_OFFSET.

But I guess things like that would have been a problem with the old
code anwyway, so should be of no concern?

-- 
Thanks,
Mina


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp()
  2025-09-18  0:28             ` Mina Almasry
@ 2025-09-22 15:49               ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-09-22 15:49 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Helge Deller, Helge Deller, David Hildenbrand,
	Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Linux Memory Management List, netdev, Linux parisc List,
	Andrew Morton

Mina Almasry <almasrymina@google.com> writes:

> On Wed, Sep 17, 2025 at 3:09 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Mina Almasry <almasrymina@google.com> writes:
>>
>> > On Tue, Sep 16, 2025 at 2:27 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Mina Almasry <almasrymina@google.com> writes:
>> >>
>> >> > On Mon, Sep 15, 2025 at 6:08 AM Helge Deller <deller@gmx.de> wrote:
>> >> >>
>> >> >> On 9/15/25 13:44, Toke Høiland-Jørgensen wrote:
>> >> >> > Helge Deller <deller@kernel.org> writes:
>> >> >> >
>> >> >> >> Commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when
>> >> >> >> destroying the pool") changed PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c on
>> >> >> >> 32-bit platforms.
>> >> >> >>
>> >> >> >> The function page_pool_page_is_pp() uses PP_MAGIC_MASK to identify page pool
>> >> >> >> pages, but the remaining bits are not sufficient to unambiguously identify
>> >> >> >> such pages any longer.
>> >> >> >
>> >> >> > Why not? What values end up in pp_magic that are mistaken for the
>> >> >> > pp_signature?
>> >> >>
>> >> >> As I wrote, PP_MAGIC_MASK changed from 0xFFFFFFFC to 0xc000007c.
>> >> >> And we have PP_SIGNATURE == 0x40  (since POISON_POINTER_DELTA is zero on 32-bit platforms).
>> >> >> That means, that before page_pool_page_is_pp() could clearly identify such pages,
>> >> >> as the (value & 0xFFFFFFFC) == 0x40.
>> >> >> So, basically only the 0x40 value indicated a PP page.
>> >> >>
>> >> >> Now with the mask a whole bunch of pointers suddenly qualify as being a pp page,
>> >> >> just showing a few examples:
>> >> >> 0x01111040
>> >> >> 0x082330C0
>> >> >> 0x03264040
>> >> >> 0x0ad686c0 ....
>> >> >>
>> >> >> For me it crashes immediately at bootup when memblocked pages are handed
>> >> >> over to become normal pages.
>> >> >>
>> >> >
>> >> > I tried to take a look to double check here and AFAICT Helge is correct.
>> >> >
>> >> > Before the breaking patch with PP_MAGIC_MASK==0xFFFFFFFC, basically
>> >> > 0x40 is the only pointer that may be mistaken as a valid pp_magic.
>> >> > AFAICT each bit we 0 in the PP_MAGIC_MASK (aside from the 3 least
>> >> > significant bits), doubles the number of pointers that can be mistaken
>> >> > for pp_magic. So with 0xFFFFFFFC, only one value (0x40) can be
>> >> > mistaken as a valid pp_magic, with  0xc000007c AFAICT 2^22 values can
>> >> > be mistaken as pp_magic?
>> >> >
>> >> > I don't know that there is any bits we can take away from
>> >> > PP_MAGIC_MASK I think? As each bit doubles the probablity :(
>> >> >
>> >> > I would usually say we can check the 3 least significant bits to tell
>> >> > if pp_magic is a pointer or not, but pp_magic is unioned with
>> >> > page->lru I believe which will use those bits.
>> >>
>> >> So if the pointers stored in the same field can be any arbitrary value,
>> >> you are quite right, there is no safe value. The critical assumption in
>> >> the bit stuffing scheme is that the pointers stored in the field will
>> >> always be above PAGE_OFFSET, and that PAGE_OFFSET has one (or both) of
>> >> the two top-most bits set (that is what the VMSPLIT reference in the
>> >> comment above the PP_DMA_INDEX_SHIFT definition is alluding to).
>> >>
>> >
>> > I see... but where does the 'PAGE_OFFSET has one (or both) of the two
>> > top-most bits set)' assumption come from? Is it from this code?
>>
>> Well, from me grepping through the code and trying to make sense of all
>> the different cases of the preprocessor and config directives across
>> architectures. Seems I did not quite succeed :/
>>
>> > /*
>> >  * PAGE_OFFSET -- the first address of the first page of memory.
>> >  * When not using MMU this corresponds to the first free page in
>> >  * physical memory (aligned on a page boundary).
>> >  */
>> > #ifdef CONFIG_MMU
>> > #ifdef CONFIG_64BIT
>> > ....
>> > #else
>> > #define PAGE_OFFSET _AC(0xc0000000, UL)
>> > #endif /* CONFIG_64BIT */
>> > #else
>> > #define PAGE_OFFSET ((unsigned long)phys_ram_base)
>> > #endif /* CONFIG_MMU */
>> >
>> > It looks like with !CONFIG_MMU we use phys_ram_base and I'm unable to
>> > confirm that all the values of this have the first 2 bits set. I
>> > wonder if his setup is !CONFIG_MMU indeed.
>>
>> Right, that's certainly one thing I missed. As was the parisc arch
>> thing, as Helge followed up with. Ugh :/
>>
>> > It also looks like pp_magic is also union'd with __folio_index in
>> > struct page, and it looks like the data there is sometimes used as a
>> > pointer and sometimes not.
>>
>> Not according to my pahole:
>>
>> [...]
>>                         union {
>>                                 long unsigned int __folio_index; /*    32     8 */
>> [...]
>>         struct {
>>                         long unsigned int pp_magic;      /*     8     8 */
>>
>> So I think we're good with this, no?
>>
>> So given the above, we could do something equivalent to this, I think?
>>
>> diff --git i/include/linux/mm.h w/include/linux/mm.h
>> index 1ae97a0b8ec7..615aaa19c60c 100644
>> --- i/include/linux/mm.h
>> +++ w/include/linux/mm.h
>> @@ -4175,8 +4175,12 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>   */
>>  #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>>  #else
>> +#if PAGE_OFFSET > PP_SIGNATURE
>>  /* Always leave out the topmost two; see above. */
>> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
>> +#define PP_DMA_INDEX_BITS MIN(32, __fls(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT - 1)
>
> Shouldn't have this been:
>
> #define PP_DMA_INDEX_BITS MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT)
>
> I.e. you're trying to use the space between the least significant bit
> set in PAGE_OFFSET and the most significant bit set in PP_SIGNATURE.
> Hmm. I'm not sure I understand this, I may be reading wrong.

No, you're right, that was me getting things mixed up; but looks like
you got the gist of it so that's good :)

>> +#else
>> +#define PP_DMA_INDEX_BITS 0
>> +#endif /* PAGE_OFFSET > PP_SIGNATURE */
>>  #endif
>>
>>  #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS +  PP_DMA_INDEX_SHIFT - 1, \
>>
>>
>> Except that it won't work in this form as-is because PAGE_OFFSET is not
>> always a constant (see the #define PAGE_OFFSET ((unsigned
>> long)phys_ram_base) that your quoted above), so we'll have to turn it
>> into an inline function or something.
>>
>> I'm not sure adding this extra complexity is really worth it, or if we
>> should just go with the '#define PP_DMA_INDEX_BITS 0' when
>> POISON_POINTER_DELTA is unset and leave it at that for the temporary
>> workaround. WDYT?
>>
>
> I think this would work. It still wouldn't handle cases where the data
> in pp_magic ends up used as a non-pointer at all or a pointer to some
> static variable in the code like `.mp_ops = &dmabuf_devmem_ops,`
> right? Because these were never allocated from memory so are unrelated
> to PAGE_OFFSET.
>
> But I guess things like that would have been a problem with the old
> code anwyway, so should be of no concern?

Yeah, this relies on the overlapping field only ever being used for
kernel-space pointers; which I believe is the case with page->lru (since
it's a list_head).

I'll see if I can find a way around the "PAGE_OFFSET may be a variable
reference" issue and post a proper patch, hopefully tomorrow.

-Toke



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-09-22 15:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-12 23:06 [PATCH][RESEND][RFC] Fix 32-bit boot failure due inaccurate page_pool_page_is_pp() Helge Deller
2025-09-14 17:06 ` Christoph Biedl
2025-09-15  9:45 ` Jesper Dangaard Brouer
2025-09-15 11:44 ` Toke Høiland-Jørgensen
2025-09-15 13:08   ` Helge Deller
2025-09-15 20:41     ` Mina Almasry
2025-09-15 23:51     ` Mina Almasry
2025-09-16  9:27       ` Toke Høiland-Jørgensen
2025-09-16 22:21         ` Mina Almasry
2025-09-17  6:27           ` Helge Deller
2025-09-17 10:08           ` Toke Høiland-Jørgensen
2025-09-18  0:28             ` Mina Almasry
2025-09-22 15:49               ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).