* Re: 6.16-pre-rc1: BUG: Bad page state in process swapper on parisc [not found] <5a004aef-9df1-4126-b167-1aae27d4240d@gmx.de> @ 2025-08-27 21:31 ` Christoph Biedl 2025-09-11 22:12 ` boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels Helge Deller 0 siblings, 1 reply; 4+ messages in thread From: Christoph Biedl @ 2025-08-27 21:31 UTC (permalink / raw) To: Helge Deller Cc: Toke Høiland-Jørgensen, Linux Kernel Development, Linux Memory Management List, linux-parisc Sorry for being somewhat late to the party ... Helge Deller wrote a few weeks ago ... > I'm facing a kernel crash on the 32-bit parisc platform with git head. > > git bisecting leads to this patch which triggers the crash: > commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool") > > Syslog:... > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 131072 > [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off > [ 0.000000] stackdepot: allocating hash table via alloc_large_system_hash > [ 0.000000] stackdepot hash table entries: 32768 (order: 6, 262144 bytes, linear) > .. > [ 0.000000] MEMBLOCK configuration: (I added this output during debugging:) > [ 0.000000] memory size = 0x20000000 reserved size = 0x01f0ed2a > [ 0.000000] memory.cnt = 0x1 > [ 0.000000] memory[0x0] [0x00000000-0x1fffffff], 0x20000000 bytes flags: 0x0 > [ 0.000000] reserved.cnt = 0xa > [ 0.000000] reserved[0x0] [0x00000000-0x0008a0b0], 0x0008a0b1 bytes flags: 0x0 > [ 0.000000] reserved[0x1] [0x0008a0c0-0x0008a130], 0x00000071 bytes flags: 0x0 > [ 0.000000] reserved[0x2] [0x0008a140-0x0008a143], 0x00000004 bytes flags: 0x0 > [ 0.000000] reserved[0x3] [0x0008a150-0x0008a153], 0x00000004 bytes flags: 0x0 > [ 0.000000] reserved[0x4] [0x0008a160-0x0008a2d3], 0x00000174 bytes flags: 0x0 > [ 0.000000] reserved[0x5] [0x0008a2e0-0x0008a5e3], 0x00000304 bytes flags: 0x0 > [ 0.000000] reserved[0x6] [0x0008a5f0-0x0008a6b3], 0x000000c4 bytes flags: 0x0 > [ 0.000000] reserved[0x7] [0x0008a6c0-0x0008acc3], 0x00000604 bytes flags: 0x0 > [ 0.000000] reserved[0x8] [0x0008acd0-0x000f6d8f], 0x0006c0c0 bytes flags: 0x0 > [ 0.000000] reserved[0x9] [0x00100000-0x01f17fff], 0x01e18000 bytes flags: 0x0 > [ 0.000000] BUG: Bad page state in process swapper pfn:000f7 > [ 0.000000] page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0xf7 > [ 0.000000] flags: 0x0(zone=0) > [ 0.000000] raw: 00000000 118022c0 118022c0 00000000 00000000 00000000 ffffffff 00000000 > [ 0.000000] raw: 00000000 > [ 0.000000] page dumped because: page_pool leak > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-32bit+ #2730 NONE > [ 0.000000] Hardware name: 9000/778/B160L > [ 0.000000] Backtrace: > [ 0.000000] [<1041d1f4>] show_stack+0x34/0x48 > [ 0.000000] [<10412dd8>] dump_stack_lvl+0x80/0xc8 > [ 0.000000] [<10412e3c>] dump_stack+0x1c/0x2c > [ 0.000000] [<106ece88>] bad_page+0x14c/0x17c > [ 0.000000] [<10406c50>] free_page_is_bad.part.0+0xd4/0xec > [ 0.000000] [<106ed180>] free_page_is_bad+0x80/0x88 > [ 0.000000] [<106ef05c>] __free_pages_ok+0x374/0x508 > [ 0.000000] [<1011d34c>] __free_pages_core+0x1f0/0x218 > [ 0.000000] [<1011a2f0>] memblock_free_pages+0x68/0x94 > [ 0.000000] [<10120324>] memblock_free_all+0x26c/0x310 > [ 0.000000] [<1011a4d8>] mm_core_init+0x18c/0x208 > [ 0.000000] [<10100e88>] start_kernel+0x4ec/0x7a0 > [ 0.000000] [<101054d0>] start_parisc+0xb4/0xc4 The same occured here but due to time constraints and hardware issues I couldn't dig into this earlier. Bisecting in the 6.15.y stable series led to commit c30ae60f41f9 which was cherry-picked from ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool"). The problem still exists in 6.17-rc2. | HP-UX model name: 9000/785/C3600 if that matters. Christoph ^ permalink raw reply [flat|nested] 4+ messages in thread
* boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels 2025-08-27 21:31 ` 6.16-pre-rc1: BUG: Bad page state in process swapper on parisc Christoph Biedl @ 2025-09-11 22:12 ` Helge Deller 2025-09-12 7:57 ` David Hildenbrand 0 siblings, 1 reply; 4+ messages in thread From: Helge Deller @ 2025-09-11 22:12 UTC (permalink / raw) To: Toke Høiland-Jørgensen, David Hildenbrand, Linux Kernel Development, Linux Memory Management List, linux-parisc Cc: Christoph Biedl, Helge Deller As reported earlier in this mail thread, all 32-bit Linux kernels since v6.16 fail to boot on the parisc architecture like this: BUG: Bad page state in process swapper pfn:000f7 page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0xf7 flags: 0x0(zone=0) raw: 00000000 118022c0 118022c0 00000000 00000000 00000000 ffffffff 00000000 raw: 00000000 page dumped because: page_pool leak Modules linked in: CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-32bit+ #2730 NONE Hardware name: 9000/778/B160L Backtrace: [<106ece88>] bad_page+0x14c/0x17c [<10406c50>] free_page_is_bad.part.0+0xd4/0xec [<106ed180>] free_page_is_bad+0x80/0x88 [<106ef05c>] __free_pages_ok+0x374/0x508 [<1011d34c>] __free_pages_core+0x1f0/0x218 [<1011a2f0>] memblock_free_pages+0x68/0x94 [<10120324>] memblock_free_all+0x26c/0x310 [<1011a4d8>] mm_core_init+0x18c/0x208 [<10100e88>] start_kernel+0x4ec/0x7a0 [<101054d0>] start_parisc+0xb4/0xc4 git bisecting leads to this patch which triggers the crash: commit ee62ce7a1d909ccba0399680a03c2dee83bcae95 Author: Toke Høiland-Jørgensen <toke@redhat.com> Date: Wed Apr 9 12:41:37 2025 +0200 page_pool: Track DMA-mapped pages and unmap them when destroying the pool It turns out that the patch itself isn't wrong. But it's the culprit which leads to the kernel bug since it modifies PP_MAGIC_MASK for 32-bit kernels from: -#define PP_MAGIC_MASK ~0x3UL +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL) Function page_pool_page_is_pp() needs to unambiguously identify page pool pages (using PP_MAGIC_MASK), but since the patch now reduced the valid bits to check in PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c, the remaining bits are not sufficient to unambiguously identify such pages any longer. Because of that, page_pool_page_is_pp() sometimes wrongly reports pages as page pool pages and as such triggers the kernel BUG as it believes it found a page pool leak. IMHO this is a generic 32-bit kernel issue, not just affecting parisc. Do you see any options other than: a) revert the patch (ee62ce7a1d90), or: b) return false in page_pool_page_is_pp() when !defined(CONFIG_64BIT), which means to effectively disable the page pool page test on 32bit machines Helge ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels 2025-09-11 22:12 ` boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels Helge Deller @ 2025-09-12 7:57 ` David Hildenbrand 2025-09-12 14:04 ` Helge Deller 0 siblings, 1 reply; 4+ messages in thread From: David Hildenbrand @ 2025-09-12 7:57 UTC (permalink / raw) To: Helge Deller, Toke Høiland-Jørgensen, Linux Kernel Development, Linux Memory Management List, linux-parisc Cc: Christoph Biedl, Helge Deller, Byungchul Park On 12.09.25 00:12, Helge Deller wrote: > As reported earlier in this mail thread, all 32-bit Linux kernels since v6.16 > fail to boot on the parisc architecture like this: > > BUG: Bad page state in process swapper pfn:000f7 > page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0xf7 > flags: 0x0(zone=0) > raw: 00000000 118022c0 118022c0 00000000 00000000 00000000 ffffffff 00000000 > raw: 00000000 > page dumped because: page_pool leak > Modules linked in: > CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-32bit+ #2730 NONE > Hardware name: 9000/778/B160L > Backtrace: > [<106ece88>] bad_page+0x14c/0x17c > [<10406c50>] free_page_is_bad.part.0+0xd4/0xec > [<106ed180>] free_page_is_bad+0x80/0x88 > [<106ef05c>] __free_pages_ok+0x374/0x508 > [<1011d34c>] __free_pages_core+0x1f0/0x218 > [<1011a2f0>] memblock_free_pages+0x68/0x94 > [<10120324>] memblock_free_all+0x26c/0x310 > [<1011a4d8>] mm_core_init+0x18c/0x208 > [<10100e88>] start_kernel+0x4ec/0x7a0 > [<101054d0>] start_parisc+0xb4/0xc4 > > git bisecting leads to this patch which triggers the crash: > > commit ee62ce7a1d909ccba0399680a03c2dee83bcae95 > Author: Toke Høiland-Jørgensen <toke@redhat.com> > Date: Wed Apr 9 12:41:37 2025 +0200 > page_pool: Track DMA-mapped pages and unmap them when destroying the pool > > It turns out that the patch itself isn't wrong. > > But it's the culprit which leads to the kernel bug since it modifies > PP_MAGIC_MASK for 32-bit kernels from: > > -#define PP_MAGIC_MASK ~0x3UL > +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL) > > Function page_pool_page_is_pp() needs to unambiguously identify page pool > pages (using PP_MAGIC_MASK), but since the patch now reduced the valid bits to > check in PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c, the remaining bits are > not sufficient to unambiguously identify such pages any longer. > > Because of that, page_pool_page_is_pp() sometimes wrongly reports pages as > page pool pages and as such triggers the kernel BUG as it believes it found a > page pool leak. > > IMHO this is a generic 32-bit kernel issue, not just affecting parisc. > > Do you see any options other than: > a) revert the patch (ee62ce7a1d90), or: > b) return false in page_pool_page_is_pp() when !defined(CONFIG_64BIT), > which means to effectively disable the page pool page test on 32bit > machines We should have a change coming soon that would use a page type and fix it as well I think. https://lkml.kernel.org/r/20250728052742.81294-1-byungchul@sk.com Until then, the easiest fix would be indeed to go with b). But maybe the page type thing could be backported? -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels 2025-09-12 7:57 ` David Hildenbrand @ 2025-09-12 14:04 ` Helge Deller 0 siblings, 0 replies; 4+ messages in thread From: Helge Deller @ 2025-09-12 14:04 UTC (permalink / raw) To: David Hildenbrand, Helge Deller, Toke Høiland-Jørgensen, Linux Kernel Development, Linux Memory Management List, linux-parisc Cc: Christoph Biedl, Byungchul Park On 9/12/25 09:57, David Hildenbrand wrote: > On 12.09.25 00:12, Helge Deller wrote: >> As reported earlier in this mail thread, all 32-bit Linux kernels since v6.16 >> fail to boot on the parisc architecture like this: >> >> BUG: Bad page state in process swapper pfn:000f7 >> page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0xf7 >> flags: 0x0(zone=0) >> raw: 00000000 118022c0 118022c0 00000000 00000000 00000000 ffffffff 00000000 >> raw: 00000000 >> page dumped because: page_pool leak >> Modules linked in: >> CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-32bit+ #2730 NONE >> Hardware name: 9000/778/B160L >> Backtrace: >> [<106ece88>] bad_page+0x14c/0x17c >> [<10406c50>] free_page_is_bad.part.0+0xd4/0xec >> [<106ed180>] free_page_is_bad+0x80/0x88 >> [<106ef05c>] __free_pages_ok+0x374/0x508 >> [<1011d34c>] __free_pages_core+0x1f0/0x218 >> [<1011a2f0>] memblock_free_pages+0x68/0x94 >> [<10120324>] memblock_free_all+0x26c/0x310 >> [<1011a4d8>] mm_core_init+0x18c/0x208 >> [<10100e88>] start_kernel+0x4ec/0x7a0 >> [<101054d0>] start_parisc+0xb4/0xc4 >> >> git bisecting leads to this patch which triggers the crash: >> >> commit ee62ce7a1d909ccba0399680a03c2dee83bcae95 >> Author: Toke Høiland-Jørgensen <toke@redhat.com> >> Date: Wed Apr 9 12:41:37 2025 +0200 >> page_pool: Track DMA-mapped pages and unmap them when destroying the pool >> >> It turns out that the patch itself isn't wrong. >> >> But it's the culprit which leads to the kernel bug since it modifies >> PP_MAGIC_MASK for 32-bit kernels from: >> >> -#define PP_MAGIC_MASK ~0x3UL >> +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL) >> >> Function page_pool_page_is_pp() needs to unambiguously identify page pool >> pages (using PP_MAGIC_MASK), but since the patch now reduced the valid bits to >> check in PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c, the remaining bits are >> not sufficient to unambiguously identify such pages any longer. >> >> Because of that, page_pool_page_is_pp() sometimes wrongly reports pages as >> page pool pages and as such triggers the kernel BUG as it believes it found a >> page pool leak. >> >> IMHO this is a generic 32-bit kernel issue, not just affecting parisc. >> >> Do you see any options other than: >> a) revert the patch (ee62ce7a1d90), or: >> b) return false in page_pool_page_is_pp() when !defined(CONFIG_64BIT), >> which means to effectively disable the page pool page test on 32bit >> machines > > We should have a change coming soon that would use a page type and fix it as well I think. > > https://lkml.kernel.org/r/20250728052742.81294-1-byungchul@sk.com > > Until then, the easiest fix would be indeed to go with b). Ok, I'll send a patch for b). Thanks! Helge ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-09-12 14:04 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <5a004aef-9df1-4126-b167-1aae27d4240d@gmx.de>
2025-08-27 21:31 ` 6.16-pre-rc1: BUG: Bad page state in process swapper on parisc Christoph Biedl
2025-09-11 22:12 ` boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels Helge Deller
2025-09-12 7:57 ` David Hildenbrand
2025-09-12 14:04 ` Helge Deller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox