From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Oliver Sang <oliver.sang@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
oe-lkp@lists.linux.dev, lkp@intel.com,
Mike Rapoport <rppt@linux.ibm.com>,
Christoph Lameter <cl@linux.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [linus:master] [mm, slub] 0af8489b02: kernel_BUG_at_include/linux/mm.h
Date: Wed, 4 Jan 2023 18:04:20 +0900 [thread overview]
Message-ID: <Y7VBFLHY/PMbb4XS@hyeyoo> (raw)
In-Reply-To: <Y7Qxucg5le7WOzr7@xsang-OptiPlex-9020>
On Tue, Jan 03, 2023 at 09:46:33PM +0800, Oliver Sang wrote:
> On Tue, Jan 03, 2023 at 11:42:11AM +0100, Vlastimil Babka wrote:
> > So the events leading up to this could be something like:
> >
> > - 0x2daee is order-1 slab folio of the inode cache, sitting on the partial list
> > - despite being on partial list, it's freed ???
> > - somebody else allocates order-2 page 0x2daec and uses it for whatever,
> > then frees it
> > - 0x2daec is reallocated as order-1 slab from names_cache, then freed
> > - we try to allocate from the slab page 0x2daee and trip on the PageTail
> >
> > Except, the freeing of order-2 page would have reset the PageTail and
> > compound_head in 0x2daec, so this is even more complicated or involves some
> > extra race?
>
> FYI, we ran tests more up to 500 times, then saw different issues but rate is
> actually low
>
> 56d5a2b9ba85a390 0af8489b0216fa1dd83e264bef8
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :500 12% 61:500 dmesg.invalid_opcode:#[##]
> :500 3% 14:500 dmesg.kernel_BUG_at_include/linux/mm.h
> :500 3% 17:500 dmesg.kernel_BUG_at_include/linux/page-flags.h
> :500 5% 26:500 dmesg.kernel_BUG_at_lib/list_debug.c
> :500 0% 2:500 dmesg.kernel_BUG_at_mm/page_alloc.c
> :500 0% 2:500 dmesg.kernel_BUG_at_mm/usercopy.c
>
> >
> > In any case, this is something a debug_pagealloc kernel could have a chance
> > of catching earlier. Would it be possible to enable CONFIG_DEBUG_PAGEALLOC
> > and DEBUG_PAGEALLOC_ENABLE_DEFAULT additionally to the rest of the
> > configuration, and repeat the test?
>
> ok, we are starting to test by these 2 additional configs now.
BTW it seems to be totally unrelated to rcutorture tests.
Are there similar reports in boot tests with the same config?
> >
> > Separately we should also make the __dump_page() more resilient.
> >
> > Thanks,
> > Vlastimil
> >
> > > [ 25.804432][ T214] ------------[ cut here ]------------
> > > [ 25.804917][ T214] kernel BUG at include/linux/mm.h:825!
> > > [ 25.805402][ T214] invalid opcode: 0000 [#1] SMP
> > > [ 25.805820][ T214] CPU: 0 PID: 214 Comm: udevadm Tainted: G S 6.1.0-rc2-00014-g0af8489b0216 #2 1c4d7707ec0ce574ed62a77e82a8580202758048
> > > [ 25.806944][ T214] EIP: __dump_page.cold+0x282/0x369
> > > [ 25.807376][ T214] Code: ff ff 83 05 e8 5d bb c5 01 ba 4c c4 2f c4 89 f8 83 15 ec 5d bb c5 00 e8 f2 92 ed fd 83 05 f8 5d bb c5 01 83 15 fc 5d bb c5 00 <0f> 0b 83 05 00 5e bb c5 01 b8 ac 85 a3 c4 83 15 04 5e bb c5 00 e8
> > > [ 25.808960][ T214] EAX: 00000000 EBX: e764d530 ECX: 00000003 EDX: 4108888f
> > > [ 25.809578][ T214] ESI: e764d4e0 EDI: e764d4e0 EBP: ed89db3c ESP: ed89db00
> > > [ 25.810168][ T214] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00210046
> > > [ 25.810803][ T214] CR0: 80050033 CR2: 00616abc CR3: 2d878000 CR4: 000406d0
> > > [ 25.811407][ T214] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > [ 25.811999][ T214] DR6: fffe0ff0 DR7: 00000400
> > > [ 25.812390][ T214] Call Trace:
> > > [ 25.812675][ T214] dump_page+0x2a/0xc0
> > > [ 25.813025][ T214] ? _raw_spin_lock_irqsave+0x16/0x30
> > > [ 25.813492][ T214] folio_flags+0x23/0x70
> > > [ 25.813945][ T214] get_partial_node+0x89/0x290
> > > [ 25.814357][ T214] __slab_alloc_node+0xbb/0x270
> > > [ 25.814860][ T214] kmem_cache_alloc_lru+0x8d/0x4e0
> > > [ 25.815289][ T214] ? __lock_release+0x3ec/0x410
> > > [ 25.815697][ T214] ? iget_locked+0x78/0x310
> > > [ 25.816096][ T214] alloc_inode+0x93/0x150
> > > [ 25.816469][ T214] iget_locked+0xdd/0x310
> > > [ 25.816829][ T214] ? lock_is_held_type+0x80/0xf0
> > > [ 25.817264][ T214] kernfs_get_inode+0x24/0xb0
> > > [ 25.817670][ T214] kernfs_iop_lookup+0xb5/0x1a0
> > > [ 25.818087][ T214] __lookup_slow+0xd9/0x2a0
> > > [ 25.818479][ T214] lookup_slow+0x50/0x90
> > > [ 25.818847][ T214] walk_component+0x19c/0x2c0
> > > [ 25.819244][ T214] path_lookupat+0xa3/0x270
> > > [ 25.819627][ T214] path_openat+0x307/0x3e0
> > > [ 25.820007][ T214] do_filp_open+0x7c/0x130
> > > [ 25.820409][ T214] do_sys_openat2+0x113/0x1f0
> > > [ 25.820807][ T214] do_sys_open+0x8e/0xe0
> > > [ 25.821211][ T214] __ia32_sys_openat+0x2b/0x40
> > > [ 25.821622][ T214] __do_fast_syscall_32+0x72/0xd0
> > > [ 25.822057][ T214] ? trace_hardirqs_on+0xa2/0x110
> > > [ 25.822480][ T214] ? __fput+0x19f/0x390
> > > [ 25.822842][ T214] ? lockdep_hardirqs_on_prepare+0x242/0x400
> > > [ 25.823346][ T214] ? syscall_exit_to_user_mode+0x5f/0x90
> > > [ 25.823823][ T214] ? __do_fast_syscall_32+0x7c/0xd0
> > > [ 25.824259][ T214] ? lockdep_hardirqs_on_prepare+0x242/0x400
> > > [ 25.824767][ T214] ? syscall_exit_to_user_mode+0x5f/0x90
> > > [ 25.825254][ T214] ? __do_fast_syscall_32+0x7c/0xd0
> > > [ 25.825696][ T214] ? __do_fast_syscall_32+0x7c/0xd0
> > > [ 25.826155][ T214] ? syscall_exit_to_user_mode+0x5f/0x90
> > > [ 25.826627][ T214] ? __do_fast_syscall_32+0x7c/0xd0
> > > [ 25.827056][ T214] ? __do_fast_syscall_32+0x7c/0xd0
> > > [ 25.827486][ T214] ? __do_fast_syscall_32+0x7c/0xd0
> > > [ 25.827929][ T214] ? irqentry_exit_to_user_mode+0x23/0x30
> > > [ 25.828423][ T214] ? irqentry_exit+0x7f/0xc0
> > > [ 25.828812][ T214] do_fast_syscall_32+0x32/0x70
> > > [ 25.829223][ T214] do_SYSENTER_32+0x15/0x20
> > > [ 25.829589][ T214] entry_SYSENTER_32+0xa2/0xfb
> > > [ 25.830003][ T214] EIP: 0xb7f8c549
> > > [ 25.830330][ T214] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> > > [ 25.831929][ T214] EAX: ffffffda EBX: 00000006 ECX: 006142a1 EDX: 002a8000
> > > [ 25.832522][ T214] ESI: 00000000 EDI: 00000001 EBP: 00614024 ESP: bff3c4a0
> > > [ 25.833123][ T214] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00200246
> > > [ 25.833738][ T214] Modules linked in:
> > > [ 25.834062][ T214] ---[ end trace 0000000000000000 ]---
> > > [ 25.834522][ T214] EIP: __dump_page.cold+0x282/0x369
> > > [ 25.834960][ T214] Code: ff ff 83 05 e8 5d bb c5 01 ba 4c c4 2f c4 89 f8 83 15 ec 5d bb c5 00 e8 f2 92 ed fd 83 05 f8 5d bb c5 01 83 15 fc 5d bb c5 00 <0f> 0b 83 05 00 5e bb c5 01 b8 ac 85 a3 c4 83 15 04 5e bb c5 00 e8
> > > [ 25.836574][ T214] EAX: 00000000 EBX: e764d530 ECX: 00000003 EDX: 4108888f
> > > [ 25.837183][ T214] ESI: e764d4e0 EDI: e764d4e0 EBP: ed89db3c ESP: ed89db00
> > > [ 25.837772][ T214] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00210046
> > > [ 25.838414][ T214] CR0: 80050033 CR2: 00616abc CR3: 2d878000 CR4: 000406d0
> > > [ 25.839011][ T214] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > [ 25.839597][ T214] DR6: fffe0ff0 DR7: 00000400
> > > [ 25.839995][ T214] Kernel panic - not syncing: Fatal exception
> > > [ 25.840554][ T214] Kernel Offset: disabled
> >
> >
--
Thanks,
Hyeonggon
next prev parent reply other threads:[~2023-01-04 9:04 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-31 15:26 [linus:master] [mm, slub] 0af8489b02: kernel_BUG_at_include/linux/mm.h kernel test robot
2023-01-01 5:30 ` Hyeonggon Yoo
2023-01-01 6:50 ` Hyeonggon Yoo
2023-01-01 7:37 ` supervisor write access in kernel mode in __pv_queued_spin_unlock_slowpath Hyeonggon Yoo
2023-01-01 11:08 ` Maxim Levitsky
2023-01-02 11:17 ` Hyeonggon Yoo
2023-01-03 10:42 ` [linus:master] [mm, slub] 0af8489b02: kernel_BUG_at_include/linux/mm.h Vlastimil Babka
2023-01-03 13:46 ` Oliver Sang
2023-01-03 14:36 ` Vlastimil Babka
2023-01-04 9:04 ` Hyeonggon Yoo [this message]
2023-01-05 1:46 ` Oliver Sang
2023-01-05 13:59 ` Hyeonggon Yoo
2023-01-05 14:47 ` Hyeonggon Yoo
2023-01-09 14:16 ` Oliver Sang
2023-01-06 10:13 ` Vlastimil Babka
2023-01-09 14:01 ` Oliver Sang
2023-01-09 14:04 ` Oliver Sang
2023-01-10 13:53 ` Oliver Sang
2023-01-10 14:09 ` Vlastimil Babka
2023-01-11 2:26 ` Feng Tang
2023-01-11 10:52 ` Vlastimil Babka
2023-01-12 7:47 ` Oliver Sang
2023-01-12 7:56 ` Vlastimil Babka
2023-01-17 7:19 ` Oliver Sang
2023-01-12 8:49 ` Vlastimil Babka
2023-01-03 15:31 ` A better dump_page() Matthew Wilcox
2023-01-03 23:07 ` David Rientjes
2023-01-03 23:29 ` Matthew Wilcox
2023-01-05 15:19 ` Vlastimil Babka
2023-01-05 15:35 ` Matthew Wilcox
2023-01-06 17:28 ` [linus:master] [mm, slub] 0af8489b02: kernel_BUG_at_include/linux/mm.h Hyeonggon Yoo
2023-01-11 9:44 ` BUG: unable to handle page fault for address: f6ffe000 Hyeonggon Yoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y7VBFLHY/PMbb4XS@hyeyoo \
--to=42.hyeyoo@gmail.com \
--cc=cl@linux.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=rppt@linux.ibm.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.