* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
@ 2021-11-02 9:29 ` bugzilla-daemon
2021-11-04 5:45 ` bugzilla-daemon
` (10 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2021-11-02 9:29 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
--- Comment #1 from Zorro Lang (zlang@redhat.com) ---
Created attachment 299403
--> https://bugzilla.kernel.org/attachment.cgi?id=299403&action=edit
.config file
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
2021-11-02 9:29 ` [Bug 214913] " bugzilla-daemon
@ 2021-11-04 5:45 ` bugzilla-daemon
2021-11-04 8:15 ` bugzilla-daemon
` (9 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2021-11-04 5:45 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
Michael Ellerman (michael@ellerman.id.au) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
CC| |michael@ellerman.id.au
--- Comment #2 from Michael Ellerman (michael@ellerman.id.au) ---
Thanks for the report, I agree this looks like a powerpc bug not an XFS bug.
I won't have time to look at this until next week probably, unless someone
beats me to it.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
2021-11-02 9:29 ` [Bug 214913] " bugzilla-daemon
2021-11-04 5:45 ` bugzilla-daemon
@ 2021-11-04 8:15 ` bugzilla-daemon
2021-11-05 11:53 ` bugzilla-daemon
` (8 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2021-11-04 8:15 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
Michal Suchanek (hramrach@gmail.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hramrach@gmail.com
--- Comment #3 from Michal Suchanek (hramrach@gmail.com) ---
What CPU is this?
Does it go away if you boot with ppc_tm=off
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
` (2 preceding siblings ...)
2021-11-04 8:15 ` bugzilla-daemon
@ 2021-11-05 11:53 ` bugzilla-daemon
2021-12-09 11:43 ` bugzilla-daemon
` (7 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2021-11-05 11:53 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
--- Comment #4 from Zorro Lang (zlang@redhat.com) ---
(In reply to Michal Suchanek from comment #3)
> What CPU is this?
>
> Does it go away if you boot with ppc_tm=off
(In reply to Michael Ellerman from comment #2)
> Thanks for the report, I agree this looks like a powerpc bug not an XFS bug.
>
> I won't have time to look at this until next week probably, unless someone
> beats me to it.
Thanks for you reply. (Un)fortunately, due to linux keeps updating, I can't
reproduce this panic on latest mainline linux master branch now. The HEAD
commit is 7ddb58cb0eca. From 8bb7eca972ad (v5.15) to 7ddb58cb0eca (v5.15+),
there're many changes, I can't sure which commit fixes this bug, or hide it? Do
you know if there was a known issue about this has been fixed?
Thanks,
Zorro
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
` (3 preceding siblings ...)
2021-11-05 11:53 ` bugzilla-daemon
@ 2021-12-09 11:43 ` bugzilla-daemon
2022-12-11 13:13 ` bugzilla-daemon
` (6 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2021-12-09 11:43 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
Michael Ellerman (michael@ellerman.id.au) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEEDINFO
--- Comment #5 from Michael Ellerman (michael@ellerman.id.au) ---
Sorry I don't have any idea which commit could have fixed this.
The process that crashed was "fsstress", do you know if it uses io_uring?
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
` (4 preceding siblings ...)
2021-12-09 11:43 ` bugzilla-daemon
@ 2022-12-11 13:13 ` bugzilla-daemon
2022-12-11 13:19 ` bugzilla-daemon
` (5 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2022-12-11 13:13 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
--- Comment #6 from Zorro Lang (zlang@redhat.com) ---
FYI, still hit this issue on linux 6.1.0-rc8+. And it's nearly 100%
reproducible.
[ 1581.047788] run fstests generic/051 at 2022-12-10 11:28:27
[ 1582.574596] XFS (sda3): Mounting V5 Filesystem
[ 1582.638653] XFS (sda3): Ending clean mount
[ 1582.646329] XFS (sda3): User initiated shutdown received.
[ 1582.646397] XFS (sda3): Metadata I/O Error (0x4) detected at
xfs_fs_goingdown+0x68/0x160 [xfs] (fs/xfs/xfs_fsops.c:483). Shutting down
filesystem.
[ 1582.646506] XFS (sda3): Please unmount the filesystem and rectify the
problem(s)
[ 1582.692102] XFS (sda3): Unmounting Filesystem
[ 1584.011651] XFS (sda3): Mounting V5 Filesystem
[ 1584.123764] XFS (sda3): Ending clean mount
[ 1605.168286] restraintd[3598]: *** Current Time: Sat Dec 10 11:28:52 2022
Localwatchdog at: Mon Dec 12 11:03:52 2022
[ 1614.846132] XFS (sda3): Unmounting Filesystem
[ 1615.569693] XFS (sda3): Mounting V5 Filesystem
[ 1615.725272] XFS (sda3): Ending clean mount
[ 1650.793064] XFS (sda3): User initiated shutdown received.
[ 1650.793108] XFS (sda3): Log I/O Error (0x6) detected at
xfs_fs_goingdown+0xf8/0x160 [xfs] (fs/xfs/xfs_fsops.c:486). Shutting down
filesystem.
[ 1650.793200] XFS (sda3): Please unmount the filesystem and rectify the
problem(s)
[ 1650.801605] Kernel attempted to read user page (108) - exploit attempt?
(uid: 0)
[ 1650.801625] BUG: Kernel NULL pointer dereference on read at 0x00000108
[ 1650.801638] Faulting instruction address: 0xc000000000036154
[ 1650.801652] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1650.801660] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 1650.801671] Modules linked in: dm_flakey dm_mod bonding tls rfkill sunrpc
pseries_rng drm fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi
sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto
[ 1650.801727] CPU: 0 PID: 382724 Comm: fsstress Kdump: loaded Not tainted
6.1.0-rc8+ #1
[ 1650.801739] Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005
of:IBM,FW940.02 (VL940_041) hv:phyp pSeries
[ 1650.801743] Kernel attempted to read user page (108) - exploit attempt?
(uid: 0)
[ 1650.801748] NIP: c000000000036154 LR: c0000000006f67b4 CTR:
c000000000036140
[ 1650.801755] BUG: Kernel NULL pointer dereference on read at 0x00000108
[ 1650.801759] REGS: c00000004eb7b480 TRAP: 0300 Not tainted (6.1.0-rc8+)
[ 1650.801764] Faulting instruction address: 0xc000000000036154
[ 1650.801769] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
CR: 88004400 XER: 00000000
[ 1650.801809] CFAR: c00000000000c9d4 DAR: 0000000000000108 DSISR: 40000000
IRQMASK: 0
[ 1650.801809] GPR00: c0000000006f67b4 c00000004eb7b720 c0000000016c0600
0000000000000000
[ 1650.801809] GPR04: c000000001690ef8 0000000000000000 0000000000000000
c00000004b72a900
[ 1650.801809] GPR08: c000000001506ee8 0000000000000000 0000000000000009
0000000000000000
[ 1650.801809] GPR12: c000000000036140 c0000000051e0000 0000000000000000
00007fff96f879b0
[ 1650.801809] GPR16: 00007fff970941d0 ffffffffffffffff 0000000000000005
c00000004484a400
[ 1650.801809] GPR20: c00000004484aeb8 0000000000040100 0000000000000001
c000000001489d58
[ 1650.801809] GPR24: 00000000ffffffff c00000004eb7b8b0 0000000000000004
c0000000011531e8
[ 1650.801809] GPR28: 0000000000000108 c00000004be38400 0000000000000004
c000000001690ef8
[ 1650.801927] NIP [c000000000036154] tm_cgpr_active+0x14/0x40
[ 1650.801939] LR [c0000000006f67b4] fill_thread_core_info+0x1d4/0x290
[ 1650.801951] Call Trace:
[ 1650.801955] [c00000004eb7b720] [c0000000006f673c]
fill_thread_core_info+0x15c/0x290 (unreliable)
[ 1650.801971] [c00000004eb7b7a0] [c0000000006f6fd4] fill_note_info+0x1f4/0x390
[ 1650.801984] [c00000004eb7b810] [c0000000006f71fc] elf_core_dump+0x8c/0x580
[ 1650.801997] [c00000004eb7ba00] [c0000000006fcc10] do_coredump+0x330/0xca0
[ 1650.802012] [c00000004eb7bbd0] [c000000000174f94] get_signal+0x7f4/0x8f0
[ 1650.802024] [c00000004eb7bcb0] [c000000000020d2c] do_signal+0x7c/0x330
[ 1650.802036] [c00000004eb7bd50] [c000000000022010]
do_notify_resume+0xb0/0x140
[ 1650.802049] [c00000004eb7bd80] [c000000000030550]
interrupt_exit_user_prepare_main+0x1d0/0x290
[ 1650.802062] [c00000004eb7bde0] [c0000000000306f4]
syscall_exit_prepare+0xe4/0x1f0
[ 1650.802074] [c00000004eb7be10] [c00000000000bffc]
system_call_vectored_common+0xfc/0x280
[ 1650.802089] --- interrupt: 3000 at 0x7fff96de315c
[ 1650.802099] NIP: 00007fff96de315c LR: 0000000000000000 CTR:
0000000000000000
[ 1650.802107] REGS: c00000004eb7be80 TRAP: 3000 Not tainted (6.1.0-rc8+)
[ 1650.802115] MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 42004404
XER: 00000000
[ 1650.802141] IRQMASK: 0
[ 1650.802141] GPR00: 00000000000000fa 00007fffc54a96a0 00007fff96f87200
0000000000000000
[ 1650.802141] GPR04: 000000000005d704 0000000000000006 0000000000000000
0000000000000000
[ 1650.802141] GPR08: 00007fff96f81f68 0000000000000000 0000000000000000
0000000000000000
[ 1650.802141] GPR12: 0000000000000000 00007fff9709b1c0 0000000000000000
00007fff96f879b0
[ 1650.802141] GPR16: 00007fff970941d0 ffffffffffffffff 0000000010030bec
00000000100152e8
[ 1650.802141] GPR20: 0000000000000000 0000000000000000 00007fffc54bdfee
0000000000000001
[ 1650.802141] GPR24: 0000000010009800 00000000100131a8 8f5c28f5c28f5c29
028f5c28f5c28f5c
[ 1650.802141] GPR28: 0000000000000006 ffffffffffffffff 00007fff97093980
000000000005d704
[ 1650.802249] NIP [00007fff96de315c] 0x7fff96de315c
[ 1650.802258] LR [0000000000000000] 0x0
[ 1650.802266] --- interrupt: 3000
[ 1650.802272] Instruction dump:
[ 1650.802279] 4bfe87d5 60000000 e8010040 38210030 ebe1fff8 7c0803a6 4e800020
7c0802a6
[ 1650.802305] 60000000 60000000 e9232aa0 38600000 <e9290108> 7929e844 79291f43
41820008
[ 1650.802330] ---[ end trace 0000000000000000 ]---
[ 1650.813469]
[ 1650.813475] Oops: Kernel access of bad area, sig: 11 [#2]
[ 1650.813480] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 1650.813488] Modules linked in: dm_flakey dm_mod bonding tls rfkill sunrpc
pseries_rng drm fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi
sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto
[ 1650.813524] CPU: 4 PID: 382723 Comm: fsstress Kdump: loaded Tainted: G
D 6.1.0-rc8+ #1
[ 1650.813532] Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005
of:IBM,FW940.02 (VL940_041) hv:phyp pSeries
[ 1650.813537] NIP: c000000000036154 LR: c0000000006f67b4 CTR:
c000000000036140
[ 1650.813541] REGS: c00000004eb4b480 TRAP: 0300 Tainted: G D
(6.1.0-rc8+)
[ 1650.813546] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
CR: 88004400 XER: 20040000
[ 1650.813562] CFAR: c00000000000c9d4 DAR: 0000000000000108 DSISR: 40000000
IRQMASK: 0
[ 1650.813562] GPR00: c0000000006f67b4 c00000004eb4b720 c0000000016c0600
0000000000000000
[ 1650.813562] GPR04: c000000001690ef8 0000000000000000 0000000000000000
c0000000437e4800
[ 1650.813562] GPR08: c000000001506ee8 0000000000000000 0000000000000009
0000000000000000
[ 1650.813562] GPR12: c000000000036140 c00000000ffcc480 0000000000000000
00007fff96f879b0
[ 1650.813562] GPR16: 00007fff970941d0 ffffffffffffffff 0000000000000005
c000000044810e00
[ 1650.813562] GPR20: c0000000448118b8 0000000000040100 0000000000000001
c000000001489d58
[ 1650.813562] GPR24: 00000000ffffffff c00000004eb4b8b0 0000000000000004
c0000000011531e8
[ 1650.813562] GPR28: 0000000000000108 c00000003235f000 0000000000000004
c000000001690ef8
[ 1650.813619] NIP [c000000000036154] tm_cgpr_active+0x14/0x40
[ 1650.813625] LR [c0000000006f67b4] fill_thread_core_info+0x1d4/0x290
[ 1650.813632] Call Trace:
[ 1650.813634] [c00000004eb4b720] [c0000000006f673c]
fill_thread_core_info+0x15c/0x290 (unreliable)
[ 1650.813643] [c00000004eb4b7a0] [c0000000006f6fd4] fill_note_info+0x1f4/0x390
[ 1650.813650] [c00000004eb4b810] [c0000000006f71fc] elf_core_dump+0x8c/0x580
[ 1650.813657] [c00000004eb4ba00] [c0000000006fcc10] do_coredump+0x330/0xca0
[ 1650.813662] [c00000004eb4bbd0] [c000000000174f94] get_signal+0x7f4/0x8f0
[ 1650.813668] [c00000004eb4bcb0] [c000000000020d2c] do_signal+0x7c/0x330
[ 1650.813674] [c00000004eb4bd50] [c000000000022010]
do_notify_resume+0xb0/0x140
[ 1650.813681] [c00000004eb4bd80] [c000000000030550]
interrupt_exit_user_prepare_main+0x1d0/0x290
[ 1650.813687] [c00000004eb4bde0] [c0000000000306f4]
syscall_exit_prepare+0xe4/0x1f0
[ 1650.813693] [c00000004eb4be10] [c00000000000bffc]
system_call_vectored_common+0xfc/0x280
[ 1650.813700] --- interrupt: 3000 at 0x7fff96de315c
[ 1650.813705] NIP: 00007fff96de315c LR: 0000000000000000 CTR:
0000000000000000
[ 1650.813709] REGS: c00000004eb4be80 TRAP: 3000 Tainted: G D
(6.1.0-rc8+)
[ 1650.813713] MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 42004404
XER: 00000000
[ 1650.813725] IRQMASK: 0
[ 1650.813725] GPR00: 00000000000000fa 00007fffc54a9b90 00007fff96f87200
0000000000000000
[ 1650.813725] GPR04: 000000000005d703 0000000000000006 0000000000000000
0000000000000000
[ 1650.813725] GPR08: 00007fff96f81f68 0000000000000000 0000000000000000
0000000000000000
[ 1650.813725] GPR12: 0000000000000000 00007fff9709b1c0 0000000000000000
00007fff96f879b0
[ 1650.813725] GPR16: 00007fff970941d0 ffffffffffffffff 0000000010030bec
00000000100152e8
[ 1650.813725] GPR20: 0000000000000000 0000000000000000 00007fffc54bdfee
0000000000000001
[ 1650.813725] GPR24: 0000000010010460 00000000100131a8 8f5c28f5c28f5c29
028f5c28f5c28f5c
[ 1650.813725] GPR28: 0000000000000006 0000000000000005 00007fff97093980
000000000005d703
[ 1650.813778] NIP [00007fff96de315c] 0x7fff96de315c
[ 1650.813782] LR [0000000000000000] 0x0
[ 1650.813785] --- interrupt: 3000
[ 1650.813788] Instruction dump:
[ 1650.813791] 4bfe87d5 60000000 e8010040 38210030 ebe1fff8 7c0803a6 4e800020
7c0802a6
[ 1650.813801] 60000000 60000000 e9232aa0 38600000 <e9290108> 7929e844 79291f43
41820008
[ 1650.813811] ---[ end trace 0000000000000000 ]---
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
` (5 preceding siblings ...)
2022-12-11 13:13 ` bugzilla-daemon
@ 2022-12-11 13:19 ` bugzilla-daemon
2022-12-12 3:52 ` Nicholas Piggin
2022-12-12 3:52 ` bugzilla-daemon
` (4 subsequent siblings)
11 siblings, 1 reply; 16+ messages in thread
From: bugzilla-daemon @ 2022-12-11 13:19 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
--- Comment #7 from Zorro Lang (zlang@redhat.com) ---
(In reply to Michael Ellerman from comment #5)
> Sorry I don't have any idea which commit could have fixed this.
>
> The process that crashed was "fsstress", do you know if it uses io_uring?
Yes, fsstress has io_uring read/write operations. And from the kernel .config
file(as attachment), the CONFIG_IO_URING=y
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2022-12-11 13:19 ` bugzilla-daemon
@ 2022-12-12 3:52 ` Nicholas Piggin
2022-12-12 7:30 ` Christophe Leroy
0 siblings, 1 reply; 16+ messages in thread
From: Nicholas Piggin @ 2022-12-12 3:52 UTC (permalink / raw)
To: bugzilla-daemon, linuxppc-dev
On Sun Dec 11, 2022 at 11:19 PM AEST, wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>
> --- Comment #7 from Zorro Lang (zlang@redhat.com) ---
> (In reply to Michael Ellerman from comment #5)
> > Sorry I don't have any idea which commit could have fixed this.
> >
> > The process that crashed was "fsstress", do you know if it uses io_uring?
>
> Yes, fsstress has io_uring read/write operations. And from the kernel .config
> file(as attachment), the CONFIG_IO_URING=y
The task being dumped seems like it's lost its task->thread.regs. The
NULL pointer is here:
int tm_cgpr_active(struct task_struct *target, const struct user_regset *regset)
{
if (!cpu_has_feature(CPU_FTR_TM))
return -ENODEV;
if (!MSR_TM_ACTIVE(target->thread.regs->msr))
return 0;
return regset->n;
}
On that regs->msr deref. r9 contains the regs pointer.
The kernel attempt to read user page - exploit attempt? message is
I think a red herring it's coming up because of the NULL deref I
think (I thought we fixed that).
Anyway I'm not sure how we could lose regs, all user threads should
have them set to non-NULL. It doesn't look like we can collect threads
for dumping before we have called copy_thread(), which is where they
get thread.regs set. AFAIK it's not supposed to change after that.
Would you be able to try this patch, hopefully it catches the problem
thread on the exit side, and gives a clue why regs is NULL.
Thanks,
Nick
---
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6a11025e5850..ece63b3d2304 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1898,9 +1898,21 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
/*
* Now fill in each thread's information.
*/
- for (t = info->thread; t != NULL; t = t->next)
+ for (t = info->thread; t != NULL; t = t->next) {
+ if (!t->task) {
+ WARN_ON(1);
+ printk("core info lost task\n");
+ continue;
+ }
+ if (!t->task->thread.regs) {
+ WARN_ON(1);
+ printk("lost regs pid:%d (current->pid:%d)\n", t->task->pid, current->pid);
+ continue;
+ }
+
if (!fill_thread_core_info(t, view, cprm->siginfo->si_signo, info))
return 0;
+ }
/*
* Fill in the two process-wide notes.
diff --git a/kernel/exit.c b/kernel/exit.c
index 35e0a31a0315..6820fe333081 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -366,6 +366,8 @@ static void coredump_task_exit(struct task_struct *tsk)
if (core_state) {
struct core_thread self;
+ WARN_ON(!current->thread.regs);
+
self.task = current;
if (self.task->flags & PF_SIGNALED)
self.next = xchg(&core_state->dumper.next, &self);
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2022-12-12 3:52 ` Nicholas Piggin
@ 2022-12-12 7:30 ` Christophe Leroy
0 siblings, 0 replies; 16+ messages in thread
From: Christophe Leroy @ 2022-12-12 7:30 UTC (permalink / raw)
To: Nicholas Piggin, bugzilla-daemon@kernel.org,
linuxppc-dev@lists.ozlabs.org
Le 12/12/2022 à 04:52, Nicholas Piggin a écrit :
> On Sun Dec 11, 2022 at 11:19 PM AEST, wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>>
>> --- Comment #7 from Zorro Lang (zlang@redhat.com) ---
>> (In reply to Michael Ellerman from comment #5)
>>> Sorry I don't have any idea which commit could have fixed this.
>>>
>>> The process that crashed was "fsstress", do you know if it uses io_uring?
>>
>> Yes, fsstress has io_uring read/write operations. And from the kernel .config
>> file(as attachment), the CONFIG_IO_URING=y
>
> The task being dumped seems like it's lost its task->thread.regs. The
> NULL pointer is here:
>
> int tm_cgpr_active(struct task_struct *target, const struct user_regset *regset)
> {
> if (!cpu_has_feature(CPU_FTR_TM))
> return -ENODEV;
>
> if (!MSR_TM_ACTIVE(target->thread.regs->msr))
> return 0;
>
> return regset->n;
> }
>
> On that regs->msr deref. r9 contains the regs pointer.
>
> The kernel attempt to read user page - exploit attempt? message is
> I think a red herring it's coming up because of the NULL deref I
> think (I thought we fixed that).
>
No we didn't fix that, my patch was rejected see
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/8b865b93d25c15c8e6d41e71c368bfc28da4489d.1606816701.git.christophe.leroy@csgroup.eu/
The reason for the rejection was:
The first page can be mapped if mmap_min_addr is 0.
Blocking all faults to the first page would potentially break any
program that does that.
Also if there is something mapped at 0 it's a good chance it is an
exploit attempt :)
Christophe
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
` (6 preceding siblings ...)
2022-12-11 13:19 ` bugzilla-daemon
@ 2022-12-12 3:52 ` bugzilla-daemon
2022-12-12 5:57 ` bugzilla-daemon
` (3 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2022-12-12 3:52 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
--- Comment #8 from npiggin@gmail.com ---
On Sun Dec 11, 2022 at 11:19 PM AEST, wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>
> --- Comment #7 from Zorro Lang (zlang@redhat.com) ---
> (In reply to Michael Ellerman from comment #5)
> > Sorry I don't have any idea which commit could have fixed this.
> >
> > The process that crashed was "fsstress", do you know if it uses io_uring?
>
> Yes, fsstress has io_uring read/write operations. And from the kernel .config
> file(as attachment), the CONFIG_IO_URING=y
The task being dumped seems like it's lost its task->thread.regs. The
NULL pointer is here:
int tm_cgpr_active(struct task_struct *target, const struct user_regset
*regset)
{
if (!cpu_has_feature(CPU_FTR_TM))
return -ENODEV;
if (!MSR_TM_ACTIVE(target->thread.regs->msr))
return 0;
return regset->n;
}
On that regs->msr deref. r9 contains the regs pointer.
The kernel attempt to read user page - exploit attempt? message is
I think a red herring it's coming up because of the NULL deref I
think (I thought we fixed that).
Anyway I'm not sure how we could lose regs, all user threads should
have them set to non-NULL. It doesn't look like we can collect threads
for dumping before we have called copy_thread(), which is where they
get thread.regs set. AFAIK it's not supposed to change after that.
Would you be able to try this patch, hopefully it catches the problem
thread on the exit side, and gives a clue why regs is NULL.
Thanks,
Nick
---
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6a11025e5850..ece63b3d2304 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1898,9 +1898,21 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
/*
* Now fill in each thread's information.
*/
- for (t = info->thread; t != NULL; t = t->next)
+ for (t = info->thread; t != NULL; t = t->next) {
+ if (!t->task) {
+ WARN_ON(1);
+ printk("core info lost task\n");
+ continue;
+ }
+ if (!t->task->thread.regs) {
+ WARN_ON(1);
+ printk("lost regs pid:%d (current->pid:%d)\n",
t->task->pid, current->pid);
+ continue;
+ }
+
if (!fill_thread_core_info(t, view, cprm->siginfo->si_signo,
info))
return 0;
+ }
/*
* Fill in the two process-wide notes.
diff --git a/kernel/exit.c b/kernel/exit.c
index 35e0a31a0315..6820fe333081 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -366,6 +366,8 @@ static void coredump_task_exit(struct task_struct *tsk)
if (core_state) {
struct core_thread self;
+ WARN_ON(!current->thread.regs);
+
self.task = current;
if (self.task->flags & PF_SIGNALED)
self.next = xchg(&core_state->dumper.next, &self);
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply related [flat|nested] 16+ messages in thread* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
` (7 preceding siblings ...)
2022-12-12 3:52 ` bugzilla-daemon
@ 2022-12-12 5:57 ` bugzilla-daemon
2022-12-12 7:19 ` Nicholas Piggin
2022-12-12 7:19 ` bugzilla-daemon
` (2 subsequent siblings)
11 siblings, 1 reply; 16+ messages in thread
From: bugzilla-daemon @ 2022-12-12 5:57 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
--- Comment #9 from Michael Ellerman (michael@ellerman.id.au) ---
I assume it's an io_uring IO worker.
They're created via create_io_worker() -> create_io_thread().
They pass a non-NULL `args->fn` to copy_process() -> copy_thread(), so we end
up in the "kernel thread" branch of the if, which sets p->thread.regs = NULL.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2022-12-12 5:57 ` bugzilla-daemon
@ 2022-12-12 7:19 ` Nicholas Piggin
0 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2022-12-12 7:19 UTC (permalink / raw)
To: bugzilla-daemon, linuxppc-dev; +Cc: Eric Biederman
On Mon Dec 12, 2022 at 3:57 PM AEST, wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>
> --- Comment #9 from Michael Ellerman (michael@ellerman.id.au) ---
> I assume it's an io_uring IO worker.
>
> They're created via create_io_worker() -> create_io_thread().
>
> They pass a non-NULL `args->fn` to copy_process() -> copy_thread(), so we end
> up in the "kernel thread" branch of the if, which sets p->thread.regs = NULL.
Hmm, you might be right. These things are created with the memory and
thread / signal context shared with the userspace process.
Still doesn't seem like they should be involved in core dumping though,
pt_regs would have no meaning even if we did set something there. How
best to catch these and filter them out of the core dump? Check for
PF_IO_WORKER in the coredump gathering?
Thanks,
Nick
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
` (8 preceding siblings ...)
2022-12-12 5:57 ` bugzilla-daemon
@ 2022-12-12 7:19 ` bugzilla-daemon
2022-12-12 7:30 ` bugzilla-daemon
2024-11-14 3:21 ` bugzilla-daemon
11 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2022-12-12 7:19 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
--- Comment #10 from npiggin@gmail.com ---
On Mon Dec 12, 2022 at 3:57 PM AEST, wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>
> --- Comment #9 from Michael Ellerman (michael@ellerman.id.au) ---
> I assume it's an io_uring IO worker.
>
> They're created via create_io_worker() -> create_io_thread().
>
> They pass a non-NULL `args->fn` to copy_process() -> copy_thread(), so we end
> up in the "kernel thread" branch of the if, which sets p->thread.regs = NULL.
Hmm, you might be right. These things are created with the memory and
thread / signal context shared with the userspace process.
Still doesn't seem like they should be involved in core dumping though,
pt_regs would have no meaning even if we did set something there. How
best to catch these and filter them out of the core dump? Check for
PF_IO_WORKER in the coredump gathering?
Thanks,
Nick
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
` (9 preceding siblings ...)
2022-12-12 7:19 ` bugzilla-daemon
@ 2022-12-12 7:30 ` bugzilla-daemon
2024-11-14 3:21 ` bugzilla-daemon
11 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2022-12-12 7:30 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
--- Comment #11 from Christophe Leroy (christophe.leroy@csgroup.eu) ---
Le 12/12/2022 à 04:52, Nicholas Piggin a écrit :
> On Sun Dec 11, 2022 at 11:19 PM AEST, wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>>
>> --- Comment #7 from Zorro Lang (zlang@redhat.com) ---
>> (In reply to Michael Ellerman from comment #5)
>>> Sorry I don't have any idea which commit could have fixed this.
>>>
>>> The process that crashed was "fsstress", do you know if it uses io_uring?
>>
>> Yes, fsstress has io_uring read/write operations. And from the kernel
>> .config
>> file(as attachment), the CONFIG_IO_URING=y
>
> The task being dumped seems like it's lost its task->thread.regs. The
> NULL pointer is here:
>
> int tm_cgpr_active(struct task_struct *target, const struct user_regset
> *regset)
> {
> if (!cpu_has_feature(CPU_FTR_TM))
> return -ENODEV;
>
> if (!MSR_TM_ACTIVE(target->thread.regs->msr))
> return 0;
>
> return regset->n;
> }
>
> On that regs->msr deref. r9 contains the regs pointer.
>
> The kernel attempt to read user page - exploit attempt? message is
> I think a red herring it's coming up because of the NULL deref I
> think (I thought we fixed that).
>
No we didn't fix that, my patch was rejected see
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/8b865b93d25c15c8e6d41e71c368bfc28da4489d.1606816701.git.christophe.leroy@csgroup.eu/
The reason for the rejection was:
The first page can be mapped if mmap_min_addr is 0.
Blocking all faults to the first page would potentially break any
program that does that.
Also if there is something mapped at 0 it's a good chance it is an
exploit attempt :)
Christophe
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
2021-11-02 9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
` (10 preceding siblings ...)
2022-12-12 7:30 ` bugzilla-daemon
@ 2024-11-14 3:21 ` bugzilla-daemon
11 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2024-11-14 3:21 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=214913
Michael Ellerman (michael@ellerman.id.au) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEEDINFO |CLOSED
Resolution|--- |CODE_FIX
--- Comment #12 from Michael Ellerman (michael@ellerman.id.au) ---
I believe this was fixed by the series merged as:
https://git.kernel.org/powerpc/c/89fb39134ae3b1e1f207af44a037721d92b32f70
Which was merged into v6.4.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread