* [Syzkaller & bisect] There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3
@ 2023-03-20 6:50 Pengfei Xu
2023-03-21 20:46 ` Darrick J. Wong
0 siblings, 1 reply; 3+ messages in thread
From: Pengfei Xu @ 2023-03-20 6:50 UTC (permalink / raw)
To: dchinner; +Cc: bfoster, djwong, heng.su, linux-xfs, lkp
Hi Dave Chinner and xfs experts,
Greeting!
There is BUG: unable to handle kernel NULL pointer dereference in
xfs_filestream_select_ag in v6.3-rc3:
All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230319_210525_xfs_filestream_select_ag
Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.c
Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/kconfig_origin
v6.3-rc3 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/v6.3-rc3_issue_dmesg.log
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/bisect_info.log
Bisected between v6.3-rc2 and v5.11 and found the bad commit:
"
8ac5b996bf5199f15b7687ceae989f8b2a410dda
xfs: fix off-by-one-block in xfs_discard_folio()
"
Reverted the commit on top of v6.3-rc2 kernel, at least the BUG dmesg was gone.
And this issue could be reproduced in v6.3-rc3 kernel also.
Is it possible that the above commit involves a new issue?
"
[ 62.318653] loop0: detected capacity change from 0 to 65536
[ 62.320459] XFS (loop0): Mounting V5 Filesystem d6f69dbd-8c5d-46be-b88e-92c0ae88ceb2
[ 62.325152] XFS (loop0): Ending clean mount
[ 62.326049] XFS (loop0): Quotacheck needed: Please wait.
[ 62.328884] XFS (loop0): Quotacheck: Done.
[ 62.363656] XFS (loop0): Metadata CRC error detected at xfs_agf_read_verify+0x10e/0x140, xfs_agf block 0x8001
[ 62.364489] XFS (loop0): Unmount and run xfs_repair
[ 62.364881] XFS (loop0): First 128 bytes of corrupted metadata buffer:
[ 62.365398] 00000000: 58 41 47 46 00 00 00 01 00 00 00 01 00 00 40 00 XAGF..........@.
[ 62.366026] 00000010: 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 01 ................
[ 62.366657] 00000020: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 04 ................
[ 62.367285] 00000030: 00 00 00 04 00 00 3b 5f 00 00 3b 5c 00 00 00 00 ......;_..;\....
[ 62.367927] 00000040: d6 f6 9d bd 8c 5d 46 be b8 8e 92 c0 ae 88 ce b2 .....]F.........
[ 62.368554] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 62.369180] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 62.369806] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 62.370471] XFS (loop0): metadata I/O error in "xfs_read_agf+0xd0/0x200" at daddr 0x8001 len 1 error 74
[ 62.371312] XFS (loop0): page discard on page 00000000a6a1237b, inode 0x46, pos 0.
[ 62.385968] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 62.386541] #PF: supervisor write access in kernel mode
[ 62.386960] #PF: error_code(0x0002) - not-present page
[ 62.387370] PGD 0 P4D 0
[ 62.387588] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 62.387945] CPU: 1 PID: 74 Comm: kworker/u4:3 Not tainted 6.3.0-rc3-kvm-e8d018dd #1
[ 62.388545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 62.389426] Workqueue: writeback wb_workfn (flush-7:0)
[ 62.389845] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0
[ 62.390285] Code: 83 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 20 94 03 00 48 89 c3 48 85 c0 0f 84 57 04 00 00 e8 2f 30 83 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45
[ 62.391712] RSP: 0018:ffffc9000092f4c0 EFLAGS: 00010246
[ 62.392128] RAX: 0000000000000000 RBX: ffff88800b858940 RCX: 0000000000006cc0
[ 62.392688] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002
[ 62.393246] RBP: ffffc9000092f548 R08: ffffc9000092f400 R09: 0000000000000000
[ 62.393805] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 62.394363] R13: ffffc9000092f588 R14: 0000000000000001 R15: ffffc9000092f708
[ 62.394924] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[ 62.395553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 62.396008] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0
[ 62.396569] PKRU: 55555554
[ 62.396793] Call Trace:
[ 62.396996] <TASK>
[ 62.397179] xfs_bmap_btalloc+0x706/0xb90
[ 62.397512] xfs_bmapi_allocate+0x25b/0x5e0
[ 62.397850] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 62.398239] xfs_bmapi_convert_delalloc+0x335/0x6c0
[ 62.398649] xfs_map_blocks+0x2ff/0x740
[ 62.398971] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 62.399362] iomap_do_writepage+0x43f/0xf10
[ 62.399709] write_cache_pages+0x2b8/0x7e0
[ 62.400047] ? __pfx_iomap_do_writepage+0x10/0x10
[ 62.400438] iomap_writepages+0x3e/0x80
[ 62.400757] xfs_vm_writepages+0x97/0xe0
[ 62.401088] ? __pfx_xfs_vm_writepages+0x10/0x10
[ 62.401470] do_writepages+0x10f/0x240
[ 62.401783] ? write_comp_data+0x2f/0x90
[ 62.402112] __writeback_single_inode+0x9f/0x780
[ 62.402492] ? write_comp_data+0x2f/0x90
[ 62.402823] writeback_sb_inodes+0x301/0x800
[ 62.403184] wb_writeback+0x18b/0x580
[ 62.403495] wb_workfn+0xca/0x880
[ 62.403778] ? __this_cpu_preempt_check+0x20/0x30
[ 62.404171] ? lock_acquire+0xe6/0x2b0
[ 62.404484] ? __this_cpu_preempt_check+0x20/0x30
[ 62.404872] ? write_comp_data+0x2f/0x90
[ 62.405202] process_one_work+0x3b1/0x860
[ 62.405538] worker_thread+0x52/0x660
[ 62.405846] ? __pfx_worker_thread+0x10/0x10
[ 62.406202] kthread+0x161/0x1a0
[ 62.406475] ? __pfx_kthread+0x10/0x10
[ 62.406787] ret_from_fork+0x29/0x50
[ 62.407094] </TASK>
[ 62.407281] Modules linked in:
[ 62.407535] CR2: 0000000000000010
[ 62.407808] ---[ end trace 0000000000000000 ]---
[ 62.408178] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0
[ 62.408619] Code: 83 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 20 94 03 00 48 89 c3 48 85 c0 0f 84 57 04 00 00 e8 2f 30 83 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45
[ 62.410052] RSP: 0018:ffffc9000092f4c0 EFLAGS: 00010246
[ 62.410469] RAX: 0000000000000000 RBX: ffff88800b858940 RCX: 0000000000006cc0
[ 62.411032] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002
[ 62.411594] RBP: ffffc9000092f548 R08: ffffc9000092f400 R09: 0000000000000000
[ 62.412155] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 62.412716] R13: ffffc9000092f588 R14: 0000000000000001 R15: ffffc9000092f708
[ 62.413278] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[ 62.413909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 62.414368] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0
[ 62.414934] PKRU: 55555554
[ 62.415159] note: kworker/u4:3[74] exited with irqs disabled
[ 62.415642] ------------[ cut here ]------------
[ 62.416012] WARNING: CPU: 1 PID: 74 at kernel/exit.c:814 do_exit+0xe8a/0x12b0
[ 62.416580] Modules linked in:
[ 62.416833] CPU: 1 PID: 74 Comm: kworker/u4:3 Tainted: G D 6.3.0-rc3-kvm-e8d018dd #1
[ 62.417546] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 62.418432] Workqueue: writeback wb_workfn (flush-7:0)
[ 62.418861] RIP: 0010:do_exit+0xe8a/0x12b0
[ 62.419197] Code: 00 65 01 05 b4 ba f0 7e e9 f4 fd ff ff e8 be 1e 1b 00 48 8b bb 98 09 00 00 31 f6 e8 30 b0 ff ff e9 74 fb ff ff e8 a6 1e 1b 00 <0f> 0b e9 3e f2 ff ff e8 9a 1e 1b 00 4c 89 ee bf 05 06 00 00 e8 bd
[ 62.420652] RSP: 0018:ffffc9000092feb0 EFLAGS: 00010246
[ 62.421072] RAX: 0000000000000000 RBX: ffff88800a02a340 RCX: 0000000000000001
[ 62.421635] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002
[ 62.422195] RBP: ffffc9000092ff18 R08: 0000000000000000 R09: 0000000000000000
[ 62.422758] R10: 34752f72656b726f R11: 776b203a65746f6e R12: 0000000000000000
[ 62.423323] R13: 0000000000000009 R14: ffff88800a009900 R15: ffff8880093a1180
[ 62.423902] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[ 62.424539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 62.425000] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0
[ 62.425568] PKRU: 55555554
[ 62.425794] Call Trace:
[ 62.426000] <TASK>
[ 62.426183] ? write_comp_data+0x2f/0x90
[ 62.426513] make_task_dead+0x100/0x290
[ 62.426832] rewind_stack_and_make_dead+0x17/0x20
[ 62.427227] </TASK>
[ 62.427414] irq event stamp: 122544
[ 62.427715] hardirqs last enabled at (122543): [<ffffffff821395dd>] get_random_u32+0x1dd/0x360
[ 62.428409] hardirqs last disabled at (122544): [<ffffffff82f8d76e>] exc_page_fault+0x4e/0x3b0
[ 62.429094] softirqs last enabled at (114870): [<ffffffff82fb01a9>] __do_softirq+0x2d9/0x3c3
[ 62.429771] softirqs last disabled at (114849): [<ffffffff81126724>] irq_exit_rcu+0xc4/0x100
[ 62.430443] ---[ end trace 0000000000000000 ]---
"
I hope it's helpful.
Thanks!
---
If you don't need the following environment to reproduce the problem or if you
already have one, please ignore the following information.
How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost
After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/
Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has
Fill the bzImage file into above start3.sh to load the target kernel in vm.
Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
make
make install
Thanks!
BR.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Syzkaller & bisect] There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3
2023-03-20 6:50 [Syzkaller & bisect] There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3 Pengfei Xu
@ 2023-03-21 20:46 ` Darrick J. Wong
2023-03-22 3:20 ` Pengfei Xu
0 siblings, 1 reply; 3+ messages in thread
From: Darrick J. Wong @ 2023-03-21 20:46 UTC (permalink / raw)
To: Pengfei Xu; +Cc: dchinner, bfoster, heng.su, linux-xfs, lkp
On Mon, Mar 20, 2023 at 02:50:07PM +0800, Pengfei Xu wrote:
> Hi Dave Chinner and xfs experts,
>
> Greeting!
>
> There is BUG: unable to handle kernel NULL pointer dereference in
> xfs_filestream_select_ag in v6.3-rc3:
>
> All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230319_210525_xfs_filestream_select_ag
> Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.c
How the hell am I supposed to extract the fuzzed disk image for
analysis?
Current Google syzbot provides a lot more information for analysis. Why
don't you go triage some of their reports instead of spraying more crap
at the XFS list?
> Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/kconfig_origin
> v6.3-rc3 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/v6.3-rc3_issue_dmesg.log
> Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/bisect_info.log
>
> Bisected between v6.3-rc2 and v5.11 and found the bad commit:
> "
> 8ac5b996bf5199f15b7687ceae989f8b2a410dda
> xfs: fix off-by-one-block in xfs_discard_folio()
How does *fixing* an off by one error in the page cache produce a crash
in the filestreams allocator?
> Reverted the commit on top of v6.3-rc2 kernel, at least the BUG dmesg was gone.
>
> And this issue could be reproduced in v6.3-rc3 kernel also.
> Is it possible that the above commit involves a new issue?
>
> "
> [ 62.318653] loop0: detected capacity change from 0 to 65536
> [ 62.320459] XFS (loop0): Mounting V5 Filesystem d6f69dbd-8c5d-46be-b88e-92c0ae88ceb2
> [ 62.325152] XFS (loop0): Ending clean mount
> [ 62.326049] XFS (loop0): Quotacheck needed: Please wait.
> [ 62.328884] XFS (loop0): Quotacheck: Done.
> [ 62.363656] XFS (loop0): Metadata CRC error detected at xfs_agf_read_verify+0x10e/0x140, xfs_agf block 0x8001
> [ 62.364489] XFS (loop0): Unmount and run xfs_repair
> [ 62.364881] XFS (loop0): First 128 bytes of corrupted metadata buffer:
> [ 62.365398] 00000000: 58 41 47 46 00 00 00 01 00 00 00 01 00 00 40 00 XAGF..........@.
> [ 62.366026] 00000010: 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 01 ................
> [ 62.366657] 00000020: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 04 ................
> [ 62.367285] 00000030: 00 00 00 04 00 00 3b 5f 00 00 3b 5c 00 00 00 00 ......;_..;\....
> [ 62.367927] 00000040: d6 f6 9d bd 8c 5d 46 be b8 8e 92 c0 ae 88 ce b2 .....]F.........
> [ 62.368554] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 62.369180] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 62.369806] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 62.370471] XFS (loop0): metadata I/O error in "xfs_read_agf+0xd0/0x200" at daddr 0x8001 len 1 error 74
> [ 62.371312] XFS (loop0): page discard on page 00000000a6a1237b, inode 0x46, pos 0.
> [ 62.385968] BUG: kernel NULL pointer dereference, address: 0000000000000010
> [ 62.386541] #PF: supervisor write access in kernel mode
> [ 62.386960] #PF: error_code(0x0002) - not-present page
> [ 62.387370] PGD 0 P4D 0
> [ 62.387588] Oops: 0002 [#1] PREEMPT SMP NOPTI
> [ 62.387945] CPU: 1 PID: 74 Comm: kworker/u4:3 Not tainted 6.3.0-rc3-kvm-e8d018dd #1
> [ 62.388545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [ 62.389426] Workqueue: writeback wb_workfn (flush-7:0)
> [ 62.389845] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0
What source line and/or instruction does %rip point to?
Considering that this is a null pointer deference, you ought to be able
to identify which pointer access did this.
If you are going to run some scripted tool to randomly corrupt the
filesystem to find failures, then you have an ethical and moral
responsibility to do some of the work to narrow down and identify the
cause of the failure, not just throw them at someone to do all the work.
--D
> [ 62.390285] Code: 83 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 20 94 03 00 48 89 c3 48 85 c0 0f 84 57 04 00 00 e8 2f 30 83 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45
> [ 62.391712] RSP: 0018:ffffc9000092f4c0 EFLAGS: 00010246
> [ 62.392128] RAX: 0000000000000000 RBX: ffff88800b858940 RCX: 0000000000006cc0
> [ 62.392688] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002
> [ 62.393246] RBP: ffffc9000092f548 R08: ffffc9000092f400 R09: 0000000000000000
> [ 62.393805] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [ 62.394363] R13: ffffc9000092f588 R14: 0000000000000001 R15: ffffc9000092f708
> [ 62.394924] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
> [ 62.395553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 62.396008] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0
> [ 62.396569] PKRU: 55555554
> [ 62.396793] Call Trace:
> [ 62.396996] <TASK>
> [ 62.397179] xfs_bmap_btalloc+0x706/0xb90
> [ 62.397512] xfs_bmapi_allocate+0x25b/0x5e0
> [ 62.397850] ? __sanitizer_cov_trace_pc+0x25/0x60
> [ 62.398239] xfs_bmapi_convert_delalloc+0x335/0x6c0
> [ 62.398649] xfs_map_blocks+0x2ff/0x740
> [ 62.398971] ? __sanitizer_cov_trace_pc+0x25/0x60
> [ 62.399362] iomap_do_writepage+0x43f/0xf10
> [ 62.399709] write_cache_pages+0x2b8/0x7e0
> [ 62.400047] ? __pfx_iomap_do_writepage+0x10/0x10
> [ 62.400438] iomap_writepages+0x3e/0x80
> [ 62.400757] xfs_vm_writepages+0x97/0xe0
> [ 62.401088] ? __pfx_xfs_vm_writepages+0x10/0x10
> [ 62.401470] do_writepages+0x10f/0x240
> [ 62.401783] ? write_comp_data+0x2f/0x90
> [ 62.402112] __writeback_single_inode+0x9f/0x780
> [ 62.402492] ? write_comp_data+0x2f/0x90
> [ 62.402823] writeback_sb_inodes+0x301/0x800
> [ 62.403184] wb_writeback+0x18b/0x580
> [ 62.403495] wb_workfn+0xca/0x880
> [ 62.403778] ? __this_cpu_preempt_check+0x20/0x30
> [ 62.404171] ? lock_acquire+0xe6/0x2b0
> [ 62.404484] ? __this_cpu_preempt_check+0x20/0x30
> [ 62.404872] ? write_comp_data+0x2f/0x90
> [ 62.405202] process_one_work+0x3b1/0x860
> [ 62.405538] worker_thread+0x52/0x660
> [ 62.405846] ? __pfx_worker_thread+0x10/0x10
> [ 62.406202] kthread+0x161/0x1a0
> [ 62.406475] ? __pfx_kthread+0x10/0x10
> [ 62.406787] ret_from_fork+0x29/0x50
> [ 62.407094] </TASK>
> [ 62.407281] Modules linked in:
> [ 62.407535] CR2: 0000000000000010
> [ 62.407808] ---[ end trace 0000000000000000 ]---
> [ 62.408178] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0
> [ 62.408619] Code: 83 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 20 94 03 00 48 89 c3 48 85 c0 0f 84 57 04 00 00 e8 2f 30 83 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45
> [ 62.410052] RSP: 0018:ffffc9000092f4c0 EFLAGS: 00010246
> [ 62.410469] RAX: 0000000000000000 RBX: ffff88800b858940 RCX: 0000000000006cc0
> [ 62.411032] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002
> [ 62.411594] RBP: ffffc9000092f548 R08: ffffc9000092f400 R09: 0000000000000000
> [ 62.412155] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [ 62.412716] R13: ffffc9000092f588 R14: 0000000000000001 R15: ffffc9000092f708
> [ 62.413278] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
> [ 62.413909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 62.414368] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0
> [ 62.414934] PKRU: 55555554
> [ 62.415159] note: kworker/u4:3[74] exited with irqs disabled
> [ 62.415642] ------------[ cut here ]------------
> [ 62.416012] WARNING: CPU: 1 PID: 74 at kernel/exit.c:814 do_exit+0xe8a/0x12b0
> [ 62.416580] Modules linked in:
> [ 62.416833] CPU: 1 PID: 74 Comm: kworker/u4:3 Tainted: G D 6.3.0-rc3-kvm-e8d018dd #1
> [ 62.417546] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [ 62.418432] Workqueue: writeback wb_workfn (flush-7:0)
> [ 62.418861] RIP: 0010:do_exit+0xe8a/0x12b0
> [ 62.419197] Code: 00 65 01 05 b4 ba f0 7e e9 f4 fd ff ff e8 be 1e 1b 00 48 8b bb 98 09 00 00 31 f6 e8 30 b0 ff ff e9 74 fb ff ff e8 a6 1e 1b 00 <0f> 0b e9 3e f2 ff ff e8 9a 1e 1b 00 4c 89 ee bf 05 06 00 00 e8 bd
> [ 62.420652] RSP: 0018:ffffc9000092feb0 EFLAGS: 00010246
> [ 62.421072] RAX: 0000000000000000 RBX: ffff88800a02a340 RCX: 0000000000000001
> [ 62.421635] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002
> [ 62.422195] RBP: ffffc9000092ff18 R08: 0000000000000000 R09: 0000000000000000
> [ 62.422758] R10: 34752f72656b726f R11: 776b203a65746f6e R12: 0000000000000000
> [ 62.423323] R13: 0000000000000009 R14: ffff88800a009900 R15: ffff8880093a1180
> [ 62.423902] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
> [ 62.424539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 62.425000] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0
> [ 62.425568] PKRU: 55555554
> [ 62.425794] Call Trace:
> [ 62.426000] <TASK>
> [ 62.426183] ? write_comp_data+0x2f/0x90
> [ 62.426513] make_task_dead+0x100/0x290
> [ 62.426832] rewind_stack_and_make_dead+0x17/0x20
> [ 62.427227] </TASK>
> [ 62.427414] irq event stamp: 122544
> [ 62.427715] hardirqs last enabled at (122543): [<ffffffff821395dd>] get_random_u32+0x1dd/0x360
> [ 62.428409] hardirqs last disabled at (122544): [<ffffffff82f8d76e>] exc_page_fault+0x4e/0x3b0
> [ 62.429094] softirqs last enabled at (114870): [<ffffffff82fb01a9>] __do_softirq+0x2d9/0x3c3
> [ 62.429771] softirqs last disabled at (114849): [<ffffffff81126724>] irq_exit_rcu+0xc4/0x100
> [ 62.430443] ---[ end trace 0000000000000000 ]---
> "
>
> I hope it's helpful.
>
> Thanks!
>
> ---
>
> If you don't need the following environment to reproduce the problem or if you
> already have one, please ignore the following information.
>
> How to reproduce:
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
> // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
> // You could change the bzImage_xxx as you want
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@localhost
>
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@localhost:/root/
>
> Get the bzImage for target kernel:
> Please use target kconfig and copy it to kernel_src/.config
> make olddefconfig
> make -jx bzImage //x should equal or less than cpu num your pc has
>
> Fill the bzImage file into above start3.sh to load the target kernel in vm.
>
>
> Tips:
> If you already have qemu-system-x86_64, please ignore below info.
> If you want to install qemu v7.1.0 version:
> git clone https://github.com/qemu/qemu.git
> cd qemu
> git checkout -f v7.1.0
> mkdir build
> cd build
> yum install -y ninja-build.x86_64
> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
> make
> make install
>
> Thanks!
> BR.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Syzkaller & bisect] There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3
2023-03-21 20:46 ` Darrick J. Wong
@ 2023-03-22 3:20 ` Pengfei Xu
0 siblings, 0 replies; 3+ messages in thread
From: Pengfei Xu @ 2023-03-22 3:20 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: dchinner, bfoster, heng.su, linux-xfs, lkp
Hi Darrick J. Wong,
On 2023-03-21 at 13:46:38 -0700, Darrick J. Wong wrote:
> On Mon, Mar 20, 2023 at 02:50:07PM +0800, Pengfei Xu wrote:
> > Hi Dave Chinner and xfs experts,
> >
> > Greeting!
> >
> > There is BUG: unable to handle kernel NULL pointer dereference in
> > xfs_filestream_select_ag in v6.3-rc3:
> >
> > All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230319_210525_xfs_filestream_select_ag
> > Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.c
>
> How the hell am I supposed to extract the fuzzed disk image for
> analysis?
>
> Current Google syzbot provides a lot more information for analysis. Why
> don't you go triage some of their reports instead of spraying more crap
> at the XFS list?
>
Ah, thanks a lot for your suggestion!
Next time I should add more analysis as follow from syzkaller to all problem
reports.
Updated more info as follow,
More detailed analysis from syzkaller report0: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/report0
repor.stats: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.stats
vm machine info: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/machineInfo0
I newly added repro.report: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.report
"
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
XFS (loop0): metadata I/O error in "xfs_read_agf+0xd0/0x2c0" at daddr 0x8001 len 1 error 74
XFS (loop0): page discard on page 00000000b8174cbd, inode 0x46, pos 0.
BUG: kernel NULL pointer dereference, address: 0000000000000010
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.3.0-rc2-intel-next-38f821ff82e9+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: writeback wb_workfn (flush-7:0)
RIP: 0010:arch_atomic_inc arch/x86/include/asm/atomic.h:95 [inline]
RIP: 0010:atomic_inc include/linux/atomic/atomic-instrumented.h:191 [inline]
RIP: 0010:xfs_filestream_create_association fs/xfs/xfs_filestream.c:321 [inline]
RIP: 0010:xfs_filestream_select_ag+0x5d5/0xce0 fs/xfs/xfs_filestream.c:372
Code: 80 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 80 f9 03 00 48 89 c3 48 85 c0 0f 84 3a 05 00 00 e8 9f 8a 80 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45
RSP: 0018:ffffc900001274c0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88800dbeae40 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88800791a340 RDI: 0000000000000002
RBP: ffffc90000127548 R08: ffffc90000127400 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffc90000127588 R14: 0000000000000001 R15: ffffc90000127708
FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000000b85c002 CR4: 0000000000f70ee0
PKRU: 55555554
Call Trace:
<TASK>
xfs_bmap_btalloc_filestreams fs/xfs/libxfs/xfs_bmap.c:3558 [inline]
xfs_bmap_btalloc+0x706/0xb90 fs/xfs/libxfs/xfs_bmap.c:3672
xfs_bmap_alloc_userdata fs/xfs/libxfs/xfs_bmap.c:4046 [inline]
xfs_bmapi_allocate+0x25b/0x5e0 fs/xfs/libxfs/xfs_bmap.c:4089
xfs_bmapi_convert_delalloc+0x335/0x6c0 fs/xfs/libxfs/xfs_bmap.c:4554
xfs_convert_blocks fs/xfs/xfs_aops.c:266 [inline]
xfs_map_blocks+0x2ff/0x8a0 fs/xfs/xfs_aops.c:389
iomap_writepage_map fs/iomap/buffered-io.c:1641 [inline]
iomap_do_writepage+0x43f/0x1070 fs/iomap/buffered-io.c:1803
write_cache_pages+0x2b8/0x8a0 mm/page-writeback.c:2473
iomap_writepages+0x3e/0x80 fs/iomap/buffered-io.c:1820
xfs_vm_writepages+0x97/0xe0 fs/xfs/xfs_aops.c:513
do_writepages+0x10f/0x240 mm/page-writeback.c:2551
__writeback_single_inode+0x9f/0xb20 fs/fs-writeback.c:1600
writeback_sb_inodes+0x301/0x8b0 fs/fs-writeback.c:1891
wb_writeback+0x18b/0x7c0 fs/fs-writeback.c:2065
wb_do_writeback fs/fs-writeback.c:2208 [inline]
wb_workfn+0xc0/0xad0 fs/fs-writeback.c:2248
process_one_work+0x3b1/0x9e0 kernel/workqueue.c:2390
worker_thread+0x52/0x660 kernel/workqueue.c:2537
kthread+0x161/0x1a0 kernel/kthread.c:376
ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
</TASK>
Modules linked in:
CR2: 0000000000000010
---[ end trace 0000000000000000 ]---
RIP: 0010:arch_atomic_inc arch/x86/include/asm/atomic.h:95 [inline]
RIP: 0010:atomic_inc include/linux/atomic/atomic-instrumented.h:191 [inline]
RIP: 0010:xfs_filestream_create_association fs/xfs/xfs_filestream.c:321 [inline]
RIP: 0010:xfs_filestream_select_ag+0x5d5/0xce0 fs/xfs/xfs_filestream.c:372
Code: 80 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 80 f9 03 00 48 89 c3 48 85 c0 0f 84 3a 05 00 00 e8 9f 8a 80 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45
RSP: 0018:ffffc900001274c0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88800dbeae40 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88800791a340 RDI: 0000000000000002
RBP: ffffc90000127548 R08: ffffc90000127400 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffc90000127588 R14: 0000000000000001 R15: ffffc90000127708
FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000000b85c002 CR4: 0000000000f70ee0
PKRU: 55555554
note: kworker/u4:2[34] exited with irqs disabled
------------[ cut here ]------------
WARNING: CPU: 1 PID: 34 at kernel/exit.c:814 do_exit+0xf68/0x1360 kernel/exit.c:814
Modules linked in:
CPU: 1 PID: 34 Comm: kworker/u4:2 Tainted: G D 6.3.0-rc2-intel-next-38f821ff82e9+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: writeback wb_workfn (flush-7:0)
RIP: 0010:do_exit+0xf68/0x1360 kernel/exit.c:814
Code: ff ff e8 2b 7e 1b 00 4c 89 ee bf 05 06 00 00 e8 7e c1 01 00 e9 a7 f2 ff ff e8 14 7e 1b 00 0f 0b e9 f8 f0 ff ff e8 08 7e 1b 00 <0f> 0b e9 60 f1 ff ff e8 fc 7d 1b 00 48 89 df e8 54 ff 1a 00 e9 ec
RSP: 0018:ffffc90000127eb0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88800791a340 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff88800791a340 RDI: 0000000000000002
RBP: ffffc90000127f18 R08: 0000000000000000 R09: 0000000000000000
R10: 34752f72656b726f R11: 776b203a65746f6e R12: 0000000000000000
R13: 0000000000000009 R14: ffff8880079292c0 R15: ffff888007924600
FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000000b85c002 CR4: 0000000000f70ee0
PKRU: 55555554
Call Trace:
<TASK>
make_task_dead+0x100/0x290 kernel/exit.c:981
rewind_stack_and_make_dead+0x17/0x20 arch/x86/entry/entry_64.S:1541
</TASK>
irq event stamp: 46556
hardirqs last enabled at (46555): [<ffffffff8218402d>] get_random_u32+0x1dd/0x360 drivers/char/random.c:532
hardirqs last disabled at (46556): [<ffffffff8300582e>] exc_page_fault+0x4e/0x500 arch/x86/mm/fault.c:1551
softirqs last enabled at (37844): [<ffffffff83029bdc>] softirq_handle_end kernel/softirq.c:414 [inline]
softirqs last enabled at (37844): [<ffffffff83029bdc>] __do_softirq+0x31c/0x49c kernel/softirq.c:600
softirqs last disabled at (37835): [<ffffffff8112e774>] invoke_softirq kernel/softirq.c:445 [inline]
softirqs last disabled at (37835): [<ffffffff8112e774>] __irq_exit_rcu kernel/softirq.c:650 [inline]
softirqs last disabled at (37835): [<ffffffff8112e774>] irq_exit_rcu+0xc4/0x100 kernel/softirq.c:662
---[ end trace 0000000000000000 ]---
----------------
Code disassembly (best guess):
0: 80 ff 49 cmp $0x49,%bh
3: 89 5d 18 mov %ebx,0x18(%rbp)
6: be 08 00 00 00 mov $0x8,%esi
b: bf 20 00 00 00 mov $0x20,%edi
10: e8 80 f9 03 00 call 0x3f995
15: 48 89 c3 mov %rax,%rbx
18: 48 85 c0 test %rax,%rax
1b: 0f 84 3a 05 00 00 je 0x55b
21: e8 9f 8a 80 ff call 0xff808ac5
26: 49 8b 45 18 mov 0x18(%r13),%rax
* 2a: f0 ff 40 10 lock incl 0x10(%rax) <-- trapping instruction
2e: 49 8b 45 18 mov 0x18(%r13),%rax
32: 48 8b 75 b8 mov -0x48(%rbp),%rsi
36: 48 89 da mov %rbx,%rdx
39: 48 89 43 18 mov %rax,0x18(%rbx)
3d: 48 rex.W
3e: 8b .byte 0x8b
3f: 45 rex.RB
"
> > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/kconfig_origin
> > v6.3-rc3 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/v6.3-rc3_issue_dmesg.log
> > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/bisect_info.log
> >
> > Bisected between v6.3-rc2 and v5.11 and found the bad commit:
> > "
> > 8ac5b996bf5199f15b7687ceae989f8b2a410dda
> > xfs: fix off-by-one-block in xfs_discard_folio()
>
> How does *fixing* an off by one error in the page cache produce a crash
> in the filestreams allocator?
>
I'm also surprised there is such a problem, I'm not sure the reason as
I'm not a little about xfs.
> > Reverted the commit on top of v6.3-rc2 kernel, at least the BUG dmesg was gone.
> >
> > And this issue could be reproduced in v6.3-rc3 kernel also.
> > Is it possible that the above commit involves a new issue?
> >
> > "
> > [ 62.318653] loop0: detected capacity change from 0 to 65536
> > [ 62.320459] XFS (loop0): Mounting V5 Filesystem d6f69dbd-8c5d-46be-b88e-92c0ae88ceb2
> > [ 62.325152] XFS (loop0): Ending clean mount
> > [ 62.326049] XFS (loop0): Quotacheck needed: Please wait.
> > [ 62.328884] XFS (loop0): Quotacheck: Done.
> > [ 62.363656] XFS (loop0): Metadata CRC error detected at xfs_agf_read_verify+0x10e/0x140, xfs_agf block 0x8001
> > [ 62.364489] XFS (loop0): Unmount and run xfs_repair
> > [ 62.364881] XFS (loop0): First 128 bytes of corrupted metadata buffer:
> > [ 62.365398] 00000000: 58 41 47 46 00 00 00 01 00 00 00 01 00 00 40 00 XAGF..........@.
> > [ 62.366026] 00000010: 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 01 ................
> > [ 62.366657] 00000020: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 04 ................
> > [ 62.367285] 00000030: 00 00 00 04 00 00 3b 5f 00 00 3b 5c 00 00 00 00 ......;_..;\....
> > [ 62.367927] 00000040: d6 f6 9d bd 8c 5d 46 be b8 8e 92 c0 ae 88 ce b2 .....]F.........
> > [ 62.368554] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > [ 62.369180] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > [ 62.369806] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > [ 62.370471] XFS (loop0): metadata I/O error in "xfs_read_agf+0xd0/0x200" at daddr 0x8001 len 1 error 74
> > [ 62.371312] XFS (loop0): page discard on page 00000000a6a1237b, inode 0x46, pos 0.
> > [ 62.385968] BUG: kernel NULL pointer dereference, address: 0000000000000010
> > [ 62.386541] #PF: supervisor write access in kernel mode
> > [ 62.386960] #PF: error_code(0x0002) - not-present page
> > [ 62.387370] PGD 0 P4D 0
> > [ 62.387588] Oops: 0002 [#1] PREEMPT SMP NOPTI
> > [ 62.387945] CPU: 1 PID: 74 Comm: kworker/u4:3 Not tainted 6.3.0-rc3-kvm-e8d018dd #1
> > [ 62.388545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> > [ 62.389426] Workqueue: writeback wb_workfn (flush-7:0)
> > [ 62.389845] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0
>
> What source line and/or instruction does %rip point to?
> Considering that this is a null pointer deference, you ought to be able
> to identify which pointer access did this.
>
> If you are going to run some scripted tool to randomly corrupt the
> filesystem to find failures, then you have an ethical and moral
> responsibility to do some of the work to narrow down and identify the
> cause of the failure, not just throw them at someone to do all the work.
>
You are right, sorry, I should provide RIP and all other detailed info I have
next time.
Below info is from above repro.report:
"
BUG: kernel NULL pointer dereference, address: 0000000000000010
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.3.0-rc2-intel-next-38f821ff82e9+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: writeback wb_workfn (flush-7:0)
RIP: 0010:arch_atomic_inc arch/x86/include/asm/atomic.h:95 [inline]
RIP: 0010:atomic_inc include/linux/atomic/atomic-instrumented.h:191 [inline]
RIP: 0010:xfs_filestream_create_association fs/xfs/xfs_filestream.c:321 [inline]
RIP: 0010:xfs_filestream_select_ag+0x5d5/0xce0 fs/xfs/xfs_filestream.c:372
"
Thanks!
BR.
-Pengfei
> --D
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-03-22 3:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-20 6:50 [Syzkaller & bisect] There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3 Pengfei Xu
2023-03-21 20:46 ` Darrick J. Wong
2023-03-22 3:20 ` Pengfei Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox