* [Syzkaller & bisect] There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3 @ 2023-03-20 6:50 Pengfei Xu 2023-03-21 20:46 ` Darrick J. Wong 0 siblings, 1 reply; 3+ messages in thread From: Pengfei Xu @ 2023-03-20 6:50 UTC (permalink / raw) To: dchinner; +Cc: bfoster, djwong, heng.su, linux-xfs, lkp Hi Dave Chinner and xfs experts, Greeting! There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3: All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230319_210525_xfs_filestream_select_ag Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.c Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/kconfig_origin v6.3-rc3 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/v6.3-rc3_issue_dmesg.log Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/bisect_info.log Bisected between v6.3-rc2 and v5.11 and found the bad commit: " 8ac5b996bf5199f15b7687ceae989f8b2a410dda xfs: fix off-by-one-block in xfs_discard_folio() " Reverted the commit on top of v6.3-rc2 kernel, at least the BUG dmesg was gone. And this issue could be reproduced in v6.3-rc3 kernel also. Is it possible that the above commit involves a new issue? " [ 62.318653] loop0: detected capacity change from 0 to 65536 [ 62.320459] XFS (loop0): Mounting V5 Filesystem d6f69dbd-8c5d-46be-b88e-92c0ae88ceb2 [ 62.325152] XFS (loop0): Ending clean mount [ 62.326049] XFS (loop0): Quotacheck needed: Please wait. [ 62.328884] XFS (loop0): Quotacheck: Done. [ 62.363656] XFS (loop0): Metadata CRC error detected at xfs_agf_read_verify+0x10e/0x140, xfs_agf block 0x8001 [ 62.364489] XFS (loop0): Unmount and run xfs_repair [ 62.364881] XFS (loop0): First 128 bytes of corrupted metadata buffer: [ 62.365398] 00000000: 58 41 47 46 00 00 00 01 00 00 00 01 00 00 40 00 XAGF..........@. [ 62.366026] 00000010: 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 01 ................ [ 62.366657] 00000020: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 04 ................ [ 62.367285] 00000030: 00 00 00 04 00 00 3b 5f 00 00 3b 5c 00 00 00 00 ......;_..;\.... [ 62.367927] 00000040: d6 f6 9d bd 8c 5d 46 be b8 8e 92 c0 ae 88 ce b2 .....]F......... [ 62.368554] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 62.369180] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 62.369806] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 62.370471] XFS (loop0): metadata I/O error in "xfs_read_agf+0xd0/0x200" at daddr 0x8001 len 1 error 74 [ 62.371312] XFS (loop0): page discard on page 00000000a6a1237b, inode 0x46, pos 0. [ 62.385968] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 62.386541] #PF: supervisor write access in kernel mode [ 62.386960] #PF: error_code(0x0002) - not-present page [ 62.387370] PGD 0 P4D 0 [ 62.387588] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 62.387945] CPU: 1 PID: 74 Comm: kworker/u4:3 Not tainted 6.3.0-rc3-kvm-e8d018dd #1 [ 62.388545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 62.389426] Workqueue: writeback wb_workfn (flush-7:0) [ 62.389845] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0 [ 62.390285] Code: 83 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 20 94 03 00 48 89 c3 48 85 c0 0f 84 57 04 00 00 e8 2f 30 83 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45 [ 62.391712] RSP: 0018:ffffc9000092f4c0 EFLAGS: 00010246 [ 62.392128] RAX: 0000000000000000 RBX: ffff88800b858940 RCX: 0000000000006cc0 [ 62.392688] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002 [ 62.393246] RBP: ffffc9000092f548 R08: ffffc9000092f400 R09: 0000000000000000 [ 62.393805] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 62.394363] R13: ffffc9000092f588 R14: 0000000000000001 R15: ffffc9000092f708 [ 62.394924] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 [ 62.395553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 62.396008] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0 [ 62.396569] PKRU: 55555554 [ 62.396793] Call Trace: [ 62.396996] <TASK> [ 62.397179] xfs_bmap_btalloc+0x706/0xb90 [ 62.397512] xfs_bmapi_allocate+0x25b/0x5e0 [ 62.397850] ? __sanitizer_cov_trace_pc+0x25/0x60 [ 62.398239] xfs_bmapi_convert_delalloc+0x335/0x6c0 [ 62.398649] xfs_map_blocks+0x2ff/0x740 [ 62.398971] ? __sanitizer_cov_trace_pc+0x25/0x60 [ 62.399362] iomap_do_writepage+0x43f/0xf10 [ 62.399709] write_cache_pages+0x2b8/0x7e0 [ 62.400047] ? __pfx_iomap_do_writepage+0x10/0x10 [ 62.400438] iomap_writepages+0x3e/0x80 [ 62.400757] xfs_vm_writepages+0x97/0xe0 [ 62.401088] ? __pfx_xfs_vm_writepages+0x10/0x10 [ 62.401470] do_writepages+0x10f/0x240 [ 62.401783] ? write_comp_data+0x2f/0x90 [ 62.402112] __writeback_single_inode+0x9f/0x780 [ 62.402492] ? write_comp_data+0x2f/0x90 [ 62.402823] writeback_sb_inodes+0x301/0x800 [ 62.403184] wb_writeback+0x18b/0x580 [ 62.403495] wb_workfn+0xca/0x880 [ 62.403778] ? __this_cpu_preempt_check+0x20/0x30 [ 62.404171] ? lock_acquire+0xe6/0x2b0 [ 62.404484] ? __this_cpu_preempt_check+0x20/0x30 [ 62.404872] ? write_comp_data+0x2f/0x90 [ 62.405202] process_one_work+0x3b1/0x860 [ 62.405538] worker_thread+0x52/0x660 [ 62.405846] ? __pfx_worker_thread+0x10/0x10 [ 62.406202] kthread+0x161/0x1a0 [ 62.406475] ? __pfx_kthread+0x10/0x10 [ 62.406787] ret_from_fork+0x29/0x50 [ 62.407094] </TASK> [ 62.407281] Modules linked in: [ 62.407535] CR2: 0000000000000010 [ 62.407808] ---[ end trace 0000000000000000 ]--- [ 62.408178] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0 [ 62.408619] Code: 83 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 20 94 03 00 48 89 c3 48 85 c0 0f 84 57 04 00 00 e8 2f 30 83 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45 [ 62.410052] RSP: 0018:ffffc9000092f4c0 EFLAGS: 00010246 [ 62.410469] RAX: 0000000000000000 RBX: ffff88800b858940 RCX: 0000000000006cc0 [ 62.411032] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002 [ 62.411594] RBP: ffffc9000092f548 R08: ffffc9000092f400 R09: 0000000000000000 [ 62.412155] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 62.412716] R13: ffffc9000092f588 R14: 0000000000000001 R15: ffffc9000092f708 [ 62.413278] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 [ 62.413909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 62.414368] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0 [ 62.414934] PKRU: 55555554 [ 62.415159] note: kworker/u4:3[74] exited with irqs disabled [ 62.415642] ------------[ cut here ]------------ [ 62.416012] WARNING: CPU: 1 PID: 74 at kernel/exit.c:814 do_exit+0xe8a/0x12b0 [ 62.416580] Modules linked in: [ 62.416833] CPU: 1 PID: 74 Comm: kworker/u4:3 Tainted: G D 6.3.0-rc3-kvm-e8d018dd #1 [ 62.417546] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 62.418432] Workqueue: writeback wb_workfn (flush-7:0) [ 62.418861] RIP: 0010:do_exit+0xe8a/0x12b0 [ 62.419197] Code: 00 65 01 05 b4 ba f0 7e e9 f4 fd ff ff e8 be 1e 1b 00 48 8b bb 98 09 00 00 31 f6 e8 30 b0 ff ff e9 74 fb ff ff e8 a6 1e 1b 00 <0f> 0b e9 3e f2 ff ff e8 9a 1e 1b 00 4c 89 ee bf 05 06 00 00 e8 bd [ 62.420652] RSP: 0018:ffffc9000092feb0 EFLAGS: 00010246 [ 62.421072] RAX: 0000000000000000 RBX: ffff88800a02a340 RCX: 0000000000000001 [ 62.421635] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002 [ 62.422195] RBP: ffffc9000092ff18 R08: 0000000000000000 R09: 0000000000000000 [ 62.422758] R10: 34752f72656b726f R11: 776b203a65746f6e R12: 0000000000000000 [ 62.423323] R13: 0000000000000009 R14: ffff88800a009900 R15: ffff8880093a1180 [ 62.423902] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 [ 62.424539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 62.425000] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0 [ 62.425568] PKRU: 55555554 [ 62.425794] Call Trace: [ 62.426000] <TASK> [ 62.426183] ? write_comp_data+0x2f/0x90 [ 62.426513] make_task_dead+0x100/0x290 [ 62.426832] rewind_stack_and_make_dead+0x17/0x20 [ 62.427227] </TASK> [ 62.427414] irq event stamp: 122544 [ 62.427715] hardirqs last enabled at (122543): [<ffffffff821395dd>] get_random_u32+0x1dd/0x360 [ 62.428409] hardirqs last disabled at (122544): [<ffffffff82f8d76e>] exc_page_fault+0x4e/0x3b0 [ 62.429094] softirqs last enabled at (114870): [<ffffffff82fb01a9>] __do_softirq+0x2d9/0x3c3 [ 62.429771] softirqs last disabled at (114849): [<ffffffff81126724>] irq_exit_rcu+0xc4/0x100 [ 62.430443] ---[ end trace 0000000000000000 ]--- " I hope it's helpful. Thanks! --- If you don't need the following environment to reproduce the problem or if you already have one, please ignore the following information. How to reproduce: git clone https://gitlab.com/xupengfe/repro_vm_env.git cd repro_vm_env tar -xvf repro_vm_env.tar.gz cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0 // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel // You could change the bzImage_xxx as you want You could use below command to log in, there is no password for root. ssh -p 10023 root@localhost After login vm(virtual machine) successfully, you could transfer reproduced binary to the vm by below way, and reproduce the problem in vm: gcc -pthread -o repro repro.c scp -P 10023 repro root@localhost:/root/ Get the bzImage for target kernel: Please use target kconfig and copy it to kernel_src/.config make olddefconfig make -jx bzImage //x should equal or less than cpu num your pc has Fill the bzImage file into above start3.sh to load the target kernel in vm. Tips: If you already have qemu-system-x86_64, please ignore below info. If you want to install qemu v7.1.0 version: git clone https://github.com/qemu/qemu.git cd qemu git checkout -f v7.1.0 mkdir build cd build yum install -y ninja-build.x86_64 ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl make make install Thanks! BR. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Syzkaller & bisect] There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3 2023-03-20 6:50 [Syzkaller & bisect] There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3 Pengfei Xu @ 2023-03-21 20:46 ` Darrick J. Wong 2023-03-22 3:20 ` Pengfei Xu 0 siblings, 1 reply; 3+ messages in thread From: Darrick J. Wong @ 2023-03-21 20:46 UTC (permalink / raw) To: Pengfei Xu; +Cc: dchinner, bfoster, heng.su, linux-xfs, lkp On Mon, Mar 20, 2023 at 02:50:07PM +0800, Pengfei Xu wrote: > Hi Dave Chinner and xfs experts, > > Greeting! > > There is BUG: unable to handle kernel NULL pointer dereference in > xfs_filestream_select_ag in v6.3-rc3: > > All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230319_210525_xfs_filestream_select_ag > Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.c How the hell am I supposed to extract the fuzzed disk image for analysis? Current Google syzbot provides a lot more information for analysis. Why don't you go triage some of their reports instead of spraying more crap at the XFS list? > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/kconfig_origin > v6.3-rc3 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/v6.3-rc3_issue_dmesg.log > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/bisect_info.log > > Bisected between v6.3-rc2 and v5.11 and found the bad commit: > " > 8ac5b996bf5199f15b7687ceae989f8b2a410dda > xfs: fix off-by-one-block in xfs_discard_folio() How does *fixing* an off by one error in the page cache produce a crash in the filestreams allocator? > Reverted the commit on top of v6.3-rc2 kernel, at least the BUG dmesg was gone. > > And this issue could be reproduced in v6.3-rc3 kernel also. > Is it possible that the above commit involves a new issue? > > " > [ 62.318653] loop0: detected capacity change from 0 to 65536 > [ 62.320459] XFS (loop0): Mounting V5 Filesystem d6f69dbd-8c5d-46be-b88e-92c0ae88ceb2 > [ 62.325152] XFS (loop0): Ending clean mount > [ 62.326049] XFS (loop0): Quotacheck needed: Please wait. > [ 62.328884] XFS (loop0): Quotacheck: Done. > [ 62.363656] XFS (loop0): Metadata CRC error detected at xfs_agf_read_verify+0x10e/0x140, xfs_agf block 0x8001 > [ 62.364489] XFS (loop0): Unmount and run xfs_repair > [ 62.364881] XFS (loop0): First 128 bytes of corrupted metadata buffer: > [ 62.365398] 00000000: 58 41 47 46 00 00 00 01 00 00 00 01 00 00 40 00 XAGF..........@. > [ 62.366026] 00000010: 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 01 ................ > [ 62.366657] 00000020: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 04 ................ > [ 62.367285] 00000030: 00 00 00 04 00 00 3b 5f 00 00 3b 5c 00 00 00 00 ......;_..;\.... > [ 62.367927] 00000040: d6 f6 9d bd 8c 5d 46 be b8 8e 92 c0 ae 88 ce b2 .....]F......... > [ 62.368554] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [ 62.369180] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [ 62.369806] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [ 62.370471] XFS (loop0): metadata I/O error in "xfs_read_agf+0xd0/0x200" at daddr 0x8001 len 1 error 74 > [ 62.371312] XFS (loop0): page discard on page 00000000a6a1237b, inode 0x46, pos 0. > [ 62.385968] BUG: kernel NULL pointer dereference, address: 0000000000000010 > [ 62.386541] #PF: supervisor write access in kernel mode > [ 62.386960] #PF: error_code(0x0002) - not-present page > [ 62.387370] PGD 0 P4D 0 > [ 62.387588] Oops: 0002 [#1] PREEMPT SMP NOPTI > [ 62.387945] CPU: 1 PID: 74 Comm: kworker/u4:3 Not tainted 6.3.0-rc3-kvm-e8d018dd #1 > [ 62.388545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > [ 62.389426] Workqueue: writeback wb_workfn (flush-7:0) > [ 62.389845] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0 What source line and/or instruction does %rip point to? Considering that this is a null pointer deference, you ought to be able to identify which pointer access did this. If you are going to run some scripted tool to randomly corrupt the filesystem to find failures, then you have an ethical and moral responsibility to do some of the work to narrow down and identify the cause of the failure, not just throw them at someone to do all the work. --D > [ 62.390285] Code: 83 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 20 94 03 00 48 89 c3 48 85 c0 0f 84 57 04 00 00 e8 2f 30 83 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45 > [ 62.391712] RSP: 0018:ffffc9000092f4c0 EFLAGS: 00010246 > [ 62.392128] RAX: 0000000000000000 RBX: ffff88800b858940 RCX: 0000000000006cc0 > [ 62.392688] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002 > [ 62.393246] RBP: ffffc9000092f548 R08: ffffc9000092f400 R09: 0000000000000000 > [ 62.393805] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > [ 62.394363] R13: ffffc9000092f588 R14: 0000000000000001 R15: ffffc9000092f708 > [ 62.394924] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 > [ 62.395553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 62.396008] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0 > [ 62.396569] PKRU: 55555554 > [ 62.396793] Call Trace: > [ 62.396996] <TASK> > [ 62.397179] xfs_bmap_btalloc+0x706/0xb90 > [ 62.397512] xfs_bmapi_allocate+0x25b/0x5e0 > [ 62.397850] ? __sanitizer_cov_trace_pc+0x25/0x60 > [ 62.398239] xfs_bmapi_convert_delalloc+0x335/0x6c0 > [ 62.398649] xfs_map_blocks+0x2ff/0x740 > [ 62.398971] ? __sanitizer_cov_trace_pc+0x25/0x60 > [ 62.399362] iomap_do_writepage+0x43f/0xf10 > [ 62.399709] write_cache_pages+0x2b8/0x7e0 > [ 62.400047] ? __pfx_iomap_do_writepage+0x10/0x10 > [ 62.400438] iomap_writepages+0x3e/0x80 > [ 62.400757] xfs_vm_writepages+0x97/0xe0 > [ 62.401088] ? __pfx_xfs_vm_writepages+0x10/0x10 > [ 62.401470] do_writepages+0x10f/0x240 > [ 62.401783] ? write_comp_data+0x2f/0x90 > [ 62.402112] __writeback_single_inode+0x9f/0x780 > [ 62.402492] ? write_comp_data+0x2f/0x90 > [ 62.402823] writeback_sb_inodes+0x301/0x800 > [ 62.403184] wb_writeback+0x18b/0x580 > [ 62.403495] wb_workfn+0xca/0x880 > [ 62.403778] ? __this_cpu_preempt_check+0x20/0x30 > [ 62.404171] ? lock_acquire+0xe6/0x2b0 > [ 62.404484] ? __this_cpu_preempt_check+0x20/0x30 > [ 62.404872] ? write_comp_data+0x2f/0x90 > [ 62.405202] process_one_work+0x3b1/0x860 > [ 62.405538] worker_thread+0x52/0x660 > [ 62.405846] ? __pfx_worker_thread+0x10/0x10 > [ 62.406202] kthread+0x161/0x1a0 > [ 62.406475] ? __pfx_kthread+0x10/0x10 > [ 62.406787] ret_from_fork+0x29/0x50 > [ 62.407094] </TASK> > [ 62.407281] Modules linked in: > [ 62.407535] CR2: 0000000000000010 > [ 62.407808] ---[ end trace 0000000000000000 ]--- > [ 62.408178] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0 > [ 62.408619] Code: 83 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 20 94 03 00 48 89 c3 48 85 c0 0f 84 57 04 00 00 e8 2f 30 83 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45 > [ 62.410052] RSP: 0018:ffffc9000092f4c0 EFLAGS: 00010246 > [ 62.410469] RAX: 0000000000000000 RBX: ffff88800b858940 RCX: 0000000000006cc0 > [ 62.411032] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002 > [ 62.411594] RBP: ffffc9000092f548 R08: ffffc9000092f400 R09: 0000000000000000 > [ 62.412155] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > [ 62.412716] R13: ffffc9000092f588 R14: 0000000000000001 R15: ffffc9000092f708 > [ 62.413278] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 > [ 62.413909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 62.414368] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0 > [ 62.414934] PKRU: 55555554 > [ 62.415159] note: kworker/u4:3[74] exited with irqs disabled > [ 62.415642] ------------[ cut here ]------------ > [ 62.416012] WARNING: CPU: 1 PID: 74 at kernel/exit.c:814 do_exit+0xe8a/0x12b0 > [ 62.416580] Modules linked in: > [ 62.416833] CPU: 1 PID: 74 Comm: kworker/u4:3 Tainted: G D 6.3.0-rc3-kvm-e8d018dd #1 > [ 62.417546] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > [ 62.418432] Workqueue: writeback wb_workfn (flush-7:0) > [ 62.418861] RIP: 0010:do_exit+0xe8a/0x12b0 > [ 62.419197] Code: 00 65 01 05 b4 ba f0 7e e9 f4 fd ff ff e8 be 1e 1b 00 48 8b bb 98 09 00 00 31 f6 e8 30 b0 ff ff e9 74 fb ff ff e8 a6 1e 1b 00 <0f> 0b e9 3e f2 ff ff e8 9a 1e 1b 00 4c 89 ee bf 05 06 00 00 e8 bd > [ 62.420652] RSP: 0018:ffffc9000092feb0 EFLAGS: 00010246 > [ 62.421072] RAX: 0000000000000000 RBX: ffff88800a02a340 RCX: 0000000000000001 > [ 62.421635] RDX: 0000000000000000 RSI: ffff88800a02a340 RDI: 0000000000000002 > [ 62.422195] RBP: ffffc9000092ff18 R08: 0000000000000000 R09: 0000000000000000 > [ 62.422758] R10: 34752f72656b726f R11: 776b203a65746f6e R12: 0000000000000000 > [ 62.423323] R13: 0000000000000009 R14: ffff88800a009900 R15: ffff8880093a1180 > [ 62.423902] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 > [ 62.424539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 62.425000] CR2: 0000000000000010 CR3: 000000000ad7c003 CR4: 0000000000770ee0 > [ 62.425568] PKRU: 55555554 > [ 62.425794] Call Trace: > [ 62.426000] <TASK> > [ 62.426183] ? write_comp_data+0x2f/0x90 > [ 62.426513] make_task_dead+0x100/0x290 > [ 62.426832] rewind_stack_and_make_dead+0x17/0x20 > [ 62.427227] </TASK> > [ 62.427414] irq event stamp: 122544 > [ 62.427715] hardirqs last enabled at (122543): [<ffffffff821395dd>] get_random_u32+0x1dd/0x360 > [ 62.428409] hardirqs last disabled at (122544): [<ffffffff82f8d76e>] exc_page_fault+0x4e/0x3b0 > [ 62.429094] softirqs last enabled at (114870): [<ffffffff82fb01a9>] __do_softirq+0x2d9/0x3c3 > [ 62.429771] softirqs last disabled at (114849): [<ffffffff81126724>] irq_exit_rcu+0xc4/0x100 > [ 62.430443] ---[ end trace 0000000000000000 ]--- > " > > I hope it's helpful. > > Thanks! > > --- > > If you don't need the following environment to reproduce the problem or if you > already have one, please ignore the following information. > > How to reproduce: > git clone https://gitlab.com/xupengfe/repro_vm_env.git > cd repro_vm_env > tar -xvf repro_vm_env.tar.gz > cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0 > // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel > // You could change the bzImage_xxx as you want > You could use below command to log in, there is no password for root. > ssh -p 10023 root@localhost > > After login vm(virtual machine) successfully, you could transfer reproduced > binary to the vm by below way, and reproduce the problem in vm: > gcc -pthread -o repro repro.c > scp -P 10023 repro root@localhost:/root/ > > Get the bzImage for target kernel: > Please use target kconfig and copy it to kernel_src/.config > make olddefconfig > make -jx bzImage //x should equal or less than cpu num your pc has > > Fill the bzImage file into above start3.sh to load the target kernel in vm. > > > Tips: > If you already have qemu-system-x86_64, please ignore below info. > If you want to install qemu v7.1.0 version: > git clone https://github.com/qemu/qemu.git > cd qemu > git checkout -f v7.1.0 > mkdir build > cd build > yum install -y ninja-build.x86_64 > ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl > make > make install > > Thanks! > BR. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Syzkaller & bisect] There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3 2023-03-21 20:46 ` Darrick J. Wong @ 2023-03-22 3:20 ` Pengfei Xu 0 siblings, 0 replies; 3+ messages in thread From: Pengfei Xu @ 2023-03-22 3:20 UTC (permalink / raw) To: Darrick J. Wong; +Cc: dchinner, bfoster, heng.su, linux-xfs, lkp Hi Darrick J. Wong, On 2023-03-21 at 13:46:38 -0700, Darrick J. Wong wrote: > On Mon, Mar 20, 2023 at 02:50:07PM +0800, Pengfei Xu wrote: > > Hi Dave Chinner and xfs experts, > > > > Greeting! > > > > There is BUG: unable to handle kernel NULL pointer dereference in > > xfs_filestream_select_ag in v6.3-rc3: > > > > All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230319_210525_xfs_filestream_select_ag > > Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.c > > How the hell am I supposed to extract the fuzzed disk image for > analysis? > > Current Google syzbot provides a lot more information for analysis. Why > don't you go triage some of their reports instead of spraying more crap > at the XFS list? > Ah, thanks a lot for your suggestion! Next time I should add more analysis as follow from syzkaller to all problem reports. Updated more info as follow, More detailed analysis from syzkaller report0: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/report0 repor.stats: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.stats vm machine info: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/machineInfo0 I newly added repro.report: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.report " 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (loop0): metadata I/O error in "xfs_read_agf+0xd0/0x2c0" at daddr 0x8001 len 1 error 74 XFS (loop0): page discard on page 00000000b8174cbd, inode 0x46, pos 0. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.3.0-rc2-intel-next-38f821ff82e9+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:arch_atomic_inc arch/x86/include/asm/atomic.h:95 [inline] RIP: 0010:atomic_inc include/linux/atomic/atomic-instrumented.h:191 [inline] RIP: 0010:xfs_filestream_create_association fs/xfs/xfs_filestream.c:321 [inline] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xce0 fs/xfs/xfs_filestream.c:372 Code: 80 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 80 f9 03 00 48 89 c3 48 85 c0 0f 84 3a 05 00 00 e8 9f 8a 80 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45 RSP: 0018:ffffc900001274c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88800dbeae40 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88800791a340 RDI: 0000000000000002 RBP: ffffc90000127548 R08: ffffc90000127400 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc90000127588 R14: 0000000000000001 R15: ffffc90000127708 FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000000b85c002 CR4: 0000000000f70ee0 PKRU: 55555554 Call Trace: <TASK> xfs_bmap_btalloc_filestreams fs/xfs/libxfs/xfs_bmap.c:3558 [inline] xfs_bmap_btalloc+0x706/0xb90 fs/xfs/libxfs/xfs_bmap.c:3672 xfs_bmap_alloc_userdata fs/xfs/libxfs/xfs_bmap.c:4046 [inline] xfs_bmapi_allocate+0x25b/0x5e0 fs/xfs/libxfs/xfs_bmap.c:4089 xfs_bmapi_convert_delalloc+0x335/0x6c0 fs/xfs/libxfs/xfs_bmap.c:4554 xfs_convert_blocks fs/xfs/xfs_aops.c:266 [inline] xfs_map_blocks+0x2ff/0x8a0 fs/xfs/xfs_aops.c:389 iomap_writepage_map fs/iomap/buffered-io.c:1641 [inline] iomap_do_writepage+0x43f/0x1070 fs/iomap/buffered-io.c:1803 write_cache_pages+0x2b8/0x8a0 mm/page-writeback.c:2473 iomap_writepages+0x3e/0x80 fs/iomap/buffered-io.c:1820 xfs_vm_writepages+0x97/0xe0 fs/xfs/xfs_aops.c:513 do_writepages+0x10f/0x240 mm/page-writeback.c:2551 __writeback_single_inode+0x9f/0xb20 fs/fs-writeback.c:1600 writeback_sb_inodes+0x301/0x8b0 fs/fs-writeback.c:1891 wb_writeback+0x18b/0x7c0 fs/fs-writeback.c:2065 wb_do_writeback fs/fs-writeback.c:2208 [inline] wb_workfn+0xc0/0xad0 fs/fs-writeback.c:2248 process_one_work+0x3b1/0x9e0 kernel/workqueue.c:2390 worker_thread+0x52/0x660 kernel/workqueue.c:2537 kthread+0x161/0x1a0 kernel/kthread.c:376 ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308 </TASK> Modules linked in: CR2: 0000000000000010 ---[ end trace 0000000000000000 ]--- RIP: 0010:arch_atomic_inc arch/x86/include/asm/atomic.h:95 [inline] RIP: 0010:atomic_inc include/linux/atomic/atomic-instrumented.h:191 [inline] RIP: 0010:xfs_filestream_create_association fs/xfs/xfs_filestream.c:321 [inline] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xce0 fs/xfs/xfs_filestream.c:372 Code: 80 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 80 f9 03 00 48 89 c3 48 85 c0 0f 84 3a 05 00 00 e8 9f 8a 80 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45 RSP: 0018:ffffc900001274c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88800dbeae40 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88800791a340 RDI: 0000000000000002 RBP: ffffc90000127548 R08: ffffc90000127400 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc90000127588 R14: 0000000000000001 R15: ffffc90000127708 FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000000b85c002 CR4: 0000000000f70ee0 PKRU: 55555554 note: kworker/u4:2[34] exited with irqs disabled ------------[ cut here ]------------ WARNING: CPU: 1 PID: 34 at kernel/exit.c:814 do_exit+0xf68/0x1360 kernel/exit.c:814 Modules linked in: CPU: 1 PID: 34 Comm: kworker/u4:2 Tainted: G D 6.3.0-rc2-intel-next-38f821ff82e9+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:do_exit+0xf68/0x1360 kernel/exit.c:814 Code: ff ff e8 2b 7e 1b 00 4c 89 ee bf 05 06 00 00 e8 7e c1 01 00 e9 a7 f2 ff ff e8 14 7e 1b 00 0f 0b e9 f8 f0 ff ff e8 08 7e 1b 00 <0f> 0b e9 60 f1 ff ff e8 fc 7d 1b 00 48 89 df e8 54 ff 1a 00 e9 ec RSP: 0018:ffffc90000127eb0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88800791a340 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffff88800791a340 RDI: 0000000000000002 RBP: ffffc90000127f18 R08: 0000000000000000 R09: 0000000000000000 R10: 34752f72656b726f R11: 776b203a65746f6e R12: 0000000000000000 R13: 0000000000000009 R14: ffff8880079292c0 R15: ffff888007924600 FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000000b85c002 CR4: 0000000000f70ee0 PKRU: 55555554 Call Trace: <TASK> make_task_dead+0x100/0x290 kernel/exit.c:981 rewind_stack_and_make_dead+0x17/0x20 arch/x86/entry/entry_64.S:1541 </TASK> irq event stamp: 46556 hardirqs last enabled at (46555): [<ffffffff8218402d>] get_random_u32+0x1dd/0x360 drivers/char/random.c:532 hardirqs last disabled at (46556): [<ffffffff8300582e>] exc_page_fault+0x4e/0x500 arch/x86/mm/fault.c:1551 softirqs last enabled at (37844): [<ffffffff83029bdc>] softirq_handle_end kernel/softirq.c:414 [inline] softirqs last enabled at (37844): [<ffffffff83029bdc>] __do_softirq+0x31c/0x49c kernel/softirq.c:600 softirqs last disabled at (37835): [<ffffffff8112e774>] invoke_softirq kernel/softirq.c:445 [inline] softirqs last disabled at (37835): [<ffffffff8112e774>] __irq_exit_rcu kernel/softirq.c:650 [inline] softirqs last disabled at (37835): [<ffffffff8112e774>] irq_exit_rcu+0xc4/0x100 kernel/softirq.c:662 ---[ end trace 0000000000000000 ]--- ---------------- Code disassembly (best guess): 0: 80 ff 49 cmp $0x49,%bh 3: 89 5d 18 mov %ebx,0x18(%rbp) 6: be 08 00 00 00 mov $0x8,%esi b: bf 20 00 00 00 mov $0x20,%edi 10: e8 80 f9 03 00 call 0x3f995 15: 48 89 c3 mov %rax,%rbx 18: 48 85 c0 test %rax,%rax 1b: 0f 84 3a 05 00 00 je 0x55b 21: e8 9f 8a 80 ff call 0xff808ac5 26: 49 8b 45 18 mov 0x18(%r13),%rax * 2a: f0 ff 40 10 lock incl 0x10(%rax) <-- trapping instruction 2e: 49 8b 45 18 mov 0x18(%r13),%rax 32: 48 8b 75 b8 mov -0x48(%rbp),%rsi 36: 48 89 da mov %rbx,%rdx 39: 48 89 43 18 mov %rax,0x18(%rbx) 3d: 48 rex.W 3e: 8b .byte 0x8b 3f: 45 rex.RB " > > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/kconfig_origin > > v6.3-rc3 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/v6.3-rc3_issue_dmesg.log > > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/bisect_info.log > > > > Bisected between v6.3-rc2 and v5.11 and found the bad commit: > > " > > 8ac5b996bf5199f15b7687ceae989f8b2a410dda > > xfs: fix off-by-one-block in xfs_discard_folio() > > How does *fixing* an off by one error in the page cache produce a crash > in the filestreams allocator? > I'm also surprised there is such a problem, I'm not sure the reason as I'm not a little about xfs. > > Reverted the commit on top of v6.3-rc2 kernel, at least the BUG dmesg was gone. > > > > And this issue could be reproduced in v6.3-rc3 kernel also. > > Is it possible that the above commit involves a new issue? > > > > " > > [ 62.318653] loop0: detected capacity change from 0 to 65536 > > [ 62.320459] XFS (loop0): Mounting V5 Filesystem d6f69dbd-8c5d-46be-b88e-92c0ae88ceb2 > > [ 62.325152] XFS (loop0): Ending clean mount > > [ 62.326049] XFS (loop0): Quotacheck needed: Please wait. > > [ 62.328884] XFS (loop0): Quotacheck: Done. > > [ 62.363656] XFS (loop0): Metadata CRC error detected at xfs_agf_read_verify+0x10e/0x140, xfs_agf block 0x8001 > > [ 62.364489] XFS (loop0): Unmount and run xfs_repair > > [ 62.364881] XFS (loop0): First 128 bytes of corrupted metadata buffer: > > [ 62.365398] 00000000: 58 41 47 46 00 00 00 01 00 00 00 01 00 00 40 00 XAGF..........@. > > [ 62.366026] 00000010: 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 01 ................ > > [ 62.366657] 00000020: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 04 ................ > > [ 62.367285] 00000030: 00 00 00 04 00 00 3b 5f 00 00 3b 5c 00 00 00 00 ......;_..;\.... > > [ 62.367927] 00000040: d6 f6 9d bd 8c 5d 46 be b8 8e 92 c0 ae 88 ce b2 .....]F......... > > [ 62.368554] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > [ 62.369180] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > [ 62.369806] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > [ 62.370471] XFS (loop0): metadata I/O error in "xfs_read_agf+0xd0/0x200" at daddr 0x8001 len 1 error 74 > > [ 62.371312] XFS (loop0): page discard on page 00000000a6a1237b, inode 0x46, pos 0. > > [ 62.385968] BUG: kernel NULL pointer dereference, address: 0000000000000010 > > [ 62.386541] #PF: supervisor write access in kernel mode > > [ 62.386960] #PF: error_code(0x0002) - not-present page > > [ 62.387370] PGD 0 P4D 0 > > [ 62.387588] Oops: 0002 [#1] PREEMPT SMP NOPTI > > [ 62.387945] CPU: 1 PID: 74 Comm: kworker/u4:3 Not tainted 6.3.0-rc3-kvm-e8d018dd #1 > > [ 62.388545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > > [ 62.389426] Workqueue: writeback wb_workfn (flush-7:0) > > [ 62.389845] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0 > > What source line and/or instruction does %rip point to? > Considering that this is a null pointer deference, you ought to be able > to identify which pointer access did this. > > If you are going to run some scripted tool to randomly corrupt the > filesystem to find failures, then you have an ethical and moral > responsibility to do some of the work to narrow down and identify the > cause of the failure, not just throw them at someone to do all the work. > You are right, sorry, I should provide RIP and all other detailed info I have next time. Below info is from above repro.report: " BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.3.0-rc2-intel-next-38f821ff82e9+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:arch_atomic_inc arch/x86/include/asm/atomic.h:95 [inline] RIP: 0010:atomic_inc include/linux/atomic/atomic-instrumented.h:191 [inline] RIP: 0010:xfs_filestream_create_association fs/xfs/xfs_filestream.c:321 [inline] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xce0 fs/xfs/xfs_filestream.c:372 " Thanks! BR. -Pengfei > --D > ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-03-22 3:19 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-03-20 6:50 [Syzkaller & bisect] There is BUG: unable to handle kernel NULL pointer dereference in xfs_filestream_select_ag in v6.3-rc3 Pengfei Xu 2023-03-21 20:46 ` Darrick J. Wong 2023-03-22 3:20 ` Pengfei Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox