* [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] @ 2023-08-23 4:06 Ming Lei 2023-08-23 8:47 ` Christian Brauner 0 siblings, 1 reply; 14+ messages in thread From: Ming Lei @ 2023-08-23 4:06 UTC (permalink / raw) To: linux-fsdevel, Christian Brauner, Alexander Viro, linux-kernel Cc: ming.lei, linux-scsi, Changhui Zhong Looks the issue is more related with vfs, so forward to vfs list. ----- Forwarded message from Changhui Zhong <czhong@redhat.com> ----- Date: Wed, 23 Aug 2023 11:17:55 +0800 From: Changhui Zhong <czhong@redhat.com> To: linux-scsi@vger.kernel.org Cc: Ming Lei <ming.lei@redhat.com> Subject: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278 Hello, triggered below warning issue with branch " Tree: mainline.kernel.org-clang Repository: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git @ master Commit Hash: 89bf6209cad66214d3774dac86b6bbf2aec6a30d Commit Name: v6.5-rc7-18-g89bf6209cad6 Kernel information: Commit message: Merge tag 'devicetree-fixes-for-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux " for more detail,please check https://datawarehouse.cki-project.org/kcidb/tests/9232643 #modprobe scsi_debug virtual_gb=128 #echo none > /sys/block/sdb/queue/scheduler #fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=4 --rw=randrw --name=sdb-libaio-randrw-4k --filename=/dev/sdb --direct=1 --size=60G --runtime=60 [ 3056.092761] Device: sdb Engine: libaio Sched: none Pattern: randrw Direct: 1 Depth: 1 Block size: 4K Size: 60G [ 3117.055168] ------------[ cut here ]------------ [ 3117.059778] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278 [ 3117.067601] Modules linked in: scsi_debug nvme nvme_core nvme_common null_blk pktcdvd ipmi_watchdog ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu cppc_cpufreq arm_dsu_pmu acpi_tad loop fuse zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce sbsa_gwdt ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug] [ 3117.103572] CPU: 121 PID: 93233 Comm: bash Not tainted 6.5.0-rc7 #1 [ 3117.109827] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022 [ 3117.119119] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3117.126068] pc : __dentry_kill+0x214/0x278 [ 3117.130152] lr : __dentry_kill+0x194/0x278 [ 3117.134236] sp : ffff800084993870 [ 3117.137537] x29: ffff800084993870 x28: ffff0800830e2200 x27: 0000000000080400 [ 3117.144661] x26: 00000000fff7fbff x25: 0000000000000001 x24: ffff07ff88473198 [ 3117.151783] x23: ffff07ff884731c0 x22: ffff07ff9d033c80 x21: ffff07ff884731d0 [ 3117.158906] x20: ffff07ff88473198 x19: ffff07ff88473140 x18: ffffbc0d0739ceb4 [ 3117.166028] x17: 0000000000000000 x16: ffffbc0d073528c0 x15: ffff07ff89c761f8 [ 3117.173151] x14: 0000000000000002 x13: 0000000000000000 x12: 0000000000000001 [ 3117.180273] x11: ffff080f33fe0850 x10: ffffffffffffffff x9 : 0000000100000000 [ 3117.187395] x8 : ffffbc0d08aa1e98 x7 : 0000000000000000 x6 : 000000000000003f [ 3117.194518] x5 : ffff800084993a30 x4 : ffff800084993908 x3 : ffff07ff9cb7e210 [ 3117.201640] x2 : ffff07ff884731d0 x1 : 0000000000000000 x0 : ffff07ff884731f0 [ 3117.208763] Call trace: [ 3117.211197] __dentry_kill+0x214/0x278 [ 3117.214934] shrink_dentry_list+0x134/0x2b0 [ 3117.219105] prune_dcache_sb+0x64/0xa0 [ 3117.222842] super_cache_scan+0x144/0x198 [ 3117.226841] do_shrink_slab+0x1dc/0x420 [ 3117.230666] shrink_slab+0x114/0x388 [ 3117.234229] drop_slab+0xb0/0x118 [ 3117.237532] drop_caches_sysctl_handler+0xac/0x170 [ 3117.242313] proc_sys_call_handler+0x184/0x2d0 [ 3117.246745] proc_sys_write+0x20/0x38 [ 3117.250396] vfs_write+0x24c/0x368 [ 3117.253786] ksys_write+0x84/0xf8 [ 3117.257089] __arm64_sys_write+0x28/0x40 [ 3117.261000] invoke_syscall+0x78/0x110 [ 3117.264739] el0_svc_common+0xc0/0xf8 [ 3117.268390] do_el0_svc+0x3c/0xb8 [ 3117.271693] el0_svc+0x34/0x110 [ 3117.274825] el0t_64_sync_handler+0x84/0x100 [ 3117.279083] el0t_64_sync+0x194/0x198 [ 3117.282733] ---[ end trace 0000000000000000 ]--- [ 3119.372909] Device: sdb Engine: libaio Sched: none Pattern: randrw Direct: 1 Depth: 1 Block size: 16K Size: 60G [ 3144.068984] watchdog: BUG: soft lockup - CPU#97 stuck for 26s! [fio:93777] [ 3144.069984] watchdog: BUG: soft lockup - CPU#99 stuck for 26s! [systemd-udevd:1680] [ 3144.075849] Modules linked in: scsi_debug nvme [ 3144.083493] Modules linked in: [ 3144.087924] nvme_core nvme_common null_blk pktcdvd [ 3144.090967] scsi_debug [ 3144.090968] ipmi_watchdog ipmi_poweroff rfkill [ 3144.098267] nvme [ 3144.098267] sunrpc vfat fat [ 3144.102785] nvme_core [ 3144.104698] acpi_ipmi [ 3144.107566] nvme_common [ 3144.109912] ipmi_ssif [ 3144.112259] null_blk [ 3144.114779] arm_spe_pmu [ 3144.117126] pktcdvd [ 3144.119385] igb ipmi_devintf [ 3144.121905] ipmi_watchdog [ 3144.124078] ipmi_msghandler [ 3144.127033] ipmi_poweroff [ 3144.129728] arm_cmn [ 3144.132596] rfkill [ 3144.135289] arm_dmc620_pmu [ 3144.137462] sunrpc [ 3144.139548] cppc_cpufreq [ 3144.142329] vfat [ 3144.144415] arm_dsu_pmu [ 3144.147022] fat [ 3144.148934] acpi_tad loop [ 3144.151455] acpi_ipmi [ 3144.153280] fuse [ 3144.155974] ipmi_ssif [ 3144.158320] zram [ 3144.160233] arm_spe_pmu [ 3144.162579] xfs [ 3144.164493] igb [ 3144.167012] crct10dif_ce [ 3144.168838] ipmi_devintf [ 3144.170664] polyval_ce [ 3144.173271] ipmi_msghandler [ 3144.175877] polyval_generic [ 3144.178312] arm_cmn [ 3144.181178] ghash_ce [ 3144.184047] arm_dmc620_pmu [ 3144.186219] sbsa_gwdt [ 3144.188479] cppc_cpufreq [ 3144.191259] ast [ 3144.193606] arm_dsu_pmu [ 3144.196213] onboard_usb_hub [ 3144.198039] acpi_tad [ 3144.200559] i2c_algo_bit [ 3144.203427] loop [ 3144.205686] xgene_hwmon [ 3144.208294] fuse [ 3144.210206] [last unloaded: scsi_debug] [ 3144.212726] zram [ 3144.214638] [ 3144.214640] CPU: 97 PID: 93777 Comm: fio Tainted: G W 6.5.0-rc7 #1 [ 3144.218548] xfs [ 3144.220459] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022 [ 3144.221939] crct10dif_ce [ 3144.229493] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3144.231320] polyval_ce [ 3144.240610] pc : d_alloc_parallel+0x204/0x520 [ 3144.243218] polyval_generic [ 3144.250164] lr : d_alloc_parallel+0x128/0x520 [ 3144.252598] ghash_ce [ 3144.256940] sp : ffff800083ecb860 [ 3144.259809] sbsa_gwdt [ 3144.264151] x29: ffff800083ecb8b0 [ 3144.266411] ast [ 3144.269712] x28: 00000000e28e44d4 [ 3144.272059] onboard_usb_hub [ 3144.275360] x27: ffff0800c7e6c400 [ 3144.277186] i2c_algo_bit [ 3144.280574] [ 3144.280575] x26: ffff07ff9dfb48d0 [ 3144.283442] xgene_hwmon [ 3144.286830] x25: 0000000000011b30 [ 3144.289437] [last unloaded: scsi_debug] [ 3144.290915] x24: ffff07ffe89be298 [ 3144.294217] [ 3144.296737] [ 3144.296738] x23: ffff07ffe89be240 [ 3144.300126] CPU: 99 PID: 1680 Comm: systemd-udevd Tainted: G W 6.5.0-rc7 #1 [ 3144.304034] x22: ffff800083ecba20 x21: ffffbc0d084e9000 [ 3144.307424] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022 [ 3144.308902] [ 3144.308903] x20: ffffbc0d08aa1e98 [ 3144.310381] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3144.313682] x19: ffff07ff9eeac918 x18: 00000000fffffffb [ 3144.322019] pc : d_alloc_parallel+0x17c/0x520 [ 3144.327316] [ 3144.327317] x17: ffff07ff93f1c021 [ 3144.336608] lr : d_alloc_parallel+0x128/0x520 [ 3144.338086] x16: 636f6c622f000000 x15: fefefefeff727372 [ 3144.341388] sp : ffff8000844ab9b0 [ 3144.348335] [ 3144.348336] x14: fefefefeff7b7b7b [ 3144.353634] x29: ffff8000844aba00 [ 3144.357976] x13: 0000000000000000 x12: 00000000e28e44d4 [ 3144.359456] x28: 00000000e777e234 [ 3144.362757] [ 3144.362757] x11: ffff080f32b00000 [ 3144.367100] x27: ffff07ff96888000 [ 3144.372398] x10: 0000000000000000 x9 : ffff07ff93f1c037 [ 3144.375700] [ 3144.377178] [ 3144.377179] x8 : 0000000000000000 [ 3144.380480] x26: ffff07ff9de47890 [ 3144.383781] x7 : 7374736575716572 [ 3144.389080] x25: 0000000000011b30 [ 3144.392467] x6 : 65757165725f726e [ 3144.393947] x24: ffff07ff88faf618 [ 3144.397248] [ 3144.397249] x5 : ffff07ff9dfb4883 [ 3144.400637] [ 3144.405934] x4 : ffff07ff93f1c042 [ 3144.407414] x23: ffff07ff88faf5c0 [ 3144.408892] x3 : ffff07ff9dfb4bd0 [ 3144.412193] x22: ffff8000844abb30 [ 3144.415494] [ 3144.415495] x2 : ffff800083ecb87c [ 3144.418883] x21: ffffbc0d084e9000 [ 3144.422271] x1 : ffff800083ecba20 x0 : 0000000000000000 [ 3144.425660] [ 3144.429048] [ 3144.429049] Call trace: [ 3144.430527] x20: ffffbc0d08aa1e98 [ 3144.433828] d_alloc_parallel+0x204/0x520 [ 3144.435307] x19: 00000000000000e8 [ 3144.438695] __lookup_slow+0x6c/0x158 [ 3144.441997] x18: 00000000fffffffb [ 3144.445385] lookup_slow+0x4c/0x78 [ 3144.448774] [ 3144.450252] walk_component+0x10c/0x128 [ 3144.453553] x17: ffff07ff9400a021 [ 3144.456941] path_lookupat+0x60/0x140 [ 3144.462240] x16: 697665642f000000 [ 3144.463718] filename_lookup+0xd8/0x1d0 [ 3144.465197] x15: 722e6a626e6b612e [ 3144.467629] vfs_statx+0x90/0x220 [ 3144.470932] [ 3144.474927] __arm64_sys_newfstatat+0xa0/0x100 [ 3144.478316] x14: 7aff6b6b7f6b6bff [ 3144.481964] invoke_syscall+0x78/0x110 [ 3144.485353] x13: 0000000000000000 [ 3144.488741] el0_svc_common+0xc0/0xf8 [ 3144.490219] x12: 00000000e777e234 [ 3144.494041] do_el0_svc+0x3c/0xb8 [ 3144.497343] [ 3144.500991] el0_svc+0x34/0x110 [ 3144.504380] x11: ffff080f32b00000 [ 3144.508201] el0t_64_sync_handler+0x84/0x100 [ 3144.511590] x10: 0000000000000000 [ 3144.514891] el0t_64_sync+0x194/0x198 [ 3144.516370] x9 : ffff07ff9400a06b [ 3144.564153] x8 : ffff07ff884731f1 x7 : 25732500716d016b x6 : 0000000032757063 [ 3144.571280] x5 : ffff07ff9de4783d x4 : ffff07ff9400a070 x3 : ffff07ff9de46c90 [ 3144.578407] x2 : ffff8000844ab9cc x1 : ffff8000844abb30 x0 : 0000000000000000 [ 3144.585534] Call trace: [ 3144.587969] d_alloc_parallel+0x17c/0x520 [ 3144.591971] path_openat+0x238/0xc70 [ 3144.595539] do_filp_open+0xc4/0x178 [ 3144.599107] do_sys_openat2+0x90/0x100 [ 3144.602847] __arm64_sys_openat+0x7c/0xb0 [ 3144.606848] invoke_syscall+0x78/0x110 [ 3144.610591] el0_svc_common+0x94/0xf8 [ 3144.614247] do_el0_svc+0x3c/0xb8 [ 3144.617555] el0_svc+0x34/0x110 [ 3144.620690] el0t_64_sync_handler+0x84/0x100 [ 3144.624954] el0t_64_sync+0x194/0x198 [ 3168.068867] watchdog: BUG: soft lockup - CPU#97 stuck for 48s! [fio:93777] [ 3168.069867] watchdog: BUG: soft lockup - CPU#99 stuck for 48s! [systemd-udevd:1680] Thanks, ----- End forwarded message ----- -- Ming ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-08-23 4:06 [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] Ming Lei @ 2023-08-23 8:47 ` Christian Brauner 2023-08-28 10:43 ` Ming Lei 0 siblings, 1 reply; 14+ messages in thread From: Christian Brauner @ 2023-08-23 8:47 UTC (permalink / raw) To: Ming Lei Cc: linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong On Wed, Aug 23, 2023 at 12:06:14PM +0800, Ming Lei wrote: > > Looks the issue is more related with vfs, so forward to vfs list. > > ----- Forwarded message from Changhui Zhong <czhong@redhat.com> ----- > > Date: Wed, 23 Aug 2023 11:17:55 +0800 > From: Changhui Zhong <czhong@redhat.com> > To: linux-scsi@vger.kernel.org > Cc: Ming Lei <ming.lei@redhat.com> > Subject: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278 > > Hello, > > triggered below warning issue with branch > " > Tree: mainline.kernel.org-clang > Repository: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > @ master > Commit Hash: 89bf6209cad66214d3774dac86b6bbf2aec6a30d > Commit Name: v6.5-rc7-18-g89bf6209cad6 > Kernel information: > Commit message: Merge tag 'devicetree-fixes-for-6.5-2' of > git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux > " > for more detail,please check > https://datawarehouse.cki-project.org/kcidb/tests/9232643 > > #modprobe scsi_debug virtual_gb=128 > #echo none > /sys/block/sdb/queue/scheduler > #fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=4 --rw=randrw > --name=sdb-libaio-randrw-4k --filename=/dev/sdb --direct=1 --size=60G > --runtime=60 Looking at this issue it seems unlikely that this is a vfs bug. We should see this all over the place and specifically not just on arm64. The sequence here seems to be: echo 4 > /proc/sys/vm/drop_caches rmmod scsi_debug > /dev/null 3>&1 [ 3117.059778] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278 [ 3117.067601] Modules linked in: scsi_debug nvme nvme_core nvme_common null_blk pktcdvd ipmi_watchdog ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu cppc_cpufreq arm_dsu_pmu acpi_tad loop fuse zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce sbsa_gwdt ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug] So my money is on some device that gets removed still having an increased refcount and pinning the dentry. Immediate suspects would be: 7882541ca06d ("of/platform: increase refcount of fwnode") but that part is complete speculation on my part. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-08-23 8:47 ` Christian Brauner @ 2023-08-28 10:43 ` Ming Lei 2023-09-13 8:59 ` Yi Zhang 0 siblings, 1 reply; 14+ messages in thread From: Ming Lei @ 2023-08-28 10:43 UTC (permalink / raw) To: Christian Brauner Cc: linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, Yi Zhang On Wed, Aug 23, 2023 at 4:47 PM Christian Brauner <brauner@kernel.org> wrote: > > On Wed, Aug 23, 2023 at 12:06:14PM +0800, Ming Lei wrote: > > > > Looks the issue is more related with vfs, so forward to vfs list. > > > > ----- Forwarded message from Changhui Zhong <czhong@redhat.com> ----- > > > > Date: Wed, 23 Aug 2023 11:17:55 +0800 > > From: Changhui Zhong <czhong@redhat.com> > > To: linux-scsi@vger.kernel.org > > Cc: Ming Lei <ming.lei@redhat.com> > > Subject: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278 > > > > Hello, > > > > triggered below warning issue with branch > > " > > Tree: mainline.kernel.org-clang > > Repository: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > @ master > > Commit Hash: 89bf6209cad66214d3774dac86b6bbf2aec6a30d > > Commit Name: v6.5-rc7-18-g89bf6209cad6 > > Kernel information: > > Commit message: Merge tag 'devicetree-fixes-for-6.5-2' of > > git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux > > " > > for more detail,please check > > https://datawarehouse.cki-project.org/kcidb/tests/9232643 > > > > #modprobe scsi_debug virtual_gb=128 > > #echo none > /sys/block/sdb/queue/scheduler > > #fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=4 --rw=randrw > > --name=sdb-libaio-randrw-4k --filename=/dev/sdb --direct=1 --size=60G > > --runtime=60 > > Looking at this issue it seems unlikely that this is a vfs bug. > We should see this all over the place and specifically not just on arm64. > > The sequence here seems to be: > > echo 4 > /proc/sys/vm/drop_caches > rmmod scsi_debug > /dev/null 3>&1 > > [ 3117.059778] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278 > [ 3117.067601] Modules linked in: scsi_debug nvme nvme_core nvme_common null_blk pktcdvd ipmi_watchdog ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu cppc_cpufreq arm_dsu_pmu acpi_tad loop fuse zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce sbsa_gwdt ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug] > > So my money is on some device that gets removed still having an > increased refcount and pinning the dentry. Immediate suspects would be: > > 7882541ca06d ("of/platform: increase refcount of fwnode") > > but that part is complete speculation on my part. BTW, just saw another panic on 6.5-rc7, still scsi_debug test on arm64: [ 959.371726] sr 50:0:0:0: Attached scsi generic sg1 type 5 [ 959.603145] scsi 48:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 959.603155] scsi 50:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 959.603950] scsi 49:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 959.604052] scsi 49:0:0:0: Power-on or device reset occurred [ 959.609336] sr 49:0:0:0: [sr1] scsi-1 drive [ 959.611360] scsi 48:0:0:0: Power-on or device reset occurred [ 959.614540] Unable to handle kernel paging request at virtual address 65888c2e6fe694d5 [ 959.614544] Mem abort info: [ 959.614545] ESR = 0x0000000096000004 [ 959.614547] EC = 0x25: DABT (current EL), IL = 32 bits [ 959.614550] SET = 0, FnV = 0 [ 959.614552] EA = 0, S1PTW = 0 [ 959.614553] FSC = 0x04: level 0 translation fault [ 959.614555] Data abort info: [ 959.614556] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 959.614559] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 959.614561] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 959.614563] [65888c2e6fe694d5] address between user and kernel address ranges [ 959.614566] Internal error: Oops: 0000000096000004 [#1] SMP [ 959.614570] Modules linked in: pktcdvd scsi_debug ipmi_watchdog ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf arm_cmn ipmi_msghandler arm_dmc620_pmu arm_dsu_pmu cppc_cpufreq acpi_tad loop fuse zram xfs nvme crct10dif_ce polyval_ce nvme_core polyval_generic ghash_ce sbsa_gwdt nvme_common ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug] [ 959.614620] CPU: 108 PID: 19529 Comm: check Not tainted 6.5.0-rc7 #1 [ 959.614625] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022 [ 959.614627] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 959.614632] pc : d_alloc_parallel+0x140/0x440 [ 959.614641] lr : d_alloc_parallel+0xcc/0x440 [ 959.614646] sp : ffff80008a7d3290 [ 959.614647] x29: ffff80008a7d3290 x28: ffff07ff8230c530 x27: 65888c2e6fe69565 [ 959.614654] x26: ffffcb72eac9e1d0 x25: ffff80008a7d33d8 x24: ffff07ffa0fd3800 [ 959.614659] x23: 00000000000003c0 x22: 000000007a701548 x21: ffffcb72eac9ffd0 [ 959.614664] x20: ffff07ffa0fd35c0 x19: ffffcb72ea6e9600 x18: ffffffffffffffff [ 959.614670] x17: 00000000440fd8e0 x16: 00000000b6431329 x15: ffff80008a7d3360 [ 959.614675] x14: ffff80008a7d3508 x13: ffffcb72e94da6d0 x12: ffff80008a7d334c [ 959.614680] x11: 0000000c7a701548 x10: ffff3c8cb54590b8 x9 : ffffcb72e80832f4 [ 959.614685] x8 : ffff07ffa0fd35c0 x7 : 7473696c5f71725f x6 : 0000000000200008 [ 959.614690] x5 : ffffcb72ea6f4000 x4 : 00000000003d380a x3 : 0000000000000004 [ 959.614696] x2 : ffff80008a7d331c x1 : ffff07ffe1760000 x0 : 0000000000005000 [ 959.614701] Call trace: [ 959.614703] d_alloc_parallel+0x140/0x440 [ 959.614708] __lookup_slow+0x64/0x158 [ 959.614714] lookup_one_len+0xac/0xc8 [ 959.614719] start_creating.part.0+0x88/0x198 [ 959.614725] __debugfs_create_file+0x70/0x230 [ 959.614730] debugfs_create_file+0x34/0x48 [ 959.614734] blk_mq_debugfs_register_hctx+0x154/0x1d0 [ 959.614740] blk_mq_debugfs_register+0xfc/0x1e0 [ 959.614745] blk_register_queue+0xc0/0x1f0 [ 959.614750] device_add_disk+0x1dc/0x3e0 [ 959.614754] sr_probe+0x2c0/0x368 [ 959.614760] really_probe+0x190/0x3d8 [ 959.614766] __driver_probe_device+0x84/0x180 [ 959.614771] driver_probe_device+0x44/0x120 [ 959.614776] __device_attach_driver+0xc4/0x168 [ 959.614781] bus_for_each_drv+0x8c/0xf0 [ 959.614785] __device_attach+0xa4/0x1c0 [ 959.614790] device_initial_probe+0x1c/0x30 [ 959.614795] bus_probe_device+0xb4/0xc0 [ 959.614799] device_add+0x508/0x6f8 [ 959.614803] scsi_sysfs_add_sdev+0x8c/0x250 [ 959.614809] scsi_add_lun+0x424/0x558 [ 959.614813] scsi_probe_and_add_lun+0x11c/0x430 [ 959.614817] __scsi_scan_target+0xb8/0x258 [ 959.614821] scsi_scan_channel+0xa0/0xb8 [ 959.614825] scsi_scan_host_selected+0x170/0x188 [ 959.614830] store_scan+0x194/0x1a8 [ 959.614835] dev_attr_store+0x20/0x40 [ 959.614840] sysfs_kf_write+0x4c/0x68 [ 959.614845] kernfs_fop_write_iter+0x13c/0x1d8 [ 959.614849] vfs_write+0x1c0/0x310 [ 959.614855] ksys_write+0x78/0x118 [ 959.614859] __arm64_sys_write+0x24/0x38 [ 959.614864] invoke_syscall+0x78/0x100 [ 959.614868] el0_svc_common.constprop.0+0x4c/0xf8 [ 959.614871] do_el0_svc+0x34/0x50 [ 959.614874] el0_svc+0x34/0x108 [ 959.614879] el0t_64_sync_handler+0x120/0x130 [ 959.614884] el0t_64_sync+0x194/0x198 [ 959.614889] Code: 54000088 14000067 f940037b b4000cbb (b8570360) [ 959.614892] ---[ end trace 0000000000000000 ]--- [ 959.614895] Kernel panic - not syncing: Oops: Fatal exception [ 959.614897] SMP: stopping secondary CPUs [ 959.619492] Kernel Offset: 0x4b7267c40000 from 0xffff800080000000 [ 959.619494] PHYS_OFFSET: 0x80000000 [ 959.619496] CPU features: 0x00000010,b80140a1,8841720b [ 959.619498] Memory Limit: none [ 960.040819] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-08-28 10:43 ` Ming Lei @ 2023-09-13 8:59 ` Yi Zhang 2023-09-16 6:55 ` Baokun Li 2023-09-17 0:35 ` Bagas Sanjaya 0 siblings, 2 replies; 14+ messages in thread From: Yi Zhang @ 2023-09-13 8:59 UTC (permalink / raw) To: Ming Lei Cc: Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, mark.rutland The issue still can be reproduced on the latest linux tree[2]. To reproduce I need to run about 1000 times blktests block/001, and bisect shows it was introduced with commit[1], as it was not 100% reproduced, not sure if it's the culprit? [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery [2] [ 2304.536339] scsi 48:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 2304.540805] sr 50:0:0:0: Attached scsi CD-ROM sr3 [ 2304.544574] scsi 48:0:0:0: Power-on or device reset occurred [ 2304.600645] sr 48:0:0:0: [sr1] scsi-1 drive [ 2304.616364] scsi 51:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 2304.624639] scsi 51:0:0:0: Power-on or device reset occurred [ 2304.626634] sr 48:0:0:0: Attached scsi CD-ROM sr1 [ 2304.680537] sr 51:0:0:0: [sr2] scsi-1 drive [ 2304.706394] sr 51:0:0:0: Attached scsi CD-ROM sr2 [ 2304.746329] scsi 49:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 2304.754569] scsi 49:0:0:0: Power-on or device reset occurred [ 2304.756302] scsi 50:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 2304.768483] scsi 50:0:0:0: Power-on or device reset occurred [ 2304.806321] scsi 48:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 2304.810587] sr 49:0:0:0: [sr0] scsi-1 drive [ 2304.814561] scsi 48:0:0:0: Power-on or device reset occurred [ 2304.824475] sr 50:0:0:0: [sr3] scsi-1 drive [ 2304.836384] scsi 51:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 2304.840364] sr 49:0:0:0: Attached scsi CD-ROM sr0 [ 2304.844619] scsi 51:0:0:0: Power-on or device reset occurred [ 2304.850444] sr 50:0:0:0: Attached scsi CD-ROM sr3 [ 2304.874563] sr 48:0:0:0: [sr1] scsi-1 drive [ 2304.900660] sr 51:0:0:0: [sr2] scsi-1 drive [ 2304.901506] sr 48:0:0:0: Attached scsi CD-ROM sr1 [ 2304.926306] sr 51:0:0:0: Attached scsi CD-ROM sr2 [ 2305.056432] scsi 50:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 2305.056572] scsi 49:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 2305.064635] scsi 50:0:0:0: Power-on or device reset occurred [ 2305.072821] scsi 49:0:0:0: Power-on or device reset occurred [ 2305.086286] scsi 51:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 2305.086357] scsi 48:0:0:0: CD-ROM Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 2305.094521] scsi 51:0:0:0: Power-on or device reset occurred [ 2305.102693] scsi 48:0:0:0: Power-on or device reset occurred [ 2305.128785] sr 50:0:0:0: [sr0] scsi-1 drive [ 2305.134445] sr 49:0:0:0: [sr1] scsi-1 drive [ 2305.154728] sr 50:0:0:0: Attached scsi CD-ROM sr0 [ 2305.158607] sr 51:0:0:0: [sr2] scsi-1 drive [ 2305.160392] sr 49:0:0:0: Attached scsi CD-ROM sr1 [ 2305.164254] sr 48:0:0:0: [sr3] scsi-1 drive [ 2305.184185] sr 51:0:0:0: Attached scsi CD-ROM sr2 [ 2305.190086] sr 48:0:0:0: Attached scsi CD-ROM sr3 [ 2305.555658] Unable to handle kernel execute from non-executable memory at virtual address ffffc61b656052e8 [ 2305.565301] Mem abort info: [ 2305.568086] ESR = 0x000000008600000e [ 2305.571822] EC = 0x21: IABT (current EL), IL = 32 bits [ 2305.577123] SET = 0, FnV = 0 [ 2305.580164] EA = 0, S1PTW = 0 [ 2305.583292] FSC = 0x0e: level 2 permission fault [ 2305.588074] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080731fa0000 [ 2305.594761] [ffffc61b656052e8] pgd=1000080ffffff003, p4d=1000080ffffff003, pud=1000080fffffe003, pmd=0068080732e00f01 [ 2305.605362] Internal error: Oops: 000000008600000e [#1] SMP [ 2305.610922] Modules linked in: scsi_debug sr_mod pktcdvd cdrom rfkill sunrpc vfat fat acpi_ipmi arm_spe_pmu ipmi_ssif ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu arm_dsu_pmu cppc_cpufreq loop fuse zram xfs crct10dif_ce ghash_ce nvme sha2_ce nvme_core sha256_arm64 igb sha1_ce ast sbsa_gwdt nvme_common i2c_designware_platform i2c_algo_bit i2c_designware_core xgene_hwmon dm_mod [last unloaded: scsi_debug] [ 2305.647236] CPU: 85 PID: 1 Comm: systemd Kdump: loaded Not tainted 6.6.0-rc1+ #13 [ 2305.654706] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022 [ 2305.663997] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2305.670946] pc : in_lookup_hashtable+0x1138/0x2000 [ 2305.675728] lr : rcu_do_batch+0x194/0x488 [ 2305.679727] sp : ffff8000802abe60 [ 2305.683029] x29: ffff8000802abe60 x28: ffffc61b6524c7c0 x27: ffffc61b63452f40 [ 2305.690152] x26: ffff080f37ab6438 x25: 000000000000000a x24: 0000000000000000 [ 2305.697274] x23: 0000000000000002 x22: ffff8000802abec0 x21: ffff080f37ab63c0 [ 2305.704396] x20: ffff07ff8136a580 x19: 0000000000000003 x18: 0000000000000000 [ 2305.711519] x17: ffff41f3d3161000 x16: ffff8000802a8000 x15: 0000000000000000 [ 2305.718641] x14: 0000000000000000 x13: ffff07ffa131802d x12: ffff80008041bb94 [ 2305.725764] x11: 0000000000000040 x10: ffff07ff802622e8 x9 : ffffc61b63452e30 [ 2305.732887] x8 : 000002189dce1780 x7 : ffff07ff8d5c1000 x6 : ffff41f3d3161000 [ 2305.740009] x5 : ffff07ff8136a580 x4 : ffff080f37aba960 x3 : 000000001550a055 [ 2305.747131] x2 : 0000000000000000 x1 : ffffc61b656052e8 x0 : ffff080184c565f0 [ 2305.754254] Call trace: [ 2305.756687] in_lookup_hashtable+0x1138/0x2000 [ 2305.761119] rcu_core+0x268/0x350 [ 2305.764422] rcu_core_si+0x18/0x30 [ 2305.767812] __do_softirq+0x120/0x3d4 [ 2305.771462] ____do_softirq+0x18/0x30 [ 2305.775112] call_on_irq_stack+0x24/0x30 [ 2305.779022] do_softirq_own_stack+0x24/0x38 [ 2305.783192] __irq_exit_rcu+0xfc/0x130 [ 2305.786929] irq_exit_rcu+0x18/0x30 [ 2305.790404] el1_interrupt+0x4c/0xe8 [ 2305.793969] el1h_64_irq_handler+0x18/0x28 [ 2305.798052] el1h_64_irq+0x78/0x80 [ 2305.801441] d_same_name+0x50/0xd0 [ 2305.804832] __lookup_slow+0x64/0x158 [ 2305.808482] walk_component+0xe0/0x1a0 [ 2305.812219] path_lookupat+0x7c/0x1b8 [ 2305.815869] filename_lookup+0xb4/0x1b8 [ 2305.819692] vfs_statx+0x94/0x1a8 [ 2305.822995] vfs_fstatat+0xd4/0x110 [ 2305.826471] __do_sys_newfstatat+0x58/0xa8 [ 2305.830556] __arm64_sys_newfstatat+0x28/0x40 [ 2305.834901] invoke_syscall.constprop.0+0x80/0xd8 [ 2305.839592] do_el0_svc+0x48/0xd0 [ 2305.842894] el0_svc+0x4c/0x1c0 [ 2305.846023] el0t_64_sync_handler+0x120/0x130 [ 2305.850367] el0t_64_sync+0x1a4/0x1a8 [ 2305.854017] Code: 00000000 00000000 00000000 00000000 (84c565f1) [ 2305.860098] SMP: stopping secondary CPUs [ 2305.865048] Starting crashdump kernel... [ 2305.868958] Bye! On Mon, Aug 28, 2023 at 6:43 PM Ming Lei <ming.lei@redhat.com> wrote: > > On Wed, Aug 23, 2023 at 4:47 PM Christian Brauner <brauner@kernel.org> wrote: > > > > On Wed, Aug 23, 2023 at 12:06:14PM +0800, Ming Lei wrote: > > > > > > Looks the issue is more related with vfs, so forward to vfs list. > > > > > > ----- Forwarded message from Changhui Zhong <czhong@redhat.com> ----- > > > > > > Date: Wed, 23 Aug 2023 11:17:55 +0800 > > > From: Changhui Zhong <czhong@redhat.com> > > > To: linux-scsi@vger.kernel.org > > > Cc: Ming Lei <ming.lei@redhat.com> > > > Subject: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278 > > > > > > Hello, > > > > > > triggered below warning issue with branch > > > " > > > Tree: mainline.kernel.org-clang > > > Repository: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > > @ master > > > Commit Hash: 89bf6209cad66214d3774dac86b6bbf2aec6a30d > > > Commit Name: v6.5-rc7-18-g89bf6209cad6 > > > Kernel information: > > > Commit message: Merge tag 'devicetree-fixes-for-6.5-2' of > > > git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux > > > " > > > for more detail,please check > > > https://datawarehouse.cki-project.org/kcidb/tests/9232643 > > > > > > #modprobe scsi_debug virtual_gb=128 > > > #echo none > /sys/block/sdb/queue/scheduler > > > #fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=4 --rw=randrw > > > --name=sdb-libaio-randrw-4k --filename=/dev/sdb --direct=1 --size=60G > > > --runtime=60 > > > > Looking at this issue it seems unlikely that this is a vfs bug. > > We should see this all over the place and specifically not just on arm64. > > > > The sequence here seems to be: > > > > echo 4 > /proc/sys/vm/drop_caches > > rmmod scsi_debug > /dev/null 3>&1 > > > > [ 3117.059778] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278 > > [ 3117.067601] Modules linked in: scsi_debug nvme nvme_core nvme_common null_blk pktcdvd ipmi_watchdog ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu cppc_cpufreq arm_dsu_pmu acpi_tad loop fuse zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce sbsa_gwdt ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug] > > > > So my money is on some device that gets removed still having an > > increased refcount and pinning the dentry. Immediate suspects would be: > > > > 7882541ca06d ("of/platform: increase refcount of fwnode") > > > > but that part is complete speculation on my part. > > BTW, just saw another panic on 6.5-rc7, still scsi_debug test on arm64: > > [ 959.371726] sr 50:0:0:0: Attached scsi generic sg1 type 5 > [ 959.603145] scsi 48:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 959.603155] scsi 50:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 959.603950] scsi 49:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 959.604052] scsi 49:0:0:0: Power-on or device reset occurred > [ 959.609336] sr 49:0:0:0: [sr1] scsi-1 drive > [ 959.611360] scsi 48:0:0:0: Power-on or device reset occurred > [ 959.614540] Unable to handle kernel paging request at virtual > address 65888c2e6fe694d5 > [ 959.614544] Mem abort info: > [ 959.614545] ESR = 0x0000000096000004 > [ 959.614547] EC = 0x25: DABT (current EL), IL = 32 bits > [ 959.614550] SET = 0, FnV = 0 > [ 959.614552] EA = 0, S1PTW = 0 > [ 959.614553] FSC = 0x04: level 0 translation fault > [ 959.614555] Data abort info: > [ 959.614556] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 > [ 959.614559] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > [ 959.614561] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > [ 959.614563] [65888c2e6fe694d5] address between user and kernel address ranges > [ 959.614566] Internal error: Oops: 0000000096000004 [#1] SMP > [ 959.614570] Modules linked in: pktcdvd scsi_debug ipmi_watchdog > ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu > igb ipmi_devintf arm_cmn ipmi_msghandler arm_dmc620_pmu arm_dsu_pmu > cppc_cpufreq acpi_tad loop fuse zram xfs nvme crct10dif_ce polyval_ce > nvme_core polyval_generic ghash_ce sbsa_gwdt nvme_common ast > onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug] > [ 959.614620] CPU: 108 PID: 19529 Comm: check Not tainted 6.5.0-rc7 #1 > [ 959.614625] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS > F31n (SCP: 2.10.20220810) 09/30/2022 > [ 959.614627] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 959.614632] pc : d_alloc_parallel+0x140/0x440 > [ 959.614641] lr : d_alloc_parallel+0xcc/0x440 > [ 959.614646] sp : ffff80008a7d3290 > [ 959.614647] x29: ffff80008a7d3290 x28: ffff07ff8230c530 x27: 65888c2e6fe69565 > [ 959.614654] x26: ffffcb72eac9e1d0 x25: ffff80008a7d33d8 x24: ffff07ffa0fd3800 > [ 959.614659] x23: 00000000000003c0 x22: 000000007a701548 x21: ffffcb72eac9ffd0 > [ 959.614664] x20: ffff07ffa0fd35c0 x19: ffffcb72ea6e9600 x18: ffffffffffffffff > [ 959.614670] x17: 00000000440fd8e0 x16: 00000000b6431329 x15: ffff80008a7d3360 > [ 959.614675] x14: ffff80008a7d3508 x13: ffffcb72e94da6d0 x12: ffff80008a7d334c > [ 959.614680] x11: 0000000c7a701548 x10: ffff3c8cb54590b8 x9 : ffffcb72e80832f4 > [ 959.614685] x8 : ffff07ffa0fd35c0 x7 : 7473696c5f71725f x6 : 0000000000200008 > [ 959.614690] x5 : ffffcb72ea6f4000 x4 : 00000000003d380a x3 : 0000000000000004 > [ 959.614696] x2 : ffff80008a7d331c x1 : ffff07ffe1760000 x0 : 0000000000005000 > [ 959.614701] Call trace: > [ 959.614703] d_alloc_parallel+0x140/0x440 > [ 959.614708] __lookup_slow+0x64/0x158 > [ 959.614714] lookup_one_len+0xac/0xc8 > [ 959.614719] start_creating.part.0+0x88/0x198 > [ 959.614725] __debugfs_create_file+0x70/0x230 > [ 959.614730] debugfs_create_file+0x34/0x48 > [ 959.614734] blk_mq_debugfs_register_hctx+0x154/0x1d0 > [ 959.614740] blk_mq_debugfs_register+0xfc/0x1e0 > [ 959.614745] blk_register_queue+0xc0/0x1f0 > [ 959.614750] device_add_disk+0x1dc/0x3e0 > [ 959.614754] sr_probe+0x2c0/0x368 > [ 959.614760] really_probe+0x190/0x3d8 > [ 959.614766] __driver_probe_device+0x84/0x180 > [ 959.614771] driver_probe_device+0x44/0x120 > [ 959.614776] __device_attach_driver+0xc4/0x168 > [ 959.614781] bus_for_each_drv+0x8c/0xf0 > [ 959.614785] __device_attach+0xa4/0x1c0 > [ 959.614790] device_initial_probe+0x1c/0x30 > [ 959.614795] bus_probe_device+0xb4/0xc0 > [ 959.614799] device_add+0x508/0x6f8 > [ 959.614803] scsi_sysfs_add_sdev+0x8c/0x250 > [ 959.614809] scsi_add_lun+0x424/0x558 > [ 959.614813] scsi_probe_and_add_lun+0x11c/0x430 > [ 959.614817] __scsi_scan_target+0xb8/0x258 > [ 959.614821] scsi_scan_channel+0xa0/0xb8 > [ 959.614825] scsi_scan_host_selected+0x170/0x188 > [ 959.614830] store_scan+0x194/0x1a8 > [ 959.614835] dev_attr_store+0x20/0x40 > [ 959.614840] sysfs_kf_write+0x4c/0x68 > [ 959.614845] kernfs_fop_write_iter+0x13c/0x1d8 > [ 959.614849] vfs_write+0x1c0/0x310 > [ 959.614855] ksys_write+0x78/0x118 > [ 959.614859] __arm64_sys_write+0x24/0x38 > [ 959.614864] invoke_syscall+0x78/0x100 > [ 959.614868] el0_svc_common.constprop.0+0x4c/0xf8 > [ 959.614871] do_el0_svc+0x34/0x50 > [ 959.614874] el0_svc+0x34/0x108 > [ 959.614879] el0t_64_sync_handler+0x120/0x130 > [ 959.614884] el0t_64_sync+0x194/0x198 > [ 959.614889] Code: 54000088 14000067 f940037b b4000cbb (b8570360) > [ 959.614892] ---[ end trace 0000000000000000 ]--- > [ 959.614895] Kernel panic - not syncing: Oops: Fatal exception > [ 959.614897] SMP: stopping secondary CPUs > [ 959.619492] Kernel Offset: 0x4b7267c40000 from 0xffff800080000000 > [ 959.619494] PHYS_OFFSET: 0x80000000 > [ 959.619496] CPU features: 0x00000010,b80140a1,8841720b > [ 959.619498] Memory Limit: none > [ 960.040819] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-09-13 8:59 ` Yi Zhang @ 2023-09-16 6:55 ` Baokun Li 2023-09-17 9:10 ` Peter Zijlstra 2023-09-19 15:10 ` Mark Rutland 2023-09-17 0:35 ` Bagas Sanjaya 1 sibling, 2 replies; 14+ messages in thread From: Baokun Li @ 2023-09-16 6:55 UTC (permalink / raw) To: Yi Zhang, Ming Lei, mark.rutland Cc: Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, yangerkun, zhangyi (F), Baokun Li, peterz, Kees Cook, chengzhihao On 2023/9/13 16:59, Yi Zhang wrote: > The issue still can be reproduced on the latest linux tree[2]. > To reproduce I need to run about 1000 times blktests block/001, and > bisect shows it was introduced with commit[1], as it was not 100% > reproduced, not sure if it's the culprit? > > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery Hello, everyone! We have confirmed that the merge-in of this patch caused hlist_bl_lock (aka, bit_spin_lock) to fail, which in turn triggered the issue above. The process in which VFS issue arise is as follows: 1. bl_head >>> first==dentry2 >>> dentry1 dentry2->next = dentry1 dentry2->pprev = head dentry1->next = NULL dentry1->pprev = dentry2 2. Concurrent deletion of dentry, hlist_bl_lock lock protection failure ``` __hlist_bl_del(dentry2) __hlist_bl_del(dentry1) dentry2->next = NULL; dentry1->next = NULL; dentry1->pprev = NULL; head->first = dentry1 dentry1->pprev = head dentry2->next = NULL; dentry2->pprev = NULL; ``` 3. WARN_ON/BUG_ON is triggered because dentry1 is still on the hlist after being deleted. dentry1->next = NULL dentry1->pprev = head Verify that hlist_bl_lock is not working with the following mod: mymod.c ``` #include <linux/kallsyms.h> #include <linux/module.h> #include <linux/moduleloader.h> #include <linux/kernel.h> #include <linux/jiffies.h> #include <linux/sched.h> #include <linux/smp.h> #include <linux/cpu.h> #include <linux/delay.h> #include <linux/percpu.h> #include <linux/threads.h> #include <linux/kthread.h> #include <linux/kernel_stat.h> #include <linux/version.h> #include <linux/slab.h> #include <linux/smpboot.h> #include <linux/pagemap.h> #include <linux/notifier.h> #include <linux/syscalls.h> #include <linux/namei.h> #include <asm/atomic.h> #include <asm/bitops.h> static unsigned long long a = 0, b = 0; static struct hlist_bl_head bl_head; struct task_struct *Thread1; struct task_struct *Thread2; struct task_struct *Thread3; struct task_struct *Thread4; struct task_struct *Thread5; struct task_struct *Thread6; int increase_ab(void *arg); int increase_ab(void *arg) { while (1) { hlist_bl_lock(&bl_head); if (a != b) { pr_err(">>> a = %llu, b = %llu \n", a, b); BUG(); return -1; } if (a > (ULLONG_MAX - 4096)) { a = 0; b = 0; } a++; b++; hlist_bl_unlock(&bl_head); schedule(); } return 0; } static int mymod_init(void) { INIT_HLIST_BL_HEAD(&bl_head); Thread1 = kthread_create(increase_ab, NULL, "bl_lock_thread1"); wake_up_process(Thread1); Thread2 = kthread_create(increase_ab, NULL, "bl_lock_thread2"); wake_up_process(Thread2); Thread3 = kthread_create(increase_ab, NULL, "bl_lock_thread3"); wake_up_process(Thread3); Thread4 = kthread_create(increase_ab, NULL, "bl_lock_thread4"); wake_up_process(Thread4); Thread5 = kthread_create(increase_ab, NULL, "bl_lock_thread5"); wake_up_process(Thread5); Thread6 = kthread_create(increase_ab, NULL, "bl_lock_thread6"); wake_up_process(Thread6); return 0; } static void mymod_exit(void) { if (Thread1) kthread_stop(Thread1); if (Thread2) kthread_stop(Thread2); if (Thread3) kthread_stop(Thread3); if (Thread4) kthread_stop(Thread4); if (Thread5) kthread_stop(Thread5); if (Thread6) kthread_stop(Thread6); } module_init(mymod_init); module_exit(mymod_exit); MODULE_LICENSE("Dual BSD/GPL"); ``` After 9257959a6e5b ("locking/atomic: scripts: restructure fallback ifdeffery") is merged in, we can see the problem when inserting the ko: ``` [root@localhost ~]# insmod mymod.ko [ 37.994787][ T621] >>> a = 725, b = 724 [ 37.995313][ T621] ------------[ cut here ]------------ [ 37.995951][ T621] kernel BUG at fs/mymod/mymod.c:42! [r[ oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 37.997420][ T621] Modules linked in: mymod(E) [ 37.997891][ T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted: G E 6.4.0-rc2-00034-g9257959a6e5b-dirty #117 [ 37.999038][ T621] Hardware name: linux,dummy-virt (DT) [ 37.999571][ T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 38.000344][ T621] pc : increase_ab+0xcc/0xe70 [mymod] [ 38.000882][ T621] lr : increase_ab+0xcc/0xe70 [mymod] [ 38.001416][ T621] sp : ffff800008b4be40 [ 38.001822][ T621] x29: ffff800008b4be40 x28: 0000000000000000 x27: 0000000000000000 [ 38.002605][ T621] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 38.003385][ T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21: 0000000000000001 [ 38.004174][ T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18: 0000000000000000 [ 38.004955][ T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15: ffffffffffffffff [ 38.005754][ T621] x14: 0000000000000000 x13: 205d313236542020 x12: ffffd99332175b80 [ 38.006538][ T621] x11: 0000000000000003 x10: 0000000000000001 x9 : ffffd9933022a9d8 [ 38.007325][ T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : ffffd993320b5b40 [ 38.008124][ T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 : 0000000000000000 [ 38.008912][ T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000015 [ 38.009709][ T621] Call trace: [ 38.010035][ T621] increase_ab+0xcc/0xe70 [mymod] [ 38.010539][ T621] kthread+0xdc/0xf0 [ 38.010927][ T621] ret_from_fork+0x10/0x20 [ 38.011370][ T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000) [ 38.012067][ T621] ---[ end trace 0000000000000000 ]--- [ 38.012603][ T621] Kernel panic - not syncing: Oops - BUG: Fatal exception [ 38.013311][ T621] SMP: stopping secondary CPUs [ 38.013818][ T621] Kernel Offset: 0x599328000000 from 0xffff800008000000 [ 38.014508][ T621] PHYS_OFFSET: 0x40000000 [ 38.014933][ T621] CPU features: 0x000000,0220080c,44016203 [ 38.015510][ T621] Memory Limit: none [ 38.015950][ T621] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]--- ``` ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-09-16 6:55 ` Baokun Li @ 2023-09-17 9:10 ` Peter Zijlstra 2023-09-17 9:26 ` Peter Zijlstra 2023-09-18 1:10 ` Baokun Li 2023-09-19 15:10 ` Mark Rutland 1 sibling, 2 replies; 14+ messages in thread From: Peter Zijlstra @ 2023-09-17 9:10 UTC (permalink / raw) To: Baokun Li Cc: Yi Zhang, Ming Lei, mark.rutland, Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, yangerkun, zhangyi (F), Kees Cook, chengzhihao On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote: > On 2023/9/13 16:59, Yi Zhang wrote: > > The issue still can be reproduced on the latest linux tree[2]. > > To reproduce I need to run about 1000 times blktests block/001, and > > bisect shows it was introduced with commit[1], as it was not 100% > > reproduced, not sure if it's the culprit? > > > > > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery > Hello, everyone! > > We have confirmed that the merge-in of this patch caused hlist_bl_lock > (aka, bit_spin_lock) to fail, which in turn triggered the issue above. > [root@localhost ~]# insmod mymod.ko > [ 37.994787][ T621] >>> a = 725, b = 724 > [ 37.995313][ T621] ------------[ cut here ]------------ > [ 37.995951][ T621] kernel BUG at fs/mymod/mymod.c:42! > [r[ oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG: > 00000000f2000800 [#1] SMP > [ 37.997420][ T621] Modules linked in: mymod(E) > [ 37.997891][ T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted: > G E 6.4.0-rc2-00034-g9257959a6e5b-dirty #117 > [ 37.999038][ T621] Hardware name: linux,dummy-virt (DT) > [ 37.999571][ T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS > BTYPE=--) > [ 38.000344][ T621] pc : increase_ab+0xcc/0xe70 [mymod] > [ 38.000882][ T621] lr : increase_ab+0xcc/0xe70 [mymod] > [ 38.001416][ T621] sp : ffff800008b4be40 > [ 38.001822][ T621] x29: ffff800008b4be40 x28: 0000000000000000 x27: > 0000000000000000 > [ 38.002605][ T621] x26: 0000000000000000 x25: 0000000000000000 x24: > 0000000000000000 > [ 38.003385][ T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21: > 0000000000000001 > [ 38.004174][ T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18: > 0000000000000000 > [ 38.004955][ T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15: > ffffffffffffffff > [ 38.005754][ T621] x14: 0000000000000000 x13: 205d313236542020 x12: > ffffd99332175b80 > [ 38.006538][ T621] x11: 0000000000000003 x10: 0000000000000001 x9 : > ffffd9933022a9d8 > [ 38.007325][ T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : > ffffd993320b5b40 > [ 38.008124][ T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 : > 0000000000000000 > [ 38.008912][ T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 : > 0000000000000015 > [ 38.009709][ T621] Call trace: > [ 38.010035][ T621] increase_ab+0xcc/0xe70 [mymod] > [ 38.010539][ T621] kthread+0xdc/0xf0 > [ 38.010927][ T621] ret_from_fork+0x10/0x20 > [ 38.011370][ T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000) > [ 38.012067][ T621] ---[ end trace 0000000000000000 ]--- Is this arm64 or something? You seem to have forgotten to mention what platform you're using. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-09-17 9:10 ` Peter Zijlstra @ 2023-09-17 9:26 ` Peter Zijlstra 2023-09-18 1:52 ` Baokun Li 2023-09-18 1:10 ` Baokun Li 1 sibling, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2023-09-17 9:26 UTC (permalink / raw) To: Baokun Li Cc: Yi Zhang, Ming Lei, mark.rutland, Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, yangerkun, zhangyi (F), Kees Cook, chengzhihao On Sun, Sep 17, 2023 at 11:10:32AM +0200, Peter Zijlstra wrote: > On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote: > > On 2023/9/13 16:59, Yi Zhang wrote: > > > The issue still can be reproduced on the latest linux tree[2]. > > > To reproduce I need to run about 1000 times blktests block/001, and > > > bisect shows it was introduced with commit[1], as it was not 100% > > > reproduced, not sure if it's the culprit? > > > > > > > > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery > > Hello, everyone! > > > > We have confirmed that the merge-in of this patch caused hlist_bl_lock > > (aka, bit_spin_lock) to fail, which in turn triggered the issue above. > > > [root@localhost ~]# insmod mymod.ko > > [ 37.994787][ T621] >>> a = 725, b = 724 > > [ 37.995313][ T621] ------------[ cut here ]------------ > > [ 37.995951][ T621] kernel BUG at fs/mymod/mymod.c:42! > > [r[ oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG: > > 00000000f2000800 [#1] SMP > > [ 37.997420][ T621] Modules linked in: mymod(E) > > [ 37.997891][ T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted: > > G E 6.4.0-rc2-00034-g9257959a6e5b-dirty #117 > > [ 37.999038][ T621] Hardware name: linux,dummy-virt (DT) > > [ 37.999571][ T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS > > BTYPE=--) > > [ 38.000344][ T621] pc : increase_ab+0xcc/0xe70 [mymod] > > [ 38.000882][ T621] lr : increase_ab+0xcc/0xe70 [mymod] > > [ 38.001416][ T621] sp : ffff800008b4be40 > > [ 38.001822][ T621] x29: ffff800008b4be40 x28: 0000000000000000 x27: > > 0000000000000000 > > [ 38.002605][ T621] x26: 0000000000000000 x25: 0000000000000000 x24: > > 0000000000000000 > > [ 38.003385][ T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21: > > 0000000000000001 > > [ 38.004174][ T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18: > > 0000000000000000 > > [ 38.004955][ T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15: > > ffffffffffffffff > > [ 38.005754][ T621] x14: 0000000000000000 x13: 205d313236542020 x12: > > ffffd99332175b80 > > [ 38.006538][ T621] x11: 0000000000000003 x10: 0000000000000001 x9 : > > ffffd9933022a9d8 > > [ 38.007325][ T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : > > ffffd993320b5b40 > > [ 38.008124][ T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 : > > 0000000000000000 > > [ 38.008912][ T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 : > > 0000000000000015 > > [ 38.009709][ T621] Call trace: > > [ 38.010035][ T621] increase_ab+0xcc/0xe70 [mymod] > > [ 38.010539][ T621] kthread+0xdc/0xf0 > > [ 38.010927][ T621] ret_from_fork+0x10/0x20 > > [ 38.011370][ T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000) > > [ 38.012067][ T621] ---[ end trace 0000000000000000 ]--- > > Is this arm64 or something? You seem to have forgotten to mention what > platform you're using. Is that an LSE or LLSC arm64 ? Anyway, it seems that ARM64 shouldn't be using the fallback as it does everything itself. Mark, can you have a look please? At first glance the atomic64_fetch_or_acquire() that's being used by generic bitops/lock.h seems in order.. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-09-17 9:26 ` Peter Zijlstra @ 2023-09-18 1:52 ` Baokun Li 2023-09-18 18:42 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Baokun Li @ 2023-09-18 1:52 UTC (permalink / raw) To: Peter Zijlstra Cc: Yi Zhang, Ming Lei, mark.rutland, Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, yangerkun, zhangyi (F), Kees Cook, chengzhihao, Baokun Li On 2023/9/17 17:26, Peter Zijlstra wrote: > On Sun, Sep 17, 2023 at 11:10:32AM +0200, Peter Zijlstra wrote: >> On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote: >>> On 2023/9/13 16:59, Yi Zhang wrote: >>>> The issue still can be reproduced on the latest linux tree[2]. >>>> To reproduce I need to run about 1000 times blktests block/001, and >>>> bisect shows it was introduced with commit[1], as it was not 100% >>>> reproduced, not sure if it's the culprit? >>>> >>>> >>>> [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery >>> Hello, everyone! >>> >>> We have confirmed that the merge-in of this patch caused hlist_bl_lock >>> (aka, bit_spin_lock) to fail, which in turn triggered the issue above. >>> [root@localhost ~]# insmod mymod.ko >>> [ 37.994787][ T621] >>> a = 725, b = 724 >>> [ 37.995313][ T621] ------------[ cut here ]------------ >>> [ 37.995951][ T621] kernel BUG at fs/mymod/mymod.c:42! >>> [r[ oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG: >>> 00000000f2000800 [#1] SMP >>> [ 37.997420][ T621] Modules linked in: mymod(E) >>> [ 37.997891][ T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted: >>> G E 6.4.0-rc2-00034-g9257959a6e5b-dirty #117 >>> [ 37.999038][ T621] Hardware name: linux,dummy-virt (DT) >>> [ 37.999571][ T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS >>> BTYPE=--) >>> [ 38.000344][ T621] pc : increase_ab+0xcc/0xe70 [mymod] >>> [ 38.000882][ T621] lr : increase_ab+0xcc/0xe70 [mymod] >>> [ 38.001416][ T621] sp : ffff800008b4be40 >>> [ 38.001822][ T621] x29: ffff800008b4be40 x28: 0000000000000000 x27: >>> 0000000000000000 >>> [ 38.002605][ T621] x26: 0000000000000000 x25: 0000000000000000 x24: >>> 0000000000000000 >>> [ 38.003385][ T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21: >>> 0000000000000001 >>> [ 38.004174][ T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18: >>> 0000000000000000 >>> [ 38.004955][ T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15: >>> ffffffffffffffff >>> [ 38.005754][ T621] x14: 0000000000000000 x13: 205d313236542020 x12: >>> ffffd99332175b80 >>> [ 38.006538][ T621] x11: 0000000000000003 x10: 0000000000000001 x9 : >>> ffffd9933022a9d8 >>> [ 38.007325][ T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : >>> ffffd993320b5b40 >>> [ 38.008124][ T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 : >>> 0000000000000000 >>> [ 38.008912][ T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 : >>> 0000000000000015 >>> [ 38.009709][ T621] Call trace: >>> [ 38.010035][ T621] increase_ab+0xcc/0xe70 [mymod] >>> [ 38.010539][ T621] kthread+0xdc/0xf0 >>> [ 38.010927][ T621] ret_from_fork+0x10/0x20 >>> [ 38.011370][ T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000) >>> [ 38.012067][ T621] ---[ end trace 0000000000000000 ]--- >> Is this arm64 or something? You seem to have forgotten to mention what >> platform you're using. > Is that an LSE or LLSC arm64 ? I'm not sure how to distinguish if it's LSE or LLSC, here's some info on the cpu: $ cat /sys/devices/system/cpu/cpu0/regs/identification/midr_el1 0x00000000481fd010 $ lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 1 Core(s) per socket: 48 Socket(s): 2 NUMA node(s): 4 Vendor ID: HiSilicon BIOS Vendor ID: HiSilicon Model: 0 Model name: Kunpeng-920 BIOS Model name: Kunpeng 920-4826 Stepping: 0x1 BogoMIPS: 200.00 L1d cache: 64K L1i cache: 64K L2 cache: 512K L3 cache: 49152K NUMA node0 CPU(s): 0-23 NUMA node1 CPU(s): 24-47 NUMA node2 CPU(s): 48-71 NUMA node3 CPU(s): 72-95 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm > Anyway, it seems that ARM64 shouldn't be using the fallback as it does > everything itself. > > Mark, can you have a look please? At first glance the > atomic64_fetch_or_acquire() that's being used by generic bitops/lock.h > seems in order.. > We also suspect some implicit mechanism change in raw_atomic64_fetch_or_acquire. You can reproduce the problem with the above mod that can reproduce the problem to make it easier to locate. I can help reproduce it and grab some information if you can't reproduce it on your end. -- With Best Regards, Baokun Li . ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-09-18 1:52 ` Baokun Li @ 2023-09-18 18:42 ` Darrick J. Wong 0 siblings, 0 replies; 14+ messages in thread From: Darrick J. Wong @ 2023-09-18 18:42 UTC (permalink / raw) To: Baokun Li Cc: Peter Zijlstra, Yi Zhang, Ming Lei, mark.rutland, Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, yangerkun, zhangyi (F), Kees Cook, chengzhihao On Mon, Sep 18, 2023 at 09:52:28AM +0800, Baokun Li wrote: > On 2023/9/17 17:26, Peter Zijlstra wrote: > > On Sun, Sep 17, 2023 at 11:10:32AM +0200, Peter Zijlstra wrote: > > > On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote: > > > > On 2023/9/13 16:59, Yi Zhang wrote: > > > > > The issue still can be reproduced on the latest linux tree[2]. > > > > > To reproduce I need to run about 1000 times blktests block/001, and > > > > > bisect shows it was introduced with commit[1], as it was not 100% > > > > > reproduced, not sure if it's the culprit? > > > > > > > > > > > > > > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery > > > > Hello, everyone! > > > > > > > > We have confirmed that the merge-in of this patch caused hlist_bl_lock > > > > (aka, bit_spin_lock) to fail, which in turn triggered the issue above. > > > > [root@localhost ~]# insmod mymod.ko > > > > [ 37.994787][ T621] >>> a = 725, b = 724 > > > > [ 37.995313][ T621] ------------[ cut here ]------------ > > > > [ 37.995951][ T621] kernel BUG at fs/mymod/mymod.c:42! > > > > [r[ oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG: > > > > 00000000f2000800 [#1] SMP > > > > [ 37.997420][ T621] Modules linked in: mymod(E) > > > > [ 37.997891][ T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted: > > > > G E 6.4.0-rc2-00034-g9257959a6e5b-dirty #117 > > > > [ 37.999038][ T621] Hardware name: linux,dummy-virt (DT) > > > > [ 37.999571][ T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS > > > > BTYPE=--) > > > > [ 38.000344][ T621] pc : increase_ab+0xcc/0xe70 [mymod] > > > > [ 38.000882][ T621] lr : increase_ab+0xcc/0xe70 [mymod] > > > > [ 38.001416][ T621] sp : ffff800008b4be40 > > > > [ 38.001822][ T621] x29: ffff800008b4be40 x28: 0000000000000000 x27: > > > > 0000000000000000 > > > > [ 38.002605][ T621] x26: 0000000000000000 x25: 0000000000000000 x24: > > > > 0000000000000000 > > > > [ 38.003385][ T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21: > > > > 0000000000000001 > > > > [ 38.004174][ T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18: > > > > 0000000000000000 > > > > [ 38.004955][ T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15: > > > > ffffffffffffffff > > > > [ 38.005754][ T621] x14: 0000000000000000 x13: 205d313236542020 x12: > > > > ffffd99332175b80 > > > > [ 38.006538][ T621] x11: 0000000000000003 x10: 0000000000000001 x9 : > > > > ffffd9933022a9d8 > > > > [ 38.007325][ T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : > > > > ffffd993320b5b40 > > > > [ 38.008124][ T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 : > > > > 0000000000000000 > > > > [ 38.008912][ T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 : > > > > 0000000000000015 > > > > [ 38.009709][ T621] Call trace: > > > > [ 38.010035][ T621] increase_ab+0xcc/0xe70 [mymod] > > > > [ 38.010539][ T621] kthread+0xdc/0xf0 > > > > [ 38.010927][ T621] ret_from_fork+0x10/0x20 > > > > [ 38.011370][ T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000) > > > > [ 38.012067][ T621] ---[ end trace 0000000000000000 ]--- > > > Is this arm64 or something? You seem to have forgotten to mention what > > > platform you're using. > > Is that an LSE or LLSC arm64 ? > > I'm not sure how to distinguish if it's LSE or LLSC, here's some info on the > cpu: > > $ cat /sys/devices/system/cpu/cpu0/regs/identification/midr_el1 > 0x00000000481fd010 > > $ lscpu > Architecture: aarch64 > Byte Order: Little Endian > CPU(s): 96 > On-line CPU(s) list: 0-95 > Thread(s) per core: 1 > Core(s) per socket: 48 > Socket(s): 2 > NUMA node(s): 4 > Vendor ID: HiSilicon > BIOS Vendor ID: HiSilicon > Model: 0 > Model name: Kunpeng-920 > BIOS Model name: Kunpeng 920-4826 > Stepping: 0x1 > BogoMIPS: 200.00 > L1d cache: 64K > L1i cache: 64K > L2 cache: 512K > L3 cache: 49152K > NUMA node0 CPU(s): 0-23 > NUMA node1 CPU(s): 24-47 > NUMA node2 CPU(s): 48-71 > NUMA node3 CPU(s): 72-95 > Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp > asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm > > > Anyway, it seems that ARM64 shouldn't be using the fallback as it does > > everything itself. > > > > Mark, can you have a look please? At first glance the > > atomic64_fetch_or_acquire() that's being used by generic bitops/lock.h > > seems in order.. > > > We also suspect some implicit mechanism change in > raw_atomic64_fetch_or_acquire. You can reproduce the problem with the > above mod that can reproduce the problem to make it easier to locate. > I can help reproduce it and grab some information if you can't reproduce > it on your end. FWIW this looks a lot like the crash I reported last week: https://lore.kernel.org/linux-fsdevel/ZQep0OR0uMmR%2Fwg3@dread.disaster.area/T/#t Also arm64, but virtualized. I /think/ the host is some Ampere box, though I have no idea what kind since it's just some Oracle Cloud A1 instance. The internet claims "Ampere Altra" processors[1]. # lscpu Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Vendor ID: ARM Model name: Neoverse-N1 Model: 1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 Stepping: r3p1 BogoMIPS: 50.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0,1 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Vulnerable Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Mitigation; CSV2, but not BHB Srbds: Not affected Tsx async abort: Not affected [1] https://www.oracle.com/cloud/compute/arm/ --D > -- > With Best Regards, > Baokun Li > . ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-09-17 9:10 ` Peter Zijlstra 2023-09-17 9:26 ` Peter Zijlstra @ 2023-09-18 1:10 ` Baokun Li 2023-09-18 10:20 ` Yi Zhang 1 sibling, 1 reply; 14+ messages in thread From: Baokun Li @ 2023-09-18 1:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Yi Zhang, Ming Lei, mark.rutland, Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, yangerkun, zhangyi (F), Kees Cook, chengzhihao, Baokun Li On 2023/9/17 17:10, Peter Zijlstra wrote: > On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote: >> On 2023/9/13 16:59, Yi Zhang wrote: >>> The issue still can be reproduced on the latest linux tree[2]. >>> To reproduce I need to run about 1000 times blktests block/001, and >>> bisect shows it was introduced with commit[1], as it was not 100% >>> reproduced, not sure if it's the culprit? >>> >>> >>> [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery >> Hello, everyone! >> >> We have confirmed that the merge-in of this patch caused hlist_bl_lock >> (aka, bit_spin_lock) to fail, which in turn triggered the issue above. >> [root@localhost ~]# insmod mymod.ko >> [ 37.994787][ T621] >>> a = 725, b = 724 >> [ 37.995313][ T621] ------------[ cut here ]------------ >> [ 37.995951][ T621] kernel BUG at fs/mymod/mymod.c:42! >> [r[ oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG: >> 00000000f2000800 [#1] SMP >> [ 37.997420][ T621] Modules linked in: mymod(E) >> [ 37.997891][ T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted: >> G E 6.4.0-rc2-00034-g9257959a6e5b-dirty #117 >> [ 37.999038][ T621] Hardware name: linux,dummy-virt (DT) >> [ 37.999571][ T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS >> BTYPE=--) >> [ 38.000344][ T621] pc : increase_ab+0xcc/0xe70 [mymod] >> [ 38.000882][ T621] lr : increase_ab+0xcc/0xe70 [mymod] >> [ 38.001416][ T621] sp : ffff800008b4be40 >> [ 38.001822][ T621] x29: ffff800008b4be40 x28: 0000000000000000 x27: >> 0000000000000000 >> [ 38.002605][ T621] x26: 0000000000000000 x25: 0000000000000000 x24: >> 0000000000000000 >> [ 38.003385][ T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21: >> 0000000000000001 >> [ 38.004174][ T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18: >> 0000000000000000 >> [ 38.004955][ T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15: >> ffffffffffffffff >> [ 38.005754][ T621] x14: 0000000000000000 x13: 205d313236542020 x12: >> ffffd99332175b80 >> [ 38.006538][ T621] x11: 0000000000000003 x10: 0000000000000001 x9 : >> ffffd9933022a9d8 >> [ 38.007325][ T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : >> ffffd993320b5b40 >> [ 38.008124][ T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 : >> 0000000000000000 >> [ 38.008912][ T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 : >> 0000000000000015 >> [ 38.009709][ T621] Call trace: >> [ 38.010035][ T621] increase_ab+0xcc/0xe70 [mymod] >> [ 38.010539][ T621] kthread+0xdc/0xf0 >> [ 38.010927][ T621] ret_from_fork+0x10/0x20 >> [ 38.011370][ T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000) >> [ 38.012067][ T621] ---[ end trace 0000000000000000 ]--- > Is this arm64 or something? You seem to have forgotten to mention what > platform you're using. > Sorry for the late reply. We tested both x86 and arm64, and the problem is only encountered under arm64. -- With Best Regards, Baokun Li . ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-09-18 1:10 ` Baokun Li @ 2023-09-18 10:20 ` Yi Zhang 0 siblings, 0 replies; 14+ messages in thread From: Yi Zhang @ 2023-09-18 10:20 UTC (permalink / raw) To: Baokun Li Cc: Peter Zijlstra, Ming Lei, mark.rutland, Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, yangerkun, zhangyi (F), Kees Cook, chengzhihao On Mon, Sep 18, 2023 at 9:10 AM Baokun Li <libaokun1@huawei.com> wrote: > > On 2023/9/17 17:10, Peter Zijlstra wrote: > > On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote: > >> On 2023/9/13 16:59, Yi Zhang wrote: > >>> The issue still can be reproduced on the latest linux tree[2]. > >>> To reproduce I need to run about 1000 times blktests block/001, and > >>> bisect shows it was introduced with commit[1], as it was not 100% > >>> reproduced, not sure if it's the culprit? > >>> > >>> > >>> [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery > >> Hello, everyone! > >> > >> We have confirmed that the merge-in of this patch caused hlist_bl_lock > >> (aka, bit_spin_lock) to fail, which in turn triggered the issue above. > >> [root@localhost ~]# insmod mymod.ko > >> [ 37.994787][ T621] >>> a = 725, b = 724 > >> [ 37.995313][ T621] ------------[ cut here ]------------ > >> [ 37.995951][ T621] kernel BUG at fs/mymod/mymod.c:42! > >> [r[ oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG: > >> 00000000f2000800 [#1] SMP > >> [ 37.997420][ T621] Modules linked in: mymod(E) > >> [ 37.997891][ T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted: > >> G E 6.4.0-rc2-00034-g9257959a6e5b-dirty #117 > >> [ 37.999038][ T621] Hardware name: linux,dummy-virt (DT) > >> [ 37.999571][ T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS > >> BTYPE=--) > >> [ 38.000344][ T621] pc : increase_ab+0xcc/0xe70 [mymod] > >> [ 38.000882][ T621] lr : increase_ab+0xcc/0xe70 [mymod] > >> [ 38.001416][ T621] sp : ffff800008b4be40 > >> [ 38.001822][ T621] x29: ffff800008b4be40 x28: 0000000000000000 x27: > >> 0000000000000000 > >> [ 38.002605][ T621] x26: 0000000000000000 x25: 0000000000000000 x24: > >> 0000000000000000 > >> [ 38.003385][ T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21: > >> 0000000000000001 > >> [ 38.004174][ T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18: > >> 0000000000000000 > >> [ 38.004955][ T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15: > >> ffffffffffffffff > >> [ 38.005754][ T621] x14: 0000000000000000 x13: 205d313236542020 x12: > >> ffffd99332175b80 > >> [ 38.006538][ T621] x11: 0000000000000003 x10: 0000000000000001 x9 : > >> ffffd9933022a9d8 > >> [ 38.007325][ T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : > >> ffffd993320b5b40 > >> [ 38.008124][ T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 : > >> 0000000000000000 > >> [ 38.008912][ T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 : > >> 0000000000000015 > >> [ 38.009709][ T621] Call trace: > >> [ 38.010035][ T621] increase_ab+0xcc/0xe70 [mymod] > >> [ 38.010539][ T621] kthread+0xdc/0xf0 > >> [ 38.010927][ T621] ret_from_fork+0x10/0x20 > >> [ 38.011370][ T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000) > >> [ 38.012067][ T621] ---[ end trace 0000000000000000 ]--- > > Is this arm64 or something? You seem to have forgotten to mention what > > platform you're using. > > > Sorry for the late reply. > We tested both x86 and arm64, and the problem is only encountered under > arm64. Yeah, my reproduced environment is also aarch64. > > -- > With Best Regards, > Baokun Li > . > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-09-16 6:55 ` Baokun Li 2023-09-17 9:10 ` Peter Zijlstra @ 2023-09-19 15:10 ` Mark Rutland 1 sibling, 0 replies; 14+ messages in thread From: Mark Rutland @ 2023-09-19 15:10 UTC (permalink / raw) To: Baokun Li Cc: Yi Zhang, Ming Lei, Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, yangerkun, zhangyi (F), peterz, Kees Cook, chengzhihao, Will Deacon On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote: > On 2023/9/13 16:59, Yi Zhang wrote: > > The issue still can be reproduced on the latest linux tree[2]. > > To reproduce I need to run about 1000 times blktests block/001, and > > bisect shows it was introduced with commit[1], as it was not 100% > > reproduced, not sure if it's the culprit? > > > > > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery > Hello, everyone! > > We have confirmed that the merge-in of this patch caused hlist_bl_lock > (aka, bit_spin_lock) to fail, which in turn triggered the issue above. Thanks for this! I believe I know what the issue is. I took a look at the generated assembly for hlist_bl_lock() and hlist_bl_unlock(), and for the latter I see a plain store rather than a store-release as was intended. I believe that in 9257959a6e5b, I messed up the fallback logic for atomic*_set_release(): | static __always_inline void | raw_atomic64_set_release(atomic64_t *v, s64 i) | { | #if defined(arch_atomic64_set_release) | arch_atomic64_set_release(v, i); | #elif defined(arch_atomic64_set) | arch_atomic64_set(v, i); | #else | if (__native_word(atomic64_t)) { | smp_store_release(&(v)->counter, i); | } else { | __atomic_release_fence(); | raw_atomic64_set(v, i); | } | #endif | } On arm64 we want to use smp_store_release(), and don't provide arch_atomic64_set_release(). Unfortunately we *do* provide arch_atomic64_set(), and the ifdeffery above will choose that in preference. Prior to that commit, the ifdeffery would do what we want: | #ifndef arch_atomic64_set_release | static __always_inline void | arch_atomic64_set_release(atomic64_t *v, s64 i) | { | if (__native_word(atomic64_t)) { | smp_store_release(&(v)->counter, i); | } else { | __atomic_release_fence(); | arch_atomic64_set(v, i); | } | } | #define arch_atomic64_set_release arch_atomic64_set_release | #endif That explains the lock going wrong -- we lose the RELEASE semantic on hlist_bl_unlock(), and so loads and stores within the critical section aren't guaranteed to be visible to the next hlist_bl_lock(). On x86 this happens to work becauase of TSO. I'm working on fixing that now; I'll try to have a patch shortly. Thanks, Mark. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-09-13 8:59 ` Yi Zhang 2023-09-16 6:55 ` Baokun Li @ 2023-09-17 0:35 ` Bagas Sanjaya 2023-09-29 13:24 ` Linux regression tracking #update (Thorsten Leemhuis) 1 sibling, 1 reply; 14+ messages in thread From: Bagas Sanjaya @ 2023-09-17 0:35 UTC (permalink / raw) To: Yi Zhang, Ming Lei Cc: Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong, Mark Rutland, Peter Zijlstra, Kees Cook, Linux Regressions [-- Attachment #1: Type: text/plain, Size: 7266 bytes --] On Wed, Sep 13, 2023 at 04:59:31PM +0800, Yi Zhang wrote: > The issue still can be reproduced on the latest linux tree[2]. > To reproduce I need to run about 1000 times blktests block/001, and > bisect shows it was introduced with commit[1], as it was not 100% > reproduced, not sure if it's the culprit? > > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery > [2] > [ 2304.536339] scsi 48:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 2304.540805] sr 50:0:0:0: Attached scsi CD-ROM sr3 > [ 2304.544574] scsi 48:0:0:0: Power-on or device reset occurred > [ 2304.600645] sr 48:0:0:0: [sr1] scsi-1 drive > [ 2304.616364] scsi 51:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 2304.624639] scsi 51:0:0:0: Power-on or device reset occurred > [ 2304.626634] sr 48:0:0:0: Attached scsi CD-ROM sr1 > [ 2304.680537] sr 51:0:0:0: [sr2] scsi-1 drive > [ 2304.706394] sr 51:0:0:0: Attached scsi CD-ROM sr2 > [ 2304.746329] scsi 49:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 2304.754569] scsi 49:0:0:0: Power-on or device reset occurred > [ 2304.756302] scsi 50:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 2304.768483] scsi 50:0:0:0: Power-on or device reset occurred > [ 2304.806321] scsi 48:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 2304.810587] sr 49:0:0:0: [sr0] scsi-1 drive > [ 2304.814561] scsi 48:0:0:0: Power-on or device reset occurred > [ 2304.824475] sr 50:0:0:0: [sr3] scsi-1 drive > [ 2304.836384] scsi 51:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 2304.840364] sr 49:0:0:0: Attached scsi CD-ROM sr0 > [ 2304.844619] scsi 51:0:0:0: Power-on or device reset occurred > [ 2304.850444] sr 50:0:0:0: Attached scsi CD-ROM sr3 > [ 2304.874563] sr 48:0:0:0: [sr1] scsi-1 drive > [ 2304.900660] sr 51:0:0:0: [sr2] scsi-1 drive > [ 2304.901506] sr 48:0:0:0: Attached scsi CD-ROM sr1 > [ 2304.926306] sr 51:0:0:0: Attached scsi CD-ROM sr2 > [ 2305.056432] scsi 50:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 2305.056572] scsi 49:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 2305.064635] scsi 50:0:0:0: Power-on or device reset occurred > [ 2305.072821] scsi 49:0:0:0: Power-on or device reset occurred > [ 2305.086286] scsi 51:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 2305.086357] scsi 48:0:0:0: CD-ROM Linux scsi_debug > 0191 PQ: 0 ANSI: 7 > [ 2305.094521] scsi 51:0:0:0: Power-on or device reset occurred > [ 2305.102693] scsi 48:0:0:0: Power-on or device reset occurred > [ 2305.128785] sr 50:0:0:0: [sr0] scsi-1 drive > [ 2305.134445] sr 49:0:0:0: [sr1] scsi-1 drive > [ 2305.154728] sr 50:0:0:0: Attached scsi CD-ROM sr0 > [ 2305.158607] sr 51:0:0:0: [sr2] scsi-1 drive > [ 2305.160392] sr 49:0:0:0: Attached scsi CD-ROM sr1 > [ 2305.164254] sr 48:0:0:0: [sr3] scsi-1 drive > [ 2305.184185] sr 51:0:0:0: Attached scsi CD-ROM sr2 > [ 2305.190086] sr 48:0:0:0: Attached scsi CD-ROM sr3 > [ 2305.555658] Unable to handle kernel execute from non-executable > memory at virtual address ffffc61b656052e8 > [ 2305.565301] Mem abort info: > [ 2305.568086] ESR = 0x000000008600000e > [ 2305.571822] EC = 0x21: IABT (current EL), IL = 32 bits > [ 2305.577123] SET = 0, FnV = 0 > [ 2305.580164] EA = 0, S1PTW = 0 > [ 2305.583292] FSC = 0x0e: level 2 permission fault > [ 2305.588074] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080731fa0000 > [ 2305.594761] [ffffc61b656052e8] pgd=1000080ffffff003, > p4d=1000080ffffff003, pud=1000080fffffe003, pmd=0068080732e00f01 > [ 2305.605362] Internal error: Oops: 000000008600000e [#1] SMP > [ 2305.610922] Modules linked in: scsi_debug sr_mod pktcdvd cdrom > rfkill sunrpc vfat fat acpi_ipmi arm_spe_pmu ipmi_ssif ipmi_devintf > ipmi_msghandler arm_cmn arm_dmc620_pmu arm_dsu_pmu cppc_cpufreq loop > fuse zram xfs crct10dif_ce ghash_ce nvme sha2_ce nvme_core > sha256_arm64 igb sha1_ce ast sbsa_gwdt nvme_common > i2c_designware_platform i2c_algo_bit i2c_designware_core xgene_hwmon > dm_mod [last unloaded: scsi_debug] > [ 2305.647236] CPU: 85 PID: 1 Comm: systemd Kdump: loaded Not tainted > 6.6.0-rc1+ #13 > [ 2305.654706] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS > F31n (SCP: 2.10.20220810) 09/30/2022 > [ 2305.663997] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 2305.670946] pc : in_lookup_hashtable+0x1138/0x2000 > [ 2305.675728] lr : rcu_do_batch+0x194/0x488 > [ 2305.679727] sp : ffff8000802abe60 > [ 2305.683029] x29: ffff8000802abe60 x28: ffffc61b6524c7c0 x27: ffffc61b63452f40 > [ 2305.690152] x26: ffff080f37ab6438 x25: 000000000000000a x24: 0000000000000000 > [ 2305.697274] x23: 0000000000000002 x22: ffff8000802abec0 x21: ffff080f37ab63c0 > [ 2305.704396] x20: ffff07ff8136a580 x19: 0000000000000003 x18: 0000000000000000 > [ 2305.711519] x17: ffff41f3d3161000 x16: ffff8000802a8000 x15: 0000000000000000 > [ 2305.718641] x14: 0000000000000000 x13: ffff07ffa131802d x12: ffff80008041bb94 > [ 2305.725764] x11: 0000000000000040 x10: ffff07ff802622e8 x9 : ffffc61b63452e30 > [ 2305.732887] x8 : 000002189dce1780 x7 : ffff07ff8d5c1000 x6 : ffff41f3d3161000 > [ 2305.740009] x5 : ffff07ff8136a580 x4 : ffff080f37aba960 x3 : 000000001550a055 > [ 2305.747131] x2 : 0000000000000000 x1 : ffffc61b656052e8 x0 : ffff080184c565f0 > [ 2305.754254] Call trace: > [ 2305.756687] in_lookup_hashtable+0x1138/0x2000 > [ 2305.761119] rcu_core+0x268/0x350 > [ 2305.764422] rcu_core_si+0x18/0x30 > [ 2305.767812] __do_softirq+0x120/0x3d4 > [ 2305.771462] ____do_softirq+0x18/0x30 > [ 2305.775112] call_on_irq_stack+0x24/0x30 > [ 2305.779022] do_softirq_own_stack+0x24/0x38 > [ 2305.783192] __irq_exit_rcu+0xfc/0x130 > [ 2305.786929] irq_exit_rcu+0x18/0x30 > [ 2305.790404] el1_interrupt+0x4c/0xe8 > [ 2305.793969] el1h_64_irq_handler+0x18/0x28 > [ 2305.798052] el1h_64_irq+0x78/0x80 > [ 2305.801441] d_same_name+0x50/0xd0 > [ 2305.804832] __lookup_slow+0x64/0x158 > [ 2305.808482] walk_component+0xe0/0x1a0 > [ 2305.812219] path_lookupat+0x7c/0x1b8 > [ 2305.815869] filename_lookup+0xb4/0x1b8 > [ 2305.819692] vfs_statx+0x94/0x1a8 > [ 2305.822995] vfs_fstatat+0xd4/0x110 > [ 2305.826471] __do_sys_newfstatat+0x58/0xa8 > [ 2305.830556] __arm64_sys_newfstatat+0x28/0x40 > [ 2305.834901] invoke_syscall.constprop.0+0x80/0xd8 > [ 2305.839592] do_el0_svc+0x48/0xd0 > [ 2305.842894] el0_svc+0x4c/0x1c0 > [ 2305.846023] el0t_64_sync_handler+0x120/0x130 > [ 2305.850367] el0t_64_sync+0x1a4/0x1a8 > [ 2305.854017] Code: 00000000 00000000 00000000 00000000 (84c565f1) > [ 2305.860098] SMP: stopping secondary CPUs > [ 2305.865048] Starting crashdump kernel... > [ 2305.868958] Bye! > > Please don't top-post; reply inline with appropriate context instead. Anyway, thanks for bisecting this regression. I'm adding it to regzbot: #regzbot ^introduced: 9257959a6e5b4f #regzbot title: restructuring atomic locking conditionals causes vfs dentry lock protection failure -- An old man doll... just what I always wanted! - Clara [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] 2023-09-17 0:35 ` Bagas Sanjaya @ 2023-09-29 13:24 ` Linux regression tracking #update (Thorsten Leemhuis) 0 siblings, 0 replies; 14+ messages in thread From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-09-29 13:24 UTC (permalink / raw) To: Linux Regressions; +Cc: linux-fsdevel, linux-kernel, linux-scsi On 17.09.23 02:35, Bagas Sanjaya wrote: > On Wed, Sep 13, 2023 at 04:59:31PM +0800, Yi Zhang wrote: >> The issue still can be reproduced on the latest linux tree[2]. >> To reproduce I need to run about 1000 times blktests block/001, and >> bisect shows it was introduced with commit[1], as it was not 100% >> reproduced, not sure if it's the culprit? > > Anyway, thanks for bisecting this regression. I'm adding it to regzbot: > > #regzbot ^introduced: 9257959a6e5b4f > #regzbot title: restructuring atomic locking conditionals causes vfs dentry lock protection failure #regzbot fix: 6d2779ecaeb56f #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-09-29 13:24 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-23 4:06 [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] Ming Lei 2023-08-23 8:47 ` Christian Brauner 2023-08-28 10:43 ` Ming Lei 2023-09-13 8:59 ` Yi Zhang 2023-09-16 6:55 ` Baokun Li 2023-09-17 9:10 ` Peter Zijlstra 2023-09-17 9:26 ` Peter Zijlstra 2023-09-18 1:52 ` Baokun Li 2023-09-18 18:42 ` Darrick J. Wong 2023-09-18 1:10 ` Baokun Li 2023-09-18 10:20 ` Yi Zhang 2023-09-19 15:10 ` Mark Rutland 2023-09-17 0:35 ` Bagas Sanjaya 2023-09-29 13:24 ` Linux regression tracking #update (Thorsten Leemhuis)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).