* blktests failures with v6.14-rc1 kernel
@ 2025-02-07 1:24 Shinichiro Kawasaki
2025-02-07 8:52 ` Ming Lei
0 siblings, 1 reply; 4+ messages in thread
From: Shinichiro Kawasaki @ 2025-02-07 1:24 UTC (permalink / raw)
To: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-scsi@vger.kernel.org, nbd@other.debian.org,
linux-rdma@vger.kernel.org
Hi all,
I ran the latest blktests (git hash: 67aff550bd52) with the v6.14-rc1 kernel.
I observed 5 failures listed below. Comparing with the previous report with
the v6.13 kernel [1], one new failure was observed at zbd/009.
[1] https://lore.kernel.org/linux-nvme/rv3w2zcno7n3bgdy2ghxmedsqf23ptmakvjerbhopgxjsvgzmo@ioece7dyg2og/
List of failures
================
#1: block/002
#2: nvme/037 (fc transport)
#3: nvme/041 (fc transport)
#4: nvme/058 (loop transport)
#5: zbd/009 (new)
Two failures observed with the v6.13 kernel are not observed with the v6.14-rc1.
Failures no longer observed
===========================
#1: block/001:
It looks resolved by fixes in v6.14-rc1 kernel.
#2: throtl/001 (CKI project, s390 arch)
I was not able to find blktests runs by CKI project with the v6.14-rc1
kernel.
Failure description
===================
#1: block/002
This test case fails with a lockdep WARN "possible circular locking
dependency detected". The lockdep splats shows q->q_usage_counter as one
of the involved locks. It was observed with the v6.13-rc2 kernel [2], and
still observed with v6.14-rc1 kernel. It needs further debug.
[2] https://lore.kernel.org/linux-block/qskveo3it6rqag4xyleobe5azpxu6tekihao4qpdopvk44una2@y4lkoe6y3d6z/
#2: nvme/037 (fc transport)
#3: nvme/041 (fc transport)
These two test cases fail for fc transport. Refer to the report for v6.12
kernel [3].
[3] https://lore.kernel.org/linux-nvme/6crydkodszx5vq4ieox3jjpwkxtu7mhbohypy24awlo5w7f4k6@to3dcng24rd4/
#4: nvme/058 (loop transport)
This test case hangs occasionally with Oops and KASAN null-ptr-deref. It was
reported for the first time with the kernel v6.13 [1]. A fix patch candidate
was posted [4] (Thanks!). The patch needs further work.
[4] https://lore.kernel.org/linux-nvme/20250124082505.140258-1-hare@kernel.org/
#5: zbd/009 (new)
This test case fails with a lockdep WARN "possible circular locking
dependency detected" [5]. The lockdep splats shows q->q_usage_counter as one
of the involved locks. This is common as the block/002 failure. It needs
further debug.
[5] kernel message during zbd/009 run
[ 204.099296] [ T1004] run blktests zbd/009 at 2025-02-07 10:01:36
[ 204.155021] [ T1040] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
[ 204.553613] [ T1041] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
[ 204.554438] [ T1041] scsi host9: scsi_debug: version 0191 [20210520]
dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0
[ 204.558331] [ T1041] scsi 9:0:0:0: Direct-Access-ZBC Linux scsi_debug 0191 PQ: 0 ANSI: 7
[ 204.560269] [ C2] scsi 9:0:0:0: Power-on or device reset occurred
[ 204.562871] [ T1041] sd 9:0:0:0: Attached scsi generic sg3 type 20
[ 204.563013] [ T100] sd 9:0:0:0: [sdd] Host-managed zoned block device
[ 204.564518] [ T100] sd 9:0:0:0: [sdd] 262144 4096-byte logical blocks: (1.07 GB/1.00 GiB)
[ 204.565477] [ T100] sd 9:0:0:0: [sdd] Write Protect is off
[ 204.565948] [ T100] sd 9:0:0:0: [sdd] Mode Sense: 5b 00 10 08
[ 204.566245] [ T100] sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 204.567453] [ T100] sd 9:0:0:0: [sdd] permanent stream count = 5
[ 204.568276] [ T100] sd 9:0:0:0: [sdd] Preferred minimum I/O size 4096 bytes
[ 204.569067] [ T100] sd 9:0:0:0: [sdd] Optimal transfer size 4194304 bytes
[ 204.571080] [ T100] sd 9:0:0:0: [sdd] 256 zones of 1024 logical blocks
[ 204.593822] [ T100] sd 9:0:0:0: [sdd] Attached SCSI disk
[ 204.901514] [ T1067] BTRFS: device fsid 15196e63-e303-48ed-9dcb-9ec397479c42 devid 1 transid 8 /dev/sdd (8:48) scanned by mount (1067)
[ 204.910330] [ T1067] BTRFS info (device sdd): first mount of filesystem 15196e63-e303-48ed-9dcb-9ec397479c42
[ 204.913129] [ T1067] BTRFS info (device sdd): using crc32c (crc32c-generic) checksum algorithm
[ 204.914856] [ T1067] BTRFS info (device sdd): using free-space-tree
[ 204.925816] [ T1067] BTRFS info (device sdd): host-managed zoned block device /dev/sdd, 256 zones of 4194304 bytes
[ 204.929320] [ T1067] BTRFS info (device sdd): zoned mode enabled with zone size 4194304
[ 204.935403] [ T1067] BTRFS info (device sdd): checking UUID tree
[ 215.637712] [ T1103] BTRFS info (device sdd): last unmount of filesystem 15196e63-e303-48ed-9dcb-9ec397479c42
[ 215.762293] [ T1110] ======================================================
[ 215.763636] [ T1110] WARNING: possible circular locking dependency detected
[ 215.765092] [ T1110] 6.14.0-rc1 #252 Not tainted
[ 215.766271] [ T1110] ------------------------------------------------------
[ 215.767615] [ T1110] modprobe/1110 is trying to acquire lock:
[ 215.768999] [ T1110] ffff888100ac83e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x38f/0xb60
[ 215.770700] [ T1110]
but task is already holding lock:
[ 215.773077] [ T1110] ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130
[ 215.774685] [ T1110]
which lock already depends on the new lock.
[ 215.778184] [ T1110]
the existing dependency chain (in reverse order) is:
[ 215.780532] [ T1110]
-> #3 (&q->q_usage_counter(queue)#16){++++}-{0:0}:
[ 215.782937] [ T1110] blk_queue_enter+0x3d9/0x500
[ 215.784175] [ T1110] blk_mq_alloc_request+0x47d/0x8e0
[ 215.785434] [ T1110] scsi_execute_cmd+0x14f/0xb80
[ 215.786662] [ T1110] sd_zbc_do_report_zones+0x1c1/0x470
[ 215.787989] [ T1110] sd_zbc_report_zones+0x362/0xd60
[ 215.789222] [ T1110] blkdev_report_zones+0x1b1/0x2e0
[ 215.790448] [ T1110] btrfs_get_dev_zones+0x215/0x7e0 [btrfs]
[ 215.791887] [ T1110] btrfs_load_block_group_zone_info+0x6d2/0x2c10 [btrfs]
[ 215.793342] [ T1110] btrfs_make_block_group+0x36b/0x870 [btrfs]
[ 215.794752] [ T1110] btrfs_create_chunk+0x147d/0x2320 [btrfs]
[ 215.796150] [ T1110] btrfs_chunk_alloc+0x2ce/0xcf0 [btrfs]
[ 215.797474] [ T1110] start_transaction+0xce6/0x1620 [btrfs]
[ 215.798858] [ T1110] btrfs_uuid_scan_kthread+0x4ee/0x5b0 [btrfs]
[ 215.800334] [ T1110] kthread+0x39d/0x750
[ 215.801479] [ T1110] ret_from_fork+0x30/0x70
[ 215.802662] [ T1110] ret_from_fork_asm+0x1a/0x30
[ 215.803902] [ T1110]
-> #2 (&fs_info->dev_replace.rwsem){++++}-{4:4}:
[ 215.805993] [ T1110] down_read+0x9b/0x470
[ 215.807088] [ T1110] btrfs_map_block+0x2ce/0x2ce0 [btrfs]
[ 215.808366] [ T1110] btrfs_submit_chunk+0x2d4/0x16c0 [btrfs]
[ 215.809687] [ T1110] btrfs_submit_bbio+0x16/0x30 [btrfs]
[ 215.810983] [ T1110] btree_write_cache_pages+0xb5a/0xf90 [btrfs]
[ 215.812295] [ T1110] do_writepages+0x17f/0x7b0
[ 215.813416] [ T1110] __writeback_single_inode+0x114/0xb00
[ 215.814575] [ T1110] writeback_sb_inodes+0x52b/0xe00
[ 215.815717] [ T1110] wb_writeback+0x1a7/0x800
[ 215.816924] [ T1110] wb_workfn+0x12a/0xbd0
[ 215.817951] [ T1110] process_one_work+0x85a/0x1460
[ 215.818985] [ T1110] worker_thread+0x5e2/0xfc0
[ 215.820013] [ T1110] kthread+0x39d/0x750
[ 215.821000] [ T1110] ret_from_fork+0x30/0x70
[ 215.822010] [ T1110] ret_from_fork_asm+0x1a/0x30
[ 215.822988] [ T1110]
-> #1 (&fs_info->zoned_meta_io_lock){+.+.}-{4:4}:
[ 215.824855] [ T1110] __mutex_lock+0x1aa/0x1360
[ 215.825856] [ T1110] btree_write_cache_pages+0x252/0xf90 [btrfs]
[ 215.827089] [ T1110] do_writepages+0x17f/0x7b0
[ 215.828027] [ T1110] __writeback_single_inode+0x114/0xb00
[ 215.829141] [ T1110] writeback_sb_inodes+0x52b/0xe00
[ 215.830129] [ T1110] wb_writeback+0x1a7/0x800
[ 215.831084] [ T1110] wb_workfn+0x12a/0xbd0
[ 215.831950] [ T1110] process_one_work+0x85a/0x1460
[ 215.832862] [ T1110] worker_thread+0x5e2/0xfc0
[ 215.833826] [ T1110] kthread+0x39d/0x750
[ 215.834715] [ T1110] ret_from_fork+0x30/0x70
[ 215.835669] [ T1110] ret_from_fork_asm+0x1a/0x30
[ 215.836594] [ T1110]
-> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
[ 215.838347] [ T1110] __lock_acquire+0x2f52/0x5ea0
[ 215.839258] [ T1110] lock_acquire+0x1b1/0x540
[ 215.840156] [ T1110] __flush_work+0x3ac/0xb60
[ 215.841041] [ T1110] wb_shutdown+0x15b/0x1f0
[ 215.841915] [ T1110] bdi_unregister+0x172/0x5b0
[ 215.842793] [ T1110] del_gendisk+0x841/0xa20
[ 215.843724] [ T1110] sd_remove+0x85/0x130
[ 215.844660] [ T1110] device_release_driver_internal+0x368/0x520
[ 215.845757] [ T1110] bus_remove_device+0x1f1/0x3f0
[ 215.846755] [ T1110] device_del+0x3bd/0x9c0
[ 215.847712] [ T1110] __scsi_remove_device+0x272/0x340
[ 215.848727] [ T1110] scsi_forget_host+0xf7/0x170
[ 215.849710] [ T1110] scsi_remove_host+0xd2/0x2a0
[ 215.850682] [ T1110] sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
[ 215.851788] [ T1110] device_release_driver_internal+0x368/0x520
[ 215.852853] [ T1110] bus_remove_device+0x1f1/0x3f0
[ 215.853885] [ T1110] device_del+0x3bd/0x9c0
[ 215.854840] [ T1110] device_unregister+0x13/0xa0
[ 215.855850] [ T1110] sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
[ 215.856947] [ T1110] scsi_debug_exit+0x17/0x70 [scsi_debug]
[ 215.857968] [ T1110] __do_sys_delete_module.isra.0+0x321/0x520
[ 215.858999] [ T1110] do_syscall_64+0x93/0x180
[ 215.859930] [ T1110] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 215.860974] [ T1110]
other info that might help us debug this:
[ 215.863317] [ T1110] Chain exists of:
(work_completion)(&(&wb->dwork)->work) --> &fs_info->dev_replace.rwsem --> &q->q_usage_counter(queue)#16
[ 215.866277] [ T1110] Possible unsafe locking scenario:
[ 215.867927] [ T1110] CPU0 CPU1
[ 215.868904] [ T1110] ---- ----
[ 215.869880] [ T1110] lock(&q->q_usage_counter(queue)#16);
[ 215.870878] [ T1110] lock(&fs_info->dev_replace.rwsem);
[ 215.872075] [ T1110] lock(&q->q_usage_counter(queue)#16);
[ 215.873274] [ T1110] lock((work_completion)(&(&wb->dwork)->work));
[ 215.874332] [ T1110]
*** DEADLOCK ***
[ 215.876625] [ T1110] 5 locks held by modprobe/1110:
[ 215.877579] [ T1110] #0: ffff88811f7bc108 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520
[ 215.879029] [ T1110] #1: ffff8881022ee0e0 (&shost->scan_mutex){+.+.}-{4:4}, at: scsi_remove_host+0x20/0x2a0
[ 215.880402] [ T1110] #2: ffff88811b4c4378 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520
[ 215.881861] [ T1110] #3: ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130
[ 215.883302] [ T1110] #4: ffffffffa3284360 (rcu_read_lock){....}-{1:3}, at: __flush_work+0xda/0xb60
[ 215.884667] [ T1110]
stack backtrace:
[ 215.886418] [ T1110] CPU: 0 UID: 0 PID: 1110 Comm: modprobe Not tainted 6.14.0-rc1 #252
[ 215.886422] [ T1110] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[ 215.886425] [ T1110] Call Trace:
[ 215.886430] [ T1110] <TASK>
[ 215.886432] [ T1110] dump_stack_lvl+0x6a/0x90
[ 215.886440] [ T1110] print_circular_bug.cold+0x1e0/0x274
[ 215.886445] [ T1110] check_noncircular+0x306/0x3f0
[ 215.886449] [ T1110] ? __pfx_check_noncircular+0x10/0x10
[ 215.886452] [ T1110] ? mark_lock+0xf5/0x1650
[ 215.886454] [ T1110] ? __pfx_check_irq_usage+0x10/0x10
[ 215.886458] [ T1110] ? lockdep_lock+0xca/0x1c0
[ 215.886460] [ T1110] ? __pfx_lockdep_lock+0x10/0x10
[ 215.886464] [ T1110] __lock_acquire+0x2f52/0x5ea0
[ 215.886469] [ T1110] ? __pfx___lock_acquire+0x10/0x10
[ 215.886473] [ T1110] ? __pfx_mark_lock+0x10/0x10
[ 215.886476] [ T1110] lock_acquire+0x1b1/0x540
[ 215.886479] [ T1110] ? __flush_work+0x38f/0xb60
[ 215.886482] [ T1110] ? __pfx_lock_acquire+0x10/0x10
[ 215.886485] [ T1110] ? __pfx_lock_release+0x10/0x10
[ 215.886488] [ T1110] ? mark_held_locks+0x94/0xe0
[ 215.886492] [ T1110] ? __flush_work+0x38f/0xb60
[ 215.886494] [ T1110] __flush_work+0x3ac/0xb60
[ 215.886498] [ T1110] ? __flush_work+0x38f/0xb60
[ 215.886501] [ T1110] ? __pfx_mark_lock+0x10/0x10
[ 215.886503] [ T1110] ? __pfx___flush_work+0x10/0x10
[ 215.886506] [ T1110] ? __pfx_wq_barrier_func+0x10/0x10
[ 215.886515] [ T1110] ? __pfx___might_resched+0x10/0x10
[ 215.886520] [ T1110] ? mark_held_locks+0x94/0xe0
[ 215.886524] [ T1110] wb_shutdown+0x15b/0x1f0
[ 215.886527] [ T1110] bdi_unregister+0x172/0x5b0
[ 215.886530] [ T1110] ? __pfx_bdi_unregister+0x10/0x10
[ 215.886535] [ T1110] ? up_write+0x1ba/0x510
[ 215.886539] [ T1110] del_gendisk+0x841/0xa20
[ 215.886543] [ T1110] ? __pfx_del_gendisk+0x10/0x10
[ 215.886546] [ T1110] ? _raw_spin_unlock_irqrestore+0x35/0x60
[ 215.886550] [ T1110] ? __pm_runtime_resume+0x79/0x110
[ 215.886556] [ T1110] sd_remove+0x85/0x130
[ 215.886558] [ T1110] device_release_driver_internal+0x368/0x520
[ 215.886563] [ T1110] ? kobject_put+0x5d/0x4a0
[ 215.886567] [ T1110] bus_remove_device+0x1f1/0x3f0
[ 215.886570] [ T1110] device_del+0x3bd/0x9c0
[ 215.886574] [ T1110] ? __pfx_device_del+0x10/0x10
[ 215.886578] [ T1110] __scsi_remove_device+0x272/0x340
[ 215.886581] [ T1110] scsi_forget_host+0xf7/0x170
[ 215.886585] [ T1110] scsi_remove_host+0xd2/0x2a0
[ 215.886587] [ T1110] sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
[ 215.886600] [ T1110] ? kernfs_remove_by_name_ns+0xc0/0xf0
[ 215.886607] [ T1110] device_release_driver_internal+0x368/0x520
[ 215.886610] [ T1110] ? kobject_put+0x5d/0x4a0
[ 215.886613] [ T1110] bus_remove_device+0x1f1/0x3f0
[ 215.886616] [ T1110] device_del+0x3bd/0x9c0
[ 215.886619] [ T1110] ? __pfx_device_del+0x10/0x10
[ 215.886621] [ T1110] ? __pfx___mutex_unlock_slowpath+0x10/0x10
[ 215.886626] [ T1110] device_unregister+0x13/0xa0
[ 215.886628] [ T1110] sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
[ 215.886640] [ T1110] scsi_debug_exit+0x17/0x70 [scsi_debug]
[ 215.886652] [ T1110] __do_sys_delete_module.isra.0+0x321/0x520
[ 215.886655] [ T1110] ? __pfx___do_sys_delete_module.isra.0+0x10/0x10
[ 215.886657] [ T1110] ? __pfx_slab_free_after_rcu_debug+0x10/0x10
[ 215.886665] [ T1110] ? kasan_save_stack+0x2c/0x50
[ 215.886670] [ T1110] ? kasan_record_aux_stack+0xa3/0xb0
[ 215.886673] [ T1110] ? __call_rcu_common.constprop.0+0xc4/0xfb0
[ 215.886677] [ T1110] ? kmem_cache_free+0x3a0/0x590
[ 215.886679] [ T1110] ? __x64_sys_close+0x78/0xd0
[ 215.886687] [ T1110] do_syscall_64+0x93/0x180
[ 215.886694] [ T1110] ? lock_is_held_type+0xd5/0x130
[ 215.886697] [ T1110] ? __call_rcu_common.constprop.0+0x3c0/0xfb0
[ 215.886699] [ T1110] ? lockdep_hardirqs_on+0x78/0x100
[ 215.886701] [ T1110] ? __call_rcu_common.constprop.0+0x3c0/0xfb0
[ 215.886705] [ T1110] ? __pfx___call_rcu_common.constprop.0+0x10/0x10
[ 215.886710] [ T1110] ? kmem_cache_free+0x3a0/0x590
[ 215.886713] [ T1110] ? lockdep_hardirqs_on_prepare+0x16d/0x400
[ 215.886715] [ T1110] ? do_syscall_64+0x9f/0x180
[ 215.886717] [ T1110] ? lockdep_hardirqs_on+0x78/0x100
[ 215.886719] [ T1110] ? do_syscall_64+0x9f/0x180
[ 215.886721] [ T1110] ? __pfx___x64_sys_openat+0x10/0x10
[ 215.886725] [ T1110] ? lockdep_hardirqs_on_prepare+0x16d/0x400
[ 215.886727] [ T1110] ? do_syscall_64+0x9f/0x180
[ 215.886729] [ T1110] ? lockdep_hardirqs_on+0x78/0x100
[ 215.886731] [ T1110] ? do_syscall_64+0x9f/0x180
[ 215.886734] [ T1110] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 215.886737] [ T1110] RIP: 0033:0x7f436712b68b
[ 215.886741] [ T1110] Code: 73 01 c3 48 8b 0d 8d a7 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d a7 0c 00 f7 d8 64 89 01 48
[ 215.886743] [ T1110] RSP: 002b:00007ffe9f1a8658 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 215.886750] [ T1110] RAX: ffffffffffffffda RBX: 00005559b367fd80 RCX: 00007f436712b68b
[ 215.886753] [ T1110] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005559b367fde8
[ 215.886754] [ T1110] RBP: 00007ffe9f1a8680 R08: 1999999999999999 R09: 0000000000000000
[ 215.886756] [ T1110] R10: 00007f43671a5fe0 R11: 0000000000000206 R12: 0000000000000000
[ 215.886757] [ T1110] R13: 00007ffe9f1a86b0 R14: 0000000000000000 R15: 0000000000000000
[ 215.886761] [ T1110] </TASK>
[ 215.989918] [ T1110] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: blktests failures with v6.14-rc1 kernel
2025-02-07 1:24 blktests failures with v6.14-rc1 kernel Shinichiro Kawasaki
@ 2025-02-07 8:52 ` Ming Lei
2025-02-07 12:22 ` Shinichiro Kawasaki
0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2025-02-07 8:52 UTC (permalink / raw)
To: Shinichiro Kawasaki
Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-scsi@vger.kernel.org, nbd@other.debian.org,
linux-rdma@vger.kernel.org
Hi Shinichiro,
On Fri, Feb 07, 2025 at 01:24:09AM +0000, Shinichiro Kawasaki wrote:
> Hi all,
>
> I ran the latest blktests (git hash: 67aff550bd52) with the v6.14-rc1 kernel.
> I observed 5 failures listed below. Comparing with the previous report with
> the v6.13 kernel [1], one new failure was observed at zbd/009.
>
> [1] https://lore.kernel.org/linux-nvme/rv3w2zcno7n3bgdy2ghxmedsqf23ptmakvjerbhopgxjsvgzmo@ioece7dyg2og/
>
> List of failures
> ================
> #1: block/002
> #2: nvme/037 (fc transport)
> #3: nvme/041 (fc transport)
> #4: nvme/058 (loop transport)
> #5: zbd/009 (new)
>
>
> Two failures observed with the v6.13 kernel are not observed with the v6.14-rc1.
>
> Failures no longer observed
> ===========================
> #1: block/001:
> It looks resolved by fixes in v6.14-rc1 kernel.
>
> #2: throtl/001 (CKI project, s390 arch)
> I was not able to find blktests runs by CKI project with the v6.14-rc1
> kernel.
>
>
> Failure description
> ===================
>
> #1: block/002
>
> This test case fails with a lockdep WARN "possible circular locking
> dependency detected". The lockdep splats shows q->q_usage_counter as one
> of the involved locks. It was observed with the v6.13-rc2 kernel [2], and
> still observed with v6.14-rc1 kernel. It needs further debug.
>
> [2] https://lore.kernel.org/linux-block/qskveo3it6rqag4xyleobe5azpxu6tekihao4qpdopvk44una2@y4lkoe6y3d6z/
[ 342.568086][ T1023] -> #0 (&mm->mmap_lock){++++}-{4:4}:
[ 342.569658][ T1023] __lock_acquire+0x2e8b/0x6010
[ 342.570577][ T1023] lock_acquire+0x1b1/0x540
[ 342.571463][ T1023] __might_fault+0xb9/0x120
[ 342.572338][ T1023] _copy_from_user+0x34/0xa0
[ 342.573231][ T1023] __blk_trace_setup+0xa0/0x140
[ 342.574129][ T1023] blk_trace_ioctl+0x14e/0x270
[ 342.575033][ T1023] blkdev_ioctl+0x38f/0x5c0
[ 342.575919][ T1023] __x64_sys_ioctl+0x130/0x190
[ 342.576824][ T1023] do_syscall_64+0x93/0x180
[ 342.577714][ T1023] entry_SYSCALL_64_after_hwframe+0x76/0x7e
The above dependency between ->mmap_lock and ->debugfs_mutex has been cut by
commit b769a2f409e7 ("blktrace: move copy_[to|from]_user() out of ->debugfs_lock"),
so I'd suggest to double check this one.
You mentioned it can be reproduced in v6.14-rc1, but not see such warning
from v6.14-rc1.
> #5: zbd/009 (new)
>
> This test case fails with a lockdep WARN "possible circular locking
> dependency detected" [5]. The lockdep splats shows q->q_usage_counter as one
> of the involved locks. This is common as the block/002 failure. It needs
> further debug.
>
> [5] kernel message during zbd/009 run
>
> [ 204.099296] [ T1004] run blktests zbd/009 at 2025-02-07 10:01:36
> [ 204.155021] [ T1040] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
> [ 204.553613] [ T1041] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
> [ 204.554438] [ T1041] scsi host9: scsi_debug: version 0191 [20210520]
> dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0
> [ 204.558331] [ T1041] scsi 9:0:0:0: Direct-Access-ZBC Linux scsi_debug 0191 PQ: 0 ANSI: 7
> [ 204.560269] [ C2] scsi 9:0:0:0: Power-on or device reset occurred
> [ 204.562871] [ T1041] sd 9:0:0:0: Attached scsi generic sg3 type 20
> [ 204.563013] [ T100] sd 9:0:0:0: [sdd] Host-managed zoned block device
> [ 204.564518] [ T100] sd 9:0:0:0: [sdd] 262144 4096-byte logical blocks: (1.07 GB/1.00 GiB)
> [ 204.565477] [ T100] sd 9:0:0:0: [sdd] Write Protect is off
> [ 204.565948] [ T100] sd 9:0:0:0: [sdd] Mode Sense: 5b 00 10 08
> [ 204.566245] [ T100] sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
> [ 204.567453] [ T100] sd 9:0:0:0: [sdd] permanent stream count = 5
> [ 204.568276] [ T100] sd 9:0:0:0: [sdd] Preferred minimum I/O size 4096 bytes
> [ 204.569067] [ T100] sd 9:0:0:0: [sdd] Optimal transfer size 4194304 bytes
> [ 204.571080] [ T100] sd 9:0:0:0: [sdd] 256 zones of 1024 logical blocks
> [ 204.593822] [ T100] sd 9:0:0:0: [sdd] Attached SCSI disk
> [ 204.901514] [ T1067] BTRFS: device fsid 15196e63-e303-48ed-9dcb-9ec397479c42 devid 1 transid 8 /dev/sdd (8:48) scanned by mount (1067)
> [ 204.910330] [ T1067] BTRFS info (device sdd): first mount of filesystem 15196e63-e303-48ed-9dcb-9ec397479c42
> [ 204.913129] [ T1067] BTRFS info (device sdd): using crc32c (crc32c-generic) checksum algorithm
> [ 204.914856] [ T1067] BTRFS info (device sdd): using free-space-tree
> [ 204.925816] [ T1067] BTRFS info (device sdd): host-managed zoned block device /dev/sdd, 256 zones of 4194304 bytes
> [ 204.929320] [ T1067] BTRFS info (device sdd): zoned mode enabled with zone size 4194304
> [ 204.935403] [ T1067] BTRFS info (device sdd): checking UUID tree
> [ 215.637712] [ T1103] BTRFS info (device sdd): last unmount of filesystem 15196e63-e303-48ed-9dcb-9ec397479c42
>
> [ 215.762293] [ T1110] ======================================================
> [ 215.763636] [ T1110] WARNING: possible circular locking dependency detected
> [ 215.765092] [ T1110] 6.14.0-rc1 #252 Not tainted
> [ 215.766271] [ T1110] ------------------------------------------------------
> [ 215.767615] [ T1110] modprobe/1110 is trying to acquire lock:
> [ 215.768999] [ T1110] ffff888100ac83e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x38f/0xb60
> [ 215.770700] [ T1110]
> but task is already holding lock:
> [ 215.773077] [ T1110] ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130
> [ 215.774685] [ T1110]
> which lock already depends on the new lock.
>
> [ 215.778184] [ T1110]
> the existing dependency chain (in reverse order) is:
> [ 215.780532] [ T1110]
> -> #3 (&q->q_usage_counter(queue)#16){++++}-{0:0}:
> [ 215.782937] [ T1110] blk_queue_enter+0x3d9/0x500
> [ 215.784175] [ T1110] blk_mq_alloc_request+0x47d/0x8e0
> [ 215.785434] [ T1110] scsi_execute_cmd+0x14f/0xb80
> [ 215.786662] [ T1110] sd_zbc_do_report_zones+0x1c1/0x470
> [ 215.787989] [ T1110] sd_zbc_report_zones+0x362/0xd60
> [ 215.789222] [ T1110] blkdev_report_zones+0x1b1/0x2e0
> [ 215.790448] [ T1110] btrfs_get_dev_zones+0x215/0x7e0 [btrfs]
> [ 215.791887] [ T1110] btrfs_load_block_group_zone_info+0x6d2/0x2c10 [btrfs]
> [ 215.793342] [ T1110] btrfs_make_block_group+0x36b/0x870 [btrfs]
> [ 215.794752] [ T1110] btrfs_create_chunk+0x147d/0x2320 [btrfs]
> [ 215.796150] [ T1110] btrfs_chunk_alloc+0x2ce/0xcf0 [btrfs]
> [ 215.797474] [ T1110] start_transaction+0xce6/0x1620 [btrfs]
> [ 215.798858] [ T1110] btrfs_uuid_scan_kthread+0x4ee/0x5b0 [btrfs]
> [ 215.800334] [ T1110] kthread+0x39d/0x750
> [ 215.801479] [ T1110] ret_from_fork+0x30/0x70
> [ 215.802662] [ T1110] ret_from_fork_asm+0x1a/0x30
Not sure why fs_info->dev_replace is required for reporting zones.
> [ 215.803902] [ T1110]
> -> #2 (&fs_info->dev_replace.rwsem){++++}-{4:4}:
> [ 215.805993] [ T1110] down_read+0x9b/0x470
> [ 215.807088] [ T1110] btrfs_map_block+0x2ce/0x2ce0 [btrfs]
> [ 215.808366] [ T1110] btrfs_submit_chunk+0x2d4/0x16c0 [btrfs]
> [ 215.809687] [ T1110] btrfs_submit_bbio+0x16/0x30 [btrfs]
> [ 215.810983] [ T1110] btree_write_cache_pages+0xb5a/0xf90 [btrfs]
> [ 215.812295] [ T1110] do_writepages+0x17f/0x7b0
> [ 215.813416] [ T1110] __writeback_single_inode+0x114/0xb00
> [ 215.814575] [ T1110] writeback_sb_inodes+0x52b/0xe00
> [ 215.815717] [ T1110] wb_writeback+0x1a7/0x800
> [ 215.816924] [ T1110] wb_workfn+0x12a/0xbd0
> [ 215.817951] [ T1110] process_one_work+0x85a/0x1460
> [ 215.818985] [ T1110] worker_thread+0x5e2/0xfc0
> [ 215.820013] [ T1110] kthread+0x39d/0x750
> [ 215.821000] [ T1110] ret_from_fork+0x30/0x70
> [ 215.822010] [ T1110] ret_from_fork_asm+0x1a/0x30
> [ 215.822988] [ T1110]
> -> #1 (&fs_info->zoned_meta_io_lock){+.+.}-{4:4}:
> [ 215.824855] [ T1110] __mutex_lock+0x1aa/0x1360
> [ 215.825856] [ T1110] btree_write_cache_pages+0x252/0xf90 [btrfs]
> [ 215.827089] [ T1110] do_writepages+0x17f/0x7b0
> [ 215.828027] [ T1110] __writeback_single_inode+0x114/0xb00
> [ 215.829141] [ T1110] writeback_sb_inodes+0x52b/0xe00
> [ 215.830129] [ T1110] wb_writeback+0x1a7/0x800
> [ 215.831084] [ T1110] wb_workfn+0x12a/0xbd0
> [ 215.831950] [ T1110] process_one_work+0x85a/0x1460
> [ 215.832862] [ T1110] worker_thread+0x5e2/0xfc0
> [ 215.833826] [ T1110] kthread+0x39d/0x750
> [ 215.834715] [ T1110] ret_from_fork+0x30/0x70
> [ 215.835669] [ T1110] ret_from_fork_asm+0x1a/0x30
> [ 215.836594] [ T1110]
> -> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
> [ 215.838347] [ T1110] __lock_acquire+0x2f52/0x5ea0
> [ 215.839258] [ T1110] lock_acquire+0x1b1/0x540
> [ 215.840156] [ T1110] __flush_work+0x3ac/0xb60
> [ 215.841041] [ T1110] wb_shutdown+0x15b/0x1f0
> [ 215.841915] [ T1110] bdi_unregister+0x172/0x5b0
> [ 215.842793] [ T1110] del_gendisk+0x841/0xa20
> [ 215.843724] [ T1110] sd_remove+0x85/0x130
> [ 215.844660] [ T1110] device_release_driver_internal+0x368/0x520
> [ 215.845757] [ T1110] bus_remove_device+0x1f1/0x3f0
> [ 215.846755] [ T1110] device_del+0x3bd/0x9c0
> [ 215.847712] [ T1110] __scsi_remove_device+0x272/0x340
> [ 215.848727] [ T1110] scsi_forget_host+0xf7/0x170
> [ 215.849710] [ T1110] scsi_remove_host+0xd2/0x2a0
> [ 215.850682] [ T1110] sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
> [ 215.851788] [ T1110] device_release_driver_internal+0x368/0x520
> [ 215.852853] [ T1110] bus_remove_device+0x1f1/0x3f0
> [ 215.853885] [ T1110] device_del+0x3bd/0x9c0
> [ 215.854840] [ T1110] device_unregister+0x13/0xa0
> [ 215.855850] [ T1110] sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
> [ 215.856947] [ T1110] scsi_debug_exit+0x17/0x70 [scsi_debug]
> [ 215.857968] [ T1110] __do_sys_delete_module.isra.0+0x321/0x520
> [ 215.858999] [ T1110] do_syscall_64+0x93/0x180
> [ 215.859930] [ T1110] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 215.860974] [ T1110]
> other info that might help us debug this:
This warning looks one real issue.
Thanks,
Ming
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: blktests failures with v6.14-rc1 kernel
2025-02-07 8:52 ` Ming Lei
@ 2025-02-07 12:22 ` Shinichiro Kawasaki
2025-02-07 12:47 ` Ming Lei
0 siblings, 1 reply; 4+ messages in thread
From: Shinichiro Kawasaki @ 2025-02-07 12:22 UTC (permalink / raw)
To: Ming Lei
Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-scsi@vger.kernel.org, nbd@other.debian.org,
linux-rdma@vger.kernel.org
On Feb 07, 2025 / 16:52, Ming Lei wrote:
> Hi Shinichiro,
Hi Ming, thanks for the comments. Let me comment on the block/002 failure.
> > Failure description
> > ===================
> >
> > #1: block/002
> >
> > This test case fails with a lockdep WARN "possible circular locking
> > dependency detected". The lockdep splats shows q->q_usage_counter as one
> > of the involved locks. It was observed with the v6.13-rc2 kernel [2], and
> > still observed with v6.14-rc1 kernel. It needs further debug.
> >
> > [2] https://lore.kernel.org/linux-block/qskveo3it6rqag4xyleobe5azpxu6tekihao4qpdopvk44una2@y4lkoe6y3d6z/
>
> [ 342.568086][ T1023] -> #0 (&mm->mmap_lock){++++}-{4:4}:
> [ 342.569658][ T1023] __lock_acquire+0x2e8b/0x6010
> [ 342.570577][ T1023] lock_acquire+0x1b1/0x540
> [ 342.571463][ T1023] __might_fault+0xb9/0x120
> [ 342.572338][ T1023] _copy_from_user+0x34/0xa0
> [ 342.573231][ T1023] __blk_trace_setup+0xa0/0x140
> [ 342.574129][ T1023] blk_trace_ioctl+0x14e/0x270
> [ 342.575033][ T1023] blkdev_ioctl+0x38f/0x5c0
> [ 342.575919][ T1023] __x64_sys_ioctl+0x130/0x190
> [ 342.576824][ T1023] do_syscall_64+0x93/0x180
> [ 342.577714][ T1023] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> The above dependency between ->mmap_lock and ->debugfs_mutex has been cut by
> commit b769a2f409e7 ("blktrace: move copy_[to|from]_user() out of ->debugfs_lock"),
> so I'd suggest to double check this one.
Thanks. I missed the fix. Said that, I do still see the lockdep WARN "possible
circular locking dependency detected" with the kernel v6.14-rc1. Then I guess
there are two problems and I confused them. One problem was fixed by the commit
b769a2f409e7, and the other problem that I still observe.
Please take a look in the kernel message below, which was observed at the
block/002 failure I have just recreated on my test node. The splat indicates the
dependency different from that observed with v6.13-rc2 kernel.
[ 165.526908] [ T1103] run blktests block/002 at 2025-02-07 21:02:22
[ 165.814157] [ T1134] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
[ 166.031013] [ T1135] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
[ 166.031986] [ T1135] scsi host9: scsi_debug: version 0191 [20210520]
dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0
[ 166.035727] [ T1135] scsi 9:0:0:0: Direct-Access Linux scsi_debug 0191 PQ: 0 ANSI: 7
[ 166.038449] [ C1] scsi 9:0:0:0: Power-on or device reset occurred
[ 166.045105] [ T1135] sd 9:0:0:0: Attached scsi generic sg3 type 0
[ 166.046426] [ T94] sd 9:0:0:0: [sdd] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)
[ 166.048275] [ T94] sd 9:0:0:0: [sdd] Write Protect is off
[ 166.048854] [ T94] sd 9:0:0:0: [sdd] Mode Sense: 73 00 10 08
[ 166.051019] [ T94] sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 166.059601] [ T94] sd 9:0:0:0: [sdd] permanent stream count = 5
[ 166.063623] [ T94] sd 9:0:0:0: [sdd] Preferred minimum I/O size 512 bytes
[ 166.064329] [ T94] sd 9:0:0:0: [sdd] Optimal transfer size 524288 bytes
[ 166.094781] [ T94] sd 9:0:0:0: [sdd] Attached SCSI disk
[ 166.855819] [ T1161] ======================================================
[ 166.856339] [ T1161] WARNING: possible circular locking dependency detected
[ 166.856945] [ T1161] 6.14.0-rc1 #252 Not tainted
[ 166.857292] [ T1161] ------------------------------------------------------
[ 166.857874] [ T1161] blktrace/1161 is trying to acquire lock:
[ 166.858310] [ T1161] ffff88811dbfe5e0 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x99/0x120
[ 166.859053] [ T1161]
but task is already holding lock:
[ 166.859593] [ T1161] ffff8881082a1078 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: relay_file_read+0xa3/0x8a0
[ 166.860410] [ T1161]
which lock already depends on the new lock.
[ 166.861269] [ T1161]
the existing dependency chain (in reverse order) is:
[ 166.863693] [ T1161]
-> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
[ 166.866064] [ T1161] down_write+0x8d/0x200
[ 166.867266] [ T1161] start_creating.part.0+0x82/0x230
[ 166.868544] [ T1161] debugfs_create_dir+0x3a/0x4c0
[ 166.869797] [ T1161] blk_register_queue+0x12d/0x430
[ 166.870986] [ T1161] add_disk_fwnode+0x6b1/0x1010
[ 166.872144] [ T1161] sd_probe+0x94e/0xf30
[ 166.873262] [ T1161] really_probe+0x1e3/0x8a0
[ 166.874372] [ T1161] __driver_probe_device+0x18c/0x370
[ 166.875544] [ T1161] driver_probe_device+0x4a/0x120
[ 166.876715] [ T1161] __device_attach_driver+0x15e/0x270
[ 166.877890] [ T1161] bus_for_each_drv+0x114/0x1a0
[ 166.878999] [ T1161] __device_attach_async_helper+0x19c/0x240
[ 166.880180] [ T1161] async_run_entry_fn+0x96/0x4f0
[ 166.881312] [ T1161] process_one_work+0x85a/0x1460
[ 166.882411] [ T1161] worker_thread+0x5e2/0xfc0
[ 166.883483] [ T1161] kthread+0x39d/0x750
[ 166.884548] [ T1161] ret_from_fork+0x30/0x70
[ 166.885629] [ T1161] ret_from_fork_asm+0x1a/0x30
[ 166.886728] [ T1161]
-> #4 (&q->debugfs_mutex){+.+.}-{4:4}:
[ 166.888799] [ T1161] __mutex_lock+0x1aa/0x1360
[ 166.889863] [ T1161] blk_mq_init_sched+0x3b5/0x5e0
[ 166.890907] [ T1161] elevator_switch+0x149/0x4b0
[ 166.891928] [ T1161] elv_iosched_store+0x29f/0x380
[ 166.892966] [ T1161] queue_attr_store+0x313/0x480
[ 166.893976] [ T1161] kernfs_fop_write_iter+0x39e/0x5a0
[ 166.895012] [ T1161] vfs_write+0x5f9/0xe90
[ 166.895970] [ T1161] ksys_write+0xf6/0x1c0
[ 166.896931] [ T1161] do_syscall_64+0x93/0x180
[ 166.897886] [ T1161] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 166.898945] [ T1161]
-> #3 (&q->q_usage_counter(io)#2){++++}-{0:0}:
[ 166.900757] [ T1161] blk_mq_submit_bio+0x19b8/0x2070
[ 166.901784] [ T1161] __submit_bio+0x36b/0x6d0
[ 166.902748] [ T1161] submit_bio_noacct_nocheck+0x542/0xca0
[ 166.903796] [ T1161] iomap_readahead+0x4c4/0x7e0
[ 166.904778] [ T1161] read_pages+0x198/0xaf0
[ 166.905718] [ T1161] page_cache_ra_order+0x4d3/0xb50
[ 166.906709] [ T1161] filemap_get_pages+0x2c7/0x1850
[ 166.907684] [ T1161] filemap_read+0x31d/0xc30
[ 166.908617] [ T1161] xfs_file_buffered_read+0x1e9/0x2a0 [xfs]
[ 166.909910] [ T1161] xfs_file_read_iter+0x25f/0x4a0 [xfs]
[ 166.911131] [ T1161] vfs_read+0x6bc/0xa20
[ 166.911994] [ T1161] ksys_read+0xf6/0x1c0
[ 166.912862] [ T1161] do_syscall_64+0x93/0x180
[ 166.913736] [ T1161] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 166.914751] [ T1161]
-> #2 (mapping.invalidate_lock#2){++++}-{4:4}:
[ 166.916401] [ T1161] down_read+0x9b/0x470
[ 166.917285] [ T1161] filemap_fault+0x231/0x2ac0
[ 166.918179] [ T1161] __do_fault+0xf4/0x5d0
[ 166.919039] [ T1161] do_fault+0x965/0x11d0
[ 166.919895] [ T1161] __handle_mm_fault+0x1060/0x1f40
[ 166.920829] [ T1161] handle_mm_fault+0x21a/0x6b0
[ 166.921722] [ T1161] do_user_addr_fault+0x322/0xaa0
[ 166.922648] [ T1161] exc_page_fault+0x7a/0x110
[ 166.923540] [ T1161] asm_exc_page_fault+0x22/0x30
[ 166.924466] [ T1161]
-> #1 (&vma->vm_lock->lock){++++}-{4:4}:
[ 166.926085] [ T1161] down_write+0x8d/0x200
[ 166.926943] [ T1161] vma_link+0x1ff/0x590
[ 166.927794] [ T1161] insert_vm_struct+0x15a/0x340
[ 166.928713] [ T1161] alloc_bprm+0x626/0xbf0
[ 166.929578] [ T1161] kernel_execve+0x85/0x2f0
[ 166.930476] [ T1161] call_usermodehelper_exec_async+0x21b/0x430
[ 166.931481] [ T1161] ret_from_fork+0x30/0x70
[ 166.932371] [ T1161] ret_from_fork_asm+0x1a/0x30
[ 166.933291] [ T1161]
-> #0 (&mm->mmap_lock){++++}-{4:4}:
[ 166.934868] [ T1161] __lock_acquire+0x2f52/0x5ea0
[ 166.935768] [ T1161] lock_acquire+0x1b1/0x540
[ 166.936678] [ T1161] __might_fault+0xb9/0x120
[ 166.937563] [ T1161] _copy_to_user+0x1e/0x80
[ 166.938438] [ T1161] relay_file_read+0x149/0x8a0
[ 166.939343] [ T1161] full_proxy_read+0x110/0x1d0
[ 166.940224] [ T1161] vfs_read+0x1bb/0xa20
[ 166.941082] [ T1161] ksys_read+0xf6/0x1c0
[ 166.941920] [ T1161] do_syscall_64+0x93/0x180
[ 166.942780] [ T1161] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 166.943755] [ T1161]
other info that might help us debug this:
[ 166.945985] [ T1161] Chain exists of:
&mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3
[ 166.948504] [ T1161] Possible unsafe locking scenario:
[ 166.950078] [ T1161] CPU0 CPU1
[ 166.950965] [ T1161] ---- ----
[ 166.951849] [ T1161] lock(&sb->s_type->i_mutex_key#3);
[ 166.952751] [ T1161] lock(&q->debugfs_mutex);
[ 166.953789] [ T1161] lock(&sb->s_type->i_mutex_key#3);
[ 166.954860] [ T1161] rlock(&mm->mmap_lock);
[ 166.955689] [ T1161]
*** DEADLOCK ***
[ 166.957776] [ T1161] 2 locks held by blktrace/1161:
[ 166.958623] [ T1161] #0: ffff88813c9ebcf8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0x233/0x2f0
[ 166.959808] [ T1161] #1: ffff8881082a1078 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: relay_file_read+0xa3/0x8a0
[ 166.961118] [ T1161]
stack backtrace:
[ 166.962578] [ T1161] CPU: 2 UID: 0 PID: 1161 Comm: blktrace Not tainted 6.14.0-rc1 #252
[ 166.962582] [ T1161] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[ 166.962585] [ T1161] Call Trace:
[ 166.962586] [ T1161] <TASK>
p[ 166.962588] [ T1161] dump_stack_lvl+0x6a/0x90
[ 166.962593] [ T1161] print_circular_bug.cold+0x1e0/0x274
[ 166.962597] [ T1161] check_noncircular+0x306/0x3f0
[ 166.962600] [ T1161] ? __pfx_check_noncircular+0x10/0x10
[ 166.962605] [ T1161] ? lockdep_lock+0xca/0x1c0
[ 166.962607] [ T1161] ? __pfx_lockdep_lock+0x10/0x10
[ 166.962611] [ T1161] __lock_acquire+0x2f52/0x5ea0
[ 166.962616] [ T1161] ? __pfx___lock_acquire+0x10/0x10
[ 166.962619] [ T1161] ? lock_acquire+0x1b1/0x540
[ 166.962623] [ T1161] lock_acquire+0x1b1/0x540
[ 166.962625] [ T1161] ? __might_fault+0x99/0x120
[ 166.962628] [ T1161] ? __pfx_lock_acquire+0x10/0x10
[ 166.962631] [ T1161] ? lock_is_held_type+0xd5/0x130
[ 166.962635] [ T1161] ? __pfx___might_resched+0x10/0x10
[ 166.962637] [ T1161] ? _raw_spin_unlock+0x29/0x50
[ 166.962640] [ T1161] ? __might_fault+0x99/0x120
[ 166.962642] [ T1161] __might_fault+0xb9/0x120
[ 166.962649] [ T1161] ? __might_fault+0x99/0x120
[ 166.962651] [ T1161] ? __check_object_size+0x3f3/0x540
[ 166.962654] [ T1161] _copy_to_user+0x1e/0x80
[ 166.962656] [ T1161] relay_file_read+0x149/0x8a0
[ 166.962661] [ T1161] ? selinux_file_permission+0x36d/0x420
[ 166.962665] [ T1161] full_proxy_read+0x110/0x1d0
[ 166.962667] [ T1161] ? rw_verify_area+0x2f7/0x520
[ 166.962670] [ T1161] vfs_read+0x1bb/0xa20
[ 166.962672] [ T1161] ? __pfx___mutex_lock+0x10/0x10
[ 166.962674] [ T1161] ? __pfx_lock_release+0x10/0x10
[ 166.962677] [ T1161] ? __pfx_vfs_read+0x10/0x10
[ 166.962683] [ T1161] ? __fget_files+0x1ae/0x2e0
[ 166.962687] [ T1161] ksys_read+0xf6/0x1c0
[ 166.962689] [ T1161] ? __pfx_ksys_read+0x10/0x10
[ 166.962693] [ T1161] do_syscall_64+0x93/0x180
[ 166.962697] [ T1161] ? lockdep_hardirqs_on_prepare+0x16d/0x400
[ 166.962700] [ T1161] ? do_syscall_64+0x9f/0x180
[ 166.962702] [ T1161] ? lockdep_hardirqs_on+0x78/0x100
[ 166.962704] [ T1161] ? do_syscall_64+0x9f/0x180
[ 166.962708] [ T1161] ? lockdep_hardirqs_on_prepare+0x16d/0x400
[ 166.962711] [ T1161] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 166.962714] [ T1161] RIP: 0033:0x7fdddfde0e4a
[ 166.962719] [ T1161] Code: 55 48 89 e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 28 58 f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 2e 44 89 c7 48 89 45 f8 e8 82 58 f8 ff 48 8b
[ 166.962721] [ T1161] RSP: 002b:00007fdddecccd80 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 166.962726] [ T1161] RAX: ffffffffffffffda RBX: 00007fddd4000bb0 RCX: 00007fdddfde0e4a
[ 166.962727] [ T1161] RDX: 0000000000080000 RSI: 00007fdddc400000 RDI: 0000000000000006
[ 166.962729] [ T1161] RBP: 00007fdddecccda0 R08: 0000000000000000 R09: 0000000000000000
[ 166.962730] [ T1161] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[ 166.962731] [ T1161] R13: 000056530426caa0 R14: 0000000000000000 R15: 00007fddd4002c90
[ 166.962735] [ T1161] </TASK>
[ 167.066759] [ T1103] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: blktests failures with v6.14-rc1 kernel
2025-02-07 12:22 ` Shinichiro Kawasaki
@ 2025-02-07 12:47 ` Ming Lei
0 siblings, 0 replies; 4+ messages in thread
From: Ming Lei @ 2025-02-07 12:47 UTC (permalink / raw)
To: Shinichiro Kawasaki
Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-scsi@vger.kernel.org, nbd@other.debian.org,
linux-rdma@vger.kernel.org
On Fri, Feb 07, 2025 at 12:22:32PM +0000, Shinichiro Kawasaki wrote:
> On Feb 07, 2025 / 16:52, Ming Lei wrote:
> > Hi Shinichiro,
>
> Hi Ming, thanks for the comments. Let me comment on the block/002 failure.
>
> > > Failure description
> > > ===================
> > >
> > > #1: block/002
> > >
> > > This test case fails with a lockdep WARN "possible circular locking
> > > dependency detected". The lockdep splats shows q->q_usage_counter as one
> > > of the involved locks. It was observed with the v6.13-rc2 kernel [2], and
> > > still observed with v6.14-rc1 kernel. It needs further debug.
> > >
> > > [2] https://lore.kernel.org/linux-block/qskveo3it6rqag4xyleobe5azpxu6tekihao4qpdopvk44una2@y4lkoe6y3d6z/
> >
> > [ 342.568086][ T1023] -> #0 (&mm->mmap_lock){++++}-{4:4}:
> > [ 342.569658][ T1023] __lock_acquire+0x2e8b/0x6010
> > [ 342.570577][ T1023] lock_acquire+0x1b1/0x540
> > [ 342.571463][ T1023] __might_fault+0xb9/0x120
> > [ 342.572338][ T1023] _copy_from_user+0x34/0xa0
> > [ 342.573231][ T1023] __blk_trace_setup+0xa0/0x140
> > [ 342.574129][ T1023] blk_trace_ioctl+0x14e/0x270
> > [ 342.575033][ T1023] blkdev_ioctl+0x38f/0x5c0
> > [ 342.575919][ T1023] __x64_sys_ioctl+0x130/0x190
> > [ 342.576824][ T1023] do_syscall_64+0x93/0x180
> > [ 342.577714][ T1023] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > The above dependency between ->mmap_lock and ->debugfs_mutex has been cut by
> > commit b769a2f409e7 ("blktrace: move copy_[to|from]_user() out of ->debugfs_lock"),
> > so I'd suggest to double check this one.
>
> Thanks. I missed the fix. Said that, I do still see the lockdep WARN "possible
> circular locking dependency detected" with the kernel v6.14-rc1. Then I guess
> there are two problems and I confused them. One problem was fixed by the commit
> b769a2f409e7, and the other problem that I still observe.
>
> Please take a look in the kernel message below, which was observed at the
> block/002 failure I have just recreated on my test node. The splat indicates the
> dependency different from that observed with v6.13-rc2 kernel.
Yeah, indeed, thanks for sharing the log.
>
>
> [ 165.526908] [ T1103] run blktests block/002 at 2025-02-07 21:02:22
> [ 165.814157] [ T1134] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
> [ 166.031013] [ T1135] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
> [ 166.031986] [ T1135] scsi host9: scsi_debug: version 0191 [20210520]
> dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0
> [ 166.035727] [ T1135] scsi 9:0:0:0: Direct-Access Linux scsi_debug 0191 PQ: 0 ANSI: 7
> [ 166.038449] [ C1] scsi 9:0:0:0: Power-on or device reset occurred
> [ 166.045105] [ T1135] sd 9:0:0:0: Attached scsi generic sg3 type 0
> [ 166.046426] [ T94] sd 9:0:0:0: [sdd] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)
> [ 166.048275] [ T94] sd 9:0:0:0: [sdd] Write Protect is off
> [ 166.048854] [ T94] sd 9:0:0:0: [sdd] Mode Sense: 73 00 10 08
> [ 166.051019] [ T94] sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
> [ 166.059601] [ T94] sd 9:0:0:0: [sdd] permanent stream count = 5
> [ 166.063623] [ T94] sd 9:0:0:0: [sdd] Preferred minimum I/O size 512 bytes
> [ 166.064329] [ T94] sd 9:0:0:0: [sdd] Optimal transfer size 524288 bytes
> [ 166.094781] [ T94] sd 9:0:0:0: [sdd] Attached SCSI disk
>
> [ 166.855819] [ T1161] ======================================================
> [ 166.856339] [ T1161] WARNING: possible circular locking dependency detected
> [ 166.856945] [ T1161] 6.14.0-rc1 #252 Not tainted
> [ 166.857292] [ T1161] ------------------------------------------------------
> [ 166.857874] [ T1161] blktrace/1161 is trying to acquire lock:
> [ 166.858310] [ T1161] ffff88811dbfe5e0 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x99/0x120
> [ 166.859053] [ T1161]
> but task is already holding lock:
> [ 166.859593] [ T1161] ffff8881082a1078 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: relay_file_read+0xa3/0x8a0
> [ 166.860410] [ T1161]
> which lock already depends on the new lock.
>
> [ 166.861269] [ T1161]
> the existing dependency chain (in reverse order) is:
> [ 166.863693] [ T1161]
> -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
> [ 166.866064] [ T1161] down_write+0x8d/0x200
> [ 166.867266] [ T1161] start_creating.part.0+0x82/0x230
> [ 166.868544] [ T1161] debugfs_create_dir+0x3a/0x4c0
> [ 166.869797] [ T1161] blk_register_queue+0x12d/0x430
> [ 166.870986] [ T1161] add_disk_fwnode+0x6b1/0x1010
> [ 166.872144] [ T1161] sd_probe+0x94e/0xf30
> [ 166.873262] [ T1161] really_probe+0x1e3/0x8a0
> [ 166.874372] [ T1161] __driver_probe_device+0x18c/0x370
> [ 166.875544] [ T1161] driver_probe_device+0x4a/0x120
> [ 166.876715] [ T1161] __device_attach_driver+0x15e/0x270
> [ 166.877890] [ T1161] bus_for_each_drv+0x114/0x1a0
> [ 166.878999] [ T1161] __device_attach_async_helper+0x19c/0x240
> [ 166.880180] [ T1161] async_run_entry_fn+0x96/0x4f0
> [ 166.881312] [ T1161] process_one_work+0x85a/0x1460
> [ 166.882411] [ T1161] worker_thread+0x5e2/0xfc0
> [ 166.883483] [ T1161] kthread+0x39d/0x750
> [ 166.884548] [ T1161] ret_from_fork+0x30/0x70
> [ 166.885629] [ T1161] ret_from_fork_asm+0x1a/0x30
> [ 166.886728] [ T1161]
> -> #4 (&q->debugfs_mutex){+.+.}-{4:4}:
> [ 166.888799] [ T1161] __mutex_lock+0x1aa/0x1360
> [ 166.889863] [ T1161] blk_mq_init_sched+0x3b5/0x5e0
> [ 166.890907] [ T1161] elevator_switch+0x149/0x4b0
> [ 166.891928] [ T1161] elv_iosched_store+0x29f/0x380
> [ 166.892966] [ T1161] queue_attr_store+0x313/0x480
> [ 166.893976] [ T1161] kernfs_fop_write_iter+0x39e/0x5a0
> [ 166.895012] [ T1161] vfs_write+0x5f9/0xe90
> [ 166.895970] [ T1161] ksys_write+0xf6/0x1c0
> [ 166.896931] [ T1161] do_syscall_64+0x93/0x180
> [ 166.897886] [ T1161] entry_SYSCALL_64_after_hwframe+0x76/0x7e
I think we can kill the above dependency, all debugfs API needn't to be
protected by q->debugfs_mutex, which is supposed for covering block
layer internal data structure update & query.
I will try to cook patch for fixing this one.
Thanks,
Ming
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-02-07 12:47 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 1:24 blktests failures with v6.14-rc1 kernel Shinichiro Kawasaki
2025-02-07 8:52 ` Ming Lei
2025-02-07 12:22 ` Shinichiro Kawasaki
2025-02-07 12:47 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).