linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* blktests failures with v6.14-rc1 kernel
@ 2025-02-07  1:24 Shinichiro Kawasaki
  2025-02-07  8:52 ` Ming Lei
  0 siblings, 1 reply; 4+ messages in thread
From: Shinichiro Kawasaki @ 2025-02-07  1:24 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

Hi all,

I ran the latest blktests (git hash: 67aff550bd52) with the v6.14-rc1 kernel.
I observed 5 failures listed below. Comparing with the previous report with
the v6.13 kernel [1], one new failure was observed at zbd/009.

[1] https://lore.kernel.org/linux-nvme/rv3w2zcno7n3bgdy2ghxmedsqf23ptmakvjerbhopgxjsvgzmo@ioece7dyg2og/

List of failures
================
#1: block/002
#2: nvme/037 (fc transport)
#3: nvme/041 (fc transport)
#4: nvme/058 (loop transport)
#5: zbd/009 (new)


Two failures observed with the v6.13 kernel are not observed with the v6.14-rc1.

Failures no longer observed
===========================
#1: block/001:
    It looks resolved by fixes in v6.14-rc1 kernel.

#2: throtl/001 (CKI project, s390 arch)
    I was not able to find blktests runs by CKI project with the v6.14-rc1
    kernel.


Failure description
===================

#1: block/002

    This test case fails with a lockdep WARN "possible circular locking
    dependency detected". The lockdep splats shows q->q_usage_counter as one
    of the involved locks. It was observed with the v6.13-rc2 kernel [2], and
    still observed with v6.14-rc1 kernel. It needs further debug.

    [2] https://lore.kernel.org/linux-block/qskveo3it6rqag4xyleobe5azpxu6tekihao4qpdopvk44una2@y4lkoe6y3d6z/

#2: nvme/037 (fc transport)
#3: nvme/041 (fc transport)

    These two test cases fail for fc transport. Refer to the report for v6.12
    kernel [3].

    [3] https://lore.kernel.org/linux-nvme/6crydkodszx5vq4ieox3jjpwkxtu7mhbohypy24awlo5w7f4k6@to3dcng24rd4/

#4: nvme/058 (loop transport)

    This test case hangs occasionally with Oops and KASAN null-ptr-deref. It was
    reported for the first time with the kernel v6.13 [1]. A fix patch candidate
    was posted [4] (Thanks!). The patch needs further work.

    [4] https://lore.kernel.org/linux-nvme/20250124082505.140258-1-hare@kernel.org/

#5: zbd/009 (new)

    This test case fails with a lockdep WARN "possible circular locking
    dependency detected" [5]. The lockdep splats shows q->q_usage_counter as one
    of the involved locks. This is common as the block/002 failure. It needs
    further debug.

[5] kernel message during zbd/009 run

[  204.099296] [   T1004] run blktests zbd/009 at 2025-02-07 10:01:36
[  204.155021] [   T1040] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
[  204.553613] [   T1041] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
[  204.554438] [   T1041] scsi host9: scsi_debug: version 0191 [20210520]
                            dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0
[  204.558331] [   T1041] scsi 9:0:0:0: Direct-Access-ZBC Linux    scsi_debug       0191 PQ: 0 ANSI: 7
[  204.560269] [      C2] scsi 9:0:0:0: Power-on or device reset occurred
[  204.562871] [   T1041] sd 9:0:0:0: Attached scsi generic sg3 type 20
[  204.563013] [    T100] sd 9:0:0:0: [sdd] Host-managed zoned block device
[  204.564518] [    T100] sd 9:0:0:0: [sdd] 262144 4096-byte logical blocks: (1.07 GB/1.00 GiB)
[  204.565477] [    T100] sd 9:0:0:0: [sdd] Write Protect is off
[  204.565948] [    T100] sd 9:0:0:0: [sdd] Mode Sense: 5b 00 10 08
[  204.566245] [    T100] sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
[  204.567453] [    T100] sd 9:0:0:0: [sdd] permanent stream count = 5
[  204.568276] [    T100] sd 9:0:0:0: [sdd] Preferred minimum I/O size 4096 bytes
[  204.569067] [    T100] sd 9:0:0:0: [sdd] Optimal transfer size 4194304 bytes
[  204.571080] [    T100] sd 9:0:0:0: [sdd] 256 zones of 1024 logical blocks
[  204.593822] [    T100] sd 9:0:0:0: [sdd] Attached SCSI disk
[  204.901514] [   T1067] BTRFS: device fsid 15196e63-e303-48ed-9dcb-9ec397479c42 devid 1 transid 8 /dev/sdd (8:48) scanned by mount (1067)
[  204.910330] [   T1067] BTRFS info (device sdd): first mount of filesystem 15196e63-e303-48ed-9dcb-9ec397479c42
[  204.913129] [   T1067] BTRFS info (device sdd): using crc32c (crc32c-generic) checksum algorithm
[  204.914856] [   T1067] BTRFS info (device sdd): using free-space-tree
[  204.925816] [   T1067] BTRFS info (device sdd): host-managed zoned block device /dev/sdd, 256 zones of 4194304 bytes
[  204.929320] [   T1067] BTRFS info (device sdd): zoned mode enabled with zone size 4194304
[  204.935403] [   T1067] BTRFS info (device sdd): checking UUID tree
[  215.637712] [   T1103] BTRFS info (device sdd): last unmount of filesystem 15196e63-e303-48ed-9dcb-9ec397479c42

[  215.762293] [   T1110] ======================================================
[  215.763636] [   T1110] WARNING: possible circular locking dependency detected
[  215.765092] [   T1110] 6.14.0-rc1 #252 Not tainted
[  215.766271] [   T1110] ------------------------------------------------------
[  215.767615] [   T1110] modprobe/1110 is trying to acquire lock:
[  215.768999] [   T1110] ffff888100ac83e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x38f/0xb60
[  215.770700] [   T1110] 
                          but task is already holding lock:
[  215.773077] [   T1110] ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130
[  215.774685] [   T1110] 
                          which lock already depends on the new lock.

[  215.778184] [   T1110] 
                          the existing dependency chain (in reverse order) is:
[  215.780532] [   T1110] 
                          -> #3 (&q->q_usage_counter(queue)#16){++++}-{0:0}:
[  215.782937] [   T1110]        blk_queue_enter+0x3d9/0x500
[  215.784175] [   T1110]        blk_mq_alloc_request+0x47d/0x8e0
[  215.785434] [   T1110]        scsi_execute_cmd+0x14f/0xb80
[  215.786662] [   T1110]        sd_zbc_do_report_zones+0x1c1/0x470
[  215.787989] [   T1110]        sd_zbc_report_zones+0x362/0xd60
[  215.789222] [   T1110]        blkdev_report_zones+0x1b1/0x2e0
[  215.790448] [   T1110]        btrfs_get_dev_zones+0x215/0x7e0 [btrfs]
[  215.791887] [   T1110]        btrfs_load_block_group_zone_info+0x6d2/0x2c10 [btrfs]
[  215.793342] [   T1110]        btrfs_make_block_group+0x36b/0x870 [btrfs]
[  215.794752] [   T1110]        btrfs_create_chunk+0x147d/0x2320 [btrfs]
[  215.796150] [   T1110]        btrfs_chunk_alloc+0x2ce/0xcf0 [btrfs]
[  215.797474] [   T1110]        start_transaction+0xce6/0x1620 [btrfs]
[  215.798858] [   T1110]        btrfs_uuid_scan_kthread+0x4ee/0x5b0 [btrfs]
[  215.800334] [   T1110]        kthread+0x39d/0x750
[  215.801479] [   T1110]        ret_from_fork+0x30/0x70
[  215.802662] [   T1110]        ret_from_fork_asm+0x1a/0x30
[  215.803902] [   T1110] 
                          -> #2 (&fs_info->dev_replace.rwsem){++++}-{4:4}:
[  215.805993] [   T1110]        down_read+0x9b/0x470
[  215.807088] [   T1110]        btrfs_map_block+0x2ce/0x2ce0 [btrfs]
[  215.808366] [   T1110]        btrfs_submit_chunk+0x2d4/0x16c0 [btrfs]
[  215.809687] [   T1110]        btrfs_submit_bbio+0x16/0x30 [btrfs]
[  215.810983] [   T1110]        btree_write_cache_pages+0xb5a/0xf90 [btrfs]
[  215.812295] [   T1110]        do_writepages+0x17f/0x7b0
[  215.813416] [   T1110]        __writeback_single_inode+0x114/0xb00
[  215.814575] [   T1110]        writeback_sb_inodes+0x52b/0xe00
[  215.815717] [   T1110]        wb_writeback+0x1a7/0x800
[  215.816924] [   T1110]        wb_workfn+0x12a/0xbd0
[  215.817951] [   T1110]        process_one_work+0x85a/0x1460
[  215.818985] [   T1110]        worker_thread+0x5e2/0xfc0
[  215.820013] [   T1110]        kthread+0x39d/0x750
[  215.821000] [   T1110]        ret_from_fork+0x30/0x70
[  215.822010] [   T1110]        ret_from_fork_asm+0x1a/0x30
[  215.822988] [   T1110] 
                          -> #1 (&fs_info->zoned_meta_io_lock){+.+.}-{4:4}:
[  215.824855] [   T1110]        __mutex_lock+0x1aa/0x1360
[  215.825856] [   T1110]        btree_write_cache_pages+0x252/0xf90 [btrfs]
[  215.827089] [   T1110]        do_writepages+0x17f/0x7b0
[  215.828027] [   T1110]        __writeback_single_inode+0x114/0xb00
[  215.829141] [   T1110]        writeback_sb_inodes+0x52b/0xe00
[  215.830129] [   T1110]        wb_writeback+0x1a7/0x800
[  215.831084] [   T1110]        wb_workfn+0x12a/0xbd0
[  215.831950] [   T1110]        process_one_work+0x85a/0x1460
[  215.832862] [   T1110]        worker_thread+0x5e2/0xfc0
[  215.833826] [   T1110]        kthread+0x39d/0x750
[  215.834715] [   T1110]        ret_from_fork+0x30/0x70
[  215.835669] [   T1110]        ret_from_fork_asm+0x1a/0x30
[  215.836594] [   T1110] 
                          -> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
[  215.838347] [   T1110]        __lock_acquire+0x2f52/0x5ea0
[  215.839258] [   T1110]        lock_acquire+0x1b1/0x540
[  215.840156] [   T1110]        __flush_work+0x3ac/0xb60
[  215.841041] [   T1110]        wb_shutdown+0x15b/0x1f0
[  215.841915] [   T1110]        bdi_unregister+0x172/0x5b0
[  215.842793] [   T1110]        del_gendisk+0x841/0xa20
[  215.843724] [   T1110]        sd_remove+0x85/0x130
[  215.844660] [   T1110]        device_release_driver_internal+0x368/0x520
[  215.845757] [   T1110]        bus_remove_device+0x1f1/0x3f0
[  215.846755] [   T1110]        device_del+0x3bd/0x9c0
[  215.847712] [   T1110]        __scsi_remove_device+0x272/0x340
[  215.848727] [   T1110]        scsi_forget_host+0xf7/0x170
[  215.849710] [   T1110]        scsi_remove_host+0xd2/0x2a0
[  215.850682] [   T1110]        sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
[  215.851788] [   T1110]        device_release_driver_internal+0x368/0x520
[  215.852853] [   T1110]        bus_remove_device+0x1f1/0x3f0
[  215.853885] [   T1110]        device_del+0x3bd/0x9c0
[  215.854840] [   T1110]        device_unregister+0x13/0xa0
[  215.855850] [   T1110]        sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
[  215.856947] [   T1110]        scsi_debug_exit+0x17/0x70 [scsi_debug]
[  215.857968] [   T1110]        __do_sys_delete_module.isra.0+0x321/0x520
[  215.858999] [   T1110]        do_syscall_64+0x93/0x180
[  215.859930] [   T1110]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  215.860974] [   T1110] 
                          other info that might help us debug this:

[  215.863317] [   T1110] Chain exists of:
                            (work_completion)(&(&wb->dwork)->work) --> &fs_info->dev_replace.rwsem --> &q->q_usage_counter(queue)#16

[  215.866277] [   T1110]  Possible unsafe locking scenario:

[  215.867927] [   T1110]        CPU0                    CPU1
[  215.868904] [   T1110]        ----                    ----
[  215.869880] [   T1110]   lock(&q->q_usage_counter(queue)#16);
[  215.870878] [   T1110]                                lock(&fs_info->dev_replace.rwsem);
[  215.872075] [   T1110]                                lock(&q->q_usage_counter(queue)#16);
[  215.873274] [   T1110]   lock((work_completion)(&(&wb->dwork)->work));
[  215.874332] [   T1110] 
                           *** DEADLOCK ***

[  215.876625] [   T1110] 5 locks held by modprobe/1110:
[  215.877579] [   T1110]  #0: ffff88811f7bc108 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520
[  215.879029] [   T1110]  #1: ffff8881022ee0e0 (&shost->scan_mutex){+.+.}-{4:4}, at: scsi_remove_host+0x20/0x2a0
[  215.880402] [   T1110]  #2: ffff88811b4c4378 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520
[  215.881861] [   T1110]  #3: ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130
[  215.883302] [   T1110]  #4: ffffffffa3284360 (rcu_read_lock){....}-{1:3}, at: __flush_work+0xda/0xb60
[  215.884667] [   T1110] 
                          stack backtrace:
[  215.886418] [   T1110] CPU: 0 UID: 0 PID: 1110 Comm: modprobe Not tainted 6.14.0-rc1 #252
[  215.886422] [   T1110] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[  215.886425] [   T1110] Call Trace:
[  215.886430] [   T1110]  <TASK>
[  215.886432] [   T1110]  dump_stack_lvl+0x6a/0x90
[  215.886440] [   T1110]  print_circular_bug.cold+0x1e0/0x274
[  215.886445] [   T1110]  check_noncircular+0x306/0x3f0
[  215.886449] [   T1110]  ? __pfx_check_noncircular+0x10/0x10
[  215.886452] [   T1110]  ? mark_lock+0xf5/0x1650
[  215.886454] [   T1110]  ? __pfx_check_irq_usage+0x10/0x10
[  215.886458] [   T1110]  ? lockdep_lock+0xca/0x1c0
[  215.886460] [   T1110]  ? __pfx_lockdep_lock+0x10/0x10
[  215.886464] [   T1110]  __lock_acquire+0x2f52/0x5ea0
[  215.886469] [   T1110]  ? __pfx___lock_acquire+0x10/0x10
[  215.886473] [   T1110]  ? __pfx_mark_lock+0x10/0x10
[  215.886476] [   T1110]  lock_acquire+0x1b1/0x540
[  215.886479] [   T1110]  ? __flush_work+0x38f/0xb60
[  215.886482] [   T1110]  ? __pfx_lock_acquire+0x10/0x10
[  215.886485] [   T1110]  ? __pfx_lock_release+0x10/0x10
[  215.886488] [   T1110]  ? mark_held_locks+0x94/0xe0
[  215.886492] [   T1110]  ? __flush_work+0x38f/0xb60
[  215.886494] [   T1110]  __flush_work+0x3ac/0xb60
[  215.886498] [   T1110]  ? __flush_work+0x38f/0xb60
[  215.886501] [   T1110]  ? __pfx_mark_lock+0x10/0x10
[  215.886503] [   T1110]  ? __pfx___flush_work+0x10/0x10
[  215.886506] [   T1110]  ? __pfx_wq_barrier_func+0x10/0x10
[  215.886515] [   T1110]  ? __pfx___might_resched+0x10/0x10
[  215.886520] [   T1110]  ? mark_held_locks+0x94/0xe0
[  215.886524] [   T1110]  wb_shutdown+0x15b/0x1f0
[  215.886527] [   T1110]  bdi_unregister+0x172/0x5b0
[  215.886530] [   T1110]  ? __pfx_bdi_unregister+0x10/0x10
[  215.886535] [   T1110]  ? up_write+0x1ba/0x510
[  215.886539] [   T1110]  del_gendisk+0x841/0xa20
[  215.886543] [   T1110]  ? __pfx_del_gendisk+0x10/0x10
[  215.886546] [   T1110]  ? _raw_spin_unlock_irqrestore+0x35/0x60
[  215.886550] [   T1110]  ? __pm_runtime_resume+0x79/0x110
[  215.886556] [   T1110]  sd_remove+0x85/0x130
[  215.886558] [   T1110]  device_release_driver_internal+0x368/0x520
[  215.886563] [   T1110]  ? kobject_put+0x5d/0x4a0
[  215.886567] [   T1110]  bus_remove_device+0x1f1/0x3f0
[  215.886570] [   T1110]  device_del+0x3bd/0x9c0
[  215.886574] [   T1110]  ? __pfx_device_del+0x10/0x10
[  215.886578] [   T1110]  __scsi_remove_device+0x272/0x340
[  215.886581] [   T1110]  scsi_forget_host+0xf7/0x170
[  215.886585] [   T1110]  scsi_remove_host+0xd2/0x2a0
[  215.886587] [   T1110]  sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
[  215.886600] [   T1110]  ? kernfs_remove_by_name_ns+0xc0/0xf0
[  215.886607] [   T1110]  device_release_driver_internal+0x368/0x520
[  215.886610] [   T1110]  ? kobject_put+0x5d/0x4a0
[  215.886613] [   T1110]  bus_remove_device+0x1f1/0x3f0
[  215.886616] [   T1110]  device_del+0x3bd/0x9c0
[  215.886619] [   T1110]  ? __pfx_device_del+0x10/0x10
[  215.886621] [   T1110]  ? __pfx___mutex_unlock_slowpath+0x10/0x10
[  215.886626] [   T1110]  device_unregister+0x13/0xa0
[  215.886628] [   T1110]  sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
[  215.886640] [   T1110]  scsi_debug_exit+0x17/0x70 [scsi_debug]
[  215.886652] [   T1110]  __do_sys_delete_module.isra.0+0x321/0x520
[  215.886655] [   T1110]  ? __pfx___do_sys_delete_module.isra.0+0x10/0x10
[  215.886657] [   T1110]  ? __pfx_slab_free_after_rcu_debug+0x10/0x10
[  215.886665] [   T1110]  ? kasan_save_stack+0x2c/0x50
[  215.886670] [   T1110]  ? kasan_record_aux_stack+0xa3/0xb0
[  215.886673] [   T1110]  ? __call_rcu_common.constprop.0+0xc4/0xfb0
[  215.886677] [   T1110]  ? kmem_cache_free+0x3a0/0x590
[  215.886679] [   T1110]  ? __x64_sys_close+0x78/0xd0
[  215.886687] [   T1110]  do_syscall_64+0x93/0x180
[  215.886694] [   T1110]  ? lock_is_held_type+0xd5/0x130
[  215.886697] [   T1110]  ? __call_rcu_common.constprop.0+0x3c0/0xfb0
[  215.886699] [   T1110]  ? lockdep_hardirqs_on+0x78/0x100
[  215.886701] [   T1110]  ? __call_rcu_common.constprop.0+0x3c0/0xfb0
[  215.886705] [   T1110]  ? __pfx___call_rcu_common.constprop.0+0x10/0x10
[  215.886710] [   T1110]  ? kmem_cache_free+0x3a0/0x590
[  215.886713] [   T1110]  ? lockdep_hardirqs_on_prepare+0x16d/0x400
[  215.886715] [   T1110]  ? do_syscall_64+0x9f/0x180
[  215.886717] [   T1110]  ? lockdep_hardirqs_on+0x78/0x100
[  215.886719] [   T1110]  ? do_syscall_64+0x9f/0x180
[  215.886721] [   T1110]  ? __pfx___x64_sys_openat+0x10/0x10
[  215.886725] [   T1110]  ? lockdep_hardirqs_on_prepare+0x16d/0x400
[  215.886727] [   T1110]  ? do_syscall_64+0x9f/0x180
[  215.886729] [   T1110]  ? lockdep_hardirqs_on+0x78/0x100
[  215.886731] [   T1110]  ? do_syscall_64+0x9f/0x180
[  215.886734] [   T1110]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  215.886737] [   T1110] RIP: 0033:0x7f436712b68b
[  215.886741] [   T1110] Code: 73 01 c3 48 8b 0d 8d a7 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d a7 0c 00 f7 d8 64 89 01 48
[  215.886743] [   T1110] RSP: 002b:00007ffe9f1a8658 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  215.886750] [   T1110] RAX: ffffffffffffffda RBX: 00005559b367fd80 RCX: 00007f436712b68b
[  215.886753] [   T1110] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005559b367fde8
[  215.886754] [   T1110] RBP: 00007ffe9f1a8680 R08: 1999999999999999 R09: 0000000000000000
[  215.886756] [   T1110] R10: 00007f43671a5fe0 R11: 0000000000000206 R12: 0000000000000000
[  215.886757] [   T1110] R13: 00007ffe9f1a86b0 R14: 0000000000000000 R15: 0000000000000000
[  215.886761] [   T1110]  </TASK>
[  215.989918] [   T1110] sd 9:0:0:0: [sdd] Synchronizing SCSI cache

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: blktests failures with v6.14-rc1 kernel
  2025-02-07  1:24 blktests failures with v6.14-rc1 kernel Shinichiro Kawasaki
@ 2025-02-07  8:52 ` Ming Lei
  2025-02-07 12:22   ` Shinichiro Kawasaki
  0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2025-02-07  8:52 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

Hi Shinichiro,

On Fri, Feb 07, 2025 at 01:24:09AM +0000, Shinichiro Kawasaki wrote:
> Hi all,
> 
> I ran the latest blktests (git hash: 67aff550bd52) with the v6.14-rc1 kernel.
> I observed 5 failures listed below. Comparing with the previous report with
> the v6.13 kernel [1], one new failure was observed at zbd/009.
> 
> [1] https://lore.kernel.org/linux-nvme/rv3w2zcno7n3bgdy2ghxmedsqf23ptmakvjerbhopgxjsvgzmo@ioece7dyg2og/
> 
> List of failures
> ================
> #1: block/002
> #2: nvme/037 (fc transport)
> #3: nvme/041 (fc transport)
> #4: nvme/058 (loop transport)
> #5: zbd/009 (new)
> 
> 
> Two failures observed with the v6.13 kernel are not observed with the v6.14-rc1.
> 
> Failures no longer observed
> ===========================
> #1: block/001:
>     It looks resolved by fixes in v6.14-rc1 kernel.
> 
> #2: throtl/001 (CKI project, s390 arch)
>     I was not able to find blktests runs by CKI project with the v6.14-rc1
>     kernel.
> 
> 
> Failure description
> ===================
> 
> #1: block/002
> 
>     This test case fails with a lockdep WARN "possible circular locking
>     dependency detected". The lockdep splats shows q->q_usage_counter as one
>     of the involved locks. It was observed with the v6.13-rc2 kernel [2], and
>     still observed with v6.14-rc1 kernel. It needs further debug.
> 
>     [2] https://lore.kernel.org/linux-block/qskveo3it6rqag4xyleobe5azpxu6tekihao4qpdopvk44una2@y4lkoe6y3d6z/

[  342.568086][ T1023] -> #0 (&mm->mmap_lock){++++}-{4:4}:
[  342.569658][ T1023]        __lock_acquire+0x2e8b/0x6010
[  342.570577][ T1023]        lock_acquire+0x1b1/0x540
[  342.571463][ T1023]        __might_fault+0xb9/0x120
[  342.572338][ T1023]        _copy_from_user+0x34/0xa0
[  342.573231][ T1023]        __blk_trace_setup+0xa0/0x140
[  342.574129][ T1023]        blk_trace_ioctl+0x14e/0x270
[  342.575033][ T1023]        blkdev_ioctl+0x38f/0x5c0
[  342.575919][ T1023]        __x64_sys_ioctl+0x130/0x190
[  342.576824][ T1023]        do_syscall_64+0x93/0x180
[  342.577714][ T1023]        entry_SYSCALL_64_after_hwframe+0x76/0x7e

The above dependency between ->mmap_lock and ->debugfs_mutex has been cut by
commit b769a2f409e7 ("blktrace: move copy_[to|from]_user() out of ->debugfs_lock"),
so I'd suggest to double check this one.

You mentioned it can be reproduced in v6.14-rc1, but not see such warning
from v6.14-rc1.

> #5: zbd/009 (new)
> 
>     This test case fails with a lockdep WARN "possible circular locking
>     dependency detected" [5]. The lockdep splats shows q->q_usage_counter as one
>     of the involved locks. This is common as the block/002 failure. It needs
>     further debug.
> 
> [5] kernel message during zbd/009 run
> 
> [  204.099296] [   T1004] run blktests zbd/009 at 2025-02-07 10:01:36
> [  204.155021] [   T1040] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
> [  204.553613] [   T1041] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
> [  204.554438] [   T1041] scsi host9: scsi_debug: version 0191 [20210520]
>                             dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0
> [  204.558331] [   T1041] scsi 9:0:0:0: Direct-Access-ZBC Linux    scsi_debug       0191 PQ: 0 ANSI: 7
> [  204.560269] [      C2] scsi 9:0:0:0: Power-on or device reset occurred
> [  204.562871] [   T1041] sd 9:0:0:0: Attached scsi generic sg3 type 20
> [  204.563013] [    T100] sd 9:0:0:0: [sdd] Host-managed zoned block device
> [  204.564518] [    T100] sd 9:0:0:0: [sdd] 262144 4096-byte logical blocks: (1.07 GB/1.00 GiB)
> [  204.565477] [    T100] sd 9:0:0:0: [sdd] Write Protect is off
> [  204.565948] [    T100] sd 9:0:0:0: [sdd] Mode Sense: 5b 00 10 08
> [  204.566245] [    T100] sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
> [  204.567453] [    T100] sd 9:0:0:0: [sdd] permanent stream count = 5
> [  204.568276] [    T100] sd 9:0:0:0: [sdd] Preferred minimum I/O size 4096 bytes
> [  204.569067] [    T100] sd 9:0:0:0: [sdd] Optimal transfer size 4194304 bytes
> [  204.571080] [    T100] sd 9:0:0:0: [sdd] 256 zones of 1024 logical blocks
> [  204.593822] [    T100] sd 9:0:0:0: [sdd] Attached SCSI disk
> [  204.901514] [   T1067] BTRFS: device fsid 15196e63-e303-48ed-9dcb-9ec397479c42 devid 1 transid 8 /dev/sdd (8:48) scanned by mount (1067)
> [  204.910330] [   T1067] BTRFS info (device sdd): first mount of filesystem 15196e63-e303-48ed-9dcb-9ec397479c42
> [  204.913129] [   T1067] BTRFS info (device sdd): using crc32c (crc32c-generic) checksum algorithm
> [  204.914856] [   T1067] BTRFS info (device sdd): using free-space-tree
> [  204.925816] [   T1067] BTRFS info (device sdd): host-managed zoned block device /dev/sdd, 256 zones of 4194304 bytes
> [  204.929320] [   T1067] BTRFS info (device sdd): zoned mode enabled with zone size 4194304
> [  204.935403] [   T1067] BTRFS info (device sdd): checking UUID tree
> [  215.637712] [   T1103] BTRFS info (device sdd): last unmount of filesystem 15196e63-e303-48ed-9dcb-9ec397479c42
> 
> [  215.762293] [   T1110] ======================================================
> [  215.763636] [   T1110] WARNING: possible circular locking dependency detected
> [  215.765092] [   T1110] 6.14.0-rc1 #252 Not tainted
> [  215.766271] [   T1110] ------------------------------------------------------
> [  215.767615] [   T1110] modprobe/1110 is trying to acquire lock:
> [  215.768999] [   T1110] ffff888100ac83e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x38f/0xb60
> [  215.770700] [   T1110] 
>                           but task is already holding lock:
> [  215.773077] [   T1110] ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130
> [  215.774685] [   T1110] 
>                           which lock already depends on the new lock.
> 
> [  215.778184] [   T1110] 
>                           the existing dependency chain (in reverse order) is:
> [  215.780532] [   T1110] 
>                           -> #3 (&q->q_usage_counter(queue)#16){++++}-{0:0}:
> [  215.782937] [   T1110]        blk_queue_enter+0x3d9/0x500
> [  215.784175] [   T1110]        blk_mq_alloc_request+0x47d/0x8e0
> [  215.785434] [   T1110]        scsi_execute_cmd+0x14f/0xb80
> [  215.786662] [   T1110]        sd_zbc_do_report_zones+0x1c1/0x470
> [  215.787989] [   T1110]        sd_zbc_report_zones+0x362/0xd60
> [  215.789222] [   T1110]        blkdev_report_zones+0x1b1/0x2e0
> [  215.790448] [   T1110]        btrfs_get_dev_zones+0x215/0x7e0 [btrfs]
> [  215.791887] [   T1110]        btrfs_load_block_group_zone_info+0x6d2/0x2c10 [btrfs]
> [  215.793342] [   T1110]        btrfs_make_block_group+0x36b/0x870 [btrfs]
> [  215.794752] [   T1110]        btrfs_create_chunk+0x147d/0x2320 [btrfs]
> [  215.796150] [   T1110]        btrfs_chunk_alloc+0x2ce/0xcf0 [btrfs]
> [  215.797474] [   T1110]        start_transaction+0xce6/0x1620 [btrfs]
> [  215.798858] [   T1110]        btrfs_uuid_scan_kthread+0x4ee/0x5b0 [btrfs]
> [  215.800334] [   T1110]        kthread+0x39d/0x750
> [  215.801479] [   T1110]        ret_from_fork+0x30/0x70
> [  215.802662] [   T1110]        ret_from_fork_asm+0x1a/0x30

Not sure why fs_info->dev_replace is required for reporting zones.

> [  215.803902] [   T1110] 
>                           -> #2 (&fs_info->dev_replace.rwsem){++++}-{4:4}:
> [  215.805993] [   T1110]        down_read+0x9b/0x470
> [  215.807088] [   T1110]        btrfs_map_block+0x2ce/0x2ce0 [btrfs]
> [  215.808366] [   T1110]        btrfs_submit_chunk+0x2d4/0x16c0 [btrfs]
> [  215.809687] [   T1110]        btrfs_submit_bbio+0x16/0x30 [btrfs]
> [  215.810983] [   T1110]        btree_write_cache_pages+0xb5a/0xf90 [btrfs]
> [  215.812295] [   T1110]        do_writepages+0x17f/0x7b0
> [  215.813416] [   T1110]        __writeback_single_inode+0x114/0xb00
> [  215.814575] [   T1110]        writeback_sb_inodes+0x52b/0xe00
> [  215.815717] [   T1110]        wb_writeback+0x1a7/0x800
> [  215.816924] [   T1110]        wb_workfn+0x12a/0xbd0
> [  215.817951] [   T1110]        process_one_work+0x85a/0x1460
> [  215.818985] [   T1110]        worker_thread+0x5e2/0xfc0
> [  215.820013] [   T1110]        kthread+0x39d/0x750
> [  215.821000] [   T1110]        ret_from_fork+0x30/0x70
> [  215.822010] [   T1110]        ret_from_fork_asm+0x1a/0x30
> [  215.822988] [   T1110] 
>                           -> #1 (&fs_info->zoned_meta_io_lock){+.+.}-{4:4}:
> [  215.824855] [   T1110]        __mutex_lock+0x1aa/0x1360
> [  215.825856] [   T1110]        btree_write_cache_pages+0x252/0xf90 [btrfs]
> [  215.827089] [   T1110]        do_writepages+0x17f/0x7b0
> [  215.828027] [   T1110]        __writeback_single_inode+0x114/0xb00
> [  215.829141] [   T1110]        writeback_sb_inodes+0x52b/0xe00
> [  215.830129] [   T1110]        wb_writeback+0x1a7/0x800
> [  215.831084] [   T1110]        wb_workfn+0x12a/0xbd0
> [  215.831950] [   T1110]        process_one_work+0x85a/0x1460
> [  215.832862] [   T1110]        worker_thread+0x5e2/0xfc0
> [  215.833826] [   T1110]        kthread+0x39d/0x750
> [  215.834715] [   T1110]        ret_from_fork+0x30/0x70
> [  215.835669] [   T1110]        ret_from_fork_asm+0x1a/0x30
> [  215.836594] [   T1110] 
>                           -> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
> [  215.838347] [   T1110]        __lock_acquire+0x2f52/0x5ea0
> [  215.839258] [   T1110]        lock_acquire+0x1b1/0x540
> [  215.840156] [   T1110]        __flush_work+0x3ac/0xb60
> [  215.841041] [   T1110]        wb_shutdown+0x15b/0x1f0
> [  215.841915] [   T1110]        bdi_unregister+0x172/0x5b0
> [  215.842793] [   T1110]        del_gendisk+0x841/0xa20
> [  215.843724] [   T1110]        sd_remove+0x85/0x130
> [  215.844660] [   T1110]        device_release_driver_internal+0x368/0x520
> [  215.845757] [   T1110]        bus_remove_device+0x1f1/0x3f0
> [  215.846755] [   T1110]        device_del+0x3bd/0x9c0
> [  215.847712] [   T1110]        __scsi_remove_device+0x272/0x340
> [  215.848727] [   T1110]        scsi_forget_host+0xf7/0x170
> [  215.849710] [   T1110]        scsi_remove_host+0xd2/0x2a0
> [  215.850682] [   T1110]        sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
> [  215.851788] [   T1110]        device_release_driver_internal+0x368/0x520
> [  215.852853] [   T1110]        bus_remove_device+0x1f1/0x3f0
> [  215.853885] [   T1110]        device_del+0x3bd/0x9c0
> [  215.854840] [   T1110]        device_unregister+0x13/0xa0
> [  215.855850] [   T1110]        sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
> [  215.856947] [   T1110]        scsi_debug_exit+0x17/0x70 [scsi_debug]
> [  215.857968] [   T1110]        __do_sys_delete_module.isra.0+0x321/0x520
> [  215.858999] [   T1110]        do_syscall_64+0x93/0x180
> [  215.859930] [   T1110]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  215.860974] [   T1110] 
>                           other info that might help us debug this:

This warning looks one real issue.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: blktests failures with v6.14-rc1 kernel
  2025-02-07  8:52 ` Ming Lei
@ 2025-02-07 12:22   ` Shinichiro Kawasaki
  2025-02-07 12:47     ` Ming Lei
  0 siblings, 1 reply; 4+ messages in thread
From: Shinichiro Kawasaki @ 2025-02-07 12:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On Feb 07, 2025 / 16:52, Ming Lei wrote:
> Hi Shinichiro,

Hi Ming, thanks for the comments. Let me comment on the block/002 failure.

> > Failure description
> > ===================
> > 
> > #1: block/002
> > 
> >     This test case fails with a lockdep WARN "possible circular locking
> >     dependency detected". The lockdep splats shows q->q_usage_counter as one
> >     of the involved locks. It was observed with the v6.13-rc2 kernel [2], and
> >     still observed with v6.14-rc1 kernel. It needs further debug.
> > 
> >     [2] https://lore.kernel.org/linux-block/qskveo3it6rqag4xyleobe5azpxu6tekihao4qpdopvk44una2@y4lkoe6y3d6z/
> 
> [  342.568086][ T1023] -> #0 (&mm->mmap_lock){++++}-{4:4}:
> [  342.569658][ T1023]        __lock_acquire+0x2e8b/0x6010
> [  342.570577][ T1023]        lock_acquire+0x1b1/0x540
> [  342.571463][ T1023]        __might_fault+0xb9/0x120
> [  342.572338][ T1023]        _copy_from_user+0x34/0xa0
> [  342.573231][ T1023]        __blk_trace_setup+0xa0/0x140
> [  342.574129][ T1023]        blk_trace_ioctl+0x14e/0x270
> [  342.575033][ T1023]        blkdev_ioctl+0x38f/0x5c0
> [  342.575919][ T1023]        __x64_sys_ioctl+0x130/0x190
> [  342.576824][ T1023]        do_syscall_64+0x93/0x180
> [  342.577714][ T1023]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> The above dependency between ->mmap_lock and ->debugfs_mutex has been cut by
> commit b769a2f409e7 ("blktrace: move copy_[to|from]_user() out of ->debugfs_lock"),
> so I'd suggest to double check this one.

Thanks. I missed the fix. Said that, I do still see the lockdep WARN "possible
circular locking dependency detected" with the kernel v6.14-rc1. Then I guess
there are two problems and I confused them. One problem was fixed by the commit
b769a2f409e7, and the other problem that I still observe.

Please take a look in the kernel message below, which was observed at the
block/002 failure I have just recreated on my test node. The splat indicates the
dependency different from that observed with v6.13-rc2 kernel.


[  165.526908] [   T1103] run blktests block/002 at 2025-02-07 21:02:22
[  165.814157] [   T1134] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
[  166.031013] [   T1135] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
[  166.031986] [   T1135] scsi host9: scsi_debug: version 0191 [20210520]
                            dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0
[  166.035727] [   T1135] scsi 9:0:0:0: Direct-Access     Linux    scsi_debug       0191 PQ: 0 ANSI: 7
[  166.038449] [      C1] scsi 9:0:0:0: Power-on or device reset occurred
[  166.045105] [   T1135] sd 9:0:0:0: Attached scsi generic sg3 type 0
[  166.046426] [     T94] sd 9:0:0:0: [sdd] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)
[  166.048275] [     T94] sd 9:0:0:0: [sdd] Write Protect is off
[  166.048854] [     T94] sd 9:0:0:0: [sdd] Mode Sense: 73 00 10 08
[  166.051019] [     T94] sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
[  166.059601] [     T94] sd 9:0:0:0: [sdd] permanent stream count = 5
[  166.063623] [     T94] sd 9:0:0:0: [sdd] Preferred minimum I/O size 512 bytes
[  166.064329] [     T94] sd 9:0:0:0: [sdd] Optimal transfer size 524288 bytes
[  166.094781] [     T94] sd 9:0:0:0: [sdd] Attached SCSI disk

[  166.855819] [   T1161] ======================================================
[  166.856339] [   T1161] WARNING: possible circular locking dependency detected
[  166.856945] [   T1161] 6.14.0-rc1 #252 Not tainted
[  166.857292] [   T1161] ------------------------------------------------------
[  166.857874] [   T1161] blktrace/1161 is trying to acquire lock:
[  166.858310] [   T1161] ffff88811dbfe5e0 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x99/0x120
[  166.859053] [   T1161] 
                          but task is already holding lock:
[  166.859593] [   T1161] ffff8881082a1078 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: relay_file_read+0xa3/0x8a0
[  166.860410] [   T1161] 
                          which lock already depends on the new lock.

[  166.861269] [   T1161] 
                          the existing dependency chain (in reverse order) is:
[  166.863693] [   T1161] 
                          -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
[  166.866064] [   T1161]        down_write+0x8d/0x200
[  166.867266] [   T1161]        start_creating.part.0+0x82/0x230
[  166.868544] [   T1161]        debugfs_create_dir+0x3a/0x4c0
[  166.869797] [   T1161]        blk_register_queue+0x12d/0x430
[  166.870986] [   T1161]        add_disk_fwnode+0x6b1/0x1010
[  166.872144] [   T1161]        sd_probe+0x94e/0xf30
[  166.873262] [   T1161]        really_probe+0x1e3/0x8a0
[  166.874372] [   T1161]        __driver_probe_device+0x18c/0x370
[  166.875544] [   T1161]        driver_probe_device+0x4a/0x120
[  166.876715] [   T1161]        __device_attach_driver+0x15e/0x270
[  166.877890] [   T1161]        bus_for_each_drv+0x114/0x1a0
[  166.878999] [   T1161]        __device_attach_async_helper+0x19c/0x240
[  166.880180] [   T1161]        async_run_entry_fn+0x96/0x4f0
[  166.881312] [   T1161]        process_one_work+0x85a/0x1460
[  166.882411] [   T1161]        worker_thread+0x5e2/0xfc0
[  166.883483] [   T1161]        kthread+0x39d/0x750
[  166.884548] [   T1161]        ret_from_fork+0x30/0x70
[  166.885629] [   T1161]        ret_from_fork_asm+0x1a/0x30
[  166.886728] [   T1161] 
                          -> #4 (&q->debugfs_mutex){+.+.}-{4:4}:
[  166.888799] [   T1161]        __mutex_lock+0x1aa/0x1360
[  166.889863] [   T1161]        blk_mq_init_sched+0x3b5/0x5e0
[  166.890907] [   T1161]        elevator_switch+0x149/0x4b0
[  166.891928] [   T1161]        elv_iosched_store+0x29f/0x380
[  166.892966] [   T1161]        queue_attr_store+0x313/0x480
[  166.893976] [   T1161]        kernfs_fop_write_iter+0x39e/0x5a0
[  166.895012] [   T1161]        vfs_write+0x5f9/0xe90
[  166.895970] [   T1161]        ksys_write+0xf6/0x1c0
[  166.896931] [   T1161]        do_syscall_64+0x93/0x180
[  166.897886] [   T1161]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  166.898945] [   T1161] 
                          -> #3 (&q->q_usage_counter(io)#2){++++}-{0:0}:
[  166.900757] [   T1161]        blk_mq_submit_bio+0x19b8/0x2070
[  166.901784] [   T1161]        __submit_bio+0x36b/0x6d0
[  166.902748] [   T1161]        submit_bio_noacct_nocheck+0x542/0xca0
[  166.903796] [   T1161]        iomap_readahead+0x4c4/0x7e0
[  166.904778] [   T1161]        read_pages+0x198/0xaf0
[  166.905718] [   T1161]        page_cache_ra_order+0x4d3/0xb50
[  166.906709] [   T1161]        filemap_get_pages+0x2c7/0x1850
[  166.907684] [   T1161]        filemap_read+0x31d/0xc30
[  166.908617] [   T1161]        xfs_file_buffered_read+0x1e9/0x2a0 [xfs]
[  166.909910] [   T1161]        xfs_file_read_iter+0x25f/0x4a0 [xfs]
[  166.911131] [   T1161]        vfs_read+0x6bc/0xa20
[  166.911994] [   T1161]        ksys_read+0xf6/0x1c0
[  166.912862] [   T1161]        do_syscall_64+0x93/0x180
[  166.913736] [   T1161]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  166.914751] [   T1161] 
                          -> #2 (mapping.invalidate_lock#2){++++}-{4:4}:
[  166.916401] [   T1161]        down_read+0x9b/0x470
[  166.917285] [   T1161]        filemap_fault+0x231/0x2ac0
[  166.918179] [   T1161]        __do_fault+0xf4/0x5d0
[  166.919039] [   T1161]        do_fault+0x965/0x11d0
[  166.919895] [   T1161]        __handle_mm_fault+0x1060/0x1f40
[  166.920829] [   T1161]        handle_mm_fault+0x21a/0x6b0
[  166.921722] [   T1161]        do_user_addr_fault+0x322/0xaa0
[  166.922648] [   T1161]        exc_page_fault+0x7a/0x110
[  166.923540] [   T1161]        asm_exc_page_fault+0x22/0x30
[  166.924466] [   T1161] 
                          -> #1 (&vma->vm_lock->lock){++++}-{4:4}:
[  166.926085] [   T1161]        down_write+0x8d/0x200
[  166.926943] [   T1161]        vma_link+0x1ff/0x590
[  166.927794] [   T1161]        insert_vm_struct+0x15a/0x340
[  166.928713] [   T1161]        alloc_bprm+0x626/0xbf0
[  166.929578] [   T1161]        kernel_execve+0x85/0x2f0
[  166.930476] [   T1161]        call_usermodehelper_exec_async+0x21b/0x430
[  166.931481] [   T1161]        ret_from_fork+0x30/0x70
[  166.932371] [   T1161]        ret_from_fork_asm+0x1a/0x30
[  166.933291] [   T1161] 
                          -> #0 (&mm->mmap_lock){++++}-{4:4}:
[  166.934868] [   T1161]        __lock_acquire+0x2f52/0x5ea0
[  166.935768] [   T1161]        lock_acquire+0x1b1/0x540
[  166.936678] [   T1161]        __might_fault+0xb9/0x120
[  166.937563] [   T1161]        _copy_to_user+0x1e/0x80
[  166.938438] [   T1161]        relay_file_read+0x149/0x8a0
[  166.939343] [   T1161]        full_proxy_read+0x110/0x1d0
[  166.940224] [   T1161]        vfs_read+0x1bb/0xa20
[  166.941082] [   T1161]        ksys_read+0xf6/0x1c0
[  166.941920] [   T1161]        do_syscall_64+0x93/0x180
[  166.942780] [   T1161]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  166.943755] [   T1161] 
                          other info that might help us debug this:

[  166.945985] [   T1161] Chain exists of:
                            &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3

[  166.948504] [   T1161]  Possible unsafe locking scenario:

[  166.950078] [   T1161]        CPU0                    CPU1
[  166.950965] [   T1161]        ----                    ----
[  166.951849] [   T1161]   lock(&sb->s_type->i_mutex_key#3);
[  166.952751] [   T1161]                                lock(&q->debugfs_mutex);
[  166.953789] [   T1161]                                lock(&sb->s_type->i_mutex_key#3);
[  166.954860] [   T1161]   rlock(&mm->mmap_lock);
[  166.955689] [   T1161] 
                           *** DEADLOCK ***

[  166.957776] [   T1161] 2 locks held by blktrace/1161:
[  166.958623] [   T1161]  #0: ffff88813c9ebcf8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0x233/0x2f0
[  166.959808] [   T1161]  #1: ffff8881082a1078 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: relay_file_read+0xa3/0x8a0
[  166.961118] [   T1161] 
                          stack backtrace:
[  166.962578] [   T1161] CPU: 2 UID: 0 PID: 1161 Comm: blktrace Not tainted 6.14.0-rc1 #252
[  166.962582] [   T1161] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[  166.962585] [   T1161] Call Trace:
[  166.962586] [   T1161]  <TASK>
p[  166.962588] [   T1161]  dump_stack_lvl+0x6a/0x90
[  166.962593] [   T1161]  print_circular_bug.cold+0x1e0/0x274
[  166.962597] [   T1161]  check_noncircular+0x306/0x3f0
[  166.962600] [   T1161]  ? __pfx_check_noncircular+0x10/0x10
[  166.962605] [   T1161]  ? lockdep_lock+0xca/0x1c0
[  166.962607] [   T1161]  ? __pfx_lockdep_lock+0x10/0x10
[  166.962611] [   T1161]  __lock_acquire+0x2f52/0x5ea0
[  166.962616] [   T1161]  ? __pfx___lock_acquire+0x10/0x10
[  166.962619] [   T1161]  ? lock_acquire+0x1b1/0x540
[  166.962623] [   T1161]  lock_acquire+0x1b1/0x540
[  166.962625] [   T1161]  ? __might_fault+0x99/0x120
[  166.962628] [   T1161]  ? __pfx_lock_acquire+0x10/0x10
[  166.962631] [   T1161]  ? lock_is_held_type+0xd5/0x130
[  166.962635] [   T1161]  ? __pfx___might_resched+0x10/0x10
[  166.962637] [   T1161]  ? _raw_spin_unlock+0x29/0x50
[  166.962640] [   T1161]  ? __might_fault+0x99/0x120
[  166.962642] [   T1161]  __might_fault+0xb9/0x120
[  166.962649] [   T1161]  ? __might_fault+0x99/0x120
[  166.962651] [   T1161]  ? __check_object_size+0x3f3/0x540
[  166.962654] [   T1161]  _copy_to_user+0x1e/0x80
[  166.962656] [   T1161]  relay_file_read+0x149/0x8a0
[  166.962661] [   T1161]  ? selinux_file_permission+0x36d/0x420
[  166.962665] [   T1161]  full_proxy_read+0x110/0x1d0
[  166.962667] [   T1161]  ? rw_verify_area+0x2f7/0x520
[  166.962670] [   T1161]  vfs_read+0x1bb/0xa20
[  166.962672] [   T1161]  ? __pfx___mutex_lock+0x10/0x10
[  166.962674] [   T1161]  ? __pfx_lock_release+0x10/0x10
[  166.962677] [   T1161]  ? __pfx_vfs_read+0x10/0x10
[  166.962683] [   T1161]  ? __fget_files+0x1ae/0x2e0
[  166.962687] [   T1161]  ksys_read+0xf6/0x1c0
[  166.962689] [   T1161]  ? __pfx_ksys_read+0x10/0x10
[  166.962693] [   T1161]  do_syscall_64+0x93/0x180
[  166.962697] [   T1161]  ? lockdep_hardirqs_on_prepare+0x16d/0x400
[  166.962700] [   T1161]  ? do_syscall_64+0x9f/0x180
[  166.962702] [   T1161]  ? lockdep_hardirqs_on+0x78/0x100
[  166.962704] [   T1161]  ? do_syscall_64+0x9f/0x180
[  166.962708] [   T1161]  ? lockdep_hardirqs_on_prepare+0x16d/0x400
[  166.962711] [   T1161]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  166.962714] [   T1161] RIP: 0033:0x7fdddfde0e4a
[  166.962719] [   T1161] Code: 55 48 89 e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 28 58 f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 2e 44 89 c7 48 89 45 f8 e8 82 58 f8 ff 48 8b
[  166.962721] [   T1161] RSP: 002b:00007fdddecccd80 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  166.962726] [   T1161] RAX: ffffffffffffffda RBX: 00007fddd4000bb0 RCX: 00007fdddfde0e4a
[  166.962727] [   T1161] RDX: 0000000000080000 RSI: 00007fdddc400000 RDI: 0000000000000006
[  166.962729] [   T1161] RBP: 00007fdddecccda0 R08: 0000000000000000 R09: 0000000000000000
[  166.962730] [   T1161] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[  166.962731] [   T1161] R13: 000056530426caa0 R14: 0000000000000000 R15: 00007fddd4002c90
[  166.962735] [   T1161]  </TASK>
[  167.066759] [   T1103] sd 9:0:0:0: [sdd] Synchronizing SCSI cache

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: blktests failures with v6.14-rc1 kernel
  2025-02-07 12:22   ` Shinichiro Kawasaki
@ 2025-02-07 12:47     ` Ming Lei
  0 siblings, 0 replies; 4+ messages in thread
From: Ming Lei @ 2025-02-07 12:47 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On Fri, Feb 07, 2025 at 12:22:32PM +0000, Shinichiro Kawasaki wrote:
> On Feb 07, 2025 / 16:52, Ming Lei wrote:
> > Hi Shinichiro,
> 
> Hi Ming, thanks for the comments. Let me comment on the block/002 failure.
> 
> > > Failure description
> > > ===================
> > > 
> > > #1: block/002
> > > 
> > >     This test case fails with a lockdep WARN "possible circular locking
> > >     dependency detected". The lockdep splats shows q->q_usage_counter as one
> > >     of the involved locks. It was observed with the v6.13-rc2 kernel [2], and
> > >     still observed with v6.14-rc1 kernel. It needs further debug.
> > > 
> > >     [2] https://lore.kernel.org/linux-block/qskveo3it6rqag4xyleobe5azpxu6tekihao4qpdopvk44una2@y4lkoe6y3d6z/
> > 
> > [  342.568086][ T1023] -> #0 (&mm->mmap_lock){++++}-{4:4}:
> > [  342.569658][ T1023]        __lock_acquire+0x2e8b/0x6010
> > [  342.570577][ T1023]        lock_acquire+0x1b1/0x540
> > [  342.571463][ T1023]        __might_fault+0xb9/0x120
> > [  342.572338][ T1023]        _copy_from_user+0x34/0xa0
> > [  342.573231][ T1023]        __blk_trace_setup+0xa0/0x140
> > [  342.574129][ T1023]        blk_trace_ioctl+0x14e/0x270
> > [  342.575033][ T1023]        blkdev_ioctl+0x38f/0x5c0
> > [  342.575919][ T1023]        __x64_sys_ioctl+0x130/0x190
> > [  342.576824][ T1023]        do_syscall_64+0x93/0x180
> > [  342.577714][ T1023]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > 
> > The above dependency between ->mmap_lock and ->debugfs_mutex has been cut by
> > commit b769a2f409e7 ("blktrace: move copy_[to|from]_user() out of ->debugfs_lock"),
> > so I'd suggest to double check this one.
> 
> Thanks. I missed the fix. Said that, I do still see the lockdep WARN "possible
> circular locking dependency detected" with the kernel v6.14-rc1. Then I guess
> there are two problems and I confused them. One problem was fixed by the commit
> b769a2f409e7, and the other problem that I still observe.
> 
> Please take a look in the kernel message below, which was observed at the
> block/002 failure I have just recreated on my test node. The splat indicates the
> dependency different from that observed with v6.13-rc2 kernel.

Yeah, indeed, thanks for sharing the log.

> 
> 
> [  165.526908] [   T1103] run blktests block/002 at 2025-02-07 21:02:22
> [  165.814157] [   T1134] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
> [  166.031013] [   T1135] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
> [  166.031986] [   T1135] scsi host9: scsi_debug: version 0191 [20210520]
>                             dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0
> [  166.035727] [   T1135] scsi 9:0:0:0: Direct-Access     Linux    scsi_debug       0191 PQ: 0 ANSI: 7
> [  166.038449] [      C1] scsi 9:0:0:0: Power-on or device reset occurred
> [  166.045105] [   T1135] sd 9:0:0:0: Attached scsi generic sg3 type 0
> [  166.046426] [     T94] sd 9:0:0:0: [sdd] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)
> [  166.048275] [     T94] sd 9:0:0:0: [sdd] Write Protect is off
> [  166.048854] [     T94] sd 9:0:0:0: [sdd] Mode Sense: 73 00 10 08
> [  166.051019] [     T94] sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
> [  166.059601] [     T94] sd 9:0:0:0: [sdd] permanent stream count = 5
> [  166.063623] [     T94] sd 9:0:0:0: [sdd] Preferred minimum I/O size 512 bytes
> [  166.064329] [     T94] sd 9:0:0:0: [sdd] Optimal transfer size 524288 bytes
> [  166.094781] [     T94] sd 9:0:0:0: [sdd] Attached SCSI disk
> 
> [  166.855819] [   T1161] ======================================================
> [  166.856339] [   T1161] WARNING: possible circular locking dependency detected
> [  166.856945] [   T1161] 6.14.0-rc1 #252 Not tainted
> [  166.857292] [   T1161] ------------------------------------------------------
> [  166.857874] [   T1161] blktrace/1161 is trying to acquire lock:
> [  166.858310] [   T1161] ffff88811dbfe5e0 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x99/0x120
> [  166.859053] [   T1161] 
>                           but task is already holding lock:
> [  166.859593] [   T1161] ffff8881082a1078 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: relay_file_read+0xa3/0x8a0
> [  166.860410] [   T1161] 
>                           which lock already depends on the new lock.
> 
> [  166.861269] [   T1161] 
>                           the existing dependency chain (in reverse order) is:
> [  166.863693] [   T1161] 
>                           -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
> [  166.866064] [   T1161]        down_write+0x8d/0x200
> [  166.867266] [   T1161]        start_creating.part.0+0x82/0x230
> [  166.868544] [   T1161]        debugfs_create_dir+0x3a/0x4c0
> [  166.869797] [   T1161]        blk_register_queue+0x12d/0x430
> [  166.870986] [   T1161]        add_disk_fwnode+0x6b1/0x1010
> [  166.872144] [   T1161]        sd_probe+0x94e/0xf30
> [  166.873262] [   T1161]        really_probe+0x1e3/0x8a0
> [  166.874372] [   T1161]        __driver_probe_device+0x18c/0x370
> [  166.875544] [   T1161]        driver_probe_device+0x4a/0x120
> [  166.876715] [   T1161]        __device_attach_driver+0x15e/0x270
> [  166.877890] [   T1161]        bus_for_each_drv+0x114/0x1a0
> [  166.878999] [   T1161]        __device_attach_async_helper+0x19c/0x240
> [  166.880180] [   T1161]        async_run_entry_fn+0x96/0x4f0
> [  166.881312] [   T1161]        process_one_work+0x85a/0x1460
> [  166.882411] [   T1161]        worker_thread+0x5e2/0xfc0
> [  166.883483] [   T1161]        kthread+0x39d/0x750
> [  166.884548] [   T1161]        ret_from_fork+0x30/0x70
> [  166.885629] [   T1161]        ret_from_fork_asm+0x1a/0x30
> [  166.886728] [   T1161] 
>                           -> #4 (&q->debugfs_mutex){+.+.}-{4:4}:
> [  166.888799] [   T1161]        __mutex_lock+0x1aa/0x1360
> [  166.889863] [   T1161]        blk_mq_init_sched+0x3b5/0x5e0
> [  166.890907] [   T1161]        elevator_switch+0x149/0x4b0
> [  166.891928] [   T1161]        elv_iosched_store+0x29f/0x380
> [  166.892966] [   T1161]        queue_attr_store+0x313/0x480
> [  166.893976] [   T1161]        kernfs_fop_write_iter+0x39e/0x5a0
> [  166.895012] [   T1161]        vfs_write+0x5f9/0xe90
> [  166.895970] [   T1161]        ksys_write+0xf6/0x1c0
> [  166.896931] [   T1161]        do_syscall_64+0x93/0x180
> [  166.897886] [   T1161]        entry_SYSCALL_64_after_hwframe+0x76/0x7e

I think we can kill the above dependency, all debugfs API needn't to be
protected by q->debugfs_mutex, which is supposed for covering block
layer internal data structure update & query.

I will try to cook patch for fixing this one.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-02-07 12:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07  1:24 blktests failures with v6.14-rc1 kernel Shinichiro Kawasaki
2025-02-07  8:52 ` Ming Lei
2025-02-07 12:22   ` Shinichiro Kawasaki
2025-02-07 12:47     ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).