Re: blktests failures with v6.14-rc1 kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"nbd@other.debian.org" <nbd@other.debian.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: blktests failures with v6.14-rc1 kernel
Date: Fri, 7 Feb 2025 16:52:08 +0800	[thread overview]
Message-ID: <Z6XJuIz012XATr7h@fedora> (raw)
In-Reply-To: <uyijd3ufbrfbiyyaajvhyhdyytssubekvymzgyiqjqmkh33cmi@ksqjpewsqlvw>

Hi Shinichiro,

On Fri, Feb 07, 2025 at 01:24:09AM +0000, Shinichiro Kawasaki wrote:
> Hi all,
> 
> I ran the latest blktests (git hash: 67aff550bd52) with the v6.14-rc1 kernel.
> I observed 5 failures listed below. Comparing with the previous report with
> the v6.13 kernel [1], one new failure was observed at zbd/009.
> 
> [1] https://lore.kernel.org/linux-nvme/rv3w2zcno7n3bgdy2ghxmedsqf23ptmakvjerbhopgxjsvgzmo@ioece7dyg2og/
> 
> List of failures
> ================
> #1: block/002
> #2: nvme/037 (fc transport)
> #3: nvme/041 (fc transport)
> #4: nvme/058 (loop transport)
> #5: zbd/009 (new)
> 
> 
> Two failures observed with the v6.13 kernel are not observed with the v6.14-rc1.
> 
> Failures no longer observed
> ===========================
> #1: block/001:
>     It looks resolved by fixes in v6.14-rc1 kernel.
> 
> #2: throtl/001 (CKI project, s390 arch)
>     I was not able to find blktests runs by CKI project with the v6.14-rc1
>     kernel.
> 
> 
> Failure description
> ===================
> 
> #1: block/002
> 
>     This test case fails with a lockdep WARN "possible circular locking
>     dependency detected". The lockdep splats shows q->q_usage_counter as one
>     of the involved locks. It was observed with the v6.13-rc2 kernel [2], and
>     still observed with v6.14-rc1 kernel. It needs further debug.
> 
>     [2] https://lore.kernel.org/linux-block/qskveo3it6rqag4xyleobe5azpxu6tekihao4qpdopvk44una2@y4lkoe6y3d6z/

[  342.568086][ T1023] -> #0 (&mm->mmap_lock){++++}-{4:4}:
[  342.569658][ T1023]        __lock_acquire+0x2e8b/0x6010
[  342.570577][ T1023]        lock_acquire+0x1b1/0x540
[  342.571463][ T1023]        __might_fault+0xb9/0x120
[  342.572338][ T1023]        _copy_from_user+0x34/0xa0
[  342.573231][ T1023]        __blk_trace_setup+0xa0/0x140
[  342.574129][ T1023]        blk_trace_ioctl+0x14e/0x270
[  342.575033][ T1023]        blkdev_ioctl+0x38f/0x5c0
[  342.575919][ T1023]        __x64_sys_ioctl+0x130/0x190
[  342.576824][ T1023]        do_syscall_64+0x93/0x180
[  342.577714][ T1023]        entry_SYSCALL_64_after_hwframe+0x76/0x7e

The above dependency between ->mmap_lock and ->debugfs_mutex has been cut by
commit b769a2f409e7 ("blktrace: move copy_[to|from]_user() out of ->debugfs_lock"),
so I'd suggest to double check this one.

You mentioned it can be reproduced in v6.14-rc1, but not see such warning
from v6.14-rc1.

> #5: zbd/009 (new)
> 
>     This test case fails with a lockdep WARN "possible circular locking
>     dependency detected" [5]. The lockdep splats shows q->q_usage_counter as one
>     of the involved locks. This is common as the block/002 failure. It needs
>     further debug.
> 
> [5] kernel message during zbd/009 run
> 
> [  204.099296] [   T1004] run blktests zbd/009 at 2025-02-07 10:01:36
> [  204.155021] [   T1040] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
> [  204.553613] [   T1041] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
> [  204.554438] [   T1041] scsi host9: scsi_debug: version 0191 [20210520]
>                             dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0
> [  204.558331] [   T1041] scsi 9:0:0:0: Direct-Access-ZBC Linux    scsi_debug       0191 PQ: 0 ANSI: 7
> [  204.560269] [      C2] scsi 9:0:0:0: Power-on or device reset occurred
> [  204.562871] [   T1041] sd 9:0:0:0: Attached scsi generic sg3 type 20
> [  204.563013] [    T100] sd 9:0:0:0: [sdd] Host-managed zoned block device
> [  204.564518] [    T100] sd 9:0:0:0: [sdd] 262144 4096-byte logical blocks: (1.07 GB/1.00 GiB)
> [  204.565477] [    T100] sd 9:0:0:0: [sdd] Write Protect is off
> [  204.565948] [    T100] sd 9:0:0:0: [sdd] Mode Sense: 5b 00 10 08
> [  204.566245] [    T100] sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
> [  204.567453] [    T100] sd 9:0:0:0: [sdd] permanent stream count = 5
> [  204.568276] [    T100] sd 9:0:0:0: [sdd] Preferred minimum I/O size 4096 bytes
> [  204.569067] [    T100] sd 9:0:0:0: [sdd] Optimal transfer size 4194304 bytes
> [  204.571080] [    T100] sd 9:0:0:0: [sdd] 256 zones of 1024 logical blocks
> [  204.593822] [    T100] sd 9:0:0:0: [sdd] Attached SCSI disk
> [  204.901514] [   T1067] BTRFS: device fsid 15196e63-e303-48ed-9dcb-9ec397479c42 devid 1 transid 8 /dev/sdd (8:48) scanned by mount (1067)
> [  204.910330] [   T1067] BTRFS info (device sdd): first mount of filesystem 15196e63-e303-48ed-9dcb-9ec397479c42
> [  204.913129] [   T1067] BTRFS info (device sdd): using crc32c (crc32c-generic) checksum algorithm
> [  204.914856] [   T1067] BTRFS info (device sdd): using free-space-tree
> [  204.925816] [   T1067] BTRFS info (device sdd): host-managed zoned block device /dev/sdd, 256 zones of 4194304 bytes
> [  204.929320] [   T1067] BTRFS info (device sdd): zoned mode enabled with zone size 4194304
> [  204.935403] [   T1067] BTRFS info (device sdd): checking UUID tree
> [  215.637712] [   T1103] BTRFS info (device sdd): last unmount of filesystem 15196e63-e303-48ed-9dcb-9ec397479c42
> 
> [  215.762293] [   T1110] ======================================================
> [  215.763636] [   T1110] WARNING: possible circular locking dependency detected
> [  215.765092] [   T1110] 6.14.0-rc1 #252 Not tainted
> [  215.766271] [   T1110] ------------------------------------------------------
> [  215.767615] [   T1110] modprobe/1110 is trying to acquire lock:
> [  215.768999] [   T1110] ffff888100ac83e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x38f/0xb60
> [  215.770700] [   T1110] 
>                           but task is already holding lock:
> [  215.773077] [   T1110] ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130
> [  215.774685] [   T1110] 
>                           which lock already depends on the new lock.
> 
> [  215.778184] [   T1110] 
>                           the existing dependency chain (in reverse order) is:
> [  215.780532] [   T1110] 
>                           -> #3 (&q->q_usage_counter(queue)#16){++++}-{0:0}:
> [  215.782937] [   T1110]        blk_queue_enter+0x3d9/0x500
> [  215.784175] [   T1110]        blk_mq_alloc_request+0x47d/0x8e0
> [  215.785434] [   T1110]        scsi_execute_cmd+0x14f/0xb80
> [  215.786662] [   T1110]        sd_zbc_do_report_zones+0x1c1/0x470
> [  215.787989] [   T1110]        sd_zbc_report_zones+0x362/0xd60
> [  215.789222] [   T1110]        blkdev_report_zones+0x1b1/0x2e0
> [  215.790448] [   T1110]        btrfs_get_dev_zones+0x215/0x7e0 [btrfs]
> [  215.791887] [   T1110]        btrfs_load_block_group_zone_info+0x6d2/0x2c10 [btrfs]
> [  215.793342] [   T1110]        btrfs_make_block_group+0x36b/0x870 [btrfs]
> [  215.794752] [   T1110]        btrfs_create_chunk+0x147d/0x2320 [btrfs]
> [  215.796150] [   T1110]        btrfs_chunk_alloc+0x2ce/0xcf0 [btrfs]
> [  215.797474] [   T1110]        start_transaction+0xce6/0x1620 [btrfs]
> [  215.798858] [   T1110]        btrfs_uuid_scan_kthread+0x4ee/0x5b0 [btrfs]
> [  215.800334] [   T1110]        kthread+0x39d/0x750
> [  215.801479] [   T1110]        ret_from_fork+0x30/0x70
> [  215.802662] [   T1110]        ret_from_fork_asm+0x1a/0x30

Not sure why fs_info->dev_replace is required for reporting zones.

> [  215.803902] [   T1110] 
>                           -> #2 (&fs_info->dev_replace.rwsem){++++}-{4:4}:
> [  215.805993] [   T1110]        down_read+0x9b/0x470
> [  215.807088] [   T1110]        btrfs_map_block+0x2ce/0x2ce0 [btrfs]
> [  215.808366] [   T1110]        btrfs_submit_chunk+0x2d4/0x16c0 [btrfs]
> [  215.809687] [   T1110]        btrfs_submit_bbio+0x16/0x30 [btrfs]
> [  215.810983] [   T1110]        btree_write_cache_pages+0xb5a/0xf90 [btrfs]
> [  215.812295] [   T1110]        do_writepages+0x17f/0x7b0
> [  215.813416] [   T1110]        __writeback_single_inode+0x114/0xb00
> [  215.814575] [   T1110]        writeback_sb_inodes+0x52b/0xe00
> [  215.815717] [   T1110]        wb_writeback+0x1a7/0x800
> [  215.816924] [   T1110]        wb_workfn+0x12a/0xbd0
> [  215.817951] [   T1110]        process_one_work+0x85a/0x1460
> [  215.818985] [   T1110]        worker_thread+0x5e2/0xfc0
> [  215.820013] [   T1110]        kthread+0x39d/0x750
> [  215.821000] [   T1110]        ret_from_fork+0x30/0x70
> [  215.822010] [   T1110]        ret_from_fork_asm+0x1a/0x30
> [  215.822988] [   T1110] 
>                           -> #1 (&fs_info->zoned_meta_io_lock){+.+.}-{4:4}:
> [  215.824855] [   T1110]        __mutex_lock+0x1aa/0x1360
> [  215.825856] [   T1110]        btree_write_cache_pages+0x252/0xf90 [btrfs]
> [  215.827089] [   T1110]        do_writepages+0x17f/0x7b0
> [  215.828027] [   T1110]        __writeback_single_inode+0x114/0xb00
> [  215.829141] [   T1110]        writeback_sb_inodes+0x52b/0xe00
> [  215.830129] [   T1110]        wb_writeback+0x1a7/0x800
> [  215.831084] [   T1110]        wb_workfn+0x12a/0xbd0
> [  215.831950] [   T1110]        process_one_work+0x85a/0x1460
> [  215.832862] [   T1110]        worker_thread+0x5e2/0xfc0
> [  215.833826] [   T1110]        kthread+0x39d/0x750
> [  215.834715] [   T1110]        ret_from_fork+0x30/0x70
> [  215.835669] [   T1110]        ret_from_fork_asm+0x1a/0x30
> [  215.836594] [   T1110] 
>                           -> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
> [  215.838347] [   T1110]        __lock_acquire+0x2f52/0x5ea0
> [  215.839258] [   T1110]        lock_acquire+0x1b1/0x540
> [  215.840156] [   T1110]        __flush_work+0x3ac/0xb60
> [  215.841041] [   T1110]        wb_shutdown+0x15b/0x1f0
> [  215.841915] [   T1110]        bdi_unregister+0x172/0x5b0
> [  215.842793] [   T1110]        del_gendisk+0x841/0xa20
> [  215.843724] [   T1110]        sd_remove+0x85/0x130
> [  215.844660] [   T1110]        device_release_driver_internal+0x368/0x520
> [  215.845757] [   T1110]        bus_remove_device+0x1f1/0x3f0
> [  215.846755] [   T1110]        device_del+0x3bd/0x9c0
> [  215.847712] [   T1110]        __scsi_remove_device+0x272/0x340
> [  215.848727] [   T1110]        scsi_forget_host+0xf7/0x170
> [  215.849710] [   T1110]        scsi_remove_host+0xd2/0x2a0
> [  215.850682] [   T1110]        sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
> [  215.851788] [   T1110]        device_release_driver_internal+0x368/0x520
> [  215.852853] [   T1110]        bus_remove_device+0x1f1/0x3f0
> [  215.853885] [   T1110]        device_del+0x3bd/0x9c0
> [  215.854840] [   T1110]        device_unregister+0x13/0xa0
> [  215.855850] [   T1110]        sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
> [  215.856947] [   T1110]        scsi_debug_exit+0x17/0x70 [scsi_debug]
> [  215.857968] [   T1110]        __do_sys_delete_module.isra.0+0x321/0x520
> [  215.858999] [   T1110]        do_syscall_64+0x93/0x180
> [  215.859930] [   T1110]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  215.860974] [   T1110] 
>                           other info that might help us debug this:

This warning looks one real issue.


Thanks,
Ming

next prev parent reply	other threads:[~2025-02-07  8:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-07  1:24 blktests failures with v6.14-rc1 kernel Shinichiro Kawasaki
2025-02-07  8:52 ` Ming Lei [this message]
2025-02-07 12:22   ` Shinichiro Kawasaki
2025-02-07 12:47     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6XJuIz012XATr7h@fedora \
    --to=ming.lei@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=nbd@other.debian.org \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.