public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Stefan Haberland <sth@linux.ibm.com>
To: Ming Lei <ming.lei@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org
Cc: "Nilay Shroff" <nilay@linux.ibm.com>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Christoph Hellwig" <hch@lst.de>
Subject: Re: [PATCH V3 00/20] block: unify elevator changing and fix lockdep warning
Date: Tue, 29 Apr 2025 14:00:27 +0200	[thread overview]
Message-ID: <0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com> (raw)
In-Reply-To: <20250424152148.1066220-1-ming.lei@redhat.com>

Am 24.04.25 um 17:21 schrieb Ming Lei:

> Hello Jens,
>
> This patchset cleans up elevator change code, and unifying it via single
> helper, meantime moves kobject_add/del & debugfs register/unregister out of
> queue freezing & elevator_lock. This way fixes many lockdep warnings
> reported recently, especially since fs_reclaim is connected with freeze lock
> manually by commit ffa1e7ada456 ("block: Make request_queue lockdep splats
> show up earlier").
>
>
> Thanks,
> Ming
>
> V3:
> 	- replace srcu with rw_sem for avoiding race between add/del disk &
> 	  elevator switch and updating nr_hw_queues (Nilay Shoff)
>
> 	- add elv_update_nr_hw_queues() for elevator reattachment in case of
> 	updating nr_hw_queues, meantime keep elv_change_ctx as local structure
> 	(Christoph)
>
> 	- replace ->elevator_lock with disk->rqos_state_mutex for covering wbt
> 	state change
>
> 	- add new patch "block: use q->elevator with ->elevator_lock held in elv_iosched_show()"
>
> 	- small cleanup & commit log improvement
>
> V2:
> 	- retry add/del disk when blk_mq_update_nr_hw_queues() is in-progress
>
> 	- swap blk_mq_add_queue_tag_set() with blk_mq_map_swqueue() in
> 	blk_mq_init_allocated_queue() (Nilay Shroff)
>
> 	- move ELEVATOR_FLAG_DISABLE_WBT to request queue's flags (Nilay Shoff) 
>
> 	- fix race because of delaying elevator unregister
>
> 	- define flags of `elv_change_ctx` as `bool` (Christoph)
>
> 	- improve comment and commit log (Christoph)
>
> Ming Lei (20):
>   block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue()
>   block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag
>   block: don't call freeze queue in elevator_switch() and
>     elevator_disable()
>   block: use q->elevator with ->elevator_lock held in elv_iosched_show()
>   block: add two helpers for registering/un-registering sched debugfs
>   block: move sched debugfs register into elvevator_register_queue
>   block: prevent adding/deleting disk during updating nr_hw_queues
>   block: don't allow to switch elevator if updating nr_hw_queues is
>     in-progress
>   block: simplify elevator reattachment for updating nr_hw_queues
>   block: move blk_unregister_queue() & device_del() after freeze wait
>   block: move queue freezing & elevator_lock into elevator_change()
>   block: add `struct elv_change_ctx` for unifying elevator change
>   block: unifying elevator change
>   block: pass elevator_queue to elv_register_queue & unregister_queue
>   block: fail to show/store elevator sysfs attribute if elevator is
>     dying
>   block: move elv_register[unregister]_queue out of elevator_lock
>   block: move debugfs/sysfs register out of freezing queue
>   block: remove several ->elevator_lock
>   block: move hctx cpuhp add/del out of queue freezing
>   block: move wbt_enable_default() out of queue freezing from sched
>     ->exit()
>
>  block/bfq-iosched.c    |   6 +-
>  block/blk-mq-debugfs.c |  12 +-
>  block/blk-mq-sched.c   |  41 +++---
>  block/blk-mq.c         | 132 +++---------------
>  block/blk-sysfs.c      |  24 ++--
>  block/blk-wbt.c        |  13 +-
>  block/blk.h            |   8 +-
>  block/elevator.c       | 302 ++++++++++++++++++++++++++++-------------
>  block/elevator.h       |   6 +-
>  block/genhd.c          | 129 +++++++++++-------
>  include/linux/blk-mq.h |   3 +
>  include/linux/blkdev.h |   5 +
>  12 files changed, 365 insertions(+), 316 deletions(-)

Hi,
while testing the patchset on s390 I still get the following lockdep splat on each boot:

======================================================
 WARNING: possible circular locking dependency detected
 6.15.0-rc4-gc2b4d8dcb3d2 #3 Not tainted
 ------------------------------------------------------
 (udev-worker)/1810 is trying to acquire lock:
 0000005fb84de3a8 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0x54/0x130

 but task is already holding lock:
 0000005fb84dde18 (&q->q_usage_counter(io)#34){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x26/0x40

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #3 (&q->q_usage_counter(io)#34){++++}-{0:0}:
        __lock_acquire+0x6da/0xcc0
        lock_acquire.part.0+0x10c/0x290
        lock_acquire+0xb0/0x1a0
        blk_alloc_queue+0x306/0x340
        blk_mq_alloc_queue+0x60/0xd0
        scsi_alloc_sdev+0x27c/0x3b0
        scsi_probe_and_add_lun+0x31a/0x480
        scsi_report_lun_scan+0x382/0x430
        __scsi_scan_target+0x11a/0x240
        scsi_scan_target+0xdc/0x100
        fc_scsi_scan_rport+0xc2/0xd0
        process_one_work+0x2a6/0x5d0
        worker_thread+0x220/0x410
        kthread+0x164/0x2d0
        __ret_from_fork+0x3c/0x60
        ret_from_fork+0xa/0x38

 -> #2 (fs_reclaim){+.+.}-{0:0}:
        __lock_acquire+0x6da/0xcc0
        lock_acquire.part.0+0x10c/0x290
        lock_acquire+0xb0/0x1a0
        __fs_reclaim_acquire+0x44/0x50
        fs_reclaim_acquire+0xba/0x100
        __kmalloc_noprof+0xae/0x5e0
        pcpu_alloc_chunk+0x30/0x170
        pcpu_create_chunk+0x22/0x130
        pcpu_alloc_noprof+0x842/0x970
        do_kmem_cache_create+0x1e0/0x4b0
        __kmem_cache_create_args+0x238/0x340
        register_ftrace_graph+0x438/0x460
        trace_selftest_startup_function_graph+0x62/0x260
        run_tracer_selftest+0x116/0x1b0
        register_tracer+0x192/0x260
        do_one_initcall+0x4a/0x180
        do_initcalls+0x146/0x170
        kernel_init_freeable+0x230/0x270
        kernel_init+0x2e/0x188
        __ret_from_fork+0x3c/0x60
        ret_from_fork+0xa/0x38

 -> #1 (pcpu_alloc_mutex){+.+.}-{4:4}:
        __lock_acquire+0x6da/0xcc0
        lock_acquire.part.0+0x10c/0x290
        lock_acquire+0xb0/0x1a0
        __mutex_lock+0xae/0xa20
        mutex_lock_killable_nested+0x32/0x40
        pcpu_alloc_noprof+0x6ea/0x970
        sbitmap_init_node+0x11a/0x230
        sbitmap_queue_init_node+0x3e/0x1c0
        blk_mq_init_tags+0xac/0x140
        blk_mq_alloc_map_and_rqs+0xb0/0x180
        blk_mq_init_sched+0xfc/0x200
        elevator_switch+0x74/0x260
        __elevator_change+0xde/0x150
        elevator_change+0xe4/0x130
        elevator_set_default+0x6c/0xc0
        blk_register_queue+0x188/0x210
        __add_disk_fwnode+0x202/0x420
        add_disk_fwnode+0x7a/0xb0
        sd_probe+0x22e/0x3f0
        really_probe+0x2ac/0x430
        driver_probe_device+0x3c/0xc0
        __device_attach_driver+0xc0/0x140
        bus_for_each_drv+0x90/0xe0
        __device_attach_async_helper+0x94/0xf0
        async_run_entry_fn+0x4a/0x180
        process_one_work+0x2a6/0x5d0
        worker_thread+0x220/0x410
        kthread+0x164/0x2d0
        __ret_from_fork+0x3c/0x60
        ret_from_fork+0xa/0x38

 -> #0 (&q->elevator_lock){+.+.}-{4:4}:
        check_prev_add+0x16c/0xff0
        validate_chain+0x734/0x9f0
        __lock_acquire+0x6da/0xcc0
        lock_acquire.part.0+0x10c/0x290
        lock_acquire+0xb0/0x1a0
        __mutex_lock+0xae/0xa20
        mutex_lock_nested+0x32/0x40
        elevator_change+0x54/0x130
        elv_iosched_store+0xec/0x140
        kernfs_fop_write_iter+0x166/0x230
        vfs_write+0x1ac/0x480
        ksys_write+0x7c/0x100
        __do_syscall+0x156/0x290
        system_call+0x74/0x98

        other info that might help us debug this:

 Chain exists of:
                 &q->elevator_lock --> fs_reclaim --> &q->q_usage_counter(io)#34

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&q->q_usage_counter(io)#34);
                                lock(fs_reclaim);
                                lock(&q->q_usage_counter(io)#34);
   lock(&q->elevator_lock);

                *** DEADLOCK ***

 6 locks held by (udev-worker)/1810:
  #0: 0000005f9d141450 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x7c/0x100
  #1: 0000005f8ee55090 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x12a/0x230
  #2: 0000005fd0be69a0 (kn->active#68){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x136/0x230
  #3: 00000066feee93c0 (&set->update_nr_hwq_sema/1){.+.+}-{4:4}, at: elv_iosched_store+0xde/0x140
  #4: 0000005fb84dde18 (&q->q_usage_counter(io)#34){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x26/0x40
  #5: 0000005fb84dde58 (&q->q_usage_counter(queue)#11){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x26/0x40

               stack backtrace:
 CPU: 18 UID: 0 PID: 1810 Comm: (udev-worker) Not tainted 6.15.0-rc4-gc2b4d8dcb3d2 #3 PREEMPT
 Hardware name: IBM 9175 ME1 701 (LPAR)
 Call Trace:
  [<00000162f415409a>] dump_stack_lvl+0xa2/0xe8
  [<00000162f424cf9c>] print_circular_bug+0x18c/0x210
  [<00000162f424d198>] check_noncircular+0x178/0x190
  [<00000162f424e57c>] check_prev_add+0x16c/0xff0
  [<00000162f424fb34>] validate_chain+0x734/0x9f0
  [<00000162f4251f8a>] __lock_acquire+0x6da/0xcc0
  [<00000162f425267c>] lock_acquire.part.0+0x10c/0x290
  [<00000162f42528b0>] lock_acquire+0xb0/0x1a0
  [<00000162f525695e>] __mutex_lock+0xae/0xa20
  [<00000162f5257302>] mutex_lock_nested+0x32/0x40
  [<00000162f4c00a54>] elevator_change+0x54/0x130
  [<00000162f4c0136c>] elv_iosched_store+0xec/0x140
  [<00000162f46b6f66>] kernfs_fop_write_iter+0x166/0x230
  [<00000162f45cbe3c>] vfs_write+0x1ac/0x480
  [<00000162f45cc2ac>] ksys_write+0x7c/0x100
  [<00000162f524e956>] __do_syscall+0x156/0x290
  [<00000162f525ff94>] system_call+0x74/0x98
 INFO: lockdep is turned off.


The trigger is a udev rule changing the scheduler for the device.

thanks,
Stefan

  parent reply	other threads:[~2025-04-29 12:00 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-24 15:21 [PATCH V3 00/20] block: unify elevator changing and fix lockdep warning Ming Lei
2025-04-24 15:21 ` [PATCH V3 01/20] block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue() Ming Lei
2025-04-24 15:21 ` [PATCH V3 02/20] block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag Ming Lei
2025-04-25 14:29   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 03/20] block: don't call freeze queue in elevator_switch() and elevator_disable() Ming Lei
2025-04-25 14:29   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 04/20] block: use q->elevator with ->elevator_lock held in elv_iosched_show() Ming Lei
2025-04-25  6:08   ` Hannes Reinecke
2025-04-25 10:53   ` Nilay Shroff
2025-04-25 14:29   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 05/20] block: add two helpers for registering/un-registering sched debugfs Ming Lei
2025-04-24 15:21 ` [PATCH V3 06/20] block: move sched debugfs register into elvevator_register_queue Ming Lei
2025-04-24 15:21 ` [PATCH V3 07/20] block: prevent adding/deleting disk during updating nr_hw_queues Ming Lei
2025-04-25  6:33   ` Hannes Reinecke
2025-04-25 11:37   ` Nilay Shroff
2025-04-25 14:33   ` Christoph Hellwig
2025-04-25 16:46   ` kernel test robot
2025-04-24 15:21 ` [PATCH V3 08/20] block: don't allow to switch elevator if updating nr_hw_queues is in-progress Ming Lei
2025-04-25  6:33   ` Hannes Reinecke
2025-04-25 12:48   ` Nilay Shroff
2025-04-27  2:27     ` Ming Lei
2025-04-28 16:17       ` Nilay Shroff
2025-04-29  2:43         ` Ming Lei
2025-04-29 10:22           ` Nilay Shroff
2025-04-30  0:54             ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 09/20] block: simplify elevator reattachment for updating nr_hw_queues Ming Lei
2025-04-25  6:34   ` Hannes Reinecke
2025-04-25 18:12   ` Christoph Hellwig
2025-04-29  9:51     ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 10/20] block: move blk_unregister_queue() & device_del() after freeze wait Ming Lei
2025-04-25  6:35   ` Hannes Reinecke
2025-04-25 12:50   ` Nilay Shroff
2025-04-25 14:34   ` Christoph Hellwig
2025-04-28 11:51     ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 11/20] block: move queue freezing & elevator_lock into elevator_change() Ming Lei
2025-04-25  6:36   ` Hannes Reinecke
2025-04-25 12:54   ` Nilay Shroff
2025-04-25 14:35   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 12/20] block: add `struct elv_change_ctx` for unifying elevator change Ming Lei
2025-04-25  6:38   ` Hannes Reinecke
2025-04-25 18:23   ` Christoph Hellwig
2025-04-29 15:45     ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 13/20] block: " Ming Lei
2025-04-25  6:39   ` Hannes Reinecke
2025-04-25 13:03   ` Nilay Shroff
2025-04-30  0:46     ` Ming Lei
2025-04-25 18:30   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 14/20] block: pass elevator_queue to elv_register_queue & unregister_queue Ming Lei
2025-04-25  6:40   ` Hannes Reinecke
2025-04-24 15:21 ` [PATCH V3 15/20] block: fail to show/store elevator sysfs attribute if elevator is dying Ming Lei
2025-04-25  6:45   ` Hannes Reinecke
2025-04-25 18:36   ` Christoph Hellwig
2025-04-30  1:07     ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 16/20] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
2025-04-25  6:46   ` Hannes Reinecke
2025-04-25 13:05   ` Nilay Shroff
2025-04-25 18:37   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 17/20] block: move debugfs/sysfs register out of freezing queue Ming Lei
2025-04-25  6:47   ` Hannes Reinecke
2025-04-25 18:38   ` Christoph Hellwig
2025-04-28 11:28     ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 18/20] block: remove several ->elevator_lock Ming Lei
2025-04-25  6:48   ` Hannes Reinecke
2025-04-25 18:38   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 19/20] block: move hctx cpuhp add/del out of queue freezing Ming Lei
2025-04-25  6:49   ` Hannes Reinecke
2025-04-24 15:21 ` [PATCH V3 20/20] block: move wbt_enable_default() out of queue freezing from sched ->exit() Ming Lei
2025-04-25 13:10   ` Nilay Shroff
2025-04-29 10:59   ` Nilay Shroff
2025-04-29 12:00 ` Stefan Haberland [this message]
2025-04-29 12:11   ` [PATCH V3 00/20] block: unify elevator changing and fix lockdep warning Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com \
    --to=sth@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=nilay@linux.ibm.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox