public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Stefan Haberland <sth@linux.ibm.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	linux-block@vger.kernel.org, "Nilay Shroff" <nilay@linux.ibm.com>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Christoph Hellwig" <hch@lst.de>
Subject: Re: [PATCH V3 00/20] block: unify elevator changing and fix lockdep warning
Date: Tue, 29 Apr 2025 20:11:22 +0800	[thread overview]
Message-ID: <aBDB6sJS8dVxwe6x@fedora> (raw)
In-Reply-To: <0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com>

On Tue, Apr 29, 2025 at 02:00:27PM +0200, Stefan Haberland wrote:
> Am 24.04.25 um 17:21 schrieb Ming Lei:
> 
> > Hello Jens,
> >
> > This patchset cleans up elevator change code, and unifying it via single
> > helper, meantime moves kobject_add/del & debugfs register/unregister out of
> > queue freezing & elevator_lock. This way fixes many lockdep warnings
> > reported recently, especially since fs_reclaim is connected with freeze lock
> > manually by commit ffa1e7ada456 ("block: Make request_queue lockdep splats
> > show up earlier").
> >
> >
> > Thanks,
> > Ming
> >
> > V3:
> > 	- replace srcu with rw_sem for avoiding race between add/del disk &
> > 	  elevator switch and updating nr_hw_queues (Nilay Shoff)
> >
> > 	- add elv_update_nr_hw_queues() for elevator reattachment in case of
> > 	updating nr_hw_queues, meantime keep elv_change_ctx as local structure
> > 	(Christoph)
> >
> > 	- replace ->elevator_lock with disk->rqos_state_mutex for covering wbt
> > 	state change
> >
> > 	- add new patch "block: use q->elevator with ->elevator_lock held in elv_iosched_show()"
> >
> > 	- small cleanup & commit log improvement
> >
> > V2:
> > 	- retry add/del disk when blk_mq_update_nr_hw_queues() is in-progress
> >
> > 	- swap blk_mq_add_queue_tag_set() with blk_mq_map_swqueue() in
> > 	blk_mq_init_allocated_queue() (Nilay Shroff)
> >
> > 	- move ELEVATOR_FLAG_DISABLE_WBT to request queue's flags (Nilay Shoff) 
> >
> > 	- fix race because of delaying elevator unregister
> >
> > 	- define flags of `elv_change_ctx` as `bool` (Christoph)
> >
> > 	- improve comment and commit log (Christoph)
> >
> > Ming Lei (20):
> >   block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue()
> >   block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag
> >   block: don't call freeze queue in elevator_switch() and
> >     elevator_disable()
> >   block: use q->elevator with ->elevator_lock held in elv_iosched_show()
> >   block: add two helpers for registering/un-registering sched debugfs
> >   block: move sched debugfs register into elvevator_register_queue
> >   block: prevent adding/deleting disk during updating nr_hw_queues
> >   block: don't allow to switch elevator if updating nr_hw_queues is
> >     in-progress
> >   block: simplify elevator reattachment for updating nr_hw_queues
> >   block: move blk_unregister_queue() & device_del() after freeze wait
> >   block: move queue freezing & elevator_lock into elevator_change()
> >   block: add `struct elv_change_ctx` for unifying elevator change
> >   block: unifying elevator change
> >   block: pass elevator_queue to elv_register_queue & unregister_queue
> >   block: fail to show/store elevator sysfs attribute if elevator is
> >     dying
> >   block: move elv_register[unregister]_queue out of elevator_lock
> >   block: move debugfs/sysfs register out of freezing queue
> >   block: remove several ->elevator_lock
> >   block: move hctx cpuhp add/del out of queue freezing
> >   block: move wbt_enable_default() out of queue freezing from sched
> >     ->exit()
> >
> >  block/bfq-iosched.c    |   6 +-
> >  block/blk-mq-debugfs.c |  12 +-
> >  block/blk-mq-sched.c   |  41 +++---
> >  block/blk-mq.c         | 132 +++---------------
> >  block/blk-sysfs.c      |  24 ++--
> >  block/blk-wbt.c        |  13 +-
> >  block/blk.h            |   8 +-
> >  block/elevator.c       | 302 ++++++++++++++++++++++++++++-------------
> >  block/elevator.h       |   6 +-
> >  block/genhd.c          | 129 +++++++++++-------
> >  include/linux/blk-mq.h |   3 +
> >  include/linux/blkdev.h |   5 +
> >  12 files changed, 365 insertions(+), 316 deletions(-)
> 
> Hi,
> while testing the patchset on s390 I still get the following lockdep splat on each boot:
> 
> ======================================================
>  WARNING: possible circular locking dependency detected
>  6.15.0-rc4-gc2b4d8dcb3d2 #3 Not tainted
>  ------------------------------------------------------
>  (udev-worker)/1810 is trying to acquire lock:
>  0000005fb84de3a8 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0x54/0x130
> 
>  but task is already holding lock:
>  0000005fb84dde18 (&q->q_usage_counter(io)#34){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x26/0x40
> 
>  which lock already depends on the new lock.
> 
>  the existing dependency chain (in reverse order) is:
> 
>  -> #3 (&q->q_usage_counter(io)#34){++++}-{0:0}:
>         __lock_acquire+0x6da/0xcc0
>         lock_acquire.part.0+0x10c/0x290
>         lock_acquire+0xb0/0x1a0
>         blk_alloc_queue+0x306/0x340
>         blk_mq_alloc_queue+0x60/0xd0
>         scsi_alloc_sdev+0x27c/0x3b0
>         scsi_probe_and_add_lun+0x31a/0x480
>         scsi_report_lun_scan+0x382/0x430
>         __scsi_scan_target+0x11a/0x240
>         scsi_scan_target+0xdc/0x100
>         fc_scsi_scan_rport+0xc2/0xd0
>         process_one_work+0x2a6/0x5d0
>         worker_thread+0x220/0x410
>         kthread+0x164/0x2d0
>         __ret_from_fork+0x3c/0x60
>         ret_from_fork+0xa/0x38
> 
>  -> #2 (fs_reclaim){+.+.}-{0:0}:
>         __lock_acquire+0x6da/0xcc0
>         lock_acquire.part.0+0x10c/0x290
>         lock_acquire+0xb0/0x1a0
>         __fs_reclaim_acquire+0x44/0x50
>         fs_reclaim_acquire+0xba/0x100
>         __kmalloc_noprof+0xae/0x5e0
>         pcpu_alloc_chunk+0x30/0x170
>         pcpu_create_chunk+0x22/0x130
>         pcpu_alloc_noprof+0x842/0x970
>         do_kmem_cache_create+0x1e0/0x4b0
>         __kmem_cache_create_args+0x238/0x340
>         register_ftrace_graph+0x438/0x460
>         trace_selftest_startup_function_graph+0x62/0x260
>         run_tracer_selftest+0x116/0x1b0
>         register_tracer+0x192/0x260
>         do_one_initcall+0x4a/0x180
>         do_initcalls+0x146/0x170
>         kernel_init_freeable+0x230/0x270
>         kernel_init+0x2e/0x188
>         __ret_from_fork+0x3c/0x60
>         ret_from_fork+0xa/0x38
> 
>  -> #1 (pcpu_alloc_mutex){+.+.}-{4:4}:
>         __lock_acquire+0x6da/0xcc0
>         lock_acquire.part.0+0x10c/0x290
>         lock_acquire+0xb0/0x1a0
>         __mutex_lock+0xae/0xa20
>         mutex_lock_killable_nested+0x32/0x40
>         pcpu_alloc_noprof+0x6ea/0x970

It is one known dependency on percpu alloc lock, and we can't cover every
one in single patchset, especially this patchset is becoming bigger.

And this percpu lock splat becomes much easier to address after the elevator
patchset is merged.

We can move the elevator data allocation out of queue freeze, then the
dependency can be cut.


Thanks
Ming


      reply	other threads:[~2025-04-29 12:11 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-24 15:21 [PATCH V3 00/20] block: unify elevator changing and fix lockdep warning Ming Lei
2025-04-24 15:21 ` [PATCH V3 01/20] block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue() Ming Lei
2025-04-24 15:21 ` [PATCH V3 02/20] block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag Ming Lei
2025-04-25 14:29   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 03/20] block: don't call freeze queue in elevator_switch() and elevator_disable() Ming Lei
2025-04-25 14:29   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 04/20] block: use q->elevator with ->elevator_lock held in elv_iosched_show() Ming Lei
2025-04-25  6:08   ` Hannes Reinecke
2025-04-25 10:53   ` Nilay Shroff
2025-04-25 14:29   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 05/20] block: add two helpers for registering/un-registering sched debugfs Ming Lei
2025-04-24 15:21 ` [PATCH V3 06/20] block: move sched debugfs register into elvevator_register_queue Ming Lei
2025-04-24 15:21 ` [PATCH V3 07/20] block: prevent adding/deleting disk during updating nr_hw_queues Ming Lei
2025-04-25  6:33   ` Hannes Reinecke
2025-04-25 11:37   ` Nilay Shroff
2025-04-25 14:33   ` Christoph Hellwig
2025-04-25 16:46   ` kernel test robot
2025-04-24 15:21 ` [PATCH V3 08/20] block: don't allow to switch elevator if updating nr_hw_queues is in-progress Ming Lei
2025-04-25  6:33   ` Hannes Reinecke
2025-04-25 12:48   ` Nilay Shroff
2025-04-27  2:27     ` Ming Lei
2025-04-28 16:17       ` Nilay Shroff
2025-04-29  2:43         ` Ming Lei
2025-04-29 10:22           ` Nilay Shroff
2025-04-30  0:54             ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 09/20] block: simplify elevator reattachment for updating nr_hw_queues Ming Lei
2025-04-25  6:34   ` Hannes Reinecke
2025-04-25 18:12   ` Christoph Hellwig
2025-04-29  9:51     ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 10/20] block: move blk_unregister_queue() & device_del() after freeze wait Ming Lei
2025-04-25  6:35   ` Hannes Reinecke
2025-04-25 12:50   ` Nilay Shroff
2025-04-25 14:34   ` Christoph Hellwig
2025-04-28 11:51     ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 11/20] block: move queue freezing & elevator_lock into elevator_change() Ming Lei
2025-04-25  6:36   ` Hannes Reinecke
2025-04-25 12:54   ` Nilay Shroff
2025-04-25 14:35   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 12/20] block: add `struct elv_change_ctx` for unifying elevator change Ming Lei
2025-04-25  6:38   ` Hannes Reinecke
2025-04-25 18:23   ` Christoph Hellwig
2025-04-29 15:45     ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 13/20] block: " Ming Lei
2025-04-25  6:39   ` Hannes Reinecke
2025-04-25 13:03   ` Nilay Shroff
2025-04-30  0:46     ` Ming Lei
2025-04-25 18:30   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 14/20] block: pass elevator_queue to elv_register_queue & unregister_queue Ming Lei
2025-04-25  6:40   ` Hannes Reinecke
2025-04-24 15:21 ` [PATCH V3 15/20] block: fail to show/store elevator sysfs attribute if elevator is dying Ming Lei
2025-04-25  6:45   ` Hannes Reinecke
2025-04-25 18:36   ` Christoph Hellwig
2025-04-30  1:07     ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 16/20] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
2025-04-25  6:46   ` Hannes Reinecke
2025-04-25 13:05   ` Nilay Shroff
2025-04-25 18:37   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 17/20] block: move debugfs/sysfs register out of freezing queue Ming Lei
2025-04-25  6:47   ` Hannes Reinecke
2025-04-25 18:38   ` Christoph Hellwig
2025-04-28 11:28     ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 18/20] block: remove several ->elevator_lock Ming Lei
2025-04-25  6:48   ` Hannes Reinecke
2025-04-25 18:38   ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 19/20] block: move hctx cpuhp add/del out of queue freezing Ming Lei
2025-04-25  6:49   ` Hannes Reinecke
2025-04-24 15:21 ` [PATCH V3 20/20] block: move wbt_enable_default() out of queue freezing from sched ->exit() Ming Lei
2025-04-25 13:10   ` Nilay Shroff
2025-04-29 10:59   ` Nilay Shroff
2025-04-29 12:00 ` [PATCH V3 00/20] block: unify elevator changing and fix lockdep warning Stefan Haberland
2025-04-29 12:11   ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBDB6sJS8dVxwe6x@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=nilay@linux.ibm.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=sth@linux.ibm.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox