All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	linux-block@vger.kernel.org, "Nilay Shroff" <nilay@linux.ibm.com>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH 04/15] block: prevent elevator switch during updating nr_hw_queues
Date: Mon, 14 Apr 2025 08:54:36 +0800	[thread overview]
Message-ID: <Z_xczMuX5_yDKdAs@fedora> (raw)
In-Reply-To: <20250410143622.GC10701@lst.de>

On Thu, Apr 10, 2025 at 04:36:22PM +0200, Christoph Hellwig wrote:
> On Thu, Apr 10, 2025 at 09:30:16PM +0800, Ming Lei wrote:
> > updating nr_hw_queues is usually used for error handling code, when it
> 
> Capitalize the first word of each sentence, please.
> 
> > doesn't make sense to allow blk-mq elevator switching, since nr_hw_queues
> > may change, and elevator tags depends on nr_hw_queues.
> 
> I don't think it's really updated from error handling

NVMe does use it in error handling. I can remove error handling words, but
the trouble doesn't change.

> 
>  - nbd does it when starting a device
>  - nullb can do it through debugfs
>  - xen-blkfront does it when resuming from a suspend
>  - nvme does it when resetting a controller.  While error handling
>    can escalate to it¸ it's basically probing and re-probing code

reset is part of error handling.

> 
> > Prevent elevator switch during updating nr_hw_queues by setting flag of
> > BLK_MQ_F_UPDATE_HW_QUEUES, and use srcu to fail elevator switch during
> > the period. Here elevator switch code is srcu reader of nr_hw_queues,
> > and blk_mq_update_nr_hw_queues() is the writer.
> 
> That being said as we generally are in a setup path I think the general
> idea is fine.  No devices should be life yet at this point and thus
> no udev rules changing the scheduler should run yet.
> 
> > This way avoids lot of trouble.
> 
> Can you spell that out a bit?

Closes: https://lore.kernel.org/linux-block/mz4t4tlwiqjijw3zvqnjb7ovvvaegkqganegmmlc567tt5xj67@xal5ro544cnc/

> 
> > Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > Closes: https://lore.kernel.org/linux-block/mz4t4tlwiqjijw3zvqnjb7ovvvaegkqganegmmlc567tt5xj67@xal5ro544cnc/
> 
> Are we using Closes for bug reports now?  I haven't really seen that
> anywhere.

The blktests block/039 isn't merged yet, and the patch is posted recently.

kernel panic and kasan is triggered in this test.

> 
> >  out_cleanup_srcu:
> >  	if (set->flags & BLK_MQ_F_BLOCKING)
> >  		cleanup_srcu_struct(set->srcu);
> > @@ -5081,7 +5087,18 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
> >  void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
> >  {
> >  	mutex_lock(&set->tag_list_lock);
> > +	/*
> > +	 * Mark us in updating nr_hw_queues for preventing switching
> > +	 * elevator
> >
> > +	 *
> > +	 * Elevator switch code can _not_ acquire ->tag_list_lock
> 
> Please add a . at the end of a sentences.  Also this should probably
> be something like "Mark us as in.." but I'll leave more nitpicking
> to the native speakers.

OK.

> 
> >  	struct request_queue *q = disk->queue;
> > +	struct blk_mq_tag_set *set = q->tag_set;
> >  
> >  	/*
> >  	 * If the attribute needs to load a module, do it before freezing the
> > @@ -732,6 +733,13 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
> >  
> >  	elv_iosched_load_module(name);
> >  
> > +	idx = srcu_read_lock(&set->update_nr_hwq_srcu);
> > +
> > +	if (set->flags & BLK_MQ_F_UPDATE_HW_QUEUES) {
> 
> What provides atomicity for field modifications vs reading of set->flags?
> i.e. does this need to switch using test/set_bit?

WRITE is serialized via tag set lock with synchronize_srcu().

READ is covered by srcu read lock.

It is typical RCU usage, one writer vs. multiple writer.

> 
> > +	struct srcu_struct	update_nr_hwq_srcu;
> >  };
> >  
> >  /**
> > @@ -681,7 +682,14 @@ enum {
> >  	 */
> >  	BLK_MQ_F_NO_SCHED_BY_DEFAULT	= 1 << 6,
> >  
> > -	BLK_MQ_F_MAX = 1 << 7,
> > +	/*
> > +	 * True when updating nr_hw_queues is in-progress
> > +	 *
> > +	 * tag_set only flag, not usable for hctx
> > +	 */
> > +	BLK_MQ_F_UPDATE_HW_QUEUES	= 1 << 7,
> > +
> > +	BLK_MQ_F_MAX = 1 << 8,
> 
> Also mixing internal state with driver provided flags is always
> a bad idea.  So this should probably be a new state field in the
> tag_set and not reuse flags.
 
That is fine, but BLK_MQ_F_TAG_QUEUE_SHARED is used in this way too.

thanks, 
Ming


  reply	other threads:[~2025-04-14  0:54 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-10 13:30 [PATCH 00/15] block: unify elevator changing and fix lockdep warning Ming Lei
2025-04-10 13:30 ` [PATCH 01/15] block: don't call freeze queue in elevator_switch() and elevator_disable() Ming Lei
2025-04-10 13:30 ` [PATCH 02/15] block: add two helpers for registering/un-registering sched debugfs Ming Lei
2025-04-10 14:25   ` Christoph Hellwig
2025-04-10 13:30 ` [PATCH 03/15] block: move sched debugfs register into elvevator_register_queue Ming Lei
2025-04-10 14:27   ` Christoph Hellwig
2025-04-14  0:42     ` Ming Lei
2025-04-10 13:30 ` [PATCH 04/15] block: prevent elevator switch during updating nr_hw_queues Ming Lei
2025-04-10 14:36   ` Christoph Hellwig
2025-04-14  0:54     ` Ming Lei [this message]
2025-04-14  6:07       ` Christoph Hellwig
2025-04-15  2:03         ` Ming Lei
2025-04-11 19:13   ` Nilay Shroff
2025-04-14  0:55     ` Ming Lei
2025-04-10 13:30 ` [PATCH 05/15] block: simplify elevator reset for " Ming Lei
2025-04-10 14:40   ` Christoph Hellwig
2025-04-10 15:34   ` Christoph Hellwig
2025-04-14  0:58     ` Ming Lei
2025-04-14  6:09       ` Christoph Hellwig
2025-04-15  2:05         ` Ming Lei
2025-04-10 13:30 ` [PATCH 06/15] block: add helper of elevator_change() Ming Lei
2025-04-10 13:30 ` [PATCH 07/15] block: move blk_unregister_queue() & device_del() after freeze wait Ming Lei
2025-04-14  6:19   ` Christoph Hellwig
2025-04-15  2:26     ` Ming Lei
2025-04-10 13:30 ` [PATCH 08/15] block: add `struct elev_change_ctx` for unifying elevator change Ming Lei
2025-04-14  6:21   ` Christoph Hellwig
2025-04-10 13:30 ` [PATCH 09/15] block: " Ming Lei
2025-04-10 18:37   ` Nilay Shroff
2025-04-14  1:22     ` Ming Lei
2025-04-15 12:30       ` Nilay Shroff
2025-04-16  1:49         ` Ming Lei
2025-04-10 13:30 ` [PATCH 10/15] block: pass elevator_queue to elv_register_queue & unregister_queue Ming Lei
2025-04-14  6:22   ` Christoph Hellwig
2025-04-15  2:31     ` Ming Lei
2025-04-16  4:53       ` Christoph Hellwig
2025-04-10 13:30 ` [PATCH 11/15] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
2025-04-11 19:20   ` Nilay Shroff
2025-04-14  1:24     ` Ming Lei
2025-04-15  9:39       ` Nilay Shroff
2025-04-15 10:32         ` Ming Lei
2025-04-10 13:30 ` [PATCH 12/15] block: move debugfs/sysfs register out of freezing queue Ming Lei
2025-04-10 18:57   ` Nilay Shroff
2025-04-14  1:42     ` Ming Lei
2025-04-15  9:37       ` Nilay Shroff
2025-04-15 10:06         ` Ming Lei
2025-04-15 11:15           ` Nilay Shroff
2025-04-15 11:54             ` Ming Lei
2025-04-15 12:21               ` Nilay Shroff
2025-04-15 12:41                 ` Ming Lei
2025-04-10 13:30 ` [PATCH 13/15] block: remove several ->elevator_lock Ming Lei
2025-04-10 19:07   ` Nilay Shroff
2025-04-14  1:46     ` Ming Lei
2025-04-10 13:30 ` [PATCH 14/15] block: move hctx cpuhp add/del out of queue freezing Ming Lei
2025-04-10 13:30 ` [PATCH 15/15] block: move wbt_enable_default() out of queue freezing from scheduler's ->exit() Ming Lei
2025-04-10 19:20   ` Nilay Shroff
2025-04-14  1:55     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z_xczMuX5_yDKdAs@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=nilay@linux.ibm.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.