linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Nilay Shroff <nilay@linux.ibm.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Christoph Hellwig" <hch@lst.de>
Subject: Re: [PATCH 09/15] block: unifying elevator change
Date: Wed, 16 Apr 2025 09:49:06 +0800	[thread overview]
Message-ID: <Z_8Mkkja_ujKPYLd@fedora> (raw)
In-Reply-To: <a5896cdb-a59a-4a37-9f99-20522f5d2987@linux.ibm.com>

On Tue, Apr 15, 2025 at 06:00:47PM +0530, Nilay Shroff wrote:
> 
> 
> On 4/14/25 6:52 AM, Ming Lei wrote:
> > On Fri, Apr 11, 2025 at 12:07:34AM +0530, Nilay Shroff wrote:
> >>
> >>
> >> On 4/10/25 7:00 PM, Ming Lei wrote:
> >>>  /*
> >>>   * Use the default elevator settings. If the chosen elevator initialization
> >>>   * fails, fall back to the "none" elevator (no elevator).
> >>>   */
> >>> -void elevator_init_mq(struct request_queue *q)
> >>> +void elevator_set_default(struct request_queue *q)
> >>>  {
> >>> -	struct elevator_type *e;
> >>> -	unsigned int memflags;
> >>> +	struct elev_change_ctx ctx = { };
> >>>  	int err;
> >>>  
> >>> -	WARN_ON_ONCE(blk_queue_registered(q));
> >>> -
> >>> -	if (unlikely(q->elevator))
> >>> +	if (!queue_is_mq(q))
> >>>  		return;
> >>>  
> >>> -	e = elevator_get_default(q);
> >>> -	if (!e)
> >>> +	ctx.name = use_default_elevator(q) ? "mq-deadline" : "none";
> >>> +	if (!q->elevator && !strcmp(ctx.name, "none"))
> >>>  		return;
> >>> +	err = elevator_change(q, &ctx);
> >>> +	if (err < 0)
> >>> +		pr_warn("\"%s\" set elevator failed %d, "
> >>> +			"falling back to \"none\"\n", ctx.name, err);
> >>> +}
> >>>  
> >> If we fail to set the evator to default (mq-deadline) while registering queue, 
> >> because nr_hw_queue update is simultaneously running then we may end up setting 
> >> the queue elevator to none and that's not correct. Isn't it?
> > 
> > It still works with none.
> > 
> > I think it isn't one big deal. And if it is really one issue in future, we can
> > set one flag in elevator_set_default(), and let blk_mq_update_nr_hw_queues set
> > default sched for us.
> > 
> >>
> >>> +void elevator_set_none(struct request_queue *q)
> >>> +{
> >>> +	struct elev_change_ctx ctx = {
> >>> +		.name	= "none",
> >>> +		.uevent = 1,
> >>> +	};
> >>> +	int err;
> >>>  
> >>> -	blk_mq_unfreeze_queue(q, memflags);
> >>> +	if (!queue_is_mq(q))
> >>> +		return;
> >>>  
> >>> -	if (err) {
> >>> -		pr_warn("\"%s\" elevator initialization failed, "
> >>> -			"falling back to \"none\"\n", e->elevator_name);
> >>> -	}
> >>> +	if (!q->elevator)
> >>> +		return;
> >>>  
> >>> -	elevator_put(e);
> >>> +	err = elevator_change(q, &ctx);
> >>> +	if (err < 0)
> >>> +		pr_warn("%s: set none elevator failed %d\n", __func__, err);
> >>>  }
> >>>  
> >> Here as well if we fail to disable/exit elevator while deleting disk 
> >> because nr_hw_queue update is simultaneously running  then we may
> >> leak elevator resource? 
> > 
> > When blk_mq_update_nr_hw_queues() observes that queue is dying, it
> > forces to change elevator to none, so there isn't elevator leak issue.
> > 
> Yes if we get into blk_mq_update_nr_hw_queues after dying flag is set.
> But what if blk_mq_update_nr_hw_queues doesn't see dying flag and starts 
> running __elevator_change. However later we set dying flag from del_gendisk
> and starts running elevator_set_none simultaneously on another cpu? 
> In this case elevator_set_none would fail to set the elevator to "none" as 
> blk_mq_update_nr_hw_queues is running on another cpu. Isn't it?
> 
> >>
> >>> @@ -565,11 +559,7 @@ int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
> >>>  	if (disk->major == BLOCK_EXT_MAJOR)
> >>>  		blk_free_ext_minor(disk->first_minor);
> >>>  out_exit_elevator:
> >>> -	if (disk->queue->elevator) {
> >>> -		mutex_lock(&disk->queue->elevator_lock);
> >>> -		elevator_exit(disk->queue);
> >>> -		mutex_unlock(&disk->queue->elevator_lock);
> >>> -	}
> >>> +	elevator_set_none(disk->queue);
> >> Same comment as above here as well but this is in add_disk code path.
> > 
> > We can avoid it by forcing to change to none in blk_mq_update_nr_hw_queues() for
> > !blk_queue_registered()
> > 
> Here as well there's a thin race window possible assuming add_disk fails 
> after we registered queue. Assuming nr_hw_queue update starts running
> and it sees queue is registered however on another cpu add_disk fails 
> just after registering queue. So in this case still it might be possible
> that elevator_set_none might fail to set elevator to "none" just because
> nr_hw_queue update is running on another cpu. What do you think?

Yeah.

It isn't hard to solve, but just don't want to make the whole
implementation too complicated.

Another way is to prevent add_disk & del_disk from happening during
updating nr_hw_queues, and this way is reasonable too because both blk_mq
debugfs & sysfs registering depends on nr_hw_queues.

Meantime we can retry add_disk/del_disk until updating_nr_hw_queues are
finished, and one waitqueue can be added, so the wait can be:


add_disk():
	while (true) {
		srcu_read_lock()
		if (set->is_updating_nr_hw_queus) {
			srcu_read_unlock();
			goto wait;
		}
		__add_disk();
		srcu_read_unlock()
		break;
	wait:
		wait_event(set->wq, !set->is_updating_nr_hw_queus);
	}

Thanks,
Ming


  reply	other threads:[~2025-04-16  1:49 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-10 13:30 [PATCH 00/15] block: unify elevator changing and fix lockdep warning Ming Lei
2025-04-10 13:30 ` [PATCH 01/15] block: don't call freeze queue in elevator_switch() and elevator_disable() Ming Lei
2025-04-10 13:30 ` [PATCH 02/15] block: add two helpers for registering/un-registering sched debugfs Ming Lei
2025-04-10 14:25   ` Christoph Hellwig
2025-04-10 13:30 ` [PATCH 03/15] block: move sched debugfs register into elvevator_register_queue Ming Lei
2025-04-10 14:27   ` Christoph Hellwig
2025-04-14  0:42     ` Ming Lei
2025-04-10 13:30 ` [PATCH 04/15] block: prevent elevator switch during updating nr_hw_queues Ming Lei
2025-04-10 14:36   ` Christoph Hellwig
2025-04-14  0:54     ` Ming Lei
2025-04-14  6:07       ` Christoph Hellwig
2025-04-15  2:03         ` Ming Lei
2025-04-11 19:13   ` Nilay Shroff
2025-04-14  0:55     ` Ming Lei
2025-04-10 13:30 ` [PATCH 05/15] block: simplify elevator reset for " Ming Lei
2025-04-10 14:40   ` Christoph Hellwig
2025-04-10 15:34   ` Christoph Hellwig
2025-04-14  0:58     ` Ming Lei
2025-04-14  6:09       ` Christoph Hellwig
2025-04-15  2:05         ` Ming Lei
2025-04-10 13:30 ` [PATCH 06/15] block: add helper of elevator_change() Ming Lei
2025-04-10 13:30 ` [PATCH 07/15] block: move blk_unregister_queue() & device_del() after freeze wait Ming Lei
2025-04-14  6:19   ` Christoph Hellwig
2025-04-15  2:26     ` Ming Lei
2025-04-10 13:30 ` [PATCH 08/15] block: add `struct elev_change_ctx` for unifying elevator change Ming Lei
2025-04-14  6:21   ` Christoph Hellwig
2025-04-10 13:30 ` [PATCH 09/15] block: " Ming Lei
2025-04-10 18:37   ` Nilay Shroff
2025-04-14  1:22     ` Ming Lei
2025-04-15 12:30       ` Nilay Shroff
2025-04-16  1:49         ` Ming Lei [this message]
2025-04-10 13:30 ` [PATCH 10/15] block: pass elevator_queue to elv_register_queue & unregister_queue Ming Lei
2025-04-14  6:22   ` Christoph Hellwig
2025-04-15  2:31     ` Ming Lei
2025-04-16  4:53       ` Christoph Hellwig
2025-04-10 13:30 ` [PATCH 11/15] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
2025-04-11 19:20   ` Nilay Shroff
2025-04-14  1:24     ` Ming Lei
2025-04-15  9:39       ` Nilay Shroff
2025-04-15 10:32         ` Ming Lei
2025-04-10 13:30 ` [PATCH 12/15] block: move debugfs/sysfs register out of freezing queue Ming Lei
2025-04-10 18:57   ` Nilay Shroff
2025-04-14  1:42     ` Ming Lei
2025-04-15  9:37       ` Nilay Shroff
2025-04-15 10:06         ` Ming Lei
2025-04-15 11:15           ` Nilay Shroff
2025-04-15 11:54             ` Ming Lei
2025-04-15 12:21               ` Nilay Shroff
2025-04-15 12:41                 ` Ming Lei
2025-04-10 13:30 ` [PATCH 13/15] block: remove several ->elevator_lock Ming Lei
2025-04-10 19:07   ` Nilay Shroff
2025-04-14  1:46     ` Ming Lei
2025-04-10 13:30 ` [PATCH 14/15] block: move hctx cpuhp add/del out of queue freezing Ming Lei
2025-04-10 13:30 ` [PATCH 15/15] block: move wbt_enable_default() out of queue freezing from scheduler's ->exit() Ming Lei
2025-04-10 19:20   ` Nilay Shroff
2025-04-14  1:55     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z_8Mkkja_ujKPYLd@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=nilay@linux.ibm.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).