From: Ming Lei <ming.lei@redhat.com>
To: Nilay Shroff <nilay@linux.ibm.com>
Cc: linux-block@vger.kernel.org, yi.zhang@redhat.com, hch@lst.de,
yukuai1@huaweicloud.com, axboe@kernel.dk,
shinichiro.kawasaki@wdc.com, yukuai3@huawei.com, gjoyce@ibm.com
Subject: Re: [PATCH] block: restore two stage elevator switch while running nr_hw_queue update
Date: Fri, 18 Jul 2025 10:01:35 +0800 [thread overview]
Message-ID: <aHmq_8uMsdq49iZT@fedora> (raw)
In-Reply-To: <2d4dc1c3-2911-469a-95b6-6b482377a375@linux.ibm.com>
On Fri, Jul 18, 2025 at 12:13:25AM +0530, Nilay Shroff wrote:
>
>
> On 7/17/25 8:14 PM, Ming Lei wrote:
>
> >> +static int blk_mq_elv_switch_none(struct request_queue *q,
> >> + struct xarray *elv_tbl)
> >> +{
> >> + int ret = 0;
> >> +
> >> + lockdep_assert_held_write(&q->tag_set->update_nr_hwq_lock);
> >> +
> >> + /*
> >> + * Accessing q->elevator without holding q->elevator_lock is safe here
> >> + * because we're called from nr_hw_queue update which is protected by
> >> + * set->update_nr_hwq_lock in the writer context. So, scheduler update/
> >> + * switch code (which acquires the same lock in the reader context)
> >> + * can't run concurrently.
> >> + */
> >> + if (q->elevator) {
> >> + char *elevator_name = (char *)q->elevator->type->elevator_name;
> >> +
> >> + ret = xa_insert(elv_tbl, q->id, elevator_name, GFP_KERNEL);
> >
> > This way isn't memory safe, because elevator module can be reloaded
> > during updating nr_hw_queues. One solution is to build a string<->int mapping
> > table, and store the (int) index of elevator type to xarray.
> >
> Yes good point! How about avoiding this issue by using __elevator_get() and
> elevator_put()? We can take module reference while switching elevator to 'none'
> and then put the module reference while we switch back elevator.
Yeah, that is another way.
>
> >> -void elv_update_nr_hw_queues(struct request_queue *q)
> >> -{
> >> - struct elv_change_ctx ctx = {};
> >> - int ret = -ENODEV;
> >> -
> >> - WARN_ON_ONCE(q->mq_freeze_depth == 0);
> >> -
> >> - mutex_lock(&q->elevator_lock);
> >> - if (q->elevator && !blk_queue_dying(q) && blk_queue_registered(q)) {
> >> - ctx.name = q->elevator->type->elevator_name;
> >> -
> >> - /* force to reattach elevator after nr_hw_queue is updated */
> >> - ret = elevator_switch(q, &ctx);
> >> - }
> >> - mutex_unlock(&q->elevator_lock);
> >> - blk_mq_unfreeze_queue_nomemrestore(q);
> >> - if (!ret)
> >> - WARN_ON_ONCE(elevator_change_done(q, &ctx));
> >> -}
> >> -
> >> /*
> >> * Use the default elevator settings. If the chosen elevator initialization
> >> * fails, fall back to the "none" elevator (no elevator).
> >> diff --git a/block/elevator.h b/block/elevator.h
> >> index a07ce773a38f..440b6e766848 100644
> >> --- a/block/elevator.h
> >> +++ b/block/elevator.h
> >> @@ -85,6 +85,17 @@ struct elevator_type
> >> struct list_head list;
> >> };
> >>
> >> +/* Holding context data for changing elevator */
> >> +struct elv_change_ctx {
> >> + const char *name;
> >> + bool no_uevent;
> >> +
> >> + /* for unregistering old elevator */
> >> + struct elevator_queue *old;
> >> + /* for registering new elevator */
> >> + struct elevator_queue *new;
> >> +};
> >> +
> >
> > You may avoid the big chunk of code move for both `elv_change_ctx` and
> > `elv_update_nr_hw_queues()`, not sure why you did that in a bug fix patch.
> >
> This is because I’ve reintroduced the blk_mq_elv_switch_none and
> blk_mq_elv_switch_back functions. I replaced elv_update_nr_hw_queues with
> blk_mq_elv_switch_back(), which was the approach used prior to commit
> 596dce110b7d ("block: simplify elevator reattachment for updating nr_hw_queues").
>
> Both blk_mq_elv_switch_none and blk_mq_elv_switch_back are defined in blk-mq.c
> and use the elevator APIs for switching elevators. These APIs — namely
> elevator_switch and elevator_change_done — rely on the elv_change_ctx structure.
> As a result, I had to move some code around to ensure elv_change_ctx is accessible
> from within blk-mq.c.
>
> While it would be possible to avoid this code movement by continuing to use
> elv_update_nr_hw_queues, I felt it was cleaner to restore the two-stage elevator
> switch more fully by reintroducing the blk_mq_elv_switch_none and blk_mq_elv_switch_back
> APIs, which were used prior to the simplification in the aforementioned commit.
>
> Do you see any concerns with this code movement? Or is such restructuring generally
> avoided in a bug fix patch?
You still can add blk_mq_elv_switch_none() and blk_mq_elv_switch_back(),
meantime call elv_update_nr_hw_queues() from blk_mq_elv_switch_back() by
passing elevator type name.
If one bug fix is simpler, it may become easier to review & merge in -rc or
backport.
Thanks,
Ming
next prev parent reply other threads:[~2025-07-18 2:01 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-17 14:11 [PATCH] block: restore two stage elevator switch while running nr_hw_queue update Nilay Shroff
2025-07-17 14:44 ` Ming Lei
2025-07-17 18:43 ` Nilay Shroff
2025-07-18 2:01 ` Ming Lei [this message]
2025-07-18 6:06 ` Nilay Shroff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHmq_8uMsdq49iZT@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=gjoyce@ibm.com \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=nilay@linux.ibm.com \
--cc=shinichiro.kawasaki@wdc.com \
--cc=yi.zhang@redhat.com \
--cc=yukuai1@huaweicloud.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.