From: Nilay Shroff <nilay@linux.ibm.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: linux-block@vger.kernel.org, hch@lst.de, axboe@kernel.dk,
sth@linux.ibm.com, gjoyce@ibm.com
Subject: Re: [PATCHv3 2/2] block: fix lock dependency between percpu alloc lock and elevator lock
Date: Wed, 18 Jun 2025 12:22:50 +0530 [thread overview]
Message-ID: <085835ca-a2fb-4e10-82c2-3a7778431345@linux.ibm.com> (raw)
In-Reply-To: <aFItNzzr-4iQbjyY@fedora>
On 6/18/25 8:36 AM, Ming Lei wrote:
>> +int blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set,
>> + unsigned int nr_hw_queues,
>> + struct elevator_tags ***elv_tags)
>
> We seldom see triple indirect pointers in linux kernel, :-)
Yeah lets see if we could avoid it by using xarray as you suggested.
>
>> +{
>> + unsigned long idx = 0, count = 0;
>> + struct request_queue *q;
>> + struct elevator_tags **tags;
>> + gfp_t gfp = GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY;
>> +
>> + lockdep_assert_held_write(&set->update_nr_hwq_lock);
>> + /*
>> + * Accessing q->elevator without holding q->elevator_lock is safe
>> + * because we're holding here set->update_nr_hwq_lock in the writer
>> + * context. So, scheduler update/switch code (which acquires the same
>> + * lock but in the reader context) can't run concurrently.
>> + */
>> + list_for_each_entry(q, &set->tag_list, tag_set_list) {
>> + if (q->elevator)
>> + count++;
>> + }
>> +
>> + if (!count)
>> + return 0;
>> +
>> + tags = kcalloc(count, sizeof(struct elevator_tags *), gfp);
>> + if (!tags)
>> + return -ENOMEM;
>> +
>> + list_for_each_entry(q, &set->tag_list, tag_set_list) {
>> + if (q->elevator) {
>> + tags[idx] = __blk_mq_alloc_sched_tags(set, nr_hw_queues,
>> + q->id);
>> + if (!tags[idx])
>> + goto out;
>> +
>> + idx++;
>> + }
>> + }
>> +
>> + *elv_tags = tags;
>> + return count;
>> +out:
>> + blk_mq_free_sched_tags(set, tags);
>> + return -ENOMEM;
>> +}
>
> I believe lots of code can be saved if you take xarray to store the
> allocated 'struct elevator_tags', and use q->id as the key.
>
I think using xarray is a nice idea! I will rewrite this code
using xarray in the next revision.
> Also the handling for updating_nr_hw_queues has new batch alloc & free logic
> and should be done in standalone patch. Per my experience, bug is easier to
> hide in the big patch, which is more hard to review.
>
Sure, I'd further split the patch in the next revision.
>> @@ -676,11 +690,16 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx)
>> blk_mq_cancel_work_sync(q);
>> mutex_lock(&q->elevator_lock);
>> if (!(q->elevator && elevator_match(q->elevator->type, ctx->name)))
>> - ret = elevator_switch(q, ctx);
>> + ret = elevator_switch(q, ctx, tags);
>
> tags may be passed via 'ctx'.
Hmm ok we can do that. I will add @tags in @ctx.
>
>> mutex_unlock(&q->elevator_lock);
>> blk_mq_unfreeze_queue(q, memflags);
>> if (!ret)
>> ret = elevator_change_done(q, ctx);
>> + /*
>> + * Free sched tags if it's allocated but we couldn't switch elevator.
>> + */
>> + if (tags && !ctx->new)
>> + __blk_mq_free_sched_tags(set, tags);
>
> Maybe you can let elevator_change_done() cover failure handling, and
> move the above two line into elevator_change_done().
Yes precisely I thought about it, but then later realized that
elevator_change_done() may not be called always. For instance, in
case elevator_switch() fails and hence returns non-zero value
then in that case elevator_change_done() would be skipped. But
still in the elvator_switch() failure case, we'd have allocated
sched_tags and so we have to free it.
>
>> diff --git a/block/elevator.h b/block/elevator.h
>> index a4de5f9ad790..0b92121005cf 100644
>> --- a/block/elevator.h
>> +++ b/block/elevator.h
>> @@ -23,6 +23,17 @@ enum elv_merge {
>> struct blk_mq_alloc_data;
>> struct blk_mq_hw_ctx;
>>
>> +/* Holding context data for changing elevator */
>> +struct elv_change_ctx {
>> + const char *name;
>> + bool no_uevent;
>> +
>> + /* for unregistering old elevator */
>> + struct elevator_queue *old;
>> + /* for registering new elevator */
>> + struct elevator_queue *new;
>> +};
>> +
>
> 'elv_change_ctx' is only used in block/elevator.c, not sure why you
> move it to the header.
Oh yes, in the previous version of this patch it was used outside of
the elevator.c and so I moved it into a common header but later in
this patch I missed to move it back into the elevator.h file as now
the only user of 'elv_change_ctx' exists in elevator.c file. I will
fix this in the next revision.
>> struct elevator_mq_ops {
>> int (*init_sched)(struct request_queue *, struct elevator_queue *);
>> void (*exit_sched)(struct elevator_queue *);
>> @@ -107,12 +118,27 @@ void elv_rqhash_add(struct request_queue *q, struct request *rq);
>> void elv_rqhash_reposition(struct request_queue *q, struct request *rq);
>> struct request *elv_rqhash_find(struct request_queue *q, sector_t offset);
>>
>> +struct elevator_tags {
>> + /* num. of hardware queues for which tags are allocated */
>> + unsigned int nr_hw_queues;
>> + /* depth used while allocating tags */
>> + unsigned int nr_requests;
>> + /* request queue id (q->id) used during elevator_tags_lookup() */
>> + int id;
>> + union {
>> + /* tags shared across all queues */
>> + struct blk_mq_tags *shared_tags;
>> + /* array of blk_mq_tags pointers, one per hardware queue. */
>> + struct blk_mq_tags **tags;
>> + } u;
>> +};
>
> I feel it is simpler to just save shared tags in the 1st item
> of the array, then you can store 'struct blk_mq_tags' in flex array of
> 'struct elevator_tags', save one extra allocation & many failure handling
> code.
>
Yes sounds good! I will send out another patchset with above
suggested changes.
Thanks,
--Nilay
next prev parent reply other threads:[~2025-06-18 6:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-16 17:32 [PATCHv3 0/2] block: move sched_tags allocation/de-allocation outside of locking context Nilay Shroff
2025-06-16 17:32 ` [PATCHv3 1/2] block: move elevator queue allocation logic into blk_mq_init_sched Nilay Shroff
2025-06-17 15:07 ` Ming Lei
2025-06-20 14:39 ` Nilay Shroff
2025-06-20 15:17 ` Ming Lei
2025-06-20 16:13 ` Nilay Shroff
2025-06-23 5:56 ` Christoph Hellwig
2025-06-23 9:14 ` Nilay Shroff
2025-06-16 17:32 ` [PATCHv3 2/2] block: fix lock dependency between percpu alloc lock and elevator lock Nilay Shroff
2025-06-18 3:06 ` Ming Lei
2025-06-18 6:52 ` Nilay Shroff [this message]
2025-06-23 6:10 ` Christoph Hellwig
2025-06-23 9:33 ` Nilay Shroff
2025-06-23 13:36 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=085835ca-a2fb-4e10-82c2-3a7778431345@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@kernel.dk \
--cc=gjoyce@ibm.com \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=sth@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox