From: Nilay Shroff <nilay@linux.ibm.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-block@vger.kernel.org, ming.lei@redhat.com,
axboe@kernel.dk, sth@linux.ibm.com, gjoyce@ibm.com
Subject: Re: [PATCHv3 2/2] block: fix lock dependency between percpu alloc lock and elevator lock
Date: Mon, 23 Jun 2025 15:03:27 +0530 [thread overview]
Message-ID: <f306f309-7cc2-4426-98bf-fb6684db5b7c@linux.ibm.com> (raw)
In-Reply-To: <20250623061015.GA30266@lst.de>
On 6/23/25 11:40 AM, Christoph Hellwig wrote:
> On Mon, Jun 16, 2025 at 11:02:26PM +0530, Nilay Shroff wrote:
>> +static void blk_mq_init_sched_tags(struct request_queue *q,
>> + struct blk_mq_hw_ctx *hctx,
>> + unsigned int hctx_idx,
>> + struct elevator_queue *eq)
>> {
>> if (blk_mq_is_shared_tags(q->tag_set->flags)) {
>> hctx->sched_tags = q->sched_shared_tags;
>> + return;
>> }
>>
>> + hctx->sched_tags = eq->tags->u.tags[hctx_idx];
>> }
>
> Given how trivial this function now is, please inline it in the only
> caller. That should also allow moving the blk_mq_is_shared_tags
> shared out of the loop over all hw contexts, and merge it with the
> check right next to it.
>
okay, will do it the next patchset.
>> +static void blk_mq_init_sched_shared_tags(struct request_queue *queue,
>> + struct elevator_queue *eq)
>> {
>> + queue->sched_shared_tags = eq->tags->u.shared_tags;
>> blk_mq_tag_update_sched_shared_tags(queue);
>> }
>
> This helper can also just be open coded in the caller now.
Yes agreed...
>
>> + if (blk_mq_is_shared_tags(set->flags)) {
>> + if (tags->u.shared_tags) {
>> + blk_mq_free_rqs(set, tags->u.shared_tags,
>> + BLK_MQ_NO_HCTX_IDX);
>> + blk_mq_free_rq_map(tags->u.shared_tags);
>> + }
>> + goto out;
>> + }
>> +
>> + if (!tags->u.tags)
>> + goto out;
>> +
>> + for (i = 0; i < tags->nr_hw_queues; i++) {
>> + if (tags->u.tags[i]) {
>> + blk_mq_free_rqs(set, tags->u.tags[i], i);
>> + blk_mq_free_rq_map(tags->u.tags[i]);
>> + }
>> + }
>
> Maybe restructucture this a bit:
>
> if (blk_mq_is_shared_tags(set->flags)) {
> ..
> } else if (tags->u.tags) {
> }
>
> kfree(tags);
>
> to have a simpler flow and remove the need for the "goto out"?
>
Okay but as you know I am rewriting this logic using Xarray as
Ming pointed in last email. So this code will be restructured and
I will ensure that your above comment will be addressed in new
logic.
>> + tags = kcalloc(1, sizeof(struct elevator_tags), gfp);
>
> This can use plain kzalloc.
>
>> + if (blk_mq_is_shared_tags(set->flags)) {
>> +
>> + tags->u.shared_tags = blk_mq_alloc_map_and_rqs(set,
>
> The empty line above is a bit odd.
Yes I would address this in next patchset.
>
>> + BLK_MQ_NO_HCTX_IDX,
>> + MAX_SCHED_RQ);
>> + if (!tags->u.shared_tags)
>> + goto out;
>> +
>> + return tags;
>> + }
>> +
>> + tags->u.tags = kcalloc(nr_hw_queues, sizeof(struct blk_mq_tags *), gfp);
>> + if (!tags->u.tags)
>> + goto out;
>> +
>> + tags->nr_hw_queues = nr_hw_queues;
>> + for (i = 0; i < nr_hw_queues; i++) {
>> + tags->u.tags[i] = blk_mq_alloc_map_and_rqs(set, i,
>> + tags->nr_requests);
>> + if (!tags->u.tags[i])
>> + goto out;
>> + }
>> +
>> + return tags;
>> +
>> +out:
>> + __blk_mq_free_sched_tags(set, tags);
>
> Is __blk_mq_free_sched_tags really the right thing here vs just unwinding
> what this function did?
>
Hmm, __blk_mq_free_sched_tags actually releases everything which is allocated
in this function (so there won't be any leak), however, it really _dose_not_
free resources in the reverse order. It just loops over starting from the first
nr_hw_queue index and releases whatsoever allocated. But I think you're right,
we could probably have more structured way of unwinding stuff here instead
of calling __blk_mq_free_sched_tags. I will handle this in the next patch.
>> + /*
>> + * Accessing q->elevator without holding q->elevator_lock is safe
>> + * because we're holding here set->update_nr_hwq_lock in the writer
>> + * context. So, scheduler update/switch code (which acquires the same
>> + * lock but in the reader context) can't run concurrently.
>> + */
>> + list_for_each_entry(q, &set->tag_list, tag_set_list) {
>> + if (q->elevator)
>> + count++;
>> + }
>
> Maybe add a helper for this code and the comment?
Yeah but as I mentioned above with Xarray code being added, we may
not need the above loop. So lets see, if needed I will add a macro
and add proper comment in next patch.
>
>> - lockdep_assert_held(&q->tag_set->update_nr_hwq_lock);
>> + lockdep_assert_held(&set->update_nr_hwq_lock);
>> +
>> + if (strncmp(ctx->name, "none", 4)) {
>
> This is a check for not having an elevator so far, right? Wouldn't
> !q->elevator be the more obvious check for that? Or am I missing
> something why that's not safe here?
>
This code runs in the context of an elevator switch, not as part of an
nr_hw_queues update. Hence, at this point, q->elevator has not yet been
updated to the new elevator we’re switching to, so accessing q->elevator
here would be incorrect. Since we've already stored the name of the target
elevator in ctx->name, we use that instead of referencing q->elevator here.
>> diff --git a/block/elevator.h b/block/elevator.h
>> index a4de5f9ad790..0b92121005cf 100644
>> --- a/block/elevator.h
>> +++ b/block/elevator.h
>> @@ -23,6 +23,17 @@ enum elv_merge {
>> struct blk_mq_alloc_data;
>> struct blk_mq_hw_ctx;
>>
>> +/* Holding context data for changing elevator */
>> +struct elv_change_ctx {
>> + const char *name;
>> + bool no_uevent;
>> +
>> + /* for unregistering old elevator */
>> + struct elevator_queue *old;
>> + /* for registering new elevator */
>> + struct elevator_queue *new;
>> +};
>
> No need to move this, it is still only used in elevator.c.
>
Yes agreed. I'm moved it back to its original place in
elevator.c
>> extern struct elevator_queue *elevator_alloc(struct request_queue *,
>> - struct elevator_type *);
>> + struct elevator_type *, struct elevator_tags *);
>
> Drop the extern while you're at it.
>
Yeah sure.
Thanks,
--Nilay
next prev parent reply other threads:[~2025-06-23 9:33 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-16 17:32 [PATCHv3 0/2] block: move sched_tags allocation/de-allocation outside of locking context Nilay Shroff
2025-06-16 17:32 ` [PATCHv3 1/2] block: move elevator queue allocation logic into blk_mq_init_sched Nilay Shroff
2025-06-17 15:07 ` Ming Lei
2025-06-20 14:39 ` Nilay Shroff
2025-06-20 15:17 ` Ming Lei
2025-06-20 16:13 ` Nilay Shroff
2025-06-23 5:56 ` Christoph Hellwig
2025-06-23 9:14 ` Nilay Shroff
2025-06-16 17:32 ` [PATCHv3 2/2] block: fix lock dependency between percpu alloc lock and elevator lock Nilay Shroff
2025-06-18 3:06 ` Ming Lei
2025-06-18 6:52 ` Nilay Shroff
2025-06-23 6:10 ` Christoph Hellwig
2025-06-23 9:33 ` Nilay Shroff [this message]
2025-06-23 13:36 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f306f309-7cc2-4426-98bf-fb6684db5b7c@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@kernel.dk \
--cc=gjoyce@ibm.com \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=sth@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox