Re: [PATCHv3 2/2] block: fix lock dependency between percpu alloc lock and elevator lock

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: Nilay Shroff <nilay@linux.ibm.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-block@vger.kernel.org, ming.lei@redhat.com,
	axboe@kernel.dk, sth@linux.ibm.com, gjoyce@ibm.com
Subject: Re: [PATCHv3 2/2] block: fix lock dependency between percpu alloc lock and elevator lock
Date: Mon, 23 Jun 2025 15:03:27 +0530	[thread overview]
Message-ID: <f306f309-7cc2-4426-98bf-fb6684db5b7c@linux.ibm.com> (raw)
In-Reply-To: <20250623061015.GA30266@lst.de>



On 6/23/25 11:40 AM, Christoph Hellwig wrote:
> On Mon, Jun 16, 2025 at 11:02:26PM +0530, Nilay Shroff wrote:
>> +static void blk_mq_init_sched_tags(struct request_queue *q,
>> +				   struct blk_mq_hw_ctx *hctx,
>> +				   unsigned int hctx_idx,
>> +				   struct elevator_queue *eq)
>>  {
>>  	if (blk_mq_is_shared_tags(q->tag_set->flags)) {
>>  		hctx->sched_tags = q->sched_shared_tags;
>> +		return;
>>  	}
>>  
>> +	hctx->sched_tags = eq->tags->u.tags[hctx_idx];
>>  }
> 
> Given how trivial this function now is, please inline it in the only
> caller.  That should also allow moving the blk_mq_is_shared_tags
> shared out of the loop over all hw contexts, and merge it with the
> check right next to it.
> 
okay, will do it the next patchset.

>> +static void blk_mq_init_sched_shared_tags(struct request_queue *queue,
>> +		struct elevator_queue *eq)
>>  {
>> +	queue->sched_shared_tags = eq->tags->u.shared_tags;
>>  	blk_mq_tag_update_sched_shared_tags(queue);
>>  }
> 
> This helper can also just be open coded in the caller now.
Yes agreed...

> 
>> +	if (blk_mq_is_shared_tags(set->flags)) {
>> +		if (tags->u.shared_tags) {
>> +			blk_mq_free_rqs(set, tags->u.shared_tags,
>> +					BLK_MQ_NO_HCTX_IDX);
>> +			blk_mq_free_rq_map(tags->u.shared_tags);
>> +		}
>> +		goto out;
>> +	}
>> +
>> +	if (!tags->u.tags)
>> +		goto out;
>> +
>> +	for (i = 0; i < tags->nr_hw_queues; i++) {
>> +		if (tags->u.tags[i]) {
>> +			blk_mq_free_rqs(set, tags->u.tags[i], i);
>> +			blk_mq_free_rq_map(tags->u.tags[i]);
>> +		}
>> +	}
> 
> Maybe restructucture this a bit:
> 
> 	if (blk_mq_is_shared_tags(set->flags)) {
> 		..
> 	} else if (tags->u.tags) {
> 	}
> 
> 	kfree(tags);
> 
> to have a simpler flow and remove the need for the "goto out"?
> 
Okay but as you know I am rewriting this logic using Xarray as
Ming pointed in last email. So this code will be restructured and
I will ensure that your above comment will be addressed in new 
logic.

>> +	tags = kcalloc(1, sizeof(struct elevator_tags), gfp);
> 
> This can use plain kzalloc.
> 
>> +	if (blk_mq_is_shared_tags(set->flags)) {
>> +
>> +		tags->u.shared_tags = blk_mq_alloc_map_and_rqs(set,
> 
> The empty line above is a bit odd.
Yes I would address this in next patchset.

> 
>> +					BLK_MQ_NO_HCTX_IDX,
>> +					MAX_SCHED_RQ);
>> +		if (!tags->u.shared_tags)
>> +			goto out;
>> +
>> +		return tags;
>> +	}
>> +
>> +	tags->u.tags = kcalloc(nr_hw_queues, sizeof(struct blk_mq_tags *), gfp);
>> +	if (!tags->u.tags)
>> +		goto out;
>> +
>> +	tags->nr_hw_queues = nr_hw_queues;
>> +	for (i = 0; i < nr_hw_queues; i++) {
>> +		tags->u.tags[i] = blk_mq_alloc_map_and_rqs(set, i,
>> +				tags->nr_requests);
>> +		if (!tags->u.tags[i])
>> +			goto out;
>> +	}
>> +
>> +	return tags;
>> +
>> +out:
>> +	__blk_mq_free_sched_tags(set, tags);
> 
> Is __blk_mq_free_sched_tags really the right thing here vs just unwinding
> what this function did?
> 
Hmm, __blk_mq_free_sched_tags actually releases everything which is allocated
in this function (so there won't be any leak), however, it really _dose_not_ 
free resources in the reverse order. It just loops over starting from the first 
nr_hw_queue index and releases whatsoever allocated. But I think you're right, 
we could probably have more structured way of unwinding stuff here instead 
of calling __blk_mq_free_sched_tags. I will handle this in the next patch.

>> +	/*
>> +	 * Accessing q->elevator without holding q->elevator_lock is safe
>> +	 * because we're holding here set->update_nr_hwq_lock in the writer
>> +	 * context. So, scheduler update/switch code (which acquires the same
>> +	 * lock but in the reader context) can't run concurrently.
>> +	 */
>> +	list_for_each_entry(q, &set->tag_list, tag_set_list) {
>> +		if (q->elevator)
>> +			count++;
>> +	}
> 
> Maybe add a helper for this code and the comment?
Yeah but as I mentioned above with Xarray code being added, we may 
not need the above loop. So lets see, if needed I will add a macro
and add proper comment in next patch.

> 
>> -	lockdep_assert_held(&q->tag_set->update_nr_hwq_lock);
>> +	lockdep_assert_held(&set->update_nr_hwq_lock);
>> +
>> +	if (strncmp(ctx->name, "none", 4)) {
> 
> This is a check for not having an elevator so far, right?  Wouldn't
> !q->elevator be the more obvious check for that?  Or am I missing
> something why that's not safe here?
> 
This code runs in the context of an elevator switch, not as part of an
nr_hw_queues update. Hence, at this point, q->elevator has not yet been
updated to the new elevator we’re switching to, so accessing q->elevator
here would be incorrect. Since we've already stored the name of the target
elevator in ctx->name, we use that instead of referencing q->elevator here.

>> diff --git a/block/elevator.h b/block/elevator.h
>> index a4de5f9ad790..0b92121005cf 100644
>> --- a/block/elevator.h
>> +++ b/block/elevator.h
>> @@ -23,6 +23,17 @@ enum elv_merge {
>>  struct blk_mq_alloc_data;
>>  struct blk_mq_hw_ctx;
>>  
>> +/* Holding context data for changing elevator */
>> +struct elv_change_ctx {
>> +	const char *name;
>> +	bool no_uevent;
>> +
>> +	/* for unregistering old elevator */
>> +	struct elevator_queue *old;
>> +	/* for registering new elevator */
>> +	struct elevator_queue *new;
>> +};
> 
> No need to move this, it is still only used in elevator.c.
> 
Yes agreed. I'm moved it back to its original place in 
elevator.c

>>  extern struct elevator_queue *elevator_alloc(struct request_queue *,
>> -					struct elevator_type *);
>> +		struct elevator_type *, struct elevator_tags *);
> 
> Drop the extern while you're at it.
> 
Yeah sure.

Thanks,
--Nilay

next prev parent reply	other threads:[~2025-06-23  9:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-16 17:32 [PATCHv3 0/2] block: move sched_tags allocation/de-allocation outside of locking context Nilay Shroff
2025-06-16 17:32 ` [PATCHv3 1/2] block: move elevator queue allocation logic into blk_mq_init_sched Nilay Shroff
2025-06-17 15:07   ` Ming Lei
2025-06-20 14:39     ` Nilay Shroff
2025-06-20 15:17       ` Ming Lei
2025-06-20 16:13         ` Nilay Shroff
2025-06-23  5:56   ` Christoph Hellwig
2025-06-23  9:14     ` Nilay Shroff
2025-06-16 17:32 ` [PATCHv3 2/2] block: fix lock dependency between percpu alloc lock and elevator lock Nilay Shroff
2025-06-18  3:06   ` Ming Lei
2025-06-18  6:52     ` Nilay Shroff
2025-06-23  6:10   ` Christoph Hellwig
2025-06-23  9:33     ` Nilay Shroff [this message]
2025-06-23 13:36       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f306f309-7cc2-4426-98bf-fb6684db5b7c@linux.ibm.com \
    --to=nilay@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=gjoyce@ibm.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=sth@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox