[RFC PATCHv2 0/2] block: remove q->sysfs_dir_lock and fix race updating nr_hw

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCHv2 0/2] block: remove q->sysfs_dir_lock and fix race updating nr_hw_queue
@ 2025-01-23 17:40 Nilay Shroff
  2025-01-23 17:40 ` [RFC PATCHv2 1/2] block: get rid of request queue ->sysfs_dir_lock Nilay Shroff
  2025-01-23 17:40 ` [RFC PATCHv2 2/2] block: fix nr_hw_queue update racing with disk addition/removal Nilay Shroff
  0 siblings, 2 replies; 6+ messages in thread
From: Nilay Shroff @ 2025-01-23 17:40 UTC (permalink / raw)
  To: linux-block; +Cc: hch, ming.lei, dlemoal, axboe, gjoyce

Hi,

There're two patches in this patchest.
The first patch removes redundant q->sysfs_dir_lock. 
The second patch fixes nr_hw_queue update racing with disk addition/
removal.

In the current implementation we use q->sysfs_dir_lock for protecting
kobject addition/deletion while we register/unregister blk-mq with
sysfs. However the sysfs/kernfs internal implementation already protects
against the simultaneous addtion/deletion of kobjects. So in that sense
use of q->sysfs_dir_lock appears redundant.

Furthermore, there're few other callsites in the current code where we
use q->sysfs_dir_lock along with q->sysfs_lock while addition/deletion 
of independent access ranges for a disk under sysfs. Please refer, disk_
register_independent_access_ranges() and disk_unregister_independent_
access_ranges(). Here as well we could easily remove use of q->sysfs_dir_
lock.

The only thing which q->syfs_dir_lock appears to protect is the use of
variable q->mq_sysfs_init_done. However this could be solved by 
converting q->mq_sysfs_init_done to an atomic variable.

In past few days, we have seen many lockdep splat in block layer and
getting rid of this one might help reduce some contention as well we 
need to worry less about lock ordering wrt to q->sysfs_dir_lock. The 
first patch helps fix this.

The second patch addresses a potential race between nr_hw_queue update 
and disk addition/removal. The __blk_mq_update_nr_hw_queues function
removes and then adds hctx sysfs files. Similarly the disk addition/ 
removal code also adds or remove the hctx sysfs files. So it's quite 
possible that disk addition/removal code could race with __blk_mq_update_
nr_hw_queues() while it adds/deletes hctx sysfs files. Apparently, 
__blk_mq_update_nr_hw_queues() holds q->tag_list_lock while it runs, 
and so to avoid race between __blk_mq_update_nr_hw_queues() and disk 
addition/removal code, we should hold the same q->tag_list_lock while 
we add/delete hctx sysfs files while registering/uregistering disk queue. 
So the second patch in the series helps fix this race condition which may
manifests while we add/remove hctx sysfs files.

Nilay Shroff (2):
  block: get rid of request queue ->sysfs_dir_lock
  block: fix nr_hw_queue update racing with disk addition/removal

Changes from v1:
  - remove q->sysfs_init_done and replace it with registered queue
    flag (hch)
  - fix nr_hw_queue update racing with disk addition/removal (hch)
  - Link to v1: https://lore.kernel.org/all/20250120130413.789737-1-nilay@linux.ibm.com/

 block/blk-core.c       |  1 -
 block/blk-ia-ranges.c  |  4 ----
 block/blk-mq-sysfs.c   | 37 ++++++++++++++++++-------------------
 block/blk-sysfs.c      |  5 -----
 include/linux/blkdev.h |  3 ---
 5 files changed, 18 insertions(+), 32 deletions(-)

-- 
2.47.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCHv2 1/2] block: get rid of request queue ->sysfs_dir_lock
  2025-01-23 17:40 [RFC PATCHv2 0/2] block: remove q->sysfs_dir_lock and fix race updating nr_hw_queue Nilay Shroff
@ 2025-01-23 17:40 ` Nilay Shroff
  2025-01-28  5:46   ` Christoph Hellwig
  2025-01-23 17:40 ` [RFC PATCHv2 2/2] block: fix nr_hw_queue update racing with disk addition/removal Nilay Shroff
  1 sibling, 1 reply; 6+ messages in thread
From: Nilay Shroff @ 2025-01-23 17:40 UTC (permalink / raw)
  To: linux-block; +Cc: hch, ming.lei, dlemoal, axboe, gjoyce

The request queue uses ->sysfs_dir_lock for protecting the addition/
deletion of kobject entries under sysfs while we register/unregister
blk-mq. However kobject addition/deletion is already protected with
kernfs/sysfs internal synchronization primitives. So use of q->sysfs_
dir_lock seems redundant.

Moreover, q->sysfs_dir_lock is also used at few other callsites along
with q->sysfs_lock for protecting the addition/deletion of kojects.
One such example is when we register with sysfs a set of independent
access ranges for a disk. Here as well we could get rid off q->sysfs_
dir_lock and only use q->sysfs_lock.

The only variable which q->sysfs_dir_lock appears to protect is q->
mq_sysfs_init_done which is set/unset while registering/unregistering
blk-mq with sysfs. But use of q->mq_sysfs_init_done could be easily
replaced using queue registered bit QUEUE_FLAG_REGISTERED.

So with this patch we remove q->sysfs_dir_lock from each callsite
and replace q->mq_sysfs_init_done using QUEUE_FLAG_REGISTERED.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-core.c       |  1 -
 block/blk-ia-ranges.c  |  4 ----
 block/blk-mq-sysfs.c   | 23 +++++------------------
 block/blk-sysfs.c      |  5 -----
 include/linux/blkdev.h |  3 ---
 5 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 32fb28a6372c..d6c4fa3943b5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -430,7 +430,6 @@ struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id)
 	refcount_set(&q->refs, 1);
 	mutex_init(&q->debugfs_mutex);
 	mutex_init(&q->sysfs_lock);
-	mutex_init(&q->sysfs_dir_lock);
 	mutex_init(&q->limits_lock);
 	mutex_init(&q->rq_qos_mutex);
 	spin_lock_init(&q->queue_lock);
diff --git a/block/blk-ia-ranges.c b/block/blk-ia-ranges.c
index c9eb4241e048..d479f5481b66 100644
--- a/block/blk-ia-ranges.c
+++ b/block/blk-ia-ranges.c
@@ -111,7 +111,6 @@ int disk_register_independent_access_ranges(struct gendisk *disk)
 	struct request_queue *q = disk->queue;
 	int i, ret;
 
-	lockdep_assert_held(&q->sysfs_dir_lock);
 	lockdep_assert_held(&q->sysfs_lock);
 
 	if (!iars)
@@ -155,7 +154,6 @@ void disk_unregister_independent_access_ranges(struct gendisk *disk)
 	struct blk_independent_access_ranges *iars = disk->ia_ranges;
 	int i;
 
-	lockdep_assert_held(&q->sysfs_dir_lock);
 	lockdep_assert_held(&q->sysfs_lock);
 
 	if (!iars)
@@ -289,7 +287,6 @@ void disk_set_independent_access_ranges(struct gendisk *disk,
 {
 	struct request_queue *q = disk->queue;
 
-	mutex_lock(&q->sysfs_dir_lock);
 	mutex_lock(&q->sysfs_lock);
 	if (iars && !disk_check_ia_ranges(disk, iars)) {
 		kfree(iars);
@@ -313,6 +310,5 @@ void disk_set_independent_access_ranges(struct gendisk *disk,
 		disk_register_independent_access_ranges(disk);
 unlock:
 	mutex_unlock(&q->sysfs_lock);
-	mutex_unlock(&q->sysfs_dir_lock);
 }
 EXPORT_SYMBOL_GPL(disk_set_independent_access_ranges);
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 156e9bb07abf..6113328abd70 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -223,8 +223,6 @@ int blk_mq_sysfs_register(struct gendisk *disk)
 	unsigned long i, j;
 	int ret;
 
-	lockdep_assert_held(&q->sysfs_dir_lock);
-
 	ret = kobject_add(q->mq_kobj, &disk_to_dev(disk)->kobj, "mq");
 	if (ret < 0)
 		goto out;
@@ -237,7 +235,6 @@ int blk_mq_sysfs_register(struct gendisk *disk)
 			goto unreg;
 	}
 
-	q->mq_sysfs_init_done = true;
 
 out:
 	return ret;
@@ -259,15 +256,12 @@ void blk_mq_sysfs_unregister(struct gendisk *disk)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
-	lockdep_assert_held(&q->sysfs_dir_lock);
 
 	queue_for_each_hw_ctx(q, hctx, i)
 		blk_mq_unregister_hctx(hctx);
 
 	kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
 	kobject_del(q->mq_kobj);
-
-	q->mq_sysfs_init_done = false;
 }
 
 void blk_mq_sysfs_unregister_hctxs(struct request_queue *q)
@@ -275,15 +269,11 @@ void blk_mq_sysfs_unregister_hctxs(struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
-	mutex_lock(&q->sysfs_dir_lock);
-	if (!q->mq_sysfs_init_done)
-		goto unlock;
+	if (!blk_queue_registered(q))
+		return;
 
 	queue_for_each_hw_ctx(q, hctx, i)
 		blk_mq_unregister_hctx(hctx);
-
-unlock:
-	mutex_unlock(&q->sysfs_dir_lock);
 }
 
 int blk_mq_sysfs_register_hctxs(struct request_queue *q)
@@ -292,9 +282,8 @@ int blk_mq_sysfs_register_hctxs(struct request_queue *q)
 	unsigned long i;
 	int ret = 0;
 
-	mutex_lock(&q->sysfs_dir_lock);
-	if (!q->mq_sysfs_init_done)
-		goto unlock;
+	if (!blk_queue_registered(q))
+		goto out;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_register_hctx(hctx);
@@ -302,8 +291,6 @@ int blk_mq_sysfs_register_hctxs(struct request_queue *q)
 			break;
 	}
 
-unlock:
-	mutex_unlock(&q->sysfs_dir_lock);
-
+out:
 	return ret;
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index e09b455874bf..7b970e6765e7 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -764,7 +764,6 @@ int blk_register_queue(struct gendisk *disk)
 	struct request_queue *q = disk->queue;
 	int ret;
 
-	mutex_lock(&q->sysfs_dir_lock);
 	kobject_init(&disk->queue_kobj, &blk_queue_ktype);
 	ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue");
 	if (ret < 0)
@@ -805,7 +804,6 @@ int blk_register_queue(struct gendisk *disk)
 	if (q->elevator)
 		kobject_uevent(&q->elevator->kobj, KOBJ_ADD);
 	mutex_unlock(&q->sysfs_lock);
-	mutex_unlock(&q->sysfs_dir_lock);
 
 	/*
 	 * SCSI probing may synchronously create and destroy a lot of
@@ -830,7 +828,6 @@ int blk_register_queue(struct gendisk *disk)
 	mutex_unlock(&q->sysfs_lock);
 out_put_queue_kobj:
 	kobject_put(&disk->queue_kobj);
-	mutex_unlock(&q->sysfs_dir_lock);
 	return ret;
 }
 
@@ -861,7 +858,6 @@ void blk_unregister_queue(struct gendisk *disk)
 	blk_queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
 	mutex_unlock(&q->sysfs_lock);
 
-	mutex_lock(&q->sysfs_dir_lock);
 	/*
 	 * Remove the sysfs attributes before unregistering the queue data
 	 * structures that can be modified through sysfs.
@@ -878,7 +874,6 @@ void blk_unregister_queue(struct gendisk *disk)
 	/* Now that we've deleted all child objects, we can delete the queue. */
 	kobject_uevent(&disk->queue_kobj, KOBJ_REMOVE);
 	kobject_del(&disk->queue_kobj);
-	mutex_unlock(&q->sysfs_dir_lock);
 
 	blk_debugfs_remove(disk);
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 76f0a4e7c2e5..248416ecd01c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -561,7 +561,6 @@ struct request_queue {
 	struct list_head	flush_list;
 
 	struct mutex		sysfs_lock;
-	struct mutex		sysfs_dir_lock;
 	struct mutex		limits_lock;
 
 	/*
@@ -605,8 +604,6 @@ struct request_queue {
 	 * Serializes all debugfs metadata operations using the above dentries.
 	 */
 	struct mutex		debugfs_mutex;
-
-	bool			mq_sysfs_init_done;
 };
 
 /* Keep blk_queue_flag_name[] in sync with the definitions below */
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCHv2 2/2] block: fix nr_hw_queue update racing with disk addition/removal
  2025-01-23 17:40 [RFC PATCHv2 0/2] block: remove q->sysfs_dir_lock and fix race updating nr_hw_queue Nilay Shroff
  2025-01-23 17:40 ` [RFC PATCHv2 1/2] block: get rid of request queue ->sysfs_dir_lock Nilay Shroff
@ 2025-01-23 17:40 ` Nilay Shroff
  2025-01-28  5:49   ` Christoph Hellwig
  1 sibling, 1 reply; 6+ messages in thread
From: Nilay Shroff @ 2025-01-23 17:40 UTC (permalink / raw)
  To: linux-block; +Cc: hch, ming.lei, dlemoal, axboe, gjoyce

The nr_hw_queue update could potentially race with disk addtion/removal
while registering/unregistering hctx sysfs files. The __blk_mq_update_
nr_hw_queues() runs with q->tag_list_lock held and so to avoid it racing
with disk addition/removal we should acquire q->tag_list_lock while
registering/unregistering hctx sysfs files.

With this patch, blk_mq_sysfs_register() (called during disk addition)
and blk_mq_sysfs_unregister() (called during disk removal) now runs
with q->tag_list_lock held so that it avoids racing with __blk_mq_update
_nr_hw_queues().

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-mq-sysfs.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 6113328abd70..2b9df2b73bf6 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -229,22 +229,31 @@ int blk_mq_sysfs_register(struct gendisk *disk)
 
 	kobject_uevent(q->mq_kobj, KOBJ_ADD);
 
+	mutex_lock(&q->tag_set->tag_list_lock);
+
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_register_hctx(hctx);
-		if (ret)
+		if (ret) {
+			mutex_unlock(&q->tag_set->tag_list_lock);
 			goto unreg;
+		}
 	}
 
+	mutex_unlock(&q->tag_set->tag_list_lock);
 
 out:
 	return ret;
 
 unreg:
+	mutex_lock(&q->tag_set->tag_list_lock);
+
 	queue_for_each_hw_ctx(q, hctx, j) {
 		if (j < i)
 			blk_mq_unregister_hctx(hctx);
 	}
 
+	mutex_unlock(&q->tag_set->tag_list_lock);
+
 	kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
 	kobject_del(q->mq_kobj);
 	return ret;
@@ -256,10 +265,13 @@ void blk_mq_sysfs_unregister(struct gendisk *disk)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
+	mutex_lock(&q->tag_set->tag_list_lock);
 
 	queue_for_each_hw_ctx(q, hctx, i)
 		blk_mq_unregister_hctx(hctx);
 
+	mutex_unlock(&q->tag_set->tag_list_lock);
+
 	kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
 	kobject_del(q->mq_kobj);
 }
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCHv2 1/2] block: get rid of request queue ->sysfs_dir_lock
  2025-01-23 17:40 ` [RFC PATCHv2 1/2] block: get rid of request queue ->sysfs_dir_lock Nilay Shroff
@ 2025-01-28  5:46   ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2025-01-28  5:46 UTC (permalink / raw)
  To: Nilay Shroff; +Cc: linux-block, hch, ming.lei, dlemoal, axboe, gjoyce

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCHv2 2/2] block: fix nr_hw_queue update racing with disk addition/removal
  2025-01-23 17:40 ` [RFC PATCHv2 2/2] block: fix nr_hw_queue update racing with disk addition/removal Nilay Shroff
@ 2025-01-28  5:49   ` Christoph Hellwig
  2025-01-28  7:11     ` Nilay Shroff
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2025-01-28  5:49 UTC (permalink / raw)
  To: Nilay Shroff; +Cc: linux-block, hch, ming.lei, dlemoal, axboe, gjoyce

On Thu, Jan 23, 2025 at 11:10:24PM +0530, Nilay Shroff wrote:
>  
> +	mutex_lock(&q->tag_set->tag_list_lock);
> +
>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		ret = blk_mq_register_hctx(hctx);
> -		if (ret)
> +		if (ret) {
> +			mutex_unlock(&q->tag_set->tag_list_lock);
>  			goto unreg;
> +		}
>  	}
>  
> +	mutex_unlock(&q->tag_set->tag_list_lock);
>  
>  out:
>  	return ret;
>  
>  unreg:
> +	mutex_lock(&q->tag_set->tag_list_lock);
> +

No real need for a unlock/lock cycle here I think.  Also as something
that is really just a nitpick, I love to keep the locks for the critical
sections close to the code they're protecting.  e.g. format this as:

	if (ret < 0)
		return ret;

	mutex_lock(&q->tag_set->tag_list_lock);
	queue_for_each_hw_ctx(q, hctx, i) {
		ret = blk_mq_register_hctx(hctx);
		if (ret)
			goto out_unregister;
	}
	mutex_unlock(&q->tag_set->tag_list_lock);
 	return 0

out_unregister:
	queue_for_each_hw_ctx(q, hctx, j) {
 		if (j < i)
			blk_mq_unregister_hctx(hctx);
	}
	mutex_unlock(&q->tag_set->tag_list_lock);

...

(and similar for blk_mq_sysfs_unregister).

I assume you did run this through blktests and xfstests with lockdep
enabled to catch if we created some new lock ordering problems?
I can't think of any right now, but it's good to validate that.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCHv2 2/2] block: fix nr_hw_queue update racing with disk addition/removal
  2025-01-28  5:49   ` Christoph Hellwig
@ 2025-01-28  7:11     ` Nilay Shroff
  0 siblings, 0 replies; 6+ messages in thread
From: Nilay Shroff @ 2025-01-28  7:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, ming.lei, dlemoal, axboe, gjoyce



On 1/28/25 11:19 AM, Christoph Hellwig wrote:
> On Thu, Jan 23, 2025 at 11:10:24PM +0530, Nilay Shroff wrote:
>>  
>> +	mutex_lock(&q->tag_set->tag_list_lock);
>> +
>>  	queue_for_each_hw_ctx(q, hctx, i) {
>>  		ret = blk_mq_register_hctx(hctx);
>> -		if (ret)
>> +		if (ret) {
>> +			mutex_unlock(&q->tag_set->tag_list_lock);
>>  			goto unreg;
>> +		}
>>  	}
>>  
>> +	mutex_unlock(&q->tag_set->tag_list_lock);
>>  
>>  out:
>>  	return ret;
>>  
>>  unreg:
>> +	mutex_lock(&q->tag_set->tag_list_lock);
>> +
> 
> No real need for a unlock/lock cycle here I think.  Also as something
> that is really just a nitpick, I love to keep the locks for the critical
> sections close to the code they're protecting.  e.g. format this as:
> 
> 	if (ret < 0)
> 		return ret;
> 
> 	mutex_lock(&q->tag_set->tag_list_lock);
> 	queue_for_each_hw_ctx(q, hctx, i) {
> 		ret = blk_mq_register_hctx(hctx);
> 		if (ret)
> 			goto out_unregister;
> 	}
> 	mutex_unlock(&q->tag_set->tag_list_lock);
>  	return 0
> 
> out_unregister:
> 	queue_for_each_hw_ctx(q, hctx, j) {
>  		if (j < i)
> 			blk_mq_unregister_hctx(hctx);
> 	}
> 	mutex_unlock(&q->tag_set->tag_list_lock);
> 
> ...
> 
> (and similar for blk_mq_sysfs_unregister).

Yes looks good. I will update code and send next patch.

> I assume you did run this through blktests and xfstests with lockdep
> enabled to catch if we created some new lock ordering problems?
> I can't think of any right now, but it's good to validate that.
> 
Yeah I ran blktests with lockdep enabled before submitting patch to
ensure that lockdep doesn't generate any waning with changes. However
I didn't run xfstests. Anyways, I would now run both blktests and 
xfstests before submitting the next patch.

Thanks,
--Nilay 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-01-28  7:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-23 17:40 [RFC PATCHv2 0/2] block: remove q->sysfs_dir_lock and fix race updating nr_hw_queue Nilay Shroff
2025-01-23 17:40 ` [RFC PATCHv2 1/2] block: get rid of request queue ->sysfs_dir_lock Nilay Shroff
2025-01-28  5:46   ` Christoph Hellwig
2025-01-23 17:40 ` [RFC PATCHv2 2/2] block: fix nr_hw_queue update racing with disk addition/removal Nilay Shroff
2025-01-28  5:49   ` Christoph Hellwig
2025-01-28  7:11     ` Nilay Shroff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).