From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org
Cc: "Nilay Shroff" <nilay@linux.ibm.com>,
"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Christoph Hellwig" <hch@lst.de>,
"Ming Lei" <ming.lei@redhat.com>,
"Hannes Reinecke" <hare@suse.de>
Subject: [PATCH V5 08/25] block: prevent adding/deleting disk during updating nr_hw_queues
Date: Mon, 5 May 2025 22:17:46 +0800 [thread overview]
Message-ID: <20250505141805.2751237-9-ming.lei@redhat.com> (raw)
In-Reply-To: <20250505141805.2751237-1-ming.lei@redhat.com>
Both adding/deleting disk code are reader of `nr_hw_queues`, so we can't
allow them in-progress when updating nr_hw_queues, kernel panic and
kasan has been reported in [1].
Prevent adding/deleting disk during updating nr_hw_queues by adding
rw_semaphore to tagset, write lock is grabbed in blk_mq_update_nr_hw_queues(),
and read lock is acquired when adding/deleting disk.
Also mark GFP_NOIO allocation scope for adding/deleting disk because
blk_mq_update_nr_hw_queues() is part of some driver's error handler.
This way avoids lot of trouble.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Suggested-by: Nilay Shroff <nilay@linux.ibm.com>
Reported-by: Nilay Shroff <nilay@linux.ibm.com>
Closes: https://lore.kernel.org/linux-block/a5896cdb-a59a-4a37-9f99-20522f5d2987@linux.ibm.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 4 ++
block/genhd.c | 113 ++++++++++++++++++++++++++++-------------
include/linux/blk-mq.h | 3 ++
3 files changed, 86 insertions(+), 34 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3292f2c4147a..240afa4c6ab3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4848,6 +4848,8 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
goto out_free_srcu;
}
+ init_rwsem(&set->update_nr_hwq_lock);
+
ret = -ENOMEM;
set->tags = kcalloc_node(set->nr_hw_queues,
sizeof(struct blk_mq_tags *), GFP_KERNEL,
@@ -5143,9 +5145,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
{
+ down_write(&set->update_nr_hwq_lock);
mutex_lock(&set->tag_list_lock);
__blk_mq_update_nr_hw_queues(set, nr_hw_queues);
mutex_unlock(&set->tag_list_lock);
+ up_write(&set->update_nr_hwq_lock);
}
EXPORT_SYMBOL_GPL(blk_mq_update_nr_hw_queues);
diff --git a/block/genhd.c b/block/genhd.c
index 50f3deeec5e3..e0dd8ecc925f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -415,19 +415,9 @@ static void add_disk_final(struct gendisk *disk)
set_bit(GD_ADDED, &disk->state);
}
-/**
- * add_disk_fwnode - add disk information to kernel list with fwnode
- * @parent: parent device for the disk
- * @disk: per-device partitioning information
- * @groups: Additional per-device sysfs groups
- * @fwnode: attached disk fwnode
- *
- * This function registers the partitioning information in @disk
- * with the kernel. Also attach a fwnode to the disk device.
- */
-int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
- const struct attribute_group **groups,
- struct fwnode_handle *fwnode)
+static int __add_disk(struct device *parent, struct gendisk *disk,
+ const struct attribute_group **groups,
+ struct fwnode_handle *fwnode)
{
struct device *ddev = disk_to_dev(disk);
@@ -550,7 +540,6 @@ int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
*/
disk->part0->bd_dev = MKDEV(disk->major, disk->first_minor);
}
- add_disk_final(disk);
return 0;
out_unregister_bdi:
@@ -580,6 +569,45 @@ int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
}
return ret;
}
+
+/**
+ * add_disk_fwnode - add disk information to kernel list with fwnode
+ * @parent: parent device for the disk
+ * @disk: per-device partitioning information
+ * @groups: Additional per-device sysfs groups
+ * @fwnode: attached disk fwnode
+ *
+ * This function registers the partitioning information in @disk
+ * with the kernel. Also attach a fwnode to the disk device.
+ */
+int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
+ const struct attribute_group **groups,
+ struct fwnode_handle *fwnode)
+{
+ struct blk_mq_tag_set *set;
+ unsigned int memflags;
+ int ret;
+
+ if (queue_is_mq(disk->queue)) {
+ set = disk->queue->tag_set;
+ memflags = memalloc_noio_save();
+ down_read(&set->update_nr_hwq_lock);
+ ret = __add_disk(parent, disk, groups, fwnode);
+ up_read(&set->update_nr_hwq_lock);
+ memalloc_noio_restore(memflags);
+ } else {
+ ret = __add_disk(parent, disk, groups, fwnode);
+ }
+
+ /*
+ * add_disk_final() needn't to read `nr_hw_queues`, so move it out
+ * of read lock `set->update_nr_hwq_lock` for avoiding unnecessary
+ * lock dependency on `disk->open_mutex` from scanning partition.
+ */
+ if (!ret)
+ add_disk_final(disk);
+ return ret;
+}
EXPORT_SYMBOL_GPL(add_disk_fwnode);
/**
@@ -660,26 +688,7 @@ void blk_mark_disk_dead(struct gendisk *disk)
}
EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
-/**
- * del_gendisk - remove the gendisk
- * @disk: the struct gendisk to remove
- *
- * Removes the gendisk and all its associated resources. This deletes the
- * partitions associated with the gendisk, and unregisters the associated
- * request_queue.
- *
- * This is the counter to the respective __device_add_disk() call.
- *
- * The final removal of the struct gendisk happens when its refcount reaches 0
- * with put_disk(), which should be called after del_gendisk(), if
- * __device_add_disk() was used.
- *
- * Drivers exist which depend on the release of the gendisk to be synchronous,
- * it should not be deferred.
- *
- * Context: can sleep
- */
-void del_gendisk(struct gendisk *disk)
+static void __del_gendisk(struct gendisk *disk)
{
struct request_queue *q = disk->queue;
struct block_device *part;
@@ -772,6 +781,42 @@ void del_gendisk(struct gendisk *disk)
if (start_drain)
blk_unfreeze_release_lock(q);
}
+
+/**
+ * del_gendisk - remove the gendisk
+ * @disk: the struct gendisk to remove
+ *
+ * Removes the gendisk and all its associated resources. This deletes the
+ * partitions associated with the gendisk, and unregisters the associated
+ * request_queue.
+ *
+ * This is the counter to the respective __device_add_disk() call.
+ *
+ * The final removal of the struct gendisk happens when its refcount reaches 0
+ * with put_disk(), which should be called after del_gendisk(), if
+ * __device_add_disk() was used.
+ *
+ * Drivers exist which depend on the release of the gendisk to be synchronous,
+ * it should not be deferred.
+ *
+ * Context: can sleep
+ */
+void del_gendisk(struct gendisk *disk)
+{
+ struct blk_mq_tag_set *set;
+ unsigned int memflags;
+
+ if (!queue_is_mq(disk->queue)) {
+ __del_gendisk(disk);
+ } else {
+ set = disk->queue->tag_set;
+ memflags = memalloc_noio_save();
+ down_read(&set->update_nr_hwq_lock);
+ __del_gendisk(disk);
+ up_read(&set->update_nr_hwq_lock);
+ memalloc_noio_restore(memflags);
+ }
+}
EXPORT_SYMBOL(del_gendisk);
/**
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8eb9b3310167..ef84d53095a6 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -9,6 +9,7 @@
#include <linux/prefetch.h>
#include <linux/srcu.h>
#include <linux/rw_hint.h>
+#include <linux/rwsem.h>
struct blk_mq_tags;
struct blk_flush_queue;
@@ -527,6 +528,8 @@ struct blk_mq_tag_set {
struct mutex tag_list_lock;
struct list_head tag_list;
struct srcu_struct *srcu;
+
+ struct rw_semaphore update_nr_hwq_lock;
};
/**
--
2.47.0
next prev parent reply other threads:[~2025-05-05 14:18 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
2025-05-05 14:17 ` [PATCH V5 01/25] block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue() Ming Lei
2025-05-05 14:17 ` [PATCH V5 02/25] block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag Ming Lei
2025-05-05 14:17 ` [PATCH V5 03/25] block: don't call freeze queue in elevator_switch() and elevator_disable() Ming Lei
2025-05-05 14:17 ` [PATCH V5 04/25] block: use q->elevator with ->elevator_lock held in elv_iosched_show() Ming Lei
2025-05-05 14:17 ` [PATCH V5 05/25] block: add two helpers for registering/un-registering sched debugfs Ming Lei
2025-05-05 14:17 ` [PATCH V5 06/25] block: move sched debugfs register into elvevator_register_queue Ming Lei
2025-05-05 14:17 ` [PATCH V5 07/25] block: add helper add_disk_final() Ming Lei
2025-05-06 4:40 ` Christoph Hellwig
2025-05-06 7:43 ` Nilay Shroff
2025-05-06 11:02 ` Hannes Reinecke
2025-05-05 14:17 ` Ming Lei [this message]
2025-05-05 14:17 ` [PATCH V5 09/25] block: don't allow to switch elevator if updating nr_hw_queues is in-progress Ming Lei
2025-05-06 4:41 ` Christoph Hellwig
2025-05-06 6:26 ` Nilay Shroff
2025-05-05 14:17 ` [PATCH V5 10/25] block: look up the elevator type in elevator_switch Ming Lei
2025-05-05 14:17 ` [PATCH V5 11/25] block: fold elevator_disable into elevator_switch Ming Lei
2025-05-05 14:17 ` [PATCH V5 12/25] block: move blk_queue_registered() check into elv_iosched_store() Ming Lei
2025-05-06 4:41 ` Christoph Hellwig
2025-05-06 7:47 ` Nilay Shroff
2025-05-05 14:17 ` [PATCH V5 13/25] block: simplify elevator reattachment for updating nr_hw_queues Ming Lei
2025-05-05 14:17 ` [PATCH V5 14/25] block: move queue freezing & elevator_lock into elevator_change() Ming Lei
2025-05-05 14:17 ` [PATCH V5 15/25] block: add `struct elv_change_ctx` for unifying elevator change Ming Lei
2025-05-06 4:42 ` Christoph Hellwig
2025-05-05 14:17 ` [PATCH V5 16/25] block: " Ming Lei
2025-05-05 14:17 ` [PATCH V5 17/25] block: pass elevator_queue to elv_register_queue & unregister_queue Ming Lei
2025-05-05 14:17 ` [PATCH V5 18/25] block: remove elevator queue's type check in elv_attr_show/store() Ming Lei
2025-05-05 14:17 ` [PATCH V5 19/25] block: fail to show/store elevator sysfs attribute if elevator is dying Ming Lei
2025-05-06 11:09 ` Hannes Reinecke
2025-05-05 14:17 ` [PATCH V5 20/25] block: add new helper for disabling elevator switch when deleting disk Ming Lei
2025-05-06 6:32 ` Nilay Shroff
2025-05-05 14:17 ` [PATCH V5 21/25] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
2025-05-06 4:43 ` Christoph Hellwig
2025-05-06 6:36 ` Nilay Shroff
2025-05-05 14:18 ` [PATCH V5 22/25] block: move hctx debugfs/sysfs registering out of freezing queue Ming Lei
2025-05-05 14:18 ` [PATCH V5 23/25] block: don't acquire ->elevator_lock in blk_mq_map_swqueue and blk_mq_realloc_hw_ctxs Ming Lei
2025-05-06 4:44 ` Christoph Hellwig
2025-05-05 14:18 ` [PATCH V5 24/25] block: move hctx cpuhp add/del out of queue freezing Ming Lei
2025-05-06 4:44 ` Christoph Hellwig
2025-05-05 14:18 ` [PATCH V5 25/25] block: move wbt_enable_default() out of queue freezing from sched ->exit() Ming Lei
2025-05-06 4:44 ` Christoph Hellwig
2025-05-06 13:48 ` [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250505141805.2751237-9-ming.lei@redhat.com \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=nilay@linux.ibm.com \
--cc=shinichiro.kawasaki@wdc.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.