* [PATCH 0/3] Fix a deadlock related to modifying queue attributes
@ 2025-07-02 18:24 Bart Van Assche
2025-07-02 18:24 ` [PATCH 1/3] block: Remove queue freezing from several sysfs store callbacks Bart Van Assche
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Bart Van Assche @ 2025-07-02 18:24 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche
Hi Jens,
In kernel v6.11 code was introduced in the sysfs store callbacks for freezing
and unfreezing the request queue. This may cause a deadlock in combination with
dm-multipath devices and the queue_if_no_path option. This patch series modifies
the block layer sysfs store callbacks as follows:
- Remove the request queue freeze / unfreeze calls from the block layer sysfs
store callbacks if it is safe to do so.
- For the sysfs attributes for which it is not safe to remove the queue
freeze / unfreeze calls, fail the sysfs write operation if freezing the
request queue takes longer than expected.
Please consider this patch series for the next merge window.
Thanks,
Bart.
Bart Van Assche (3):
block: Remove queue freezing from several sysfs store callbacks
block: Restrict the duration of sysfs attribute changes
block: Move a misplaced comment in queue_wb_lat_store()
block/blk-mq.c | 14 +++++++++++++
block/blk-settings.c | 32 ++++++++++++++++++++++++++++++
block/blk-sysfs.c | 45 ++++++++++++++++--------------------------
include/linux/blk-mq.h | 2 ++
include/linux/blkdev.h | 2 ++
5 files changed, 67 insertions(+), 28 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] block: Remove queue freezing from several sysfs store callbacks
2025-07-02 18:24 [PATCH 0/3] Fix a deadlock related to modifying queue attributes Bart Van Assche
@ 2025-07-02 18:24 ` Bart Van Assche
2025-07-03 2:08 ` Ming Lei
2025-07-02 18:24 ` [PATCH 2/3] block: Restrict the duration of sysfs attribute changes Bart Van Assche
` (2 subsequent siblings)
3 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2025-07-02 18:24 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Christoph Hellwig, Bart Van Assche, Nilay Shroff,
stable
Freezing the request queue from inside sysfs store callbacks may cause a
deadlock in combination with the dm-multipath driver and the
queue_if_no_path option. Additionally, freezing the request queue slows
down system boot on systems where sysfs attributes are set synchronously.
Fix this by removing the blk_mq_freeze_queue() / blk_mq_unfreeze_queue()
calls from the store callbacks that do not strictly need these callbacks.
This patch may cause a small delay in applying the new settings.
This patch affects the following sysfs attributes:
* io_poll_delay
* io_timeout
* nomerges
* read_ahead_kb
* rq_affinity
Here is an example of a deadlock triggered by running test srp/002:
task:multipathd
Call Trace:
<TASK>
__schedule+0x8c1/0x1bf0
schedule+0xdd/0x270
schedule_preempt_disabled+0x1c/0x30
__mutex_lock+0xb89/0x1650
mutex_lock_nested+0x1f/0x30
dm_table_set_restrictions+0x823/0xdf0
__bind+0x166/0x590
dm_swap_table+0x2a7/0x490
do_resume+0x1b1/0x610
dev_suspend+0x55/0x1a0
ctl_ioctl+0x3a5/0x7e0
dm_ctl_ioctl+0x12/0x20
__x64_sys_ioctl+0x127/0x1a0
x64_sys_call+0xe2b/0x17d0
do_syscall_64+0x96/0x3a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
task:(udev-worker)
Call Trace:
<TASK>
__schedule+0x8c1/0x1bf0
schedule+0xdd/0x270
blk_mq_freeze_queue_wait+0xf2/0x140
blk_mq_freeze_queue_nomemsave+0x23/0x30
queue_ra_store+0x14e/0x290
queue_attr_store+0x23e/0x2c0
sysfs_kf_write+0xde/0x140
kernfs_fop_write_iter+0x3b2/0x630
vfs_write+0x4fd/0x1390
ksys_write+0xfd/0x230
__x64_sys_write+0x76/0xc0
x64_sys_call+0x276/0x17d0
do_syscall_64+0x96/0x3a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nilay Shroff <nilay@linux.ibm.com>
Cc: stable@vger.kernel.org
Fixes: af2814149883 ("block: freeze the queue in queue_attr_store")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/blk-sysfs.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b2b9b89d6967..ab34fe62f4da 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -105,7 +105,6 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count)
{
unsigned long ra_kb;
ssize_t ret;
- unsigned int memflags;
struct request_queue *q = disk->queue;
ret = queue_var_store(&ra_kb, page, count);
@@ -116,10 +115,8 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count)
* calculated from the queue limits by queue_limits_commit_update.
*/
mutex_lock(&q->limits_lock);
- memflags = blk_mq_freeze_queue(q);
disk->bdi->ra_pages = ra_kb >> (PAGE_SHIFT - 10);
mutex_unlock(&q->limits_lock);
- blk_mq_unfreeze_queue(q, memflags);
return ret;
}
@@ -317,21 +314,18 @@ static ssize_t queue_nomerges_store(struct gendisk *disk, const char *page,
size_t count)
{
unsigned long nm;
- unsigned int memflags;
struct request_queue *q = disk->queue;
ssize_t ret = queue_var_store(&nm, page, count);
if (ret < 0)
return ret;
- memflags = blk_mq_freeze_queue(q);
blk_queue_flag_clear(QUEUE_FLAG_NOMERGES, q);
blk_queue_flag_clear(QUEUE_FLAG_NOXMERGES, q);
if (nm == 2)
blk_queue_flag_set(QUEUE_FLAG_NOMERGES, q);
else if (nm)
blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q);
- blk_mq_unfreeze_queue(q, memflags);
return ret;
}
@@ -351,7 +345,6 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count)
#ifdef CONFIG_SMP
struct request_queue *q = disk->queue;
unsigned long val;
- unsigned int memflags;
ret = queue_var_store(&val, page, count);
if (ret < 0)
@@ -363,7 +356,6 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count)
* are accessed individually using atomic test_bit operation. So we
* don't grab any lock while updating these flags.
*/
- memflags = blk_mq_freeze_queue(q);
if (val == 2) {
blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, q);
@@ -374,7 +366,6 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count)
blk_queue_flag_clear(QUEUE_FLAG_SAME_COMP, q);
blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
}
- blk_mq_unfreeze_queue(q, memflags);
#endif
return ret;
}
@@ -388,11 +379,9 @@ static ssize_t queue_poll_delay_store(struct gendisk *disk, const char *page,
static ssize_t queue_poll_store(struct gendisk *disk, const char *page,
size_t count)
{
- unsigned int memflags;
ssize_t ret = count;
struct request_queue *q = disk->queue;
- memflags = blk_mq_freeze_queue(q);
if (!(q->limits.features & BLK_FEAT_POLL)) {
ret = -EINVAL;
goto out;
@@ -401,7 +390,6 @@ static ssize_t queue_poll_store(struct gendisk *disk, const char *page,
pr_info_ratelimited("writes to the poll attribute are ignored.\n");
pr_info_ratelimited("please use driver specific parameters instead.\n");
out:
- blk_mq_unfreeze_queue(q, memflags);
return ret;
}
@@ -414,7 +402,7 @@ static ssize_t queue_io_timeout_show(struct gendisk *disk, char *page)
static ssize_t queue_io_timeout_store(struct gendisk *disk, const char *page,
size_t count)
{
- unsigned int val, memflags;
+ unsigned int val;
int err;
struct request_queue *q = disk->queue;
@@ -422,9 +410,7 @@ static ssize_t queue_io_timeout_store(struct gendisk *disk, const char *page,
if (err || val == 0)
return -EINVAL;
- memflags = blk_mq_freeze_queue(q);
blk_queue_rq_timeout(q, msecs_to_jiffies(val));
- blk_mq_unfreeze_queue(q, memflags);
return count;
}
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] block: Restrict the duration of sysfs attribute changes
2025-07-02 18:24 [PATCH 0/3] Fix a deadlock related to modifying queue attributes Bart Van Assche
2025-07-02 18:24 ` [PATCH 1/3] block: Remove queue freezing from several sysfs store callbacks Bart Van Assche
@ 2025-07-02 18:24 ` Bart Van Assche
2025-07-03 9:07 ` Christoph Hellwig
2025-07-03 14:54 ` Nilay Shroff
2025-07-02 18:24 ` [PATCH 3/3] block: Move a misplaced comment in queue_wb_lat_store() Bart Van Assche
2025-07-03 9:09 ` [PATCH 0/3] Fix a deadlock related to modifying queue attributes Christoph Hellwig
3 siblings, 2 replies; 15+ messages in thread
From: Bart Van Assche @ 2025-07-02 18:24 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche, Nilay Shroff
Freezing the request queue from inside sysfs store callbacks may cause a
deadlock in combination with the dm-multipath driver and the
queue_if_no_path option. Fix this by restricting how long to wait until
the queue is frozen inside the remaining sysfs callback methods that
freeze the request queue.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nilay Shroff <nilay@linux.ibm.com>
Fixes: af2814149883 ("block: freeze the queue in queue_attr_store")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/blk-mq.c | 14 ++++++++++++++
block/blk-settings.c | 32 ++++++++++++++++++++++++++++++++
block/blk-sysfs.c | 19 +++++++++++--------
include/linux/blk-mq.h | 2 ++
include/linux/blkdev.h | 2 ++
5 files changed, 61 insertions(+), 8 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 532acdbe9e16..23fd8db663d9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -191,6 +191,7 @@ void blk_mq_freeze_queue_wait(struct request_queue *q)
}
EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
+/* Returns > 0 upon success; 0 upon failure. */
int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
unsigned long timeout)
{
@@ -207,6 +208,19 @@ void blk_mq_freeze_queue_nomemsave(struct request_queue *q)
}
EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_nomemsave);
+/*
+ * Try to freeze @q. Returns 0 if successful or a negative value if freezing @q
+ * did not succeed before the timeout expired.
+ */
+int blk_mq_freeze_queue_nomemsave_timeout(struct request_queue *q,
+ unsigned long timeout)
+{
+ blk_freeze_queue_start(q);
+ if (blk_mq_freeze_queue_wait_timeout(q, timeout) <= 0)
+ return -EAGAIN;
+ return 0;
+}
+
bool __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic)
{
bool unfreeze;
diff --git a/block/blk-settings.c b/block/blk-settings.c
index a000daafbfb4..93225762a40d 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -487,6 +487,38 @@ int queue_limits_commit_update_frozen(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(queue_limits_commit_update_frozen);
+/**
+ * queue_limits_commit_update_frozen_timeout - commit an atomic update of queue
+ * limits
+ * @q: queue to update
+ * @lim: limits to apply
+ * @timeout: maximum time to wait until @q is frozen
+ *
+ * Apply the limits in @lim that were obtained from queue_limits_start_update()
+ * and updated with the new values by the caller to @q. Freezes the queue
+ * before the update and unfreezes it after.
+ *
+ * Returns 0 if successful, else a negative error code.
+ */
+int queue_limits_commit_update_frozen_timeout(struct request_queue *q,
+ struct queue_limits *lim, unsigned long timeout)
+{
+ unsigned int memflags;
+ int ret;
+
+ memflags = memalloc_noio_save();
+ ret = blk_mq_freeze_queue_nomemsave_timeout(q, timeout);
+ if (ret < 0) {
+ memalloc_flags_restore(memflags);
+ queue_limits_cancel_update(q);
+ return ret;
+ }
+ ret = queue_limits_commit_update(q, lim);
+ blk_mq_unfreeze_queue(q, memflags);
+
+ return ret;
+}
+
/**
* queue_limits_set - apply queue limits to queue
* @q: queue to update
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ab34fe62f4da..c75901a55497 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -65,7 +65,7 @@ static ssize_t
queue_requests_store(struct gendisk *disk, const char *page, size_t count)
{
unsigned long nr;
- int ret, err;
+ int ret;
unsigned int memflags;
struct request_queue *q = disk->queue;
@@ -76,17 +76,18 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
if (ret < 0)
return ret;
- memflags = blk_mq_freeze_queue(q);
+ memflags = memalloc_noio_save();
+ ret = blk_mq_freeze_queue_nomemsave_timeout(q, q->rq_timeout);
+ if (ret < 0)
+ return ret;
mutex_lock(&q->elevator_lock);
if (nr < BLKDEV_MIN_RQ)
nr = BLKDEV_MIN_RQ;
- err = blk_mq_update_nr_requests(disk->queue, nr);
- if (err)
- ret = err;
+ ret = blk_mq_update_nr_requests(disk->queue, nr);
mutex_unlock(&q->elevator_lock);
blk_mq_unfreeze_queue(q, memflags);
- return ret;
+ return ret < 0 ? ret : count;
}
static ssize_t queue_ra_show(struct gendisk *disk, char *page)
@@ -582,7 +583,8 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
if (val < -1)
return -EINVAL;
- memflags = blk_mq_freeze_queue(q);
+ memflags = memalloc_noio_save();
+ ret = blk_mq_freeze_queue_nomemsave_timeout(q, q->rq_timeout);
rqos = wbt_rq_qos(q);
if (!rqos) {
@@ -782,7 +784,8 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
return res;
}
- res = queue_limits_commit_update_frozen(q, &lim);
+ res = queue_limits_commit_update_frozen_timeout(q, &lim,
+ q->rq_timeout);
if (res)
return res;
return length;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2a5a828f19a0..6665de1cd75e 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -925,6 +925,8 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
busy_tag_iter_fn *fn, void *priv);
void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset);
void blk_mq_freeze_queue_nomemsave(struct request_queue *q);
+int blk_mq_freeze_queue_nomemsave_timeout(struct request_queue *q,
+ unsigned long timeout);
void blk_mq_unfreeze_queue_nomemrestore(struct request_queue *q);
static inline unsigned int __must_check
blk_mq_freeze_queue(struct request_queue *q)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a51f92b6c340..dc00a0f4bd28 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1059,6 +1059,8 @@ queue_limits_start_update(struct request_queue *q)
}
int queue_limits_commit_update_frozen(struct request_queue *q,
struct queue_limits *lim);
+int queue_limits_commit_update_frozen_timeout(struct request_queue *q,
+ struct queue_limits *lim, unsigned long timeout);
int queue_limits_commit_update(struct request_queue *q,
struct queue_limits *lim);
int queue_limits_set(struct request_queue *q, struct queue_limits *lim);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] block: Move a misplaced comment in queue_wb_lat_store()
2025-07-02 18:24 [PATCH 0/3] Fix a deadlock related to modifying queue attributes Bart Van Assche
2025-07-02 18:24 ` [PATCH 1/3] block: Remove queue freezing from several sysfs store callbacks Bart Van Assche
2025-07-02 18:24 ` [PATCH 2/3] block: Restrict the duration of sysfs attribute changes Bart Van Assche
@ 2025-07-02 18:24 ` Bart Van Assche
2025-07-03 9:09 ` [PATCH 0/3] Fix a deadlock related to modifying queue attributes Christoph Hellwig
3 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2025-07-02 18:24 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche, Nilay Shroff
blk_mq_quiesce_queue() does not wait for pending I/O to finish. Freezing
a queue waits for pending I/O to finish. Hence move the comment that
refers to waiting for pending I/O above the call that freezes the
request queue. This patch moves this comment back to the position where
it was when this comment was introduced. See also commit c125311d96b1
("blk-wbt: don't maintain inflight counts if disabled").
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/blk-sysfs.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index c75901a55497..52766bd629f3 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -584,6 +584,11 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
return -EINVAL;
memflags = memalloc_noio_save();
+ /*
+ * Ensure that the queue is idled, in case the latency update
+ * ends up either enabling or disabling wbt completely. We can't
+ * have IO inflight if that happens.
+ */
ret = blk_mq_freeze_queue_nomemsave_timeout(q, q->rq_timeout);
rqos = wbt_rq_qos(q);
@@ -602,11 +607,6 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
if (wbt_get_min_lat(q) == val)
goto out;
- /*
- * Ensure that the queue is idled, in case the latency update
- * ends up either enabling or disabling wbt completely. We can't
- * have IO inflight if that happens.
- */
blk_mq_quiesce_queue(q);
mutex_lock(&disk->rqos_state_mutex);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: Remove queue freezing from several sysfs store callbacks
2025-07-02 18:24 ` [PATCH 1/3] block: Remove queue freezing from several sysfs store callbacks Bart Van Assche
@ 2025-07-03 2:08 ` Ming Lei
2025-07-03 9:06 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2025-07-03 2:08 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Nilay Shroff, stable
On Wed, Jul 02, 2025 at 11:24:28AM -0700, Bart Van Assche wrote:
> Freezing the request queue from inside sysfs store callbacks may cause a
> deadlock in combination with the dm-multipath driver and the
> queue_if_no_path option. Additionally, freezing the request queue slows
> down system boot on systems where sysfs attributes are set synchronously.
>
> Fix this by removing the blk_mq_freeze_queue() / blk_mq_unfreeze_queue()
> calls from the store callbacks that do not strictly need these callbacks.
Please add commit log why freeze isn't needed for these sysfs attributes
callbacks.
Thanks,
Ming
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: Remove queue freezing from several sysfs store callbacks
2025-07-03 2:08 ` Ming Lei
@ 2025-07-03 9:06 ` Christoph Hellwig
0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-07-03 9:06 UTC (permalink / raw)
To: Ming Lei
Cc: Bart Van Assche, Jens Axboe, linux-block, Christoph Hellwig,
Nilay Shroff, stable
On Thu, Jul 03, 2025 at 10:08:38AM +0800, Ming Lei wrote:
> On Wed, Jul 02, 2025 at 11:24:28AM -0700, Bart Van Assche wrote:
> > Freezing the request queue from inside sysfs store callbacks may cause a
> > deadlock in combination with the dm-multipath driver and the
> > queue_if_no_path option. Additionally, freezing the request queue slows
> > down system boot on systems where sysfs attributes are set synchronously.
> >
> > Fix this by removing the blk_mq_freeze_queue() / blk_mq_unfreeze_queue()
> > calls from the store callbacks that do not strictly need these callbacks.
>
> Please add commit log why freeze isn't needed for these sysfs attributes
> callbacks.
Yes. I'm rather doubtful about some of them, but waiting for a full
explanation. The explanation might be easier to deliver by doing one
patch per attribute.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] block: Restrict the duration of sysfs attribute changes
2025-07-02 18:24 ` [PATCH 2/3] block: Restrict the duration of sysfs attribute changes Bart Van Assche
@ 2025-07-03 9:07 ` Christoph Hellwig
2025-07-03 14:54 ` Nilay Shroff
1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-07-03 9:07 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Christoph Hellwig, Nilay Shroff
You are not limiting the duration of attribute change, your are
letting the freeze time out.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Fix a deadlock related to modifying queue attributes
2025-07-02 18:24 [PATCH 0/3] Fix a deadlock related to modifying queue attributes Bart Van Assche
` (2 preceding siblings ...)
2025-07-02 18:24 ` [PATCH 3/3] block: Move a misplaced comment in queue_wb_lat_store() Bart Van Assche
@ 2025-07-03 9:09 ` Christoph Hellwig
2025-07-07 18:35 ` Bart Van Assche
3 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2025-07-03 9:09 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Christoph Hellwig
I'm still very doubtful on this whole approach, and I think you are
ignoring the root cause, which is dm-multipath keeping a q_usage_count
reference for an unbounded time. It is only supposed to be held over
I/O, and I/O is expected to time out.
You'll probably get much farther by changing dm-multipath to not hold
a q_usage_count reference for something not actually under I/O.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] block: Restrict the duration of sysfs attribute changes
2025-07-02 18:24 ` [PATCH 2/3] block: Restrict the duration of sysfs attribute changes Bart Van Assche
2025-07-03 9:07 ` Christoph Hellwig
@ 2025-07-03 14:54 ` Nilay Shroff
1 sibling, 0 replies; 15+ messages in thread
From: Nilay Shroff @ 2025-07-03 14:54 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe; +Cc: linux-block, Christoph Hellwig
On 7/2/25 11:54 PM, Bart Van Assche wrote:
> Freezing the request queue from inside sysfs store callbacks may cause a
> deadlock in combination with the dm-multipath driver and the
> queue_if_no_path option. Fix this by restricting how long to wait until
> the queue is frozen inside the remaining sysfs callback methods that
> freeze the request queue.
I think we should explicitly list the attribute names that are affected by
this change. Also, it would help clarify which sysfs store callbacks are
impacted.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Fix a deadlock related to modifying queue attributes
2025-07-03 9:09 ` [PATCH 0/3] Fix a deadlock related to modifying queue attributes Christoph Hellwig
@ 2025-07-07 18:35 ` Bart Van Assche
2025-07-08 9:57 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2025-07-07 18:35 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jens Axboe, linux-block
On 7/3/25 2:09 AM, Christoph Hellwig wrote:
> I'm still very doubtful on this whole approach, and I think you are
> ignoring the root cause, which is dm-multipath keeping a q_usage_count
> reference for an unbounded time. It is only supposed to be held over
> I/O, and I/O is expected to time out.
No. Since queue_if_no_path was introduced, it can queue I/O
indefinitely. The oldest reference I could find to the queue_if_no_path
implementation is from 20 years ago:
Author: Alasdair G. Kergon <agk@redhat.com>
Date: Wed Mar 9 17:19:15 2005 -0800
[PATCH] device-mapper: multipath
The core device-mapper multipath and path-selector code.
[ ... ]
As a last resort there is an option to 'queue_if_no_path' which
queues I/O if
all paths have failed e.g. temporarily during a firmware update or
if the
userspace daemon is slow reinstating paths.
> You'll probably get much farther by changing dm-multipath to not hold
> a q_usage_count reference for something not actually under I/O.
Please make yourself familiar with how the dm-mpath driver works. The
default behavior for the dm-mpath driver is to operate in request based
mode (DM_TYPE_REQUEST_BASED). Hence, a request is allocated before the
dm-mpath driver sees any I/O. Hence, q_usage_counter is increased before
the dm-mpath driver sees any I/O. Hence, what has been suggested cannot
be implemented.
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Fix a deadlock related to modifying queue attributes
2025-07-07 18:35 ` Bart Van Assche
@ 2025-07-08 9:57 ` Christoph Hellwig
2025-07-08 16:11 ` Bart Van Assche
2025-07-08 17:16 ` Bart Van Assche
0 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-07-08 9:57 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Christoph Hellwig, Jens Axboe, linux-block
On Mon, Jul 07, 2025 at 11:35:32AM -0700, Bart Van Assche wrote:
> On 7/3/25 2:09 AM, Christoph Hellwig wrote:
>> I'm still very doubtful on this whole approach, and I think you are
>> ignoring the root cause, which is dm-multipath keeping a q_usage_count
>> reference for an unbounded time. It is only supposed to be held over
>> I/O, and I/O is expected to time out.
>
> No. Since queue_if_no_path was introduced, it can queue I/O
> indefinitely. The oldest reference I could find to the queue_if_no_path
> implementation is from 20 years ago:
Thank you ѕo much Bart, without your help I would not be able to
git-blame.
That still doesn't make it sensible to keep the q usage counter
elevator for unlimited time. See nvme multipath for how we can keep
bios around forever without elevating the usage counter which is
supposed to be transient. Note that dm-multipath should in fact
already be doing the right thing in bio based mode as well.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Fix a deadlock related to modifying queue attributes
2025-07-08 9:57 ` Christoph Hellwig
@ 2025-07-08 16:11 ` Bart Van Assche
2025-07-10 8:03 ` Christoph Hellwig
2025-07-08 17:16 ` Bart Van Assche
1 sibling, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2025-07-08 16:11 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jens Axboe, linux-block
On 7/8/25 2:57 AM, Christoph Hellwig wrote:
> That still doesn't make it sensible to keep the q usage counter
> elevator for unlimited time. See nvme multipath for how we can keep
> bios around forever without elevating the usage counter which is
> supposed to be transient. Note that dm-multipath should in fact
> already be doing the right thing in bio based mode as well.
Hi Christoph,
I will look into modifying the SRP tests in the blktests repository such
that these use bio-based mode instead of request-based mode.
However, that won't make the regressions disappear that I have reported.
Many users prefer the request-based dm-multipath mode because of better
support for I/O scheduling and merging. For more information, see also
this cover letter from 2008: [PATCH 00/13] request-based dm-multipath
(https://lwn.net/Articles/298882/).
So the question remains what to do about these two regressions:
* The deadlock triggered by modifying a sysfs attribute of a
dm-multipath device configured with "queue_if_no_path" and no paths
(temporarily).
* Slower booting of Linux devices that modify sysfs attributes
synchronously during boot.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Fix a deadlock related to modifying queue attributes
2025-07-08 9:57 ` Christoph Hellwig
2025-07-08 16:11 ` Bart Van Assche
@ 2025-07-08 17:16 ` Bart Van Assche
1 sibling, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2025-07-08 17:16 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jens Axboe, linux-block
On 7/8/25 2:57 AM, Christoph Hellwig wrote:
> That still doesn't make it sensible to keep the q usage counter
> elevator for unlimited time. See nvme multipath for how we can keep
> bios around forever without elevating the usage counter which is
> supposed to be transient. Note that dm-multipath should in fact
> already be doing the right thing in bio based mode as well.
This blktests patch should be sufficient to switch from rq-based to
bio-based dm-multipathing:
diff --git a/common/multipath-over-rdma b/common/multipath-over-rdma
index e157e0a19560..167428b1f53e 100644
--- a/common/multipath-over-rdma
+++ b/common/multipath-over-rdma
@@ -267,7 +267,7 @@ mpath_has_stale_dev() {
# Check whether multipath definition $1 includes the queue_if_no_path
keyword.
is_qinp_def() {
case "$1" in
- *" 3 queue_if_no_path queue_mode mq "*)
+ *" 3 queue_if_no_path queue_mode "*)
return 0;;
*" 1 queue_if_no_path "*)
return 0;;
diff --git a/tests/srp/multipath.conf b/tests/srp/multipath.conf
index e0da32e29917..dffea925466f 100644
--- a/tests/srp/multipath.conf
+++ b/tests/srp/multipath.conf
@@ -8,6 +8,7 @@ devices {
vendor "LIO-ORG|SCST_BIO|FUSIONIO"
product ".*"
no_path_retry queue
+ queue_mode bio
path_checker tur
}
}
With the above patch applied, the following deadlock is triggered by the
SRP tests:
Call Trace:
<TASK>
__schedule+0x8c1/0x1be0
schedule+0xdd/0x270
schedule_preempt_disabled+0x1c/0x30
__mutex_lock+0xb89/0x1650
mutex_lock_nested+0x1f/0x30 <- queue_limits_start_update()
dm_table_set_restrictions+0x823/0xdf0
__bind+0x166/0x5a0
dm_swap_table+0x2a7/0x490
do_resume+0x1b1/0x610
dev_suspend+0x55/0x1a0
ctl_ioctl+0x3a5/0x810
dm_ctl_ioctl+0x12/0x20
__x64_sys_ioctl+0x127/0x1a0
x64_sys_call+0xe2b/0x17d0
do_syscall_64+0x96/0x3a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
task:(udev-worker) state:D stack:27360 pid:8369 tgid:8369 ppid:1758
task_flags:0x480140 flags:0x00004002
Call Trace:
<TASK>
__schedule+0x8c1/0x1be0
schedule+0xdd/0x270
blk_mq_freeze_queue_wait+0xf2/0x140
blk_mq_freeze_queue_nomemsave+0x23/0x30
queue_ra_store+0x14e/0x290
queue_attr_store+0x23e/0x2c0
sysfs_kf_write+0xde/0x140
kernfs_fop_write_iter+0x3b2/0x630
vfs_write+0x4fd/0x1390
ksys_write+0xfd/0x230
__x64_sys_write+0x76/0xc0
x64_sys_call+0x276/0x17d0
do_syscall_64+0x96/0x3a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Thanks,
Bart.
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Fix a deadlock related to modifying queue attributes
2025-07-08 16:11 ` Bart Van Assche
@ 2025-07-10 8:03 ` Christoph Hellwig
2025-07-10 19:29 ` Bart Van Assche
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2025-07-10 8:03 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Christoph Hellwig, Jens Axboe, linux-block
On Tue, Jul 08, 2025 at 09:11:41AM -0700, Bart Van Assche wrote:
> I will look into modifying the SRP tests in the blktests repository such
> that these use bio-based mode instead of request-based mode.
Note that this just fixes the test case. The fact that request
based dm-multipath keeps active requests and thus an elevated
q_usage_counter around still exists then, with effects both to sysfs
and other users of queue freezing.
> So the question remains what to do about these two regressions:
> * The deadlock triggered by modifying a sysfs attribute of a
> dm-multipath device configured with "queue_if_no_path" and no paths
> (temporarily).
That's not a deadlock by the classic definition, but yes, it is hang
that should be addressed.
> * Slower booting of Linux devices that modify sysfs attributes
> synchronously during boot.
So which attributes are regularly modified? Note that for read-ahead
it should be safe to drop the freeze as unlike the others it is not
used for splitting I/O to the limits accepted by the hardware. So
we could probably drop the freeze IFF the patch documents that it is
safe. But it still won't fix the root cause.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] Fix a deadlock related to modifying queue attributes
2025-07-10 8:03 ` Christoph Hellwig
@ 2025-07-10 19:29 ` Bart Van Assche
0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2025-07-10 19:29 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jens Axboe, linux-block
On 7/10/25 1:03 AM, Christoph Hellwig wrote:
> On Tue, Jul 08, 2025 at 09:11:41AM -0700, Bart Van Assche wrote:
>> * Slower booting of Linux devices that modify sysfs attributes
>> synchronously during boot.
>
> So which attributes are regularly modified? Note that for read-ahead
> it should be safe to drop the freeze as unlike the others it is not
> used for splitting I/O to the limits accepted by the hardware. So
> we could probably drop the freeze IFF the patch documents that it is
> safe. But it still won't fix the root cause.
I can't answer the above question for all Linux distros. But I can
answer this question for Android. Here is what I found in the AOSP
platform code, which includes startup (.rc) scripts for all Android
vendors:
$ repo-grep -nHE 'write (/sys/block|/sys/class/block)' '**rc' | \
sed 's,.*/,,;s/ .*//' | sort | uniq -c | sort -rn | head -n5
46 read_ahead_kb # Implemented in block/blk-sysfs.c
27 scheduler # Implemented in block/blk-sysfs.c
20 slice_idle # BFQ I/O scheduler
17 comp_algorithm # zram kernel driver
4 discard_max_bytes # Implemented in block/blk-sysfs.c
Thanks,
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-07-10 19:29 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 18:24 [PATCH 0/3] Fix a deadlock related to modifying queue attributes Bart Van Assche
2025-07-02 18:24 ` [PATCH 1/3] block: Remove queue freezing from several sysfs store callbacks Bart Van Assche
2025-07-03 2:08 ` Ming Lei
2025-07-03 9:06 ` Christoph Hellwig
2025-07-02 18:24 ` [PATCH 2/3] block: Restrict the duration of sysfs attribute changes Bart Van Assche
2025-07-03 9:07 ` Christoph Hellwig
2025-07-03 14:54 ` Nilay Shroff
2025-07-02 18:24 ` [PATCH 3/3] block: Move a misplaced comment in queue_wb_lat_store() Bart Van Assche
2025-07-03 9:09 ` [PATCH 0/3] Fix a deadlock related to modifying queue attributes Christoph Hellwig
2025-07-07 18:35 ` Bart Van Assche
2025-07-08 9:57 ` Christoph Hellwig
2025-07-08 16:11 ` Bart Van Assche
2025-07-10 8:03 ` Christoph Hellwig
2025-07-10 19:29 ` Bart Van Assche
2025-07-08 17:16 ` Bart Van Assche
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).