* [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev()
@ 2025-06-10 17:05 Justin Sanders
2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Justin Sanders @ 2025-06-10 17:05 UTC (permalink / raw)
To: axboe, ed.cashin, linux-block, linux-kernel; +Cc: Justin Sanders
An aoe device's rq_list contains accepted block requests that are
waiting to be transmitted to the aoe target. This queue was added as
part of the conversion to blk_mq. However, the queue was not cleaned out
when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
indefinitely waiting for those requests to complete, causing a hang. This
fix cleans out the queue before calling blk_mq_freeze_queue().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
---
drivers/block/aoe/aoedev.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index bba05f0c5bbd..edd4bae3b5a9 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -198,6 +198,7 @@ aoedev_downdev(struct aoedev *d)
{
struct aoetgt *t, **tt, **te;
struct list_head *head, *pos, *nx;
+ struct request *rq, *rqnext;
int i;
d->flags &= ~DEVFL_UP;
@@ -223,6 +224,13 @@ aoedev_downdev(struct aoedev *d)
/* clean out the in-process request (if any) */
aoe_failip(d);
+ /* clean out any queued block requests */
+ list_for_each_entry_safe(rq, rqnext, &d->rq_list, queuelist) {
+ list_del_init(&rq->queuelist);
+ blk_mq_start_request(rq);
+ blk_mq_end_request(rq, BLK_STS_IOERR);
+ }
+
/* fast fail all pending I/O */
if (d->blkq) {
/* UP is cleared, freeze+quiesce to insure all are errored */
--
2.49.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue
2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders
@ 2025-06-10 17:06 ` Justin Sanders
2025-06-17 11:55 ` [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Valentin Kleibel
2025-06-17 12:12 ` Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Justin Sanders @ 2025-06-10 17:06 UTC (permalink / raw)
To: axboe, ed.cashin, linux-block, linux-kernel; +Cc: Justin Sanders
When aoe's rexmit_timer() notices that an aoe target fails to respond to
commands for more than aoe_deadsecs, it calls aoedev_downdev() which
cleans the outstanding aoe and block queues. This can involve sleeping,
such as in blk_mq_freeze_queue(), which should not occur in irq context.
This patch defers that aoedev_downdev() call to the aoe device's
workqueue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
---
drivers/block/aoe/aoe.h | 1 +
drivers/block/aoe/aoecmd.c | 8 ++++++--
drivers/block/aoe/aoedev.c | 5 ++++-
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 749ae1246f4c..d35caa3c69e1 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -80,6 +80,7 @@ enum {
DEVFL_NEWSIZE = (1<<6), /* need to update dev size in block layer */
DEVFL_FREEING = (1<<7), /* set when device is being cleaned up */
DEVFL_FREED = (1<<8), /* device has been cleaned up */
+ DEVFL_DEAD = (1<<9), /* device has timed out of aoe_deadsecs */
};
enum {
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 50cc90f6ab35..6298f8e271e3 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -754,7 +754,7 @@ rexmit_timer(struct timer_list *timer)
utgts = count_targets(d, NULL);
- if (d->flags & DEVFL_TKILL) {
+ if (d->flags & (DEVFL_TKILL | DEVFL_DEAD)) {
spin_unlock_irqrestore(&d->lock, flags);
return;
}
@@ -786,7 +786,8 @@ rexmit_timer(struct timer_list *timer)
* to clean up.
*/
list_splice(&flist, &d->factive[0]);
- aoedev_downdev(d);
+ d->flags |= DEVFL_DEAD;
+ queue_work(aoe_wq, &d->work);
goto out;
}
@@ -898,6 +899,9 @@ aoecmd_sleepwork(struct work_struct *work)
{
struct aoedev *d = container_of(work, struct aoedev, work);
+ if (d->flags & DEVFL_DEAD)
+ aoedev_downdev(d);
+
if (d->flags & DEVFL_GDALLOC)
aoeblk_gdalloc(d);
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index edd4bae3b5a9..3a240755045b 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -200,8 +200,11 @@ aoedev_downdev(struct aoedev *d)
struct list_head *head, *pos, *nx;
struct request *rq, *rqnext;
int i;
+ unsigned long flags;
- d->flags &= ~DEVFL_UP;
+ spin_lock_irqsave(&d->lock, flags);
+ d->flags &= ~(DEVFL_UP | DEVFL_DEAD);
+ spin_unlock_irqrestore(&d->lock, flags);
/* clean out active and to-be-retransmitted buffers */
for (i = 0; i < NFACTIVE; i++) {
--
2.49.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev()
2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders
2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders
@ 2025-06-17 11:55 ` Valentin Kleibel
2025-06-17 12:12 ` Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Valentin Kleibel @ 2025-06-17 11:55 UTC (permalink / raw)
To: Justin Sanders; +Cc: linux-kernel, axboe, ed.cashin, linux-block
On 10/06/2025 19.05, Justin Sanders wrote:
> An aoe device's rq_list contains accepted block requests that are
> waiting to be transmitted to the aoe target. This queue was added as
> part of the conversion to blk_mq. However, the queue was not cleaned out
> when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
> indefinitely waiting for those requests to complete, causing a hang. This
> fix cleans out the queue before calling blk_mq_freeze_queue().
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
> Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
> Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
Thank you very much for the patches to fix this issue.
We have tested them in our environment and can confirm that they work as
expected.
Tested-By: Valentin Kleibel <valentin@vrvis.at>
Best Regards,
Valentin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev()
2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders
2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders
2025-06-17 11:55 ` [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Valentin Kleibel
@ 2025-06-17 12:12 ` Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-06-17 12:12 UTC (permalink / raw)
To: ed.cashin, linux-block, linux-kernel, Justin Sanders
On Tue, 10 Jun 2025 17:05:59 +0000, Justin Sanders wrote:
> An aoe device's rq_list contains accepted block requests that are
> waiting to be transmitted to the aoe target. This queue was added as
> part of the conversion to blk_mq. However, the queue was not cleaned out
> when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
> indefinitely waiting for those requests to complete, causing a hang. This
> fix cleans out the queue before calling blk_mq_freeze_queue().
>
> [...]
Applied, thanks!
[1/2] aoe: clean device rq_list in aoedev_downdev()
commit: a847c4a41630b38136e069aad82dd619c03e95b6
[2/2] aoe: defer rexmit timer downdev work to workqueue
commit: 71437cf6208c63af6ba99cb42074d13d7b56b669
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-06-17 12:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders
2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders
2025-06-17 11:55 ` [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Valentin Kleibel
2025-06-17 12:12 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.