* [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev()
@ 2025-06-10 17:05 Justin Sanders
2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Justin Sanders @ 2025-06-10 17:05 UTC (permalink / raw)
To: axboe, ed.cashin, linux-block, linux-kernel; +Cc: Justin Sanders
An aoe device's rq_list contains accepted block requests that are
waiting to be transmitted to the aoe target. This queue was added as
part of the conversion to blk_mq. However, the queue was not cleaned out
when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
indefinitely waiting for those requests to complete, causing a hang. This
fix cleans out the queue before calling blk_mq_freeze_queue().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
---
drivers/block/aoe/aoedev.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index bba05f0c5bbd..edd4bae3b5a9 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -198,6 +198,7 @@ aoedev_downdev(struct aoedev *d)
{
struct aoetgt *t, **tt, **te;
struct list_head *head, *pos, *nx;
+ struct request *rq, *rqnext;
int i;
d->flags &= ~DEVFL_UP;
@@ -223,6 +224,13 @@ aoedev_downdev(struct aoedev *d)
/* clean out the in-process request (if any) */
aoe_failip(d);
+ /* clean out any queued block requests */
+ list_for_each_entry_safe(rq, rqnext, &d->rq_list, queuelist) {
+ list_del_init(&rq->queuelist);
+ blk_mq_start_request(rq);
+ blk_mq_end_request(rq, BLK_STS_IOERR);
+ }
+
/* fast fail all pending I/O */
if (d->blkq) {
/* UP is cleared, freeze+quiesce to insure all are errored */
--
2.49.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue 2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders @ 2025-06-10 17:06 ` Justin Sanders 2025-06-17 11:55 ` [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Valentin Kleibel 2025-06-17 12:12 ` Jens Axboe 2 siblings, 0 replies; 4+ messages in thread From: Justin Sanders @ 2025-06-10 17:06 UTC (permalink / raw) To: axboe, ed.cashin, linux-block, linux-kernel; +Cc: Justin Sanders When aoe's rexmit_timer() notices that an aoe target fails to respond to commands for more than aoe_deadsecs, it calls aoedev_downdev() which cleans the outstanding aoe and block queues. This can involve sleeping, such as in blk_mq_freeze_queue(), which should not occur in irq context. This patch defers that aoedev_downdev() call to the aoe device's workqueue. Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665 Signed-off-by: Justin Sanders <jsanders.devel@gmail.com> --- drivers/block/aoe/aoe.h | 1 + drivers/block/aoe/aoecmd.c | 8 ++++++-- drivers/block/aoe/aoedev.c | 5 ++++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h index 749ae1246f4c..d35caa3c69e1 100644 --- a/drivers/block/aoe/aoe.h +++ b/drivers/block/aoe/aoe.h @@ -80,6 +80,7 @@ enum { DEVFL_NEWSIZE = (1<<6), /* need to update dev size in block layer */ DEVFL_FREEING = (1<<7), /* set when device is being cleaned up */ DEVFL_FREED = (1<<8), /* device has been cleaned up */ + DEVFL_DEAD = (1<<9), /* device has timed out of aoe_deadsecs */ }; enum { diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c index 50cc90f6ab35..6298f8e271e3 100644 --- a/drivers/block/aoe/aoecmd.c +++ b/drivers/block/aoe/aoecmd.c @@ -754,7 +754,7 @@ rexmit_timer(struct timer_list *timer) utgts = count_targets(d, NULL); - if (d->flags & DEVFL_TKILL) { + if (d->flags & (DEVFL_TKILL | DEVFL_DEAD)) { spin_unlock_irqrestore(&d->lock, flags); return; } @@ -786,7 +786,8 @@ rexmit_timer(struct timer_list *timer) * to clean up. */ list_splice(&flist, &d->factive[0]); - aoedev_downdev(d); + d->flags |= DEVFL_DEAD; + queue_work(aoe_wq, &d->work); goto out; } @@ -898,6 +899,9 @@ aoecmd_sleepwork(struct work_struct *work) { struct aoedev *d = container_of(work, struct aoedev, work); + if (d->flags & DEVFL_DEAD) + aoedev_downdev(d); + if (d->flags & DEVFL_GDALLOC) aoeblk_gdalloc(d); diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c index edd4bae3b5a9..3a240755045b 100644 --- a/drivers/block/aoe/aoedev.c +++ b/drivers/block/aoe/aoedev.c @@ -200,8 +200,11 @@ aoedev_downdev(struct aoedev *d) struct list_head *head, *pos, *nx; struct request *rq, *rqnext; int i; + unsigned long flags; - d->flags &= ~DEVFL_UP; + spin_lock_irqsave(&d->lock, flags); + d->flags &= ~(DEVFL_UP | DEVFL_DEAD); + spin_unlock_irqrestore(&d->lock, flags); /* clean out active and to-be-retransmitted buffers */ for (i = 0; i < NFACTIVE; i++) { -- 2.49.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() 2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders 2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders @ 2025-06-17 11:55 ` Valentin Kleibel 2025-06-17 12:12 ` Jens Axboe 2 siblings, 0 replies; 4+ messages in thread From: Valentin Kleibel @ 2025-06-17 11:55 UTC (permalink / raw) To: Justin Sanders; +Cc: linux-kernel, axboe, ed.cashin, linux-block On 10/06/2025 19.05, Justin Sanders wrote: > An aoe device's rq_list contains accepted block requests that are > waiting to be transmitted to the aoe target. This queue was added as > part of the conversion to blk_mq. However, the queue was not cleaned out > when an aoe device is downed which caused blk_mq_freeze_queue() to sleep > indefinitely waiting for those requests to complete, causing a hang. This > fix cleans out the queue before calling blk_mq_freeze_queue(). > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665 > Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq") > Signed-off-by: Justin Sanders <jsanders.devel@gmail.com> Thank you very much for the patches to fix this issue. We have tested them in our environment and can confirm that they work as expected. Tested-By: Valentin Kleibel <valentin@vrvis.at> Best Regards, Valentin ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() 2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders 2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders 2025-06-17 11:55 ` [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Valentin Kleibel @ 2025-06-17 12:12 ` Jens Axboe 2 siblings, 0 replies; 4+ messages in thread From: Jens Axboe @ 2025-06-17 12:12 UTC (permalink / raw) To: ed.cashin, linux-block, linux-kernel, Justin Sanders On Tue, 10 Jun 2025 17:05:59 +0000, Justin Sanders wrote: > An aoe device's rq_list contains accepted block requests that are > waiting to be transmitted to the aoe target. This queue was added as > part of the conversion to blk_mq. However, the queue was not cleaned out > when an aoe device is downed which caused blk_mq_freeze_queue() to sleep > indefinitely waiting for those requests to complete, causing a hang. This > fix cleans out the queue before calling blk_mq_freeze_queue(). > > [...] Applied, thanks! [1/2] aoe: clean device rq_list in aoedev_downdev() commit: a847c4a41630b38136e069aad82dd619c03e95b6 [2/2] aoe: defer rexmit timer downdev work to workqueue commit: 71437cf6208c63af6ba99cb42074d13d7b56b669 Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-06-17 12:26 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders 2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders 2025-06-17 11:55 ` [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Valentin Kleibel 2025-06-17 12:12 ` Jens Axboe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.