All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev()
@ 2025-06-10 17:05 Justin Sanders
  2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Justin Sanders @ 2025-06-10 17:05 UTC (permalink / raw)
  To: axboe, ed.cashin, linux-block, linux-kernel; +Cc: Justin Sanders

An aoe device's rq_list contains accepted block requests that are
waiting to be transmitted to the aoe target. This queue was added as
part of the conversion to blk_mq. However, the queue was not cleaned out
when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
indefinitely waiting for those requests to complete, causing a hang. This
fix cleans out the queue before calling blk_mq_freeze_queue().

Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
---
 drivers/block/aoe/aoedev.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index bba05f0c5bbd..edd4bae3b5a9 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -198,6 +198,7 @@ aoedev_downdev(struct aoedev *d)
 {
 	struct aoetgt *t, **tt, **te;
 	struct list_head *head, *pos, *nx;
+	struct request *rq, *rqnext;
 	int i;
 
 	d->flags &= ~DEVFL_UP;
@@ -223,6 +224,13 @@ aoedev_downdev(struct aoedev *d)
 	/* clean out the in-process request (if any) */
 	aoe_failip(d);
 
+	/* clean out any queued block requests */
+	list_for_each_entry_safe(rq, rqnext, &d->rq_list, queuelist) {
+		list_del_init(&rq->queuelist);
+		blk_mq_start_request(rq);
+		blk_mq_end_request(rq, BLK_STS_IOERR);
+	}
+
 	/* fast fail all pending I/O */
 	if (d->blkq) {
 		/* UP is cleared, freeze+quiesce to insure all are errored */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue
  2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders
@ 2025-06-10 17:06 ` Justin Sanders
  2025-06-17 11:55 ` [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Valentin Kleibel
  2025-06-17 12:12 ` Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Justin Sanders @ 2025-06-10 17:06 UTC (permalink / raw)
  To: axboe, ed.cashin, linux-block, linux-kernel; +Cc: Justin Sanders

When aoe's rexmit_timer() notices that an aoe target fails to respond to
commands for more than aoe_deadsecs, it calls aoedev_downdev() which
cleans the outstanding aoe and block queues. This can involve sleeping,
such as in blk_mq_freeze_queue(), which should not occur in irq context.

This patch defers that aoedev_downdev() call to the aoe device's
workqueue.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
---
 drivers/block/aoe/aoe.h    | 1 +
 drivers/block/aoe/aoecmd.c | 8 ++++++--
 drivers/block/aoe/aoedev.c | 5 ++++-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 749ae1246f4c..d35caa3c69e1 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -80,6 +80,7 @@ enum {
 	DEVFL_NEWSIZE = (1<<6),	/* need to update dev size in block layer */
 	DEVFL_FREEING = (1<<7),	/* set when device is being cleaned up */
 	DEVFL_FREED = (1<<8),	/* device has been cleaned up */
+	DEVFL_DEAD = (1<<9),	/* device has timed out of aoe_deadsecs */
 };
 
 enum {
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 50cc90f6ab35..6298f8e271e3 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -754,7 +754,7 @@ rexmit_timer(struct timer_list *timer)
 
 	utgts = count_targets(d, NULL);
 
-	if (d->flags & DEVFL_TKILL) {
+	if (d->flags & (DEVFL_TKILL | DEVFL_DEAD)) {
 		spin_unlock_irqrestore(&d->lock, flags);
 		return;
 	}
@@ -786,7 +786,8 @@ rexmit_timer(struct timer_list *timer)
 			 * to clean up.
 			 */
 			list_splice(&flist, &d->factive[0]);
-			aoedev_downdev(d);
+			d->flags |= DEVFL_DEAD;
+			queue_work(aoe_wq, &d->work);
 			goto out;
 		}
 
@@ -898,6 +899,9 @@ aoecmd_sleepwork(struct work_struct *work)
 {
 	struct aoedev *d = container_of(work, struct aoedev, work);
 
+	if (d->flags & DEVFL_DEAD)
+		aoedev_downdev(d);
+
 	if (d->flags & DEVFL_GDALLOC)
 		aoeblk_gdalloc(d);
 
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index edd4bae3b5a9..3a240755045b 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -200,8 +200,11 @@ aoedev_downdev(struct aoedev *d)
 	struct list_head *head, *pos, *nx;
 	struct request *rq, *rqnext;
 	int i;
+	unsigned long flags;
 
-	d->flags &= ~DEVFL_UP;
+	spin_lock_irqsave(&d->lock, flags);
+	d->flags &= ~(DEVFL_UP | DEVFL_DEAD);
+	spin_unlock_irqrestore(&d->lock, flags);
 
 	/* clean out active and to-be-retransmitted buffers */
 	for (i = 0; i < NFACTIVE; i++) {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev()
  2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders
  2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders
@ 2025-06-17 11:55 ` Valentin Kleibel
  2025-06-17 12:12 ` Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Valentin Kleibel @ 2025-06-17 11:55 UTC (permalink / raw)
  To: Justin Sanders; +Cc: linux-kernel, axboe, ed.cashin, linux-block

On 10/06/2025 19.05, Justin Sanders wrote:
> An aoe device's rq_list contains accepted block requests that are
> waiting to be transmitted to the aoe target. This queue was added as
> part of the conversion to blk_mq. However, the queue was not cleaned out
> when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
> indefinitely waiting for those requests to complete, causing a hang. This
> fix cleans out the queue before calling blk_mq_freeze_queue().
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
> Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
> Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>

Thank you very much for the patches to fix this issue.
We have tested them in our environment and can confirm that they work as 
expected.

Tested-By: Valentin Kleibel <valentin@vrvis.at>

Best Regards,
	Valentin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev()
  2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders
  2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders
  2025-06-17 11:55 ` [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Valentin Kleibel
@ 2025-06-17 12:12 ` Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-06-17 12:12 UTC (permalink / raw)
  To: ed.cashin, linux-block, linux-kernel, Justin Sanders


On Tue, 10 Jun 2025 17:05:59 +0000, Justin Sanders wrote:
> An aoe device's rq_list contains accepted block requests that are
> waiting to be transmitted to the aoe target. This queue was added as
> part of the conversion to blk_mq. However, the queue was not cleaned out
> when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
> indefinitely waiting for those requests to complete, causing a hang. This
> fix cleans out the queue before calling blk_mq_freeze_queue().
> 
> [...]

Applied, thanks!

[1/2] aoe: clean device rq_list in aoedev_downdev()
      commit: a847c4a41630b38136e069aad82dd619c03e95b6
[2/2] aoe: defer rexmit timer downdev work to workqueue
      commit: 71437cf6208c63af6ba99cb42074d13d7b56b669

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-17 12:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-10 17:05 [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Justin Sanders
2025-06-10 17:06 ` [PATCH 2/2] aoe: defer rexmit timer downdev work to workqueue Justin Sanders
2025-06-17 11:55 ` [PATCH 1/2] aoe: clean device rq_list in aoedev_downdev() Valentin Kleibel
2025-06-17 12:12 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.