[PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc

sparclinux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
@ 2025-10-06 10:02 John Paul Adrian Glaubitz
  2025-10-06 12:48 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-10-06 10:02 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux, John Paul Adrian Glaubitz

In a11f6ca9aef9 ("sunvdc: Do not spin in an infinite loop when vio_ldc_send()
returns EAGAIN"), a maximum retry count was added to __vdc_tx_trigger().

After this change, several users reported disk I/O errors when running Linux
inside a logical domain on Solaris 11.4:

[19095.192532] sunvdc: vdc_tx_trigger() failure, err=-11
[19095.192605] I/O error, dev vdiskc, sector 368208928 op 0x1:(WRITE) flags 0x1000 phys_seg 2 prio class 2
[19095.205681] XFS (vdiskc1): metadata I/O error in "xfs_buf_ioend+0x28c/0x600 [xfs]" at daddr 0x15f26420 len 32 error 5
[19432.043471] sunvdc: vdc_tx_trigger() failure, err=-11
[19432.043529] I/O error, dev vdiskc, sector 3732568 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 2
[19432.058821] sunvdc: vdc_tx_trigger() failure, err=-11
[19432.058843] I/O error, dev vdiskc, sector 3736256 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 2
[19432.074109] sunvdc: vdc_tx_trigger() failure, err=-11
[19432.074128] I/O error, dev vdiskc, sector 3736512 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 2
[19432.089425] sunvdc: vdc_tx_trigger() failure, err=-11
[19432.089443] I/O error, dev vdiskc, sector 3737024 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 2
[19432.100964] XFS (vdiskc1): metadata I/O error in "xfs_buf_ioend+0x28c/0x600 [xfs]" at daddr 0x38ec58 len 8 error 5

Since this change seems to have only been justified by reading the code which
becomes evident by the reference to adddc32d6fde ("sunvnet: Do not spin in an
infinite loop when vio_ldc_send() returns EAGAIN") in the commit message, it
can be safely assumed that the change was neither properly tested nor motivated
by any actual bug reports.

Thus, let's revert this change to address the disk I/O errors above.

Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
---
Changes since v1:
- Rephrase commit message
---
 drivers/block/sunvdc.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index 282f81616a78..f56023c2b033 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -45,8 +45,6 @@ MODULE_VERSION(DRV_MODULE_VERSION);
 #define WAITING_FOR_GEN_CMD	0x04
 #define WAITING_FOR_ANY		-1
 
-#define	VDC_MAX_RETRIES	10
-
 static struct workqueue_struct *sunvdc_wq;
 
 struct vdc_req_entry {
@@ -437,7 +435,6 @@ static int __vdc_tx_trigger(struct vdc_port *port)
 		.end_idx		= dr->prod,
 	};
 	int err, delay;
-	int retries = 0;
 
 	hdr.seq = dr->snd_nxt;
 	delay = 1;
@@ -450,8 +447,6 @@ static int __vdc_tx_trigger(struct vdc_port *port)
 		udelay(delay);
 		if ((delay <<= 1) > 128)
 			delay = 128;
-		if (retries++ > VDC_MAX_RETRIES)
-			break;
 	} while (err == -EAGAIN);
 
 	if (err == -ENOTCONN)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 10:02 [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN" John Paul Adrian Glaubitz
@ 2025-10-06 12:48 ` Jens Axboe
  2025-10-06 13:00   ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2025-10-06 12:48 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

On 10/6/25 4:02 AM, John Paul Adrian Glaubitz wrote:
> In a11f6ca9aef9 ("sunvdc: Do not spin in an infinite loop when vio_ldc_send()
> returns EAGAIN"), a maximum retry count was added to __vdc_tx_trigger().
> 
> After this change, several users reported disk I/O errors when running Linux
> inside a logical domain on Solaris 11.4:
> 
> [19095.192532] sunvdc: vdc_tx_trigger() failure, err=-11
> [19095.192605] I/O error, dev vdiskc, sector 368208928 op 0x1:(WRITE) flags 0x1000 phys_seg 2 prio class 2
> [19095.205681] XFS (vdiskc1): metadata I/O error in "xfs_buf_ioend+0x28c/0x600 [xfs]" at daddr 0x15f26420 len 32 error 5
> [19432.043471] sunvdc: vdc_tx_trigger() failure, err=-11
> [19432.043529] I/O error, dev vdiskc, sector 3732568 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 2
> [19432.058821] sunvdc: vdc_tx_trigger() failure, err=-11
> [19432.058843] I/O error, dev vdiskc, sector 3736256 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 2
> [19432.074109] sunvdc: vdc_tx_trigger() failure, err=-11
> [19432.074128] I/O error, dev vdiskc, sector 3736512 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 2
> [19432.089425] sunvdc: vdc_tx_trigger() failure, err=-11
> [19432.089443] I/O error, dev vdiskc, sector 3737024 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 2
> [19432.100964] XFS (vdiskc1): metadata I/O error in "xfs_buf_ioend+0x28c/0x600 [xfs]" at daddr 0x38ec58 len 8 error 5
> 
> Since this change seems to have only been justified by reading the code which
> becomes evident by the reference to adddc32d6fde ("sunvnet: Do not spin in an
> infinite loop when vio_ldc_send() returns EAGAIN") in the commit message, it
> can be safely assumed that the change was neither properly tested nor motivated
> by any actual bug reports.
> 
> Thus, let's revert this change to address the disk I/O errors above.
> 
> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> ---
> Changes since v1:
> - Rephrase commit message
> ---
>  drivers/block/sunvdc.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
> index 282f81616a78..f56023c2b033 100644
> --- a/drivers/block/sunvdc.c
> +++ b/drivers/block/sunvdc.c
> @@ -45,8 +45,6 @@ MODULE_VERSION(DRV_MODULE_VERSION);
>  #define WAITING_FOR_GEN_CMD	0x04
>  #define WAITING_FOR_ANY		-1
>  
> -#define	VDC_MAX_RETRIES	10
> -
>  static struct workqueue_struct *sunvdc_wq;
>  
>  struct vdc_req_entry {
> @@ -437,7 +435,6 @@ static int __vdc_tx_trigger(struct vdc_port *port)
>  		.end_idx		= dr->prod,
>  	};
>  	int err, delay;
> -	int retries = 0;
>  
>  	hdr.seq = dr->snd_nxt;
>  	delay = 1;
> @@ -450,8 +447,6 @@ static int __vdc_tx_trigger(struct vdc_port *port)
>  		udelay(delay);
>  		if ((delay <<= 1) > 128)
>  			delay = 128;
> -		if (retries++ > VDC_MAX_RETRIES)
> -			break;
>  	} while (err == -EAGAIN);
>  
>  	if (err == -ENOTCONN)

When you apply this patch and things work, how many times does it
generally spin where it would've failed before? It's a bit unnerving to
have a never ending spin loop for this, with udelay spinning in between
as well. Looking at vio_ldc_send() as well, that spins for potentially
1000 loops of 1usec each, which would be 1ms. With the current limit of
10 retries, the driver would end up doing udelays of:

1
2
4
8
16
32
64
128
128
128

which is 511 usec on top, for 10.5ms in total spinning time before
giving up. That is kind of mind boggling, that's an eternity.

Not that it's _really_ that important as this is a pretty niche driver,
but still pretty ugly... Does it work reliably with a limit of 100
spins? If things get truly stuck, spinning forever in that loop is not
going to help. In any case, your patch should have

Cc: stable@vger.kernel.org
Fixes: a11f6ca9aef9 ("sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN")

tags added.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 12:48 ` Jens Axboe
@ 2025-10-06 13:00   ` John Paul Adrian Glaubitz
  2025-10-06 13:03     ` John Paul Adrian Glaubitz
  2025-10-06 13:19     ` Jens Axboe
  0 siblings, 2 replies; 13+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-10-06 13:00 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

Hi Jens,

On Mon, 2025-10-06 at 06:48 -0600, Jens Axboe wrote:
> When you apply this patch and things work, how many times does it
> generally spin where it would've failed before? It's a bit unnerving to
> have a never ending spin loop for this, with udelay spinning in between
> as well. Looking at vio_ldc_send() as well, that spins for potentially
> 1000 loops of 1usec each, which would be 1ms. With the current limit of
> 10 retries, the driver would end up doing udelays of:
> 
> 1
> 2
> 4
> 8
> 16
> 32
> 64
> 128
> 128
> 128
> 
> which is 511 usec on top, for 10.5ms in total spinning time before
> giving up. That is kind of mind boggling, that's an eternity.

The problem is that giving up can lead to filesystem corruption which
is problem that was never observed before the change from what I know.

We have deployed a kernel with the change reverted on several LDOMs that
are seeing heavy use such as cfarm202.cfarm.net and we have seen any system
lock ups or similar.

> Not that it's _really_ that important as this is a pretty niche driver,
> but still pretty ugly... Does it work reliably with a limit of 100
> spins? If things get truly stuck, spinning forever in that loop is not
> going to help. In any case, your patch should have

Isn't it possible that the call to vio_ldc_send() will eventually succeed
which is why there is no need to abort in __vdc_tx_trigger()?

And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
driver concerns a block device while the other driver concerns a network
device. Dropping network packages is expected, dropping bytes from a block
device driver is not.

> Cc: stable@vger.kernel.org
> Fixes: a11f6ca9aef9 ("sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN")
> 
> tags added.

Will do.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 13:00   ` John Paul Adrian Glaubitz
@ 2025-10-06 13:03     ` John Paul Adrian Glaubitz
  2025-10-06 13:19     ` Jens Axboe
  1 sibling, 0 replies; 13+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-10-06 13:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

On Mon, 2025-10-06 at 15:00 +0200, John Paul Adrian Glaubitz wrote:
> The problem is that giving up can lead to filesystem corruption which
> is problem that was never observed before the change from what I know.
> 
> We have deployed a kernel with the change reverted on several LDOMs that
> are seeing heavy use such as cfarm202.cfarm.net and we have seen any system
> lock ups or similar.

We have not* seen any lock ups, so far with the patch reverted. Sorry.

> 
-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 13:00   ` John Paul Adrian Glaubitz
  2025-10-06 13:03     ` John Paul Adrian Glaubitz
@ 2025-10-06 13:19     ` Jens Axboe
  2025-10-06 13:34       ` John Paul Adrian Glaubitz
  1 sibling, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2025-10-06 13:19 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

On 10/6/25 7:00 AM, John Paul Adrian Glaubitz wrote:
> Hi Jens,
> 
> On Mon, 2025-10-06 at 06:48 -0600, Jens Axboe wrote:
>> When you apply this patch and things work, how many times does it
>> generally spin where it would've failed before? It's a bit unnerving to
>> have a never ending spin loop for this, with udelay spinning in between
>> as well. Looking at vio_ldc_send() as well, that spins for potentially
>> 1000 loops of 1usec each, which would be 1ms. With the current limit of
>> 10 retries, the driver would end up doing udelays of:
>>
>> 1
>> 2
>> 4
>> 8
>> 16
>> 32
>> 64
>> 128
>> 128
>> 128
>>
>> which is 511 usec on top, for 10.5ms in total spinning time before
>> giving up. That is kind of mind boggling, that's an eternity.
> 
> The problem is that giving up can lead to filesystem corruption which
> is problem that was never observed before the change from what I know.

Yes, I get that.

> We have deployed a kernel with the change reverted on several LDOMs that
> are seeing heavy use such as cfarm202.cfarm.net and we have seen any system
> lock ups or similar.

I believe you. I'm just wondering how long you generally need to spin,
as per the question above: how many times does it generally spin where
it would've failed before?

>> Not that it's _really_ that important as this is a pretty niche driver,
>> but still pretty ugly... Does it work reliably with a limit of 100
>> spins? If things get truly stuck, spinning forever in that loop is not
>> going to help. In any case, your patch should have
> 
> Isn't it possible that the call to vio_ldc_send() will eventually succeed
> which is why there is no need to abort in __vdc_tx_trigger()?

Of course. Is it also possible that some conditions will lead it to
never succeed?

The nicer approach would be to have sunvdc punt retries back up to the
block stack, which would at least avoid a busy spin for the condition.
Rather than return BLK_STS_IOERR which terminates the request and
bubbles back the -EIO as per your log. IOW, if we've already spent
10.5ms in that busy loop as per the above rough calculation, perhaps
we'd be better off restarting the queue and hence this operation after a
small sleeping delay instead. That would seem a lot saner than hammering
on it.

> And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
> loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
> driver concerns a block device while the other driver concerns a network
> device. Dropping network packages is expected, dropping bytes from a block
> device driver is not.

Right, but we can sanely retry it rather than sit in a tight loop.
Something like the entirely untested below diff.

diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index db1fe9772a4d..aa49dffb1b53 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -539,6 +539,7 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct vdc_port *port = hctx->queue->queuedata;
 	struct vio_dring_state *dr;
 	unsigned long flags;
+	int ret;
 
 	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
 
@@ -560,7 +561,13 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
 		return BLK_STS_DEV_RESOURCE;
 	}
 
-	if (__send_request(bd->rq) < 0) {
+	ret = __send_request(bd->rq);
+	if (ret == -EAGAIN) {
+		spin_unlock_irqrestore(&port->vio.lock, flags);
+		/* already spun for 10msec, defer 10msec and retry */
+		blk_mq_delay_kick_requeue_list(hctx->queue, 10);
+		return BLK_STS_DEV_RESOURCE;
+	} else if (ret < 0) {
 		spin_unlock_irqrestore(&port->vio.lock, flags);
 		return BLK_STS_IOERR;
 	}

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 13:19     ` Jens Axboe
@ 2025-10-06 13:34       ` John Paul Adrian Glaubitz
  2025-10-06 13:46         ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-10-06 13:34 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

Hi Jens,

On Mon, 2025-10-06 at 07:19 -0600, Jens Axboe wrote:
> > The problem is that giving up can lead to filesystem corruption which
> > is problem that was never observed before the change from what I know.
> 
> Yes, I get that.
> 
> > We have deployed a kernel with the change reverted on several LDOMs that
> > are seeing heavy use such as cfarm202.cfarm.net and we have seen any system
> > lock ups or similar.
> 
> I believe you. I'm just wondering how long you generally need to spin,
> as per the question above: how many times does it generally spin where
> it would've failed before?

I don't have any data on that, unfortunately. All I can say that we have seen
filesystem corruption when installing Linux inside an LDOM and this particular
issue was eventually tracked down to this commit.

> > > Not that it's _really_ that important as this is a pretty niche driver,
> > > but still pretty ugly... Does it work reliably with a limit of 100
> > > spins? If things get truly stuck, spinning forever in that loop is not
> > > going to help. In any case, your patch should have
> > 
> > Isn't it possible that the call to vio_ldc_send() will eventually succeed
> > which is why there is no need to abort in __vdc_tx_trigger()?
> 
> Of course. Is it also possible that some conditions will lead it to
> never succeed?

I would assume that this would require for the virtual disk server to not
respond which should never happen since it's a virtualized environment.

If hardware issues would cause vio_ldc_send() to never succeed, these would
have to be handled by the virtualization environment, I guess.

> The nicer approach would be to have sunvdc punt retries back up to the
> block stack, which would at least avoid a busy spin for the condition.
> Rather than return BLK_STS_IOERR which terminates the request and
> bubbles back the -EIO as per your log. IOW, if we've already spent
> 10.5ms in that busy loop as per the above rough calculation, perhaps
> we'd be better off restarting the queue and hence this operation after a
> small sleeping delay instead. That would seem a lot saner than hammering
> on it.

I generally agree with this remark. I just wonder whether this behavior should
apply for a logical domain as well. I guess if a request doesn't succeed immediately,
it's an urgent problem if the logical domain locks up, is it?

I have to admit though that I'm absolutely not an expert on the block layer.

> > And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
> > loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
> > driver concerns a block device while the other driver concerns a network
> > device. Dropping network packages is expected, dropping bytes from a block
> > device driver is not.
> 
> Right, but we can sanely retry it rather than sit in a tight loop.
> Something like the entirely untested below diff.
> 
> diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
> index db1fe9772a4d..aa49dffb1b53 100644
> --- a/drivers/block/sunvdc.c
> +++ b/drivers/block/sunvdc.c
> @@ -539,6 +539,7 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	struct vdc_port *port = hctx->queue->queuedata;
>  	struct vio_dring_state *dr;
>  	unsigned long flags;
> +	int ret;
>  
>  	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
>  
> @@ -560,7 +561,13 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>  		return BLK_STS_DEV_RESOURCE;
>  	}
>  
> -	if (__send_request(bd->rq) < 0) {
> +	ret = __send_request(bd->rq);
> +	if (ret == -EAGAIN) {
> +		spin_unlock_irqrestore(&port->vio.lock, flags);
> +		/* already spun for 10msec, defer 10msec and retry */
> +		blk_mq_delay_kick_requeue_list(hctx->queue, 10);
> +		return BLK_STS_DEV_RESOURCE;
> +	} else if (ret < 0) {
>  		spin_unlock_irqrestore(&port->vio.lock, flags);
>  		return BLK_STS_IOERR;
>  	}

We could add this particular change on top of mine after we have extensively tested it.

For now, I would propose to pick up my patch to revert the previous change. I can then
pick up your proposed change and deploy it for extensive testing and see if it has any
side effects.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 13:34       ` John Paul Adrian Glaubitz
@ 2025-10-06 13:46         ` Jens Axboe
  2025-10-06 13:56           ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2025-10-06 13:46 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

On 10/6/25 7:34 AM, John Paul Adrian Glaubitz wrote:
> Hi Jens,
> 
> On Mon, 2025-10-06 at 07:19 -0600, Jens Axboe wrote:
>>> The problem is that giving up can lead to filesystem corruption which
>>> is problem that was never observed before the change from what I know.
>>
>> Yes, I get that.
>>
>>> We have deployed a kernel with the change reverted on several LDOMs that
>>> are seeing heavy use such as cfarm202.cfarm.net and we have seen any system
>>> lock ups or similar.
>>
>> I believe you. I'm just wondering how long you generally need to spin,
>> as per the question above: how many times does it generally spin where
>> it would've failed before?
> 
> I don't have any data on that, unfortunately. All I can say that we
> have seen filesystem corruption when installing Linux inside an LDOM
> and this particular issue was eventually tracked down to this commit.

Right, I would imagine that would need to be collected separately. But
if we can get away from the busy looping, then I don't think it's too
interesting.

>>>> Not that it's _really_ that important as this is a pretty niche driver,
>>>> but still pretty ugly... Does it work reliably with a limit of 100
>>>> spins? If things get truly stuck, spinning forever in that loop is not
>>>> going to help. In any case, your patch should have
>>>
>>> Isn't it possible that the call to vio_ldc_send() will eventually succeed
>>> which is why there is no need to abort in __vdc_tx_trigger()?
>>
>> Of course. Is it also possible that some conditions will lead it to
>> never succeed?
> 
> I would assume that this would require for the virtual disk server to
> not respond which should never happen since it's a virtualized
> environment.

Most likely, yes.

> If hardware issues would cause vio_ldc_send() to never succeed, these
> would have to be handled by the virtualization environment, I guess.

Yep, it'd be a bug somewhere else.

>> The nicer approach would be to have sunvdc punt retries back up to the
>> block stack, which would at least avoid a busy spin for the condition.
>> Rather than return BLK_STS_IOERR which terminates the request and
>> bubbles back the -EIO as per your log. IOW, if we've already spent
>> 10.5ms in that busy loop as per the above rough calculation, perhaps
>> we'd be better off restarting the queue and hence this operation after a
>> small sleeping delay instead. That would seem a lot saner than hammering
>> on it.
> 
> I generally agree with this remark. I just wonder whether this
> behavior should apply for a logical domain as well. I guess if a
> request doesn't succeed immediately, it's an urgent problem if the
> logical domain locks up, is it?

It's just bad behavior. Honestly most of this just looks like either a
bad implementation of the protocol as it's all based on busy looping, or
a badly designed protocol. And then the sunvdc usage of it just
proliferates that problem, rather than utilizing the tools that exist in
the block stack to take a breather rather than repeatedly hammering on
the hardware for conditions like this.

>>> And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
>>> loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
>>> driver concerns a block device while the other driver concerns a network
>>> device. Dropping network packages is expected, dropping bytes from a block
>>> device driver is not.
>>
>> Right, but we can sanely retry it rather than sit in a tight loop.
>> Something like the entirely untested below diff.
>>
>> diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
>> index db1fe9772a4d..aa49dffb1b53 100644
>> --- a/drivers/block/sunvdc.c
>> +++ b/drivers/block/sunvdc.c
>> @@ -539,6 +539,7 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>>  	struct vdc_port *port = hctx->queue->queuedata;
>>  	struct vio_dring_state *dr;
>>  	unsigned long flags;
>> +	int ret;
>>  
>>  	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
>>  
>> @@ -560,7 +561,13 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>>  		return BLK_STS_DEV_RESOURCE;
>>  	}
>>  
>> -	if (__send_request(bd->rq) < 0) {
>> +	ret = __send_request(bd->rq);
>> +	if (ret == -EAGAIN) {
>> +		spin_unlock_irqrestore(&port->vio.lock, flags);
>> +		/* already spun for 10msec, defer 10msec and retry */
>> +		blk_mq_delay_kick_requeue_list(hctx->queue, 10);
>> +		return BLK_STS_DEV_RESOURCE;
>> +	} else if (ret < 0) {
>>  		spin_unlock_irqrestore(&port->vio.lock, flags);
>>  		return BLK_STS_IOERR;
>>  	}
> 
> We could add this particular change on top of mine after we have
> extensively tested it.

It doesn't really make sense on top of yours, as that removes the
limited looping that sunvdc would do...

> For now, I would propose to pick up my patch to revert the previous
> change. I can then pick up your proposed change and deploy it for
> extensive testing and see if it has any side effects.

Why not just test this one and see if it works? As far as I can tell,
it's been 6.5 years since this change went in, I can't imagine there's a
huge sense of urgency to fix it up that can't wait for testing a more
proper patch rather than a work-around?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 13:46         ` Jens Axboe
@ 2025-10-06 13:56           ` John Paul Adrian Glaubitz
  2025-10-06 14:03             ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-10-06 13:56 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

On Mon, 2025-10-06 at 07:46 -0600, Jens Axboe wrote:
> 
> > > The nicer approach would be to have sunvdc punt retries back up to the
> > > block stack, which would at least avoid a busy spin for the condition.
> > > Rather than return BLK_STS_IOERR which terminates the request and
> > > bubbles back the -EIO as per your log. IOW, if we've already spent
> > > 10.5ms in that busy loop as per the above rough calculation, perhaps
> > > we'd be better off restarting the queue and hence this operation after a
> > > small sleeping delay instead. That would seem a lot saner than hammering
> > > on it.
> > 
> > I generally agree with this remark. I just wonder whether this
> > behavior should apply for a logical domain as well. I guess if a
> > request doesn't succeed immediately, it's an urgent problem if the
> > logical domain locks up, is it?
> 
> It's just bad behavior. Honestly most of this just looks like either a
> bad implementation of the protocol as it's all based on busy looping, or
> a badly designed protocol. And then the sunvdc usage of it just
> proliferates that problem, rather than utilizing the tools that exist in
> the block stack to take a breather rather than repeatedly hammering on
> the hardware for conditions like this.

To be fair, the sunvdc driver is fairly old and I'm not sure whether these
tools already existed back then. FWIW, Oracle engineers did actually work
on the Linux for SPARC code for a while and it might be possible that their
UEK kernel tree [1] contains some improvements in this regard.

> > > > And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
> > > > loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
> > > > driver concerns a block device while the other driver concerns a network
> > > > device. Dropping network packages is expected, dropping bytes from a block
> > > > device driver is not.
> > > 
> > > Right, but we can sanely retry it rather than sit in a tight loop.
> > > Something like the entirely untested below diff.
> > > 
> > > diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
> > > index db1fe9772a4d..aa49dffb1b53 100644
> > > --- a/drivers/block/sunvdc.c
> > > +++ b/drivers/block/sunvdc.c
> > > @@ -539,6 +539,7 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
> > >  	struct vdc_port *port = hctx->queue->queuedata;
> > >  	struct vio_dring_state *dr;
> > >  	unsigned long flags;
> > > +	int ret;
> > >  
> > >  	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
> > >  
> > > @@ -560,7 +561,13 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
> > >  		return BLK_STS_DEV_RESOURCE;
> > >  	}
> > >  
> > > -	if (__send_request(bd->rq) < 0) {
> > > +	ret = __send_request(bd->rq);
> > > +	if (ret == -EAGAIN) {
> > > +		spin_unlock_irqrestore(&port->vio.lock, flags);
> > > +		/* already spun for 10msec, defer 10msec and retry */
> > > +		blk_mq_delay_kick_requeue_list(hctx->queue, 10);
> > > +		return BLK_STS_DEV_RESOURCE;
> > > +	} else if (ret < 0) {
> > >  		spin_unlock_irqrestore(&port->vio.lock, flags);
> > >  		return BLK_STS_IOERR;
> > >  	}
> > 
> > We could add this particular change on top of mine after we have
> > extensively tested it.
> 
> It doesn't really make sense on top of yours, as that removes the
> limited looping that sunvdc would do...

Why not? From what I understood, you're moving the limited looping to a
different part of the driver where it can delegate the request back up
the stack meaning that the current place to implement the limitation is
not correct anyway, is it?

> > For now, I would propose to pick up my patch to revert the previous
> > change. I can then pick up your proposed change and deploy it for
> > extensive testing and see if it has any side effects.
> 
> Why not just test this one and see if it works? As far as I can tell,
> it's been 6.5 years since this change went in, I can't imagine there's a
> huge sense of urgency to fix it up that can't wait for testing a more
> proper patch rather than a work-around?

Well, the thing is that a lot of people have been running older kernels
on SPARC because of issues like these and I have started working on trying
to track down all of these issues now [2] for users to be able to run a
current kernel. So, the 6.5 years existence of this change shouldn't
be an argument I think.

Adrian

> [1] https://github.com/oracle/linux-uek/tree/uek4/qu7
> [2] https://github.com/sparclinux/issues/issues

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 13:56           ` John Paul Adrian Glaubitz
@ 2025-10-06 14:03             ` Jens Axboe
  2025-10-06 14:12               ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2025-10-06 14:03 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

On 10/6/25 7:56 AM, John Paul Adrian Glaubitz wrote:
> On Mon, 2025-10-06 at 07:46 -0600, Jens Axboe wrote:
>>
>>>> The nicer approach would be to have sunvdc punt retries back up to the
>>>> block stack, which would at least avoid a busy spin for the condition.
>>>> Rather than return BLK_STS_IOERR which terminates the request and
>>>> bubbles back the -EIO as per your log. IOW, if we've already spent
>>>> 10.5ms in that busy loop as per the above rough calculation, perhaps
>>>> we'd be better off restarting the queue and hence this operation after a
>>>> small sleeping delay instead. That would seem a lot saner than hammering
>>>> on it.
>>>
>>> I generally agree with this remark. I just wonder whether this
>>> behavior should apply for a logical domain as well. I guess if a
>>> request doesn't succeed immediately, it's an urgent problem if the
>>> logical domain locks up, is it?
>>
>> It's just bad behavior. Honestly most of this just looks like either a
>> bad implementation of the protocol as it's all based on busy looping, or
>> a badly designed protocol. And then the sunvdc usage of it just
>> proliferates that problem, rather than utilizing the tools that exist in
>> the block stack to take a breather rather than repeatedly hammering on
>> the hardware for conditions like this.
> 
> To be fair, the sunvdc driver is fairly old and I'm not sure whether these
> tools already existed back then. FWIW, Oracle engineers did actually work
> on the Linux for SPARC code for a while and it might be possible that their
> UEK kernel tree [1] contains some improvements in this regard.

Requeueing and retry has always been available on the block side. It's
not an uncommon thing for a driver to need, in case of resource
starvation. And sometimes those resources can be unrelated to the IO, eg
iommu shortages. Or this busy condition.

But that's fine, it's not uncommon for drivers to miss things like that,
and then we fix them up when noticed. It was probably written by someone
not super familiar with the IO stack.

>>>>> And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
>>>>> loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
>>>>> driver concerns a block device while the other driver concerns a network
>>>>> device. Dropping network packages is expected, dropping bytes from a block
>>>>> device driver is not.
>>>>
>>>> Right, but we can sanely retry it rather than sit in a tight loop.
>>>> Something like the entirely untested below diff.
>>>>
>>>> diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
>>>> index db1fe9772a4d..aa49dffb1b53 100644
>>>> --- a/drivers/block/sunvdc.c
>>>> +++ b/drivers/block/sunvdc.c
>>>> @@ -539,6 +539,7 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>  	struct vdc_port *port = hctx->queue->queuedata;
>>>>  	struct vio_dring_state *dr;
>>>>  	unsigned long flags;
>>>> +	int ret;
>>>>  
>>>>  	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
>>>>  
>>>> @@ -560,7 +561,13 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>  		return BLK_STS_DEV_RESOURCE;
>>>>  	}
>>>>  
>>>> -	if (__send_request(bd->rq) < 0) {
>>>> +	ret = __send_request(bd->rq);
>>>> +	if (ret == -EAGAIN) {
>>>> +		spin_unlock_irqrestore(&port->vio.lock, flags);
>>>> +		/* already spun for 10msec, defer 10msec and retry */
>>>> +		blk_mq_delay_kick_requeue_list(hctx->queue, 10);
>>>> +		return BLK_STS_DEV_RESOURCE;
>>>> +	} else if (ret < 0) {
>>>>  		spin_unlock_irqrestore(&port->vio.lock, flags);
>>>>  		return BLK_STS_IOERR;
>>>>  	}
>>>
>>> We could add this particular change on top of mine after we have
>>> extensively tested it.
>>
>> It doesn't really make sense on top of yours, as that removes the
>> limited looping that sunvdc would do...
> 
> Why not? From what I understood, you're moving the limited looping to a
> different part of the driver where it can delegate the request back up
> the stack meaning that the current place to implement the limitation is
> not correct anyway, is it?

Because your change never gives up, hence it'd never trigger the softer
retry condition. It'll just keep busy looping.

>>> For now, I would propose to pick up my patch to revert the previous
>>> change. I can then pick up your proposed change and deploy it for
>>> extensive testing and see if it has any side effects.
>>
>> Why not just test this one and see if it works? As far as I can tell,
>> it's been 6.5 years since this change went in, I can't imagine there's a
>> huge sense of urgency to fix it up that can't wait for testing a more
>> proper patch rather than a work-around?
> 
> Well, the thing is that a lot of people have been running older kernels
> on SPARC because of issues like these and I have started working on trying
> to track down all of these issues now [2] for users to be able to run a
> current kernel. So, the 6.5 years existence of this change shouldn't
> be an argument I think.

While I agree that the bug is unfortunate, it's also a chance to
properly fix it rather than just go back to busy looping. How difficult
is it to test an iteration of the patch? It'd be annoying to queue a
bandaid only to have to revert that again for a real fix. If this was a
regression from the last release or two then that'd be a different
story, but the fact that this has persisted for 6.5 years and is only
bubbling back up to mainstream now would seem to indicate that we should
spend a bit of extra time to just get it right the first time.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 14:03             ` Jens Axboe
@ 2025-10-06 14:12               ` John Paul Adrian Glaubitz
  2025-10-06 14:27                 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-10-06 14:12 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

On Mon, 2025-10-06 at 08:03 -0600, Jens Axboe wrote:
> > To be fair, the sunvdc driver is fairly old and I'm not sure whether these
> > tools already existed back then. FWIW, Oracle engineers did actually work
> > on the Linux for SPARC code for a while and it might be possible that their
> > UEK kernel tree [1] contains some improvements in this regard.
> 
> Requeueing and retry has always been available on the block side. It's
> not an uncommon thing for a driver to need, in case of resource
> starvation. And sometimes those resources can be unrelated to the IO, eg
> iommu shortages. Or this busy condition.

I see. Makes sense.

> But that's fine, it's not uncommon for drivers to miss things like that,
> and then we fix them up when noticed. It was probably written by someone
> not super familiar with the IO stack.

FWIW, Oracle engineers actually made some significant changes to the driver
that they never upstreamed, see:

https://github.com/oracle/linux-uek/commits/uek4/qu7/drivers/block/sunvdc.c

In particular, they added support for out-of-order execution:

https://github.com/oracle/linux-uek/commit/68f7c9c17fb80d29cbc1e5110f6c021f8da8d610

and they also changed the driver to use the BIO-based interface for VDC I/O requests:

https://github.com/oracle/linux-uek/commit/4b725eb64cc10a4877f2af75ff3a776586f68eb7

Could you review these two changes and tell me whether these would actually implement
the changes you would want to see? I think the BIO layer is a generic interface of
the block layer in the kernel, isn't it?

> > > > > > And unlike the change in adddc32d6fde ("sunvnet: Do not spin in an infinite
> > > > > > loop when vio_ldc_send() returns EAGAIN"), we can't just drop data as this
> > > > > > driver concerns a block device while the other driver concerns a network
> > > > > > device. Dropping network packages is expected, dropping bytes from a block
> > > > > > device driver is not.
> > > > > 
> > > > > Right, but we can sanely retry it rather than sit in a tight loop.
> > > > > Something like the entirely untested below diff.
> > > > > 
> > > > > diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
> > > > > index db1fe9772a4d..aa49dffb1b53 100644
> > > > > --- a/drivers/block/sunvdc.c
> > > > > +++ b/drivers/block/sunvdc.c
> > > > > @@ -539,6 +539,7 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
> > > > >  	struct vdc_port *port = hctx->queue->queuedata;
> > > > >  	struct vio_dring_state *dr;
> > > > >  	unsigned long flags;
> > > > > +	int ret;
> > > > >  
> > > > >  	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
> > > > >  
> > > > > @@ -560,7 +561,13 @@ static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
> > > > >  		return BLK_STS_DEV_RESOURCE;
> > > > >  	}
> > > > >  
> > > > > -	if (__send_request(bd->rq) < 0) {
> > > > > +	ret = __send_request(bd->rq);
> > > > > +	if (ret == -EAGAIN) {
> > > > > +		spin_unlock_irqrestore(&port->vio.lock, flags);
> > > > > +		/* already spun for 10msec, defer 10msec and retry */
> > > > > +		blk_mq_delay_kick_requeue_list(hctx->queue, 10);
> > > > > +		return BLK_STS_DEV_RESOURCE;
> > > > > +	} else if (ret < 0) {
> > > > >  		spin_unlock_irqrestore(&port->vio.lock, flags);
> > > > >  		return BLK_STS_IOERR;
> > > > >  	}
> > > > 
> > > > We could add this particular change on top of mine after we have
> > > > extensively tested it.
> > > 
> > > It doesn't really make sense on top of yours, as that removes the
> > > limited looping that sunvdc would do...
> > 
> > Why not? From what I understood, you're moving the limited looping to a
> > different part of the driver where it can delegate the request back up
> > the stack meaning that the current place to implement the limitation is
> > not correct anyway, is it?
> 
> Because your change never gives up, hence it'd never trigger the softer
> retry condition. It'll just keep busy looping.

Ah, that makes sense.

> > > > For now, I would propose to pick up my patch to revert the previous
> > > > change. I can then pick up your proposed change and deploy it for
> > > > extensive testing and see if it has any side effects.
> > > 
> > > Why not just test this one and see if it works? As far as I can tell,
> > > it's been 6.5 years since this change went in, I can't imagine there's a
> > > huge sense of urgency to fix it up that can't wait for testing a more
> > > proper patch rather than a work-around?
> > 
> > Well, the thing is that a lot of people have been running older kernels
> > on SPARC because of issues like these and I have started working on trying
> > to track down all of these issues now [2] for users to be able to run a
> > current kernel. So, the 6.5 years existence of this change shouldn't
> > be an argument I think.
> 
> While I agree that the bug is unfortunate, it's also a chance to
> properly fix it rather than just go back to busy looping. How difficult
> is it to test an iteration of the patch? It'd be annoying to queue a
> bandaid only to have to revert that again for a real fix. If this was a
> regression from the last release or two then that'd be a different
> story, but the fact that this has persisted for 6.5 years and is only
> bubbling back up to mainstream now would seem to indicate that we should
> spend a bit of extra time to just get it right the first time.

We could do that for sure. But I would like to hear your opinion on the changes
contributed by Oracle engineers first. Maybe their improvements are much better
so that it might make sense to try to upstream them.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 14:12               ` John Paul Adrian Glaubitz
@ 2025-10-06 14:27                 ` Jens Axboe
  2025-10-06 14:48                   ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2025-10-06 14:27 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

On 10/6/25 8:12 AM, John Paul Adrian Glaubitz wrote:
> On Mon, 2025-10-06 at 08:03 -0600, Jens Axboe wrote:
>>> To be fair, the sunvdc driver is fairly old and I'm not sure whether these
>>> tools already existed back then. FWIW, Oracle engineers did actually work
>>> on the Linux for SPARC code for a while and it might be possible that their
>>> UEK kernel tree [1] contains some improvements in this regard.
>>
>> Requeueing and retry has always been available on the block side. It's
>> not an uncommon thing for a driver to need, in case of resource
>> starvation. And sometimes those resources can be unrelated to the IO, eg
>> iommu shortages. Or this busy condition.
> 
> I see. Makes sense.
> 
>> But that's fine, it's not uncommon for drivers to miss things like that,
>> and then we fix them up when noticed. It was probably written by someone
>> not super familiar with the IO stack.
> 
> FWIW, Oracle engineers actually made some significant changes to the
> driver that they never upstreamed, see:
> 
> https://github.com/oracle/linux-uek/commits/uek4/qu7/drivers/block/sunvdc.c
> 
> In particular, they added support for out-of-order execution:
> 
> https://github.com/oracle/linux-uek/commit/68f7c9c17fb80d29cbc1e5110f6c021f8da8d610
> 
> and they also changed the driver to use the BIO-based interface for
> VDC I/O requests:
> 
> https://github.com/oracle/linux-uek/commit/4b725eb64cc10a4877f2af75ff3a776586f68eb7
> 
> Could you review these two changes and tell me whether these would
> actually implement the changes you would want to see? I think the BIO
> layer is a generic interface of the block layer in the kernel, isn't
> it?

Moving lower down the stack to use a bio directly is not a good idea,
it's in fact going the opposite direction of what we'd like to see in
the storage stack. And it would then mean you'd need to implement your
own internal requeueing and retrying.

These are the kind of changes that happen when development is done and
changes aren't submitted upstream. It's unfortunate drift...

>>>>> For now, I would propose to pick up my patch to revert the previous
>>>>> change. I can then pick up your proposed change and deploy it for
>>>>> extensive testing and see if it has any side effects.
>>>>
>>>> Why not just test this one and see if it works? As far as I can tell,
>>>> it's been 6.5 years since this change went in, I can't imagine there's a
>>>> huge sense of urgency to fix it up that can't wait for testing a more
>>>> proper patch rather than a work-around?
>>>
>>> Well, the thing is that a lot of people have been running older kernels
>>> on SPARC because of issues like these and I have started working on trying
>>> to track down all of these issues now [2] for users to be able to run a
>>> current kernel. So, the 6.5 years existence of this change shouldn't
>>> be an argument I think.
>>
>> While I agree that the bug is unfortunate, it's also a chance to
>> properly fix it rather than just go back to busy looping. How difficult
>> is it to test an iteration of the patch? It'd be annoying to queue a
>> bandaid only to have to revert that again for a real fix. If this was a
>> regression from the last release or two then that'd be a different
>> story, but the fact that this has persisted for 6.5 years and is only
>> bubbling back up to mainstream now would seem to indicate that we should
>> spend a bit of extra time to just get it right the first time.
> 
> We could do that for sure. But I would like to hear your opinion on
> the changes contributed by Oracle engineers first. Maybe their
> improvements are much better so that it might make sense to try to
> upstream them.

Won't help this case, and it's actively going the wrong direction
imho...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 14:27                 ` Jens Axboe
@ 2025-10-06 14:48                   ` John Paul Adrian Glaubitz
  2025-10-06 14:52                     ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-10-06 14:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

On Mon, 2025-10-06 at 08:27 -0600, Jens Axboe wrote:
> > > But that's fine, it's not uncommon for drivers to miss things like that,
> > > and then we fix them up when noticed. It was probably written by someone
> > > not super familiar with the IO stack.
> > 
> > FWIW, Oracle engineers actually made some significant changes to the
> > driver that they never upstreamed, see:
> > 
> > https://github.com/oracle/linux-uek/commits/uek4/qu7/drivers/block/sunvdc.c
> > 
> > In particular, they added support for out-of-order execution:
> > 
> > https://github.com/oracle/linux-uek/commit/68f7c9c17fb80d29cbc1e5110f6c021f8da8d610
> > 
> > and they also changed the driver to use the BIO-based interface for
> > VDC I/O requests:
> > 
> > https://github.com/oracle/linux-uek/commit/4b725eb64cc10a4877f2af75ff3a776586f68eb7
> > 
> > Could you review these two changes and tell me whether these would
> > actually implement the changes you would want to see? I think the BIO
> > layer is a generic interface of the block layer in the kernel, isn't
> > it?
> 
> Moving lower down the stack to use a bio directly is not a good idea,
> it's in fact going the opposite direction of what we'd like to see in
> the storage stack. And it would then mean you'd need to implement your
> own internal requeueing and retrying.

I looked at the virtio_blk driver and that seems to confirm it. There is no
use of the bio interface either, so I guess we should not pick up this
patch.

What do you think about the out-of-order execution? Would that make sense
to upstream it? Does it look reasonable?

> These are the kind of changes that happen when development is done and
> changes aren't submitted upstream. It's unfortunate drift...

Well, the problem here is that Oracle stopped working on Linux for SPARC
abruptly, so many of their improvements were never sent upstream and did
not see any reviews which would have caught this.

> > > > > > For now, I would propose to pick up my patch to revert the previous
> > > > > > change. I can then pick up your proposed change and deploy it for
> > > > > > extensive testing and see if it has any side effects.
> > > > > 
> > > > > Why not just test this one and see if it works? As far as I can tell,
> > > > > it's been 6.5 years since this change went in, I can't imagine there's a
> > > > > huge sense of urgency to fix it up that can't wait for testing a more
> > > > > proper patch rather than a work-around?
> > > > 
> > > > Well, the thing is that a lot of people have been running older kernels
> > > > on SPARC because of issues like these and I have started working on trying
> > > > to track down all of these issues now [2] for users to be able to run a
> > > > current kernel. So, the 6.5 years existence of this change shouldn't
> > > > be an argument I think.
> > > 
> > > While I agree that the bug is unfortunate, it's also a chance to
> > > properly fix it rather than just go back to busy looping. How difficult
> > > is it to test an iteration of the patch? It'd be annoying to queue a
> > > bandaid only to have to revert that again for a real fix. If this was a
> > > regression from the last release or two then that'd be a different
> > > story, but the fact that this has persisted for 6.5 years and is only
> > > bubbling back up to mainstream now would seem to indicate that we should
> > > spend a bit of extra time to just get it right the first time.
> > 
> > We could do that for sure. But I would like to hear your opinion on
> > the changes contributed by Oracle engineers first. Maybe their
> > improvements are much better so that it might make sense to try to
> > upstream them.
> 
> Won't help this case, and it's actively going the wrong direction
> imho...

OK, so your opinion is then to add the patch that you proposed on top of what's
currently there in Linus' tree, meaning adding some code that will requeue requests
once the retry limit has been reached?

Can you maybe post a proper patch then which I (and others) could test and then
hopefully add their "Tested-by"?

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN"
  2025-10-06 14:48                   ` John Paul Adrian Glaubitz
@ 2025-10-06 14:52                     ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2025-10-06 14:52 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz, linux-block, linux-kernel
  Cc: Andreas Larsson, Anthony Yznaga, Sam James, David S . Miller,
	Michael Karcher, sparclinux

On 10/6/25 8:48 AM, John Paul Adrian Glaubitz wrote:
> On Mon, 2025-10-06 at 08:27 -0600, Jens Axboe wrote:
>>>> But that's fine, it's not uncommon for drivers to miss things like that,
>>>> and then we fix them up when noticed. It was probably written by someone
>>>> not super familiar with the IO stack.
>>>
>>> FWIW, Oracle engineers actually made some significant changes to the
>>> driver that they never upstreamed, see:
>>>
>>> https://github.com/oracle/linux-uek/commits/uek4/qu7/drivers/block/sunvdc.c
>>>
>>> In particular, they added support for out-of-order execution:
>>>
>>> https://github.com/oracle/linux-uek/commit/68f7c9c17fb80d29cbc1e5110f6c021f8da8d610
>>>
>>> and they also changed the driver to use the BIO-based interface for
>>> VDC I/O requests:
>>>
>>> https://github.com/oracle/linux-uek/commit/4b725eb64cc10a4877f2af75ff3a776586f68eb7
>>>
>>> Could you review these two changes and tell me whether these would
>>> actually implement the changes you would want to see? I think the BIO
>>> layer is a generic interface of the block layer in the kernel, isn't
>>> it?
>>
>> Moving lower down the stack to use a bio directly is not a good idea,
>> it's in fact going the opposite direction of what we'd like to see in
>> the storage stack. And it would then mean you'd need to implement your
>> own internal requeueing and retrying.
> 
> I looked at the virtio_blk driver and that seems to confirm it. There is no
> use of the bio interface either, so I guess we should not pick up this
> patch.

I'd be very hesitant to pick anything up that hasn't been posted and
included upstream...

> What do you think about the out-of-order execution? Would that make sense
> to upstream it? Does it look reasonable?

I have no opinion on that, there's not even a description of why that
change makes any sense. Sorry but I'm not going to waste my time
reviewing out-of-tree code, it's just not a very useful thing to do. If
the changes get submitted upstream for review in a suitable fashion,
then they will get reviewed.

>> These are the kind of changes that happen when development is done and
>> changes aren't submitted upstream. It's unfortunate drift...
> 
> Well, the problem here is that Oracle stopped working on Linux for SPARC
> abruptly, so many of their improvements were never sent upstream and did
> not see any reviews which would have caught this.

And to be frank, the changes you referenced also look pretty incomplete
and would not pass upstream review. I guess they are dead in the water
at this point, unless someone else picks them up and polishes them into
something that can be sent upstream for review.

>>>>>>> For now, I would propose to pick up my patch to revert the previous
>>>>>>> change. I can then pick up your proposed change and deploy it for
>>>>>>> extensive testing and see if it has any side effects.
>>>>>>
>>>>>> Why not just test this one and see if it works? As far as I can tell,
>>>>>> it's been 6.5 years since this change went in, I can't imagine there's a
>>>>>> huge sense of urgency to fix it up that can't wait for testing a more
>>>>>> proper patch rather than a work-around?
>>>>>
>>>>> Well, the thing is that a lot of people have been running older kernels
>>>>> on SPARC because of issues like these and I have started working on trying
>>>>> to track down all of these issues now [2] for users to be able to run a
>>>>> current kernel. So, the 6.5 years existence of this change shouldn't
>>>>> be an argument I think.
>>>>
>>>> While I agree that the bug is unfortunate, it's also a chance to
>>>> properly fix it rather than just go back to busy looping. How difficult
>>>> is it to test an iteration of the patch? It'd be annoying to queue a
>>>> bandaid only to have to revert that again for a real fix. If this was a
>>>> regression from the last release or two then that'd be a different
>>>> story, but the fact that this has persisted for 6.5 years and is only
>>>> bubbling back up to mainstream now would seem to indicate that we should
>>>> spend a bit of extra time to just get it right the first time.
>>>
>>> We could do that for sure. But I would like to hear your opinion on
>>> the changes contributed by Oracle engineers first. Maybe their
>>> improvements are much better so that it might make sense to try to
>>> upstream them.
>>
>> Won't help this case, and it's actively going the wrong direction
>> imho...
> 
> OK, so your opinion is then to add the patch that you proposed on top
> of what's currently there in Linus' tree, meaning adding some code
> that will requeue requests once the retry limit has been reached?

Right, the patch I sent is against the normal upstream tree.

> Can you maybe post a proper patch then which I (and others) could test
> and then hopefully add their "Tested-by"?

Sure.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-10-06 14:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-06 10:02 [PATCH v2] Revert "sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN" John Paul Adrian Glaubitz
2025-10-06 12:48 ` Jens Axboe
2025-10-06 13:00   ` John Paul Adrian Glaubitz
2025-10-06 13:03     ` John Paul Adrian Glaubitz
2025-10-06 13:19     ` Jens Axboe
2025-10-06 13:34       ` John Paul Adrian Glaubitz
2025-10-06 13:46         ` Jens Axboe
2025-10-06 13:56           ` John Paul Adrian Glaubitz
2025-10-06 14:03             ` Jens Axboe
2025-10-06 14:12               ` John Paul Adrian Glaubitz
2025-10-06 14:27                 ` Jens Axboe
2025-10-06 14:48                   ` John Paul Adrian Glaubitz
2025-10-06 14:52                     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).