[PATCH for-10.0] scsi-disk: Apply error policy for host

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
@ 2025-04-07 15:59 Kevin Wolf
  2025-04-07 18:19 ` Stefan Hajnoczi
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Kevin Wolf @ 2025-04-07 15:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable

Originally, all failed SG_IO requests called scsi_handle_rw_error() to
apply the configured error policy. However, commit f3126d65, which was
supposed to be a mere refactoring for scsi-disk.c, broke this and
accidentally completed the SCSI request without considering the error
policy any more if the error was signalled in the host_status field.

Apart from the commit message not describing the chance as intended,
errors indicated in host_status are also obviously backend errors and
not something the guest must deal with indepdently of the error policy.

This behaviour means that some recoverable errors (such as a path error
in multipath configurations) were reported to the guest anyway, which
might not expect it and might consider its disk broken.

Make sure that we apply the error policy again for host_status errors,
too. This addresses an existing FIXME comment and allows us to remove
some comments warning that callbacks weren't always called. With this
fix, they are called in all cases again.

The return value passed to the request callback doesn't have more free
values that could be used to indicate host_status errors as well as SAM
status codes and negative errno. Store the value in the host_status
field of the SCSIRequest instead and use -ENODEV as the return value (if
a path hasn't been reachable for a while, blk_aio_ioctl() will return
-ENODEV instead of just setting host_status, so just reuse it here -
it's not necessarily entirely accurate, but it's as good as any errno).

Cc: qemu-stable@nongnu.org
Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/scsi/scsi-disk.c | 39 +++++++++++++++++++++++++--------------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 8da1d5a77c..e59632e9b1 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -68,10 +68,9 @@ struct SCSIDiskClass {
     SCSIDeviceClass parent_class;
     /*
      * Callbacks receive ret == 0 for success. Errors are represented either as
-     * negative errno values, or as positive SAM status codes.
-     *
-     * Beware: For errors returned in host_status, the function may directly
-     * complete the request and never call the callback.
+     * negative errno values, or as positive SAM status codes. For host_status
+     * errors, the function passes ret == -ENODEV and sets the host_status field
+     * of the SCSIRequest.
      */
     DMAIOFunc       *dma_readv;
     DMAIOFunc       *dma_writev;
@@ -225,11 +224,26 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
     SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
     SCSISense sense = SENSE_CODE(NO_SENSE);
+    int16_t host_status;
     int error;
     bool req_has_sense = false;
     BlockErrorAction action;
     int status;
 
+    /*
+     * host_status should only be set for SG_IO requests that came back with a
+     * host_status error in scsi_block_sgio_complete(). This error path passes
+     * -ENODEV as the return value.
+     *
+     * Reset host_status in the request because we may still want to complete
+     * the request successfully with the 'stop' or 'ignore' error policy.
+     */
+    host_status = r->req.host_status;
+    if (host_status != -1) {
+        assert(ret == -ENODEV);
+        r->req.host_status = -1;
+    }
+
     if (ret < 0) {
         status = scsi_sense_from_errno(-ret, &sense);
         error = -ret;
@@ -289,6 +303,10 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
         if (acct_failed) {
             block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
         }
+        if (host_status != -1) {
+            scsi_req_complete_failed(&r->req, host_status);
+            return true;
+        }
         if (req_has_sense) {
             sdc->update_sense(&r->req);
         } else if (status == CHECK_CONDITION) {
@@ -409,7 +427,6 @@ done:
     scsi_req_unref(&r->req);
 }
 
-/* May not be called in all error cases, don't rely on cleanup here */
 static void scsi_dma_complete(void *opaque, int ret)
 {
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -448,7 +465,6 @@ done:
     scsi_req_unref(&r->req);
 }
 
-/* May not be called in all error cases, don't rely on cleanup here */
 static void scsi_read_complete(void *opaque, int ret)
 {
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -585,7 +601,6 @@ done:
     scsi_req_unref(&r->req);
 }
 
-/* May not be called in all error cases, don't rely on cleanup here */
 static void scsi_write_complete(void * opaque, int ret)
 {
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -2846,14 +2861,10 @@ static void scsi_block_sgio_complete(void *opaque, int ret)
     sg_io_hdr_t *io_hdr = &req->io_header;
 
     if (ret == 0) {
-        /* FIXME This skips calling req->cb() and any cleanup in it */
         if (io_hdr->host_status != SCSI_HOST_OK) {
-            scsi_req_complete_failed(&r->req, io_hdr->host_status);
-            scsi_req_unref(&r->req);
-            return;
-        }
-
-        if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
+            r->req.host_status = io_hdr->host_status;
+            ret = -ENODEV;
+        } else if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
             ret = BUSY;
         } else {
             ret = io_hdr->status;
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
@ 2025-04-07 18:19 ` Stefan Hajnoczi
  2025-04-08 12:21 ` Hanna Czenczek
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2025-04-07 18:19 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable

On Mon, Apr 7, 2025 at 12:00 PM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> apply the configured error policy. However, commit f3126d65, which was
> supposed to be a mere refactoring for scsi-disk.c, broke this and
> accidentally completed the SCSI request without considering the error
> policy any more if the error was signalled in the host_status field.
>
> Apart from the commit message not describing the chance as intended,
> errors indicated in host_status are also obviously backend errors and
> not something the guest must deal with indepdently of the error policy.
>
> This behaviour means that some recoverable errors (such as a path error
> in multipath configurations) were reported to the guest anyway, which
> might not expect it and might consider its disk broken.
>
> Make sure that we apply the error policy again for host_status errors,
> too. This addresses an existing FIXME comment and allows us to remove
> some comments warning that callbacks weren't always called. With this
> fix, they are called in all cases again.
>
> The return value passed to the request callback doesn't have more free
> values that could be used to indicate host_status errors as well as SAM
> status codes and negative errno. Store the value in the host_status
> field of the SCSIRequest instead and use -ENODEV as the return value (if
> a path hasn't been reachable for a while, blk_aio_ioctl() will return
> -ENODEV instead of just setting host_status, so just reuse it here -
> it's not necessarily entirely accurate, but it's as good as any errno).
>
> Cc: qemu-stable@nongnu.org
> Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 39 +++++++++++++++++++++++++--------------
>  1 file changed, 25 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

>
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 8da1d5a77c..e59632e9b1 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -68,10 +68,9 @@ struct SCSIDiskClass {
>      SCSIDeviceClass parent_class;
>      /*
>       * Callbacks receive ret == 0 for success. Errors are represented either as
> -     * negative errno values, or as positive SAM status codes.
> -     *
> -     * Beware: For errors returned in host_status, the function may directly
> -     * complete the request and never call the callback.
> +     * negative errno values, or as positive SAM status codes. For host_status
> +     * errors, the function passes ret == -ENODEV and sets the host_status field
> +     * of the SCSIRequest.
>       */
>      DMAIOFunc       *dma_readv;
>      DMAIOFunc       *dma_writev;
> @@ -225,11 +224,26 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
>      SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
>      SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
>      SCSISense sense = SENSE_CODE(NO_SENSE);
> +    int16_t host_status;
>      int error;
>      bool req_has_sense = false;
>      BlockErrorAction action;
>      int status;
>
> +    /*
> +     * host_status should only be set for SG_IO requests that came back with a
> +     * host_status error in scsi_block_sgio_complete(). This error path passes
> +     * -ENODEV as the return value.
> +     *
> +     * Reset host_status in the request because we may still want to complete
> +     * the request successfully with the 'stop' or 'ignore' error policy.
> +     */
> +    host_status = r->req.host_status;
> +    if (host_status != -1) {
> +        assert(ret == -ENODEV);
> +        r->req.host_status = -1;
> +    }
> +
>      if (ret < 0) {
>          status = scsi_sense_from_errno(-ret, &sense);
>          error = -ret;
> @@ -289,6 +303,10 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
>          if (acct_failed) {
>              block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
>          }
> +        if (host_status != -1) {
> +            scsi_req_complete_failed(&r->req, host_status);
> +            return true;
> +        }
>          if (req_has_sense) {
>              sdc->update_sense(&r->req);
>          } else if (status == CHECK_CONDITION) {
> @@ -409,7 +427,6 @@ done:
>      scsi_req_unref(&r->req);
>  }
>
> -/* May not be called in all error cases, don't rely on cleanup here */
>  static void scsi_dma_complete(void *opaque, int ret)
>  {
>      SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -448,7 +465,6 @@ done:
>      scsi_req_unref(&r->req);
>  }
>
> -/* May not be called in all error cases, don't rely on cleanup here */
>  static void scsi_read_complete(void *opaque, int ret)
>  {
>      SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -585,7 +601,6 @@ done:
>      scsi_req_unref(&r->req);
>  }
>
> -/* May not be called in all error cases, don't rely on cleanup here */
>  static void scsi_write_complete(void * opaque, int ret)
>  {
>      SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -2846,14 +2861,10 @@ static void scsi_block_sgio_complete(void *opaque, int ret)
>      sg_io_hdr_t *io_hdr = &req->io_header;
>
>      if (ret == 0) {
> -        /* FIXME This skips calling req->cb() and any cleanup in it */
>          if (io_hdr->host_status != SCSI_HOST_OK) {
> -            scsi_req_complete_failed(&r->req, io_hdr->host_status);
> -            scsi_req_unref(&r->req);
> -            return;
> -        }
> -
> -        if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
> +            r->req.host_status = io_hdr->host_status;
> +            ret = -ENODEV;
> +        } else if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
>              ret = BUSY;
>          } else {
>              ret = io_hdr->status;
> --
> 2.48.1
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
  2025-04-07 18:19 ` Stefan Hajnoczi
@ 2025-04-08 12:21 ` Hanna Czenczek
  2025-04-10 12:37 ` Michael Tokarev
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Hanna Czenczek @ 2025-04-08 12:21 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: pbonzini, stefanha, qemu-devel, qemu-stable

On 07.04.25 17:59, Kevin Wolf wrote:
> Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> apply the configured error policy. However, commit f3126d65, which was
> supposed to be a mere refactoring for scsi-disk.c, broke this and
> accidentally completed the SCSI request without considering the error
> policy any more if the error was signalled in the host_status field.
>
> Apart from the commit message not describing the chance as intended,
> errors indicated in host_status are also obviously backend errors and
> not something the guest must deal with indepdently of the error policy.

*independently

> This behaviour means that some recoverable errors (such as a path error
> in multipath configurations) were reported to the guest anyway, which
> might not expect it and might consider its disk broken.
>
> Make sure that we apply the error policy again for host_status errors,
> too. This addresses an existing FIXME comment and allows us to remove
> some comments warning that callbacks weren't always called. With this
> fix, they are called in all cases again.
>
> The return value passed to the request callback doesn't have more free
> values that could be used to indicate host_status errors as well as SAM
> status codes and negative errno. Store the value in the host_status
> field of the SCSIRequest instead and use -ENODEV as the return value (if
> a path hasn't been reachable for a while, blk_aio_ioctl() will return
> -ENODEV instead of just setting host_status, so just reuse it here -
> it's not necessarily entirely accurate, but it's as good as any errno).
>
> Cc: qemu-stable@nongnu.org
> Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   hw/scsi/scsi-disk.c | 39 +++++++++++++++++++++++++--------------
>   1 file changed, 25 insertions(+), 14 deletions(-)

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
  2025-04-07 18:19 ` Stefan Hajnoczi
  2025-04-08 12:21 ` Hanna Czenczek
@ 2025-04-10 12:37 ` Michael Tokarev
  2025-04-10 13:14   ` Kevin Wolf
  2025-04-10 14:25 ` Paolo Bonzini
  2025-05-12  9:23 ` Michael Tokarev
  4 siblings, 1 reply; 14+ messages in thread
From: Michael Tokarev @ 2025-04-10 12:37 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block
  Cc: hreitz, pbonzini, stefanha, qemu-devel, qemu-stable

07.04.2025 18:59, Kevin Wolf пишет:
> Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> apply the configured error policy. However, commit f3126d65, which was
> supposed to be a mere refactoring for scsi-disk.c, broke this and
> accidentally completed the SCSI request without considering the error
> policy any more if the error was signalled in the host_status field.
> 
> Apart from the commit message not describing the chance as intended,
> errors indicated in host_status are also obviously backend errors and
> not something the guest must deal with indepdently of the error policy.
> 
> This behaviour means that some recoverable errors (such as a path error
> in multipath configurations) were reported to the guest anyway, which
> might not expect it and might consider its disk broken.
> 
> Make sure that we apply the error policy again for host_status errors,
> too. This addresses an existing FIXME comment and allows us to remove
> some comments warning that callbacks weren't always called. With this
> fix, they are called in all cases again.
> 
> The return value passed to the request callback doesn't have more free
> values that could be used to indicate host_status errors as well as SAM
> status codes and negative errno. Store the value in the host_status
> field of the SCSIRequest instead and use -ENODEV as the return value (if
> a path hasn't been reachable for a while, blk_aio_ioctl() will return
> -ENODEV instead of just setting host_status, so just reuse it here -
> it's not necessarily entirely accurate, but it's as good as any errno).
> 
> Cc: qemu-stable@nongnu.org
> Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')

Hi!

Does it make sense to apply this one for older stable qemu series?
In particular, in 8.2, we lack cfe0880835cd3
"scsi-disk: Use positive return value for status in dma_readv/writev",
which seems to be relevant here.  Or should I pick up cfe0880835cd3 too,
maybe together with 8a0495624f (a no-op, just to make this patch to apply
cleanly) and probably 9da6bd39f924?

Thanks,

/mjt




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-10 12:37 ` Michael Tokarev
@ 2025-04-10 13:14   ` Kevin Wolf
  2025-04-10 13:33     ` Michael Tokarev
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2025-04-10 13:14 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable

Am 10.04.2025 um 14:37 hat Michael Tokarev geschrieben:
> 07.04.2025 18:59, Kevin Wolf пишет:
> > Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> > apply the configured error policy. However, commit f3126d65, which was
> > supposed to be a mere refactoring for scsi-disk.c, broke this and
> > accidentally completed the SCSI request without considering the error
> > policy any more if the error was signalled in the host_status field.
> > 
> > Apart from the commit message not describing the chance as intended,
> > errors indicated in host_status are also obviously backend errors and
> > not something the guest must deal with indepdently of the error policy.
> > 
> > This behaviour means that some recoverable errors (such as a path error
> > in multipath configurations) were reported to the guest anyway, which
> > might not expect it and might consider its disk broken.
> > 
> > Make sure that we apply the error policy again for host_status errors,
> > too. This addresses an existing FIXME comment and allows us to remove
> > some comments warning that callbacks weren't always called. With this
> > fix, they are called in all cases again.
> > 
> > The return value passed to the request callback doesn't have more free
> > values that could be used to indicate host_status errors as well as SAM
> > status codes and negative errno. Store the value in the host_status
> > field of the SCSIRequest instead and use -ENODEV as the return value (if
> > a path hasn't been reachable for a while, blk_aio_ioctl() will return
> > -ENODEV instead of just setting host_status, so just reuse it here -
> > it's not necessarily entirely accurate, but it's as good as any errno).
> > 
> > Cc: qemu-stable@nongnu.org
> > Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
> 
> Does it make sense to apply this one for older stable qemu series?
> In particular, in 8.2, we lack cfe0880835cd3
> "scsi-disk: Use positive return value for status in dma_readv/writev",
> which seems to be relevant here.  Or should I pick up cfe0880835cd3 too,
> maybe together with 8a0495624f (a no-op, just to make this patch to apply
> cleanly) and probably 9da6bd39f924?

Yes, I think it makes sense to pick all of them up (and 622a7016 in the
middle, too), they were part of one series:

https://patchew.org/QEMU/20240731123207.27636-1-kwolf@redhat.com/

And this patch builds on top of that series, so rebasing it correctly
might not be trivial without the previous series.

Kevin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-10 13:14   ` Kevin Wolf
@ 2025-04-10 13:33     ` Michael Tokarev
  2025-05-13 11:53       ` Kevin Wolf
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Tokarev @ 2025-04-10 13:33 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable

10.04.2025 16:14, Kevin Wolf wrote:
> Am 10.04.2025 um 14:37 hat Michael Tokarev geschrieben:
...>> Does it make sense to apply this one for older stable qemu series?
>> In particular, in 8.2, we lack cfe0880835cd3
>> "scsi-disk: Use positive return value for status in dma_readv/writev",
>> which seems to be relevant here.  Or should I pick up cfe0880835cd3 too,
>> maybe together with 8a0495624f (a no-op, just to make this patch to apply
>> cleanly) and probably 9da6bd39f924?
> 
> Yes, I think it makes sense to pick all of them up (and 622a7016 in the
> middle, too), they were part of one series:
> 
> https://patchew.org/QEMU/20240731123207.27636-1-kwolf@redhat.com/
> 
> And this patch builds on top of that series, so rebasing it correctly
> might not be trivial without the previous series.

A (most likely small) issue here: 622a70161a "scsi-block: Don't skip
callback for sgio error status/driver_status" is on top of an earlier
commit, 1404226804 "scsi: don't lock AioContext in I/O code path",
but does not actually *require* it, since it removes whole code block
where a locking has been removed earlier by 1404226804.

Also the next comment commit, 8a0495624f "scsi-disk: Add warning comments
that host_status errors take a shortcut", clashes with e7fc3c4a8cc "scsi:
remove outdated AioContext lock comment".

This seems a bit too fragile for 8.2, don't you think?  And I haven't even
tried to check 7.2 yet :)

https://gitlab.com/mjt0k/qemu/-/tree/staging-8.2 for the current result
(not yet tested) - I dislike my comment handling at scsi_dma_complete().

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-10 13:33     ` Michael Tokarev
@ 2025-05-13 11:53       ` Kevin Wolf
  0 siblings, 0 replies; 14+ messages in thread
From: Kevin Wolf @ 2025-05-13 11:53 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable

Am 10.04.2025 um 15:33 hat Michael Tokarev geschrieben:
> 10.04.2025 16:14, Kevin Wolf wrote:
> > Am 10.04.2025 um 14:37 hat Michael Tokarev geschrieben:
> ...>> Does it make sense to apply this one for older stable qemu series?
> > > In particular, in 8.2, we lack cfe0880835cd3
> > > "scsi-disk: Use positive return value for status in dma_readv/writev",
> > > which seems to be relevant here.  Or should I pick up cfe0880835cd3 too,
> > > maybe together with 8a0495624f (a no-op, just to make this patch to apply
> > > cleanly) and probably 9da6bd39f924?
> > 
> > Yes, I think it makes sense to pick all of them up (and 622a7016 in the
> > middle, too), they were part of one series:
> > 
> > https://patchew.org/QEMU/20240731123207.27636-1-kwolf@redhat.com/
> > 
> > And this patch builds on top of that series, so rebasing it correctly
> > might not be trivial without the previous series.
> 
> A (most likely small) issue here: 622a70161a "scsi-block: Don't skip
> callback for sgio error status/driver_status" is on top of an earlier
> commit, 1404226804 "scsi: don't lock AioContext in I/O code path",
> but does not actually *require* it, since it removes whole code block
> where a locking has been removed earlier by 1404226804.

Right, a backport should just remove the locking with it. That seems to
be a trivial conflict resolution.

> Also the next comment commit, 8a0495624f "scsi-disk: Add warning comments
> that host_status errors take a shortcut", clashes with e7fc3c4a8cc "scsi:
> remove outdated AioContext lock comment".

It's only comments anyway, but the correct conflict resolution here is
to keep both comments.

> This seems a bit too fragile for 8.2, don't you think?  And I haven't even
> tried to check 7.2 yet :)

Both of the merge conflicts you mentioned above don't really seem like
problems to me. Did you find any semantic merge conflicts? That's when I
would start to be concerned about fragility.

> https://gitlab.com/mjt0k/qemu/-/tree/staging-8.2 for the current result
> (not yet tested) - I dislike my comment handling at scsi_dma_complete().

Seems you already resolved the conflicts the same was as I suggested
above.

Kevin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
                   ` (2 preceding siblings ...)
  2025-04-10 12:37 ` Michael Tokarev
@ 2025-04-10 14:25 ` Paolo Bonzini
  2025-04-10 15:28   ` Paolo Bonzini
  2025-05-12  9:23 ` Michael Tokarev
  4 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2025-04-10 14:25 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: hreitz, stefanha, qemu-devel, qemu-stable

On 4/7/25 17:59, Kevin Wolf wrote:
> Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> apply the configured error policy. However, commit f3126d65, which was
> supposed to be a mere refactoring for scsi-disk.c, broke this and
> accidentally completed the SCSI request without considering the error
> policy any more if the error was signalled in the host_status field.
> 
> Apart from the commit message not describing the chance as intended,
> errors indicated in host_status are also obviously backend errors and
> not something the guest must deal with indepdently of the error policy.
> 
> This behaviour means that some recoverable errors (such as a path error
> in multipath configurations) were reported to the guest anyway, which
> might not expect it and might consider its disk broken.
> 
> Make sure that we apply the error policy again for host_status errors,
> too. This addresses an existing FIXME comment and allows us to remove
> some comments warning that callbacks weren't always called. With this
> fix, they are called in all cases again.
> 
> The return value passed to the request callback doesn't have more free
> values that could be used to indicate host_status errors as well as SAM
> status codes and negative errno. Store the value in the host_status
> field of the SCSIRequest instead and use -ENODEV as the return value (if
> a path hasn't been reachable for a while, blk_aio_ioctl() will return
> -ENODEV instead of just setting host_status, so just reuse it here -
> it's not necessarily entirely accurate, but it's as good as any errno).
> 
> Cc: qemu-stable@nongnu.org
> Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   hw/scsi/scsi-disk.c | 39 +++++++++++++++++++++++++--------------
>   1 file changed, 25 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 8da1d5a77c..e59632e9b1 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -68,10 +68,9 @@ struct SCSIDiskClass {
>       SCSIDeviceClass parent_class;
>       /*
>        * Callbacks receive ret == 0 for success. Errors are represented either as
> -     * negative errno values, or as positive SAM status codes.
> -     *
> -     * Beware: For errors returned in host_status, the function may directly
> -     * complete the request and never call the callback.
> +     * negative errno values, or as positive SAM status codes. For host_status
> +     * errors, the function passes ret == -ENODEV and sets the host_status field
> +     * of the SCSIRequest.
>        */
>       DMAIOFunc       *dma_readv;
>       DMAIOFunc       *dma_writev;
> @@ -225,11 +224,26 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
>       SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
>       SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
>       SCSISense sense = SENSE_CODE(NO_SENSE);
> +    int16_t host_status;
>       int error;
>       bool req_has_sense = false;
>       BlockErrorAction action;
>       int status;
>   
> +    /*
> +     * host_status should only be set for SG_IO requests that came back with a
> +     * host_status error in scsi_block_sgio_complete(). This error path passes
> +     * -ENODEV as the return value.
> +     *
> +     * Reset host_status in the request because we may still want to complete
> +     * the request successfully with the 'stop' or 'ignore' error policy.
> +     */
> +    host_status = r->req.host_status;
> +    if (host_status != -1) {
> +        assert(ret == -ENODEV);
> +        r->req.host_status = -1;

You should set ret = 0 here to avoid going down the 
scsi_sense_from_errno() path.

Otherwise,

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> +    }
> +
>       if (ret < 0) {
>           status = scsi_sense_from_errno(-ret, &sense);
>           error = -ret;
> @@ -289,6 +303,10 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
>           if (acct_failed) {
>               block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
>           }
> +        if (host_status != -1) {
> +            scsi_req_complete_failed(&r->req, host_status);
> +            return true;
> +        }
>           if (req_has_sense) {
>               sdc->update_sense(&r->req);
>           } else if (status == CHECK_CONDITION) {
> @@ -409,7 +427,6 @@ done:
>       scsi_req_unref(&r->req);
>   }
>   
> -/* May not be called in all error cases, don't rely on cleanup here */
>   static void scsi_dma_complete(void *opaque, int ret)
>   {
>       SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -448,7 +465,6 @@ done:
>       scsi_req_unref(&r->req);
>   }
>   
> -/* May not be called in all error cases, don't rely on cleanup here */
>   static void scsi_read_complete(void *opaque, int ret)
>   {
>       SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -585,7 +601,6 @@ done:
>       scsi_req_unref(&r->req);
>   }
>   
> -/* May not be called in all error cases, don't rely on cleanup here */
>   static void scsi_write_complete(void * opaque, int ret)
>   {
>       SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -2846,14 +2861,10 @@ static void scsi_block_sgio_complete(void *opaque, int ret)
>       sg_io_hdr_t *io_hdr = &req->io_header;
>   
>       if (ret == 0) {
> -        /* FIXME This skips calling req->cb() and any cleanup in it */
>           if (io_hdr->host_status != SCSI_HOST_OK) {
> -            scsi_req_complete_failed(&r->req, io_hdr->host_status);
> -            scsi_req_unref(&r->req);
> -            return;
> -        }
> -
> -        if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
> +            r->req.host_status = io_hdr->host_status;
> +            ret = -ENODEV;
> +        } else if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
>               ret = BUSY;
>           } else {
>               ret = io_hdr->status;



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-10 14:25 ` Paolo Bonzini
@ 2025-04-10 15:28   ` Paolo Bonzini
  2025-04-11 10:18     ` Kevin Wolf
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2025-04-10 15:28 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: hreitz, stefanha, qemu-devel, qemu-stable

On Thu, Apr 10, 2025 at 4:25 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> You should set ret = 0 here to avoid going down the
> scsi_sense_from_errno() path.
>
> Otherwise,
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Okay, going down the scsi_sense_from_errno() path is more or less
harmless because status and sense end up unused; even though ENODEV is
not something that the function handles, that can be added as a
cleanup in 10.1.

Paolo

> > +    }
> > +
> >       if (ret < 0) {
> >           status = scsi_sense_from_errno(-ret, &sense);
> >           error = -ret;
> > @@ -289,6 +303,10 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
> >           if (acct_failed) {
> >               block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
> >           }
> > +        if (host_status != -1) {
> > +            scsi_req_complete_failed(&r->req, host_status);
> > +            return true;
> > +        }
> >           if (req_has_sense) {
> >               sdc->update_sense(&r->req);
> >           } else if (status == CHECK_CONDITION) {
> > @@ -409,7 +427,6 @@ done:
> >       scsi_req_unref(&r->req);
> >   }
> >
> > -/* May not be called in all error cases, don't rely on cleanup here */
> >   static void scsi_dma_complete(void *opaque, int ret)
> >   {
> >       SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> > @@ -448,7 +465,6 @@ done:
> >       scsi_req_unref(&r->req);
> >   }
> >
> > -/* May not be called in all error cases, don't rely on cleanup here */
> >   static void scsi_read_complete(void *opaque, int ret)
> >   {
> >       SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> > @@ -585,7 +601,6 @@ done:
> >       scsi_req_unref(&r->req);
> >   }
> >
> > -/* May not be called in all error cases, don't rely on cleanup here */
> >   static void scsi_write_complete(void * opaque, int ret)
> >   {
> >       SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> > @@ -2846,14 +2861,10 @@ static void scsi_block_sgio_complete(void *opaque, int ret)
> >       sg_io_hdr_t *io_hdr = &req->io_header;
> >
> >       if (ret == 0) {
> > -        /* FIXME This skips calling req->cb() and any cleanup in it */
> >           if (io_hdr->host_status != SCSI_HOST_OK) {
> > -            scsi_req_complete_failed(&r->req, io_hdr->host_status);
> > -            scsi_req_unref(&r->req);
> > -            return;
> > -        }
> > -
> > -        if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
> > +            r->req.host_status = io_hdr->host_status;
> > +            ret = -ENODEV;
> > +        } else if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
> >               ret = BUSY;
> >           } else {
> >               ret = io_hdr->status;
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-10 15:28   ` Paolo Bonzini
@ 2025-04-11 10:18     ` Kevin Wolf
  2025-04-11 10:20       ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2025-04-11 10:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, hreitz, stefanha, qemu-devel, qemu-stable

Am 10.04.2025 um 17:28 hat Paolo Bonzini geschrieben:
> On Thu, Apr 10, 2025 at 4:25 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > You should set ret = 0 here to avoid going down the
> > scsi_sense_from_errno() path.
> >
> > Otherwise,
> >
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> Okay, going down the scsi_sense_from_errno() path is more or less
> harmless because status and sense end up unused; even though ENODEV is
> not something that the function handles, that can be added as a
> cleanup in 10.1.

Yes, it could be handled more explicitly. I considered adding a special
if branch in scsi_handle_rw_error() for host_status != -1 before
checking ret < 0, but didn't do it in the end because the existing code
already handles it fine. If you prefer it to be there for readability, I
can send a cleanup patch.

Kevin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-11 10:18     ` Kevin Wolf
@ 2025-04-11 10:20       ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2025-04-11 10:20 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, hreitz, stefanha, qemu-devel, qemu-stable

On Fri, Apr 11, 2025 at 12:18 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > Okay, going down the scsi_sense_from_errno() path is more or less
> > harmless because status and sense end up unused; even though ENODEV is
> > not something that the function handles, that can be added as a
> > cleanup in 10.1.
>
> Yes, it could be handled more explicitly. I considered adding a special
> if branch in scsi_handle_rw_error() for host_status != -1 before
> checking ret < 0, but didn't do it in the end because the existing code
> already handles it fine. If you prefer it to be there for readability, I
> can send a cleanup patch.

Don't worry, I tried when I thought it was a bug but came to the same
conclusion.  I have sent a patch to handle ENODEV, which makes the
code a bit less mysterious, but that's it.

Paolo



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
                   ` (3 preceding siblings ...)
  2025-04-10 14:25 ` Paolo Bonzini
@ 2025-05-12  9:23 ` Michael Tokarev
  2025-05-13 11:42   ` Kevin Wolf
  4 siblings, 1 reply; 14+ messages in thread
From: Michael Tokarev @ 2025-05-12  9:23 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block
  Cc: hreitz, pbonzini, stefanha, qemu-devel, qemu-stable

On 07.04.2025 18:59, Kevin Wolf wrote:
> Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> apply the configured error policy. However, commit f3126d65, which was
> supposed to be a mere refactoring for scsi-disk.c, broke this and
> accidentally completed the SCSI request without considering the error
> policy any more if the error was signalled in the host_status field.
> 
> Apart from the commit message not describing the chance as intended,
> errors indicated in host_status are also obviously backend errors and
> not something the guest must deal with indepdently of the error policy.
> 
> This behaviour means that some recoverable errors (such as a path error
> in multipath configurations) were reported to the guest anyway, which
> might not expect it and might consider its disk broken.
> 
> Make sure that we apply the error policy again for host_status errors,
> too. This addresses an existing FIXME comment and allows us to remove
> some comments warning that callbacks weren't always called. With this
> fix, they are called in all cases again.
> 
> The return value passed to the request callback doesn't have more free
> values that could be used to indicate host_status errors as well as SAM
> status codes and negative errno. Store the value in the host_status
> field of the SCSIRequest instead and use -ENODEV as the return value (if
> a path hasn't been reachable for a while, blk_aio_ioctl() will return
> -ENODEV instead of just setting host_status, so just reuse it here -
> it's not necessarily entirely accurate, but it's as good as any errno).
> 
> Cc: qemu-stable@nongnu.org
> Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')

Ping? Is this change still needed for 10.0 (or for master)?

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-05-12  9:23 ` Michael Tokarev
@ 2025-05-13 11:42   ` Kevin Wolf
  2025-05-13 11:46     ` Michael Tokarev
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2025-05-13 11:42 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable

Am 12.05.2025 um 11:23 hat Michael Tokarev geschrieben:
> On 07.04.2025 18:59, Kevin Wolf wrote:
> > Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> > apply the configured error policy. However, commit f3126d65, which was
> > supposed to be a mere refactoring for scsi-disk.c, broke this and
> > accidentally completed the SCSI request without considering the error
> > policy any more if the error was signalled in the host_status field.
> > 
> > Apart from the commit message not describing the chance as intended,
> > errors indicated in host_status are also obviously backend errors and
> > not something the guest must deal with indepdently of the error policy.
> > 
> > This behaviour means that some recoverable errors (such as a path error
> > in multipath configurations) were reported to the guest anyway, which
> > might not expect it and might consider its disk broken.
> > 
> > Make sure that we apply the error policy again for host_status errors,
> > too. This addresses an existing FIXME comment and allows us to remove
> > some comments warning that callbacks weren't always called. With this
> > fix, they are called in all cases again.
> > 
> > The return value passed to the request callback doesn't have more free
> > values that could be used to indicate host_status errors as well as SAM
> > status codes and negative errno. Store the value in the host_status
> > field of the SCSIRequest instead and use -ENODEV as the return value (if
> > a path hasn't been reachable for a while, blk_aio_ioctl() will return
> > -ENODEV instead of just setting host_status, so just reuse it here -
> > it's not necessarily entirely accurate, but it's as good as any errno).
> > 
> > Cc: qemu-stable@nongnu.org
> > Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
> 
> Ping? Is this change still needed for 10.0 (or for master)?

This was already merged for 10.0 as commit 61b6d9b7.

Kevin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
  2025-05-13 11:42   ` Kevin Wolf
@ 2025-05-13 11:46     ` Michael Tokarev
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Tokarev @ 2025-05-13 11:46 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable

On 13.05.2025 14:42, Kevin Wolf wrote:
> Am 12.05.2025 um 11:23 hat Michael Tokarev geschrieben:
[..]
> This was already merged for 10.0 as commit 61b6d9b7.

Heck. Yes, it is, and I already queued it for 9.2, too.

Sorry for the noise.

/mjt


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-05-13 11:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
2025-04-07 18:19 ` Stefan Hajnoczi
2025-04-08 12:21 ` Hanna Czenczek
2025-04-10 12:37 ` Michael Tokarev
2025-04-10 13:14   ` Kevin Wolf
2025-04-10 13:33     ` Michael Tokarev
2025-05-13 11:53       ` Kevin Wolf
2025-04-10 14:25 ` Paolo Bonzini
2025-04-10 15:28   ` Paolo Bonzini
2025-04-11 10:18     ` Kevin Wolf
2025-04-11 10:20       ` Paolo Bonzini
2025-05-12  9:23 ` Michael Tokarev
2025-05-13 11:42   ` Kevin Wolf
2025-05-13 11:46     ` Michael Tokarev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.