* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
@ 2025-04-07 18:19 ` Stefan Hajnoczi
2025-04-08 12:21 ` Hanna Czenczek
` (3 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2025-04-07 18:19 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable
On Mon, Apr 7, 2025 at 12:00 PM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> apply the configured error policy. However, commit f3126d65, which was
> supposed to be a mere refactoring for scsi-disk.c, broke this and
> accidentally completed the SCSI request without considering the error
> policy any more if the error was signalled in the host_status field.
>
> Apart from the commit message not describing the chance as intended,
> errors indicated in host_status are also obviously backend errors and
> not something the guest must deal with indepdently of the error policy.
>
> This behaviour means that some recoverable errors (such as a path error
> in multipath configurations) were reported to the guest anyway, which
> might not expect it and might consider its disk broken.
>
> Make sure that we apply the error policy again for host_status errors,
> too. This addresses an existing FIXME comment and allows us to remove
> some comments warning that callbacks weren't always called. With this
> fix, they are called in all cases again.
>
> The return value passed to the request callback doesn't have more free
> values that could be used to indicate host_status errors as well as SAM
> status codes and negative errno. Store the value in the host_status
> field of the SCSIRequest instead and use -ENODEV as the return value (if
> a path hasn't been reachable for a while, blk_aio_ioctl() will return
> -ENODEV instead of just setting host_status, so just reuse it here -
> it's not necessarily entirely accurate, but it's as good as any errno).
>
> Cc: qemu-stable@nongnu.org
> Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> hw/scsi/scsi-disk.c | 39 +++++++++++++++++++++++++--------------
> 1 file changed, 25 insertions(+), 14 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 8da1d5a77c..e59632e9b1 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -68,10 +68,9 @@ struct SCSIDiskClass {
> SCSIDeviceClass parent_class;
> /*
> * Callbacks receive ret == 0 for success. Errors are represented either as
> - * negative errno values, or as positive SAM status codes.
> - *
> - * Beware: For errors returned in host_status, the function may directly
> - * complete the request and never call the callback.
> + * negative errno values, or as positive SAM status codes. For host_status
> + * errors, the function passes ret == -ENODEV and sets the host_status field
> + * of the SCSIRequest.
> */
> DMAIOFunc *dma_readv;
> DMAIOFunc *dma_writev;
> @@ -225,11 +224,26 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
> SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
> SCSISense sense = SENSE_CODE(NO_SENSE);
> + int16_t host_status;
> int error;
> bool req_has_sense = false;
> BlockErrorAction action;
> int status;
>
> + /*
> + * host_status should only be set for SG_IO requests that came back with a
> + * host_status error in scsi_block_sgio_complete(). This error path passes
> + * -ENODEV as the return value.
> + *
> + * Reset host_status in the request because we may still want to complete
> + * the request successfully with the 'stop' or 'ignore' error policy.
> + */
> + host_status = r->req.host_status;
> + if (host_status != -1) {
> + assert(ret == -ENODEV);
> + r->req.host_status = -1;
> + }
> +
> if (ret < 0) {
> status = scsi_sense_from_errno(-ret, &sense);
> error = -ret;
> @@ -289,6 +303,10 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
> if (acct_failed) {
> block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
> }
> + if (host_status != -1) {
> + scsi_req_complete_failed(&r->req, host_status);
> + return true;
> + }
> if (req_has_sense) {
> sdc->update_sense(&r->req);
> } else if (status == CHECK_CONDITION) {
> @@ -409,7 +427,6 @@ done:
> scsi_req_unref(&r->req);
> }
>
> -/* May not be called in all error cases, don't rely on cleanup here */
> static void scsi_dma_complete(void *opaque, int ret)
> {
> SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -448,7 +465,6 @@ done:
> scsi_req_unref(&r->req);
> }
>
> -/* May not be called in all error cases, don't rely on cleanup here */
> static void scsi_read_complete(void *opaque, int ret)
> {
> SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -585,7 +601,6 @@ done:
> scsi_req_unref(&r->req);
> }
>
> -/* May not be called in all error cases, don't rely on cleanup here */
> static void scsi_write_complete(void * opaque, int ret)
> {
> SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -2846,14 +2861,10 @@ static void scsi_block_sgio_complete(void *opaque, int ret)
> sg_io_hdr_t *io_hdr = &req->io_header;
>
> if (ret == 0) {
> - /* FIXME This skips calling req->cb() and any cleanup in it */
> if (io_hdr->host_status != SCSI_HOST_OK) {
> - scsi_req_complete_failed(&r->req, io_hdr->host_status);
> - scsi_req_unref(&r->req);
> - return;
> - }
> -
> - if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
> + r->req.host_status = io_hdr->host_status;
> + ret = -ENODEV;
> + } else if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
> ret = BUSY;
> } else {
> ret = io_hdr->status;
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
2025-04-07 18:19 ` Stefan Hajnoczi
@ 2025-04-08 12:21 ` Hanna Czenczek
2025-04-10 12:37 ` Michael Tokarev
` (2 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Hanna Czenczek @ 2025-04-08 12:21 UTC (permalink / raw)
To: Kevin Wolf, qemu-block; +Cc: pbonzini, stefanha, qemu-devel, qemu-stable
On 07.04.25 17:59, Kevin Wolf wrote:
> Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> apply the configured error policy. However, commit f3126d65, which was
> supposed to be a mere refactoring for scsi-disk.c, broke this and
> accidentally completed the SCSI request without considering the error
> policy any more if the error was signalled in the host_status field.
>
> Apart from the commit message not describing the chance as intended,
> errors indicated in host_status are also obviously backend errors and
> not something the guest must deal with indepdently of the error policy.
*independently
> This behaviour means that some recoverable errors (such as a path error
> in multipath configurations) were reported to the guest anyway, which
> might not expect it and might consider its disk broken.
>
> Make sure that we apply the error policy again for host_status errors,
> too. This addresses an existing FIXME comment and allows us to remove
> some comments warning that callbacks weren't always called. With this
> fix, they are called in all cases again.
>
> The return value passed to the request callback doesn't have more free
> values that could be used to indicate host_status errors as well as SAM
> status codes and negative errno. Store the value in the host_status
> field of the SCSIRequest instead and use -ENODEV as the return value (if
> a path hasn't been reachable for a while, blk_aio_ioctl() will return
> -ENODEV instead of just setting host_status, so just reuse it here -
> it's not necessarily entirely accurate, but it's as good as any errno).
>
> Cc: qemu-stable@nongnu.org
> Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> hw/scsi/scsi-disk.c | 39 +++++++++++++++++++++++++--------------
> 1 file changed, 25 insertions(+), 14 deletions(-)
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
2025-04-07 18:19 ` Stefan Hajnoczi
2025-04-08 12:21 ` Hanna Czenczek
@ 2025-04-10 12:37 ` Michael Tokarev
2025-04-10 13:14 ` Kevin Wolf
2025-04-10 14:25 ` Paolo Bonzini
2025-05-12 9:23 ` Michael Tokarev
4 siblings, 1 reply; 14+ messages in thread
From: Michael Tokarev @ 2025-04-10 12:37 UTC (permalink / raw)
To: Kevin Wolf, qemu-block
Cc: hreitz, pbonzini, stefanha, qemu-devel, qemu-stable
07.04.2025 18:59, Kevin Wolf пишет:
> Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> apply the configured error policy. However, commit f3126d65, which was
> supposed to be a mere refactoring for scsi-disk.c, broke this and
> accidentally completed the SCSI request without considering the error
> policy any more if the error was signalled in the host_status field.
>
> Apart from the commit message not describing the chance as intended,
> errors indicated in host_status are also obviously backend errors and
> not something the guest must deal with indepdently of the error policy.
>
> This behaviour means that some recoverable errors (such as a path error
> in multipath configurations) were reported to the guest anyway, which
> might not expect it and might consider its disk broken.
>
> Make sure that we apply the error policy again for host_status errors,
> too. This addresses an existing FIXME comment and allows us to remove
> some comments warning that callbacks weren't always called. With this
> fix, they are called in all cases again.
>
> The return value passed to the request callback doesn't have more free
> values that could be used to indicate host_status errors as well as SAM
> status codes and negative errno. Store the value in the host_status
> field of the SCSIRequest instead and use -ENODEV as the return value (if
> a path hasn't been reachable for a while, blk_aio_ioctl() will return
> -ENODEV instead of just setting host_status, so just reuse it here -
> it's not necessarily entirely accurate, but it's as good as any errno).
>
> Cc: qemu-stable@nongnu.org
> Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
Hi!
Does it make sense to apply this one for older stable qemu series?
In particular, in 8.2, we lack cfe0880835cd3
"scsi-disk: Use positive return value for status in dma_readv/writev",
which seems to be relevant here. Or should I pick up cfe0880835cd3 too,
maybe together with 8a0495624f (a no-op, just to make this patch to apply
cleanly) and probably 9da6bd39f924?
Thanks,
/mjt
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-10 12:37 ` Michael Tokarev
@ 2025-04-10 13:14 ` Kevin Wolf
2025-04-10 13:33 ` Michael Tokarev
0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2025-04-10 13:14 UTC (permalink / raw)
To: Michael Tokarev
Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable
Am 10.04.2025 um 14:37 hat Michael Tokarev geschrieben:
> 07.04.2025 18:59, Kevin Wolf пишет:
> > Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> > apply the configured error policy. However, commit f3126d65, which was
> > supposed to be a mere refactoring for scsi-disk.c, broke this and
> > accidentally completed the SCSI request without considering the error
> > policy any more if the error was signalled in the host_status field.
> >
> > Apart from the commit message not describing the chance as intended,
> > errors indicated in host_status are also obviously backend errors and
> > not something the guest must deal with indepdently of the error policy.
> >
> > This behaviour means that some recoverable errors (such as a path error
> > in multipath configurations) were reported to the guest anyway, which
> > might not expect it and might consider its disk broken.
> >
> > Make sure that we apply the error policy again for host_status errors,
> > too. This addresses an existing FIXME comment and allows us to remove
> > some comments warning that callbacks weren't always called. With this
> > fix, they are called in all cases again.
> >
> > The return value passed to the request callback doesn't have more free
> > values that could be used to indicate host_status errors as well as SAM
> > status codes and negative errno. Store the value in the host_status
> > field of the SCSIRequest instead and use -ENODEV as the return value (if
> > a path hasn't been reachable for a while, blk_aio_ioctl() will return
> > -ENODEV instead of just setting host_status, so just reuse it here -
> > it's not necessarily entirely accurate, but it's as good as any errno).
> >
> > Cc: qemu-stable@nongnu.org
> > Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
>
> Does it make sense to apply this one for older stable qemu series?
> In particular, in 8.2, we lack cfe0880835cd3
> "scsi-disk: Use positive return value for status in dma_readv/writev",
> which seems to be relevant here. Or should I pick up cfe0880835cd3 too,
> maybe together with 8a0495624f (a no-op, just to make this patch to apply
> cleanly) and probably 9da6bd39f924?
Yes, I think it makes sense to pick all of them up (and 622a7016 in the
middle, too), they were part of one series:
https://patchew.org/QEMU/20240731123207.27636-1-kwolf@redhat.com/
And this patch builds on top of that series, so rebasing it correctly
might not be trivial without the previous series.
Kevin
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-10 13:14 ` Kevin Wolf
@ 2025-04-10 13:33 ` Michael Tokarev
2025-05-13 11:53 ` Kevin Wolf
0 siblings, 1 reply; 14+ messages in thread
From: Michael Tokarev @ 2025-04-10 13:33 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable
10.04.2025 16:14, Kevin Wolf wrote:
> Am 10.04.2025 um 14:37 hat Michael Tokarev geschrieben:
...>> Does it make sense to apply this one for older stable qemu series?
>> In particular, in 8.2, we lack cfe0880835cd3
>> "scsi-disk: Use positive return value for status in dma_readv/writev",
>> which seems to be relevant here. Or should I pick up cfe0880835cd3 too,
>> maybe together with 8a0495624f (a no-op, just to make this patch to apply
>> cleanly) and probably 9da6bd39f924?
>
> Yes, I think it makes sense to pick all of them up (and 622a7016 in the
> middle, too), they were part of one series:
>
> https://patchew.org/QEMU/20240731123207.27636-1-kwolf@redhat.com/
>
> And this patch builds on top of that series, so rebasing it correctly
> might not be trivial without the previous series.
A (most likely small) issue here: 622a70161a "scsi-block: Don't skip
callback for sgio error status/driver_status" is on top of an earlier
commit, 1404226804 "scsi: don't lock AioContext in I/O code path",
but does not actually *require* it, since it removes whole code block
where a locking has been removed earlier by 1404226804.
Also the next comment commit, 8a0495624f "scsi-disk: Add warning comments
that host_status errors take a shortcut", clashes with e7fc3c4a8cc "scsi:
remove outdated AioContext lock comment".
This seems a bit too fragile for 8.2, don't you think? And I haven't even
tried to check 7.2 yet :)
https://gitlab.com/mjt0k/qemu/-/tree/staging-8.2 for the current result
(not yet tested) - I dislike my comment handling at scsi_dma_complete().
Thanks,
/mjt
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-10 13:33 ` Michael Tokarev
@ 2025-05-13 11:53 ` Kevin Wolf
0 siblings, 0 replies; 14+ messages in thread
From: Kevin Wolf @ 2025-05-13 11:53 UTC (permalink / raw)
To: Michael Tokarev
Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable
Am 10.04.2025 um 15:33 hat Michael Tokarev geschrieben:
> 10.04.2025 16:14, Kevin Wolf wrote:
> > Am 10.04.2025 um 14:37 hat Michael Tokarev geschrieben:
> ...>> Does it make sense to apply this one for older stable qemu series?
> > > In particular, in 8.2, we lack cfe0880835cd3
> > > "scsi-disk: Use positive return value for status in dma_readv/writev",
> > > which seems to be relevant here. Or should I pick up cfe0880835cd3 too,
> > > maybe together with 8a0495624f (a no-op, just to make this patch to apply
> > > cleanly) and probably 9da6bd39f924?
> >
> > Yes, I think it makes sense to pick all of them up (and 622a7016 in the
> > middle, too), they were part of one series:
> >
> > https://patchew.org/QEMU/20240731123207.27636-1-kwolf@redhat.com/
> >
> > And this patch builds on top of that series, so rebasing it correctly
> > might not be trivial without the previous series.
>
> A (most likely small) issue here: 622a70161a "scsi-block: Don't skip
> callback for sgio error status/driver_status" is on top of an earlier
> commit, 1404226804 "scsi: don't lock AioContext in I/O code path",
> but does not actually *require* it, since it removes whole code block
> where a locking has been removed earlier by 1404226804.
Right, a backport should just remove the locking with it. That seems to
be a trivial conflict resolution.
> Also the next comment commit, 8a0495624f "scsi-disk: Add warning comments
> that host_status errors take a shortcut", clashes with e7fc3c4a8cc "scsi:
> remove outdated AioContext lock comment".
It's only comments anyway, but the correct conflict resolution here is
to keep both comments.
> This seems a bit too fragile for 8.2, don't you think? And I haven't even
> tried to check 7.2 yet :)
Both of the merge conflicts you mentioned above don't really seem like
problems to me. Did you find any semantic merge conflicts? That's when I
would start to be concerned about fragility.
> https://gitlab.com/mjt0k/qemu/-/tree/staging-8.2 for the current result
> (not yet tested) - I dislike my comment handling at scsi_dma_complete().
Seems you already resolved the conflicts the same was as I suggested
above.
Kevin
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
` (2 preceding siblings ...)
2025-04-10 12:37 ` Michael Tokarev
@ 2025-04-10 14:25 ` Paolo Bonzini
2025-04-10 15:28 ` Paolo Bonzini
2025-05-12 9:23 ` Michael Tokarev
4 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2025-04-10 14:25 UTC (permalink / raw)
To: Kevin Wolf, qemu-block; +Cc: hreitz, stefanha, qemu-devel, qemu-stable
On 4/7/25 17:59, Kevin Wolf wrote:
> Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> apply the configured error policy. However, commit f3126d65, which was
> supposed to be a mere refactoring for scsi-disk.c, broke this and
> accidentally completed the SCSI request without considering the error
> policy any more if the error was signalled in the host_status field.
>
> Apart from the commit message not describing the chance as intended,
> errors indicated in host_status are also obviously backend errors and
> not something the guest must deal with indepdently of the error policy.
>
> This behaviour means that some recoverable errors (such as a path error
> in multipath configurations) were reported to the guest anyway, which
> might not expect it and might consider its disk broken.
>
> Make sure that we apply the error policy again for host_status errors,
> too. This addresses an existing FIXME comment and allows us to remove
> some comments warning that callbacks weren't always called. With this
> fix, they are called in all cases again.
>
> The return value passed to the request callback doesn't have more free
> values that could be used to indicate host_status errors as well as SAM
> status codes and negative errno. Store the value in the host_status
> field of the SCSIRequest instead and use -ENODEV as the return value (if
> a path hasn't been reachable for a while, blk_aio_ioctl() will return
> -ENODEV instead of just setting host_status, so just reuse it here -
> it's not necessarily entirely accurate, but it's as good as any errno).
>
> Cc: qemu-stable@nongnu.org
> Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> hw/scsi/scsi-disk.c | 39 +++++++++++++++++++++++++--------------
> 1 file changed, 25 insertions(+), 14 deletions(-)
>
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 8da1d5a77c..e59632e9b1 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -68,10 +68,9 @@ struct SCSIDiskClass {
> SCSIDeviceClass parent_class;
> /*
> * Callbacks receive ret == 0 for success. Errors are represented either as
> - * negative errno values, or as positive SAM status codes.
> - *
> - * Beware: For errors returned in host_status, the function may directly
> - * complete the request and never call the callback.
> + * negative errno values, or as positive SAM status codes. For host_status
> + * errors, the function passes ret == -ENODEV and sets the host_status field
> + * of the SCSIRequest.
> */
> DMAIOFunc *dma_readv;
> DMAIOFunc *dma_writev;
> @@ -225,11 +224,26 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
> SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
> SCSISense sense = SENSE_CODE(NO_SENSE);
> + int16_t host_status;
> int error;
> bool req_has_sense = false;
> BlockErrorAction action;
> int status;
>
> + /*
> + * host_status should only be set for SG_IO requests that came back with a
> + * host_status error in scsi_block_sgio_complete(). This error path passes
> + * -ENODEV as the return value.
> + *
> + * Reset host_status in the request because we may still want to complete
> + * the request successfully with the 'stop' or 'ignore' error policy.
> + */
> + host_status = r->req.host_status;
> + if (host_status != -1) {
> + assert(ret == -ENODEV);
> + r->req.host_status = -1;
You should set ret = 0 here to avoid going down the
scsi_sense_from_errno() path.
Otherwise,
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> + }
> +
> if (ret < 0) {
> status = scsi_sense_from_errno(-ret, &sense);
> error = -ret;
> @@ -289,6 +303,10 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
> if (acct_failed) {
> block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
> }
> + if (host_status != -1) {
> + scsi_req_complete_failed(&r->req, host_status);
> + return true;
> + }
> if (req_has_sense) {
> sdc->update_sense(&r->req);
> } else if (status == CHECK_CONDITION) {
> @@ -409,7 +427,6 @@ done:
> scsi_req_unref(&r->req);
> }
>
> -/* May not be called in all error cases, don't rely on cleanup here */
> static void scsi_dma_complete(void *opaque, int ret)
> {
> SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -448,7 +465,6 @@ done:
> scsi_req_unref(&r->req);
> }
>
> -/* May not be called in all error cases, don't rely on cleanup here */
> static void scsi_read_complete(void *opaque, int ret)
> {
> SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -585,7 +601,6 @@ done:
> scsi_req_unref(&r->req);
> }
>
> -/* May not be called in all error cases, don't rely on cleanup here */
> static void scsi_write_complete(void * opaque, int ret)
> {
> SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> @@ -2846,14 +2861,10 @@ static void scsi_block_sgio_complete(void *opaque, int ret)
> sg_io_hdr_t *io_hdr = &req->io_header;
>
> if (ret == 0) {
> - /* FIXME This skips calling req->cb() and any cleanup in it */
> if (io_hdr->host_status != SCSI_HOST_OK) {
> - scsi_req_complete_failed(&r->req, io_hdr->host_status);
> - scsi_req_unref(&r->req);
> - return;
> - }
> -
> - if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
> + r->req.host_status = io_hdr->host_status;
> + ret = -ENODEV;
> + } else if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
> ret = BUSY;
> } else {
> ret = io_hdr->status;
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-10 14:25 ` Paolo Bonzini
@ 2025-04-10 15:28 ` Paolo Bonzini
2025-04-11 10:18 ` Kevin Wolf
0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2025-04-10 15:28 UTC (permalink / raw)
To: Kevin Wolf, qemu-block; +Cc: hreitz, stefanha, qemu-devel, qemu-stable
On Thu, Apr 10, 2025 at 4:25 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> You should set ret = 0 here to avoid going down the
> scsi_sense_from_errno() path.
>
> Otherwise,
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Okay, going down the scsi_sense_from_errno() path is more or less
harmless because status and sense end up unused; even though ENODEV is
not something that the function handles, that can be added as a
cleanup in 10.1.
Paolo
> > + }
> > +
> > if (ret < 0) {
> > status = scsi_sense_from_errno(-ret, &sense);
> > error = -ret;
> > @@ -289,6 +303,10 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed)
> > if (acct_failed) {
> > block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
> > }
> > + if (host_status != -1) {
> > + scsi_req_complete_failed(&r->req, host_status);
> > + return true;
> > + }
> > if (req_has_sense) {
> > sdc->update_sense(&r->req);
> > } else if (status == CHECK_CONDITION) {
> > @@ -409,7 +427,6 @@ done:
> > scsi_req_unref(&r->req);
> > }
> >
> > -/* May not be called in all error cases, don't rely on cleanup here */
> > static void scsi_dma_complete(void *opaque, int ret)
> > {
> > SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> > @@ -448,7 +465,6 @@ done:
> > scsi_req_unref(&r->req);
> > }
> >
> > -/* May not be called in all error cases, don't rely on cleanup here */
> > static void scsi_read_complete(void *opaque, int ret)
> > {
> > SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> > @@ -585,7 +601,6 @@ done:
> > scsi_req_unref(&r->req);
> > }
> >
> > -/* May not be called in all error cases, don't rely on cleanup here */
> > static void scsi_write_complete(void * opaque, int ret)
> > {
> > SCSIDiskReq *r = (SCSIDiskReq *)opaque;
> > @@ -2846,14 +2861,10 @@ static void scsi_block_sgio_complete(void *opaque, int ret)
> > sg_io_hdr_t *io_hdr = &req->io_header;
> >
> > if (ret == 0) {
> > - /* FIXME This skips calling req->cb() and any cleanup in it */
> > if (io_hdr->host_status != SCSI_HOST_OK) {
> > - scsi_req_complete_failed(&r->req, io_hdr->host_status);
> > - scsi_req_unref(&r->req);
> > - return;
> > - }
> > -
> > - if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
> > + r->req.host_status = io_hdr->host_status;
> > + ret = -ENODEV;
> > + } else if (io_hdr->driver_status & SG_ERR_DRIVER_TIMEOUT) {
> > ret = BUSY;
> > } else {
> > ret = io_hdr->status;
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-10 15:28 ` Paolo Bonzini
@ 2025-04-11 10:18 ` Kevin Wolf
2025-04-11 10:20 ` Paolo Bonzini
0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2025-04-11 10:18 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-block, hreitz, stefanha, qemu-devel, qemu-stable
Am 10.04.2025 um 17:28 hat Paolo Bonzini geschrieben:
> On Thu, Apr 10, 2025 at 4:25 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > You should set ret = 0 here to avoid going down the
> > scsi_sense_from_errno() path.
> >
> > Otherwise,
> >
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>
> Okay, going down the scsi_sense_from_errno() path is more or less
> harmless because status and sense end up unused; even though ENODEV is
> not something that the function handles, that can be added as a
> cleanup in 10.1.
Yes, it could be handled more explicitly. I considered adding a special
if branch in scsi_handle_rw_error() for host_status != -1 before
checking ret < 0, but didn't do it in the end because the existing code
already handles it fine. If you prefer it to be there for readability, I
can send a cleanup patch.
Kevin
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-11 10:18 ` Kevin Wolf
@ 2025-04-11 10:20 ` Paolo Bonzini
0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2025-04-11 10:20 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, hreitz, stefanha, qemu-devel, qemu-stable
On Fri, Apr 11, 2025 at 12:18 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > Okay, going down the scsi_sense_from_errno() path is more or less
> > harmless because status and sense end up unused; even though ENODEV is
> > not something that the function handles, that can be added as a
> > cleanup in 10.1.
>
> Yes, it could be handled more explicitly. I considered adding a special
> if branch in scsi_handle_rw_error() for host_status != -1 before
> checking ret < 0, but didn't do it in the end because the existing code
> already handles it fine. If you prefer it to be there for readability, I
> can send a cleanup patch.
Don't worry, I tried when I thought it was a bug but came to the same
conclusion. I have sent a patch to handle ENODEV, which makes the
code a bit less mysterious, but that's it.
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-04-07 15:59 [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again Kevin Wolf
` (3 preceding siblings ...)
2025-04-10 14:25 ` Paolo Bonzini
@ 2025-05-12 9:23 ` Michael Tokarev
2025-05-13 11:42 ` Kevin Wolf
4 siblings, 1 reply; 14+ messages in thread
From: Michael Tokarev @ 2025-05-12 9:23 UTC (permalink / raw)
To: Kevin Wolf, qemu-block
Cc: hreitz, pbonzini, stefanha, qemu-devel, qemu-stable
On 07.04.2025 18:59, Kevin Wolf wrote:
> Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> apply the configured error policy. However, commit f3126d65, which was
> supposed to be a mere refactoring for scsi-disk.c, broke this and
> accidentally completed the SCSI request without considering the error
> policy any more if the error was signalled in the host_status field.
>
> Apart from the commit message not describing the chance as intended,
> errors indicated in host_status are also obviously backend errors and
> not something the guest must deal with indepdently of the error policy.
>
> This behaviour means that some recoverable errors (such as a path error
> in multipath configurations) were reported to the guest anyway, which
> might not expect it and might consider its disk broken.
>
> Make sure that we apply the error policy again for host_status errors,
> too. This addresses an existing FIXME comment and allows us to remove
> some comments warning that callbacks weren't always called. With this
> fix, they are called in all cases again.
>
> The return value passed to the request callback doesn't have more free
> values that could be used to indicate host_status errors as well as SAM
> status codes and negative errno. Store the value in the host_status
> field of the SCSIRequest instead and use -ENODEV as the return value (if
> a path hasn't been reachable for a while, blk_aio_ioctl() will return
> -ENODEV instead of just setting host_status, so just reuse it here -
> it's not necessarily entirely accurate, but it's as good as any errno).
>
> Cc: qemu-stable@nongnu.org
> Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
Ping? Is this change still needed for 10.0 (or for master)?
Thanks,
/mjt
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-05-12 9:23 ` Michael Tokarev
@ 2025-05-13 11:42 ` Kevin Wolf
2025-05-13 11:46 ` Michael Tokarev
0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2025-05-13 11:42 UTC (permalink / raw)
To: Michael Tokarev
Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable
Am 12.05.2025 um 11:23 hat Michael Tokarev geschrieben:
> On 07.04.2025 18:59, Kevin Wolf wrote:
> > Originally, all failed SG_IO requests called scsi_handle_rw_error() to
> > apply the configured error policy. However, commit f3126d65, which was
> > supposed to be a mere refactoring for scsi-disk.c, broke this and
> > accidentally completed the SCSI request without considering the error
> > policy any more if the error was signalled in the host_status field.
> >
> > Apart from the commit message not describing the chance as intended,
> > errors indicated in host_status are also obviously backend errors and
> > not something the guest must deal with indepdently of the error policy.
> >
> > This behaviour means that some recoverable errors (such as a path error
> > in multipath configurations) were reported to the guest anyway, which
> > might not expect it and might consider its disk broken.
> >
> > Make sure that we apply the error policy again for host_status errors,
> > too. This addresses an existing FIXME comment and allows us to remove
> > some comments warning that callbacks weren't always called. With this
> > fix, they are called in all cases again.
> >
> > The return value passed to the request callback doesn't have more free
> > values that could be used to indicate host_status errors as well as SAM
> > status codes and negative errno. Store the value in the host_status
> > field of the SCSIRequest instead and use -ENODEV as the return value (if
> > a path hasn't been reachable for a while, blk_aio_ioctl() will return
> > -ENODEV instead of just setting host_status, so just reuse it here -
> > it's not necessarily entirely accurate, but it's as good as any errno).
> >
> > Cc: qemu-stable@nongnu.org
> > Fixes: f3126d65b393 ('scsi: move host_status handling into SCSI drivers')
>
> Ping? Is this change still needed for 10.0 (or for master)?
This was already merged for 10.0 as commit 61b6d9b7.
Kevin
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH for-10.0] scsi-disk: Apply error policy for host_status errors again
2025-05-13 11:42 ` Kevin Wolf
@ 2025-05-13 11:46 ` Michael Tokarev
0 siblings, 0 replies; 14+ messages in thread
From: Michael Tokarev @ 2025-05-13 11:46 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-block, hreitz, pbonzini, stefanha, qemu-devel, qemu-stable
On 13.05.2025 14:42, Kevin Wolf wrote:
> Am 12.05.2025 um 11:23 hat Michael Tokarev geschrieben:
[..]
> This was already merged for 10.0 as commit 61b6d9b7.
Heck. Yes, it is, and I already queued it for 9.2, too.
Sorry for the noise.
/mjt
^ permalink raw reply [flat|nested] 14+ messages in thread