From: Eric Farman <farman@linux.ibm.com>
To: Cornelia Huck <cohuck@redhat.com>, Halil Pasic <pasic@linux.ibm.com>
Cc: Jason Herne <jjherne@linux.ibm.com>,
qemu-s390x@nongnu.org, Jared Rossi <jrossi@linux.ibm.com>,
qemu-devel@nongnu.org
Subject: Re: [RFC PATCH v2 1/7] vfio-ccw: Return IOINST_CC_NOT_OPERATIONAL for EIO
Date: Mon, 6 Apr 2020 14:21:17 -0400 [thread overview]
Message-ID: <06a9fc5c-be2d-f152-de61-e815d7454692@linux.ibm.com> (raw)
In-Reply-To: <20200401105258.3e885efb.cohuck@redhat.com>
On 4/1/20 4:52 AM, Cornelia Huck wrote:
> On Wed, 25 Mar 2020 03:24:28 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
>
>> On Tue, 24 Mar 2020 18:04:30 +0100
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>> On Thu, 6 Feb 2020 22:45:03 +0100
>>> Eric Farman <farman@linux.ibm.com> wrote:
>>>
>>>> From: Farhan Ali <alifm@linux.ibm.com>
>>>>
>>>> EIO is returned by vfio-ccw mediated device when the backing
>>>> host subchannel is not operational anymore. So return cc=3
>>>> back to the guest, rather than returning a unit check.
>>>> This way the guest can take appropriate action such as
>>>> issue an 'stsch'.
>>
>> I believe this is not the only situation when vfio-ccw returns
>> EIO, or?
>>
>>>>
>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>>> ---
>>>>
>>>> Notes:
>>>> v1->v2: [EF]
>>>> - Add s-o-b
>>>> - [Seems the discussion on v1 centered on the return code
>>>> set in the kernel, rather than anything that needs to
>>>> change here, unless I've missed something.]
>>
>> Does this need to change here? If the kernel is supposed to return ENODEV
>> then this does not need to change.
>>
>>>
>>> I've stared at this and at the kernel code for some time again; and I'm
>>> not sure if "return -EIO == not operational" is even true. That said,
>>> I'm not sure a unit check is the right response, either. The only thing
>>> I'm sure about is that the kernel code needs some review of return
>>> codes and some documentation...
>>
>> I could not agree more, this is semantically uapi and needs to be
>> properly documented.
>>
>> With regards to "linux error codes: vs "ionist cc's" an where
>> the mapping is different example:
>>
>> """
>> /**
>> * cio_cancel_halt_clear - Cancel running I/O by performing cancel, halt
>> * and clear ordinally if subchannel is valid.
>> * @sch: subchannel on which to perform the cancel_halt_clear operation
>> * @iretry: the number of the times remained to retry the next operation
>> *
>> * This should be called repeatedly since halt/clear are asynchronous
>> * operations. We do one try with cio_cancel, three tries with cio_halt,
>> * 255 tries with cio_clear. The caller should initialize @iretry with
>> * the value 255 for its first call to this, and keep using the same
>> * @iretry in the subsequent calls until it gets a non -EBUSY return.
>> *
>> * Returns 0 if device now idle, -ENODEV for device not operational,
>> * -EBUSY if an interrupt is expected (either from halt/clear or from a
>> * status pending), and -EIO if out of retries.
>> */
>> int cio_cancel_halt_clear(struct subchannel *sch, int *iretry)
>>
>> """
>> Here -ENODEV is not operational.
>
> Ok, I went through the kernel code and checked which error may be
> returned in which case (hope I caught all of them). Here's what I
> currently have:
Thanks for doing all this mapping!
>
> diff --git a/Documentation/s390/vfio-ccw.rst b/Documentation/s390/vfio-ccw.rst
> index fca9c4f5bd9c..43f375a03cce 100644
> --- a/Documentation/s390/vfio-ccw.rst
> +++ b/Documentation/s390/vfio-ccw.rst
> @@ -210,7 +210,36 @@ Subchannel.
>
> irb_area stores the I/O result.
>
> -ret_code stores a return code for each access of the region.
> +ret_code stores a return code for each access of the region. The following
> +values may occur:
> +
> +0
> + The operation was successful.
> +
> +-EOPNOTSUPP
> + The orb specified transport mode or an unidentified IDAW format, did not
> + specify prefetch mode, or the scsw specified a function other than the
> + start function.
> +
> +-EIO
> + A request was issued while the device was not in a state ready to accept
> + requests, or an internal error occurred.
> +
> +-EBUSY
> + The subchannel was status pending or busy, or a request is already active.
> +
> +-EAGAIN
> + A request was being processed, and the caller should retry.
> +
> +-EACCES
> + The channel path(s) used for the I/O were found to be not operational.
> +
> +-ENODEV
> + The device was found to be not operational.
> +
> +-EINVAL
> + The orb specified a chain longer than 255 ccws, or an internal error
> + occurred.
>
> This region is always available.
>
> @@ -231,6 +260,29 @@ This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD.
>
> Currently, CLEAR SUBCHANNEL and HALT SUBCHANNEL use this region.
>
> +command specifies the command to be issued; ret_code stores a return code
> +for each access of the region. The following values may occur:
> +
> +0
> + The operation was successful.
> +
> +-ENODEV
> + The device was found to be not operational.
> +
> +-EINVAL
> + A command other than halt or clear was specified.
> +
> +-EIO
> + A request was issued while the device was not in a state ready to accept
> + requests.
> +
> +-EAGAIN
> + A request was being processed, and the caller should retry.
> +
> +-EBUSY
> + The subchannel was status pending or busy while processing a halt request.
> +
> +
> vfio-ccw operation details
> --------------------------
>
> Unless we interpret "device in wrong state" as "not operational",
> mapping -EIO to cc 3 is probably not the right thing to do; but
> generating a unit exception probably isn't either. Honestly, I'm unsure
> what the right thing to do here would be...
Me neither. I grepped my qemu logs for the past, ugh, too long for the
error report that would precede these failures ("vfio-ccw: [wirte/write]
I/O region failed with errno=%d"). Excluding the ones that were
obviously the result of half-baked code, all instances were either
-EBUSY or -ENODEV. Could I trigger one? Maybe. Is it likely? Doesn't
seem to be.
It seems that getting -EIO would indicate we got ourselves out of sync,
and started buttoning up the device again (or never having opened it in
the first place), so a unit exception might be valid as a "hey
something's screwy here" ?
>
> Another problem is that -EIO might signal "something went wrong in the
> kernel code" - should not happen, but would certainly not be the
> caller's fault, and they can't do anything about it. That "internal
> error" thing might also be signaled by -EINVAL (which is odd), but
> -EINVAL could also mean "the ccw chain is too long", for which
> -EOPNOTSUPP would probably be a better return code, as it's a
> limitation in the code, not the architecture, IIRC. Not sure if we can
> still change that, though (and QEMU handles both in the same way,
> anyway.)
>
> The other return codes look sane, and the return codes for the async
> region also seem sane (but have the same issue with -EIO == "device in
> wrong state").
>
> Looking at the QEMU handling, I think the -EIO handling is a bit
> questionable (without an obvious solution), while mapping -EBUSY to cc
> 2 is the only reasonable thing to do given that the interface does not
> differentiate between "busy" and "status pending". The rest seems sane.
>
So maybe with all this data, and absent a better solution, it's best to
just drop this patch from v3?
next prev parent reply other threads:[~2020-04-06 18:34 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-06 21:45 [RFC PATCH v2 0/7] s390x/vfio_ccw: Channel Path Handling [QEMU] Eric Farman
2020-02-06 21:45 ` [RFC PATCH v2 1/7] vfio-ccw: Return IOINST_CC_NOT_OPERATIONAL for EIO Eric Farman
2020-03-24 17:04 ` Cornelia Huck
2020-03-25 2:24 ` Halil Pasic
2020-04-01 8:52 ` Cornelia Huck
2020-04-06 18:21 ` Eric Farman [this message]
2020-04-07 6:28 ` Cornelia Huck
2020-04-07 10:18 ` Eric Farman
2020-02-06 21:45 ` [RFC PATCH v2 2/7] linux-headers: update Eric Farman
2020-02-06 21:45 ` [RFC PATCH v2 3/7] vfio-ccw: Refactor cleanup of regions Eric Farman
2020-02-06 21:45 ` [RFC PATCH v2 4/7] vfio-ccw: Add support for the schib region Eric Farman
2020-02-06 21:45 ` [RFC PATCH v2 5/7] vfio-ccw: Add support for the crw region Eric Farman
2020-02-06 21:45 ` [RFC PATCH v2 6/7] vfio-ccw: Refactor ccw irq handler Eric Farman
2020-03-24 17:32 ` Cornelia Huck
2020-02-06 21:45 ` [RFC PATCH v2 7/7] vfio-ccw: Add support for the CRW irq Eric Farman
2020-04-06 16:22 ` Cornelia Huck
2020-04-06 21:37 ` Eric Farman
2020-04-07 6:35 ` Cornelia Huck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=06a9fc5c-be2d-f152-de61-e815d7454692@linux.ibm.com \
--to=farman@linux.ibm.com \
--cc=cohuck@redhat.com \
--cc=jjherne@linux.ibm.com \
--cc=jrossi@linux.ibm.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-s390x@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).