From: "Cédric Le Goater" <clg@redhat.com>
To: Markus Armbruster <armbru@redhat.com>, Farhan Ali <alifm@linux.ibm.com>
Cc: qemu-devel@nongnu.org, qemu-s390x@nongnu.org,
mjrosato@linux.ibm.com, thuth@redhat.com,
alex.williamson@redhat.com
Subject: Re: [PATCH v3 2/5] vfio/pci: Add an error handler callback
Date: Sat, 27 Sep 2025 09:05:12 +0200 [thread overview]
Message-ID: <4207529b-a0a5-4360-8449-f4c20661e9e8@redhat.com> (raw)
In-Reply-To: <87ms6gmo4j.fsf@pond.sub.org>
On 9/27/25 07:59, Markus Armbruster wrote:
> Farhan Ali <alifm@linux.ibm.com> writes:
>
>> On 9/25/2025 9:57 PM, Markus Armbruster wrote:
>>> Farhan Ali <alifm@linux.ibm.com> writes:
>>>
>>>> Provide a vfio error handling callback, that can be used by devices to
>>>> handle PCI errors for passthrough devices.
>>>>
>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>> ---
>>>> hw/vfio/pci.c | 8 ++++++++
>>>> hw/vfio/pci.h | 1 +
>>>> 2 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index bc0b4c4d56..b02a974954 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -3063,11 +3063,19 @@ void vfio_pci_put_device(VFIOPCIDevice *vdev)
>>>> static void vfio_err_notifier_handler(void *opaque)
>>>> {
>>>> VFIOPCIDevice *vdev = opaque;
>>>> + Error *err = NULL;
>>>>
>>>> if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
>>>> return;
>>>> }
>>>>
>>>> + if (vdev->err_handler) {
>>>> + if (vdev->err_handler(vdev, &err)) {
>>>> + return;
>>>> + }
>>>> + error_report_err(err);
>>>> + }
>>>
>>> This is unusual.
>>>
>>> Functions taking an Error ** argument usually do so to report errors.
>>> The rules spelled out in qapi/error.h apply. In particular:
>>>
>>> * - On success, the function should not touch *errp. On failure, it
>>> * should set a new error, e.g. with error_setg(errp, ...), or
>>> * propagate an existing one, e.g. with error_propagate(errp, ...).
>>> *
>>> * - Whenever practical, also return a value that indicates success /
>>> * failure. This can make the error checking more concise, and can
>>> * avoid useless error object creation and destruction. Note that
>>> * we still have many functions returning void. We recommend
>>> * • bool-valued functions return true on success / false on failure,
>>>
>>> If ->err_handler() behaved that way, it @err would be null after it
>>> returns false. We'd call error_report_err(NULL), and crash.
>>>
>>> Functions with unusual behavior need a contract: a comment spelling out
>>> their behavior.
>>>
>>> What is the intended behavior of the err_handler() callback?
>>
>> Hi Markus,
>>
>> Thanks for reviewing! The intended behavior for err_handler() is to set errp and report the error on false/failure. With the above code, I also intended fall through to vm_stop() when err_handler() fails.
>>
>> I think I misunderstood the errp error handling, it seems like the correct way to do what I intended would be
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index b02a974954..630de46c90 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3070,10 +3070,11 @@ static void vfio_err_notifier_handler(void *opaque)
>> }
>>
>> if (vdev->err_handler) {
>> - if (vdev->err_handler(vdev, &err)) {
>> + if (!vdev->err_handler(vdev, &err)) {
>> + error_report_err(err);
>> + } else {
>> return;
>> }
>> - error_report_err(err);
>> }
>>
>> Please correct me if I missed anything.
>
> Resulting function:
>
> static void vfio_err_notifier_handler(void *opaque)
> {
> VFIOPCIDevice *vdev = opaque;
> Error *err = NULL;
>
> if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
> return;
> }
>
> if (vdev->err_handler) {
> if (!vdev->err_handler(vdev, &err)) {
> error_report_err(err);
> } else {
> return;
> }
> }
>
> /*
> * TBD. Retrieve the error details and decide what action
> * needs to be taken. One of the actions could be to pass
> * the error to the guest and have the guest driver recover
> * from the error. This requires that PCIe capabilities be
> * exposed to the guest. For now, we just terminate the
> * guest to contain the error.
> */
>
> error_report("%s(%s) Unrecoverable error detected. Please collect any data possible and then kill the guest", __func__, vdev->vbasedev.name);
>
> vm_stop(RUN_STATE_INTERNAL_ERROR);
> }
>
> Slighly rearranged for clearer control flow:
>
> static void vfio_err_notifier_handler(void *opaque)
> {
> VFIOPCIDevice *vdev = opaque;
> Error *err = NULL;
>
> if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
> return;
> }
>
> if (vdev->err_handler) {
> if (vdev->err_handler(vdev, &err)) {
> /* Error successfully handled */
> return;
> }
> error_report_err(err);
> }
>
> /*
> * TBD. Retrieve the error details and decide what action
> * needs to be taken. One of the actions could be to pass
> * the error to the guest and have the guest driver recover
> * from the error. This requires that PCIe capabilities be
> * exposed to the guest. For now, we just terminate the
> * guest to contain the error.
> */
>
> error_report("%s(%s) Unrecoverable error detected. Please collect any data possible and then kill the guest", __func__, vdev->vbasedev.name);
>
> vm_stop(RUN_STATE_INTERNAL_ERROR);
> }
>
> Questions / issues:
>
> * Is the comment still accurate?
>
> * When ->err_handler() fails, we report the error twice. Would it make
> sense to combine the two error reports into one?
Yes. It was my request too.
Thanks,
C.
> * Preexisting: the second error message is ugly.
>
> Error messages should be short and to the point: single phrase, with
> no newline or trailing punctuation. The "please collect ..." part
> does not belong to the error message proper, it's advice on what to
> do. Better: report the error, then print advice:
>
> error_report("%s(%s) Unrecoverable error detected",
> __func__, vdev->vbasedev.name);
> error_printf("Please collect any data possible and then kill the guest.");
>
> Including __func__ in an error message is an anti-pattern. Look at
>
> vfio_err_notifier_handler(fred) Unrecoverable error detected
>
> with a user's eyes: "vfio_err_notifier_handler" is programmer
> gobbledygook, the device name "fred" is useful once you realize what
> it is, "Unrecoverable error detected" lacks detail.
>
> [...]
>
next prev parent reply other threads:[~2025-09-27 7:07 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-25 17:48 [PATCH v3 0/5] Error recovery for zPCI passthrough devices Farhan Ali
2025-09-25 17:48 ` [PATCH v3 1/5] [NOTFORMERGE] linux-headers: Update for zpci vfio device Farhan Ali
2025-09-25 17:48 ` [PATCH v3 2/5] vfio/pci: Add an error handler callback Farhan Ali
2025-09-26 4:57 ` Markus Armbruster
2025-09-26 7:40 ` Cédric Le Goater
2025-09-26 18:44 ` Farhan Ali
2025-09-26 17:53 ` Farhan Ali
2025-09-27 5:59 ` Markus Armbruster
2025-09-27 7:05 ` Cédric Le Goater [this message]
2025-09-29 17:20 ` Farhan Ali
2025-09-30 9:20 ` Markus Armbruster
2025-09-30 17:15 ` Farhan Ali
2025-10-01 4:52 ` Markus Armbruster
2025-10-01 18:21 ` Farhan Ali
2025-10-06 6:06 ` Markus Armbruster
2025-09-25 17:48 ` [PATCH v3 3/5] vfio: Introduce vfio_device_feature helper function Farhan Ali
2025-09-25 17:48 ` [PATCH v3 4/5] s390x/pci: Add PCI error handling for vfio pci devices Farhan Ali
2025-09-25 17:48 ` [PATCH v3 5/5] s390x/pci: Reset a device in error state Farhan Ali
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4207529b-a0a5-4360-8449-f4c20661e9e8@redhat.com \
--to=clg@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=alifm@linux.ibm.com \
--cc=armbru@redhat.com \
--cc=mjrosato@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-s390x@nongnu.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).