qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Markus Armbruster" <armbru@redhat.com>,
	小田喜陽彦 <akihiko.odaki@daynix.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org,
	qemu-arm@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"John Snow" <jsnow@redhat.com>,
	"Dmitry Fleytman" <dmitry.fleytman@gmail.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Stefan Weil" <sw@weilnetz.de>, "Keith Busch" <kbusch@kernel.org>,
	"Klaus Jensen" <its@irrelevant.dk>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Andrey Smirnov" <andrew.smirnov@gmail.com>,
	"Paul Burton" <paulburton@kernel.org>,
	"Aleksandar Rikalo" <aleksandar.rikalo@syrmia.com>
Subject: Re: [PATCH] pci: Abort if pci_add_capability fails
Date: Wed, 31 Aug 2022 10:18:22 +0200	[thread overview]
Message-ID: <87y1v4ddwx.fsf@pond.sub.org> (raw)
In-Reply-To: <20220830120014.55f55b24.alex.williamson@redhat.com> (Alex Williamson's message of "Tue, 30 Aug 2022 12:00:14 -0600")

Alex Williamson <alex.williamson@redhat.com> writes:

> On Tue, 30 Aug 2022 13:37:35 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
>>        if (!offset) {
>>            offset = pci_find_space(pdev, size);
>>            /* out of PCI config space is programming error */
>>            assert(offset);
>>        } else {
>>            /* Verify that capabilities don't overlap.  Note: device assignment
>>             * depends on this check to verify that the device is not broken.
>>             * Should never trigger for emulated devices, but it's helpful
>>             * for debugging these. */
>> 
>> The comment makes me suspect that device assignment of a broken device
>> could trigger the error.  It goes back to
>> 
>> commit c9abe111209abca1b910e35c6ca9888aced5f183
>> Author: Jan Kiszka <jan.kiszka@siemens.com>
>> Date:   Wed Aug 24 14:29:30 2011 +0200
>> 
>>     pci: Error on PCI capability collisions
>>     
>>     Nothing good can happen when we overlap capabilities. This may happen
>>     when plugging in assigned devices or when devices models contain bugs.
>>     Detect the overlap and report it.
>>     
>>     Based on qemu-kvm commit by Alex Williamson.
>>     
>>     Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>     Acked-by: Don Dutile <ddutile@redhat.com>
>>     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> 
>> If this is still correct, then your patch is a regression: QEMU is no
>> longer able to gracefully handle assignment of a broken device.  Does
>> this matter?  Alex, maybe?
>
> Ok, that was a long time ago.  I have some vague memories of hitting

Indeed!

> something like this with a Broadcom NIC, but a google search for the
> error string doesn't turn up anything recently.  So there's a fair
> chance this wouldn't break anyone initially.

I like your careful phrasing.

> Even back when the above patch was proposed, there were some
> suggestions to turn the error path into an abort, which I pushed back
> on since clearly enumerating capabilities of a device can occur due to
> a hot-plug and we don't necessarily have control of the device being
> added.  This is only more true with the possibility of soft-devices out
> of tree, through things like vfio-user.

Valid point.

When an compiled-in device model asks for overlapping PCI capabilities,
it's a programming error.

When an assigned device (be it physical or virtual) has overlapping PCI
capabilities, it's bad input.

Abort on programming error is okay.  Abort on bad input isn't.

I think callers of the former sort should pass &error_abort to
pci_add_capability(), and callers of the latter sort should handle the
error.

> Personally I think the right approach is to support an error path such
> that we can abort when triggered by a cold-plug device, while simply
> rejecting a broken hot-plug device, but that seems to be the minority
> opinion among QEMU developers afaict.  Thanks,

I'm in the "aborting on programming errors is okay" camp.

Recovery from certain programming errors is possible.  However,
different programming errors can manifest themselves the same way, and
not all need permit safe recovery.

Say we detect overlapping PCI capabilities when hot-plugging a virtual
device model.

If this is because the programmer passed an incorrect literal offset, we
can recover safely by failing the hot plug.

But perhaps the offset comes from a table, and some other bug scribbled
over it.  Continuing the process is *unsafe*, and may increase the
damage and/or obstruct the root cause.

The former kind of bug is unlikely to survive even cursory testing.



  reply	other threads:[~2022-08-31  8:20 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-29  8:44 [PATCH] pci: Abort if pci_add_capability fails 小田喜陽彦
2022-08-30 11:37 ` Markus Armbruster
2022-08-30 18:00   ` Alex Williamson
2022-08-31  8:18     ` Markus Armbruster [this message]
2022-08-31 12:32       ` Akihiko Odaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1v4ddwx.fsf@pond.sub.org \
    --to=armbru@redhat.com \
    --cc=akihiko.odaki@daynix.com \
    --cc=aleksandar.rikalo@syrmia.com \
    --cc=alex.williamson@redhat.com \
    --cc=andrew.smirnov@gmail.com \
    --cc=dmitry.fleytman@gmail.com \
    --cc=eduardo@habkost.net \
    --cc=its@irrelevant.dk \
    --cc=jasowang@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kbusch@kernel.org \
    --cc=kraxel@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=paulburton@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).