From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jan Kiszka <jan.kiszka@web.de>
Cc: Avi Kivity <avi@redhat.com>,
Alex Williamson <alex.williamson@redhat.com>,
kvm@vger.kernel.org, jbaron@redhat.com
Subject: Re: [PATCH v2] kvm: Disable MSI/MSI-X in assigned device reset path
Date: Sun, 8 Apr 2012 23:35:48 +0300 [thread overview]
Message-ID: <20120408203547.GB10916@redhat.com> (raw)
In-Reply-To: <4F81DB67.700@web.de>
On Sun, Apr 08, 2012 at 08:39:35PM +0200, Jan Kiszka wrote:
> On 2012-04-08 20:18, Michael S. Tsirkin wrote:
> > On Sun, Apr 08, 2012 at 07:37:57PM +0200, Jan Kiszka wrote:
> >> On 2012-04-08 18:08, Avi Kivity wrote:
> >>> On 04/08/2012 07:04 PM, Michael S. Tsirkin wrote:
> >>>> On Sun, Apr 08, 2012 at 06:50:27PM +0300, Avi Kivity wrote:
> >>>>> On 04/08/2012 06:46 PM, Michael S. Tsirkin wrote:
> >>>>>>>>>
> >>>>>>>>> I'm thinking about this flow:
> >>>>>>>>>
> >>>>>>>>> FLR the device
> >>>>>>>>> for each emulated register
> >>>>>>>>> read it from the hardware
> >>>>>>>>> if different from emulated register:
> >>>>>>>>> update the internal model (for example, disabling MSI in kvm if
> >>>>>>>>> needed)
> >>>>>>>>
> >>>>>>>> If we do it this way we get back the problem this patch
> >>>>>>>> is trying to solve: MSIX assigned while device
> >>>>>>>> memory is disabled would cause unsupported request errors.
> >>>>>>>
> >>>>>>> Why is that? FLR would presumably disable MSI in the device, and this
> >>>>>>> line would disable it in kvm as well.
> >>>>>>
> >>>>>> The bug is that device memory is disabled (FLR would do that)
> >>>>>> while MSI is enabled in kvm. The fix is to
> >>>>>> disable MSI in kvm first.
> >>>>>
> >>>>> Yes, no need to repeat. My question is whether my pseudo-code does the
> >>>>> same
> >>>>
> >>>> It doesn't seem to: FLR (disabling memory) is followed
> >>>> by MSI disable in kvm instead of the reverse.
> >>>
> >>> Ah, so the problem is the ordering? I see.
> >>>
> >>>>> and whether or not if it is better (when applied to all emulated
> >>>>> config space).
> >>>>
> >>>> I'm not sure.
> >>>> I would like to see an example of a register that you have
> >>>> in mind.
> >>>
> >>> I went over the PCI registers and saw none that would be affected.
> >>>
> >>>>>>
> >>>>>> Yes. I'm talking about things like enabling memory, setting up irq register,
> >>>>>> etc though. Most of this setup is done by bios.
> >>>>>
> >>>>> I see. So should we have a pci_reset_function() variant that limits
> >>>>> itself to restoring just those bits?
> >>>>
> >>>> We only need kernel to restore whatever qemu emulates, but
> >>>> kernel doesn't know what that is.
> >>>> What kind of interface do you have in mind?
> >>>>
> >>>
> >>> The same as pci_reset_function(), but leaves MSI clear.
> >>>
> >>> I guess it's not worth it if the ordering problem is there.
> >>
> >> The core problem is not the ordering. The problem is that the kernel is
> >> susceptible to ordering mistakes of userspace.
> >> And that is because the
> >> kernel panics on PCI errors of devices that are in user hands - a
> >> critical kernel bug IMHO.
> >
> > I'm not sure. The pci sysfs interface
> > is by design not secured against malicious users,
> > isn't it?
>
> That's surely true for devices outside of IOMMU protection. But do we
> really have to give up when we encapsulate and isolate them that way?
> Provided we moderate access to the sysfs resources via libvirt or some
> other management service.
We don't have to give up but we'd have to build such an
interface: /config attribute is not it.
> >
> >> Proper reset of MSI or even the whole PCI
> >> config space is another issue, but one the kernel should not worry about
> >> - still, it should be fixed (therefore this patch).
> >> But even if we disallowed userland to disable MMIO and PIO access to the
> >> device, we would be be able to exclude that there are secrete channels
> >> in the device's interface having the same effect.
> >
> > I'm not sure I agree here. If there are secret channels to the device
> > that let it violate the PCI express spec, it can probably break the SRIOV
> > security model. And then you can do much more than just crash the host.
>
> Maybe, but there are also other devices. And if a guest reprograms it
> (firmware update...) and makes it stop reacting on requests, we may get
> the same effect. That would also be some kind of a "secrete channel".
Right. So it looks like SRIOV VF is the only type of device that
is safe to assign to a guest:
Presumably, SRIOV VFs don't let driver program the firmware.
And I think SRIOV VFs don't have MMIO/PIO enable bits either,
and the BAR isn't programmable through the VF...
> >
> >> So we likely need to
> >> enhance PCI error handling to catch and handle faults for certain
> >> devices differently - those we cannot trust to behave properly while
> >> they are under userland/guest control.
> >>
> >> Jan
> >>
> >
> >
> > I agree - forwarding errors to the guest would actually be very useful - but
> > I think we also need to analyse the problem carefully,
> > and prevent as many ways as we can for guest to cause trouble.
>
> If possible, the protection should target userspace which would
> automatically include guests. Only if that is not feasible with
> reasonable effort, we have to rely on QEMU to save the host.
Defence in depth is best, right?
> >
> > And there is another issue here: unsuppported request errors
> > should not cause kernel panics IMO.
> >
> > There's also the issue that qemu let guest control the MMIO/PIO
> > bits in the command register.
> >
> > So there are multiple bugs.
> >
>
> Yep, that's true.
>
> Jan
>
next prev parent reply other threads:[~2012-04-08 20:35 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-05 3:42 [PATCH v2] kvm: Disable MSI/MSI-X in assigned device reset path Alex Williamson
2012-04-05 7:28 ` Jan Kiszka
2012-04-05 9:34 ` Michael S. Tsirkin
2012-04-05 14:42 ` Alex Williamson
2012-04-05 15:04 ` Michael S. Tsirkin
2012-04-08 13:14 ` Avi Kivity
2012-04-08 13:17 ` Michael S. Tsirkin
2012-04-08 13:18 ` Avi Kivity
2012-04-08 13:21 ` Michael S. Tsirkin
2012-04-08 13:24 ` Avi Kivity
2012-04-08 13:30 ` Michael S. Tsirkin
2012-04-08 13:41 ` Avi Kivity
2012-04-08 13:53 ` Michael S. Tsirkin
2012-04-08 14:01 ` Avi Kivity
2012-04-08 14:42 ` Michael S. Tsirkin
2012-04-08 15:26 ` Avi Kivity
2012-04-08 15:46 ` Michael S. Tsirkin
2012-04-08 15:50 ` Avi Kivity
2012-04-08 16:04 ` Michael S. Tsirkin
2012-04-08 16:08 ` Avi Kivity
2012-04-08 17:37 ` Jan Kiszka
2012-04-08 18:18 ` Michael S. Tsirkin
2012-04-08 18:39 ` Jan Kiszka
2012-04-08 20:35 ` Michael S. Tsirkin [this message]
2012-04-09 8:35 ` Avi Kivity
2012-04-10 16:55 ` Alex Williamson
2012-04-16 14:03 ` Alex Williamson
2012-04-16 14:31 ` Avi Kivity
2012-04-16 15:06 ` Michael S. Tsirkin
2012-04-16 15:10 ` Jan Kiszka
2012-04-16 16:08 ` Michael S. Tsirkin
2012-04-16 16:13 ` Jan Kiszka
2012-04-16 16:36 ` Michael S. Tsirkin
2012-04-16 16:38 ` Jan Kiszka
2012-04-16 17:12 ` Michael S. Tsirkin
2012-04-16 18:47 ` Jan Kiszka
2012-04-16 16:12 ` Jason Baron
2012-04-16 16:34 ` Michael S. Tsirkin
2012-04-16 19:07 ` Alex Williamson
2012-04-16 19:47 ` Michael S. Tsirkin
2012-04-17 0:55 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120408203547.GB10916@redhat.com \
--to=mst@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=avi@redhat.com \
--cc=jan.kiszka@web.de \
--cc=jbaron@redhat.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.