From: Gavin Shan <shangw@linux.vnet.ibm.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: aik@ozlabs.ru, linuxppc-dev@lists.ozlabs.org,
Gavin Shan <shangw@linux.vnet.ibm.com>,
kvm@vger.kernel.org
Subject: Re: [PATCH 2/3] VFIO: VFIO_DEVICE_SET_ADDR_MAPPING command
Date: Tue, 19 Mar 2013 11:24:10 +0800 [thread overview]
Message-ID: <20130319032410.GA22591@shangw.(null)> (raw)
In-Reply-To: <1363640474.24132.433.camel@bling.home>
On Mon, Mar 18, 2013 at 03:01:14PM -0600, Alex Williamson wrote:
>On Sat, 2013-03-16 at 06:37 +0100, Benjamin Herrenschmidt wrote:
>> On Sat, 2013-03-16 at 09:34 +0800, Gavin Shan wrote:
>> > >Could you explain further how this will be used? How the device is
>> > >exposed to a guest is entirely a userspace construct, so why does vfio
>> > >need to know or care about this? I had assumed for AER that QEMU would
>> > >do the translation from host to guest address space.
>> > >
>> >
>> > The weak IOCTL function (vfio_pci_arch_ioctl) was introduced by previous
>> > patch. The PowerNV platform is going to override it to figure out the
>> > information for EEH core to use. On the other hand, QEMU will runs into
>> > the IOCTL command while opening (creating) one VFIO device.
>> >
>> > Though I'm not familiar with AER very much. AER is quite different from
>> > EEH. The EEH functionality implemented in PHB instead of in PCI device
>> > core. So we don't care AER stuff in EEH directly :-)
>>
>> To give Alex a bit more background...
>>
>> EEH is our IBM specific error handling facility which is a superset of AER.
>>
>> IE. In addition to AER's error detection and logging, it adds a layer of
>> error detection at the host bridge level (such as iommu violations etc...)
>> and a mechanism for handling and recovering from errors. This is tied to
>> our iommu domain stuff (our PE's) and our device "freezing" capability
>> among others.
>>
>> With VFIO + KVM, we want to implement most of the EEH support for guests in
>> the host kernel. The reason is multipart and we can discuss this separately
>> as some of it might well be debatable (mostly it's more convenient that way
>> because we hook into the underlying HW/FW EEH which isn't directly userspace
>> accessible so we don't have to add a new layer of kernel -> user API in
>> addition to the VFIO stuff), but there's at least one aspect of it that drives
>> this requirement more strongly which is performance:
>>
>> When EEH is enabled, whenever any MMIO returns all 1's, the kernel will do
>> a firmware call to query the EEH state of the device and check whether it
>> has been frozen. On some devices, that can be a performance issue, and
>> going all the way to qemu for that would be horribly expensive.
>>
>> So we want at least a way to handle that call in the kernel and for that we
>> need at least some way of mapping things there.
>
>There's no notification mechanism when a PHB is frozen? I suppose
>notification would be asynchronous so you risk data for every read that
>happens in the interim. So the choices are a) tell the host kernel the
>mapping, b) tell the guest kernel the mapping, c) identity mapping, or
>d) qemu intercept?
>
We do have dedicated interrupts on detecting frozen PHB on host side.
However, the guest has to poll/check the frozen state (frozen PE) during
access to config or MMIO space. For the recommended methods, (a) is what
we want to do with the patchset. (b) seems infeasible since the guest
shouldn't be aware of hypervisor (e.g. KVM or PowerVM) it's running on
top of, it's hard to polish the guest to do it. (d) sounds applicable
since the QEMU should know the address (BDF) of host and guest devices.
However, we still need let the host EEH core know that which PCI device
has been passed to guest and the best place to do that would be when opening
the corresponding VFIO PCI device. In turn, it will still need weak function
for ppc platform to override it. Why we not directly take (a) to finish
everything in one VFIO IOCTL command?
Sorry, Alex. I didn't understand (c) well :-)
>Presumably your firmware call to query the EEH is not going through
>VFIO, so is VFIO the appropriate place to setup this mapping? As you
>say, this seems like just a convenient place to put it even though it
>really has nothing to do with the VFIO kernel component. QEMU has this
>information and could register it with the host kernel through other
>means if available. Maybe the mapping should be registered with KVM if
>that's how the EEH data is accessed. I'm not yet sold on why this
>mapping is registered here. Thanks,
>
Yes, EEH firmware call needn't going through VFIO. However, EEH has
very close relationship with PCI and so VFIO-PCI does. Eventually, EEH
has close relationship with VFIO-PCI :-)
Thanks,
Gavin
next prev parent reply other threads:[~2013-03-19 3:24 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-15 7:26 [PATCH 0/3] VFIO change for EEH support Gavin Shan
2013-03-15 7:26 ` [PATCH 1/3] VFIO: Architecture dependent VFIO device operations Gavin Shan
2013-03-15 7:26 ` [PATCH 2/3] VFIO: VFIO_DEVICE_SET_ADDR_MAPPING command Gavin Shan
2013-03-15 19:29 ` Alex Williamson
2013-03-16 1:34 ` Gavin Shan
2013-03-16 5:37 ` Benjamin Herrenschmidt
2013-03-18 21:01 ` Alex Williamson
2013-03-19 3:24 ` Gavin Shan [this message]
2013-03-19 4:18 ` Alex Williamson
2013-03-19 4:45 ` Benjamin Herrenschmidt
2013-03-20 18:48 ` Alex Williamson
2013-03-20 19:31 ` Benjamin Herrenschmidt
2013-03-20 19:46 ` Alex Williamson
2013-03-21 2:09 ` Gavin Shan
2013-03-15 7:26 ` [PATCH 3/3] VFIO: Direct access config reg without capability Gavin Shan
2013-03-15 19:41 ` Alex Williamson
2013-03-16 3:34 ` Gavin Shan
2013-03-16 5:30 ` Benjamin Herrenschmidt
2013-03-18 21:15 ` Alex Williamson
2013-03-21 0:58 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='20130319032410.GA22591@shangw.(null)' \
--to=shangw@linux.vnet.ibm.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).