From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>,
Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
Gavin Shan <gwshan@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
Date: Mon, 14 Sep 2015 18:56:35 +1000 [thread overview]
Message-ID: <55F68BC3.3030108@ozlabs.ru> (raw)
In-Reply-To: <20150914040407.GG2547@voom.fritz.box>
On 09/14/2015 02:04 PM, David Gibson wrote:
> On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote:
>> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote:
>>> On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
>>>> So far there were 2 limitations enforced on an emulated PHB
>>>> regarding VFIO:
>>>> 1) only one IOMMU group per IOMMU container was allowed and
>>>> the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
>>>> 2) only one IOMMU container per PHB was allowed as a group
>>>> can only be attached to one container.
>>>>
>>>> However these are not really necessary so we are removing them here.
>>>>
>>>> This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
>>>> not to receive it. Instead of getting a container from a group ID,
>>>> this calls ioctl() for every container attached to the PHB address space.
>>>> This allows multiple containers on the same PHB which means multiple
>>>> groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
>>>> to be passed to every container separately which will affect
>>>> the performance. For 32bit devices it is still recommended to put
>>>> every group to a separate PHB.
>>>>
>>>> Since the existing VFIO code is already trying to share a container for
>>>> multiple groups, just removing a group id from
>>>> the vfio_container_do_ioctl() allows the existing code to share
>>>> a container if it is supported by the host kernel.
>>>>
>>>> This replaces the check for a group id to be set correctly with
>>>> the check that it is not set.
>>>>
>>>> This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
>>>> members is accessed here.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>> hw/ppc/spapr_pci.c | 10 +++++-----
>>>> hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
>>>> hw/vfio/common.c | 22 ++++++----------------
>>>> include/hw/vfio/vfio.h | 3 +--
>>>> 4 files changed, 18 insertions(+), 34 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>> index 4e289cb..1b7559d 100644
>>>> --- a/hw/ppc/spapr_pci.c
>>>> +++ b/hw/ppc/spapr_pci.c
>>>> @@ -810,11 +810,6 @@ static int spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
>>>> pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, sphb);
>>>>
>>>> if (sphb->vfio_num) {
>>>> - if (sphb->iommugroupid == -1) {
>>>> - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
>>>> - return -1;
>>>> - }
>>>> -
>>>> ret = spapr_phb_vfio_dma_capabilities_update(sphb);
>>>> if (ret) {
>>>> error_report("Unable to get DMA32 info from VFIO");
>>>> @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>>> PCIBus *bus;
>>>> uint64_t msi_window_size = 4096;
>>>>
>>>> + if ((sphb->iommugroupid != -1) &&
>>>> + object_dynamic_cast(OBJECT(sphb), TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
>>>> + error_report("Warning: iommugroupid is deprecated and will be ignored");
>>>> + }
>>>> +
>>>> if (sphb->index != (uint32_t)-1) {
>>>> hwaddr windows_base;
>>>>
>>>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
>>>> index f94d8a4..48137d5 100644
>>>> --- a/hw/ppc/spapr_pci_vfio.c
>>>> +++ b/hw/ppc/spapr_pci_vfio.c
>>>> @@ -35,7 +35,7 @@ int spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
>>>> struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
>>>> int ret;
>>>>
>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> + ret = vfio_container_ioctl(&sphb->iommu_as,
>>>> VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
>>>> if (ret) {
>>>> return ret;
>>>> @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
>>>> .op = VFIO_EEH_PE_ENABLE
>>>> };
>>>>
>>>> - vfio_container_ioctl(&sphb->iommu_as,
>>>> - sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
>>>> + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>> }
>>>>
>>>> int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>>>> @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
>>>> return RTAS_OUT_PARAM_ERROR;
>>>> }
>>>>
>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> - VFIO_EEH_PE_OP, &op);
>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>> if (ret < 0) {
>>>> return RTAS_OUT_HW_ERROR;
>>>> }
>>>> @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, int *state)
>>>> int ret;
>>>>
>>>> op.op = VFIO_EEH_PE_GET_STATE;
>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> - VFIO_EEH_PE_OP, &op);
>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>> if (ret < 0) {
>>>> return RTAS_OUT_PARAM_ERROR;
>>>> }
>>>> @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>>>> return RTAS_OUT_PARAM_ERROR;
>>>> }
>>>>
>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> - VFIO_EEH_PE_OP, &op);
>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>> if (ret < 0) {
>>>> return RTAS_OUT_HW_ERROR;
>>>> }
>>>> @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
>>>> int ret;
>>>>
>>>> op.op = VFIO_EEH_PE_CONFIGURE;
>>>> - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
>>>> - VFIO_EEH_PE_OP, &op);
>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
>>>> if (ret < 0) {
>>>> return RTAS_OUT_PARAM_ERROR;
>>>> }
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index 85ee9b0..a00644e 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
>>>> close(vbasedev->fd);
>>>> }
>>>>
>>>> -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
>>>> - int req, void *param)
>>>> +static int vfio_container_do_ioctl(AddressSpace *as, int req, void *param)
>>>> {
>>>> - VFIOGroup *group;
>>>> VFIOContainer *container;
>>>> int ret = -1;
>>>> + VFIOAddressSpace *space = vfio_get_address_space(as);
>>>>
>>>> - group = vfio_get_group(groupid, as);
>>>> - if (!group) {
>>>> - error_report("vfio: group %d not registered", groupid);
>>>> - return ret;
>>>> - }
>>>> -
>>>> - container = group->container;
>>>> - if (group->container) {
>>>> + QLIST_FOREACH(container, &space->containers, next) {
>>>> ret = ioctl(container->fd, req, param);
>>>> if (ret < 0) {
>>>> error_report("vfio: failed to ioctl %d to container: ret=%d, %s",
>>>> _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
>>>> + return -errno;
>>>> }
>>>> }
>>>
>>> This highlights how terrible this ioctl passthrough interface is; what's
>>> a caller supposed to do on error? Here we don't have visibility into
>>> the ioctl being called in order to do any unwind on error. The caller
>>> doesn't have the context to unwind only the failed containers. Is
>>> returning errno here really sufficient for anyone to figure out how to
>>> proceed or should we just cut our loses and abort()? What's the least
>>> horrible way we can realistically handle a failure here? Thanks,
>>
>> It seemed like I asked this before and it turns out that I did:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html
>>
>> "Gavin says yes" is not really a supportable solution when I stumbled on
>> it again and David doesn't know why it's safe either (nor does it answer
>> my question of how does this work). At a minimum, the reasoning why
>> this is safe for the ioctls we currently allow here needs to be spelled
>> out in a comment. We could easily make the mistake of adding another
>> ioctl to the passthrough for which those assumptions are not valid. We
>> may even want to abort if it's not one of the ioctls we've evaluated so
>> we don't overlook this problem later. Thanks,
>
> Yeah, you're right. This is a mess.
Right. I have been testing/studying this lately a lot, will fix this too.
> What we need to do here is to make vfio_container_ioctl() take a
> VFIOContainer object as the name suggests. The the callers will need
> to either use it on a specific container, or iterate across all the
> containers in the VFIOAddressSpace as appropriate for the specific
> operation.
The callers - spapr code - do not deal with containers, these are a VFIO
internal thing managed by hw/vfio/common.c and attached to VFIOAddressSpace.
So s/vfio_container_ioctl/vfio_address_space_ioctl/ (or vfio_as_ioctl()?)
makes more sense here. Which I can make a patch for. Or change
vfio_container_ioctl() as you suggest (+make is static) and add public
vfio_as_ioctl(). Which one to pick?
--
Alexey
next prev parent reply other threads:[~2015-09-14 8:56 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-03 4:40 [Qemu-devel] [PATCH qemu v2 0/2] spapr_pci: Merge spapr-vfio-phb into spapr-phb Alexey Kardashevskiy
2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
2015-09-03 4:40 ` [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices Alexey Kardashevskiy
2015-09-10 2:43 ` Alex Williamson
2015-09-11 20:03 ` Alex Williamson
2015-09-14 4:04 ` David Gibson
2015-09-14 8:56 ` Alexey Kardashevskiy [this message]
2015-09-14 11:42 ` David Gibson
2015-09-14 14:37 ` Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55F68BC3.3030108@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=gwshan@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).