From: Paolo Bonzini <pbonzini@redhat.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Anthony Liguori <aliguori@us.ibm.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
qemu-devel@nongnu.org,
Alex Williamson <alex.williamson@redhat.com>,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space
Date: Fri, 09 Aug 2013 13:07:44 +0200 [thread overview]
Message-ID: <5204CD80.7000409@redhat.com> (raw)
In-Reply-To: <5204CB5A.8060000@ozlabs.ru>
Il 09/08/2013 12:58, Alexey Kardashevskiy ha scritto:
> On 08/09/2013 08:19 PM, Paolo Bonzini wrote:
>> Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
>>> On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
>>>> Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
>>>>> On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
>>>>>> Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
>>>>>>> A PCI device's DMA address space (possibly an IOMMU) is returned by a
>>>>>>> method on the PCIBus. At the moment that only has one caller, so the
>>>>>>> method is simply open coded. We'll need another caller for VFIO, so
>>>>>>> this patch introduces a helper/wrapper function.
>>>>>>>
>>>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>>>>> [aik: added inheritance from parent if iommu is not set for the current bus]
>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>
>>>>>>> ---
>>>>>>> Changes:
>>>>>>> v2:
>>>>>>> * added inheritance, needed for a pci-bridge on spapr-ppc64
>>>>>>> * pci_iommu_as renamed to pci_device_iommu_address_space
>>>>>>> ---
>>>>>>> hw/pci/pci.c | 22 ++++++++++++++++------
>>>>>>> include/hw/pci/pci.h | 1 +
>>>>>>> 2 files changed, 17 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>>> index 4c004f5..0072b54 100644
>>>>>>> --- a/hw/pci/pci.c
>>>>>>> +++ b/hw/pci/pci.c
>>>>>>> @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>>>>>> }
>>>>>>>
>>>>>>> pci_dev->bus = bus;
>>>>>>> - if (bus->iommu_fn) {
>>>>>>> - dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
>>>>>>> - } else {
>>>>>>> - /* FIXME: inherit memory region from bus creator */
>>>>>>> - dma_as = &address_space_memory;
>>>>>>> - }
>>>>>>> + dma_as = pci_device_iommu_address_space(pci_dev);
>>>>>>>
>>>>>>> memory_region_init_alias(&pci_dev->bus_master_enable_region,
>>>>>>> OBJECT(pci_dev), "bus master",
>>>>>>> @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
>>>>>>> k->props = pci_props;
>>>>>>> }
>>>>>>>
>>>>>>> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>>>>>>> +{
>>>>>>> + PCIBus *bus = PCI_BUS(dev->bus);
>>>>>>> +
>>>>>>> + if (bus->iommu_fn) {
>>>>>>> + return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>>>>>> + }
>>>>>>> +
>>>>>>> + if (bus->parent_dev) {
>>>>>>> + return pci_device_iommu_address_space(bus->parent_dev);
>>>>>>> + }
>>>>>>
>>>>>> No, this would fail if bus->parent_dev is not NULL but not a PCI device
>>>>>> either.
>>>>>
>>>>> parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/
>>>>
>>>> Doh, I misread the code, I thought it was the "parent" field in
>>>> BusState. Why do we have parent_dev at all?
>>>
>>> The code is too old? Don't know.
>>>
>>>
>>>>>> You can use object_dynamic_cast to convert the parent_dev to
>>>>>> PCIDevice, and if the cast succeeds you call the new function.
>>>>>>
>>>>>> Perhaps you could make the new function take a PCIBus instead.
>>>>>> Accessing the PCIDevice's IOMMU address space (as opposed to the
>>>>>> bus-master address space) doesn't make much sense, VFIO is really a
>>>>>> special case. Putting the new API on the bus side instead looks better.
>>>>>>
>>>>>> (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
>>>>>> devices sitting on the secondary bus?)
>>>>>
>>>>> It happens naturally I guess when linux enables devices.
>>>>
>>>> Yes, but then using the IOMMU address space would be wrong; you would
>>>> have to use the bus-master address space as a base for the child's
>>>> bus-master address space.
>>>
>>>
>>> Like this? Works too.
>>>
>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>> index 8bdcedc..a4c70e6 100644
>>> --- a/hw/pci/pci.c
>>> +++ b/hw/pci/pci.c
>>> @@ -2247,23 +2247,23 @@ AddressSpace
>>> *pci_device_iommu_address_space(PCIDevice *dev)
>>> {
>>> PCIBus *bus = PCI_BUS(dev->bus);
>>>
>>> if (bus->iommu_fn) {
>>> return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>> }
>>>
>>> if (bus->parent_dev) {
>>> return pci_device_iommu_address_space(bus->parent_dev);
>>> }
>>>
>>> - return &address_space_memory;
>>> + return &dev->bus_master_as;
>>> }
>>
>> I was thinking more like this:
>>
>> if (bus->parent_dev) {
>> - return pci_device_iommu_address_space(bus->parent_dev);
>> + /* Take parent device's bus-master enable bit into account. */
>> + return pci_get_address_space(bus->parent_dev);
>> }
>>
>> + /* Not a secondary bus and no IOMMU. Use system memory. */
>> return &address_space_memory;
>
> Ok.
>
> Oh. BTW. This "bus master" thing now breaks VFIO's check for all devices
> being in the same address space as every single device has its own "bus
> master" AddressSpace.
Yes, fixing that check is another good use of the new API you introduced.
But after Ben's answer, I guess the above change is not really needed.
It would add complication for VFIO, too. Proper emulation would require
QEMU to trap writes to the device's bus-master bit. QEMU would have to
take of the value that the guest writes, AND it with the bus-master
bit(s) of all bridges between the host bridge and the VFIO device, and
write the result to the passed-through device. This is because the
bridges are emulated, and do not exist in real hardware.
>>>> Also, we would have to fix the x86 firmware.
>
> btw what would it mean? :)
Firmware would have to look for bridges and enable bus master DMA on
them. Otherwise, for example a SCSI controller below a bridge (not
virtio-scsi) would fail to boot.
Paolo
next prev parent reply other threads:[~2013-08-09 11:08 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-09 8:49 [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space Alexey Kardashevskiy
2013-08-09 9:40 ` Paolo Bonzini
2013-08-09 9:48 ` Alexey Kardashevskiy
2013-08-09 9:53 ` Paolo Bonzini
2013-08-09 10:13 ` Alexey Kardashevskiy
2013-08-09 10:19 ` Paolo Bonzini
2013-08-09 10:58 ` Alexey Kardashevskiy
2013-08-09 11:07 ` Paolo Bonzini [this message]
2013-08-09 11:21 ` Alexey Kardashevskiy
2013-08-09 11:30 ` Paolo Bonzini
2013-08-09 10:44 ` David Gibson
2013-08-09 10:20 ` Benjamin Herrenschmidt
2013-08-09 10:29 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5204CD80.7000409@redhat.com \
--to=pbonzini@redhat.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=david@gibson.dropbear.id.au \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).