From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: kvm@vger.kernel.org, joro@8bytes.org, qemu-devel@nongnu.org,
blauwirbel@gmail.com, yamahata@valinux.co.jp,
paul@codesourcery.com, avi@redhat.com
Subject: [Qemu-devel] Re: [PATCH 2/7] pci: memory access API and IOMMU support
Date: Thu, 2 Sep 2010 08:28:26 +0300 [thread overview]
Message-ID: <20100902052826.GB4273@redhat.com> (raw)
In-Reply-To: <1283007298-10942-3-git-send-email-eduard.munteanu@linux360.ro>
On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
>
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
I am concerned about adding more pointer chaising on data path.
Could we have
1. an iommu pointer in a device, inherited by secondary buses
when they are created and by devices from buses when they are attached.
2. translation pointer in the iommu instead of the bus
3. pci_memory_XX functions inline, doing fast path for non-iommu case:
if (__builtin_expect(!dev->iommu, 1)
return cpu_memory_rw
> ---
> hw/pci.c | 185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> hw/pci.h | 74 +++++++++++++++++++++
> hw/pci_internals.h | 12 ++++
> qemu-common.h | 1 +
> 4 files changed, 271 insertions(+), 1 deletions(-)
Almost nothing here is PCI specific.
Can this code go into dma.c/dma.h?
We would have struct DMADevice, APIs like device_dma_write etc.
This would help us get rid of the void * stuff as well?
>
> diff --git a/hw/pci.c b/hw/pci.c
> index 2dc1577..b460905 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -158,6 +158,19 @@ static void pci_device_reset(PCIDevice *dev)
> pci_update_mappings(dev);
> }
>
> +static int pci_no_translate(PCIDevice *iommu,
> + PCIDevice *dev,
> + pcibus_t addr,
> + target_phys_addr_t *paddr,
> + target_phys_addr_t *len,
> + unsigned perms)
> +{
> + *paddr = addr;
> + *len = -1;
> +
> + return 0;
> +}
> +
> static void pci_bus_reset(void *opaque)
> {
> PCIBus *bus = opaque;
> @@ -220,7 +233,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
> {
> qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
> assert(PCI_FUNC(devfn_min) == 0);
> - bus->devfn_min = devfn_min;
> +
> + bus->devfn_min = devfn_min;
> + bus->iommu = NULL;
> + bus->translate = pci_no_translate;
>
> /* host bridge */
> QLIST_INIT(&bus->child);
> @@ -1789,3 +1805,170 @@ static char *pcibus_get_dev_path(DeviceState *dev)
> return strdup(path);
> }
>
> +void pci_register_iommu(PCIDevice *iommu,
> + PCITranslateFunc *translate)
> +{
> + iommu->bus->iommu = iommu;
> + iommu->bus->translate = translate;
> +}
> +
The above seems broken for secondary buses, right? Also, can we use
qdev for initialization in some way, instead of adding more APIs? E.g.
I think it would be nice if we could just use qdev command line flags to
control which bus is behind iommu and which isn't.
> +void pci_memory_rw(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len,
> + int is_write)
> +{
> + int err;
> + unsigned perms;
> + PCIDevice *iommu = dev->bus->iommu;
> + target_phys_addr_t paddr, plen;
> +
> + perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> + while (len) {
> + err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> + if (err)
> + return;
> +
> + /* The translation might be valid for larger regions. */
> + if (plen > len)
> + plen = len;
> +
> + cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> + len -= plen;
> + addr += plen;
> + buf += plen;
> + }
> +}
> +
> +static void pci_memory_register_map(PCIDevice *dev,
> + pcibus_t addr,
> + pcibus_t len,
> + target_phys_addr_t paddr,
> + PCIInvalidateMapFunc *invalidate,
> + void *invalidate_opaque)
> +{
> + PCIMemoryMap *map;
> +
> + map = qemu_malloc(sizeof(PCIMemoryMap));
> + map->addr = addr;
> + map->len = len;
> + map->paddr = paddr;
> + map->invalidate = invalidate;
> + map->invalidate_opaque = invalidate_opaque;
> +
> + QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
> +}
> +
> +static void pci_memory_unregister_map(PCIDevice *dev,
> + target_phys_addr_t paddr,
> + target_phys_addr_t len)
> +{
> + PCIMemoryMap *map;
> +
> + QLIST_FOREACH(map, &dev->memory_maps, list) {
> + if (map->paddr == paddr && map->len == len) {
> + QLIST_REMOVE(map, list);
> + free(map);
> + }
> + }
> +}
> +
> +void pci_memory_invalidate_range(PCIDevice *dev,
> + pcibus_t addr,
> + pcibus_t len)
> +{
> + PCIMemoryMap *map;
> +
> + QLIST_FOREACH(map, &dev->memory_maps, list) {
> + if (ranges_overlap(addr, len, map->addr, map->len)) {
> + map->invalidate(map->invalidate_opaque);
> + QLIST_REMOVE(map, list);
> + free(map);
> + }
> + }
> +}
> +
> +void *pci_memory_map(PCIDevice *dev,
> + PCIInvalidateMapFunc *cb,
> + void *opaque,
> + pcibus_t addr,
> + target_phys_addr_t *len,
> + int is_write)
> +{
> + int err;
> + unsigned perms;
> + PCIDevice *iommu = dev->bus->iommu;
> + target_phys_addr_t paddr, plen;
> +
> + perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> + plen = *len;
> + err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> + if (err)
> + return NULL;
> +
> + /*
> + * If this is true, the virtual region is contiguous,
> + * but the translated physical region isn't. We just
> + * clamp *len, much like cpu_physical_memory_map() does.
> + */
> + if (plen < *len)
> + *len = plen;
> +
> + /* We treat maps as remote TLBs to cope with stuff like AIO. */
> + if (cb)
> + pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
> +
> + return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +
All the above is really only useful for when there is an iommu,
right? So maybe we should shortcut all this if there's no iommu?
> +void pci_memory_unmap(PCIDevice *dev,
> + void *buffer,
> + target_phys_addr_t len,
> + int is_write,
> + target_phys_addr_t access_len)
> +{
> + cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> + pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
> +}
> +
> +#define DEFINE_PCI_LD(suffix, size) \
> +uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr) \
> +{ \
> + int err; \
> + target_phys_addr_t paddr, plen; \
> + \
> + err = dev->bus->translate(dev->bus->iommu, dev, \
> + addr, &paddr, &plen, IOMMU_PERM_READ); \
> + if (err || (plen < size / 8)) \
> + return 0; \
> + \
> + return ld##suffix##_phys(paddr); \
> +}
> +
> +#define DEFINE_PCI_ST(suffix, size) \
> +void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val) \
> +{ \
> + int err; \
> + target_phys_addr_t paddr, plen; \
> + \
> + err = dev->bus->translate(dev->bus->iommu, dev, \
> + addr, &paddr, &plen, IOMMU_PERM_WRITE); \
> + if (err || (plen < size / 8)) \
> + return; \
> + \
> + st##suffix##_phys(paddr, val); \
> +}
> +
> +DEFINE_PCI_LD(ub, 8)
> +DEFINE_PCI_LD(uw, 16)
> +DEFINE_PCI_LD(l, 32)
> +DEFINE_PCI_LD(q, 64)
> +
> +DEFINE_PCI_ST(b, 8)
> +DEFINE_PCI_ST(w, 16)
> +DEFINE_PCI_ST(l, 32)
> +DEFINE_PCI_ST(q, 64)
> +
> diff --git a/hw/pci.h b/hw/pci.h
> index c551f96..3131016 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -172,6 +172,8 @@ struct PCIDevice {
> char *romfile;
> ram_addr_t rom_offset;
> uint32_t rom_bar;
> +
> + QLIST_HEAD(, PCIMemoryMap) memory_maps;
> };
>
> PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> @@ -391,4 +393,76 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
> return !(last2 < first1 || last1 < first2);
> }
>
> +/*
> + * Memory I/O and PCI IOMMU definitions.
> + */
> +
> +#define IOMMU_PERM_READ (1 << 0)
> +#define IOMMU_PERM_WRITE (1 << 1)
> +#define IOMMU_PERM_RW (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef int PCIInvalidateMapFunc(void *opaque);
> +typedef int PCITranslateFunc(PCIDevice *iommu,
> + PCIDevice *dev,
> + pcibus_t addr,
> + target_phys_addr_t *paddr,
> + target_phys_addr_t *len,
> + unsigned perms);
> +
> +extern void pci_memory_rw(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len,
> + int is_write);
> +extern void *pci_memory_map(PCIDevice *dev,
> + PCIInvalidateMapFunc *cb,
> + void *opaque,
> + pcibus_t addr,
> + target_phys_addr_t *len,
> + int is_write);
> +extern void pci_memory_unmap(PCIDevice *dev,
> + void *buffer,
> + target_phys_addr_t len,
> + int is_write,
> + target_phys_addr_t access_len);
> +extern void pci_register_iommu(PCIDevice *dev,
> + PCITranslateFunc *translate);
> +extern void pci_memory_invalidate_range(PCIDevice *dev,
> + pcibus_t addr,
> + pcibus_t len);
> +
> +#define DECLARE_PCI_LD(suffix, size) \
> +extern uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
> +
> +#define DECLARE_PCI_ST(suffix, size) \
> +extern void pci_st##suffix(PCIDevice *dev, \
> + pcibus_t addr, \
> + uint##size##_t val);
> +
> +DECLARE_PCI_LD(ub, 8)
> +DECLARE_PCI_LD(uw, 16)
> +DECLARE_PCI_LD(l, 32)
> +DECLARE_PCI_LD(q, 64)
> +
> +DECLARE_PCI_ST(b, 8)
> +DECLARE_PCI_ST(w, 16)
> +DECLARE_PCI_ST(l, 32)
> +DECLARE_PCI_ST(q, 64)
> +
> +static inline void pci_memory_read(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len)
> +{
> + pci_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void pci_memory_write(PCIDevice *dev,
> + pcibus_t addr,
> + const uint8_t *buf,
> + pcibus_t len)
> +{
> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> +}
> +
> #endif
> diff --git a/hw/pci_internals.h b/hw/pci_internals.h
> index e3c93a3..fb134b9 100644
> --- a/hw/pci_internals.h
> +++ b/hw/pci_internals.h
> @@ -33,6 +33,9 @@ struct PCIBus {
> Keep a count of the number of devices with raised IRQs. */
> int nirq;
> int *irq_count;
> +
> + PCIDevice *iommu;
> + PCITranslateFunc *translate;
> };
Why is translate pointer in a bus? I think it's a work of an iommu?
> struct PCIBridge {
> @@ -44,4 +47,13 @@ struct PCIBridge {
> const char *bus_name;
> };
>
> +struct PCIMemoryMap {
> + pcibus_t addr;
> + pcibus_t len;
> + target_phys_addr_t paddr;
> + PCIInvalidateMapFunc *invalidate;
> + void *invalidate_opaque;
Can we have a structure that encapsulates the mapping
data instead of a void *?
> + QLIST_ENTRY(PCIMemoryMap) list;
> +};
> +
> #endif /* QEMU_PCI_INTERNALS_H */
> diff --git a/qemu-common.h b/qemu-common.h
> index d735235..8b060e8 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
> typedef struct PCIHostState PCIHostState;
> typedef struct PCIExpressHost PCIExpressHost;
> typedef struct PCIBus PCIBus;
> +typedef struct PCIMemoryMap PCIMemoryMap;
> typedef struct PCIDevice PCIDevice;
> typedef struct PCIBridge PCIBridge;
> typedef struct SerialState SerialState;
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-09-02 5:34 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-28 14:54 [Qemu-devel] [PATCH 0/7] AMD IOMMU emulation patchset v4 Eduard - Gabriel Munteanu
2010-08-28 14:54 ` [Qemu-devel] [PATCH 1/7] pci: expand tabs to spaces in pci_regs.h Eduard - Gabriel Munteanu
2010-08-31 20:29 ` [Qemu-devel] " Michael S. Tsirkin
2010-08-31 22:58 ` Eduard - Gabriel Munteanu
2010-09-01 10:39 ` Michael S. Tsirkin
2010-08-28 14:54 ` [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support Eduard - Gabriel Munteanu
2010-09-02 5:28 ` Michael S. Tsirkin [this message]
2010-09-02 8:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-09-02 9:49 ` Michael S. Tsirkin
2010-09-04 9:01 ` Blue Swirl
2010-09-05 7:10 ` Michael S. Tsirkin
2010-08-28 14:54 ` [Qemu-devel] [PATCH 3/7] AMD IOMMU emulation Eduard - Gabriel Munteanu
2010-08-28 15:58 ` [Qemu-devel] " Blue Swirl
2010-08-28 21:53 ` Eduard - Gabriel Munteanu
2010-08-29 20:37 ` Blue Swirl
2010-08-30 3:07 ` [Qemu-devel] " Isaku Yamahata
2010-08-30 5:54 ` Eduard - Gabriel Munteanu
2010-08-28 14:54 ` [Qemu-devel] [PATCH 4/7] ide: use the PCI memory access interface Eduard - Gabriel Munteanu
2010-09-02 5:19 ` [Qemu-devel] " Michael S. Tsirkin
2010-09-02 9:12 ` Eduard - Gabriel Munteanu
2010-09-02 9:58 ` Michael S. Tsirkin
2010-09-02 15:01 ` Eduard - Gabriel Munteanu
2010-09-02 15:24 ` Avi Kivity
2010-09-02 15:39 ` Michael S. Tsirkin
2010-09-02 16:07 ` Avi Kivity
2010-09-02 15:31 ` Michael S. Tsirkin
2010-08-28 14:54 ` [Qemu-devel] [PATCH 5/7] rtl8139: " Eduard - Gabriel Munteanu
2010-08-28 14:54 ` [Qemu-devel] [PATCH 6/7] eepro100: " Eduard - Gabriel Munteanu
2010-08-28 14:54 ` [Qemu-devel] [PATCH 7/7] ac97: " Eduard - Gabriel Munteanu
2010-08-28 16:00 ` [Qemu-devel] Re: [PATCH 0/7] AMD IOMMU emulation patchset v4 Blue Swirl
2010-08-29 9:55 ` Joerg Roedel
2010-08-29 20:44 ` Blue Swirl
2010-08-29 22:08 ` [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support Eduard - Gabriel Munteanu
2010-08-29 22:11 ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-09-01 20:10 ` [Qemu-devel] " Stefan Weil
2010-09-02 6:00 ` Michael S. Tsirkin
2010-09-02 9:08 ` Eduard - Gabriel Munteanu
2010-09-02 13:24 ` Anthony Liguori
2010-09-02 8:51 ` Eduard - Gabriel Munteanu
2010-09-02 16:05 ` Stefan Weil
2010-09-02 16:14 ` Eduard - Gabriel Munteanu
2010-09-13 20:01 ` [Qemu-devel] [PATCH RFC] dma_rw.h (was Re: [PATCH 0/7] AMD IOMMU emulation patchset v4) Michael S. Tsirkin
2010-09-13 20:45 ` Anthony Liguori
2010-09-16 7:12 ` Eduard - Gabriel Munteanu
2010-09-16 9:35 ` Michael S. Tsirkin
2010-09-16 7:06 ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-09-16 9:20 ` Michael S. Tsirkin
2010-09-16 11:15 ` Eduard - Gabriel Munteanu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100902052826.GB4273@redhat.com \
--to=mst@redhat.com \
--cc=avi@redhat.com \
--cc=blauwirbel@gmail.com \
--cc=eduard.munteanu@linux360.ro \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=paul@codesourcery.com \
--cc=qemu-devel@nongnu.org \
--cc=yamahata@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).