qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: qemu-devel@nongnu.org
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>,
	Alexander Graf <agraf@suse.de>,
	Alex Williamson <alex.williamson@redhat.com>,
	qemu-ppc@nongnu.org, Gavin Shan <gwshan@linux.vnet.ibm.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH qemu v9 09/13] spapr_pci: Enable vfio-pci hotplug
Date: Fri, 3 Jul 2015 21:39:24 +1000	[thread overview]
Message-ID: <5596746C.5040909@ozlabs.ru> (raw)
In-Reply-To: <1435922923-10731-10-git-send-email-aik@ozlabs.ru>

Oops, I have not added Mike in cc:, especially for this patch.


On 07/03/2015 09:28 PM, Alexey Kardashevskiy wrote:
> sPAPR IOMMU is managing two copies of an TCE table:
> 1) a guest view of the table - this is what emulated devices use and
> this is where H_GET_TCE reads from;
> 2) a hardware TCE table - only present if there is at least one vfio-pci
> device on a PHB; it is updated via a memory listener on a PHB address
> space which forwards map/unmap requests to vfio-pci IOMMU host driver.
>
> At the moment presence of vfio-pci devices on a bus affect the way
> the guest view table is allocated. If there is no vfio-pci on a PHB
> and the host kernel supports KVM acceleration of H_PUT_TCE, a table
> is allocated in KVM. However, if there is vfio-pci and we do yet not
> support KVM acceleration for these, the table has to be allocated
> by the userspace.
>
> When vfio-pci device is hotplugged and there were no vfio-pci devices
> already, the guest view table could have been allocated by KVM which
> means that H_PUT_TCE is handled by the host kernel and since we
> do not support vfio-pci in KVM, the hardware table will not be updated.
>
> This reallocates the guest view table in QEMU if the first vfio-pci
> device has just been plugged. spapr_tce_realloc_userspace() handles this.
>
> This replays all the mappings to make sure that the tables are in sync.
> This will not have a visible effect though as for a new device
> the guest kernel will allocate-and-map new addresses and therefore
> existing mappings from emulated devices will not be used by vfio-pci
> devices.
>
> This adds calls to spapr_phb_dma_capabilities_update() in PCI hotplug
> hooks.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v9:
> * spapr_phb_hotplug_dma_sync() enumerates TCE tables explicitely rather than
> via object_child_foreach()
> * spapr_phb_hotplug_dma_sync() does memory_region_del_subregion() +
> memory_region_add_subregion() as otherwise vfio_listener_region_del() is not
> called and we end up with vfio_iommu_map_notify registered twice (comments welcome!)
> if we do hotplug+hotunplug+hotplug of the same device.
> * moved spapr_phb_hotplug_dma_sync() on unplug event to rcu as before calling
> spapr_phb_hotplug_dma_sync(), we need VFIO to release the container, otherwise
> spapr_phb_dma_capabilities_update() will decide that the PHB still has VFIO device.
> Actual VFIO PCI device release happens from rcu and since we add ours later,
> it gets executed later and we are good.
> ---
>   hw/ppc/spapr_iommu.c        | 51 ++++++++++++++++++++++++++++++++++++++++++---
>   hw/ppc/spapr_pci.c          | 49 +++++++++++++++++++++++++++++++++++++++++++
>   include/hw/pci-host/spapr.h |  1 +
>   include/hw/ppc/spapr.h      |  2 ++
>   trace-events                |  2 ++
>   5 files changed, 102 insertions(+), 3 deletions(-)
>
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 45c00d8..2d99c3b 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -78,12 +78,13 @@ static uint64_t *spapr_tce_alloc_table(uint32_t liobn,
>                                          uint32_t nb_table,
>                                          uint32_t page_shift,
>                                          int *fd,
> -                                       bool vfio_accel)
> +                                       bool vfio_accel,
> +                                       bool force_userspace)
>   {
>       uint64_t *table = NULL;
>       uint64_t window_size = (uint64_t)nb_table << page_shift;
>
> -    if (kvm_enabled() && !(window_size >> 32)) {
> +    if (kvm_enabled() && !force_userspace && !(window_size >> 32)) {
>           table = kvmppc_create_spapr_tce(liobn, window_size, fd, vfio_accel);
>       }
>
> @@ -222,7 +223,8 @@ static void spapr_tce_table_do_enable(sPAPRTCETable *tcet, bool vfio_accel)
>                                           tcet->nb_table,
>                                           tcet->page_shift,
>                                           &tcet->fd,
> -                                        vfio_accel);
> +                                        vfio_accel,
> +                                        false);
>
>       memory_region_set_size(&tcet->iommu,
>                              (uint64_t)tcet->nb_table << tcet->page_shift);
> @@ -495,6 +497,49 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
>       return 0;
>   }
>
> +static int spapr_tce_do_replay(sPAPRTCETable *tcet, uint64_t *table)
> +{
> +    target_ulong ioba = tcet->bus_offset, pgsz = (1ULL << tcet->page_shift);
> +    long i, ret = 0;
> +
> +    for (i = 0; i < tcet->nb_table; ++i, ioba += pgsz) {
> +        ret = put_tce_emu(tcet, ioba, table[i]);
> +        if (ret) {
> +            break;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +int spapr_tce_replay(sPAPRTCETable *tcet)
> +{
> +    return spapr_tce_do_replay(tcet, tcet->table);
> +}
> +
> +int spapr_tce_realloc_userspace(sPAPRTCETable *tcet, bool replay)
> +{
> +    int ret = 0, oldfd;
> +    uint64_t *oldtable;
> +
> +    oldtable = tcet->table;
> +    oldfd = tcet->fd;
> +    tcet->table = spapr_tce_alloc_table(tcet->liobn,
> +                                        tcet->nb_table,
> +                                        tcet->page_shift,
> +                                        &tcet->fd,
> +                                        false,
> +                                        true); /* force_userspace */
> +
> +    if (replay) {
> +        ret = spapr_tce_do_replay(tcet, oldtable);
> +    }
> +
> +    spapr_tce_free_table(oldtable, oldfd, tcet->nb_table);
> +
> +    return ret;
> +}
> +
>   int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
>                         sPAPRTCETable *tcet)
>   {
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 76c988f..dca747f 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -827,6 +827,45 @@ int spapr_phb_dma_reset(sPAPRPHBState *sphb)
>       return 0;
>   }
>
> +static int spapr_phb_hotplug_dma_sync(sPAPRPHBState *sphb)
> +{
> +    int ret = 0, i;
> +    bool had_vfio = sphb->has_vfio;
> +    sPAPRTCETable *tcet;
> +
> +    spapr_phb_dma_capabilities_update(sphb);
> +
> +    /* We only update DMA config if there was no VFIO and now we got one */
> +    if (had_vfio || !sphb->has_vfio) {
> +        return 0;
> +    }
> +
> +    for (i = 0; i < SPAPR_PCI_DMA_MAX_WINDOWS; ++i) {
> +        tcet = spapr_tce_find_by_liobn(SPAPR_PCI_LIOBN(sphb->index, i));
> +        if (!tcet || !tcet->enabled) {
> +            continue;
> +        }
> +        if (tcet->fd >= 0) {
> +            /*
> +             * We got first vfio-pci device on accelerated table.
> +             * VFIO acceleration is not possible.
> +             * Reallocate table in userspace and replay mappings.
> +             */
> +            ret = spapr_tce_realloc_userspace(tcet, true);
> +            trace_spapr_pci_dma_realloc_update(tcet->liobn, ret);
> +        } else {
> +            /* There was no acceleration, so just replay mappings. */
> +            ret = spapr_tce_replay(tcet);
> +            trace_spapr_pci_dma_update(tcet->liobn, ret);
> +        }
> +        if (ret) {
> +            break;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
>   /* Macros to operate with address in OF binding to PCI */
>   #define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
>   #define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> @@ -1106,6 +1145,7 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector *drc,
>               error_setg(errp, "Failed to create pci child device tree node");
>               goto out;
>           }
> +        spapr_phb_hotplug_dma_sync(phb);
>       }
>
>       drck->attach(drc, DEVICE(pdev),
> @@ -1116,6 +1156,12 @@ out:
>       }
>   }
>
> +static void spapr_phb_remove_sync_dma(struct rcu_head *head)
> +{
> +    sPAPRPHBState *sphb = container_of(head, sPAPRPHBState, rcu);
> +    spapr_phb_hotplug_dma_sync(sphb);
> +}
> +
>   static void spapr_phb_remove_pci_device_cb(DeviceState *dev, void *opaque)
>   {
>       /* some version guests do not wait for completion of a device
> @@ -1130,6 +1176,9 @@ static void spapr_phb_remove_pci_device_cb(DeviceState *dev, void *opaque)
>        */
>       pci_device_reset(PCI_DEVICE(dev));
>       object_unparent(OBJECT(dev));
> +
> +    /* Actual VFIO device release happens from RCU so postpone DMA update */
> +    call_rcu1(&((sPAPRPHBState *)opaque)->rcu, spapr_phb_remove_sync_dma);
>   }
>
>   static void spapr_phb_remove_pci_device(sPAPRDRConnector *drc,
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index bf66315..8b007aa 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -61,6 +61,7 @@ typedef struct spapr_pci_msi_mig {
>
>   struct sPAPRPHBState {
>       PCIHostState parent_obj;
> +    struct rcu_head rcu;
>
>       uint32_t index;
>       uint64_t buid;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index e32e787..4645f16 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -588,6 +588,8 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
>                    uint32_t liobn, uint64_t window, uint32_t size);
>   int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
>                         sPAPRTCETable *tcet);
> +int spapr_tce_replay(sPAPRTCETable *tcet);
> +int spapr_tce_realloc_userspace(sPAPRTCETable *tcet, bool replay);
>   void spapr_pci_switch_vga(bool big_endian);
>   void spapr_hotplug_req_add_event(sPAPRDRConnector *drc);
>   void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc);
> diff --git a/trace-events b/trace-events
> index a93af9a..a994019 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1300,6 +1300,8 @@ spapr_pci_rtas_ibm_query_interrupt_source_number(unsigned ioa, unsigned intr) "q
>   spapr_pci_msi_write(uint64_t addr, uint64_t data, uint32_t dt_irq) "@%"PRIx64"<=%"PRIx64" IRQ %u"
>   spapr_pci_lsi_set(const char *busname, int pin, uint32_t irq) "%s PIN%d IRQ %u"
>   spapr_pci_msi_retry(unsigned config_addr, unsigned req_num, unsigned max_irqs) "Guest device at %x asked %u, have only %u"
> +spapr_pci_dma_update(uint64_t liobn, long ret) "liobn=%"PRIx64" ret=%ld"
> +spapr_pci_dma_realloc_update(uint64_t liobn, long ret) "liobn=%"PRIx64" tcet=%ld"
>
>   # hw/pci/pci.c
>   pci_update_mappings_del(void *d, uint32_t bus, uint32_t func, uint32_t slot, int bar, uint64_t addr, uint64_t size) "d=%p %02x:%02x.%x %d,%#"PRIx64"+%#"PRIx64
>


-- 
Alexey

  reply	other threads:[~2015-07-03 11:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-03 11:28 [Qemu-devel] [PATCH qemu v9 00/13] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 01/13] vmstate: Define VARRAY with VMS_ALLOC Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 02/13] spapr_pci: Convert finish_realize() to dma_capabilities_update()+dma_init_window() Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 03/13] spapr_iommu: Move table allocation to helpers Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 04/13] spapr_iommu: Introduce "enabled" state for TCE table Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 05/13] spapr_iommu: Remove vfio_accel flag from sPAPRTCETable Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 06/13] spapr_iommu: Add root memory region Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 07/13] spapr_pci: Do complete reset of DMA config when resetting PHB Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 08/13] spapr_vfio_pci: Remove redundant spapr-pci-vfio-host-bridge Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 09/13] spapr_pci: Enable vfio-pci hotplug Alexey Kardashevskiy
2015-07-03 11:39   ` Alexey Kardashevskiy [this message]
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 10/13] spapr_pci_vfio: Enable multiple groups per container Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 11/13] linux headers update for DDW on SPAPR Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 12/13] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) Alexey Kardashevskiy
2015-07-03 11:28 ` [Qemu-devel] [PATCH qemu v9 13/13] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5596746C.5040909@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).