From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Alex Williamson <alex.williamson@redhat.com>,
qemu-ppc@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH qemu v14 17/18] vfio/spapr: Use VFIO_SPAPR_TCE_v2_IOMMU
Date: Tue, 22 Mar 2016 16:54:07 +1100 [thread overview]
Message-ID: <56F0DDFF.7050308@ozlabs.ru> (raw)
In-Reply-To: <20160322051449.GG23586@voom.redhat.com>
On 03/22/2016 04:14 PM, David Gibson wrote:
> On Mon, Mar 21, 2016 at 06:47:05PM +1100, Alexey Kardashevskiy wrote:
>> New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
>> This adds ability to VFIO common code to dynamically allocate/remove
>> DMA windows in the host kernel when new VFIO container is added/removed.
>>
>> This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to vfio_listener_region_add
>> and adds just created IOMMU into the host IOMMU list; the opposite
>> action is taken in vfio_listener_region_del.
>>
>> When creating a new window, this uses euristic to decide on the TCE table
>> levels number.
>>
>> This should cause no guest visible change in behavior.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v14:
>> * new to the series
>>
>> ---
>> TODO:
>> * export levels to PHB
>> ---
>> hw/vfio/common.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>> trace-events | 2 ++
>> 2 files changed, 105 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 4e873b7..421d6eb 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -279,6 +279,14 @@ static int vfio_host_iommu_add(VFIOContainer *container,
>> return 0;
>> }
>>
>> +static void vfio_host_iommu_del(VFIOContainer *container, hwaddr min_iova)
>> +{
>> + VFIOHostIOMMU *hiommu = vfio_host_iommu_lookup(container, min_iova, 0x1000);
>
> The hard-coded 0x1000 looks dubious..
Well, that's the minimal page size...
>
>> + g_assert(hiommu);
>> + QLIST_REMOVE(hiommu, hiommu_next);
>> +}
>> +
>> static bool vfio_listener_skipped_section(MemoryRegionSection *section)
>> {
>> return (!memory_region_is_ram(section->mr) &&
>> @@ -392,6 +400,61 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> }
>> end = int128_get64(llend);
>>
>> + if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
>
> I think this would be clearer split out into a helper function,
> vfio_create_host_window() or something.
It is rather vfio_spapr_create_host_window() and we were avoiding
xxx_spapr_xxx so far. I'd cut-n-paste the SPAPR PCI AS listener to a
separate file but this usually triggers more discussion and never ends well.
>> + unsigned entries, pages;
>> + struct vfio_iommu_spapr_tce_create create = { .argsz = sizeof(create) };
>> +
>> + g_assert(section->mr->iommu_ops);
>> + g_assert(memory_region_is_iommu(section->mr));
>
> I don't think you need these asserts. AFAICT the same logic should
> work if a RAM MR was added directly to PCI address space - this would
> create the new host window, then the existing code for adding a RAM MR
> would map that block of RAM statically into the new window.
In what configuration/machine can we do that on SPAPR?
>> + trace_vfio_listener_region_add_iommu(iova, end - 1);
>> + /*
>> + * FIXME: For VFIO iommu types which have KVM acceleration to
>> + * avoid bouncing all map/unmaps through qemu this way, this
>> + * would be the right place to wire that up (tell the KVM
>> + * device emulation the VFIO iommu handles to use).
>> + */
>> + create.window_size = memory_region_size(section->mr);
>> + create.page_shift =
>> + ctz64(section->mr->iommu_ops->get_page_sizes(section->mr));
>
> Ah.. except that I guess you'd need to fall back to host page size
> here to handle a RAM MR.
Can you give an example of such RAM MR being added to PCI AS on SPAPR?
>> + /*
>> + * SPAPR host supports multilevel TCE tables, there is some
>> + * euristic to decide how many levels we want for our table:
>> + * 0..64 = 1; 65..4096 = 2; 4097..262144 = 3; 262145.. = 4
>> + */
>> + entries = create.window_size >> create.page_shift;
>> + pages = (entries * sizeof(uint64_t)) / getpagesize();
>> + create.levels = ctz64(pow2ceil(pages) - 1) / 6 + 1;
>> +
>> + ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
>> + if (ret) {
>> + error_report("Failed to create a window, ret = %d (%m)", ret);
>> + goto fail;
>> + }
>> +
>> + if (create.start_addr != section->offset_within_address_space ||
>> + vfio_host_iommu_lookup(container, create.start_addr,
>> + create.start_addr + create.window_size - 1)) {
>
> Under what circumstances can this trigger? Is the kernel ioctl
> allowed to return a different window start address than the one
> requested?
You already asked this some time ago :) The userspace cannot request
address, the host kernel returns one.
> The second check looks very strange - if it returns true doesn't that
> mean you *do* have host window which can accomodate this guest region,
> which is what you want?
This should not happen, this is what this check is for. Can make it
assert() or something like this.
>
>> + struct vfio_iommu_spapr_tce_remove remove = {
>> + .argsz = sizeof(remove),
>> + .start_addr = create.start_addr
>> + };
>> + error_report("Host doesn't support DMA window at %"HWADDR_PRIx", must be %"PRIx64,
>> + section->offset_within_address_space,
>> + create.start_addr);
>> + ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
>> + ret = -EINVAL;
>> + goto fail;
>> + }
>> + trace_vfio_spapr_create_window(create.page_shift,
>> + create.window_size,
>> + create.start_addr);
>> +
>> + vfio_host_iommu_add(container, create.start_addr,
>> + create.start_addr + create.window_size - 1,
>> + 1ULL << create.page_shift);
>> + }
>> +
>> if (!vfio_host_iommu_lookup(container, iova, end - 1)) {
>> error_report("vfio: IOMMU container %p can't map guest IOVA region"
>> " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
>> @@ -525,6 +588,22 @@ static void vfio_listener_region_del(MemoryListener *listener,
>> container, iova, end - iova, ret);
>> }
>>
>> + if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
>> + struct vfio_iommu_spapr_tce_remove remove = {
>> + .argsz = sizeof(remove),
>> + .start_addr = section->offset_within_address_space,
>> + };
>> + ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
>> + if (ret) {
>> + error_report("Failed to remove window at %"PRIx64,
>> + remove.start_addr);
>> + }
>> +
>> + vfio_host_iommu_del(container, section->offset_within_address_space);
>> +
>> + trace_vfio_spapr_remove_window(remove.start_addr);
>> + }
>> +
>> if (iommu && iommu->iommu_ops && iommu->iommu_ops->vfio_stop) {
>> iommu->iommu_ops->vfio_stop(section->mr);
>> }
>> @@ -928,11 +1007,30 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>> goto listener_release_exit;
>> }
>>
>> - /* The default table uses 4K pages */
>> - vfio_host_iommu_add(container, info.dma32_window_start,
>> - info.dma32_window_start +
>> - info.dma32_window_size - 1,
>> - 0x1000);
>> + if (v2) {
>> + /*
>> + * There is a default window in just created container.
>> + * To make region_add/del simpler, we better remove this
>> + * window now and let those iommu_listener callbacks
>> + * create/remove them when needed.
>> + */
>> + struct vfio_iommu_spapr_tce_remove remove = {
>> + .argsz = sizeof(remove),
>> + .start_addr = info.dma32_window_start,
>> + };
>> + ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
>> + if (ret) {
>> + error_report("vfio: VFIO_IOMMU_SPAPR_TCE_REMOVE failed: %m");
>> + ret = -errno;
>> + goto free_container_exit;
>> + }
>> + } else {
>> + /* The default table uses 4K pages */
>> + vfio_host_iommu_add(container, info.dma32_window_start,
>> + info.dma32_window_start +
>> + info.dma32_window_size - 1,
>> + 0x1000);
>> + }
>> } else {
>> error_report("vfio: No available IOMMU models");
>> ret = -EINVAL;
>> diff --git a/trace-events b/trace-events
>> index cc619e1..f2b75a3 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -1736,6 +1736,8 @@ vfio_region_finalize(const char *name, int index) "Device %s, region %d"
>> vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps enabled: %d"
>> vfio_ram_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
>> vfio_ram_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
>> +vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64
>> +vfio_spapr_remove_window(uint64_t off) "offset=%"PRIx64
>>
>> # hw/vfio/platform.c
>> vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
>
--
Alexey
next prev parent reply other threads:[~2016-03-22 5:54 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-21 7:46 [Qemu-devel] [PATCH qemu v14 00/18] spapr: vfio: Enable Dynamic DMA windows (DDW) Alexey Kardashevskiy
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 01/18] memory: Fix IOMMU replay base address Alexey Kardashevskiy
2016-03-22 0:49 ` David Gibson
2016-03-22 3:12 ` Alexey Kardashevskiy
2016-03-22 3:26 ` David Gibson
2016-03-22 4:28 ` Alexey Kardashevskiy
2016-03-22 4:59 ` David Gibson
2016-03-22 7:19 ` Alexey Kardashevskiy
2016-03-22 23:07 ` David Gibson
2016-03-23 10:58 ` Paolo Bonzini
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 02/18] vmstate: Define VARRAY with VMS_ALLOC Alexey Kardashevskiy
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 03/18] spapr_pci: Move DMA window enablement to a helper Alexey Kardashevskiy
2016-03-22 1:02 ` David Gibson
2016-03-22 3:17 ` Alexey Kardashevskiy
2016-03-22 3:28 ` David Gibson
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 04/18] spapr_iommu: Move table allocation to helpers Alexey Kardashevskiy
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 05/18] spapr_iommu: Introduce "enabled" state for TCE table Alexey Kardashevskiy
2016-03-22 1:11 ` David Gibson
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 06/18] spapr_iommu: Finish renaming vfio_accel to need_vfio Alexey Kardashevskiy
2016-03-22 1:18 ` David Gibson
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 07/18] spapr_iommu: Realloc table during migration Alexey Kardashevskiy
2016-03-22 1:23 ` David Gibson
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 08/18] spapr_iommu: Migrate full state Alexey Kardashevskiy
2016-03-22 1:31 ` David Gibson
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 09/18] spapr_iommu: Add root memory region Alexey Kardashevskiy
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 10/18] spapr_pci: Reset DMA config on PHB reset Alexey Kardashevskiy
2016-03-21 7:46 ` [Qemu-devel] [PATCH qemu v14 11/18] memory: Add reporting of supported page sizes Alexey Kardashevskiy
2016-03-22 3:02 ` David Gibson
2016-03-21 7:47 ` [Qemu-devel] [PATCH qemu v14 12/18] vfio: Check that IOMMU MR translates to system address space Alexey Kardashevskiy
2016-03-22 3:05 ` David Gibson
2016-03-22 15:47 ` Alex Williamson
2016-03-23 0:43 ` David Gibson
2016-03-23 0:44 ` Alexey Kardashevskiy
2016-03-21 7:47 ` [Qemu-devel] [PATCH qemu v14 13/18] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) Alexey Kardashevskiy
2016-03-22 4:04 ` David Gibson
2016-03-21 7:47 ` [Qemu-devel] [PATCH qemu v14 14/18] spapr_pci: Add and export DMA resetting helper Alexey Kardashevskiy
2016-03-21 7:47 ` [Qemu-devel] [PATCH qemu v14 15/18] vfio: Add host side IOMMU capabilities Alexey Kardashevskiy
2016-03-22 4:20 ` David Gibson
2016-03-22 6:47 ` Alexey Kardashevskiy
2016-03-21 7:47 ` [Qemu-devel] [PATCH qemu v14 16/18] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping being used by VFIO Alexey Kardashevskiy
2016-03-22 4:45 ` David Gibson
2016-03-22 6:24 ` Alexey Kardashevskiy
2016-03-22 10:22 ` David Gibson
2016-03-21 7:47 ` [Qemu-devel] [PATCH qemu v14 17/18] vfio/spapr: Use VFIO_SPAPR_TCE_v2_IOMMU Alexey Kardashevskiy
2016-03-22 5:14 ` David Gibson
2016-03-22 5:54 ` Alexey Kardashevskiy [this message]
2016-03-23 1:08 ` David Gibson
2016-03-23 2:12 ` Alexey Kardashevskiy
2016-03-23 2:53 ` David Gibson
2016-03-23 3:06 ` Alexey Kardashevskiy
2016-03-23 6:03 ` David Gibson
2016-03-24 0:03 ` Alexey Kardashevskiy
2016-03-24 9:10 ` Alexey Kardashevskiy
2016-03-29 5:30 ` David Gibson
2016-03-29 5:44 ` Alexey Kardashevskiy
2016-03-29 6:44 ` David Gibson
2016-03-21 7:47 ` [Qemu-devel] [PATCH qemu v14 18/18] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) Alexey Kardashevskiy
2016-03-23 2:13 ` David Gibson
2016-03-23 3:28 ` Alexey Kardashevskiy
2016-03-23 6:11 ` David Gibson
2016-03-24 2:32 ` Alexey Kardashevskiy
2016-03-29 5:22 ` David Gibson
2016-03-29 6:23 ` Alexey Kardashevskiy
2016-03-31 3:19 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56F0DDFF.7050308@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.