All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
	Alexander Graf <agraf@suse.de>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [Qemu-devel] [PATCH qemu v17 10/12] vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)
Date: Wed, 8 Jun 2016 15:56:02 +1000	[thread overview]
Message-ID: <20160608055602.GW9226@voom.fritz.box> (raw)
In-Reply-To: <bd93c771-34d7-0481-8c74-a8b5bd7f4e6b@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 12998 bytes --]

On Mon, Jun 06, 2016 at 04:45:50PM +1000, Alexey Kardashevskiy wrote:
> On 03/06/16 17:37, David Gibson wrote:
> > On Wed, Jun 01, 2016 at 06:57:41PM +1000, Alexey Kardashevskiy wrote:
> >> New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
> >> This adds ability to VFIO common code to dynamically allocate/remove
> >> DMA windows in the host kernel when new VFIO container is added/removed.
> >>
> >> This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to vfio_listener_region_add
> >> and adds just created IOMMU into the host IOMMU list; the opposite
> >> action is taken in vfio_listener_region_del.
> >>
> >> When creating a new window, this uses heuristic to decide on the TCE table
> >> levels number.
> >>
> >> This should cause no guest visible change in behavior.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >> Changes:
> >> v17:
> >> * moved spapr window create/remove helpers to separate file
> >> * added hw_error() if vfio_host_win_del() failed
> >>
> >> v16:
> >> * used memory_region_iommu_get_page_sizes() in vfio_listener_region_add()
> >> * enforced no intersections between windows
> >>
> >> v14:
> >> * new to the series
> >> ---
> >>  hw/vfio/common.c              | 76 +++++++++++++++++++++++++++++++++++++------
> >>  hw/vfio/spapr.c               | 70 +++++++++++++++++++++++++++++++++++++++
> >>  include/hw/vfio/vfio-common.h |  6 ++++
> >>  trace-events                  |  2 ++
> >>  4 files changed, 144 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 52b08fd..7f55c26 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -275,6 +275,18 @@ static void vfio_host_win_add(VFIOContainer *container,
> >>      QLIST_INSERT_HEAD(&container->hostwin_list, hostwin, hostwin_next);
> >>  }
> >>  
> >> +static int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova)
> >> +{
> >> +    VFIOHostDMAWindow *hostwin = vfio_host_win_lookup(container, min_iova, 1);
> > 
> > Hrm.. and for this case I think you want exact match, rather than
> > looking for range inclusion.
> 
> I suppose so, I'll change this.
> 
> 
> >> +
> >> +    if (!hostwin) {
> >> +        return -1;
> >> +    }
> >> +    QLIST_REMOVE(hostwin, hostwin_next);
> >> +
> >> +    return 0;
> >> +}
> >> +
> >>  static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> >>  {
> >>      return (!memory_region_is_ram(section->mr) &&
> >> @@ -388,6 +400,30 @@ static void vfio_listener_region_add(MemoryListener *listener,
> >>      }
> >>      end = int128_get64(int128_sub(llend, int128_one()));
> >>  
> >> +    if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
> >> +        VFIOHostDMAWindow *hostwin;
> >> +        hwaddr pgsize = 0;
> >> +
> >> +        /* For now intersections are not allowed, we may relax this later */
> >> +        QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> >> +            if (ranges_overlap(hostwin->min_iova,
> >> +                               hostwin->max_iova - hostwin->min_iova + 1,
> >> +                               section->offset_within_address_space,
> >> +                               int128_get64(section->size))) {
> >> +                goto fail;
> >> +            }
> >> +        }
> >> +
> >> +        ret = vfio_spapr_create_window(container, section, &pgsize);
> >> +        if (ret) {
> >> +            goto fail;
> >> +        }
> >> +
> >> +        vfio_host_win_add(container, section->offset_within_address_space,
> >> +                          section->offset_within_address_space +
> >> +                          int128_get64(section->size) - 1, pgsize);
> >> +    }
> >> +
> >>      if (!vfio_host_win_lookup(container, iova, end)) {
> >>          error_report("vfio: IOMMU container %p can't map guest IOVA region"
> >>                       " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> >> @@ -523,6 +559,18 @@ static void vfio_listener_region_del(MemoryListener *listener,
> >>                       "0x%"HWADDR_PRIx") = %d (%m)",
> >>                       container, iova, int128_get64(llsize), ret);
> >>      }
> >> +
> >> +    if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
> >> +        vfio_spapr_remove_window(container,
> >> +                                 section->offset_within_address_space);
> > 
> > Should check for error here.
> 
> 
> And do what here? vfio_spapr_remove_window() calls error_report() already
> and I still want to remove the host window here.

Hmm.. yes, I guess so.  Probably best to have a comment here saying
why it's safe to ignore an error you should usually test for.

> > 
> >> +        if (vfio_host_win_del(container,
> >> +                              section->offset_within_address_space) < 0) {
> >> +            hw_error("%s: Cannot delete missing window at %"HWADDR_PRIx,
> >> +                     __func__, section->offset_within_address_space);
> > 
> > Personally I think assert() would be better here, but Alex doesn't
> > like them so I'm ok with this.
> > 
> >> +        }
> >> +
> >> +        trace_vfio_spapr_remove_window(section->offset_within_address_space);
> >> +    }
> >>  }
> >>  
> >>  static const MemoryListener vfio_memory_listener = {
> >> @@ -960,11 +1008,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
> >>              }
> >>          }
> >>  
> >> -        /*
> >> -         * This only considers the host IOMMU's 32-bit window.  At
> >> -         * some point we need to add support for the optional 64-bit
> >> -         * window and dynamic windows
> >> -         */
> >>          info.argsz = sizeof(info);
> >>          ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> >>          if (ret) {
> >> @@ -973,11 +1016,24 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
> >>              goto listener_release_exit;
> >>          }
> >>  
> >> -        /* The default table uses 4K pages */
> >> -        vfio_host_win_add(container, info.dma32_window_start,
> >> -                          info.dma32_window_start +
> >> -                          info.dma32_window_size - 1,
> >> -                          0x1000);
> >> +        if (v2) {
> >> +            /*
> >> +             * There is a default window in just created container.
> >> +             * To make region_add/del simpler, we better remove this
> >> +             * window now and let those iommu_listener callbacks
> >> +             * create/remove them when needed.
> >> +             */
> >> +            ret = vfio_spapr_remove_window(container, info.dma32_window_start);
> >> +            if (ret) {
> >> +                goto free_container_exit;
> >> +            }
> >> +        } else {
> >> +            /* The default table uses 4K pages */
> >> +            vfio_host_win_add(container, info.dma32_window_start,
> >> +                              info.dma32_window_start +
> >> +                              info.dma32_window_size - 1,
> >> +                              0x1000);
> >> +        }
> >>      } else {
> >>          error_report("vfio: No available IOMMU models");
> >>          ret = -EINVAL;
> >> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> >> index f339472..0c784c4 100644
> >> --- a/hw/vfio/spapr.c
> >> +++ b/hw/vfio/spapr.c
> >> @@ -135,3 +135,73 @@ const MemoryListener vfio_prereg_listener = {
> >>      .region_add = vfio_prereg_listener_region_add,
> >>      .region_del = vfio_prereg_listener_region_del,
> >>  };
> >> +
> >> +int vfio_spapr_create_window(VFIOContainer *container,
> >> +                             MemoryRegionSection *section,
> >> +                             hwaddr *pgsize)
> >> +{
> >> +    int ret;
> >> +    unsigned pagesizes = memory_region_iommu_get_page_sizes(section->mr);
> >> +    unsigned pagesize = (hwaddr)1 << ctz64(pagesizes);
> >> +    unsigned entries, pages;
> >> +    struct vfio_iommu_spapr_tce_create create = { .argsz = sizeof(create) };
> >> +
> >> +    /*
> >> +     * FIXME: For VFIO iommu types which have KVM acceleration to
> >> +     * avoid bouncing all map/unmaps through qemu this way, this
> >> +     * would be the right place to wire that up (tell the KVM
> >> +     * device emulation the VFIO iommu handles to use).
> >> +     */
> >> +    create.window_size = int128_get64(section->size);
> >> +    create.page_shift = ctz64(pagesize);
> > 
> > Doing a ctz on a value which is defined as 1 << n seems a bit
> > perverse.
> 
> Well, this way it felt more obvious that pagesize is a single page size,
> not a mask. Not sure if memory_region_iommu_get_page_sizes() returning a
> mask (rather than a page size) is a good idea after all...
> 
> I'll make it:
> 
>  create.page_shift = ctz64(pagesizes);
> 
> and (below):
> 
>  *pgsize = 1ULL << create.page_shift;

Yes, I think that's better.

> and remove pagesize.
> 
> >> +    /*
> >> +     * SPAPR host supports multilevel TCE tables, there is some
> >> +     * heuristic to decide how many levels we want for our table:
> >> +     * 0..64 = 1; 65..4096 = 2; 4097..262144 = 3; 262145.. = 4
> >> +     */
> >> +    entries = create.window_size >> create.page_shift;
> >> +    pages = MAX((entries * sizeof(uint64_t)) / getpagesize(), 1);
> >> +    pages = MAX(pow2ceil(pages) - 1, 1); /* Round up */
> >> +    create.levels = ctz64(pages) / 6 + 1;
> >> +
> >> +    ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
> >> +    if (ret) {
> >> +        error_report("Failed to create a window, ret = %d (%m)", ret);
> >> +        return -errno;
> >> +    }
> >> +
> >> +    if (create.start_addr != section->offset_within_address_space) {
> >> +        vfio_spapr_remove_window(container, create.start_addr);
> >> +
> >> +        error_report("Host doesn't support DMA window at %"HWADDR_PRIx", must be %"PRIx64,
> >> +                     section->offset_within_address_space,
> >> +                     create.start_addr);
> >> +        ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
> >> +        return -EINVAL;
> >> +    }
> >> +    trace_vfio_spapr_create_window(create.page_shift,
> >> +                                   create.window_size,
> >> +                                   create.start_addr);
> >> +    *pgsize = pagesize;
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +int vfio_spapr_remove_window(VFIOContainer *container,
> >> +                             hwaddr offset_within_address_space)
> >> +{
> >> +    struct vfio_iommu_spapr_tce_remove remove = {
> >> +        .argsz = sizeof(remove),
> >> +        .start_addr = offset_within_address_space,
> >> +    };
> >> +    int ret;
> >> +
> >> +    ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
> >> +    if (ret) {
> >> +        error_report("Failed to remove window at %"PRIx64,
> >> +                     remove.start_addr);
> >> +        return -errno;
> >> +    }
> >> +
> >> +    return 0;
> >> +}
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> index c76ddc4..7e80382 100644
> >> --- a/include/hw/vfio/vfio-common.h
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -167,4 +167,10 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
> >>  #endif
> >>  extern const MemoryListener vfio_prereg_listener;
> >>  
> >> +int vfio_spapr_create_window(VFIOContainer *container,
> >> +                             MemoryRegionSection *section,
> >> +                             hwaddr *pgsize);
> >> +int vfio_spapr_remove_window(VFIOContainer *container,
> >> +                             hwaddr offset_within_address_space);
> >> +
> >>  #endif /* !HW_VFIO_VFIO_COMMON_H */
> >> diff --git a/trace-events b/trace-events
> >> index ddb8676..ec32c20 100644
> >> --- a/trace-events
> >> +++ b/trace-events
> >> @@ -1768,6 +1768,8 @@ vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sp
> >>  vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
> >>  vfio_ram_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
> >>  vfio_ram_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
> >> +vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64
> >> +vfio_spapr_remove_window(uint64_t off) "offset=%"PRIx64
> >>  
> >>  # hw/vfio/platform.c
> >>  vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
> > 
> 
> 




-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2016-06-08  6:02 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1464771463-37214-1-git-send-email-aik@ozlabs.ru>
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 01/12] vmstate: Define VARRAY with VMS_ALLOC Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 02/12] spapr_iommu: Introduce "enabled" state for TCE table Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 03/12] spapr_iommu: Migrate full state Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 04/12] spapr_iommu: Add root memory region Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 05/12] spapr_pci: Reset DMA config on PHB reset Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 06/12] memory: Add reporting of supported page sizes Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 07/12] vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 08/12] spapr_pci: Add and export DMA resetting helper Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 09/12] vfio: Add host side DMA window capabilities Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 10/12] vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 11/12] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) Alexey Kardashevskiy
2016-06-01  8:57 ` [Qemu-devel] [PATCH qemu v17 12/12] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping listening Alexey Kardashevskiy
     [not found] ` <201606010902.u518wwmb029353@mx0a-001b2d01.pphosted.com>
2016-06-02  3:35   ` [Qemu-devel] [PATCH qemu v17 06/12] memory: Add reporting of supported page sizes David Gibson
2016-06-06 13:31     ` Paolo Bonzini
2016-06-07  3:42       ` Alexey Kardashevskiy
2016-06-08  6:00       ` David Gibson
2016-06-08  6:05         ` Alexey Kardashevskiy
2016-06-14 21:41           ` Alexey Kardashevskiy
2016-06-15  6:15           ` David Gibson
     [not found] ` <201606010900.u518wvH7046287@mx0a-001b2d01.pphosted.com>
2016-06-02  4:18   ` [Qemu-devel] [PATCH qemu v17 07/12] vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) David Gibson
     [not found] ` <201606010902.u518x15j023604@mx0a-001b2d01.pphosted.com>
2016-06-02  4:19   ` [Qemu-devel] [PATCH qemu v17 08/12] spapr_pci: Add and export DMA resetting helper David Gibson
     [not found] ` <201606010901.u518wwEL029369@mx0a-001b2d01.pphosted.com>
2016-06-03  7:23   ` [Qemu-devel] [PATCH qemu v17 09/12] vfio: Add host side DMA window capabilities David Gibson
     [not found] ` <201606011012.u51A9A6i023070@mx0a-001b2d01.pphosted.com>
2016-06-03  7:37   ` [Qemu-devel] [PATCH qemu v17 10/12] vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) David Gibson
2016-06-06  6:45     ` Alexey Kardashevskiy
2016-06-08  5:56       ` David Gibson [this message]
     [not found] ` <201606010902.u51902Zl007699@mx0a-001b2d01.pphosted.com>
2016-06-03 15:37   ` [Qemu-devel] [PATCH qemu v17 06/12] memory: Add reporting of supported page sizes Alex Williamson
     [not found] ` <201606010900.u51900Om007391@mx0a-001b2d01.pphosted.com>
2016-06-03 16:13   ` [Qemu-devel] [PATCH qemu v17 07/12] vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) Alex Williamson
2016-06-06  6:04     ` Alexey Kardashevskiy
2016-06-06 17:20       ` Alex Williamson
2016-06-07  3:10         ` Alexey Kardashevskiy
     [not found] ` <201606010901.u518x843001647@mx0a-001b2d01.pphosted.com>
2016-06-03 16:32   ` [Qemu-devel] [PATCH qemu v17 09/12] vfio: Add host side DMA window capabilities Alex Williamson
     [not found] ` <201606010901.u518x7AQ001537@mx0a-001b2d01.pphosted.com>
2016-06-03 16:50   ` [Qemu-devel] [PATCH qemu v17 10/12] vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) Alex Williamson
     [not found] ` <201606011013.u51A9ALx023064@mx0a-001b2d01.pphosted.com>
2016-06-03 16:59   ` [Qemu-devel] [PATCH qemu v17 12/12] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping listening Alex Williamson
     [not found] ` <201606010901.u518x1wF023568@mx0a-001b2d01.pphosted.com>
2016-06-06  5:57   ` [Qemu-devel] [PATCH qemu v17 11/12] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) David Gibson
2016-06-06  8:12     ` Alexey Kardashevskiy
2016-06-08  5:57       ` David Gibson
2016-06-08  6:09         ` Alexey Kardashevskiy
2016-06-09  4:28           ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160608055602.GW9226@voom.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.