Re: [PATCH v3 05/10] vfio: Support for RamDiscardMgr in the !vIOMMU case

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	Wei Yang <richard.weiyang@linux.alibaba.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Auger Eric <eric.auger@redhat.com>,
	teawater <teawaterz@linux.alibaba.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Marek Kedzierski <mkedzier@redhat.com>
Subject: Re: [PATCH v3 05/10] vfio: Support for RamDiscardMgr in the !vIOMMU case
Date: Thu, 17 Dec 2020 12:59:05 -0700	[thread overview]
Message-ID: <20201217125905.7cff1c12@omen.home> (raw)
In-Reply-To: <ff63c12d-9d96-0650-de19-9331091aaaf7@redhat.com>

On Thu, 17 Dec 2020 19:55:55 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 17.12.20 19:36, Alex Williamson wrote:
> > On Wed, 16 Dec 2020 15:11:55 +0100
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> Implement support for RamDiscardMgr, to prepare for virtio-mem
> >> support. Instead of mapping the whole memory section, we only map
> >> "populated" parts and update the mapping when notified about
> >> discarding/population of memory via the RamDiscardListener. Similarly, when
> >> syncing the dirty bitmaps, sync only the actually mapped (populated) parts
> >> by replaying via the notifier.
> >>
> >> Small mapping granularity is problematic for vfio, because we might run out
> >> of mappings. Indicate virito-mem as one of the problematic parts when
> >> warning in vfio_container_dma_reserve() to at least make users aware that
> >> there is such a limitation.
> >>
> >> Using virtio-mem with vfio is still blocked via
> >> ram_block_discard_disable()/ram_block_discard_require() after this patch.
> >>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> Cc: Igor Mammedov <imammedo@redhat.com>
> >> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> >> Cc: Peter Xu <peterx@redhat.com>
> >> Cc: Auger Eric <eric.auger@redhat.com>
> >> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> >> Cc: teawater <teawaterz@linux.alibaba.com>
> >> Cc: Marek Kedzierski <mkedzier@redhat.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  hw/vfio/common.c              | 213 +++++++++++++++++++++++++++++++++-
> >>  include/hw/vfio/vfio-common.h |  13 +++
> >>  2 files changed, 225 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 5ad88d476f..b1582be1e8 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -296,7 +296,8 @@ static void vfio_container_dma_reserve(VFIOContainer *container,
> >>      container->dma_reserved += dma_mappings;
> >>      if (!warned && container->dma_max &&
> >>          container->dma_reserved > container->dma_max) {
> >> -        warn_report("%s: possibly running out of DMA mappings. "
> >> +        warn_report("%s: possibly running out of DMA mappings. E.g., try"
> >> +                    " increasing the 'block-size' of virtio-mem devies."
> >>                      " Maximum number of DMA mappings: %d", __func__,
> >>                      container->dma_max);
> >>      }
> >> @@ -674,6 +675,146 @@ out:
> >>      rcu_read_unlock();
> >>  }
> >>  
> >> +static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
> >> +                                            const MemoryRegion *mr,
> >> +                                            ram_addr_t offset, ram_addr_t size)
> >> +{
> >> +    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
> >> +                                                listener);
> >> +    const hwaddr mr_start = MAX(offset, vrdl->offset_within_region);
> >> +    const hwaddr mr_end = MIN(offset + size,
> >> +                              vrdl->offset_within_region + vrdl->size);
> >> +    const hwaddr iova = mr_start - vrdl->offset_within_region +
> >> +                        vrdl->offset_within_address_space;
> >> +    int ret;
> >> +
> >> +    if (mr_start >= mr_end) {
> >> +        return;
> >> +    }
> >> +
> >> +    /* Unmap with a single call. */
> >> +    ret = vfio_dma_unmap(vrdl->container, iova, mr_end - mr_start, NULL);
> >> +    if (ret) {
> >> +        error_report("%s: vfio_dma_unmap() failed: %s", __func__,
> >> +                     strerror(-ret));
> >> +    }
> >> +}
> >> +
> >> +static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
> >> +                                            const MemoryRegion *mr,
> >> +                                            ram_addr_t offset, ram_addr_t size)
> >> +{
> >> +    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
> >> +                                                listener);
> >> +    const hwaddr mr_end = MIN(offset + size,
> >> +                              vrdl->offset_within_region + vrdl->size);
> >> +    hwaddr mr_start = MAX(offset, vrdl->offset_within_region);
> >> +    hwaddr mr_next, iova;
> >> +    void *vaddr;
> >> +    int ret;
> >> +
> >> +    /*
> >> +     * Map in (aligned within memory region) minimum granularity, so we can
> >> +     * unmap in minimum granularity later.
> >> +     */
> >> +    for (; mr_start < mr_end; mr_start = mr_next) {
> >> +        mr_next = QEMU_ALIGN_UP(mr_start + 1, vrdl->granularity);
> >> +        mr_next = MIN(mr_next, mr_end);
> >> +
> >> +        iova = mr_start - vrdl->offset_within_region +
> >> +               vrdl->offset_within_address_space;
> >> +        vaddr = memory_region_get_ram_ptr(vrdl->mr) + mr_start;
> >> +
> >> +        ret = vfio_dma_map(vrdl->container, iova, mr_next - mr_start,
> >> +                           vaddr, mr->readonly);
> >> +        if (ret) {
> >> +            /* Rollback */
> >> +            vfio_ram_discard_notify_discard(rdl, mr, offset, size);
> >> +            return ret;
> >> +        }
> >> +    }
> >> +    return 0;
> >> +}
> >> +
> >> +static void vfio_ram_discard_notify_discard_all(RamDiscardListener *rdl,
> >> +                                                const MemoryRegion *mr)
> >> +{
> >> +    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
> >> +                                                listener);
> >> +    int ret;
> >> +
> >> +    /* Unmap with a single call. */
> >> +    ret = vfio_dma_unmap(vrdl->container, vrdl->offset_within_address_space,
> >> +                         vrdl->size, NULL);
> >> +    if (ret) {
> >> +        error_report("%s: vfio_dma_unmap() failed: %s", __func__,
> >> +                     strerror(-ret));
> >> +    }
> >> +}
> >> +
> >> +static void vfio_register_ram_discard_notifier(VFIOContainer *container,
> >> +                                               MemoryRegionSection *section)
> >> +{
> >> +    RamDiscardMgr *rdm = memory_region_get_ram_discard_mgr(section->mr);
> >> +    RamDiscardMgrClass *rdmc = RAM_DISCARD_MGR_GET_CLASS(rdm);
> >> +    VFIORamDiscardListener *vrdl;
> >> +
> >> +    vrdl = g_new0(VFIORamDiscardListener, 1);
> >> +    vrdl->container = container;
> >> +    vrdl->mr = section->mr;
> >> +    vrdl->offset_within_region = section->offset_within_region;
> >> +    vrdl->offset_within_address_space = section->offset_within_address_space;
> >> +    vrdl->size = int128_get64(section->size);
> >> +    vrdl->granularity = rdmc->get_min_granularity(rdm, section->mr);
> >> +    vrdl->dma_max = vrdl->size / vrdl->granularity;
> >> +    if (!QEMU_IS_ALIGNED(vrdl->size, vrdl->granularity) ||
> >> +        !QEMU_IS_ALIGNED(vrdl->offset_within_region, vrdl->granularity)) {
> >> +        vrdl->dma_max++;
> >> +    }
> >> +
> >> +    /* Ignore some corner cases not relevant in practice. */
> >> +    g_assert(QEMU_IS_ALIGNED(vrdl->offset_within_region, TARGET_PAGE_SIZE));
> >> +    g_assert(QEMU_IS_ALIGNED(vrdl->offset_within_address_space,
> >> +                             TARGET_PAGE_SIZE));
> >> +    g_assert(QEMU_IS_ALIGNED(vrdl->size, TARGET_PAGE_SIZE));
> >> +
> >> +    /* We could consume quite some mappings later. */
> >> +    vfio_container_dma_reserve(container, vrdl->dma_max);  
> > 
> > 
> > Aha, I guess this is where the "reservation" aspect begins to appear.
> > Should this be its own counter though, perhaps
> > dma_discard_max_mappings?  The populate and discard callbacks could
> > further divide this into used and outstanding counters.  However, TBH
> > I'm not sure I understand the counters since this is probably the most
> > robust mapping path where we can actually safely nak a populate  
> 
> I'd like to be able to warn early on fundamental setup issues, not only
> when accidentally running into these limits later.
> 
> > callback.  Maybe rather than any of these runtime counters we should
> > just walk the vrdl_list, calculate max mappings, and if that exceeds
> > some large fraction of available mappings, issue a warning (not that
> > they wouldn't be useful for tracing).  Thanks,  
> 
> Sure, we can calculate max mappings from the vrdl_list. But which
> fraction to chose? The reservation approach simply considers any
> mappings (well, except IOMMU because they are kind of special)

Right, but we're looking at the address space of a device, which should
be exclusively system memory or an IOMMU range, right?  There are IOMMUs
that don't restrict the device to the IOVA window, but I'm not sure if
we care about those.  If that's true, I'm not sure we need to worry
about the complicated intersection of RamDiscardMgr and vIOMMU both
creating mappings.

> Guidance on the fraction / #mappings to assume we can use appreciated.

Can we use the number of KVM memory slots as a guide?  This is
essentially a mechanism for sub-dividing things that exist in a KVM
memory slot, so it seems like (dma_avail - KVM-memory-slots) should be
greater than the # of possible granules we'd map across all the
RamDiscardMgr regions.  Maybe a good starting point?  Thanks,

Alex

next prev parent reply	other threads:[~2020-12-17 20:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-16 14:11 [PATCH v3 00/10] virtio-mem: vfio support David Hildenbrand
2020-12-16 14:11 ` [PATCH v3 01/10] memory: Introduce RamDiscardMgr for RAM memory regions David Hildenbrand
2020-12-16 14:11 ` [PATCH v3 02/10] virtio-mem: Factor out traversing unplugged ranges David Hildenbrand
2020-12-16 14:11 ` [PATCH v3 03/10] virtio-mem: Implement RamDiscardMgr interface David Hildenbrand
2020-12-16 14:11 ` [PATCH v3 04/10] vfio: Query and store the maximum number of DMA mappings David Hildenbrand
2020-12-17  7:43   ` Pankaj Gupta
2020-12-17 17:55   ` Alex Williamson
2020-12-17 19:04     ` David Hildenbrand
2020-12-17 19:37       ` David Hildenbrand
2020-12-17 19:47       ` Alex Williamson
2021-01-07 12:56         ` David Hildenbrand
2020-12-16 14:11 ` [PATCH v3 05/10] vfio: Support for RamDiscardMgr in the !vIOMMU case David Hildenbrand
2020-12-17 18:36   ` Alex Williamson
2020-12-17 18:55     ` David Hildenbrand
2020-12-17 19:59       ` Alex Williamson [this message]
2020-12-18  9:11         ` David Hildenbrand
2020-12-16 14:11 ` [PATCH v3 06/10] vfio: Support for RamDiscardMgr in the vIOMMU case David Hildenbrand
2020-12-16 14:11 ` [PATCH v3 07/10] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require) David Hildenbrand
2020-12-16 14:11 ` [PATCH v3 08/10] softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types David Hildenbrand
2020-12-16 14:11 ` [PATCH v3 09/10] virtio-mem: Require only coordinated discards David Hildenbrand
2020-12-16 14:12 ` [PATCH v3 10/10] vfio: Disable only uncoordinated discards David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201217125905.7cff1c12@omen.home \
    --to=alex.williamson@redhat.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mkedzier@redhat.com \
    --cc=mst@redhat.com \
    --cc=pankaj.gupta.linux@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.weiyang@linux.alibaba.com \
    --cc=teawaterz@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).