From: Chao Gao <chao.gao@intel.com>
To: mst@redhat.com, pbonzini@redhat.com, peterx@redhat.com,
jasowang@redhat.com
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH] vhost: use large iotlb entry if no IOMMU translation is needed
Date: Tue, 3 Aug 2021 12:29:29 +0800 [thread overview]
Message-ID: <20210803042927.GA30466@gao-cwp> (raw)
In-Reply-To: <20210721075402.203711-1-chao.gao@intel.com>
Ping. Could someone help to review this patch?
Thanks
Chao
On Wed, Jul 21, 2021 at 03:54:02PM +0800, Chao Gao wrote:
>If guest enables IOMMU_PLATFORM for virtio-net, severe network
>performance drop is observed even if there is no IOMMU. And disabling
>vhost can mitigate the perf issue. Finally, we found the culprit is
>frequent iotlb misses: kernel vhost-net has 2048 entries and each
>entry is 4K (qemu uses 4K for i386 if no IOMMU); vhost-net can cache
>translations for up to 8M (i.e. 4K*2048) IOVAs. If guest uses >8M
>memory for DMA, there are some iotlb misses.
>
>If there is no IOMMU or IOMMU is disabled or IOMMU works in pass-thru
>mode, we can optimistically use large, unaligned iotlb entries instead
>of 4K-aligned entries to reduce iotlb pressure. Actually, vhost-net
>in kernel supports unaligned iotlb entry. The alignment requirement is
>imposed by address_space_get_iotlb_entry() and flatview_do_translate().
>
>Introduce IOMMUTLBEntryUnaligned which has a @len field to specify the
>iotlb size to abstract a generic iotlb entry: aligned (original
>IOMMUTLBEntry) and unaligned entry. flatview_do_translate() now
>returns a magic value in @page_mask_out if no IOMMU translation is
>needed. Then, address_space_get_iotbl_entry() can accordingly return a
>page-aligned iotlb entry or the whole memory region section where the
>iova resides as a large iotlb entry.
>
>Signed-off-by: Chao Gao <chao.gao@intel.com>
>---
> hw/virtio/vhost.c | 6 +++---
> include/exec/memory.h | 16 ++++++++++++++--
> softmmu/physmem.c | 37 +++++++++++++++++++++++++++++--------
> 3 files changed, 46 insertions(+), 13 deletions(-)
>
>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>index e8f85a5d2d..6745caa129 100644
>--- a/hw/virtio/vhost.c
>+++ b/hw/virtio/vhost.c
>@@ -1010,7 +1010,7 @@ static int vhost_memory_region_lookup(struct vhost_dev *hdev,
>
> int vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
> {
>- IOMMUTLBEntry iotlb;
>+ IOMMUTLBEntryUnaligned iotlb;
> uint64_t uaddr, len;
> int ret = -EFAULT;
>
>@@ -1031,8 +1031,8 @@ int vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
> goto out;
> }
>
>- len = MIN(iotlb.addr_mask + 1, len);
>- iova = iova & ~iotlb.addr_mask;
>+ len = MIN(iotlb.len, len);
>+ iova = iotlb.iova;
>
> ret = vhost_backend_update_device_iotlb(dev, iova, uaddr,
> len, iotlb.perm);
>diff --git a/include/exec/memory.h b/include/exec/memory.h
>index c3d417d317..3f04e8fe88 100644
>--- a/include/exec/memory.h
>+++ b/include/exec/memory.h
>@@ -94,6 +94,7 @@ struct MemoryRegionSection {
> };
>
> typedef struct IOMMUTLBEntry IOMMUTLBEntry;
>+typedef struct IOMMUTLBEntryUnaligned IOMMUTLBEntryUnaligned;
>
> /* See address_space_translate: bit 0 is read, bit 1 is write. */
> typedef enum {
>@@ -113,6 +114,15 @@ struct IOMMUTLBEntry {
> IOMMUAccessFlags perm;
> };
>
>+/* IOMMUTLBEntryUnaligned may be not page-aligned */
>+struct IOMMUTLBEntryUnaligned {
>+ AddressSpace *target_as;
>+ hwaddr iova;
>+ hwaddr translated_addr;
>+ hwaddr len;
>+ IOMMUAccessFlags perm;
>+};
>+
> /*
> * Bitmap for different IOMMUNotifier capabilities. Each notifier can
> * register with one or multiple IOMMU Notifier capability bit(s).
>@@ -2653,8 +2663,10 @@ void address_space_cache_destroy(MemoryRegionCache *cache);
> /* address_space_get_iotlb_entry: translate an address into an IOTLB
> * entry. Should be called from an RCU critical section.
> */
>-IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
>- bool is_write, MemTxAttrs attrs);
>+IOMMUTLBEntryUnaligned address_space_get_iotlb_entry(AddressSpace *as,
>+ hwaddr addr,
>+ bool is_write,
>+ MemTxAttrs attrs);
>
> /* address_space_translate: translate an address range into an address space
> * into a MemoryRegion and an address range into that section. Should be
>diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>index 3c1912a1a0..469963f754 100644
>--- a/softmmu/physmem.c
>+++ b/softmmu/physmem.c
>@@ -143,6 +143,8 @@ typedef struct subpage_t {
>
> #define PHYS_SECTION_UNASSIGNED 0
>
>+#define PAGE_MASK_NOT_BEHIND_IOMMU ((hwaddr)-1)
>+
> static void io_mem_init(void);
> static void memory_map_init(void);
> static void tcg_log_global_after_sync(MemoryListener *listener);
>@@ -470,7 +472,9 @@ unassigned:
> * @page_mask_out: page mask for the translated address. This
> * should only be meaningful for IOMMU translated
> * addresses, since there may be huge pages that this bit
>- * would tell. It can be @NULL if we don't care about it.
>+ * would tell. If the returned memory region section isn't
>+ * behind an IOMMU, PAGE_MASK_NOT_BEHIND_IOMMU is return.
>+ * It can be @NULL if we don't care about it.
> * @is_write: whether the translation operation is for write
> * @is_mmio: whether this can be MMIO, set true if it can
> * @target_as: the address space targeted by the IOMMU
>@@ -508,16 +512,18 @@ static MemoryRegionSection flatview_do_translate(FlatView *fv,
> target_as, attrs);
> }
> if (page_mask_out) {
>- /* Not behind an IOMMU, use default page size. */
>- *page_mask_out = ~TARGET_PAGE_MASK;
>+ /* return a magic value if not behind an IOMMU */
>+ *page_mask_out = PAGE_MASK_NOT_BEHIND_IOMMU;
> }
>
> return *section;
> }
>
> /* Called from RCU critical section */
>-IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
>- bool is_write, MemTxAttrs attrs)
>+IOMMUTLBEntryUnaligned address_space_get_iotlb_entry(AddressSpace *as,
>+ hwaddr addr,
>+ bool is_write,
>+ MemTxAttrs attrs)
> {
> MemoryRegionSection section;
> hwaddr xlat, page_mask;
>@@ -535,21 +541,36 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
> goto iotlb_fail;
> }
>
>+ /*
>+ * If the section isn't behind an IOMMU, return the whole section as an
>+ * IOMMU TLB entry.
>+ */
>+ if (page_mask == PAGE_MASK_NOT_BEHIND_IOMMU) {
>+ return (IOMMUTLBEntryUnaligned) {
>+ .target_as = as,
>+ .iova = section.offset_within_address_space,
>+ .translated_addr = section.offset_within_address_space,
>+ .len = section.size,
>+ /* IOTLBs are for DMAs, and DMA only allows on RAMs. */
>+ .perm = IOMMU_RW,
>+ };
>+ }
>+
> /* Convert memory region offset into address space offset */
> xlat += section.offset_within_address_space -
> section.offset_within_region;
>
>- return (IOMMUTLBEntry) {
>+ return (IOMMUTLBEntryUnaligned) {
> .target_as = as,
> .iova = addr & ~page_mask,
> .translated_addr = xlat & ~page_mask,
>- .addr_mask = page_mask,
>+ .len = page_mask + 1,
> /* IOTLBs are for DMAs, and DMA only allows on RAMs. */
> .perm = IOMMU_RW,
> };
>
> iotlb_fail:
>- return (IOMMUTLBEntry) {0};
>+ return (IOMMUTLBEntryUnaligned) {0};
> }
>
> /* Called from RCU critical section */
>--
>2.25.1
>
next prev parent reply other threads:[~2021-08-03 4:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-21 7:54 [PATCH] vhost: use large iotlb entry if no IOMMU translation is needed Chao Gao
2021-08-03 4:29 ` Chao Gao [this message]
2021-08-03 4:43 ` Jason Wang
2021-08-03 5:51 ` Chao Gao
2021-08-03 8:14 ` Jason Wang
2021-08-03 21:11 ` Peter Xu
2021-08-04 3:19 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210803042927.GA30466@gao-cwp \
--to=chao.gao@intel.com \
--cc=jasowang@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).