From: Paul Mackerras <paulus@ozlabs.org>
To: linuxppc-dev@ozlabs.org, kvm-ppc@vger.kernel.org
Cc: Michael Ellerman <michael@ellerman.id.au>,
Alexey Kardashevskiy <aik@ozlabs.ru>,
David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH] KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages
Date: Thu, 23 Aug 2018 10:08:58 +1000 [thread overview]
Message-ID: <20180823000858.GA17719@fergus> (raw)
Commit 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in the
pinned physical page", 2018-07-17) added some checks to ensure that guest
DMA mappings don't attempt to map more than the guest is entitled to
access. However, errors in the logic mean that legitimate guest requests
to map pages for DMA are being denied in some situations. Specifically,
if the first page of the range passed to mm_iommu_get() is mapped with
a normal page, and subsequent pages are mapped with transparent huge
pages, we end up with mem->pageshift == 0. That means that the page
size checks in mm_iommu_ua_to_hpa() and mm_iommu_up_to_hpa_rm() will
always fail for every page in that region, and thus the guest can never
map any memory in that region for DMA, typically leading to a flood of
error messages like this:
qemu-system-ppc64: VFIO_MAP_DMA: -22
qemu-system-ppc64: vfio_dma_map(0x10005f47780, 0x800000000000000, 0x10000, 0x7fff63ff0000) = -22 (Invalid argument)
The logic errors in mm_iommu_get() are:
(a) use of 'ua' not 'ua + (i << PAGE_SHIFT)' in the find_linux_pte() call
(meaning that find_linux_pte() returns the pte for the first address
in the range, not the address we are currently up to);
(b) use of 'pageshift' as the variable to receive the hugepage shift
returned by find_linux_pte() - for a normal page this gets set to
0, leading to us setting mem->pageshift to 0 when we conclude that
the pte returned by find_linux_pte didn't match the page we were
looking at;
(c) comparing 'compshift', which is a page order, i.e. log base 2 of the
number of pages, with 'pageshift', which is a log base 2 of the number
of bytes.
To fix these problems, this patch introduces 'cur_ua' to hold the current
user address and uses that in the find_linux_pte call; introduces
'pteshift' to hold the hugepage shift found by find_linux_pte(); and
compares 'pteshift' with 'compshift + PAGE_SHIFT' rather than 'compshift'.
The patch also moves the local_irq_restore to the point after the pte
pointer returned by find_linux_pte has been dereferenced because that
seems safer, and adds a check to avoid doing the find_linux_pte() call
once mem->pageshift has been reduced to PAGE_SHIFT, as an optimization.
Cc: stable@vger.kernel.org # v4.12+
Fixes: 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in the
pinned physical page")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/mm/mmu_context_iommu.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index a4ca57612558..c9ee9e23845f 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -129,6 +129,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
long i, j, ret = 0, locked_entries = 0;
unsigned int pageshift;
unsigned long flags;
+ unsigned long cur_ua;
struct page *page = NULL;
mutex_lock(&mem_list_mutex);
@@ -177,7 +178,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
}
for (i = 0; i < entries; ++i) {
- if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
+ cur_ua = ua + (i << PAGE_SHIFT);
+ if (1 != get_user_pages_fast(cur_ua,
1/* pages */, 1/* iswrite */, &page)) {
ret = -EFAULT;
for (j = 0; j < i; ++j)
@@ -196,7 +198,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
if (is_migrate_cma_page(page)) {
if (mm_iommu_move_page_from_cma(page))
goto populate;
- if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
+ if (1 != get_user_pages_fast(cur_ua,
1/* pages */, 1/* iswrite */,
&page)) {
ret = -EFAULT;
@@ -210,20 +212,21 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
}
populate:
pageshift = PAGE_SHIFT;
- if (PageCompound(page)) {
+ if (mem->pageshift > PAGE_SHIFT && PageCompound(page)) {
pte_t *pte;
struct page *head = compound_head(page);
unsigned int compshift = compound_order(head);
+ unsigned int pteshift;
local_irq_save(flags); /* disables as well */
- pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift);
- local_irq_restore(flags);
+ pte = find_linux_pte(mm->pgd, cur_ua, NULL, &pteshift);
/* Double check it is still the same pinned page */
if (pte && pte_page(*pte) == head &&
- pageshift == compshift)
- pageshift = max_t(unsigned int, pageshift,
+ pteshift == compshift + PAGE_SHIFT)
+ pageshift = max_t(unsigned int, pteshift,
PAGE_SHIFT);
+ local_irq_restore(flags);
}
mem->pageshift = min(mem->pageshift, pageshift);
mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
--
2.11.0
next reply other threads:[~2018-08-23 0:09 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-23 0:08 Paul Mackerras [this message]
2018-08-23 14:18 ` KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180823000858.GA17719@fergus \
--to=paulus@ozlabs.org \
--cc=aik@ozlabs.ru \
--cc=david@gibson.dropbear.id.au \
--cc=kvm-ppc@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=michael@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).