* [PATCH kernel v7 0/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page @ 2018-07-17 7:19 Alexey Kardashevskiy 2018-07-17 7:19 ` [PATCH kernel v7 1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize Alexey Kardashevskiy ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Alexey Kardashevskiy @ 2018-07-17 7:19 UTC (permalink / raw) To: linuxppc-dev Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V, Alex Williamson, Michael Ellerman, Nicholas Piggin, Paul Mackerras This is to improve page boundaries checking and should probably be cc:stable. I came accross this while debugging nvlink2 passthrough but the lack of checking might be exploited by the existing userspace. The get_user_pages() comment says it should be "phased out" but the only alternative seems to be get_user_pages_longterm(), should that be used instead (this is longterm reference elevation, however it is not DAX, whatever this implies)? get_user_pages_remote() seems unnecessarily complicated because of @locked. Changes: v7: * 2/2: do not fail if pte is not found, fall back to the default case instead v6: * 2/2: read pageshift from pte v5: * 2/2: changed compound pages handling v4: * 2/2: implemented less strict but still safe max pageshift as David suggested v3: * enforced huge pages not to cross preregistered chunk boundaries v2: * 2/2: explicitly check for compound pages before calling compound_order() This is based on sha1 9d3cce1 Linus Torvalds "Linux 4.18-rc5". Please comment. Thanks. Alexey Kardashevskiy (2): vfio/spapr: Use IOMMU pageshift rather than pagesize KVM: PPC: Check if IOMMU page is contained in the pinned physical page arch/powerpc/include/asm/mmu_context.h | 4 ++-- arch/powerpc/kvm/book3s_64_vio.c | 2 +- arch/powerpc/kvm/book3s_64_vio_hv.c | 6 ++++-- arch/powerpc/mm/mmu_context_iommu.c | 37 ++++++++++++++++++++++++++++++++-- drivers/vfio/vfio_iommu_spapr_tce.c | 10 ++++----- 5 files changed, 47 insertions(+), 12 deletions(-) -- 2.11.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH kernel v7 1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize 2018-07-17 7:19 [PATCH kernel v7 0/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy @ 2018-07-17 7:19 ` Alexey Kardashevskiy 2018-07-19 6:06 ` [kernel, v7, " Michael Ellerman 2018-07-17 7:19 ` [PATCH kernel v7 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy 2018-07-18 6:28 ` [PATCH kernel v7 0/2] " Paul Mackerras 2 siblings, 1 reply; 8+ messages in thread From: Alexey Kardashevskiy @ 2018-07-17 7:19 UTC (permalink / raw) To: linuxppc-dev Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V, Alex Williamson, Michael Ellerman, Nicholas Piggin, Paul Mackerras The size is always equal to 1 page so let's use this. Later on this will be used for other checks which use page shifts to check the granularity of access. This should cause no behavioral change. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- As Alex suggested, this should go via the ppc tree which the next patch is going to (which is ppc-kvm). --- drivers/vfio/vfio_iommu_spapr_tce.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 759a5bd..2da5f05 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -457,13 +457,13 @@ static void tce_iommu_unuse_page(struct tce_container *container, } static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container, - unsigned long tce, unsigned long size, + unsigned long tce, unsigned long shift, unsigned long *phpa, struct mm_iommu_table_group_mem_t **pmem) { long ret = 0; struct mm_iommu_table_group_mem_t *mem; - mem = mm_iommu_lookup(container->mm, tce, size); + mem = mm_iommu_lookup(container->mm, tce, 1ULL << shift); if (!mem) return -EINVAL; @@ -487,7 +487,7 @@ static void tce_iommu_unuse_page_v2(struct tce_container *container, if (!pua) return; - ret = tce_iommu_prereg_ua_to_hpa(container, *pua, IOMMU_PAGE_SIZE(tbl), + ret = tce_iommu_prereg_ua_to_hpa(container, *pua, tbl->it_page_shift, &hpa, &mem); if (ret) pr_debug("%s: tce %lx at #%lx was not cached, ret=%d\n", @@ -611,7 +611,7 @@ static long tce_iommu_build_v2(struct tce_container *container, entry + i); ret = tce_iommu_prereg_ua_to_hpa(container, - tce, IOMMU_PAGE_SIZE(tbl), &hpa, &mem); + tce, tbl->it_page_shift, &hpa, &mem); if (ret) break; -- 2.11.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [kernel, v7, 1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize 2018-07-17 7:19 ` [PATCH kernel v7 1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize Alexey Kardashevskiy @ 2018-07-19 6:06 ` Michael Ellerman 2018-07-20 3:06 ` [kernel,v7,1/2] " Paul Mackerras 0 siblings, 1 reply; 8+ messages in thread From: Michael Ellerman @ 2018-07-19 6:06 UTC (permalink / raw) To: Alexey Kardashevskiy, linuxppc-dev Cc: Alexey Kardashevskiy, Nicholas Piggin, kvm-ppc, Alex Williamson, Aneesh Kumar K.V, David Gibson On Tue, 2018-07-17 at 07:19:12 UTC, Alexey Kardashevskiy wrote: > The size is always equal to 1 page so let's use this. Later on this will > be used for other checks which use page shifts to check the granularity > of access. > > This should cause no behavioral change. > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > Acked-by: Alex Williamson <alex.williamson@redhat.com> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/1463edca6734d42ab4406fa2896e20 cheers ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [kernel,v7,1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize 2018-07-19 6:06 ` [kernel, v7, " Michael Ellerman @ 2018-07-20 3:06 ` Paul Mackerras 0 siblings, 0 replies; 8+ messages in thread From: Paul Mackerras @ 2018-07-20 3:06 UTC (permalink / raw) To: Michael Ellerman Cc: Alexey Kardashevskiy, linuxppc-dev, Nicholas Piggin, kvm-ppc, Alex Williamson, Aneesh Kumar K.V, David Gibson On Thu, Jul 19, 2018 at 04:06:10PM +1000, Michael Ellerman wrote: > On Tue, 2018-07-17 at 07:19:12 UTC, Alexey Kardashevskiy wrote: > > The size is always equal to 1 page so let's use this. Later on this will > > be used for other checks which use page shifts to check the granularity > > of access. > > > > This should cause no behavioral change. > > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > > Acked-by: Alex Williamson <alex.williamson@redhat.com> > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > Applied to powerpc fixes, thanks. > > https://git.kernel.org/powerpc/c/1463edca6734d42ab4406fa2896e20 Ah. I have put these two patches in my kvm-ppc-next branch and I was about to send a pull request to Paolo. Paul. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH kernel v7 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page 2018-07-17 7:19 [PATCH kernel v7 0/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy 2018-07-17 7:19 ` [PATCH kernel v7 1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize Alexey Kardashevskiy @ 2018-07-17 7:19 ` Alexey Kardashevskiy 2018-07-18 2:11 ` David Gibson 2018-07-19 6:06 ` [kernel, v7, " Michael Ellerman 2018-07-18 6:28 ` [PATCH kernel v7 0/2] " Paul Mackerras 2 siblings, 2 replies; 8+ messages in thread From: Alexey Kardashevskiy @ 2018-07-17 7:19 UTC (permalink / raw) To: linuxppc-dev Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V, Alex Williamson, Michael Ellerman, Nicholas Piggin, Paul Mackerras A VM which has: - a DMA capable device passed through to it (eg. network card); - running a malicious kernel that ignores H_PUT_TCE failure; - capability of using IOMMU pages bigger that physical pages can create an IOMMU mapping that exposes (for example) 16MB of the host physical memory to the device when only 64K was allocated to the VM. The remaining 16MB - 64K will be some other content of host memory, possibly including pages of the VM, but also pages of host kernel memory, host programs or other VMs. The attacking VM does not control the location of the page it can map, and is only allowed to map as many pages as it has pages of RAM. We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that an IOMMU page is contained in the physical page so the PCI hardware won't get access to unassigned host memory; however this check is missing in the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and did not hit this yet as the very first time when the mapping happens we do not have tbl::it_userspace allocated yet and fall back to the userspace which in turn calls VFIO IOMMU driver, this fails and the guest does not retry, This stores the smallest preregistered page size in the preregistered region descriptor and changes the mm_iommu_xxx API to check this against the IOMMU page size. This calculates maximum page size as a minimum of the natural region alignment and compound page size. For the page shift this uses the shift returned by find_linux_pte() which indicates how the page is mapped to the current userspace - if the page is huge and this is not a zero, then it is a leaf pte and the page is mapped within the range. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- v6 got a couple of rb's but since the patch has changed again, I am not putting them here yet. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> --- Changes: v7: * do not fail if pte is not found, fall back to the default case instead v6: * replaced hugetlbfs with pageshift from find_linux_pte() v5: * only consider compound pages from hugetlbfs v4: * reimplemented max pageshift calculation v3: * fixed upper limit for the page size * added checks that we don't register parts of a huge page v2: * explicitely check for compound pages before calling compound_order() --- The bug is: run QEMU _without_ hugepages (no -mempath) and tell it to advertise 16MB pages to the guest; a typical pseries guest will use 16MB for IOMMU pages without checking the mmu pagesize and this will fail at https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/common.c;h=fb396cf00ac40eb35967a04c9cc798ca896eed57;hb=refs/heads/master#l256 With the change, mapping will fail in KVM and the guest will print: mlx5_core 0000:00:00.0: ibm,create-pe-dma-window(2027) 0 8000000 20000000 18 1f returned 0 (liobn = 0x80000001 starting addr = 8000000 0) mlx5_core 0000:00:00.0: created tce table LIOBN 0x80000001 for /pci@800000020000000/ethernet@0 mlx5_core 0000:00:00.0: failed to map direct window for /pci@800000020000000/ethernet@0: -1 --- arch/powerpc/include/asm/mmu_context.h | 4 ++-- arch/powerpc/kvm/book3s_64_vio.c | 2 +- arch/powerpc/kvm/book3s_64_vio_hv.c | 6 ++++-- arch/powerpc/mm/mmu_context_iommu.c | 37 ++++++++++++++++++++++++++++++++-- drivers/vfio/vfio_iommu_spapr_tce.c | 2 +- 5 files changed, 43 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 896efa5..79d570c 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -35,9 +35,9 @@ extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm( extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, unsigned long ua, unsigned long entries); extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa); + unsigned long ua, unsigned int pageshift, unsigned long *hpa); extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa); + unsigned long ua, unsigned int pageshift, unsigned long *hpa); extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem); #endif diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index d066e37..8c456fa 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -449,7 +449,7 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, /* This only handles v2 IOMMU type, v1 is handled via ioctl() */ return H_TOO_HARD; - if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, &hpa))) + if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa))) return H_HARDWARE; if (mm_iommu_mapped_inc(mem)) diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 925fc31..5b298f5 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -279,7 +279,8 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, if (!mem) return H_TOO_HARD; - if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, &hpa))) + if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift, + &hpa))) return H_HARDWARE; pua = (void *) vmalloc_to_phys(pua); @@ -469,7 +470,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K); if (mem) - prereg = mm_iommu_ua_to_hpa_rm(mem, ua, &tces) == 0; + prereg = mm_iommu_ua_to_hpa_rm(mem, ua, + IOMMU_PAGE_SHIFT_4K, &tces) == 0; } if (!prereg) { diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index abb4364..a4ca576 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -19,6 +19,7 @@ #include <linux/hugetlb.h> #include <linux/swap.h> #include <asm/mmu_context.h> +#include <asm/pte-walk.h> static DEFINE_MUTEX(mem_list_mutex); @@ -27,6 +28,7 @@ struct mm_iommu_table_group_mem_t { struct rcu_head rcu; unsigned long used; atomic64_t mapped; + unsigned int pageshift; u64 ua; /* userspace address */ u64 entries; /* number of entries in hpas[] */ u64 *hpas; /* vmalloc'ed */ @@ -125,6 +127,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, { struct mm_iommu_table_group_mem_t *mem; long i, j, ret = 0, locked_entries = 0; + unsigned int pageshift; + unsigned long flags; struct page *page = NULL; mutex_lock(&mem_list_mutex); @@ -159,6 +163,12 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, goto unlock_exit; } + /* + * For a starting point for a maximum page size calculation + * we use @ua and @entries natural alignment to allow IOMMU pages + * smaller than huge pages but still bigger than PAGE_SIZE. + */ + mem->pageshift = __ffs(ua | (entries << PAGE_SHIFT)); mem->hpas = vzalloc(array_size(entries, sizeof(mem->hpas[0]))); if (!mem->hpas) { kfree(mem); @@ -199,6 +209,23 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, } } populate: + pageshift = PAGE_SHIFT; + if (PageCompound(page)) { + pte_t *pte; + struct page *head = compound_head(page); + unsigned int compshift = compound_order(head); + + local_irq_save(flags); /* disables as well */ + pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift); + local_irq_restore(flags); + + /* Double check it is still the same pinned page */ + if (pte && pte_page(*pte) == head && + pageshift == compshift) + pageshift = max_t(unsigned int, pageshift, + PAGE_SHIFT); + } + mem->pageshift = min(mem->pageshift, pageshift); mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT; } @@ -349,7 +376,7 @@ struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, EXPORT_SYMBOL_GPL(mm_iommu_find); long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa) + unsigned long ua, unsigned int pageshift, unsigned long *hpa) { const long entry = (ua - mem->ua) >> PAGE_SHIFT; u64 *va = &mem->hpas[entry]; @@ -357,6 +384,9 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, if (entry >= mem->entries) return -EFAULT; + if (pageshift > mem->pageshift) + return -EFAULT; + *hpa = *va | (ua & ~PAGE_MASK); return 0; @@ -364,7 +394,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa); long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, - unsigned long ua, unsigned long *hpa) + unsigned long ua, unsigned int pageshift, unsigned long *hpa) { const long entry = (ua - mem->ua) >> PAGE_SHIFT; void *va = &mem->hpas[entry]; @@ -373,6 +403,9 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, if (entry >= mem->entries) return -EFAULT; + if (pageshift > mem->pageshift) + return -EFAULT; + pa = (void *) vmalloc_to_phys(va); if (!pa) return -EFAULT; diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 2da5f05..7cd63b0 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -467,7 +467,7 @@ static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container, if (!mem) return -EINVAL; - ret = mm_iommu_ua_to_hpa(mem, tce, phpa); + ret = mm_iommu_ua_to_hpa(mem, tce, shift, phpa); if (ret) return -EINVAL; -- 2.11.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH kernel v7 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page 2018-07-17 7:19 ` [PATCH kernel v7 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy @ 2018-07-18 2:11 ` David Gibson 2018-07-19 6:06 ` [kernel, v7, " Michael Ellerman 1 sibling, 0 replies; 8+ messages in thread From: David Gibson @ 2018-07-18 2:11 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: linuxppc-dev, kvm-ppc, Aneesh Kumar K.V, Alex Williamson, Michael Ellerman, Nicholas Piggin, Paul Mackerras [-- Attachment #1: Type: text/plain, Size: 10772 bytes --] On Tue, Jul 17, 2018 at 05:19:13PM +1000, Alexey Kardashevskiy wrote: > A VM which has: > - a DMA capable device passed through to it (eg. network card); > - running a malicious kernel that ignores H_PUT_TCE failure; > - capability of using IOMMU pages bigger that physical pages > can create an IOMMU mapping that exposes (for example) 16MB of > the host physical memory to the device when only 64K was allocated to the VM. > > The remaining 16MB - 64K will be some other content of host memory, possibly > including pages of the VM, but also pages of host kernel memory, host > programs or other VMs. > > The attacking VM does not control the location of the page it can map, > and is only allowed to map as many pages as it has pages of RAM. > > We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that > an IOMMU page is contained in the physical page so the PCI hardware won't > get access to unassigned host memory; however this check is missing in > the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and > did not hit this yet as the very first time when the mapping happens > we do not have tbl::it_userspace allocated yet and fall back to > the userspace which in turn calls VFIO IOMMU driver, this fails and > the guest does not retry, > > This stores the smallest preregistered page size in the preregistered > region descriptor and changes the mm_iommu_xxx API to check this against > the IOMMU page size. > > This calculates maximum page size as a minimum of the natural region > alignment and compound page size. For the page shift this uses the shift > returned by find_linux_pte() which indicates how the page is mapped to > the current userspace - if the page is huge and this is not a zero, then > it is a leaf pte and the page is mapped within the range. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > --- > > v6 got a couple of rb's but since the patch has changed again, I am not > putting them here yet. > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > Reviewed-by: Nicholas Piggin <npiggin@gmail.com> > > --- > Changes: > v7: > * do not fail if pte is not found, fall back to the default case instead > > v6: > * replaced hugetlbfs with pageshift from find_linux_pte() > > v5: > * only consider compound pages from hugetlbfs > > v4: > * reimplemented max pageshift calculation > > v3: > * fixed upper limit for the page size > * added checks that we don't register parts of a huge page > > v2: > * explicitely check for compound pages before calling compound_order() > > --- > The bug is: run QEMU _without_ hugepages (no -mempath) and tell it to > advertise 16MB pages to the guest; a typical pseries guest will use 16MB > for IOMMU pages without checking the mmu pagesize and this will fail > at https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/common.c;h=fb396cf00ac40eb35967a04c9cc798ca896eed57;hb=refs/heads/master#l256 > > With the change, mapping will fail in KVM and the guest will print: > > mlx5_core 0000:00:00.0: ibm,create-pe-dma-window(2027) 0 8000000 20000000 18 1f returned 0 (liobn = 0x80000001 starting addr = 8000000 0) > mlx5_core 0000:00:00.0: created tce table LIOBN 0x80000001 for /pci@800000020000000/ethernet@0 > mlx5_core 0000:00:00.0: failed to map direct window for /pci@800000020000000/ethernet@0: -1 > --- > arch/powerpc/include/asm/mmu_context.h | 4 ++-- > arch/powerpc/kvm/book3s_64_vio.c | 2 +- > arch/powerpc/kvm/book3s_64_vio_hv.c | 6 ++++-- > arch/powerpc/mm/mmu_context_iommu.c | 37 ++++++++++++++++++++++++++++++++-- > drivers/vfio/vfio_iommu_spapr_tce.c | 2 +- > 5 files changed, 43 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h > index 896efa5..79d570c 100644 > --- a/arch/powerpc/include/asm/mmu_context.h > +++ b/arch/powerpc/include/asm/mmu_context.h > @@ -35,9 +35,9 @@ extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm( > extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, > unsigned long ua, unsigned long entries); > extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, > - unsigned long ua, unsigned long *hpa); > + unsigned long ua, unsigned int pageshift, unsigned long *hpa); > extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, > - unsigned long ua, unsigned long *hpa); > + unsigned long ua, unsigned int pageshift, unsigned long *hpa); > extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); > extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem); > #endif > diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c > index d066e37..8c456fa 100644 > --- a/arch/powerpc/kvm/book3s_64_vio.c > +++ b/arch/powerpc/kvm/book3s_64_vio.c > @@ -449,7 +449,7 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, > /* This only handles v2 IOMMU type, v1 is handled via ioctl() */ > return H_TOO_HARD; > > - if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, &hpa))) > + if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa))) > return H_HARDWARE; > > if (mm_iommu_mapped_inc(mem)) > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c > index 925fc31..5b298f5 100644 > --- a/arch/powerpc/kvm/book3s_64_vio_hv.c > +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c > @@ -279,7 +279,8 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, > if (!mem) > return H_TOO_HARD; > > - if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, &hpa))) > + if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift, > + &hpa))) > return H_HARDWARE; > > pua = (void *) vmalloc_to_phys(pua); > @@ -469,7 +470,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, > > mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K); > if (mem) > - prereg = mm_iommu_ua_to_hpa_rm(mem, ua, &tces) == 0; > + prereg = mm_iommu_ua_to_hpa_rm(mem, ua, > + IOMMU_PAGE_SHIFT_4K, &tces) == 0; > } > > if (!prereg) { > diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c > index abb4364..a4ca576 100644 > --- a/arch/powerpc/mm/mmu_context_iommu.c > +++ b/arch/powerpc/mm/mmu_context_iommu.c > @@ -19,6 +19,7 @@ > #include <linux/hugetlb.h> > #include <linux/swap.h> > #include <asm/mmu_context.h> > +#include <asm/pte-walk.h> > > static DEFINE_MUTEX(mem_list_mutex); > > @@ -27,6 +28,7 @@ struct mm_iommu_table_group_mem_t { > struct rcu_head rcu; > unsigned long used; > atomic64_t mapped; > + unsigned int pageshift; > u64 ua; /* userspace address */ > u64 entries; /* number of entries in hpas[] */ > u64 *hpas; /* vmalloc'ed */ > @@ -125,6 +127,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, > { > struct mm_iommu_table_group_mem_t *mem; > long i, j, ret = 0, locked_entries = 0; > + unsigned int pageshift; > + unsigned long flags; > struct page *page = NULL; > > mutex_lock(&mem_list_mutex); > @@ -159,6 +163,12 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, > goto unlock_exit; > } > > + /* > + * For a starting point for a maximum page size calculation > + * we use @ua and @entries natural alignment to allow IOMMU pages > + * smaller than huge pages but still bigger than PAGE_SIZE. > + */ > + mem->pageshift = __ffs(ua | (entries << PAGE_SHIFT)); > mem->hpas = vzalloc(array_size(entries, sizeof(mem->hpas[0]))); > if (!mem->hpas) { > kfree(mem); > @@ -199,6 +209,23 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, > } > } > populate: > + pageshift = PAGE_SHIFT; > + if (PageCompound(page)) { > + pte_t *pte; > + struct page *head = compound_head(page); > + unsigned int compshift = compound_order(head); > + > + local_irq_save(flags); /* disables as well */ > + pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift); > + local_irq_restore(flags); > + > + /* Double check it is still the same pinned page */ > + if (pte && pte_page(*pte) == head && > + pageshift == compshift) > + pageshift = max_t(unsigned int, pageshift, > + PAGE_SHIFT); > + } > + mem->pageshift = min(mem->pageshift, pageshift); > mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT; > } > > @@ -349,7 +376,7 @@ struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, > EXPORT_SYMBOL_GPL(mm_iommu_find); > > long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, > - unsigned long ua, unsigned long *hpa) > + unsigned long ua, unsigned int pageshift, unsigned long *hpa) > { > const long entry = (ua - mem->ua) >> PAGE_SHIFT; > u64 *va = &mem->hpas[entry]; > @@ -357,6 +384,9 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, > if (entry >= mem->entries) > return -EFAULT; > > + if (pageshift > mem->pageshift) > + return -EFAULT; > + > *hpa = *va | (ua & ~PAGE_MASK); > > return 0; > @@ -364,7 +394,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, > EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa); > > long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, > - unsigned long ua, unsigned long *hpa) > + unsigned long ua, unsigned int pageshift, unsigned long *hpa) > { > const long entry = (ua - mem->ua) >> PAGE_SHIFT; > void *va = &mem->hpas[entry]; > @@ -373,6 +403,9 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, > if (entry >= mem->entries) > return -EFAULT; > > + if (pageshift > mem->pageshift) > + return -EFAULT; > + > pa = (void *) vmalloc_to_phys(va); > if (!pa) > return -EFAULT; > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c > index 2da5f05..7cd63b0 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -467,7 +467,7 @@ static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container, > if (!mem) > return -EINVAL; > > - ret = mm_iommu_ua_to_hpa(mem, tce, phpa); > + ret = mm_iommu_ua_to_hpa(mem, tce, shift, phpa); > if (ret) > return -EINVAL; > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [kernel, v7, 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page 2018-07-17 7:19 ` [PATCH kernel v7 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy 2018-07-18 2:11 ` David Gibson @ 2018-07-19 6:06 ` Michael Ellerman 1 sibling, 0 replies; 8+ messages in thread From: Michael Ellerman @ 2018-07-19 6:06 UTC (permalink / raw) To: Alexey Kardashevskiy, linuxppc-dev Cc: Alexey Kardashevskiy, Nicholas Piggin, kvm-ppc, Alex Williamson, Aneesh Kumar K.V, David Gibson On Tue, 2018-07-17 at 07:19:13 UTC, Alexey Kardashevskiy wrote: > A VM which has: > - a DMA capable device passed through to it (eg. network card); > - running a malicious kernel that ignores H_PUT_TCE failure; > - capability of using IOMMU pages bigger that physical pages > can create an IOMMU mapping that exposes (for example) 16MB of > the host physical memory to the device when only 64K was allocated to the VM. > > The remaining 16MB - 64K will be some other content of host memory, possibly > including pages of the VM, but also pages of host kernel memory, host > programs or other VMs. > > The attacking VM does not control the location of the page it can map, > and is only allowed to map as many pages as it has pages of RAM. > > We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that > an IOMMU page is contained in the physical page so the PCI hardware won't > get access to unassigned host memory; however this check is missing in > the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and > did not hit this yet as the very first time when the mapping happens > we do not have tbl::it_userspace allocated yet and fall back to > the userspace which in turn calls VFIO IOMMU driver, this fails and > the guest does not retry, > > This stores the smallest preregistered page size in the preregistered > region descriptor and changes the mm_iommu_xxx API to check this against > the IOMMU page size. > > This calculates maximum page size as a minimum of the natural region > alignment and compound page size. For the page shift this uses the shift > returned by find_linux_pte() which indicates how the page is mapped to > the current userspace - if the page is huge and this is not a zero, then > it is a leaf pte and the page is mapped within the range. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/76fa4975f3ed12d15762bc979ca440 cheers ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH kernel v7 0/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page 2018-07-17 7:19 [PATCH kernel v7 0/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy 2018-07-17 7:19 ` [PATCH kernel v7 1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize Alexey Kardashevskiy 2018-07-17 7:19 ` [PATCH kernel v7 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy @ 2018-07-18 6:28 ` Paul Mackerras 2 siblings, 0 replies; 8+ messages in thread From: Paul Mackerras @ 2018-07-18 6:28 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: linuxppc-dev, David Gibson, kvm-ppc, Aneesh Kumar K.V, Alex Williamson, Michael Ellerman, Nicholas Piggin On Tue, Jul 17, 2018 at 05:19:11PM +1000, Alexey Kardashevskiy wrote: > This is to improve page boundaries checking and should probably > be cc:stable. I came accross this while debugging nvlink2 passthrough > but the lack of checking might be exploited by the existing userspace. > > The get_user_pages() comment says it should be "phased out" but the only > alternative seems to be get_user_pages_longterm(), should that be used > instead (this is longterm reference elevation, however it is not DAX, > whatever this implies)? get_user_pages_remote() seems unnecessarily > complicated because of @locked. > > > Changes: > v7: > * 2/2: do not fail if pte is not found, fall back to the default case instead > > v6: > * 2/2: read pageshift from pte > > v5: > * 2/2: changed compound pages handling > > v4: > * 2/2: implemented less strict but still safe max pageshift as David suggested > > v3: > * enforced huge pages not to cross preregistered chunk boundaries > > v2: > * 2/2: explicitly check for compound pages before calling compound_order() > > > This is based on sha1 > 9d3cce1 Linus Torvalds "Linux 4.18-rc5". > > Please comment. Thanks. > > > > Alexey Kardashevskiy (2): > vfio/spapr: Use IOMMU pageshift rather than pagesize > KVM: PPC: Check if IOMMU page is contained in the pinned physical page > > arch/powerpc/include/asm/mmu_context.h | 4 ++-- > arch/powerpc/kvm/book3s_64_vio.c | 2 +- > arch/powerpc/kvm/book3s_64_vio_hv.c | 6 ++++-- > arch/powerpc/mm/mmu_context_iommu.c | 37 ++++++++++++++++++++++++++++++++-- > drivers/vfio/vfio_iommu_spapr_tce.c | 10 ++++----- > 5 files changed, 47 insertions(+), 12 deletions(-) Thanks, series applied to my kvm-ppc-next branch. Paul. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-07-20 3:06 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-17 7:19 [PATCH kernel v7 0/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy 2018-07-17 7:19 ` [PATCH kernel v7 1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize Alexey Kardashevskiy 2018-07-19 6:06 ` [kernel, v7, " Michael Ellerman 2018-07-20 3:06 ` [kernel,v7,1/2] " Paul Mackerras 2018-07-17 7:19 ` [PATCH kernel v7 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy 2018-07-18 2:11 ` David Gibson 2018-07-19 6:06 ` [kernel, v7, " Michael Ellerman 2018-07-18 6:28 ` [PATCH kernel v7 0/2] " Paul Mackerras
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).