Re: [PATCH kernel v3 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org, kvm-ppc@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Paul Mackerras <paulus@ozlabs.org>
Subject: Re: [PATCH kernel v3 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page
Date: Thu, 5 Jul 2018 12:42:20 +1000	[thread overview]
Message-ID: <20180705024220.GF3450@umbus.fritz.box> (raw)
In-Reply-To: <20180704050052.20045-3-aik@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 4494 bytes --]

On Wed, Jul 04, 2018 at 03:00:52PM +1000, Alexey Kardashevskiy wrote:
> A VM which has:
>  - a DMA capable device passed through to it (eg. network card);
>  - running a malicious kernel that ignores H_PUT_TCE failure;
>  - capability of using IOMMU pages bigger that physical pages
> can create an IOMMU mapping that exposes (for example) 16MB of
> the host physical memory to the device when only 64K was allocated to the VM.
> 
> The remaining 16MB - 64K will be some other content of host memory, possibly
> including pages of the VM, but also pages of host kernel memory, host
> programs or other VMs.
> 
> The attacking VM does not control the location of the page it can map,
> and is only allowed to map as many pages as it has pages of RAM.
> 
> We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
> an IOMMU page is contained in the physical page so the PCI hardware won't
> get access to unassigned host memory; however this check is missing in
> the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
> did not hit this yet as the very first time when the mapping happens
> we do not have tbl::it_userspace allocated yet and fall back to
> the userspace which in turn calls VFIO IOMMU driver, this fails and
> the guest does not retry,
> 
> This stores the smallest preregistered page size in the preregistered
> region descriptor and changes the mm_iommu_xxx API to check this against
> the IOMMU page size. This only allows huge pages use if the entire
> preregistered block is backed with huge pages which are completely
> contained the preregistered chunk; otherwise this defaults to PAGE_SIZE.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

On the grounds that I think this version is safe, which the old one
wasn't.  However it still has some flaws..

[snip]
> @@ -125,7 +126,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  {
>  	struct mm_iommu_table_group_mem_t *mem;
>  	long i, j, ret = 0, locked_entries = 0;
> -	struct page *page = NULL;
> +	unsigned int pageshift;
> +	struct page *page = NULL, *head = NULL;
>  
>  	mutex_lock(&mem_list_mutex);
>  
> @@ -159,6 +161,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  		goto unlock_exit;
>  	}
>  
> +	mem->pageshift = 64;
>  	mem->hpas = vzalloc(array_size(entries, sizeof(mem->hpas[0])));
>  	if (!mem->hpas) {
>  		kfree(mem);
> @@ -199,9 +202,35 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  			}
>  		}
>  populate:
> +		pageshift = PAGE_SHIFT;
> +		if (PageCompound(page)) {
> +			/* Make sure huge page is contained completely */
> +			struct page *tmphead = compound_head(page);
> +			unsigned int n = compound_order(tmphead);
> +
> +			if (!head) {
> +				/* Is it a head of a huge page? */
> +				if (page == tmphead) {
> +					head = tmphead;
> +					pageshift += n;
> +				}
> +			} else if (head == tmphead) {
> +				/* Still same huge page, good */
> +				pageshift += n;
> +
> +				/* End of the huge page */
> +				if (page - head == (1UL << n) - 1)
> +					head = NULL;
> +			}
> +		}
> +		mem->pageshift = min(mem->pageshift, pageshift);
>  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
>  	}
>  
> +	/* We have an incomplete huge page, default to PAGE_SHIFT */
> +	if (head)
> +		mem->pageshift = PAGE_SHIFT;
> +

So, if the user attempts to prereg a region which starts or ends in
the middle of a hugepage, this logic will clamp the region's max page
shift down to PAGE_SHIFT.  That's safe, but not optimal.

Suppose userspace had an area backed with 16MiB hugepages, and wanted
to pre-reg a window that was 2MiB aligned, but not 16MiB aligned.  It
would still be safe to allow 2MiB TCEs, but the code above would clamp
it down to 64kiB (or 4kiB).

The code to do it is also pretty convoluted.

I think you'd be better off initializing mem->pageshift to the largest
possible natural alignment of the region:
	mem->pageshift = ctz64(ua | (entries << PAGE_SHIFT));

Then it should just be sufficient to clamp pageshift down to
compound_order() + PAGE_SHIFT for each entry.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2018-07-05  3:25 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-04  5:00 [PATCH kernel v3 0/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy
2018-07-04  5:00 ` [PATCH kernel v3 1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize Alexey Kardashevskiy
2018-07-04  5:00 ` [PATCH kernel v3 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page Alexey Kardashevskiy
2018-07-05  2:42   ` David Gibson [this message]
2018-07-05  5:19     ` Alexey Kardashevskiy
2018-07-05  8:04       ` Alexey Kardashevskiy
2018-07-06  5:06       ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180705024220.GF3450@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).