public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Janosch Frank <frankja@linux.ibm.com>
To: Claudio Imbrenda <imbrenda@linux.ibm.com>, kvm@vger.kernel.org
Cc: linux-s390@vger.kernel.org, borntraeger@de.ibm.com,
	schlameuss@linux.ibm.com, david@redhat.com, willy@infradead.org,
	hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com,
	gor@linux.ibm.com, nrb@linux.ibm.com, nsg@linux.ibm.com,
	seanjc@google.com, seiden@linux.ibm.com
Subject: Re: [PATCH v2 03/15] KVM: s390: move pv gmap functions into kvm
Date: Fri, 17 Jan 2025 14:38:08 +0100	[thread overview]
Message-ID: <3f0a1778-0617-4c8d-bc8f-40eae47570fb@linux.ibm.com> (raw)
In-Reply-To: <20250116113355.32184-4-imbrenda@linux.ibm.com>

On 1/16/25 12:33 PM, Claudio Imbrenda wrote:
> Move gmap related functions from kernel/uv into kvm.
> 
> Create a new file to collect gmap-related functions.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---

Thanks git, this is close to unreadable

>   /**
> - * should_export_before_import - Determine whether an export is needed
> - * before an import-like operation
> - * @uvcb: the Ultravisor control block of the UVC to be performed
> - * @mm: the mm of the process
> - *
> - * Returns whether an export is needed before every import-like operation.
> - * This is needed for shared pages, which don't trigger a secure storage
> - * exception when accessed from a different guest.
> - *
> - * Although considered as one, the Unpin Page UVC is not an actual import,
> - * so it is not affected.
> + * uv_wiggle_folio() - try to drain extra references to a folio
> + * @folio: the folio
> + * @split: whether to split a large folio
>    *
> - * No export is needed also when there is only one protected VM, because the
> - * page cannot belong to the wrong VM in that case (there is no "other VM"
> - * it can belong to).
> - *
> - * Return: true if an export is needed before every import, otherwise false.
> + * Context: Must be called while holding an extra reference to the folio;
> + *          the mm lock should not be held.
>    */
> -static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm)
> +int uv_wiggle_folio(struct folio *folio, bool split)

Should I expect a drop of references to also split THPs?
Seems a bit odd to me but I do not know a lot about folios.

>   {
> -	/*
> -	 * The misc feature indicates, among other things, that importing a
> -	 * shared page from a different protected VM will automatically also
> -	 * transfer its ownership.
> -	 */
> -	if (uv_has_feature(BIT_UV_FEAT_MISC))
> -		return false;
> -	if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED)
> -		return false;
> -	return atomic_read(&mm->context.protected_count) > 1;
> -}
> -

[...]

> -/*
> - * Requests the Ultravisor to make a page accessible to a guest.
> - * If it's brought in the first time, it will be cleared. If
> - * it has been exported before, it will be decrypted and integrity
> - * checked.
> - */
> -int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> -{
> -	struct vm_area_struct *vma;
> -	bool drain_lru_called = false;
> -	spinlock_t *ptelock;
> -	unsigned long uaddr;
> -	struct folio *folio;
> -	pte_t *ptep;
>   	int rc;
>   
> -again:
> -	rc = -EFAULT;
> -	mmap_read_lock(gmap->mm);
> -
> -	uaddr = __gmap_translate(gmap, gaddr);
> -	if (IS_ERR_VALUE(uaddr))
> -		goto out;
> -	vma = vma_lookup(gmap->mm, uaddr);
> -	if (!vma)
> -		goto out;
> -	/*
> -	 * Secure pages cannot be huge and userspace should not combine both.
> -	 * In case userspace does it anyway this will result in an -EFAULT for
> -	 * the unpack. The guest is thus never reaching secure mode. If
> -	 * userspace is playing dirty tricky with mapping huge pages later
> -	 * on this will result in a segmentation fault.
> -	 */
> -	if (is_vm_hugetlb_page(vma))
> -		goto out;
> -
> -	rc = -ENXIO;
> -	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
> -	if (!ptep)
> -		goto out;
> -	if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
> -		folio = page_folio(pte_page(*ptep));
> -		rc = -EAGAIN;
> -		if (folio_test_large(folio)) {
> -			rc = -E2BIG;
> -		} else if (folio_trylock(folio)) {
> -			if (should_export_before_import(uvcb, gmap->mm))
> -				uv_convert_from_secure(PFN_PHYS(folio_pfn(folio)));
> -			rc = make_folio_secure(folio, uvcb);
> -			folio_unlock(folio);
> -		}
> -
> -		/*
> -		 * Once we drop the PTL, the folio may get unmapped and
> -		 * freed immediately. We need a temporary reference.
> -		 */
> -		if (rc == -EAGAIN || rc == -E2BIG)
> -			folio_get(folio);
> -	}
> -	pte_unmap_unlock(ptep, ptelock);
> -out:
> -	mmap_read_unlock(gmap->mm);
> -
> -	switch (rc) {
> -	case -E2BIG:
> +	folio_wait_writeback(folio);

Add an assert_not_held for the mm mutex above 
"folio_wait_writeback(folio);"?

> +	if (split) {
>   		folio_lock(folio);
>   		rc = split_folio(folio);
>   		folio_unlock(folio);
> -		folio_put(folio);
> -
> -		switch (rc) {
> -		case 0:
> -			/* Splitting succeeded, try again immediately. */
> -			goto again;
> -		case -EAGAIN:
> -			/* Additional folio references. */
> -			if (drain_lru(&drain_lru_called))
> -				goto again;
> -			return -EAGAIN;
> -		case -EBUSY:
> -			/* Unexpected race. */
> +
> +		if (rc == -EBUSY)
>   			return -EAGAIN;
> -		}
> -		WARN_ON_ONCE(1);
> -		return -ENXIO;
> -	case -EAGAIN:
> -		/*
> -		 * If we are here because the UVC returned busy or partial
> -		 * completion, this is just a useless check, but it is safe.
> -		 */
> -		folio_wait_writeback(folio);
> -		folio_put(folio);
> -		return -EAGAIN;
> -	case -EBUSY:
> -		/* Additional folio references. */
> -		if (drain_lru(&drain_lru_called))
> -			goto again;
> -		return -EAGAIN;
> -	case -ENXIO:
> -		if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))
> -			return -EFAULT;
> -		return -EAGAIN;
> +		if (rc != -EAGAIN)
> +			return rc;
>   	}
> -	return rc;
> -}
> -EXPORT_SYMBOL_GPL(gmap_make_secure);
> -
> -int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
> -{
> -	struct uv_cb_cts uvcb = {
> -		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
> -		.header.len = sizeof(uvcb),
> -		.guest_handle = gmap->guest_handle,
> -		.gaddr = gaddr,
> -	};
> -
> -	return gmap_make_secure(gmap, gaddr, &uvcb);
> -}
> -EXPORT_SYMBOL_GPL(gmap_convert_to_secure);
> -
> -/**
> - * gmap_destroy_page - Destroy a guest page.
> - * @gmap: the gmap of the guest
> - * @gaddr: the guest address to destroy
> - *
> - * An attempt will be made to destroy the given guest page. If the attempt
> - * fails, an attempt is made to export the page. If both attempts fail, an
> - * appropriate error is returned.
> - */
> -int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)
> -{
> -	struct vm_area_struct *vma;
> -	struct folio_walk fw;
> -	unsigned long uaddr;
> -	struct folio *folio;
> -	int rc;
> -
> -	rc = -EFAULT;
> -	mmap_read_lock(gmap->mm);
> -
> -	uaddr = __gmap_translate(gmap, gaddr);
> -	if (IS_ERR_VALUE(uaddr))
> -		goto out;
> -	vma = vma_lookup(gmap->mm, uaddr);
> -	if (!vma)
> -		goto out;
> -	/*
> -	 * Huge pages should not be able to become secure
> -	 */
> -	if (is_vm_hugetlb_page(vma))
> -		goto out;
> -
> -	rc = 0;
> -	folio = folio_walk_start(&fw, vma, uaddr, 0);
> -	if (!folio)
> -		goto out;
> -	/*
> -	 * See gmap_make_secure(): large folios cannot be secure. Small
> -	 * folio implies FW_LEVEL_PTE.
> -	 */
> -	if (folio_test_large(folio) || !pte_write(fw.pte))
> -		goto out_walk_end;
> -	rc = uv_destroy_folio(folio);
> -	/*
> -	 * Fault handlers can race; it is possible that two CPUs will fault
> -	 * on the same secure page. One CPU can destroy the page, reboot,
> -	 * re-enter secure mode and import it, while the second CPU was
> -	 * stuck at the beginning of the handler. At some point the second
> -	 * CPU will be able to progress, and it will not be able to destroy
> -	 * the page. In that case we do not want to terminate the process,
> -	 * we instead try to export the page.
> -	 */
> -	if (rc)
> -		rc = uv_convert_from_secure_folio(folio);
> -out_walk_end:
> -	folio_walk_end(&fw, vma);
> -out:
> -	mmap_read_unlock(gmap->mm);
> -	return rc;
> +	lru_add_drain_all();
> +	return -EAGAIN;
>   }
> -EXPORT_SYMBOL_GPL(gmap_destroy_page);
> +EXPORT_SYMBOL_GPL(uv_wiggle_folio);
>   
>   /*
>    * To be called with the folio locked or with an extra reference! This will
> diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
> index 02217fb4ae10..d972dea657fd 100644
> --- a/arch/s390/kvm/Makefile
> +++ b/arch/s390/kvm/Makefile
> @@ -8,7 +8,7 @@ include $(srctree)/virt/kvm/Makefile.kvm
>   ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
>   
>   kvm-y += kvm-s390.o intercept.o interrupt.o priv.o sigp.o
> -kvm-y += diag.o gaccess.o guestdbg.o vsie.o pv.o
> +kvm-y += diag.o gaccess.o guestdbg.o vsie.o pv.o gmap.o
>   
>   kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) += pci.o
>   obj-$(CONFIG_KVM) += kvm.o
> diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
> new file mode 100644
> index 000000000000..c0911a863902
> --- /dev/null
> +++ b/arch/s390/kvm/gmap.c
> @@ -0,0 +1,196 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Guest memory management for KVM/s390
> + *
> + * Copyright IBM Corp. 2008, 2020, 2024
> + *
> + *    Author(s): Claudio Imbrenda <imbrenda@linux.ibm.com>
> + *               Martin Schwidefsky <schwidefsky@de.ibm.com>
> + *               David Hildenbrand <david@redhat.com>
> + *               Janosch Frank <frankja@linux.vnet.ibm.com>
> + */
> +
> +#include <linux/compiler.h>
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +#include <linux/pgtable.h>
> +#include <linux/pagemap.h>
> +
> +#include <asm/lowcore.h>
> +#include <asm/gmap.h>
> +#include <asm/uv.h>
> +
> +#include "gmap.h"
> +
> +/**
> + * should_export_before_import - Determine whether an export is needed
> + * before an import-like operation
> + * @uvcb: the Ultravisor control block of the UVC to be performed
> + * @mm: the mm of the process
> + *
> + * Returns whether an export is needed before every import-like operation.
> + * This is needed for shared pages, which don't trigger a secure storage
> + * exception when accessed from a different guest.
> + *
> + * Although considered as one, the Unpin Page UVC is not an actual import,
> + * so it is not affected.
> + *
> + * No export is needed also when there is only one protected VM, because the
> + * page cannot belong to the wrong VM in that case (there is no "other VM"
> + * it can belong to).
> + *
> + * Return: true if an export is needed before every import, otherwise false.
> + */
> +static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm)
> +{
> +	/*
> +	 * The misc feature indicates, among other things, that importing a
> +	 * shared page from a different protected VM will automatically also
> +	 * transfer its ownership.
> +	 */
> +	if (uv_has_feature(BIT_UV_FEAT_MISC))
> +		return false;
> +	if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED)
> +		return false;
> +	return atomic_read(&mm->context.protected_count) > 1;
> +}
> +
> +static int __gmap_make_secure(struct gmap *gmap, struct page *page, void *uvcb)
> +{
> +	struct folio *folio = page_folio(page);
> +	int rc;
> +
> +	/*
> +	 * Secure pages cannot be huge and userspace should not combine both.
> +	 * In case userspace does it anyway this will result in an -EFAULT for
> +	 * the unpack. The guest is thus never reaching secure mode.
> +	 * If userspace plays dirty tricks and decides to map huge pages at a
> +	 * later point in time, it will receive a segmentation fault or
> +	 * KVM_RUN will return -EFAULT.
> +	 */
> +	if (folio_test_hugetlb(folio))
> +		return -EFAULT;
> +	if (folio_test_large(folio)) {
> +		mmap_read_unlock(gmap->mm);
> +		rc = uv_wiggle_folio(folio, true);
> +		mmap_read_lock(gmap->mm);
> +		if (rc)
> +			return rc;
> +		folio = page_folio(page);
> +	}
> +
> +	rc = -EAGAIN;
> +	if (folio_trylock(folio)) {

If you test for !folio_trylock() you could do an early return, no?

> +		if (should_export_before_import(uvcb, gmap->mm))
> +			uv_convert_from_secure(folio_to_phys(folio));
> +		rc = make_folio_secure(folio, uvcb);
> +		folio_unlock(folio);
> +	}
> +
> +	/*
> +	 * Unlikely case: the page is not mapped anymore. Return success
> +	 * and let the proper fault handler fault in the page again.
> +	 */
> +	if (rc == -ENXIO)
> +		return 0;
> +	/* The folio has too many references, try to shake some off */
> +	if (rc == -EBUSY) {
> +		mmap_read_unlock(gmap->mm);
> +		uv_wiggle_folio(folio, false);
> +		mmap_read_lock(gmap->mm);
> +		return -EAGAIN;
> +	}
> +
> +	return rc;
> +}
>

  reply	other threads:[~2025-01-17 13:38 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-16 11:33 [PATCH v2 00/15] KVM: s390: Stop using page->index and other things Claudio Imbrenda
2025-01-16 11:33 ` [PATCH v2 01/15] KVM: Do not restrict the size of KVM-internal memory regions Claudio Imbrenda
2025-01-16 12:17   ` Christoph Schlameuss
2025-01-16 11:33 ` [PATCH v2 02/15] KVM: s390: wrapper for KVM_BUG Claudio Imbrenda
2025-01-17 16:36   ` Steffen Eiden
2025-01-16 11:33 ` [PATCH v2 03/15] KVM: s390: move pv gmap functions into kvm Claudio Imbrenda
2025-01-17 13:38   ` Janosch Frank [this message]
2025-01-17 13:49     ` Claudio Imbrenda
2025-01-17 16:04   ` Janosch Frank
2025-01-16 11:33 ` [PATCH v2 04/15] KVM: s390: fake memslot for ucontrol VMs Claudio Imbrenda
2025-01-16 12:42   ` Janosch Frank
2025-01-16 12:46     ` Claudio Imbrenda
2025-01-16 11:33 ` [PATCH v2 05/15] KVM: s390: selftests: fix ucontrol memory region test Claudio Imbrenda
2025-01-16 11:33 ` [PATCH v2 06/15] KVM: s390: use __kvm_faultin_pfn() Claudio Imbrenda
2025-01-17 16:20   ` Janosch Frank
2025-01-16 11:33 ` [PATCH v2 07/15] KVM: s390: get rid of gmap_fault() Claudio Imbrenda
2025-01-17 16:22   ` Janosch Frank
2025-01-16 11:33 ` [PATCH v2 08/15] KVM: s390: get rid of gmap_translate() Claudio Imbrenda
2025-01-17 16:29   ` Janosch Frank
2025-01-16 11:33 ` [PATCH v2 09/15] KVM: s390: move some gmap shadowing functions away from mm/gmap.c Claudio Imbrenda
2025-01-17 16:41   ` Janosch Frank
2025-01-16 11:33 ` [PATCH v2 10/15] KVM: s390: stop using page->index for non-shadow gmaps Claudio Imbrenda
2025-01-17 16:52   ` Janosch Frank
2025-01-16 11:33 ` [PATCH v2 11/15] KVM: s390: stop using lists to keep track of used dat tables Claudio Imbrenda
2025-01-20 12:21   ` Janosch Frank
2025-01-16 11:33 ` [PATCH v2 12/15] KVM: s390: move gmap_shadow_pgt_lookup() into kvm Claudio Imbrenda
2025-01-17 12:58   ` Steffen Eiden
2025-01-20 13:47   ` Janosch Frank
2025-01-20 13:54     ` Claudio Imbrenda
2025-01-16 11:33 ` [PATCH v2 13/15] KVM: s390: remove useless page->index usage Claudio Imbrenda
2025-01-20 12:25   ` Janosch Frank
2025-01-16 11:33 ` [PATCH v2 14/15] KVM: s390: move PGSTE softbits Claudio Imbrenda
2025-01-17 16:11   ` Steffen Eiden
2025-01-16 11:33 ` [PATCH v2 15/15] KVM: s390: remove the last user of page->index Claudio Imbrenda
2025-01-17 16:34   ` Steffen Eiden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f0a1778-0617-4c8d-bc8f-40eae47570fb@linux.ibm.com \
    --to=frankja@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=david@redhat.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=nrb@linux.ibm.com \
    --cc=nsg@linux.ibm.com \
    --cc=schlameuss@linux.ibm.com \
    --cc=seanjc@google.com \
    --cc=seiden@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox