public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: David Hildenbrand <david@redhat.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-s390@vger.kernel.org, frankja@linux.ibm.com,
	borntraeger@de.ibm.com
Subject: Re: [GIT PULL v2 09/20] KVM: s390: move pv gmap functions into kvm
Date: Wed, 12 Feb 2025 18:45:38 +0100	[thread overview]
Message-ID: <20250212184538.3c79d608@p-imbrenda> (raw)
In-Reply-To: <d5ef124a-d353-4074-925e-a2721be3ce5d@redhat.com>

On Wed, 12 Feb 2025 17:55:18 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 31.01.25 12:24, Claudio Imbrenda wrote:
> > Move gmap related functions from kernel/uv into kvm.
> > 
> > Create a new file to collect gmap-related functions.
> > 
> > Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
> > Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
> > [fixed unpack_one(), thanks mhartmay@linux.ibm.com]
> > Link: https://lore.kernel.org/r/20250123144627.312456-6-imbrenda@linux.ibm.com
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > Message-ID: <20250123144627.312456-6-imbrenda@linux.ibm.com>
> > ---  
> 
> This patch breaks large folio splitting because you end up un-refing
> the wrong folios after a split; I tried to make it work, but either
> because of other changes in this patch (or in others), I
> cannot get it to work and have to give up for today.

yes, I had also noticed that and I already have a fix ready. In fact my
fix was exactly like yours, except that I did not pass the struct folio
anymore to kvm_s390_wiggle_split_folio(), but instead I only pass a
page and use page_folio() at the beginning, and I use
split_huge_page_to_list_to_order() directly instead of split_folio()
 
unfortunately the fix does not fix the issue I'm seeing....

but putting printks everywhere seems to solve the issue, so it seems to
be a race somewhere

> 
> Running a simple VM backed by memory-backend-memfd
> (which now uses large folios) no longer works (random refcount underflows /
> freeing of wrong pages).
> 
> 
> 
> The following should be required (but according to my testing insufficient):
> 
>  From 71fafff5183c637f20830f6f346e8c9f3eafeb59 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Wed, 12 Feb 2025 16:00:32 +0100
> Subject: [PATCH] KVM: s390: fix splitting of large folios
> 
> If we end up splitting the large folio, doing a put_page() will drop the
> wrong reference (unless it was the head page), because we are holding a
> reference to the old (large) folio. Similarly, doing another
> page_folio() after the split is wrong.
> 
> The result is that we end up freeing a page that is still mapped+used.
> 
> To fix it, let's pass the page and call split_huge_page() instead.
> 
> As an alternative, we could convert all code to use folios, and to
> look up the page again from the page table after our split; however, in
> context of non-uniform folio splits [1], it make sense to pass the page
> where we really want to split.
> 
> [1] https://lkml.kernel.org/r/20250211155034.268962-1-ziy@nvidia.com
> 
> Fixes: 5cbe24350b7d ("KVM: s390: move pv gmap functions into kvm")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   arch/s390/include/asm/gmap.h |  3 ++-
>   arch/s390/kvm/gmap.c         |  4 ++--
>   arch/s390/mm/gmap.c          | 13 +++++++++++--
>   3 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index 4e73ef46d4b2a..0efa087778135 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -139,7 +139,8 @@ int s390_replace_asce(struct gmap *gmap);
>   void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns);
>   int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start,
>   			    unsigned long end, bool interruptible);
> -int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool split);
> +int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio,
> +		struct page *page, bool split);
>   unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int level);
>   
>   /**
> diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
> index 02adf151d4de4..c2523c63afea3 100644
> --- a/arch/s390/kvm/gmap.c
> +++ b/arch/s390/kvm/gmap.c
> @@ -72,7 +72,7 @@ static int __gmap_make_secure(struct gmap *gmap, struct page *page, void *uvcb)
>   		return -EFAULT;
>   	if (folio_test_large(folio)) {
>   		mmap_read_unlock(gmap->mm);
> -		rc = kvm_s390_wiggle_split_folio(gmap->mm, folio, true);
> +		rc = kvm_s390_wiggle_split_folio(gmap->mm, folio, page, true);
>   		mmap_read_lock(gmap->mm);
>   		if (rc)
>   			return rc;
> @@ -100,7 +100,7 @@ static int __gmap_make_secure(struct gmap *gmap, struct page *page, void *uvcb)
>   	/* The folio has too many references, try to shake some off */
>   	if (rc == -EBUSY) {
>   		mmap_read_unlock(gmap->mm);
> -		kvm_s390_wiggle_split_folio(gmap->mm, folio, false);
> +		kvm_s390_wiggle_split_folio(gmap->mm, folio, page, false);
>   		mmap_read_lock(gmap->mm);
>   		return -EAGAIN;
>   	}
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 94d9277858009..3180ad90a255a 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2631,12 +2631,18 @@ EXPORT_SYMBOL_GPL(s390_replace_asce);
>    * kvm_s390_wiggle_split_folio() - try to drain extra references to a folio and optionally split
>    * @mm:    the mm containing the folio to work on
>    * @folio: the folio
> + * @page:  the folio page where to split the folio
>    * @split: whether to split a large folio
>    *
> + * If a split of a large folio was requested, the original provided folio must
> + * no longer be used if this function returns 0. The new folio must be looked
> + * up using page_folio(), to which we will then hold a reference.
> + *
>    * Context: Must be called while holding an extra reference to the folio;
>    *          the mm lock should not be held.
>    */
> -int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool split)
> +int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio,
> +		struct page *page, bool split)
>   {
>   	int rc;
>   
> @@ -2645,7 +2651,10 @@ int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool
>   	lru_add_drain_all();
>   	if (split) {
>   		folio_lock(folio);
> -		rc = split_folio(folio);
> +		/* Careful: split_folio() would be wrong. */
> +		rc = split_huge_page(page);
> +		if (!rc)
> +			folio = page_folio(page);
>   		folio_unlock(folio);
>   
>   		if (rc != -EBUSY)


  reply	other threads:[~2025-02-12 17:45 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-31 11:24 [GIT PULL v2 00/20] KVM: s390: some non-trivial fixes and cleanups for 6.14 Claudio Imbrenda
2025-01-31 11:24 ` [GIT PULL v2 01/20] KVM: s390: vsie: fix some corner-cases when grabbing vsie pages Claudio Imbrenda
2025-01-31 11:24 ` [GIT PULL v2 02/20] KVM: s390: vsie: stop using page->index Claudio Imbrenda
2025-01-31 11:24 ` [GIT PULL v2 03/20] KVM: s390: vsie: stop messing with page refcount Claudio Imbrenda
2025-01-31 11:24 ` [GIT PULL v2 04/20] KVM: s390: vsie: stop using "struct page" for vsie page Claudio Imbrenda
2025-01-31 11:24 ` [GIT PULL v2 05/20] KVM: Do not restrict the size of KVM-internal memory regions Claudio Imbrenda
2025-01-31 11:24 ` [GIT PULL v2 06/20] KVM: s390: wrapper for KVM_BUG Claudio Imbrenda
2025-01-31 11:24 ` [GIT PULL v2 07/20] KVM: s390: fake memslot for ucontrol VMs Claudio Imbrenda
2025-01-31 11:24 ` [GIT PULL v2 08/20] KVM: s390: selftests: fix ucontrol memory region test Claudio Imbrenda
2025-01-31 11:24 ` [GIT PULL v2 09/20] KVM: s390: move pv gmap functions into kvm Claudio Imbrenda
2025-02-12 16:55   ` David Hildenbrand
2025-02-12 17:45     ` Claudio Imbrenda [this message]
2025-02-12 18:14       ` David Hildenbrand
2025-02-13 10:02         ` David Hildenbrand
2025-01-31 11:25 ` [GIT PULL v2 10/20] KVM: s390: use __kvm_faultin_pfn() Claudio Imbrenda
2025-01-31 11:25 ` [GIT PULL v2 11/20] KVM: s390: get rid of gmap_fault() Claudio Imbrenda
2025-01-31 11:25 ` [GIT PULL v2 12/20] KVM: s390: get rid of gmap_translate() Claudio Imbrenda
2025-01-31 11:25 ` [GIT PULL v2 13/20] KVM: s390: move some gmap shadowing functions away from mm/gmap.c Claudio Imbrenda
2025-01-31 11:25 ` [GIT PULL v2 14/20] KVM: s390: stop using page->index for non-shadow gmaps Claudio Imbrenda
2025-01-31 11:25 ` [GIT PULL v2 15/20] KVM: s390: stop using lists to keep track of used dat tables Claudio Imbrenda
2025-01-31 11:25 ` [GIT PULL v2 16/20] KVM: s390: move gmap_shadow_pgt_lookup() into kvm Claudio Imbrenda
2025-01-31 11:25 ` [GIT PULL v2 17/20] KVM: s390: remove useless page->index usage Claudio Imbrenda
2025-01-31 11:25 ` [GIT PULL v2 18/20] KVM: s390: move PGSTE softbits Claudio Imbrenda
2025-01-31 11:25 ` [GIT PULL v2 19/20] KVM: s390: remove the last user of page->index Claudio Imbrenda
2025-01-31 11:25 ` [GIT PULL v2 20/20] KVM: s390: selftests: Streamline uc_skey test to issue iske after sske Claudio Imbrenda
2025-02-06  9:39 ` [GIT PULL v2 00/20] KVM: s390: some non-trivial fixes and cleanups for 6.14 Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250212184538.3c79d608@p-imbrenda \
    --to=imbrenda@linux.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox