public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
To: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: kvm@vger.kernel.org, linux-s390@vger.kernel.org,
	borntraeger@de.ibm.com, frankja@linux.ibm.com, nrb@linux.ibm.com,
	seiden@linux.ibm.com, schlameuss@linux.ibm.com,
	hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com,
	david@redhat.com, gerald.schaefer@linux.ibm.com
Subject: Re: [PATCH v2 03/20] KVM: s390: Add gmap_helper_set_unused()
Date: Mon, 27 Oct 2025 19:00:15 +0100	[thread overview]
Message-ID: <af9c7f4aedf71896dd2a9dd80837aae8f3428f93.camel@linux.ibm.com> (raw)
In-Reply-To: <20250915133340.5b7c7d55@p-imbrenda>

On Mon, 2025-09-15 at 13:33 +0200, Claudio Imbrenda wrote:
> On Fri, 12 Sep 2025 11:17:02 +0200
> Nina Schoetterl-Glausch <nsg@linux.ibm.com> wrote:
> 
> > On Wed, 2025-09-10 at 20:07 +0200, Claudio Imbrenda wrote:
> > > Add gmap_helper_set_unused() to mark userspace ptes as unused.
> > > 
> > > Core mm code will use that information to discard unused pages instead
> > > of attempting to swap them.
> > > 
> > > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>  
> > 
> > LGTM
> > > ---
> > >  arch/s390/include/asm/gmap_helpers.h |  1 +
> > >  arch/s390/mm/gmap_helpers.c          | 64 ++++++++++++++++++++++++++++
> > >  2 files changed, 65 insertions(+)
> > > 
> > > diff --git a/arch/s390/include/asm/gmap_helpers.h b/arch/s390/include/asm/gmap_helpers.h
> > > index 5356446a61c4..459bd39d0887 100644
> > > --- a/arch/s390/include/asm/gmap_helpers.h
> > > +++ b/arch/s390/include/asm/gmap_helpers.h
> > > @@ -11,5 +11,6 @@
> > >  void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr);
> > >  void gmap_helper_discard(struct mm_struct *mm, unsigned long vmaddr, unsigned long end);
> > >  int gmap_helper_disable_cow_sharing(void);
> > > +void gmap_helper_set_unused(struct mm_struct *mm, unsigned long vmaddr);
> > >  
> > >  #endif /* _ASM_S390_GMAP_HELPERS_H */
> > > diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c
> > > index a45d417ad951..69ffc0c6b654 100644
> > > --- a/arch/s390/mm/gmap_helpers.c
> > > +++ b/arch/s390/mm/gmap_helpers.c
> > > @@ -91,6 +91,70 @@ void gmap_helper_discard(struct mm_struct *mm, unsigned long vmaddr, unsigned lo
> > >  }
> > >  EXPORT_SYMBOL_GPL(gmap_helper_discard);
> > >  
> > > +/**
> > > + * gmap_helper_set_unused() - mark a pte entry as unused
> > > + * @mm: the mm
> > > + * @vmaddr: the userspace address whose pte is to be marked
> > > + *
> > > + * Mark the pte corresponding the given address as unused. This will cause
> > > + * core mm code to just drop this page instead of swapping it.
> > > + *
> > > + * This function needs to be called with interrupts disabled (for example
> > > + * while holding a spinlock), or while holding the mmap lock. Normally this
> > > + * function is called as a result of an unmap operation, and thus KVM common
> > > + * code will already hold kvm->mmu_lock in write mode.
> > > + *
> > > + * Context: Needs to be called while holding the mmap lock or with interrupts
> > > + *          disabled.
> > > + */
> > > +void gmap_helper_set_unused(struct mm_struct *mm, unsigned long vmaddr)  
> > 
> > Can you give this a better name? E.g. gmap_helper_try_set_pte_unused
> 
> yes
> 
> > 
> > > +{
> > > +	pmd_t *pmdp, pmd, pmdval;
> > > +	pud_t *pudp, pud;
> > > +	p4d_t *p4dp, p4d;
> > > +	pgd_t *pgdp, pgd;
> > > +	spinlock_t *ptl;
> > > +	pte_t *ptep;
> > > +
> > > +	pgdp = pgd_offset(mm, vmaddr);
> > > +	pgd = pgdp_get(pgdp);
> > > +	if (pgd_none(pgd) || !pgd_present(pgd))
> > > +		return;
> > > +
> > > +	p4dp = p4d_offset(pgdp, vmaddr);
> > > +	p4d = p4dp_get(p4dp);
> > > +	if (p4d_none(p4d) || !p4d_present(p4d))
> > > +		return;
> > > +
> > > +	pudp = pud_offset(p4dp, vmaddr);
> > > +	pud = pudp_get(pudp);
> > > +	if (pud_none(pud) || pud_leaf(pud) || !pud_present(pud))
> > > +		return;
> > > +
> > > +	pmdp = pmd_offset(pudp, vmaddr);
> > > +	pmd = pmdp_get_lockless(pmdp);
> > > +	if (pmd_none(pmd) || pmd_leaf(pmd) || !pmd_present(pmd))
> > > +		return;
> > > +
> > > +	ptep = pte_offset_map_rw_nolock(mm, pmdp, vmaddr, &pmdval, &ptl);
> > > +	if (!ptep)
> > > +		return;
> > > +
> > > +	if (spin_trylock(ptl)) {  
> > 
> > Missing the comment you promised :) about deadlock prevention.
> 
> Ooops! will fix
> 
> > 
> > > +		/*
> > > +		 * Make sure the pte we are touching is still the correct
> > > +		 * one. In theory this check should not be needed, but  
> > 
> > Why should it not be needed? I.e. why should we be protected against modification?
> 
> I will add this in a comment:
> 
> a pmd pointing to a page table can change in only very few cases, and
> all cases will take the mm->mmap_lock in write mode and require IPC
> synchronization, which means that as long as interrupts are disabled or
> we are holding the mmap_lock, the pmd cannot change under our feet.
> 
> by keeping interrupts disabled, we are basically stalling any remote
> CPUs that might want to change the pmd, as the IPC will not complete
> until we re-enable them.
> 
> in our case, we call this function after calling pgste_get_lock(),
> which will disable interrupts until pgste_set_unlock() is called.
> 
> > > +		 * better safe than sorry.
> > > +		 */
> > > +		if (likely(pmd_same(pmdval, pmdp_get_lockless(pmdp))))
> > > +			__atomic64_or(_PAGE_UNUSED, (long *)ptep);

Should we do a WARN_ONCE? Would we be notified if it shows up in CI?
Is the "if" even meaningful? That is, if the pmd is not the same for what
ever reason, are we guaranteed to see it here.
Does taking the page table lock also lock the pmd table? (I assume not)
What are the consequences of the or if the pmd is not the same, arbitrary memory corruption?

> > > +		spin_unlock(ptl);
> > > +	}
> > > +
> > > +	pte_unmap(ptep);
> > > +}
> > > +EXPORT_SYMBOL_GPL(gmap_helper_set_unused);
> > > +
> > >  static int find_zeropage_pte_entry(pte_t *pte, unsigned long addr,
> > >  				   unsigned long end, struct mm_walk *walk)
> > >  {  
> > 

-- 
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

  reply	other threads:[~2025-10-27 18:01 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 18:07 [PATCH v2 00/20] KVM: s390: gmap rewrite, the real deal Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 01/20] KVM: s390: add P bit in table entry bitfields, move union vaddress Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 02/20] s390: Move sske_frame() to a header Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 03/20] KVM: s390: Add gmap_helper_set_unused() Claudio Imbrenda
2025-09-11  8:38   ` Nico Boehr
2025-09-12  9:17   ` Nina Schoetterl-Glausch
2025-09-15 11:33     ` Claudio Imbrenda
2025-10-27 18:00       ` Nina Schoetterl-Glausch [this message]
2025-09-10 18:07 ` [PATCH v2 04/20] KVM: s390: Enable KVM_GENERIC_MMU_NOTIFIER Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 05/20] KVM: s390: Add helper functions for fault handling Claudio Imbrenda
2025-09-12 17:56   ` Nina Schoetterl-Glausch
2025-09-15 11:49     ` Claudio Imbrenda
2025-09-18 14:19   ` Alexander Gordeev
2025-09-18 14:46     ` Claudio Imbrenda
2025-09-18 14:41   ` Alexander Gordeev
2025-09-18 15:10     ` Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 06/20] KVM: s390: Rename some functions in gaccess.c Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 07/20] KVM: s390: KVM-specific bitfields and helper functions Claudio Imbrenda
2025-09-17 12:18   ` Heiko Carstens
2025-09-17 12:51     ` Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 08/20] KVM: s390: KVM page table management functions: allocation Claudio Imbrenda
2025-09-11  8:22   ` Janosch Frank
2025-09-11  8:43     ` Claudio Imbrenda
2025-09-16 16:26   ` Heiko Carstens
2025-09-16 16:47     ` Claudio Imbrenda
2025-09-16 17:01       ` Christian Borntraeger
2025-09-16 17:05         ` Claudio Imbrenda
2025-09-16 17:06           ` Christian Borntraeger
2025-09-16 17:36             ` Heiko Carstens
2025-09-17  7:27               ` Heiko Carstens
2025-09-17 11:25                 ` Claudio Imbrenda
2025-09-17 12:30                   ` Heiko Carstens
2025-09-17 13:11                     ` Claudio Imbrenda
2025-09-17 13:26                       ` Christian Borntraeger
2025-09-17 14:00                         ` Claudio Imbrenda
2025-09-17 14:05                           ` Christian Borntraeger
2025-09-17 14:11                             ` Claudio Imbrenda
2025-09-17 17:08                             ` Claudio Imbrenda
2025-09-17 13:31                       ` Heiko Carstens
2025-09-17 14:00                         ` Claudio Imbrenda
2025-09-17 12:12               ` Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 09/20] KVM: s390: KVM page table management functions: clear and replace Claudio Imbrenda
2025-09-11 12:57   ` Janosch Frank
2025-09-11 13:19     ` Claudio Imbrenda
2025-09-11 13:27       ` Janosch Frank
2025-09-16 15:56         ` Heiko Carstens
2025-09-16 16:47   ` Heiko Carstens
2025-09-16 17:04     ` Claudio Imbrenda
2025-09-16 17:27       ` Heiko Carstens
2025-09-10 18:07 ` [PATCH v2 10/20] KVM: s390: KVM page table management functions: walks Claudio Imbrenda
2025-09-11 12:56   ` Janosch Frank
2025-09-11 13:14     ` Claudio Imbrenda
2025-09-12  5:47       ` Gerd Bayer
2025-09-16 16:22   ` Heiko Carstens
2025-09-16 16:48     ` Claudio Imbrenda
2025-09-16 17:24       ` Heiko Carstens
2025-09-17 11:14         ` Claudio Imbrenda
2025-09-17 12:55   ` Heiko Carstens
2025-09-17 13:13     ` Claudio Imbrenda
2025-09-17 13:24       ` Heiko Carstens
2025-09-17 14:01         ` Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 11/20] KVM: s390: KVM page table management functions: storage keys Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 12/20] KVM: s390: KVM page table management functions: lifecycle management Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 13/20] KVM: s390: KVM page table management functions: CMMA Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 14/20] KVM: s390: New gmap code Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 15/20] KVM: s390: Stop using CONFIG_PGSTE Claudio Imbrenda
2025-09-16  7:45   ` Steffen Eiden
2025-09-10 18:07 ` [PATCH v2 16/20] KVM: s390: Switch to new gmap Claudio Imbrenda
2025-09-17 13:20   ` Heiko Carstens
2025-09-10 18:07 ` [PATCH v2 17/20] KVM: s390: Remove gmap from s390/mm Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 18/20] KVM: S390: Remove PGSTE code from linux/s390 mm Claudio Imbrenda
2025-09-16  7:30   ` Steffen Eiden
2025-09-16  9:24     ` Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 19/20] KVM: s390: Enable 1M pages for gmap Claudio Imbrenda
2025-09-10 18:07 ` [PATCH v2 20/20] KVM: s390: Storage key manipulation IOCTL Claudio Imbrenda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af9c7f4aedf71896dd2a9dd80837aae8f3428f93.camel@linux.ibm.com \
    --to=nsg@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=nrb@linux.ibm.com \
    --cc=schlameuss@linux.ibm.com \
    --cc=seiden@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox