The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH v2 0/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit
@ 2026-06-15  9:17 Claudio Imbrenda
  2026-06-15  9:17 ` [PATCH v2 1/1] " Claudio Imbrenda
  0 siblings, 1 reply; 8+ messages in thread
From: Claudio Imbrenda @ 2026-06-15  9:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

Fix handling of _PAGE_UNUSED pte bit. The bit was lingering around
longer than it should have, causing issues.

This patch replaces "s390/pgtable: Unconditionally clear _PAGE_UNUSED"
which also solved the issue in practice, but in the wrong way.

v1->v2:
* Completely different approach, entirely new patch

Claudio Imbrenda (1):
  s390/mm: Fix handling of _PAGE_UNUSED pte bit

 arch/s390/mm/gmap_helpers.c | 4 ++--
 arch/s390/mm/pgtable.c      | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit
  2026-06-15  9:17 [PATCH v2 0/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
@ 2026-06-15  9:17 ` Claudio Imbrenda
  2026-06-15  9:43   ` Heiko Carstens
  2026-06-15 16:03   ` Alexander Gordeev
  0 siblings, 2 replies; 8+ messages in thread
From: Claudio Imbrenda @ 2026-06-15  9:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, linux-s390, borntraeger, frankja, david, seiden, nrb,
	schlameuss, gra, hca, gerald.schaefer, gor, agordeev, svens

The _PAGE_UNUSED softbit should not really be lying around. Its sole
purpose is to signal to try_to_unmap_one() and try_to_migrate_one()
that the page can be discarded instead of being moved / swapped.

KVM has no way to know why a page is being unmapped, so it sets the bit
on userspace ptes corresponding to unused guest pages every time they
get unmapped. KVM has no reasonable way to clear the bit once the page
is in use again.

Without appropriate cleanup, the _PAGE_UNUSED bit will linger around
and cause guest corruption when a used page is instead thrown out.

While set_ptes() checks and clears the bit, ptep_xchg_direct(),
ptep_xchg_lazy(), and ptep_modify_prot_commit() did not. This led to
used pages being thrown out as if they were unused, causing guest
corruption.

This patch fixes the issue by introducing the missing checks in the
above functions.

Also fix gmap_helper_try_set_pte_unused() to only set the bit if the
pte is present; the _PAGE_UNUSED bit is only defined for present ptes
and thus should not be set for non-present ptes.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: c98175b7917f ("KVM: s390: Add gmap_helper_set_unused()")
---
 arch/s390/mm/gmap_helpers.c | 4 ++--
 arch/s390/mm/pgtable.c      | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c
index 1cfe4724fbe2..5a7d6b9790e2 100644
--- a/arch/s390/mm/gmap_helpers.c
+++ b/arch/s390/mm/gmap_helpers.c
@@ -180,8 +180,8 @@ void gmap_helper_try_set_pte_unused(struct mm_struct *mm, unsigned long vmaddr)
 	ptep = try_get_locked_pte(mm, vmaddr, &ptl);
 	if (IS_ERR_OR_NULL(ptep))
 		return;
-
-	__atomic64_or(_PAGE_UNUSED, (long *)ptep);
+	if (pte_present(*ptep))
+		__atomic64_or(_PAGE_UNUSED, (long *)ptep);
 	pte_unmap_unlock(ptep, ptl);
 }
 EXPORT_SYMBOL_GPL(gmap_helper_try_set_pte_unused);
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 4acd8b140c4b..2acc79383e7d 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -122,6 +122,8 @@ pte_t ptep_xchg_direct(struct mm_struct *mm, unsigned long addr,
 
 	preempt_disable();
 	old = ptep_flush_direct(mm, addr, ptep, 1);
+	if (pte_present(new))
+		new = clear_pte_bit(new, __pgprot(_PAGE_UNUSED));
 	set_pte(ptep, new);
 	preempt_enable();
 	return old;
@@ -160,6 +162,8 @@ pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr,
 
 	preempt_disable();
 	old = ptep_flush_lazy(mm, addr, ptep, 1);
+	if (pte_present(new))
+		new = clear_pte_bit(new, __pgprot(_PAGE_UNUSED));
 	set_pte(ptep, new);
 	preempt_enable();
 	return old;
@@ -175,6 +179,8 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
 void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t *ptep, pte_t old_pte, pte_t pte)
 {
+	if (pte_present(pte))
+		pte = clear_pte_bit(pte, __pgprot(_PAGE_UNUSED));
 	set_pte(ptep, pte);
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit
  2026-06-15  9:17 ` [PATCH v2 1/1] " Claudio Imbrenda
@ 2026-06-15  9:43   ` Heiko Carstens
  2026-06-15 10:31     ` Claudio Imbrenda
  2026-06-15 16:03   ` Alexander Gordeev
  1 sibling, 1 reply; 8+ messages in thread
From: Heiko Carstens @ 2026-06-15  9:43 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-kernel, kvm, linux-s390, borntraeger, frankja, david,
	seiden, nrb, schlameuss, gra, gerald.schaefer, gor, agordeev,
	svens

On Mon, Jun 15, 2026 at 11:17:41AM +0200, Claudio Imbrenda wrote:
> The _PAGE_UNUSED softbit should not really be lying around. Its sole
> purpose is to signal to try_to_unmap_one() and try_to_migrate_one()
> that the page can be discarded instead of being moved / swapped.
> 
> KVM has no way to know why a page is being unmapped, so it sets the bit
> on userspace ptes corresponding to unused guest pages every time they
> get unmapped. KVM has no reasonable way to clear the bit once the page
> is in use again.
> 
> Without appropriate cleanup, the _PAGE_UNUSED bit will linger around
> and cause guest corruption when a used page is instead thrown out.
> 
> While set_ptes() checks and clears the bit, ptep_xchg_direct(),
> ptep_xchg_lazy(), and ptep_modify_prot_commit() did not. This led to
> used pages being thrown out as if they were unused, causing guest
> corruption.
> 
> This patch fixes the issue by introducing the missing checks in the
> above functions.
> 
> Also fix gmap_helper_try_set_pte_unused() to only set the bit if the
> pte is present; the _PAGE_UNUSED bit is only defined for present ptes
> and thus should not be set for non-present ptes.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Fixes: c98175b7917f ("KVM: s390: Add gmap_helper_set_unused()")
> ---
>  arch/s390/mm/gmap_helpers.c | 4 ++--
>  arch/s390/mm/pgtable.c      | 6 ++++++
>  2 files changed, 8 insertions(+), 2 deletions(-)

...

> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 4acd8b140c4b..2acc79383e7d 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -122,6 +122,8 @@ pte_t ptep_xchg_direct(struct mm_struct *mm, unsigned long addr,
>  
>  	preempt_disable();
>  	old = ptep_flush_direct(mm, addr, ptep, 1);
> +	if (pte_present(new))
> +		new = clear_pte_bit(new, __pgprot(_PAGE_UNUSED));
>  	set_pte(ptep, new);
>  	preempt_enable();
>  	return old;
> @@ -160,6 +162,8 @@ pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr,
>  
>  	preempt_disable();
>  	old = ptep_flush_lazy(mm, addr, ptep, 1);
> +	if (pte_present(new))
> +		new = clear_pte_bit(new, __pgprot(_PAGE_UNUSED));
>  	set_pte(ptep, new);
>  	preempt_enable();
>  	return old;
> @@ -175,6 +179,8 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
>  void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
>  			     pte_t *ptep, pte_t old_pte, pte_t pte)
>  {
> +	if (pte_present(pte))
> +		pte = clear_pte_bit(pte, __pgprot(_PAGE_UNUSED));
>  	set_pte(ptep, pte);

Can't we move the logic from set_ptes() to set_pte() instead? The above
approach remembers me of the open-coded removal of the no-exec bit at many
places we had, which became a maintenance mess until it was rewritten.

The compiler _might_ even be clever enough to move the removal of the bit
outside the loop within set_ptes().

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit
  2026-06-15  9:43   ` Heiko Carstens
@ 2026-06-15 10:31     ` Claudio Imbrenda
  2026-06-15 11:50       ` Heiko Carstens
  0 siblings, 1 reply; 8+ messages in thread
From: Claudio Imbrenda @ 2026-06-15 10:31 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, kvm, linux-s390, borntraeger, frankja, david,
	seiden, nrb, schlameuss, gra, gerald.schaefer, gor, agordeev,
	svens

On Mon, 15 Jun 2026 11:43:00 +0200
Heiko Carstens <hca@linux.ibm.com> wrote:

[...]

> > @@ -175,6 +179,8 @@ pte_t ptep_modify_prot_start(struct
vm_area_struct *vma, unsigned long addr,
> >  void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
> >  			     pte_t *ptep, pte_t old_pte, pte_t pte)
> >  {
> > +	if (pte_present(pte))
> > +		pte = clear_pte_bit(pte, __pgprot(_PAGE_UNUSED));
> >  	set_pte(ptep, pte);  
> 
> Can't we move the logic from set_ptes() to set_pte() instead? The above

set_pte() is also used for things that are not ptes, and in those cases
we probably don't want to touch that bit, although technically it is
currently unused for present large pmds and puds.

> approach remembers me of the open-coded removal of the no-exec bit at many
> places we had, which became a maintenance mess until it was rewritten.
> 
> The compiler _might_ even be clever enough to move the removal of the bit
> outside the loop within set_ptes().


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit
  2026-06-15 10:31     ` Claudio Imbrenda
@ 2026-06-15 11:50       ` Heiko Carstens
  2026-06-15 12:09         ` Gerald Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Heiko Carstens @ 2026-06-15 11:50 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-kernel, kvm, linux-s390, borntraeger, frankja, david,
	seiden, nrb, schlameuss, gra, gerald.schaefer, gor, agordeev,
	svens

On Mon, Jun 15, 2026 at 12:31:03PM +0200, Claudio Imbrenda wrote:
> On Mon, 15 Jun 2026 11:43:00 +0200
> Heiko Carstens <hca@linux.ibm.com> wrote:
> 
> [...]
> 
> > > @@ -175,6 +179,8 @@ pte_t ptep_modify_prot_start(struct
> vm_area_struct *vma, unsigned long addr,
> > >  void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
> > >  			     pte_t *ptep, pte_t old_pte, pte_t pte)
> > >  {
> > > +	if (pte_present(pte))
> > > +		pte = clear_pte_bit(pte, __pgprot(_PAGE_UNUSED));
> > >  	set_pte(ptep, pte);  
> > 
> > Can't we move the logic from set_ptes() to set_pte() instead? The above
> 
> set_pte() is also used for things that are not ptes, and in those cases
> we probably don't want to touch that bit, although technically it is
> currently unused for present large pmds and puds.

I can only see huge_pte_clear() for this.

If that's the only user I'd rather add a BUG_ON() there instead of starting to
sprinkle the logic around. This _will_ break sooner or later.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit
  2026-06-15 11:50       ` Heiko Carstens
@ 2026-06-15 12:09         ` Gerald Schaefer
  2026-06-16 11:06           ` Heiko Carstens
  0 siblings, 1 reply; 8+ messages in thread
From: Gerald Schaefer @ 2026-06-15 12:09 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Claudio Imbrenda, linux-kernel, kvm, linux-s390, borntraeger,
	frankja, david, seiden, nrb, schlameuss, gra, gor, agordeev,
	svens

On Mon, 15 Jun 2026 13:50:00 +0200
Heiko Carstens <hca@linux.ibm.com> wrote:

> On Mon, Jun 15, 2026 at 12:31:03PM +0200, Claudio Imbrenda wrote:
> > On Mon, 15 Jun 2026 11:43:00 +0200
> > Heiko Carstens <hca@linux.ibm.com> wrote:
> > 
> > [...]
> > 
> > > > @@ -175,6 +179,8 @@ pte_t ptep_modify_prot_start(struct
> > vm_area_struct *vma, unsigned long addr,
> > > >  void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
> > > >  			     pte_t *ptep, pte_t old_pte, pte_t pte)
> > > >  {
> > > > +	if (pte_present(pte))
> > > > +		pte = clear_pte_bit(pte, __pgprot(_PAGE_UNUSED));
> > > >  	set_pte(ptep, pte);  
> > > 
> > > Can't we move the logic from set_ptes() to set_pte() instead? The above
> > 
> > set_pte() is also used for things that are not ptes, and in those cases
> > we probably don't want to touch that bit, although technically it is
> > currently unused for present large pmds and puds.
> 
> I can only see huge_pte_clear() for this.
> 
> If that's the only user I'd rather add a BUG_ON() there instead of starting to
> sprinkle the logic around. This _will_ break sooner or later.

There is also __set_huge_pte_at(), and there it could also be called for
swap PMDs/PUDs, where bit 56 is used. But they would not be present, and
ATM we use the same present bit for PTEs and PMDs/PUDs, so it should work.

Still feels a bit shaky, but in general I agree that it would be better
to have this logic in a single place, like set_pte().

Also wonder now why we use set_pte() an not set_pmd() / set_pud() for the
hugetlbfs "fake" PTEs. I think at least in s390 code we could switch to
the pmd/pud variants, and then be safe against modifications from set_pte().

BTW, wrt Sashiko report that just dropped in, I also wondered first about
ptep_reset_dat_prot() using set_pte(). Not 100% sure about the exact scenario
where/how the _PAGE_UNUSED bit gets mixed in, where it shouldn't. Maybe
the answer to that question might even show another fix option. But when
changing set_pte(), it should also be fine for ptep_reset_dat_prot(),
which requires that the PROTECT bit is the only HW bit getting changed,
and _PAGE_UNUSED is a SW bit.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit
  2026-06-15  9:17 ` [PATCH v2 1/1] " Claudio Imbrenda
  2026-06-15  9:43   ` Heiko Carstens
@ 2026-06-15 16:03   ` Alexander Gordeev
  1 sibling, 0 replies; 8+ messages in thread
From: Alexander Gordeev @ 2026-06-15 16:03 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-kernel, kvm, linux-s390, borntraeger, frankja, david,
	seiden, nrb, schlameuss, gra, hca, gerald.schaefer, gor, svens

On Mon, Jun 15, 2026 at 11:17:41AM +0200, Claudio Imbrenda wrote:
> @@ -122,6 +122,8 @@ pte_t ptep_xchg_direct(struct mm_struct *mm, unsigned long addr,
>  
>  	preempt_disable();
>  	old = ptep_flush_direct(mm, addr, ptep, 1);
> +	if (pte_present(new))
> +		new = clear_pte_bit(new, __pgprot(_PAGE_UNUSED));

Why not before preempt_disable()?

>  	set_pte(ptep, new);
>  	preempt_enable();
>  	return old;
> @@ -160,6 +162,8 @@ pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr,
>  
>  	preempt_disable();
>  	old = ptep_flush_lazy(mm, addr, ptep, 1);
> +	if (pte_present(new))
> +		new = clear_pte_bit(new, __pgprot(_PAGE_UNUSED));

Same here.

>  	set_pte(ptep, new);
>  	preempt_enable();
>  	return old;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit
  2026-06-15 12:09         ` Gerald Schaefer
@ 2026-06-16 11:06           ` Heiko Carstens
  0 siblings, 0 replies; 8+ messages in thread
From: Heiko Carstens @ 2026-06-16 11:06 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Claudio Imbrenda, linux-kernel, kvm, linux-s390, borntraeger,
	frankja, david, seiden, nrb, schlameuss, gra, gor, agordeev,
	svens

On Mon, Jun 15, 2026 at 02:09:39PM +0200, Gerald Schaefer wrote:
> On Mon, 15 Jun 2026 13:50:00 +0200
> Heiko Carstens <hca@linux.ibm.com> wrote:
> > > set_pte() is also used for things that are not ptes, and in those cases
> > > we probably don't want to touch that bit, although technically it is
> > > currently unused for present large pmds and puds.
> > 
> > I can only see huge_pte_clear() for this.
> > 
> > If that's the only user I'd rather add a BUG_ON() there instead of starting to
> > sprinkle the logic around. This _will_ break sooner or later.
> 
> There is also __set_huge_pte_at(), and there it could also be called for
> swap PMDs/PUDs, where bit 56 is used. But they would not be present, and
> ATM we use the same present bit for PTEs and PMDs/PUDs, so it should work.
> 
> Still feels a bit shaky, but in general I agree that it would be better
> to have this logic in a single place, like set_pte().

Yes, let's do that please.

> Also wonder now why we use set_pte() an not set_pmd() / set_pud() for the
> hugetlbfs "fake" PTEs. I think at least in s390 code we could switch to
> the pmd/pud variants, and then be safe against modifications from set_pte().

I guess that would be nice cleanup. Could you provide a patch for
that, please?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-16 11:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15  9:17 [PATCH v2 0/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
2026-06-15  9:17 ` [PATCH v2 1/1] " Claudio Imbrenda
2026-06-15  9:43   ` Heiko Carstens
2026-06-15 10:31     ` Claudio Imbrenda
2026-06-15 11:50       ` Heiko Carstens
2026-06-15 12:09         ` Gerald Schaefer
2026-06-16 11:06           ` Heiko Carstens
2026-06-15 16:03   ` Alexander Gordeev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox