Linux s390 Architecture development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Alexander Gordeev" <agordeev@linux.ibm.com>
Cc: "Vasily Gorbik" <gor@linux.ibm.com>,
	"Heiko Carstens" <hca@linux.ibm.com>,
	linux-s390@vger.kernel.org
Subject: Re: [PATCH -next v4 2/4] s390/mm: Batch PTE updates in lazy MMU mode
Date: Thu, 18 Jun 2026 15:10:47 +0000	[thread overview]
Message-ID: <20260618151048.063661F000E9@smtp.kernel.org> (raw)
In-Reply-To: <e50680152fc7d99215d53758fab4b87280d25c20.1781789772.git.agordeev@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Deferred PTE updates to a per-CPU cache break synchronization with lockless page table walkers like GUP-fast, allowing them to pin pages being concurrently write-protected for COW.
- [High] The lazy MMU cache bypasses the `full=1` TLB flush optimization during process teardown, causing an O(N) IPTE storm.
--

commit 38614c455538c6b6144011b070bbbf41513506e6
Author: Alexander Gordeev <agordeev@linux.ibm.com>

s390/mm: Batch PTE updates in lazy MMU mode

This commit introduces a per-CPU cache to batch PTE updates while in lazy
MMU mode. By caching PTE changes and applying them simultaneously with the
IPTE instruction's "Additional Entries" feature, it reduces overall IPTE
traffic.

> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h

[ ... ]

> @@ -1259,11 +1373,13 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
>  {
>  	pte_t res;
>  
> -	if (full) {
> -		res = ptep_get(ptep);
> -		set_pte(ptep, __pte(_PAGE_INVALID));
> -	} else {
> -		res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
> +	if (!lazy_mmu_ptep_get_and_clear(addr, ptep, &res)) {

[Severity: High]
Does checking lazy_mmu_ptep_get_and_clear() before the 'full' flag bypass
the teardown optimization and cause an IPTE storm?

During process teardown, exit_mmap() and zap_pte_range() use
ptep_get_and_clear_full(..., full=1) to clear PTEs without issuing individual
TLB invalidations, deferring the flush to a global teardown at the end.
If lazy_mmu_mode_enable_with_ptes() is active, this intercepts the call,
writes _PAGE_INVALID to the cache, and bypasses the if (full) path.

Later, when lazy_mmu_mode_disable() flushes the cache, it calls
__invalidate_pte_range() in arch/s390/mm/lazy_mmu.c:

static void __invalidate_pte_range(...)
{
	...
	if (cpu_has_tlb_lc() && ...)
		__ptep_ipte_range(addr, nr_ptes - 1, ptep, IPTE_LOCAL);
	else
		__ptep_ipte_range(addr, nr_ptes - 1, ptep, IPTE_GLOBAL);
	...
}

This unconditionally executes batched IPTE instructions for all cached PTEs.
For multi-threaded processes, does this turn an O(1) TLB flush operation
into an O(N) global IPTE broadcast, causing a regression in system
performance during process termination?

> +		if (full) {
> +			res = __ptep_get(ptep);
> +			__set_pte(ptep, __pte(_PAGE_INVALID));
> +		} else {
> +			res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
> +		}
>  	}
>  	page_table_check_pte_clear(mm, addr, res);

[ ... ]

> @@ -1289,10 +1405,15 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
>  static inline void ptep_set_wrprotect(struct mm_struct *mm,
>  				      unsigned long addr, pte_t *ptep)
>  {
> -	pte_t pte = ptep_get(ptep);
> +	pte_t pte;
>  
> -	if (pte_write(pte))
> -		ptep_xchg_lazy(mm, addr, ptep, pte_wrprotect(pte));
> +	if (!lazy_mmu_ptep_set_wrprotect(addr, ptep)) {

[Severity: Critical]
Can intercepting the write-protect operation here break synchronization with
lockless page table walkers like GUP-fast?

During fork(), copy_pte_range() calls lazy_mmu_mode_enable_with_ptes().
When copy_present_pte() attempts to write-protect the parent's PTE by
calling ptep_set_wrprotect(), the lazy MMU mode intercepts the call via
__lazy_mmu_ptep_set_wrprotect() in arch/s390/mm/lazy_mmu.c:

bool __lazy_mmu_ptep_set_wrprotect(...)
{
	...
	pte = __ipte_range_ptep_get(range, ptep);
	if (pte_write(pte)) {
		pte = pte_wrprotect(pte);
		__ipte_range_set_pte(range, ptep, pte);
	}
	return true;
}

This updates the per-CPU cache but leaves the physical PTE completely valid
and writable until the cache is flushed at the end of the PMD range. Because
GUP-fast operates locklessly, a concurrent thread in the parent process can
read the physical PTE, observe that it is still writable, and successfully
take a FOLL_WRITE pin on the page.

Meanwhile, fork() proceeds, sees no pins, and shares the page with the
child. Does this create a regression where the parent retains a writable
GUP pin to a page shared with the child, allowing writes to corrupt the
child's memory and break COW isolation?

> +		pte = __ptep_get(ptep);
> +		if (pte_write(pte)) {
> +			pte = pte_wrprotect(pte);
> +			ptep_xchg_lazy(mm, addr, ptep, pte);
> +		}
> +	}
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1781789772.git.agordeev@linux.ibm.com?part=2

  reply	other threads:[~2026-06-18 15:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-18 14:47 [PATCH -next v4 0/4] s390/mm: Batch PTE updates in lazy MMU mode Alexander Gordeev
2026-06-18 14:47 ` [PATCH -next v4 1/4] mm: Make lazy MMU mode context-aware Alexander Gordeev
2026-06-18 14:47 ` [PATCH -next v4 2/4] s390/mm: Batch PTE updates in lazy MMU mode Alexander Gordeev
2026-06-18 15:10   ` sashiko-bot [this message]
2026-06-18 14:47 ` [PATCH -next v4 3/4] mm/kasan: Introduce helpers for lazy MMU mode sanitizer Alexander Gordeev
2026-06-18 15:00   ` sashiko-bot
2026-06-18 14:47 ` [PATCH -next v4 4/4] s390/mm: Lazy " Alexander Gordeev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260618151048.063661F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=linux-s390@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox