All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Chengfeng Lin <23020251154299@stu.xmu.edu.cn>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
	linux-kernel@vger.kernel.org, regressions@lists.linux.dev,
	Pedro Falcato <pfalcato@suse.de>
Subject: Re: [REGRESSION] mm: MADV_PAGEOUT THP/no-swap refault takes ~1.7x longer on v6.19 than v6.12
Date: Mon, 18 May 2026 17:36:27 +0200	[thread overview]
Message-ID: <dfdef925-4127-4e7c-bbe1-e7b2542355da@kernel.org> (raw)
In-Reply-To: <662955ba.f499.19e3b2cf478.Coremail.23020251154299@stu.xmu.edu.cn>

On 5/18/26 15:01, Chengfeng Lin wrote:
> Hi,
> 
> I would like to report a userspace-visible mprotect() performance
> regression in a shared dirty PTE workload.
> 
> The workload is intentionally narrow:
> 
>   - anonymous shared 64 MiB mapping
>   - prefault before protection changes
>   - repeatedly toggle the whole range with mprotect(PROT_READ)
>   - restore with mprotect(PROT_READ | PROT_WRITE)
>   - write-touch after the protection cycle
> 
> This is not meant as a generic mprotect() regression report. In
> particular, I am not claiming that the anon/THP mprotect paths regress.
> The current signal is scoped to the shared-dirty full-range PTE toggle
> path above.
> 
> The current public evidence bundle is here:
> 
>   https://github.com/lcf0399/linux-mm-regression-evidence-2026-05/tree/e13469b/mprotect-shared-dirty-toggle
> 
> The generated workload source used for auditing the workload semantics is
> here:
> 
>   https://github.com/lcf0399/linux-mm-regression-evidence-2026-05/blob/e13469b/mprotect-shared-dirty-toggle/workload/mprotect_paths_storm.c
> 
> The formal experiment profile is here:
> 
>   https://github.com/lcf0399/linux-mm-regression-evidence-2026-05/tree/e13469b/mprotect-shared-dirty-toggle/experiments
> 
> The formal timing runs compare v6.12.77 and v6.19.9 with similar kernel
> configuration, using QEMU direct boot. The formal performance runs were
> clean timing runs with coverage disabled. Coverage was collected
> separately and is not used for the timing numbers below.
> 
> Lab environment:
> 
>   host label: lcf
>   host kernel: Linux 6.14.0-37-generic x86_64
>   QEMU: qemu-system-x86_64 8.2.2
>   container/cgroup CPU set: 0,2,4,6,8,10,12,14
>   container/cgroup memory limit: 16106127360 bytes
>   guest memory: QEMU_MEM_MB=14336
>   guest CPUs: QEMU_SMP=1/2/4
>   repetitions: 9
>   version order: interleaved
>   performance coverage_enabled: false
> 
> Primary result, cycle_ns_per_page, lower is better:
> 
>   CPU   v6.12.77   v6.19.9   old-lower-vs-new   v6.19/v6.12   reliability
>     1      346.8     578.1        40.0%             1.67x      reliable
>     2      394.7     641.7        38.5%             1.63x      robust-only
>     4      381.1     624.8        39.0%             1.64x      partial, same direction
> 
> The strongest current result is the 1CPU lab formal result. The 2CPU case
> is same-direction but robust-only in the framework classification. The
> 4CPU case is same-direction but partial because one QEMU run failed; the
> summary still has 8 successful runs for that CPU count.
> 
> The current mechanism hypothesis is local to the shared-dirty PTE path.
> In v6.19, the measured hot path goes through the change_pte_range()
> batching machinery:
> 
>   change_pte_range()
>     -> mprotect_folio_pte_batch()
>     -> modify_prot_start_ptes()
>     -> set_write_prot_commit_flush_ptes()
>     -> prot_commit_flush_ptes()
> 
> For this shared-dirty workload, follow-up batch-probe attribution showed
> nr_ptes=1 in the measured path. The hypothesis is that the extra folio
> lookup, batch-size query, helper dispatch, and commit machinery are paid
> per 4 KiB PTE without effective batch-size amortization in this workload.
> This is mechanism interpretation, not a completed culprit-commit bisect.
> 
> I have not bisected the exact culprit commit yet. Separate release-level
> sanity checks showed v6.18.19 already in the slow range, so the current
> best reporting range is:
> 
> #regzbot introduced: v6.12..v6.18
> 
> Please let me know if a standalone reproducer, a narrower bisect, or
> additional raw logs would be more useful.

Pedro recently optimized this:

https://lore.kernel.org/all/20260402141628.3367596-1-pfalcato@suse.de/

Maybe that fixes most of the regression for you?

-- 
Cheers,

David

  parent reply	other threads:[~2026-05-18 15:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18 13:01 [REGRESSION] mm: MADV_PAGEOUT THP/no-swap refault takes ~1.7x longer on v6.19 than v6.12 Chengfeng Lin
2026-05-18 13:10 ` [REGRESSION] mm/mprotect: shared dirty PTE toggle takes ~1.6x " Chengfeng Lin
2026-05-18 15:36 ` David Hildenbrand (Arm) [this message]
2026-05-18 17:01   ` Chengfeng Lin
2026-05-22  9:03     ` Chengfeng Lin
2026-05-25 10:29       ` Pedro Falcato
2026-05-26  7:57         ` Chengfeng Lin
2026-05-18 15:43 ` [REGRESSION] mm: MADV_PAGEOUT THP/no-swap refault takes ~1.7x " Lorenzo Stoakes
2026-05-18 16:51   ` [REGRESSION] mm/mprotect: shared dirty PTE toggle takes ~1.6x " Chengfeng Lin
  -- strict thread matches above, loose matches on Subject: below --
2026-05-18 12:59 [REGRESSION] mm: MADV_PAGEOUT THP/no-swap refault takes ~1.7x " Chengfeng Lin
2026-05-18 18:14 ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dfdef925-4127-4e7c-bbe1-e7b2542355da@kernel.org \
    --to=david@kernel.org \
    --cc=23020251154299@stu.xmu.edu.cn \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisl@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@kernel.org \
    --cc=pfalcato@suse.de \
    --cc=regressions@lists.linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.