* Re: [PATCH v4] mm/page_alloc: replace kernel_init_pages() with batch page clearing
[not found] <20260504063942.553438-1-hsalunke@amd.com>
@ 2026-05-12 8:58 ` David Hildenbrand (Arm)
0 siblings, 0 replies; only message in thread
From: David Hildenbrand (Arm) @ 2026-05-12 8:58 UTC (permalink / raw)
To: Hrushikesh Salunke, akpm, ljs, Liam.Howlett, vbabka, rppt, surenb,
mhocko, jackmanb, hannes, ziy
Cc: linux-mm, linux-kernel, rkodsara, bharata, ankur.a.arora,
shivankg
On 5/4/26 08:39, Hrushikesh Salunke wrote:
> When init_on_alloc is enabled, kernel_init_pages() clears every page
> one at a time via clear_highpage_kasan_tagged(), which incurs per-page
> kmap_local_page()/kunmap_local() overhead and prevents the architecture
> clearing primitive from operating on contiguous ranges.
>
> Introduce clear_highpages_kasan_tagged() as a static batch clearing
> helper in page_alloc.c that calls clear_pages() for the full contiguous
> range on !HIGHMEM systems, bypassing the per-page kmap overhead and
> allowing a single invocation of the arch clearing primitive across the
> entire allocation. The HIGHMEM path falls back to per-page clearing
> since those pages require kmap.
>
> Replace kernel_init_pages() with direct calls to the new helper, as it
> becomes a trivial wrapper.
>
> Allocating 8192 x 2MB HugeTLB pages (16GB) with init_on_alloc=1:
>
> Before: 0.445s
> After: 0.166s (-62.7%, 2.68x faster)
>
> Kernel time (sys) reduction per workload with init_on_alloc=1:
>
> Workload Before After Change
> Graph500 64C128T 30m 41.8s 15m 14.8s -50.3%
> Graph500 16C32T 15m 56.7s 9m 43.7s -39.0%
> Pagerank 32T 1m 58.5s 1m 12.8s -38.5%
> Pagerank 128T 2m 36.3s 1m 40.4s -35.7%
>
> Signed-off-by: Hrushikesh Salunke <hsalunke@amd.com>
> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Pankaj Gupta <pankaj.gupta@amd.com>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] only message in thread