From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C0C1F37E31A for ; Thu, 23 Apr 2026 11:12:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776942770; cv=none; b=F5ZqutGzvw75G+QShqhyX93Nx5X8b+nN+ZDZsS14C5TC7R9MBohVw3XPZCxS/40CO5jR2hg+8g63pRRyZm6ksMuDeZeSEAnHVmSfGjmo55XGrGh2WtQSltNo39fG6Xhpuj//kWIspAl5uT2VyazyTPmeas3KNEBE6pxrmrc75Wk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776942770; c=relaxed/simple; bh=aG4UEfRKsS91O3VxSN0LQyWp1NS50airK/bhwBveSOc=; h=Date:From:To:Cc:Subject:Message-Id:In-Reply-To:References: Mime-Version:Content-Type; b=AEOMvH3cv1TIgIOCqfNpQ6YAfrvDiJCslFnATafjjFAyIBndBl9Byh6BfF9qtUVXb7WPcfPZxAzxmpKmeyK8JkX8h8hRVDAaFQYX7RCHhzXSTFxSxdvbwvmMzKeCN945WULwLh9mIemKWPlNI7RHlPsbZ6WFHvs8CnF9+N2QezQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=vULEhAPo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="vULEhAPo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CB4A6C2BCB2; Thu, 23 Apr 2026 11:12:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1776942770; bh=aG4UEfRKsS91O3VxSN0LQyWp1NS50airK/bhwBveSOc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=vULEhAPopc8qt2HKcJ+ZpRi46sD2pitq3K9i3RNrzavSVfWwwfEn18780bqNu9Xok bI22VPL19Ky42B1brpsaJkaLMfz8fIBtfPEjeyLgb6z61B8EF5oI/IKR7Wg47POe+e akDmnid3ESV01CXGb4b7NGUwntsyB4dR4oSJL9No= Date: Thu, 23 Apr 2026 04:12:49 -0700 From: Andrew Morton To: Hrushikesh Salunke Cc: , , , , , , , , , , , , , , , Subject: Re: [PATCH v3] mm/page_alloc: replace kernel_init_pages() with batch page clearing Message-Id: <20260423041249.156eb95889696ccfaf23dca1@linux-foundation.org> In-Reply-To: <20260422102729.166599-1-hsalunke@amd.com> References: <20260422102729.166599-1-hsalunke@amd.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Wed, 22 Apr 2026 10:26:58 +0000 Hrushikesh Salunke wrote: > When init_on_alloc is enabled, kernel_init_pages() clears every page > one at a time via clear_highpage_kasan_tagged(), which incurs per-page > kmap_local_page()/kunmap_local() overhead and prevents the architecture > clearing primitive from operating on contiguous ranges. > > Introduce clear_highpages_kasan_tagged() in highmem.h, a batch > clearing helper that calls clear_pages() for the full contiguous range > on !HIGHMEM systems, bypassing the per-page kmap overhead and allowing > a single invocation of the arch clearing primitive across the entire > allocation. The HIGHMEM path falls back to per-page clearing since > those pages require kmap. > > Replace kernel_init_pages() with direct calls to the new helper, as it > becomes a trivial wrapper. > > Allocating 8192 x 2MB HugeTLB pages (16GB) with init_on_alloc=1: > > Before: 0.445s > After: 0.166s (-62.7%, 2.68x faster) Nice. > Kernel time (sys) reduction per workload with init_on_alloc=1: > > Workload Before After Change > Graph500 64C128T 30m 41.8s 15m 14.8s -50.3% > Graph500 16C32T 15m 56.7s 9m 43.7s -39.0% > Pagerank 32T 1m 58.5s 1m 12.8s -38.5% > Pagerank 128T 2m 36.3s 1m 40.4s -35.7% > > ... > > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -345,6 +345,21 @@ static inline void clear_highpage_kasan_tagged(struct page *page) > kunmap_local(kaddr); > } > > +static inline void clear_highpages_kasan_tagged(struct page *page, int numpages) > +{ > + /* s390's use of memset() could override KASAN redzones. */ > + kasan_disable_current(); > + if (!IS_ENABLED(CONFIG_HIGHMEM)) { > + clear_pages(kasan_reset_tag(page_address(page)), numpages); > + } else { > + int i; > + > + for (i = 0; i < numpages; i++) > + clear_highpage_kasan_tagged(page + i); > + } > + kasan_enable_current(); > +} Why was it globally published and inlined? Is there any expectation that this will be used outside of page_alloc.c? Both of the callsites are themselves inlined. The patch adds 330 bytes to my arm allmodcnfig page_alloc.o - did we gain anything from that?