Re: [PATCH] mm/page_alloc: use batch page clearing in kernel_init_pages()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Salunke, Hrushikesh" <hsalunke@amd.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>, <surenb@google.com>,
	<mhocko@suse.com>, <jackmanb@google.com>, <hannes@cmpxchg.org>,
	<ziy@nvidia.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <rkodsara@amd.com>,
	<bharata@amd.com>, <ankur.a.arora@oracle.com>, <shivankg@amd.com>,
	<hsalunke@amd.com>
Subject: Re: [PATCH] mm/page_alloc: use batch page clearing in kernel_init_pages()
Date: Thu, 9 Apr 2026 14:58:29 +0530	[thread overview]
Message-ID: <5c01d4ba-4453-47ac-9904-6ad6dbd69c2c@amd.com> (raw)
In-Reply-To: <4dd26573-85cc-446a-b2b7-2aeab8aa2417@kernel.org>


On 09-04-2026 14:30, David Hildenbrand (Arm) wrote:

> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On 4/9/26 10:55, Salunke, Hrushikesh wrote:
>> On 08-04-2026 21:02, Andrew Morton wrote:
>>
>>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>>
>>>
>>> On Wed, 8 Apr 2026 16:14:03 +0530 "Salunke, Hrushikesh" <hsalunke@amd.com> wrote:
>>>
>>>> kernel_init_pages() runs inside the allocator (post_alloc_hook and
>>>> __free_pages_prepare), so it inherits whatever context the caller is in.
>>>> Testing with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_PROVE_LOCKING=y, I
>>>> hit this during exit_group() -> exit_mmap() -> __zap_vma_range, where a
>>>> page allocation happens while the PTE lock and RCU read lock are held,
>>>> making the cond_resched() in the clearing loop illegal:
>>>>
>>>> [ 1997.353228] BUG: sleeping function called from invalid context at mm/page_alloc.c:1235
>>>> [ 1997.353433] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 19725, name: bash
>>>> [ 1997.353572] preempt_count: 1, expected: 0
>>>> [ 1997.353706] RCU nest depth: 1, expected: 0
>>>> [ 1997.353837] 3 locks held by bash/19725:
>>>> [ 1997.353839]  #0: ff38cd415971e540 (&mm->mmap_lock){++++}-{4:4}, at: exit_mmap+0x6e/0x430
>>>> [ 1997.353850]  #1: ffffffffb03d6f60 (rcu_read_lock){....}-{1:3}, at: __pte_offset_map+0x2c/0x220
>>>> [ 1997.353855]  #2: ff38cd410deb4618 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: pte_offset_map_lock+0x92/0x170
>>>> [ 1997.353868] Call Trace:
>>>> [ 1997.353870]  <TASK>
>>>> [ 1997.353873]  dump_stack_lvl+0x91/0xb0
>>>> [ 1997.353877]  __might_resched+0x15f/0x290
>>>> [ 1997.353882]  kernel_init_pages+0x4b/0xa0
>>>> [ 1997.353886]  get_page_from_freelist+0x406/0x1e60
>>>> [ 1997.353895]  __alloc_frozen_pages_noprof+0x1d8/0x1730
>>>> [ 1997.353912]  alloc_pages_mpol+0xa4/0x190
>>>> [ 1997.353917]  alloc_pages_noprof+0x59/0xd0
>>>> [ 1997.353919]  get_free_pages_noprof+0x11/0x40
>>>> [ 1997.353921]  __tlb_remove_folio_pages_size.isra.0+0x7f/0xe0
>>>> [ 1997.353923]  __zap_vma_range+0x1bbd/0x1f40
>>>> [ 1997.353931]  unmap_vmas+0xd9/0x1d0
>>>> [ 1997.353934]  exit_mmap+0x10a/0x430
>>>> [ 1997.353943]  __mmput+0x3d/0x130
>>>> [ 1997.353947]  do_exit+0x2a7/0xae0
>>> tlb_next_batch() is (fortunately) using GFP_NOWAIT.  Perhaps you can
>>> alter your patch to not call the cond_resched() if caller is attempting
>>> an atomic allocation.
>>
>> Thanks Vlastimil, David, Andrew, and Raghu for the reviews.
>>
>> After looking into this more, I think adding cond_resched() here was
>> overkill. I agree that dropping cond_resched() and
>> PROCESS_PAGES_NON_PREEMPT_BATCH entirely and just calling clear_pages()
>> is the right approach. There's no case where cond_resched() in
>> kernel_init_pages() is both necessary and safe:
>>
>> - It's unsafe in atomic context, as the BUG shows (tlb_next_batch()
>>   allocates under PTE lock + RCU read lock via GFP_NOWAIT).
>> - It's unnecessary for common allocations (order-0, mTHP, 2MB) which
>>   clear in well under 1ms.
>> - For 1 GiB hugepages, kernel_init_pages() only runs during the
>>   initial admin-triggered allocation. When processes later fault on
>>   those pages, clearing goes through folio_zero_user() ->
>>   clear_contig_highpages(), not kernel_init_pages().
>>
>> So rather than guarding cond_resched() with GFP flags (as Andrew
>> suggested), I'll remove it entirely in v2 to keep things simple and
>> same scheduling characteristics as the original code, just with the
>> batch clearing performance benefit.
>>
>> Regarding the 512 MiB arm64 case that David mentioned the stall from
>> clearing that without cond_resched() under PREEMPT_NONE is acceptable,
>> or should it be handled differently?

Thanks David.

> I mean, it would already happen today, because there is no
> cond_resched(). So nothing to worry about I guess.

Right makes sense.

>
>> I can introduce clear_highpages_kasan_tagged() / clear_highpages()
>> helpers, or keep v2 minimal with the logic inline in
>> kernel_init_pages(). Any preference?
> I'd prefer not sprinkling IS_ENABLED(CONFIG_HIGHMEM) around and simply
> calling a clear_highpages_kasan_tagged() from kernel_init_pages().
>
Sounds good. I'll add a clear_highpages_kasan_tagged() helper in
highmem.h and have kernel_init_pages() call it directly. Will send
v2 shortly.

Regards,
Hrushikesh

next prev parent reply	other threads:[~2026-04-09  9:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08  9:24 [PATCH] mm/page_alloc: use batch page clearing in kernel_init_pages() Hrushikesh Salunke
2026-04-08  9:47 ` Vlastimil Babka (SUSE)
2026-04-08 10:44   ` Salunke, Hrushikesh
2026-04-08 10:53     ` David Hildenbrand (Arm)
2026-04-08 11:16     ` Raghavendra K T
2026-04-08 16:24       ` Raghavendra K T
2026-04-08 15:32     ` Andrew Morton
2026-04-09  8:55       ` Salunke, Hrushikesh
2026-04-09  9:00         ` David Hildenbrand (Arm)
2026-04-09  9:28           ` Salunke, Hrushikesh [this message]
2026-04-09 12:02           ` Michal Hocko
2026-04-08 11:32 ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c01d4ba-4453-47ac-9904-6ad6dbd69c2c@amd.com \
    --to=hsalunke@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=ankur.a.arora@oracle.com \
    --cc=bharata@amd.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=rkodsara@amd.com \
    --cc=shivankg@amd.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.