[PATCH 0/4] mm: Fix apply_to_pte_range() vs lazy MMU mode

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alexander Gordeev <agordeev@linux.ibm.com>
To: Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kasan-dev@googlegroups.com, sparclinux@vger.kernel.org,
	xen-devel@lists.xenproject.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Guenter Roeck <linux@roeck-us.net>,
	Juergen Gross <jgross@suse.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>
Subject: [PATCH 0/4] mm: Fix apply_to_pte_range() vs lazy MMU mode
Date: Fri, 28 Mar 2025 10:13:38 +0100	[thread overview]
Message-ID: <cover.1743079720.git.agordeev@linux.ibm.com> (raw)

Hi All!

On s390 if make arch_enter_lazy_mmu_mode() do preempt_enable() and
arch_leave_lazy_mmu_mode() do preempt_disable() I am getting this:

    [  553.332108] preempt_count: 1, expected: 0
    [  553.332117] no locks held by multipathd/2116.
    [  553.332128] CPU: 24 PID: 2116 Comm: multipathd Kdump: loaded Tainted:
    [  553.332139] Hardware name: IBM 3931 A01 701 (LPAR)
    [  553.332146] Call Trace:
    [  553.332152]  [<00000000158de23a>] dump_stack_lvl+0xfa/0x150 
    [  553.332167]  [<0000000013e10d12>] __might_resched+0x57a/0x5e8 
    [  553.332178]  [<00000000144eb6c2>] __alloc_pages+0x2ba/0x7c0 
    [  553.332189]  [<00000000144d5cdc>] __get_free_pages+0x2c/0x88 
    [  553.332198]  [<00000000145663f6>] kasan_populate_vmalloc_pte+0x4e/0x110 
    [  553.332207]  [<000000001447625c>] apply_to_pte_range+0x164/0x3c8 
    [  553.332218]  [<000000001448125a>] apply_to_pmd_range+0xda/0x318 
    [  553.332226]  [<000000001448181c>] __apply_to_page_range+0x384/0x768 
    [  553.332233]  [<0000000014481c28>] apply_to_page_range+0x28/0x38 
    [  553.332241]  [<00000000145665da>] kasan_populate_vmalloc+0x82/0x98 
    [  553.332249]  [<00000000144c88d0>] alloc_vmap_area+0x590/0x1c90 
    [  553.332257]  [<00000000144ca108>] __get_vm_area_node.constprop.0+0x138/0x260 
    [  553.332265]  [<00000000144d17fc>] __vmalloc_node_range+0x134/0x360 
    [  553.332274]  [<0000000013d5dbf2>] alloc_thread_stack_node+0x112/0x378 
    [  553.332284]  [<0000000013d62726>] dup_task_struct+0x66/0x430 
    [  553.332293]  [<0000000013d63962>] copy_process+0x432/0x4b80 
    [  553.332302]  [<0000000013d68300>] kernel_clone+0xf0/0x7d0 
    [  553.332311]  [<0000000013d68bd6>] __do_sys_clone+0xae/0xc8 
    [  553.332400]  [<0000000013d68dee>] __s390x_sys_clone+0xd6/0x118 
    [  553.332410]  [<0000000013c9d34c>] do_syscall+0x22c/0x328 
    [  553.332419]  [<00000000158e7366>] __do_syscall+0xce/0xf0 
    [  553.332428]  [<0000000015913260>] system_call+0x70/0x98 

I guess, commit b9ef323ea168 ("powerpc/64s: Disable preemption in hash
lazy mmu mode") (albeit not completely) fixed similar issue on PPC:

    apply_to_page_range on kernel pages does not disable preemption, which         
    is a requirement for hash's lazy mmu mode, which keeps track of the            
    TLBs to flush with a per-cpu array.                                            

This series is an attempt to fix the violation of lazy MMU mode context
as described for arch_enter_lazy_mmu_mode():

    This mode can only be entered and left under the protection of  
    the page table locks for all page tables which may be modified.

If I am not mistaken, xen and sparc are also prone to the described
problem, as they use this_cpu_ptr() rather than get_cpu_var().

Take init_mm.page_table_lock for kernel tables to avoid all of that.

Thanks!

Alexander Gordeev (4):
  kasan: Avoid sleepable page allocation from atomic context
  mm: Allow detection of wrong arch_enter_lazy_mmu_mode() context
  mm: Cleanup apply_to_pte_range() routine
  mm: Protect kernel pgtables in apply_to_pte_range()

 include/linux/pgtable.h | 15 ++++++++++++---
 mm/kasan/shadow.c       |  9 +++------
 mm/memory.c             | 33 +++++++++++++++++++++------------
 3 files changed, 36 insertions(+), 21 deletions(-)

-- 
2.45.2

next             reply	other threads:[~2025-03-28  9:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-28  9:13 Alexander Gordeev [this message]
2025-03-28  9:13 ` [PATCH 1/4] kasan: Avoid sleepable page allocation from atomic context Alexander Gordeev
2025-03-28  9:13 ` [PATCH 2/4] mm: Allow detection of wrong arch_enter_lazy_mmu_mode() context Alexander Gordeev
2025-03-28  9:13 ` [PATCH 3/4] mm: Cleanup apply_to_pte_range() routine Alexander Gordeev
2025-03-28  9:13 ` [PATCH 4/4] mm: Protect kernel pgtables in apply_to_pte_range() Alexander Gordeev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1743079720.git.agordeev@linux.ibm.com \
    --to=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jeremy@goop.org \
    --cc=jgross@suse.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=ryabinin.a.a@gmail.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.