All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Tejun Heo <tj@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	David Vernet <void@manifault.com>,
	Andrea Righi <arighi@nvidia.com>,
	Changwoo Min <changwoo@igalia.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Thomas Gleixner <tglx@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@kernel.org>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	sched-ext@lists.linux.dev, bpf <bpf@vger.kernel.org>,
	X86 ML <x86@kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/8] mm: Add ptep_try_install() for lockless empty-slot installs
Date: Wed, 20 May 2026 23:29:07 +0200	[thread overview]
Message-ID: <210d140d-c78a-47be-8046-caae359db6db@kernel.org> (raw)
In-Reply-To: <ag3_qpKObFOLKgbR@slm.duckdns.org>

On 5/20/26 20:38, Tejun Heo wrote:
> On Wed, May 20, 2026 at 10:41:15AM +0200, David Hildenbrand (Arm) wrote:
>> And that should be carefully documented.
> 
> Will do.
> 
>>> The fault handler has to install _some_ page and let kernel continue.
>>> Scratch page or arena page doesn't matter. Potentially different CPUs
>>> will see different page. It's not a concern at all.
>>> bpf prog is buggy, but the kernel will continue to work without a glitch.
>>> bpf runtime will disable and unload misbehaving prog.
>>
>> Having one page table walker overwrite a scratch page on race is just rather ...
>> questionable locking design, that just screams for problems long-term, really.
>>
>> At least in apply_range_clear_cb() one could similarly switch to
>> ptep_try_install() to at least have both these paths handle races in a
>> reasonable way. (having to handle when ptep_try_install() is not really implemented)
> 
> Hmm... wouldn't that be more confusing on apply_range_clear_cb() side?
> Whether it maintains the current behavior (if collide with scratch, try
> again with scratch as the original value) or flip the behavior (fail if
> scratch), that extra logic is spurious, and those tend to confuse people as
> they force asking why it has to be that specific way. If the goal is
> documenting the subtlety, wouldn't a detailed comment serve the purpose
> better?

I would let all operations that can happen concurrently work with atomics, not
just a single one.

Alexei correctly concluded that ptep_get_and_clear() might be one option that
should do on x86 and arm64 from a quick glimpse.

> 
>> Anyhow, the documentation of ptep_try_install() must clearly spell out that this
>> must be used very carefully, and only in special kernel page tables, never user
>> page tables. There are likely other scenarios we should document (caller must
>> prevent concurrent page table teardown somehow, and must be prepared to handle
>> races if other code is not using atomics).
>>
>> To highlight that, we should likely consider adding a "kernel" in the name, like
>> "ptep_try_install_kernel()".
>>
>> I am also not sure if "install" is the right terminology and whether it should
>> instead be "ptep_try_set()". (set_pte_at is the non-atomic interface right now)
> 
> So, ptep_try_set_kernel()?

I guess we could do that, but we don't have a lot of existing functions that are
specific to kernel page tables ... only pte_offset_kernel() and
pte_alloc_one_kernel()/pte_free_kernel().

The most important part is getting the documentation right.

We should probably document that code concurrently clearing PTEs should be using
ptep_get_and_clear().

-- 
Cheers,

David


  reply	other threads:[~2026-05-20 21:29 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-17 21:12 [PATCHSET v2 sched_ext/for-7.2] bpf/arena: Direct kernel-side access Tejun Heo
2026-05-17 21:12 ` [PATCH 1/8] mm: Add ptep_try_install() for lockless empty-slot installs Tejun Heo
2026-05-18  8:06   ` David Hildenbrand (Arm)
2026-05-18 19:53     ` Tejun Heo
2026-05-19  8:00       ` David Hildenbrand (Arm)
2026-05-19  8:58         ` Tejun Heo
2026-05-19  9:05           ` David Hildenbrand (Arm)
2026-05-19  9:40             ` David Hildenbrand (Arm)
2026-05-19 17:11               ` Tejun Heo
2026-05-19 19:04                 ` Alexei Starovoitov
2026-05-20  8:41                   ` David Hildenbrand (Arm)
2026-05-20 14:31                     ` Alexei Starovoitov
2026-05-20 21:22                       ` David Hildenbrand (Arm)
2026-05-20 18:38                     ` Tejun Heo
2026-05-20 21:29                       ` David Hildenbrand (Arm) [this message]
2026-05-25 15:50   ` patchwork-bot+netdevbpf
2026-05-17 21:12 ` [PATCH 2/8] bpf: Recover arena kernel faults with scratch page Tejun Heo
2026-05-17 21:12 ` [PATCH 3/8] bpf: Add sleepable variant of bpf_arena_alloc_pages for kernel callers Tejun Heo
2026-05-17 21:12 ` [PATCH 4/8] bpf: Add bpf_struct_ops_for_each_prog() Tejun Heo
2026-05-17 21:12 ` [PATCH 5/8] bpf/arena: Add bpf_arena_map_kern_vm_start() and bpf_prog_arena() Tejun Heo
2026-05-17 21:12 ` [PATCH 6/8] sched_ext: Require an arena for cid-form schedulers Tejun Heo
2026-05-17 21:12 ` [PATCH 7/8] sched_ext: Sub-allocator over kernel-claimed BPF arena pages Tejun Heo
2026-05-18  7:20   ` Peter Zijlstra
2026-05-18 19:51     ` Tejun Heo
2026-05-18 23:26       ` Alexei Starovoitov
2026-05-20 23:47         ` Tejun Heo
2026-05-21  7:27           ` Peter Zijlstra
2026-05-17 21:12 ` [PATCH 8/8] sched_ext: Convert ops.set_cmask() to arena-resident cmask Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=210d140d-c78a-47be-8046-caae359db6db@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=changwoo@igalia.com \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=emil@etsalapatis.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=mingo@redhat.com \
    --cc=rppt@kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tglx@kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.