Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Andrea Righi <arighi@nvidia.com>,
	Changwoo Min <changwoo@igalia.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Thomas Gleixner <tglx@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@kernel.org>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	sched-ext@lists.linux.dev, bpf@vger.kernel.org, x86@kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/8] mm: Add ptep_try_install() for lockless empty-slot installs
Date: Mon, 18 May 2026 22:58:11 -1000	[thread overview]
Message-ID: <297658c4ae2d6e7103f5968efc936224@kernel.org> (raw)
In-Reply-To: <2f02d90d-cdc9-48ef-abe3-99e00f22595f@kernel.org>

Hello, David.

On Tue, May 19, 2026 at 10:00:39AM +0200, David Hildenbrand (Arm) wrote:
> Is that really possible? I'd much rather prefer to trylock and retry, unless
> that can really result in deadlocks. But I have the feeling that such deadlocks
> should be impossible here.

I'm not well versed in either mm or BPF, so the BPF folks will have a
better take. But here's a scenario that seemed plausible to me:

1. A bpf prog calls bpf_arena_alloc_pages() on its arena. The kernel
   takes arena->spinlock via raw_res_spin_lock_irqsave().
2. Under the lock, the alloc path goes through bpf_map_alloc_pages()
   -> alloc_pages_node(), which fires trace_mm_page_alloc().
3. A BPF tracepoint program on mm_page_alloc that shares the arena
   starts running with the lock still held.
4. The tracepoint program calls a kfunc, passing an arena pointer
   one entry past the array it meant to touch.
5. The kfunc dereferences. The kernel-side address is unbacked, so
   the CPU faults.

trylock + retry at 5 would A-A deadlock.

> For example, staring at apply_range_set_cb(), what prevents:
>
> (1) apply_range_set_cb() finding pte_none(ptep_get(pte)
> (2) apply_range_set_scratch_cb() succeeding ptep_try_install()
> (3) apply_range_set_cb() overwriting the pte with set_pte_at()
>
> Between (2) and (3) CPUs could access the scratch PTE.

Scratch only gets installed when BPF passes an unallocated arena
address to the kernel side, which is itself the violation, reported
through the program's BPF stream. Behavior at that addr is then
undefined. For scx, the scheduler should be aborted and torn down.

The only requirements are that the kernel doesn't oops and the
violation gets caught. Beyond that, behavior at the address is
unspecified, and which installer wins the race doesn't matter as
long as kernel integrity holds.

Thanks.

--
tejun


  reply	other threads:[~2026-05-19  8:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-17 21:12 [PATCHSET v2 sched_ext/for-7.2] bpf/arena: Direct kernel-side access Tejun Heo
2026-05-17 21:12 ` [PATCH 1/8] mm: Add ptep_try_install() for lockless empty-slot installs Tejun Heo
2026-05-18  8:06   ` David Hildenbrand (Arm)
2026-05-18 19:53     ` Tejun Heo
2026-05-19  8:00       ` David Hildenbrand (Arm)
2026-05-19  8:58         ` Tejun Heo [this message]
2026-05-19  9:05           ` David Hildenbrand (Arm)
2026-05-19  9:40             ` David Hildenbrand (Arm)
2026-05-17 21:12 ` [PATCH 2/8] bpf: Recover arena kernel faults with scratch page Tejun Heo
2026-05-17 21:12 ` [PATCH 3/8] bpf: Add sleepable variant of bpf_arena_alloc_pages for kernel callers Tejun Heo
2026-05-17 21:12 ` [PATCH 4/8] bpf: Add bpf_struct_ops_for_each_prog() Tejun Heo
2026-05-17 21:12 ` [PATCH 5/8] bpf/arena: Add bpf_arena_map_kern_vm_start() and bpf_prog_arena() Tejun Heo
2026-05-17 21:12 ` [PATCH 6/8] sched_ext: Require an arena for cid-form schedulers Tejun Heo
2026-05-17 21:12 ` [PATCH 7/8] sched_ext: Sub-allocator over kernel-claimed BPF arena pages Tejun Heo
2026-05-18  7:20   ` Peter Zijlstra
2026-05-18 19:51     ` Tejun Heo
2026-05-18 23:26       ` Alexei Starovoitov
2026-05-17 21:12 ` [PATCH 8/8] sched_ext: Convert ops.set_cmask() to arena-resident cmask Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=297658c4ae2d6e7103f5968efc936224@kernel.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=changwoo@igalia.com \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=emil@etsalapatis.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=mingo@redhat.com \
    --cc=rppt@kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tglx@kernel.org \
    --cc=void@manifault.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox