public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Puranjay Mohan <puranjay@kernel.org>
To: bpf@vger.kernel.org
Cc: Puranjay Mohan <puranjay@kernel.org>,
	Puranjay Mohan <puranjay12@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>,
	kernel-team@meta.com
Subject: [PATCH bpf v6 0/3] bpf: fix and improve open-coded task_vma iterator
Date: Wed,  8 Apr 2026 08:45:34 -0700	[thread overview]
Message-ID: <20260408154539.3832150-1-puranjay@kernel.org> (raw)

Changelog:
v5: https://lore.kernel.org/all/20260326151111.4002475-1-puranjay@kernel.org/
Changes in v6:
- Replace local_irq_disable() + get_task_mm() with spin_trylock() on
  alloc_lock to avoid a softirq deadlock: if the target task holds its
  alloc_lock and gets interrupted, a softirq BPF program iterating
  that task would deadlock on task_lock() (Gemini)
- Gate on CONFIG_MMU in patch 1 so that the mmput() fallback in
  bpf_iter_mmput_async() cannot sleep in non-sleepable BPF context
  on NOMMU; patch 2 tightens this to CONFIG_PER_VMA_LOCK (Gemini)
- Merge the split if (irq_work_busy) / if (!mmap_read_trylock())
  back into a single if statement in patch 1 (Andrii)
- Flip comparison direction in bpf_iter_task_vma_find_next() so both
  the locked and unlocked VMA failure cases read consistently:
  end <= next_addr → PAGE_SIZE, else - use end (Andrii)
- Add Acked-by from Andrii on patch 3

v4: https://lore.kernel.org/all/20260316185736.649940-1-puranjay@kernel.org/
Changes in v5:
- Use get_task_mm() instead of a lockless task->mm read followed by
  mmget_not_zero() to fix a use-after-free: mm_struct is not
  SLAB_TYPESAFE_BY_RCU, so the lockless pointer can go stale (AI)
- Add a local bpf_iter_mmput_async() wrapper with #ifdef CONFIG_MMU
  to avoid modifying fork.c and sched/mm.h outside the BPF tree
- Drop the fork.c and sched/mm.h changes that widened the
  mmput_async() #if guard
- Disable IRQs around get_task_mm() to prevent raw tracepoint
  re-entrancy from deadlocking on task_lock()

v3: https://lore.kernel.org/all/20260311225726.808332-1-puranjay@kernel.org/
Changes in v4:
- Disable task_vma iterator in irq_disabled() contexts to mitigate deadlocks (Alexei)
- Use a helper function to reset the snapshot (Andrii)
- Remove the redundant snap->vm_mm = kit->data->mm; (Andrii)
- Remove all irq_work deferral as the iterator will not work in
  irq_disabled() sections anymore and _new() will return -EBUSY early.

v2: https://lore.kernel.org/all/20260309155506.23490-1-puranjay@kernel.org/
Changes in v3:
- Remove the rename patch 1 (Andrii)
- Put the irq_work in the iter data, per-cpu slot is not needed (Andrii)
- Remove the unnecessary !in_hardirq() in the deferral path (Alexei)
- Use PAGE_SIZE advancement in case vma shrinks back to maintain the
  forward progress guarantee (AI)

v1: https://lore.kernel.org/all/20260304142026.1443666-1-puranjay@kernel.org/
Changes in v2:
- Add a preparatory patch to rename mmap_unlock_irq_work to
  bpf_iter_mm_irq_work (Mykyta)
- Fix bpf_iter_mmput() to also defer for IRQ disabled regions (Alexei)
- Fix a build issue where mmpu_async() is not available without
  CONFIG_MMU (kernel test robot)
- Reuse mmap_unlock_irq_work (after rename) for mmput (Mykyta)
- Move vma lookup (retry block) to a separate function (Mykyta)

This series fixes the mm lifecycle handling in the open-coded task_vma
BPF iterator and switches it from mmap_lock to per-VMA locking to reduce
contention. It then fixes a deadlock that is caused by holding locks
accross the body of the iterator where faulting is allowed.

Patch 1 fixes a use-after-free where task->mm was read locklessly and
could be freed before the iterator used it. It uses a trylock on
alloc_lock to safely read task->mm and acquire an mm reference, and
disables the iterator in irq_disabled() contexts by returning -EBUSY
from _new().

Patch 2 switches from holding mmap_lock for the entire iteration to
per-VMA locking via lock_vma_under_rcu(). This still doesn't fix the
deadlock problem because holding the per-vma lock for the whole
iteration can still cause lock ordering issues when a faultable helper
is called in the body of the iterator.

Patch 3 resolves the lock ordering problems caused by holding the
per-VMA lock or the mmap_lock (not applicable after patch 2) across BPF
program execution. It snapshots VMA fields under the lock, then drops
the lock before returning to the BPF program. File references are
managed via get_file()/fput() across iterations.

Puranjay Mohan (3):
  bpf: fix mm lifecycle in open-coded task_vma iterator
  bpf: switch task_vma iterator from mmap_lock to per-VMA locks
  bpf: return VMA snapshot from task_vma iterator

 kernel/bpf/task_iter.c | 151 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 136 insertions(+), 15 deletions(-)


base-commit: d8a9a4b11a137909e306e50346148fc5c3b63f9d
-- 
2.52.0


             reply	other threads:[~2026-04-08 15:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 15:45 Puranjay Mohan [this message]
2026-04-08 15:45 ` [PATCH bpf v6 1/3] bpf: fix mm lifecycle in open-coded task_vma iterator Puranjay Mohan
2026-04-08 15:45 ` [PATCH bpf v6 2/3] bpf: switch task_vma iterator from mmap_lock to per-VMA locks Puranjay Mohan
2026-04-08 15:45 ` [PATCH bpf v6 3/3] bpf: return VMA snapshot from task_vma iterator Puranjay Mohan
2026-04-09 13:15   ` Mykyta Yatsenko
2026-04-10 19:10 ` [PATCH bpf v6 0/3] bpf: fix and improve open-coded " patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260408154539.3832150-1-puranjay@kernel.org \
    --to=puranjay@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=mykyta.yatsenko5@gmail.com \
    --cc=puranjay12@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox