From: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
To: Puranjay Mohan <puranjay@kernel.org>, bpf@vger.kernel.org
Cc: Puranjay Mohan <puranjay12@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
kernel-team@meta.com
Subject: Re: [PATCH bpf v6 3/3] bpf: return VMA snapshot from task_vma iterator
Date: Thu, 9 Apr 2026 14:15:10 +0100 [thread overview]
Message-ID: <debb9f27-443a-4277-b252-de8efc09697a@gmail.com> (raw)
In-Reply-To: <20260408154539.3832150-4-puranjay@kernel.org>
On 4/8/26 4:45 PM, Puranjay Mohan wrote:
> Holding the per-VMA lock across the BPF program body creates a lock
> ordering problem when helpers acquire locks that depend on mmap_lock:
>
> vm_lock -> i_rwsem -> mmap_lock -> vm_lock
>
> Snapshot the VMA under the per-VMA lock in _next() via memcpy(), then
> drop the lock before returning. The BPF program accesses only the
> snapshot.
>
> The verifier only trusts vm_mm and vm_file pointers (see
> BTF_TYPE_SAFE_TRUSTED_OR_NULL in verifier.c). vm_file is reference-
> counted with get_file() under the lock and released via fput() on the
> next iteration or in _destroy(). vm_mm is already correct because
> lock_vma_under_rcu() verifies vma->vm_mm == mm. All other pointers
> are left as-is by memcpy() since the verifier treats them as untrusted.
>
> Fixes: 4ac454682158 ("bpf: Introduce task_vma open-coded iterator kfuncs")
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> ---
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
> kernel/bpf/task_iter.c | 42 ++++++++++++++++++++++++++++++------------
> 1 file changed, 30 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 87e87f18913d..e791ae065c39 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -808,7 +808,7 @@ static inline void bpf_iter_mmput_async(struct mm_struct *mm)
> struct bpf_iter_task_vma_kern_data {
> struct task_struct *task;
> struct mm_struct *mm;
> - struct vm_area_struct *locked_vma;
> + struct vm_area_struct snapshot;
> u64 next_addr;
> };
>
> @@ -842,7 +842,7 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
>
> /*
> * Reject irqs-disabled contexts including NMI. Operations used
> - * by _next() and _destroy() (vma_end_read, bpf_iter_mmput_async)
> + * by _next() and _destroy() (vma_end_read, fput, bpf_iter_mmput_async)
> * can take spinlocks with IRQs disabled (pi_lock, pool->lock).
> * Running from NMI or from a tracepoint that fires with those
> * locks held could deadlock.
> @@ -885,7 +885,7 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
> goto err_cleanup_iter;
> }
>
> - kit->data->locked_vma = NULL;
> + kit->data->snapshot.vm_file = NULL;
> kit->data->next_addr = addr;
> return 0;
>
> @@ -947,26 +947,45 @@ bpf_iter_task_vma_find_next(struct bpf_iter_task_vma_kern_data *data)
> return vma;
> }
>
> +static void bpf_iter_task_vma_snapshot_reset(struct vm_area_struct *snap)
> +{
> + if (snap->vm_file) {
> + fput(snap->vm_file);
> + snap->vm_file = NULL;
> + }
> +}
> +
> __bpf_kfunc struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it)
> {
> struct bpf_iter_task_vma_kern *kit = (void *)it;
> - struct vm_area_struct *vma;
> + struct vm_area_struct *snap, *vma;
>
> if (!kit->data) /* bpf_iter_task_vma_new failed */
> return NULL;
>
> - if (kit->data->locked_vma) {
> - vma_end_read(kit->data->locked_vma);
> - kit->data->locked_vma = NULL;
> - }
> + snap = &kit->data->snapshot;
> +
> + bpf_iter_task_vma_snapshot_reset(snap);
>
> vma = bpf_iter_task_vma_find_next(kit->data);
> if (!vma)
> return NULL;
>
> - kit->data->locked_vma = vma;
> + memcpy(snap, vma, sizeof(*snap));
> +
> + /*
> + * The verifier only trusts vm_mm and vm_file (see
> + * BTF_TYPE_SAFE_TRUSTED_OR_NULL in verifier.c). Take a reference
> + * on vm_file; vm_mm is already correct because lock_vma_under_rcu()
> + * verifies vma->vm_mm == mm. All other pointers are untrusted by
> + * the verifier and left as-is.
> + */
> + if (snap->vm_file)
> + get_file(snap->vm_file);
> +
> kit->data->next_addr = vma->vm_end;
> - return vma;
> + vma_end_read(vma);
> + return snap;
> }
>
> __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
> @@ -974,8 +993,7 @@ __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
> struct bpf_iter_task_vma_kern *kit = (void *)it;
>
> if (kit->data) {
> - if (kit->data->locked_vma)
> - vma_end_read(kit->data->locked_vma);
> + bpf_iter_task_vma_snapshot_reset(&kit->data->snapshot);
> put_task_struct(kit->data->task);
> bpf_iter_mmput_async(kit->data->mm);
> bpf_mem_free(&bpf_global_ma, kit->data);
next prev parent reply other threads:[~2026-04-09 13:15 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 15:45 [PATCH bpf v6 0/3] bpf: fix and improve open-coded task_vma iterator Puranjay Mohan
2026-04-08 15:45 ` [PATCH bpf v6 1/3] bpf: fix mm lifecycle in " Puranjay Mohan
2026-04-08 15:45 ` [PATCH bpf v6 2/3] bpf: switch task_vma iterator from mmap_lock to per-VMA locks Puranjay Mohan
2026-04-08 15:45 ` [PATCH bpf v6 3/3] bpf: return VMA snapshot from task_vma iterator Puranjay Mohan
2026-04-09 13:15 ` Mykyta Yatsenko [this message]
2026-04-10 19:10 ` [PATCH bpf v6 0/3] bpf: fix and improve open-coded " patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=debb9f27-443a-4277-b252-de8efc09697a@gmail.com \
--to=mykyta.yatsenko5@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=kernel-team@meta.com \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=puranjay12@gmail.com \
--cc=puranjay@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.