BPF List
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Vernon Yang" <vernon2gm@gmail.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH 3/4] mm: introduce bpf_mthp_ops struct ops
Date: Sun, 03 May 2026 17:41:24 +0000	[thread overview]
Message-ID: <20260503174125.2C949C2BCB4@smtp.kernel.org> (raw)
In-Reply-To: <20260503165024.1526680-4-vernon2gm@gmail.com>

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [Critical] Use-After-Free vulnerability in `bpf_mthp_choose()` due to improper SRCU locking.
- [High] NULL pointer dereference of `link` before validation in `bpf_mthp_ops_reg()` and `bpf_mthp_ops_unreg()`.
- [High] Newly created cgroups silently bypass the hierarchical BPF mTHP policy.
- [High] Silent registration failure leaves the cgroup hierarchy in an inconsistent state.
- [Medium] The BPF program can only restrict allowable THP orders, not expand or select new ones.
- [Low] Compilation failure due to dereferencing an incomplete type and a non-existent struct member.
--

commit e5c9c74b11f76b4de7ba4ef8ff94c10bae978e16
Author: Vernon Yang <yanglincheng@kylinos.cn>

mm: introduce bpf_mthp_ops struct ops

Introducing bpf_mthp_ops enables eBPF programs to register the
mthp_choose callback function via cgroup-ebpf.

> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2949e5acff35..80ec622213df 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
[ ... ]
> @@ -291,6 +292,11 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>  				       enum tva_type type,
>  				       unsigned long orders)
>  {
> +	/* The eBPF-specified orders overrides which order is selected. */
> +	orders &= bpf_mthp_choose(vma->vm_mm, orders);

Does this inadvertently prevent the BPF program from enabling or selecting new
orders? Since the return value is applied using a bitwise AND, it appears
restricted to only disabling existing orders rather than customizing them as
described in the commit message.

> +	if (!orders)
> +		return 0;
> +
>  	/*
>  	 * Optimization to check if required orders are enabled early. Only
>  	 * forced collapse ignores sysfs configs.
> diff --git a/mm/bpf_huge_memory.c b/mm/bpf_huge_memory.c
> new file mode 100644
> index 000000000000..e34e0a35edac
> --- /dev/null
> +++ b/mm/bpf_huge_memory.c
[ ... ]
> +unsigned long bpf_mthp_choose(struct mm_struct *mm, unsigned long orders)
> +{
> +	struct cgroup *cgrp;
> +	struct mem_cgroup *memcg;
> +	struct bpf_mthp_ops *ops;
> +	int idx;
> +
> +	memcg = get_mem_cgroup_from_mm(mm);
> +	if (!memcg)
> +		return orders;
> +
> +	cgrp = memcg->css.cgroup;
> +	ops = READ_ONCE(cgrp->mthp_ops);
> +	if (unlikely(ops)) {
> +		idx = srcu_read_lock(&mthp_bpf_srcu);
> +		if (ops->mthp_choose)
> +			orders = ops->mthp_choose(cgrp, orders);
> +		srcu_read_unlock(&mthp_bpf_srcu, idx);
> +	}

Could reading cgrp->mthp_ops outside the SRCU critical section lead to a
use-after-free regression?

If a thread is preempted immediately after ops is read:

    ops = READ_ONCE(cgrp->mthp_ops);
    <preempted>

Another thread could execute bpf_mthp_ops_unreg(), set the pointer to NULL,
and complete synchronize_srcu(). The original thread would then resume,
acquire the lock, and dereference the freed ops pointer.

> +
> +	mem_cgroup_put(memcg);
> +
> +	return orders;
> +}
[ ... ]
> +static int bpf_mthp_ops_reg(void *kdata, struct bpf_link *link)
> +{
> +	struct bpf_struct_ops_link *st_link = (struct bpf_struct_ops_link *)link;
> +	struct bpf_mthp_ops *ops = kdata;
> +	struct cgroup *cgrp = st_link->cgroup;

Can this cause a regression if link is passed as NULL? st_link is cast from
link, and st_link->cgroup implicitly dereferences it before the validation
check below.

Additionally, struct bpf_struct_ops_link is defined privately in
kernel/bpf/bpf_struct_ops.c and does not contain a cgroup field. Will this
cause a compilation failure due to dereferencing an incomplete type and a
non-existent struct member?

> +	struct cgroup_subsys_state *pos;
> +
> +	/* The link is not yet fully initialized, but cgroup should be set */
> +	if (!link)
> +		return -EOPNOTSUPP;
> +
> +	cgroup_lock();
> +	css_for_each_descendant_pre(pos, &cgrp->self) {
> +		struct cgroup *child = pos->cgroup;
> +
> +		if (READ_ONCE(child->mthp_ops)) {
> +			/* TODO
> +			 * Do not destroy the cgroup hierarchy property.
> +			 * If an eBPF program already exists in the sub-cgroup,
> +			 * trigger an error and clear the already set
> +			 * bpf_mthp_ops data.
> +			 */
> +			continue;

Is it intentional to return 0 at the end of this function when we skip over
descendants that already have mthp_ops configured? This appears to leave the
hierarchy in an inconsistent state while userspace assumes the registration
was fully successful.

Furthermore, is there a missing hook in the cgroup creation path? Since this
loop only applies to existing descendants, any newly created cgroups will
have their mthp_ops field initialized to NULL, allowing tasks moved there to
silently bypass the BPF program attached to their ancestors.

> +		}
> +		WRITE_ONCE(child->mthp_ops, ops);
> +	}
> +	cgroup_unlock();
> +
> +	return 0;
> +}
> +
> +static void bpf_mthp_ops_unreg(void *kdata, struct bpf_link *link)
> +{
> +	struct bpf_struct_ops_link *st_link = (struct bpf_struct_ops_link *)link;
> +	struct bpf_mthp_ops *ops = kdata;
> +	struct cgroup *cgrp = st_link->cgroup;

Does this code similarly dereference link before it can be validated?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503165024.1526680-1-vernon2gm@gmail.com?part=3

  parent reply	other threads:[~2026-05-03 17:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-03 16:50 [PATCH 0/4] mm: introduce mthp_ext via cgroup-bpf to make mTHP more transparent Vernon Yang
2026-05-03 16:50 ` [PATCH 1/4] psi: add psi_group_flush_stats() function Vernon Yang
2026-05-03 16:50 ` [PATCH 2/4] bpf: add bpf_cgroup_{flush_stats,stall} function Vernon Yang
2026-05-03 17:23   ` bot+bpf-ci
2026-05-06 12:38     ` Vernon Yang
2026-05-03 17:25   ` sashiko-bot
2026-05-06 12:55     ` Vernon Yang
2026-05-03 16:50 ` [PATCH 3/4] mm: introduce bpf_mthp_ops struct ops Vernon Yang
2026-05-03 17:35   ` bot+bpf-ci
2026-05-06 13:06     ` Vernon Yang
2026-05-03 17:41   ` sashiko-bot [this message]
2026-05-06 13:26     ` Vernon Yang
2026-05-03 16:50 ` [PATCH 4/4] samples: bpf: add mthp_ext Vernon Yang
2026-05-03 17:35   ` bot+bpf-ci
2026-05-06 13:30     ` Vernon Yang
2026-05-03 17:57   ` sashiko-bot
2026-05-06 13:50     ` Vernon Yang
2026-05-07  3:34 ` [PATCH 0/4] mm: introduce mthp_ext via cgroup-bpf to make mTHP more transparent Yafang Shao
2026-05-07 12:50   ` Vernon Yang
2026-05-07 13:18     ` Yafang Shao
2026-05-07 15:19       ` Vernon Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260503174125.2C949C2BCB4@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    --cc=vernon2gm@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox