Re: [PATCH 3/4] mm: introduce bpf_mthp_ops struct ops

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Vernon Yang" <vernon2gm@gmail.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH 3/4] mm: introduce bpf_mthp_ops struct ops
Date: Sun, 03 May 2026 17:41:24 +0000	[thread overview]
Message-ID: <20260503174125.2C949C2BCB4@smtp.kernel.org> (raw)
In-Reply-To: <20260503165024.1526680-4-vernon2gm@gmail.com>

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [Critical] Use-After-Free vulnerability in `bpf_mthp_choose()` due to improper SRCU locking.
- [High] NULL pointer dereference of `link` before validation in `bpf_mthp_ops_reg()` and `bpf_mthp_ops_unreg()`.
- [High] Newly created cgroups silently bypass the hierarchical BPF mTHP policy.
- [High] Silent registration failure leaves the cgroup hierarchy in an inconsistent state.
- [Medium] The BPF program can only restrict allowable THP orders, not expand or select new ones.
- [Low] Compilation failure due to dereferencing an incomplete type and a non-existent struct member.
--

commit e5c9c74b11f76b4de7ba4ef8ff94c10bae978e16
Author: Vernon Yang <yanglincheng@kylinos.cn>

mm: introduce bpf_mthp_ops struct ops

Introducing bpf_mthp_ops enables eBPF programs to register the
mthp_choose callback function via cgroup-ebpf.

> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2949e5acff35..80ec622213df 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
[ ... ]
> @@ -291,6 +292,11 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>  				       enum tva_type type,
>  				       unsigned long orders)
>  {
> +	/* The eBPF-specified orders overrides which order is selected. */
> +	orders &= bpf_mthp_choose(vma->vm_mm, orders);

Does this inadvertently prevent the BPF program from enabling or selecting new
orders? Since the return value is applied using a bitwise AND, it appears
restricted to only disabling existing orders rather than customizing them as
described in the commit message.

> +	if (!orders)
> +		return 0;
> +
>  	/*
>  	 * Optimization to check if required orders are enabled early. Only
>  	 * forced collapse ignores sysfs configs.
> diff --git a/mm/bpf_huge_memory.c b/mm/bpf_huge_memory.c
> new file mode 100644
> index 000000000000..e34e0a35edac
> --- /dev/null
> +++ b/mm/bpf_huge_memory.c
[ ... ]
> +unsigned long bpf_mthp_choose(struct mm_struct *mm, unsigned long orders)
> +{
> +	struct cgroup *cgrp;
> +	struct mem_cgroup *memcg;
> +	struct bpf_mthp_ops *ops;
> +	int idx;
> +
> +	memcg = get_mem_cgroup_from_mm(mm);
> +	if (!memcg)
> +		return orders;
> +
> +	cgrp = memcg->css.cgroup;
> +	ops = READ_ONCE(cgrp->mthp_ops);
> +	if (unlikely(ops)) {
> +		idx = srcu_read_lock(&mthp_bpf_srcu);
> +		if (ops->mthp_choose)
> +			orders = ops->mthp_choose(cgrp, orders);
> +		srcu_read_unlock(&mthp_bpf_srcu, idx);
> +	}

Could reading cgrp->mthp_ops outside the SRCU critical section lead to a
use-after-free regression?

If a thread is preempted immediately after ops is read:

    ops = READ_ONCE(cgrp->mthp_ops);
    <preempted>

Another thread could execute bpf_mthp_ops_unreg(), set the pointer to NULL,
and complete synchronize_srcu(). The original thread would then resume,
acquire the lock, and dereference the freed ops pointer.

> +
> +	mem_cgroup_put(memcg);
> +
> +	return orders;
> +}
[ ... ]
> +static int bpf_mthp_ops_reg(void *kdata, struct bpf_link *link)
> +{
> +	struct bpf_struct_ops_link *st_link = (struct bpf_struct_ops_link *)link;
> +	struct bpf_mthp_ops *ops = kdata;
> +	struct cgroup *cgrp = st_link->cgroup;

Can this cause a regression if link is passed as NULL? st_link is cast from
link, and st_link->cgroup implicitly dereferences it before the validation
check below.

Additionally, struct bpf_struct_ops_link is defined privately in
kernel/bpf/bpf_struct_ops.c and does not contain a cgroup field. Will this
cause a compilation failure due to dereferencing an incomplete type and a
non-existent struct member?

> +	struct cgroup_subsys_state *pos;
> +
> +	/* The link is not yet fully initialized, but cgroup should be set */
> +	if (!link)
> +		return -EOPNOTSUPP;
> +
> +	cgroup_lock();
> +	css_for_each_descendant_pre(pos, &cgrp->self) {
> +		struct cgroup *child = pos->cgroup;
> +
> +		if (READ_ONCE(child->mthp_ops)) {
> +			/* TODO
> +			 * Do not destroy the cgroup hierarchy property.
> +			 * If an eBPF program already exists in the sub-cgroup,
> +			 * trigger an error and clear the already set
> +			 * bpf_mthp_ops data.
> +			 */
> +			continue;

Is it intentional to return 0 at the end of this function when we skip over
descendants that already have mthp_ops configured? This appears to leave the
hierarchy in an inconsistent state while userspace assumes the registration
was fully successful.

Furthermore, is there a missing hook in the cgroup creation path? Since this
loop only applies to existing descendants, any newly created cgroups will
have their mthp_ops field initialized to NULL, allowing tasks moved there to
silently bypass the BPF program attached to their ancestors.

> +		}
> +		WRITE_ONCE(child->mthp_ops, ops);
> +	}
> +	cgroup_unlock();
> +
> +	return 0;
> +}
> +
> +static void bpf_mthp_ops_unreg(void *kdata, struct bpf_link *link)
> +{
> +	struct bpf_struct_ops_link *st_link = (struct bpf_struct_ops_link *)link;
> +	struct bpf_mthp_ops *ops = kdata;
> +	struct cgroup *cgrp = st_link->cgroup;

Does this code similarly dereference link before it can be validated?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503165024.1526680-1-vernon2gm@gmail.com?part=3

next prev parent reply	other threads:[~2026-05-03 17:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-03 16:50 [PATCH 0/4] mm: introduce mthp_ext via cgroup-bpf to make mTHP more transparent Vernon Yang
2026-05-03 16:50 ` [PATCH 1/4] psi: add psi_group_flush_stats() function Vernon Yang
2026-05-03 16:50 ` [PATCH 2/4] bpf: add bpf_cgroup_{flush_stats,stall} function Vernon Yang
2026-05-03 17:23   ` bot+bpf-ci
2026-05-06 12:38     ` Vernon Yang
2026-05-03 17:25   ` sashiko-bot
2026-05-06 12:55     ` Vernon Yang
2026-05-03 16:50 ` [PATCH 3/4] mm: introduce bpf_mthp_ops struct ops Vernon Yang
2026-05-03 17:35   ` bot+bpf-ci
2026-05-06 13:06     ` Vernon Yang
2026-05-03 17:41   ` sashiko-bot [this message]
2026-05-06 13:26     ` Vernon Yang
2026-05-03 16:50 ` [PATCH 4/4] samples: bpf: add mthp_ext Vernon Yang
2026-05-03 17:35   ` bot+bpf-ci
2026-05-06 13:30     ` Vernon Yang
2026-05-03 17:57   ` sashiko-bot
2026-05-06 13:50     ` Vernon Yang
2026-05-07  3:34 ` [PATCH 0/4] mm: introduce mthp_ext via cgroup-bpf to make mTHP more transparent Yafang Shao
2026-05-07 12:50   ` Vernon Yang
2026-05-07 13:18     ` Yafang Shao
2026-05-07 15:19       ` Vernon Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260503174125.2C949C2BCB4@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    --cc=vernon2gm@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.