From: Leon Hwang <leon.hwang@linux.dev>
To: Yonghong Song <yonghong.song@linux.dev>, bpf@vger.kernel.org
Cc: ast@kernel.org, andrii@kernel.org, daniel@iogearbox.net,
martin.lau@linux.dev, eddyz87@gmail.com, song@kernel.org,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, memxor@gmail.com,
ameryhung@gmail.com, linux-kernel@vger.kernel.org,
kernel-patches-bot@fb.com
Subject: Re: [PATCH bpf-next v6 2/2] selftests/bpf: Add test to verify freeing the special fields when update [lru_,]percpu_hash maps
Date: Thu, 13 Nov 2025 21:16:32 +0800 [thread overview]
Message-ID: <17a9444f-1151-44b4-b2ec-5cafd12bf2bd@linux.dev> (raw)
In-Reply-To: <6a2a6f41-f24a-4e87-94d0-8cb147725279@linux.dev>
On 2025/11/12 05:58, Yonghong Song wrote:
>
>
> On 11/11/25 5:52 AM, Leon Hwang wrote:
>>
>> On 2025/11/11 21:38, Leon Hwang wrote:
>>>
>>> On 2025/11/7 10:00, Yonghong Song wrote:
>>>>
>>>> On 11/5/25 7:14 AM, Leon Hwang wrote:
[...]
>>>>> +
>>>>> +static int __insert_in_list(struct bpf_list_head *head, struct
>>>>> bpf_spin_lock *lock,
>>>>> + struct node_data __kptr **node)
>>>>> +{
>>>>> + struct node_data *node_new, *node_ref, *node_old;
>>>>> +
>>>>> + node_new = bpf_obj_new(typeof(*node_new));
>>>>> + if (!node_new)
>>>>> + return -1;
>>>>> +
>>>>> + node_ref = bpf_refcount_acquire(node_new);
>>>>> + node_old = bpf_kptr_xchg(node, node_new);
>>>> Change the above to node_old = bpf_kptr_xchg(node, node_node_ref);
>>>> might
>>>> be better for reasoning although node_ref/node_new are the same.
>>>>
>>> Nope — node_ref and node_new are different for the verifier.
>> They are the same in theory.
>>
>> The verifier failure was likely caused by something else, but I'm not
>> sure of the exact reason.
>
> I did some analysis and your code works as expected:
>
> node_ref = bpf_refcount_acquire(node_new);
> node_old = bpf_kptr_xchg(node, node_new);
> if (node_old) {
> bpf_obj_drop(node_old);
> bpf_obj_drop(node_ref);
> return -2;
> }
>
> bpf_spin_lock(lock);
> bpf_list_push_front(head, &node_ref->l);
> ref = (u64)(void *) &node_ref->ref;
> bpf_spin_unlock(lock);
>
> In the above, after the following insn:
> node_old = bpf_kptr_xchg(node, node_new);
> the second argument 'node_new' will become a scalar since it
> may be changed by another bpf program accessing the same map.
>
> So your code is okay as node_ref still valid ptr_node_data
> and can be used in following codes.
>
>
> My suggestion to replace
> node_old = bpf_kptr_xchg(node, node_new);
> with
> node_old = bpf_kptr_xchg(node, node_ref);
> will not work since node_ref will be a scalar
> so subsequent bpf_obj_drop(node_ref) and bpf_list_push_front(...)
> won't work.
>
> In summary, your change look okay to me. Sorry for noise.
>
Hi Yonghong,
Thanks for your detailed analysis.
Indeed, in the case of
node_old = bpf_kptr_xchg(node, node_ref);,
the verifier marks node_ref as SCALAR_VALUE.
Here's the relevant part in check_helper_call():
static int check_helper_call(struct bpf_verifier_env *env, struct
bpf_insn *insn,
int *insn_idx_p)
{
//...
if (meta.release_regno) {
err = -EINVAL;
if (arg_type_is_dynptr(fn->arg_type[meta.release_regno -
BPF_REG_1])) {
err = unmark_stack_slots_dynptr(env,
®s[meta.release_regno]);
} else if (func_id == BPF_FUNC_kptr_xchg &&
meta.ref_obj_id) {
u32 ref_obj_id = meta.ref_obj_id;
bool in_rcu = in_rcu_cs(env);
struct bpf_func_state *state;
struct bpf_reg_state *reg;
err = release_reference_nomark(env->cur_state,
ref_obj_id);
if (!err) {
bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
if (reg->ref_obj_id == ref_obj_id) {
if (in_rcu && (reg->type
& MEM_ALLOC) && (reg->type & MEM_PERCPU)) {
reg->ref_obj_id = 0;
reg->type &=
~MEM_ALLOC;
reg->type |=
MEM_RCU;
} else {
mark_reg_invalid(env, reg);
}
}
}));
}
} else if (meta.ref_obj_id) {
//...
}
//...
}
This logic matches your explanation — when the argument passed to
bpf_kptr_xchg() holds a reference (meta.ref_obj_id), that reference is
not a KPTR_PERCPU inside an RCU critical section.
As a result, node_ref is invalidated and becomes a scalar after the call.
So yes, your reasoning is correct, and this behavior explains why using
node_ref as the second argument doesn't work.
Thanks,
Leon
[...]
next prev parent reply other threads:[~2025-11-13 13:16 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 15:14 [PATCH bpf-next v6 0/2] bpf: Free special fields when update [lru_,]percpu_hash maps Leon Hwang
2025-11-05 15:14 ` [PATCH bpf-next v6 1/2] " Leon Hwang
2025-11-07 1:56 ` Yonghong Song
2025-11-05 15:14 ` [PATCH bpf-next v6 2/2] selftests/bpf: Add test to verify freeing the " Leon Hwang
2025-11-07 2:00 ` Yonghong Song
2025-11-11 13:38 ` Leon Hwang
2025-11-11 13:52 ` Leon Hwang
2025-11-11 21:58 ` Yonghong Song
2025-11-13 13:16 ` Leon Hwang [this message]
2025-11-13 17:30 ` [PATCH bpf-next v6 0/2] bpf: Free " patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17a9444f-1151-44b4-b2ec-5cafd12bf2bd@linux.dev \
--to=leon.hwang@linux.dev \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kernel-patches-bot@fb.com \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).