All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Yonghong Song <yonghong.song@linux.dev>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>,
	Hou Tao <houtao@huaweicloud.com>,
	bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v4] bpf: Fix a race condition between btf_put() and map_free()
Date: Fri, 8 Dec 2023 10:26:34 -0800	[thread overview]
Message-ID: <80439854-29ed-41f1-855b-d0cf91c07b8d@linux.dev> (raw)
In-Reply-To: <ba220781-3be6-4788-8765-f2868e97e126@linux.dev>

On 12/8/23 8:45 AM, Yonghong Song wrote:
> 
> On 12/8/23 12:16 AM, Martin KaFai Lau wrote:
>> On 12/7/23 7:59 PM, Yonghong Song wrote:
>>>>
>>>> I am trying to avoid making a special case for "bool has_btf_ref;" and "bool 
>>>> from_map_check". It seems to a bit too much to deal with the error path for 
>>>> btf_parse().
>>>>
>>>> Would doing the refcount_set(&btf->refcnt, 1) earlier in btf_parse help?
>>>
>>> No, it does not. The core reason is what Hao is mentioned in
>>> https://lore.kernel.org/bpf/47ee3265-23f7-2130-ff28-27bfaf3f7877@huaweicloud.com/
>>> We simply cannot take btf reference if called from btf_parse().
>>> Let us say we move refcount_set(&btf->refcnt, 1) earlier in btf_parse()
>>> so we take ref for btf during btf_parse_fields(), then we have
>>>       btf_put <=== expect refcount == 0 to start the destruction process
>>>         ...
>>>           btf_record_free <=== in which if graph_root, a btf reference will 
>>> be hold
>>> so btf_put will never be able to actually free btf data.
>>
>> ah. There is a loop like btf->struct_meta_tab->...btf.
>>
>>> Yes, the kasan problem will be resolved but we leak memory.
>>>
>>>>
>>>>> It is also unnecessary to take a reference since the value_rec is
>>>>> referring to a record in struct_meta_tab.
>>>>
>>>> If we optimize for not taking a refcnt, how about not taking a refcnt for 
>>>> all cases and postpone the btf_put(), instead of taking refcnt in one case 
>>>> but not another. Like your fix in v1. The failed selftest can be changed or 
>>>> even removed if it does not make sense anymore.
>>>
>>> After a couple of iterations, I think taking necessary reference approach 
>>> sounds better
>>> and this will be consistent with how kptr is handled. For kptr, btf_parse 
>>> will ignore it.
>>
>> Got it. It is why kptr.btf got away with the loop.
>>
>> On the other hand, am I reading it correctly that kptr.btf only needs to take 
>> the refcnt for btf that is btf_is_kernel()?
> 
> No. besides vmlinux and module btf, it also takes reference for prog btf, see
> 
> static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
>                            struct btf_field_info *info)
> {
> ...
>          if (id == -ENOENT) {
>                  /* btf_parse_kptr should only be called w/ btf = program BTF */
>                  WARN_ON_ONCE(btf_is_kernel(btf));
>                  /* Type exists only in program BTF. Assume that it's a MEM_ALLOC
>                   * kptr allocated via bpf_obj_new
>                   */
>                  field->kptr.dtor = NULL;
>                  id = info->kptr.type_id;
>                  kptr_btf = (struct btf *)btf;
>                  btf_get(kptr_btf);

I meant only kernel/module btf needs to take the refcnt, so there is no need to 
take the refcnt here for the (it)self btf. Sorry that I was not clear in my 
earlier comment.

The record is capturing something either in the self btf or something in the 
kernel btf. The field->kptr.kptr is the one that may either point to a kernel or 
self btf, so it should be the only case that needs to check the following in 
btf_record_free():

	if (btf_is_kernel(rec->fields[i].kptr.btf))
		btf_put(rec->fields[i].kptr.btf);

All other cases the record has a self btf (including field->graph_root.btf). The 
owner (map here) needs to ensure the self btf is freed after the record is freed.

I was thinking if it can avoid doing different things based on where 
btf_parse_fields() is called by separating what type of btf always needs refcnt 
or not. Agree the approach in this patch will fix the issue also and I have 
acked v5. Thanks for the fix.

>                  goto found_dtor;
>          }
> ...
> }
> 
>>
>>> Unfortunately, for graph_root (list_head, rb_root), btf_parse and map_check 
>>> will both
>>> process it and that adds a little bit complexity.
>>> Alexei also suggested the same taking reference approach:
>>> https://lore.kernel.org/bpf/CAADnVQL+uc6VV65_Ezgzw3WH=ME9z1Fdy8Pd6xd0oOq8rgwh7g@mail.gmail.com/
>>


      reply	other threads:[~2023-12-08 18:26 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-06 21:09 [PATCH bpf-next v4] bpf: Fix a race condition between btf_put() and map_free() Yonghong Song
2023-12-07 13:46 ` Hou Tao
2023-12-08  1:23 ` Martin KaFai Lau
2023-12-08  3:59   ` Yonghong Song
2023-12-08  4:02     ` Yonghong Song
2023-12-08  8:30       ` Hou Tao
2023-12-08 17:07         ` Yonghong Song
2023-12-14  4:17           ` Alexei Starovoitov
2023-12-14  6:30             ` Yonghong Song
2023-12-08  8:16     ` Martin KaFai Lau
2023-12-08 16:45       ` Yonghong Song
2023-12-08 18:26         ` Martin KaFai Lau [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80439854-29ed-41f1-855b-d0cf91c07b8d@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=houtao@huaweicloud.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.