BPF List
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Hou Tao <houtao@huaweicloud.com>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v4] bpf: Fix a race condition between btf_put() and map_free()
Date: Wed, 13 Dec 2023 22:30:53 -0800	[thread overview]
Message-ID: <50044158-c732-4ca5-8a3c-d06d693f1c85@linux.dev> (raw)
In-Reply-To: <CAADnVQJ3FiXUhZJwX_81sjZvSYYKCFB3BT6P8D59RS2Gu+0Z7g@mail.gmail.com>


On 12/13/23 8:17 PM, Alexei Starovoitov wrote:
> On Fri, Dec 8, 2023 at 9:07 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 12/8/23 12:30 AM, Hou Tao wrote:
>>> Hi,
>>>
>>> On 12/8/2023 12:02 PM, Yonghong Song wrote:
>>>> On 12/7/23 7:59 PM, Yonghong Song wrote:
>>>>> On 12/7/23 5:23 PM, Martin KaFai Lau wrote:
>>>>>> On 12/6/23 1:09 PM, Yonghong Song wrote:
>>>>>>> When running `./test_progs -j` in my local vm with latest kernel,
>>>>>>> I once hit a kasan error like below:
>>>>>>>
>>>>>>>
>>> SNIP
>>>>>>> Here, 'value_rec' is assigned in btf_check_and_fixup_fields() with
>>>>>>> following code:
>>>>>>>
>>>>>>>      meta = btf_find_struct_meta(btf, btf_id);
>>>>>>>      if (!meta)
>>>>>>>        return -EFAULT;
>>>>>>>      rec->fields[i].graph_root.value_rec = meta->record;
>>>>>>>
>>>>>>> So basically, 'value_rec' is a pointer to the record in
>>>>>>> struct_metas_tab.
>>>>>>> And it is possible that that particular record has been freed by
>>>>>>> btf_struct_metas_free() and hence we have a kasan error here.
>>>>>>>
>>>>>>> Actually it is very hard to reproduce the failure with current
>>>>>>> bpf/bpf-next
>>>>>>> code, I only got the above error once. To increase reproducibility,
>>>>>>> I added
>>>>>>> a delay in bpf_map_free_deferred() to delay map->ops->map_free(),
>>>>>>> which
>>>>>>> significantly increased reproducibility.
>>>>>>>
>>>>>>>      diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>>>>>>      index 5e43ddd1b83f..aae5b5213e93 100644
>>>>>>>      --- a/kernel/bpf/syscall.c
>>>>>>>      +++ b/kernel/bpf/syscall.c
>>>>>>>      @@ -695,6 +695,7 @@ static void bpf_map_free_deferred(struct
>>>>>>> work_struct *work)
>>>>>>>            struct bpf_map *map = container_of(work, struct bpf_map,
>>>>>>> work);
>>>>>>>            struct btf_record *rec = map->record;
>>>>>>>
>>>>>>>      +     mdelay(100);
>>>>>>>            security_bpf_map_free(map);
>>>>>>>            bpf_map_release_memcg(map);
>>>>>>>            /* implementation dependent freeing */
>>>>>>>
>>>>>>> To fix the problem, we need to have a reference on btf in order to
>>>>>>> safeguard accessing field->graph_root.value_rec in
>>>>>>> map->ops->map_free().
>>>>>>> The function btf_parse_graph_root() is the place to get a btf
>>>>>>> reference.
>>>>>>> The following are rough call stacks reaching bpf_parse_graph_root():
>>>>>>>
>>>>>>>       btf_parse
>>>>>>>         ...
>>>>>>>           btf_parse_fields
>>>>>>>             ...
>>>>>>>               btf_parse_graph_root
>>>>>>>
>>>>>>>       map_check_btf
>>>>>>>         btf_parse_fields
>>>>>>>           ...
>>>>>>>             btf_parse_graph_root
>>>>>>>
>>>>>>> Looking at the above call stack, the btf_parse_graph_root() is
>>>>>>> indirectly
>>>>>>> called from btf_parse() or map_check_btf().
>>>>>>>
>>>>>>> We cannot take a reference in btf_parse() case since at that moment,
>>>>>>> btf is still in the middle to self-validation and initial reference
>>>>>>> (refcount_set(&btf->refcnt, 1)) has not been triggered yet.
>>>>>> Thanks for the details analysis and clear explanation. It helps a lot.
>>>>>>
>>>>>> Sorry for jumping in late.
>>>>>>
>>>>>> I am trying to avoid making a special case for "bool has_btf_ref;"
>>>>>> and "bool from_map_check". It seems to a bit too much to deal with
>>>>>> the error path for btf_parse().
>>> Maybe we could move the common btf used by kptr and graph_root into
>>> bpf_record and let the callers of btf_parse_fields()  and
>>> btf_record_free() to decide the life cycle of btf in btf_record, so
>>> there will be less intrusive and less special case. The following is the
>> I didn't fully check the code but looks like we took a
>> btf reference at map_check_btf() and free it at the end
>> of bpf_map_free_deferred(). This is similar to my v1 patch,
>> not exactly the same but similar since they all do
>> btf_put() at the end of bpf_map_free_deferred().
>>
>> Through discussion, doing on-demand btf_get()/btf_put()
>> approach, similar to kptr approach, seems more favored.
>> This also has advantage to free btf at its earlist possible
>> point.
> Sorry. Looks like I recommended the wrong path.
>
> The approach of btf_parse_fields(... false | true)
> depending on where it's called and whether returned struct btf_record *
> will be kept within a type or within a map
> is pushing complexity too far.
> A year from now we'll forget these subtle details.
> There is an advantage to do btf_put() earli in bpf_map_put(),
> but in the common case it would be delayed just after queue_work.
> Which is a minor time delay.
> And for free_after_mult_rcu_gp much longer,
> but saving from freeing btf are minor compared to the map itself.
>
> I think it's cleaner to go back to v1 and simply move btf_put
> to bpf_map_free_deferred().
> A lot less things to worry about.
> Especially considering that BPF_RB_ROOT may not be the last such special
> record keeping type and every new type would need to think
> hard whether it's BPF_RB_ROOT-like or BPF_LIST_NODE-like.
> v1 avoids this future complexity.

No problem. Will send v6 tomorrow with v1.


  reply	other threads:[~2023-12-14  6:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-06 21:09 [PATCH bpf-next v4] bpf: Fix a race condition between btf_put() and map_free() Yonghong Song
2023-12-07 13:46 ` Hou Tao
2023-12-08  1:23 ` Martin KaFai Lau
2023-12-08  3:59   ` Yonghong Song
2023-12-08  4:02     ` Yonghong Song
2023-12-08  8:30       ` Hou Tao
2023-12-08 17:07         ` Yonghong Song
2023-12-14  4:17           ` Alexei Starovoitov
2023-12-14  6:30             ` Yonghong Song [this message]
2023-12-08  8:16     ` Martin KaFai Lau
2023-12-08 16:45       ` Yonghong Song
2023-12-08 18:26         ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50044158-c732-4ca5-8a3c-d06d693f1c85@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=houtao@huaweicloud.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=martin.lau@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox