From: Martin KaFai Lau <martin.lau@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Yonghong Song <yhs@meta.com>, bpf <bpf@vger.kernel.org>,
Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>,
Hao Luo <haoluo@google.com>, Yonghong Song <yhs@fb.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
rcu@vger.kernel.org, Hou Tao <houtao1@huawei.com>,
Martin KaFai Lau <martin.lau@kernel.org>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>
Subject: Re: [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc
Date: Tue, 14 Feb 2023 23:22:56 -0800 [thread overview]
Message-ID: <2b1ddc4c-9905-899a-a903-e66a6e8b4d58@linux.dev> (raw)
In-Reply-To: <7fef4ece-0982-cb43-ed39-e73791436355@huaweicloud.com>
On 2/14/23 8:02 PM, Hou Tao wrote:
>> For local storage, when its owner (sk/task/inode/cgrp) is going away, the
>> memory can be reused immediately. No rcu gp is needed.
> Now it seems it will wait for RCU GP and i think it is still necessary, because
> when the process exits, other processes may still access the local storage
> through pidfd or task_struct of the exited process.
When its owner (sk/task/cgrp...) is going away, its owner has reached refcnt 0
and will be kfree immediately next. eg. bpf_sk_storage_free is called just
before the sk is about to be kfree. No bpf prog should have a hold on this sk.
The same should go for the task.
The current rcu gp waiting during bpf_{sk,task,cgrp...}_storage_free is because
the racing with the map destruction bpf_local_storage_map_free().
>>
>> The local storage delete case (eg. bpf_sk_storage_delete) is the only one that
>> needs to be freed by tasks_trace gp because another bpf prog (reader) may be
>> under the rcu_read_lock_trace(). I think the idea (BPF_REUSE_AFTER_RCU_GP) on
>> allowing reuse after vanilla rcu gp and free (if needed) after tasks_trace gp
>> can be extended to the local storage delete case. I think we can extend the
>> assumption that "sleepable progs (reader) can use explicit bpf_rcu_read_lock()
>> when they want to avoid uaf" to bpf_{sk,task,inode,cgrp}_storage_get() also.
>>
> It seems bpf_rcu_read_lock() & bpf_rcu_read_unlock() will be used to protect not
> only bpf_task_storage_get(), but also the dereferences of the returned local
> storage ptr, right ? I think qp-trie may also need this.
I think bpf_rcu_read_lock() is primarily for bpf prog.
The bpf_{sk,task,...}_storage_get() internal is easier to handle and probably
will need to do its own rcu_read_lock() instead of depending on the bpf prog
doing the bpf_rcu_read_lock() because the bpf prog may decide uaf is fine.
>> I also need the GFP_ZERO in bpf_mem_alloc, so will work on the GFP_ZERO and
>> the BPF_REUSE_AFTER_RCU_GP idea. Probably will get the GFP_ZERO out first.
> I will continue work on this patchset for GFP_ZERO and reuse flag. Do you mean
> that you want to work together to implement BPF_REUSE_AFTER_RCU_GP ? How do we
> cooperate together to accomplish that ?
Please submit the GFP_ZERO patch first. Kumar and I can use it immediately.
I have been hacking to make bpf's memalloc safe for the
bpf_{sk,task,cgrp..}_storage_delete() and this safe-on-reuse piece still need
works. The whole thing is getting pretty long, so my current plan is to put the
safe-on-reuse piece aside for now, focus back on the immediate goal and make the
common case deadlock free first. Meaning the
bpf_*_storage_get(BPF_*_STORAGE_GET_F_CREATE) and the bpf_*_storage_free() will
use the bpf_mem_cache_{alloc,free}. The bpf_*_storage_delete() will stay as-is
to go through the call_rcu_tasks_trace() for now since delete is not the common
use case.
In parallel, if you can post the BPF_REUSE_AFTER_RCU_GP, we can discuss based on
your work. That should speed up the progress. If I finished the immediate goal
for local storage and this piece is still pending, I will ping you first. Thoughts?
next prev parent reply other threads:[~2023-02-15 7:23 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-30 4:11 [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 1/6] bpf: Support ctor in bpf memory allocator Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 2/6] bpf: Factor out a common helper free_llist() Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 3/6] bpf: Pass bitwise flags to bpf_mem_alloc_init() Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 4/6] bpf: Introduce BPF_MA_NO_REUSE for bpf memory allocator Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 5/6] bpf: Use BPF_MA_NO_REUSE in htab map Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 6/6] selftests/bpf: Add test case for element reuse " Hou Tao
2023-01-01 1:26 ` [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc Alexei Starovoitov
2023-01-01 18:48 ` Yonghong Song
2023-01-03 13:47 ` Hou Tao
2023-01-04 6:10 ` Yonghong Song
2023-01-04 6:30 ` Hou Tao
2023-01-04 7:14 ` Yonghong Song
2023-01-04 18:26 ` Alexei Starovoitov
2023-02-10 16:32 ` Kumar Kartikeya Dwivedi
2023-02-10 21:06 ` Alexei Starovoitov
2023-02-11 1:09 ` Hou Tao
2023-02-11 16:33 ` Alexei Starovoitov
2023-02-11 16:34 ` Alexei Starovoitov
2023-02-15 1:54 ` Martin KaFai Lau
2023-02-15 4:02 ` Hou Tao
2023-02-15 7:22 ` Martin KaFai Lau [this message]
2023-02-16 2:11 ` Hou Tao
2023-02-16 7:47 ` Martin KaFai Lau
2023-02-16 8:18 ` Hou Tao
2023-02-16 13:55 ` Hou Tao
2023-02-16 16:35 ` Alexei Starovoitov
2023-02-17 1:19 ` Hou Tao
2023-02-22 19:30 ` Alexei Starovoitov
2023-02-15 2:35 ` Hou Tao
2023-02-15 2:42 ` Alexei Starovoitov
2023-02-15 3:00 ` Hou Tao
2023-01-03 13:40 ` Hou Tao
2023-01-03 19:38 ` Alexei Starovoitov
2023-01-10 6:26 ` Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2b1ddc4c-9905-899a-a903-e66a6e8b4d58@linux.dev \
--to=martin.lau@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=houtao1@huawei.com \
--cc=houtao@huaweicloud.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=yhs@fb.com \
--cc=yhs@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox