All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Yonghong Song <yhs@meta.com>, bpf <bpf@vger.kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>,
	Hao Luo <haoluo@google.com>, Yonghong Song <yhs@fb.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	rcu@vger.kernel.org, Hou Tao <houtao1@huawei.com>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Subject: Re: [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc
Date: Tue, 14 Feb 2023 23:22:56 -0800	[thread overview]
Message-ID: <2b1ddc4c-9905-899a-a903-e66a6e8b4d58@linux.dev> (raw)
In-Reply-To: <7fef4ece-0982-cb43-ed39-e73791436355@huaweicloud.com>

On 2/14/23 8:02 PM, Hou Tao wrote:
>> For local storage, when its owner (sk/task/inode/cgrp) is going away, the
>> memory can be reused immediately. No rcu gp is needed.
> Now it seems it will wait for RCU GP and i think it is still necessary, because
> when the process exits, other processes may still access the local storage
> through pidfd or task_struct of the exited process.

When its owner (sk/task/cgrp...) is going away, its owner has reached refcnt 0 
and will be kfree immediately next. eg. bpf_sk_storage_free is called just 
before the sk is about to be kfree. No bpf prog should have a hold on this sk. 
The same should go for the task.

The current rcu gp waiting during bpf_{sk,task,cgrp...}_storage_free is because 
the racing with the map destruction bpf_local_storage_map_free().

>>
>> The local storage delete case (eg. bpf_sk_storage_delete) is the only one that
>> needs to be freed by tasks_trace gp because another bpf prog (reader) may be
>> under the rcu_read_lock_trace(). I think the idea (BPF_REUSE_AFTER_RCU_GP) on
>> allowing reuse after vanilla rcu gp and free (if needed) after tasks_trace gp
>> can be extended to the local storage delete case. I think we can extend the
>> assumption that "sleepable progs (reader) can use explicit bpf_rcu_read_lock()
>> when they want to avoid uaf" to bpf_{sk,task,inode,cgrp}_storage_get() also.
>>
> It seems bpf_rcu_read_lock() & bpf_rcu_read_unlock() will be used to protect not
> only bpf_task_storage_get(), but also the dereferences of the returned local
> storage ptr, right ? I think qp-trie may also need this.

I think bpf_rcu_read_lock() is primarily for bpf prog.

The bpf_{sk,task,...}_storage_get() internal is easier to handle and probably 
will need to do its own rcu_read_lock() instead of depending on the bpf prog 
doing the bpf_rcu_read_lock() because the bpf prog may decide uaf is fine.

>> I also need the GFP_ZERO in bpf_mem_alloc, so will work on the GFP_ZERO and
>> the BPF_REUSE_AFTER_RCU_GP idea.  Probably will get the GFP_ZERO out first.
> I will continue work on this patchset for GFP_ZERO and reuse flag. Do you mean
> that you want to work together to implement BPF_REUSE_AFTER_RCU_GP ? How do we
> cooperate together to accomplish that ?
Please submit the GFP_ZERO patch first. Kumar and I can use it immediately.

I have been hacking to make bpf's memalloc safe for the 
bpf_{sk,task,cgrp..}_storage_delete() and this safe-on-reuse piece still need 
works. The whole thing is getting pretty long, so my current plan is to put the 
safe-on-reuse piece aside for now, focus back on the immediate goal and make the 
common case deadlock free first. Meaning the 
bpf_*_storage_get(BPF_*_STORAGE_GET_F_CREATE) and the bpf_*_storage_free() will 
use the bpf_mem_cache_{alloc,free}. The bpf_*_storage_delete() will stay as-is 
to go through the call_rcu_tasks_trace() for now since delete is not the common 
use case.

In parallel, if you can post the BPF_REUSE_AFTER_RCU_GP, we can discuss based on 
your work. That should speed up the progress. If I finished the immediate goal 
for local storage and this piece is still pending, I will ping you first.  Thoughts?


  reply	other threads:[~2023-02-15  7:23 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-30  4:11 [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 1/6] bpf: Support ctor in bpf memory allocator Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 2/6] bpf: Factor out a common helper free_llist() Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 3/6] bpf: Pass bitwise flags to bpf_mem_alloc_init() Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 4/6] bpf: Introduce BPF_MA_NO_REUSE for bpf memory allocator Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 5/6] bpf: Use BPF_MA_NO_REUSE in htab map Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 6/6] selftests/bpf: Add test case for element reuse " Hou Tao
2023-01-01  1:26 ` [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc Alexei Starovoitov
2023-01-01 18:48   ` Yonghong Song
2023-01-03 13:47     ` Hou Tao
2023-01-04  6:10       ` Yonghong Song
2023-01-04  6:30         ` Hou Tao
2023-01-04  7:14           ` Yonghong Song
2023-01-04 18:26             ` Alexei Starovoitov
2023-02-10 16:32               ` Kumar Kartikeya Dwivedi
2023-02-10 21:06                 ` Alexei Starovoitov
2023-02-11  1:09                   ` Hou Tao
2023-02-11 16:33                     ` Alexei Starovoitov
2023-02-11 16:34                       ` Alexei Starovoitov
2023-02-15  1:54                         ` Martin KaFai Lau
2023-02-15  4:02                           ` Hou Tao
2023-02-15  7:22                             ` Martin KaFai Lau [this message]
2023-02-16  2:11                               ` Hou Tao
2023-02-16  7:47                                 ` Martin KaFai Lau
2023-02-16  8:18                                   ` Hou Tao
2023-02-16 13:55                         ` Hou Tao
2023-02-16 16:35                           ` Alexei Starovoitov
2023-02-17  1:19                             ` Hou Tao
2023-02-22 19:30                               ` Alexei Starovoitov
2023-02-15  2:35                       ` Hou Tao
2023-02-15  2:42                         ` Alexei Starovoitov
2023-02-15  3:00                           ` Hou Tao
2023-01-03 13:40   ` Hou Tao
2023-01-03 19:38     ` Alexei Starovoitov
2023-01-10  6:26       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b1ddc4c-9905-899a-a903-e66a6e8b4d58@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=houtao@huaweicloud.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    --cc=yhs@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.