From: Yonghong Song <yhs@fb.com>
To: Hao Luo <haoluo@google.com>, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>
Cc: KP Singh <kpsingh@kernel.org>, Martin KaFai Lau <kafai@fb.com>,
Song Liu <songliubraving@fb.com>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC bpf-next 0/2] Mmapable task local storage.
Date: Fri, 25 Mar 2022 12:16:27 -0700 [thread overview]
Message-ID: <9cdf860d-8370-95b5-1688-af03265cc874@fb.com> (raw)
In-Reply-To: <20220324234123.1608337-1-haoluo@google.com>
On 3/24/22 4:41 PM, Hao Luo wrote:
> Some map types support mmap operation, which allows userspace to
> communicate with BPF programs directly. Currently only arraymap
> and ringbuf have mmap implemented.
>
> However, in some use cases, when multiple program instances can
> run concurrently, global mmapable memory can cause race. In that
> case, userspace needs to provide necessary synchronizations to
> coordinate the usage of mapped global data. This can be a source
> of bottleneck.
I can see your use case here. Each calling process can get the
corresponding bpf program task local storage data through
mmap interface. As you mentioned, there is a tradeoff
between more memory vs. non-global synchronization.
I am thinking that another bpf_iter approach can retrieve
the similar result. We could implement a bpf_iter
for task local storage map, optionally it can provide
a tid to retrieve the data for that particular tid.
This way, user space needs an explicit syscall, but
does not need to allocate more memory than necessary.
WDYT?
>
> It would be great to have a mmapable local storage in that case.
> This patch adds that.
>
> Mmap isn't BPF syscall, so unpriv users can also use it to
> interact with maps.
>
> Currently the only way of allocating mmapable map area is using
> vmalloc() and it's only used at map allocation time. Vmalloc()
> may sleep, therefore it's not suitable for maps that may allocate
> memory in an atomic context such as local storage. Local storage
> uses kmalloc() with GFP_ATOMIC, which doesn't sleep. This patch
> uses kmalloc() with GFP_ATOMIC as well for mmapable map area.
>
> Allocating mmapable memory has requirment on page alignment. So we
> have to deliberately allocate more memory than necessary to obtain
> an address that has sdata->data aligned at page boundary. The
> calculations for mmapable allocation size, and the actual
> allocation/deallocation are packaged in three functions:
>
> - bpf_map_mmapable_alloc_size()
> - bpf_map_mmapable_kzalloc()
> - bpf_map_mmapable_kfree()
>
> BPF local storage uses them to provide generic mmap API:
>
> - bpf_local_storage_mmap()
>
> And task local storage adds the mmap callback:
>
> - task_storage_map_mmap()
>
> When application calls mmap on a task local storage, it gets its
> own local storage.
>
> Overall, mmapable local storage trades off memory with flexibility
> and efficiency. It brings memory fragmentation but can make programs
> stateless. Therefore useful in some cases.
>
> Hao Luo (2):
> bpf: Mmapable local storage.
> selftests/bpf: Test mmapable task local storage.
>
> include/linux/bpf.h | 4 +
> include/linux/bpf_local_storage.h | 5 +-
> kernel/bpf/bpf_local_storage.c | 73 +++++++++++++++++--
> kernel/bpf/bpf_task_storage.c | 40 ++++++++++
> kernel/bpf/syscall.c | 67 +++++++++++++++++
> .../bpf/prog_tests/task_local_storage.c | 38 ++++++++++
> .../bpf/progs/task_local_storage_mmapable.c | 38 ++++++++++
> 7 files changed, 257 insertions(+), 8 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage_mmapable.c
>
next prev parent reply other threads:[~2022-03-25 19:42 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-24 23:41 [PATCH RFC bpf-next 0/2] Mmapable task local storage Hao Luo
2022-03-24 23:41 ` [PATCH RFC bpf-next 1/2] bpf: Mmapable " Hao Luo
2022-03-24 23:41 ` [PATCH RFC bpf-next 2/2] selftests/bpf: Test mmapable task " Hao Luo
2022-03-25 19:16 ` Yonghong Song [this message]
2022-03-28 17:39 ` [PATCH RFC bpf-next 0/2] Mmapable " Hao Luo
2022-03-28 17:46 ` Hao Luo
2022-03-29 9:37 ` Kumar Kartikeya Dwivedi
2022-03-29 17:43 ` Hao Luo
2022-03-29 21:45 ` Martin KaFai Lau
2022-03-30 18:05 ` Hao Luo
2022-03-29 23:29 ` Alexei Starovoitov
2022-03-30 18:06 ` Hao Luo
2022-03-30 18:16 ` Alexei Starovoitov
2022-03-30 18:26 ` Hao Luo
2022-03-31 22:32 ` KP Singh
2022-03-31 23:06 ` Alexei Starovoitov
2022-04-02 0:48 ` KP Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9cdf860d-8370-95b5-1688-af03265cc874@fb.com \
--to=yhs@fb.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=songliubraving@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox