BPF List
 help / color / mirror / Atom feed
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Kernel Team <kernel-team@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH v1 bpf-next 0/2] bpf: Add mmapable task_local storage
Date: Mon, 20 Nov 2023 09:59:23 -0800	[thread overview]
Message-ID: <20231120175925.733167-1-davemarchevsky@fb.com> (raw)

This series adds support for mmap()ing single task_local storage mapvals
into userspace. Two motivating usecases:

  * sched_ext ([0]) schedulers might want to act on 'scheduling hints'
    provided by userspace tasks. For example, a task can tag itself as
    latency-sensitive but not particularly computationally intensive and
    BPF scheduler can use this information to make better scheduling
    decisions. Similarly, a database task about to start a
    transaction can tag itself as doing so without high overhead by
    writing to the mmap'd mapval. In both cases the information is
    task-specific and in the latter it'd be preferable to avoid
    incurring syscall overhead as the hint would change often.

  * strobemeta ([1]) technique to read thread_local storage is used
    by tracing programs at Meta to annotate tracing data with
    task-specific metadata. For example, a multithreaded webserver with
    a pool of worker threads preparing responses and other threads
    handling request connections might want to tag threads by type, and
    further tag worker threads with feature flags enabled during request
    processing.
      * The strobemeta technique predates existence of task_local
	storage map, instead relying on reverse-engineering thread_local
	storage implementation specifics. The approach enabled here
	avoids much of this complexity.

The general thrust of this series' implementation is "simplest thing
that works". A userspace thread can mmap() a task_local storage map fd
and receive the map_value corresponding to its task. In the future we
can support mmap()ing in other threads' map_values via offset parameter
or some other approach. Similarly, this series makes no attempt to pack
multiple map_values into a userspace-mappable page - each map_value for
a BPF_F_MMAPABLE task_local storage map is given its own page. For the
motivating usecases above neither of those potential improvements is
necessary. Patch 1's summary digs deeper into implementation details.

This series' changes to generic local_storage implementation shared by
cgroup_local storage and others will make extending this support to
those local storage types straightforward in the future.

Summary of patches:
  * Patch 1 adds support for mmapable map_vals in generic
    bpf_local_storage infrastructure and uses the new feature in
    task_local storage
  * Patch 2 adds tests

  [0]: https://lore.kernel.org/bpf/20231111024835.2164816-1-tj@kernel.org/
  [1]: tools/testing/selftests/bpf/progs/strobemeta*

Dave Marchevsky (2):
  bpf: Support BPF_F_MMAPABLE task_local storage
  selftests/bpf: Add test exercising mmapable task_local_storage

 include/linux/bpf_local_storage.h             |  14 +-
 kernel/bpf/bpf_local_storage.c                | 145 +++++++++++---
 kernel/bpf/bpf_task_storage.c                 |  35 +++-
 kernel/bpf/syscall.c                          |   2 +-
 .../bpf/prog_tests/task_local_storage.c       | 177 ++++++++++++++++++
 .../bpf/progs/task_local_storage__mmap.c      |  59 ++++++
 .../bpf/progs/task_local_storage__mmap.h      |   7 +
 .../bpf/progs/task_local_storage__mmap_fail.c |  39 ++++
 8 files changed, 445 insertions(+), 33 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage__mmap.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage__mmap.h
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage__mmap_fail.c

-- 
2.34.1


             reply	other threads:[~2023-11-20 17:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-20 17:59 Dave Marchevsky [this message]
2023-11-20 17:59 ` [PATCH v1 bpf-next 1/2] bpf: Support BPF_F_MMAPABLE task_local storage Dave Marchevsky
2023-11-20 21:41   ` Johannes Weiner
2023-11-21  0:42   ` Martin KaFai Lau
2023-11-21  6:11     ` David Marchevsky
2023-11-21 19:27       ` Martin KaFai Lau
2023-11-21 19:49         ` Alexei Starovoitov
2023-12-11 17:31           ` David Marchevsky
2023-11-21  2:32   ` kernel test robot
2023-11-21  5:06   ` kernel test robot
2023-11-21  5:20   ` kernel test robot
2023-11-21  5:44   ` Alexei Starovoitov
2023-11-21  6:41   ` Yonghong Song
2023-11-21 15:34   ` Yonghong Song
2023-11-21 19:30   ` Andrii Nakryiko
2023-11-20 17:59 ` [PATCH v1 bpf-next 2/2] selftests/bpf: Add test exercising mmapable task_local_storage Dave Marchevsky
2023-11-21 19:34   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231120175925.733167-1-davemarchevsky@fb.com \
    --to=davemarchevsky@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox