From: Yonghong Song <yonghong.song@linux.dev>
To: Dave Marchevsky <davemarchevsky@fb.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@kernel.org>,
Kernel Team <kernel-team@fb.com>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v1 bpf-next 1/2] bpf: Support BPF_F_MMAPABLE task_local storage
Date: Tue, 21 Nov 2023 07:34:05 -0800 [thread overview]
Message-ID: <20669b09-36be-493c-9cf7-bf34e906832c@linux.dev> (raw)
In-Reply-To: <20231120175925.733167-2-davemarchevsky@fb.com>
On 11/20/23 12:59 PM, Dave Marchevsky wrote:
> This patch modifies the generic bpf_local_storage infrastructure to
> support mmapable map values and adds mmap() handling to task_local
> storage leveraging this new functionality. A userspace task which
> mmap's a task_local storage map will receive a pointer to the map_value
> corresponding to that tasks' key - mmap'ing in other tasks' mapvals is
> not supported in this patch.
>
> Currently, struct bpf_local_storage_elem contains both bookkeeping
> information as well as a struct bpf_local_storage_data with additional
> bookkeeping information and the actual mapval data. We can't simply map
> the page containing this struct into userspace. Instead, mmapable
> local_storage uses bpf_local_storage_data's data field to point to the
> actual mapval, which is allocated separately such that it can be
> mmapped. Only the mapval lives on the page(s) allocated for it.
>
> The lifetime of the actual_data mmapable region is tied to the
> bpf_local_storage_elem which points to it. This doesn't necessarily mean
> that the pages go away when the bpf_local_storage_elem is free'd - if
> they're mapped into some userspace process they will remain until
> unmapped, but are no longer the task_local storage's mapval.
>
> Implementation details:
>
> * A few small helpers are added to deal with bpf_local_storage_data's
> 'data' field having different semantics when the local_storage map
> is mmapable. With their help, many of the changes to existing code
> are purely mechanical (e.g. sdata->data becomes sdata_mapval(sdata),
> selem->elem_size becomes selem_bytes_used(selem)).
>
> * The map flags are copied into bpf_local_storage_data when its
> containing bpf_local_storage_elem is alloc'd, since the
> bpf_local_storage_map associated with them may be gone when
> bpf_local_storage_data is free'd, and testing flags for
> BPF_F_MMAPABLE is necessary when free'ing to ensure that the
> mmapable region is free'd.
> * The extra field doesn't change bpf_local_storage_elem's size.
> There were 48 bytes of padding after the bpf_local_storage_data
> field, now there are 40.
>
> * Currently, bpf_local_storage_update always creates a new
> bpf_local_storage_elem for the 'updated' value - the only exception
> being if the map_value has a bpf_spin_lock field, in which case the
> spin lock is grabbed instead of the less granular bpf_local_storage
> lock, and the value updated in place. This inplace update behavior
> is desired for mmapable local_storage map_values as well, since
> creating a new selem would result in new mmapable pages.
>
> * The size of the mmapable pages are accounted for when calling
> mem_{charge,uncharge}. If the pages are mmap'd into a userspace task
> mem_uncharge may be called before they actually go away.
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
> include/linux/bpf_local_storage.h | 14 ++-
> kernel/bpf/bpf_local_storage.c | 145 ++++++++++++++++++++++++------
> kernel/bpf/bpf_task_storage.c | 35 ++++++--
> kernel/bpf/syscall.c | 2 +-
> 4 files changed, 163 insertions(+), 33 deletions(-)
>
> diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
> index 173ec7f43ed1..114973f925ea 100644
> --- a/include/linux/bpf_local_storage.h
> +++ b/include/linux/bpf_local_storage.h
> @@ -69,7 +69,17 @@ struct bpf_local_storage_data {
> * the number of cachelines accessed during the cache hit case.
> */
> struct bpf_local_storage_map __rcu *smap;
> - u8 data[] __aligned(8);
> + /* Need to duplicate smap's map_flags as smap may be gone when
> + * it's time to free bpf_local_storage_data
> + */
> + u64 smap_map_flags;
> + /* If BPF_F_MMAPABLE, this is a void * to separately-alloc'd data
> + * Otherwise the actual mapval data lives here
> + */
> + union {
> + DECLARE_FLEX_ARRAY(u8, data) __aligned(8);
> + void *actual_data __aligned(8);
> + };
> };
>
> /* Linked to bpf_local_storage and bpf_local_storage_map */
> @@ -124,6 +134,8 @@ static struct bpf_local_storage_cache name = { \
> /* Helper functions for bpf_local_storage */
> int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
>
> +void *sdata_mapval(struct bpf_local_storage_data *data);
> +
> struct bpf_map *
> bpf_local_storage_map_alloc(union bpf_attr *attr,
> struct bpf_local_storage_cache *cache,
> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index 146824cc9689..9b3becbcc1a3 100644
> --- a/kernel/bpf/bpf_local_storage.c
[...]
> @@ -583,14 +665,14 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
> if (err) {
> bpf_selem_free(selem, smap, true);
> - mem_uncharge(smap, owner, smap->elem_size);
> + mem_uncharge(smap, owner, selem_bytes_used(smap));
> return ERR_PTR(err);
> }
>
> return SDATA(selem);
> }
>
> - if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
> + if (can_update_existing_selem(smap, map_flags) && !(map_flags & BPF_NOEXIST)) {
> /* Hoping to find an old_sdata to do inline update
> * such that it can avoid taking the local_storage->lock
> * and changing the lists.
> @@ -601,8 +683,13 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> if (err)
> return ERR_PTR(err);
> if (old_sdata && selem_linked_to_storage_lockless(SELEM(old_sdata))) {
> - copy_map_value_locked(&smap->map, old_sdata->data,
> - value, false);
> + if (map_flags & BPF_F_LOCK)
> + copy_map_value_locked(&smap->map,
> + sdata_mapval(old_sdata),
> + value, false);
> + else
> + copy_map_value(&smap->map, sdata_mapval(old_sdata),
> + value);
IIUC, if two 'storage_update' to the same map/key and then
these two updates will be serialized due to spin_lock.
How about concurrent update for mmap'ed sdata, do we need
any protection here?
> return old_sdata;
> }
> }
> @@ -633,8 +720,8 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> goto unlock;
>
> if (old_sdata && (map_flags & BPF_F_LOCK)) {
> - copy_map_value_locked(&smap->map, old_sdata->data, value,
> - false);
> + copy_map_value_locked(&smap->map, sdata_mapval(old_sdata),
> + value, false);
> selem = SELEM(old_sdata);
> goto unlock;
> }
> @@ -656,7 +743,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> unlock:
> raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> if (alloc_selem) {
> - mem_uncharge(smap, owner, smap->elem_size);
> + mem_uncharge(smap, owner, selem_bytes_used(smap));
> bpf_selem_free(alloc_selem, smap, true);
> }
> return err ? ERR_PTR(err) : SDATA(selem);
> @@ -707,6 +794,10 @@ int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
> if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE)
> return -E2BIG;
>
> + if ((attr->map_flags & BPF_F_MMAPABLE) &&
> + attr->map_type != BPF_MAP_TYPE_TASK_STORAGE)
> + return -EINVAL;
> +
> return 0;
> }
>
[...]
next prev parent reply other threads:[~2023-11-21 15:34 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-20 17:59 [PATCH v1 bpf-next 0/2] bpf: Add mmapable task_local storage Dave Marchevsky
2023-11-20 17:59 ` [PATCH v1 bpf-next 1/2] bpf: Support BPF_F_MMAPABLE " Dave Marchevsky
2023-11-20 21:41 ` Johannes Weiner
2023-11-21 0:42 ` Martin KaFai Lau
2023-11-21 6:11 ` David Marchevsky
2023-11-21 19:27 ` Martin KaFai Lau
2023-11-21 19:49 ` Alexei Starovoitov
2023-12-11 17:31 ` David Marchevsky
2023-11-21 2:32 ` kernel test robot
2023-11-21 5:06 ` kernel test robot
2023-11-21 5:20 ` kernel test robot
2023-11-21 5:44 ` Alexei Starovoitov
2023-11-21 6:41 ` Yonghong Song
2023-11-21 15:34 ` Yonghong Song [this message]
2023-11-21 19:30 ` Andrii Nakryiko
2023-11-20 17:59 ` [PATCH v1 bpf-next 2/2] selftests/bpf: Add test exercising mmapable task_local_storage Dave Marchevsky
2023-11-21 19:34 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20669b09-36be-493c-9cf7-bf34e906832c@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davemarchevsky@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox