From: Yonghong Song <yhs@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>, <davem@davemloft.net>
Cc: <daniel@iogearbox.net>, <andrii@kernel.org>,
<netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
<kernel-team@fb.com>
Subject: Re: [PATCH v3 bpf-next 2/8] bpf: Add map side support for bpf timers.
Date: Fri, 25 Jun 2021 12:46:03 -0700 [thread overview]
Message-ID: <2c523078-cb24-e1ca-2439-27be37cfe90b@fb.com> (raw)
In-Reply-To: <20210624022518.57875-3-alexei.starovoitov@gmail.com>
On 6/23/21 7:25 PM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
>
> Restrict bpf timers to array, hash (both preallocated and kmalloced), and
> lru map types. The per-cpu maps with timers don't make sense, since 'struct
> bpf_timer' is a part of map value. bpf timers in per-cpu maps would mean that
> the number of timers depends on number of possible cpus and timers would not be
> accessible from all cpus. lpm map support can be added in the future.
> The timers in inner maps are supported.
>
> The bpf_map_update/delete_elem() helpers and sys_bpf commands cancel and free
> bpf_timer in a given map element.
>
> Similar to 'struct bpf_spin_lock' BTF is required and it is used to validate
> that map element indeed contains 'struct bpf_timer'.
>
> Make check_and_init_map_value() init both bpf_spin_lock and bpf_timer when
> map element data is reused in preallocated htab and lru maps.
>
> Teach copy_map_value() to support both bpf_spin_lock and bpf_timer in a single
> map element. There could be one of each, but not more than one. Due to 'one
> bpf_timer in one element' restriction do not support timers in global data,
> since global data is a map of single element, but from bpf program side it's
> seen as many global variables and restriction of single global timer would be
> odd. The sys_bpf map_freeze and sys_mmap syscalls are not allowed on maps with
> timers, since user space could have corrupted mmap element and crashed the
> kernel. The maps with timers cannot be readonly. Due to these restrictions
> search for bpf_timer in datasec BTF in case it was placed in the global data to
> report clear error.
>
> The previous patch allowed 'struct bpf_timer' as a first field in a map
> element only. Relax this restriction.
>
> Refactor lru map to s/bpf_lru_push_free/htab_lru_push_free/ to cancel and free
> the timer when lru map deletes an element as a part of it eviction algorithm.
>
> Make sure that bpf program cannot access 'struct bpf_timer' via direct load/store.
> The timer operation are done through helpers only.
> This is similar to 'struct bpf_spin_lock'.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
I didn't find major issues. Only one minor comment below. I do a race
during map_update where updated map elements will have timer removed
but at the same time the timer might still be used after update. But
irq spinlock should handle this properly.
Acked-by: Yonghong Song <yhs@fb.com>
> ---
> include/linux/bpf.h | 44 ++++++++++++-----
> include/linux/btf.h | 1 +
> kernel/bpf/arraymap.c | 22 +++++++++
> kernel/bpf/btf.c | 77 ++++++++++++++++++++++++------
> kernel/bpf/hashtab.c | 96 +++++++++++++++++++++++++++++++++-----
> kernel/bpf/local_storage.c | 4 +-
> kernel/bpf/map_in_map.c | 1 +
> kernel/bpf/syscall.c | 21 +++++++--
> kernel/bpf/verifier.c | 30 ++++++++++--
> 9 files changed, 251 insertions(+), 45 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 72da9d4d070c..90e6c6d1404c 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -198,24 +198,46 @@ static inline bool map_value_has_spin_lock(const struct bpf_map *map)
> return map->spin_lock_off >= 0;
> }
>
> -static inline void check_and_init_map_lock(struct bpf_map *map, void *dst)
> +static inline bool map_value_has_timer(const struct bpf_map *map)
> {
> - if (likely(!map_value_has_spin_lock(map)))
> - return;
> - *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
> - (struct bpf_spin_lock){};
> + return map->timer_off >= 0;
> }
>
> -/* copy everything but bpf_spin_lock */
> +static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> +{
> + if (unlikely(map_value_has_spin_lock(map)))
> + *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
> + (struct bpf_spin_lock){};
> + if (unlikely(map_value_has_timer(map)))
> + *(struct bpf_timer *)(dst + map->timer_off) =
> + (struct bpf_timer){};
> +}
> +
[...]
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 6f6681b07364..e85a5839ffe8 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -228,6 +228,28 @@ static struct htab_elem *get_htab_elem(struct bpf_htab *htab, int i)
> return (struct htab_elem *) (htab->elems + i * (u64)htab->elem_size);
> }
>
> +static void htab_free_prealloced_timers(struct bpf_htab *htab)
> +{
> + u32 num_entries = htab->map.max_entries;
> + int i;
> +
> + if (likely(!map_value_has_timer(&htab->map)))
> + return;
> + if (!htab_is_percpu(htab) && !htab_is_lru(htab))
Is !htab_is_percpu(htab) needed? I think we already checked
if map_value has timer it can only be hash/lruhash/array?
> + /* htab has extra_elems */
> + num_entries += num_possible_cpus();
> +
> + for (i = 0; i < num_entries; i++) {
> + struct htab_elem *elem;
> +
> + elem = get_htab_elem(htab, i);
> + bpf_timer_cancel_and_free(elem->key +
> + round_up(htab->map.key_size, 8) +
> + htab->map.timer_off);
> + cond_resched();
> + }
> +}
> +
[...]
next prev parent reply other threads:[~2021-06-25 19:46 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-24 2:25 [PATCH v3 bpf-next 0/8] bpf: Introduce BPF timers Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers Alexei Starovoitov
2021-06-25 6:25 ` Yonghong Song
2021-06-25 14:57 ` Alexei Starovoitov
2021-06-25 15:54 ` Yonghong Song
2021-06-29 1:39 ` Alexei Starovoitov
2021-06-25 16:54 ` Yonghong Song
2021-06-29 1:46 ` Alexei Starovoitov
2021-06-29 2:24 ` Yonghong Song
2021-06-29 3:32 ` Alexei Starovoitov
2021-06-29 6:34 ` Andrii Nakryiko
2021-06-29 13:28 ` Alexei Starovoitov
2021-06-30 10:08 ` Andrii Nakryiko
2021-06-30 17:38 ` Alexei Starovoitov
2021-07-01 5:40 ` Alexei Starovoitov
2021-07-01 11:51 ` Toke Høiland-Jørgensen
2021-07-01 15:34 ` Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 2/8] bpf: Add map side support for " Alexei Starovoitov
2021-06-25 19:46 ` Yonghong Song [this message]
2021-06-29 1:49 ` Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 3/8] bpf: Remember BTF of inner maps Alexei Starovoitov
2021-06-29 1:45 ` Yonghong Song
2021-06-24 2:25 ` [PATCH v3 bpf-next 4/8] bpf: Relax verifier recursion check Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 5/8] bpf: Implement verifier support for validation of async callbacks Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 6/8] bpf: Teach stack depth check about " Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 7/8] selftests/bpf: Add bpf_timer test Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 8/8] selftests/bpf: Add a test with bpf_timer in inner map Alexei Starovoitov
2021-06-24 11:27 ` [PATCH v3 bpf-next 0/8] bpf: Introduce BPF timers Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2c523078-cb24-e1ca-2439-27be37cfe90b@fb.com \
--to=yhs@fb.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.