Re: [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Yonghong Song <yhs@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>, <davem@davemloft.net>
Cc: <daniel@iogearbox.net>, <andrii@kernel.org>,
	<netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
	<kernel-team@fb.com>
Subject: Re: [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers.
Date: Fri, 25 Jun 2021 09:54:11 -0700	[thread overview]
Message-ID: <fd30895e-475f-c78a-d367-2abdf835c9ef@fb.com> (raw)
In-Reply-To: <20210624022518.57875-2-alexei.starovoitov@gmail.com>



On 6/23/21 7:25 PM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> Introduce 'struct bpf_timer { __u64 :64; __u64 :64; };' that can be embedded
> in hash/array/lru maps as a regular field and helpers to operate on it:
> 
> // Initialize the timer.
> // First 4 bits of 'flags' specify clockid.
> // Only CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTTIME are allowed.
> long bpf_timer_init(struct bpf_timer *timer, int flags);
> 
> // Arm the timer to call callback_fn static function and set its
> // expiration 'nsec' nanoseconds from the current time.
> long bpf_timer_start(struct bpf_timer *timer, void *callback_fn, u64 nsec);
> 
> // Cancel the timer and wait for callback_fn to finish if it was running.
> long bpf_timer_cancel(struct bpf_timer *timer);
> 
> Here is how BPF program might look like:
> struct map_elem {
>      int counter;
>      struct bpf_timer timer;
> };
> 
> struct {
>      __uint(type, BPF_MAP_TYPE_HASH);
>      __uint(max_entries, 1000);
>      __type(key, int);
>      __type(value, struct map_elem);
> } hmap SEC(".maps");
> 
> static int timer_cb(void *map, int *key, struct map_elem *val);
> /* val points to particular map element that contains bpf_timer. */
> 
> SEC("fentry/bpf_fentry_test1")
> int BPF_PROG(test1, int a)
> {
>      struct map_elem *val;
>      int key = 0;
> 
>      val = bpf_map_lookup_elem(&hmap, &key);
>      if (val) {
>          bpf_timer_init(&val->timer, CLOCK_REALTIME);
>          bpf_timer_start(&val->timer, timer_cb, 1000 /* call timer_cb2 in 1 usec */);
>      }
> }
> 
> This patch adds helper implementations that rely on hrtimers
> to call bpf functions as timers expire.
> The following patches add necessary safety checks.
> 
> Only programs with CAP_BPF are allowed to use bpf_timer.
> 
> The amount of timers used by the program is constrained by
> the memcg recorded at map creation time.
> 
> The bpf_timer_init() helper is receiving hidden 'map' argument and
> bpf_timer_start() is receiving hidden 'prog' argument supplied by the verifier.
> The prog pointer is needed to do refcnting of bpf program to make sure that
> program doesn't get freed while the timer is armed. This apporach relies on
> "user refcnt" scheme used in prog_array that stores bpf programs for
> bpf_tail_call. The bpf_timer_start() will increment the prog refcnt which is
> paired with bpf_timer_cancel() that will drop the prog refcnt. The
> ops->map_release_uref is responsible for cancelling the timers and dropping
> prog refcnt when user space reference to a map reaches zero.
> This uref approach is done to make sure that Ctrl-C of user space process will
> not leave timers running forever unless the user space explicitly pinned a map
> that contained timers in bpffs.
> 
> The bpf_map_delete_elem() and bpf_map_update_elem() operations cancel
> and free the timer if given map element had it allocated.
> "bpftool map update" command can be used to cancel timers.
> 
> The 'struct bpf_timer' is explicitly __attribute__((aligned(8))) because
> '__u64 :64' has 1 byte alignment of 8 byte padding.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>   include/linux/bpf.h            |   3 +
>   include/uapi/linux/bpf.h       |  55 +++++++
>   kernel/bpf/helpers.c           | 281 +++++++++++++++++++++++++++++++++
>   kernel/bpf/verifier.c          | 138 ++++++++++++++++
>   kernel/trace/bpf_trace.c       |   2 +-
>   scripts/bpf_doc.py             |   2 +
>   tools/include/uapi/linux/bpf.h |  55 +++++++
>   7 files changed, 535 insertions(+), 1 deletion(-)
> 
[...]
> @@ -12533,6 +12607,70 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>   			continue;
>   		}
>   
> +		if (insn->imm == BPF_FUNC_timer_init) {
> +			aux = &env->insn_aux_data[i + delta];
> +			if (bpf_map_ptr_poisoned(aux)) {
> +				verbose(env, "bpf_timer_init abusing map_ptr\n");
> +				return -EINVAL;
> +			}
> +			map_ptr = BPF_MAP_PTR(aux->map_ptr_state);
> +			{
> +				struct bpf_insn ld_addrs[2] = {
> +					BPF_LD_IMM64(BPF_REG_3, (long)map_ptr),
> +				};
> +
> +				insn_buf[0] = ld_addrs[0];
> +				insn_buf[1] = ld_addrs[1];
> +			}
> +			insn_buf[2] = *insn;
> +			cnt = 3;
> +
> +			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> +			if (!new_prog)
> +				return -ENOMEM;
> +
> +			delta    += cnt - 1;
> +			env->prog = prog = new_prog;
> +			insn      = new_prog->insnsi + i + delta;
> +			goto patch_call_imm;
> +		}
> +
> +		if (insn->imm == BPF_FUNC_timer_start) {
> +			/* There is no need to do:
> +			 *     aux = &env->insn_aux_data[i + delta];
> +			 *     if (bpf_map_ptr_poisoned(aux)) return -EINVAL;
> +			 * for bpf_timer_start(). If the same callback_fn is shared
> +			 * by different timers in different maps the poisoned check
> +			 * will return false positive.
> +			 *
> +			 * The verifier will process callback_fn as many times as necessary
> +			 * with different maps and the register states prepared by
> +			 * set_timer_start_callback_state will be accurate.
> +			 *
> +			 * There is no need for bpf_timer_start() to check in the
> +			 * run-time that bpf_hrtimer->map stored during bpf_timer_init()
> +			 * is the same map as in bpf_timer_start()
> +			 * because it's the same map element value.

I am puzzled by above comments. Maybe you could explain more?
bpf_timer_start() checked whether timer is initialized with timer->timer 
NULL check. It will proceed only if a valid timer has been
initialized. I think the following scenarios are also supported:
   1. map1 is shared by prog1 and prog2
   2. prog1 call bpf_timer_init for all map1 elements
   3. prog2 call bpf_timer_start for some or all map1 elements.
So for prog2 verification, bpf_timer_init() is not even called.

> +			 */
> +			struct bpf_insn ld_addrs[2] = {
> +				BPF_LD_IMM64(BPF_REG_4, (long)prog),
> +			};
> +
> +			insn_buf[0] = ld_addrs[0];
> +			insn_buf[1] = ld_addrs[1];
> +			insn_buf[2] = *insn;
> +			cnt = 3;
> +
> +			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> +			if (!new_prog)
> +				return -ENOMEM;
> +
> +			delta    += cnt - 1;
> +			env->prog = prog = new_prog;
> +			insn      = new_prog->insnsi + i + delta;
> +			goto patch_call_imm;
> +		}
> +
>   		/* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
>   		 * and other inlining handlers are currently limited to 64 bit
>   		 * only.
[...]

next prev parent reply	other threads:[~2021-06-25 16:54 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24  2:25 [PATCH v3 bpf-next 0/8] bpf: Introduce BPF timers Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers Alexei Starovoitov
2021-06-25  6:25   ` Yonghong Song
2021-06-25 14:57     ` Alexei Starovoitov
2021-06-25 15:54       ` Yonghong Song
2021-06-29  1:39         ` Alexei Starovoitov
2021-06-25 16:54   ` Yonghong Song [this message]
2021-06-29  1:46     ` Alexei Starovoitov
2021-06-29  2:24       ` Yonghong Song
2021-06-29  3:32         ` Alexei Starovoitov
2021-06-29  6:34       ` Andrii Nakryiko
2021-06-29 13:28         ` Alexei Starovoitov
2021-06-30 10:08           ` Andrii Nakryiko
2021-06-30 17:38             ` Alexei Starovoitov
2021-07-01  5:40   ` Alexei Starovoitov
2021-07-01 11:51     ` Toke Høiland-Jørgensen
2021-07-01 15:34       ` Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 2/8] bpf: Add map side support for " Alexei Starovoitov
2021-06-25 19:46   ` Yonghong Song
2021-06-29  1:49     ` Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 3/8] bpf: Remember BTF of inner maps Alexei Starovoitov
2021-06-29  1:45   ` Yonghong Song
2021-06-24  2:25 ` [PATCH v3 bpf-next 4/8] bpf: Relax verifier recursion check Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 5/8] bpf: Implement verifier support for validation of async callbacks Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 6/8] bpf: Teach stack depth check about " Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 7/8] selftests/bpf: Add bpf_timer test Alexei Starovoitov
2021-06-24  2:25 ` [PATCH v3 bpf-next 8/8] selftests/bpf: Add a test with bpf_timer in inner map Alexei Starovoitov
2021-06-24 11:27 ` [PATCH v3 bpf-next 0/8] bpf: Introduce BPF timers Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fd30895e-475f-c78a-d367-2abdf835c9ef@fb.com \
    --to=yhs@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).