Re: [PATCH RFC bpf-next 0/3] Execution context callbacks

BPF List
 help / color / mirror / Atom feed

From: Delyan Kratunov <delyank@fb.com>
To: "alexei.starovoitov@gmail.com" <alexei.starovoitov@gmail.com>
Cc: "daniel@iogearbox.net" <daniel@iogearbox.net>,
	"sdf@google.com" <sdf@google.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH RFC bpf-next 0/3] Execution context callbacks
Date: Fri, 15 Jul 2022 18:28:20 +0000	[thread overview]
Message-ID: <8ee9f9d1a5218ab23655d3f0d754aa5634a71d89.camel@fb.com> (raw)
In-Reply-To: <20220715015100.p7fwr7dbjyfbjjad@MacBook-Pro-3.local>

On Thu, 2022-07-14 at 18:51 -0700, Alexei Starovoitov wrote:
> On Tue, Jul 12, 2022 at 06:42:52PM +0000, Delyan Kratunov wrote:
> > 
> > > but have you though of maybe initially supporting something like:
> > > 
> > > bpf_timer_init(&timer, map, SOME_NEW_DEFERRED_NMI_ONLY_FLAG);
> > > bpf_timer_set_callback(&timer, cg);
> > > bpf_timer_start(&timer, 0, 0);
> > > 
> > > If you init a timer with that special flag, I'm assuming you can have
> > > special cases in the existing helpers to simulate the delayed work?
> > 
> > Potentially but I have some reservations about drawing this equivalence.
> 
> hrtimer api has various: flags. soft vs hard irq, pinned and not.
> So the suggestion to treat irq_work callback as special timer flag
> actually fits well.
> 
> bpf_timer_init + set_callback + start can be a static inline function
> named bpf_work_submit() in bpf_helpers.h
> (or some new file that will mark the beginning libc-bpf library).
> Reusing struct bpf_timer and adding zero-delay callback could probably be
> easier for users to learn and consume.

To clarify, we're talking about 1) making bpf_timer nmi-safe for _some_ but not all
combinations of parameters and 2) adding new flags to specify an execution context?
It's achievable but it's hard to see how it's the superior solution here.

> 
> Separately:
> +struct bpf_delayed_work {
> +       __u64 :64;
> +       __u64 :64;
> +       __u64 :64;
> +       __u64 :64;
> +       __u64 :64;
> +} __attribute__((aligned(8)));
> is not extensible.
> It would be better to add indirection to allow kernel side to grow
> independently from amount of space consumed in a map value.

Fair point, I was wondering what to do with it - storing just a pointer sounds
reasonable.

> Can you think of a way to make irq_work/sleepable callback independent of maps?
> Assume bpf_mem_alloc is already available and NMI prog can allocate a typed object.
> The usage could be:
> struct my_work {
>   int a;
>   struct task_struct __kptr_ref *t;
> };
> void my_cb(struct my_work *w);
> 
> struct my_work *w = bpf_mem_alloc(allocator, bpf_core_type_id_local(*w));
> w->t = ..;
> bpf_submit_work(w, my_cb, SLEEPABLE | IRQ_WORK);
> 
> Am I day dreaming? :)

Nothing wrong with dreaming of a better future :) 

(I'm assuming you're thinking of bpf_mem_alloc being fronted by the allocator you
recently sent to the list.)

On a first pass, here are my concerns:

A program and its maps can guarantee a certain amount of storage for work items.
Sizing that storage is difficult but it is yours alone to use. The freelist allocator
can be transiently drained by other programs and starve you of this utility. This is
a new failure mode, so it's worth talking about.

With a generic allocator mechanism, we'll have a hard time enforcing the can't-load-
or-store-into-special-fields logic. I like that guardrail and I'm not sure how we'd
achieve the same guarantees. (In your snippet, we don't have the llist_node on the
work item - do we wrap my_work into something else internally? That would hide the
fields that need protecting at the expense of an extra bpf_mem_alloc allocation.)

Managing the storage returned from bpf_mem_alloc is of course also a concern. We'd
need to treat bpf_submit_work as "releasing" it (really, taking ownership). This path
means more lifecycle analysis in the verifier and explicit and implicit free()s.

I'm not opposed to it overall - the developer experience is very familiar - but I am
primarily worried that allocator failures will be in the same category of issues as
the hash map collisions for stacks. If you want reliability, you just don't use that
type of map - what's the alternative in this hypothetical bpf_mem_alloc future?

-- Delyan

next prev parent reply	other threads:[~2022-07-15 18:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-11 21:48 [PATCH RFC bpf-next 0/3] Execution context callbacks Delyan Kratunov
2022-07-11 21:48 ` [PATCH RFC bpf-next 2/3] bpf: add delayed_work mechanism Delyan Kratunov
2022-07-11 21:48 ` [PATCH RFC bpf-next 1/3] bpf: allow maps to hold bpf_delayed_work fields Delyan Kratunov
2022-07-14  4:23   ` Andrii Nakryiko
2022-07-11 21:48 ` [PATCH RFC bpf-next 3/3] selftests: delayed_work tests Delyan Kratunov
2022-07-12 18:07 ` [PATCH RFC bpf-next 0/3] Execution context callbacks sdf
2022-07-12 18:42   ` Delyan Kratunov
2022-07-12 22:51     ` sdf
2022-07-15  1:51     ` Alexei Starovoitov
2022-07-15 18:28       ` Delyan Kratunov [this message]
2022-07-19 19:02         ` Alexei Starovoitov
2022-07-19 22:12           ` Delyan Kratunov
2022-07-20  0:54             ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ee9f9d1a5218ab23655d3f0d754aa5634a71d89.camel@fb.com \
    --to=delyank@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox