Re: [PATCH bpf-next v3 1/8] bpf: Add generic attach/detach/query API for multi-progs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stanislav Fomichev <sdf@google.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: ast@kernel.org, andrii@kernel.org, martin.lau@linux.dev,
	 razor@blackwall.org, john.fastabend@gmail.com, kuba@kernel.org,
	dxu@dxuuu.xyz,  joe@cilium.io, toke@kernel.org,
	davem@davemloft.net, bpf@vger.kernel.org,
	 netdev@vger.kernel.org
Subject: Re: [PATCH bpf-next v3 1/8] bpf: Add generic attach/detach/query API for multi-progs
Date: Mon, 10 Jul 2023 11:18:11 -0700	[thread overview]
Message-ID: <ZKxLY3onuOHepOxt@google.com> (raw)
In-Reply-To: <d67ca0f4-4753-e86f-f8ca-dd515f941ea5@iogearbox.net>

On 07/10, Daniel Borkmann wrote:
> On 7/7/23 11:27 PM, Stanislav Fomichev wrote:
> > On 07/07, Daniel Borkmann wrote:
> [...]
> > > +static inline struct bpf_mprog_entry *
> > > +bpf_mprog_create(const size_t size, const off_t off)
> > > +{
> > > +	struct bpf_mprog_bundle *bundle;
> > > +	void *ptr;
> > > +
> > > +	BUILD_BUG_ON(size < sizeof(*bundle) + off);
> > > +	BUILD_BUG_ON(sizeof(bundle->a.fp_items[0]) > sizeof(u64));
> > > +	BUILD_BUG_ON(ARRAY_SIZE(bundle->a.fp_items) !=
> > > +		     ARRAY_SIZE(bundle->cp_items));
> > > +
> > > +	ptr = kzalloc(size, GFP_KERNEL);
> > > +	if (ptr) {
> > > +		bundle = ptr + off;
> > > +		atomic64_set(&bundle->revision, 1);
> > > +		bundle->off = off;
> > > +		bundle->a.parent = bundle;
> > > +		bundle->b.parent = bundle;
> > > +		return &bundle->a;
> > > +	}
> > > +	return NULL;
> > > +}
> > > +
> > > +void bpf_mprog_free_rcu(struct rcu_head *rcu);
> > > +
> > > +static inline void bpf_mprog_free(struct bpf_mprog_entry *entry)
> > > +{
> > > +	struct bpf_mprog_bundle *bundle = entry->parent;
> > > +
> > > +	call_rcu(&bundle->rcu, bpf_mprog_free_rcu);
> > > +}
> > 
> > Any reason we're doing allocation here? Why not do
> > bpf_mprog_init(struct bpf_mprog_bundle *) instead that simply initializes
> > the fields? Then we can move allocation/free part to the caller (tcx) along
> > with rcu_head.
> > Feels like it would be a bit more conventional/readable? bpf_mprog_free{,_rcu}
> > will also become tcx_free{,_rcu}..
> > 
> > I guess current approach works, but it took me awhile to figure it out..
> > (maybe it's just me)
> 
> I found this approach quite useful for tcx case since we only fetch the
> bpf_mprog_entry for tcx_link_prog_attach et al, but I can take a look to
> see if this looks better and if it does I'll include it.
> 
> > > +static inline void bpf_mprog_mark_ref(struct bpf_mprog_entry *entry,
> > > +				      struct bpf_tuple *tuple)
> > > +{
> > > +	WARN_ON_ONCE(entry->parent->ref);
> > > +	if (!tuple->link)
> > > +		entry->parent->ref = tuple->prog;
> > > +}
> > > +
> > > +static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry)
> > > +{
> > > +	entry->parent->count++;
> > > +}
> > > +
> > > +static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry)
> > > +{
> > > +	entry->parent->count--;
> > > +}
> > > +
> > > +static inline int bpf_mprog_max(void)
> > > +{
> > > +	return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1;
> > > +}
> > > +
> > > +static inline int bpf_mprog_total(struct bpf_mprog_entry *entry)
> > > +{
> > > +	int total = entry->parent->count;
> > > +
> > > +	WARN_ON_ONCE(total > bpf_mprog_max());
> > > +	return total;
> > > +}
> > > +
> > > +static inline bool bpf_mprog_exists(struct bpf_mprog_entry *entry,
> > > +				    struct bpf_prog *prog)
> > > +{
> > > +	const struct bpf_mprog_fp *fp;
> > > +	const struct bpf_prog *tmp;
> > > +
> > > +	bpf_mprog_foreach_prog(entry, fp, tmp) {
> > > +		if (tmp == prog)
> > > +			return true;
> > > +	}
> > > +	return false;
> > > +}
> > > +
> > > +static inline bool bpf_mprog_swap_entries(const int code)
> > > +{
> > > +	return code == BPF_MPROG_SWAP ||
> > > +	       code == BPF_MPROG_FREE;
> > > +}
> > > +
> > > +static inline void bpf_mprog_commit(struct bpf_mprog_entry *entry)
> > > +{
> > > +	atomic64_inc(&entry->parent->revision);
> > > +	synchronize_rcu();
> > 
> > Maybe add a comment on why we need to synchronize_rcu here? In general,
> > I don't think I have a good grasp of that ->ref member.
> 
> Yeap, will add a comment. For the case where we delete the prog, we mark
> it in bpf_mprog_detach, but we can only drop the reference once the user
> swapped the bpf_mprog_entry and ensured that there are no in-flight users
> hence both in bpf_mprog_commit.
> 
> [...]
> > > +static int bpf_mprog_prog(struct bpf_tuple *tuple,
> > > +			  u32 object, u32 flags,
> > > +			  enum bpf_prog_type type)
> > > +{
> > > +	bool id = flags & BPF_F_ID;
> > > +	struct bpf_prog *prog;
> > > +
> > > +	if (id)
> > > +		prog = bpf_prog_by_id(object);
> > > +	else
> > > +		prog = bpf_prog_get(object);
> > > +	if (IS_ERR(prog)) {
> > 
> > [..]
> > 
> > > +		if (!object && !id)
> > > +			return 0;
> > 
> > What's the reason behind this?
> 
> If an fd was passed which is 0 and this was not a program fd, then we don't error
> out and treat it as if no fd was passed.
 
Is this new api an opportunity to fix that fd==0? And always treat it as
valid. Or we have some other constrains elsewhere?

> > > +		return PTR_ERR(prog);
> > > +	}
> > > +	if (type && prog->type != type) {
> > > +		bpf_prog_put(prog);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	tuple->link = NULL;
> > > +	tuple->prog = prog;
> > > +	return 0;
> > > +}
> [...]
> > > +static int bpf_mprog_pos_before(struct bpf_mprog_entry *entry,
> > > +				struct bpf_tuple *tuple)
> > > +{
> > > +	struct bpf_mprog_fp *fp;
> > > +	struct bpf_mprog_cp *cp;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < bpf_mprog_total(entry); i++) {
> > > +		bpf_mprog_read(entry, i, &fp, &cp);
> > > +		if (tuple->prog == READ_ONCE(fp->prog) &&
> > 
> > Both attach/detach happen under rtnl, why do need READ_ONCE? I'm assuming
> > even going forwrad, attach/detach from non-tcx places will happen
> > under lock?
> > 
> > (same for bpf_mprog_pos_before/bpf_mprog_pos_after)
> > 
> > Feels like the only place where we need WRITE_ONCE is the replace (in-place)
> > and READ_ONCE during fast-path. Why do we need the rest?
> 
> Yes, the replace case is via WRITE_ONCE, hence the READ_ONCE annotations. You
> are saying that for the cases where we are under lock we should just drop the
> READ_ONCE annotations? I can do that ofc, I thought the general convention was
> to do the {READ,WRITE}_ONCE consistently for the purpose of documenting fp->prog
> access.

I see, then maybe let's keep them. I was a bit confused because those
READ_ONCE are within a locked section so I wasn't sure whether I'm
missing something or it's working as intended :-)

> > > +		    (!tuple->link || tuple->link == cp->link))
> > > +			return i - 1;
> > > +	}
> > > +	return tuple->prog ? -ENOENT : -1;
> > > +}
> > > +
> > > +static int bpf_mprog_pos_after(struct bpf_mprog_entry *entry,
> > > +			       struct bpf_tuple *tuple)
> > > +{
> > > +	struct bpf_mprog_fp *fp;
> > > +	struct bpf_mprog_cp *cp;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < bpf_mprog_total(entry); i++) {
> > > +		bpf_mprog_read(entry, i, &fp, &cp);
> > > +		if (tuple->prog == READ_ONCE(fp->prog) &&
> > > +		    (!tuple->link || tuple->link == cp->link))
> > > +			return i + 1;
> > > +	}
> > > +	return tuple->prog ? -ENOENT : bpf_mprog_total(entry);
> > > +}
> > > +
> > > +int bpf_mprog_attach(struct bpf_mprog_entry *entry, struct bpf_prog *prog_new,
> > > +		     struct bpf_link *link, struct bpf_prog *prog_old,
> > > +		     u32 flags, u32 object, u64 revision)
> > > +{
> > > +	struct bpf_tuple rtuple, ntuple = {
> > > +		.prog = prog_new,
> > > +		.link = link,
> > > +	}, otuple = {
> > > +		.prog = prog_old,
> > > +		.link = link,
> > > +	};
> > > +	int ret, idx = -2, tidx;
> > > +
> > > +	if (revision && revision != bpf_mprog_revision(entry))
> > > +		return -ESTALE;
> > > +	if (bpf_mprog_exists(entry, prog_new))
> > > +		return -EEXIST;
> > > +	ret = bpf_mprog_tuple_relative(&rtuple, object,
> > > +				       flags & ~BPF_F_REPLACE,
> > > +				       prog_new->type);
> > > +	if (ret)
> > > +		return ret;
> > > +	if (flags & BPF_F_REPLACE) {
> > > +		tidx = bpf_mprog_pos_exact(entry, &otuple);
> > > +		if (tidx < 0) {
> > > +			ret = tidx;
> > > +			goto out;
> > > +		}
> > > +		idx = tidx;
> > > +	}
> > 
> > [..]
> > 
> > > +	if (flags & BPF_F_BEFORE) {
> > > +		tidx = bpf_mprog_pos_before(entry, &rtuple);
> > > +		if (tidx < -1 || (idx >= -1 && tidx != idx)) {
> > > +			ret = tidx < -1 ? tidx : -EDOM;
> > > +			goto out;
> > > +		}
> > > +		idx = tidx;
> > > +	}
> > > +	if (flags & BPF_F_AFTER) {
> > > +		tidx = bpf_mprog_pos_after(entry, &rtuple);
> > > +		if (tidx < 0 || (idx >= -1 && tidx != idx)) {
> > > +			ret = tidx < 0 ? tidx : -EDOM;
> > > +			goto out;
> > > +		}
> > > +		idx = tidx;
> > > +	}
> > 
> > There still seems to be some inter-dependency between F_BEFORE and F_AFTER?
> > IOW, looks like I can pass F_BEFORE|F_AFTER|F_REPLACE. Do we need that?
> > Why not exclusive cases?
> 
> I reworked this as per Andrii's suggestion/preference from v2 [0], iow, to calculate
> target index and bail out if the request cannot be resolved into a common index.
> 
>   [0] https://lore.kernel.org/bpf/CAEf4BzbsUMnP7WMm3OmJznvD2b03B1qASFRNiDoVAU6XvvTZNA@mail.gmail.com/

SG! Let's maybe put a summary in the header of what the expectation is when
combining them?

next prev parent reply	other threads:[~2023-07-10 18:18 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-07 17:24 [PATCH bpf-next v3 0/8] BPF link support for tc BPF programs Daniel Borkmann
2023-07-07 17:24 ` [PATCH bpf-next v3 1/8] bpf: Add generic attach/detach/query API for multi-progs Daniel Borkmann
2023-07-07 21:27   ` Stanislav Fomichev
2023-07-10  7:42     ` Daniel Borkmann
2023-07-10 14:26       ` Daniel Borkmann
2023-07-10 18:18       ` Stanislav Fomichev [this message]
2023-07-10 18:26         ` Alexei Starovoitov
2023-07-10 19:00           ` Stanislav Fomichev
2023-07-10 20:16             ` Alexei Starovoitov
2023-07-10 21:13               ` Stanislav Fomichev
2023-07-10 22:38                 ` Alexei Starovoitov
2023-07-10 22:46                   ` Stanislav Fomichev
2023-07-10 18:29         ` Daniel Borkmann
2023-07-09 17:17   ` Alexei Starovoitov
2023-07-10  7:10     ` Daniel Borkmann
2023-07-10 13:15       ` Daniel Borkmann
2023-07-07 17:24 ` [PATCH bpf-next v3 2/8] bpf: Add fd-based tcx multi-prog infra with link support Daniel Borkmann
2023-07-09 17:19   ` Alexei Starovoitov
2023-07-10  6:57     ` Daniel Borkmann
2023-07-07 17:24 ` [PATCH bpf-next v3 3/8] libbpf: Add opts-based attach/detach/query API for tcx Daniel Borkmann
2023-07-07 17:24 ` [PATCH bpf-next v3 4/8] libbpf: Add link-based " Daniel Borkmann
2023-07-07 17:24 ` [PATCH bpf-next v3 5/8] libbpf: Add helper macro to clear opts structs Daniel Borkmann
2023-07-07 17:24 ` [PATCH bpf-next v3 6/8] bpftool: Extend net dump with tcx progs Daniel Borkmann
2023-07-07 21:31   ` Stanislav Fomichev
2023-07-10  6:53     ` Daniel Borkmann
2023-07-07 17:24 ` [PATCH bpf-next v3 7/8] selftests/bpf: Add mprog API tests for BPF tcx opts Daniel Borkmann
2023-07-07 17:24 ` [PATCH bpf-next v3 8/8] selftests/bpf: Add mprog API tests for BPF tcx links Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZKxLY3onuOHepOxt@google.com \
    --to=sdf@google.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dxu@dxuuu.xyz \
    --cc=joe@cilium.io \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=razor@blackwall.org \
    --cc=toke@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.