From: Andrey Ignatov <rdna@fb.com>
To: Martin Lau <kafai@fb.com>
Cc: "bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"ast@kernel.org" <ast@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 3/5] bpf: Support replacing cgroup-bpf program in MULTI mode
Date: Thu, 12 Dec 2019 18:46:31 +0000 [thread overview]
Message-ID: <20191212184630.GA65217@rdna-mbp> (raw)
In-Reply-To: <20191212181756.ny22ca756bmzrlmm@kafai-mbp>
Martin Lau <kafai@fb.com> [Thu, 2019-12-12 10:18 -0800]:
> On Tue, Dec 10, 2019 at 06:33:29PM -0800, Andrey Ignatov wrote:
> > The common use-case in production is to have multiple cgroup-bpf
> > programs per attach type that cover multiple use-cases. Such programs
> > are attached with BPF_F_ALLOW_MULTI and can be maintained by different
> > people.
> >
> > Order of programs usually matters, for example imagine two egress
> > programs: the first one drops packets and the second one counts packets.
> > If they're swapped the result of counting program will be different.
> >
> > It brings operational challenges with updating cgroup-bpf program(s)
> > attached with BPF_F_ALLOW_MULTI since there is no way to replace a
> > program:
> >
> > * One way to update is to detach all programs first and then attach the
> > new version(s) again in the right order. This introduces an
> > interruption in the work a program is doing and may not be acceptable
> > (e.g. if it's egress firewall);
> >
> > * Another way is attach the new version of a program first and only then
> > detach the old version. This introduces the time interval when two
> > versions of same program are working, what may not be acceptable if a
> > program is not idempotent. It also imposes additional burden on
> > program developers to make sure that two versions of their program can
> > co-exist.
> >
> > Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH
> > command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI
> > flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag
> > and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to
> > replace
> >
> > That way user can replace any program among those attached with
> > BPF_F_ALLOW_MULTI flag without the problems described above.
> >
> > Details of the new API:
> >
> > * If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid
> > descriptor of BPF program, BPF_PROG_ATTACH will return corresponding
> > error (EINVAL or EBADF).
> >
> > * If replace_bpf_fd has valid descriptor of BPF program but such a
> > program is not attached to specified cgroup, BPF_PROG_ATTACH will
> > return ENOENT.
> >
> > BPF_F_REPLACE is introduced to make the user intend clear, since
> > replace_bpf_fd alone can't be used for this (its default value, 0, is a
> > valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
> > future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).
> Thanks for the details explanation.
>
> [ ... ]
>
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index 283efe3ce052..45346c79613a 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -282,14 +282,17 @@ static int update_effective_progs(struct cgroup *cgrp,
> > * propagate the change to descendants
> > * @cgrp: The cgroup which descendants to traverse
> > * @prog: A program to attach
> > + * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set
> > * @type: Type of attach operation
> > * @flags: Option flags
> > *
> > * Must be called with cgroup_mutex held.
> > */
> > int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
> > + struct bpf_prog *replace_prog,
> > enum bpf_attach_type type, u32 flags)
> > {
> > + u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
> > struct list_head *progs = &cgrp->bpf.progs[type];
> > struct bpf_prog *old_prog = NULL;
> > struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
> > @@ -298,14 +301,15 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
> > enum bpf_cgroup_storage_type stype;
> > int err;
> >
> > - if ((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI))
> > + if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) ||
> > + ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
> > /* invalid combination */
> > return -EINVAL;
> >
> > if (!hierarchy_allows_attach(cgrp, type))
> > return -EPERM;
> >
> > - if (!list_empty(progs) && cgrp->bpf.flags[type] != flags)
> > + if (!list_empty(progs) && cgrp->bpf.flags[type] != saved_flags)
> > /* Disallow attaching non-overridable on top
> > * of existing overridable in this cgroup.
> > * Disallow attaching multi-prog if overridable or none
> > @@ -320,7 +324,12 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
> > if (pl->prog == prog)
> > /* disallow attaching the same prog twice */
> > return -EINVAL;
> > + if (pl->prog == replace_prog)
> > + replace_pl = pl;
> > }
> > + if ((flags & BPF_F_REPLACE) && !replace_pl)
> > + /* prog to replace not found for cgroup */
> > + return -ENOENT;
> > } else if (!list_empty(progs)) {
> > replace_pl = list_first_entry(progs, typeof(*pl), node);
> > }
> > @@ -356,7 +365,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
> > for_each_cgroup_storage_type(stype)
> > pl->storage[stype] = storage[stype];
> >
> > - cgrp->bpf.flags[type] = flags;
> > + cgrp->bpf.flags[type] = saved_flags;
> >
> > err = update_effective_progs(cgrp, type);
> > if (err)
> > @@ -522,6 +531,7 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
> > int cgroup_bpf_prog_attach(const union bpf_attr *attr,
> > enum bpf_prog_type ptype, struct bpf_prog *prog)
> > {
> > + struct bpf_prog *replace_prog = NULL;
> > struct cgroup *cgrp;
> > int ret;
> >
> > @@ -529,8 +539,20 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr,
> > if (IS_ERR(cgrp))
> > return PTR_ERR(cgrp);
> >
> > - ret = cgroup_bpf_attach(cgrp, prog, attr->attach_type,
> > + if ((attr->attach_flags & BPF_F_ALLOW_MULTI) &&
> > + (attr->attach_flags & BPF_F_REPLACE)) {
> The patch looks good. One optional nit for consideration,
>
> Since it is testing BPF_F_REPLACE here already,
> how about moving the
> "((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI))"
> test from __cgroup_bpf_attach() to this function here?
> Clear the BPF_F_REPLACE bit before passing to cgroup_bpf_attach().
>
> Then the "saved_flags" logic in cgroup_bpf_attach() can go away.
> cgroup_bpf_attach() can work on the "replace_prog" alone.
>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
Thank you for review Martin!
I considered doing exactly this and a few other options to split the
logic between __cgroup_bpf_attach() and cgroup_bpf_prog_attach() since
it's not super clear what belongs where, but decided to go with the
current approach.
A couple of reasons I split it this way:
1)
To keep the whole logic and decisions in __cgroup_bpf_attach() and use
cgroup_bpf_prog_attach() only to acquire cgroup-bpf specific resources
that correspond to the user input, such as cgroup and program to
replace. Unfortunately to acquire replace_prog I still need to check
flags to avoid unnecessary work for the most common case when
BPF_F_REPLACE is not set, but IMO it's better to keep the logic to
verify flags combinations in one place, __cgroup_bpf_attach().
2)
Also I think saved_flags would be introduced sooner or later anyway if
new flags are added since as it can be seen there is a clear separation
between flags that control programs arrangement, like OVERRIDE and
MULTI, and should be remembered for the whole life time of the program,
and one-time-needed flags such as REPLACE that are needed only once to
attach program and don't make sense in its future life time.
> > + replace_prog = bpf_prog_get_type(attr->replace_bpf_fd, ptype);
> > + if (IS_ERR(replace_prog)) {
> > + cgroup_put(cgrp);
> > + return PTR_ERR(replace_prog);
> > + }
> > + }
> > +
> > + ret = cgroup_bpf_attach(cgrp, prog, replace_prog, attr->attach_type,
> > attr->attach_flags);
> > +
> > + if (replace_prog)
> > + bpf_prog_put(replace_prog);
> > cgroup_put(cgrp);
> > return ret;
> > }
--
Andrey Ignatov
next prev parent reply other threads:[~2019-12-12 18:46 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-11 2:33 [PATCH bpf-next 0/5] bpf: Support replacing cgroup-bpf program in MULTI mode Andrey Ignatov
2019-12-11 2:33 ` [PATCH bpf-next 1/5] bpf: Simplify __cgroup_bpf_attach Andrey Ignatov
2019-12-12 17:57 ` Martin Lau
2019-12-11 2:33 ` [PATCH bpf-next 2/5] bpf: Remove unused new_flags in hierarchy_allows_attach() Andrey Ignatov
2019-12-12 17:57 ` [Potential Spoof] " Martin Lau
2019-12-11 2:33 ` [PATCH bpf-next 3/5] bpf: Support replacing cgroup-bpf program in MULTI mode Andrey Ignatov
2019-12-12 18:18 ` [Potential Spoof] " Martin Lau
2019-12-12 18:46 ` Andrey Ignatov [this message]
2019-12-11 2:33 ` [PATCH bpf-next 4/5] libbpf: Introduce bpf_prog_attach_xattr Andrey Ignatov
2019-12-12 8:08 ` Andrii Nakryiko
2019-12-11 2:33 ` [PATCH bpf-next 5/5] selftests/bpf: Cover BPF_F_REPLACE in test_cgroup_attach Andrey Ignatov
2019-12-12 18:26 ` Martin Lau
2019-12-12 18:51 ` Andrey Ignatov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191212184630.GA65217@rdna-mbp \
--to=rdna@fb.com \
--cc=Kernel-team@fb.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kafai@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox