From: David Vernet <void@manifault.com>
To: Yonghong Song <yhs@meta.com>
Cc: Yonghong Song <yhs@fb.com>,
bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, KP Singh <kpsingh@kernel.org>,
Martin KaFai Lau <martin.lau@kernel.org>,
Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v2 2/6] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Date: Sun, 23 Oct 2022 16:14:53 -0500 [thread overview]
Message-ID: <Y1WuzQxNExrOX8Xv@maniforge.dhcp.thefacebook.com> (raw)
In-Reply-To: <95ff1fa3-124b-6886-64e0-adcf40085e55@meta.com>
On Sun, Oct 23, 2022 at 09:45:35AM -0700, Yonghong Song wrote:
> > > > > > > + * could be modifying the local_storage->list now.
> > > > > > > + * Thus, no elem can be added-to or deleted-from the
> > > > > > > + * local_storage->list by the bpf_prog or by the bpf-map's syscall.
> > > > > > > + *
> > > > > > > + * It is racing with bpf_local_storage_map_free() alone
> > > > > > > + * when unlinking elem from the local_storage->list and
> > > > > > > + * the map's bucket->list.
> > > > > > > + */
> > > > > > > + bpf_cgrp_storage_lock();
> > > > > > > + raw_spin_lock_irqsave(&local_storage->lock, flags);
> > > > > > > + hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
> > > > > > > + bpf_selem_unlink_map(selem);
> > > > > > > + free_cgroup_storage =
> > > > > > > + bpf_selem_unlink_storage_nolock(local_storage, selem, false, false);
> > > > > >
> > > > > > This still requires a comment explaining why it's OK to overwrite
> > > > > > free_cgroup_storage with a previous value from calling
> > > > > > bpf_selem_unlink_storage_nolock(). Even if that is safe, this looks like
> > > > > > a pretty weird programming pattern, and IMO doing this feels more
> > > > > > intentional and future-proof:
> > > > > >
> > > > > > if (bpf_selem_unlink_storage_nolock(local_storage, selem, false, false))
> > > > > > free_cgroup_storage = true;
> > > > >
> > > > > We have a comment a few lines below.
> > > > > /* free_cgroup_storage should always be true as long as
> > > > > * local_storage->list was non-empty.
> > > > > */
> > > > > if (free_cgroup_storage)
> > > > > kfree_rcu(local_storage, rcu);
> > > >
> > > > IMO that comment doesn't provide much useful information -- it states an
> > > > assumption, but doesn't give a reason for it.
> > > >
> > > > > I will add more explanation in the above code like
> > > > >
> > > > > bpf_selem_unlink_map(selem);
> > > > > /* If local_storage list only have one element, the
> > > > > * bpf_selem_unlink_storage_nolock() will return true.
> > > > > * Otherwise, it will return false. The current loop iteration
> > > > > * intends to remove all local storage. So the last iteration
> > > > > * of the loop will set the free_cgroup_storage to true.
> > > > > */
> > > > > free_cgroup_storage =
> > > > > bpf_selem_unlink_storage_nolock(local_storage, selem, false, false);
> > > >
> > > > Thanks, this is the type of comment I was looking for.
> > > >
> > > > Also, I realize this was copy-pasted from a number of other possible
> > > > locations in the codebase which are doing the same thing, but I still
> > > > think this pattern is an odd and brittle way to do this. We're relying
> > > > on an abstracted implementation detail of
> > > > bpf_selem_unlink_storage_nolock() for correctness, which IMO is a signal
> > > > that bpf_selem_unlink_storage_nolock() should probably be the one
> > > > invoking kfree_rcu() on behalf of callers in the first place. It looks
> > > > like all of the callers end up calling kfree_rcu() on the struct
> > > > bpf_local_storage * if bpf_selem_unlink_storage_nolock() returns true,
> > > > so can we just move the responsibility of freeing the local storage
> > > > object down into bpf_selem_unlink_storage_nolock() where it's unlinked?
> > >
> > > We probably cannot do this. bpf_selem_unlink_storage_nolock()
> > > is inside the rcu_read_lock() region. We do kfree_rcu() outside
> > > the rcu_read_lock() region.
> >
> > kfree_rcu() is non-blocking and is safe to invoke from within an RCU
> > read region. If you invoke it within an RCU read region, the object will
> > not be kfree'd until (at least) you exit the current read region, so I
> > believe that the net effect here should be the same whether it's done in
> > bpf_selem_unlink_storage_nolock(), or in the caller after the RCU read
> > region is exited.
>
> Okay. we probably still want to do kfree_rcu outside
> bpf_selem_unlink_storage_nolock() as the function is to unlink storage
> for a particular selem.
Meaning, it's for unlinking a specific element rather than the whole
list, so it's not the right place to free the larger struct
bpf_local_storage * container? If that's your point (and please clarify
if it's not and I'm misunderstanding) then I agree that's true, but
unfortunately whether the API likes it or not, it's tied itself to the
lifetime of the larger struct bpf_local_storage * by returning a bool
that says whether the caller needs to free that local storage pointer.
AFAICT, with the current API / implementation, if the caller drops this
value on the floor, the struct bpf_local_storage * is leaked, which
means that it's a leaky API.
That being said, I think I agree with you that just moving kfree_rcu()
into bpf_selem_unlink_storage_nolock() may not be appropriate, but
overall it feels like this pattern / API has room for improvement.
The fact that the (now) only three callers of this function have
copy-pasted code that's doing the exact same thing to free the is local
storage object is in my opinion a testament to that.
Anyways, none of that needs to block this patch set. I acked this in
your latest version, but I think this should be cleaned up by someone in
the near future; certainly before we add another local storage variant.
> We could move
> if (free_cgroup_storage)
> kfree_rcu(local_storage, rcu);
> immediately after hlist_for_each_entry_safe() loop.
> But I think putting that 'if' statement after rcu_read_unlock() is
> slightly better as it will not increase the code inside the lock region.
Yeah, if it's not abstracted by the bpf_local_storage APIs, it might as
well just be freed outside of the critical section.
Thanks,
David
next prev parent reply other threads:[~2022-10-23 21:15 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-20 22:12 [PATCH bpf-next v2 0/6] bpf: Implement cgroup local storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 1/6] bpf: Make struct cgroup btf id global Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 2/6] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-21 5:22 ` David Vernet
2022-10-21 5:26 ` David Vernet
2022-10-21 17:33 ` Yonghong Song
2022-10-21 19:57 ` David Vernet
2022-10-21 22:57 ` Yonghong Song
2022-10-22 3:02 ` David Vernet
2022-10-23 16:45 ` Yonghong Song
2022-10-23 21:14 ` David Vernet [this message]
[not found] ` <202210210932.nHqTyTmx-lkp@intel.com>
2022-10-21 16:51 ` Yonghong Song
2022-10-21 19:29 ` Yosry Ahmed
2022-10-21 21:05 ` Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 3/6] libbpf: Support new cgroup local storage Yonghong Song
2022-10-21 23:10 ` Andrii Nakryiko
2022-10-22 0:32 ` Yonghong Song
2022-10-22 1:05 ` Tejun Heo
2022-10-20 22:13 ` [PATCH bpf-next v2 4/6] bpftool: " Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 5/6] selftests/bpf: Add selftests for " Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 6/6] docs/bpf: Add documentation for map type BPF_MAP_TYPE_CGRP_STROAGE Yonghong Song
2022-10-21 7:12 ` David Vernet
2022-10-21 17:46 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y1WuzQxNExrOX8Xv@maniforge.dhcp.thefacebook.com \
--to=void@manifault.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=kpsingh@kernel.org \
--cc=martin.lau@kernel.org \
--cc=tj@kernel.org \
--cc=yhs@fb.com \
--cc=yhs@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox