All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: sdf@google.com
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, kernel-team@cloudflare.com
Subject: Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
Date: Thu, 28 May 2020 12:34:56 +0200	[thread overview]
Message-ID: <87sgfk2vqn.fsf@cloudflare.com> (raw)
In-Reply-To: <20200527203823.GB57268@google.com>

On Wed, May 27, 2020 at 10:38 PM CEST, sdf@google.com wrote:
> On 05/27, Jakub Sitnicki wrote:
>> On Wed, May 27, 2020 at 07:48 PM CEST, sdf@google.com wrote:
>> > On 05/27, Jakub Sitnicki wrote:
>> >> Add support for bpf() syscall subcommands that operate on
>> >> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to
>> >> network namespaces (that is flow dissector at the moment).
>> >
>> >> Link-based and prog-based attachment can be used interchangeably, but only
>> >> one can be in use at a time. Attempts to attach a link when a prog is
>> >> already attached directly, and the other way around, will be met with
>> >> -EBUSY.
>> >
>> >> Attachment of multiple links of same attach type to one netns is not
>> >> supported, with the intention to lift it when a use-case presents
>> >> itself. Because of that attempts to create a netns link, when one already
>> >> exists result in -E2BIG error, signifying that there is no space left for
>> >> another attachment.
>> >
>> >> Link-based attachments to netns don't keep a netns alive by holding a ref
>> >> to it. Instead links get auto-detached from netns when the latter is being
>> >> destroyed by a pernet pre_exit callback.
>> >
>> >> When auto-detached, link lives in defunct state as long there are open FDs
>> >> for it. -ENOLINK is returned if a user tries to update a defunct link.
>> >
>> >> Because bpf_link to netns doesn't hold a ref to struct net, special care is
>> >> taken when releasing the link. The netns might be getting torn down when
>> >> the release function tries to access it to detach the link.
>> >
>> >> To ensure the struct net object is alive when release function accesses it
>> >> we rely on the fact that cleanup_net(), struct net destructor, calls
>> >> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
>> >> pre_exit happens first, link release will not attempt to access struct net.
>> >
>> >> Same applies the other way around, network namespace doesn't keep an
>> >> attached link alive because by not holding a ref to it. Instead bpf_links
>> >> to netns are RCU-freed, so that pernet pre_exit callback can safely access
>> >> and auto-detach the link when racing with link release/free.
>> >
>> > [..]
>> >> +	rcu_read_lock();
>> >>   	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
>> >> -		if (rcu_access_pointer(net->bpf.progs[type]))
>> >> +		if (rcu_access_pointer(net->bpf.links[type]))
>> >> +			bpf_netns_link_auto_detach(net, type);
>> >> +		else if (rcu_access_pointer(net->bpf.progs[type]))
>> >>   			__netns_bpf_prog_detach(net, type);
>> >>   	}
>> >> +	rcu_read_unlock();
>> > Aren't you doing RCU_INIT_POINTER in __netns_bpf_prog_detach?
>> > Is it allowed under rcu_read_load?
>
>> Yes, that's true. __netns_bpf_prog_detach does
>
>> 	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
>
>> RCU read lock is here for the rcu_dereference() that happens in
>> bpf_netns_link_auto_detach (netns doesn't hold a ref to bpf_link):
>
>> /* Called with RCU read lock. */
>> static void __net_exit
>> bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type type)
>> {
>> 	struct bpf_netns_link *net_link;
>> 	struct bpf_link *link;
>
>> 	link = rcu_dereference(net->bpf.links[type]);
>> 	if (!link)
>> 		return;
>> 	net_link = to_bpf_netns_link(link);
>> 	RCU_INIT_POINTER(net_link->net, NULL);
>> }
>
>> I've pulled it up, out of the loop, perhaps too eagerly and just made it
>> confusing, considering we're iterating over a 1-item array :-)
>
>> Now, I'm also doing RCU_INIT_POINTER on the *contents of bpf_link* in
>> bpf_netns_link_auto_detach. Is that allowed? I'm not sure, that bit is
>> hazy to me.
>
>> There are no concurrent writers to net_link->net, just readers, i.e.
>> bpf_netns_link_release(). And I know bpf_link won't be freed until the
>> grace period elapses.
>
>> sparse and CONFIG_PROVE_RCU are not shouting at me, but please do if it
>> doesn't hold up or make sense.
>
>> I certainly can push the rcu_read_lock() down into
>> bpf_netns_link_auto_detach().
> I think it would be much nicer if you push them down to preserve the
> assumption that nothing is modified under read lock and you flip
> the pointers only when holding the mutexes.

I certainly see how that would save some head-scratching. Might be
doable with grabbing a temporary reference to struct net/struct
bpf_link. Please see the code snippet below.

>
> I'll do another pass on this patch, I think I don't understand a bunch
> of bits where you do:
>
> mutex_lock
> rcu_read_lock -> why? you're already in the update section, can use
>                  rcu_dereference_protected
> ...
> rcu_read_unlock
> mutex_unlock

The rcu_read_lock is to get the grace-period guarantee for the value
(struct net) we access by dereferencing RCU protected pointer
(bpf_netns_link.net).

While mutex_lock is to serialize updates to values within struct
net. That is net->bpf.progs or net->bpf.links.

The locking is done in reverse order because I cannot grab the mutex
while in RCU read-side critical section.

If I was holding a reference to struct net, I would not need to be
inside an RCU read-side critical section to access it. (This is how
bpf_cgroup_link does it when accessing cgroup->bpf.)

One thought I had is that I could rearrange sychnronization so that we
try to grab a reference to struct net when we need to modify it:

	rcu_read_lock();
	net = rcu_dereference(to_bpf_netns_link(link)->net);
	if (net)
		net = maybe_get_net(net);
	rcu_read_unlock();

	if (!net)
		return; /* netns is dead */

	mutex_lock(&netns_bpf_mutex);
	rcu_assign_pointer(net->bpf.progs[type], link->prog);
	rcu_assign_pointer(net->bpf.links[type], link);
	mutex_unlock(&netns_bpf_mutex);
	put_net(net);

That seems be easier to reason about, no?

> But I'll post those comments inline shortly.

Thanks. Will be better to discuss in context.

  reply	other threads:[~2020-05-28 10:35 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-27 17:08 [PATCH bpf-next 0/8] Link-based program attachment to network namespaces Jakub Sitnicki
2020-05-27 17:08 ` [PATCH bpf-next 1/8] flow_dissector: Don't grab update-side lock on prog detach from pre_exit Jakub Sitnicki
2020-05-27 17:35   ` sdf
2020-05-27 17:08 ` [PATCH bpf-next 2/8] flow_dissector: Pull locking up from prog attach callback Jakub Sitnicki
2020-05-27 17:35   ` sdf
2020-05-27 17:08 ` [PATCH bpf-next 3/8] net: Introduce netns_bpf for BPF programs attached to netns Jakub Sitnicki
2020-05-27 17:40   ` sdf
2020-05-27 19:31     ` Jakub Sitnicki
2020-05-27 20:36       ` sdf
2020-05-27 17:08 ` [PATCH bpf-next 4/8] flow_dissector: Move out netns_bpf prog callbacks Jakub Sitnicki
2020-05-27 17:08 ` [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
2020-05-27 17:48   ` sdf
2020-05-27 19:54     ` Jakub Sitnicki
2020-05-27 20:38       ` sdf
2020-05-28 10:34         ` Jakub Sitnicki [this message]
2020-05-27 20:53   ` sdf
2020-05-28 11:03     ` Jakub Sitnicki
2020-05-28 16:09       ` sdf
2020-05-28  2:54   ` kbuild test robot
2020-05-28  2:54     ` kbuild test robot
2020-06-04 23:38     ` Nick Desaulniers
2020-06-04 23:38       ` Nick Desaulniers
2020-06-05 14:41       ` Jakub Sitnicki
2020-05-28  5:56   ` Andrii Nakryiko
2020-05-28 12:26     ` Jakub Sitnicki
2020-05-28 18:09       ` Andrii Nakryiko
2020-05-28 18:48         ` Alexei Starovoitov
2020-05-28 13:30   ` Jakub Sitnicki
2020-05-27 17:08 ` [PATCH bpf-next 6/8] libbpf: Add support for bpf_link-based netns attachment Jakub Sitnicki
2020-05-28  5:59   ` Andrii Nakryiko
2020-05-28 13:05     ` Jakub Sitnicki
2020-05-27 17:08 ` [PATCH bpf-next 7/8] bpftool: Support link show for netns-attached links Jakub Sitnicki
2020-05-28  6:02   ` Andrii Nakryiko
2020-05-28 13:10     ` Jakub Sitnicki
2020-05-27 17:08 ` [PATCH bpf-next 8/8] selftests/bpf: Add tests for attaching bpf_link to netns Jakub Sitnicki
2020-05-28  6:08   ` Andrii Nakryiko
2020-05-28 13:29     ` Jakub Sitnicki
2020-05-28 18:31       ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sgfk2vqn.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.