netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
	kernel-team@cloudflare.com
Subject: Re: [PATCH bpf-next v2 04/12] bpf: Add link-based BPF program attachment to network namespace
Date: Tue, 02 Jun 2020 11:30:53 +0200	[thread overview]
Message-ID: <87ftbd3jci.fsf@cloudflare.com> (raw)
In-Reply-To: <CAEf4BzbxPrEJgWyeh_XzQcQ6VwfhC9NzyDNX4JCu86Jj4cCMtA@mail.gmail.com>

On Tue, Jun 02, 2020 at 12:30 AM CEST, Andrii Nakryiko wrote:
> On Sun, May 31, 2020 at 1:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Extend bpf() syscall subcommands that operate on bpf_link, that is
>> LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to
>> network namespaces (only flow dissector at the moment).
>>
>> Link-based and prog-based attachment can be used interchangeably, but only
>> one can exist at a time. Attempts to attach a link when a prog is already
>> attached directly, and the other way around, will be met with -EEXIST.
>> Attempts to detach a program when link exists result in -EINVAL.
>>
>> Attachment of multiple links of same attach type to one netns is not
>> supported with the intention to lift the restriction when a use-case
>> presents itself. Because of that link create returns -E2BIG when trying to
>> create another netns link, when one already exists.
>>
>> Link-based attachments to netns don't keep a netns alive by holding a ref
>> to it. Instead links get auto-detached from netns when the latter is being
>> destroyed, using a pernet pre_exit callback.
>>
>> When auto-detached, link lives in defunct state as long there are open FDs
>> for it. -ENOLINK is returned if a user tries to update a defunct link.
>>
>> Because bpf_link to netns doesn't hold a ref to struct net, special care is
>> taken when releasing, updating, or filling link info. The netns might be
>> getting torn down when any of these link operations are in progress. That
>> is why auto-detach and update/release/fill_info are synchronized by the
>> same mutex. Also, link ops have to always check if auto-detach has not
>> happened yet and if netns is still alive (refcnt > 0).
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  include/linux/bpf-netns.h      |   8 ++
>>  include/linux/bpf_types.h      |   3 +
>>  include/net/netns/bpf.h        |   1 +
>>  include/uapi/linux/bpf.h       |   5 +
>>  kernel/bpf/net_namespace.c     | 244 ++++++++++++++++++++++++++++++++-
>>  kernel/bpf/syscall.c           |   3 +
>>  tools/include/uapi/linux/bpf.h |   5 +
>>  7 files changed, 267 insertions(+), 2 deletions(-)
>>
>
> [...]
>
>> +
>> +static int bpf_netns_link_update_prog(struct bpf_link *link,
>> +                                     struct bpf_prog *new_prog,
>> +                                     struct bpf_prog *old_prog)
>> +{
>> +       struct bpf_netns_link *net_link =
>> +               container_of(link, struct bpf_netns_link, link);
>> +       enum netns_bpf_attach_type type = net_link->netns_type;
>> +       struct net *net;
>> +       int ret = 0;
>> +
>> +       if (old_prog && old_prog != link->prog)
>> +               return -EPERM;
>> +       if (new_prog->type != link->prog->type)
>> +               return -EINVAL;
>> +
>> +       mutex_lock(&netns_bpf_mutex);
>> +
>> +       net = net_link->net;
>> +       if (!net || !check_net(net)) {
>
> As is, this check_net() check looks very racy. Because if we do worry
> about net refcnt dropping to zero, then between check_net() and
> accessing net fields that can happen. So if that's a possiblity, you
> should probably instead do maybe_get_net() instead.
>
> But on the other hand, if we established that auto-detach taking a
> mutex protects us from net going away, then maybe we shouldn't worry
> at all about that, and thus check_net() is unnecessary and just
> unnecessarily confusing everything.
>
> I don't know enough overall net lifecycle, so I'm not sure which one
> it is. But the way it is right now still looks suspicious to me.

The story behind the additional "!check_net(net)" test (in update_prog
and in fill_info) is that without it, user-space will see the link as
defunct only after some delay from the moment last ref to netns is gone
(for instance, last process left the netns).

That is because netns is being destroyed from a workqueue [0].

This unpredictable delay makes testing the uAPI harder, and I expect
using it as well. Since we can do better and check the refcnt to detect
"early" that netns is no good any more, that's what we do.

At least that was my thinking here. I was hoping the comment below the
check would be enough. Didn't mean to cause confusion.

So there is no race, the locking scheme you suggested holds.

[0] https://elixir.bootlin.com/linux/latest/source/net/core/net_namespace.c#L644

>
>> +               /* Link auto-detached or netns dying */
>> +               ret = -ENOLINK;
>> +               goto out_unlock;
>> +       }
>> +
>> +       old_prog = xchg(&link->prog, new_prog);
>> +       rcu_assign_pointer(net->bpf.progs[type], new_prog);
>> +       bpf_prog_put(old_prog);
>> +
>> +out_unlock:
>> +       mutex_unlock(&netns_bpf_mutex);
>> +       return ret;
>> +}
>> +
>> +static int bpf_netns_link_fill_info(const struct bpf_link *link,
>> +                                   struct bpf_link_info *info)
>> +{
>> +       const struct bpf_netns_link *net_link =
>> +               container_of(link, struct bpf_netns_link, link);
>> +       unsigned int inum = 0;
>> +       struct net *net;
>> +
>> +       mutex_lock(&netns_bpf_mutex);
>> +       net = net_link->net;
>> +       if (net && check_net(net))
>> +               inum = net->ns.inum;
>> +       mutex_unlock(&netns_bpf_mutex);
>> +
>> +       info->netns.netns_ino = inum;
>> +       info->netns.attach_type = net_link->type;
>> +       return 0;
>> +}
>> +
>> +static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
>> +                                      struct seq_file *seq)
>> +{
>> +       struct bpf_link_info info = {};
>
> initialization here is probably not necessary, as long as you access
> only fields that fill_info initializes.

True. I figured that better safe than leaking stack contents, if
bpf_netns_link_fill_info gets broken.

>
>> +
>> +       bpf_netns_link_fill_info(link, &info);
>> +       seq_printf(seq,
>> +                  "netns_ino:\t%u\n"
>> +                  "attach_type:\t%u\n",
>> +                  info.netns.netns_ino,
>> +                  info.netns.attach_type);
>> +}
>> +
>
> [...]

  reply	other threads:[~2020-06-02  9:30 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 01/12] flow_dissector: Pull locking up from prog attach callback Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 02/12] net: Introduce netns_bpf for BPF programs attached to netns Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 03/12] flow_dissector: Move out netns_bpf prog callbacks Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 04/12] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
2020-06-01 22:30   ` Andrii Nakryiko
2020-06-02  9:30     ` Jakub Sitnicki [this message]
2020-06-02 17:38       ` Andrii Nakryiko
2020-05-31  8:28 ` [PATCH bpf-next v2 05/12] bpf, cgroup: Return ENOLINK for auto-detached links on update Jakub Sitnicki
2020-06-01 22:31   ` Andrii Nakryiko
2020-05-31  8:28 ` [PATCH bpf-next v2 06/12] libbpf: Add support for bpf_link-based netns attachment Jakub Sitnicki
2020-06-01 22:32   ` Andrii Nakryiko
2020-05-31  8:28 ` [PATCH bpf-next v2 07/12] bpftool: Extract helpers for showing link attach type Jakub Sitnicki
2020-06-01 22:35   ` Andrii Nakryiko
2020-06-02  9:37     ` Jakub Sitnicki
2020-06-02 17:39       ` Andrii Nakryiko
2020-05-31  8:28 ` [PATCH bpf-next v2 08/12] bpftool: Support link show for netns-attached links Jakub Sitnicki
2020-06-01 22:38   ` Andrii Nakryiko
2020-05-31  8:28 ` [PATCH bpf-next v2 09/12] selftests/bpf: Add tests for attaching bpf_link to netns Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 10/12] selftests/bpf, flow_dissector: Close TAP device FD after the test Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 11/12] selftests/bpf: Convert test_flow_dissector to use BPF skeleton Jakub Sitnicki
2020-06-01 22:42   ` Andrii Nakryiko
2020-06-02  9:46     ` Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 12/12] selftests/bpf: Extend test_flow_dissector to cover link creation Jakub Sitnicki
2020-06-01 22:26 ` [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ftbd3jci.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).