From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@fb.com>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
Networking <netdev@vger.kernel.org>,
Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction
Date: Fri, 06 Mar 2020 09:31:23 +0100 [thread overview]
Message-ID: <878skd3mw4.fsf@toke.dk> (raw)
In-Reply-To: <7b674384-1131-59d4-ee2f-5a17824c1eca@iogearbox.net>
Daniel Borkmann <daniel@iogearbox.net> writes:
> On 3/5/20 11:50 PM, Alexei Starovoitov wrote:
>> On Thu, Mar 05, 2020 at 11:34:18PM +0100, Daniel Borkmann wrote:
>>> On 3/5/20 5:34 PM, Alexei Starovoitov wrote:
>>>> On Thu, Mar 05, 2020 at 11:37:11AM +0100, Toke Høiland-Jørgensen wrote:
>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>>>>> On Wed, Mar 04, 2020 at 08:47:44AM +0100, Toke Høiland-Jørgensen wrote:
>>> [...]
>>>>> Anyway, what I was trying to express:
>>>>>
>>>>>> Still that doesn't mean that pinned link is 'immutable'.
>>>>>
>>>>> I don't mean 'immutable' in the sense that it cannot be removed ever.
>>>>> Just that we may end up in a situation where an application can see a
>>>>> netdev with an XDP program attached, has the right privileges to modify
>>>>> it, but can't because it can't find the pinned bpf_link. Right? Or am I
>>>>> misunderstanding your proposal?
>>>>>
>>>>> Amending my example from before, this could happen by:
>>>>>
>>>>> 1. Someone attaches a program to eth0, and pins the bpf_link to
>>>>> /sys/fs/bpf/myprog
>>>>>
>>>>> 2. eth0 is moved to a different namespace which mounts a new sysfs at
>>>>> /sys
>>>>>
>>>>> 3. Inside that namespace, /sys/fs/bpf/myprog is no longer accessible, so
>>>>> xdp-loader can't get access to the original bpf_link; but the XDP
>>>>> program is still attached to eth0.
>>>>
>>>> The key to decide is whether moving netdev across netns should be allowed
>>>> when xdp attached. I think it should be denied. Even when legacy xdp
>>>> program is attached, since it will confuse user space managing part.
>>>
>>> There are perfectly valid use cases where this is done already today (minus
>>> bpf_link), for example, consider an orchestrator that is setting up the BPF
>>> program on the device, moving to the newly created application pod during
>>> the CNI call in k8s, such that the new pod does not have the /sys/fs/bpf/
>>> mount instance and if unprivileged cannot remove the BPF prog from the dev
>>> either. We do something like this in case of ipvlan, meaning, we attach a
>>> rootlet prog that calls into single slot of a tail call map, move it to the
>>> application pod, and only out of Cilium's own pod and it's pod-local bpf fs
>>> instance we manage the pinned tail call map to update the main programs in
>>> that single slot w/o having to switch any netns later on.
>>
>> Right. You mentioned this use case before, but I managed to forget about it.
>> Totally makes sense for prog to stay attached to netdev when it's moved.
>> I think pod manager would also prefer that pod is not able to replace
>> xdp prog from inside the container. It sounds to me that steps 1,2,3 above
>> is exactly the desired behavior. Otherwise what stops some application
>> that started in a pod to override it?
>
> Generally, yes, and it shouldn't need to care nor see what is happening in
> /sys/fs/bpf/ from the orchestrator at least (or could potentially have its
> own private mount under /sys/fs/bpf/ or elsewhere). Ideally, the behavior
> should be that orchestrator does all the setup out of its own namespace,
> then moves everything over to the newly created target namespace and e.g.
> only if the pod has the capable(cap_sys_admin) permissions, it could mess
> around with anything attached there, or via similar model as done in [0]
> when there is a master device.
Yup, I can see how this can be a reasonable use case where you *would*
want the locking. However, my concern is that there should be a way for
an admin to recover from this (say, if it happens by mistake, or a
misbehaving application). Otherwise, I fear we'll end up with support
cases where the only answer is "try rebooting", which is obviously not
ideal.
> Last time I looked, there is a down/up cycle on the dev upon netns
> migration and it flushes e.g. attached qdiscs afaik, so there are
> limitations that still need to be addressed. Not sure atm if same is
> happening to XDP right now.
XDP programs will stay attached. devmaps (for redirect) have a notifier
that will remove devices when they move out of a namespace. Not sure if
there are any other issues with netns moves somewhere.
> In this regards veth devices are a bit nicer to work with since
> everything can be attached on hostns ingress w/o needing to worry on
> the peer dev in the pod's netns.
Presumably the XDP EGRESS hook that David Ahern is working on will make
this doable for XDP on veth as well?
-Toke
next prev parent reply other threads:[~2020-03-06 8:31 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-28 22:39 [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 1/3] bpf: introduce pinnable bpf_link abstraction Andrii Nakryiko
2020-03-02 10:13 ` Toke Høiland-Jørgensen
2020-03-02 18:06 ` Andrii Nakryiko
2020-03-02 21:40 ` Toke Høiland-Jørgensen
2020-03-02 23:37 ` Andrii Nakryiko
2020-03-03 2:50 ` Alexei Starovoitov
2020-03-03 4:18 ` Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 2/3] libbpf: add bpf_link pinning/unpinning Andrii Nakryiko
2020-03-02 10:16 ` Toke Høiland-Jørgensen
2020-03-02 18:09 ` Andrii Nakryiko
2020-03-02 21:45 ` Toke Høiland-Jørgensen
2020-02-28 22:39 ` [PATCH bpf-next 3/3] selftests/bpf: add link pinning selftests Andrii Nakryiko
2020-03-02 10:11 ` [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Toke Høiland-Jørgensen
2020-03-02 18:05 ` Andrii Nakryiko
2020-03-02 22:24 ` Toke Høiland-Jørgensen
2020-03-02 23:35 ` Andrii Nakryiko
2020-03-03 8:12 ` Toke Høiland-Jørgensen
2020-03-03 8:12 ` Daniel Borkmann
2020-03-03 15:46 ` Alexei Starovoitov
2020-03-03 19:23 ` Daniel Borkmann
2020-03-03 19:46 ` Andrii Nakryiko
2020-03-03 20:24 ` Toke Høiland-Jørgensen
2020-03-03 20:53 ` Daniel Borkmann
2020-03-03 22:01 ` Alexei Starovoitov
2020-03-03 22:27 ` Toke Høiland-Jørgensen
2020-03-04 4:36 ` Alexei Starovoitov
2020-03-04 7:47 ` Toke Høiland-Jørgensen
2020-03-04 15:47 ` Alexei Starovoitov
2020-03-05 10:37 ` Toke Høiland-Jørgensen
2020-03-05 16:34 ` Alexei Starovoitov
2020-03-05 22:34 ` Daniel Borkmann
2020-03-05 22:50 ` Alexei Starovoitov
2020-03-05 23:42 ` Daniel Borkmann
2020-03-06 8:31 ` Toke Høiland-Jørgensen [this message]
2020-03-06 10:25 ` Daniel Borkmann
2020-03-06 10:42 ` Toke Høiland-Jørgensen
2020-03-06 18:09 ` David Ahern
2020-03-04 19:41 ` Jakub Kicinski
2020-03-04 20:45 ` Alexei Starovoitov
2020-03-04 21:24 ` Jakub Kicinski
2020-03-05 1:07 ` Alexei Starovoitov
2020-03-05 8:16 ` Jakub Kicinski
2020-03-05 11:05 ` Toke Høiland-Jørgensen
2020-03-05 18:13 ` Jakub Kicinski
2020-03-09 11:41 ` Toke Høiland-Jørgensen
2020-03-09 18:50 ` Jakub Kicinski
2020-03-10 12:22 ` Toke Høiland-Jørgensen
2020-03-05 16:39 ` Alexei Starovoitov
2020-03-03 22:40 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878skd3mw4.fsf@toke.dk \
--to=toke@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andriin@fb.com \
--cc=ast@fb.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).