Re: [PATCH net] bpf: expose netns inode to bpf programs

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@fb.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"David S . Miller" <davem@davemloft.net>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Ahern <dsa@cumulusnetworks.com>, Tejun Heo <tj@kernel.org>,
	Thomas Graf <tgraf@suug.ch>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH net] bpf: expose netns inode to bpf programs
Date: Sat, 04 Feb 2017 10:06:41 +1300	[thread overview]
Message-ID: <8737fvt25a.fsf@xmission.com> (raw)
In-Reply-To: <CALCETrWDxVHc4_hpjLQRj-XbvGX0+tbGZ8o388-xn6rNcQTySQ@mail.gmail.com> (Andy Lutomirski's message of "Fri, 3 Feb 2017 13:00:47 -0800")

Andy Lutomirski <luto@amacapital.net> writes:

> On Thu, Feb 2, 2017 at 8:33 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Alexei Starovoitov <ast@fb.com> writes:
>>
>>> On 1/26/17 11:07 AM, Andy Lutomirski wrote:
>>>> On Thu, Jan 26, 2017 at 10:32 AM, Alexei Starovoitov <ast@fb.com> wrote:
>>>>> On 1/26/17 10:12 AM, Andy Lutomirski wrote:
>>>>>>
>>>>>> On Thu, Jan 26, 2017 at 9:46 AM, Alexei Starovoitov <ast@fb.com> wrote:
>>>>>>>
>>>>>>> On 1/26/17 8:37 AM, Andy Lutomirski wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Think of bpf programs as safe kernel modules. They don't have
>>>>>>>>> confined boundaries and program authors, if not careful, can shoot
>>>>>>>>> themselves in the foot. We're not trying to prevent that because
>>>>>>>>> it's impossible to check that the program is sane. Just like
>>>>>>>>> it's impossible to check that kernel module is sane.
>>>>>>>>> But in case of bpf we check that bpf program is _safe_ from the kernel
>>>>>>>>> point of view. If it's doing some garbage, it's program's business.
>>>>>>>>> Does it make more sense now?
>>>>>>>>>
>>>>>>>>
>>>>>>>> With all due respect, I think this is not an acceptable way to think
>>>>>>>> about BPF at all.  If you think of BPF this way, I think there needs
>>>>>>>> to be a real discussion at KS or similar as to whether this is okay.
>>>>>>>> The reason is simple: the kernel promises a stable ABI to userspace
>>>>>>>> but not to kernel modules.  By thinking of BPF as more like a module,
>>>>>>>> you're taking a big shortcut that will either result in ABI breakage
>>>>>>>> down the road or in committing to a problematic stable ABI.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> you misunderstood the analogy.
>>>>>>> bpf abi is certainly stable. that's why we were careful of not
>>>>>>> exposing anything to it that is not already stable.
>>>>>>>
>>>>>>
>>>>>> In that case I don't understand what you're trying to say.  Eric
>>>>>> thinks your patch exposes a bad interface.  A bad interface for
>>>>>> userspace is a very different thing from a bad interface available to
>>>>>> kernel modules.  Are you saying that BPF is kernel-module-like in that
>>>>>> the ABI exposed to BPF programs doesn't need to meet the same quality
>>>>>> standards as userspace ABIs?
>>>>>
>>>>>
>>>>> of course not.
>>>>> ns.inum is already exposed to user space as a value.
>>>>> This patch exposes it to bpf program in a convenient and stable way,
>>>>
>>>> Here's what I'm imaging Eric is thinking:
>>>>
>>>> ns.inum is currently exposed to userspace via procfs.  In principle,
>>>> the value could be local to a namespace, though, which would enable
>>>> CRIU to be able to preserve namespace inode numbers across a
>>>> checkpoint+restore operation.  If this happened, the contained and
>>>> restored procfs would see a different inode number than the outermost
>>>> procfs.
>>>
>>> sure. there are many different ways for the program to see inode
>>> that either was already reused or disappeared.
>>> What I'm saying that it is expected. We cannot prevent that from
>>> bpf side. Just like ifindex value read by the program can be bogus
>>> as in the example I just provided.
>>
>> The point is that we can make the inode number stable across migration
>> and the user space API for namespaces has been designed with that
>> possibility in mind.
>
> How does it help if BPF starts exposing both inode number and device
> number?

Adding the device number comparison helps in that it is explicit what is
being compared against.  That gives me at least a bit of a namespace
for the namespaces, and a program from a sufficiently wrong context will
have it's comparisons fail rather than having a match.

I think the operation that is exported in the BPF should be a full
comparison operation of device and inode number so that it could be
optimized/compiled to something else depending upon the context.

AKA the compilation of the bpf program would have the opportunity to
remove the namespace dependency and make the program work in a global
context.  So we don't have to carry namespace information around at run
time.

> ISTM any ability to migrate namespaces and to migrate eBPF programs
> that know about namespaces needs to have the eBPF program firmly
> rooted in some namespace (or perhaps cgroup in this case) so that it
> can see a namespaced view of the world.  For this to work, presumably
> we need to make sure that eBPF programs that are installed by programs
> that are in a container don't see traffic that isn't in that
> container.  This is part of why I think that we should consider
> preventing programs that aren't in the root namespace (perhaps *all*
> the root namespaces) from installing bpf+cgroup programs in the first
> place until there's a clearer understanding of how this all fits
> together.

Andy I agree.  At least to the point those programs are
reading attributes that are in a namespace.  Something that should be
straight forward to verify in the bpf checker when installing the
program.

Eric

next prev parent reply	other threads:[~2017-02-03 21:11 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-26  3:27 [PATCH net] bpf: expose netns inode to bpf programs Alexei Starovoitov
2017-01-26  5:46 ` Eric W. Biederman
2017-01-26  6:00   ` Ying Xue
2017-01-26  6:23   ` Alexei Starovoitov
2017-01-26 16:37     ` Andy Lutomirski
2017-01-26 17:46       ` Alexei Starovoitov
2017-01-26 18:12         ` Andy Lutomirski
2017-01-26 18:32           ` Alexei Starovoitov
2017-01-26 19:07             ` Andy Lutomirski
2017-01-26 19:25               ` Alexei Starovoitov
2017-02-03  4:33                 ` Eric W. Biederman
2017-02-03  6:05                   ` Alexei Starovoitov
2017-02-03 10:30                     ` Eric W. Biederman
2017-02-03 21:00                   ` Andy Lutomirski
2017-02-03 21:06                     ` Eric W. Biederman [this message]
2017-02-03 23:08                     ` Alexei Starovoitov
2017-02-04 17:07                       ` Andy Lutomirski
2017-02-05  3:10                         ` Alexei Starovoitov
2017-02-05  3:27                           ` Andy Lutomirski
2017-02-05  3:48                             ` Alexei Starovoitov
2017-02-05  3:54                               ` Andy Lutomirski
2017-02-05  4:37                                 ` Alexei Starovoitov
2017-02-05  5:05                                   ` Andy Lutomirski
2017-02-07  1:43                                     ` Alexei Starovoitov
2017-01-31 18:02 ` David Miller
2017-01-31 22:11 ` David Ahern
2017-02-03 21:56 ` Daniel Borkmann
2017-02-03 23:06   ` Alexei Starovoitov
2017-02-03 23:42     ` Daniel Borkmann
2017-02-04  1:25       ` Alexei Starovoitov
2017-02-04 17:08       ` Andy Lutomirski
2017-02-05  3:18         ` Alexei Starovoitov
2017-02-05  3:22           ` Andy Lutomirski
2017-02-05  3:35             ` Alexei Starovoitov
2017-02-05  3:49               ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8737fvt25a.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=ast@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsa@cumulusnetworks.com \
    --cc=luto@amacapital.net \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).