netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <ast@fb.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"David S . Miller" <davem@davemloft.net>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	David Ahern <dsa@cumulusnetworks.com>,
	"Tejun Heo" <tj@kernel.org>, Thomas Graf <tgraf@suug.ch>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH net] bpf: expose netns inode to bpf programs
Date: Thu, 26 Jan 2017 11:25:48 -0800	[thread overview]
Message-ID: <588A4D3C.9070601@fb.com> (raw)
In-Reply-To: <CALCETrXgdY_Kt8wn4uiATUnNJ3YXttCUREgEeQReG7u29Lc44g@mail.gmail.com>

On 1/26/17 11:07 AM, Andy Lutomirski wrote:
> On Thu, Jan 26, 2017 at 10:32 AM, Alexei Starovoitov <ast@fb.com> wrote:
>> On 1/26/17 10:12 AM, Andy Lutomirski wrote:
>>>
>>> On Thu, Jan 26, 2017 at 9:46 AM, Alexei Starovoitov <ast@fb.com> wrote:
>>>>
>>>> On 1/26/17 8:37 AM, Andy Lutomirski wrote:
>>>>>>
>>>>>>
>>>>>> Think of bpf programs as safe kernel modules. They don't have
>>>>>> confined boundaries and program authors, if not careful, can shoot
>>>>>> themselves in the foot. We're not trying to prevent that because
>>>>>> it's impossible to check that the program is sane. Just like
>>>>>> it's impossible to check that kernel module is sane.
>>>>>> But in case of bpf we check that bpf program is _safe_ from the kernel
>>>>>> point of view. If it's doing some garbage, it's program's business.
>>>>>> Does it make more sense now?
>>>>>>
>>>>>
>>>>> With all due respect, I think this is not an acceptable way to think
>>>>> about BPF at all.  If you think of BPF this way, I think there needs
>>>>> to be a real discussion at KS or similar as to whether this is okay.
>>>>> The reason is simple: the kernel promises a stable ABI to userspace
>>>>> but not to kernel modules.  By thinking of BPF as more like a module,
>>>>> you're taking a big shortcut that will either result in ABI breakage
>>>>> down the road or in committing to a problematic stable ABI.
>>>>
>>>>
>>>>
>>>> you misunderstood the analogy.
>>>> bpf abi is certainly stable. that's why we were careful of not
>>>> exposing anything to it that is not already stable.
>>>>
>>>
>>> In that case I don't understand what you're trying to say.  Eric
>>> thinks your patch exposes a bad interface.  A bad interface for
>>> userspace is a very different thing from a bad interface available to
>>> kernel modules.  Are you saying that BPF is kernel-module-like in that
>>> the ABI exposed to BPF programs doesn't need to meet the same quality
>>> standards as userspace ABIs?
>>
>>
>> of course not.
>> ns.inum is already exposed to user space as a value.
>> This patch exposes it to bpf program in a convenient and stable way,
>
> Here's what I'm imaging Eric is thinking:
>
> ns.inum is currently exposed to userspace via procfs.  In principle,
> the value could be local to a namespace, though, which would enable
> CRIU to be able to preserve namespace inode numbers across a
> checkpoint+restore operation.  If this happened, the contained and
> restored procfs would see a different inode number than the outermost
> procfs.

sure. there are many different ways for the program to see inode
that either was already reused or disappeared.
What I'm saying that it is expected. We cannot prevent that from
bpf side. Just like ifindex value read by the program can be bogus
as in the example I just provided.

> If you start exposing the raw ns.inum field to BPF programs and those
> programs are not themselves scoped to a namespace, then this could
> create a problem for CRIU.

criu doesn't support ebpf because maps are not snapshot-able and
programs are detached from the control plane. I cannot see how one can
criu of xdp or cls program. The ssh connection to the box might die in
the middle while criu is messing with unknown. Hence the analogy to
the kernel modules. Imagine a set of mini-kernel modules and a set
of apps that depend on them. What kind of criu can we even talk about?

> But you told Eric that his nack doesn't matter, and maybe it would be
> nice to ask him to clarify instead.

Fair enough. Eric, thoughts?

  reply	other threads:[~2017-01-26 20:14 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-26  3:27 [PATCH net] bpf: expose netns inode to bpf programs Alexei Starovoitov
2017-01-26  5:46 ` Eric W. Biederman
2017-01-26  6:00   ` Ying Xue
2017-01-26  6:23   ` Alexei Starovoitov
2017-01-26 16:37     ` Andy Lutomirski
2017-01-26 17:46       ` Alexei Starovoitov
2017-01-26 18:12         ` Andy Lutomirski
2017-01-26 18:32           ` Alexei Starovoitov
2017-01-26 19:07             ` Andy Lutomirski
2017-01-26 19:25               ` Alexei Starovoitov [this message]
2017-02-03  4:33                 ` Eric W. Biederman
2017-02-03  6:05                   ` Alexei Starovoitov
2017-02-03 10:30                     ` Eric W. Biederman
2017-02-03 21:00                   ` Andy Lutomirski
2017-02-03 21:06                     ` Eric W. Biederman
2017-02-03 23:08                     ` Alexei Starovoitov
2017-02-04 17:07                       ` Andy Lutomirski
2017-02-05  3:10                         ` Alexei Starovoitov
2017-02-05  3:27                           ` Andy Lutomirski
2017-02-05  3:48                             ` Alexei Starovoitov
2017-02-05  3:54                               ` Andy Lutomirski
2017-02-05  4:37                                 ` Alexei Starovoitov
2017-02-05  5:05                                   ` Andy Lutomirski
2017-02-07  1:43                                     ` Alexei Starovoitov
2017-01-31 18:02 ` David Miller
2017-01-31 22:11 ` David Ahern
2017-02-03 21:56 ` Daniel Borkmann
2017-02-03 23:06   ` Alexei Starovoitov
2017-02-03 23:42     ` Daniel Borkmann
2017-02-04  1:25       ` Alexei Starovoitov
2017-02-04 17:08       ` Andy Lutomirski
2017-02-05  3:18         ` Alexei Starovoitov
2017-02-05  3:22           ` Andy Lutomirski
2017-02-05  3:35             ` Alexei Starovoitov
2017-02-05  3:49               ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=588A4D3C.9070601@fb.com \
    --to=ast@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsa@cumulusnetworks.com \
    --cc=ebiederm@xmission.com \
    --cc=luto@amacapital.net \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).