netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Graf <tgraf@suug.ch>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	davem@davemloft.net, viro@ZenIV.linux.org.uk,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs
Date: Wed, 21 Oct 2015 20:34:47 +0200	[thread overview]
Message-ID: <20151021183447.GC23554@pox.localdomain> (raw)
In-Reply-To: <5627AC79.5000704@iogearbox.net>

On 10/21/15 at 05:17pm, Daniel Borkmann wrote:
> On 10/20/2015 08:56 PM, Eric W. Biederman wrote:
> ...
> >Just FYI:  Using a device for this kind of interface is pretty
> >much a non-starter as that quickly gets you into situations where
> >things do not work in containers.  If someone gets a version of device
> >namespaces past GregKH it might be up for discussion to use character
> >devices.
> 
> Okay, you are referring to this discussion here:
> 
>   http://thread.gmane.org/gmane.linux.kernel.containers/26760
> 
> What had been mentioned earlier in this thread was to have a namespace
> pass-through facility enforced by device cgroups we have in the kernel,
> which is one out of various means used to enforce policy today by
> deployment systems such as docker, for example. But more below.
> 
> I think this all depends on the kind of expectations we have, where all
> this is going. In the original proposal, it was agreed to have the
> operation that creates a node as 'capable(CAP_SYS_ADMIN)'-only (in the
> way like most of the rest of eBPF is restricted), and based on the use
> case we distribute such objects to unprivileged applications. But I
> understand that it seems the trend lately to lift eBPF restrictions at
> some point anyway, and thus the CAP_SYS_ADMIN is suddenly irrelevant
> again. Fair enough.
> 
> Don't get me wrong, I really don't mind if it will be some version of
> this fs patch or whatever architecture else we find consensus on, I
> think this discussion is merely trying to evaluate/discuss on what seems
> to be a good fit, also in terms of future requirements and integration.
> 
> So far, during this discussion, it was proposed to modify the file system
> to a single-mount one and to stick this under /sys/kernel/bpf/. This
> will not have "real" namespace support either, but it was proposed to
> have a following structure:
> 
>   /sys/kernel/bpf/username/<optional_dirs_mkdir_by_user>/progX

This would probably work as you would typically map the ebpf map
using -v like this to give a stable path:

        docker run -v /sys/kernel/bpf/foo/maps/progX:/map proX
 
> So, the file system will have kind of a user home-directory for each user
> to isolate through permissions, if I understood correctly.
> 
> If we really want to go this route, then I think there are no big stones
> in the way for the other model either. It should look roughly drafted like
> the below.
> 
> Together with device cgroups for containers, it would allow scenarios where
> you can have:
> 
>   * eBPF (map/prog) device pass-through so a map/prog could even be shared out
>     from the initial namespace into individual ones/all (one could possibly
>     extend such maps as read-only for these consumers).
>   * eBPF device creation for unprivileged users with permissions being set
>     accordingly (as in fs case).
>   * Since cgroup controller can also do wildcards on major/minors, we could
>     make that further fine-grained.
>   * eBPF device creation can also be enforced by the cgroup controller to be
>     entirely disallowed for a specific container.
> 
> (An admin can determine the dynamically created major f.e. under /proc/devices.)

I've read the discussion passively and my take away is that, frankly,
I think the differences are somewhat minor. Both architectures can
scale to what we need. Both will do the job. I'm slightly worried about
exposing uAPI as a FS, I think that didn't work too well for sysfs. It's
pretty much a define the format once and never touch it again kind of
deal.

  reply	other threads:[~2015-10-21 18:34 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-16  1:09 [PATCH net-next 0/4] BPF updates Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 1/4] bpf: abstract anon_inode_getfd invocations Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 2/4] bpf: align and clean bpf_{map,prog}_get helpers Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 3/4] bpf: add support for persistent maps/progs Daniel Borkmann
2015-10-16 10:25   ` Hannes Frederic Sowa
2015-10-16 13:36     ` Daniel Borkmann
2015-10-16 16:36       ` Hannes Frederic Sowa
2015-10-16 17:27         ` Daniel Borkmann
2015-10-16 17:37           ` Alexei Starovoitov
2015-10-16 16:18     ` Alexei Starovoitov
2015-10-16 16:43       ` Hannes Frederic Sowa
2015-10-16 17:32         ` Alexei Starovoitov
2015-10-16 17:37           ` Thomas Graf
2015-10-16 17:21   ` Hannes Frederic Sowa
2015-10-16 17:42     ` Alexei Starovoitov
2015-10-16 17:56       ` Daniel Borkmann
2015-10-16 18:41         ` Eric W. Biederman
2015-10-16 19:27           ` Alexei Starovoitov
2015-10-16 19:53             ` Eric W. Biederman
2015-10-16 20:56               ` Alexei Starovoitov
2015-10-16 23:44                 ` Eric W. Biederman
2015-10-17  2:43                   ` Alexei Starovoitov
2015-10-17 12:28                     ` Daniel Borkmann
2015-10-18  2:20                       ` Alexei Starovoitov
2015-10-18 15:03                         ` Daniel Borkmann
2015-10-18 16:49                           ` Daniel Borkmann
2015-10-18 20:59                             ` Alexei Starovoitov
2015-10-19  7:36                               ` Hannes Frederic Sowa
2015-10-19  9:51                                 ` Daniel Borkmann
2015-10-19 14:23                                   ` Daniel Borkmann
2015-10-19 16:22                                     ` Alexei Starovoitov
2015-10-19 17:37                                       ` Daniel Borkmann
2015-10-19 18:15                                         ` Alexei Starovoitov
2015-10-19 18:46                                           ` Hannes Frederic Sowa
2015-10-19 19:34                                             ` Alexei Starovoitov
2015-10-19 20:03                                               ` Hannes Frederic Sowa
2015-10-19 20:48                                                 ` Alexei Starovoitov
2015-10-19 22:17                                                   ` Daniel Borkmann
2015-10-20  0:30                                                     ` Alexei Starovoitov
2015-10-20  8:46                                                       ` Daniel Borkmann
2015-10-20 17:53                                                         ` Alexei Starovoitov
2015-10-20 18:56                                                           ` Eric W. Biederman
2015-10-21 15:17                                                             ` Daniel Borkmann
2015-10-21 18:34                                                               ` Thomas Graf [this message]
2015-10-21 22:44                                                                 ` Alexei Starovoitov
2015-10-22 13:22                                                                   ` Daniel Borkmann
2015-10-22 19:35                                                               ` Eric W. Biederman
2015-10-23 13:47                                                                 ` Daniel Borkmann
2015-10-20  9:43                                                       ` Hannes Frederic Sowa
2015-10-19 23:02                                                   ` Hannes Frederic Sowa
2015-10-20  1:09                                                     ` Alexei Starovoitov
2015-10-20 10:07                                                       ` Hannes Frederic Sowa
2015-10-20 18:44                                                         ` Alexei Starovoitov
2015-10-16 19:54             ` Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 4/4] bpf: add sample usages " Daniel Borkmann
2015-10-19  2:53 ` [PATCH net-next 0/4] BPF updates David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151021183447.GC23554@pox.localdomain \
    --to=tgraf@suug.ch \
    --cc=ast@kernel.org \
    --cc=ast@plumgrid.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=hannes@stressinduktion.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).