All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Graf <tgraf@suug.ch>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	davem@davemloft.net, viro@ZenIV.linux.org.uk,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs
Date: Wed, 21 Oct 2015 20:34:47 +0200	[thread overview]
Message-ID: <20151021183447.GC23554@pox.localdomain> (raw)
In-Reply-To: <5627AC79.5000704@iogearbox.net>

On 10/21/15 at 05:17pm, Daniel Borkmann wrote:
> On 10/20/2015 08:56 PM, Eric W. Biederman wrote:
> ...
> >Just FYI:  Using a device for this kind of interface is pretty
> >much a non-starter as that quickly gets you into situations where
> >things do not work in containers.  If someone gets a version of device
> >namespaces past GregKH it might be up for discussion to use character
> >devices.
> 
> Okay, you are referring to this discussion here:
> 
>   http://thread.gmane.org/gmane.linux.kernel.containers/26760
> 
> What had been mentioned earlier in this thread was to have a namespace
> pass-through facility enforced by device cgroups we have in the kernel,
> which is one out of various means used to enforce policy today by
> deployment systems such as docker, for example. But more below.
> 
> I think this all depends on the kind of expectations we have, where all
> this is going. In the original proposal, it was agreed to have the
> operation that creates a node as 'capable(CAP_SYS_ADMIN)'-only (in the
> way like most of the rest of eBPF is restricted), and based on the use
> case we distribute such objects to unprivileged applications. But I
> understand that it seems the trend lately to lift eBPF restrictions at
> some point anyway, and thus the CAP_SYS_ADMIN is suddenly irrelevant
> again. Fair enough.
> 
> Don't get me wrong, I really don't mind if it will be some version of
> this fs patch or whatever architecture else we find consensus on, I
> think this discussion is merely trying to evaluate/discuss on what seems
> to be a good fit, also in terms of future requirements and integration.
> 
> So far, during this discussion, it was proposed to modify the file system
> to a single-mount one and to stick this under /sys/kernel/bpf/. This
> will not have "real" namespace support either, but it was proposed to
> have a following structure:
> 
>   /sys/kernel/bpf/username/<optional_dirs_mkdir_by_user>/progX

This would probably work as you would typically map the ebpf map
using -v like this to give a stable path:

        docker run -v /sys/kernel/bpf/foo/maps/progX:/map proX
 
> So, the file system will have kind of a user home-directory for each user
> to isolate through permissions, if I understood correctly.
> 
> If we really want to go this route, then I think there are no big stones
> in the way for the other model either. It should look roughly drafted like
> the below.
> 
> Together with device cgroups for containers, it would allow scenarios where
> you can have:
> 
>   * eBPF (map/prog) device pass-through so a map/prog could even be shared out
>     from the initial namespace into individual ones/all (one could possibly
>     extend such maps as read-only for these consumers).
>   * eBPF device creation for unprivileged users with permissions being set
>     accordingly (as in fs case).
>   * Since cgroup controller can also do wildcards on major/minors, we could
>     make that further fine-grained.
>   * eBPF device creation can also be enforced by the cgroup controller to be
>     entirely disallowed for a specific container.
> 
> (An admin can determine the dynamically created major f.e. under /proc/devices.)

I've read the discussion passively and my take away is that, frankly,
I think the differences are somewhat minor. Both architectures can
scale to what we need. Both will do the job. I'm slightly worried about
exposing uAPI as a FS, I think that didn't work too well for sysfs. It's
pretty much a define the format once and never touch it again kind of
deal.

  reply	other threads:[~2015-10-21 18:34 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-16  1:09 [PATCH net-next 0/4] BPF updates Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 1/4] bpf: abstract anon_inode_getfd invocations Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 2/4] bpf: align and clean bpf_{map,prog}_get helpers Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 3/4] bpf: add support for persistent maps/progs Daniel Borkmann
2015-10-16 10:25   ` Hannes Frederic Sowa
2015-10-16 13:36     ` Daniel Borkmann
2015-10-16 16:36       ` Hannes Frederic Sowa
2015-10-16 17:27         ` Daniel Borkmann
2015-10-16 17:37           ` Alexei Starovoitov
2015-10-16 16:18     ` Alexei Starovoitov
2015-10-16 16:43       ` Hannes Frederic Sowa
2015-10-16 17:32         ` Alexei Starovoitov
2015-10-16 17:37           ` Thomas Graf
2015-10-16 17:21   ` Hannes Frederic Sowa
2015-10-16 17:42     ` Alexei Starovoitov
2015-10-16 17:56       ` Daniel Borkmann
2015-10-16 18:41         ` Eric W. Biederman
2015-10-16 19:27           ` Alexei Starovoitov
2015-10-16 19:53             ` Eric W. Biederman
2015-10-16 20:56               ` Alexei Starovoitov
2015-10-16 23:44                 ` Eric W. Biederman
2015-10-17  2:43                   ` Alexei Starovoitov
2015-10-17 12:28                     ` Daniel Borkmann
2015-10-18  2:20                       ` Alexei Starovoitov
2015-10-18 15:03                         ` Daniel Borkmann
2015-10-18 16:49                           ` Daniel Borkmann
2015-10-18 20:59                             ` Alexei Starovoitov
2015-10-19  7:36                               ` Hannes Frederic Sowa
2015-10-19  9:51                                 ` Daniel Borkmann
2015-10-19 14:23                                   ` Daniel Borkmann
2015-10-19 16:22                                     ` Alexei Starovoitov
2015-10-19 17:37                                       ` Daniel Borkmann
2015-10-19 17:37                                         ` Daniel Borkmann
2015-10-19 18:15                                         ` Alexei Starovoitov
2015-10-19 18:46                                           ` Hannes Frederic Sowa
2015-10-19 19:34                                             ` Alexei Starovoitov
2015-10-19 20:03                                               ` Hannes Frederic Sowa
2015-10-19 20:48                                                 ` Alexei Starovoitov
2015-10-19 22:17                                                   ` Daniel Borkmann
2015-10-19 22:17                                                     ` Daniel Borkmann
2015-10-20  0:30                                                     ` Alexei Starovoitov
2015-10-20  8:46                                                       ` Daniel Borkmann
2015-10-20  8:46                                                         ` Daniel Borkmann
2015-10-20 17:53                                                         ` Alexei Starovoitov
2015-10-20 18:56                                                           ` Eric W. Biederman
2015-10-21 15:17                                                             ` Daniel Borkmann
2015-10-21 15:17                                                               ` Daniel Borkmann
2015-10-21 18:34                                                               ` Thomas Graf [this message]
2015-10-21 22:44                                                                 ` Alexei Starovoitov
2015-10-22 13:22                                                                   ` Daniel Borkmann
2015-10-22 19:35                                                               ` Eric W. Biederman
2015-10-23 13:47                                                                 ` Daniel Borkmann
2015-10-23 13:47                                                                   ` Daniel Borkmann
2015-10-20  9:43                                                       ` Hannes Frederic Sowa
2015-10-19 23:02                                                   ` Hannes Frederic Sowa
2015-10-20  1:09                                                     ` Alexei Starovoitov
2015-10-20 10:07                                                       ` Hannes Frederic Sowa
2015-10-20 18:44                                                         ` Alexei Starovoitov
2015-10-16 19:54             ` Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 4/4] bpf: add sample usages " Daniel Borkmann
2015-10-19  2:53 ` [PATCH net-next 0/4] BPF updates David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151021183447.GC23554@pox.localdomain \
    --to=tgraf@suug.ch \
    --cc=ast@kernel.org \
    --cc=ast@plumgrid.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=hannes@stressinduktion.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.