netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <ast@plumgrid.com>
To: Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Cc: davem@davemloft.net, viro@ZenIV.linux.org.uk, tgraf@suug.ch,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs
Date: Mon, 19 Oct 2015 12:34:02 -0700	[thread overview]
Message-ID: <562545AA.2080207@plumgrid.com> (raw)
In-Reply-To: <1445280385.602530.414418777.63627F89@webmail.messagingengine.com>

On 10/19/15 11:46 AM, Hannes Frederic Sowa wrote:
> Hi,
>
> On Mon, Oct 19, 2015, at 20:15, Alexei Starovoitov wrote:
>> On 10/19/15 10:37 AM, Daniel Borkmann wrote:
>>> An eBPF program or map loading/destruction is *not* by any means to be
>>> considered fast-path. We currently hold a global mutex during loading.
>>> So, how can that be considered fast-path? Similarly, socket creation/
>>> destruction is also not fast-path, etc. Do you expect that applications
>>> would create/destroy these devices within milliseconds? I'd argue that
>>> something would be seriously wrong with that application, then. Such
>>> persistent maps are to be considered rather mid-long living objects in
>>> the system. The fast-path surely is the data-path of them.
>>
>> consider seccomp that loads several programs for every tab, then
>> container use case where we're loading two for each as well.
>> Who knows what use cases will come up in the future.
>> It's obviously not a fast path that is being hit million times a second,
>> but why add overhead when it's unnecessary?
>
> But the browser using seccomp does not need persistent maps at all. It

today it doesn't, but if it was a light weight feature, it could
have been used much more broadly.
Currently we're using user agent for networking and it's a pain.

>>>> completely unnecessary here. The kernel will consume more memory for
>>>> no real reason other than cdev are used to keep prog/maps around.
>>>
>>> I don't consider this a big issue, and well worth the trade-off. You'll
>>> have an infrastructure that integrates *nicely* into the *existing* kernel
>>> model *and* tooling with the proposed patch. This is a HUGE plus. The
>>> UAPI of this is simple and minimal. And to me, these are in-fact special
>>> files, not regular ones.
>>
>> Seriously? Syscall to create/destory cdevs is a nice api?
>
> It is pretty normal to do. They don't use syscall but ioctl on a special
> pseudo device node (lvm2/device-mapper, tun, etc.). Same thing, very
> overloaded syscall. Also I don't see a reason to not make the creation
> possible via sysfs, which would be even nicer but not orthogonal to the
> current creation of maps, which is nowadays set in stone by uapi.

Isn't it your point going against cdev? ioctls on devs are overloaded,
whereas here we have clean bpf syscall with no extra baggage.
Going old-school to device model goes against that clean syscall
approach.

> sysfs already has infrastructure to ensure supportability and
> debugability. Dependency graphs can be created to see which programs
> depend on which bpf_progs. kobjects are singeltons in sysfs
> representation. If you have multiple ebpf filesystems, even maybe
> referencing the same hashtable with gigabytes of data multiple times,
> there needs to be some way to help administrators to check resource
> usage, statistics, tweak and tune the rhashtable. All this needs to be
> handled as well in the future. It doesn't really fit the filesystem
> model, but representing a kobject seems to be a good fit to me.

quite the opposite. The way cdev patch is done now, there is no way
to extend it to create a hierarchy without breaking users,
whereas fs style with initial Daniel's patch can be extended.
Doing 'resource stats' via sysfs requires bpf to add to sysfs, which
is not this cdev approach.

> Policy can be done by user space with help of udev. Selinux policies can
> easily being extended to allow specific domains access. Namespace
> "passthrough" is defined for devices already.

that's an interesting point, but isn't it better done with fs?
you can go much more fine-grained with directory permissions in fs.
With different users/namespaces having their own hierarchies and mounts
whereas cdev dumps everything into one spot.

>> where do you see 'over-design' in fs? It's a straightforward code and
>> most of it is boilerplate like all other fs.
>> Kill rmdir/mkdir ops, term bit and options and the fs diff will shrink
>> by another 200+ lines.
>
> I don't think that only a filesystem will do it in the foreseeable
> future. You want to have tools like lsof reporting which map and which
> program has references to each other. Those file nodes will need more
> metadata attached in future anyway, so currently just comparing lines of
> codes seems not to be a good way for arguing.

my point was that both have roughly the same number, but the lines of
code will grow in either approach and when cdev starts as a hack I can
only see more hacks added in the future, whereas fs gives us
full flexibility to do any type of file access and user visible
representation.
Also I don't buy the point of reinventing sysfs. bpffs is not doing
sysfs. I don't want to see _every_ bpf object in sysfs. It's way too
much overhead. Classic doesn't have sysfs and everyone have been
using it just fine.
bpffs is solving the need to 'pin_fd' when user process exits.
That's it. Let's solve that and not go into any of this sysfs stuff.

  reply	other threads:[~2015-10-19 19:34 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-16  1:09 [PATCH net-next 0/4] BPF updates Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 1/4] bpf: abstract anon_inode_getfd invocations Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 2/4] bpf: align and clean bpf_{map,prog}_get helpers Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 3/4] bpf: add support for persistent maps/progs Daniel Borkmann
2015-10-16 10:25   ` Hannes Frederic Sowa
2015-10-16 13:36     ` Daniel Borkmann
2015-10-16 16:36       ` Hannes Frederic Sowa
2015-10-16 17:27         ` Daniel Borkmann
2015-10-16 17:37           ` Alexei Starovoitov
2015-10-16 16:18     ` Alexei Starovoitov
2015-10-16 16:43       ` Hannes Frederic Sowa
2015-10-16 17:32         ` Alexei Starovoitov
2015-10-16 17:37           ` Thomas Graf
2015-10-16 17:21   ` Hannes Frederic Sowa
2015-10-16 17:42     ` Alexei Starovoitov
2015-10-16 17:56       ` Daniel Borkmann
2015-10-16 18:41         ` Eric W. Biederman
2015-10-16 19:27           ` Alexei Starovoitov
2015-10-16 19:53             ` Eric W. Biederman
2015-10-16 20:56               ` Alexei Starovoitov
2015-10-16 23:44                 ` Eric W. Biederman
2015-10-17  2:43                   ` Alexei Starovoitov
2015-10-17 12:28                     ` Daniel Borkmann
2015-10-18  2:20                       ` Alexei Starovoitov
2015-10-18 15:03                         ` Daniel Borkmann
2015-10-18 16:49                           ` Daniel Borkmann
2015-10-18 20:59                             ` Alexei Starovoitov
2015-10-19  7:36                               ` Hannes Frederic Sowa
2015-10-19  9:51                                 ` Daniel Borkmann
2015-10-19 14:23                                   ` Daniel Borkmann
2015-10-19 16:22                                     ` Alexei Starovoitov
2015-10-19 17:37                                       ` Daniel Borkmann
2015-10-19 18:15                                         ` Alexei Starovoitov
2015-10-19 18:46                                           ` Hannes Frederic Sowa
2015-10-19 19:34                                             ` Alexei Starovoitov [this message]
2015-10-19 20:03                                               ` Hannes Frederic Sowa
2015-10-19 20:48                                                 ` Alexei Starovoitov
2015-10-19 22:17                                                   ` Daniel Borkmann
2015-10-20  0:30                                                     ` Alexei Starovoitov
2015-10-20  8:46                                                       ` Daniel Borkmann
2015-10-20 17:53                                                         ` Alexei Starovoitov
2015-10-20 18:56                                                           ` Eric W. Biederman
2015-10-21 15:17                                                             ` Daniel Borkmann
2015-10-21 18:34                                                               ` Thomas Graf
2015-10-21 22:44                                                                 ` Alexei Starovoitov
2015-10-22 13:22                                                                   ` Daniel Borkmann
2015-10-22 19:35                                                               ` Eric W. Biederman
2015-10-23 13:47                                                                 ` Daniel Borkmann
2015-10-20  9:43                                                       ` Hannes Frederic Sowa
2015-10-19 23:02                                                   ` Hannes Frederic Sowa
2015-10-20  1:09                                                     ` Alexei Starovoitov
2015-10-20 10:07                                                       ` Hannes Frederic Sowa
2015-10-20 18:44                                                         ` Alexei Starovoitov
2015-10-16 19:54             ` Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 4/4] bpf: add sample usages " Daniel Borkmann
2015-10-19  2:53 ` [PATCH net-next 0/4] BPF updates David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=562545AA.2080207@plumgrid.com \
    --to=ast@plumgrid.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=hannes@stressinduktion.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).