From: Daniel Borkmann <daniel@iogearbox.net>
To: Alexei Starovoitov <ast@plumgrid.com>,
Hannes Frederic Sowa <hannes@stressinduktion.org>,
"Eric W. Biederman" <ebiederm@xmission.com>
Cc: davem@davemloft.net, viro@ZenIV.linux.org.uk, tgraf@suug.ch,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs
Date: Mon, 19 Oct 2015 19:37:07 +0200 [thread overview]
Message-ID: <56252A43.3000706@iogearbox.net> (raw)
In-Reply-To: <562518B8.2070401@plumgrid.com>
On 10/19/2015 06:22 PM, Alexei Starovoitov wrote:
> On 10/19/15 7:23 AM, Daniel Borkmann wrote:
>>>> The mknod is not the holder but rather the kobject which should be
>>>> represented in sysfs will be. So you can still get the map major:minor
>>>> by looking up the /dev file in the correspdonding sysfs directory or I
>>>> think we should provide a 'unbind' file, which will drop the kobject if
>>>> the user writes a '1' to it.
>>>
>>> I agree, this could still be done.
>
> imo doing 'rm' is way cleaner then dealing with 'unbind' file.
Hmm, not sure, maybe this was misunderstood. It's not about files, but
rather devices. Devices are decoupled.
This unbind file is optional and could live under /sys/class/bpf/bpf_{map,
prog}<X>/unbind for a device release. It's not strictly necessary for this
to work, though, the management is, as explained, via bpf() syscall.
>> As Hannes said, under /sys/class/bpf/ an admin can see all held nodes, so
>> visibility is there for free at all times. The device management (creation/
>> deletion) itself and the mknod's pointing to it are simply decoupled.
>>
>> This whole approach looks sound to me, also integrates nicely into the
>> existing Linux facilities, and works on top of every fs supporting special
>> files. Much cleaner than an extra file-system that would be required by a
>> syscall in order to make the syscall work.
>
> thanks for the explanations. I think I got a complete picture now on
> how such cdev will be used and I don't like it.
> There is nothing in linux or any unix that creates thousands of cdevs
> on the fly, but here user apps will create/destroy them back and forth
> and they would need to do it quickly. Whole sysfs/kobj baggage is
Well, you are talking about thousand maps and even root can create about
5 maps and then will get an -EPERM. ;) Until an admin will figure out over
couple of corners that ulimit -l needs to be adjusted ... ;)
But more serious, can you elaborate what you mean?
An eBPF program or map loading/destruction is *not* by any means to be
considered fast-path. We currently hold a global mutex during loading.
So, how can that be considered fast-path? Similarly, socket creation/
destruction is also not fast-path, etc. Do you expect that applications
would create/destroy these devices within milliseconds? I'd argue that
something would be seriously wrong with that application, then. Such
persistent maps are to be considered rather mid-long living objects in
the system. The fast-path surely is the data-path of them.
> completely unnecessary here. The kernel will consume more memory for
> no real reason other than cdev are used to keep prog/maps around.
I don't consider this a big issue, and well worth the trade-off. You'll
have an infrastructure that integrates *nicely* into the *existing* kernel
model *and* tooling with the proposed patch. This is a HUGE plus. The
UAPI of this is simple and minimal. And to me, these are in-fact special
files, not regular ones.
> imo fs is cleaner and we can tailor it to be similar to cdev style.
Really, IMHO I think this is over-designed, and much much more hacky. We
design a whole new file system that works *exactly* like cdevs, takes
likely more than twice the code and complexity to realize but just to
save a few bytes ...? I don't understand that.
Cheers,
Daniel
next prev parent reply other threads:[~2015-10-19 17:37 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-16 1:09 [PATCH net-next 0/4] BPF updates Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 1/4] bpf: abstract anon_inode_getfd invocations Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 2/4] bpf: align and clean bpf_{map,prog}_get helpers Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 3/4] bpf: add support for persistent maps/progs Daniel Borkmann
2015-10-16 10:25 ` Hannes Frederic Sowa
2015-10-16 13:36 ` Daniel Borkmann
2015-10-16 16:36 ` Hannes Frederic Sowa
2015-10-16 17:27 ` Daniel Borkmann
2015-10-16 17:37 ` Alexei Starovoitov
2015-10-16 16:18 ` Alexei Starovoitov
2015-10-16 16:43 ` Hannes Frederic Sowa
2015-10-16 17:32 ` Alexei Starovoitov
2015-10-16 17:37 ` Thomas Graf
2015-10-16 17:21 ` Hannes Frederic Sowa
2015-10-16 17:42 ` Alexei Starovoitov
2015-10-16 17:56 ` Daniel Borkmann
2015-10-16 18:41 ` Eric W. Biederman
2015-10-16 19:27 ` Alexei Starovoitov
2015-10-16 19:53 ` Eric W. Biederman
2015-10-16 20:56 ` Alexei Starovoitov
2015-10-16 23:44 ` Eric W. Biederman
2015-10-17 2:43 ` Alexei Starovoitov
2015-10-17 12:28 ` Daniel Borkmann
2015-10-18 2:20 ` Alexei Starovoitov
2015-10-18 15:03 ` Daniel Borkmann
2015-10-18 16:49 ` Daniel Borkmann
2015-10-18 20:59 ` Alexei Starovoitov
2015-10-19 7:36 ` Hannes Frederic Sowa
2015-10-19 9:51 ` Daniel Borkmann
2015-10-19 14:23 ` Daniel Borkmann
2015-10-19 16:22 ` Alexei Starovoitov
2015-10-19 17:37 ` Daniel Borkmann [this message]
2015-10-19 18:15 ` Alexei Starovoitov
2015-10-19 18:46 ` Hannes Frederic Sowa
2015-10-19 19:34 ` Alexei Starovoitov
2015-10-19 20:03 ` Hannes Frederic Sowa
2015-10-19 20:48 ` Alexei Starovoitov
2015-10-19 22:17 ` Daniel Borkmann
2015-10-20 0:30 ` Alexei Starovoitov
2015-10-20 8:46 ` Daniel Borkmann
2015-10-20 17:53 ` Alexei Starovoitov
2015-10-20 18:56 ` Eric W. Biederman
2015-10-21 15:17 ` Daniel Borkmann
2015-10-21 18:34 ` Thomas Graf
2015-10-21 22:44 ` Alexei Starovoitov
2015-10-22 13:22 ` Daniel Borkmann
2015-10-22 19:35 ` Eric W. Biederman
2015-10-23 13:47 ` Daniel Borkmann
2015-10-20 9:43 ` Hannes Frederic Sowa
2015-10-19 23:02 ` Hannes Frederic Sowa
2015-10-20 1:09 ` Alexei Starovoitov
2015-10-20 10:07 ` Hannes Frederic Sowa
2015-10-20 18:44 ` Alexei Starovoitov
2015-10-16 19:54 ` Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 4/4] bpf: add sample usages " Daniel Borkmann
2015-10-19 2:53 ` [PATCH net-next 0/4] BPF updates David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56252A43.3000706@iogearbox.net \
--to=daniel@iogearbox.net \
--cc=ast@kernel.org \
--cc=ast@plumgrid.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=hannes@stressinduktion.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tgraf@suug.ch \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).