Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Borkmann <daniel@iogearbox.net>
To: Alexei Starovoitov <ast@plumgrid.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>,
	davem@davemloft.net, viro@ZenIV.linux.org.uk, tgraf@suug.ch,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs
Date: Sat, 17 Oct 2015 14:28:48 +0200	[thread overview]
Message-ID: <56223F00.5030203@iogearbox.net> (raw)
In-Reply-To: <5621B5BC.8020204@plumgrid.com>

On 10/17/2015 04:43 AM, Alexei Starovoitov wrote:
> On 10/16/15 4:44 PM, Eric W. Biederman wrote:
>> Alexei Starovoitov <ast@plumgrid.com> writes:
>>
>>> We can argue about api for 2nd, whether it's mount with fd=1234 string
>>> or else, but for the first mount style doesn't make sense.
>>
>> Why does mount not make sense?  It is exactly what you are looking for
>> so why does it not make sense?
>
> hmm, how do you get a new fd back after mounting it?
> Note, open cannot be overloaded, so we end up with BPF_NEW_FD anyway,
> but now it's more convoluted and empty mounts are everywhere.

That would be my understanding as well, but as Alexei already said,
these are two different issues, it would be step 2 (let me get back
to that further below). But in any case, I don't really like dumping
key/value somewhere as files. You have binary blobs as both, and
lets say your application has a lookup-key (for whatever reason) of
several cachelines it all ends up getting rather messy than making
it really useful for non-bpf(2) aware cmdline tools to deal with.

Anyway, another idea I've been brainstorming with Hannes today a
bit is about the following:

We register two major numbers, one for eBPF maps (X), one for eBPF
progs (Y). A user can either via cmdline call something like ...
mknod /dev/bpf/maps/map_pkts c X Z to create a special character
device, or alternatively out of an application through mknod(2)
syscall (f.e. tc when setting up maps/progs internally from the obj
file for a classifer).

Then, we still have 2 eBPF commands for bpf(2) syscall to add, say
(for example) BPF_BIND_DEV and BPF_FETCH_DEV. The application that
created a map (or prog) already has the map fd and after mknod(2) it
can open(2) the special file to get the special file fd. Then it can
call something like bpf(BPF_BIND_DEV, &attr, sizeof(attr))) where
attr looks like:

   union bpf_attr attr = {
     .bpf_fd    = bpf_fd,
     .dev_fd    = dev_fd,
   };

The bpf(2) syscall can check whether dev_fd belongs to an eBPF special
file and it can then copy over file->private_data from the bpf_fd
to the dev_fd's underlying file, where the private_data, as we know,
from the bpf_fd already points to a proper bpf_map/bpf_prog structure.
The map/prog would then get ref'ed and lives onwards in the char device's
lifetime. No special hashtable, gc, etc needed. The char device has fops
that we can define by ourself, and unlinking would drop the ref from
its private_data.

Now to the other part: BPF_FETCH_DEV would work similar. The application
opens the device, and fills bpf_attr as follows again:

   union bpf_attr attr = {
     .bpf_fd    = 0,
     .dev_fd    = dev_fd,
   };

This would allow us to look up the map/prog from the dev_fd's file->
private_data, and installs a new fd via bpf_{map,prog}_new_fd() that
is returned from bpf(2) for bpf-related access. The remaining fops
from the char device could still be reserved for possibilities like
debugging in future.

Now in future (2nd step), could either be to use Eric's idea and then do
something like mount -t bpffs ... -o /dev/bpf/maps/map_pkts to dump
attributes or other properties to some location for inspection from such
a special file, or we could use kobjects for that attached to the device
if the fops from the cdev should not be sufficient.

So closing the loop to the special files where there were concerns:

This won't forbid to have a future shell-style access possibility, and
it would also not end up as a nightmare on what you mentioned with the
S_ISSOCK-like bit in the other email.

The pinning mechanism would not require an extra file system to be mounted
somewhere, and yet the user can define himself an arbitrary hierarchy
where he puts the special files as this facility already exists. An
approach like this looks overall cleaner to me, and most likely be
realizable in fewer lines of code as well.

Thoughts?

Cheers,
Daniel

next prev parent reply	other threads:[~2015-10-17 12:29 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-16  1:09 [PATCH net-next 0/4] BPF updates Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 1/4] bpf: abstract anon_inode_getfd invocations Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 2/4] bpf: align and clean bpf_{map,prog}_get helpers Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 3/4] bpf: add support for persistent maps/progs Daniel Borkmann
2015-10-16 10:25   ` Hannes Frederic Sowa
2015-10-16 13:36     ` Daniel Borkmann
2015-10-16 16:36       ` Hannes Frederic Sowa
2015-10-16 17:27         ` Daniel Borkmann
2015-10-16 17:37           ` Alexei Starovoitov
2015-10-16 16:18     ` Alexei Starovoitov
2015-10-16 16:43       ` Hannes Frederic Sowa
2015-10-16 17:32         ` Alexei Starovoitov
2015-10-16 17:37           ` Thomas Graf
2015-10-16 17:21   ` Hannes Frederic Sowa
2015-10-16 17:42     ` Alexei Starovoitov
2015-10-16 17:56       ` Daniel Borkmann
2015-10-16 18:41         ` Eric W. Biederman
2015-10-16 19:27           ` Alexei Starovoitov
2015-10-16 19:53             ` Eric W. Biederman
2015-10-16 20:56               ` Alexei Starovoitov
2015-10-16 23:44                 ` Eric W. Biederman
2015-10-17  2:43                   ` Alexei Starovoitov
2015-10-17 12:28                     ` Daniel Borkmann [this message]
2015-10-18  2:20                       ` Alexei Starovoitov
2015-10-18 15:03                         ` Daniel Borkmann
2015-10-18 16:49                           ` Daniel Borkmann
2015-10-18 20:59                             ` Alexei Starovoitov
2015-10-19  7:36                               ` Hannes Frederic Sowa
2015-10-19  9:51                                 ` Daniel Borkmann
2015-10-19 14:23                                   ` Daniel Borkmann
2015-10-19 16:22                                     ` Alexei Starovoitov
2015-10-19 17:37                                       ` Daniel Borkmann
2015-10-19 18:15                                         ` Alexei Starovoitov
2015-10-19 18:46                                           ` Hannes Frederic Sowa
2015-10-19 19:34                                             ` Alexei Starovoitov
2015-10-19 20:03                                               ` Hannes Frederic Sowa
2015-10-19 20:48                                                 ` Alexei Starovoitov
2015-10-19 22:17                                                   ` Daniel Borkmann
2015-10-20  0:30                                                     ` Alexei Starovoitov
2015-10-20  8:46                                                       ` Daniel Borkmann
2015-10-20 17:53                                                         ` Alexei Starovoitov
2015-10-20 18:56                                                           ` Eric W. Biederman
2015-10-21 15:17                                                             ` Daniel Borkmann
2015-10-21 18:34                                                               ` Thomas Graf
2015-10-21 22:44                                                                 ` Alexei Starovoitov
2015-10-22 13:22                                                                   ` Daniel Borkmann
2015-10-22 19:35                                                               ` Eric W. Biederman
2015-10-23 13:47                                                                 ` Daniel Borkmann
2015-10-20  9:43                                                       ` Hannes Frederic Sowa
2015-10-19 23:02                                                   ` Hannes Frederic Sowa
2015-10-20  1:09                                                     ` Alexei Starovoitov
2015-10-20 10:07                                                       ` Hannes Frederic Sowa
2015-10-20 18:44                                                         ` Alexei Starovoitov
2015-10-16 19:54             ` Daniel Borkmann
2015-10-16  1:09 ` [PATCH net-next 4/4] bpf: add sample usages " Daniel Borkmann
2015-10-19  2:53 ` [PATCH net-next 0/4] BPF updates David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56223F00.5030203@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=ast@kernel.org \
    --cc=ast@plumgrid.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=hannes@stressinduktion.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).