From: Alexei Starovoitov <ast@plumgrid.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
Hannes Frederic Sowa <hannes@stressinduktion.org>,
davem@davemloft.net, viro@ZenIV.linux.org.uk, tgraf@suug.ch,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs
Date: Fri, 16 Oct 2015 13:56:58 -0700 [thread overview]
Message-ID: <5621649A.80403@plumgrid.com> (raw)
In-Reply-To: <87y4f2io9l.fsf@x220.int.ebiederm.org>
On 10/16/15 12:53 PM, Eric W. Biederman wrote:
> Alexei Starovoitov <ast@plumgrid.com> writes:
>
>> On 10/16/15 11:41 AM, Eric W. Biederman wrote:
> [...]
>>> I am missing something.
>>>
>>> When I suggested using a filesystem it was my thought there would be
>>> exactly one superblock per map, and the map would be specified at mount
>>> time. You clearly are not implementing that.
>>
>> I don't think it's practical to have sb per map, since that would mean
>> sb per prog and that won't scale.
>
> What do you mean won't scale? You want to have a name per map/prog so the
> basic complexity appears the same. Is there some crucial interaction
> between the persistent dodads you are placing on a filesystem that I am
> missing?
>
> Given the fact you don't normally need any persistence without a program
> I am puzzled why "scaling" is an issue of any kind. This is for a
> comparitively rare case if I am not mistaken.
representing map as a directory tree with files as keys is indeed 'rare'
since it's mainly for debugging and slow accesses,
but 'pin_fd' functionality now popping up everywhere.
Mainly because in things like openstack there are tons of disjoint
libraries written in different languages and the only thing
common is kernel. So pin_fd/new_fd is a mandatory feature.
>> Also map today is an fd that belongs to a process. I cannot see
>> an api from C program to do 'mount of FD' that wouldn't look like
>> ugly hack.
>
> mount -t bpffs ... -o fd=1234
>
> That is not all convoluted or hacky. Especially compared to some of the
> alternatives I am seeing.
>
> It is no problem at all to wrap something like that in a nice function
> call that has the exact same complexity of use as any of the other
> options that are being explored to give something that starts out
> as a filedescriptor a name.
Frankly, I don't think parsing 'fd=1234' string is a clean api, but
before we argue about fs philosophy of passing options, let's
get on the same page with requirements.
First goal that this patch is solving is providing an ability
to 'pin' an FD, so that map/prog won't disappear when user app exist.
Second goal of future patches is to expose map internals as a directory
structure.
These two goals are independent.
We can argue about api for 2nd, whether it's mount with fd=1234 string
or else, but for the first mount style doesn't make sense.
>>> A filesystem per map makes sense as you have a key-value store with one
>>> file per key.
>>>
>>> The idea is that something resembling your bpf_pin_fd function would be
>>> the mount system call for the filesystem.
>>>
>>> The the keys in the map could be read by "ls /mountpoint/".
>>> Key values could be inspected with "cat /mountpoint/key".
>>
>> yes. that is still the goal for follow up patches, but contained
>> within given bpffs. Something bpf_pin_fd-like command for bpf syscall
>> would create files for keys in a map and allow 'cat' via open/read.
>> Such api would be much cleaner from C app point of view.
>> Potentially we can allow mount of a file created via BPF_PIN_FD
>> that will expand into keys/values.
>> All of that are our future plans.
>> There, actually, the main contention point is 'how to represent keys
>> and values'. whether key is hex representation or we need some
>> pretty-printers via format string or via schema? etc, etc.
>> We tried few ideas of representing keys in our fuse implementations,
>> but don't have an agreement yet.
>
> My gut feel would be to keep it simple and use the same representation
> you use in your existing system calls. Certainly ordinary filenames are
> keys of arbitrary binary data that can included everything except
> a '\0' byte. That they are human readable is a nice convention, but not
> at all fundamental to what they are.
that doesn't work. map keys are never human readable. they're arbitrary
binary data. That's why representing them as file name is not trivial.
Some pretty-printer is needed.
Again that is 2nd goal of bpffs in general. We cannot really solve it
now, because we cannot say 'lets represent keys like X and work
from there', since that will become kernel ABI and we won't be able to
change that.
It's equally not clear that thousands of keys can even work as files.
So quite a bit of brainstorming still to do for this 2nd goal.
next prev parent reply other threads:[~2015-10-16 20:56 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-16 1:09 [PATCH net-next 0/4] BPF updates Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 1/4] bpf: abstract anon_inode_getfd invocations Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 2/4] bpf: align and clean bpf_{map,prog}_get helpers Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 3/4] bpf: add support for persistent maps/progs Daniel Borkmann
2015-10-16 10:25 ` Hannes Frederic Sowa
2015-10-16 13:36 ` Daniel Borkmann
2015-10-16 16:36 ` Hannes Frederic Sowa
2015-10-16 17:27 ` Daniel Borkmann
2015-10-16 17:37 ` Alexei Starovoitov
2015-10-16 16:18 ` Alexei Starovoitov
2015-10-16 16:43 ` Hannes Frederic Sowa
2015-10-16 17:32 ` Alexei Starovoitov
2015-10-16 17:37 ` Thomas Graf
2015-10-16 17:21 ` Hannes Frederic Sowa
2015-10-16 17:42 ` Alexei Starovoitov
2015-10-16 17:56 ` Daniel Borkmann
2015-10-16 18:41 ` Eric W. Biederman
2015-10-16 19:27 ` Alexei Starovoitov
2015-10-16 19:53 ` Eric W. Biederman
2015-10-16 20:56 ` Alexei Starovoitov [this message]
2015-10-16 23:44 ` Eric W. Biederman
2015-10-17 2:43 ` Alexei Starovoitov
2015-10-17 12:28 ` Daniel Borkmann
2015-10-18 2:20 ` Alexei Starovoitov
2015-10-18 15:03 ` Daniel Borkmann
2015-10-18 16:49 ` Daniel Borkmann
2015-10-18 20:59 ` Alexei Starovoitov
2015-10-19 7:36 ` Hannes Frederic Sowa
2015-10-19 9:51 ` Daniel Borkmann
2015-10-19 14:23 ` Daniel Borkmann
2015-10-19 16:22 ` Alexei Starovoitov
2015-10-19 17:37 ` Daniel Borkmann
2015-10-19 18:15 ` Alexei Starovoitov
2015-10-19 18:46 ` Hannes Frederic Sowa
2015-10-19 19:34 ` Alexei Starovoitov
2015-10-19 20:03 ` Hannes Frederic Sowa
2015-10-19 20:48 ` Alexei Starovoitov
2015-10-19 22:17 ` Daniel Borkmann
2015-10-20 0:30 ` Alexei Starovoitov
2015-10-20 8:46 ` Daniel Borkmann
2015-10-20 17:53 ` Alexei Starovoitov
2015-10-20 18:56 ` Eric W. Biederman
2015-10-21 15:17 ` Daniel Borkmann
2015-10-21 18:34 ` Thomas Graf
2015-10-21 22:44 ` Alexei Starovoitov
2015-10-22 13:22 ` Daniel Borkmann
2015-10-22 19:35 ` Eric W. Biederman
2015-10-23 13:47 ` Daniel Borkmann
2015-10-20 9:43 ` Hannes Frederic Sowa
2015-10-19 23:02 ` Hannes Frederic Sowa
2015-10-20 1:09 ` Alexei Starovoitov
2015-10-20 10:07 ` Hannes Frederic Sowa
2015-10-20 18:44 ` Alexei Starovoitov
2015-10-16 19:54 ` Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 4/4] bpf: add sample usages " Daniel Borkmann
2015-10-19 2:53 ` [PATCH net-next 0/4] BPF updates David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5621649A.80403@plumgrid.com \
--to=ast@plumgrid.com \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=hannes@stressinduktion.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tgraf@suug.ch \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).