From: Hannes Frederic Sowa <hannes@stressinduktion.org>
To: Alexei Starovoitov <ast@plumgrid.com>,
Daniel Borkmann <daniel@iogearbox.net>,
"Eric W. Biederman" <ebiederm@xmission.com>
Cc: davem@davemloft.net, viro@ZenIV.linux.org.uk, tgraf@suug.ch,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs
Date: Tue, 20 Oct 2015 12:07:51 +0200 [thread overview]
Message-ID: <1445335671.822349.415077537.0691ED4C@webmail.messagingengine.com> (raw)
In-Reply-To: <56259438.3080308@plumgrid.com>
Hello Alexei,
On Tue, Oct 20, 2015, at 03:09, Alexei Starovoitov wrote:
> On 10/19/15 4:02 PM, Hannes Frederic Sowa wrote:
> > I bet commercial software will make use of this ebpf framework, too. And
> > the kernel always helped me and gave me a way to see what is going on,
> > debug which part of my operating system universe interacts with which
> > other part. Merely dropping file descriptors with data attached to them
> > in an filesystem seems not to fulfill my need at all. I would love to
> > see where resources are referenced and why, like I am nowadays.
>
> agree. common fs with hierarchy will give this visibility in
> one place.
>
> >> >It feels you're pushing for cdev only because of that potential
> >> >debugging need. Did you actually face that need? I didn't and
> >> >don't like to add 'nice to have' feature until real need comes.
> > Given that we want to monitor the load of a hashmap for graphing
> > purposes. Or liberate some hashmaps from its restriction on number of
> > keys and make upper bounds configurable by admins who know the
> > dimensions of their systems and not some software deep down buried in
> > the bpf syscall where I might not have access to source code. In tc
> > force e.g. hashmaps to do garbage collection because we cannot be sure
> > that under DoS attacks user space clean up gets scheduled early enough
> > if ebpf adds flows to hashtables. I do see need to expand and implement
> > some kind of policy in the future.
>
> disagree here. admin should not interfere with map parameters.
> What you proposing above sounds very very dangerous.
> Admins to configure GC of maps? What do you think the programs will do
> with such sophisticated maps? What kind of networking app you have
> in mind? Anyway that's a bit off-topic. I'm very curious though.
<off-topic>
Just a pretty obvious idea is accurate sampling of flows.
</off-topic>
> >> >single task in seccomp can have a chain of bpf progs, so hierarchy
> >> >is already there.
> > And it would be great to inspect them.
>
> again let's not mix criu and lsof-like requirements with 'pin fd'.
> For visibility of normal maps we can add fdinfo and lsof
> can pick it up without any fs or any cdevs.
fdinfo tells me where my position in a file is and which locks the file
have? Nothing like that is supposed to work on bpf file descriptors,
because they are kind of special. A new hierarchy has to be installed
alongside fdinfo/.
> > I am fine with creating maps only by bpf syscall. But to hide
> > configuration details or at least not be really able to query them
> > easily seems odd to me. If we go with the ebpffs how could those
> > attributes be added?
>
> I'm not advocating to hide details. Most of the time maps will not be
> pinned, so fdinfo seems the easiest way to show things like key_size,
> value_size, max_entries, type.
This is an argument in favor of the "fdinfo-like" approach.
So far, if someone wants to delve into the details of a map my approach
would be to take the file descriptor and make it persistence. I have to
think about that some more.
> Even if we decide to do it some other way, it's not related to 'pin fd'
> discussion, since debugging/visibility is nice to have for all bpf
> objects. Note that walking of key/value without pretty-printers
> provided by the app is meaningless for admin, so only things
> like 'how much memory this map is using' are useful.
Yes, absolutely and I am absolutely against pretty printing key values
in kernel domain.
> May be we should try to draft the hierarchy of this common fs.
> How about:
> /sys/kernel/bpf/username/optional_dirs_mkdir_by_user/progX
> and 'cat' of it will print the same as fdinfo for normal maps,
> so admin can see what maps were pinned by user and its cost.
So cat-ing them will produce text output with some details about the
map? This is what I wanted to avoid. The concept with symlinks and small
files seems much cleaner and nicer to me. Also you cannot add writable
attributes to this filesystem or you overload stuff heavily?
> Inside 'fdinfo' output we can provide pointers to which progs
> are using which maps as
> # cat /sys/kernel/bpf/.../mapX
> key_size: 4
> used_by: /proc/xxx/fd/5
> # cat /sys/kernel/bpf/.../progY
> type: socket
> using: /proc/xxx/fd/6
> using: /sys/kernel/bpf/.../mapZ
> and similar for cat /proc/xxx/fdinfo/6
> but showing hierarchy as directories is non starter, since
> it's no a tree.
It is not a tree but a graph, sure, that's why sysfs allows to break the
cyclic dependencies and create symlinks (see holders/ directories). ;)
And if you implement the same set of features IMHO you basically
re-implement sysfs. In the beginning we just expose the basic maps and
there won't be any features in sysfs, but it will be cheap to have
read/write flags on maps etc. etc. (I don't know what people will come
up with, yet.). In my opinion those are clearly attributes of a map and
should be defined and managed alongside with their holders.
> All of these would be nice, but doesn't have to be implemented
> along with 'pin fd' feature.
The pinfd feature will provide the future infrastructure alongside to
make this usable, so I think it is worth spending time to think about
it.
Thanks,
Hannes
next prev parent reply other threads:[~2015-10-20 10:07 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-16 1:09 [PATCH net-next 0/4] BPF updates Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 1/4] bpf: abstract anon_inode_getfd invocations Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 2/4] bpf: align and clean bpf_{map,prog}_get helpers Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 3/4] bpf: add support for persistent maps/progs Daniel Borkmann
2015-10-16 10:25 ` Hannes Frederic Sowa
2015-10-16 13:36 ` Daniel Borkmann
2015-10-16 16:36 ` Hannes Frederic Sowa
2015-10-16 17:27 ` Daniel Borkmann
2015-10-16 17:37 ` Alexei Starovoitov
2015-10-16 16:18 ` Alexei Starovoitov
2015-10-16 16:43 ` Hannes Frederic Sowa
2015-10-16 17:32 ` Alexei Starovoitov
2015-10-16 17:37 ` Thomas Graf
2015-10-16 17:21 ` Hannes Frederic Sowa
2015-10-16 17:42 ` Alexei Starovoitov
2015-10-16 17:56 ` Daniel Borkmann
2015-10-16 18:41 ` Eric W. Biederman
2015-10-16 19:27 ` Alexei Starovoitov
2015-10-16 19:53 ` Eric W. Biederman
2015-10-16 20:56 ` Alexei Starovoitov
2015-10-16 23:44 ` Eric W. Biederman
2015-10-17 2:43 ` Alexei Starovoitov
2015-10-17 12:28 ` Daniel Borkmann
2015-10-18 2:20 ` Alexei Starovoitov
2015-10-18 15:03 ` Daniel Borkmann
2015-10-18 16:49 ` Daniel Borkmann
2015-10-18 20:59 ` Alexei Starovoitov
2015-10-19 7:36 ` Hannes Frederic Sowa
2015-10-19 9:51 ` Daniel Borkmann
2015-10-19 14:23 ` Daniel Borkmann
2015-10-19 16:22 ` Alexei Starovoitov
2015-10-19 17:37 ` Daniel Borkmann
2015-10-19 18:15 ` Alexei Starovoitov
2015-10-19 18:46 ` Hannes Frederic Sowa
2015-10-19 19:34 ` Alexei Starovoitov
2015-10-19 20:03 ` Hannes Frederic Sowa
2015-10-19 20:48 ` Alexei Starovoitov
2015-10-19 22:17 ` Daniel Borkmann
2015-10-20 0:30 ` Alexei Starovoitov
2015-10-20 8:46 ` Daniel Borkmann
2015-10-20 17:53 ` Alexei Starovoitov
2015-10-20 18:56 ` Eric W. Biederman
2015-10-21 15:17 ` Daniel Borkmann
2015-10-21 18:34 ` Thomas Graf
2015-10-21 22:44 ` Alexei Starovoitov
2015-10-22 13:22 ` Daniel Borkmann
2015-10-22 19:35 ` Eric W. Biederman
2015-10-23 13:47 ` Daniel Borkmann
2015-10-20 9:43 ` Hannes Frederic Sowa
2015-10-19 23:02 ` Hannes Frederic Sowa
2015-10-20 1:09 ` Alexei Starovoitov
2015-10-20 10:07 ` Hannes Frederic Sowa [this message]
2015-10-20 18:44 ` Alexei Starovoitov
2015-10-16 19:54 ` Daniel Borkmann
2015-10-16 1:09 ` [PATCH net-next 4/4] bpf: add sample usages " Daniel Borkmann
2015-10-19 2:53 ` [PATCH net-next 0/4] BPF updates David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1445335671.822349.415077537.0691ED4C@webmail.messagingengine.com \
--to=hannes@stressinduktion.org \
--cc=ast@kernel.org \
--cc=ast@plumgrid.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tgraf@suug.ch \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).