From: Ingo Molnar <mingo@kernel.org>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: David Miller <davem@davemloft.net>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andy Lutomirski <luto@amacapital.net>,
Steven Rostedt <rostedt@goodmis.org>,
Daniel Borkmann <dborkman@redhat.com>,
Chema Gonzalez <chema@google.com>,
Eric Dumazet <edumazet@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Brendan Gregg <brendan.d.gregg@gmail.com>,
Namhyung Kim <namhyung@kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Andrew Morton <akpm@linux-foundation.org>,
Kees Cook <keescook@chromium.org>,
Linux API <linux-api@vger.kernel.org>,
Network Development <netdev@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6 net-next 4/6] bpf: enable bpf syscall on x64 and i386
Date: Tue, 26 Aug 2014 09:45:34 +0200 [thread overview]
Message-ID: <20140826074534.GA19799@gmail.com> (raw)
In-Reply-To: <CAMEtUuy1DXFMABAg2Uup5HtqmiJHw0WR=or-z9CfpVMscrVcVg@mail.gmail.com>
* Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Mon, Aug 25, 2014 at 6:07 PM, David Miller <davem@davemloft.net> wrote:
> > From: Alexei Starovoitov <ast@plumgrid.com>
> > Date: Mon, 25 Aug 2014 18:00:56 -0700
> >
> >> -
> >> +asmlinkage long sys_bpf(int cmd, unsigned long arg2, unsigned long arg3,
> >> + unsigned long arg4, unsigned long arg5);
> >
> > Please do not add interfaces with opaque types as arguments.
> >
> > It is impossible for the compiler to type check the args at
> > compile time when userspace tries to use this stuff.
>
> I share this concern. I went with single BPF syscall, because
> alternative is 6 syscalls for every command and more
> syscalls in the future when we'd need to add another command.
We had a similar problem growing the perf syscall - and we were
able to hold to a single syscall, which I think has served us
well. Had we gone with a per functionality syscall we'd have
something like a dozen syscalls today, scattered all around
non-continuously in the syscall space on most platforms.
But note that 'opaque or non-opaque' is a false dichotomy, as
there are 3 options in reality: what we used instead of an opaque
type was an extensible data type, and extensible C structure,
with structure size expectations part of the structure.
See 'struct perf_event_attr':
SYSCALL_DEFINE5(perf_event_open,
struct perf_event_attr __user *, attr_uptr,
pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)
That way new versions of the data type are immediately obvious to
the kernel, and compatibility can be handled well. Smaller,
previous versions received from old user-space are padded out
transparently to the kernel's value of the structure, with zeroes
filled in.
See perf_copy_attr() in kernel/events/core.c. Instead of
versioning the structure, we use its size as a finegrained and
robust version indicator in essence.
That way it's both forwards and backwards compatible, as much as
possible technically: old kernel can run new user-space, and new
user-space will be able to take advantage of as much of an old
kernel's capabilities as possible, and in the typical case of
version match there's no extra overhead worth speaking of.
This way we were able to gradually grow to the sophisticated ABI
you can find in include/uapi/linux/perf_event.h, without having
to touch the syscall interface. (It's not the only method: we
also have a handful of ioctls, where that's the most natural
interface for a perf event fd.)
Thanks,
Ingo
next prev parent reply other threads:[~2014-08-26 7:45 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-26 1:00 [PATCH v6 net-next 0/6] introduce BPF syscall Alexei Starovoitov
2014-08-26 1:00 ` [PATCH v6 net-next 1/6] net: filter: add "load 64-bit immediate" eBPF instruction Alexei Starovoitov
2014-08-26 1:06 ` David Miller
2014-08-26 1:35 ` Alexei Starovoitov
2014-08-26 1:38 ` Andy Lutomirski
2014-08-26 1:53 ` Alexei Starovoitov
2014-08-26 1:54 ` Andy Lutomirski
2014-08-26 2:02 ` Alexei Starovoitov
2014-08-26 4:12 ` Alexei Starovoitov
2014-08-26 1:00 ` [PATCH v6 net-next 2/6] net: filter: split filter.h and expose eBPF to user space Alexei Starovoitov
2014-08-26 1:00 ` [PATCH v6 net-next 3/6] bpf: introduce syscall(BPF, ...) and BPF maps Alexei Starovoitov
2014-08-26 1:00 ` [PATCH v6 net-next 4/6] bpf: enable bpf syscall on x64 and i386 Alexei Starovoitov
2014-08-26 1:07 ` David Miller
2014-08-26 1:43 ` Alexei Starovoitov
2014-08-26 7:45 ` Ingo Molnar [this message]
2014-08-26 16:29 ` Alexei Starovoitov
2014-08-26 3:52 ` Stephen Hemminger
2014-08-26 4:24 ` Alexei Starovoitov
2014-08-26 7:46 ` Ingo Molnar
2014-08-26 8:00 ` Daniel Borkmann
2014-08-26 8:02 ` Ingo Molnar
2014-08-26 16:40 ` Alexei Starovoitov
2014-08-26 1:00 ` [PATCH v6 net-next 5/6] bpf: add lookup/update/delete/iterate methods to BPF maps Alexei Starovoitov
2014-08-26 1:00 ` [PATCH v6 net-next 6/6] bpf: add hashtable type of " Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140826074534.GA19799@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=ast@plumgrid.com \
--cc=brendan.d.gregg@gmail.com \
--cc=chema@google.com \
--cc=davem@davemloft.net \
--cc=dborkman@redhat.com \
--cc=edumazet@google.com \
--cc=hpa@zytor.com \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=namhyung@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).