From: Ingo Molnar <mingo@kernel.org>
To: Fam Zheng <famz@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Kees Cook <keescook@chromium.org>,
Andy Lutomirski <luto@amacapital.net>,
David Herrmann <dh.herrmann@gmail.com>,
Alexei Starovoitov <ast@plumgrid.com>,
Miklos Szeredi <mszeredi@suse.cz>,
David Drysdale <drysdale@google.com>,
Oleg Nesterov <oleg@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
Vivek Goyal <vgoyal@redhat.com>,
Mike Frysinger <vapier@gentoo.org>, Theodore Ts'o <tytso@mit.edu>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Rashika Kheria <rashika.kheria@gmail.com>,
Hugh Dickins <hughd@google.com>,
Mathieu Desnoyers <mathieu.desnoyers@effi>
Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Wed, 18 Feb 2015 19:49:34 +0100 [thread overview]
Message-ID: <20150218184934.GA7493@gmail.com> (raw)
In-Reply-To: <20150216010224.GA32421@ad.nay.redhat.com>
* Fam Zheng <famz@redhat.com> wrote:
> On Sun, 02/15 15:00, Jonathan Corbet wrote:
> > On Fri, 13 Feb 2015 17:03:56 +0800
> > Fam Zheng <famz@redhat.com> wrote:
> >
> > > SYNOPSIS
> > >
> > > #include <sys/epoll.h>
> > >
> > > int epoll_pwait1(int epfd, int flags,
> > > struct epoll_event *events,
> > > int maxevents,
> > > struct epoll_wait_params *params);
> >
> > Quick, possibly dumb question: might it make sense to also pass in
> > sizeof(struct epoll_wait_params)? That way, when somebody wants to add
> > another parameter in the future, the kernel can tell which version is in
> > use and they won't have to do an epoll_pwait2()?
> >
>
> Flags can be used for that, if the change is not
> radically different.
Passing in size is generally better than flags, because
that way an extension of the ABI (new field[s])
automatically signals towards the kernel what to do with
old binaries - while extending the functionality of new
binaries, without sacrificing functionality.
With flags you are either limited to the same structure
size - or have to decode a 'size' value from the flags
value - which is fragile (and in which case a real 'size'
parameter is better).
in the perf ABI we use something like that: there's a
perf_attr.size parameter that iterates the ABI forward,
while still being binary compatible with older software.
If old binaries pass in a smaller structure to a newer
kernel then the kernel pads the new fields with zero by
default - that way the kernel internals are never burdened
with compatibility details and data format versions.
If new user-space passes in a large structure than the
kernel can handle then the kernel returns an error - this
way user-space can transparently support conditional
features and fallback logic.
It works really well, we've done literally a hundred perf
ABI extensions this way in the last 4+ years, in a pretty
natural fashion, without littering the kernel (or
user-space) with version legacies and without breaking
existing perf tooling.
Other syscall ABIs already get painful when trying to
handle 2-3 data structure versions, so people either give
up, or add flags kludges or go to new syscall entries:
which is painful in its own fashion and adds unnecessary
latency to feature introduction as well.
Thanks,
Ingo
WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Fam Zheng <famz@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Kees Cook <keescook@chromium.org>,
Andy Lutomirski <luto@amacapital.net>,
David Herrmann <dh.herrmann@gmail.com>,
Alexei Starovoitov <ast@plumgrid.com>,
Miklos Szeredi <mszeredi@suse.cz>,
David Drysdale <drysdale@google.com>,
Oleg Nesterov <oleg@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
Vivek Goyal <vgoyal@redhat.com>,
Mike Frysinger <vapier@gentoo.org>, Theodore Ts'o <tytso@mit.edu>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Rashika Kheria <rashika.kheria@gmail.com>,
Hugh Dickins <hughd@google.com>,
Mathieu Desnoyers <mathieu.desnoyers@effi
Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Wed, 18 Feb 2015 19:49:34 +0100 [thread overview]
Message-ID: <20150218184934.GA7493@gmail.com> (raw)
In-Reply-To: <20150216010224.GA32421@ad.nay.redhat.com>
* Fam Zheng <famz@redhat.com> wrote:
> On Sun, 02/15 15:00, Jonathan Corbet wrote:
> > On Fri, 13 Feb 2015 17:03:56 +0800
> > Fam Zheng <famz@redhat.com> wrote:
> >
> > > SYNOPSIS
> > >
> > > #include <sys/epoll.h>
> > >
> > > int epoll_pwait1(int epfd, int flags,
> > > struct epoll_event *events,
> > > int maxevents,
> > > struct epoll_wait_params *params);
> >
> > Quick, possibly dumb question: might it make sense to also pass in
> > sizeof(struct epoll_wait_params)? That way, when somebody wants to add
> > another parameter in the future, the kernel can tell which version is in
> > use and they won't have to do an epoll_pwait2()?
> >
>
> Flags can be used for that, if the change is not
> radically different.
Passing in size is generally better than flags, because
that way an extension of the ABI (new field[s])
automatically signals towards the kernel what to do with
old binaries - while extending the functionality of new
binaries, without sacrificing functionality.
With flags you are either limited to the same structure
size - or have to decode a 'size' value from the flags
value - which is fragile (and in which case a real 'size'
parameter is better).
in the perf ABI we use something like that: there's a
perf_attr.size parameter that iterates the ABI forward,
while still being binary compatible with older software.
If old binaries pass in a smaller structure to a newer
kernel then the kernel pads the new fields with zero by
default - that way the kernel internals are never burdened
with compatibility details and data format versions.
If new user-space passes in a large structure than the
kernel can handle then the kernel returns an error - this
way user-space can transparently support conditional
features and fallback logic.
It works really well, we've done literally a hundred perf
ABI extensions this way in the last 4+ years, in a pretty
natural fashion, without littering the kernel (or
user-space) with version legacies and without breaking
existing perf tooling.
Other syscall ABIs already get painful when trying to
handle 2-3 data structure versions, so people either give
up, or add flags kludges or go to new syscall entries:
which is painful in its own fashion and adds unnecessary
latency to feature introduction as well.
Thanks,
Ingo
WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Fam Zheng <famz@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Kees Cook <keescook@chromium.org>,
Andy Lutomirski <luto@amacapital.net>,
David Herrmann <dh.herrmann@gmail.com>,
Alexei Starovoitov <ast@plumgrid.com>,
Miklos Szeredi <mszeredi@suse.cz>,
David Drysdale <drysdale@google.com>,
Oleg Nesterov <oleg@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
Vivek Goyal <vgoyal@redhat.com>,
Mike Frysinger <vapier@gentoo.org>,
"Theodore Ts'o" <tytso@mit.edu>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Rashika Kheria <rashika.kheria@gmail.com>,
Hugh Dickins <hughd@google.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
Josh Triplett <josh@joshtriplett.org>,
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Omar Sandoval <osandov@osandov.com>
Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Wed, 18 Feb 2015 19:49:34 +0100 [thread overview]
Message-ID: <20150218184934.GA7493@gmail.com> (raw)
In-Reply-To: <20150216010224.GA32421@ad.nay.redhat.com>
* Fam Zheng <famz@redhat.com> wrote:
> On Sun, 02/15 15:00, Jonathan Corbet wrote:
> > On Fri, 13 Feb 2015 17:03:56 +0800
> > Fam Zheng <famz@redhat.com> wrote:
> >
> > > SYNOPSIS
> > >
> > > #include <sys/epoll.h>
> > >
> > > int epoll_pwait1(int epfd, int flags,
> > > struct epoll_event *events,
> > > int maxevents,
> > > struct epoll_wait_params *params);
> >
> > Quick, possibly dumb question: might it make sense to also pass in
> > sizeof(struct epoll_wait_params)? That way, when somebody wants to add
> > another parameter in the future, the kernel can tell which version is in
> > use and they won't have to do an epoll_pwait2()?
> >
>
> Flags can be used for that, if the change is not
> radically different.
Passing in size is generally better than flags, because
that way an extension of the ABI (new field[s])
automatically signals towards the kernel what to do with
old binaries - while extending the functionality of new
binaries, without sacrificing functionality.
With flags you are either limited to the same structure
size - or have to decode a 'size' value from the flags
value - which is fragile (and in which case a real 'size'
parameter is better).
in the perf ABI we use something like that: there's a
perf_attr.size parameter that iterates the ABI forward,
while still being binary compatible with older software.
If old binaries pass in a smaller structure to a newer
kernel then the kernel pads the new fields with zero by
default - that way the kernel internals are never burdened
with compatibility details and data format versions.
If new user-space passes in a large structure than the
kernel can handle then the kernel returns an error - this
way user-space can transparently support conditional
features and fallback logic.
It works really well, we've done literally a hundred perf
ABI extensions this way in the last 4+ years, in a pretty
natural fashion, without littering the kernel (or
user-space) with version legacies and without breaking
existing perf tooling.
Other syscall ABIs already get painful when trying to
handle 2-3 data structure versions, so people either give
up, or add flags kludges or go to new syscall entries:
which is painful in its own fashion and adds unnecessary
latency to feature introduction as well.
Thanks,
Ingo
next prev parent reply other threads:[~2015-02-18 18:49 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-13 9:03 [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
2015-02-13 9:03 ` Fam Zheng
2015-02-13 9:03 ` Fam Zheng
2015-02-13 9:03 ` [PATCH RFC v3 2/7] epoll: Specify clockid explicitly Fam Zheng
2015-02-13 9:03 ` Fam Zheng
2015-02-13 9:03 ` Fam Zheng
2015-02-13 9:03 ` [PATCH RFC v3 3/7] epoll: Extract ep_ctl_do Fam Zheng
2015-02-13 9:03 ` Fam Zheng
2015-02-13 9:03 ` Fam Zheng
[not found] ` <1423818243-15410-1-git-send-email-famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-13 9:03 ` [PATCH RFC v3 1/7] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
2015-02-13 9:03 ` Fam Zheng
2015-02-13 9:03 ` Fam Zheng
2015-02-13 9:04 ` [PATCH RFC v3 4/7] epoll: Add implementation for epoll_ctl_batch Fam Zheng
2015-02-13 9:04 ` Fam Zheng
2015-02-13 9:04 ` Fam Zheng
2015-02-13 9:04 ` [PATCH RFC v3 5/7] x86: Hook up epoll_ctl_batch syscall Fam Zheng
2015-02-13 9:04 ` Fam Zheng
2015-02-13 9:04 ` Fam Zheng
2015-02-13 9:04 ` [PATCH RFC v3 6/7] epoll: Add implementation for epoll_pwait1 Fam Zheng
2015-02-13 9:04 ` Fam Zheng
2015-02-13 9:04 ` Fam Zheng
2015-02-13 9:04 ` [PATCH RFC v3 7/7] x86: Hook up epoll_pwait1 syscall Fam Zheng
2015-02-13 9:04 ` Fam Zheng
2015-02-13 9:04 ` Fam Zheng
2015-02-13 9:53 ` [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Omar Sandoval
2015-02-13 9:53 ` Omar Sandoval
2015-02-13 9:53 ` Omar Sandoval
2015-02-15 6:44 ` Fam Zheng
2015-02-15 15:16 ` Michael Kerrisk (man-pages)
2015-02-15 15:16 ` Michael Kerrisk (man-pages)
2015-02-15 15:16 ` Michael Kerrisk (man-pages)
2015-02-15 22:00 ` Jonathan Corbet
2015-02-15 22:00 ` Jonathan Corbet
2015-02-15 22:00 ` Jonathan Corbet
[not found] ` <20150215150011.0340686c-T1hC0tSOHrs@public.gmane.org>
2015-02-16 1:02 ` Fam Zheng
2015-02-16 1:02 ` Fam Zheng
2015-02-16 1:02 ` Fam Zheng
2015-02-16 7:25 ` Seymour, Shane M
2015-02-16 7:25 ` Seymour, Shane M
2015-02-16 7:25 ` Seymour, Shane M
[not found] ` <DDB9C85B850785449757F9914A034FCB3BF41130-4I1V4pQFGigSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2015-02-16 8:12 ` Fam Zheng
2015-02-16 8:12 ` Fam Zheng
2015-02-16 8:12 ` Fam Zheng
2015-02-18 18:49 ` Ingo Molnar [this message]
2015-02-18 18:49 ` Ingo Molnar
2015-02-18 18:49 ` Ingo Molnar
2015-02-25 3:30 ` Fam Zheng
2015-02-25 3:30 ` Fam Zheng
2015-02-25 3:30 ` Fam Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150218184934.GA7493@gmail.com \
--to=mingo@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=ast@plumgrid.com \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=dh.herrmann@gmail.com \
--cc=drysdale@google.com \
--cc=famz@redhat.com \
--cc=heiko.carstens@de.ibm.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=luto@amacapital.net \
--cc=mathieu.desnoyers@effi \
--cc=mingo@redhat.com \
--cc=mszeredi@suse.cz \
--cc=oleg@redhat.com \
--cc=rashika.kheria@gmail.com \
--cc=tglx@linutronix.de \
--cc=tytso@mit.edu \
--cc=vapier@gentoo.org \
--cc=vgoyal@redhat.com \
--cc=viro@zeniv.linux.org.uk \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.