All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Fam Zheng <famz@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>, Theodore Ts'o <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@effi>
Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Wed, 18 Feb 2015 19:49:34 +0100	[thread overview]
Message-ID: <20150218184934.GA7493@gmail.com> (raw)
In-Reply-To: <20150216010224.GA32421@ad.nay.redhat.com>


* Fam Zheng <famz@redhat.com> wrote:

> On Sun, 02/15 15:00, Jonathan Corbet wrote:
> > On Fri, 13 Feb 2015 17:03:56 +0800
> > Fam Zheng <famz@redhat.com> wrote:
> > 
> > > SYNOPSIS
> > > 
> > >        #include <sys/epoll.h>
> > > 
> > >        int epoll_pwait1(int epfd, int flags,
> > >                         struct epoll_event *events,
> > >                         int maxevents,
> > >                         struct epoll_wait_params *params);
> > 
> > Quick, possibly dumb question: might it make sense to also pass in 
> > sizeof(struct epoll_wait_params)?  That way, when somebody wants to add
> > another parameter in the future, the kernel can tell which version is in
> > use and they won't have to do an epoll_pwait2()?
> > 
> 
> Flags can be used for that, if the change is not 
> radically different.

Passing in size is generally better than flags, because 
that way an extension of the ABI (new field[s]) 
automatically signals towards the kernel what to do with 
old binaries - while extending the functionality of new 
binaries, without sacrificing functionality.

With flags you are either limited to the same structure 
size - or have to decode a 'size' value from the flags 
value - which is fragile (and in which case a real 'size' 
parameter is better).

in the perf ABI we use something like that: there's a 
perf_attr.size parameter that iterates the ABI forward, 
while still being binary compatible with older software.

If old binaries pass in a smaller structure to a newer 
kernel then the kernel pads the new fields with zero by 
default - that way the kernel internals are never burdened 
with compatibility details and data format versions.

If new user-space passes in a large structure than the 
kernel can handle then the kernel returns an error - this 
way user-space can transparently support conditional 
features and fallback logic.

It works really well, we've done literally a hundred perf 
ABI extensions this way in the last 4+ years, in a pretty 
natural fashion, without littering the kernel (or 
user-space) with version legacies and without breaking 
existing perf tooling.

Other syscall ABIs already get painful when trying to 
handle 2-3 data structure versions, so people either give 
up, or add flags kludges or go to new syscall entries: 
which is painful in its own fashion and adds unnecessary 
latency to feature introduction as well.

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Fam Zheng <famz@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>, Theodore Ts'o <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@effi
Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Wed, 18 Feb 2015 19:49:34 +0100	[thread overview]
Message-ID: <20150218184934.GA7493@gmail.com> (raw)
In-Reply-To: <20150216010224.GA32421@ad.nay.redhat.com>


* Fam Zheng <famz@redhat.com> wrote:

> On Sun, 02/15 15:00, Jonathan Corbet wrote:
> > On Fri, 13 Feb 2015 17:03:56 +0800
> > Fam Zheng <famz@redhat.com> wrote:
> > 
> > > SYNOPSIS
> > > 
> > >        #include <sys/epoll.h>
> > > 
> > >        int epoll_pwait1(int epfd, int flags,
> > >                         struct epoll_event *events,
> > >                         int maxevents,
> > >                         struct epoll_wait_params *params);
> > 
> > Quick, possibly dumb question: might it make sense to also pass in 
> > sizeof(struct epoll_wait_params)?  That way, when somebody wants to add
> > another parameter in the future, the kernel can tell which version is in
> > use and they won't have to do an epoll_pwait2()?
> > 
> 
> Flags can be used for that, if the change is not 
> radically different.

Passing in size is generally better than flags, because 
that way an extension of the ABI (new field[s]) 
automatically signals towards the kernel what to do with 
old binaries - while extending the functionality of new 
binaries, without sacrificing functionality.

With flags you are either limited to the same structure 
size - or have to decode a 'size' value from the flags 
value - which is fragile (and in which case a real 'size' 
parameter is better).

in the perf ABI we use something like that: there's a 
perf_attr.size parameter that iterates the ABI forward, 
while still being binary compatible with older software.

If old binaries pass in a smaller structure to a newer 
kernel then the kernel pads the new fields with zero by 
default - that way the kernel internals are never burdened 
with compatibility details and data format versions.

If new user-space passes in a large structure than the 
kernel can handle then the kernel returns an error - this 
way user-space can transparently support conditional 
features and fallback logic.

It works really well, we've done literally a hundred perf 
ABI extensions this way in the last 4+ years, in a pretty 
natural fashion, without littering the kernel (or 
user-space) with version legacies and without breaking 
existing perf tooling.

Other syscall ABIs already get painful when trying to 
handle 2-3 data structure versions, so people either give 
up, or add flags kludges or go to new syscall entries: 
which is painful in its own fashion and adds unnecessary 
latency to feature introduction as well.

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Fam Zheng <famz@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Omar Sandoval <osandov@osandov.com>
Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Wed, 18 Feb 2015 19:49:34 +0100	[thread overview]
Message-ID: <20150218184934.GA7493@gmail.com> (raw)
In-Reply-To: <20150216010224.GA32421@ad.nay.redhat.com>


* Fam Zheng <famz@redhat.com> wrote:

> On Sun, 02/15 15:00, Jonathan Corbet wrote:
> > On Fri, 13 Feb 2015 17:03:56 +0800
> > Fam Zheng <famz@redhat.com> wrote:
> > 
> > > SYNOPSIS
> > > 
> > >        #include <sys/epoll.h>
> > > 
> > >        int epoll_pwait1(int epfd, int flags,
> > >                         struct epoll_event *events,
> > >                         int maxevents,
> > >                         struct epoll_wait_params *params);
> > 
> > Quick, possibly dumb question: might it make sense to also pass in 
> > sizeof(struct epoll_wait_params)?  That way, when somebody wants to add
> > another parameter in the future, the kernel can tell which version is in
> > use and they won't have to do an epoll_pwait2()?
> > 
> 
> Flags can be used for that, if the change is not 
> radically different.

Passing in size is generally better than flags, because 
that way an extension of the ABI (new field[s]) 
automatically signals towards the kernel what to do with 
old binaries - while extending the functionality of new 
binaries, without sacrificing functionality.

With flags you are either limited to the same structure 
size - or have to decode a 'size' value from the flags 
value - which is fragile (and in which case a real 'size' 
parameter is better).

in the perf ABI we use something like that: there's a 
perf_attr.size parameter that iterates the ABI forward, 
while still being binary compatible with older software.

If old binaries pass in a smaller structure to a newer 
kernel then the kernel pads the new fields with zero by 
default - that way the kernel internals are never burdened 
with compatibility details and data format versions.

If new user-space passes in a large structure than the 
kernel can handle then the kernel returns an error - this 
way user-space can transparently support conditional 
features and fallback logic.

It works really well, we've done literally a hundred perf 
ABI extensions this way in the last 4+ years, in a pretty 
natural fashion, without littering the kernel (or 
user-space) with version legacies and without breaking 
existing perf tooling.

Other syscall ABIs already get painful when trying to 
handle 2-3 data structure versions, so people either give 
up, or add flags kludges or go to new syscall entries: 
which is painful in its own fashion and adds unnecessary 
latency to feature introduction as well.

Thanks,

	Ingo

  parent reply	other threads:[~2015-02-18 18:49 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-13  9:03 [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
2015-02-13  9:03 ` Fam Zheng
2015-02-13  9:03 ` Fam Zheng
2015-02-13  9:03 ` [PATCH RFC v3 2/7] epoll: Specify clockid explicitly Fam Zheng
2015-02-13  9:03   ` Fam Zheng
2015-02-13  9:03   ` Fam Zheng
2015-02-13  9:03 ` [PATCH RFC v3 3/7] epoll: Extract ep_ctl_do Fam Zheng
2015-02-13  9:03   ` Fam Zheng
2015-02-13  9:03   ` Fam Zheng
     [not found] ` <1423818243-15410-1-git-send-email-famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-13  9:03   ` [PATCH RFC v3 1/7] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
2015-02-13  9:03     ` Fam Zheng
2015-02-13  9:03     ` Fam Zheng
2015-02-13  9:04   ` [PATCH RFC v3 4/7] epoll: Add implementation for epoll_ctl_batch Fam Zheng
2015-02-13  9:04     ` Fam Zheng
2015-02-13  9:04     ` Fam Zheng
2015-02-13  9:04 ` [PATCH RFC v3 5/7] x86: Hook up epoll_ctl_batch syscall Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:04 ` [PATCH RFC v3 6/7] epoll: Add implementation for epoll_pwait1 Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:04 ` [PATCH RFC v3 7/7] x86: Hook up epoll_pwait1 syscall Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:53 ` [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Omar Sandoval
2015-02-13  9:53   ` Omar Sandoval
2015-02-13  9:53   ` Omar Sandoval
2015-02-15  6:44   ` Fam Zheng
2015-02-15 15:16   ` Michael Kerrisk (man-pages)
2015-02-15 15:16     ` Michael Kerrisk (man-pages)
2015-02-15 15:16     ` Michael Kerrisk (man-pages)
2015-02-15 22:00 ` Jonathan Corbet
2015-02-15 22:00   ` Jonathan Corbet
2015-02-15 22:00   ` Jonathan Corbet
     [not found]   ` <20150215150011.0340686c-T1hC0tSOHrs@public.gmane.org>
2015-02-16  1:02     ` Fam Zheng
2015-02-16  1:02       ` Fam Zheng
2015-02-16  1:02       ` Fam Zheng
2015-02-16  7:25       ` Seymour, Shane M
2015-02-16  7:25         ` Seymour, Shane M
2015-02-16  7:25         ` Seymour, Shane M
     [not found]         ` <DDB9C85B850785449757F9914A034FCB3BF41130-4I1V4pQFGigSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2015-02-16  8:12           ` Fam Zheng
2015-02-16  8:12             ` Fam Zheng
2015-02-16  8:12             ` Fam Zheng
2015-02-18 18:49       ` Ingo Molnar [this message]
2015-02-18 18:49         ` Ingo Molnar
2015-02-18 18:49         ` Ingo Molnar
2015-02-25  3:30         ` Fam Zheng
2015-02-25  3:30           ` Fam Zheng
2015-02-25  3:30           ` Fam Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150218184934.GA7493@gmail.com \
    --to=mingo@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ast@plumgrid.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=dh.herrmann@gmail.com \
    --cc=drysdale@google.com \
    --cc=famz@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@amacapital.net \
    --cc=mathieu.desnoyers@effi \
    --cc=mingo@redhat.com \
    --cc=mszeredi@suse.cz \
    --cc=oleg@redhat.com \
    --cc=rashika.kheria@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=vapier@gentoo.org \
    --cc=vgoyal@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.