All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fam Zheng <famz@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>, Theodore Ts'o <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@ef>
Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Wed, 25 Feb 2015 11:30:09 +0800	[thread overview]
Message-ID: <20150225033009.GA20485@ad.nay.redhat.com> (raw)
In-Reply-To: <20150218184934.GA7493@gmail.com>

On Wed, 02/18 19:49, Ingo Molnar wrote:
> 
> * Fam Zheng <famz@redhat.com> wrote:
> 
> > On Sun, 02/15 15:00, Jonathan Corbet wrote:
> > > On Fri, 13 Feb 2015 17:03:56 +0800
> > > Fam Zheng <famz@redhat.com> wrote:
> > > 
> > > > SYNOPSIS
> > > > 
> > > >        #include <sys/epoll.h>
> > > > 
> > > >        int epoll_pwait1(int epfd, int flags,
> > > >                         struct epoll_event *events,
> > > >                         int maxevents,
> > > >                         struct epoll_wait_params *params);
> > > 
> > > Quick, possibly dumb question: might it make sense to also pass in 
> > > sizeof(struct epoll_wait_params)?  That way, when somebody wants to add
> > > another parameter in the future, the kernel can tell which version is in
> > > use and they won't have to do an epoll_pwait2()?
> > > 
> > 
> > Flags can be used for that, if the change is not 
> > radically different.
> 
> Passing in size is generally better than flags, because 
> that way an extension of the ABI (new field[s]) 
> automatically signals towards the kernel what to do with 
> old binaries - while extending the functionality of new 
> binaries, without sacrificing functionality.
> 
> With flags you are either limited to the same structure 
> size - or have to decode a 'size' value from the flags 
> value - which is fragile (and in which case a real 'size' 
> parameter is better).
> 
> in the perf ABI we use something like that: there's a 
> perf_attr.size parameter that iterates the ABI forward, 
> while still being binary compatible with older software.
> 
> If old binaries pass in a smaller structure to a newer 
> kernel then the kernel pads the new fields with zero by 
> default - that way the kernel internals are never burdened 
> with compatibility details and data format versions.
> 
> If new user-space passes in a large structure than the 
> kernel can handle then the kernel returns an error - this 
> way user-space can transparently support conditional 
> features and fallback logic.
> 
> It works really well, we've done literally a hundred perf 
> ABI extensions this way in the last 4+ years, in a pretty 
> natural fashion, without littering the kernel (or 
> user-space) with version legacies and without breaking 
> existing perf tooling.
> 
> Other syscall ABIs already get painful when trying to 
> handle 2-3 data structure versions, so people either give 
> up, or add flags kludges or go to new syscall entries: 
> which is painful in its own fashion and adds unnecessary 
> latency to feature introduction as well.
> 

Excellent. This now makes a lot of sense to me, thanks to your explanations,
Ingo.

I'll add the "size" field in the next revision.

Thanks,
Fam

WARNING: multiple messages have this Message-ID (diff)
From: Fam Zheng <famz@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@ef
Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Wed, 25 Feb 2015 11:30:09 +0800	[thread overview]
Message-ID: <20150225033009.GA20485@ad.nay.redhat.com> (raw)
In-Reply-To: <20150218184934.GA7493@gmail.com>

On Wed, 02/18 19:49, Ingo Molnar wrote:
> 
> * Fam Zheng <famz@redhat.com> wrote:
> 
> > On Sun, 02/15 15:00, Jonathan Corbet wrote:
> > > On Fri, 13 Feb 2015 17:03:56 +0800
> > > Fam Zheng <famz@redhat.com> wrote:
> > > 
> > > > SYNOPSIS
> > > > 
> > > >        #include <sys/epoll.h>
> > > > 
> > > >        int epoll_pwait1(int epfd, int flags,
> > > >                         struct epoll_event *events,
> > > >                         int maxevents,
> > > >                         struct epoll_wait_params *params);
> > > 
> > > Quick, possibly dumb question: might it make sense to also pass in 
> > > sizeof(struct epoll_wait_params)?  That way, when somebody wants to add
> > > another parameter in the future, the kernel can tell which version is in
> > > use and they won't have to do an epoll_pwait2()?
> > > 
> > 
> > Flags can be used for that, if the change is not 
> > radically different.
> 
> Passing in size is generally better than flags, because 
> that way an extension of the ABI (new field[s]) 
> automatically signals towards the kernel what to do with 
> old binaries - while extending the functionality of new 
> binaries, without sacrificing functionality.
> 
> With flags you are either limited to the same structure 
> size - or have to decode a 'size' value from the flags 
> value - which is fragile (and in which case a real 'size' 
> parameter is better).
> 
> in the perf ABI we use something like that: there's a 
> perf_attr.size parameter that iterates the ABI forward, 
> while still being binary compatible with older software.
> 
> If old binaries pass in a smaller structure to a newer 
> kernel then the kernel pads the new fields with zero by 
> default - that way the kernel internals are never burdened 
> with compatibility details and data format versions.
> 
> If new user-space passes in a large structure than the 
> kernel can handle then the kernel returns an error - this 
> way user-space can transparently support conditional 
> features and fallback logic.
> 
> It works really well, we've done literally a hundred perf 
> ABI extensions this way in the last 4+ years, in a pretty 
> natural fashion, without littering the kernel (or 
> user-space) with version legacies and without breaking 
> existing perf tooling.
> 
> Other syscall ABIs already get painful when trying to 
> handle 2-3 data structure versions, so people either give 
> up, or add flags kludges or go to new syscall entries: 
> which is painful in its own fashion and adds unnecessary 
> latency to feature introduction as well.
> 

Excellent. This now makes a lot of sense to me, thanks to your explanations,
Ingo.

I'll add the "size" field in the next revision.

Thanks,
Fam

WARNING: multiple messages have this Message-ID (diff)
From: Fam Zheng <famz@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Omar Sandoval <osandov@osandov.com>
Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Wed, 25 Feb 2015 11:30:09 +0800	[thread overview]
Message-ID: <20150225033009.GA20485@ad.nay.redhat.com> (raw)
In-Reply-To: <20150218184934.GA7493@gmail.com>

On Wed, 02/18 19:49, Ingo Molnar wrote:
> 
> * Fam Zheng <famz@redhat.com> wrote:
> 
> > On Sun, 02/15 15:00, Jonathan Corbet wrote:
> > > On Fri, 13 Feb 2015 17:03:56 +0800
> > > Fam Zheng <famz@redhat.com> wrote:
> > > 
> > > > SYNOPSIS
> > > > 
> > > >        #include <sys/epoll.h>
> > > > 
> > > >        int epoll_pwait1(int epfd, int flags,
> > > >                         struct epoll_event *events,
> > > >                         int maxevents,
> > > >                         struct epoll_wait_params *params);
> > > 
> > > Quick, possibly dumb question: might it make sense to also pass in 
> > > sizeof(struct epoll_wait_params)?  That way, when somebody wants to add
> > > another parameter in the future, the kernel can tell which version is in
> > > use and they won't have to do an epoll_pwait2()?
> > > 
> > 
> > Flags can be used for that, if the change is not 
> > radically different.
> 
> Passing in size is generally better than flags, because 
> that way an extension of the ABI (new field[s]) 
> automatically signals towards the kernel what to do with 
> old binaries - while extending the functionality of new 
> binaries, without sacrificing functionality.
> 
> With flags you are either limited to the same structure 
> size - or have to decode a 'size' value from the flags 
> value - which is fragile (and in which case a real 'size' 
> parameter is better).
> 
> in the perf ABI we use something like that: there's a 
> perf_attr.size parameter that iterates the ABI forward, 
> while still being binary compatible with older software.
> 
> If old binaries pass in a smaller structure to a newer 
> kernel then the kernel pads the new fields with zero by 
> default - that way the kernel internals are never burdened 
> with compatibility details and data format versions.
> 
> If new user-space passes in a large structure than the 
> kernel can handle then the kernel returns an error - this 
> way user-space can transparently support conditional 
> features and fallback logic.
> 
> It works really well, we've done literally a hundred perf 
> ABI extensions this way in the last 4+ years, in a pretty 
> natural fashion, without littering the kernel (or 
> user-space) with version legacies and without breaking 
> existing perf tooling.
> 
> Other syscall ABIs already get painful when trying to 
> handle 2-3 data structure versions, so people either give 
> up, or add flags kludges or go to new syscall entries: 
> which is painful in its own fashion and adds unnecessary 
> latency to feature introduction as well.
> 

Excellent. This now makes a lot of sense to me, thanks to your explanations,
Ingo.

I'll add the "size" field in the next revision.

Thanks,
Fam

  reply	other threads:[~2015-02-25  3:30 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-13  9:03 [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
2015-02-13  9:03 ` Fam Zheng
2015-02-13  9:03 ` Fam Zheng
     [not found] ` <1423818243-15410-1-git-send-email-famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-13  9:03   ` [PATCH RFC v3 1/7] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
2015-02-13  9:03     ` Fam Zheng
2015-02-13  9:03     ` Fam Zheng
2015-02-13  9:04   ` [PATCH RFC v3 4/7] epoll: Add implementation for epoll_ctl_batch Fam Zheng
2015-02-13  9:04     ` Fam Zheng
2015-02-13  9:04     ` Fam Zheng
2015-02-13  9:03 ` [PATCH RFC v3 2/7] epoll: Specify clockid explicitly Fam Zheng
2015-02-13  9:03   ` Fam Zheng
2015-02-13  9:03   ` Fam Zheng
2015-02-13  9:03 ` [PATCH RFC v3 3/7] epoll: Extract ep_ctl_do Fam Zheng
2015-02-13  9:03   ` Fam Zheng
2015-02-13  9:03   ` Fam Zheng
2015-02-13  9:04 ` [PATCH RFC v3 5/7] x86: Hook up epoll_ctl_batch syscall Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:04 ` [PATCH RFC v3 6/7] epoll: Add implementation for epoll_pwait1 Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:04 ` [PATCH RFC v3 7/7] x86: Hook up epoll_pwait1 syscall Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:04   ` Fam Zheng
2015-02-13  9:53 ` [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Omar Sandoval
2015-02-13  9:53   ` Omar Sandoval
2015-02-13  9:53   ` Omar Sandoval
2015-02-15  6:44   ` Fam Zheng
2015-02-15 15:16   ` Michael Kerrisk (man-pages)
2015-02-15 15:16     ` Michael Kerrisk (man-pages)
2015-02-15 15:16     ` Michael Kerrisk (man-pages)
2015-02-15 22:00 ` Jonathan Corbet
2015-02-15 22:00   ` Jonathan Corbet
2015-02-15 22:00   ` Jonathan Corbet
     [not found]   ` <20150215150011.0340686c-T1hC0tSOHrs@public.gmane.org>
2015-02-16  1:02     ` Fam Zheng
2015-02-16  1:02       ` Fam Zheng
2015-02-16  1:02       ` Fam Zheng
2015-02-16  7:25       ` Seymour, Shane M
2015-02-16  7:25         ` Seymour, Shane M
2015-02-16  7:25         ` Seymour, Shane M
     [not found]         ` <DDB9C85B850785449757F9914A034FCB3BF41130-4I1V4pQFGigSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2015-02-16  8:12           ` Fam Zheng
2015-02-16  8:12             ` Fam Zheng
2015-02-16  8:12             ` Fam Zheng
2015-02-18 18:49       ` Ingo Molnar
2015-02-18 18:49         ` Ingo Molnar
2015-02-18 18:49         ` Ingo Molnar
2015-02-25  3:30         ` Fam Zheng [this message]
2015-02-25  3:30           ` Fam Zheng
2015-02-25  3:30           ` Fam Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150225033009.GA20485@ad.nay.redhat.com \
    --to=famz@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@plumgrid.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=dh.herrmann@gmail.com \
    --cc=drysdale@google.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@amacapital.net \
    --cc=mathieu.desnoyers@ef \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mszeredi@suse.cz \
    --cc=oleg@redhat.com \
    --cc=rashika.kheria@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=vapier@gentoo.org \
    --cc=vgoyal@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.