From: Fam Zheng <famz@redhat.com>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Kees Cook <keescook@chromium.org>,
Andy Lutomirski <luto@amacapital.net>,
David Herrmann <dh.herrmann@gmail.com>,
Alexei Starovoitov <ast@plumgrid.com>,
Miklos Szeredi <mszeredi@suse.cz>,
David Drysdale <drysdale@google.com>,
Oleg Nesterov <oleg@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
Vivek Goyal <vgoyal@redhat.com>,
Mike Frysinger <vapier@gentoo.org>,
"Theodore Ts'o" <tytso@mit.edu>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Rashika Kheria <rashika.kheria@gmail.com>,
Hugh Dickins <hughd@google.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
Josh Triplett <josh@joshtriplett.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Omar Sandoval <osandov@osandov.com>
Subject: Re: [PATCH RFC v2 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Thu, 5 Feb 2015 17:01:46 +0800 [thread overview]
Message-ID: <20150205090146.GC5048@ad.nay.redhat.com> (raw)
In-Reply-To: <54D31F6F.4030301@gmail.com>
On Thu, 02/05 08:44, Michael Kerrisk (man-pages) wrote:
> Hello Fam Zheng,
>
> On 02/05/2015 02:52 AM, Fam Zheng wrote:
> > On Wed, 02/04 13:44, Michael Kerrisk (man-pages) wrote:
> >> Hello Fam Zheng,
> >>
> >> On 02/04/2015 11:36 AM, Fam Zheng wrote:
> >>> Changes from v1 (https://lkml.org/lkml/2015/1/20/189):
> >>>
> >>> - As discussed in previous thread [1], split the call to epoll_ctl_batch and
> >>> epoll_pwait. [Michael]
> >>>
> >>> - Fix memory leaks. [Omar]
> >>>
> >>> - Add a short comment about the ignored copy_to_user failure. [Omar]
> >>>
> >>> - Cover letter rewritten.
> >>>
> >>> This adds two new system calls as described below. I mentioned glibc wrapping
> >>> of sigarg in epoll_pwait1 description, so don't read it as end user man page
> >>> yet.
> >>
> >> Fair enough. But I think it would be helpful to say a little more already.
> >> See my comment below.
> >>
> >>> One caveat is the possible failure of the last copy_to_user in epoll_ctl_batch,
> >>> which returns per command error code. Ideas to improve that are welcome!
> >>>
> >>> 1) epoll_ctl_batch
> >>> ------------------
> >>>
> >>> NAME
> >>> epoll_ctl_batch - modify an epoll descriptor in batch
> >>>
> >>> SYNOPSIS
> >>>
> >>> #include <sys/epoll.h>
> >>>
> >>> int epoll_ctl_batch(int epfd, int flags,
> >>> int ncmds, struct epoll_ctl_cmd *cmds);
> >>>
> >>> DESCRIPTION
> >>>
> >>> The system call performs a batch of epoll_ctl operations. It allows
> >>> efficient update of events on this epoll descriptor.
> >>>
> >>> Flags is reserved and must be 0.
> >>>
> >>> Each operation is specified as an element in the cmds array, defined as:
> >>>
> >>> struct epoll_ctl_cmd {
> >>>
> >>> /* Reserved flags for future extension, must be 0. */
> >>> int flags;
> >>>
> >>> /* The same as epoll_ctl() op parameter. */
> >>> int op;
> >>>
> >>> /* The same as epoll_ctl() fd parameter. */
> >>> int fd;
> >>>
> >>> /* The same as the "events" field in struct epoll_event. */
> >>> uint32_t events;
> >>>
> >>> /* The same as the "data" field in struct epoll_event. */
> >>> uint64_t data;
> >>>
> >>> /* Output field, will be set to the return code after this
> >>> * command is executed by kernel */
> >>> int error_hint;
> >>
> >> Why 'error_hint', rather than just stay 'status' or 'result'? I mean
> >> is it really a "hint"? Also, it can be 0 meaning "success" (no error).
> >
> > Because the convention is weak here: the copy_to_user, to assign values to this
> > field in user provided array, could (very unlikely) fail, at which point the
> > commands are already executed so there is no return. This corner case
> > inconsistency makes it hard to be a contract.
>
> I still think it's a poor name. Yes, it could unlikely fail.
> Still, 'status' or 'result_status' or something like that is better.
>
> >>> };
> >>>
> >>> Commands are executed in their order in cmds, and if one of them failed,
> >>> the rest after it are not tried.
> >>>
> >>> Not that this call isn't atomic in terms of updating the epoll
> >>> descriptor, which means a second epoll_ctl or epoll_ctl_batch call
> >>> during the first epoll_ctl_batch may make the operation sequence
> >>> interleaved. However, each single epoll_ctl_cmd operation has the same
> >>> semantics as a epoll_ctl call.
> >>
> >> (Good to include that warning!)
> >>
> >>> RETURN VALUE
> >>>
> >>> If one or more of the parameters are incorrect, -1 is returned and errno
> >>> is set appropriately. Otherwise, the number of succeeded commands is
> >>> returned.
> >>>
> >>> Each error_hint field may be set to the error code or 0, depending on
> >>> the result of the command. If there is some error in returning the error
> >>> to user, it may also be unchanged, even though the command may actually
> >>> be executed. In this case, it's still ensured that the i-th (i = ret)
> >>> command is the failed command.
> >>
> >> Sorry -- I'm not clear here. Can you say some more here? What sort
> >> of error might there be when "returning the error to the user"?
> >
> > As explained above. See also the last comment of Omar:
> >
> > https://lkml.org/lkml/2015/1/21/103
>
> Okay -- but could you please actually explain this in more detail in the
> man page. Then others do not need to ask the same question.
OK, I'll name it 'status' and add more details in next revision.
>
> > <...>
> >
> >>> DESCRIPTION
> >>>
> >>> The epoll_pwait1 system call differs from epoll_pwait only in parameter
> >>> types. The first difference is timeout, a pointer to timespec structure
> >>> which allows nanosecond presicion; the second difference, which should
> >>> probably be wrapper by glibc and only expose a sigset_t pointer as in
> >>> pselect6.
> >>
> >> Here I think it still helps to explain that 'struct sigags' is a sigset_t* +
> >> the size of the pointed-to set.
> >
> > Yes, will add.
> >
> >>
> >>> If timeout is NULL, it's treated as if 0 is specified in epoll_pwait
> >>
> >> The convention I would expect here is that NULL means infinite timeout,
> >> like select(), and timeout == {0,0} would get the "return immediately"
> >> behavior. Why did you choose your convention? And, how does one otherwise
> >> request an infinite timeout?
> >
> > I'm wrong. I should follow the conventions of pselect and ppoll. Thanks for
> > catching that.
>
> Good.
>
> >>
> >>> (return immediately). Otherwise it's converted to nanosecond scalar,
> >>> again, with the same convention as epoll_pwait's timeout.
> >>>
> >>> RETURN VALUE
> >>>
> >>> The same as said in epoll_pwait(2).
> >>>
> >>> ERRORS
> >>>
> >>> The same as said in man epoll_pwait(2), plus:
> >>>
> >>> EINVAL flags is not zero.
> >>>
> >>>
> >>> CONFORMING TO
> >>>
> >>> epoll_pwait1() is Linux-specific.
> >>>
> >>> SEE ALSO
> >>>
> >>> epoll_create(2), epoll_ctl(2), epoll_wait(2), epoll_pwait(2), epoll(7)
> >>
> >> In your earlier patch set, there was the ability to select the clock
> >> used for timeouts. Why did this go away? I'm not sure whether we need
> >> that functionality or not, but it would be good to know why you
> >> dropped it this time.
> >>
> >
> > No room for another "int clockid". Options:
>
> Ahh yes. I overlooked that. But again, if your commit message or
> the man page said something about why you dropped this argument,
> then we would not run around in circles having the same
> conversations repeatedly.
>
> I keep loving this recent quote from akpm:
> http://lwn.net/Articles/616226/
Good point!
>
> > 0) Don't have it.
> >
> > 1) "Or" into lower bits of flags.
> >
> > 2) Encode into flags bits.
> >
> > 3) Squash "int clockid" in the last argument (currently "sig" but need to name
> > it "spec", again).
>
> Well, it's up to you. In https://lkml.org/lkml/2015/1/20/189
> you initially proposed to have the clockid. Do you need it,
> or not? I am agnostic about it. If you do decide you want it,
> then I think (3) is the best option.
Then let's do it this way. clockid would be useful because different
applications care about different clocks.
>
> I'm very pleased that you expand this man page as you go.
> But, for the next iteration, I think it would be better to make
> it even more complete--it really is helpful to have documentation
> as a starting point to discuss the API, and the better the doc,
> the better the discussion.
>
Sure, I'll take your points. Thanks for the advice!
Fam
next prev parent reply other threads:[~2015-02-05 9:03 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-04 10:36 [PATCH RFC v2 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
2015-02-04 10:36 ` [PATCH RFC v2 1/7] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
2015-02-04 10:36 ` [PATCH RFC v2 2/7] epoll: Specify clockid explicitly Fam Zheng
2015-02-04 10:36 ` [PATCH RFC v2 3/7] epoll: Extract ep_ctl_do Fam Zheng
2015-02-04 10:36 ` [PATCH RFC v2 4/7] epoll: Add implementation for epoll_ctl_batch Fam Zheng
2015-02-04 10:36 ` [PATCH RFC v2 5/7] x86: Hook up epoll_ctl_batch syscall Fam Zheng
2015-02-04 10:36 ` [PATCH RFC v2 6/7] epoll: Add implementation for epoll_pwait1 Fam Zheng
2015-02-04 10:36 ` [PATCH RFC v2 7/7] x86: Hook up epoll_pwait1 syscall Fam Zheng
2015-02-04 10:50 ` [PATCH RFC v2 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
2015-02-04 12:44 ` Michael Kerrisk (man-pages)
2015-02-05 1:52 ` Fam Zheng
2015-02-05 7:44 ` Michael Kerrisk (man-pages)
2015-02-05 9:01 ` Fam Zheng [this message]
2015-02-04 21:38 ` Andy Lutomirski
2015-02-05 1:51 ` Fam Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150205090146.GC5048@ad.nay.redhat.com \
--to=famz@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ast@plumgrid.com \
--cc=davem@davemloft.net \
--cc=dh.herrmann@gmail.com \
--cc=drysdale@google.com \
--cc=heiko.carstens@de.ibm.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=josh@joshtriplett.org \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=luto@amacapital.net \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@redhat.com \
--cc=mszeredi@suse.cz \
--cc=mtk.manpages@gmail.com \
--cc=oleg@redhat.com \
--cc=osandov@osandov.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rashika.kheria@gmail.com \
--cc=tglx@linutronix.de \
--cc=tytso@mit.edu \
--cc=vapier@gentoo.org \
--cc=vgoyal@redhat.com \
--cc=viro@zeniv.linux.org.uk \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox