[RESEND PATCH 0/3] epoll: Add epoll

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RESEND PATCH 0/3] epoll: Add epoll_pwait1 syscall
@ 2015-01-08  9:16 Fam Zheng
  2015-01-08  9:16 ` [RESEND PATCH 1/3] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Fam Zheng @ 2015-01-08  9:16 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86-DgEjT+Ai2ygdnm+yROfE0A, Alexander Viro, Andrew Morton,
	Miklos Szeredi, Juri Lelli, Zach Brown, David Drysdale, Fam Zheng,
	Kees Cook, Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh

[Resend because my script screwed the recipient format, sorry for the noise.]

Applications could use epoll interface when then need to poll a big number of
files in their main loops, to achieve better performance than ppoll(2). Except
for one concern: epoll only takes timeout parameters in microseconds, rather
than nanoseconds.

That is a drawback we should address. For a real case in QEMU, we run into a
scalability issue with ppoll(2) when many devices are attached to guest, in
which case many host fds, such as virtual disk images and sockets, need to be
polled by the main loop. As a result we are looking at switching to epoll, but
the coarse timeout precision is a trouble, as explained below. 

We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
timers in the main loop; and we call ppoll(2) with the next firing timer as
timeout, so when ppoll(2) returns, we know that we have more work to do (either
handling IO events, or fire a timer callback). This is natual and efficient,
except that ppoll(2) itself is slow.

Now that we want to switch to epoll, to speed up the polling. However the timer
slack setting will be effectively undone, because that way we will have to
round up the timeout to microseconds honoring timer contract. But consequently,
this hurts the general responsiveness.

Note: there are two alternatives, without changing kernel:

1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
be slow as one fd is polled. No more scalability issue. And if there are
events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
timeout=0; otherwise, there can't be events for the epoll, skip the following
epoll_wait and just continue with other work.

2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
The timerfd will hopefully force epoll_wait to return when it timeouts, even if
no other events have arrived. This will inheritly give us timerfd's precision.
Note that for each poll, the desired timeout is different because the next
timer is different, so that, before each epoll_wait(2), there will be a
timerfd_settime syscall to set it to a proper value.

Unfortunately, both approaches require one more syscall per iteration, compared
to the original single ppoll(2), cost of which is unneglectable when we talk
about nanosecond granularity.

Fam

Fam Zheng (3):
  epoll: Extract epoll_wait_do and epoll_pwait_do
  epoll: Add implementation for epoll_pwait1
  x86: hook up epoll_pwait1 syscall

 arch/x86/syscalls/syscall_32.tbl |   1 +
 arch/x86/syscalls/syscall_64.tbl |   1 +
 fs/eventpoll.c                   | 160 +++++++++++++++++++++++----------------
 include/linux/syscalls.h         |   4 +
 kernel/sys_ni.c                  |   3 +
 5 files changed, 103 insertions(+), 66 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RESEND PATCH 1/3] epoll: Extract epoll_wait_do and epoll_pwait_do
  2015-01-08  9:16 [RESEND PATCH 0/3] epoll: Add epoll_pwait1 syscall Fam Zheng
@ 2015-01-08  9:16 ` Fam Zheng
  2015-01-08  9:16 ` [RESEND PATCH 2/3] epoll: Add implementation for epoll_pwait1 Fam Zheng
  2015-01-08  9:16 ` [RESEND PATCH 3/3] x86: hook up epoll_pwait1 syscall Fam Zheng
  2 siblings, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2015-01-08  9:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Alexander Viro,
	Andrew Morton, Miklos Szeredi, Juri Lelli, Zach Brown,
	David Drysdale, Fam Zheng, Kees Cook, Alexei Starovoitov,
	David Herrmann, Dario Faggioli, Theodore Ts'o, Peter Zijlstra,
	Vivek Goyal, Mike Frysinger, Heiko Carstens, Rasmus Villemoes,
	Oleg Nesterov, Mathieu Desnoyers, Fabian Frederick, Josh

In preparation of epoll_pwait1, this allows sharing code with coming new
syscall. The new functions use timespec for timeout.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 fs/eventpoll.c | 136 +++++++++++++++++++++++++++++----------------------------
 1 file changed, 70 insertions(+), 66 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index d77f944..117ba72 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1554,15 +1554,12 @@ static int ep_send_events(struct eventpoll *ep,
 	return ep_scan_ready_list(ep, ep_send_events_proc, &esed, 0, false);
 }
 
-static inline struct timespec ep_set_mstimeout(long ms)
+static inline struct timespec ep_set_mstimeout(const struct timespec *ts)
 {
-	struct timespec now, ts = {
-		.tv_sec = ms / MSEC_PER_SEC,
-		.tv_nsec = NSEC_PER_MSEC * (ms % MSEC_PER_SEC),
-	};
+	struct timespec now;
 
 	ktime_get_ts(&now);
-	return timespec_add_safe(now, ts);
+	return timespec_add_safe(now, *ts);
 }
 
 /**
@@ -1573,17 +1570,16 @@ static inline struct timespec ep_set_mstimeout(long ms)
  * @events: Pointer to the userspace buffer where the ready events should be
  *          stored.
  * @maxevents: Size (in terms of number of events) of the caller event buffer.
- * @timeout: Maximum timeout for the ready events fetch operation, in
- *           milliseconds. If the @timeout is zero, the function will not block,
- *           while if the @timeout is less than zero, the function will block
- *           until at least one event has been retrieved (or an error
- *           occurred).
+ * @timeout: Maximum timeout for the ready events fetch operation.  If NULL, or
+ *           if both tv_sec and tv_nsec are zero, the function will not block.
+ *           If either one is less than zero, the function will block until at
+ *           least one event has been retrieved (or an error occurred).
  *
  * Returns: Returns the number of ready events which have been fetched, or an
  *          error code, in case of error.
  */
 static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
-		   int maxevents, long timeout)
+		   int maxevents, const struct timespec *timeout)
 {
 	int res = 0, eavail, timed_out = 0;
 	unsigned long flags;
@@ -1591,13 +1587,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 	wait_queue_t wait;
 	ktime_t expires, *to = NULL;
 
-	if (timeout > 0) {
-		struct timespec end_time = ep_set_mstimeout(timeout);
-
-		slack = select_estimate_accuracy(&end_time);
-		to = &expires;
-		*to = timespec_to_ktime(end_time);
-	} else if (timeout == 0) {
+	if (!timeout || (timeout->tv_nsec == 0 && timeout->tv_sec == 0)) {
 		/*
 		 * Avoid the unnecessary trip to the wait queue loop, if the
 		 * caller specified a non blocking operation.
@@ -1605,6 +1595,12 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		timed_out = 1;
 		spin_lock_irqsave(&ep->lock, flags);
 		goto check_events;
+	} else if (timeout->tv_nsec >= 0 && timeout->tv_sec >= 0) {
+		struct timespec end_time = ep_set_mstimeout(timeout);
+
+		slack = select_estimate_accuracy(&end_time);
+		to = &expires;
+		*to = timespec_to_ktime(end_time);
 	}
 
 fetch_events:
@@ -1954,12 +1950,8 @@ error_return:
 	return error;
 }
 
-/*
- * Implement the event wait interface for the eventpoll file. It is the kernel
- * part of the user space epoll_wait(2).
- */
-SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
-		int, maxevents, int, timeout)
+static inline int epoll_wait_do(int epfd, struct epoll_event __user *events,
+				int maxevents, const struct timespec *timeout)
 {
 	int error;
 	struct fd f;
@@ -2002,29 +1994,35 @@ error_fput:
 
 /*
  * Implement the event wait interface for the eventpoll file. It is the kernel
- * part of the user space epoll_pwait(2).
+ * part of the user space epoll_wait(2).
  */
-SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
-		int, maxevents, int, timeout, const sigset_t __user *, sigmask,
-		size_t, sigsetsize)
+SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
+		int, maxevents, int, timeout)
+{
+	struct timespec ts = (struct timespec) {
+	    .tv_sec = timeout / MSEC_PER_SEC,
+	    .tv_nsec = (timeout % MSEC_PER_SEC) * NSEC_PER_MSEC,
+	};
+	return epoll_wait_do(epfd, events, maxevents, &ts);
+}
+
+static inline int epoll_pwait_do(int epfd, struct epoll_event __user *events,
+				 int maxevents, struct timespec *timeout,
+				 sigset_t *sigmask, size_t sigsetsize)
 {
 	int error;
-	sigset_t ksigmask, sigsaved;
+	sigset_t sigsaved;
 
 	/*
 	 * If the caller wants a certain signal mask to be set during the wait,
 	 * we apply it here.
 	 */
 	if (sigmask) {
-		if (sigsetsize != sizeof(sigset_t))
-			return -EINVAL;
-		if (copy_from_user(&ksigmask, sigmask, sizeof(ksigmask)))
-			return -EFAULT;
 		sigsaved = current->blocked;
-		set_current_blocked(&ksigmask);
+		set_current_blocked(sigmask);
 	}
 
-	error = sys_epoll_wait(epfd, events, maxevents, timeout);
+	error = epoll_wait_do(epfd, events, maxevents, timeout);
 
 	/*
 	 * If we changed the signal mask, we need to restore the original one.
@@ -2044,49 +2042,55 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 	return error;
 }
 
+/*
+ * Implement the event wait interface for the eventpoll file. It is the kernel
+ * part of the user space epoll_pwait(2).
+ */
+SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
+		int, maxevents, int, timeout, const sigset_t __user *, sigmask,
+		size_t, sigsetsize)
+{
+	struct timespec ts = (struct timespec) {
+	    .tv_sec = timeout / MSEC_PER_SEC,
+	    .tv_nsec = (timeout % MSEC_PER_SEC) * NSEC_PER_MSEC,
+	};
+	sigset_t ksigmask;
+
+	if (sigmask) {
+		if (sigsetsize != sizeof(sigset_t))
+			return -EINVAL;
+		if (copy_from_user(&ksigmask, sigmask, sizeof(ksigmask)))
+			return -EFAULT;
+	}
+	return epoll_pwait_do(epfd, events, maxevents, &ts,
+			      sigmask ? &ksigmask : NULL, sigsetsize);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
-			struct epoll_event __user *, events,
-			int, maxevents, int, timeout,
-			const compat_sigset_t __user *, sigmask,
-			compat_size_t, sigsetsize)
+		       struct epoll_event __user *, events,
+		       int, maxevents, int, timeout,
+		       const compat_sigset_t __user *, sigmask,
+		       compat_size_t, sigsetsize)
 {
-	long err;
 	compat_sigset_t csigmask;
-	sigset_t ksigmask, sigsaved;
+	sigset_t ksigmask;
+
+	struct timespec ts = (struct timespec) {
+	    .tv_sec = timeout / MSEC_PER_SEC,
+	    .tv_nsec = (timeout % MSEC_PER_SEC) * NSEC_PER_MSEC,
+	};
 
-	/*
-	 * If the caller wants a certain signal mask to be set during the wait,
-	 * we apply it here.
-	 */
 	if (sigmask) {
 		if (sigsetsize != sizeof(compat_sigset_t))
 			return -EINVAL;
 		if (copy_from_user(&csigmask, sigmask, sizeof(csigmask)))
 			return -EFAULT;
 		sigset_from_compat(&ksigmask, &csigmask);
-		sigsaved = current->blocked;
-		set_current_blocked(&ksigmask);
-	}
-
-	err = sys_epoll_wait(epfd, events, maxevents, timeout);
-
-	/*
-	 * If we changed the signal mask, we need to restore the original one.
-	 * In case we've got a signal while waiting, we do not restore the
-	 * signal mask yet, and we allow do_signal() to deliver the signal on
-	 * the way back to userspace, before the signal mask is restored.
-	 */
-	if (sigmask) {
-		if (err == -EINTR) {
-			memcpy(&current->saved_sigmask, &sigsaved,
-			       sizeof(sigsaved));
-			set_restore_sigmask();
-		} else
-			set_current_blocked(&sigsaved);
 	}
 
-	return err;
+	return epoll_pwait_do(epfd, events, maxevents, &ts,
+			      sigmask ? &ksigmask : NULL, sigsetsize);
 }
 #endif
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RESEND PATCH 2/3] epoll: Add implementation for epoll_pwait1
  2015-01-08  9:16 [RESEND PATCH 0/3] epoll: Add epoll_pwait1 syscall Fam Zheng
  2015-01-08  9:16 ` [RESEND PATCH 1/3] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
@ 2015-01-08  9:16 ` Fam Zheng
  2015-01-08 11:10   ` Paolo Bonzini
  2015-01-08  9:16 ` [RESEND PATCH 3/3] x86: hook up epoll_pwait1 syscall Fam Zheng
  2 siblings, 1 reply; 6+ messages in thread
From: Fam Zheng @ 2015-01-08  9:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Alexander Viro,
	Andrew Morton, Miklos Szeredi, Juri Lelli, Zach Brown,
	David Drysdale, Fam Zheng, Kees Cook, Alexei Starovoitov,
	David Herrmann, Dario Faggioli, Theodore Ts'o, Peter Zijlstra,
	Vivek Goyal, Mike Frysinger, Heiko Carstens, Rasmus Villemoes,
	Oleg Nesterov, Mathieu Desnoyers, Fabian Frederick, Josh

Unlike ppoll(2), which accepts a timespec argument "timeout_ts" to
specify the timeout, epoll_wait(2) and epoll_pwait(2) expect a
microsecond timeout in int type.

This is an obstacle for applications in switching from ppoll to epoll,
if they want nanosecond resolution in their event loops.

Therefore, adding this variation of epoll wait interface, giving user an
option with *both* advantages, is a reasonable move: there could be
constantly scalable performance polling many fds, while having a
nanosecond timeout precision (assuming it has properly set up timer
slack with prctl(2)).

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 fs/eventpoll.c           | 24 ++++++++++++++++++++++++
 include/linux/syscalls.h |  4 ++++
 kernel/sys_ni.c          |  3 +++
 3 files changed, 31 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 117ba72..ee69fd4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2066,6 +2066,30 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 			      sigmask ? &ksigmask : NULL, sigsetsize);
 }
 
+SYSCALL_DEFINE6(epoll_pwait1, int, epfd, struct epoll_event __user *, events,
+		int, maxevents,
+		struct timespec __user *, timeout,
+		const sigset_t __user *, sigmask,
+		size_t, sigsetsize)
+{
+	struct timespec ts;
+	sigset_t ksigmask;
+
+	if (timeout && copy_from_user(&ts, timeout, sizeof(ts)))
+		return -EFAULT;
+
+	if (sigmask) {
+		if (sigsetsize != sizeof(sigset_t))
+			return -EINVAL;
+		if (copy_from_user(&ksigmask, sigmask, sizeof(ksigmask)))
+			return -EFAULT;
+	}
+	return epoll_pwait_do(epfd, events, maxevents,
+			      timeout ? &ts : NULL,
+			      sigmask ? &ksigmask : NULL,
+			      sigsetsize);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 		       struct epoll_event __user *, events,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 85893d7..3e0ed0b 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -630,6 +630,10 @@ asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events,
 				int maxevents, int timeout,
 				const sigset_t __user *sigmask,
 				size_t sigsetsize);
+asmlinkage long sys_epoll_pwait1(int epfd, struct epoll_event __user *events,
+				 int maxevents, struct timespec __user *ts,
+				 const sigset_t __user *sigmask,
+				 size_t sigsetsize);
 asmlinkage long sys_gethostname(char __user *name, int len);
 asmlinkage long sys_sethostname(char __user *name, int len);
 asmlinkage long sys_setdomainname(char __user *name, int len);
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 5adcb0a..1044158 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -229,3 +229,6 @@ cond_syscall(sys_bpf);
 
 /* execveat */
 cond_syscall(sys_execveat);
+
+/* epoll_pwait1 */
+cond_syscall(sys_epoll_pwait1);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RESEND PATCH 3/3] x86: hook up epoll_pwait1 syscall
  2015-01-08  9:16 [RESEND PATCH 0/3] epoll: Add epoll_pwait1 syscall Fam Zheng
  2015-01-08  9:16 ` [RESEND PATCH 1/3] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
  2015-01-08  9:16 ` [RESEND PATCH 2/3] epoll: Add implementation for epoll_pwait1 Fam Zheng
@ 2015-01-08  9:16 ` Fam Zheng
  2 siblings, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2015-01-08  9:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Alexander Viro,
	Andrew Morton, Miklos Szeredi, Juri Lelli, Zach Brown,
	David Drysdale, Fam Zheng, Kees Cook, Alexei Starovoitov,
	David Herrmann, Dario Faggioli, Theodore Ts'o, Peter Zijlstra,
	Vivek Goyal, Mike Frysinger, Heiko Carstens, Rasmus Villemoes,
	Oleg Nesterov, Mathieu Desnoyers, Fabian Frederick, Josh

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 arch/x86/syscalls/syscall_32.tbl | 1 +
 arch/x86/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index b3560ec..1c863f6 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
 356	i386	memfd_create		sys_memfd_create
 357	i386	bpf			sys_bpf
 358	i386	execveat		sys_execveat			stub32_execveat
+359	i386	epoll_pwait1		sys_epoll_pwait1
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 8d656fb..644a90f 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
 320	common	kexec_file_load		sys_kexec_file_load
 321	common	bpf			sys_bpf
 322	64	execveat		stub_execveat
+323	common	epoll_pwait1		sys_epoll_pwait1
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RESEND PATCH 2/3] epoll: Add implementation for epoll_pwait1
  2015-01-08  9:16 ` [RESEND PATCH 2/3] epoll: Add implementation for epoll_pwait1 Fam Zheng
@ 2015-01-08 11:10   ` Paolo Bonzini
       [not found]     ` <54AE65BB.1020707-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2015-01-08 11:10 UTC (permalink / raw)
  To: Fam Zheng, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Alexander Viro,
	Andrew Morton, Miklos Szeredi, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers, Fabian Frederick, Josh Triplett



On 08/01/2015 10:16, Fam Zheng wrote:
> Unlike ppoll(2), which accepts a timespec argument "timeout_ts" to
> specify the timeout, epoll_wait(2) and epoll_pwait(2) expect a
> microsecond timeout in int type.
> 
> This is an obstacle for applications in switching from ppoll to epoll,
> if they want nanosecond resolution in their event loops.
> 
> Therefore, adding this variation of epoll wait interface, giving user an
> option with *both* advantages, is a reasonable move: there could be
> constantly scalable performance polling many fds, while having a
> nanosecond timeout precision (assuming it has properly set up timer
> slack with prctl(2)).
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  fs/eventpoll.c           | 24 ++++++++++++++++++++++++
>  include/linux/syscalls.h |  4 ++++
>  kernel/sys_ni.c          |  3 +++
>  3 files changed, 31 insertions(+)

As mentioned by Miklos in the non-resent version, please add a flags
argument.  Invalid flags should return -EINVAL.

In fact, we could already use the flags argument to specify an absolute
timeout, which is a nice thing to have for QEMU too.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND PATCH 2/3] epoll: Add implementation for epoll_pwait1
       [not found]     ` <54AE65BB.1020707-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-01-08 11:48       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-08 11:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, lkml, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, Alexander Viro,
	Andrew Morton, Miklos Szeredi, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers <mathieu.desnoyers

On 8 January 2015 at 12:10, Paolo Bonzini <pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
>
> On 08/01/2015 10:16, Fam Zheng wrote:
>> Unlike ppoll(2), which accepts a timespec argument "timeout_ts" to
>> specify the timeout, epoll_wait(2) and epoll_pwait(2) expect a
>> microsecond timeout in int type.
>>
>> This is an obstacle for applications in switching from ppoll to epoll,
>> if they want nanosecond resolution in their event loops.
>>
>> Therefore, adding this variation of epoll wait interface, giving user an
>> option with *both* advantages, is a reasonable move: there could be
>> constantly scalable performance polling many fds, while having a
>> nanosecond timeout precision (assuming it has properly set up timer
>> slack with prctl(2)).
>>
>> Signed-off-by: Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> ---
>>  fs/eventpoll.c           | 24 ++++++++++++++++++++++++
>>  include/linux/syscalls.h |  4 ++++
>>  kernel/sys_ni.c          |  3 +++
>>  3 files changed, 31 insertions(+)
>
> As mentioned by Miklos in the non-resent version, please add a flags
> argument.  Invalid flags should return -EINVAL.
>
> In fact, we could already use the flags argument to specify an absolute
> timeout, which is a nice thing to have for QEMU too.

Nice! It looks like we found this iteration of "failure to include a
flags argument is a mistake" already!

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-01-08 11:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-08  9:16 [RESEND PATCH 0/3] epoll: Add epoll_pwait1 syscall Fam Zheng
2015-01-08  9:16 ` [RESEND PATCH 1/3] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
2015-01-08  9:16 ` [RESEND PATCH 2/3] epoll: Add implementation for epoll_pwait1 Fam Zheng
2015-01-08 11:10   ` Paolo Bonzini
     [not found]     ` <54AE65BB.1020707-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-08 11:48       ` Michael Kerrisk (man-pages)
2015-01-08  9:16 ` [RESEND PATCH 3/3] x86: hook up epoll_pwait1 syscall Fam Zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).