public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] signalfd: retrieve multiple signals with one read() call
@ 2007-05-20  0:07 Davi Arnaut
  2007-05-20 19:17 ` Davide Libenzi
  2007-05-21  4:02 ` Andrew Morton
  0 siblings, 2 replies; 8+ messages in thread
From: Davi Arnaut @ 2007-05-20  0:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Davide Libenzi, Linus Torvalds, Linux Kernel Mailing List

Hi,

Gathering signals in bulk enables server applications to drain a signal
queue (almost full of realtime signals) more efficiently by reducing the
syscall and file look-up overhead.

Very similar to the sigtimedwait4() call described by Niels Provos,
Chuck Lever, and Stephen Tweedie in a paper entitled "Analyzing the
Overload Behavior of a Simple Web Server". The paper lists more details
and advantages.

Signed-off-by: Davi E. M. Arnaut <davi@haxent.com.br>

diff --git a/fs/signalfd.c b/fs/signalfd.c
index 7cfeab4..f6303c5 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -11,6 +11,8 @@
  *      Now using anonymous inode source.
  *      Thanks to Oleg Nesterov for useful code review and suggestions.
  *      More comments and suggestions from Arnd Bergmann.
+ * Sat May 19, 2007: Davi E. M. Arnaut <davi@haxent.com.br>
+ *      Retrieve multiple signals with one read() call
  */
 
 #include <linux/file.h>
@@ -206,6 +208,64 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
 	return err ? -EFAULT: sizeof(*uinfo);
 }
 
+static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info,
+				int nonblock)
+{
+	int locked;
+	ssize_t ret;
+	struct signalfd_lockctx lk;
+	DECLARE_WAITQUEUE(wait, current);
+
+	locked = signalfd_lock(ctx, &lk);
+	if (!locked)
+		return 0;
+
+	ret = dequeue_signal(lk.tsk, &ctx->sigmask, info);
+	switch (ret) {
+	case 0:
+		if (!nonblock)
+			break;
+		ret = -EAGAIN;
+	default:
+		signalfd_unlock(&lk);
+		return ret;
+	}
+
+	add_wait_queue(&ctx->wqh, &wait);
+	for (;;) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		ret = dequeue_signal(lk.tsk, &ctx->sigmask, info);
+		if (ret != 0)
+			break;
+		if (signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+		signalfd_unlock(&lk);
+		schedule();
+		locked = signalfd_lock(ctx, &lk);
+		if (unlikely(!locked)) {
+			/*
+			 * Let the caller read zero byte, ala socket
+			 * recv() when the peer disconnect. This test
+			 * must be done before doing a dequeue_signal(),
+			 * because if the sighand has been orphaned,
+			 * the dequeue_signal() call is going to crash.
+			 */
+			 ret = 0;
+			 break;
+		}
+	}
+
+	remove_wait_queue(&ctx->wqh, &wait);
+	__set_current_state(TASK_RUNNING);
+
+	if (likely(locked))
+		signalfd_unlock(&lk);
+
+	return ret;
+}
+
 /*
  * Returns either the size of a "struct signalfd_siginfo", or zero if the
  * sighand we are attached to, has been orphaned. The "count" parameter
@@ -215,55 +275,30 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 			     loff_t *ppos)
 {
 	struct signalfd_ctx *ctx = file->private_data;
-	ssize_t res = 0;
-	int locked, signo;
+	struct signalfd_siginfo __user *siginfo;
+	int nonblock = file->f_flags & O_NONBLOCK;
+	ssize_t ret, total = 0;
 	siginfo_t info;
-	struct signalfd_lockctx lk;
-	DECLARE_WAITQUEUE(wait, current);
 
-	if (count < sizeof(struct signalfd_siginfo))
+	count /= sizeof(struct signalfd_siginfo);
+	if (!count)
 		return -EINVAL;
-	locked = signalfd_lock(ctx, &lk);
-	if (!locked)
-		return 0;
-	res = -EAGAIN;
-	signo = dequeue_signal(lk.tsk, &ctx->sigmask, &info);
-	if (signo == 0 && !(file->f_flags & O_NONBLOCK)) {
-		add_wait_queue(&ctx->wqh, &wait);
-		for (;;) {
-			set_current_state(TASK_INTERRUPTIBLE);
-			signo = dequeue_signal(lk.tsk, &ctx->sigmask, &info);
-			if (signo != 0)
-				break;
-			if (signal_pending(current)) {
-				res = -ERESTARTSYS;
-				break;
-			}
-			signalfd_unlock(&lk);
-			schedule();
-			locked = signalfd_lock(ctx, &lk);
-			if (unlikely(!locked)) {
-				/*
-				 * Let the caller read zero byte, ala socket
-				 * recv() when the peer disconnect. This test
-				 * must be done before doing a dequeue_signal(),
-				 * because if the sighand has been orphaned,
-				 * the dequeue_signal() call is going to crash.
-				 */
-				res = 0;
-				break;
-			}
-		}
-		remove_wait_queue(&ctx->wqh, &wait);
-		__set_current_state(TASK_RUNNING);
-	}
-	if (likely(locked))
-		signalfd_unlock(&lk);
-	if (likely(signo))
-		res = signalfd_copyinfo((struct signalfd_siginfo __user *) buf,
-					&info);
 
-	return res;
+	siginfo = (struct signalfd_siginfo __user *) buf;
+
+	do {
+		ret = signalfd_dequeue(ctx, &info, nonblock);
+		if (unlikely(ret <= 0))
+			break;
+		ret = signalfd_copyinfo(siginfo, &info);
+		if (ret < 0)
+			break;
+		siginfo++;
+		total += ret;
+		nonblock = 1;
+	} while (--count);
+
+	return total ? total : ret;
 }
 
 static const struct file_operations signalfd_fops = {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] signalfd: retrieve multiple signals with one read() call
  2007-05-20  0:07 [PATCH] signalfd: retrieve multiple signals with one read() call Davi Arnaut
@ 2007-05-20 19:17 ` Davide Libenzi
  2007-05-21  4:02 ` Andrew Morton
  1 sibling, 0 replies; 8+ messages in thread
From: Davide Libenzi @ 2007-05-20 19:17 UTC (permalink / raw)
  To: Davi Arnaut; +Cc: Andrew Morton, Linus Torvalds, Linux Kernel Mailing List

On Sat, 19 May 2007, Davi Arnaut wrote:

> Hi,
> 
> Gathering signals in bulk enables server applications to drain a signal
> queue (almost full of realtime signals) more efficiently by reducing the
> syscall and file look-up overhead.
> 
> Very similar to the sigtimedwait4() call described by Niels Provos,
> Chuck Lever, and Stephen Tweedie in a paper entitled "Analyzing the
> Overload Behavior of a Simple Web Server". The paper lists more details
> and advantages.

That paper is a pre-epoll thing. Nowadays I see no value in using RT 
signals as main event gathering mechanism ;)


> Signed-off-by: Davi E. M. Arnaut <davi@haxent.com.br>

The patch looks otherwise good to me, and I see no reasons why we 
shoulnd't be fetching more than one signal if available.

Acked-by: Davide Libenzi <davidel@xmailserver.org>



- Davide



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] signalfd: retrieve multiple signals with one read() call
  2007-05-20  0:07 [PATCH] signalfd: retrieve multiple signals with one read() call Davi Arnaut
  2007-05-20 19:17 ` Davide Libenzi
@ 2007-05-21  4:02 ` Andrew Morton
  2007-05-21  4:14   ` Davide Libenzi
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2007-05-21  4:02 UTC (permalink / raw)
  To: Davi Arnaut; +Cc: Davide Libenzi, Linus Torvalds, Linux Kernel Mailing List

On Sat, 19 May 2007 21:07:11 -0300 Davi Arnaut <davi@haxent.com.br> wrote:

> Hi,
> 
> Gathering signals in bulk enables server applications to drain a signal
> queue (almost full of realtime signals) more efficiently by reducing the
> syscall and file look-up overhead.
> 
> Very similar to the sigtimedwait4() call described by Niels Provos,
> Chuck Lever, and Stephen Tweedie in a paper entitled "Analyzing the
> Overload Behavior of a Simple Web Server". The paper lists more details
> and advantages.
> 

static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
			     loff_t *ppos)
{
	struct signalfd_ctx *ctx = file->private_data;
	struct signalfd_siginfo __user *siginfo;
	int nonblock = file->f_flags & O_NONBLOCK;
	ssize_t ret, total = 0;
	siginfo_t info;

	count /= sizeof(struct signalfd_siginfo);
	if (!count)
		return -EINVAL;

	siginfo = (struct signalfd_siginfo __user *) buf;

	do {
		ret = signalfd_dequeue(ctx, &info, nonblock);
		if (unlikely(ret <= 0))
			break;
		ret = signalfd_copyinfo(siginfo, &info);
		if (ret < 0)
			break;
		siginfo++;
		total += ret;
		nonblock = 1;
	} while (--count);

	return total ? total : ret;
}

If 'count' is not a multiple of sizeof(struct signalfd_siginfo)), the read()
will return the next smallest multiple of `count'.

That is, unless `count' happens to be less than 1*sizeof(struct
signalfd_siginfo)), in which case we return -EINVAL.

This seems inconsistent.


Also, I'm desperately hunting for the place where we zero out that local
siginfo_t, and I ain't finding it.  Someone please convince me that we're
not leaking bits of kernel memory out to userspace in that thing.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] signalfd: retrieve multiple signals with one read() call
  2007-05-21  4:02 ` Andrew Morton
@ 2007-05-21  4:14   ` Davide Libenzi
  2007-05-21  4:20     ` Davide Libenzi
  2007-05-21  4:26     ` Andrew Morton
  0 siblings, 2 replies; 8+ messages in thread
From: Davide Libenzi @ 2007-05-21  4:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Davi Arnaut, Linus Torvalds, Linux Kernel Mailing List

On Sun, 20 May 2007, Andrew Morton wrote:

> If 'count' is not a multiple of sizeof(struct signalfd_siginfo)), the read()
> will return the next smallest multiple of `count'.
> 
> That is, unless `count' happens to be less than 1*sizeof(struct
> signalfd_siginfo)), in which case we return -EINVAL.
> 
> This seems inconsistent.

I think it fits the rule "buffer must be big enough for at least one sigingo".
We use the special return 0; as indicator that the process we were 
monitoring signals, detached the sighand.



> Also, I'm desperately hunting for the place where we zero out that local
> siginfo_t, and I ain't finding it.  Someone please convince me that we're
> not leaking bits of kernel memory out to userspace in that thing.

Hmm, __clear_user()?


- Davide



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] signalfd: retrieve multiple signals with one read() call
  2007-05-21  4:14   ` Davide Libenzi
@ 2007-05-21  4:20     ` Davide Libenzi
  2007-05-21  4:26     ` Andrew Morton
  1 sibling, 0 replies; 8+ messages in thread
From: Davide Libenzi @ 2007-05-21  4:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Davi Arnaut, Linus Torvalds, Linux Kernel Mailing List

On Sun, 20 May 2007, Davide Libenzi wrote:

> I think it fits the rule "buffer must be big enough for at least one sigingo".
                                                                       ^^^^^^^

/me wonders what sizeof(sigingo) is  :)


- Davide



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] signalfd: retrieve multiple signals with one read() call
  2007-05-21  4:14   ` Davide Libenzi
  2007-05-21  4:20     ` Davide Libenzi
@ 2007-05-21  4:26     ` Andrew Morton
  2007-05-21  5:05       ` Davide Libenzi
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2007-05-21  4:26 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Davi Arnaut, Linus Torvalds, Linux Kernel Mailing List

On Sun, 20 May 2007 21:14:38 -0700 (PDT) Davide Libenzi <davidel@xmailserver.org> wrote:

> On Sun, 20 May 2007, Andrew Morton wrote:
> 
> > If 'count' is not a multiple of sizeof(struct signalfd_siginfo)), the read()
> > will return the next smallest multiple of `count'.
> > 
> > That is, unless `count' happens to be less than 1*sizeof(struct
> > signalfd_siginfo)), in which case we return -EINVAL.
> > 
> > This seems inconsistent.
> 
> I think it fits the rule "buffer must be big enough for at least one sigingo".
> We use the special return 0; as indicator that the process we were 
> monitoring signals, detached the sighand.
> 

hm.  Kernel violates proper read() semantics in many places.  Looks like we
just did it again.

> 
> > Also, I'm desperately hunting for the place where we zero out that local
> > siginfo_t, and I ain't finding it.  Someone please convince me that we're
> > not leaking bits of kernel memory out to userspace in that thing.
> 
> Hmm, __clear_user()?

oic, yes, that thing.  Usually we'd zero out the on-stack struct, assemble
it then copy out the whole thing.  I guess doing it the way you have saves
a few instructions.  But it's the cache hit against *uinfo which will have
most of the cost, and we can't do anything about that.

Unless we just remove the __clear_user() altogether.  Who said that "Unused
memebers should be zero"?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] signalfd: retrieve multiple signals with one read() call
  2007-05-21  4:26     ` Andrew Morton
@ 2007-05-21  5:05       ` Davide Libenzi
  2007-05-21  5:12         ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Davide Libenzi @ 2007-05-21  5:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Davi Arnaut, Linus Torvalds, Linux Kernel Mailing List

On Sun, 20 May 2007, Andrew Morton wrote:

> > I think it fits the rule "buffer must be big enough for at least one sigingo".
> > We use the special return 0; as indicator that the process we were 
> > monitoring signals, detached the sighand.
> > 
> 
> hm.  Kernel violates proper read() semantics in many places.  Looks like we
> just did it again.

I think we can have the check that "if size == 0 return 0". The above 
cited return-0-on-detch would still apply for enough sized buffers. So:

1) size == 0, return 0 (POSIX wants this)

2) size < sizeof(signalfd_siginfo), return EINVAL

3) size >= sizeof(signalfd_siginfo) && DETACH, return 0

The signalfd falls into what POSIX defined as "special file", and can 
return a lower-than-size result.


> Unless we just remove the __clear_user() altogether.  Who said that "Unused
> memebers should be zero"?

Because it is a typically used value for still-unused/reserved members? 
Better than random values I think ;)
Members validity is driven by si_code & SI_MASK anyway.


- Davide



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] signalfd: retrieve multiple signals with one read() call
  2007-05-21  5:05       ` Davide Libenzi
@ 2007-05-21  5:12         ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2007-05-21  5:12 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Davi Arnaut, Linus Torvalds, Linux Kernel Mailing List

On Sun, 20 May 2007 22:05:00 -0700 (PDT) Davide Libenzi <davidel@xmailserver.org> wrote:

> On Sun, 20 May 2007, Andrew Morton wrote:
> 
> > > I think it fits the rule "buffer must be big enough for at least one sigingo".
> > > We use the special return 0; as indicator that the process we were 
> > > monitoring signals, detached the sighand.
> > > 
> > 
> > hm.  Kernel violates proper read() semantics in many places.  Looks like we
> > just did it again.
> 
> I think we can have the check that "if size == 0 return 0". The above 
> cited return-0-on-detch would still apply for enough sized buffers. So:
> 
> 1) size == 0, return 0 (POSIX wants this)
> 
> 2) size < sizeof(signalfd_siginfo), return EINVAL
> 
> 3) size >= sizeof(signalfd_siginfo) && DETACH, return 0
> 
> The signalfd falls into what POSIX defined as "special file", and can 
> return a lower-than-size result.
> 

hm, well.  I'd suggest that we do what makes most sense, rather than
warping things to try to obey the letter of posix.

> 
> > Unless we just remove the __clear_user() altogether.  Who said that "Unused
> > memebers should be zero"?
> 
> Because it is a typically used value for still-unused/reserved members? 
> Better than random values I think ;)
> Members validity is driven by si_code & SI_MASK anyway.

Sure.  And it'd be a bit rude to return 128 from the read() but to only
have written to a few bytes of the user's memory.

otoh, only-writing-a-few-bytes will be usefully quicker than zapping the
whole 128b, particularly on small-cacheline CPUs.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-05-21  5:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-20  0:07 [PATCH] signalfd: retrieve multiple signals with one read() call Davi Arnaut
2007-05-20 19:17 ` Davide Libenzi
2007-05-21  4:02 ` Andrew Morton
2007-05-21  4:14   ` Davide Libenzi
2007-05-21  4:20     ` Davide Libenzi
2007-05-21  4:26     ` Andrew Morton
2007-05-21  5:05       ` Davide Libenzi
2007-05-21  5:12         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox