linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2)
@ 2012-12-28 10:22 Andrey Vagin
  2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Andrey Vagin @ 2012-12-28 10:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Serge Hallyn,
	Oleg Nesterov, Andrew Morton, Eric W. Biederman, Al Viro,
	Pavel Emelyanov, Cyrill Gorcunov, Michael Kerrisk

The idea is simple. We need to get the siginfo for each signal on dump,
and then return it back on restore.

The first problem is that the kernel doesn’t report complete siginfo-s
in user-space. In a signal handler the kernel strips SI_CODE from
siginfo. When a siginfo is received from signalfd, it has a different
format with fixed sizes of fields. The interface of signalfd was
extended. If a signalfd is created with the flag SFD_RAW, it returns
siginfo in a raw format.

rt_sigqueueinfo looks suitable for restoring signals, but it can’t send
siginfo with a positive si_code, because these codes are reserved for
the kernel. In the real world each person has right to do anything with
himself, so I think a process should able to send any siginfo to itself.

v2: add ability to dump signals without dequeuing them.
    pread with non-zero offset is used for this. offset encodes
    a queue (private of shared) and a sequence number of a signal
    in the queue. Thanks to Oleg for this idea.

Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] signal: allow to send any siginfo to itself
  2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin
@ 2012-12-28 10:22 ` Andrey Vagin
  2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin
  2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin
  2 siblings, 0 replies; 11+ messages in thread
From: Andrey Vagin @ 2012-12-28 10:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Oleg Nesterov,
	Serge Hallyn, Andrew Morton, Eric W. Biederman, Al Viro,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

A kernel prevents of sending siginfo with positive si_code, because
these codes is reserved for kernel. I think we can allow to send any
siginfo to itself. This operation should not be dangerous.

This functionality is required for restoring signals.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 kernel/signal.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 7aaa51d..ac5f5e7 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2944,7 +2944,8 @@ SYSCALL_DEFINE3(rt_sigqueueinfo, pid_t, pid, int, sig,
 	/* Not even root can pretend to send signals from the kernel.
 	 * Nor can they impersonate a kill()/tgkill(), which adds source info.
 	 */
-	if (info.si_code >= 0 || info.si_code == SI_TKILL) {
+	if (((info.si_code >= 0 || info.si_code == SI_TKILL)) &&
+	    (task_pid_vnr(current) != pid)) {
 		/* We used to allow any < 0 si_code */
 		WARN_ON_ONCE(info.si_code < 0);
 		return -EPERM;
@@ -2964,7 +2965,8 @@ long do_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig, siginfo_t *info)
 	/* Not even root can pretend to send signals from the kernel.
 	 * Nor can they impersonate a kill()/tgkill(), which adds source info.
 	 */
-	if (info->si_code >= 0 || info->si_code == SI_TKILL) {
+	if ((info->si_code >= 0 || info->si_code == SI_TKILL) &&
+	    (task_pid_vnr(current) != pid)) {
 		/* We used to allow any < 0 si_code */
 		WARN_ON_ONCE(info->si_code < 0);
 		return -EPERM;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
  2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin
  2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin
@ 2012-12-28 10:23 ` Andrey Vagin
       [not found]   ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin
  2 siblings, 1 reply; 11+ messages in thread
From: Andrey Vagin @ 2012-12-28 10:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

signalfd should be called with the flag SFD_RAW for that.

signalfd_siginfo is not full for siginfo with a negative si_code.
copy_siginfo_to_user() is copied a full siginfo to user-space, if
si_code is negative.  signalfd_copyinfo() doesn't do that and can't be
expanded, because it has not compatible format with siginfo_t.

Another problem is that a constant __SI_* is removed from si_code.
It's not a problem for usual applications, because they expect
a defined type of siginfo (internal logic).
When we want to dump pending signals, we can't predict a type of
siginfo, so we should get it from kernel.

The main idea of the raw format is that it should be enough for
restoring exactly the same siginfo for the current process.

This functionality is required for checkpointing pending signals.

v2: fix a race condition during setting file flags
    copy_siginfo_to_user32() if is_compat_task

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/signalfd.c                 | 64 +++++++++++++++++++++++++++++++++++++++----
 include/uapi/linux/signalfd.h |  1 +
 2 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index b534869..4439a81 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -30,6 +30,7 @@
 #include <linux/signalfd.h>
 #include <linux/syscalls.h>
 #include <linux/proc_fs.h>
+#include <linux/compat.h>
 
 void signalfd_cleanup(struct sighand_struct *sighand)
 {
@@ -74,6 +75,38 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait)
 }
 
 /*
+ * Copy a whole siginfo into users spaces.
+ * The main idea of this format is that it should be enough
+ * for restoring siginfo back into the kernel.
+ */
+static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo,
+					siginfo_t *kinfo)
+{
+	siginfo_t *uinfo = (siginfo_t *) siginfo;
+	int err;
+
+	BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo));
+
+	err = __clear_user(uinfo, sizeof(*uinfo));
+
+#ifdef CONFIG_COMPAT
+	if (unlikely(is_compat_task())) {
+		compat_siginfo_t *compat_uinfo = (compat_siginfo_t *) siginfo;
+
+		err |= copy_siginfo_to_user32(compat_uinfo, kinfo);
+		err |= put_user(kinfo->si_code, &compat_uinfo->si_code);
+
+		return err ? -EFAULT: sizeof(*compat_uinfo);
+	}
+#endif
+
+	err |= copy_siginfo_to_user(uinfo, kinfo);
+	err |= put_user(kinfo->si_code, &uinfo->si_code);
+
+	return err ? -EFAULT: sizeof(*uinfo);
+}
+
+/*
  * Copied from copy_siginfo_to_user() in kernel/signal.c
  */
 static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
@@ -205,6 +238,7 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 	struct signalfd_ctx *ctx = file->private_data;
 	struct signalfd_siginfo __user *siginfo;
 	int nonblock = file->f_flags & O_NONBLOCK;
+	bool raw = file->f_flags & SFD_RAW;
 	ssize_t ret, total = 0;
 	siginfo_t info;
 
@@ -217,7 +251,12 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 		ret = signalfd_dequeue(ctx, &info, nonblock);
 		if (unlikely(ret <= 0))
 			break;
-		ret = signalfd_copyinfo(siginfo, &info);
+
+		if (raw)
+			ret = signalfd_copy_raw_info(siginfo, &info);
+		else
+			ret = signalfd_copyinfo(siginfo, &info);
+
 		if (ret < 0)
 			break;
 		siginfo++;
@@ -262,7 +301,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 	BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC);
 	BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK);
 
-	if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK))
+	if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_RAW))
 		return -EINVAL;
 
 	if (sizemask != sizeof(sigset_t) ||
@@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 	signotset(&sigmask);
 
 	if (ufd == -1) {
+		struct file *file;
 		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
 		if (!ctx)
 			return -ENOMEM;
 
 		ctx->sigmask = sigmask;
 
+		ufd = get_unused_fd_flags(flags);
+		if (ufd < 0) {
+			kfree(ctx);
+			goto out;
+		}
+
 		/*
 		 * When we call this, the initialization must be complete, since
 		 * anon_inode_getfd() will install the fd.
 		 */
-		ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx,
+		file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx,
 				       O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
-		if (ufd < 0)
+		if (IS_ERR(file)) {
+			put_unused_fd(ufd);
+			ufd = PTR_ERR(file);
 			kfree(ctx);
+			goto out;
+		}
+
+		file->f_flags |= flags & SFD_RAW;
+
+		fd_install(ufd, file);
 	} else {
 		struct fd f = fdget(ufd);
 		if (!f.file)
@@ -302,7 +356,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 		wake_up(&current->sighand->signalfd_wqh);
 		fdput(f);
 	}
-
+out:
 	return ufd;
 }
 
diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h
index 492c6de..bc31849 100644
--- a/include/uapi/linux/signalfd.h
+++ b/include/uapi/linux/signalfd.h
@@ -15,6 +15,7 @@
 /* Flags for signalfd4.  */
 #define SFD_CLOEXEC O_CLOEXEC
 #define SFD_NONBLOCK O_NONBLOCK
+#define SFD_RAW O_DIRECT
 
 struct signalfd_siginfo {
 	__u32 ssi_signo;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3)
  2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin
  2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin
  2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin
@ 2012-12-28 10:23 ` Andrey Vagin
       [not found]   ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  2 siblings, 1 reply; 11+ messages in thread
From: Andrey Vagin @ 2012-12-28 10:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Oleg Nesterov,
	Alexander Viro, Paul E. McKenney, David Howells, Dave Jones,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

pread(fd, buf, size, pos) with non-zero pos returns siginfo-s
without dequeuing signals.

A sequence number and a queue are encoded in pos.

pos = seq + SFD_*_OFFSET

seq is a sequence number of a signal in a queue.

SFD_PER_THREAD_QUEUE_OFFSET - read signals from a per-thread queue.
SFD_SHARED_QUEUE_OFFSET - read signals from a shared (process wide) queue.

This functionality is required for checkpointing pending signals.

v2: llseek() can't be used here, because peek_offset/f_pos/whatever
has to be shared with all processes which have this file opened.

Suppose that the task forks after sys_signalfd(). Now if parent or child
do llseek this affects them both. This is insane because signalfd is
"strange" to say at least, fork/dup/etc inherits signalfd_ctx but not
the" source" of the data. // Oleg Nesterov

v3: minor cleanup

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/signalfd.c                 | 44 ++++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/signalfd.h |  5 +++++
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index 4439a81..86eb130 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -51,6 +51,43 @@ struct signalfd_ctx {
 	sigset_t sigmask;
 };
 
+static ssize_t signalfd_peek(struct signalfd_ctx *ctx,
+				siginfo_t *info, loff_t *ppos)
+{
+	struct sigpending *pending;
+	struct sigqueue *q;
+	loff_t seq;
+	int ret = 0;
+
+	spin_lock_irq(&current->sighand->siglock);
+
+	if (*ppos >= SFD_SHARED_QUEUE_OFFSET) {
+		pending = &current->signal->shared_pending;
+		seq = *ppos - SFD_SHARED_QUEUE_OFFSET;
+	} else {
+		pending = &current->pending;
+		seq = *ppos - SFD_PER_THREAD_QUEUE_OFFSET;
+	}
+
+	list_for_each_entry(q, &pending->list, list) {
+		if (sigismember(&ctx->sigmask, q->info.si_signo))
+			continue;
+
+		if (seq-- == 0) {
+			copy_siginfo(info, &q->info);
+			ret = info->si_signo;
+			break;
+		}
+	}
+
+	spin_unlock_irq(&current->sighand->siglock);
+
+	if (ret)
+		(*ppos)++;
+
+	return ret;
+}
+
 static int signalfd_release(struct inode *inode, struct file *file)
 {
 	kfree(file->private_data);
@@ -248,7 +285,11 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 
 	siginfo = (struct signalfd_siginfo __user *) buf;
 	do {
-		ret = signalfd_dequeue(ctx, &info, nonblock);
+		if (*ppos == 0)
+			ret = signalfd_dequeue(ctx, &info, nonblock);
+		else
+			ret = signalfd_peek(ctx, &info, ppos);
+
 		if (unlikely(ret <= 0))
 			break;
 
@@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 		}
 
 		file->f_flags |= flags & SFD_RAW;
+		file->f_mode |= FMODE_PREAD;
 
 		fd_install(ufd, file);
 	} else {
diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h
index bc31849..0953785 100644
--- a/include/uapi/linux/signalfd.h
+++ b/include/uapi/linux/signalfd.h
@@ -17,6 +17,11 @@
 #define SFD_NONBLOCK O_NONBLOCK
 #define SFD_RAW O_DIRECT
 
+/* Read signals from a shared (process wide) queue */
+#define SFD_SHARED_QUEUE_OFFSET (1LL << 62)
+/* Read signals from a per-thread queue */
+#define SFD_PER_THREAD_QUEUE_OFFSET 1
+
 struct signalfd_siginfo {
 	__u32 ssi_signo;
 	__s32 ssi_errno;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3)
       [not found]   ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2012-12-28 14:32     ` Oleg Nesterov
       [not found]       ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2012-12-28 14:32 UTC (permalink / raw)
  To: Andrey Vagin, Linus Torvalds
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	Paul E. McKenney, David Howells, Dave Jones, Michael Kerrisk,
	Pavel Emelyanov, Cyrill Gorcunov

On 12/28, Andrey Vagin wrote:
>
> pread(fd, buf, size, pos) with non-zero pos returns siginfo-s
> without dequeuing signals.
>
> A sequence number and a queue are encoded in pos.
>
> pos = seq + SFD_*_OFFSET
>
> seq is a sequence number of a signal in a queue.
>
> SFD_PER_THREAD_QUEUE_OFFSET - read signals from a per-thread queue.
> SFD_SHARED_QUEUE_OFFSET - read signals from a shared (process wide) queue.
>
> This functionality is required for checkpointing pending signals.
>
> v2: llseek() can't be used here, because peek_offset/f_pos/whatever
> has to be shared with all processes which have this file opened.
>
> Suppose that the task forks after sys_signalfd(). Now if parent or child
> do llseek this affects them both. This is insane because signalfd is
> "strange" to say at least, fork/dup/etc inherits signalfd_ctx but not
> the" source" of the data. // Oleg Nesterov

I think we should cc Linus.

This patch adds the hack and it makes signalfd even more strange.

Yes, this hack was suggested by me because I can't suggest something
better. But if Linus dislikes this user-visible API it would be better
to get his nack right now.

> +static ssize_t signalfd_peek(struct signalfd_ctx *ctx,
> +				siginfo_t *info, loff_t *ppos)
> +{
> +	struct sigpending *pending;
> +	struct sigqueue *q;
> +	loff_t seq;
> +	int ret = 0;
> +
> +	spin_lock_irq(&current->sighand->siglock);
> +
> +	if (*ppos >= SFD_SHARED_QUEUE_OFFSET) {
> +		pending = &current->signal->shared_pending;
> +		seq = *ppos - SFD_SHARED_QUEUE_OFFSET;
> +	} else {
> +		pending = &current->pending;
> +		seq = *ppos - SFD_PER_THREAD_QUEUE_OFFSET;
> +	}

You can do this outside of spin_lock_irq().

And I think it would be better to check SFD_PRIVATE_QUEUE_OFFSET too
although this is not strictly necessary. Otherwise this code assumes
that sys_pread() cheks pos >= 0 and SFD_PRIVATE_QUEUE_OFFSET == 1.

> +	list_for_each_entry(q, &pending->list, list) {
> +		if (sigismember(&ctx->sigmask, q->info.si_signo))
> +			continue;
> +
> +		if (seq-- == 0) {
> +			copy_siginfo(info, &q->info);
> +			ret = info->si_signo;
> +			break;
> +		}
> +	}
> +
> +	spin_unlock_irq(&current->sighand->siglock);
> +
> +	if (ret)
> +		(*ppos)++;

We can change it unconditionally but I won't argue.

> @@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>  		}
>
>  		file->f_flags |= flags & SFD_RAW;
> +		file->f_mode |= FMODE_PREAD;

Again, this is not needed or the code was broken by the previous patch.

Given that 2/3 passes O_RDWR to anon_inode_getfile() I think FMODE_PREAD
should be already set. Note OPEN_FMODE(flags) in anon_inode_getfile().

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3)
       [not found]       ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-12-28 15:40         ` Oleg Nesterov
  0 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2012-12-28 15:40 UTC (permalink / raw)
  To: Andrey Vagin, Linus Torvalds
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	Paul E. McKenney, David Howells, Dave Jones, Michael Kerrisk,
	Pavel Emelyanov, Cyrill Gorcunov

On 12/28, Oleg Nesterov wrote:
>
> On 12/28, Andrey Vagin wrote:
> >
> > @@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
> >  		}
> >
> >  		file->f_flags |= flags & SFD_RAW;
> > +		file->f_mode |= FMODE_PREAD;
>
> Again, this is not needed or the code was broken by the previous patch.
>
> Given that 2/3 passes O_RDWR to anon_inode_getfile() I think FMODE_PREAD
> should be already set. Note OPEN_FMODE(flags) in anon_inode_getfile().

As you explained in another thread I was wrong, I confused FMODE_PREAD
and FMODE_READ.

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
       [not found]   ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2012-12-28 16:14     ` Oleg Nesterov
  2013-01-10  9:47       ` Andrey Wagin
  0 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2012-12-28 16:14 UTC (permalink / raw)
  To: Andrey Vagin, Linus Torvalds
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner, Michael Kerrisk,
	Pavel Emelyanov, Cyrill Gorcunov

On 12/28, Andrey Vagin wrote:
>
> signalfd should be called with the flag SFD_RAW for that.
>
> signalfd_siginfo is not full for siginfo with a negative si_code.
> copy_siginfo_to_user() is copied a full siginfo to user-space, if
> si_code is negative.  signalfd_copyinfo() doesn't do that and can't be
> expanded, because it has not compatible format with siginfo_t.
>
> Another problem is that a constant __SI_* is removed from si_code.
> It's not a problem for usual applications, because they expect
> a defined type of siginfo (internal logic).
> When we want to dump pending signals, we can't predict a type of
> siginfo, so we should get it from kernel.
>
> The main idea of the raw format is that it should be enough for
> restoring exactly the same siginfo for the current process.
>
> This functionality is required for checkpointing pending signals.

And this should be used along with pread().

I won't argue, but perhaps we should simply postulate that pread()
always dumps siginfo in raw format and avoid the new flag?

> @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>  	signotset(&sigmask);
>  
>  	if (ufd == -1) {
> +		struct file *file;
>  		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>  		if (!ctx)
>  			return -ENOMEM;
>  
>  		ctx->sigmask = sigmask;
>  
> +		ufd = get_unused_fd_flags(flags);
> +		if (ufd < 0) {
> +			kfree(ctx);
> +			goto out;
> +		}
> +
>  		/*
>  		 * When we call this, the initialization must be complete, since
>  		 * anon_inode_getfd() will install the fd.
>  		 */
> -		ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx,
> +		file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx,
>  				       O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
> -		if (ufd < 0)
> +		if (IS_ERR(file)) {
> +			put_unused_fd(ufd);
> +			ufd = PTR_ERR(file);
>  			kfree(ctx);
> +			goto out;
> +		}
> +
> +		file->f_flags |= flags & SFD_RAW;
> +
> +		fd_install(ufd, file);

And this is needed because we want to set more bits in f_flags/f_mode.

I am wondering if we can change anon_inode_getfd/getfile somehow so
that the user can pass more flags...

Or. Currently fops->open() is never used if this file_operations is used
by anon_inode_get*. So perhaps we can change anon_inode_getfile() to call
fops->open() if it is not zero before return?

OK, this is almost off-topic, please ignore.

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
  2012-12-28 16:14     ` Oleg Nesterov
@ 2013-01-10  9:47       ` Andrey Wagin
  2013-01-10 22:45         ` Michael Kerrisk (man-pages)
  2013-01-12 18:55         ` Oleg Nesterov
  0 siblings, 2 replies; 11+ messages in thread
From: Andrey Wagin @ 2013-01-10  9:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, linux-kernel, criu, linux-fsdevel, linux-api,
	Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

2012/12/28 Oleg Nesterov <oleg@redhat.com>:
> On 12/28, Andrey Vagin wrote:
>>
>> signalfd should be called with the flag SFD_RAW for that.
>>
>> signalfd_siginfo is not full for siginfo with a negative si_code.
>> copy_siginfo_to_user() is copied a full siginfo to user-space, if
>> si_code is negative.  signalfd_copyinfo() doesn't do that and can't be
>> expanded, because it has not compatible format with siginfo_t.
>>
>> Another problem is that a constant __SI_* is removed from si_code.
>> It's not a problem for usual applications, because they expect
>> a defined type of siginfo (internal logic).
>> When we want to dump pending signals, we can't predict a type of
>> siginfo, so we should get it from kernel.
>>
>> The main idea of the raw format is that it should be enough for
>> restoring exactly the same siginfo for the current process.
>>
>> This functionality is required for checkpointing pending signals.
>
> And this should be used along with pread().
>
> I won't argue, but perhaps we should simply postulate that pread()
> always dumps siginfo in raw format and avoid the new flag?

I won't argue too:), but I think the flag is more flexiable. The flag
allows to get raw siginfo from read(), it may be useful not only for
checkpoint/restore. pread with offset = 0 is equal to read(), so the
interface with the flag is more clear.

>
>> @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>>       signotset(&sigmask);
>>
>>       if (ufd == -1) {
>> +             struct file *file;
>>               ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>>               if (!ctx)
>>                       return -ENOMEM;
>>
>>               ctx->sigmask = sigmask;
>>
>> +             ufd = get_unused_fd_flags(flags);
>> +             if (ufd < 0) {
>> +                     kfree(ctx);
>> +                     goto out;
>> +             }
>> +
>>               /*
>>                * When we call this, the initialization must be complete, since
>>                * anon_inode_getfd() will install the fd.
>>                */
>> -             ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx,
>> +             file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx,
>>                                      O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
>> -             if (ufd < 0)
>> +             if (IS_ERR(file)) {
>> +                     put_unused_fd(ufd);
>> +                     ufd = PTR_ERR(file);
>>                       kfree(ctx);
>> +                     goto out;
>> +             }
>> +
>> +             file->f_flags |= flags & SFD_RAW;
>> +
>> +             fd_install(ufd, file);
>
> And this is needed because we want to set more bits in f_flags/f_mode.
>
> I am wondering if we can change anon_inode_getfd/getfile somehow so
> that the user can pass more flags...
>
> Or. Currently fops->open() is never used if this file_operations is used
> by anon_inode_get*. So perhaps we can change anon_inode_getfile() to call
> fops->open() if it is not zero before return?
>
> OK, this is almost off-topic, please ignore.
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
  2013-01-10  9:47       ` Andrey Wagin
@ 2013-01-10 22:45         ` Michael Kerrisk (man-pages)
  2013-01-12 18:55         ` Oleg Nesterov
  1 sibling, 0 replies; 11+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-10 22:45 UTC (permalink / raw)
  To: Andrey Wagin
  Cc: Oleg Nesterov, Linus Torvalds, linux-kernel, criu, linux-fsdevel,
	linux-api, Alexander Viro, Paul E. McKenney, David Howells,
	Thomas Gleixner, Pavel Emelyanov, Cyrill Gorcunov

On Thu, Jan 10, 2013 at 10:47 AM, Andrey Wagin <avagin@gmail.com> wrote:
> 2012/12/28 Oleg Nesterov <oleg@redhat.com>:
>> On 12/28, Andrey Vagin wrote:
>>>
>>> signalfd should be called with the flag SFD_RAW for that.
>>>
>>> signalfd_siginfo is not full for siginfo with a negative si_code.
>>> copy_siginfo_to_user() is copied a full siginfo to user-space, if
>>> si_code is negative.  signalfd_copyinfo() doesn't do that and can't be
>>> expanded, because it has not compatible format with siginfo_t.
>>>
>>> Another problem is that a constant __SI_* is removed from si_code.
>>> It's not a problem for usual applications, because they expect
>>> a defined type of siginfo (internal logic).
>>> When we want to dump pending signals, we can't predict a type of
>>> siginfo, so we should get it from kernel.
>>>
>>> The main idea of the raw format is that it should be enough for
>>> restoring exactly the same siginfo for the current process.
>>>
>>> This functionality is required for checkpointing pending signals.
>>
>> And this should be used along with pread().
>>
>> I won't argue, but perhaps we should simply postulate that pread()
>> always dumps siginfo in raw format and avoid the new flag?
>
> I won't argue too:), but I think the flag is more flexiable. The flag
> allows to get raw siginfo from read(), it may be useful not only for
> checkpoint/restore. pread with offset = 0 is equal to read(), so the
> interface with the flag is more clear.

Special casing pread() behavior for signalfd() so that SFD_RAW is not
required seems only to make the ugliness worse...

While I appreciate the need for this API, multiplexing a queue
selector and a positional value into what is normally a byte offset in
pread() is really bizarre. This is ugly and I wonder if something
better is possible, though so far I have no suggestions beyond adding
a new system call.


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
  2013-01-10  9:47       ` Andrey Wagin
  2013-01-10 22:45         ` Michael Kerrisk (man-pages)
@ 2013-01-12 18:55         ` Oleg Nesterov
  1 sibling, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2013-01-12 18:55 UTC (permalink / raw)
  To: Andrey Wagin
  Cc: Linus Torvalds, linux-kernel, criu, linux-fsdevel, linux-api,
	Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

On 01/10, Andrey Wagin wrote:
>
> 2012/12/28 Oleg Nesterov <oleg@redhat.com>:
> >>
> >> This functionality is required for checkpointing pending signals.
> >
> > And this should be used along with pread().
> >
> > I won't argue, but perhaps we should simply postulate that pread()
> > always dumps siginfo in raw format and avoid the new flag?
>
> I won't argue too:), but I think the flag is more flexiable. The flag
> allows to get raw siginfo from read(), it may be useful not only for
> checkpoint/restore. pread with offset = 0 is equal to read(), so the
> interface with the flag is more clear.

OK, I agree.

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] signal: allow to send any siginfo to itself
  2013-01-14 16:53 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending " Andrey Vagin
@ 2013-01-14 16:53 ` Andrey Vagin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Vagin @ 2013-01-14 16:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Oleg Nesterov,
	Serge Hallyn, Andrew Morton, Eric W. Biederman, Al Viro,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

A kernel prevents of sending siginfo with positive si_code, because
these codes is reserved for kernel. I think we can allow to send any
siginfo to itself. This operation should not be dangerous.

This functionality is required for restoring signals.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 kernel/signal.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 7aaa51d..ac5f5e7 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2944,7 +2944,8 @@ SYSCALL_DEFINE3(rt_sigqueueinfo, pid_t, pid, int, sig,
 	/* Not even root can pretend to send signals from the kernel.
 	 * Nor can they impersonate a kill()/tgkill(), which adds source info.
 	 */
-	if (info.si_code >= 0 || info.si_code == SI_TKILL) {
+	if (((info.si_code >= 0 || info.si_code == SI_TKILL)) &&
+	    (task_pid_vnr(current) != pid)) {
 		/* We used to allow any < 0 si_code */
 		WARN_ON_ONCE(info.si_code < 0);
 		return -EPERM;
@@ -2964,7 +2965,8 @@ long do_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig, siginfo_t *info)
 	/* Not even root can pretend to send signals from the kernel.
 	 * Nor can they impersonate a kill()/tgkill(), which adds source info.
 	 */
-	if (info->si_code >= 0 || info->si_code == SI_TKILL) {
+	if ((info->si_code >= 0 || info->si_code == SI_TKILL) &&
+	    (task_pid_vnr(current) != pid)) {
 		/* We used to allow any < 0 si_code */
 		WARN_ON_ONCE(info->si_code < 0);
 		return -EPERM;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-01-14 16:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin
2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin
2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin
     [not found]   ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2012-12-28 16:14     ` Oleg Nesterov
2013-01-10  9:47       ` Andrey Wagin
2013-01-10 22:45         ` Michael Kerrisk (man-pages)
2013-01-12 18:55         ` Oleg Nesterov
2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin
     [not found]   ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2012-12-28 14:32     ` Oleg Nesterov
     [not found]       ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-12-28 15:40         ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2013-01-14 16:53 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending " Andrey Vagin
2013-01-14 16:53 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).