* [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2)
@ 2012-12-28 10:22 Andrey Vagin
2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Andrey Vagin @ 2012-12-28 10:22 UTC (permalink / raw)
To: linux-kernel
Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Serge Hallyn,
Oleg Nesterov, Andrew Morton, Eric W. Biederman, Al Viro,
Pavel Emelyanov, Cyrill Gorcunov, Michael Kerrisk
The idea is simple. We need to get the siginfo for each signal on dump,
and then return it back on restore.
The first problem is that the kernel doesn’t report complete siginfo-s
in user-space. In a signal handler the kernel strips SI_CODE from
siginfo. When a siginfo is received from signalfd, it has a different
format with fixed sizes of fields. The interface of signalfd was
extended. If a signalfd is created with the flag SFD_RAW, it returns
siginfo in a raw format.
rt_sigqueueinfo looks suitable for restoring signals, but it can’t send
siginfo with a positive si_code, because these codes are reserved for
the kernel. In the real world each person has right to do anything with
himself, so I think a process should able to send any siginfo to itself.
v2: add ability to dump signals without dequeuing them.
pread with non-zero offset is used for this. offset encodes
a queue (private of shared) and a sequence number of a signal
in the queue. Thanks to Oleg for this idea.
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
--
1.7.11.7
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] signal: allow to send any siginfo to itself
2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin
@ 2012-12-28 10:22 ` Andrey Vagin
2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin
2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin
2 siblings, 0 replies; 16+ messages in thread
From: Andrey Vagin @ 2012-12-28 10:22 UTC (permalink / raw)
To: linux-kernel
Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Oleg Nesterov,
Serge Hallyn, Andrew Morton, Eric W. Biederman, Al Viro,
Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov
A kernel prevents of sending siginfo with positive si_code, because
these codes is reserved for kernel. I think we can allow to send any
siginfo to itself. This operation should not be dangerous.
This functionality is required for restoring signals.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
kernel/signal.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index 7aaa51d..ac5f5e7 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2944,7 +2944,8 @@ SYSCALL_DEFINE3(rt_sigqueueinfo, pid_t, pid, int, sig,
/* Not even root can pretend to send signals from the kernel.
* Nor can they impersonate a kill()/tgkill(), which adds source info.
*/
- if (info.si_code >= 0 || info.si_code == SI_TKILL) {
+ if (((info.si_code >= 0 || info.si_code == SI_TKILL)) &&
+ (task_pid_vnr(current) != pid)) {
/* We used to allow any < 0 si_code */
WARN_ON_ONCE(info.si_code < 0);
return -EPERM;
@@ -2964,7 +2965,8 @@ long do_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig, siginfo_t *info)
/* Not even root can pretend to send signals from the kernel.
* Nor can they impersonate a kill()/tgkill(), which adds source info.
*/
- if (info->si_code >= 0 || info->si_code == SI_TKILL) {
+ if ((info->si_code >= 0 || info->si_code == SI_TKILL) &&
+ (task_pid_vnr(current) != pid)) {
/* We used to allow any < 0 si_code */
WARN_ON_ONCE(info->si_code < 0);
return -EPERM;
--
1.7.11.7
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin
2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin
@ 2012-12-28 10:23 ` Andrey Vagin
[not found] ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin
2 siblings, 1 reply; 16+ messages in thread
From: Andrey Vagin @ 2012-12-28 10:23 UTC (permalink / raw)
To: linux-kernel
Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Alexander Viro,
Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov,
Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov
signalfd should be called with the flag SFD_RAW for that.
signalfd_siginfo is not full for siginfo with a negative si_code.
copy_siginfo_to_user() is copied a full siginfo to user-space, if
si_code is negative. signalfd_copyinfo() doesn't do that and can't be
expanded, because it has not compatible format with siginfo_t.
Another problem is that a constant __SI_* is removed from si_code.
It's not a problem for usual applications, because they expect
a defined type of siginfo (internal logic).
When we want to dump pending signals, we can't predict a type of
siginfo, so we should get it from kernel.
The main idea of the raw format is that it should be enough for
restoring exactly the same siginfo for the current process.
This functionality is required for checkpointing pending signals.
v2: fix a race condition during setting file flags
copy_siginfo_to_user32() if is_compat_task
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
fs/signalfd.c | 64 +++++++++++++++++++++++++++++++++++++++----
include/uapi/linux/signalfd.h | 1 +
2 files changed, 60 insertions(+), 5 deletions(-)
diff --git a/fs/signalfd.c b/fs/signalfd.c
index b534869..4439a81 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -30,6 +30,7 @@
#include <linux/signalfd.h>
#include <linux/syscalls.h>
#include <linux/proc_fs.h>
+#include <linux/compat.h>
void signalfd_cleanup(struct sighand_struct *sighand)
{
@@ -74,6 +75,38 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait)
}
/*
+ * Copy a whole siginfo into users spaces.
+ * The main idea of this format is that it should be enough
+ * for restoring siginfo back into the kernel.
+ */
+static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo,
+ siginfo_t *kinfo)
+{
+ siginfo_t *uinfo = (siginfo_t *) siginfo;
+ int err;
+
+ BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo));
+
+ err = __clear_user(uinfo, sizeof(*uinfo));
+
+#ifdef CONFIG_COMPAT
+ if (unlikely(is_compat_task())) {
+ compat_siginfo_t *compat_uinfo = (compat_siginfo_t *) siginfo;
+
+ err |= copy_siginfo_to_user32(compat_uinfo, kinfo);
+ err |= put_user(kinfo->si_code, &compat_uinfo->si_code);
+
+ return err ? -EFAULT: sizeof(*compat_uinfo);
+ }
+#endif
+
+ err |= copy_siginfo_to_user(uinfo, kinfo);
+ err |= put_user(kinfo->si_code, &uinfo->si_code);
+
+ return err ? -EFAULT: sizeof(*uinfo);
+}
+
+/*
* Copied from copy_siginfo_to_user() in kernel/signal.c
*/
static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
@@ -205,6 +238,7 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
struct signalfd_ctx *ctx = file->private_data;
struct signalfd_siginfo __user *siginfo;
int nonblock = file->f_flags & O_NONBLOCK;
+ bool raw = file->f_flags & SFD_RAW;
ssize_t ret, total = 0;
siginfo_t info;
@@ -217,7 +251,12 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
ret = signalfd_dequeue(ctx, &info, nonblock);
if (unlikely(ret <= 0))
break;
- ret = signalfd_copyinfo(siginfo, &info);
+
+ if (raw)
+ ret = signalfd_copy_raw_info(siginfo, &info);
+ else
+ ret = signalfd_copyinfo(siginfo, &info);
+
if (ret < 0)
break;
siginfo++;
@@ -262,7 +301,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK);
- if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK))
+ if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_RAW))
return -EINVAL;
if (sizemask != sizeof(sigset_t) ||
@@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
signotset(&sigmask);
if (ufd == -1) {
+ struct file *file;
ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
return -ENOMEM;
ctx->sigmask = sigmask;
+ ufd = get_unused_fd_flags(flags);
+ if (ufd < 0) {
+ kfree(ctx);
+ goto out;
+ }
+
/*
* When we call this, the initialization must be complete, since
* anon_inode_getfd() will install the fd.
*/
- ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx,
+ file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx,
O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
- if (ufd < 0)
+ if (IS_ERR(file)) {
+ put_unused_fd(ufd);
+ ufd = PTR_ERR(file);
kfree(ctx);
+ goto out;
+ }
+
+ file->f_flags |= flags & SFD_RAW;
+
+ fd_install(ufd, file);
} else {
struct fd f = fdget(ufd);
if (!f.file)
@@ -302,7 +356,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
wake_up(¤t->sighand->signalfd_wqh);
fdput(f);
}
-
+out:
return ufd;
}
diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h
index 492c6de..bc31849 100644
--- a/include/uapi/linux/signalfd.h
+++ b/include/uapi/linux/signalfd.h
@@ -15,6 +15,7 @@
/* Flags for signalfd4. */
#define SFD_CLOEXEC O_CLOEXEC
#define SFD_NONBLOCK O_NONBLOCK
+#define SFD_RAW O_DIRECT
struct signalfd_siginfo {
__u32 ssi_signo;
--
1.7.11.7
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3)
2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin
2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin
2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin
@ 2012-12-28 10:23 ` Andrey Vagin
[not found] ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2 siblings, 1 reply; 16+ messages in thread
From: Andrey Vagin @ 2012-12-28 10:23 UTC (permalink / raw)
To: linux-kernel
Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Oleg Nesterov,
Alexander Viro, Paul E. McKenney, David Howells, Dave Jones,
Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov
pread(fd, buf, size, pos) with non-zero pos returns siginfo-s
without dequeuing signals.
A sequence number and a queue are encoded in pos.
pos = seq + SFD_*_OFFSET
seq is a sequence number of a signal in a queue.
SFD_PER_THREAD_QUEUE_OFFSET - read signals from a per-thread queue.
SFD_SHARED_QUEUE_OFFSET - read signals from a shared (process wide) queue.
This functionality is required for checkpointing pending signals.
v2: llseek() can't be used here, because peek_offset/f_pos/whatever
has to be shared with all processes which have this file opened.
Suppose that the task forks after sys_signalfd(). Now if parent or child
do llseek this affects them both. This is insane because signalfd is
"strange" to say at least, fork/dup/etc inherits signalfd_ctx but not
the" source" of the data. // Oleg Nesterov
v3: minor cleanup
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
fs/signalfd.c | 44 ++++++++++++++++++++++++++++++++++++++++++-
include/uapi/linux/signalfd.h | 5 +++++
2 files changed, 48 insertions(+), 1 deletion(-)
diff --git a/fs/signalfd.c b/fs/signalfd.c
index 4439a81..86eb130 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -51,6 +51,43 @@ struct signalfd_ctx {
sigset_t sigmask;
};
+static ssize_t signalfd_peek(struct signalfd_ctx *ctx,
+ siginfo_t *info, loff_t *ppos)
+{
+ struct sigpending *pending;
+ struct sigqueue *q;
+ loff_t seq;
+ int ret = 0;
+
+ spin_lock_irq(¤t->sighand->siglock);
+
+ if (*ppos >= SFD_SHARED_QUEUE_OFFSET) {
+ pending = ¤t->signal->shared_pending;
+ seq = *ppos - SFD_SHARED_QUEUE_OFFSET;
+ } else {
+ pending = ¤t->pending;
+ seq = *ppos - SFD_PER_THREAD_QUEUE_OFFSET;
+ }
+
+ list_for_each_entry(q, &pending->list, list) {
+ if (sigismember(&ctx->sigmask, q->info.si_signo))
+ continue;
+
+ if (seq-- == 0) {
+ copy_siginfo(info, &q->info);
+ ret = info->si_signo;
+ break;
+ }
+ }
+
+ spin_unlock_irq(¤t->sighand->siglock);
+
+ if (ret)
+ (*ppos)++;
+
+ return ret;
+}
+
static int signalfd_release(struct inode *inode, struct file *file)
{
kfree(file->private_data);
@@ -248,7 +285,11 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
siginfo = (struct signalfd_siginfo __user *) buf;
do {
- ret = signalfd_dequeue(ctx, &info, nonblock);
+ if (*ppos == 0)
+ ret = signalfd_dequeue(ctx, &info, nonblock);
+ else
+ ret = signalfd_peek(ctx, &info, ppos);
+
if (unlikely(ret <= 0))
break;
@@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
}
file->f_flags |= flags & SFD_RAW;
+ file->f_mode |= FMODE_PREAD;
fd_install(ufd, file);
} else {
diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h
index bc31849..0953785 100644
--- a/include/uapi/linux/signalfd.h
+++ b/include/uapi/linux/signalfd.h
@@ -17,6 +17,11 @@
#define SFD_NONBLOCK O_NONBLOCK
#define SFD_RAW O_DIRECT
+/* Read signals from a shared (process wide) queue */
+#define SFD_SHARED_QUEUE_OFFSET (1LL << 62)
+/* Read signals from a per-thread queue */
+#define SFD_PER_THREAD_QUEUE_OFFSET 1
+
struct signalfd_siginfo {
__u32 ssi_signo;
__s32 ssi_errno;
--
1.7.11.7
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3)
[not found] ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2012-12-28 14:32 ` Oleg Nesterov
[not found] ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2012-12-28 14:32 UTC (permalink / raw)
To: Andrey Vagin, Linus Torvalds
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
Paul E. McKenney, David Howells, Dave Jones, Michael Kerrisk,
Pavel Emelyanov, Cyrill Gorcunov
On 12/28, Andrey Vagin wrote:
>
> pread(fd, buf, size, pos) with non-zero pos returns siginfo-s
> without dequeuing signals.
>
> A sequence number and a queue are encoded in pos.
>
> pos = seq + SFD_*_OFFSET
>
> seq is a sequence number of a signal in a queue.
>
> SFD_PER_THREAD_QUEUE_OFFSET - read signals from a per-thread queue.
> SFD_SHARED_QUEUE_OFFSET - read signals from a shared (process wide) queue.
>
> This functionality is required for checkpointing pending signals.
>
> v2: llseek() can't be used here, because peek_offset/f_pos/whatever
> has to be shared with all processes which have this file opened.
>
> Suppose that the task forks after sys_signalfd(). Now if parent or child
> do llseek this affects them both. This is insane because signalfd is
> "strange" to say at least, fork/dup/etc inherits signalfd_ctx but not
> the" source" of the data. // Oleg Nesterov
I think we should cc Linus.
This patch adds the hack and it makes signalfd even more strange.
Yes, this hack was suggested by me because I can't suggest something
better. But if Linus dislikes this user-visible API it would be better
to get his nack right now.
> +static ssize_t signalfd_peek(struct signalfd_ctx *ctx,
> + siginfo_t *info, loff_t *ppos)
> +{
> + struct sigpending *pending;
> + struct sigqueue *q;
> + loff_t seq;
> + int ret = 0;
> +
> + spin_lock_irq(¤t->sighand->siglock);
> +
> + if (*ppos >= SFD_SHARED_QUEUE_OFFSET) {
> + pending = ¤t->signal->shared_pending;
> + seq = *ppos - SFD_SHARED_QUEUE_OFFSET;
> + } else {
> + pending = ¤t->pending;
> + seq = *ppos - SFD_PER_THREAD_QUEUE_OFFSET;
> + }
You can do this outside of spin_lock_irq().
And I think it would be better to check SFD_PRIVATE_QUEUE_OFFSET too
although this is not strictly necessary. Otherwise this code assumes
that sys_pread() cheks pos >= 0 and SFD_PRIVATE_QUEUE_OFFSET == 1.
> + list_for_each_entry(q, &pending->list, list) {
> + if (sigismember(&ctx->sigmask, q->info.si_signo))
> + continue;
> +
> + if (seq-- == 0) {
> + copy_siginfo(info, &q->info);
> + ret = info->si_signo;
> + break;
> + }
> + }
> +
> + spin_unlock_irq(¤t->sighand->siglock);
> +
> + if (ret)
> + (*ppos)++;
We can change it unconditionally but I won't argue.
> @@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
> }
>
> file->f_flags |= flags & SFD_RAW;
> + file->f_mode |= FMODE_PREAD;
Again, this is not needed or the code was broken by the previous patch.
Given that 2/3 passes O_RDWR to anon_inode_getfile() I think FMODE_PREAD
should be already set. Note OPEN_FMODE(flags) in anon_inode_getfile().
Oleg.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3)
[not found] ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-12-28 15:40 ` Oleg Nesterov
0 siblings, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2012-12-28 15:40 UTC (permalink / raw)
To: Andrey Vagin, Linus Torvalds
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
Paul E. McKenney, David Howells, Dave Jones, Michael Kerrisk,
Pavel Emelyanov, Cyrill Gorcunov
On 12/28, Oleg Nesterov wrote:
>
> On 12/28, Andrey Vagin wrote:
> >
> > @@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
> > }
> >
> > file->f_flags |= flags & SFD_RAW;
> > + file->f_mode |= FMODE_PREAD;
>
> Again, this is not needed or the code was broken by the previous patch.
>
> Given that 2/3 passes O_RDWR to anon_inode_getfile() I think FMODE_PREAD
> should be already set. Note OPEN_FMODE(flags) in anon_inode_getfile().
As you explained in another thread I was wrong, I confused FMODE_PREAD
and FMODE_READ.
Oleg.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
[not found] ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2012-12-28 16:14 ` Oleg Nesterov
2013-01-10 9:47 ` Andrey Wagin
0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2012-12-28 16:14 UTC (permalink / raw)
To: Andrey Vagin, Linus Torvalds
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
Paul E. McKenney, David Howells, Thomas Gleixner, Michael Kerrisk,
Pavel Emelyanov, Cyrill Gorcunov
On 12/28, Andrey Vagin wrote:
>
> signalfd should be called with the flag SFD_RAW for that.
>
> signalfd_siginfo is not full for siginfo with a negative si_code.
> copy_siginfo_to_user() is copied a full siginfo to user-space, if
> si_code is negative. signalfd_copyinfo() doesn't do that and can't be
> expanded, because it has not compatible format with siginfo_t.
>
> Another problem is that a constant __SI_* is removed from si_code.
> It's not a problem for usual applications, because they expect
> a defined type of siginfo (internal logic).
> When we want to dump pending signals, we can't predict a type of
> siginfo, so we should get it from kernel.
>
> The main idea of the raw format is that it should be enough for
> restoring exactly the same siginfo for the current process.
>
> This functionality is required for checkpointing pending signals.
And this should be used along with pread().
I won't argue, but perhaps we should simply postulate that pread()
always dumps siginfo in raw format and avoid the new flag?
> @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
> signotset(&sigmask);
>
> if (ufd == -1) {
> + struct file *file;
> ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> if (!ctx)
> return -ENOMEM;
>
> ctx->sigmask = sigmask;
>
> + ufd = get_unused_fd_flags(flags);
> + if (ufd < 0) {
> + kfree(ctx);
> + goto out;
> + }
> +
> /*
> * When we call this, the initialization must be complete, since
> * anon_inode_getfd() will install the fd.
> */
> - ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx,
> + file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx,
> O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
> - if (ufd < 0)
> + if (IS_ERR(file)) {
> + put_unused_fd(ufd);
> + ufd = PTR_ERR(file);
> kfree(ctx);
> + goto out;
> + }
> +
> + file->f_flags |= flags & SFD_RAW;
> +
> + fd_install(ufd, file);
And this is needed because we want to set more bits in f_flags/f_mode.
I am wondering if we can change anon_inode_getfd/getfile somehow so
that the user can pass more flags...
Or. Currently fops->open() is never used if this file_operations is used
by anon_inode_get*. So perhaps we can change anon_inode_getfile() to call
fops->open() if it is not zero before return?
OK, this is almost off-topic, please ignore.
Oleg.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
2012-12-28 16:14 ` Oleg Nesterov
@ 2013-01-10 9:47 ` Andrey Wagin
2013-01-10 22:45 ` Michael Kerrisk (man-pages)
2013-01-12 18:55 ` Oleg Nesterov
0 siblings, 2 replies; 16+ messages in thread
From: Andrey Wagin @ 2013-01-10 9:47 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Linus Torvalds, linux-kernel, criu, linux-fsdevel, linux-api,
Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner,
Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov
2012/12/28 Oleg Nesterov <oleg@redhat.com>:
> On 12/28, Andrey Vagin wrote:
>>
>> signalfd should be called with the flag SFD_RAW for that.
>>
>> signalfd_siginfo is not full for siginfo with a negative si_code.
>> copy_siginfo_to_user() is copied a full siginfo to user-space, if
>> si_code is negative. signalfd_copyinfo() doesn't do that and can't be
>> expanded, because it has not compatible format with siginfo_t.
>>
>> Another problem is that a constant __SI_* is removed from si_code.
>> It's not a problem for usual applications, because they expect
>> a defined type of siginfo (internal logic).
>> When we want to dump pending signals, we can't predict a type of
>> siginfo, so we should get it from kernel.
>>
>> The main idea of the raw format is that it should be enough for
>> restoring exactly the same siginfo for the current process.
>>
>> This functionality is required for checkpointing pending signals.
>
> And this should be used along with pread().
>
> I won't argue, but perhaps we should simply postulate that pread()
> always dumps siginfo in raw format and avoid the new flag?
I won't argue too:), but I think the flag is more flexiable. The flag
allows to get raw siginfo from read(), it may be useful not only for
checkpoint/restore. pread with offset = 0 is equal to read(), so the
interface with the flag is more clear.
>
>> @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>> signotset(&sigmask);
>>
>> if (ufd == -1) {
>> + struct file *file;
>> ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>> if (!ctx)
>> return -ENOMEM;
>>
>> ctx->sigmask = sigmask;
>>
>> + ufd = get_unused_fd_flags(flags);
>> + if (ufd < 0) {
>> + kfree(ctx);
>> + goto out;
>> + }
>> +
>> /*
>> * When we call this, the initialization must be complete, since
>> * anon_inode_getfd() will install the fd.
>> */
>> - ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx,
>> + file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx,
>> O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
>> - if (ufd < 0)
>> + if (IS_ERR(file)) {
>> + put_unused_fd(ufd);
>> + ufd = PTR_ERR(file);
>> kfree(ctx);
>> + goto out;
>> + }
>> +
>> + file->f_flags |= flags & SFD_RAW;
>> +
>> + fd_install(ufd, file);
>
> And this is needed because we want to set more bits in f_flags/f_mode.
>
> I am wondering if we can change anon_inode_getfd/getfile somehow so
> that the user can pass more flags...
>
> Or. Currently fops->open() is never used if this file_operations is used
> by anon_inode_get*. So perhaps we can change anon_inode_getfile() to call
> fops->open() if it is not zero before return?
>
> OK, this is almost off-topic, please ignore.
>
> Oleg.
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
2013-01-10 9:47 ` Andrey Wagin
@ 2013-01-10 22:45 ` Michael Kerrisk (man-pages)
2013-01-12 18:55 ` Oleg Nesterov
1 sibling, 0 replies; 16+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-10 22:45 UTC (permalink / raw)
To: Andrey Wagin
Cc: Oleg Nesterov, Linus Torvalds, linux-kernel, criu, linux-fsdevel,
linux-api, Alexander Viro, Paul E. McKenney, David Howells,
Thomas Gleixner, Pavel Emelyanov, Cyrill Gorcunov
On Thu, Jan 10, 2013 at 10:47 AM, Andrey Wagin <avagin@gmail.com> wrote:
> 2012/12/28 Oleg Nesterov <oleg@redhat.com>:
>> On 12/28, Andrey Vagin wrote:
>>>
>>> signalfd should be called with the flag SFD_RAW for that.
>>>
>>> signalfd_siginfo is not full for siginfo with a negative si_code.
>>> copy_siginfo_to_user() is copied a full siginfo to user-space, if
>>> si_code is negative. signalfd_copyinfo() doesn't do that and can't be
>>> expanded, because it has not compatible format with siginfo_t.
>>>
>>> Another problem is that a constant __SI_* is removed from si_code.
>>> It's not a problem for usual applications, because they expect
>>> a defined type of siginfo (internal logic).
>>> When we want to dump pending signals, we can't predict a type of
>>> siginfo, so we should get it from kernel.
>>>
>>> The main idea of the raw format is that it should be enough for
>>> restoring exactly the same siginfo for the current process.
>>>
>>> This functionality is required for checkpointing pending signals.
>>
>> And this should be used along with pread().
>>
>> I won't argue, but perhaps we should simply postulate that pread()
>> always dumps siginfo in raw format and avoid the new flag?
>
> I won't argue too:), but I think the flag is more flexiable. The flag
> allows to get raw siginfo from read(), it may be useful not only for
> checkpoint/restore. pread with offset = 0 is equal to read(), so the
> interface with the flag is more clear.
Special casing pread() behavior for signalfd() so that SFD_RAW is not
required seems only to make the ugliness worse...
While I appreciate the need for this API, multiplexing a queue
selector and a positional value into what is normally a byte offset in
pread() is really bizarre. This is ugly and I wonder if something
better is possible, though so far I have no suggestions beyond adding
a new system call.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
2013-01-10 9:47 ` Andrey Wagin
2013-01-10 22:45 ` Michael Kerrisk (man-pages)
@ 2013-01-12 18:55 ` Oleg Nesterov
1 sibling, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2013-01-12 18:55 UTC (permalink / raw)
To: Andrey Wagin
Cc: Linus Torvalds, linux-kernel, criu, linux-fsdevel, linux-api,
Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner,
Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov
On 01/10, Andrey Wagin wrote:
>
> 2012/12/28 Oleg Nesterov <oleg@redhat.com>:
> >>
> >> This functionality is required for checkpointing pending signals.
> >
> > And this should be used along with pread().
> >
> > I won't argue, but perhaps we should simply postulate that pread()
> > always dumps siginfo in raw format and avoid the new flag?
>
> I won't argue too:), but I think the flag is more flexiable. The flag
> allows to get raw siginfo from read(), it may be useful not only for
> checkpoint/restore. pread with offset = 0 is equal to read(), so the
> interface with the flag is more clear.
OK, I agree.
Oleg.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
2013-01-14 16:53 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending " Andrey Vagin
@ 2013-01-14 16:53 ` Andrey Vagin
2013-01-16 20:35 ` Andrew Morton
0 siblings, 1 reply; 16+ messages in thread
From: Andrey Vagin @ 2013-01-14 16:53 UTC (permalink / raw)
To: linux-kernel
Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Alexander Viro,
Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov,
Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov
signalfd should be called with the flag SFD_RAW for that.
signalfd_siginfo is not full for siginfo with a negative si_code.
copy_siginfo_to_user() is copied a full siginfo to user-space, if
si_code is negative. signalfd_copyinfo() doesn't do that and can't be
expanded, because it has not compatible format with siginfo_t.
Another problem is that a constant __SI_* is removed from si_code.
It's not a problem for usual applications, because they expect
a defined type of siginfo (internal logic).
When we want to dump pending signals, we can't predict a type of
siginfo, so we should get it from kernel.
The main idea of the raw format is that it should be enough for
restoring exactly the same siginfo for the current process.
This functionality is required for checkpointing pending signals.
v2: fix a race condition during setting file flags
copy_siginfo_to_user32() if is_compat_task
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
fs/signalfd.c | 64 +++++++++++++++++++++++++++++++++++++++----
include/uapi/linux/signalfd.h | 1 +
2 files changed, 60 insertions(+), 5 deletions(-)
diff --git a/fs/signalfd.c b/fs/signalfd.c
index b534869..4439a81 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -30,6 +30,7 @@
#include <linux/signalfd.h>
#include <linux/syscalls.h>
#include <linux/proc_fs.h>
+#include <linux/compat.h>
void signalfd_cleanup(struct sighand_struct *sighand)
{
@@ -74,6 +75,38 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait)
}
/*
+ * Copy a whole siginfo into users spaces.
+ * The main idea of this format is that it should be enough
+ * for restoring siginfo back into the kernel.
+ */
+static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo,
+ siginfo_t *kinfo)
+{
+ siginfo_t *uinfo = (siginfo_t *) siginfo;
+ int err;
+
+ BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo));
+
+ err = __clear_user(uinfo, sizeof(*uinfo));
+
+#ifdef CONFIG_COMPAT
+ if (unlikely(is_compat_task())) {
+ compat_siginfo_t *compat_uinfo = (compat_siginfo_t *) siginfo;
+
+ err |= copy_siginfo_to_user32(compat_uinfo, kinfo);
+ err |= put_user(kinfo->si_code, &compat_uinfo->si_code);
+
+ return err ? -EFAULT: sizeof(*compat_uinfo);
+ }
+#endif
+
+ err |= copy_siginfo_to_user(uinfo, kinfo);
+ err |= put_user(kinfo->si_code, &uinfo->si_code);
+
+ return err ? -EFAULT: sizeof(*uinfo);
+}
+
+/*
* Copied from copy_siginfo_to_user() in kernel/signal.c
*/
static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
@@ -205,6 +238,7 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
struct signalfd_ctx *ctx = file->private_data;
struct signalfd_siginfo __user *siginfo;
int nonblock = file->f_flags & O_NONBLOCK;
+ bool raw = file->f_flags & SFD_RAW;
ssize_t ret, total = 0;
siginfo_t info;
@@ -217,7 +251,12 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
ret = signalfd_dequeue(ctx, &info, nonblock);
if (unlikely(ret <= 0))
break;
- ret = signalfd_copyinfo(siginfo, &info);
+
+ if (raw)
+ ret = signalfd_copy_raw_info(siginfo, &info);
+ else
+ ret = signalfd_copyinfo(siginfo, &info);
+
if (ret < 0)
break;
siginfo++;
@@ -262,7 +301,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK);
- if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK))
+ if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_RAW))
return -EINVAL;
if (sizemask != sizeof(sigset_t) ||
@@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
signotset(&sigmask);
if (ufd == -1) {
+ struct file *file;
ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
return -ENOMEM;
ctx->sigmask = sigmask;
+ ufd = get_unused_fd_flags(flags);
+ if (ufd < 0) {
+ kfree(ctx);
+ goto out;
+ }
+
/*
* When we call this, the initialization must be complete, since
* anon_inode_getfd() will install the fd.
*/
- ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx,
+ file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx,
O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
- if (ufd < 0)
+ if (IS_ERR(file)) {
+ put_unused_fd(ufd);
+ ufd = PTR_ERR(file);
kfree(ctx);
+ goto out;
+ }
+
+ file->f_flags |= flags & SFD_RAW;
+
+ fd_install(ufd, file);
} else {
struct fd f = fdget(ufd);
if (!f.file)
@@ -302,7 +356,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
wake_up(¤t->sighand->signalfd_wqh);
fdput(f);
}
-
+out:
return ufd;
}
diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h
index 492c6de..bc31849 100644
--- a/include/uapi/linux/signalfd.h
+++ b/include/uapi/linux/signalfd.h
@@ -15,6 +15,7 @@
/* Flags for signalfd4. */
#define SFD_CLOEXEC O_CLOEXEC
#define SFD_NONBLOCK O_NONBLOCK
+#define SFD_RAW O_DIRECT
struct signalfd_siginfo {
__u32 ssi_signo;
--
1.7.11.7
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
2013-01-14 16:53 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin
@ 2013-01-16 20:35 ` Andrew Morton
2013-01-17 15:28 ` Andrew Vagin
[not found] ` <20130116123502.70af6b85.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
0 siblings, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2013-01-16 20:35 UTC (permalink / raw)
To: Andrey Vagin
Cc: linux-kernel, criu, linux-fsdevel, linux-api, Alexander Viro,
Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov,
Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov
On Mon, 14 Jan 2013 20:53:54 +0400
Andrey Vagin <avagin@openvz.org> wrote:
> signalfd should be called with the flag SFD_RAW for that.
>
> signalfd_siginfo is not full for siginfo with a negative si_code.
> copy_siginfo_to_user() is copied a full siginfo to user-space, if
> si_code is negative. signalfd_copyinfo() doesn't do that and can't be
> expanded, because it has not compatible format with siginfo_t.
>
> Another problem is that a constant __SI_* is removed from si_code.
> It's not a problem for usual applications, because they expect
> a defined type of siginfo (internal logic).
> When we want to dump pending signals, we can't predict a type of
> siginfo, so we should get it from kernel.
>
> The main idea of the raw format is that it should be enough for
> restoring exactly the same siginfo for the current process.
>
> This functionality is required for checkpointing pending signals.
>
> ...
>
> --- a/fs/signalfd.c
> +++ b/fs/signalfd.c
> @@ -30,6 +30,7 @@
> #include <linux/signalfd.h>
> #include <linux/syscalls.h>
> #include <linux/proc_fs.h>
> +#include <linux/compat.h>
>
> void signalfd_cleanup(struct sighand_struct *sighand)
> {
> @@ -74,6 +75,38 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait)
> }
>
> /*
> + * Copy a whole siginfo into users spaces.
"userspace"
> + * The main idea of this format is that it should be enough
> + * for restoring siginfo back into the kernel.
> + */
> +static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo,
> + siginfo_t *kinfo)
> +{
> + siginfo_t *uinfo = (siginfo_t *) siginfo;
Should be
siginfo_t __user *uinfo = (siginfo_t __user *)siginfo;
> + int err;
> +
> + BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo));
> +
> + err = __clear_user(uinfo, sizeof(*uinfo));
>
> +#ifdef CONFIG_COMPAT
> + if (unlikely(is_compat_task())) {
> + compat_siginfo_t *compat_uinfo = (compat_siginfo_t *) siginfo;
> +
> + err |= copy_siginfo_to_user32(compat_uinfo, kinfo);
> + err |= put_user(kinfo->si_code, &compat_uinfo->si_code);
> +
> + return err ? -EFAULT: sizeof(*compat_uinfo);
> + }
> +#endif
> +
> + err |= copy_siginfo_to_user(uinfo, kinfo);
> + err |= put_user(kinfo->si_code, &uinfo->si_code);
> +
> + return err ? -EFAULT: sizeof(*uinfo);
> +}
> +
> +/*
> * Copied from copy_siginfo_to_user() in kernel/signal.c
> */
> static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
>
> ...
>
> --- a/include/uapi/linux/signalfd.h
> +++ b/include/uapi/linux/signalfd.h
> @@ -15,6 +15,7 @@
> /* Flags for signalfd4. */
> #define SFD_CLOEXEC O_CLOEXEC
> #define SFD_NONBLOCK O_NONBLOCK
> +#define SFD_RAW O_DIRECT
>
> struct signalfd_siginfo {
> __u32 ssi_signo;
As SFD_RAW is being added to the kernel API we should document it.
Please keep Michael cc'ed and work with him on getting the manpages
updated.
I usually ask that checkpoint-restart specific code be wrapped in
#ifdef CONFIG_CHECKPOINT_RESTORE, mainly so we can identify it all
if/when your project fails and we decide to remove the feature ;) But
as this patch extends the user API I think it simplifies life if we
make the extension permanent. Perhaps this is a bad idea, as
permanently adding this extension to the API makes it harder to ever
remove the c/r feature.
Proposed fixups. Please review and test this and check that sparse is
happy with it.
From: Andrew Morton <akpm@linux-foundation.org>
Subject: signalfd-add-ability-to-return-siginfo-in-a-raw-format-v2-fix
fix __user annotations, tidy comments and code layout
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/signalfd.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff -puN fs/signalfd.c~signalfd-add-ability-to-return-siginfo-in-a-raw-format-v2-fix fs/signalfd.c
--- a/fs/signalfd.c~signalfd-add-ability-to-return-siginfo-in-a-raw-format-v2-fix
+++ a/fs/signalfd.c
@@ -75,14 +75,14 @@ static unsigned int signalfd_poll(struct
}
/*
- * Copy a whole siginfo into users spaces.
+ * Copy a whole siginfo into userspace.
* The main idea of this format is that it should be enough
* for restoring siginfo back into the kernel.
*/
static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo,
siginfo_t *kinfo)
{
- siginfo_t *uinfo = (siginfo_t *) siginfo;
+ siginfo_t __user *uinfo = (siginfo_t __user *)siginfo;
int err;
BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo));
@@ -91,19 +91,20 @@ static int signalfd_copy_raw_info(struct
#ifdef CONFIG_COMPAT
if (unlikely(is_compat_task())) {
- compat_siginfo_t *compat_uinfo = (compat_siginfo_t *) siginfo;
+ compat_siginfo_t __user *compat_uinfo;
+ compat_uinfo = (compat_siginfo_t __user *)siginfo;
err |= copy_siginfo_to_user32(compat_uinfo, kinfo);
err |= put_user(kinfo->si_code, &compat_uinfo->si_code);
- return err ? -EFAULT: sizeof(*compat_uinfo);
+ return err ? -EFAULT : sizeof(*compat_uinfo);
}
#endif
err |= copy_siginfo_to_user(uinfo, kinfo);
err |= put_user(kinfo->si_code, &uinfo->si_code);
- return err ? -EFAULT: sizeof(*uinfo);
+ return err ? -EFAULT : sizeof(*uinfo);
}
/*
_
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
2013-01-16 20:35 ` Andrew Morton
@ 2013-01-17 15:28 ` Andrew Vagin
[not found] ` <20130116123502.70af6b85.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
1 sibling, 0 replies; 16+ messages in thread
From: Andrew Vagin @ 2013-01-17 15:28 UTC (permalink / raw)
To: Andrew Morton
Cc: Andrey Vagin, linux-kernel, criu, linux-fsdevel, linux-api,
Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner,
Oleg Nesterov, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov
On Wed, Jan 16, 2013 at 12:35:02PM -0800, Andrew Morton wrote:
> > --- a/include/uapi/linux/signalfd.h
> > +++ b/include/uapi/linux/signalfd.h
> > @@ -15,6 +15,7 @@
> > /* Flags for signalfd4. */
> > #define SFD_CLOEXEC O_CLOEXEC
> > #define SFD_NONBLOCK O_NONBLOCK
> > +#define SFD_RAW O_DIRECT
> >
> > struct signalfd_siginfo {
> > __u32 ssi_signo;
>
> As SFD_RAW is being added to the kernel API we should document it.
> Please keep Michael cc'ed and work with him on getting the manpages
> updated.
Ok
>
> I usually ask that checkpoint-restart specific code be wrapped in
> #ifdef CONFIG_CHECKPOINT_RESTORE, mainly so we can identify it all
> if/when your project fails and we decide to remove the feature ;) But
> as this patch extends the user API I think it simplifies life if we
> make the extension permanent. Perhaps this is a bad idea, as
> permanently adding this extension to the API makes it harder to ever
> remove the c/r feature.
I think CRIU is already enough functional to be successful ;).
It already can dump/restore a Linux Container with Oracle Data Base,
Apache and a few other deamons.
>
>
> Proposed fixups. Please review and test this and check that sparse is
> happy with it.
>
I have tested this patch. All is ok and sparce are happy with it.
Thank you.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
[not found] ` <20130116123502.70af6b85.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2013-01-18 23:27 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgHVB3=k_XOevobcMWuEqy2r75tdTc85ZYiD8rkn5OZKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-18 23:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Andrey Vagin, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov,
Pavel Emelyanov, Cyrill Gorcunov
> As SFD_RAW is being added to the kernel API we should document it.
> Please keep Michael cc'ed and work with him on getting the manpages
> updated.
The API is certainly no thing of beauty, as I already remarked here:
http://thread.gmane.org/gmane.linux.file-systems/70420/focus=1420228
A description of the API can be found here:
https://lwn.net/Articles/531939/
The main problem is the ugliness of changing the meaning of pread()'s
'offset' argument to be a multiplexed [signal queue selector + queue
position].
I do wonder if we can do better, though I have no particular solution
to offer beyond the sledgehammer of adding a new system call.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
[not found] ` <CAKgNAkgHVB3=k_XOevobcMWuEqy2r75tdTc85ZYiD8rkn5OZKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-19 10:50 ` Andrey Wagin
2013-01-19 23:27 ` Michael Kerrisk (man-pages)
0 siblings, 1 reply; 16+ messages in thread
From: Andrey Wagin @ 2013-01-19 10:50 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov,
Pavel Emelyanov, Cyrill Gorcunov
2013/1/19 Michael Kerrisk (man-pages) <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>> As SFD_RAW is being added to the kernel API we should document it.
>> Please keep Michael cc'ed and work with him on getting the manpages
>> updated.
>
> The API is certainly no thing of beauty, as I already remarked here:
> http://thread.gmane.org/gmane.linux.file-systems/70420/focus=1420228
All of us have similar thoughts, but we don't know a better solution.
Here is an example of usage signalfd and pread:
http://lists.openvz.org/pipermail/criu/2013-January/007040.html
>
> A description of the API can be found here:
> https://lwn.net/Articles/531939/
Thanks you for the attempt to involve more people to this discussion.
>
> The main problem is the ugliness of changing the meaning of pread()'s
> 'offset' argument to be a multiplexed [signal queue selector + queue
> position].
>
> I do wonder if we can do better, though I have no particular solution
> to offer beyond the sledgehammer of adding a new system call.
If we are choosing between this interface or a new syscall, I would
prefer this interface. signalfd is a special descriptor, so I think it
is not a big deal, that it works a bit strange. If all other would
decides, that a new syscall is better, I will not ague.
Michael, thank you for the comments.
>
> Cheers,
>
> Michael
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Author of "The Linux Programming Interface"; http://man7.org/tlpi/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2)
2013-01-19 10:50 ` Andrey Wagin
@ 2013-01-19 23:27 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 16+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-19 23:27 UTC (permalink / raw)
To: Andrey Wagin
Cc: Andrew Morton, linux-kernel, criu, linux-fsdevel, linux-api,
Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner,
Oleg Nesterov, Pavel Emelyanov, Cyrill Gorcunov
Hello Andrey,
On Sat, Jan 19, 2013 at 11:50 AM, Andrey Wagin <avagin@gmail.com> wrote:
> 2013/1/19 Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>:
>>> As SFD_RAW is being added to the kernel API we should document it.
>>> Please keep Michael cc'ed and work with him on getting the manpages
>>> updated.
>>
>> The API is certainly no thing of beauty, as I already remarked here:
>> http://thread.gmane.org/gmane.linux.file-systems/70420/focus=1420228
>
> All of us have similar thoughts, but we don't know a better solution.
>
> Here is an example of usage signalfd and pread:
> http://lists.openvz.org/pipermail/criu/2013-January/007040.html
>
>>
>> A description of the API can be found here:
>> https://lwn.net/Articles/531939/
>
> Thanks you for the attempt to involve more people to this discussion.
>
>>
>> The main problem is the ugliness of changing the meaning of pread()'s
>> 'offset' argument to be a multiplexed [signal queue selector + queue
>> position].
>>
>> I do wonder if we can do better, though I have no particular solution
>> to offer beyond the sledgehammer of adding a new system call.
>
> If we are choosing between this interface or a new syscall, I would
> prefer this interface.
A agree. That's why I called using a new syscall a sledgehammer.
Still, I just wanted to remind folk who might think of something
better.
> signalfd is a special descriptor, so I think it
> is not a big deal, that it works a bit strange.
Sure, but the more we special case things, the uglier the ABI as a
whole becomes. So special casing should be avoided as far as we can.
> If all other would
> decides, that a new syscall is better, I will not ague.
And that's more or less how I see it too. I'm not going to argue for a
new syscall, based on what I know so far.
Here is one idea to think about though, while more or less maintaining
your proposed interface.
At the moment, you select signal queues in the pread() call. An
alternative would be to do it in the signalfd() call. In other words,
you could have the following flags used with signalfd()
SFD_RAW
SFD_SHARED_QUEUE -- reads will be from process-wide shared signal queue
SFD_PER_THREAD_QUEUE --reads will be from per-thread signal queue
Specifying both SFD_SHARED_QUEUE and SFD_PER_THREAD_QUEUE would be
the same as omitting them both, providing the default behavior of
slecting from both queues.
My point here is that you can then separate the RAW functionality from
the queue selector functionality. Now, it might be that at the moment
you always require that if the caller specifies SFD_SHARED_QUEUE or
SFD_PER_THREAD_QUEUE, then they must also specify SFD_RAW. But later,
that constraint might be relaxed, so that users could use signalfd()
to select from a particular queue when reading traditional (non-RAW)
signalfd_siginfo structures from a signalfd. This does seem like a
very sensible design optimization to make now (and an easy one, I
would suppose). What do you think?
Cheers,
Michael
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-01-19 23:27 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin
2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin
2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin
[not found] ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2012-12-28 16:14 ` Oleg Nesterov
2013-01-10 9:47 ` Andrey Wagin
2013-01-10 22:45 ` Michael Kerrisk (man-pages)
2013-01-12 18:55 ` Oleg Nesterov
2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin
[not found] ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2012-12-28 14:32 ` Oleg Nesterov
[not found] ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-12-28 15:40 ` Oleg Nesterov
-- strict thread matches above, loose matches on Subject: below --
2013-01-14 16:53 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending " Andrey Vagin
2013-01-14 16:53 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin
2013-01-16 20:35 ` Andrew Morton
2013-01-17 15:28 ` Andrew Vagin
[not found] ` <20130116123502.70af6b85.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-01-18 23:27 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgHVB3=k_XOevobcMWuEqy2r75tdTc85ZYiD8rkn5OZKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-19 10:50 ` Andrey Wagin
2013-01-19 23:27 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).