* [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) @ 2012-12-28 10:22 Andrey Vagin 2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Andrey Vagin @ 2012-12-28 10:22 UTC (permalink / raw) To: linux-kernel Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Serge Hallyn, Oleg Nesterov, Andrew Morton, Eric W. Biederman, Al Viro, Pavel Emelyanov, Cyrill Gorcunov, Michael Kerrisk The idea is simple. We need to get the siginfo for each signal on dump, and then return it back on restore. The first problem is that the kernel doesn’t report complete siginfo-s in user-space. In a signal handler the kernel strips SI_CODE from siginfo. When a siginfo is received from signalfd, it has a different format with fixed sizes of fields. The interface of signalfd was extended. If a signalfd is created with the flag SFD_RAW, it returns siginfo in a raw format. rt_sigqueueinfo looks suitable for restoring signals, but it can’t send siginfo with a positive si_code, because these codes are reserved for the kernel. In the real world each person has right to do anything with himself, so I think a process should able to send any siginfo to itself. v2: add ability to dump signals without dequeuing them. pread with non-zero offset is used for this. offset encodes a queue (private of shared) and a sequence number of a signal in the queue. Thanks to Oleg for this idea. Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] signal: allow to send any siginfo to itself 2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin @ 2012-12-28 10:22 ` Andrey Vagin 2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin 2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin 2 siblings, 0 replies; 16+ messages in thread From: Andrey Vagin @ 2012-12-28 10:22 UTC (permalink / raw) To: linux-kernel Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Oleg Nesterov, Serge Hallyn, Andrew Morton, Eric W. Biederman, Al Viro, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov A kernel prevents of sending siginfo with positive si_code, because these codes is reserved for kernel. I think we can allow to send any siginfo to itself. This operation should not be dangerous. This functionality is required for restoring signals. Cc: Oleg Nesterov <oleg@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> --- kernel/signal.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 7aaa51d..ac5f5e7 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2944,7 +2944,8 @@ SYSCALL_DEFINE3(rt_sigqueueinfo, pid_t, pid, int, sig, /* Not even root can pretend to send signals from the kernel. * Nor can they impersonate a kill()/tgkill(), which adds source info. */ - if (info.si_code >= 0 || info.si_code == SI_TKILL) { + if (((info.si_code >= 0 || info.si_code == SI_TKILL)) && + (task_pid_vnr(current) != pid)) { /* We used to allow any < 0 si_code */ WARN_ON_ONCE(info.si_code < 0); return -EPERM; @@ -2964,7 +2965,8 @@ long do_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig, siginfo_t *info) /* Not even root can pretend to send signals from the kernel. * Nor can they impersonate a kill()/tgkill(), which adds source info. */ - if (info->si_code >= 0 || info->si_code == SI_TKILL) { + if ((info->si_code >= 0 || info->si_code == SI_TKILL) && + (task_pid_vnr(current) != pid)) { /* We used to allow any < 0 si_code */ WARN_ON_ONCE(info->si_code < 0); return -EPERM; -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin 2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin @ 2012-12-28 10:23 ` Andrey Vagin [not found] ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin 2 siblings, 1 reply; 16+ messages in thread From: Andrey Vagin @ 2012-12-28 10:23 UTC (permalink / raw) To: linux-kernel Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov signalfd should be called with the flag SFD_RAW for that. signalfd_siginfo is not full for siginfo with a negative si_code. copy_siginfo_to_user() is copied a full siginfo to user-space, if si_code is negative. signalfd_copyinfo() doesn't do that and can't be expanded, because it has not compatible format with siginfo_t. Another problem is that a constant __SI_* is removed from si_code. It's not a problem for usual applications, because they expect a defined type of siginfo (internal logic). When we want to dump pending signals, we can't predict a type of siginfo, so we should get it from kernel. The main idea of the raw format is that it should be enough for restoring exactly the same siginfo for the current process. This functionality is required for checkpointing pending signals. v2: fix a race condition during setting file flags copy_siginfo_to_user32() if is_compat_task Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> --- fs/signalfd.c | 64 +++++++++++++++++++++++++++++++++++++++---- include/uapi/linux/signalfd.h | 1 + 2 files changed, 60 insertions(+), 5 deletions(-) diff --git a/fs/signalfd.c b/fs/signalfd.c index b534869..4439a81 100644 --- a/fs/signalfd.c +++ b/fs/signalfd.c @@ -30,6 +30,7 @@ #include <linux/signalfd.h> #include <linux/syscalls.h> #include <linux/proc_fs.h> +#include <linux/compat.h> void signalfd_cleanup(struct sighand_struct *sighand) { @@ -74,6 +75,38 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait) } /* + * Copy a whole siginfo into users spaces. + * The main idea of this format is that it should be enough + * for restoring siginfo back into the kernel. + */ +static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo, + siginfo_t *kinfo) +{ + siginfo_t *uinfo = (siginfo_t *) siginfo; + int err; + + BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo)); + + err = __clear_user(uinfo, sizeof(*uinfo)); + +#ifdef CONFIG_COMPAT + if (unlikely(is_compat_task())) { + compat_siginfo_t *compat_uinfo = (compat_siginfo_t *) siginfo; + + err |= copy_siginfo_to_user32(compat_uinfo, kinfo); + err |= put_user(kinfo->si_code, &compat_uinfo->si_code); + + return err ? -EFAULT: sizeof(*compat_uinfo); + } +#endif + + err |= copy_siginfo_to_user(uinfo, kinfo); + err |= put_user(kinfo->si_code, &uinfo->si_code); + + return err ? -EFAULT: sizeof(*uinfo); +} + +/* * Copied from copy_siginfo_to_user() in kernel/signal.c */ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo, @@ -205,6 +238,7 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, struct signalfd_ctx *ctx = file->private_data; struct signalfd_siginfo __user *siginfo; int nonblock = file->f_flags & O_NONBLOCK; + bool raw = file->f_flags & SFD_RAW; ssize_t ret, total = 0; siginfo_t info; @@ -217,7 +251,12 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, ret = signalfd_dequeue(ctx, &info, nonblock); if (unlikely(ret <= 0)) break; - ret = signalfd_copyinfo(siginfo, &info); + + if (raw) + ret = signalfd_copy_raw_info(siginfo, &info); + else + ret = signalfd_copyinfo(siginfo, &info); + if (ret < 0) break; siginfo++; @@ -262,7 +301,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK); - if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK)) + if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_RAW)) return -EINVAL; if (sizemask != sizeof(sigset_t) || @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, signotset(&sigmask); if (ufd == -1) { + struct file *file; ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) return -ENOMEM; ctx->sigmask = sigmask; + ufd = get_unused_fd_flags(flags); + if (ufd < 0) { + kfree(ctx); + goto out; + } + /* * When we call this, the initialization must be complete, since * anon_inode_getfd() will install the fd. */ - ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx, + file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx, O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK))); - if (ufd < 0) + if (IS_ERR(file)) { + put_unused_fd(ufd); + ufd = PTR_ERR(file); kfree(ctx); + goto out; + } + + file->f_flags |= flags & SFD_RAW; + + fd_install(ufd, file); } else { struct fd f = fdget(ufd); if (!f.file) @@ -302,7 +356,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, wake_up(¤t->sighand->signalfd_wqh); fdput(f); } - +out: return ufd; } diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h index 492c6de..bc31849 100644 --- a/include/uapi/linux/signalfd.h +++ b/include/uapi/linux/signalfd.h @@ -15,6 +15,7 @@ /* Flags for signalfd4. */ #define SFD_CLOEXEC O_CLOEXEC #define SFD_NONBLOCK O_NONBLOCK +#define SFD_RAW O_DIRECT struct signalfd_siginfo { __u32 ssi_signo; -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 16+ messages in thread
[parent not found: <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) [not found] ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2012-12-28 16:14 ` Oleg Nesterov 2013-01-10 9:47 ` Andrey Wagin 0 siblings, 1 reply; 16+ messages in thread From: Oleg Nesterov @ 2012-12-28 16:14 UTC (permalink / raw) To: Andrey Vagin, Linus Torvalds Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov On 12/28, Andrey Vagin wrote: > > signalfd should be called with the flag SFD_RAW for that. > > signalfd_siginfo is not full for siginfo with a negative si_code. > copy_siginfo_to_user() is copied a full siginfo to user-space, if > si_code is negative. signalfd_copyinfo() doesn't do that and can't be > expanded, because it has not compatible format with siginfo_t. > > Another problem is that a constant __SI_* is removed from si_code. > It's not a problem for usual applications, because they expect > a defined type of siginfo (internal logic). > When we want to dump pending signals, we can't predict a type of > siginfo, so we should get it from kernel. > > The main idea of the raw format is that it should be enough for > restoring exactly the same siginfo for the current process. > > This functionality is required for checkpointing pending signals. And this should be used along with pread(). I won't argue, but perhaps we should simply postulate that pread() always dumps siginfo in raw format and avoid the new flag? > @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, > signotset(&sigmask); > > if (ufd == -1) { > + struct file *file; > ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); > if (!ctx) > return -ENOMEM; > > ctx->sigmask = sigmask; > > + ufd = get_unused_fd_flags(flags); > + if (ufd < 0) { > + kfree(ctx); > + goto out; > + } > + > /* > * When we call this, the initialization must be complete, since > * anon_inode_getfd() will install the fd. > */ > - ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx, > + file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx, > O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK))); > - if (ufd < 0) > + if (IS_ERR(file)) { > + put_unused_fd(ufd); > + ufd = PTR_ERR(file); > kfree(ctx); > + goto out; > + } > + > + file->f_flags |= flags & SFD_RAW; > + > + fd_install(ufd, file); And this is needed because we want to set more bits in f_flags/f_mode. I am wondering if we can change anon_inode_getfd/getfile somehow so that the user can pass more flags... Or. Currently fops->open() is never used if this file_operations is used by anon_inode_get*. So perhaps we can change anon_inode_getfile() to call fops->open() if it is not zero before return? OK, this is almost off-topic, please ignore. Oleg. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2012-12-28 16:14 ` Oleg Nesterov @ 2013-01-10 9:47 ` Andrey Wagin 2013-01-10 22:45 ` Michael Kerrisk (man-pages) 2013-01-12 18:55 ` Oleg Nesterov 0 siblings, 2 replies; 16+ messages in thread From: Andrey Wagin @ 2013-01-10 9:47 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, linux-kernel, criu, linux-fsdevel, linux-api, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov 2012/12/28 Oleg Nesterov <oleg@redhat.com>: > On 12/28, Andrey Vagin wrote: >> >> signalfd should be called with the flag SFD_RAW for that. >> >> signalfd_siginfo is not full for siginfo with a negative si_code. >> copy_siginfo_to_user() is copied a full siginfo to user-space, if >> si_code is negative. signalfd_copyinfo() doesn't do that and can't be >> expanded, because it has not compatible format with siginfo_t. >> >> Another problem is that a constant __SI_* is removed from si_code. >> It's not a problem for usual applications, because they expect >> a defined type of siginfo (internal logic). >> When we want to dump pending signals, we can't predict a type of >> siginfo, so we should get it from kernel. >> >> The main idea of the raw format is that it should be enough for >> restoring exactly the same siginfo for the current process. >> >> This functionality is required for checkpointing pending signals. > > And this should be used along with pread(). > > I won't argue, but perhaps we should simply postulate that pread() > always dumps siginfo in raw format and avoid the new flag? I won't argue too:), but I think the flag is more flexiable. The flag allows to get raw siginfo from read(), it may be useful not only for checkpoint/restore. pread with offset = 0 is equal to read(), so the interface with the flag is more clear. > >> @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, >> signotset(&sigmask); >> >> if (ufd == -1) { >> + struct file *file; >> ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); >> if (!ctx) >> return -ENOMEM; >> >> ctx->sigmask = sigmask; >> >> + ufd = get_unused_fd_flags(flags); >> + if (ufd < 0) { >> + kfree(ctx); >> + goto out; >> + } >> + >> /* >> * When we call this, the initialization must be complete, since >> * anon_inode_getfd() will install the fd. >> */ >> - ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx, >> + file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx, >> O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK))); >> - if (ufd < 0) >> + if (IS_ERR(file)) { >> + put_unused_fd(ufd); >> + ufd = PTR_ERR(file); >> kfree(ctx); >> + goto out; >> + } >> + >> + file->f_flags |= flags & SFD_RAW; >> + >> + fd_install(ufd, file); > > And this is needed because we want to set more bits in f_flags/f_mode. > > I am wondering if we can change anon_inode_getfd/getfile somehow so > that the user can pass more flags... > > Or. Currently fops->open() is never used if this file_operations is used > by anon_inode_get*. So perhaps we can change anon_inode_getfile() to call > fops->open() if it is not zero before return? > > OK, this is almost off-topic, please ignore. > > Oleg. > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2013-01-10 9:47 ` Andrey Wagin @ 2013-01-10 22:45 ` Michael Kerrisk (man-pages) 2013-01-12 18:55 ` Oleg Nesterov 1 sibling, 0 replies; 16+ messages in thread From: Michael Kerrisk (man-pages) @ 2013-01-10 22:45 UTC (permalink / raw) To: Andrey Wagin Cc: Oleg Nesterov, Linus Torvalds, linux-kernel, criu, linux-fsdevel, linux-api, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Pavel Emelyanov, Cyrill Gorcunov On Thu, Jan 10, 2013 at 10:47 AM, Andrey Wagin <avagin@gmail.com> wrote: > 2012/12/28 Oleg Nesterov <oleg@redhat.com>: >> On 12/28, Andrey Vagin wrote: >>> >>> signalfd should be called with the flag SFD_RAW for that. >>> >>> signalfd_siginfo is not full for siginfo with a negative si_code. >>> copy_siginfo_to_user() is copied a full siginfo to user-space, if >>> si_code is negative. signalfd_copyinfo() doesn't do that and can't be >>> expanded, because it has not compatible format with siginfo_t. >>> >>> Another problem is that a constant __SI_* is removed from si_code. >>> It's not a problem for usual applications, because they expect >>> a defined type of siginfo (internal logic). >>> When we want to dump pending signals, we can't predict a type of >>> siginfo, so we should get it from kernel. >>> >>> The main idea of the raw format is that it should be enough for >>> restoring exactly the same siginfo for the current process. >>> >>> This functionality is required for checkpointing pending signals. >> >> And this should be used along with pread(). >> >> I won't argue, but perhaps we should simply postulate that pread() >> always dumps siginfo in raw format and avoid the new flag? > > I won't argue too:), but I think the flag is more flexiable. The flag > allows to get raw siginfo from read(), it may be useful not only for > checkpoint/restore. pread with offset = 0 is equal to read(), so the > interface with the flag is more clear. Special casing pread() behavior for signalfd() so that SFD_RAW is not required seems only to make the ugliness worse... While I appreciate the need for this API, multiplexing a queue selector and a positional value into what is normally a byte offset in pread() is really bizarre. This is ugly and I wonder if something better is possible, though so far I have no suggestions beyond adding a new system call. -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface"; http://man7.org/tlpi/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2013-01-10 9:47 ` Andrey Wagin 2013-01-10 22:45 ` Michael Kerrisk (man-pages) @ 2013-01-12 18:55 ` Oleg Nesterov 1 sibling, 0 replies; 16+ messages in thread From: Oleg Nesterov @ 2013-01-12 18:55 UTC (permalink / raw) To: Andrey Wagin Cc: Linus Torvalds, linux-kernel, criu, linux-fsdevel, linux-api, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov On 01/10, Andrey Wagin wrote: > > 2012/12/28 Oleg Nesterov <oleg@redhat.com>: > >> > >> This functionality is required for checkpointing pending signals. > > > > And this should be used along with pread(). > > > > I won't argue, but perhaps we should simply postulate that pread() > > always dumps siginfo in raw format and avoid the new flag? > > I won't argue too:), but I think the flag is more flexiable. The flag > allows to get raw siginfo from read(), it may be useful not only for > checkpoint/restore. pread with offset = 0 is equal to read(), so the > interface with the flag is more clear. OK, I agree. Oleg. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) 2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin 2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin 2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin @ 2012-12-28 10:23 ` Andrey Vagin [not found] ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2 siblings, 1 reply; 16+ messages in thread From: Andrey Vagin @ 2012-12-28 10:23 UTC (permalink / raw) To: linux-kernel Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Oleg Nesterov, Alexander Viro, Paul E. McKenney, David Howells, Dave Jones, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov pread(fd, buf, size, pos) with non-zero pos returns siginfo-s without dequeuing signals. A sequence number and a queue are encoded in pos. pos = seq + SFD_*_OFFSET seq is a sequence number of a signal in a queue. SFD_PER_THREAD_QUEUE_OFFSET - read signals from a per-thread queue. SFD_SHARED_QUEUE_OFFSET - read signals from a shared (process wide) queue. This functionality is required for checkpointing pending signals. v2: llseek() can't be used here, because peek_offset/f_pos/whatever has to be shared with all processes which have this file opened. Suppose that the task forks after sys_signalfd(). Now if parent or child do llseek this affects them both. This is insane because signalfd is "strange" to say at least, fork/dup/etc inherits signalfd_ctx but not the" source" of the data. // Oleg Nesterov v3: minor cleanup Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> --- fs/signalfd.c | 44 ++++++++++++++++++++++++++++++++++++++++++- include/uapi/linux/signalfd.h | 5 +++++ 2 files changed, 48 insertions(+), 1 deletion(-) diff --git a/fs/signalfd.c b/fs/signalfd.c index 4439a81..86eb130 100644 --- a/fs/signalfd.c +++ b/fs/signalfd.c @@ -51,6 +51,43 @@ struct signalfd_ctx { sigset_t sigmask; }; +static ssize_t signalfd_peek(struct signalfd_ctx *ctx, + siginfo_t *info, loff_t *ppos) +{ + struct sigpending *pending; + struct sigqueue *q; + loff_t seq; + int ret = 0; + + spin_lock_irq(¤t->sighand->siglock); + + if (*ppos >= SFD_SHARED_QUEUE_OFFSET) { + pending = ¤t->signal->shared_pending; + seq = *ppos - SFD_SHARED_QUEUE_OFFSET; + } else { + pending = ¤t->pending; + seq = *ppos - SFD_PER_THREAD_QUEUE_OFFSET; + } + + list_for_each_entry(q, &pending->list, list) { + if (sigismember(&ctx->sigmask, q->info.si_signo)) + continue; + + if (seq-- == 0) { + copy_siginfo(info, &q->info); + ret = info->si_signo; + break; + } + } + + spin_unlock_irq(¤t->sighand->siglock); + + if (ret) + (*ppos)++; + + return ret; +} + static int signalfd_release(struct inode *inode, struct file *file) { kfree(file->private_data); @@ -248,7 +285,11 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, siginfo = (struct signalfd_siginfo __user *) buf; do { - ret = signalfd_dequeue(ctx, &info, nonblock); + if (*ppos == 0) + ret = signalfd_dequeue(ctx, &info, nonblock); + else + ret = signalfd_peek(ctx, &info, ppos); + if (unlikely(ret <= 0)) break; @@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, } file->f_flags |= flags & SFD_RAW; + file->f_mode |= FMODE_PREAD; fd_install(ufd, file); } else { diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h index bc31849..0953785 100644 --- a/include/uapi/linux/signalfd.h +++ b/include/uapi/linux/signalfd.h @@ -17,6 +17,11 @@ #define SFD_NONBLOCK O_NONBLOCK #define SFD_RAW O_DIRECT +/* Read signals from a shared (process wide) queue */ +#define SFD_SHARED_QUEUE_OFFSET (1LL << 62) +/* Read signals from a per-thread queue */ +#define SFD_PER_THREAD_QUEUE_OFFSET 1 + struct signalfd_siginfo { __u32 ssi_signo; __s32 ssi_errno; -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 16+ messages in thread
[parent not found: <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) [not found] ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2012-12-28 14:32 ` Oleg Nesterov [not found] ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Oleg Nesterov @ 2012-12-28 14:32 UTC (permalink / raw) To: Andrey Vagin, Linus Torvalds Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro, Paul E. McKenney, David Howells, Dave Jones, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov On 12/28, Andrey Vagin wrote: > > pread(fd, buf, size, pos) with non-zero pos returns siginfo-s > without dequeuing signals. > > A sequence number and a queue are encoded in pos. > > pos = seq + SFD_*_OFFSET > > seq is a sequence number of a signal in a queue. > > SFD_PER_THREAD_QUEUE_OFFSET - read signals from a per-thread queue. > SFD_SHARED_QUEUE_OFFSET - read signals from a shared (process wide) queue. > > This functionality is required for checkpointing pending signals. > > v2: llseek() can't be used here, because peek_offset/f_pos/whatever > has to be shared with all processes which have this file opened. > > Suppose that the task forks after sys_signalfd(). Now if parent or child > do llseek this affects them both. This is insane because signalfd is > "strange" to say at least, fork/dup/etc inherits signalfd_ctx but not > the" source" of the data. // Oleg Nesterov I think we should cc Linus. This patch adds the hack and it makes signalfd even more strange. Yes, this hack was suggested by me because I can't suggest something better. But if Linus dislikes this user-visible API it would be better to get his nack right now. > +static ssize_t signalfd_peek(struct signalfd_ctx *ctx, > + siginfo_t *info, loff_t *ppos) > +{ > + struct sigpending *pending; > + struct sigqueue *q; > + loff_t seq; > + int ret = 0; > + > + spin_lock_irq(¤t->sighand->siglock); > + > + if (*ppos >= SFD_SHARED_QUEUE_OFFSET) { > + pending = ¤t->signal->shared_pending; > + seq = *ppos - SFD_SHARED_QUEUE_OFFSET; > + } else { > + pending = ¤t->pending; > + seq = *ppos - SFD_PER_THREAD_QUEUE_OFFSET; > + } You can do this outside of spin_lock_irq(). And I think it would be better to check SFD_PRIVATE_QUEUE_OFFSET too although this is not strictly necessary. Otherwise this code assumes that sys_pread() cheks pos >= 0 and SFD_PRIVATE_QUEUE_OFFSET == 1. > + list_for_each_entry(q, &pending->list, list) { > + if (sigismember(&ctx->sigmask, q->info.si_signo)) > + continue; > + > + if (seq-- == 0) { > + copy_siginfo(info, &q->info); > + ret = info->si_signo; > + break; > + } > + } > + > + spin_unlock_irq(¤t->sighand->siglock); > + > + if (ret) > + (*ppos)++; We can change it unconditionally but I won't argue. > @@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, > } > > file->f_flags |= flags & SFD_RAW; > + file->f_mode |= FMODE_PREAD; Again, this is not needed or the code was broken by the previous patch. Given that 2/3 passes O_RDWR to anon_inode_getfile() I think FMODE_PREAD should be already set. Note OPEN_FMODE(flags) in anon_inode_getfile(). Oleg. ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) [not found] ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2012-12-28 15:40 ` Oleg Nesterov 0 siblings, 0 replies; 16+ messages in thread From: Oleg Nesterov @ 2012-12-28 15:40 UTC (permalink / raw) To: Andrey Vagin, Linus Torvalds Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro, Paul E. McKenney, David Howells, Dave Jones, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov On 12/28, Oleg Nesterov wrote: > > On 12/28, Andrey Vagin wrote: > > > > @@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, > > } > > > > file->f_flags |= flags & SFD_RAW; > > + file->f_mode |= FMODE_PREAD; > > Again, this is not needed or the code was broken by the previous patch. > > Given that 2/3 passes O_RDWR to anon_inode_getfile() I think FMODE_PREAD > should be already set. Note OPEN_FMODE(flags) in anon_inode_getfile(). As you explained in another thread I was wrong, I confused FMODE_PREAD and FMODE_READ. Oleg. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v3) @ 2013-01-14 16:53 Andrey Vagin 2013-01-14 16:53 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin 0 siblings, 1 reply; 16+ messages in thread From: Andrey Vagin @ 2013-01-14 16:53 UTC (permalink / raw) To: linux-kernel Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Serge Hallyn, Oleg Nesterov, Andrew Morton, Eric W. Biederman, Al Viro, Pavel Emelyanov, Cyrill Gorcunov, Michael Kerrisk The idea is simple. We need to get the siginfo for each signal on dump, and then return it back on restore. The first problem is that the kernel doesn’t report complete siginfo-s in user-space. In a signal handler the kernel strips SI_CODE from siginfo. When a siginfo is received from signalfd, it has a different format with fixed sizes of fields. The interface of signalfd was extended. If a signalfd is created with the flag SFD_RAW, it returns siginfo in a raw format. rt_sigqueueinfo looks suitable for restoring signals, but it can’t send siginfo with a positive si_code, because these codes are reserved for the kernel. In the real world each person has right to do anything with himself, so I think a process should able to send any siginfo to itself. v2: add ability to dump signals without dequeuing them. pread with non-zero offset is used for this. offset encodes a queue (private of shared) and a sequence number of a signal in the queue. Thanks to Oleg for this idea. v3: minor cleanups Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Andrey Vagin (3): signal: allow to send any siginfo to itself signalfd: add ability to return siginfo in a raw format (v2) signalfd: add ability to read siginfo-s without dequeuing signals (v4) fs/signalfd.c | 109 +++++++++++++++++++++++++++++++++++++++--- include/uapi/linux/signalfd.h | 6 +++ kernel/signal.c | 6 ++- 3 files changed, 113 insertions(+), 8 deletions(-) -- 1.7.11.7 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2013-01-14 16:53 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending " Andrey Vagin @ 2013-01-14 16:53 ` Andrey Vagin 2013-01-16 20:35 ` Andrew Morton 0 siblings, 1 reply; 16+ messages in thread From: Andrey Vagin @ 2013-01-14 16:53 UTC (permalink / raw) To: linux-kernel Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov signalfd should be called with the flag SFD_RAW for that. signalfd_siginfo is not full for siginfo with a negative si_code. copy_siginfo_to_user() is copied a full siginfo to user-space, if si_code is negative. signalfd_copyinfo() doesn't do that and can't be expanded, because it has not compatible format with siginfo_t. Another problem is that a constant __SI_* is removed from si_code. It's not a problem for usual applications, because they expect a defined type of siginfo (internal logic). When we want to dump pending signals, we can't predict a type of siginfo, so we should get it from kernel. The main idea of the raw format is that it should be enough for restoring exactly the same siginfo for the current process. This functionality is required for checkpointing pending signals. v2: fix a race condition during setting file flags copy_siginfo_to_user32() if is_compat_task Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> --- fs/signalfd.c | 64 +++++++++++++++++++++++++++++++++++++++---- include/uapi/linux/signalfd.h | 1 + 2 files changed, 60 insertions(+), 5 deletions(-) diff --git a/fs/signalfd.c b/fs/signalfd.c index b534869..4439a81 100644 --- a/fs/signalfd.c +++ b/fs/signalfd.c @@ -30,6 +30,7 @@ #include <linux/signalfd.h> #include <linux/syscalls.h> #include <linux/proc_fs.h> +#include <linux/compat.h> void signalfd_cleanup(struct sighand_struct *sighand) { @@ -74,6 +75,38 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait) } /* + * Copy a whole siginfo into users spaces. + * The main idea of this format is that it should be enough + * for restoring siginfo back into the kernel. + */ +static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo, + siginfo_t *kinfo) +{ + siginfo_t *uinfo = (siginfo_t *) siginfo; + int err; + + BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo)); + + err = __clear_user(uinfo, sizeof(*uinfo)); + +#ifdef CONFIG_COMPAT + if (unlikely(is_compat_task())) { + compat_siginfo_t *compat_uinfo = (compat_siginfo_t *) siginfo; + + err |= copy_siginfo_to_user32(compat_uinfo, kinfo); + err |= put_user(kinfo->si_code, &compat_uinfo->si_code); + + return err ? -EFAULT: sizeof(*compat_uinfo); + } +#endif + + err |= copy_siginfo_to_user(uinfo, kinfo); + err |= put_user(kinfo->si_code, &uinfo->si_code); + + return err ? -EFAULT: sizeof(*uinfo); +} + +/* * Copied from copy_siginfo_to_user() in kernel/signal.c */ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo, @@ -205,6 +238,7 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, struct signalfd_ctx *ctx = file->private_data; struct signalfd_siginfo __user *siginfo; int nonblock = file->f_flags & O_NONBLOCK; + bool raw = file->f_flags & SFD_RAW; ssize_t ret, total = 0; siginfo_t info; @@ -217,7 +251,12 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, ret = signalfd_dequeue(ctx, &info, nonblock); if (unlikely(ret <= 0)) break; - ret = signalfd_copyinfo(siginfo, &info); + + if (raw) + ret = signalfd_copy_raw_info(siginfo, &info); + else + ret = signalfd_copyinfo(siginfo, &info); + if (ret < 0) break; siginfo++; @@ -262,7 +301,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK); - if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK)) + if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_RAW)) return -EINVAL; if (sizemask != sizeof(sigset_t) || @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, signotset(&sigmask); if (ufd == -1) { + struct file *file; ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) return -ENOMEM; ctx->sigmask = sigmask; + ufd = get_unused_fd_flags(flags); + if (ufd < 0) { + kfree(ctx); + goto out; + } + /* * When we call this, the initialization must be complete, since * anon_inode_getfd() will install the fd. */ - ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx, + file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx, O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK))); - if (ufd < 0) + if (IS_ERR(file)) { + put_unused_fd(ufd); + ufd = PTR_ERR(file); kfree(ctx); + goto out; + } + + file->f_flags |= flags & SFD_RAW; + + fd_install(ufd, file); } else { struct fd f = fdget(ufd); if (!f.file) @@ -302,7 +356,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, wake_up(¤t->sighand->signalfd_wqh); fdput(f); } - +out: return ufd; } diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h index 492c6de..bc31849 100644 --- a/include/uapi/linux/signalfd.h +++ b/include/uapi/linux/signalfd.h @@ -15,6 +15,7 @@ /* Flags for signalfd4. */ #define SFD_CLOEXEC O_CLOEXEC #define SFD_NONBLOCK O_NONBLOCK +#define SFD_RAW O_DIRECT struct signalfd_siginfo { __u32 ssi_signo; -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2013-01-14 16:53 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin @ 2013-01-16 20:35 ` Andrew Morton 2013-01-17 15:28 ` Andrew Vagin [not found] ` <20130116123502.70af6b85.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> 0 siblings, 2 replies; 16+ messages in thread From: Andrew Morton @ 2013-01-16 20:35 UTC (permalink / raw) To: Andrey Vagin Cc: linux-kernel, criu, linux-fsdevel, linux-api, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov, Michael Kerrisk On Mon, 14 Jan 2013 20:53:54 +0400 Andrey Vagin <avagin@openvz.org> wrote: > signalfd should be called with the flag SFD_RAW for that. > > signalfd_siginfo is not full for siginfo with a negative si_code. > copy_siginfo_to_user() is copied a full siginfo to user-space, if > si_code is negative. signalfd_copyinfo() doesn't do that and can't be > expanded, because it has not compatible format with siginfo_t. > > Another problem is that a constant __SI_* is removed from si_code. > It's not a problem for usual applications, because they expect > a defined type of siginfo (internal logic). > When we want to dump pending signals, we can't predict a type of > siginfo, so we should get it from kernel. > > The main idea of the raw format is that it should be enough for > restoring exactly the same siginfo for the current process. > > This functionality is required for checkpointing pending signals. > > ... > > --- a/fs/signalfd.c > +++ b/fs/signalfd.c > @@ -30,6 +30,7 @@ > #include <linux/signalfd.h> > #include <linux/syscalls.h> > #include <linux/proc_fs.h> > +#include <linux/compat.h> > > void signalfd_cleanup(struct sighand_struct *sighand) > { > @@ -74,6 +75,38 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait) > } > > /* > + * Copy a whole siginfo into users spaces. "userspace" > + * The main idea of this format is that it should be enough > + * for restoring siginfo back into the kernel. > + */ > +static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo, > + siginfo_t *kinfo) > +{ > + siginfo_t *uinfo = (siginfo_t *) siginfo; Should be siginfo_t __user *uinfo = (siginfo_t __user *)siginfo; > + int err; > + > + BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo)); > + > + err = __clear_user(uinfo, sizeof(*uinfo)); > > +#ifdef CONFIG_COMPAT > + if (unlikely(is_compat_task())) { > + compat_siginfo_t *compat_uinfo = (compat_siginfo_t *) siginfo; > + > + err |= copy_siginfo_to_user32(compat_uinfo, kinfo); > + err |= put_user(kinfo->si_code, &compat_uinfo->si_code); > + > + return err ? -EFAULT: sizeof(*compat_uinfo); > + } > +#endif > + > + err |= copy_siginfo_to_user(uinfo, kinfo); > + err |= put_user(kinfo->si_code, &uinfo->si_code); > + > + return err ? -EFAULT: sizeof(*uinfo); > +} > + > +/* > * Copied from copy_siginfo_to_user() in kernel/signal.c > */ > static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo, > > ... > > --- a/include/uapi/linux/signalfd.h > +++ b/include/uapi/linux/signalfd.h > @@ -15,6 +15,7 @@ > /* Flags for signalfd4. */ > #define SFD_CLOEXEC O_CLOEXEC > #define SFD_NONBLOCK O_NONBLOCK > +#define SFD_RAW O_DIRECT > > struct signalfd_siginfo { > __u32 ssi_signo; As SFD_RAW is being added to the kernel API we should document it. Please keep Michael cc'ed and work with him on getting the manpages updated. I usually ask that checkpoint-restart specific code be wrapped in #ifdef CONFIG_CHECKPOINT_RESTORE, mainly so we can identify it all if/when your project fails and we decide to remove the feature ;) But as this patch extends the user API I think it simplifies life if we make the extension permanent. Perhaps this is a bad idea, as permanently adding this extension to the API makes it harder to ever remove the c/r feature. Proposed fixups. Please review and test this and check that sparse is happy with it. From: Andrew Morton <akpm@linux-foundation.org> Subject: signalfd-add-ability-to-return-siginfo-in-a-raw-format-v2-fix fix __user annotations, tidy comments and code layout Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrey Vagin <avagin@openvz.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: David Howells <dhowells@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/signalfd.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff -puN fs/signalfd.c~signalfd-add-ability-to-return-siginfo-in-a-raw-format-v2-fix fs/signalfd.c --- a/fs/signalfd.c~signalfd-add-ability-to-return-siginfo-in-a-raw-format-v2-fix +++ a/fs/signalfd.c @@ -75,14 +75,14 @@ static unsigned int signalfd_poll(struct } /* - * Copy a whole siginfo into users spaces. + * Copy a whole siginfo into userspace. * The main idea of this format is that it should be enough * for restoring siginfo back into the kernel. */ static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo, siginfo_t *kinfo) { - siginfo_t *uinfo = (siginfo_t *) siginfo; + siginfo_t __user *uinfo = (siginfo_t __user *)siginfo; int err; BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo)); @@ -91,19 +91,20 @@ static int signalfd_copy_raw_info(struct #ifdef CONFIG_COMPAT if (unlikely(is_compat_task())) { - compat_siginfo_t *compat_uinfo = (compat_siginfo_t *) siginfo; + compat_siginfo_t __user *compat_uinfo; + compat_uinfo = (compat_siginfo_t __user *)siginfo; err |= copy_siginfo_to_user32(compat_uinfo, kinfo); err |= put_user(kinfo->si_code, &compat_uinfo->si_code); - return err ? -EFAULT: sizeof(*compat_uinfo); + return err ? -EFAULT : sizeof(*compat_uinfo); } #endif err |= copy_siginfo_to_user(uinfo, kinfo); err |= put_user(kinfo->si_code, &uinfo->si_code); - return err ? -EFAULT: sizeof(*uinfo); + return err ? -EFAULT : sizeof(*uinfo); } /* _ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2013-01-16 20:35 ` Andrew Morton @ 2013-01-17 15:28 ` Andrew Vagin [not found] ` <20130116123502.70af6b85.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> 1 sibling, 0 replies; 16+ messages in thread From: Andrew Vagin @ 2013-01-17 15:28 UTC (permalink / raw) To: Andrew Morton Cc: Andrey Vagin, linux-kernel, criu, linux-fsdevel, linux-api, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov On Wed, Jan 16, 2013 at 12:35:02PM -0800, Andrew Morton wrote: > > --- a/include/uapi/linux/signalfd.h > > +++ b/include/uapi/linux/signalfd.h > > @@ -15,6 +15,7 @@ > > /* Flags for signalfd4. */ > > #define SFD_CLOEXEC O_CLOEXEC > > #define SFD_NONBLOCK O_NONBLOCK > > +#define SFD_RAW O_DIRECT > > > > struct signalfd_siginfo { > > __u32 ssi_signo; > > As SFD_RAW is being added to the kernel API we should document it. > Please keep Michael cc'ed and work with him on getting the manpages > updated. Ok > > I usually ask that checkpoint-restart specific code be wrapped in > #ifdef CONFIG_CHECKPOINT_RESTORE, mainly so we can identify it all > if/when your project fails and we decide to remove the feature ;) But > as this patch extends the user API I think it simplifies life if we > make the extension permanent. Perhaps this is a bad idea, as > permanently adding this extension to the API makes it harder to ever > remove the c/r feature. I think CRIU is already enough functional to be successful ;). It already can dump/restore a Linux Container with Oracle Data Base, Apache and a few other deamons. > > > Proposed fixups. Please review and test this and check that sparse is > happy with it. > I have tested this patch. All is ok and sparce are happy with it. Thank you. ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20130116123502.70af6b85.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>]
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) [not found] ` <20130116123502.70af6b85.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> @ 2013-01-18 23:27 ` Michael Kerrisk (man-pages) [not found] ` <CAKgNAkgHVB3=k_XOevobcMWuEqy2r75tdTc85ZYiD8rkn5OZKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Michael Kerrisk (man-pages) @ 2013-01-18 23:27 UTC (permalink / raw) To: Andrew Morton Cc: Andrey Vagin, linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov, Pavel Emelyanov, Cyrill Gorcunov > As SFD_RAW is being added to the kernel API we should document it. > Please keep Michael cc'ed and work with him on getting the manpages > updated. The API is certainly no thing of beauty, as I already remarked here: http://thread.gmane.org/gmane.linux.file-systems/70420/focus=1420228 A description of the API can be found here: https://lwn.net/Articles/531939/ The main problem is the ugliness of changing the meaning of pread()'s 'offset' argument to be a multiplexed [signal queue selector + queue position]. I do wonder if we can do better, though I have no particular solution to offer beyond the sledgehammer of adding a new system call. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface"; http://man7.org/tlpi/ ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CAKgNAkgHVB3=k_XOevobcMWuEqy2r75tdTc85ZYiD8rkn5OZKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) [not found] ` <CAKgNAkgHVB3=k_XOevobcMWuEqy2r75tdTc85ZYiD8rkn5OZKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-01-19 10:50 ` Andrey Wagin 2013-01-19 23:27 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 16+ messages in thread From: Andrey Wagin @ 2013-01-19 10:50 UTC (permalink / raw) To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w Cc: Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov, Pavel Emelyanov, Cyrill Gorcunov 2013/1/19 Michael Kerrisk (man-pages) <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>: >> As SFD_RAW is being added to the kernel API we should document it. >> Please keep Michael cc'ed and work with him on getting the manpages >> updated. > > The API is certainly no thing of beauty, as I already remarked here: > http://thread.gmane.org/gmane.linux.file-systems/70420/focus=1420228 All of us have similar thoughts, but we don't know a better solution. Here is an example of usage signalfd and pread: http://lists.openvz.org/pipermail/criu/2013-January/007040.html > > A description of the API can be found here: > https://lwn.net/Articles/531939/ Thanks you for the attempt to involve more people to this discussion. > > The main problem is the ugliness of changing the meaning of pread()'s > 'offset' argument to be a multiplexed [signal queue selector + queue > position]. > > I do wonder if we can do better, though I have no particular solution > to offer beyond the sledgehammer of adding a new system call. If we are choosing between this interface or a new syscall, I would prefer this interface. signalfd is a special descriptor, so I think it is not a big deal, that it works a bit strange. If all other would decides, that a new syscall is better, I will not ague. Michael, thank you for the comments. > > Cheers, > > Michael > > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Author of "The Linux Programming Interface"; http://man7.org/tlpi/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2013-01-19 10:50 ` Andrey Wagin @ 2013-01-19 23:27 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 16+ messages in thread From: Michael Kerrisk (man-pages) @ 2013-01-19 23:27 UTC (permalink / raw) To: Andrey Wagin Cc: Andrew Morton, linux-kernel, criu, linux-fsdevel, linux-api, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov, Pavel Emelyanov, Cyrill Gorcunov Hello Andrey, On Sat, Jan 19, 2013 at 11:50 AM, Andrey Wagin <avagin@gmail.com> wrote: > 2013/1/19 Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>: >>> As SFD_RAW is being added to the kernel API we should document it. >>> Please keep Michael cc'ed and work with him on getting the manpages >>> updated. >> >> The API is certainly no thing of beauty, as I already remarked here: >> http://thread.gmane.org/gmane.linux.file-systems/70420/focus=1420228 > > All of us have similar thoughts, but we don't know a better solution. > > Here is an example of usage signalfd and pread: > http://lists.openvz.org/pipermail/criu/2013-January/007040.html > >> >> A description of the API can be found here: >> https://lwn.net/Articles/531939/ > > Thanks you for the attempt to involve more people to this discussion. > >> >> The main problem is the ugliness of changing the meaning of pread()'s >> 'offset' argument to be a multiplexed [signal queue selector + queue >> position]. >> >> I do wonder if we can do better, though I have no particular solution >> to offer beyond the sledgehammer of adding a new system call. > > If we are choosing between this interface or a new syscall, I would > prefer this interface. A agree. That's why I called using a new syscall a sledgehammer. Still, I just wanted to remind folk who might think of something better. > signalfd is a special descriptor, so I think it > is not a big deal, that it works a bit strange. Sure, but the more we special case things, the uglier the ABI as a whole becomes. So special casing should be avoided as far as we can. > If all other would > decides, that a new syscall is better, I will not ague. And that's more or less how I see it too. I'm not going to argue for a new syscall, based on what I know so far. Here is one idea to think about though, while more or less maintaining your proposed interface. At the moment, you select signal queues in the pread() call. An alternative would be to do it in the signalfd() call. In other words, you could have the following flags used with signalfd() SFD_RAW SFD_SHARED_QUEUE -- reads will be from process-wide shared signal queue SFD_PER_THREAD_QUEUE --reads will be from per-thread signal queue Specifying both SFD_SHARED_QUEUE and SFD_PER_THREAD_QUEUE would be the same as omitting them both, providing the default behavior of slecting from both queues. My point here is that you can then separate the RAW functionality from the queue selector functionality. Now, it might be that at the moment you always require that if the caller specifies SFD_SHARED_QUEUE or SFD_PER_THREAD_QUEUE, then they must also specify SFD_RAW. But later, that constraint might be relaxed, so that users could use signalfd() to select from a particular queue when reading traditional (non-RAW) signalfd_siginfo structures from a signalfd. This does seem like a very sensible design optimization to make now (and an easy one, I would suppose). What do you think? Cheers, Michael ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-01-19 23:27 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin 2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin 2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin [not found] ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2012-12-28 16:14 ` Oleg Nesterov 2013-01-10 9:47 ` Andrey Wagin 2013-01-10 22:45 ` Michael Kerrisk (man-pages) 2013-01-12 18:55 ` Oleg Nesterov 2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin [not found] ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2012-12-28 14:32 ` Oleg Nesterov [not found] ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-12-28 15:40 ` Oleg Nesterov -- strict thread matches above, loose matches on Subject: below -- 2013-01-14 16:53 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending " Andrey Vagin 2013-01-14 16:53 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin 2013-01-16 20:35 ` Andrew Morton 2013-01-17 15:28 ` Andrew Vagin [not found] ` <20130116123502.70af6b85.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> 2013-01-18 23:27 ` Michael Kerrisk (man-pages) [not found] ` <CAKgNAkgHVB3=k_XOevobcMWuEqy2r75tdTc85ZYiD8rkn5OZKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-01-19 10:50 ` Andrey Wagin 2013-01-19 23:27 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).