* [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) @ 2012-12-28 10:22 Andrey Vagin 2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Andrey Vagin @ 2012-12-28 10:22 UTC (permalink / raw) To: linux-kernel Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Serge Hallyn, Oleg Nesterov, Andrew Morton, Eric W. Biederman, Al Viro, Pavel Emelyanov, Cyrill Gorcunov, Michael Kerrisk The idea is simple. We need to get the siginfo for each signal on dump, and then return it back on restore. The first problem is that the kernel doesn’t report complete siginfo-s in user-space. In a signal handler the kernel strips SI_CODE from siginfo. When a siginfo is received from signalfd, it has a different format with fixed sizes of fields. The interface of signalfd was extended. If a signalfd is created with the flag SFD_RAW, it returns siginfo in a raw format. rt_sigqueueinfo looks suitable for restoring signals, but it can’t send siginfo with a positive si_code, because these codes are reserved for the kernel. In the real world each person has right to do anything with himself, so I think a process should able to send any siginfo to itself. v2: add ability to dump signals without dequeuing them. pread with non-zero offset is used for this. offset encodes a queue (private of shared) and a sequence number of a signal in the queue. Thanks to Oleg for this idea. Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] signal: allow to send any siginfo to itself 2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin @ 2012-12-28 10:22 ` Andrey Vagin 2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin 2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin 2 siblings, 0 replies; 10+ messages in thread From: Andrey Vagin @ 2012-12-28 10:22 UTC (permalink / raw) To: linux-kernel Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Oleg Nesterov, Serge Hallyn, Andrew Morton, Eric W. Biederman, Al Viro, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov A kernel prevents of sending siginfo with positive si_code, because these codes is reserved for kernel. I think we can allow to send any siginfo to itself. This operation should not be dangerous. This functionality is required for restoring signals. Cc: Oleg Nesterov <oleg@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> --- kernel/signal.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 7aaa51d..ac5f5e7 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2944,7 +2944,8 @@ SYSCALL_DEFINE3(rt_sigqueueinfo, pid_t, pid, int, sig, /* Not even root can pretend to send signals from the kernel. * Nor can they impersonate a kill()/tgkill(), which adds source info. */ - if (info.si_code >= 0 || info.si_code == SI_TKILL) { + if (((info.si_code >= 0 || info.si_code == SI_TKILL)) && + (task_pid_vnr(current) != pid)) { /* We used to allow any < 0 si_code */ WARN_ON_ONCE(info.si_code < 0); return -EPERM; @@ -2964,7 +2965,8 @@ long do_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig, siginfo_t *info) /* Not even root can pretend to send signals from the kernel. * Nor can they impersonate a kill()/tgkill(), which adds source info. */ - if (info->si_code >= 0 || info->si_code == SI_TKILL) { + if ((info->si_code >= 0 || info->si_code == SI_TKILL) && + (task_pid_vnr(current) != pid)) { /* We used to allow any < 0 si_code */ WARN_ON_ONCE(info->si_code < 0); return -EPERM; -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin 2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin @ 2012-12-28 10:23 ` Andrey Vagin [not found] ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin 2 siblings, 1 reply; 10+ messages in thread From: Andrey Vagin @ 2012-12-28 10:23 UTC (permalink / raw) To: linux-kernel Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov signalfd should be called with the flag SFD_RAW for that. signalfd_siginfo is not full for siginfo with a negative si_code. copy_siginfo_to_user() is copied a full siginfo to user-space, if si_code is negative. signalfd_copyinfo() doesn't do that and can't be expanded, because it has not compatible format with siginfo_t. Another problem is that a constant __SI_* is removed from si_code. It's not a problem for usual applications, because they expect a defined type of siginfo (internal logic). When we want to dump pending signals, we can't predict a type of siginfo, so we should get it from kernel. The main idea of the raw format is that it should be enough for restoring exactly the same siginfo for the current process. This functionality is required for checkpointing pending signals. v2: fix a race condition during setting file flags copy_siginfo_to_user32() if is_compat_task Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> --- fs/signalfd.c | 64 +++++++++++++++++++++++++++++++++++++++---- include/uapi/linux/signalfd.h | 1 + 2 files changed, 60 insertions(+), 5 deletions(-) diff --git a/fs/signalfd.c b/fs/signalfd.c index b534869..4439a81 100644 --- a/fs/signalfd.c +++ b/fs/signalfd.c @@ -30,6 +30,7 @@ #include <linux/signalfd.h> #include <linux/syscalls.h> #include <linux/proc_fs.h> +#include <linux/compat.h> void signalfd_cleanup(struct sighand_struct *sighand) { @@ -74,6 +75,38 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait) } /* + * Copy a whole siginfo into users spaces. + * The main idea of this format is that it should be enough + * for restoring siginfo back into the kernel. + */ +static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo, + siginfo_t *kinfo) +{ + siginfo_t *uinfo = (siginfo_t *) siginfo; + int err; + + BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo)); + + err = __clear_user(uinfo, sizeof(*uinfo)); + +#ifdef CONFIG_COMPAT + if (unlikely(is_compat_task())) { + compat_siginfo_t *compat_uinfo = (compat_siginfo_t *) siginfo; + + err |= copy_siginfo_to_user32(compat_uinfo, kinfo); + err |= put_user(kinfo->si_code, &compat_uinfo->si_code); + + return err ? -EFAULT: sizeof(*compat_uinfo); + } +#endif + + err |= copy_siginfo_to_user(uinfo, kinfo); + err |= put_user(kinfo->si_code, &uinfo->si_code); + + return err ? -EFAULT: sizeof(*uinfo); +} + +/* * Copied from copy_siginfo_to_user() in kernel/signal.c */ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo, @@ -205,6 +238,7 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, struct signalfd_ctx *ctx = file->private_data; struct signalfd_siginfo __user *siginfo; int nonblock = file->f_flags & O_NONBLOCK; + bool raw = file->f_flags & SFD_RAW; ssize_t ret, total = 0; siginfo_t info; @@ -217,7 +251,12 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, ret = signalfd_dequeue(ctx, &info, nonblock); if (unlikely(ret <= 0)) break; - ret = signalfd_copyinfo(siginfo, &info); + + if (raw) + ret = signalfd_copy_raw_info(siginfo, &info); + else + ret = signalfd_copyinfo(siginfo, &info); + if (ret < 0) break; siginfo++; @@ -262,7 +301,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK); - if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK)) + if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_RAW)) return -EINVAL; if (sizemask != sizeof(sigset_t) || @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, signotset(&sigmask); if (ufd == -1) { + struct file *file; ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) return -ENOMEM; ctx->sigmask = sigmask; + ufd = get_unused_fd_flags(flags); + if (ufd < 0) { + kfree(ctx); + goto out; + } + /* * When we call this, the initialization must be complete, since * anon_inode_getfd() will install the fd. */ - ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx, + file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx, O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK))); - if (ufd < 0) + if (IS_ERR(file)) { + put_unused_fd(ufd); + ufd = PTR_ERR(file); kfree(ctx); + goto out; + } + + file->f_flags |= flags & SFD_RAW; + + fd_install(ufd, file); } else { struct fd f = fdget(ufd); if (!f.file) @@ -302,7 +356,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, wake_up(¤t->sighand->signalfd_wqh); fdput(f); } - +out: return ufd; } diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h index 492c6de..bc31849 100644 --- a/include/uapi/linux/signalfd.h +++ b/include/uapi/linux/signalfd.h @@ -15,6 +15,7 @@ /* Flags for signalfd4. */ #define SFD_CLOEXEC O_CLOEXEC #define SFD_NONBLOCK O_NONBLOCK +#define SFD_RAW O_DIRECT struct signalfd_siginfo { __u32 ssi_signo; -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 10+ messages in thread
[parent not found: <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) [not found] ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2012-12-28 16:14 ` Oleg Nesterov 2013-01-10 9:47 ` Andrey Wagin 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2012-12-28 16:14 UTC (permalink / raw) To: Andrey Vagin, Linus Torvalds Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov On 12/28, Andrey Vagin wrote: > > signalfd should be called with the flag SFD_RAW for that. > > signalfd_siginfo is not full for siginfo with a negative si_code. > copy_siginfo_to_user() is copied a full siginfo to user-space, if > si_code is negative. signalfd_copyinfo() doesn't do that and can't be > expanded, because it has not compatible format with siginfo_t. > > Another problem is that a constant __SI_* is removed from si_code. > It's not a problem for usual applications, because they expect > a defined type of siginfo (internal logic). > When we want to dump pending signals, we can't predict a type of > siginfo, so we should get it from kernel. > > The main idea of the raw format is that it should be enough for > restoring exactly the same siginfo for the current process. > > This functionality is required for checkpointing pending signals. And this should be used along with pread(). I won't argue, but perhaps we should simply postulate that pread() always dumps siginfo in raw format and avoid the new flag? > @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, > signotset(&sigmask); > > if (ufd == -1) { > + struct file *file; > ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); > if (!ctx) > return -ENOMEM; > > ctx->sigmask = sigmask; > > + ufd = get_unused_fd_flags(flags); > + if (ufd < 0) { > + kfree(ctx); > + goto out; > + } > + > /* > * When we call this, the initialization must be complete, since > * anon_inode_getfd() will install the fd. > */ > - ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx, > + file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx, > O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK))); > - if (ufd < 0) > + if (IS_ERR(file)) { > + put_unused_fd(ufd); > + ufd = PTR_ERR(file); > kfree(ctx); > + goto out; > + } > + > + file->f_flags |= flags & SFD_RAW; > + > + fd_install(ufd, file); And this is needed because we want to set more bits in f_flags/f_mode. I am wondering if we can change anon_inode_getfd/getfile somehow so that the user can pass more flags... Or. Currently fops->open() is never used if this file_operations is used by anon_inode_get*. So perhaps we can change anon_inode_getfile() to call fops->open() if it is not zero before return? OK, this is almost off-topic, please ignore. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2012-12-28 16:14 ` Oleg Nesterov @ 2013-01-10 9:47 ` Andrey Wagin 2013-01-10 22:45 ` Michael Kerrisk (man-pages) 2013-01-12 18:55 ` Oleg Nesterov 0 siblings, 2 replies; 10+ messages in thread From: Andrey Wagin @ 2013-01-10 9:47 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, linux-kernel, criu, linux-fsdevel, linux-api, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov 2012/12/28 Oleg Nesterov <oleg@redhat.com>: > On 12/28, Andrey Vagin wrote: >> >> signalfd should be called with the flag SFD_RAW for that. >> >> signalfd_siginfo is not full for siginfo with a negative si_code. >> copy_siginfo_to_user() is copied a full siginfo to user-space, if >> si_code is negative. signalfd_copyinfo() doesn't do that and can't be >> expanded, because it has not compatible format with siginfo_t. >> >> Another problem is that a constant __SI_* is removed from si_code. >> It's not a problem for usual applications, because they expect >> a defined type of siginfo (internal logic). >> When we want to dump pending signals, we can't predict a type of >> siginfo, so we should get it from kernel. >> >> The main idea of the raw format is that it should be enough for >> restoring exactly the same siginfo for the current process. >> >> This functionality is required for checkpointing pending signals. > > And this should be used along with pread(). > > I won't argue, but perhaps we should simply postulate that pread() > always dumps siginfo in raw format and avoid the new flag? I won't argue too:), but I think the flag is more flexiable. The flag allows to get raw siginfo from read(), it may be useful not only for checkpoint/restore. pread with offset = 0 is equal to read(), so the interface with the flag is more clear. > >> @@ -272,20 +311,35 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, >> signotset(&sigmask); >> >> if (ufd == -1) { >> + struct file *file; >> ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); >> if (!ctx) >> return -ENOMEM; >> >> ctx->sigmask = sigmask; >> >> + ufd = get_unused_fd_flags(flags); >> + if (ufd < 0) { >> + kfree(ctx); >> + goto out; >> + } >> + >> /* >> * When we call this, the initialization must be complete, since >> * anon_inode_getfd() will install the fd. >> */ >> - ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx, >> + file = anon_inode_getfile("[signalfd]", &signalfd_fops, ctx, >> O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK))); >> - if (ufd < 0) >> + if (IS_ERR(file)) { >> + put_unused_fd(ufd); >> + ufd = PTR_ERR(file); >> kfree(ctx); >> + goto out; >> + } >> + >> + file->f_flags |= flags & SFD_RAW; >> + >> + fd_install(ufd, file); > > And this is needed because we want to set more bits in f_flags/f_mode. > > I am wondering if we can change anon_inode_getfd/getfile somehow so > that the user can pass more flags... > > Or. Currently fops->open() is never used if this file_operations is used > by anon_inode_get*. So perhaps we can change anon_inode_getfile() to call > fops->open() if it is not zero before return? > > OK, this is almost off-topic, please ignore. > > Oleg. > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2013-01-10 9:47 ` Andrey Wagin @ 2013-01-10 22:45 ` Michael Kerrisk (man-pages) 2013-01-12 18:55 ` Oleg Nesterov 1 sibling, 0 replies; 10+ messages in thread From: Michael Kerrisk (man-pages) @ 2013-01-10 22:45 UTC (permalink / raw) To: Andrey Wagin Cc: Oleg Nesterov, Linus Torvalds, linux-kernel, criu, linux-fsdevel, linux-api, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Pavel Emelyanov, Cyrill Gorcunov On Thu, Jan 10, 2013 at 10:47 AM, Andrey Wagin <avagin@gmail.com> wrote: > 2012/12/28 Oleg Nesterov <oleg@redhat.com>: >> On 12/28, Andrey Vagin wrote: >>> >>> signalfd should be called with the flag SFD_RAW for that. >>> >>> signalfd_siginfo is not full for siginfo with a negative si_code. >>> copy_siginfo_to_user() is copied a full siginfo to user-space, if >>> si_code is negative. signalfd_copyinfo() doesn't do that and can't be >>> expanded, because it has not compatible format with siginfo_t. >>> >>> Another problem is that a constant __SI_* is removed from si_code. >>> It's not a problem for usual applications, because they expect >>> a defined type of siginfo (internal logic). >>> When we want to dump pending signals, we can't predict a type of >>> siginfo, so we should get it from kernel. >>> >>> The main idea of the raw format is that it should be enough for >>> restoring exactly the same siginfo for the current process. >>> >>> This functionality is required for checkpointing pending signals. >> >> And this should be used along with pread(). >> >> I won't argue, but perhaps we should simply postulate that pread() >> always dumps siginfo in raw format and avoid the new flag? > > I won't argue too:), but I think the flag is more flexiable. The flag > allows to get raw siginfo from read(), it may be useful not only for > checkpoint/restore. pread with offset = 0 is equal to read(), so the > interface with the flag is more clear. Special casing pread() behavior for signalfd() so that SFD_RAW is not required seems only to make the ugliness worse... While I appreciate the need for this API, multiplexing a queue selector and a positional value into what is normally a byte offset in pread() is really bizarre. This is ugly and I wonder if something better is possible, though so far I have no suggestions beyond adding a new system call. -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface"; http://man7.org/tlpi/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) 2013-01-10 9:47 ` Andrey Wagin 2013-01-10 22:45 ` Michael Kerrisk (man-pages) @ 2013-01-12 18:55 ` Oleg Nesterov 1 sibling, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2013-01-12 18:55 UTC (permalink / raw) To: Andrey Wagin Cc: Linus Torvalds, linux-kernel, criu, linux-fsdevel, linux-api, Alexander Viro, Paul E. McKenney, David Howells, Thomas Gleixner, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov On 01/10, Andrey Wagin wrote: > > 2012/12/28 Oleg Nesterov <oleg@redhat.com>: > >> > >> This functionality is required for checkpointing pending signals. > > > > And this should be used along with pread(). > > > > I won't argue, but perhaps we should simply postulate that pread() > > always dumps siginfo in raw format and avoid the new flag? > > I won't argue too:), but I think the flag is more flexiable. The flag > allows to get raw siginfo from read(), it may be useful not only for > checkpoint/restore. pread with offset = 0 is equal to read(), so the > interface with the flag is more clear. OK, I agree. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) 2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin 2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin 2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin @ 2012-12-28 10:23 ` Andrey Vagin [not found] ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2 siblings, 1 reply; 10+ messages in thread From: Andrey Vagin @ 2012-12-28 10:23 UTC (permalink / raw) To: linux-kernel Cc: criu, linux-fsdevel, linux-api, Andrey Vagin, Oleg Nesterov, Alexander Viro, Paul E. McKenney, David Howells, Dave Jones, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov pread(fd, buf, size, pos) with non-zero pos returns siginfo-s without dequeuing signals. A sequence number and a queue are encoded in pos. pos = seq + SFD_*_OFFSET seq is a sequence number of a signal in a queue. SFD_PER_THREAD_QUEUE_OFFSET - read signals from a per-thread queue. SFD_SHARED_QUEUE_OFFSET - read signals from a shared (process wide) queue. This functionality is required for checkpointing pending signals. v2: llseek() can't be used here, because peek_offset/f_pos/whatever has to be shared with all processes which have this file opened. Suppose that the task forks after sys_signalfd(). Now if parent or child do llseek this affects them both. This is insane because signalfd is "strange" to say at least, fork/dup/etc inherits signalfd_ctx but not the" source" of the data. // Oleg Nesterov v3: minor cleanup Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> --- fs/signalfd.c | 44 ++++++++++++++++++++++++++++++++++++++++++- include/uapi/linux/signalfd.h | 5 +++++ 2 files changed, 48 insertions(+), 1 deletion(-) diff --git a/fs/signalfd.c b/fs/signalfd.c index 4439a81..86eb130 100644 --- a/fs/signalfd.c +++ b/fs/signalfd.c @@ -51,6 +51,43 @@ struct signalfd_ctx { sigset_t sigmask; }; +static ssize_t signalfd_peek(struct signalfd_ctx *ctx, + siginfo_t *info, loff_t *ppos) +{ + struct sigpending *pending; + struct sigqueue *q; + loff_t seq; + int ret = 0; + + spin_lock_irq(¤t->sighand->siglock); + + if (*ppos >= SFD_SHARED_QUEUE_OFFSET) { + pending = ¤t->signal->shared_pending; + seq = *ppos - SFD_SHARED_QUEUE_OFFSET; + } else { + pending = ¤t->pending; + seq = *ppos - SFD_PER_THREAD_QUEUE_OFFSET; + } + + list_for_each_entry(q, &pending->list, list) { + if (sigismember(&ctx->sigmask, q->info.si_signo)) + continue; + + if (seq-- == 0) { + copy_siginfo(info, &q->info); + ret = info->si_signo; + break; + } + } + + spin_unlock_irq(¤t->sighand->siglock); + + if (ret) + (*ppos)++; + + return ret; +} + static int signalfd_release(struct inode *inode, struct file *file) { kfree(file->private_data); @@ -248,7 +285,11 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, siginfo = (struct signalfd_siginfo __user *) buf; do { - ret = signalfd_dequeue(ctx, &info, nonblock); + if (*ppos == 0) + ret = signalfd_dequeue(ctx, &info, nonblock); + else + ret = signalfd_peek(ctx, &info, ppos); + if (unlikely(ret <= 0)) break; @@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, } file->f_flags |= flags & SFD_RAW; + file->f_mode |= FMODE_PREAD; fd_install(ufd, file); } else { diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h index bc31849..0953785 100644 --- a/include/uapi/linux/signalfd.h +++ b/include/uapi/linux/signalfd.h @@ -17,6 +17,11 @@ #define SFD_NONBLOCK O_NONBLOCK #define SFD_RAW O_DIRECT +/* Read signals from a shared (process wide) queue */ +#define SFD_SHARED_QUEUE_OFFSET (1LL << 62) +/* Read signals from a per-thread queue */ +#define SFD_PER_THREAD_QUEUE_OFFSET 1 + struct signalfd_siginfo { __u32 ssi_signo; __s32 ssi_errno; -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 10+ messages in thread
[parent not found: <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) [not found] ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2012-12-28 14:32 ` Oleg Nesterov [not found] ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2012-12-28 14:32 UTC (permalink / raw) To: Andrey Vagin, Linus Torvalds Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro, Paul E. McKenney, David Howells, Dave Jones, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov On 12/28, Andrey Vagin wrote: > > pread(fd, buf, size, pos) with non-zero pos returns siginfo-s > without dequeuing signals. > > A sequence number and a queue are encoded in pos. > > pos = seq + SFD_*_OFFSET > > seq is a sequence number of a signal in a queue. > > SFD_PER_THREAD_QUEUE_OFFSET - read signals from a per-thread queue. > SFD_SHARED_QUEUE_OFFSET - read signals from a shared (process wide) queue. > > This functionality is required for checkpointing pending signals. > > v2: llseek() can't be used here, because peek_offset/f_pos/whatever > has to be shared with all processes which have this file opened. > > Suppose that the task forks after sys_signalfd(). Now if parent or child > do llseek this affects them both. This is insane because signalfd is > "strange" to say at least, fork/dup/etc inherits signalfd_ctx but not > the" source" of the data. // Oleg Nesterov I think we should cc Linus. This patch adds the hack and it makes signalfd even more strange. Yes, this hack was suggested by me because I can't suggest something better. But if Linus dislikes this user-visible API it would be better to get his nack right now. > +static ssize_t signalfd_peek(struct signalfd_ctx *ctx, > + siginfo_t *info, loff_t *ppos) > +{ > + struct sigpending *pending; > + struct sigqueue *q; > + loff_t seq; > + int ret = 0; > + > + spin_lock_irq(¤t->sighand->siglock); > + > + if (*ppos >= SFD_SHARED_QUEUE_OFFSET) { > + pending = ¤t->signal->shared_pending; > + seq = *ppos - SFD_SHARED_QUEUE_OFFSET; > + } else { > + pending = ¤t->pending; > + seq = *ppos - SFD_PER_THREAD_QUEUE_OFFSET; > + } You can do this outside of spin_lock_irq(). And I think it would be better to check SFD_PRIVATE_QUEUE_OFFSET too although this is not strictly necessary. Otherwise this code assumes that sys_pread() cheks pos >= 0 and SFD_PRIVATE_QUEUE_OFFSET == 1. > + list_for_each_entry(q, &pending->list, list) { > + if (sigismember(&ctx->sigmask, q->info.si_signo)) > + continue; > + > + if (seq-- == 0) { > + copy_siginfo(info, &q->info); > + ret = info->si_signo; > + break; > + } > + } > + > + spin_unlock_irq(¤t->sighand->siglock); > + > + if (ret) > + (*ppos)++; We can change it unconditionally but I won't argue. > @@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, > } > > file->f_flags |= flags & SFD_RAW; > + file->f_mode |= FMODE_PREAD; Again, this is not needed or the code was broken by the previous patch. Given that 2/3 passes O_RDWR to anon_inode_getfile() I think FMODE_PREAD should be already set. Note OPEN_FMODE(flags) in anon_inode_getfile(). Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) [not found] ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2012-12-28 15:40 ` Oleg Nesterov 0 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2012-12-28 15:40 UTC (permalink / raw) To: Andrey Vagin, Linus Torvalds Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, Alexander Viro, Paul E. McKenney, David Howells, Dave Jones, Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov On 12/28, Oleg Nesterov wrote: > > On 12/28, Andrey Vagin wrote: > > > > @@ -338,6 +379,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, > > } > > > > file->f_flags |= flags & SFD_RAW; > > + file->f_mode |= FMODE_PREAD; > > Again, this is not needed or the code was broken by the previous patch. > > Given that 2/3 passes O_RDWR to anon_inode_getfile() I think FMODE_PREAD > should be already set. Note OPEN_FMODE(flags) in anon_inode_getfile(). As you explained in another thread I was wrong, I confused FMODE_PREAD and FMODE_READ. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-01-12 18:55 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-28 10:22 [PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending signals (v2) Andrey Vagin 2012-12-28 10:22 ` [PATCH 1/3] signal: allow to send any siginfo to itself Andrey Vagin 2012-12-28 10:23 ` [PATCH 2/3] signalfd: add ability to return siginfo in a raw format (v2) Andrey Vagin [not found] ` <1356690181-1796-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2012-12-28 16:14 ` Oleg Nesterov 2013-01-10 9:47 ` Andrey Wagin 2013-01-10 22:45 ` Michael Kerrisk (man-pages) 2013-01-12 18:55 ` Oleg Nesterov 2012-12-28 10:23 ` [PATCH 3/3] signalfd: add ability to read siginfo-s without dequeuing signals (v3) Andrey Vagin [not found] ` <1356690181-1796-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2012-12-28 14:32 ` Oleg Nesterov [not found] ` <20121228143200.GB24229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-12-28 15:40 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).