From: Andrew Morton <akpm@linux-foundation.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
"H. Peter Anvin" <hpa@zytor.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-kernel@vger.kernel.org,
Pavel Emelyanov <xemul@parallels.com>,
Serge Hallyn <serge.hallyn@canonical.com>,
Kees Cook <keescook@chromium.org>, Tejun Heo <tj@kernel.org>,
Andrew Vagin <avagin@openvz.org>,
Alexey Dobriyan <adobriyan@gmail.com>,
Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
Glauber Costa <glommer@parallels.com>,
Andi Kleen <andi@firstfloor.org>,
Matt Helsley <matthltc@us.ibm.com>,
Pekka Enberg <penberg@kernel.org>,
Eric Dumazet <eric.dumazet@gmail.com>,
Vasiliy Kulikov <segoon@openwall.com>,
Valdis.Kletnieks@vt.edu
Subject: Re: [patch 2/4] [RFC] syscalls, x86: Add __NR_kcmp syscall v4
Date: Tue, 24 Jan 2012 13:22:22 -0800 [thread overview]
Message-ID: <20120124132222.d78bc0d4.akpm@linux-foundation.org> (raw)
In-Reply-To: <20120124205039.GB2278@moon>
On Wed, 25 Jan 2012 00:50:39 +0400
Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> This one should fit all requirements I guess.
[wakes up]
> While doing the checkpoint-restore in the userspace one need to determine
> whether various kernel objects (like mm_struct-s of file_struct-s) are shared
> between tasks and restore this state.
>
> The 2nd step can be solved by using appropriate CLONE_ flags and the unshare
> syscall, while there's currently no ways for solving the 1st one.
>
> One of the ways for checking whether two tasks share e.g. mm_struct is to
> provide some mm_struct ID of a task to its proc file, but showing such
> info considered to be not that good for security reasons.
>
> Thus after some debates we end up in conclusion that using that named
> 'comparision' syscall might be the best candidate. So here is it --
> __NR_kcmp.
>
> It takes up to 5 agruments - the pids of the two tasks (which
> characteristics should be compared), the comparision type and
> (in case of comparision of files) two file descriptors.
PIDs are not unique. One wonders what happens in this syscall if the
same pid appears in two namespaces.
<reads the code>
Seems that it performs lookups only in the caller's PID namespace.
Maybe this is appropriate but it should be described and justified in
the changelog and in code comments, please. And in the forthcoming
manpage ;)
> At moment only x86 is supported.
Presumably you have a test app. Please let's include that app in
tools/testing/selftests/ for arch maintainers and others to use and
maintain.
>
> ...
>
> --- /dev/null
> +++ linux-2.6.git/kernel/kcmp.c
> @@ -0,0 +1,163 @@
> +#include <linux/kernel.h>
> +#include <linux/syscalls.h>
> +#include <linux/fdtable.h>
> +#include <linux/string.h>
> +#include <linux/random.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/cache.h>
> +#include <linux/bug.h>
> +#include <linux/err.h>
> +#include <linux/kcmp.h>
> +
> +#include <asm/unistd.h>
> +
> +static unsigned long cookies[KCMP_TYPES][2] __read_mostly;
This reader of this code doesn't understand why all this cookie stuff
is in here. Please include code comments which explain the reason for
the existence of this code.
> +static long kptr_obfuscate(long v, int type)
> +{
> + return (v ^ cookies[type][0]) * cookies[type][1];
> +}
> +
> +/*
> + * 0 - equal
> + * 1 - less than
> + * 2 - greater than
> + * 3 - not equal but ordering unavailable
what the heck does case 3 mean? Why is it here?
> + */
> +static int kcmp_ptr(long v1, long v2, int type)
> +{
> + long ret;
> +
> + ret = kptr_obfuscate(v1, type) - kptr_obfuscate(v2, type);
> +
> + return (ret < 0) | ((ret > 0) << 1);
> +}
> +
> +#define KCMP_TASK_PTR(task1, task2, member, type) \
> + kcmp_ptr((long)(task1)->member, \
> + (long)(task2)->member, \
> + type)
> +
> +#define KCMP_PTR(ptr1, ptr2, type) \
> + kcmp_ptr((long)ptr1, (long)ptr2, type)
ugh. This:
static long kptr_obfuscate(void *p, enum you_forgot_to_name_the_enum type)
{
return ((long)p ^ cookies[type][0]) * cookies[type][1];
}
static int kcmp_task_pointers(void *task1, void *task2, size_t field_offset,
enum you_forgot_to_name_the_enum type)
{
void **field1 = t1 + field_offset; /* points to a pointer in the task_struct */
void **field2 = t1 + field_offset;
long diff;
diff = kptr_obfuscate(*field1, type) - kptr_obfuscate(*field2, type);
return (diff < 0) | ((diff > 0) << 1);
}
...
ret = kcmp_task_pointers(task1, task2, offsetof(task_struct, mm),
KCMP_VM);
...
see? No nasty macros, it's type-correct and it uses only a single
explicit typecast.
> +/* A caller must be sure the task is presented in memory */
"The caller must have pinned the task"
> +static struct file *
> +get_file_raw_ptr(struct task_struct *task, unsigned int idx)
> +{
> + struct fdtable *fdt;
> + struct file *file;
> +
> + spin_lock(&task->files->file_lock);
> + fdt = files_fdtable(task->files);
> + if (idx < fdt->max_fds)
> + file = fdt->fd[idx];
> + else
> + file = NULL;
> + spin_unlock(&task->files->file_lock);
> +
> + return file;
> +}
> +
> +SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
> + unsigned long, idx1, unsigned long, idx2)
> +{
> + struct task_struct *task1;
> + struct task_struct *task2;
> + int ret = 0;
> +
> + rcu_read_lock();
> +
> + task1 = find_task_by_vpid(pid1);
> + if (!task1) {
> + rcu_read_unlock();
> + return -ESRCH;
> + }
> +
> + task2 = find_task_by_vpid(pid2);
> + if (!task2) {
> + put_task_struct(task1);
> + rcu_read_unlock();
> + return -ESRCH;
> + }
> +
> + get_task_struct(task1);
> + get_task_struct(task2);
> +
> + rcu_read_unlock();
> +
> + if (!ptrace_may_access(task1, PTRACE_MODE_READ) ||
> + !ptrace_may_access(task2, PTRACE_MODE_READ)) {
Add a comment explaining this decision.
> + ret = -EACCES;
> + goto err;
> + }
> +
> + /*
> + * Note for all cases but the KCMP_FILE we
> + * don't take any locks in a sake of speed.
> + */
> +
> + switch (type) {
> + case KCMP_FILE: {
> + struct file *filp1, *filp2;
> +
> + filp1 = get_file_raw_ptr(task1, idx1);
> + filp2 = get_file_raw_ptr(task2, idx2);
> +
> + if (filp1 && filp2)
> + ret = KCMP_PTR(filp1, filp2, KCMP_FILE);
> + else
> + ret = -ENOENT;
> + break;
> + }
> + case KCMP_VM:
> + ret = KCMP_TASK_PTR(task1, task2, mm, KCMP_VM);
> + break;
> + case KCMP_FILES:
> + ret = KCMP_TASK_PTR(task1, task2, files, KCMP_FILES);
> + break;
> + case KCMP_FS:
> + ret = KCMP_TASK_PTR(task1, task2, fs, KCMP_FS);
> + break;
> + case KCMP_SIGHAND:
> + ret = KCMP_TASK_PTR(task1, task2, sighand, KCMP_SIGHAND);
> + break;
> + case KCMP_IO:
> + ret = KCMP_TASK_PTR(task1, task2, io_context, KCMP_IO);
> + break;
> + case KCMP_SYSVSEM:
> +#ifdef CONFIG_SYSVIPC
> + ret = KCMP_TASK_PTR(task1, task2, sysvsem.undo_list, KCMP_SYSVSEM);
> +#else
> + ret = -ENOENT;
ENOENT seems inappropriate here.
> + goto err;
> +#endif
> + break;
> + default:
> + ret = -EINVAL;
> + goto err;
> + }
> +
> +err:
> + put_task_struct(task1);
> + put_task_struct(task2);
> +
> + return ret;
> +}
> +
> +static __init int kcmp_cookie_init(void)
> +{
> + int i, j;
> +
> + for (i = 0; i < KCMP_TYPES; i++) {
> + for (j = 0; j < 2; j++) {
> + get_random_bytes(&cookies[i][j],
> + sizeof(cookies[i][j]));
> + }
> + cookies[i][1] |= (~(~0UL >> 1) | 1);
hm, what's the point in writing a random number to cookies[i][1] and
then immediately overwriting that with a constant?
> + }
> +
> + return 0;
> +}
> +late_initcall(kcmp_cookie_init);
next prev parent reply other threads:[~2012-01-24 21:22 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-23 14:20 [patch 0/4] A few patches in a sake of c/r functionality Cyrill Gorcunov
2012-01-23 14:20 ` [patch 1/4] fs, proc: Introduce /proc/<pid>/task/<tid>/children entry v8 Cyrill Gorcunov
2012-01-23 18:54 ` Kees Cook
2012-01-23 19:33 ` Cyrill Gorcunov
2012-01-23 20:29 ` Kees Cook
2012-01-23 20:39 ` Cyrill Gorcunov
2012-01-24 2:07 ` KAMEZAWA Hiroyuki
2012-01-24 6:53 ` Cyrill Gorcunov
2012-01-24 7:07 ` KAMEZAWA Hiroyuki
2012-01-24 7:21 ` Cyrill Gorcunov
2012-01-24 8:52 ` Eric W. Biederman
2012-01-24 9:11 ` Cyrill Gorcunov
2012-01-25 1:14 ` KOSAKI Motohiro
2012-01-25 2:11 ` Eric W. Biederman
2012-01-25 6:55 ` Cyrill Gorcunov
2012-01-25 15:29 ` Cyrill Gorcunov
2012-01-24 8:51 ` Cyrill Gorcunov
2012-01-24 23:53 ` Andrew Morton
2012-01-25 6:52 ` Cyrill Gorcunov
2012-01-23 14:20 ` [patch 2/4] [RFC] syscalls, x86: Add __NR_kcmp syscall v4 Cyrill Gorcunov
2012-01-23 18:48 ` H. Peter Anvin
2012-01-23 20:03 ` Cyrill Gorcunov
2012-01-24 2:16 ` KAMEZAWA Hiroyuki
2012-01-24 6:47 ` Cyrill Gorcunov
2012-01-24 7:04 ` H. Peter Anvin
2012-01-24 7:17 ` Cyrill Gorcunov
2012-01-24 7:20 ` KAMEZAWA Hiroyuki
2012-01-24 7:38 ` Cyrill Gorcunov
2012-01-24 7:40 ` KAMEZAWA Hiroyuki
2012-01-24 8:48 ` Cyrill Gorcunov
2012-01-24 20:20 ` KOSAKI Motohiro
2012-01-24 20:26 ` Cyrill Gorcunov
2012-01-24 20:44 ` Eric W. Biederman
2012-01-24 20:50 ` Cyrill Gorcunov
2012-01-24 21:20 ` Eric W. Biederman
2012-01-24 21:34 ` Cyrill Gorcunov
2012-01-24 21:22 ` Andrew Morton [this message]
2012-01-24 21:45 ` Andrew Morton
2012-01-24 21:46 ` H. Peter Anvin
2012-01-24 22:00 ` Andrew Morton
2012-01-24 22:52 ` H. Peter Anvin
2012-01-24 23:42 ` Andrew Morton
2012-01-24 21:46 ` Cyrill Gorcunov
2012-01-24 21:59 ` Andrew Morton
2012-01-24 22:54 ` Eric W. Biederman
2012-01-24 22:54 ` Andrew Morton
2012-01-24 21:25 ` Andrew Morton
2012-01-24 21:31 ` Cyrill Gorcunov
2012-01-24 8:49 ` Eric W. Biederman
2012-01-24 8:49 ` Cyrill Gorcunov
2012-01-23 14:20 ` [patch 3/4] c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat Cyrill Gorcunov
2012-01-23 20:42 ` Kees Cook
2012-01-23 20:53 ` Cyrill Gorcunov
2012-01-24 23:59 ` Andrew Morton
2012-01-25 6:54 ` Cyrill Gorcunov
2012-01-25 7:12 ` Andrew Morton
2012-01-25 7:18 ` Cyrill Gorcunov
2012-01-23 14:20 ` [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries Cyrill Gorcunov
2012-01-23 15:55 ` Cyrill Gorcunov
2012-01-23 20:02 ` Cyrill Gorcunov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120124132222.d78bc0d4.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=Valdis.Kletnieks@vt.edu \
--cc=adobriyan@gmail.com \
--cc=andi@firstfloor.org \
--cc=avagin@openvz.org \
--cc=ebiederm@xmission.com \
--cc=eric.dumazet@gmail.com \
--cc=glommer@parallels.com \
--cc=gorcunov@gmail.com \
--cc=hpa@zytor.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=keescook@chromium.org \
--cc=kosaki.motohiro@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matthltc@us.ibm.com \
--cc=mingo@elte.hu \
--cc=penberg@kernel.org \
--cc=segoon@openwall.com \
--cc=serge.hallyn@canonical.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.