* [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks
@ 2011-11-15 11:35 Pavel Emelyanov
2011-11-15 11:36 ` [PATCH 1/4] Routine for generating an safe ID for kernel pointer Pavel Emelyanov
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Pavel Emelyanov @ 2011-11-15 11:35 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Cyrill Gorcunov, Glauber Costa, Andi Kleen, Tejun Heo,
Andrew Morton, Matt Helsley
While doing the checkpoint-restore in the userspace one need to determine
whether various kernel objects (like mm_struct-s of file_struct-s) are shared
between tasks and restore this state.
The 2nd step can for now be solved by using respective CLONE_XXX flags and
the unshare syscall, while there's currently no ways for solving the 1st one.
One of the ways for checking whether two tasks share e.g. an mm_struct is to
provide some mm_struct ID of a task to its proc file. The best from the
performance point of view ID is the object address in the kernel, but showing
them to the userspace is not good for performance reasons.
The previous attempt to solve this was to generate an ID for slab/slub and then
mix it up with the object index on the slab page. This attempt wasn't met
warmly by slab maintainers, so here's the 2nd approach.
The object address is XOR-ed with a "random" value of the same size and then
shown in proc. Providing this poison is not leaked into the userspace then
ID seem to be safe.
The other change from the previous set - this now includes patches from /proc
to show the IDs generated. The objects for which the IDs are shown are:
* all namespaces living in /proc/pid/ns/
* open files (shown in /proc/pid/fdinfo/)
* objects, that can be shared with CLONE_XXX flags (except for namespaces)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH 1/4] Routine for generating an safe ID for kernel pointer
2011-11-15 11:35 [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks Pavel Emelyanov
@ 2011-11-15 11:36 ` Pavel Emelyanov
2011-11-15 11:38 ` Pekka Enberg
` (2 more replies)
2011-11-15 11:36 ` [PATCH 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Pavel Emelyanov
` (2 subsequent siblings)
3 siblings, 3 replies; 20+ messages in thread
From: Pavel Emelyanov @ 2011-11-15 11:36 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Cyrill Gorcunov, Glauber Costa, Andi Kleen, Tejun Heo,
Andrew Morton, Matt Helsley
The routine XORs the given pointer with a random value thus producing
an ID (32 or 64 bit, depending on the arch) which can be shown even to
unprivileged user space processes without risking of leaking kernel
information.
It implies that it gets called when the random pool is ready for providing
a random long value.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
include/linux/gen_object_ids.h | 12 ++++++++++++
mm/Kconfig | 7 +++++++
mm/Makefile | 1 +
mm/gen_object_ids.c | 21 +++++++++++++++++++++
4 files changed, 41 insertions(+), 0 deletions(-)
create mode 100644 include/linux/gen_object_ids.h
create mode 100644 mm/gen_object_ids.c
diff --git a/include/linux/gen_object_ids.h b/include/linux/gen_object_ids.h
new file mode 100644
index 0000000..17981ae
--- /dev/null
+++ b/include/linux/gen_object_ids.h
@@ -0,0 +1,12 @@
+#ifndef __GEN_OBJECT_IDS_H__
+#define __GEN_OBJECT_IDS_H__
+
+#ifdef CONFIG_GENERIC_OBJECT_IDS
+unsigned long gen_object_id(void *ptr);
+#else
+static inline unsigned long gen_object_id(void *ptr)
+{
+ return 0;
+}
+#endif
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index f2f1ca1..1480cbf 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -370,3 +370,10 @@ config CLEANCACHE
in a negligible performance hit.
If unsure, say Y to enable cleancache
+
+config GENERIC_OBJECT_IDS
+ bool "Enable generic object ids infrastructure"
+ default n
+ help
+ Turn on the (quite simple) funtionality that can generate IDs for
+ kernel objects which is safe to export to the userspace.
diff --git a/mm/Makefile b/mm/Makefile
index 836e416..155797a 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -50,3 +50,4 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
obj-$(CONFIG_CLEANCACHE) += cleancache.o
+obj-$(CONFIG_GENERIC_OBJECT_IDS) += gen_object_ids.o
diff --git a/mm/gen_object_ids.c b/mm/gen_object_ids.c
new file mode 100644
index 0000000..a75119b
--- /dev/null
+++ b/mm/gen_object_ids.c
@@ -0,0 +1,21 @@
+#include <linux/gen_object_ids.h>
+#include <linux/spinlock.h>
+#include <linux/random.h>
+
+static unsigned long ptr_poison __read_mostly;
+static DEFINE_SPINLOCK(ptr_poison_lock);
+
+unsigned long gen_object_id(void *ptr)
+{
+ if (!ptr)
+ return 0;
+
+ if (unlikely(!ptr_poison)) {
+ spin_lock(&ptr_poison_lock);
+ if (!ptr_poison)
+ get_random_bytes(&ptr_poison, sizeof(ptr_poison));
+ spin_unlock(&ptr_poison_lock);
+ }
+
+ return ((unsigned long)ptr) ^ ptr_poison;
+}
--
1.5.5.6
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH 1/4] Routine for generating an safe ID for kernel pointer
2011-11-15 11:36 ` [PATCH 1/4] Routine for generating an safe ID for kernel pointer Pavel Emelyanov
@ 2011-11-15 11:38 ` Pekka Enberg
2011-11-15 11:44 ` Pavel Emelyanov
2011-11-15 15:13 ` Eric Dumazet
2011-11-15 15:20 ` Tejun Heo
2 siblings, 1 reply; 20+ messages in thread
From: Pekka Enberg @ 2011-11-15 11:38 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: Linux Kernel Mailing List, Cyrill Gorcunov, Glauber Costa,
Andi Kleen, Tejun Heo, Andrew Morton, Matt Helsley
On Tue, Nov 15, 2011 at 1:36 PM, Pavel Emelyanov <xemul@parallels.com> wrote:
> The routine XORs the given pointer with a random value thus producing
> an ID (32 or 64 bit, depending on the arch) which can be shown even to
> unprivileged user space processes without risking of leaking kernel
> information.
>
> It implies that it gets called when the random pool is ready for providing
> a random long value.
>
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>
> ---
> include/linux/gen_object_ids.h | 12 ++++++++++++
> mm/Kconfig | 7 +++++++
> mm/Makefile | 1 +
> mm/gen_object_ids.c | 21 +++++++++++++++++++++
> 4 files changed, 41 insertions(+), 0 deletions(-)
> create mode 100644 include/linux/gen_object_ids.h
> create mode 100644 mm/gen_object_ids.c
>
> diff --git a/include/linux/gen_object_ids.h b/include/linux/gen_object_ids.h
> new file mode 100644
> index 0000000..17981ae
> --- /dev/null
> +++ b/include/linux/gen_object_ids.h
> @@ -0,0 +1,12 @@
> +#ifndef __GEN_OBJECT_IDS_H__
> +#define __GEN_OBJECT_IDS_H__
> +
> +#ifdef CONFIG_GENERIC_OBJECT_IDS
> +unsigned long gen_object_id(void *ptr);
> +#else
> +static inline unsigned long gen_object_id(void *ptr)
> +{
> + return 0;
> +}
> +#endif
> +#endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index f2f1ca1..1480cbf 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -370,3 +370,10 @@ config CLEANCACHE
> in a negligible performance hit.
>
> If unsure, say Y to enable cleancache
> +
> +config GENERIC_OBJECT_IDS
> + bool "Enable generic object ids infrastructure"
> + default n
> + help
> + Turn on the (quite simple) funtionality that can generate IDs for
> + kernel objects which is safe to export to the userspace.
> diff --git a/mm/Makefile b/mm/Makefile
> index 836e416..155797a 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -50,3 +50,4 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
> obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
> obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
> obj-$(CONFIG_CLEANCACHE) += cleancache.o
> +obj-$(CONFIG_GENERIC_OBJECT_IDS) += gen_object_ids.o
> diff --git a/mm/gen_object_ids.c b/mm/gen_object_ids.c
> new file mode 100644
> index 0000000..a75119b
> --- /dev/null
> +++ b/mm/gen_object_ids.c
> @@ -0,0 +1,21 @@
> +#include <linux/gen_object_ids.h>
> +#include <linux/spinlock.h>
> +#include <linux/random.h>
> +
> +static unsigned long ptr_poison __read_mostly;
> +static DEFINE_SPINLOCK(ptr_poison_lock);
> +
> +unsigned long gen_object_id(void *ptr)
> +{
> + if (!ptr)
> + return 0;
> +
> + if (unlikely(!ptr_poison)) {
> + spin_lock(&ptr_poison_lock);
> + if (!ptr_poison)
> + get_random_bytes(&ptr_poison, sizeof(ptr_poison));
> + spin_unlock(&ptr_poison_lock);
> + }
> +
> + return ((unsigned long)ptr) ^ ptr_poison;
> +}
You could put this in mm/util.c. Wouldn't it make sense to separate
the initialization and use late_initcall() to call it?
Pekka
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 1/4] Routine for generating an safe ID for kernel pointer
2011-11-15 11:38 ` Pekka Enberg
@ 2011-11-15 11:44 ` Pavel Emelyanov
2011-11-15 11:51 ` Pekka Enberg
0 siblings, 1 reply; 20+ messages in thread
From: Pavel Emelyanov @ 2011-11-15 11:44 UTC (permalink / raw)
To: Pekka Enberg
Cc: Linux Kernel Mailing List, Cyrill Gorcunov, Glauber Costa,
Andi Kleen, Tejun Heo, Andrew Morton, Matt Helsley
>> +unsigned long gen_object_id(void *ptr)
>> +{
>> + if (!ptr)
>> + return 0;
>> +
>> + if (unlikely(!ptr_poison)) {
>> + spin_lock(&ptr_poison_lock);
>> + if (!ptr_poison)
>> + get_random_bytes(&ptr_poison, sizeof(ptr_poison));
>> + spin_unlock(&ptr_poison_lock);
>> + }
>> +
>> + return ((unsigned long)ptr) ^ ptr_poison;
>> +}
>
> You could put this in mm/util.c. Wouldn't it make sense to separate
> the initialization and use late_initcall() to call it?
OK, will put to util.c
About the initialization - I will put the sanity check about poison being not 0 on
get_object_id() anyway, so what's the point in separate initialization?
> Pekka
> .
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 1/4] Routine for generating an safe ID for kernel pointer
2011-11-15 11:44 ` Pavel Emelyanov
@ 2011-11-15 11:51 ` Pekka Enberg
0 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-11-15 11:51 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: Linux Kernel Mailing List, Cyrill Gorcunov, Glauber Costa,
Andi Kleen, Tejun Heo, Andrew Morton, Matt Helsley
On Tue, Nov 15, 2011 at 1:44 PM, Pavel Emelyanov <xemul@parallels.com> wrote:
>>> +unsigned long gen_object_id(void *ptr)
>>> +{
>>> + if (!ptr)
>>> + return 0;
>>> +
>>> + if (unlikely(!ptr_poison)) {
>>> + spin_lock(&ptr_poison_lock);
>>> + if (!ptr_poison)
>>> + get_random_bytes(&ptr_poison, sizeof(ptr_poison));
>>> + spin_unlock(&ptr_poison_lock);
>>> + }
>>> +
>>> + return ((unsigned long)ptr) ^ ptr_poison;
>>> +}
>>
>> You could put this in mm/util.c. Wouldn't it make sense to separate
>> the initialization and use late_initcall() to call it?
>
> OK, will put to util.c
>
> About the initialization - I will put the sanity check about poison being not 0 on
> get_object_id() anyway, so what's the point in separate initialization?
You no longer need the spinlock and you get rid of the potential
double initialization problem because you're not holding the spinlock
when you check for ptr_poisson being zero.
Pekka
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] Routine for generating an safe ID for kernel pointer
2011-11-15 11:36 ` [PATCH 1/4] Routine for generating an safe ID for kernel pointer Pavel Emelyanov
2011-11-15 11:38 ` Pekka Enberg
@ 2011-11-15 15:13 ` Eric Dumazet
2011-11-15 15:20 ` Tejun Heo
2 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2011-11-15 15:13 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: Linux Kernel Mailing List, Cyrill Gorcunov, Glauber Costa,
Andi Kleen, Tejun Heo, Andrew Morton, Matt Helsley
Le mardi 15 novembre 2011 à 15:36 +0400, Pavel Emelyanov a écrit :
> The routine XORs the given pointer with a random value thus producing
> an ID (32 or 64 bit, depending on the arch) which can be shown even to
> unprivileged user space processes without risking of leaking kernel
> information.
>
> It implies that it gets called when the random pool is ready for providing
> a random long value.
>
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>
> ---
> include/linux/gen_object_ids.h | 12 ++++++++++++
> mm/Kconfig | 7 +++++++
> mm/Makefile | 1 +
> mm/gen_object_ids.c | 21 +++++++++++++++++++++
> 4 files changed, 41 insertions(+), 0 deletions(-)
> create mode 100644 include/linux/gen_object_ids.h
> create mode 100644 mm/gen_object_ids.c
>
> diff --git a/include/linux/gen_object_ids.h b/include/linux/gen_object_ids.h
> new file mode 100644
> index 0000000..17981ae
> --- /dev/null
> +++ b/include/linux/gen_object_ids.h
> @@ -0,0 +1,12 @@
> +#ifndef __GEN_OBJECT_IDS_H__
> +#define __GEN_OBJECT_IDS_H__
> +
> +#ifdef CONFIG_GENERIC_OBJECT_IDS
> +unsigned long gen_object_id(void *ptr);
> +#else
> +static inline unsigned long gen_object_id(void *ptr)
> +{
> + return 0;
> +}
> +#endif
> +#endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index f2f1ca1..1480cbf 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -370,3 +370,10 @@ config CLEANCACHE
> in a negligible performance hit.
>
> If unsure, say Y to enable cleancache
> +
> +config GENERIC_OBJECT_IDS
> + bool "Enable generic object ids infrastructure"
> + default n
> + help
> + Turn on the (quite simple) funtionality that can generate IDs for
> + kernel objects which is safe to export to the userspace.
> diff --git a/mm/Makefile b/mm/Makefile
> index 836e416..155797a 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -50,3 +50,4 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
> obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
> obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
> obj-$(CONFIG_CLEANCACHE) += cleancache.o
> +obj-$(CONFIG_GENERIC_OBJECT_IDS) += gen_object_ids.o
> diff --git a/mm/gen_object_ids.c b/mm/gen_object_ids.c
> new file mode 100644
> index 0000000..a75119b
> --- /dev/null
> +++ b/mm/gen_object_ids.c
> @@ -0,0 +1,21 @@
> +#include <linux/gen_object_ids.h>
> +#include <linux/spinlock.h>
> +#include <linux/random.h>
> +
> +static unsigned long ptr_poison __read_mostly;
> +static DEFINE_SPINLOCK(ptr_poison_lock);
> +
> +unsigned long gen_object_id(void *ptr)
> +{
> + if (!ptr)
> + return 0;
> +
> + if (unlikely(!ptr_poison)) {
> + spin_lock(&ptr_poison_lock);
> + if (!ptr_poison)
> + get_random_bytes(&ptr_poison, sizeof(ptr_poison));
> + spin_unlock(&ptr_poison_lock);
> + }
> +
> + return ((unsigned long)ptr) ^ ptr_poison;
> +}
It can not be called from irq context then...
I suggest using following code instead (no lock needed)
if (!ptr_poison) {
unsigned long rnd;
do {
get_random_bytes(&rnd, sizeof(rnd));
} while (rnd == 0);
cmpxchg(&ptr_poison, 0, rnd);
}
return ((unsigned long)ptr) ^ ptr_poison;
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 1/4] Routine for generating an safe ID for kernel pointer
2011-11-15 11:36 ` [PATCH 1/4] Routine for generating an safe ID for kernel pointer Pavel Emelyanov
2011-11-15 11:38 ` Pekka Enberg
2011-11-15 15:13 ` Eric Dumazet
@ 2011-11-15 15:20 ` Tejun Heo
2 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2011-11-15 15:20 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: Linux Kernel Mailing List, Cyrill Gorcunov, Glauber Costa,
Andi Kleen, Andrew Morton, Matt Helsley
On Tue, Nov 15, 2011 at 03:36:33PM +0400, Pavel Emelyanov wrote:
> +unsigned long gen_object_id(void *ptr)
> +{
> + if (!ptr)
> + return 0;
> +
> + if (unlikely(!ptr_poison)) {
> + spin_lock(&ptr_poison_lock);
> + if (!ptr_poison)
> + get_random_bytes(&ptr_poison, sizeof(ptr_poison));
> + spin_unlock(&ptr_poison_lock);
> + }
One thing that worries me about this is that there's one ptr_poison
for all id's and any single leak of a pointer value will make all ids
vulnerable. If we're going to do this, let's segregate different id
spaces and use different poison values for each.
Thank you.
--
tejun
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2011-11-15 11:35 [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks Pavel Emelyanov
2011-11-15 11:36 ` [PATCH 1/4] Routine for generating an safe ID for kernel pointer Pavel Emelyanov
@ 2011-11-15 11:36 ` Pavel Emelyanov
2011-11-15 11:37 ` [PATCH 3/4] proc: Show open file ID in /proc/pid/fdinfo/* Pavel Emelyanov
2011-11-16 5:44 ` [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks Matt Helsley
3 siblings, 0 replies; 20+ messages in thread
From: Pavel Emelyanov @ 2011-11-15 11:36 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Cyrill Gorcunov, Glauber Costa, Andi Kleen, Tejun Heo,
Andrew Morton, Matt Helsley
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
fs/proc/namespaces.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index be177f7..06baabb 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -12,6 +12,7 @@
#include <linux/mnt_namespace.h>
#include <linux/ipc_namespace.h>
#include <linux/pid_namespace.h>
+#include <linux/gen_object_ids.h>
#include "internal.h"
@@ -27,8 +28,19 @@ static const struct proc_ns_operations *ns_entries[] = {
#endif
};
+static ssize_t proc_ns_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ char tmp[32];
+ struct proc_inode *ei = PROC_I(file->f_dentry->d_inode);
+
+ snprintf(tmp, sizeof(tmp), "id:\t%lu\n", gen_object_id(ei->ns));
+ return simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
+}
+
static const struct file_operations ns_file_operations = {
.llseek = no_llseek,
+ .read = proc_ns_read,
};
static struct dentry *proc_ns_instantiate(struct inode *dir,
--
1.5.5.6
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 3/4] proc: Show open file ID in /proc/pid/fdinfo/*
2011-11-15 11:35 [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks Pavel Emelyanov
2011-11-15 11:36 ` [PATCH 1/4] Routine for generating an safe ID for kernel pointer Pavel Emelyanov
2011-11-15 11:36 ` [PATCH 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Pavel Emelyanov
@ 2011-11-15 11:37 ` Pavel Emelyanov
2011-11-16 5:44 ` [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks Matt Helsley
3 siblings, 0 replies; 20+ messages in thread
From: Pavel Emelyanov @ 2011-11-15 11:37 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Cyrill Gorcunov, Glauber Costa, Andi Kleen, Tejun Heo,
Andrew Morton, Matt Helsley
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
fs/proc/base.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 5eb0206..7a9e36b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -83,6 +83,7 @@
#include <linux/pid_namespace.h>
#include <linux/fs_struct.h>
#include <linux/slab.h>
+#include <linux/gen_object_ids.h>
#ifdef CONFIG_HARDWALL
#include <asm/hardwall.h>
#endif
@@ -1934,9 +1935,10 @@ static int proc_fd_info(struct inode *inode, struct path *path, char *info)
if (info)
snprintf(info, PROC_FDINFO_MAX,
"pos:\t%lli\n"
- "flags:\t0%o\n",
+ "flags:\t0%o\n"
+ "id:\t%lu\n",
(long long) file->f_pos,
- f_flags);
+ f_flags, gen_object_id(file));
spin_unlock(&files->file_lock);
put_files_struct(files);
return 0;
--
1.5.5.6
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks
2011-11-15 11:35 [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks Pavel Emelyanov
` (2 preceding siblings ...)
2011-11-15 11:37 ` [PATCH 3/4] proc: Show open file ID in /proc/pid/fdinfo/* Pavel Emelyanov
@ 2011-11-16 5:44 ` Matt Helsley
2011-11-16 6:19 ` Cyrill Gorcunov
2011-11-16 8:25 ` Pavel Emelyanov
3 siblings, 2 replies; 20+ messages in thread
From: Matt Helsley @ 2011-11-16 5:44 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: Linux Kernel Mailing List, Cyrill Gorcunov, Glauber Costa,
Andi Kleen, Tejun Heo, Andrew Morton, Matt Helsley
On Tue, Nov 15, 2011 at 03:35:58PM +0400, Pavel Emelyanov wrote:
> While doing the checkpoint-restore in the userspace one need to determine
> whether various kernel objects (like mm_struct-s of file_struct-s) are shared
> between tasks and restore this state.
>
> The 2nd step can for now be solved by using respective CLONE_XXX flags and
> the unshare syscall, while there's currently no ways for solving the 1st one.
>
> One of the ways for checking whether two tasks share e.g. an mm_struct is to
> provide some mm_struct ID of a task to its proc file. The best from the
> performance point of view ID is the object address in the kernel, but showing
> them to the userspace is not good for performance reasons.
(I think you meant "not good for security reasons."...)
> The previous attempt to solve this was to generate an ID for slab/slub and then
> mix it up with the object index on the slab page. This attempt wasn't met
> warmly by slab maintainers, so here's the 2nd approach.
>
> The object address is XOR-ed with a "random" value of the same size and then
> shown in proc. Providing this poison is not leaked into the userspace then
> ID seem to be safe.
Really? There's no way to quickly derive the random number from known
allocation patterns and thereby break the obfuscation scheme?
To start we can note that the low N bits are directly exposed in the ID
of anything that requires 2^N-byte alignment.
I think it's really a question of whether the high order bits can be derived.
And of course the random number only needs to be derived once per boot
before it reveals the address of everything with an ID.
Some wild speculation:
I bet you could use some cpu affinity, mem policy, slab info, mmap
tricks, etc. to derive more low bits of the random number. You can probably
get even more when you consider objects that don't fit evenly in slabs.
Speaking of slabs, is there some way to use the fact that nearby slab objects
will share their high ID bits? If any of the ID-bearing objects allocated via
kmalloc then inducing memory pressure and/or watching for buddy allocator
merge/splits might reveal more low bits...
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks
2011-11-16 5:44 ` [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks Matt Helsley
@ 2011-11-16 6:19 ` Cyrill Gorcunov
2011-11-16 8:25 ` Pavel Emelyanov
1 sibling, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2011-11-16 6:19 UTC (permalink / raw)
To: Matt Helsley
Cc: Pavel Emelyanov, Linux Kernel Mailing List, Glauber Costa,
Andi Kleen, Tejun Heo, Andrew Morton
On Tue, Nov 15, 2011 at 09:44:27PM -0800, Matt Helsley wrote:
...
> >
> > The object address is XOR-ed with a "random" value of the same size and then
> > shown in proc. Providing this poison is not leaked into the userspace then
> > ID seem to be safe.
>
> Really? There's no way to quickly derive the random number from known
> allocation patterns and thereby break the obfuscation scheme?
> To start we can note that the low N bits are directly exposed in the ID
> of anything that requires 2^N-byte alignment.
>
> I think it's really a question of whether the high order bits can be derived.
>
Good point. I suppose we might use 2 random numbers here, one for xor and
second to shuffle bits.
> And of course the random number only needs to be derived once per boot
> before it reveals the address of everything with an ID.
Cyrill
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks
2011-11-16 5:44 ` [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks Matt Helsley
2011-11-16 6:19 ` Cyrill Gorcunov
@ 2011-11-16 8:25 ` Pavel Emelyanov
2011-11-18 23:25 ` Matt Helsley
1 sibling, 1 reply; 20+ messages in thread
From: Pavel Emelyanov @ 2011-11-16 8:25 UTC (permalink / raw)
To: Matt Helsley
Cc: Linux Kernel Mailing List, Cyrill Gorcunov, Glauber Costa,
Andi Kleen, Tejun Heo, Andrew Morton
On 11/16/2011 09:44 AM, Matt Helsley wrote:
> On Tue, Nov 15, 2011 at 03:35:58PM +0400, Pavel Emelyanov wrote:
>> While doing the checkpoint-restore in the userspace one need to determine
>> whether various kernel objects (like mm_struct-s of file_struct-s) are shared
>> between tasks and restore this state.
>>
>> The 2nd step can for now be solved by using respective CLONE_XXX flags and
>> the unshare syscall, while there's currently no ways for solving the 1st one.
>>
>> One of the ways for checking whether two tasks share e.g. an mm_struct is to
>> provide some mm_struct ID of a task to its proc file. The best from the
>> performance point of view ID is the object address in the kernel, but showing
>> them to the userspace is not good for performance reasons.
>
> (I think you meant "not good for security reasons."...)
>
>> The previous attempt to solve this was to generate an ID for slab/slub and then
>> mix it up with the object index on the slab page. This attempt wasn't met
>> warmly by slab maintainers, so here's the 2nd approach.
>>
>> The object address is XOR-ed with a "random" value of the same size and then
>> shown in proc. Providing this poison is not leaked into the userspace then
>> ID seem to be safe.
>
> Really? There's no way to quickly derive the random number from known
> allocation patterns and thereby break the obfuscation scheme?
> To start we can note that the low N bits are directly exposed in the ID
> of anything that requires 2^N-byte alignment.
>
> I think it's really a question of whether the high order bits can be derived.
>
> And of course the random number only needs to be derived once per boot
> before it reveals the address of everything with an ID.
Tejun already proposed to split ID space and use different poisons for them.
> Some wild speculation:
>
> I bet you could use some cpu affinity, mem policy, slab info, mmap
> tricks, etc. to derive more low bits of the random number. You can probably
> get even more when you consider objects that don't fit evenly in slabs.
> Speaking of slabs, is there some way to use the fact that nearby slab objects
> will share their high ID bits?
OK, let's assume we found out that two mm_struct IDs have higher bits equal, what
can we do next to split address bits from the poison ones?
> If any of the ID-bearing objects allocated via
> kmalloc then inducing memory pressure and/or watching for buddy allocator
> merge/splits might reveal more low bits...
>
> Cheers,
> -Matt Helsley
>
> .
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks
2011-11-16 8:25 ` Pavel Emelyanov
@ 2011-11-18 23:25 ` Matt Helsley
0 siblings, 0 replies; 20+ messages in thread
From: Matt Helsley @ 2011-11-18 23:25 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: Matt Helsley, Linux Kernel Mailing List, Cyrill Gorcunov,
Glauber Costa, Andi Kleen, Tejun Heo, Andrew Morton
On Wed, Nov 16, 2011 at 12:25:16PM +0400, Pavel Emelyanov wrote:
> On 11/16/2011 09:44 AM, Matt Helsley wrote:
> > On Tue, Nov 15, 2011 at 03:35:58PM +0400, Pavel Emelyanov wrote:
> >> While doing the checkpoint-restore in the userspace one need to determine
> >> whether various kernel objects (like mm_struct-s of file_struct-s) are shared
> >> between tasks and restore this state.
> >>
> >> The 2nd step can for now be solved by using respective CLONE_XXX flags and
> >> the unshare syscall, while there's currently no ways for solving the 1st one.
> >>
> >> One of the ways for checking whether two tasks share e.g. an mm_struct is to
> >> provide some mm_struct ID of a task to its proc file. The best from the
> >> performance point of view ID is the object address in the kernel, but showing
> >> them to the userspace is not good for performance reasons.
> >
> > (I think you meant "not good for security reasons."...)
> >
> >> The previous attempt to solve this was to generate an ID for slab/slub and then
> >> mix it up with the object index on the slab page. This attempt wasn't met
> >> warmly by slab maintainers, so here's the 2nd approach.
> >>
> >> The object address is XOR-ed with a "random" value of the same size and then
> >> shown in proc. Providing this poison is not leaked into the userspace then
> >> ID seem to be safe.
> >
> > Really? There's no way to quickly derive the random number from known
> > allocation patterns and thereby break the obfuscation scheme?
> > To start we can note that the low N bits are directly exposed in the ID
> > of anything that requires 2^N-byte alignment.
> >
> > I think it's really a question of whether the high order bits can be derived.
> >
> > And of course the random number only needs to be derived once per boot
> > before it reveals the address of everything with an ID.
>
> Tejun already proposed to split ID space and use different poisons for them.
>
> > Some wild speculation:
> >
> > I bet you could use some cpu affinity, mem policy, slab info, mmap
> > tricks, etc. to derive more low bits of the random number. You can probably
> > get even more when you consider objects that don't fit evenly in slabs.
> > Speaking of slabs, is there some way to use the fact that nearby slab objects
> > will share their high ID bits?
>
> OK, let's assume we found out that two mm_struct IDs have higher bits equal, what
> can we do next to split address bits from the poison ones?
Perhaps we can figure out where things are likely to be allocated by
looking at the booted kernel in /boot.
Do we really need to spend time discussing precisely how this ID scheme
can be attacked? I think we're better off just switching to the sha* hash
scheme or 64-bit counter instead.
It would also be good to specify that the IDs presented to userspace
are identifying strings (names) -- not identifying numbers. This way the ABI
will be forward and backward compatible if the kernel ever needs to change
the way it generates them.
Cheers,
-Matt
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks
@ 2011-11-17 9:55 Pavel Emelyanov
2011-11-17 9:56 ` [PATCH 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Pavel Emelyanov
0 siblings, 1 reply; 20+ messages in thread
From: Pavel Emelyanov @ 2011-11-17 9:55 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Cyrill Gorcunov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Andrew Morton
While doing the checkpoint-restore in the userspace one need to determine
whether various kernel objects (like mm_struct-s of file_struct-s) are shared
between tasks and restore this state.
The 2nd step can for now be solved by using respective CLONE_XXX flags and
the unshare syscall, while there's currently no ways for solving the 1st one.
One of the ways for checking whether two tasks share e.g. an mm_struct is to
provide some mm_struct ID of a task to its proc file. The best from the
performance point of view ID is the object address in the kernel, but showing
them to the userspace is not good for security reasons.
Thus the object address is XOR-ed with a "random" value of the same size and
then shown in proc. Providing this poison is not leaked into the userspace then
ID seem to be safe. The objects for which the IDs are shown are:
* all namespaces living in /proc/pid/ns/
* open files (shown in /proc/pid/fdinfo/)
* objects, that can be shared with CLONE_XXX flags (except for namespaces)
Changes since
v1: * Tejun worried about the single poison value was a weak side - leaking one
makes all the IDs vulnerable. To address this several poison values - one
per object type - are introduced. They are stored in a plain array. Tejun,
is this enough from your POV, or you'd like to see them widely scattered
over the memory?
* Pekka proposed to initialized poison values in the late_initcall callback
* ... and move the code to mm/util.c
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2011-11-17 9:55 [PATCH v2 " Pavel Emelyanov
@ 2011-11-17 9:56 ` Pavel Emelyanov
0 siblings, 0 replies; 20+ messages in thread
From: Pavel Emelyanov @ 2011-11-17 9:56 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Cyrill Gorcunov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Andrew Morton
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
fs/proc/namespaces.c | 12 ++++++++++++
include/linux/mm.h | 1 +
2 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index be177f7..48c64ab 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -27,8 +27,20 @@ static const struct proc_ns_operations *ns_entries[] = {
#endif
};
+static ssize_t proc_ns_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ char tmp[32];
+ struct proc_inode *ei = PROC_I(file->f_dentry->d_inode);
+
+ snprintf(tmp, sizeof(tmp), "id:\t%lu\n",
+ gen_object_id(ei->ns, GEN_OBJ_ID_NS));
+ return simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
+}
+
static const struct file_operations ns_file_operations = {
.llseek = no_llseek,
+ .read = proc_ns_read,
};
static struct dentry *proc_ns_instantiate(struct inode *dir,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80ea327..cd4d727 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1628,6 +1628,7 @@ extern void copy_user_huge_page(struct page *dst, struct page *src,
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
enum {
+ GEN_OBJ_ID_NS,
GEN_OBJ_ID_TYPES,
};
--
1.5.5.6
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [patch 0/4] kernel generic object IDs series
@ 2011-12-22 12:56 Cyrill Gorcunov
2011-12-22 12:56 ` [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Cyrill Gorcunov
0 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov @ 2011-12-22 12:56 UTC (permalink / raw)
To: linux-kernel
Cc: Pavel Emelyanov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Vasiliy Kulikov,
Andrew Morton
Hi,
when we do the checkpoint-restore we need to find out if various kernel objects
(like mm_struct-s of file_struct-s) are shared between tasks (and restore them
after).
While at restore time we can use CLONE_XXX flags and unshare syscall there is
no way to find out sharing structures at checkpoint time. Thus, to chop the
knit, we introduce generic-object-ids helpers which do basically encode
kernel pointers into some form (at moment is's simple XOR operation over
a random cookie value) and provide them back to userspace. So one can test
if two resource are shared between different task.
Since such information is pretty valuable -- it's allowed for CAP_SYS_ADMIN
only since xor encoded values has nothing to do with security but used only
to break an impression that ID means something other than random "number"
which should be used for "sameness" test only and nothing else.
The following objects are shown at the moment
- all namespaces living in /proc/pid/ns/
- open files (shown in /proc/pid/fdinfo/)
- objects, that can be shared with CLONE_XXX flags (except for namespaces)
Any kind of comments and especially complains (!) are very appreciated!
Cyrill
^ permalink raw reply [flat|nested] 20+ messages in thread
* [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2011-12-22 12:56 [patch 0/4] kernel generic object IDs series Cyrill Gorcunov
@ 2011-12-22 12:56 ` Cyrill Gorcunov
0 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2011-12-22 12:56 UTC (permalink / raw)
To: linux-kernel
Cc: Pavel Emelyanov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Vasiliy Kulikov,
Andrew Morton, Cyrill Gorcunov
[-- Attachment #1: 2-objids-namespaces --]
[-- Type: text/plain, Size: 3835 bytes --]
This patch adds proc_ns_read method which provides
IDs for /proc/pid/ns/* files.
Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Glauber Costa <glommer@parallels.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/filesystems/proc.txt | 24 ++++++++++++++++++++++++
fs/proc/namespaces.c | 21 +++++++++++++++++++++
include/linux/mm.h | 1 +
3 files changed, 46 insertions(+)
Index: linux-2.6.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.git.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.git/Documentation/filesystems/proc.txt
@@ -40,6 +40,7 @@ Table of Contents
3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
3.5 /proc/<pid>/mountinfo - Information about mounts
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
+ 3.7 /proc/<pid>/ns - Information about namespaces
------------------------------------------------------------------------------
@@ -1545,3 +1546,26 @@ a task to set its own or one of its thre
is limited in size compared to the cmdline value, so writing anything longer
then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
comm value.
+
+3.7 /proc/<pid>/ns - Information about namespaces
+-----------------------------------------------------
+
+This directory consists of the following files "net", "uts", "ipc",
+and depend if appropriate CONFIG_ entry is set, i.e. it's possible
+to have only one, two or all three files here.
+
+Currently file contents provides that named "object id" number, which
+is a number useful for the one purpose only -- to test if two differen
+<pid> share the namespace.
+
+A typical format is
+
+id: 445332486300860161
+
+i.e. "id" followed by a number. One should never assume the number
+means something, it is only useful for "sameness" test with another number
+obtained from another <pid>.
+
+Moreover, a safe approach is to remember it as a string, since format may
+change in future and id would be not a long integer value, but something
+else, say SHA1/2 or even uuid encoded stream.
Index: linux-2.6.git/fs/proc/namespaces.c
===================================================================
--- linux-2.6.git.orig/fs/proc/namespaces.c
+++ linux-2.6.git/fs/proc/namespaces.c
@@ -27,10 +27,31 @@ static const struct proc_ns_operations *
#endif
};
+#ifdef CONFIG_GENERIC_OBJECT_IDS
+static ssize_t proc_ns_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct proc_inode *ei = PROC_I(file->f_dentry->d_inode);
+ char tmp[32];
+
+ snprintf(tmp, sizeof(tmp), "id:\t%lu\n",
+ gen_obj_id(ei->ns, GEN_OBJ_ID_NS));
+ return simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
+}
+
+static const struct file_operations ns_file_operations = {
+ .llseek = no_llseek,
+ .read = proc_ns_read,
+};
+
+#else
+
static const struct file_operations ns_file_operations = {
.llseek = no_llseek,
};
+#endif /* CONFIG_GENERIC_OBJECT_IDS */
+
static struct dentry *proc_ns_instantiate(struct inode *dir,
struct dentry *dentry, struct task_struct *task, const void *ptr)
{
Index: linux-2.6.git/include/linux/mm.h
===================================================================
--- linux-2.6.git.orig/include/linux/mm.h
+++ linux-2.6.git/include/linux/mm.h
@@ -1641,6 +1641,7 @@ extern void copy_user_huge_page(struct p
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
enum {
+ GEN_OBJ_ID_NS,
GEN_OBJ_ID_TYPES,
};
^ permalink raw reply [flat|nested] 20+ messages in thread
* [patch 0/4] generic object ids, v2
@ 2011-12-23 12:47 Cyrill Gorcunov
2011-12-23 12:47 ` [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Cyrill Gorcunov
0 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov @ 2011-12-23 12:47 UTC (permalink / raw)
To: linux-kernel
Cc: Pavel Emelyanov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Vasiliy Kulikov,
Andrew Morton, Alexey Dobriyan
I've been strongly advised to not put things into mm.h
so what about this series which introduces gen_obj_id.c/h.
Does it look better?
Cyrill
^ permalink raw reply [flat|nested] 20+ messages in thread
* [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2011-12-23 12:47 [patch 0/4] generic object ids, v2 Cyrill Gorcunov
@ 2011-12-23 12:47 ` Cyrill Gorcunov
2012-01-04 6:02 ` Eric W. Biederman
0 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov @ 2011-12-23 12:47 UTC (permalink / raw)
To: linux-kernel
Cc: Pavel Emelyanov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Vasiliy Kulikov,
Andrew Morton, Alexey Dobriyan, Cyrill Gorcunov
[-- Attachment #1: 2-objids-namespaces --]
[-- Type: text/plain, Size: 3988 bytes --]
This patch adds proc_ns_read method which provides
IDs for /proc/pid/ns/* files.
Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Glauber Costa <glommer@parallels.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
---
Documentation/filesystems/proc.txt | 24 ++++++++++++++++++++++++
fs/proc/namespaces.c | 22 ++++++++++++++++++++++
include/linux/gen_obj_id.h | 1 +
3 files changed, 47 insertions(+)
Index: linux-2.6.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.git.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.git/Documentation/filesystems/proc.txt
@@ -40,6 +40,7 @@ Table of Contents
3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
3.5 /proc/<pid>/mountinfo - Information about mounts
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
+ 3.7 /proc/<pid>/ns - Information about namespaces
------------------------------------------------------------------------------
@@ -1545,3 +1546,26 @@ a task to set its own or one of its thre
is limited in size compared to the cmdline value, so writing anything longer
then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
comm value.
+
+3.7 /proc/<pid>/ns - Information about namespaces
+-----------------------------------------------------
+
+This directory consists of the following files "net", "uts", "ipc",
+and depend if appropriate CONFIG_ entry is set, i.e. it's possible
+to have only one, two or all three files here.
+
+Currently file contents provides that named "object id" number, which
+is a number useful for the one purpose only -- to test if two differen
+<pid> share the namespace.
+
+A typical format is
+
+id: 445332486300860161
+
+i.e. "id" followed by a number. One should never assume the number
+means something, it is only useful for "sameness" test with another number
+obtained from another <pid>.
+
+Moreover, a safe approach is to remember it as a string, since format may
+change in future and id would be not a long integer value, but something
+else, say SHA1/2 or even uuid encoded stream.
Index: linux-2.6.git/fs/proc/namespaces.c
===================================================================
--- linux-2.6.git.orig/fs/proc/namespaces.c
+++ linux-2.6.git/fs/proc/namespaces.c
@@ -12,6 +12,7 @@
#include <linux/mnt_namespace.h>
#include <linux/ipc_namespace.h>
#include <linux/pid_namespace.h>
+#include <linux/gen_obj_id.h>
#include "internal.h"
@@ -27,10 +28,31 @@ static const struct proc_ns_operations *
#endif
};
+#ifdef CONFIG_GENERIC_OBJECT_ID
+static ssize_t proc_ns_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct proc_inode *ei = PROC_I(file->f_dentry->d_inode);
+ char tmp[32];
+
+ snprintf(tmp, sizeof(tmp), "id:\t%lu\n",
+ gen_obj_id(ei->ns, GEN_OBJ_ID_NS));
+ return simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
+}
+
+static const struct file_operations ns_file_operations = {
+ .llseek = no_llseek,
+ .read = proc_ns_read,
+};
+
+#else
+
static const struct file_operations ns_file_operations = {
.llseek = no_llseek,
};
+#endif /* CONFIG_GENERIC_OBJECT_ID */
+
static struct dentry *proc_ns_instantiate(struct inode *dir,
struct dentry *dentry, struct task_struct *task, const void *ptr)
{
Index: linux-2.6.git/include/linux/gen_obj_id.h
===================================================================
--- linux-2.6.git.orig/include/linux/gen_obj_id.h
+++ linux-2.6.git/include/linux/gen_obj_id.h
@@ -4,6 +4,7 @@
#ifdef __KERNEL__
enum {
+ GEN_OBJ_ID_NS,
GEN_OBJ_ID_TYPES,
};
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2011-12-23 12:47 ` [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Cyrill Gorcunov
@ 2012-01-04 6:02 ` Eric W. Biederman
2012-01-04 11:26 ` Cyrill Gorcunov
0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2012-01-04 6:02 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton, Alexey Dobriyan
Cyrill Gorcunov <gorcunov@openvz.org> writes:
> This patch adds proc_ns_read method which provides
> IDs for /proc/pid/ns/* files.
Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
This is a poorly thought out user interface. If we are going to return
this kind of information and I believe we should we should, we should
return the id in the inode field with stat.
Comparing device+inode for equality is the traditional way to see if
two objects are the same in unix and there is no reason to make up
a new interface to get this functionality.
Furthermore we should always return the information, as it is valuable
even outside of the checkpoint/restart context.
I am also concerned that you appear to be building an interface
for use by checkpoint/restart that makes it impossible
checkpoint/restart the programs using that interface. The reason
is that you appear to be putting this nebulous id into a global
namespace and as such even if we wanted to I don't see how we could
build a version where we could restore the id during a restart. And
the thing is if you start building interfaces with identifiers you can
not possibly restore I expect you will find you have painted yourself
into a corner.
Using inode from stat avoids painting yourself into a corner because
you have the possibility of different mounts with different device
numbers having different inode numbers.
For the short term I don't see value in being able to restore the
object identifiers, but I do see a lot of value in allowing for a future
where nested checkpoint/restart is an option.
Eric
> Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Tejun Heo <tj@kernel.org>
> CC: Matt Helsley <matthltc@us.ibm.com>
> CC: Pekka Enberg <penberg@kernel.org>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Vasiliy Kulikov <segoon@openwall.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Alexey Dobriyan <adobriyan@gmail.com>
> ---
> Documentation/filesystems/proc.txt | 24 ++++++++++++++++++++++++
> fs/proc/namespaces.c | 22 ++++++++++++++++++++++
> include/linux/gen_obj_id.h | 1 +
> 3 files changed, 47 insertions(+)
>
> Index: linux-2.6.git/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux-2.6.git.orig/Documentation/filesystems/proc.txt
> +++ linux-2.6.git/Documentation/filesystems/proc.txt
> @@ -40,6 +40,7 @@ Table of Contents
> 3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
> 3.5 /proc/<pid>/mountinfo - Information about mounts
> 3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
> + 3.7 /proc/<pid>/ns - Information about namespaces
>
>
> ------------------------------------------------------------------------------
> @@ -1545,3 +1546,26 @@ a task to set its own or one of its thre
> is limited in size compared to the cmdline value, so writing anything longer
> then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
> comm value.
> +
> +3.7 /proc/<pid>/ns - Information about namespaces
> +-----------------------------------------------------
> +
> +This directory consists of the following files "net", "uts", "ipc",
> +and depend if appropriate CONFIG_ entry is set, i.e. it's possible
> +to have only one, two or all three files here.
> +
> +Currently file contents provides that named "object id" number, which
> +is a number useful for the one purpose only -- to test if two differen
> +<pid> share the namespace.
> +
> +A typical format is
> +
> +id: 445332486300860161
> +
> +i.e. "id" followed by a number. One should never assume the number
> +means something, it is only useful for "sameness" test with another number
> +obtained from another <pid>.
> +
> +Moreover, a safe approach is to remember it as a string, since format may
> +change in future and id would be not a long integer value, but something
> +else, say SHA1/2 or even uuid encoded stream.
> Index: linux-2.6.git/fs/proc/namespaces.c
> ===================================================================
> --- linux-2.6.git.orig/fs/proc/namespaces.c
> +++ linux-2.6.git/fs/proc/namespaces.c
> @@ -12,6 +12,7 @@
> #include <linux/mnt_namespace.h>
> #include <linux/ipc_namespace.h>
> #include <linux/pid_namespace.h>
> +#include <linux/gen_obj_id.h>
> #include "internal.h"
>
>
> @@ -27,10 +28,31 @@ static const struct proc_ns_operations *
> #endif
> };
>
> +#ifdef CONFIG_GENERIC_OBJECT_ID
> +static ssize_t proc_ns_read(struct file *file, char __user *buf,
> + size_t len, loff_t *ppos)
> +{
> + struct proc_inode *ei = PROC_I(file->f_dentry->d_inode);
> + char tmp[32];
> +
> + snprintf(tmp, sizeof(tmp), "id:\t%lu\n",
> + gen_obj_id(ei->ns, GEN_OBJ_ID_NS));
> + return simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
> +}
> +
> +static const struct file_operations ns_file_operations = {
> + .llseek = no_llseek,
> + .read = proc_ns_read,
> +};
> +
> +#else
> +
> static const struct file_operations ns_file_operations = {
> .llseek = no_llseek,
> };
>
> +#endif /* CONFIG_GENERIC_OBJECT_ID */
> +
> static struct dentry *proc_ns_instantiate(struct inode *dir,
> struct dentry *dentry, struct task_struct *task, const void *ptr)
> {
> Index: linux-2.6.git/include/linux/gen_obj_id.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/gen_obj_id.h
> +++ linux-2.6.git/include/linux/gen_obj_id.h
> @@ -4,6 +4,7 @@
> #ifdef __KERNEL__
>
> enum {
> + GEN_OBJ_ID_NS,
> GEN_OBJ_ID_TYPES,
> };
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2012-01-04 6:02 ` Eric W. Biederman
@ 2012-01-04 11:26 ` Cyrill Gorcunov
2012-01-04 17:56 ` Eric W. Biederman
0 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov @ 2012-01-04 11:26 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton, Alexey Dobriyan
On Tue, Jan 03, 2012 at 10:02:32PM -0800, Eric W. Biederman wrote:
> Cyrill Gorcunov <gorcunov@openvz.org> writes:
>
> > This patch adds proc_ns_read method which provides
> > IDs for /proc/pid/ns/* files.
>
> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> This is a poorly thought out user interface. If we are going to return
> this kind of information and I believe we should we should, we should
> return the id in the inode field with stat.
>
> Comparing device+inode for equality is the traditional way to see if
> two objects are the same in unix and there is no reason to make up
> a new interface to get this functionality.
>
> Furthermore we should always return the information, as it is valuable
> even outside of the checkpoint/restart context.
>
> I am also concerned that you appear to be building an interface
> for use by checkpoint/restart that makes it impossible
> checkpoint/restart the programs using that interface. The reason
> is that you appear to be putting this nebulous id into a global
> namespace and as such even if we wanted to I don't see how we could
> build a version where we could restore the id during a restart. And
> the thing is if you start building interfaces with identifiers you can
> not possibly restore I expect you will find you have painted yourself
> into a corner.
>
> Using inode from stat avoids painting yourself into a corner because
> you have the possibility of different mounts with different device
> numbers having different inode numbers.
>
> For the short term I don't see value in being able to restore the
> object identifiers, but I do see a lot of value in allowing for a future
> where nested checkpoint/restart is an option.
>
> Eric
>
Hi Eric, thanks a lot for comments! I must admit I never though about
nested checkpoint/restore simply because even plain and direct CR still
has a number of problems which are not yet addressed.
As to return such ID in ino field (if I understand you right -- you
propose to return such ID as inode of kstat structure) -- I don't think
it would be right either. Instead of one iteface applied to all objects
we export there will be a few different approaches instead -- for net-ns
it would be dev+ino, for tasks and other members of task-structure
it'll be IDs from /proc (as implemented in another patches). I like
more Kyle's idea about object_id() call which would simply return the
entrypted ID to user-space and it'll be up to user-space to do anything
it wants with such pieces of information.
Yes, there will be no way to restore such IDs later but the interface
is not supposed to work this way. All this mess only because of lack
of way to figure out which task resources are shared and which are not.
Maybe if we can carry CLONE_ flags from copy_process()/unshare()/setns()
(and which else modify task resources?) inside task_struct and provide
these flags back to user-space we might not need the IDs helpers at all.
But I think such approach might end up in a pretty big patch bloating
the kernel. In turn I wanted to bring as minimum new functionality as
possible *with* a way to completely turn it off if user don't need it.
Cyrill
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2012-01-04 11:26 ` Cyrill Gorcunov
@ 2012-01-04 17:56 ` Eric W. Biederman
2012-01-04 18:19 ` Cyrill Gorcunov
0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2012-01-04 17:56 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton, Alexey Dobriyan
Cyrill Gorcunov <gorcunov@gmail.com> writes:
> On Tue, Jan 03, 2012 at 10:02:32PM -0800, Eric W. Biederman wrote:
>> Cyrill Gorcunov <gorcunov@openvz.org> writes:
>>
>> > This patch adds proc_ns_read method which provides
>> > IDs for /proc/pid/ns/* files.
>>
>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>
>> This is a poorly thought out user interface. If we are going to return
>> this kind of information and I believe we should we should, we should
>> return the id in the inode field with stat.
>>
>> Comparing device+inode for equality is the traditional way to see if
>> two objects are the same in unix and there is no reason to make up
>> a new interface to get this functionality.
>>
>> Furthermore we should always return the information, as it is valuable
>> even outside of the checkpoint/restart context.
>>
>> I am also concerned that you appear to be building an interface
>> for use by checkpoint/restart that makes it impossible
>> checkpoint/restart the programs using that interface. The reason
>> is that you appear to be putting this nebulous id into a global
>> namespace and as such even if we wanted to I don't see how we could
>> build a version where we could restore the id during a restart. And
>> the thing is if you start building interfaces with identifiers you can
>> not possibly restore I expect you will find you have painted yourself
>> into a corner.
>>
>> Using inode from stat avoids painting yourself into a corner because
>> you have the possibility of different mounts with different device
>> numbers having different inode numbers.
>>
>> For the short term I don't see value in being able to restore the
>> object identifiers, but I do see a lot of value in allowing for a future
>> where nested checkpoint/restart is an option.
>>
>> Eric
>>
>
> Hi Eric, thanks a lot for comments! I must admit I never though about
> nested checkpoint/restore simply because even plain and direct CR still
> has a number of problems which are not yet addressed.
>
> As to return such ID in ino field (if I understand you right -- you
> propose to return such ID as inode of kstat structure) -- I don't think
> it would be right either. Instead of one iteface applied to all objects
> we export there will be a few different approaches instead -- for net-ns
> it would be dev+ino, for tasks and other members of task-structure
> it'll be IDs from /proc (as implemented in another patches). I like
> more Kyle's idea about object_id() call which would simply return the
> entrypted ID to user-space and it'll be up to user-space to do anything
> it wants with such pieces of information.
Right now everything thing that is exported is dev+ino. My objection
is that you are adding yet another interface to get that information.
I already have patches that already implement dev+ino for the namespaces
so I fully expect that to happen independently of your patches. My
priority is to get the rest of the namespaces exported which requires
a bit more review.
> Yes, there will be no way to restore such IDs later but the interface
> is not supposed to work this way.
It sounds like it won't be possible to retrofit the ability to restore
the IDs later. If the path to what will be needed to support nested
checkpoint/restore is not clear the user space interface is broken
by design. And since it is broken by design I say the design needs
to bake more before we think of baking it.
> All this mess only because of lack
> of way to figure out which task resources are shared and which are not.
> Maybe if we can carry CLONE_ flags from copy_process()/unshare()/setns()
> (and which else modify task resources?) inside task_struct and provide
> these flags back to user-space we might not need the IDs helpers at all.
> But I think such approach might end up in a pretty big patch bloating
> the kernel. In turn I wanted to bring as minimum new functionality as
> possible *with* a way to completely turn it off if user don't need it.
The tricky case is file descriptors and file descriptors can be passed
over unix domain sockets in arbitrary ways.
If you can find a way to do this without id helpers that sounds like
a good design.
I have a nasty feeling that by trying to do this piecemeal instead of
in one big system call you are slowly painting yourself into a corner
from which you can not get out.
Eric
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2012-01-04 17:56 ` Eric W. Biederman
@ 2012-01-04 18:19 ` Cyrill Gorcunov
0 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2012-01-04 18:19 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton, Alexey Dobriyan
On Wed, Jan 04, 2012 at 09:56:24AM -0800, Eric W. Biederman wrote:
...
> >
> > Hi Eric, thanks a lot for comments! I must admit I never though about
> > nested checkpoint/restore simply because even plain and direct CR still
> > has a number of problems which are not yet addressed.
> >
> > As to return such ID in ino field (if I understand you right -- you
> > propose to return such ID as inode of kstat structure) -- I don't think
> > it would be right either. Instead of one iteface applied to all objects
> > we export there will be a few different approaches instead -- for net-ns
> > it would be dev+ino, for tasks and other members of task-structure
> > it'll be IDs from /proc (as implemented in another patches). I like
> > more Kyle's idea about object_id() call which would simply return the
> > entrypted ID to user-space and it'll be up to user-space to do anything
> > it wants with such pieces of information.
>
> Right now everything thing that is exported is dev+ino. My objection
> is that you are adding yet another interface to get that information.
>
> I already have patches that already implement dev+ino for the namespaces
> so I fully expect that to happen independently of your patches. My
> priority is to get the rest of the namespaces exported which requires
> a bit more review.
>
Ah, good to know, could you please point me where I can get them and try
at least dev+ino part out?
> > Yes, there will be no way to restore such IDs later but the interface
> > is not supposed to work this way.
>
> It sounds like it won't be possible to retrofit the ability to restore
> the IDs later. If the path to what will be needed to support nested
> checkpoint/restore is not clear the user space interface is broken
> by design. And since it is broken by design I say the design needs
> to bake more before we think of baking it.
>
I'm not against of chaging/improving design at all. If there some other
ways to retrieve this kind of information I'm gladly dropping patches
piece-by-piece.
> > All this mess only because of lack
> > of way to figure out which task resources are shared and which are not.
> > Maybe if we can carry CLONE_ flags from copy_process()/unshare()/setns()
> > (and which else modify task resources?) inside task_struct and provide
> > these flags back to user-space we might not need the IDs helpers at all.
> > But I think such approach might end up in a pretty big patch bloating
> > the kernel. In turn I wanted to bring as minimum new functionality as
> > possible *with* a way to completely turn it off if user don't need it.
>
> The tricky case is file descriptors and file descriptors can be passed
> over unix domain sockets in arbitrary ways.
>
Not really, what about other members of task-structure, such as mm, files
and others? If I export this bits I have to export them somehow in a safe
way which would not reveal too much of kernel internals.
> If you can find a way to do this without id helpers that sounds like
> a good design.
>
Yes, I'm trying to find some other way but without much luck at moment.
Once I have something to show -- of course I send it to lkml immediately.
> I have a nasty feeling that by trying to do this piecemeal instead of
> in one big system call you are slowly painting yourself into a corner
> from which you can not get out.
>
Yes again, that was the reason the patches flew to LKML -- just
to obtain as much comments as possible and find some sane way.
Cyrill
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-01-04 18:19 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-15 11:35 [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks Pavel Emelyanov
2011-11-15 11:36 ` [PATCH 1/4] Routine for generating an safe ID for kernel pointer Pavel Emelyanov
2011-11-15 11:38 ` Pekka Enberg
2011-11-15 11:44 ` Pavel Emelyanov
2011-11-15 11:51 ` Pekka Enberg
2011-11-15 15:13 ` Eric Dumazet
2011-11-15 15:20 ` Tejun Heo
2011-11-15 11:36 ` [PATCH 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Pavel Emelyanov
2011-11-15 11:37 ` [PATCH 3/4] proc: Show open file ID in /proc/pid/fdinfo/* Pavel Emelyanov
2011-11-16 5:44 ` [PATCH 0/4] Checkpoint/Restore: Show in proc IDs of objects that can be shared between tasks Matt Helsley
2011-11-16 6:19 ` Cyrill Gorcunov
2011-11-16 8:25 ` Pavel Emelyanov
2011-11-18 23:25 ` Matt Helsley
-- strict thread matches above, loose matches on Subject: below --
2011-11-17 9:55 [PATCH v2 " Pavel Emelyanov
2011-11-17 9:56 ` [PATCH 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Pavel Emelyanov
2011-12-22 12:56 [patch 0/4] kernel generic object IDs series Cyrill Gorcunov
2011-12-22 12:56 ` [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Cyrill Gorcunov
2011-12-23 12:47 [patch 0/4] generic object ids, v2 Cyrill Gorcunov
2011-12-23 12:47 ` [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Cyrill Gorcunov
2012-01-04 6:02 ` Eric W. Biederman
2012-01-04 11:26 ` Cyrill Gorcunov
2012-01-04 17:56 ` Eric W. Biederman
2012-01-04 18:19 ` Cyrill Gorcunov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.