* [patch 1/4] Add routine for generating an ID for kernel pointer
2011-12-22 12:56 [patch 0/4] kernel generic object IDs series Cyrill Gorcunov
@ 2011-12-22 12:56 ` Cyrill Gorcunov
2011-12-28 16:51 ` Alan Cox
2011-12-22 12:56 ` [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Cyrill Gorcunov
` (3 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-12-22 12:56 UTC (permalink / raw)
To: linux-kernel
Cc: Pavel Emelyanov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Vasiliy Kulikov,
Andrew Morton, Cyrill Gorcunov
[-- Attachment #1: 1-introduce-gen_obj_id --]
[-- Type: text/plain, Size: 4078 bytes --]
The routine XORs the given pointer with a random value
producing an ID (32 or 64 bit, depending on the arch).
Since it's a valuable information -- only CAP_SYS_ADMIN
is allowed to obtain it.
Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Glauber Costa <glommer@parallels.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mm.h | 10 ++++++++++
mm/Kconfig | 16 ++++++++++++++++
mm/util.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 75 insertions(+)
Index: linux-2.6.git/include/linux/mm.h
===================================================================
--- linux-2.6.git.orig/include/linux/mm.h
+++ linux-2.6.git/include/linux/mm.h
@@ -1640,5 +1640,15 @@ extern void copy_user_huge_page(struct p
unsigned int pages_per_huge_page);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
+enum {
+ GEN_OBJ_ID_TYPES,
+};
+
+#ifdef CONFIG_GENERIC_OBJECT_IDS
+unsigned long gen_obj_id(void *ptr, int type);
+#else
+static inline unsigned long gen_obj_id(void *ptr, int type) { return 0; }
+#endif
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
Index: linux-2.6.git/mm/Kconfig
===================================================================
--- linux-2.6.git.orig/mm/Kconfig
+++ linux-2.6.git/mm/Kconfig
@@ -373,3 +373,19 @@ config CLEANCACHE
in a negligible performance hit.
If unsure, say Y to enable cleancache
+
+config GENERIC_OBJECT_IDS
+ bool "Enable generic object ids infrastructure"
+ depends on CHECKPOINT_RESTORE
+ default n
+ help
+ Turn on the functionality that can generate IDs for kernel
+ objects, which are exported to userspace via /proc filesystem.
+
+ It is useful if you need to examinate kernel objects and test
+ if they are shared between several tasks. These IDs should never
+ be used for anything but the "sameness" test. Besides, the IDs are
+ dynamic and valid only while object is alive, once it get freed or
+ kernel is rebooted -- the IDs will be changed.
+
+ If unsure, say N here.
Index: linux-2.6.git/mm/util.c
===================================================================
--- linux-2.6.git.orig/mm/util.c
+++ linux-2.6.git/mm/util.c
@@ -4,6 +4,8 @@
#include <linux/export.h>
#include <linux/err.h>
#include <linux/sched.h>
+#include <linux/random.h>
+#include <linux/capability.h>
#include <asm/uaccess.h>
#include "internal.h"
@@ -307,3 +309,50 @@ EXPORT_TRACEPOINT_SYMBOL(kmalloc_node);
EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc_node);
EXPORT_TRACEPOINT_SYMBOL(kfree);
EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
+
+#ifdef CONFIG_GENERIC_OBJECT_IDS
+static unsigned long gen_obj_cookie[GEN_OBJ_ID_TYPES] __read_mostly;
+
+unsigned long gen_obj_id(void *ptr, int type)
+{
+ if (!capable(CAP_SYS_ADMIN) || !ptr)
+ return 0;
+
+ BUG_ON(type >= GEN_OBJ_ID_TYPES);
+
+ /*
+ * Note the simple XOR is used here not in a sake
+ * of security by any means, but rather to break
+ * an "impression" that such IDs means something
+ * other than a number which can be used for comparison
+ * with another number generated by this helper only.
+ */
+ return ((unsigned long)ptr) ^ gen_obj_cookie[type];
+}
+
+static __init int gen_obj_cookie_init(void)
+{
+#if BITS_PER_LONG == 64
+ const unsigned long emergency_cookie = 0xefcdab8967452301;
+#else
+ const unsigned long emergency_cookie = 0x98badcf9;
+#endif
+ int i;
+
+ for (i = 0; i < GEN_OBJ_ID_TYPES; i++) {
+ get_random_bytes(&gen_obj_cookie[i],
+ sizeof(unsigned long));
+ /*
+ * In 'impossible' case of random-bytes = 0
+ * we still would have non-zero value.
+ */
+ gen_obj_cookie[i] =
+ (gen_obj_cookie[i] & __PAGE_OFFSET) +
+ (emergency_cookie & ~__PAGE_OFFSET);
+ }
+
+ return 0;
+}
+
+late_initcall(gen_obj_cookie_init);
+#endif
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [patch 1/4] Add routine for generating an ID for kernel pointer
2011-12-22 12:56 ` [patch 1/4] Add routine for generating an ID for kernel pointer Cyrill Gorcunov
@ 2011-12-28 16:51 ` Alan Cox
2011-12-28 17:05 ` Cyrill Gorcunov
0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2011-12-28 16:51 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton
> + be used for anything but the "sameness" test. Besides, the IDs are
> + dynamic and valid only while object is alive, once it get freed or
Please explain your locking model which ensures a false sameness cannot
occur due to object reuse as you do the comparison and scan.
Alan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 1/4] Add routine for generating an ID for kernel pointer
2011-12-28 16:51 ` Alan Cox
@ 2011-12-28 17:05 ` Cyrill Gorcunov
2011-12-28 17:21 ` Alan Cox
0 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-12-28 17:05 UTC (permalink / raw)
To: Alan Cox
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton
On Wed, Dec 28, 2011 at 04:51:14PM +0000, Alan Cox wrote:
> > + be used for anything but the "sameness" test. Besides, the IDs are
> > + dynamic and valid only while object is alive, once it get freed or
>
> Please explain your locking model which ensures a false sameness cannot
> occur due to object reuse as you do the comparison and scan.
>
It can happen in case of object re-allocated from slab. But in case
of two living pids it's impossible to get same pointers for different
objects. Or I misunderstood the question, Alan? It's up to application
to not compare objects from dead tasks.
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 1/4] Add routine for generating an ID for kernel pointer
2011-12-28 17:05 ` Cyrill Gorcunov
@ 2011-12-28 17:21 ` Alan Cox
2011-12-28 17:35 ` Cyrill Gorcunov
0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2011-12-28 17:21 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton
> It can happen in case of object re-allocated from slab. But in case
> of two living pids it's impossible to get same pointers for different
> objects. Or I misunderstood the question, Alan? It's up to application
> to not compare objects from dead tasks.
How will it know the task has not died and been reallocated, or its
resources not been freed and reallocated during the comparison ?
Your comparison appears to have a zero time validity - you can ask "is A
the same as B" but by the time you get an answer your answer may no
longer be true. It also externalises a current implementation
detail in a very ugly way.
Would it also not be better to do the job right and simply have an
interface to ask "who shares with A" or even something a bit more high
level, it seems you are creating something nastier by trying to push all
this in userspace than if you did the job or part of it kernel side where
you had access to the right locking infrastructure and where the public
API doesn't need to expose innards of the kernel ?
Alan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 1/4] Add routine for generating an ID for kernel pointer
2011-12-28 17:21 ` Alan Cox
@ 2011-12-28 17:35 ` Cyrill Gorcunov
2011-12-28 19:48 ` Cyrill Gorcunov
0 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-12-28 17:35 UTC (permalink / raw)
To: Alan Cox
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton
On Wed, Dec 28, 2011 at 05:21:51PM +0000, Alan Cox wrote:
> > It can happen in case of object re-allocated from slab. But in case
> > of two living pids it's impossible to get same pointers for different
> > objects. Or I misunderstood the question, Alan? It's up to application
> > to not compare objects from dead tasks.
>
> How will it know the task has not died and been reallocated, or its
> resources not been freed and reallocated during the comparison ?
>
> Your comparison appears to have a zero time validity - you can ask "is A
> the same as B" but by the time you get an answer your answer may no
> longer be true. It also externalises a current implementation
> detail in a very ugly way.
>
It's not differ from reading other data from /proc. How make you be sure
the PPid read from a task status is the same once you've finished reading
it? The same applies to say ps output, it's valid for almost zero time,
then one need to restart ps again to get new process tree snapshot. The
same here -- if I need a precise results I have to either stop tasks or
froze them and compare IDs. That's what I've had in mind.
> Would it also not be better to do the job right and simply have an
> interface to ask "who shares with A" or even something a bit more high
> level, it seems you are creating something nastier by trying to push all
> this in userspace than if you did the job or part of it kernel side where
> you had access to the right locking infrastructure and where the public
> API doesn't need to expose innards of the kernel ?
>
Need to think, thanks a lot for idea, Alan!
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 1/4] Add routine for generating an ID for kernel pointer
2011-12-28 17:35 ` Cyrill Gorcunov
@ 2011-12-28 19:48 ` Cyrill Gorcunov
0 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-12-28 19:48 UTC (permalink / raw)
To: Alan Cox
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton
On Wed, Dec 28, 2011 at 09:35:02PM +0400, Cyrill Gorcunov wrote:
> On Wed, Dec 28, 2011 at 05:21:51PM +0000, Alan Cox wrote:
> > > It can happen in case of object re-allocated from slab. But in case
> > > of two living pids it's impossible to get same pointers for different
> > > objects. Or I misunderstood the question, Alan? It's up to application
> > > to not compare objects from dead tasks.
> >
> > How will it know the task has not died and been reallocated, or its
> > resources not been freed and reallocated during the comparison ?
> >
> > Your comparison appears to have a zero time validity - you can ask "is A
> > the same as B" but by the time you get an answer your answer may no
> > longer be true. It also externalises a current implementation
> > detail in a very ugly way.
> >
>
> It's not differ from reading other data from /proc. How make you be sure
> the PPid read from a task status is the same once you've finished reading
> it? The same applies to say ps output, it's valid for almost zero time,
> then one need to restart ps again to get new process tree snapshot. The
> same here -- if I need a precise results I have to either stop tasks or
> froze them and compare IDs. That's what I've had in mind.
>
> > Would it also not be better to do the job right and simply have an
> > interface to ask "who shares with A" or even something a bit more high
> > level, it seems you are creating something nastier by trying to push all
> > this in userspace than if you did the job or part of it kernel side where
> > you had access to the right locking infrastructure and where the public
> > API doesn't need to expose innards of the kernel ?
> >
>
> Need to think, thanks a lot for idea, Alan!
>
You know, Alan, I thought about different ways... but eventually I think
having _this_ interface is better than anything else (of course along with
addressing problems Tejun mentioned). "Who shares with A" doesn't make
situation better, once I obtain the result it's not valid anymore and
I have to either do the same requests again and again, or stop/freeze
tasks before asking which resources they do share. The interface provided
now gives the *minimun* information user-space needed to draw the shared
resources affinity scheme, and I would like to escape putting more work
on kernel side if possible. Sounds reasonable?
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2011-12-22 12:56 [patch 0/4] kernel generic object IDs series Cyrill Gorcunov
2011-12-22 12:56 ` [patch 1/4] Add routine for generating an ID for kernel pointer Cyrill Gorcunov
@ 2011-12-22 12:56 ` Cyrill Gorcunov
2011-12-22 12:56 ` [patch 3/4] proc: Show open file ID in /proc/pid/fdinfo/* Cyrill Gorcunov
` (2 subsequent siblings)
4 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-12-22 12:56 UTC (permalink / raw)
To: linux-kernel
Cc: Pavel Emelyanov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Vasiliy Kulikov,
Andrew Morton, Cyrill Gorcunov
[-- Attachment #1: 2-objids-namespaces --]
[-- Type: text/plain, Size: 3835 bytes --]
This patch adds proc_ns_read method which provides
IDs for /proc/pid/ns/* files.
Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Glauber Costa <glommer@parallels.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/filesystems/proc.txt | 24 ++++++++++++++++++++++++
fs/proc/namespaces.c | 21 +++++++++++++++++++++
include/linux/mm.h | 1 +
3 files changed, 46 insertions(+)
Index: linux-2.6.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.git.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.git/Documentation/filesystems/proc.txt
@@ -40,6 +40,7 @@ Table of Contents
3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
3.5 /proc/<pid>/mountinfo - Information about mounts
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
+ 3.7 /proc/<pid>/ns - Information about namespaces
------------------------------------------------------------------------------
@@ -1545,3 +1546,26 @@ a task to set its own or one of its thre
is limited in size compared to the cmdline value, so writing anything longer
then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
comm value.
+
+3.7 /proc/<pid>/ns - Information about namespaces
+-----------------------------------------------------
+
+This directory consists of the following files "net", "uts", "ipc",
+and depend if appropriate CONFIG_ entry is set, i.e. it's possible
+to have only one, two or all three files here.
+
+Currently file contents provides that named "object id" number, which
+is a number useful for the one purpose only -- to test if two differen
+<pid> share the namespace.
+
+A typical format is
+
+id: 445332486300860161
+
+i.e. "id" followed by a number. One should never assume the number
+means something, it is only useful for "sameness" test with another number
+obtained from another <pid>.
+
+Moreover, a safe approach is to remember it as a string, since format may
+change in future and id would be not a long integer value, but something
+else, say SHA1/2 or even uuid encoded stream.
Index: linux-2.6.git/fs/proc/namespaces.c
===================================================================
--- linux-2.6.git.orig/fs/proc/namespaces.c
+++ linux-2.6.git/fs/proc/namespaces.c
@@ -27,10 +27,31 @@ static const struct proc_ns_operations *
#endif
};
+#ifdef CONFIG_GENERIC_OBJECT_IDS
+static ssize_t proc_ns_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct proc_inode *ei = PROC_I(file->f_dentry->d_inode);
+ char tmp[32];
+
+ snprintf(tmp, sizeof(tmp), "id:\t%lu\n",
+ gen_obj_id(ei->ns, GEN_OBJ_ID_NS));
+ return simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
+}
+
+static const struct file_operations ns_file_operations = {
+ .llseek = no_llseek,
+ .read = proc_ns_read,
+};
+
+#else
+
static const struct file_operations ns_file_operations = {
.llseek = no_llseek,
};
+#endif /* CONFIG_GENERIC_OBJECT_IDS */
+
static struct dentry *proc_ns_instantiate(struct inode *dir,
struct dentry *dentry, struct task_struct *task, const void *ptr)
{
Index: linux-2.6.git/include/linux/mm.h
===================================================================
--- linux-2.6.git.orig/include/linux/mm.h
+++ linux-2.6.git/include/linux/mm.h
@@ -1641,6 +1641,7 @@ extern void copy_user_huge_page(struct p
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
enum {
+ GEN_OBJ_ID_NS,
GEN_OBJ_ID_TYPES,
};
^ permalink raw reply [flat|nested] 18+ messages in thread* [patch 3/4] proc: Show open file ID in /proc/pid/fdinfo/*
2011-12-22 12:56 [patch 0/4] kernel generic object IDs series Cyrill Gorcunov
2011-12-22 12:56 ` [patch 1/4] Add routine for generating an ID for kernel pointer Cyrill Gorcunov
2011-12-22 12:56 ` [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Cyrill Gorcunov
@ 2011-12-22 12:56 ` Cyrill Gorcunov
2011-12-22 13:24 ` [patch 0/4] kernel generic object IDs series Cyrill Gorcunov
2011-12-27 11:15 ` Cyrill Gorcunov
4 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-12-22 12:56 UTC (permalink / raw)
To: linux-kernel
Cc: Pavel Emelyanov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Vasiliy Kulikov,
Andrew Morton, Cyrill Gorcunov
[-- Attachment #1: 3-objid-fdinfo-2 --]
[-- Type: text/plain, Size: 2360 bytes --]
This patch adds "id" field in /proc/pid/fdinfo/<file>
output which represent generic ID for 'struct file'
being used by a task.
Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Glauber Costa <glommer@parallels.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
fs/proc/base.c | 34 +++++++++++++++++++++++++++++-----
include/linux/mm.h | 1 +
2 files changed, 30 insertions(+), 5 deletions(-)
Index: linux-2.6.git/fs/proc/base.c
===================================================================
--- linux-2.6.git.orig/fs/proc/base.c
+++ linux-2.6.git/fs/proc/base.c
@@ -1888,8 +1888,36 @@ out:
return ~0U;
}
+#ifdef CONFIG_GENERIC_OBJECT_IDS
+
+#define PROC_FDINFO_MAX 128
+
+static void proc_fd_info_fill(char *info, struct file *file, unsigned int f_flags)
+{
+ snprintf(info, PROC_FDINFO_MAX,
+ "pos:\t%lli\n"
+ "flags:\t0%o\n"
+ "id:\t%lu\n",
+ (long long)file->f_pos,
+ f_flags,
+ gen_obj_id(file, GEN_OBJ_ID_FILE));
+}
+
+#else /* CONFIG_GENERIC_OBJECT_IDS */
+
#define PROC_FDINFO_MAX 64
+static void proc_fd_info_fill(char *info, struct file *file, unsigned int f_flags)
+{
+ snprintf(info, PROC_FDINFO_MAX,
+ "pos:\t%lli\n"
+ "flags:\t0%o\n",
+ (long long)file->f_pos,
+ f_flags);
+}
+
+#endif
+
static int proc_fd_info(struct inode *inode, struct path *path, char *info)
{
struct task_struct *task = get_proc_task(inode);
@@ -1922,11 +1950,7 @@ static int proc_fd_info(struct inode *in
path_get(&file->f_path);
}
if (info)
- snprintf(info, PROC_FDINFO_MAX,
- "pos:\t%lli\n"
- "flags:\t0%o\n",
- (long long) file->f_pos,
- f_flags);
+ proc_fd_info_fill(info, file, f_flags);
spin_unlock(&files->file_lock);
put_files_struct(files);
return 0;
Index: linux-2.6.git/include/linux/mm.h
===================================================================
--- linux-2.6.git.orig/include/linux/mm.h
+++ linux-2.6.git/include/linux/mm.h
@@ -1642,6 +1642,7 @@ extern void copy_user_huge_page(struct p
enum {
GEN_OBJ_ID_NS,
+ GEN_OBJ_ID_FILE,
GEN_OBJ_ID_TYPES,
};
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [patch 0/4] kernel generic object IDs series
2011-12-22 12:56 [patch 0/4] kernel generic object IDs series Cyrill Gorcunov
` (2 preceding siblings ...)
2011-12-22 12:56 ` [patch 3/4] proc: Show open file ID in /proc/pid/fdinfo/* Cyrill Gorcunov
@ 2011-12-22 13:24 ` Cyrill Gorcunov
2011-12-27 11:15 ` Cyrill Gorcunov
4 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-12-22 13:24 UTC (permalink / raw)
To: linux-kernel
Cc: Pavel Emelyanov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Vasiliy Kulikov,
Andrew Morton
On Thu, Dec 22, 2011 at 04:56:39PM +0400, Cyrill Gorcunov wrote:
...
>
> Any kind of comments and especially complains (!) are very appreciated!
>
> Cyrill
>
Because of "CLONE_XXX" in subject, LKML dropped the fourth patch, so I
renamed it and put it here. Sorry for inconvenince.
Cyrill
---
proc: Show IDs of objects cloned with CLONE_ in proc
An example of output is
# cat /proc/2332/objects
VM :445332486314238977
FILES :16574420397857109505
FS :7421276367695228033
SIGHAND :9517130248188834433
IO :9605099974489260945
SYSVSEM :0
each one represents an ID of appropriate resource used
by a task -- memory, signals, fs and etc.
Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Glauber Costa <glommer@parallels.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/filesystems/proc.txt | 18 ++++++++++++++++++
fs/proc/base.c | 29 +++++++++++++++++++++++++++++
include/linux/mm.h | 6 ++++++
3 files changed, 53 insertions(+)
Index: linux-2.6.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.git.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.git/Documentation/filesystems/proc.txt
@@ -41,6 +41,7 @@ Table of Contents
3.5 /proc/<pid>/mountinfo - Information about mounts
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
3.7 /proc/<pid>/ns - Information about namespaces
+ 3.8 /proc/<pid>/objects - Information about generic objects
------------------------------------------------------------------------------
@@ -1569,3 +1570,20 @@ obtained from another <pid>.
Moreover, a safe approach is to remember it as a string, since format may
change in future and id would be not a long integer value, but something
else, say SHA1/2 or even uuid encoded stream.
+
+3.8 /proc/<pid>/objects - Information about generic objects
+---------------------------------------------------------------
+
+Similar to /proc/<pid>/ns this file provide generic object ids by
+the following format (better to illustrate with example)
+
+VM :445332486300860161
+FILES :16574420397866995457
+FS :7421276367693613825
+SIGHAND :9517130248196786369
+IO :0
+SYSVSEM :0
+
+The first column is an object name, the second -- object ID number.
+As being mentioned in 3.7 one should carry them as a string to be
+on a safe side.
Index: linux-2.6.git/fs/proc/base.c
===================================================================
--- linux-2.6.git.orig/fs/proc/base.c
+++ linux-2.6.git/fs/proc/base.c
@@ -312,6 +312,32 @@ out:
return res;
}
+#ifdef CONFIG_GENERIC_OBJECT_IDS
+static int proc_pid_objs(struct task_struct *task, char * buffer)
+{
+ int ret = 0;
+
+#define SHOW_PID_OBJ(__type, __field) \
+ do { \
+ ret += sprintf(buffer + ret, "%-10s:%lu\n", \
+ #__type, \
+ gen_obj_id(task->__field, \
+ GEN_OBJ_ID_##__type)); \
+ } while (0)
+
+ SHOW_PID_OBJ(VM, mm);
+ SHOW_PID_OBJ(FILES, files);
+ SHOW_PID_OBJ(FS, fs);
+ SHOW_PID_OBJ(SIGHAND, sighand);
+ SHOW_PID_OBJ(IO, io_context);
+ SHOW_PID_OBJ(SYSVSEM, sysvsem.undo_list);
+
+#undef SHOW_PID_OBJ
+
+ return ret;
+}
+#endif
+
static int proc_pid_auxv(struct task_struct *task, char *buffer)
{
struct mm_struct *mm = mm_for_maps(task);
@@ -3172,6 +3198,9 @@ static const struct pid_entry tgid_base_
INF("syscall", S_IRUGO, proc_pid_syscall),
#endif
INF("cmdline", S_IRUGO, proc_pid_cmdline),
+#ifdef CONFIG_GENERIC_OBJECT_IDS
+ INF("objects", S_IRUGO, proc_pid_objs),
+#endif
ONE("stat", S_IRUGO, proc_tgid_stat),
ONE("statm", S_IRUGO, proc_pid_statm),
REG("maps", S_IRUGO, proc_maps_operations),
Index: linux-2.6.git/include/linux/mm.h
===================================================================
--- linux-2.6.git.orig/include/linux/mm.h
+++ linux-2.6.git/include/linux/mm.h
@@ -1643,6 +1643,12 @@ extern void copy_user_huge_page(struct p
enum {
GEN_OBJ_ID_NS,
GEN_OBJ_ID_FILE,
+ GEN_OBJ_ID_VM,
+ GEN_OBJ_ID_FILES,
+ GEN_OBJ_ID_FS,
+ GEN_OBJ_ID_SIGHAND,
+ GEN_OBJ_ID_IO,
+ GEN_OBJ_ID_SYSVSEM,
GEN_OBJ_ID_TYPES,
};
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [patch 0/4] kernel generic object IDs series
2011-12-22 12:56 [patch 0/4] kernel generic object IDs series Cyrill Gorcunov
` (3 preceding siblings ...)
2011-12-22 13:24 ` [patch 0/4] kernel generic object IDs series Cyrill Gorcunov
@ 2011-12-27 11:15 ` Cyrill Gorcunov
4 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-12-27 11:15 UTC (permalink / raw)
To: linux-kernel
Cc: Pavel Emelyanov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Vasiliy Kulikov,
Andrew Morton
On Thu, Dec 22, 2011 at 4:56 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> Hi,
>
> when we do the checkpoint-restore we need to find out if various kernel objects
> (like mm_struct-s of file_struct-s) are shared between tasks (and restore them
> after).
>
...
Ping, any opinions on the series?
^ permalink raw reply [flat|nested] 18+ messages in thread
* [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2011-12-23 12:47 [patch 0/4] generic object ids, v2 Cyrill Gorcunov
@ 2011-12-23 12:47 ` Cyrill Gorcunov
2012-01-04 6:02 ` Eric W. Biederman
0 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-12-23 12:47 UTC (permalink / raw)
To: linux-kernel
Cc: Pavel Emelyanov, Glauber Costa, Andi Kleen, Tejun Heo,
Matt Helsley, Pekka Enberg, Eric Dumazet, Vasiliy Kulikov,
Andrew Morton, Alexey Dobriyan, Cyrill Gorcunov
[-- Attachment #1: 2-objids-namespaces --]
[-- Type: text/plain, Size: 3988 bytes --]
This patch adds proc_ns_read method which provides
IDs for /proc/pid/ns/* files.
Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Glauber Costa <glommer@parallels.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
---
Documentation/filesystems/proc.txt | 24 ++++++++++++++++++++++++
fs/proc/namespaces.c | 22 ++++++++++++++++++++++
include/linux/gen_obj_id.h | 1 +
3 files changed, 47 insertions(+)
Index: linux-2.6.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.git.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.git/Documentation/filesystems/proc.txt
@@ -40,6 +40,7 @@ Table of Contents
3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
3.5 /proc/<pid>/mountinfo - Information about mounts
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
+ 3.7 /proc/<pid>/ns - Information about namespaces
------------------------------------------------------------------------------
@@ -1545,3 +1546,26 @@ a task to set its own or one of its thre
is limited in size compared to the cmdline value, so writing anything longer
then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
comm value.
+
+3.7 /proc/<pid>/ns - Information about namespaces
+-----------------------------------------------------
+
+This directory consists of the following files "net", "uts", "ipc",
+and depend if appropriate CONFIG_ entry is set, i.e. it's possible
+to have only one, two or all three files here.
+
+Currently file contents provides that named "object id" number, which
+is a number useful for the one purpose only -- to test if two differen
+<pid> share the namespace.
+
+A typical format is
+
+id: 445332486300860161
+
+i.e. "id" followed by a number. One should never assume the number
+means something, it is only useful for "sameness" test with another number
+obtained from another <pid>.
+
+Moreover, a safe approach is to remember it as a string, since format may
+change in future and id would be not a long integer value, but something
+else, say SHA1/2 or even uuid encoded stream.
Index: linux-2.6.git/fs/proc/namespaces.c
===================================================================
--- linux-2.6.git.orig/fs/proc/namespaces.c
+++ linux-2.6.git/fs/proc/namespaces.c
@@ -12,6 +12,7 @@
#include <linux/mnt_namespace.h>
#include <linux/ipc_namespace.h>
#include <linux/pid_namespace.h>
+#include <linux/gen_obj_id.h>
#include "internal.h"
@@ -27,10 +28,31 @@ static const struct proc_ns_operations *
#endif
};
+#ifdef CONFIG_GENERIC_OBJECT_ID
+static ssize_t proc_ns_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct proc_inode *ei = PROC_I(file->f_dentry->d_inode);
+ char tmp[32];
+
+ snprintf(tmp, sizeof(tmp), "id:\t%lu\n",
+ gen_obj_id(ei->ns, GEN_OBJ_ID_NS));
+ return simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
+}
+
+static const struct file_operations ns_file_operations = {
+ .llseek = no_llseek,
+ .read = proc_ns_read,
+};
+
+#else
+
static const struct file_operations ns_file_operations = {
.llseek = no_llseek,
};
+#endif /* CONFIG_GENERIC_OBJECT_ID */
+
static struct dentry *proc_ns_instantiate(struct inode *dir,
struct dentry *dentry, struct task_struct *task, const void *ptr)
{
Index: linux-2.6.git/include/linux/gen_obj_id.h
===================================================================
--- linux-2.6.git.orig/include/linux/gen_obj_id.h
+++ linux-2.6.git/include/linux/gen_obj_id.h
@@ -4,6 +4,7 @@
#ifdef __KERNEL__
enum {
+ GEN_OBJ_ID_NS,
GEN_OBJ_ID_TYPES,
};
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2011-12-23 12:47 ` [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files Cyrill Gorcunov
@ 2012-01-04 6:02 ` Eric W. Biederman
2012-01-04 11:26 ` Cyrill Gorcunov
0 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2012-01-04 6:02 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton, Alexey Dobriyan
Cyrill Gorcunov <gorcunov@openvz.org> writes:
> This patch adds proc_ns_read method which provides
> IDs for /proc/pid/ns/* files.
Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
This is a poorly thought out user interface. If we are going to return
this kind of information and I believe we should we should, we should
return the id in the inode field with stat.
Comparing device+inode for equality is the traditional way to see if
two objects are the same in unix and there is no reason to make up
a new interface to get this functionality.
Furthermore we should always return the information, as it is valuable
even outside of the checkpoint/restart context.
I am also concerned that you appear to be building an interface
for use by checkpoint/restart that makes it impossible
checkpoint/restart the programs using that interface. The reason
is that you appear to be putting this nebulous id into a global
namespace and as such even if we wanted to I don't see how we could
build a version where we could restore the id during a restart. And
the thing is if you start building interfaces with identifiers you can
not possibly restore I expect you will find you have painted yourself
into a corner.
Using inode from stat avoids painting yourself into a corner because
you have the possibility of different mounts with different device
numbers having different inode numbers.
For the short term I don't see value in being able to restore the
object identifiers, but I do see a lot of value in allowing for a future
where nested checkpoint/restart is an option.
Eric
> Based-on-patch-from: Pavel Emelyanov <xemul@parallels.com>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Glauber Costa <glommer@parallels.com>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Tejun Heo <tj@kernel.org>
> CC: Matt Helsley <matthltc@us.ibm.com>
> CC: Pekka Enberg <penberg@kernel.org>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Vasiliy Kulikov <segoon@openwall.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Alexey Dobriyan <adobriyan@gmail.com>
> ---
> Documentation/filesystems/proc.txt | 24 ++++++++++++++++++++++++
> fs/proc/namespaces.c | 22 ++++++++++++++++++++++
> include/linux/gen_obj_id.h | 1 +
> 3 files changed, 47 insertions(+)
>
> Index: linux-2.6.git/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux-2.6.git.orig/Documentation/filesystems/proc.txt
> +++ linux-2.6.git/Documentation/filesystems/proc.txt
> @@ -40,6 +40,7 @@ Table of Contents
> 3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
> 3.5 /proc/<pid>/mountinfo - Information about mounts
> 3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
> + 3.7 /proc/<pid>/ns - Information about namespaces
>
>
> ------------------------------------------------------------------------------
> @@ -1545,3 +1546,26 @@ a task to set its own or one of its thre
> is limited in size compared to the cmdline value, so writing anything longer
> then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
> comm value.
> +
> +3.7 /proc/<pid>/ns - Information about namespaces
> +-----------------------------------------------------
> +
> +This directory consists of the following files "net", "uts", "ipc",
> +and depend if appropriate CONFIG_ entry is set, i.e. it's possible
> +to have only one, two or all three files here.
> +
> +Currently file contents provides that named "object id" number, which
> +is a number useful for the one purpose only -- to test if two differen
> +<pid> share the namespace.
> +
> +A typical format is
> +
> +id: 445332486300860161
> +
> +i.e. "id" followed by a number. One should never assume the number
> +means something, it is only useful for "sameness" test with another number
> +obtained from another <pid>.
> +
> +Moreover, a safe approach is to remember it as a string, since format may
> +change in future and id would be not a long integer value, but something
> +else, say SHA1/2 or even uuid encoded stream.
> Index: linux-2.6.git/fs/proc/namespaces.c
> ===================================================================
> --- linux-2.6.git.orig/fs/proc/namespaces.c
> +++ linux-2.6.git/fs/proc/namespaces.c
> @@ -12,6 +12,7 @@
> #include <linux/mnt_namespace.h>
> #include <linux/ipc_namespace.h>
> #include <linux/pid_namespace.h>
> +#include <linux/gen_obj_id.h>
> #include "internal.h"
>
>
> @@ -27,10 +28,31 @@ static const struct proc_ns_operations *
> #endif
> };
>
> +#ifdef CONFIG_GENERIC_OBJECT_ID
> +static ssize_t proc_ns_read(struct file *file, char __user *buf,
> + size_t len, loff_t *ppos)
> +{
> + struct proc_inode *ei = PROC_I(file->f_dentry->d_inode);
> + char tmp[32];
> +
> + snprintf(tmp, sizeof(tmp), "id:\t%lu\n",
> + gen_obj_id(ei->ns, GEN_OBJ_ID_NS));
> + return simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
> +}
> +
> +static const struct file_operations ns_file_operations = {
> + .llseek = no_llseek,
> + .read = proc_ns_read,
> +};
> +
> +#else
> +
> static const struct file_operations ns_file_operations = {
> .llseek = no_llseek,
> };
>
> +#endif /* CONFIG_GENERIC_OBJECT_ID */
> +
> static struct dentry *proc_ns_instantiate(struct inode *dir,
> struct dentry *dentry, struct task_struct *task, const void *ptr)
> {
> Index: linux-2.6.git/include/linux/gen_obj_id.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/gen_obj_id.h
> +++ linux-2.6.git/include/linux/gen_obj_id.h
> @@ -4,6 +4,7 @@
> #ifdef __KERNEL__
>
> enum {
> + GEN_OBJ_ID_NS,
> GEN_OBJ_ID_TYPES,
> };
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2012-01-04 6:02 ` Eric W. Biederman
@ 2012-01-04 11:26 ` Cyrill Gorcunov
2012-01-04 17:56 ` Eric W. Biederman
0 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2012-01-04 11:26 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton, Alexey Dobriyan
On Tue, Jan 03, 2012 at 10:02:32PM -0800, Eric W. Biederman wrote:
> Cyrill Gorcunov <gorcunov@openvz.org> writes:
>
> > This patch adds proc_ns_read method which provides
> > IDs for /proc/pid/ns/* files.
>
> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> This is a poorly thought out user interface. If we are going to return
> this kind of information and I believe we should we should, we should
> return the id in the inode field with stat.
>
> Comparing device+inode for equality is the traditional way to see if
> two objects are the same in unix and there is no reason to make up
> a new interface to get this functionality.
>
> Furthermore we should always return the information, as it is valuable
> even outside of the checkpoint/restart context.
>
> I am also concerned that you appear to be building an interface
> for use by checkpoint/restart that makes it impossible
> checkpoint/restart the programs using that interface. The reason
> is that you appear to be putting this nebulous id into a global
> namespace and as such even if we wanted to I don't see how we could
> build a version where we could restore the id during a restart. And
> the thing is if you start building interfaces with identifiers you can
> not possibly restore I expect you will find you have painted yourself
> into a corner.
>
> Using inode from stat avoids painting yourself into a corner because
> you have the possibility of different mounts with different device
> numbers having different inode numbers.
>
> For the short term I don't see value in being able to restore the
> object identifiers, but I do see a lot of value in allowing for a future
> where nested checkpoint/restart is an option.
>
> Eric
>
Hi Eric, thanks a lot for comments! I must admit I never though about
nested checkpoint/restore simply because even plain and direct CR still
has a number of problems which are not yet addressed.
As to return such ID in ino field (if I understand you right -- you
propose to return such ID as inode of kstat structure) -- I don't think
it would be right either. Instead of one iteface applied to all objects
we export there will be a few different approaches instead -- for net-ns
it would be dev+ino, for tasks and other members of task-structure
it'll be IDs from /proc (as implemented in another patches). I like
more Kyle's idea about object_id() call which would simply return the
entrypted ID to user-space and it'll be up to user-space to do anything
it wants with such pieces of information.
Yes, there will be no way to restore such IDs later but the interface
is not supposed to work this way. All this mess only because of lack
of way to figure out which task resources are shared and which are not.
Maybe if we can carry CLONE_ flags from copy_process()/unshare()/setns()
(and which else modify task resources?) inside task_struct and provide
these flags back to user-space we might not need the IDs helpers at all.
But I think such approach might end up in a pretty big patch bloating
the kernel. In turn I wanted to bring as minimum new functionality as
possible *with* a way to completely turn it off if user don't need it.
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2012-01-04 11:26 ` Cyrill Gorcunov
@ 2012-01-04 17:56 ` Eric W. Biederman
2012-01-04 18:19 ` Cyrill Gorcunov
0 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2012-01-04 17:56 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton, Alexey Dobriyan
Cyrill Gorcunov <gorcunov@gmail.com> writes:
> On Tue, Jan 03, 2012 at 10:02:32PM -0800, Eric W. Biederman wrote:
>> Cyrill Gorcunov <gorcunov@openvz.org> writes:
>>
>> > This patch adds proc_ns_read method which provides
>> > IDs for /proc/pid/ns/* files.
>>
>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>
>> This is a poorly thought out user interface. If we are going to return
>> this kind of information and I believe we should we should, we should
>> return the id in the inode field with stat.
>>
>> Comparing device+inode for equality is the traditional way to see if
>> two objects are the same in unix and there is no reason to make up
>> a new interface to get this functionality.
>>
>> Furthermore we should always return the information, as it is valuable
>> even outside of the checkpoint/restart context.
>>
>> I am also concerned that you appear to be building an interface
>> for use by checkpoint/restart that makes it impossible
>> checkpoint/restart the programs using that interface. The reason
>> is that you appear to be putting this nebulous id into a global
>> namespace and as such even if we wanted to I don't see how we could
>> build a version where we could restore the id during a restart. And
>> the thing is if you start building interfaces with identifiers you can
>> not possibly restore I expect you will find you have painted yourself
>> into a corner.
>>
>> Using inode from stat avoids painting yourself into a corner because
>> you have the possibility of different mounts with different device
>> numbers having different inode numbers.
>>
>> For the short term I don't see value in being able to restore the
>> object identifiers, but I do see a lot of value in allowing for a future
>> where nested checkpoint/restart is an option.
>>
>> Eric
>>
>
> Hi Eric, thanks a lot for comments! I must admit I never though about
> nested checkpoint/restore simply because even plain and direct CR still
> has a number of problems which are not yet addressed.
>
> As to return such ID in ino field (if I understand you right -- you
> propose to return such ID as inode of kstat structure) -- I don't think
> it would be right either. Instead of one iteface applied to all objects
> we export there will be a few different approaches instead -- for net-ns
> it would be dev+ino, for tasks and other members of task-structure
> it'll be IDs from /proc (as implemented in another patches). I like
> more Kyle's idea about object_id() call which would simply return the
> entrypted ID to user-space and it'll be up to user-space to do anything
> it wants with such pieces of information.
Right now everything thing that is exported is dev+ino. My objection
is that you are adding yet another interface to get that information.
I already have patches that already implement dev+ino for the namespaces
so I fully expect that to happen independently of your patches. My
priority is to get the rest of the namespaces exported which requires
a bit more review.
> Yes, there will be no way to restore such IDs later but the interface
> is not supposed to work this way.
It sounds like it won't be possible to retrofit the ability to restore
the IDs later. If the path to what will be needed to support nested
checkpoint/restore is not clear the user space interface is broken
by design. And since it is broken by design I say the design needs
to bake more before we think of baking it.
> All this mess only because of lack
> of way to figure out which task resources are shared and which are not.
> Maybe if we can carry CLONE_ flags from copy_process()/unshare()/setns()
> (and which else modify task resources?) inside task_struct and provide
> these flags back to user-space we might not need the IDs helpers at all.
> But I think such approach might end up in a pretty big patch bloating
> the kernel. In turn I wanted to bring as minimum new functionality as
> possible *with* a way to completely turn it off if user don't need it.
The tricky case is file descriptors and file descriptors can be passed
over unix domain sockets in arbitrary ways.
If you can find a way to do this without id helpers that sounds like
a good design.
I have a nasty feeling that by trying to do this piecemeal instead of
in one big system call you are slowly painting yourself into a corner
from which you can not get out.
Eric
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 2/4] proc: Show namespaces IDs in /proc/pid/ns/* files
2012-01-04 17:56 ` Eric W. Biederman
@ 2012-01-04 18:19 ` Cyrill Gorcunov
0 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2012-01-04 18:19 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-kernel, Pavel Emelyanov, Glauber Costa, Andi Kleen,
Tejun Heo, Matt Helsley, Pekka Enberg, Eric Dumazet,
Vasiliy Kulikov, Andrew Morton, Alexey Dobriyan
On Wed, Jan 04, 2012 at 09:56:24AM -0800, Eric W. Biederman wrote:
...
> >
> > Hi Eric, thanks a lot for comments! I must admit I never though about
> > nested checkpoint/restore simply because even plain and direct CR still
> > has a number of problems which are not yet addressed.
> >
> > As to return such ID in ino field (if I understand you right -- you
> > propose to return such ID as inode of kstat structure) -- I don't think
> > it would be right either. Instead of one iteface applied to all objects
> > we export there will be a few different approaches instead -- for net-ns
> > it would be dev+ino, for tasks and other members of task-structure
> > it'll be IDs from /proc (as implemented in another patches). I like
> > more Kyle's idea about object_id() call which would simply return the
> > entrypted ID to user-space and it'll be up to user-space to do anything
> > it wants with such pieces of information.
>
> Right now everything thing that is exported is dev+ino. My objection
> is that you are adding yet another interface to get that information.
>
> I already have patches that already implement dev+ino for the namespaces
> so I fully expect that to happen independently of your patches. My
> priority is to get the rest of the namespaces exported which requires
> a bit more review.
>
Ah, good to know, could you please point me where I can get them and try
at least dev+ino part out?
> > Yes, there will be no way to restore such IDs later but the interface
> > is not supposed to work this way.
>
> It sounds like it won't be possible to retrofit the ability to restore
> the IDs later. If the path to what will be needed to support nested
> checkpoint/restore is not clear the user space interface is broken
> by design. And since it is broken by design I say the design needs
> to bake more before we think of baking it.
>
I'm not against of chaging/improving design at all. If there some other
ways to retrieve this kind of information I'm gladly dropping patches
piece-by-piece.
> > All this mess only because of lack
> > of way to figure out which task resources are shared and which are not.
> > Maybe if we can carry CLONE_ flags from copy_process()/unshare()/setns()
> > (and which else modify task resources?) inside task_struct and provide
> > these flags back to user-space we might not need the IDs helpers at all.
> > But I think such approach might end up in a pretty big patch bloating
> > the kernel. In turn I wanted to bring as minimum new functionality as
> > possible *with* a way to completely turn it off if user don't need it.
>
> The tricky case is file descriptors and file descriptors can be passed
> over unix domain sockets in arbitrary ways.
>
Not really, what about other members of task-structure, such as mm, files
and others? If I export this bits I have to export them somehow in a safe
way which would not reveal too much of kernel internals.
> If you can find a way to do this without id helpers that sounds like
> a good design.
>
Yes, I'm trying to find some other way but without much luck at moment.
Once I have something to show -- of course I send it to lkml immediately.
> I have a nasty feeling that by trying to do this piecemeal instead of
> in one big system call you are slowly painting yourself into a corner
> from which you can not get out.
>
Yes again, that was the reason the patches flew to LKML -- just
to obtain as much comments as possible and find some sane way.
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread