All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Serge Hallyn <serge.hallyn@canonical.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Tejun Heo <tj@kernel.org>, Andrew Vagin <avagin@openvz.org>,
	Vasiliy Kulikov <segoon@openwall.com>
Subject: Re: [PATCH] fs, proc: Introduce /proc/<pid>/task/<tid>/children entry v7
Date: Fri, 20 Jan 2012 16:01:08 +0100	[thread overview]
Message-ID: <20120120150108.GA9081@redhat.com> (raw)
In-Reply-To: <20120119181329.GE32568@moon>

On 01/19, Cyrill Gorcunov wrote:
>
> Here is refreshed one for sure.


Reviewed-by: Oleg Nesterov <oleg@redhat.com>


> From: Cyrill Gorcunov <gorcunov@openvz.org>
> Subject: fs, proc: Introduce /proc/<pid>/task/<tid>/children entry v8
> 
> When we do checkpoint of a task we need to know the list of children
> the task, has but there is no easy and fast way to generate reverse
> parent->children chain from arbitrary <pid> (while a parent pid is
> provided in "PPid" field of /proc/<pid>/status).
> 
> So instead of walking over all pids in the system (creating one big process
> tree in memory, just to figure out which children a task has) -- we add
> explicit /proc/<pid>/task/<tid>/children entry, because the kernel already has
> this kind of information but it is not yet exported.
> 
> This is a first level children, not the whole process tree.
> 
> v2:
>  - Kame suggested to use a separated /proc/<pid>/children entry
>    instead of poking /proc/<pid>/status
>  - Andew suggested to use rcu facility instead of locking
>    tasklist_lock
>  - Tejun pointed that non-seekable seq file might not be
>    enough for tasks with large number of children
> 
> v3:
>  - To be on a safe side use %lu format for pid_t printing
> 
> v4:
>  - New line get printed when sequence ends not at seq->stop,
>    a nit pointed by Tejun
>  - Documentation update
>  - tasklist_lock is back, Oleg pointed that ->children list
>    is actually not rcu-safe
> 
> v5:
>  - Oleg suggested to make /proc/<pid>/task/<tid>/children
>    instead of global /proc/<pid>/children, which eliminates
>    hardness related to threads and children migration, and
>    allows patch to be a way simplier.
> 
> v6:
>  - Drop ptrace_may_access tests, pids are can be found anyway
>    so nothing to protect here.
>  - Update comments and docs, pointed by Oleg.
> 
> v7:
>  - Use get_pid over proc-pid directly, to simplify
>    code, pointed by Oleg.
> 
> v8:
>  - Obtain a starting pid from the proc's inode directly.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> ---
>  Documentation/filesystems/proc.txt |   18 +++++
>  fs/proc/array.c                    |  123 +++++++++++++++++++++++++++++++++++++
>  fs/proc/base.c                     |    1 
>  fs/proc/internal.h                 |    1 
>  4 files changed, 143 insertions(+)
> 
> Index: linux-2.6.git/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux-2.6.git.orig/Documentation/filesystems/proc.txt
> +++ linux-2.6.git/Documentation/filesystems/proc.txt
> @@ -40,6 +40,7 @@ Table of Contents
>    3.4	/proc/<pid>/coredump_filter - Core dump filtering settings
>    3.5	/proc/<pid>/mountinfo - Information about mounts
>    3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
> +  3.7   /proc/<pid>/task/<tid>/children - Information about task children
>  
>    4	Configuring procfs
>    4.1	Mount options
> @@ -1549,6 +1550,23 @@ then the kernel's TASK_COMM_LEN (current
>  comm value.
>  
>  
> +3.7	/proc/<pid>/task/<tid>/children - Information about task children
> +-------------------------------------------------------------------------
> +This file provides a fast way to retrieve first level children pids
> +of a task pointed by <pid>/<tid> pair. The format is a space separated
> +stream of pids.
> +
> +Note the "first level" here -- if a child has own children they will
> +not be listed here, one needs to read /proc/<children-pid>/task/<tid>/children
> +to obtain the descendants.
> +
> +Since this interface is intended to be fast and cheap it doesn't
> +guarantee to provide precise results and some children might be
> +skipped, especially if they've exited right after we printed their
> +pids, so one need to either stop or freeze processes being inspected
> +if precise results are needed.
> +
> +
>  ------------------------------------------------------------------------------
>  Configuring procfs
>  ------------------------------------------------------------------------------
> Index: linux-2.6.git/fs/proc/array.c
> ===================================================================
> --- linux-2.6.git.orig/fs/proc/array.c
> +++ linux-2.6.git/fs/proc/array.c
> @@ -547,3 +547,126 @@ int proc_pid_statm(struct seq_file *m, s
>  
>  	return 0;
>  }
> +
> +static struct pid *
> +get_children_pid(struct inode *inode, struct pid *pid_prev, loff_t pos)
> +{
> +	struct task_struct *start, *task;
> +	struct pid *pid = NULL;
> +
> +	read_lock(&tasklist_lock);
> +
> +	start = pid_task(proc_pid(inode), PIDTYPE_PID);
> +	if (!start)
> +		goto out;
> +
> +	/*
> +	 * Lets try to continue searching first, this gives
> +	 * us significant speedup on children-rich processes.
> +	 */
> +	if (pid_prev) {
> +		task = pid_task(pid_prev, PIDTYPE_PID);
> +		if (task && task->real_parent == start &&
> +		    !(list_empty(&task->sibling))) {
> +			if (list_is_last(&task->sibling, &start->children))
> +				goto out;
> +			task = list_first_entry(&task->sibling,
> +						struct task_struct, sibling);
> +			pid = get_pid(task_pid(task));
> +			goto out;
> +		}
> +	}
> +
> +	/*
> +	 * Slow search case.
> +	 *
> +	 * We might miss some children here if children
> +	 * are exited while we were not holding the lock,
> +	 * but it was never promised to be accurate that
> +	 * much.
> +	 *
> +	 * "Just suppose that the parent sleeps, but N children
> +	 *  exit after we printed their tids. Now the slow paths
> +	 *  skips N extra children, we miss N tasks." (c)
> +	 *
> +	 * So one need to stop or freeze the leader and all
> +	 * its children to get a precise result.
> +	 */
> +	list_for_each_entry(task, &start->children, sibling) {
> +		if (pos-- == 0) {
> +			pid = get_pid(task_pid(task));
> +			break;
> +		}
> +	}
> +
> +out:
> +	read_unlock(&tasklist_lock);
> +	return pid;
> +}
> +
> +static int children_seq_show(struct seq_file *seq, void *v)
> +{
> +	struct inode *inode = seq->private;
> +	pid_t pid;
> +
> +	pid = pid_nr_ns(v, inode->i_sb->s_fs_info);
> +	return seq_printf(seq, " %d", pid);
> +}
> +
> +static void *children_seq_start(struct seq_file *seq, loff_t *pos)
> +{
> +	return get_children_pid(seq->private, NULL, *pos);
> +}
> +
> +static void *children_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> +{
> +	struct pid *pid = NULL;
> +
> +	pid = get_children_pid(seq->private, v, *pos + 1);
> +	if (!pid)
> +		seq_printf(seq, "\n");
> +	put_pid(v);
> +
> +	++*pos;
> +	return pid;
> +}
> +
> +static void children_seq_stop(struct seq_file *seq, void *v)
> +{
> +	put_pid(v);
> +}
> +
> +static const struct seq_operations children_seq_ops = {
> +	.start	= children_seq_start,
> +	.next	= children_seq_next,
> +	.stop	= children_seq_stop,
> +	.show	= children_seq_show,
> +};
> +
> +static int children_seq_open(struct inode *inode, struct file *file)
> +{
> +	struct seq_file *m;
> +	int ret;
> +
> +	ret = seq_open(file, &children_seq_ops);
> +	if (ret)
> +		return ret;
> +
> +	m = file->private_data;
> +	m->private = inode;
> +
> +	return ret;
> +}
> +
> +int children_seq_release(struct inode *inode, struct file *file)
> +{
> +	seq_release(inode, file);
> +	return 0;
> +}
> +
> +const struct file_operations proc_tid_children_operations = {
> +	.open    = children_seq_open,
> +	.read    = seq_read,
> +	.llseek  = seq_lseek,
> +	.release = children_seq_release,
> +};
> Index: linux-2.6.git/fs/proc/base.c
> ===================================================================
> --- linux-2.6.git.orig/fs/proc/base.c
> +++ linux-2.6.git/fs/proc/base.c
> @@ -3454,6 +3454,7 @@ static const struct pid_entry tid_base_s
>  	ONE("stat",      S_IRUGO, proc_tid_stat),
>  	ONE("statm",     S_IRUGO, proc_pid_statm),
>  	REG("maps",      S_IRUGO, proc_maps_operations),
> +	REG("children",  S_IRUGO, proc_tid_children_operations),
>  #ifdef CONFIG_NUMA
>  	REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
>  #endif
> Index: linux-2.6.git/fs/proc/internal.h
> ===================================================================
> --- linux-2.6.git.orig/fs/proc/internal.h
> +++ linux-2.6.git/fs/proc/internal.h
> @@ -53,6 +53,7 @@ extern int proc_pid_statm(struct seq_fil
>  				struct pid *pid, struct task_struct *task);
>  extern loff_t mem_lseek(struct file *file, loff_t offset, int orig);
>  
> +extern const struct file_operations proc_tid_children_operations;
>  extern const struct file_operations proc_maps_operations;
>  extern const struct file_operations proc_numa_maps_operations;
>  extern const struct file_operations proc_smaps_operations;


  reply	other threads:[~2012-01-20 15:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-19 15:51 [PATCH] fs, proc: Introduce /proc/<pid>/task/<tid>/children entry v7 Cyrill Gorcunov
2012-01-19 17:47 ` Cyrill Gorcunov
2012-01-19 17:57   ` Oleg Nesterov
2012-01-19 18:09     ` Cyrill Gorcunov
2012-01-19 18:13       ` Cyrill Gorcunov
2012-01-20 15:01         ` Oleg Nesterov [this message]
2012-01-19 23:14       ` Eric W. Biederman
2012-01-20  6:31         ` Cyrill Gorcunov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120120150108.GA9081@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@openvz.org \
    --cc=gorcunov@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=segoon@openwall.com \
    --cc=serge.hallyn@canonical.com \
    --cc=tj@kernel.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.