public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] [RFC] kernel: add a netlink interface to get information about processes
@ 2015-02-19 13:00 Pavel Odintsov
  2015-02-27 20:43 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Odintsov @ 2015-02-19 13:00 UTC (permalink / raw)
  To: linux-kernel

Hello!

In addition to my post I want to mention another issue related with
slow /proc read in perf toolkit. On my server with 25 000 processes I
need about ~15 minutes for loading perf top toolkit completely.

https://bugzilla.kernel.org/show_bug.cgi?id=86991


-- 
Sincerely yours, Pavel Odintsov

^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH 0/7] [RFC] kernel: add a netlink interface to get information about processes
@ 2015-02-19 12:50 Pavel Odintsov
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Odintsov @ 2015-02-19 12:50 UTC (permalink / raw)
  To: linux-kernel

Hello, folks!

It's very useful patches and they can do my tasks simpler and faster.

In my day to day work I working with Linux servers with enormous
amount of processes (~25 000 per server). This servers run multiple
hundreds of Linux containers.

If I want analyze processor load, network load or check something else
I use top/atop/htop/netstat. But they work very slow and consume
significant amount of CPU power for parsing multiple thousands text
files in /proc (like /proc/tcp, /proc/udp, /proc/status,
/proc/$pid/status).

Some time ago I worked on malware detection toolkit for Linux -
Antidoto (https://github.com/FastVPSEestiOu/Antidoto) which uses /proc
filesystem very deeply. For detecting malware I need check every
descriptor, every sockets and get complete information about all
processes on system.

But with current text file based architecture of /proc I can't achieve
suitable speed of my toolkit.

For example, there you can look at time of processing all network
connections for server with 20244 processes with
linux_network_activity_tracker.pl
(https://github.com/FastVPSEestiOu/Antidoto/blob/master/linux_network_activity_tracker.pl):

real 1m26.637s
user 0m23.945s
sys 0m43.978s

As you can see this time is very huge but I use latest CPUs from Intel
(Xepn 2697v3).

I have multiple ideas about complete realtime Linux server monitoring
but without ability to pull information from the Linux Kernel faster I
can't realize they.

-- 
Sincerely yours, Pavel Odintsov

^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH 0/7] [RFC] kernel: add a netlink interface to get information about processes
@ 2015-02-17  8:20 Andrey Vagin
  2015-02-17  8:53 ` Arnd Bergmann
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Andrey Vagin @ 2015-02-17  8:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-api, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Andrey Vagin

Here is a preview version. It provides restricted set of functionality.
I would like to collect feedback about this idea.

Currently we use the proc file system, where all information are
presented in text files, what is convenient for humans.  But if we need
to get information about processes from code (e.g. in C), the procfs
doesn't look so cool.

>From code we would prefer to get information in binary format and to be
able to specify which information and for which tasks are required. Here
is a new interface with all these features, which is called task_diag.
In addition it's much faster than procfs.

task_diag is based on netlink sockets and looks like socket-diag, which
is used to get information about sockets.

A request is described by the task_diag_pid structure:

struct task_diag_pid {
       __u64   show_flags;	/* specify which information are required */
       __u64   dump_stratagy;   /* specify a group of processes */

       __u32   pid;
};

A respone is a set of netlink messages. Each message describes one task.
All task properties are divided on groups. A message contains the
TASK_DIAG_MSG group and other groups if they have been requested in
show_flags. For example, if show_flags contains TASK_DIAG_SHOW_CRED, a
response will contain the TASK_DIAG_CRED group which is described by the
task_diag_creds structure.

struct task_diag_msg {
	__u32	tgid;
	__u32	pid;
	__u32	ppid;
	__u32	tpid;
	__u32	sid;
	__u32	pgid;
	__u8	state;
	char	comm[TASK_DIAG_COMM_LEN];
};

Another good feature of task_diag is an ability to request information
for a few processes. Currently here are two stratgies
TASK_DIAG_DUMP_ALL	- get information for all tasks
TASK_DIAG_DUMP_CHILDREN	- get information for children of a specified
			  tasks

The task diag is much faster than the proc file system. We don't need to
create a new file descriptor for each task. We need to send a request
and get a response. It allows to get information for a few task in one
request-response iteration.

I have compared performance of procfs and task-diag for the
"ps ax -o pid,ppid" command.

A test stand contains 10348 processes.
$ ps ax -o pid,ppid | wc -l
10348

$ time ps ax -o pid,ppid > /dev/null

real	0m1.073s
user	0m0.086s
sys	0m0.903s

$ time ./task_diag_all > /dev/null

real	0m0.037s
user	0m0.004s
sys	0m0.020s

And here are statistics about syscalls which were called by each
command.
$ perf stat -e syscalls:sys_exit* -- ps ax -o pid,ppid  2>&1 | grep syscalls | sort -n -r | head -n 5
            20,713      syscalls:sys_exit_open
            20,710      syscalls:sys_exit_close
            20,708      syscalls:sys_exit_read
            10,348      syscalls:sys_exit_newstat
                31      syscalls:sys_exit_write

$ perf stat -e syscalls:sys_exit* -- ./task_diag_all  2>&1 | grep syscalls | sort -n -r | head -n 5
               114      syscalls:sys_exit_recvfrom
                49      syscalls:sys_exit_write
                 8      syscalls:sys_exit_mmap
                 4      syscalls:sys_exit_mprotect
                 3      syscalls:sys_exit_newfstat

You can find the test program from this experiment in the last patch.

The idea of this functionality was suggested by Pavel Emelyanov
(xemul@), when he found that operations with /proc forms a significant
part of a checkpointing time.

Ten years ago here was attempt to add a netlink interface to access to /proc
information:
http://lwn.net/Articles/99600/

Signed-off-by: Andrey Vagin <avagin@openvz.org>

git repo: https://github.com/avagin/linux-task-diag

Andrey Vagin (7):
  [RFC] kernel: add a netlink interface to get information about tasks
  kernel: move next_tgid from fs/proc
  task-diag: add ability to get information about all tasks
  task-diag: add a new group to get process credentials
  kernel: add ability to iterate children of a specified task
  task_diag: add ability to dump children
  selftest: check the task_diag functinonality

 fs/proc/array.c                                    |  58 +---
 fs/proc/base.c                                     |  43 ---
 include/linux/proc_fs.h                            |  13 +
 include/uapi/linux/taskdiag.h                      |  89 ++++++
 init/Kconfig                                       |  12 +
 kernel/Makefile                                    |   1 +
 kernel/pid.c                                       |  94 ++++++
 kernel/taskdiag.c                                  | 343 +++++++++++++++++++++
 tools/testing/selftests/task_diag/Makefile         |  16 +
 tools/testing/selftests/task_diag/task_diag.c      |  59 ++++
 tools/testing/selftests/task_diag/task_diag_all.c  |  82 +++++
 tools/testing/selftests/task_diag/task_diag_comm.c | 195 ++++++++++++
 tools/testing/selftests/task_diag/task_diag_comm.h |  47 +++
 tools/testing/selftests/task_diag/taskdiag.h       |   1 +
 14 files changed, 967 insertions(+), 86 deletions(-)
 create mode 100644 include/uapi/linux/taskdiag.h
 create mode 100644 kernel/taskdiag.c
 create mode 100644 tools/testing/selftests/task_diag/Makefile
 create mode 100644 tools/testing/selftests/task_diag/task_diag.c
 create mode 100644 tools/testing/selftests/task_diag/task_diag_all.c
 create mode 100644 tools/testing/selftests/task_diag/task_diag_comm.c
 create mode 100644 tools/testing/selftests/task_diag/task_diag_comm.h
 create mode 120000 tools/testing/selftests/task_diag/taskdiag.h

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Roger Luethi <rl@hellgate.ch>
-- 
2.1.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-02-27 21:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-19 13:00 [PATCH 0/7] [RFC] kernel: add a netlink interface to get information about processes Pavel Odintsov
2015-02-27 20:43 ` Arnaldo Carvalho de Melo
2015-02-27 20:54   ` David Ahern
2015-02-27 21:50     ` Arnaldo Carvalho de Melo
  -- strict thread matches above, loose matches on Subject: below --
2015-02-19 12:50 Pavel Odintsov
2015-02-17  8:20 Andrey Vagin
2015-02-17  8:53 ` Arnd Bergmann
2015-02-17 21:33   ` Andrew Vagin
2015-02-18 11:06     ` Arnd Bergmann
2015-02-18 12:42       ` Andrew Vagin
2015-02-18 14:46         ` Arnd Bergmann
2015-02-19 14:04           ` Andrew Vagin
2015-02-17 16:09 ` David Ahern
2015-02-17 20:32   ` Andrew Vagin
2015-02-17 19:05 ` Andy Lutomirski
2015-02-18 14:27   ` Andrew Vagin
2015-02-19  1:18     ` Andy Lutomirski
2015-02-19 21:39       ` Andrew Vagin
2015-02-20 20:33         ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox