All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eugene Lubarsky <elubarsky.linux@gmail.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, adobriyan@gmail.com,
	avagin@gmail.com, dsahern@gmail.com
Subject: Re: [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes
Date: Tue, 25 Aug 2020 19:59:09 +1000	[thread overview]
Message-ID: <20200825195909.1d1dcd72@eug-lubuntu> (raw)
In-Reply-To: <20200810154132.GA4171851@kroah.com>

On Mon, 10 Aug 2020 17:41:32 +0200
Greg KH <gregkh@linuxfoundation.org> wrote:

> On Tue, Aug 11, 2020 at 01:27:00AM +1000, Eugene Lubarsky wrote:
> > On Mon, 10 Aug 2020 17:04:53 +0200
> > Greg KH <gregkh@linuxfoundation.org> wrote:  
> And have you benchmarked any of this?  Try working with the common
> tools that want this information and see if it actually is noticeable
> (hint, I have been doing that with the readfile work and it's
> surprising what the results are in places...)

Apologies for the delay. Here are some benchmarks with atop.

Patch to atop at: https://github.com/eug48/atop/commits/proc-all
Patch to add /proc/all/schedstat & cpuset below.
atop not collecting threads & cmdline as /proc/all/ doesn't support it.
10,000 processes, kernel 5.8, nested KVM, 2 cores of i7-6700HQ @ 2.60GHz

# USE_PROC_ALL=0 ./atop -w test 1 &
# pidstat -p $(pidof atop) 1

01:33:05   %usr %system  %guest   %wait    %CPU   CPU  Command
01:33:06  33.66   33.66    0.00    0.99   67.33     1  atop
01:33:07  33.00   32.00    0.00    2.00   65.00     0  atop
01:33:08  34.00   31.00    0.00    1.00   65.00     0  atop
...
Average:  33.15   32.79    0.00    1.09   65.94     -  atop


# USE_PROC_ALL=1 ./atop -w test 1 &
# pidstat -p $(pidof atop) 1

01:33:33   %usr %system  %guest   %wait    %CPU   CPU  Command
01:33:34  28.00   14.00    0.00    1.00   42.00     1  atop
01:33:35  28.00   14.00    0.00    0.00   42.00     1  atop
01:33:36  26.00   13.00    0.00    0.00   39.00     1  atop
...
Average:  27.08   12.86    0.00    0.35   39.94     -  atop

So CPU usage goes down from ~65% to ~40%.

Data collection times in milliseconds are:

# xsv cat columns proc.csv procall.csv \
> | xsv stats \
> | xsv select field,min,max,mean,stddev \
> | xsv table
field           min  max  mean     stddev
/proc time      558  625  586.59   18.29
/proc/all time  231  262  243.56   8.02

Much performance optimisation can still be done, e.g. the modified atop
uses fgets which is reading 1KB at a time, and seq_file seems to only
return 4KB pages. task_diag should be much faster still.

I'd imagine this sort of thing would be useful for daemons monitoring
large numbers of processes. I don't run such systems myself; my initial
motivation was frustration with the Kubernetes kubelet having ~2-4% CPU
usage even with a couple of containers. Basic profiling suggests syscalls
have a lot to do with it - it's actually reading loads of tiny cgroup files
and enumerating many directories every 10 seconds, but /proc has similar
issues and seemed easier to start with.

Anyway, I've read that io_uring could also help here in the near future,
which would be really cool especially if there was a way to enumerate
directories and read many files regex-style in a single operation,
e.g. /proc/[0-9].*/(stat|statm|io)

> > Currently I'm trying to re-use the existing code in fs/proc that
> > controls which PIDs are visible, but may well be missing
> > something..  
> 
> Try it out and see if it works correctly.  And pid namespaces are not
> the only thing these days from what I call :)
> 
I've tried `unshare --fork --pid --mount-proc cat /proc/all/stat`
which seems to behave correctly. ptrace flags are handled by the
existing code.


Best Wishes,
Eugene


From 2ffc2e388f7ce4e3f182c2442823e5f13bae03dd Mon Sep 17 00:00:00 2001
From: Eugene Lubarsky <elubarsky.linux@gmail.com>
Date: Tue, 25 Aug 2020 12:36:41 +1000
Subject: [RFC PATCH] fs/proc: /proc/all: add schedstat and cpuset

Signed-off-by: Eugene Lubarsky <elubarsky.linux@gmail.com>
---
 fs/proc/base.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0bba4b3a985e..44d73f1ade4a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3944,6 +3944,36 @@ static int proc_all_io(struct seq_file *m, void *v)
 }
 #endif
 
+#ifdef CONFIG_PROC_PID_CPUSET
+static int proc_all_cpuset(struct seq_file *m, void *v)
+{
+	struct all_iter *iter = (struct all_iter *) v;
+	struct pid_namespace *ns = iter->ns;
+	struct task_struct *task = iter->tgid_iter.task;
+	struct pid *pid = task->thread_pid;
+
+	seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
+	seq_puts(m, " ");
+
+	return proc_cpuset_show(m, ns, pid, task);
+}
+#endif
+
+#ifdef CONFIG_SCHED_INFO
+static int proc_all_schedstat(struct seq_file *m, void *v)
+{
+	struct all_iter *iter = (struct all_iter *) v;
+	struct pid_namespace *ns = iter->ns;
+	struct task_struct *task = iter->tgid_iter.task;
+	struct pid *pid = task->thread_pid;
+
+	seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
+	seq_puts(m, " ");
+
+	return proc_pid_schedstat(m, ns, pid, task);
+}
+#endif
+
 static int proc_all_statx(struct seq_file *m, void *v)
 {
 	struct all_iter *iter = (struct all_iter *) v;
@@ -3990,6 +4020,12 @@ PROC_ALL_OPS(status);
 #ifdef CONFIG_TASK_IO_ACCOUNTING
 	PROC_ALL_OPS(io);
 #endif
+#ifdef CONFIG_SCHED_INFO
+	PROC_ALL_OPS(schedstat);
+#endif
+#ifdef CONFIG_PROC_PID_CPUSET
+	PROC_ALL_OPS(cpuset);
+#endif
 
 #define PROC_ALL_CREATE(NAME) \
 	do { \
@@ -4011,4 +4047,10 @@ void __init proc_all_init(void)
 #ifdef CONFIG_TASK_IO_ACCOUNTING
 	PROC_ALL_CREATE(io);
 #endif
+#ifdef CONFIG_SCHED_INFO
+	PROC_ALL_CREATE(schedstat);
+#endif
+#ifdef CONFIG_PROC_PID_CPUSET
+	PROC_ALL_CREATE(cpuset);
+#endif
 }
-- 
2.25.1


  reply	other threads:[~2020-08-25  9:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 14:58 [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 1/5] fs/proc: Introduce /proc/all/stat Eugene Lubarsky
2020-08-10 18:21   ` kernel test robot
2020-08-10 14:58 ` [RFC PATCH 2/5] fs/proc: Introduce /proc/all/statm Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 3/5] fs/proc: Introduce /proc/all/status Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 4/5] fs/proc: Introduce /proc/all/io Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 5/5] fs/proc: Introduce /proc/all/statx Eugene Lubarsky
2020-08-10 15:04 ` [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes Greg KH
2020-08-10 15:27   ` Eugene Lubarsky
2020-08-10 15:41     ` Greg KH
2020-08-25  9:59       ` Eugene Lubarsky [this message]
2020-08-12  7:51 ` Andrei Vagin
2020-08-13  4:47   ` David Ahern
2020-08-13  8:03     ` Andrei Vagin
2020-08-13 15:01   ` Eugene Lubarsky
2020-08-20 17:41     ` Andrei Vagin
2020-08-25 10:00       ` Eugene Lubarsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200825195909.1d1dcd72@eug-lubuntu \
    --to=elubarsky.linux@gmail.com \
    --cc=adobriyan@gmail.com \
    --cc=avagin@gmail.com \
    --cc=dsahern@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.