From: Jonathan Corbet <corbet@lwn.net>
To: 王贇 <yun.wang@linux.alibaba.com>
Cc: "Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
"Luis Chamberlain" <mcgrof@kernel.org>,
"Kees Cook" <keescook@chromium.org>,
"Iurii Zaikin" <yzaikin@google.com>,
"Michal Koutný" <mkoutny@suse.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org,
"Paul E. McKenney" <paulmck@linux.ibm.com>
Subject: Re: [PATCH 3/3] sched/numa: documentation for per-cgroup numa stat
Date: Wed, 13 Nov 2019 08:09:12 -0700 [thread overview]
Message-ID: <20191113080912.041918ce@lwn.net> (raw)
In-Reply-To: <896a7da3-f139-32e7-8a64-b3562df1a091@linux.alibaba.com>
On Wed, 13 Nov 2019 11:45:59 +0800
王贇 <yun.wang@linux.alibaba.com> wrote:
> Add the description for 'cg_numa_stat', also a new doc to explain
> the details on how to deal with the per-cgroup numa statistics.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Michal Koutný <mkoutny@suse.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
> ---
> Documentation/admin-guide/cg-numa-stat.rst | 161 ++++++++++++++++++++++++
> Documentation/admin-guide/kernel-parameters.txt | 4 +
> Documentation/admin-guide/sysctl/kernel.rst | 9 ++
> 3 files changed, 174 insertions(+)
> create mode 100644 Documentation/admin-guide/cg-numa-stat.rst
Thanks for adding documentation for your new feature! When you add a new
RST file, though, you should also add it to index.rst so that it becomes a
part of the docs build.
A couple of nits below...
> diff --git a/Documentation/admin-guide/cg-numa-stat.rst b/Documentation/admin-guide/cg-numa-stat.rst
> new file mode 100644
> index 000000000000..87b716c51e16
> --- /dev/null
> +++ b/Documentation/admin-guide/cg-numa-stat.rst
> @@ -0,0 +1,161 @@
> +===============================
> +Per-cgroup NUMA statistics
> +===============================
> +
> +Background
> +----------
> +
> +On NUMA platforms, remote memory accessing always has a performance penalty,
> +although we have NUMA balancing working hard to maximum the local accessing
> +proportion, there are still situations it can't helps.
> +
> +This could happen in modern production environment, using bunch of cgroups
> +to classify and control resources which introduced complex configuration on
> +memory policy, CPUs and NUMA node, NUMA balancing could facing the wrong
> +memory policy or exhausted local NUMA node, lead into the low local page
> +accessing proportion.
> +
> +We need to perceive such cases, figure out which workloads from which cgroup
> +has introduced the issues, then we got chance to do adjustment to avoid
> +performance damages.
> +
> +However, there are no hardware counter for per-task local/remote accessing
> +info, we don't know how many remote page accessing has been done for a
> +particular task.
> +
> +Statistics
> +----------
> +
> +Fortunately, we have NUMA Balancing which scan task's mapping and trigger PF
> +periodically, give us the opportunity to record per-task page accessing info.
> +
> +By "echo 1 > /proc/sys/kernel/cg_numa_stat" on runtime or add boot parameter
> +'cg_numa_stat', we will enable the accounting of per-cgroup numa statistics,
> +the 'cpu.numa_stat' entry of CPU cgroup will show statistics:
> +
> + locality -- execution time sectioned by task NUMA locality (in ms)
> + exectime -- execution time sectioned by NUMA node (in ms)
> +
> +We define 'task NUMA locality' as:
> +
> + nr_local_page_access * 100 / (nr_local_page_access + nr_remote_page_access)
> +
> +this per-task percentage value will be updated on the ticks for current task,
> +and the access counter will be updated on task's NUMA balancing PF, so only
> +the pages which NUMA Balancing paid attention to will be accounted.
> +
> +On each tick, we acquire the locality of current task on that CPU, accumulating
> +the ticks into the counter of corresponding locality region, tasks from the
> +same group sharing the counters, becoming the group locality.
> +
> +Similarly, we acquire the NUMA node of current CPU where the current task is
> +executing on, accumulating the ticks into the counter of corresponding node,
> +becoming the per-cgroup node execution time.
> +
> +To be noticed, the accounting is in a hierarchy way, which means the numa
> +statistics representing not only the workload of this group, but also the
> +workloads of all it's descendants.
> +
> +For example the 'cpu.numa_stat' show:
> + locality 39541 60962 36842 72519 118605 721778 946553
> + exectime 1220127 1458684
You almost certainly want that rendered as a literal block, so say
"show::". There are other places where you'll want to do that as well.
> +The locality is sectioned into 7 regions, closely as:
> + 0-13% 14-27% 28-42% 43-56% 57-71% 72-85% 86-100%
> +
> +And exectime is sectioned into 2 nodes, 0 and 1 in this case.
> +
> +Thus we know the workload of this group and it's descendants have totally
> +executed 1220127ms on node_0 and 1458684ms on node_1, tasks with locality
> +around 0~13% executed for 39541 ms, and tasks with locality around 87~100%
> +executed for 946553 ms, which imply most of the memory access are local.
> +
> +Monitoring
> +-----------------
A slightly long underline :)
I'll stop here; thanks again for adding documentation.
jon
next prev parent reply other threads:[~2019-11-13 15:09 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-13 3:43 [PATCH 0/3] sched/numa: introduce advanced numa statistic 王贇
2019-11-13 3:44 ` [PATCH 1/3] sched/numa: advanced per-cgroup " 王贇
2019-11-13 3:45 ` [PATCH 2/3] sched/numa: expose per-task pages-migration-failure 王贇
2019-11-13 3:45 ` [PATCH 3/3] sched/numa: documentation for per-cgroup numa stat 王贇
2019-11-13 15:09 ` Jonathan Corbet [this message]
2019-11-14 1:52 ` 王贇
2019-11-13 18:28 ` Iurii Zaikin
2019-11-14 2:22 ` 王贇
2019-11-15 2:29 ` [PATCH v2 " 王贇
2019-11-20 9:45 ` [PATCH 0/3] sched/numa: introduce advanced numa statistic 王贇
2019-11-25 1:35 ` 王贇
2019-11-27 1:48 ` [PATCH v2 " 王贇
2019-11-27 1:49 ` [PATCH v2 1/3] sched/numa: advanced per-cgroup " 王贇
2019-11-27 10:19 ` Mel Gorman
2019-11-28 2:09 ` 王贇
2019-11-28 12:39 ` Michal Koutný
2019-11-28 13:41 ` 王贇
2019-11-28 15:58 ` Michal Koutný
2019-11-29 1:52 ` 王贇
2019-11-29 5:19 ` 王贇
2019-11-29 10:06 ` Michal Koutný
2019-12-02 2:11 ` 王贇
2019-11-27 1:50 ` [PATCH v2 2/3] sched/numa: expose per-task pages-migration-failure 王贇
2019-11-27 10:00 ` Mel Gorman
2019-12-02 2:22 ` 王贇
2019-11-27 1:50 ` [PATCH v2 3/3] sched/numa: documentation for per-cgroup numa stat 王贇
2019-11-27 4:58 ` Randy Dunlap
2019-11-27 5:54 ` 王贇
2019-12-03 5:59 ` [PATCH v3 0/2] sched/numa: introduce numa locality 王贇
2019-12-03 6:00 ` [PATCH v3 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2019-12-04 2:33 ` Randy Dunlap
2019-12-04 2:38 ` 王贇
2019-12-03 6:02 ` [PATCH v3 2/2] sched/numa: documentation for per-cgroup numa statistics 王贇
2019-12-03 13:43 ` Jonathan Corbet
2019-12-04 2:27 ` 王贇
2019-12-04 7:58 ` [PATCH v4 0/2] sched/numa: introduce numa locality 王贇
2019-12-04 7:59 ` [PATCH v4 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2019-12-05 3:28 ` Randy Dunlap
2019-12-05 3:29 ` Randy Dunlap
2019-12-05 3:52 ` 王贇
2019-12-04 8:00 ` [PATCH v4 2/2] sched/numa: documentation for per-cgroup numa statistics 王贇
2019-12-05 3:40 ` Randy Dunlap
2019-12-05 6:53 ` [PATCH v5 0/2] sched/numa: introduce numa locality 王贇
2019-12-05 6:53 ` [PATCH v5 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2019-12-05 6:54 ` [PATCH v5 2/2] sched/numa: documentation for per-cgroup numa, statistics 王贇
2019-12-10 2:19 ` [PATCH v5 0/2] sched/numa: introduce numa locality 王贇
2019-12-13 1:43 ` [PATCH v6 " 王贇
2019-12-13 1:47 ` [PATCH v6 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-01-03 15:14 ` Michal Koutný
2020-01-04 4:51 ` 王贇
2019-12-13 1:48 ` [PATCH v6 2/2] sched/numa: documentation for per-cgroup numa 王贇
2019-12-27 2:22 ` [PATCH v6 0/2] sched/numa: introduce numa locality 王贇
2020-01-17 2:19 ` 王贇
2020-01-19 6:08 ` [PATCH v7 " 王贇
2020-01-19 6:09 ` [PATCH v7 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-01-19 6:09 ` [PATCH v7 2/2] sched/numa: documentation for per-cgroup numa, statistics 王贇
2020-01-21 0:12 ` Randy Dunlap
2020-01-21 1:58 ` 王贇
2020-01-21 1:56 ` [PATCH v8 0/2] sched/numa: introduce numa locality 王贇
2020-01-21 1:57 ` [PATCH v8 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-01-21 1:57 ` [PATCH v8 2/2] sched/numa: documentation for per-cgroup numa, statistics 王贇
2020-01-21 2:08 ` Randy Dunlap
2020-02-07 1:10 ` [PATCH v8 0/2] sched/numa: introduce numa locality 王贇
2020-02-07 1:25 ` Steven Rostedt
2020-02-07 2:31 ` 王贇
2020-02-07 2:37 ` [PATCH RESEND " 王贇
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191113080912.041918ce@lwn.net \
--to=corbet@lwn.net \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=keescook@chromium.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=paulmck@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=yun.wang@linux.alibaba.com \
--cc=yzaikin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.