From: Richard Palethorpe <rpalethorpe@suse.de>
To: ltp@lists.linux.it
Subject: [LTP] [RESEND PATCH 1/4] controllers/memcg: account per-node kernel memory
Date: Wed, 11 Aug 2021 15:42:55 +0100 [thread overview]
Message-ID: <87v94ckpow.fsf@suse.de> (raw)
In-Reply-To: <20210811101058.36695-2-krzysztof.kozlowski@canonical.com>
Hello Krzysztof,
Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> writes:
> Recent Linux kernels () charge groups also with kernel memory. This is
> not limited only to process-allocated memory but also cgroup-handling
> code memory as well.
>
> For example since kernel v5.9 with commit 3e38e0aaca9e ("mm: memcg:
> charge memcg percpu memory to the parent cgroup") creating a subgroup
> causes several kernel allocations towards this group.
>
> These additional kernel memory allocations are proportional to number of
> CPUs and number of nodes.
>
> On c4.8xlarge AWS instance with 36 cores in two nodes with v5.11 Linux
> kernel the memcg_subgroup_charge and memcg_use_hierarchy_test tests were
> failing:
>
> memcg_use_hierarchy_test 1 TINFO: timeout per run is 0h 5m 0s
> memcg_use_hierarchy_test 1 TINFO: set /dev/memcg/memory.use_hierarchy to 0 failed
> memcg_use_hierarchy_test 1 TINFO: test if one of the ancestors goes over its limit, the proces will be killed
> mkdir: cannot create directory ?subgroup?: Cannot allocate memory
> /home/ubuntu/ltp-install/testcases/bin/memcg_use_hierarchy_test.sh: 26: cd: can't cd to subgroup
> memcg_use_hierarchy_test 1 TINFO: Running memcg_process --mmap-lock1 -s 8192
> memcg_use_hierarchy_test 1 TFAIL: process is not killed
> rmdir: failed to remove 'subgroup': No such file or directory
>
> The kernel was unable to create the subgroup (mkdir returned -ENOMEM)
> due to this additional per-node kernel memory allocations.
>
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
> ---
> .../controllers/memcg/functional/memcg_lib.sh | 44 +++++++++++++++++++
> .../memcg/functional/memcg_subgroup_charge.sh | 8 +---
> .../functional/memcg_use_hierarchy_test.sh | 8 +++-
> 3 files changed, 52 insertions(+), 8 deletions(-)
>
> diff --git a/testcases/kernel/controllers/memcg/functional/memcg_lib.sh b/testcases/kernel/controllers/memcg/functional/memcg_lib.sh
> index dad66c798e19..700e9e367bff 100755
> --- a/testcases/kernel/controllers/memcg/functional/memcg_lib.sh
> +++ b/testcases/kernel/controllers/memcg/functional/memcg_lib.sh
> @@ -63,6 +63,50 @@ memcg_require_hierarchy_disabled()
> fi
> }
>
> +# Kernel memory allocated for the process is also charged. It might depend on
> +# the number of CPUs and number of nodes. For example on kernel v5.11
> +# additionally total_cpus (plus 1 or 2) pages are charged to the group via
> +# kernel memory. For a two-node machine, additional 108 pages kernel memory
> +# are charged to the group.
> +#
> +# Adjust the limit to account such per-CPU and per-node kernel memory.
> +# $1 - variable name with limit to adjust
> +memcg_adjust_limit_for_kmem()
> +{
> + [ $# -ne 1 ] && tst_brk TBROK "memcg_adjust_limit_for_kmem expects 1 parameter"
> + eval "local _limit=\$$1"
Could we do this a simpler way?
It would be much easier to read if we just returned the value which
needed to be added.
> +
> + # Total number of CPUs
> + local total_cpus=`tst_ncpus`
> +
> + # Get the number of NODES
Is it acceptable or necessary to use /sys/devices/system/node/possible
(or online) instead?
> + if [ -f "/sys/devices/system/node/has_high_memory" ]; then
> + local mem_string="`cat /sys/devices/system/node/has_high_memory`"
> + else
> + local mem_string="`cat /sys/devices/system/node/has_normal_memory`"
> + fi
> +
> + local total_nodes="`echo $mem_string | tr ',' ' '`"
> + local count=0
> + for item in $total_nodes; do
> + local delta=1
> + if [ "${item#*-*}" != "$item" ]; then
> + delta=$((${item#*-*} - ${item%*-*} + 1))
> + fi
> + count=$((count + $delta))
> + done
Or perhaps we could count the number of 'node[0-9]+' directories? I
think that would be easier to understand.
> + total_nodes=$count
> + # Additional nodes impose charging the kmem, not having regular one node
> + local node_mem=0
> + if [ $total_nodes -gt 1 ]; then
> + node_mem=$((total_nodes - 1))
> + node_mem=$((node_mem * PAGESIZE * 128))
> + fi
> +
> + eval "$1='$((_limit + 4 * PAGESIZE + total_cpus * PAGESIZE + node_mem))'"
> + return 0
> +}
Otherwise looks good.
--
Thank you,
Richard.
next prev parent reply other threads:[~2021-08-11 14:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-11 10:10 [LTP] [RESEND PATCH 0/4] controllers/memcg: fixes for CPU nodes on newer kernels Krzysztof Kozlowski
2021-08-11 10:10 ` [LTP] [RESEND PATCH 1/4] controllers/memcg: account per-node kernel memory Krzysztof Kozlowski
2021-08-11 14:42 ` Richard Palethorpe [this message]
2021-08-12 6:45 ` Krzysztof Kozlowski
2021-08-12 7:53 ` Richard Palethorpe
2021-08-12 7:55 ` Krzysztof Kozlowski
2021-08-11 10:10 ` [LTP] [RESEND PATCH 2/4] controllers/memcg: fix memcg_stat_test on two node machines Krzysztof Kozlowski
2021-08-11 10:10 ` [LTP] [RESEND PATCH 3/4] controllers/memcg: fail early to avoid possible false-positives Krzysztof Kozlowski
2021-08-11 10:10 ` [LTP] [RESEND PATCH 4/4] controllers/memcg: check status of commands using interface Krzysztof Kozlowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v94ckpow.fsf@suse.de \
--to=rpalethorpe@suse.de \
--cc=ltp@lists.linux.it \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.