From: Bharata B Rao <bharata@linux.ibm.com>
To: Roman Gushchin <guro@fb.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeelb@google.com>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
linux-kernel@vger.kernel.org, kernel-team@fb.com,
Yafang Shao <laoar.shao@gmail.com>
Subject: Re: [PATCH v2 00/28] The new cgroup slab memory controller
Date: Thu, 30 Jan 2020 07:36:26 +0530 [thread overview]
Message-ID: <20200130020626.GA21973@in.ibm.com> (raw)
In-Reply-To: <20200127173453.2089565-1-guro@fb.com>
On Mon, Jan 27, 2020 at 09:34:25AM -0800, Roman Gushchin wrote:
> The existing cgroup slab memory controller is based on the idea of
> replicating slab allocator internals for each memory cgroup.
> This approach promises a low memory overhead (one pointer per page),
> and isn't adding too much code on hot allocation and release paths.
> But is has a very serious flaw: it leads to a low slab utilization.
>
> Using a drgn* script I've got an estimation of slab utilization on
> a number of machines running different production workloads. In most
> cases it was between 45% and 65%, and the best number I've seen was
> around 85%. Turning kmem accounting off brings it to high 90s. Also
> it brings back 30-50% of slab memory. It means that the real price
> of the existing slab memory controller is way bigger than a pointer
> per page.
>
> The real reason why the existing design leads to a low slab utilization
> is simple: slab pages are used exclusively by one memory cgroup.
> If there are only few allocations of certain size made by a cgroup,
> or if some active objects (e.g. dentries) are left after the cgroup is
> deleted, or the cgroup contains a single-threaded application which is
> barely allocating any kernel objects, but does it every time on a new CPU:
> in all these cases the resulting slab utilization is very low.
> If kmem accounting is off, the kernel is able to use free space
> on slab pages for other allocations.
>
> Arguably it wasn't an issue back to days when the kmem controller was
> introduced and was an opt-in feature, which had to be turned on
> individually for each memory cgroup. But now it's turned on by default
> on both cgroup v1 and v2. And modern systemd-based systems tend to
> create a large number of cgroups.
>
> This patchset provides a new implementation of the slab memory controller,
> which aims to reach a much better slab utilization by sharing slab pages
> between multiple memory cgroups. Below is the short description of the new
> design (more details in commit messages).
>
> Accounting is performed per-object instead of per-page. Slab-related
> vmstat counters are converted to bytes. Charging is performed on page-basis,
> with rounding up and remembering leftovers.
>
> Memcg ownership data is stored in a per-slab-page vector: for each slab page
> a vector of corresponding size is allocated. To keep slab memory reparenting
> working, instead of saving a pointer to the memory cgroup directly an
> intermediate object is used. It's simply a pointer to a memcg (which can be
> easily changed to the parent) with a built-in reference counter. This scheme
> allows to reparent all allocated objects without walking them over and
> changing memcg pointer to the parent.
>
> Instead of creating an individual set of kmem_caches for each memory cgroup,
> two global sets are used: the root set for non-accounted and root-cgroup
> allocations and the second set for all other allocations. This allows to
> simplify the lifetime management of individual kmem_caches: they are
> destroyed with root counterparts. It allows to remove a good amount of code
> and make things generally simpler.
>
> The patchset* has been tested on a number of different workloads in our
> production. In all cases it saved significant amount of memory, measured
> from high hundreds of MBs to single GBs per host. On average, the size
> of slab memory has been reduced by 35-45%.
Here are some numbers from multiple runs of sysbench and kernel compilation
with this patchset on a 10 core POWER8 host:
==========================================================================
Peak usage of memory.kmem.usage_in_bytes, memory.usage_in_bytes and
meminfo:Slab for Sysbench oltp_read_write with mysqld running as part
of a mem cgroup (Sampling every 5s)
--------------------------------------------------------------------------
5.5.0-rc7-mm1 +slab patch %reduction
--------------------------------------------------------------------------
memory.kmem.usage_in_bytes 15859712 4456448 72
memory.usage_in_bytes 337510400 335806464 .5
Slab: (kB) 814336 607296 25
memory.kmem.usage_in_bytes 16187392 4653056 71
memory.usage_in_bytes 318832640 300154880 5
Slab: (kB) 789888 559744 29
--------------------------------------------------------------------------
Peak usage of memory.kmem.usage_in_bytes, memory.usage_in_bytes and
meminfo:Slab for kernel compilation (make -s -j64) Compilation was
done from bash that is in a memory cgroup. (Sampling every 5s)
--------------------------------------------------------------------------
5.5.0-rc7-mm1 +slab patch %reduction
--------------------------------------------------------------------------
memory.kmem.usage_in_bytes 338493440 231931904 31
memory.usage_in_bytes 7368015872 6275923968 15
Slab: (kB) 1139072 785408 31
memory.kmem.usage_in_bytes 341835776 236453888 30
memory.usage_in_bytes 6540427264 6072893440 7
Slab: (kB) 1074304 761280 29
memory.kmem.usage_in_bytes 340525056 233570304 31
memory.usage_in_bytes 6406209536 6177357824 3
Slab: (kB) 1244288 739712 40
--------------------------------------------------------------------------
Slab consumption right after boot
--------------------------------------------------------------------------
5.5.0-rc7-mm1 +slab patch %reduction
--------------------------------------------------------------------------
Slab: (kB) 821888 583424 29
==========================================================================
Summary:
With sysbench and kernel compilation, memory.kmem.usage_in_bytes shows
around 70% and 30% reduction consistently.
Didn't see consistent reduction of memory.usage_in_bytes with sysbench and
kernel compilation.
Slab usage (from /proc/meminfo) shows consistent 30% reduction and the
same is seen right after boot too.
Regards,
Bharata.
next prev parent reply other threads:[~2020-01-30 2:06 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-27 17:34 [PATCH v2 00/28] The new cgroup slab memory controller Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 01/28] mm: kmem: cleanup (__)memcg_kmem_charge_memcg() arguments Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 02/28] mm: kmem: cleanup memcg_kmem_uncharge_memcg() arguments Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 03/28] mm: kmem: rename memcg_kmem_(un)charge() into memcg_kmem_(un)charge_page() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 04/28] mm: kmem: switch to nr_pages in (__)memcg_kmem_charge_memcg() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 05/28] mm: memcg/slab: cache page number in memcg_(un)charge_slab() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 06/28] mm: kmem: rename (__)memcg_kmem_(un)charge_memcg() to __memcg_kmem_(un)charge() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 07/28] mm: memcg/slab: introduce mem_cgroup_from_obj() Roman Gushchin
2020-02-03 16:05 ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 08/28] mm: fork: fix kernel_stack memcg stats for various stack implementations Roman Gushchin
2020-02-03 16:12 ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 09/28] mm: memcg/slab: rename __mod_lruvec_slab_state() into __mod_lruvec_obj_state() Roman Gushchin
2020-02-03 16:13 ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 10/28] mm: memcg: introduce mod_lruvec_memcg_state() Roman Gushchin
2020-02-03 17:39 ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 11/28] mm: slub: implement SLUB version of obj_to_index() Roman Gushchin
2020-02-03 17:44 ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 12/28] mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat Roman Gushchin
2020-02-03 17:58 ` Johannes Weiner
2020-02-03 18:25 ` Roman Gushchin
2020-02-03 20:34 ` Johannes Weiner
2020-02-03 22:28 ` Roman Gushchin
2020-02-03 22:39 ` Johannes Weiner
2020-02-04 1:44 ` Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 13/28] mm: vmstat: convert slab vmstat counter to bytes Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 14/28] mm: memcontrol: decouple reference counting from page accounting Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 15/28] mm: memcg/slab: obj_cgroup API Roman Gushchin
2020-02-03 19:31 ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 16/28] mm: memcg/slab: allocate obj_cgroups for non-root slab pages Roman Gushchin
2020-02-03 18:27 ` Johannes Weiner
2020-02-03 18:34 ` Roman Gushchin
2020-02-03 20:46 ` Johannes Weiner
2020-02-03 21:19 ` Roman Gushchin
2020-02-03 22:29 ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 17/28] mm: memcg/slab: save obj_cgroup for non-root slab objects Roman Gushchin
2020-02-03 19:53 ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 18/28] mm: memcg/slab: charge individual slab objects instead of pages Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 19/28] mm: memcg/slab: deprecate memory.kmem.slabinfo Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 20/28] mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 21/28] mm: memcg/slab: use a single set of kmem_caches for all memory cgroups Roman Gushchin
2020-02-03 19:50 ` Johannes Weiner
2020-02-03 20:58 ` Roman Gushchin
2020-02-03 22:17 ` Johannes Weiner
2020-02-03 22:38 ` Roman Gushchin
2020-02-04 1:15 ` Roman Gushchin
2020-02-04 2:47 ` Johannes Weiner
2020-02-04 4:35 ` Roman Gushchin
2020-02-04 18:41 ` Johannes Weiner
2020-02-05 15:58 ` Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 22/28] mm: memcg/slab: simplify memcg cache creation Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 23/28] mm: memcg/slab: deprecate memcg_kmem_get_cache() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 24/28] mm: memcg/slab: deprecate slab_root_caches Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 25/28] mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 26/28] tools/cgroup: add slabinfo.py tool Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 27/28] tools/cgroup: make slabinfo.py compatible with new slab controller Roman Gushchin
2020-01-30 2:17 ` Bharata B Rao
2020-01-30 2:44 ` Roman Gushchin
2020-01-31 22:24 ` Roman Gushchin
2020-02-12 5:21 ` Bharata B Rao
2020-02-12 20:42 ` Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 28/28] kselftests: cgroup: add kernel memory accounting tests Roman Gushchin
2020-01-30 2:06 ` Bharata B Rao [this message]
2020-01-30 2:41 ` [PATCH v2 00/28] The new cgroup slab memory controller Roman Gushchin
2020-08-12 23:16 ` Pavel Tatashin
2020-08-12 23:18 ` Pavel Tatashin
2020-08-13 0:04 ` Roman Gushchin
2020-08-13 0:31 ` Pavel Tatashin
2020-08-28 16:47 ` Pavel Tatashin
2020-09-01 5:28 ` Bharata B Rao
2020-09-01 12:52 ` Pavel Tatashin
2020-09-02 6:23 ` Bharata B Rao
2020-09-02 12:34 ` Pavel Tatashin
2020-09-02 9:53 ` Vlastimil Babka
2020-09-02 10:39 ` David Hildenbrand
2020-09-02 12:42 ` Pavel Tatashin
2020-09-02 13:50 ` Michal Hocko
2020-09-02 14:20 ` Pavel Tatashin
2020-09-03 18:09 ` David Hildenbrand
2020-09-02 11:26 ` Michal Hocko
2020-09-02 12:51 ` Pavel Tatashin
2020-09-02 13:51 ` Michal Hocko
2020-09-02 11:32 ` Michal Hocko
2020-09-02 12:53 ` Pavel Tatashin
2020-09-02 13:52 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200130020626.GA21973@in.ibm.com \
--to=bharata@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=laoar.shao@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=shakeelb@google.com \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.