public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: stable@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Roman Gushchin <guro@fb.com>, Josef Bacik <jbacik@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Sasha Levin <alexander.levin@microsoft.com>
Subject: [PATCH AUTOSEL 4.18 48/48] mm: slowly shrink slabs with a relatively small number of objects
Date: Fri,  5 Oct 2018 12:14:24 -0400	[thread overview]
Message-ID: <20181005161424.20521-48-sashal@kernel.org> (raw)
In-Reply-To: <20181005161424.20521-1-sashal@kernel.org>

From: Roman Gushchin <guro@fb.com>

[ Upstream commit 172b06c32b949759fe6313abec514bc4f15014f4 ]

9092c71bb724 ("mm: use sc->priority for slab shrink targets") changed the
way that the target slab pressure is calculated and made it
priority-based:

    delta = freeable >> priority;
    delta *= 4;
    do_div(delta, shrinker->seeks);

The problem is that on a default priority (which is 12) no pressure is
applied at all, if the number of potentially reclaimable objects is less
than 4096 (1<<12).

This causes the last objects on slab caches of no longer used cgroups to
(almost) never get reclaimed.  It's obviously a waste of memory.

It can be especially painful, if these stale objects are holding a
reference to a dying cgroup.  Slab LRU lists are reparented on memcg
offlining, but corresponding objects are still holding a reference to the
dying cgroup.  If we don't scan these objects, the dying cgroup can't go
away.  Most likely, the parent cgroup hasn't any directly charged objects,
only remaining objects from dying children cgroups.  So it can easily hold
a reference to hundreds of dying cgroups.

If there are no big spikes in memory pressure, and new memory cgroups are
created and destroyed periodically, this causes the number of dying
cgroups grow steadily, causing a slow-ish and hard-to-detect memory
"leak".  It's not a real leak, as the memory can be eventually reclaimed,
but it could not happen in a real life at all.  I've seen hosts with a
steadily climbing number of dying cgroups, which doesn't show any signs of
a decline in months, despite the host is loaded with a production
workload.

It is an obvious waste of memory, and to prevent it, let's apply a minimal
pressure even on small shrinker lists.  E.g.  if there are freeable
objects, let's scan at least min(freeable, scan_batch) objects.

This fix significantly improves a chance of a dying cgroup to be
reclaimed, and together with some previous patches stops the steady growth
of the dying cgroups number on some of our hosts.

Link: http://lkml.kernel.org/r/20180905230759.12236-1-guro@fb.com
Fixes: 9092c71bb724 ("mm: use sc->priority for slab shrink targets")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
---
 mm/vmscan.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 03822f86f288..fc0436407471 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -386,6 +386,17 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 	delta = freeable >> priority;
 	delta *= 4;
 	do_div(delta, shrinker->seeks);
+
+	/*
+	 * Make sure we apply some minimal pressure on default priority
+	 * even on small cgroups. Stale objects are not only consuming memory
+	 * by themselves, but can also hold a reference to a dying cgroup,
+	 * preventing it from being reclaimed. A dying cgroup with all
+	 * corresponding structures like per-cpu stats and kmem caches
+	 * can be really big, so it may lead to a significant waste of memory.
+	 */
+	delta = max_t(unsigned long long, delta, min(freeable, batch_size));
+
 	total_scan += delta;
 	if (total_scan < 0) {
 		pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",
-- 
2.17.1


      parent reply	other threads:[~2018-10-05 16:15 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-05 16:13 [PATCH AUTOSEL 4.18 01/48] ASoC: dapm: Fix NULL pointer deference on CODEC to CODEC DAIs Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 02/48] ASoC: max98373: Added speaker FS gain cotnrol register to volatile Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 03/48] ASoC: rt5514: Fix the issue of the delay volume applied again Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 04/48] selftests: android: move config up a level Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 05/48] selftests: kselftest: Remove outdated comment Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 06/48] ASoC: max98373: Added 10ms sleep after amp software reset Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 07/48] ASoC: wm8804: Add ACPI support Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 08/48] ASoC: sigmadsp: safeload should not have lower byte limit Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 09/48] ASoC: q6routing: initialize data correctly Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 10/48] selftests: add headers_install to lib.mk Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 11/48] selftests/efivarfs: add required kernel configs Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 12/48] selftests: memory-hotplug: add required configs Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 13/48] ASoC: rsnd: adg: care clock-frequency size Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 14/48] ASoC: rsnd: don't fallback to PIO mode when -EPROBE_DEFER Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 15/48] hwmon: (nct6775) Fix access to fan pulse registers Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 16/48] Fix cg_read_strcmp() Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 17/48] Add tests for memory.oom.group Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 18/48] ASoC: AMD: Ensure reset bit is cleared before configuring Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 19/48] drm/pl111: Make sure of_device_id tables are NULL terminated Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 20/48] Bluetooth: SMP: Fix trying to use non-existent local OOB data Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 21/48] Bluetooth: Use correct tfm to generate " Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 22/48] Bluetooth: hci_ldisc: Free rw_semaphore on close Sasha Levin
2018-10-05 16:13 ` [PATCH AUTOSEL 4.18 23/48] mfd: omap-usb-host: Fix dts probe of children Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 24/48] KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 25/48] scsi: iscsi: target: Don't use stack buffer for scatterlist Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 26/48] scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted() Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 27/48] sound: enable interrupt after dma buffer initialization Sasha Levin
2018-10-08  9:34   ` Mark Brown
2018-10-08  9:36     ` Takashi Iwai
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 28/48] sound: don't call skl_init_chip() to reset intel skl soc Sasha Levin
2018-10-08  9:34   ` Mark Brown
2018-10-08  9:37     ` Takashi Iwai
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 29/48] bpf: btf: Fix end boundary calculation for type section Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 30/48] bpf: use __GFP_COMP while allocating page Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 31/48] hwmon: (nct6775) Fix virtual temperature sources for NCT6796D Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 32/48] hwmon: (nct6775) Fix RPM output for fan7 on NCT6796D Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 33/48] stmmac: fix valid numbers of unicast filter entries Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 34/48] hwmon: (nct6775) Use different register to get fan RPM for fan7 Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 35/48] net: ethernet: ti: add missing GENERIC_ALLOCATOR dependency Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 36/48] net: macb: disable scatter-gather for macb on sama5d3 Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 37/48] ARM: dts: at91: add new compatibility string " Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 38/48] PCI: hv: support reporting serial number as slot information Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 39/48] hv_netvsc: pair VF based on serial number Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 40/48] clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 41/48] clk: x86: Stop marking clocks as CLK_IS_CRITICAL Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 42/48] pinctrl: cannonlake: Fix gpio base for GPP-E Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 43/48] x86/kvm/lapic: always disable MMIO interface in x2APIC mode Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 44/48] drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7 Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 45/48] drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9 Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 46/48] drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs Sasha Levin
2018-10-05 16:14 ` [PATCH AUTOSEL 4.18 47/48] ubifs: Check for name being NULL while mounting Sasha Levin
2018-10-05 16:14 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181005161424.20521-48-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.levin@microsoft.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=jbacik@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=stable@vger.kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox