public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Nanyong Sun <sunnanyong@huawei.com>
Subject: [PATCH 5.4 50/71] mm: slab: fix kmem_cache_create failed when sysfs node not destroyed
Date: Thu, 22 Jul 2021 18:31:25 +0200	[thread overview]
Message-ID: <20210722155619.545713554@linuxfoundation.org> (raw)
In-Reply-To: <20210722155617.865866034@linuxfoundation.org>

From: Nanyong Sun <sunnanyong@huawei.com>

The commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root
kmem_cache destroy") introduced a problem: If one thread destroy a
kmem_cache A and another thread concurrently create a kmem_cache B,
which is mergeable with A and has same size with A, the B may fail to
create due to the duplicate sysfs node.
The scenario in detail:
1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is
mergeable, it decreases A's refcount and if refcount is 0, then call
memcg_set_kmem_cache_dying() which set A->memcg_params.dying = true,
then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost
a while.
Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still
present, it will be deleted in shutdown_cache() which will be called
after flush_memcg_workqueue() is done and lock the slab_mutex again.
2) Now if thread 2 is coming, it use kmem_cache_create() to create B, which
is mergeable with A(their size is same), it gain the lock of slab_mutex,
then call __kmem_cache_alias() trying to find a mergeable node, because
of the below added code in commit d38a2b7a9c93 ("mm: memcg/slab: fix
memory leak at non-root kmem_cache destroy"), B is not mergeable with
A whose memcg_params.dying is true.

int slab_unmergeable(struct kmem_cache *s)
 	if (s->refcount < 0)
 		return 1;

	/*
	 * Skip the dying kmem_cache.
	 */
	if (s->memcg_params.dying)
		return 1;

 	return 0;
 }

So B has to create its own sysfs node by calling:
 create_cache->
	__kmem_cache_create->
		sysfs_slab_add->
			kobject_init_and_add
Because B is mergeable itself, its filename of sysfs node is based on its size,
like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs
node of A is still present now, so kobject_init_and_add() will return
fail and result in kmem_cache_create() fail.

Concurrently modprobe and rmmod the two modules below can reproduce the issue
quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end.

LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versions after
v5.9 do not have this problem because the patchset: ("The new cgroup slab memory
controller") almost refactored memcg slab.

A potential solution(this patch belongs): Just let the dying kmem_cache be mergeable,
the slab_mutex lock can prevent the race between alias kmem_cache creating thread
and root kmem_cache destroying thread. In the destroying thread, after
flush_memcg_workqueue() is done, judge the refcount again, if someone
reference it again during un-lock time, we don't need to destroy the kmem_cache
completely, we can reuse it.

Another potential solution: revert the commit d38a2b7a9c93 ("mm: memcg/slab:
fix memory leak at non-root kmem_cache destroy"), compare to the fail of
kmem_cache_create, the memory leak in special scenario seems less harmful.

Call trace:
 sysfs: cannot create duplicate filename '/kernel/slab/:0000248'
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  dump_backtrace+0x0/0x198
  show_stack+0x24/0x30
  dump_stack+0xb0/0x100
  sysfs_warn_dup+0x6c/0x88
  sysfs_create_dir_ns+0x104/0x120
  kobject_add_internal+0xd0/0x378
  kobject_init_and_add+0x90/0xd8
  sysfs_slab_add+0x16c/0x2d0
  __kmem_cache_create+0x16c/0x1d8
  create_cache+0xbc/0x1f8
  kmem_cache_create_usercopy+0x1a0/0x230
  kmem_cache_create+0x50/0x68
  init_se_kmem_caches+0x38/0x258 [target_core_mod]
  target_core_init_configfs+0x8c/0x390 [target_core_mod]
  do_one_initcall+0x54/0x230
  do_init_module+0x64/0x1ec
  load_module+0x150c/0x16f0
  __se_sys_finit_module+0xf0/0x108
  __arm64_sys_finit_module+0x24/0x30
  el0_svc_common+0x80/0x1c0
  el0_svc_handler+0x78/0xe0
  el0_svc+0x10/0x260
 kobject_add_internal failed for :0000248 with -EEXIST, don't try to register things with the same name in the same directory.
 kmem_cache_create(se_sess_cache) failed with error -17
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  dump_backtrace+0x0/0x198
  show_stack+0x24/0x30
  dump_stack+0xb0/0x100
  kmem_cache_create_usercopy+0xa8/0x230
  kmem_cache_create+0x50/0x68
  init_se_kmem_caches+0x38/0x258 [target_core_mod]
  target_core_init_configfs+0x8c/0x390 [target_core_mod]
  do_one_initcall+0x54/0x230
  do_init_module+0x64/0x1ec
  load_module+0x150c/0x16f0
  __se_sys_finit_module+0xf0/0x108
  __arm64_sys_finit_module+0x24/0x30
  el0_svc_common+0x80/0x1c0
  el0_svc_handler+0x78/0xe0
  el0_svc+0x10/0x260

Fixes: d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy")
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/slab_common.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -325,14 +325,6 @@ int slab_unmergeable(struct kmem_cache *
 	if (s->refcount < 0)
 		return 1;
 
-#ifdef CONFIG_MEMCG_KMEM
-	/*
-	 * Skip the dying kmem_cache.
-	 */
-	if (s->memcg_params.dying)
-		return 1;
-#endif
-
 	return 0;
 }
 
@@ -973,6 +965,16 @@ void kmem_cache_destroy(struct kmem_cach
 	get_online_mems();
 
 	mutex_lock(&slab_mutex);
+
+	/*
+	 * Another thread referenced it again
+	 */
+	if (READ_ONCE(s->refcount)) {
+		spin_lock_irq(&memcg_kmem_wq_lock);
+		s->memcg_params.dying = false;
+		spin_unlock_irq(&memcg_kmem_wq_lock);
+		goto out_unlock;
+	}
 #endif
 
 	err = shutdown_memcg_caches(s);



  parent reply	other threads:[~2021-07-22 16:34 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-22 16:30 [PATCH 5.4 00/71] 5.4.135-rc1 review Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 01/71] ARM: dts: gemini: rename mdio to the right name Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 02/71] ARM: dts: gemini: add device_type on pci Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 03/71] ARM: dts: rockchip: fix pinctrl sleep nodename for rk3036-kylin and rk3288 Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 04/71] arm64: dts: rockchip: fix pinctrl sleep nodename for rk3399.dtsi Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 05/71] ARM: dts: rockchip: Fix the timer clocks order Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 06/71] ARM: dts: rockchip: Fix IOMMU nodes properties on rk322x Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 07/71] ARM: dts: rockchip: Fix power-controller node names for rk3066a Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 08/71] ARM: dts: rockchip: Fix power-controller node names for rk3188 Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 09/71] ARM: dts: rockchip: Fix power-controller node names for rk3288 Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 10/71] arm64: dts: rockchip: Fix power-controller node names for px30 Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 11/71] arm64: dts: rockchip: Fix power-controller node names for rk3328 Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 12/71] reset: ti-syscon: fix to_ti_syscon_reset_data macro Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 13/71] ARM: brcmstb: dts: fix NAND nodes names Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 14/71] ARM: Cygnus: " Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 15/71] ARM: NSP: " Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 16/71] ARM: dts: BCM63xx: Fix " Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 17/71] ARM: dts: Hurricane 2: " Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 18/71] ARM: dts: imx6: phyFLEX: Fix UART hardware flow control Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 19/71] ARM: imx: pm-imx5: Fix references to imx5_cpu_suspend_info Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 20/71] rtc: mxc_v2: add missing MODULE_DEVICE_TABLE Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 21/71] kbuild: sink stdout from cmd for silent build Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 22/71] ARM: dts: am57xx-cl-som-am57x: fix ti,no-reset-on-init flag for gpios Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 23/71] ARM: dts: am437x-gp-evm: " Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 24/71] ARM: dts: stm32: fix gpio-keys node on STM32 MCU boards Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 25/71] ARM: dts: stm32: fix RCC node name on stm32f429 MCU Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 26/71] ARM: dts: stm32: fix timer nodes on STM32 MCU to prevent warnings Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 27/71] arm64: dts: juno: Update SCPI nodes as per the YAML schema Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 28/71] ARM: dts: rockchip: fix supply properties in io-domains nodes Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 29/71] ARM: dts: stm32: fix i2c node name on stm32f746 to prevent warnings Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 30/71] ARM: dts: stm32: move stmmac axi config in ethernet node on stm32mp15 Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 31/71] soc/tegra: fuse: Fix Tegra234-only builds Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 32/71] firmware: tegra: bpmp: " Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 33/71] arm64: dts: ls208xa: remove bus-num from dspi node Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 34/71] arm64: dts: imx8mq: assign PCIe clocks Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 35/71] thermal/core: Correct function name thermal_zone_device_unregister() Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 36/71] kbuild: mkcompile_h: consider timestamp if KBUILD_BUILD_TIMESTAMP is set Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 37/71] rtc: max77686: Do not enforce (incorrect) interrupt trigger type Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 38/71] scsi: aic7xxx: Fix unintentional sign extension issue on left shift of u8 Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 39/71] scsi: libsas: Add LUN number check in .slave_alloc callback Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 40/71] scsi: libfc: Fix array index out of bound exception Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 41/71] scsi: qedf: Add check to synchronize abort and flush Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 42/71] sched/fair: Fix CFS bandwidth hrtimer expiry type Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 43/71] s390: introduce proper type handling call_on_stack() macro Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 44/71] cifs: prevent NULL deref in cifs_compose_mount_options() Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 45/71] arm64: dts: armada-3720-turris-mox: add firmware node Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 46/71] firmware: turris-mox-rwtm: add marvell,armada-3700-rwtm-firmware compatible string Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 47/71] arm64: dts: marvell: armada-37xx: move firmware node to generic dtsi file Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 48/71] f2fs: Show casefolding support only when supported Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 49/71] usb: cdns3: Enable TDL_CHK only for OUT ep Greg Kroah-Hartman
2021-07-22 16:31 ` Greg Kroah-Hartman [this message]
2021-07-22 16:31 ` [PATCH 5.4 51/71] dm writecache: return the exact table values that were set Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 52/71] net: dsa: mv88e6xxx: enable .port_set_policy() on Topaz Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 53/71] net: dsa: mv88e6xxx: enable .rmu_disable() " Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 54/71] net: ipv6: fix return value of ip6_skb_dst_mtu Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 55/71] netfilter: ctnetlink: suspicious RCU usage in ctnetlink_dump_helpinfo Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 56/71] net/sched: act_ct: fix err check for nf_conntrack_confirm Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 57/71] net: bridge: sync fdb to new unicast-filtering ports Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 58/71] net: bcmgenet: Ensure all TX/RX queues DMAs are disabled Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 59/71] net: ip_tunnel: fix mtu calculation for ETHER tunnel devices Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 60/71] net: moxa: fix UAF in moxart_mac_probe Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 61/71] net: qcom/emac: fix UAF in emac_remove Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 62/71] net: ti: fix UAF in tlan_remove_one Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 63/71] net: send SYNACK packet with accepted fwmark Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 64/71] net: validate lwtstate->data before returning from skb_tunnel_info() Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 65/71] net: fddi: fix UAF in fza_probe Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 66/71] dma-buf/sync_file: Dont leak fences on merge failure Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 67/71] tcp: annotate data races around tp->mtu_info Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 68/71] ipv6: tcp: drop silly ICMPv6 packet too big messages Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 69/71] bpftool: Properly close va_list ap by va_end() on error Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 70/71] perf test bpf: Free obj_buf Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 71/71] udp: annotate data races around unix_sk(sk)->gso_size Greg Kroah-Hartman
2021-07-23  6:36 ` [PATCH 5.4 00/71] 5.4.135-rc1 review Samuel Zou
2021-07-23  8:01 ` Jon Hunter
2021-07-23 11:28 ` Sudip Mukherjee
2021-07-23 12:54 ` Naresh Kamboju
2021-07-23 15:58 ` Shuah Khan
2021-07-23 16:16 ` Florian Fainelli
2021-07-23 21:07 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210722155619.545713554@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=sunnanyong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox