From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Mukesh Ojha <quic_mojha@quicinc.com>,
shisiyuan <shisiyuan19870131@gmail.com>
Subject: [PATCH 5.4 13/71] cgroup: Use separate src/dst nodes when preloading css_sets for migration
Date: Tue, 19 Jul 2022 13:53:36 +0200 [thread overview]
Message-ID: <20220719114553.487372412@linuxfoundation.org> (raw)
In-Reply-To: <20220719114552.477018590@linuxfoundation.org>
From: Tejun Heo <tj@kernel.org>
commit 07fd5b6cdf3cc30bfde8fe0f644771688be04447 upstream.
Each cset (css_set) is pinned by its tasks. When we're moving tasks around
across csets for a migration, we need to hold the source and destination
csets to ensure that they don't go away while we're moving tasks about. This
is done by linking cset->mg_preload_node on either the
mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
same cset->mg_preload_node for both the src and dst lists was deemed okay as
a cset can't be both the source and destination at the same time.
Unfortunately, this overloading becomes problematic when multiple tasks are
involved in a migration and some of them are identity noop migrations while
others are actually moving across cgroups. For example, this can happen with
the following sequence on cgroup1:
#1> mkdir -p /sys/fs/cgroup/misc/a/b
#2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs
#3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &
#4> PID=$!
#5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks
#6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs
the process including the group leader back into a. In this final migration,
non-leader threads would be doing identity migration while the group leader
is doing an actual one.
After #3, let's say the whole process was in cset A, and that after #4, the
leader moves to cset B. Then, during #6, the following happens:
1. cgroup_migrate_add_src() is called on B for the leader.
2. cgroup_migrate_add_src() is called on A for the other threads.
3. cgroup_migrate_prepare_dst() is called. It scans the src list.
4. It notices that B wants to migrate to A, so it tries to A to the dst
list but realizes that its ->mg_preload_node is already busy.
5. and then it notices A wants to migrate to A as it's an identity
migration, it culls it by list_del_init()'ing its ->mg_preload_node and
putting references accordingly.
6. The rest of migration takes place with B on the src list but nothing on
the dst list.
This means that A isn't held while migration is in progress. If all tasks
leave A before the migration finishes and the incoming task pins it, the
cset will be destroyed leading to use-after-free.
This is caused by overloading cset->mg_preload_node for both src and dst
preload lists. We wanted to exclude the cset from the src list but ended up
inadvertently excluding it from the dst list too.
This patch fixes the issue by separating out cset->mg_preload_node into
->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
preloadings don't interfere with each other.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reported-by: shisiyuan <shisiyuan19870131@gmail.com>
Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com
Link: https://www.spinics.net/lists/cgroups/msg33313.html
Fixes: f817de98513d ("cgroup: prepare migration path for unified hierarchy")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/cgroup-defs.h | 3 ++-
kernel/cgroup/cgroup.c | 37 +++++++++++++++++++++++--------------
2 files changed, 25 insertions(+), 15 deletions(-)
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -255,7 +255,8 @@ struct css_set {
* List of csets participating in the on-going migration either as
* source or destination. Protected by cgroup_mutex.
*/
- struct list_head mg_preload_node;
+ struct list_head mg_src_preload_node;
+ struct list_head mg_dst_preload_node;
struct list_head mg_node;
/*
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -743,7 +743,8 @@ struct css_set init_css_set = {
.task_iters = LIST_HEAD_INIT(init_css_set.task_iters),
.threaded_csets = LIST_HEAD_INIT(init_css_set.threaded_csets),
.cgrp_links = LIST_HEAD_INIT(init_css_set.cgrp_links),
- .mg_preload_node = LIST_HEAD_INIT(init_css_set.mg_preload_node),
+ .mg_src_preload_node = LIST_HEAD_INIT(init_css_set.mg_src_preload_node),
+ .mg_dst_preload_node = LIST_HEAD_INIT(init_css_set.mg_dst_preload_node),
.mg_node = LIST_HEAD_INIT(init_css_set.mg_node),
/*
@@ -1219,7 +1220,8 @@ static struct css_set *find_css_set(stru
INIT_LIST_HEAD(&cset->threaded_csets);
INIT_HLIST_NODE(&cset->hlist);
INIT_LIST_HEAD(&cset->cgrp_links);
- INIT_LIST_HEAD(&cset->mg_preload_node);
+ INIT_LIST_HEAD(&cset->mg_src_preload_node);
+ INIT_LIST_HEAD(&cset->mg_dst_preload_node);
INIT_LIST_HEAD(&cset->mg_node);
/* Copy the set of subsystem state objects generated in
@@ -2629,21 +2631,27 @@ int cgroup_migrate_vet_dst(struct cgroup
*/
void cgroup_migrate_finish(struct cgroup_mgctx *mgctx)
{
- LIST_HEAD(preloaded);
struct css_set *cset, *tmp_cset;
lockdep_assert_held(&cgroup_mutex);
spin_lock_irq(&css_set_lock);
- list_splice_tail_init(&mgctx->preloaded_src_csets, &preloaded);
- list_splice_tail_init(&mgctx->preloaded_dst_csets, &preloaded);
+ list_for_each_entry_safe(cset, tmp_cset, &mgctx->preloaded_src_csets,
+ mg_src_preload_node) {
+ cset->mg_src_cgrp = NULL;
+ cset->mg_dst_cgrp = NULL;
+ cset->mg_dst_cset = NULL;
+ list_del_init(&cset->mg_src_preload_node);
+ put_css_set_locked(cset);
+ }
- list_for_each_entry_safe(cset, tmp_cset, &preloaded, mg_preload_node) {
+ list_for_each_entry_safe(cset, tmp_cset, &mgctx->preloaded_dst_csets,
+ mg_dst_preload_node) {
cset->mg_src_cgrp = NULL;
cset->mg_dst_cgrp = NULL;
cset->mg_dst_cset = NULL;
- list_del_init(&cset->mg_preload_node);
+ list_del_init(&cset->mg_dst_preload_node);
put_css_set_locked(cset);
}
@@ -2685,7 +2693,7 @@ void cgroup_migrate_add_src(struct css_s
src_cgrp = cset_cgroup_from_root(src_cset, dst_cgrp->root);
- if (!list_empty(&src_cset->mg_preload_node))
+ if (!list_empty(&src_cset->mg_src_preload_node))
return;
WARN_ON(src_cset->mg_src_cgrp);
@@ -2696,7 +2704,7 @@ void cgroup_migrate_add_src(struct css_s
src_cset->mg_src_cgrp = src_cgrp;
src_cset->mg_dst_cgrp = dst_cgrp;
get_css_set(src_cset);
- list_add_tail(&src_cset->mg_preload_node, &mgctx->preloaded_src_csets);
+ list_add_tail(&src_cset->mg_src_preload_node, &mgctx->preloaded_src_csets);
}
/**
@@ -2721,7 +2729,7 @@ int cgroup_migrate_prepare_dst(struct cg
/* look up the dst cset for each src cset and link it to src */
list_for_each_entry_safe(src_cset, tmp_cset, &mgctx->preloaded_src_csets,
- mg_preload_node) {
+ mg_src_preload_node) {
struct css_set *dst_cset;
struct cgroup_subsys *ss;
int ssid;
@@ -2740,7 +2748,7 @@ int cgroup_migrate_prepare_dst(struct cg
if (src_cset == dst_cset) {
src_cset->mg_src_cgrp = NULL;
src_cset->mg_dst_cgrp = NULL;
- list_del_init(&src_cset->mg_preload_node);
+ list_del_init(&src_cset->mg_src_preload_node);
put_css_set(src_cset);
put_css_set(dst_cset);
continue;
@@ -2748,8 +2756,8 @@ int cgroup_migrate_prepare_dst(struct cg
src_cset->mg_dst_cset = dst_cset;
- if (list_empty(&dst_cset->mg_preload_node))
- list_add_tail(&dst_cset->mg_preload_node,
+ if (list_empty(&dst_cset->mg_dst_preload_node))
+ list_add_tail(&dst_cset->mg_dst_preload_node,
&mgctx->preloaded_dst_csets);
else
put_css_set(dst_cset);
@@ -2980,7 +2988,8 @@ static int cgroup_update_dfl_csses(struc
goto out_finish;
spin_lock_irq(&css_set_lock);
- list_for_each_entry(src_cset, &mgctx.preloaded_src_csets, mg_preload_node) {
+ list_for_each_entry(src_cset, &mgctx.preloaded_src_csets,
+ mg_src_preload_node) {
struct task_struct *task, *ntask;
/* all tasks in src_csets need to be migrated */
next prev parent reply other threads:[~2022-07-19 12:09 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-19 11:53 [PATCH 5.4 00/71] 5.4.207-rc1 review Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 01/71] ALSA: hda - Add fixup for Dell Latitidue E5430 Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 02/71] ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 03/71] ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671 Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 04/71] ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221 Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 05/71] ALSA: hda/realtek - Enable the headset-mic on a Xiaomis laptop Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 06/71] xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 07/71] tracing/histograms: Fix memory leak problem Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 08/71] net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 09/71] ip: fix dflt addr selection for connected nexthop Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 10/71] ARM: 9213/1: Print message about disabled Spectre workarounds only once Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 11/71] ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 12/71] wifi: mac80211: fix queue selection for mesh/OCB interfaces Greg Kroah-Hartman
2022-07-19 11:53 ` Greg Kroah-Hartman [this message]
2022-07-19 11:53 ` [PATCH 5.4 14/71] drm/panfrost: Fix shrinker list corruption by madvise IOCTL Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 15/71] nilfs2: fix incorrect masking of permission flags for symlinks Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 16/71] Revert "evm: Fix memleak in init_desc" Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 17/71] sched/rt: Disable RT_RUNTIME_SHARE by default Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 18/71] ext4: fix race condition between ext4_write and ext4_convert_inline_data Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 19/71] ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 20/71] ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 21/71] ARM: 9210/1: Mark the FDT_FIXED sections as shareable Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 22/71] drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector() Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 23/71] ima: Fix a potential integer overflow in ima_appraise_measurement Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 24/71] ASoC: sgtl5000: Fix noise on shutdown/remove Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 25/71] net: stmmac: dwc-qos: Disable split header for Tegra194 Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 26/71] inetpeer: Fix data-races around sysctl Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 27/71] net: Fix data-races around sysctl_mem Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 28/71] cipso: Fix data-races around sysctl Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 29/71] icmp: " Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 30/71] ipv4: Fix a data-race around sysctl_fib_sync_mem Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 31/71] ARM: dts: at91: sama5d2: Fix typo in i2s1 node Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 32/71] ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 33/71] drm/i915/gt: Serialize TLB invalidates with GT resets Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 34/71] icmp: Fix a data-race around sysctl_icmp_ratelimit Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 35/71] icmp: Fix a data-race around sysctl_icmp_ratemask Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 5.4 36/71] raw: Fix a data-race around sysctl_raw_l3mdev_accept Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 37/71] ipv4: Fix data-races around sysctl_ip_dynaddr Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 38/71] net: ftgmac100: Hold reference returned by of_get_child_by_name() Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 39/71] sfc: fix use after free when disabling sriov Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 40/71] seg6: fix skb checksum evaluation in SRH encapsulation/insertion Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 41/71] seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 42/71] seg6: bpf: fix skb checksum in bpf_push_seg6_encap() Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 43/71] sfc: fix kernel panic when creating VF Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 44/71] mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 45/71] virtio_mmio: Add missing PM calls to freeze/restore Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 46/71] virtio_mmio: Restore guest page size on resume Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 47/71] netfilter: br_netfilter: do not skip all hooks with 0 priority Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 48/71] cpufreq: pmac32-cpufreq: Fix refcount leak bug Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 49/71] platform/x86: hp-wmi: Ignore Sanitization Mode event Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 50/71] net: tipc: fix possible refcount leak in tipc_sk_create() Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 51/71] NFC: nxp-nci: dont print header length mismatch on i2c error Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 52/71] nvme: fix regression when disconnect a recovering ctrl Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 53/71] net: sfp: fix memory leak in sfp_probe() Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 54/71] ASoC: ops: Fix off by one in range control validation Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 55/71] ASoC: wm5110: Fix DRE control Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 56/71] ASoC: cs47l15: Fix event generation for low power mux control Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 57/71] ASoC: madera: Fix event generation for OUT1 demux Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 58/71] ASoC: madera: Fix event generation for rate controls Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 59/71] irqchip: or1k-pic: Undefine mask_ack for level triggered hardware Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 60/71] x86: Clear .brk area at early boot Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 61/71] soc: ixp4xx/npe: Fix unused match warning Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 62/71] ARM: dts: stm32: use the correct clock source for CEC on stm32mp151 Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 63/71] signal handling: dont use BUG_ON() for debugging Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 64/71] USB: serial: ftdi_sio: add Belimo device ids Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 65/71] usb: typec: add missing uevent when partner support PD Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 66/71] usb: dwc3: gadget: Fix event pending check Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 67/71] tty: serial: samsung_tty: set dma burst_size to 1 Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 68/71] serial: 8250: fix return error code in serial8250_request_std_resource() Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 69/71] serial: stm32: Clear prev values before setting RTS delays Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 70/71] serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 5.4 71/71] can: m_can: m_can_tx_handler(): fix use after free of skb Greg Kroah-Hartman
2022-07-19 18:11 ` [PATCH 5.4 00/71] 5.4.207-rc1 review Florian Fainelli
2022-07-20 0:59 ` Samuel Zou
2022-07-20 6:18 ` Guenter Roeck
2022-07-20 9:42 ` Naresh Kamboju
2022-07-20 14:50 ` Sudip Mukherjee (Codethink)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220719114553.487372412@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=quic_mojha@quicinc.com \
--cc=shisiyuan19870131@gmail.com \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox