From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Mukesh Ojha <quic_mojha@quicinc.com>,
shisiyuan <shisiyuan19870131@gmail.com>
Subject: [PATCH 4.14 07/43] cgroup: Use separate src/dst nodes when preloading css_sets for migration
Date: Tue, 19 Jul 2022 13:53:38 +0200 [thread overview]
Message-ID: <20220719114522.727899896@linuxfoundation.org> (raw)
In-Reply-To: <20220719114521.868169025@linuxfoundation.org>
From: Tejun Heo <tj@kernel.org>
commit 07fd5b6cdf3cc30bfde8fe0f644771688be04447 upstream.
Each cset (css_set) is pinned by its tasks. When we're moving tasks around
across csets for a migration, we need to hold the source and destination
csets to ensure that they don't go away while we're moving tasks about. This
is done by linking cset->mg_preload_node on either the
mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
same cset->mg_preload_node for both the src and dst lists was deemed okay as
a cset can't be both the source and destination at the same time.
Unfortunately, this overloading becomes problematic when multiple tasks are
involved in a migration and some of them are identity noop migrations while
others are actually moving across cgroups. For example, this can happen with
the following sequence on cgroup1:
#1> mkdir -p /sys/fs/cgroup/misc/a/b
#2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs
#3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &
#4> PID=$!
#5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks
#6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs
the process including the group leader back into a. In this final migration,
non-leader threads would be doing identity migration while the group leader
is doing an actual one.
After #3, let's say the whole process was in cset A, and that after #4, the
leader moves to cset B. Then, during #6, the following happens:
1. cgroup_migrate_add_src() is called on B for the leader.
2. cgroup_migrate_add_src() is called on A for the other threads.
3. cgroup_migrate_prepare_dst() is called. It scans the src list.
4. It notices that B wants to migrate to A, so it tries to A to the dst
list but realizes that its ->mg_preload_node is already busy.
5. and then it notices A wants to migrate to A as it's an identity
migration, it culls it by list_del_init()'ing its ->mg_preload_node and
putting references accordingly.
6. The rest of migration takes place with B on the src list but nothing on
the dst list.
This means that A isn't held while migration is in progress. If all tasks
leave A before the migration finishes and the incoming task pins it, the
cset will be destroyed leading to use-after-free.
This is caused by overloading cset->mg_preload_node for both src and dst
preload lists. We wanted to exclude the cset from the src list but ended up
inadvertently excluding it from the dst list too.
This patch fixes the issue by separating out cset->mg_preload_node into
->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
preloadings don't interfere with each other.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reported-by: shisiyuan <shisiyuan19870131@gmail.com>
Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com
Link: https://www.spinics.net/lists/cgroups/msg33313.html
Fixes: f817de98513d ("cgroup: prepare migration path for unified hierarchy")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/cgroup-defs.h | 3 ++-
kernel/cgroup/cgroup.c | 37 +++++++++++++++++++++++--------------
2 files changed, 25 insertions(+), 15 deletions(-)
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -235,7 +235,8 @@ struct css_set {
* List of csets participating in the on-going migration either as
* source or destination. Protected by cgroup_mutex.
*/
- struct list_head mg_preload_node;
+ struct list_head mg_src_preload_node;
+ struct list_head mg_dst_preload_node;
struct list_head mg_node;
/*
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -647,7 +647,8 @@ struct css_set init_css_set = {
.task_iters = LIST_HEAD_INIT(init_css_set.task_iters),
.threaded_csets = LIST_HEAD_INIT(init_css_set.threaded_csets),
.cgrp_links = LIST_HEAD_INIT(init_css_set.cgrp_links),
- .mg_preload_node = LIST_HEAD_INIT(init_css_set.mg_preload_node),
+ .mg_src_preload_node = LIST_HEAD_INIT(init_css_set.mg_src_preload_node),
+ .mg_dst_preload_node = LIST_HEAD_INIT(init_css_set.mg_dst_preload_node),
.mg_node = LIST_HEAD_INIT(init_css_set.mg_node),
};
@@ -1113,7 +1114,8 @@ static struct css_set *find_css_set(stru
INIT_LIST_HEAD(&cset->threaded_csets);
INIT_HLIST_NODE(&cset->hlist);
INIT_LIST_HEAD(&cset->cgrp_links);
- INIT_LIST_HEAD(&cset->mg_preload_node);
+ INIT_LIST_HEAD(&cset->mg_src_preload_node);
+ INIT_LIST_HEAD(&cset->mg_dst_preload_node);
INIT_LIST_HEAD(&cset->mg_node);
/* Copy the set of subsystem state objects generated in
@@ -2399,21 +2401,27 @@ int cgroup_migrate_vet_dst(struct cgroup
*/
void cgroup_migrate_finish(struct cgroup_mgctx *mgctx)
{
- LIST_HEAD(preloaded);
struct css_set *cset, *tmp_cset;
lockdep_assert_held(&cgroup_mutex);
spin_lock_irq(&css_set_lock);
- list_splice_tail_init(&mgctx->preloaded_src_csets, &preloaded);
- list_splice_tail_init(&mgctx->preloaded_dst_csets, &preloaded);
+ list_for_each_entry_safe(cset, tmp_cset, &mgctx->preloaded_src_csets,
+ mg_src_preload_node) {
+ cset->mg_src_cgrp = NULL;
+ cset->mg_dst_cgrp = NULL;
+ cset->mg_dst_cset = NULL;
+ list_del_init(&cset->mg_src_preload_node);
+ put_css_set_locked(cset);
+ }
- list_for_each_entry_safe(cset, tmp_cset, &preloaded, mg_preload_node) {
+ list_for_each_entry_safe(cset, tmp_cset, &mgctx->preloaded_dst_csets,
+ mg_dst_preload_node) {
cset->mg_src_cgrp = NULL;
cset->mg_dst_cgrp = NULL;
cset->mg_dst_cset = NULL;
- list_del_init(&cset->mg_preload_node);
+ list_del_init(&cset->mg_dst_preload_node);
put_css_set_locked(cset);
}
@@ -2455,7 +2463,7 @@ void cgroup_migrate_add_src(struct css_s
src_cgrp = cset_cgroup_from_root(src_cset, dst_cgrp->root);
- if (!list_empty(&src_cset->mg_preload_node))
+ if (!list_empty(&src_cset->mg_src_preload_node))
return;
WARN_ON(src_cset->mg_src_cgrp);
@@ -2466,7 +2474,7 @@ void cgroup_migrate_add_src(struct css_s
src_cset->mg_src_cgrp = src_cgrp;
src_cset->mg_dst_cgrp = dst_cgrp;
get_css_set(src_cset);
- list_add_tail(&src_cset->mg_preload_node, &mgctx->preloaded_src_csets);
+ list_add_tail(&src_cset->mg_src_preload_node, &mgctx->preloaded_src_csets);
}
/**
@@ -2491,7 +2499,7 @@ int cgroup_migrate_prepare_dst(struct cg
/* look up the dst cset for each src cset and link it to src */
list_for_each_entry_safe(src_cset, tmp_cset, &mgctx->preloaded_src_csets,
- mg_preload_node) {
+ mg_src_preload_node) {
struct css_set *dst_cset;
struct cgroup_subsys *ss;
int ssid;
@@ -2510,7 +2518,7 @@ int cgroup_migrate_prepare_dst(struct cg
if (src_cset == dst_cset) {
src_cset->mg_src_cgrp = NULL;
src_cset->mg_dst_cgrp = NULL;
- list_del_init(&src_cset->mg_preload_node);
+ list_del_init(&src_cset->mg_src_preload_node);
put_css_set(src_cset);
put_css_set(dst_cset);
continue;
@@ -2518,8 +2526,8 @@ int cgroup_migrate_prepare_dst(struct cg
src_cset->mg_dst_cset = dst_cset;
- if (list_empty(&dst_cset->mg_preload_node))
- list_add_tail(&dst_cset->mg_preload_node,
+ if (list_empty(&dst_cset->mg_dst_preload_node))
+ list_add_tail(&dst_cset->mg_dst_preload_node,
&mgctx->preloaded_dst_csets);
else
put_css_set(dst_cset);
@@ -2753,7 +2761,8 @@ static int cgroup_update_dfl_csses(struc
goto out_finish;
spin_lock_irq(&css_set_lock);
- list_for_each_entry(src_cset, &mgctx.preloaded_src_csets, mg_preload_node) {
+ list_for_each_entry(src_cset, &mgctx.preloaded_src_csets,
+ mg_src_preload_node) {
struct task_struct *task, *ntask;
/* all tasks in src_csets need to be migrated */
next prev parent reply other threads:[~2022-07-19 12:02 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-19 11:53 [PATCH 4.14 00/43] 4.14.289-rc1 review Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 01/43] ALSA: hda - Add fixup for Dell Latitidue E5430 Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 02/43] ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 03/43] xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 04/43] net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 05/43] ARM: 9213/1: Print message about disabled Spectre workarounds only once Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 06/43] ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction Greg Kroah-Hartman
2022-07-19 11:53 ` Greg Kroah-Hartman [this message]
2022-07-19 11:53 ` [PATCH 4.14 08/43] nilfs2: fix incorrect masking of permission flags for symlinks Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 09/43] net: dsa: bcm_sf2: force pause link settings Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 10/43] xhci: bail out early if driver cant accress host in resume Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 11/43] xhci: make xhci_handshake timeout for xhci_reset() adjustable Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 12/43] ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 13/43] inetpeer: Fix data-races around sysctl Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 14/43] net: Fix data-races around sysctl_mem Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 15/43] cipso: Fix data-races around sysctl Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 16/43] icmp: " Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 17/43] ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 18/43] icmp: Fix a data-race around sysctl_icmp_ratelimit Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 19/43] icmp: Fix a data-race around sysctl_icmp_ratemask Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 20/43] ipv4: Fix data-races around sysctl_ip_dynaddr Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 21/43] sfc: fix use after free when disabling sriov Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 22/43] seg6: fix skb checksum evaluation in SRH encapsulation/insertion Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 23/43] seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 24/43] sfc: fix kernel panic when creating VF Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 25/43] virtio_mmio: Add missing PM calls to freeze/restore Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 26/43] virtio_mmio: Restore guest page size on resume Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 27/43] netfilter: br_netfilter: do not skip all hooks with 0 priority Greg Kroah-Hartman
2022-07-19 11:53 ` [PATCH 4.14 28/43] cpufreq: pmac32-cpufreq: Fix refcount leak bug Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 29/43] platform/x86: hp-wmi: Ignore Sanitization Mode event Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 30/43] net: tipc: fix possible refcount leak in tipc_sk_create() Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 31/43] NFC: nxp-nci: dont print header length mismatch on i2c error Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 32/43] net: sfp: fix memory leak in sfp_probe() Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 33/43] ASoC: ops: Fix off by one in range control validation Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 34/43] ASoC: wm5110: Fix DRE control Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 35/43] irqchip: or1k-pic: Undefine mask_ack for level triggered hardware Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 36/43] x86: Clear .brk area at early boot Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 37/43] signal handling: dont use BUG_ON() for debugging Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 38/43] USB: serial: ftdi_sio: add Belimo device ids Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 39/43] usb: dwc3: gadget: Fix event pending check Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 40/43] tty: serial: samsung_tty: set dma burst_size to 1 Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 41/43] serial: 8250: fix return error code in serial8250_request_std_resource() Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 42/43] mm: invalidate hwpoison page cache page in fault path Greg Kroah-Hartman
2022-07-19 11:54 ` [PATCH 4.14 43/43] can: m_can: m_can_tx_handler(): fix use after free of skb Greg Kroah-Hartman
2022-07-20 6:17 ` [PATCH 4.14 00/43] 4.14.289-rc1 review Guenter Roeck
2022-07-20 10:37 ` Naresh Kamboju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220719114522.727899896@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=quic_mojha@quicinc.com \
--cc=shisiyuan19870131@gmail.com \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox