From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Li Zefan <lizefan@huawei.com>,
Tejun Heo <tj@kernel.org>
Subject: [PATCH 3.15 71/84] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
Date: Tue, 15 Jul 2014 16:18:08 -0700 [thread overview]
Message-ID: <20140715231715.310198631@linuxfoundation.org> (raw)
In-Reply-To: <20140715231713.193785557@linuxfoundation.org>
3.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Zefan <lizefan@huawei.com>
commit 3a32bd72d77058d768dbb38183ad517f720dd1bc upstream.
We've converted cgroup to kernfs so cgroup won't be intertwined with
vfs objects and locking, but there are dark areas.
Run two instances of this script concurrently:
for ((; ;))
{
mount -t cgroup -o cpuacct xxx /cgroup
umount /cgroup
}
After a while, I saw two mount processes were stuck at retrying, because
they were waiting for a subsystem to become free, but the root associated
with this subsystem never got freed.
This can happen, if thread A is in the process of killing superblock but
hasn't called percpu_ref_kill(), and at this time thread B is mounting
the same cgroup root and finds the root in the root list and performs
percpu_ref_try_get().
To fix this, we try to increase both the refcnt of the superblock and the
percpu refcnt of cgroup root.
v2:
- we should try to get both the superblock refcnt and cgroup_root refcnt,
because cgroup_root may have no superblock assosiated with it.
- adjust/add comments.
tj: Updated comments. Renamed @sb to @pinned_sb.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[lizf: Backported to 3.15:
- Adjust context
- s/percpu_tryget_live/atomic_inc_not_zero/]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/cgroup.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1484,6 +1484,7 @@ static struct dentry *cgroup_mount(struc
int flags, const char *unused_dev_name,
void *data)
{
+ struct super_block *pinned_sb = NULL;
struct cgroup_subsys *ss;
struct cgroup_root *root;
struct cgroup_sb_opts opts;
@@ -1584,10 +1585,25 @@ retry:
* destruction to complete so that the subsystems are free.
* We can use wait_queue for the wait but this path is
* super cold. Let's just sleep for a bit and retry.
+
+ * We want to reuse @root whose lifetime is governed by its
+ * ->cgrp. Let's check whether @root is alive and keep it
+ * that way. As cgroup_kill_sb() can happen anytime, we
+ * want to block it by pinning the sb so that @root doesn't
+ * get killed before mount is complete.
+ *
+ * With the sb pinned, inc_not_zero can reliably indicate
+ * whether @root can be reused. If it's being killed,
+ * drain it. We can use wait_queue for the wait but this
+ * path is super cold. Let's just sleep a bit and retry.
*/
- if (!atomic_inc_not_zero(&root->cgrp.refcnt)) {
+ pinned_sb = kernfs_pin_sb(root->kf_root, NULL);
+ if (IS_ERR(pinned_sb) ||
+ !atomic_inc_not_zero(&root->cgrp.refcnt)) {
mutex_unlock(&cgroup_mutex);
mutex_unlock(&cgroup_tree_mutex);
+ if (!IS_ERR_OR_NULL(pinned_sb))
+ deactivate_super(pinned_sb);
msleep(10);
mutex_lock(&cgroup_tree_mutex);
mutex_lock(&cgroup_mutex);
@@ -1634,6 +1650,16 @@ out_unlock:
CGROUP_SUPER_MAGIC, &new_sb);
if (IS_ERR(dentry) || !new_sb)
cgroup_put(&root->cgrp);
+
+ /*
+ * If @pinned_sb, we're reusing an existing root and holding an
+ * extra ref on its sb. Mount is complete. Put the extra ref.
+ */
+ if (pinned_sb) {
+ WARN_ON(new_sb);
+ deactivate_super(pinned_sb);
+ }
+
return dentry;
}
next prev parent reply other threads:[~2014-07-15 23:30 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-15 23:16 [PATCH 3.15 00/84] 3.15.6-stable review Greg Kroah-Hartman
2014-07-15 23:16 ` [PATCH 3.15 01/84] usb: option: Add ID for Telewell TW-LTE 4G v2 Greg Kroah-Hartman
2014-07-15 23:16 ` [PATCH 3.15 02/84] USB: cp210x: add support for Corsair usb dongle Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 03/84] USB: ftdi_sio: Add extra PID Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 04/84] USB: serial: ftdi_sio: Add Infineon Triboard Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 05/84] iio: ti_am335x_adc: Fix: Use same step id at FIFOs both ends Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 06/84] serial: Test for no tx data on tx restart Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 07/84] parisc: add serial ports of C8000/1GHz machine to hardware database Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 08/84] parisc: fix fanotify_mark() syscall on 32bit compat kernel Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 09/84] workqueue: fix dev_set_uevent_suppress() imbalance Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 10/84] cpuset,mempolicy: fix sleeping function called from invalid context Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 11/84] workqueue: zero cpumask of wq_numa_possible_cpumask on init Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 12/84] ahci: imx: manage only sata_ref_clk in imx_sata_enable[disable] Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 13/84] i8k: Fix non-SMP operation Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 14/84] serial: imx: Fix build breakage Greg Kroah-Hartman
2014-07-16 0:24 ` Stephen Rothwell
2014-07-16 0:42 ` Greg Kroah-Hartman
2014-07-16 0:56 ` Stephen Rothwell
2014-07-16 1:07 ` Greg Kroah-Hartman
2014-07-16 1:08 ` Stephen Rothwell
2014-07-15 23:17 ` [PATCH 3.15 15/84] thermal: hwmon: Make the check for critical temp valid consistent Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 16/84] hwmon: (adc128d818) Drop write support on inX_input attributes Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 17/84] hwmon: (amc6821) Fix permissions for temp2_input Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 18/84] hwmon: (emc2103) Clamp limits instead of bailing out Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 19/84] hwmon: (adm1031) Fix writes to limit registers Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 20/84] hwmon: (adm1029) Ensure the fan_div cache is updated in set_fan_div Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 21/84] hwmon: (adm1021) Fix cache problem when writing temperature limits Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 22/84] Revert "ACPI / AC: Remove ACs proc directory." Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 23/84] ACPI / resources: only reject zero length resources based at address zero Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 24/84] ACPI / EC: Avoid race condition related to advance_transaction() Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 25/84] ACPI / EC: Add asynchronous command byte write support Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 26/84] ACPI / EC: Remove duplicated ec_wait_ibf0() waiter Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 27/84] ACPI / EC: Fix race condition in ec_transaction_completed() Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 28/84] powerpc/kvm: Remove redundant save of SIER AND MMCR2 Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 29/84] powerpc/perf: Never program book3s PMCs with values >= 0x80000000 Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 30/84] powerpc/perf: Add PPMU_ARCH_207S define Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 31/84] powerpc/perf: Clear MMCR2 when enabling PMU Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 32/84] cpufreq: Makefile: fix compilation for davinci platform Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 33/84] crypto: sha512_ssse3 - fix byte count to bit count conversion Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 34/84] crypto: caam - fix memleak in caam_jr module Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 35/84] arm64: implement TASK_SIZE_OF Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 36/84] phy: core: Fix error path in phy_create() Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 37/84] clk: spear3xx: Use proper control register offset Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 38/84] clk: s2mps11: Fix double free corruption during driver unbind Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 39/84] clk: qcom: HDMI source sel is 3 not 2 Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 40/84] Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 41/84] dm mpath: fix IO hang due to logic bug in multipath_busy Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 42/84] dm io: fix a race condition in the wake up code for sync_io Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 43/84] dm: allocate a special workqueue for deferred device removal Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 44/84] intel_pstate: Fix setting VID Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 45/84] intel_pstate: dont touch turbo bit if turbo disabled or unavailable Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 46/84] intel_pstate: Update documentation of {max,min}_perf_pct sysfs files Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 47/84] intel_pstate: Set CPU number before accessing MSRs Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 48/84] PCI: Fix unaligned access in AF transaction pending test Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 49/84] ext4: fix unjournalled bg descriptor while initializing inode bitmap Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 50/84] ext4: clarify error count warning messages Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 51/84] ext4: clarify ext4_error message in ext4_mb_generate_buddy_error() Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 52/84] ext4: disable synchronous transaction batching if max_batch_time==0 Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 53/84] ext4: revert commit which was causing fs corruption after journal replays Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 54/84] ext4: fix a potential deadlock in __ext4_es_shrink() Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 55/84] drm/radeon/dpm: Reenabling SS on Cayman Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 56/84] drm/radeon: fix typo in ci_stop_dpm() Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 57/84] drm/radeon: fix typo in golden register setup on evergreen Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 60/84] drm/i915: quirk asserts controllable backlight presence, overriding VBT Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 61/84] drm/i915: Acer C720 and C720P have controllable backlights Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.15 62/84] drm/i915: Toshiba CB35 has a controllable backlight Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 64/84] DMA, CMA: fix possible memory leak Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 65/84] ring-buffer: Check if buffer exists before polling Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 66/84] i40e: fix passing wrong error code to i40e_open() Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 67/84] mtd: nand: omap: fix omap_calculate_ecc_bch() for-loop error Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 68/84] cgroup: fix mount failure in a corner case Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 69/84] kernfs: implement kernfs_root->supers list Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 70/84] kernfs: introduce kernfs_pin_sb() Greg Kroah-Hartman
2014-07-15 23:18 ` Greg Kroah-Hartman [this message]
2014-07-15 23:18 ` [PATCH 3.15 72/84] f2fs: adjust free mem size to flush dentry blocks Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 73/84] f2fs: check bdi->dirty_exceeded when trying to skip data writes Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 74/84] drivers/rtc/rtc-puv3.c: remove "&dev->" for typo issue Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 75/84] drivers/rtc/rtc-puv3.c: use dev_dbg() instead of dev_debug() " Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 76/84] powerpc: Disable RELOCATABLE for COMPILE_TEST with PPC64 Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 77/84] Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 78/84] x86-64, espfix: Dont leak bits 31:16 of %esp returning to 16-bit stack Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 79/84] x86, espfix: Move espfix definitions into a separate header file Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 80/84] x86, espfix: Fix broken header guard Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 81/84] x86, espfix: Make espfix64 a Kconfig option, fix UML Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 82/84] x86, espfix: Make it possible to disable 16-bit support Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 83/84] x86, ioremap: Speed up check for RAM pages Greg Kroah-Hartman
2014-07-15 23:18 ` [PATCH 3.15 84/84] ACPI / battery: Retry to get battery information if failed during probing Greg Kroah-Hartman
2014-07-16 4:28 ` [PATCH 3.15 00/84] 3.15.6-stable review Guenter Roeck
2014-07-16 10:53 ` Satoru Takeuchi
2014-07-17 1:51 ` Greg Kroah-Hartman
2014-07-16 23:09 ` Greg Kroah-Hartman
2014-07-17 0:12 ` Guenter Roeck
2014-07-17 1:50 ` Greg Kroah-Hartman
2014-07-17 12:29 ` Satoru Takeuchi
2014-07-17 21:23 ` Greg Kroah-Hartman
2014-07-17 13:24 ` Shuah Khan
2014-07-17 21:23 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140715231715.310198631@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox