From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Balbir Singh <bsingharora@gmail.com>,
Zefan Li <lizefan@huawei.com>, Oleg Nesterov <oleg@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>
Subject: [PATCH 4.7 15/59] cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
Date: Mon, 12 Sep 2016 17:29:35 +0200 [thread overview]
Message-ID: <20160912152129.402683650@linuxfoundation.org> (raw)
In-Reply-To: <20160912152128.765864031@linuxfoundation.org>
4.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Balbir Singh <bsingharora@gmail.com>
commit 568ac888215c7fb2fabe8ea739b00ec3c1f5d440 upstream.
cgroup_threadgroup_rwsem is acquired in read mode during process exit
and fork. It is also grabbed in write mode during
__cgroups_proc_write(). I've recently run into a scenario with lots
of memory pressure and OOM and I am beginning to see
systemd
__switch_to+0x1f8/0x350
__schedule+0x30c/0x990
schedule+0x48/0xc0
percpu_down_write+0x114/0x170
__cgroup_procs_write.isra.12+0xb8/0x3c0
cgroup_file_write+0x74/0x1a0
kernfs_fop_write+0x188/0x200
__vfs_write+0x6c/0xe0
vfs_write+0xc0/0x230
SyS_write+0x6c/0x110
system_call+0x38/0xb4
This thread is waiting on the reader of cgroup_threadgroup_rwsem to
exit. The reader itself is under memory pressure and has gone into
reclaim after fork. There are times the reader also ends up waiting on
oom_lock as well.
__switch_to+0x1f8/0x350
__schedule+0x30c/0x990
schedule+0x48/0xc0
jbd2_log_wait_commit+0xd4/0x180
ext4_evict_inode+0x88/0x5c0
evict+0xf8/0x2a0
dispose_list+0x50/0x80
prune_icache_sb+0x6c/0x90
super_cache_scan+0x190/0x210
shrink_slab.part.15+0x22c/0x4c0
shrink_zone+0x288/0x3c0
do_try_to_free_pages+0x1dc/0x590
try_to_free_pages+0xdc/0x260
__alloc_pages_nodemask+0x72c/0xc90
alloc_pages_current+0xb4/0x1a0
page_table_alloc+0xc0/0x170
__pte_alloc+0x58/0x1f0
copy_page_range+0x4ec/0x950
copy_process.isra.5+0x15a0/0x1870
_do_fork+0xa8/0x4b0
ppc_clone+0x8/0xc
In the meanwhile, all processes exiting/forking are blocked almost
stalling the system.
This patch moves the threadgroup_change_begin from before
cgroup_fork() to just before cgroup_canfork(). There is no nee to
worry about threadgroup changes till the task is actually added to the
threadgroup. This avoids having to call reclaim with
cgroup_threadgroup_rwsem held.
tj: Subject and description edits.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/fork.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1406,7 +1406,6 @@ static struct task_struct *copy_process(
p->real_start_time = ktime_get_boot_ns();
p->io_context = NULL;
p->audit_context = NULL;
- threadgroup_change_begin(current);
cgroup_fork(p);
#ifdef CONFIG_NUMA
p->mempolicy = mpol_dup(p->mempolicy);
@@ -1558,6 +1557,7 @@ static struct task_struct *copy_process(
INIT_LIST_HEAD(&p->thread_group);
p->task_works = NULL;
+ threadgroup_change_begin(current);
/*
* Ensure that the cgroup subsystem policies allow the new process to be
* forked. It should be noted the the new process's css_set can be changed
@@ -1658,6 +1658,7 @@ static struct task_struct *copy_process(
bad_fork_cancel_cgroup:
cgroup_cancel_fork(p);
bad_fork_free_pid:
+ threadgroup_change_end(current);
if (pid != &init_struct_pid)
free_pid(pid);
bad_fork_cleanup_thread:
@@ -1690,7 +1691,6 @@ bad_fork_cleanup_policy:
mpol_put(p->mempolicy);
bad_fork_cleanup_threadgroup_lock:
#endif
- threadgroup_change_end(current);
delayacct_tsk_free(p);
bad_fork_cleanup_count:
atomic_dec(&p->cred->user->processes);
next prev parent reply other threads:[~2016-09-12 15:46 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20160912153038uscas1p166159f601d12fb1e6621d90d3ef5d311@uscas1p1.samsung.com>
[not found] ` <20160912152128.765864031@linuxfoundation.org>
2016-09-12 15:29 ` [PATCH 4.7 01/59] Revert "floppy: refactor open() flags handling" Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 02/59] apparmor: fix refcount race when finding a child profile Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 03/59] kernel: Add noaudit variant of ns_capable() Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 04/59] net: Use ns_capable_noaudit() when determining net sysctl permissions Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 05/59] fs: Check for invalid i_uid in may_follow_link() Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 06/59] cred: Reject inodes with invalid ids in set_create_file_as() Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 07/59] ext4: validate that metadata blocks do not overlap superblock Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 08/59] ext4: fix xattr shifting when expanding inodes Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 09/59] ext4: fix xattr shifting when expanding inodes part 2 Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 10/59] ext4: properly align shifted xattrs when expanding inodes Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 11/59] ext4: avoid deadlock when expanding inode size Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 13/59] block: Fix race triggered by blk_set_queue_dying() Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 14/59] block: make sure a big bio is split into at most 256 bvecs Greg Kroah-Hartman
2016-09-12 15:29 ` Greg Kroah-Hartman [this message]
2016-09-12 15:29 ` [PATCH 4.7 16/59] cdc-acm: added sanity checking for probe() Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 19/59] drm/atomic: Dont potentially reset color_mgmt_changed on successive property updates Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 20/59] drm: Reject page_flip for !DRIVER_MODESET Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 21/59] drm/msm: fix use of copy_from_user() while holding spinlock Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 22/59] drm/vc4: Use drm_free_large() on handles to match its allocation Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 23/59] drm/vc4: Fix overflow mem unreferencing when the binner runs dry Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 24/59] drm/vc4: Fix oops when userspace hands in a bad BO Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 25/59] ASoC: atmel_ssc_dai: Dont unconditionally reset SSC on stream startup Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 26/59] xfs: fix superblock inprogress check Greg Kroah-Hartman
2016-09-12 15:29 ` Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 27/59] timekeeping: Cap array access in timekeeping_debug Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 28/59] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 30/59] ovl: proper cleanup of workdir Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 31/59] ovl: dont copy up opaqueness Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 32/59] ovl: remove posix_acl_default from workdir Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 33/59] ovl: listxattr: use strnlen() Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 34/59] ovl: fix workdir creation Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 35/59] mei: me: disable driver on SPT SPS firmware Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 36/59] ubifs: Fix xattr generic handler usage Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 38/59] bdev: fix NULL pointer dereference Greg Kroah-Hartman
2016-09-12 15:29 ` [PATCH 4.7 39/59] bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 40/59] irqchip/mips-gic: Cleanup chip and handler setup Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 41/59] irqchip/mips-gic: Implement activate op for device domain Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 42/59] vhost/scsi: fix reuse of &vq->iov[out] in response Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 43/59] x86/apic: Do not init irq remapping if ioapic is disabled Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 44/59] xprtrdma: Create common scatterlist fields in rpcrdma_mw Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 46/59] fscrypto: add authorization check for setting encryption policy Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 47/59] fscrypto: only allow setting encryption policy on directories Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 48/59] ALSA: usb-audio: Add sample rate inquiry quirk for B850V3 CP2114 Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 49/59] ALSA: firewire-tascam: accessing to user space outside spinlock Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 50/59] ALSA: fireworks: " Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 51/59] ALSA: rawmidi: Fix possible deadlock with virmidi registration Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 52/59] ALSA: hda - Add headset mic quirk for Dell Inspiron 5468 Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 53/59] ALSA: hda - Enable subwoofer on Dell Inspiron 7559 Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 54/59] ALSA: timer: fix NULL pointer dereference in read()/ioctl() race Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 55/59] ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 56/59] ALSA: timer: fix NULL pointer dereference on memory allocation failure Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 57/59] ALSA: timer: Fix zero-division by continue of uninitialized instance Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 58/59] scsi: fix upper bounds check of sense key in scsi_sense_key_string() Greg Kroah-Hartman
2016-09-12 15:30 ` [PATCH 4.7 59/59] cpufreq: dt: Add terminate entry for of_device_id tables Greg Kroah-Hartman
2016-09-13 3:07 ` [PATCH 4.7 00/59] 4.7.4-stable review Guenter Roeck
2016-09-13 8:57 ` Greg Kroah-Hartman
[not found] ` <57d7cc1f.0d8d1c0a.df2d3.a693@mx.google.com>
2016-09-13 12:12 ` Greg Kroah-Hartman
2016-09-13 17:40 ` Kevin Hilman
2016-09-13 18:32 ` Shuah Khan
2016-09-15 6:19 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160912152129.402683650@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=bsingharora@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=oleg@redhat.com \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.