From: Alexey Vlasov <renton@renton.name>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@elte.hu>, Mike Galbraith <bitbucket@online.de>,
linux-kernel@vger.kernel.org
Subject: Re: Kernel migration eat CPUs
Date: Thu, 29 Aug 2013 14:10:43 +0400 [thread overview]
Message-ID: <20130829101042.GA13306@beaver> (raw)
In-Reply-To: <20130825142837.GD31370@twins.programming.kicks-ass.net>
Hi Peter,
On Sun, Aug 25, 2013 at 04:28:37PM +0200, Peter Zijlstra wrote:
>
> Gargh.. I've never seen anything like that. Nor ever had a report like
> this. Is there anything in particular one can do to try and reproduce
> this?
I don't know how to reproduce it. This happens by itself and only on
high-loaded servers. For example this happens almost every hour on one
server with kernel 3.8.11 with 10k web-sites and 5k MySQL databases. On
another server with kernel 3.9.4 with same load this can take place 3-5
times per day. Sometimes this happens almost synchronously on both
servers.
I returned to kernel 2.6.35 on servers where this often took place. Or
they are not high-loaded enough that this effect doesn't appear.
For example here is server which earlier worked on kernel 3.9.4. It is
high-loaded, but migration stopped to eat CPUs after downgrade to
2.6.35.
# uname -r
2.6.35.7
# uptime
13:56:34 up 32 days, 10:31, 10 users, load average: 24.44, 23.44, 24.13
# ps -u root -o user,bsdtime,comm | grep -E 'COMMAND|migration'
USER TIME COMMAND
root 4:20 migration/0
root 6:07 migration/1
root 17:00 migration/2
root 5:23 migration/3
root 16:43 migration/4
root 3:48 migration/5
root 12:28 migration/6
root 3:44 migration/7
root 12:25 migration/8
root 3:49 migration/9
root 1:52 migration/10
root 2:51 migration/11
root 1:28 migration/12
root 2:43 migration/13
root 2:16 migration/14
root 4:53 migration/15
root 2:15 migration/16
root 4:13 migration/17
root 2:13 migration/18
root 4:21 migration/19
root 2:07 migration/20
root 4:13 migration/21
root 2:13 migration/22
root 3:26 migration/23
For comparison 3.9.4:
# uptime
13:55:49 up 11 days, 15:36, 11 users, load average: 24.62, 24.36, 23.63
USER TIME COMMAND
root 233:51 migration/0
root 233:38 migration/1
root 231:57 migration/2
root 233:26 migration/3
root 231:46 migration/4
root 233:26 migration/5
root 231:37 migration/6
root 232:56 migration/7
root 231:09 migration/8
root 232:34 migration/9
root 231:04 migration/10
root 232:22 migration/11
root 230:50 migration/12
root 232:16 migration/13
root 230:38 migration/14
root 231:51 migration/15
root 230:04 migration/16
root 230:16 migration/17
root 230:06 migration/18
root 230:22 migration/19
root 229:45 migration/20
root 229:43 migration/21
root 229:27 migration/22
root 229:24 migration/23
root 229:11 migration/24
root 229:25 migration/25
root 229:16 migration/26
root 228:58 migration/27
root 228:48 migration/28
root 229:06 migration/29
root 228:25 migration/30
root 228:25 migration/31
> Could you perhaps send your .config and a function (or function-graph)
> trace for when this happens?
My .config
https://www.dropbox.com/s/vuwvalj58cfgahu/.config_3.9.4-1gb-csmb-tr
I can't make trace because it isn't turned on on my kernels. I will be
able to reboot servers on weekend as there are many clients there and
will send you trace.
> Also, do you use weird things like cgroup/cpusets or other such fancy
> stuff? If so, could you outline your setup?
Grsec patch is used on all kernels. Also there is following patch only on
kernel 3.8.11:
--- kernel/cgroup.c.orig
+++ kernel/cgroup.c
@@ -1931,7 +1931,8 @@
ss->attach(cgrp, &tset);
}
- synchronize_rcu();
+ synchronize_rcu_expedited();
/*
* wake up rmdir() waiter. the rmdir should fail since the
Aslo I use https://github.com/facebook/flashcache/
Actually I really use cgroup namely controllers cpuacct, memory, blkio.
I create cgroup for every user on server, where all users processes are
running. To make it work there are needed patches in Apache/prefork, SSH
and other users staff. There can be about 10k-15k users and accordingly
same amount of cgroups.
The other day I disabled all cgroups, but controllers are still mounted.
# cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 2 1 1
cpuacct 3 1 1
memory 4 1 1
blkio 5 1 1
But migration still eats CPUs. However I also use cgroup on kernel
2.6.35.
--
BRGDS. Alexey Vlasov.
next parent reply other threads:[~2013-08-29 10:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20130823151711.f4c1f596b4c7aa1eecccc9a6@linux-foundation.org>
[not found] ` <20130825142837.GD31370@twins.programming.kicks-ass.net>
2013-08-29 10:10 ` Alexey Vlasov [this message]
2013-09-04 18:53 ` Kernel migration eat CPUs Alexey Vlasov
2013-09-05 11:12 ` Ingo Molnar
2013-09-11 15:18 ` Alexey Vlasov
2013-10-10 7:13 ` Alexey Vlasov
2013-08-22 15:00 Alexey Vlasov
2013-08-29 7:31 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130829101042.GA13306@beaver \
--to=renton@renton.name \
--cc=akpm@linux-foundation.org \
--cc=bitbucket@online.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox