public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexey Vlasov <renton@renton.name>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, Mike Galbraith <bitbucket@online.de>,
	linux-kernel@vger.kernel.org
Subject: Re: Kernel migration eat CPUs
Date: Thu, 29 Aug 2013 14:10:43 +0400	[thread overview]
Message-ID: <20130829101042.GA13306@beaver> (raw)
In-Reply-To: <20130825142837.GD31370@twins.programming.kicks-ass.net>

Hi Peter,

On Sun, Aug 25, 2013 at 04:28:37PM +0200, Peter Zijlstra wrote:
>
> Gargh.. I've never seen anything like that. Nor ever had a report like
> this. Is there anything in particular one can do to try and reproduce
> this?

I don't know how to reproduce it. This happens by itself and only on
high-loaded servers. For example this happens almost every hour on one
server with kernel 3.8.11 with 10k web-sites and 5k MySQL databases. On
another server with kernel 3.9.4 with same load this can take place 3-5
times per day. Sometimes this happens almost synchronously on both
servers.
I returned to kernel 2.6.35 on servers where this often took place. Or
they are not high-loaded enough that this effect doesn't appear.

For example here is server which earlier worked on kernel 3.9.4. It is
high-loaded, but migration stopped to eat CPUs after downgrade to
2.6.35.

# uname -r
2.6.35.7

# uptime
13:56:34 up 32 days, 10:31, 10 users, load average: 24.44, 23.44, 24.13

# ps -u root -o user,bsdtime,comm | grep -E 'COMMAND|migration'
USER       TIME COMMAND
root       4:20 migration/0
root       6:07 migration/1
root      17:00 migration/2
root       5:23 migration/3
root      16:43 migration/4
root       3:48 migration/5
root      12:28 migration/6
root       3:44 migration/7
root      12:25 migration/8
root       3:49 migration/9
root       1:52 migration/10
root       2:51 migration/11
root       1:28 migration/12
root       2:43 migration/13
root       2:16 migration/14
root       4:53 migration/15
root       2:15 migration/16
root       4:13 migration/17
root       2:13 migration/18
root       4:21 migration/19
root       2:07 migration/20
root       4:13 migration/21
root       2:13 migration/22
root       3:26 migration/23

For comparison 3.9.4:
# uptime
13:55:49 up 11 days, 15:36, 11 users, load average: 24.62, 24.36, 23.63

USER       TIME COMMAND
root     233:51 migration/0
root     233:38 migration/1
root     231:57 migration/2
root     233:26 migration/3
root     231:46 migration/4
root     233:26 migration/5
root     231:37 migration/6
root     232:56 migration/7
root     231:09 migration/8
root     232:34 migration/9
root     231:04 migration/10
root     232:22 migration/11
root     230:50 migration/12
root     232:16 migration/13
root     230:38 migration/14
root     231:51 migration/15
root     230:04 migration/16
root     230:16 migration/17
root     230:06 migration/18
root     230:22 migration/19
root     229:45 migration/20
root     229:43 migration/21
root     229:27 migration/22
root     229:24 migration/23
root     229:11 migration/24
root     229:25 migration/25
root     229:16 migration/26
root     228:58 migration/27
root     228:48 migration/28
root     229:06 migration/29
root     228:25 migration/30
root     228:25 migration/31
 

> Could you perhaps send your .config and a function (or function-graph)
> trace for when this happens?

My .config
https://www.dropbox.com/s/vuwvalj58cfgahu/.config_3.9.4-1gb-csmb-tr

I can't make trace because it isn't turned on on my kernels. I will be
able to reboot servers on weekend as there are many clients there and
will send you trace.

> Also, do you use weird things like cgroup/cpusets or other such fancy
> stuff? If so, could you outline your setup?

Grsec patch is used on all kernels. Also there is following patch only on
kernel 3.8.11:

--- kernel/cgroup.c.orig
+++ kernel/cgroup.c 
@@ -1931,7 +1931,8 @@
                           ss->attach(cgrp, &tset);
        }
-       synchronize_rcu();
+       synchronize_rcu_expedited();

        /*
	 * wake up rmdir() waiter. the rmdir should fail since the

Aslo I use https://github.com/facebook/flashcache/

Actually I really use cgroup namely controllers cpuacct, memory, blkio.
I create cgroup for every user on server, where all users processes are
running. To make it work there are needed patches in Apache/prefork, SSH
and other users staff. There can be about 10k-15k users and accordingly
same amount of cgroups.

The other day I disabled all cgroups, but controllers are still mounted.

# cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  2       1       1
cpuacct 3       1       1
memory  4       1       1
blkio   5       1       1

But migration still eats CPUs. However I also use cgroup on kernel
2.6.35.

-- 
BRGDS. Alexey Vlasov.

       reply	other threads:[~2013-08-29 10:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130823151711.f4c1f596b4c7aa1eecccc9a6@linux-foundation.org>
     [not found] ` <20130825142837.GD31370@twins.programming.kicks-ass.net>
2013-08-29 10:10   ` Alexey Vlasov [this message]
2013-09-04 18:53   ` Kernel migration eat CPUs Alexey Vlasov
2013-09-05 11:12     ` Ingo Molnar
2013-09-11 15:18       ` Alexey Vlasov
2013-10-10  7:13         ` Alexey Vlasov
2013-08-22 15:00 Alexey Vlasov
2013-08-29  7:31 ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130829101042.GA13306@beaver \
    --to=renton@renton.name \
    --cc=akpm@linux-foundation.org \
    --cc=bitbucket@online.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox