All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Vlasov <renton@renton.name>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, Mike Galbraith <bitbucket@online.de>,
	linux-kernel@vger.kernel.org
Subject: Re: Kernel migration eat CPUs
Date: Thu, 29 Aug 2013 14:10:43 +0400	[thread overview]
Message-ID: <20130829101042.GA13306@beaver> (raw)
In-Reply-To: <20130825142837.GD31370@twins.programming.kicks-ass.net>

Hi Peter,

On Sun, Aug 25, 2013 at 04:28:37PM +0200, Peter Zijlstra wrote:
>
> Gargh.. I've never seen anything like that. Nor ever had a report like
> this. Is there anything in particular one can do to try and reproduce
> this?

I don't know how to reproduce it. This happens by itself and only on
high-loaded servers. For example this happens almost every hour on one
server with kernel 3.8.11 with 10k web-sites and 5k MySQL databases. On
another server with kernel 3.9.4 with same load this can take place 3-5
times per day. Sometimes this happens almost synchronously on both
servers.
I returned to kernel 2.6.35 on servers where this often took place. Or
they are not high-loaded enough that this effect doesn't appear.

For example here is server which earlier worked on kernel 3.9.4. It is
high-loaded, but migration stopped to eat CPUs after downgrade to
2.6.35.

# uname -r
2.6.35.7

# uptime
13:56:34 up 32 days, 10:31, 10 users, load average: 24.44, 23.44, 24.13

# ps -u root -o user,bsdtime,comm | grep -E 'COMMAND|migration'
USER       TIME COMMAND
root       4:20 migration/0
root       6:07 migration/1
root      17:00 migration/2
root       5:23 migration/3
root      16:43 migration/4
root       3:48 migration/5
root      12:28 migration/6
root       3:44 migration/7
root      12:25 migration/8
root       3:49 migration/9
root       1:52 migration/10
root       2:51 migration/11
root       1:28 migration/12
root       2:43 migration/13
root       2:16 migration/14
root       4:53 migration/15
root       2:15 migration/16
root       4:13 migration/17
root       2:13 migration/18
root       4:21 migration/19
root       2:07 migration/20
root       4:13 migration/21
root       2:13 migration/22
root       3:26 migration/23

For comparison 3.9.4:
# uptime
13:55:49 up 11 days, 15:36, 11 users, load average: 24.62, 24.36, 23.63

USER       TIME COMMAND
root     233:51 migration/0
root     233:38 migration/1
root     231:57 migration/2
root     233:26 migration/3
root     231:46 migration/4
root     233:26 migration/5
root     231:37 migration/6
root     232:56 migration/7
root     231:09 migration/8
root     232:34 migration/9
root     231:04 migration/10
root     232:22 migration/11
root     230:50 migration/12
root     232:16 migration/13
root     230:38 migration/14
root     231:51 migration/15
root     230:04 migration/16
root     230:16 migration/17
root     230:06 migration/18
root     230:22 migration/19
root     229:45 migration/20
root     229:43 migration/21
root     229:27 migration/22
root     229:24 migration/23
root     229:11 migration/24
root     229:25 migration/25
root     229:16 migration/26
root     228:58 migration/27
root     228:48 migration/28
root     229:06 migration/29
root     228:25 migration/30
root     228:25 migration/31
 

> Could you perhaps send your .config and a function (or function-graph)
> trace for when this happens?

My .config
https://www.dropbox.com/s/vuwvalj58cfgahu/.config_3.9.4-1gb-csmb-tr

I can't make trace because it isn't turned on on my kernels. I will be
able to reboot servers on weekend as there are many clients there and
will send you trace.

> Also, do you use weird things like cgroup/cpusets or other such fancy
> stuff? If so, could you outline your setup?

Grsec patch is used on all kernels. Also there is following patch only on
kernel 3.8.11:

--- kernel/cgroup.c.orig
+++ kernel/cgroup.c 
@@ -1931,7 +1931,8 @@
                           ss->attach(cgrp, &tset);
        }
-       synchronize_rcu();
+       synchronize_rcu_expedited();

        /*
	 * wake up rmdir() waiter. the rmdir should fail since the

Aslo I use https://github.com/facebook/flashcache/

Actually I really use cgroup namely controllers cpuacct, memory, blkio.
I create cgroup for every user on server, where all users processes are
running. To make it work there are needed patches in Apache/prefork, SSH
and other users staff. There can be about 10k-15k users and accordingly
same amount of cgroups.

The other day I disabled all cgroups, but controllers are still mounted.

# cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  2       1       1
cpuacct 3       1       1
memory  4       1       1
blkio   5       1       1

But migration still eats CPUs. However I also use cgroup on kernel
2.6.35.

-- 
BRGDS. Alexey Vlasov.

       reply	other threads:[~2013-08-29 10:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130823151711.f4c1f596b4c7aa1eecccc9a6@linux-foundation.org>
     [not found] ` <20130825142837.GD31370@twins.programming.kicks-ass.net>
2013-08-29 10:10   ` Alexey Vlasov [this message]
2013-09-04 18:53   ` Kernel migration eat CPUs Alexey Vlasov
2013-09-05 11:12     ` Ingo Molnar
2013-09-11 15:18       ` Alexey Vlasov
2013-10-10  7:13         ` Alexey Vlasov
2013-08-22 15:00 Alexey Vlasov
2013-08-29  7:31 ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130829101042.GA13306@beaver \
    --to=renton@renton.name \
    --cc=akpm@linux-foundation.org \
    --cc=bitbucket@online.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.