From: Simon Kirby <sim@hostway.ca>
To: Mark Moseley <moseleymark@gmail.com>
Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: System CPU increasing on idle 2.6.36
Date: Fri, 7 Jan 2011 16:52:03 -0800 [thread overview]
Message-ID: <20110108005203.GD24433@hostway.ca> (raw)
In-Reply-To: <AANLkTinLsREtOqNGW2j5m0UYros1AgTu953JHjjJZ-F+@mail.gmail.com>
On Fri, Jan 07, 2011 at 10:05:39AM -0800, Mark Moseley wrote:
> NOTE: NFS/Kernel guys: I've left the 2.6.36.2 box still thrashing
> about in case there's something you'd like me to look at.
>
> Ok, both my 2.6.36.2 kernel box and my 2.6.35->2.6.36 bisect box (was
> bisected to ce7db282a3830f57f5b05ec48288c23a5c4d66d5 -- this is my
> first time doing bisect, so I'll preemptively apologize for doing
That's a merge commit, so that doesn't sound right (it doesn't contain
changes by itself). How were you running the bisect? Were you
definitely seeing a good case since v2.6.35? Could you post your
"git bisect log" so I can see if I can follow it?
> anything silly) both went berserk within 15 mins of each other, after
> an uptime of around 63 hours for 2.6.36.2 and 65 hours for the
> bisected box. The 2.6.36.2 one is still running with all the various
> flush-0:xx threads spinning wildly. The bisected box just keeled over
> and died, but is back up now. The only kernel messages logged are just
> of the "task kworker/4:1:359 blocked for more than 120 seconds"
> variety, all with _raw_spin_lock_irq at the top of the stack trace.
>
> Looking at slabinfo, specifically, this: stats=$(grep nfs_inode_cache
> /proc/slabinfo | awk '{ print $2, $3 }')
> the active_objs and num_objs both increase to over a million (these
> boxes are delivering mail to NFS-mounted mailboxes, so that's
> perfectly reasonable). On both boxes, looking at sar, things start to
> go awry around 10am today EST. At that time on the 2.6.36.2 box, the
> NFS numbers look like this:
>
> Fri Jan 7 09:58:00 2011: 1079433 1079650
> Fri Jan 7 09:59:00 2011: 1079632 1080300
> Fri Jan 7 10:00:00 2011: 1080196 1080300
> Fri Jan 7 10:01:01 2011: 1080599 1080716
> Fri Jan 7 10:02:01 2011: 1081074 1081288
>
> on the bisected, like this:
>
> Fri Jan 7 09:59:34 2011: 1162786 1165320
> Fri Jan 7 10:00:34 2011: 1163301 1165320
> Fri Jan 7 10:01:34 2011: 1164369 1165450
> Fri Jan 7 10:02:35 2011: 1164179 1165450
> Fri Jan 7 10:03:35 2011: 1165795 1166958
>
> When the bisected box finally died, the last numbers were:
>
> Fri Jan 7 10:40:33 2011: 1177156 1177202
> Fri Jan 7 10:42:21 2011: 1177157 1177306
> Fri Jan 7 10:44:55 2011: 1177201 1177324
> Fri Jan 7 10:45:55 2011: 1177746 1177826
Yeah, I was seeing load proportional to these, but I don't think their
growth is anything wrong. It just seems to be something that is taking
up CPU from flush proceses as they grow, and something which is causing a
thundering herd of reclaim with lots of spinlock contention.
Hmmm.. I tried to see if drop_caches would clear nfs_inode_cache and thus
the CPU usage. The top of slabtop before doing anything:
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
1256343 1221799 97% 0.95K 38071 33 1218272K nfs_inode_cache
# echo 2 > /proc/sys/vm/drop_caches
this took about 40 seconds to execute, and perf top showed this for
pretty much the whole time:
samples pcnt function DSO
_______ _____ ________________________ ___________
1188.00 45.1% _raw_spin_lock [kernel]
797.00 30.2% rpcauth_cache_shrinker [kernel]
132.00 5.0% shrink_slab [kernel]
60.00 2.3% queue_io [kernel]
59.00 2.2% _cond_resched [kernel]
44.00 1.7% nfs_writepages [kernel]
35.00 1.3% writeback_single_inode [kernel]
34.00 1.3% writeback_sb_inodes [kernel]
23.00 0.9% do_writepages [kernel]
20.00 0.8% bit_waitqueue [kernel]
15.00 0.6% redirty_tail [kernel]
11.00 0.4% __iget [kernel]
11.00 0.4% __GI_strstr libc-2.7.so
slabtop then showed:
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
1256211 531720 42% 0.95K 38067 33 1218144K nfs_inode_cache
and this doesn't change further with drop_caches=3 or 1. It's strange
that most of them were freed, but not all... We do actually have
RPC_CREDCACHE_DEFAULT_HASHBITS set to 10 at build time, which might
explain a bigger credcache and time to free that. Did you change your
auth_hashtable_size at all? Anyway, even with all of that, system CPU
doesn't change.
Ok, weird, Running "sync" actually caused "vmstat 1" to go from 25% CPU
spikes every 5 seconds with nfs_writepages in the "perf top" every 5
seconds to being idle! /proc/vmstat showed only nr_dirty 8 and
nr_writeback 0 before I ran "sync". Either that, or it was just good
timing.
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 6767892 9696 41088 0 0 5 180 15 70 62 7 29 1
0 0 0 6767892 9696 41088 0 0 0 0 1124 1978 0 0 100 0
0 0 0 6767892 9696 41088 0 0 0 0 1105 1967 0 0 100 0
0 0 0 6767892 9696 41088 0 0 0 0 1180 1984 0 2 98 0
2 0 0 6767892 9696 41088 0 0 0 0 2102 1968 0 22 78 0
0 0 0 6767768 9704 41084 0 0 0 52 1504 2022 0 8 91 1
0 0 0 6767768 9704 41084 0 0 0 0 1059 1965 0 0 100 0
0 0 0 6767768 9704 41088 0 0 0 0 1077 1982 0 0 100 0
0 0 0 6767768 9704 41088 0 0 0 0 1166 1972 1 2 97 0
0 0 0 6767768 9704 41088 0 0 0 0 1353 1990 0 6 94 0
0 0 0 6767644 9704 41100 0 0 0 0 2531 2050 0 27 72 0
0 0 0 6767644 9704 41100 0 0 0 0 1065 1974 0 0 100 0
1 0 0 6767644 9704 41100 0 0 0 0 1090 1967 0 0 100 0
0 0 0 6767644 9704 41100 0 0 0 0 1175 1971 0 2 98 0
0 0 0 6767644 9704 41100 0 0 0 0 1321 1967 0 6 94 0
1 0 0 6767644 9712 41100 0 0 0 52 1995 1995 0 18 81 1
0 0 0 6767644 9712 41100 0 0 0 0 1342 1964 0 6 94 0
0 0 0 6767644 9712 41100 0 0 0 0 1064 1974 0 0 100 0
0 0 0 6767644 9712 41100 0 0 0 0 1163 1968 0 2 98 0
0 0 0 6767644 9712 41100 0 0 0 0 1390 1976 1 6 94 0
0 0 0 6767768 9712 41112 0 0 0 0 1282 2018 0 1 99 0
0 0 0 6767768 9712 41112 0 0 0 0 2166 1972 0 24 76 0
0 0 0 6786260 9712 41112 0 0 0 0 1092 1989 0 0 100 0
0 0 0 6786260 9712 41112 0 0 0 0 1039 2003 0 2 98 0
0 0 0 6774588 9712 41112 0 0 0 0 1524 2015 3 9 89 0
0 1 0 6770124 9716 41112 0 0 0 48 1315 1990 3 0 87 11
1 0 0 6769628 9720 41112 0 0 0 4 2196 1966 0 20 79 0
0 0 0 6768364 9724 41108 0 0 36 0 1403 2031 1 5 94 1
0 0 0 6768388 9724 41144 0 0 0 0 1185 1969 0 2 98 0
0 0 0 6768388 9724 41144 0 0 0 0 1351 1975 0 6 94 0
0 0 0 6768140 9724 41156 0 0 0 0 1261 2020 0 1 99 0
0 0 0 6768140 9724 41156 0 0 0 0 1053 1973 0 0 100 0
1 0 0 6768140 9724 41156 0 0 0 0 2194 1971 1 26 73 0
0 0 0 6768140 9732 41148 0 0 0 52 1283 2088 0 4 96 0
2 0 0 6768140 9732 41156 0 0 0 0 1184 1965 0 2 98 0
0 0 0 6786368 9732 41156 0 0 0 0 1297 2000 0 4 97 0
0 0 0 6786384 9732 41156 0 0 0 0 998 1949 0 0 100 0
(stop vmstat 1, run sync, restart vmstat 1)
1 0 0 6765304 9868 41736 0 0 0 0 1239 1967 0 0 100 0
0 0 0 6765304 9868 41736 0 0 0 0 1174 2035 0 0 100 0
0 0 0 6765304 9868 41736 0 0 0 0 1084 1976 0 0 100 0
1 0 0 6765172 9868 41736 0 0 0 0 1100 1975 0 0 100 0
0 0 0 6765304 9868 41760 0 0 0 0 1501 1997 1 1 98 0
0 0 0 6765304 9868 41760 0 0 0 0 1092 1964 0 0 100 0
1 0 0 6765304 9868 41760 0 0 0 0 1096 1963 0 0 100 0
0 0 0 6765304 9868 41760 0 0 0 0 1083 1967 0 0 100 0
0 0 0 6765304 9876 41752 0 0 0 76 1101 1980 0 0 99 1
0 0 0 6765304 9876 41760 0 0 0 4 1199 2097 0 2 98 0
0 0 0 6765304 9876 41760 0 0 0 0 1055 1964 0 0 100 0
0 0 0 6765304 9876 41760 0 0 0 0 1064 1977 0 0 100 0
0 0 0 6765304 9876 41760 0 0 0 0 1073 1968 0 0 100 0
0 0 0 6765304 9876 41760 0 0 0 0 1080 1976 0 0 100 0
0 0 0 6765156 9884 41772 0 0 0 20 1329 2034 1 1 98 1
0 0 0 6765180 9884 41780 0 0 0 0 1097 1971 0 1 99 0
0 0 0 6765180 9884 41780 0 0 0 0 1073 1960 0 0 100 0
0 0 0 6765180 9884 41780 0 0 0 0 1079 1976 0 0 100 0
0 0 0 6765172 9884 41780 0 0 0 0 1120 1980 0 0 100 0
0 0 0 6765180 9892 41772 0 0 0 60 1097 1982 0 0 100 0
0 0 0 6765180 9892 41780 0 0 0 0 1067 1969 0 0 100 0
0 0 0 6765180 9892 41780 0 0 0 0 1063 1973 0 0 100 0
0 0 0 6765180 9892 41780 0 0 0 0 1064 1969 0 1 100 0
Hmmm. I'll watch it and see if it climbs back up again. That would mean
that for now, echo 2 > /proc/sys/vm/drop_caches ; sync is a workaround
for the problem we're seeing with system CPU slowly creeping up since
2.6.36. I have no idea if this affects the sudden flush process spinlock
contention and attempted writeback without much progress during normal
load, but it certainly changes things here.
http://0x.ca/sim/ref/2.6.37/cpu_after_drop_caches_2_and_sync.png
Simon-
next prev parent reply other threads:[~2011-01-08 0:52 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-08 21:25 System CPU increasing on idle 2.6.36 Simon Kirby
2010-12-08 21:53 ` Trond Myklebust
2010-12-08 22:36 ` Simon Kirby
2010-12-09 4:37 ` Trond Myklebust
2010-12-14 23:38 ` Simon Kirby
2010-12-15 1:10 ` Simon Kirby
2010-12-15 1:56 ` Simon Kirby
2010-12-15 18:08 ` J. Bruce Fields
2010-12-15 18:22 ` Trond Myklebust
2010-12-15 18:38 ` J. Bruce Fields
2010-12-15 19:33 ` Trond Myklebust
2010-12-15 19:49 ` J. Bruce Fields
2010-12-15 19:57 ` Trond Myklebust
2010-12-15 20:19 ` J. Bruce Fields
2010-12-15 20:32 ` Trond Myklebust
2010-12-15 21:48 ` J. Bruce Fields
2010-12-15 22:15 ` Trond Myklebust
2010-12-15 22:29 ` J. Bruce Fields
2010-12-15 22:55 ` J. Bruce Fields
2010-12-15 23:58 ` Trond Myklebust
2010-12-16 0:36 ` J. Bruce Fields
2011-09-27 0:39 ` NFS client growing system CPU Simon Kirby
2011-09-27 11:42 ` Trond Myklebust
2011-09-27 16:49 ` Simon Kirby
2011-09-27 17:04 ` Trond Myklebust
2011-09-28 19:58 ` Simon Kirby
2011-09-30 0:58 ` Simon Kirby
2011-09-30 1:11 ` Myklebust, Trond
2011-10-05 23:07 ` Simon Kirby
2010-12-18 1:08 ` System CPU increasing on idle 2.6.36 Simon Kirby
2010-12-21 20:31 ` Mark Moseley
2010-12-29 22:03 ` Simon Kirby
2011-01-04 17:42 ` Mark Moseley
2011-01-04 21:40 ` Simon Kirby
2011-01-05 19:43 ` Mark Moseley
2011-01-07 18:05 ` Mark Moseley
2011-01-07 18:12 ` Mark Moseley
2011-01-07 19:33 ` Mark Moseley
2011-01-08 0:52 ` Simon Kirby [this message]
2011-01-08 1:30 ` Mark Moseley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110108005203.GD24433@hostway.ca \
--to=sim@hostway.ca \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=moseleymark@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.