Re: System CPU increasing on idle 2.6.36

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Simon Kirby <sim@hostway.ca>
To: Mark Moseley <moseleymark@gmail.com>
Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: System CPU increasing on idle 2.6.36
Date: Fri, 7 Jan 2011 16:52:03 -0800	[thread overview]
Message-ID: <20110108005203.GD24433@hostway.ca> (raw)
In-Reply-To: <AANLkTinLsREtOqNGW2j5m0UYros1AgTu953JHjjJZ-F+@mail.gmail.com>

On Fri, Jan 07, 2011 at 10:05:39AM -0800, Mark Moseley wrote:

> NOTE: NFS/Kernel guys: I've left the 2.6.36.2 box still thrashing
> about in case there's something you'd like me to look at.
> 
> Ok, both my 2.6.36.2 kernel box and my 2.6.35->2.6.36 bisect box (was
> bisected to ce7db282a3830f57f5b05ec48288c23a5c4d66d5 -- this is my
> first time doing bisect, so I'll preemptively apologize for doing

That's a merge commit, so that doesn't sound right (it doesn't contain
changes by itself).  How were you running the bisect?  Were you
definitely seeing a good case since v2.6.35?  Could you post your
"git bisect log" so I can see if I can follow it?

> anything silly) both went berserk within 15 mins of each other, after
> an uptime of around 63 hours for 2.6.36.2 and 65 hours for the
> bisected box. The 2.6.36.2 one is still running with all the various
> flush-0:xx threads spinning wildly. The bisected box just keeled over
> and died, but is back up now. The only kernel messages logged are just
> of the "task kworker/4:1:359 blocked for more than 120 seconds"
> variety, all with _raw_spin_lock_irq at the top of the stack trace.
> 
> Looking at slabinfo, specifically, this: stats=$(grep nfs_inode_cache
> /proc/slabinfo | awk '{ print $2, $3 }')
> the active_objs and num_objs both increase to over a million (these
> boxes are delivering mail to NFS-mounted mailboxes, so that's
> perfectly reasonable). On both boxes, looking at sar, things start to
> go awry around 10am today EST. At that time on the 2.6.36.2 box, the
> NFS numbers look like this:
> 
> Fri Jan  7 09:58:00 2011: 1079433 1079650
> Fri Jan  7 09:59:00 2011: 1079632 1080300
> Fri Jan  7 10:00:00 2011: 1080196 1080300
> Fri Jan  7 10:01:01 2011: 1080599 1080716
> Fri Jan  7 10:02:01 2011: 1081074 1081288
> 
> on the bisected, like this:
> 
> Fri Jan  7 09:59:34 2011: 1162786 1165320
> Fri Jan  7 10:00:34 2011: 1163301 1165320
> Fri Jan  7 10:01:34 2011: 1164369 1165450
> Fri Jan  7 10:02:35 2011: 1164179 1165450
> Fri Jan  7 10:03:35 2011: 1165795 1166958
> 
> When the bisected box finally died, the last numbers were:
> 
> Fri Jan  7 10:40:33 2011: 1177156 1177202
> Fri Jan  7 10:42:21 2011: 1177157 1177306
> Fri Jan  7 10:44:55 2011: 1177201 1177324
> Fri Jan  7 10:45:55 2011: 1177746 1177826

Yeah, I was seeing load proportional to these, but I don't think their
growth is anything wrong.  It just seems to be something that is taking
up CPU from flush proceses as they grow, and something which is causing a
thundering herd of reclaim with lots of spinlock contention.

Hmmm.. I tried to see if drop_caches would clear nfs_inode_cache and thus
the CPU usage.  The top of slabtop before doing anything:

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
1256343 1221799  97%    0.95K  38071       33   1218272K nfs_inode_cache

# echo 2 > /proc/sys/vm/drop_caches

this took about 40 seconds to execute, and perf top showed this for
pretty much the whole time:

             samples  pcnt function                 DSO
             _______ _____ ________________________ ___________

             1188.00 45.1% _raw_spin_lock           [kernel]
              797.00 30.2% rpcauth_cache_shrinker   [kernel]
              132.00  5.0% shrink_slab              [kernel]
               60.00  2.3% queue_io                 [kernel]
               59.00  2.2% _cond_resched            [kernel]
               44.00  1.7% nfs_writepages           [kernel]
               35.00  1.3% writeback_single_inode   [kernel]
               34.00  1.3% writeback_sb_inodes      [kernel]
               23.00  0.9% do_writepages            [kernel]
               20.00  0.8% bit_waitqueue            [kernel]
               15.00  0.6% redirty_tail             [kernel]
               11.00  0.4% __iget                   [kernel]
               11.00  0.4% __GI_strstr              libc-2.7.so

slabtop then showed:

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
1256211 531720  42%    0.95K  38067       33   1218144K nfs_inode_cache

and this doesn't change further with drop_caches=3 or 1.  It's strange
that most of them were freed, but not all...  We do actually have
RPC_CREDCACHE_DEFAULT_HASHBITS set to 10 at build time, which might
explain a bigger credcache and time to free that.  Did you change your
auth_hashtable_size at all?  Anyway, even with all of that, system CPU
doesn't change.

Ok, weird, Running "sync" actually caused "vmstat 1" to go from 25% CPU
spikes every 5 seconds with nfs_writepages in the "perf top" every 5
seconds to being idle!  /proc/vmstat showed only nr_dirty 8 and
nr_writeback 0 before I ran "sync".  Either that, or it was just good
timing.

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 6767892   9696  41088    0    0     5   180   15   70 62  7 29  1
 0  0      0 6767892   9696  41088    0    0     0     0 1124 1978  0  0 100  0
 0  0      0 6767892   9696  41088    0    0     0     0 1105 1967  0  0 100  0
 0  0      0 6767892   9696  41088    0    0     0     0 1180 1984  0  2 98  0
 2  0      0 6767892   9696  41088    0    0     0     0 2102 1968  0 22 78  0
 0  0      0 6767768   9704  41084    0    0     0    52 1504 2022  0  8 91  1
 0  0      0 6767768   9704  41084    0    0     0     0 1059 1965  0  0 100  0
 0  0      0 6767768   9704  41088    0    0     0     0 1077 1982  0  0 100  0
 0  0      0 6767768   9704  41088    0    0     0     0 1166 1972  1  2 97  0
 0  0      0 6767768   9704  41088    0    0     0     0 1353 1990  0  6 94  0
 0  0      0 6767644   9704  41100    0    0     0     0 2531 2050  0 27 72  0
 0  0      0 6767644   9704  41100    0    0     0     0 1065 1974  0  0 100  0
 1  0      0 6767644   9704  41100    0    0     0     0 1090 1967  0  0 100  0
 0  0      0 6767644   9704  41100    0    0     0     0 1175 1971  0  2 98  0
 0  0      0 6767644   9704  41100    0    0     0     0 1321 1967  0  6 94  0
 1  0      0 6767644   9712  41100    0    0     0    52 1995 1995  0 18 81  1
 0  0      0 6767644   9712  41100    0    0     0     0 1342 1964  0  6 94  0
 0  0      0 6767644   9712  41100    0    0     0     0 1064 1974  0  0 100  0
 0  0      0 6767644   9712  41100    0    0     0     0 1163 1968  0  2 98  0
 0  0      0 6767644   9712  41100    0    0     0     0 1390 1976  1  6 94  0
 0  0      0 6767768   9712  41112    0    0     0     0 1282 2018  0  1 99  0
 0  0      0 6767768   9712  41112    0    0     0     0 2166 1972  0 24 76  0
 0  0      0 6786260   9712  41112    0    0     0     0 1092 1989  0  0 100  0
 0  0      0 6786260   9712  41112    0    0     0     0 1039 2003  0  2 98  0
 0  0      0 6774588   9712  41112    0    0     0     0 1524 2015  3  9 89  0
 0  1      0 6770124   9716  41112    0    0     0    48 1315 1990  3  0 87 11
 1  0      0 6769628   9720  41112    0    0     0     4 2196 1966  0 20 79  0
 0  0      0 6768364   9724  41108    0    0    36     0 1403 2031  1  5 94  1
 0  0      0 6768388   9724  41144    0    0     0     0 1185 1969  0  2 98  0
 0  0      0 6768388   9724  41144    0    0     0     0 1351 1975  0  6 94  0
 0  0      0 6768140   9724  41156    0    0     0     0 1261 2020  0  1 99  0
 0  0      0 6768140   9724  41156    0    0     0     0 1053 1973  0  0 100  0
 1  0      0 6768140   9724  41156    0    0     0     0 2194 1971  1 26 73  0
 0  0      0 6768140   9732  41148    0    0     0    52 1283 2088  0  4 96  0
 2  0      0 6768140   9732  41156    0    0     0     0 1184 1965  0  2 98  0
 0  0      0 6786368   9732  41156    0    0     0     0 1297 2000  0  4 97  0
 0  0      0 6786384   9732  41156    0    0     0     0  998 1949  0  0 100  0
(stop vmstat 1, run sync, restart vmstat 1)
 1  0      0 6765304   9868  41736    0    0     0     0 1239 1967  0  0 100  0
 0  0      0 6765304   9868  41736    0    0     0     0 1174 2035  0  0 100  0
 0  0      0 6765304   9868  41736    0    0     0     0 1084 1976  0  0 100  0
 1  0      0 6765172   9868  41736    0    0     0     0 1100 1975  0  0 100  0
 0  0      0 6765304   9868  41760    0    0     0     0 1501 1997  1  1 98  0
 0  0      0 6765304   9868  41760    0    0     0     0 1092 1964  0  0 100  0
 1  0      0 6765304   9868  41760    0    0     0     0 1096 1963  0  0 100  0
 0  0      0 6765304   9868  41760    0    0     0     0 1083 1967  0  0 100  0
 0  0      0 6765304   9876  41752    0    0     0    76 1101 1980  0  0 99  1
 0  0      0 6765304   9876  41760    0    0     0     4 1199 2097  0  2 98  0
 0  0      0 6765304   9876  41760    0    0     0     0 1055 1964  0  0 100  0
 0  0      0 6765304   9876  41760    0    0     0     0 1064 1977  0  0 100  0
 0  0      0 6765304   9876  41760    0    0     0     0 1073 1968  0  0 100  0
 0  0      0 6765304   9876  41760    0    0     0     0 1080 1976  0  0 100  0
 0  0      0 6765156   9884  41772    0    0     0    20 1329 2034  1  1 98  1
 0  0      0 6765180   9884  41780    0    0     0     0 1097 1971  0  1 99  0
 0  0      0 6765180   9884  41780    0    0     0     0 1073 1960  0  0 100  0
 0  0      0 6765180   9884  41780    0    0     0     0 1079 1976  0  0 100  0
 0  0      0 6765172   9884  41780    0    0     0     0 1120 1980  0  0 100  0
 0  0      0 6765180   9892  41772    0    0     0    60 1097 1982  0  0 100  0
 0  0      0 6765180   9892  41780    0    0     0     0 1067 1969  0  0 100  0
 0  0      0 6765180   9892  41780    0    0     0     0 1063 1973  0  0 100  0
 0  0      0 6765180   9892  41780    0    0     0     0 1064 1969  0  1 100  0

Hmmm.  I'll watch it and see if it climbs back up again.  That would mean
that for now, echo 2 > /proc/sys/vm/drop_caches ; sync is a workaround
for the problem we're seeing with system CPU slowly creeping up since
2.6.36.  I have no idea if this affects the sudden flush process spinlock
contention and attempted writeback without much progress during normal
load, but it certainly changes things here.

http://0x.ca/sim/ref/2.6.37/cpu_after_drop_caches_2_and_sync.png

Simon-

next prev parent reply	other threads:[~2011-01-08  0:52 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-08 21:25 System CPU increasing on idle 2.6.36 Simon Kirby
2010-12-08 21:53 ` Trond Myklebust
2010-12-08 22:36   ` Simon Kirby
2010-12-09  4:37     ` Trond Myklebust
2010-12-14 23:38       ` Simon Kirby
2010-12-15  1:10         ` Simon Kirby
2010-12-15  1:56           ` Simon Kirby
2010-12-15 18:08             ` J. Bruce Fields
2010-12-15 18:22               ` Trond Myklebust
2010-12-15 18:38                 ` J. Bruce Fields
2010-12-15 19:33                   ` Trond Myklebust
2010-12-15 19:49                     ` J. Bruce Fields
2010-12-15 19:57                       ` Trond Myklebust
2010-12-15 20:19                         ` J. Bruce Fields
2010-12-15 20:32                           ` Trond Myklebust
2010-12-15 21:48                             ` J. Bruce Fields
2010-12-15 22:15                               ` Trond Myklebust
2010-12-15 22:29                                 ` J. Bruce Fields
2010-12-15 22:55                                   ` J. Bruce Fields
2010-12-15 23:58                                     ` Trond Myklebust
2010-12-16  0:36                                       ` J. Bruce Fields
2011-09-27  0:39   ` NFS client growing system CPU Simon Kirby
2011-09-27 11:42     ` Trond Myklebust
2011-09-27 16:49       ` Simon Kirby
2011-09-27 17:04         ` Trond Myklebust
2011-09-28 19:58           ` Simon Kirby
2011-09-30  0:58             ` Simon Kirby
2011-09-30  1:11               ` Myklebust, Trond
2011-10-05 23:07                 ` Simon Kirby
2010-12-18  1:08 ` System CPU increasing on idle 2.6.36 Simon Kirby
2010-12-21 20:31   ` Mark Moseley
2010-12-29 22:03   ` Simon Kirby
2011-01-04 17:42     ` Mark Moseley
2011-01-04 21:40       ` Simon Kirby
2011-01-05 19:43         ` Mark Moseley
2011-01-07 18:05           ` Mark Moseley
2011-01-07 18:12             ` Mark Moseley
2011-01-07 19:33               ` Mark Moseley
2011-01-08  0:52             ` Simon Kirby [this message]
2011-01-08  1:30               ` Mark Moseley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110108005203.GD24433@hostway.ca \
    --to=sim@hostway.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=moseleymark@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.