From: Simon Kirby <sim@hostway.ca>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: System CPU increasing on idle 2.6.36
Date: Tue, 14 Dec 2010 15:38:43 -0800 [thread overview]
Message-ID: <20101214233843.GB836@hostway.ca> (raw)
In-Reply-To: <1291869437.2821.6.camel@heimdal.trondhjem.org>
On Wed, Dec 08, 2010 at 11:37:17PM -0500, Trond Myklebust wrote:
> On Wed, 2010-12-08 at 14:36 -0800, Simon Kirby wrote:
> > On Wed, Dec 08, 2010 at 04:53:09PM -0500, Trond Myklebust wrote:
> >
> > > Possibly a side effect of the fs/fs-writeback.c changes in 2.6.36? You
> > > do appear to be hitting a lot of spinlock contention, but I suspect that
> > > a lot of it is coming from writeback_sb_inodes, writeback_single_inode
> > > and queue_io, all of which seem unnaturally high on your stats above.
> > >
> > > I don't see how you can be seeing no traffic on the wire. You are
> > > certainly hitting some page writeout (0.2% nfs_pageio_doio).
> >
> > It really seems to not be doing anything. I ran nfsstat -Zcv for 5
> > minutes, and the only non-zero counters are rpc calls and authrefrsh,
> > even though perf top shows similar profiles the whole time.
>
> That sounds like a bug in nfsstat, then. As I said, your trace results
> definitely indicate that you are doing page writeout. I'd cross check
> using wireshark...
I know it's weird, but it really isn't writing anything. Confirmed with
tcpdump and tshark. Maybe it writes a single thing after a lot of
trying, or something.
2.6.36.1 is also doing the same thing when completely idle, after
finishing a bunch of log crunching. I'm just about to try
2.6.37-rc5-git3 on there plus your NFS fixes (which Linus seems to have
half-merged and uploaded as -git3 but not pushed to his public git) and
your NFSv4 UID patch, which I will be testing shortly. If it works, this
should show if the same flush issue exists there or not.
Also, I just saw something similar in from 2.6.37-rc4-git1 plus your leak
fixes (this ver of top's %sys cpu is broken, ignore):
top - 16:54:44 up 8 days, 16:04, 1 user, load average: 40.11, 43.36, 24.71
Tasks: 490 total, 28 running, 461 sleeping, 0 stopped, 1 zombie
Cpu(s): 17.8%us, 6.9%sy, 0.0%ni, 60.2%id, 13.1%wa, 0.0%hi, 2.0%si, 0.0%st
Mem: 4051256k total, 1717004k used, 2334252k free, 17112k buffers
Swap: 979960k total, 268756k used, 711204k free, 362940k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2221 root 20 0 0 0 0 R 16 0.0 4:37.71 flush-0:65
10557 root 20 0 0 0 0 R 16 0.0 2:54.90 flush-0:46
7240 root 20 0 0 0 0 R 15 0.0 7:46.64 flush-0:31
8682 root 20 0 0 0 0 R 15 0.0 3:04.89 flush-0:34
13915 root 20 0 0 0 0 R 15 0.0 4:37.43 flush-0:27
17778 root 20 0 0 0 0 R 15 0.0 4:53.57 flush-0:83
19281 root 20 0 0 0 0 R 15 0.0 3:59.55 flush-0:56
19484 root 20 0 0 0 0 R 15 0.0 17:12.80 flush-0:77
19693 root 20 0 0 0 0 R 15 0.0 6:16.20 flush-0:71
25272 root 20 0 0 0 0 R 15 0.0 3:47.70 flush-0:68
5494 root 20 0 0 0 0 R 13 0.0 28:23.03 flush-0:75
7095 root 20 0 0 0 0 R 13 0.0 1:55.52 flush-0:42
10298 root 20 0 0 0 0 R 13 0.0 3:39.00 flush-0:61
13900 root 20 0 0 0 0 R 13 0.0 8:38.14 flush-0:80
14389 root 20 0 0 0 0 R 13 0.0 4:52.59 flush-0:81
16412 root 20 0 0 0 0 R 13 0.0 2:52.46 flush-0:33
17225 root 20 0 0 0 0 R 13 0.0 4:01.92 flush-0:78
20985 root 20 0 0 0 0 R 13 0.0 3:09.37 flush-0:29
24424 root 20 0 0 0 0 R 13 0.0 2:01.69 flush-0:20
25273 root 20 0 0 0 0 R 13 0.0 4:08.11 flush-0:70
25560 root 20 0 0 0 0 R 13 0.0 3:40.00 flush-0:69
27578 root 20 0 0 0 0 R 13 0.0 6:17.51 flush-0:57
25270 root 20 0 0 0 0 R 12 0.0 5:15.30 flush-0:26
25392 root 20 0 0 0 0 R 12 0.0 2:39.36 flush-0:82
27488 root 20 0 0 0 0 R 12 0.0 3:35.94 flush-0:76
31283 root 20 0 0 0 0 R 12 0.0 2:58.57 flush-0:18
28461 root 20 0 29556 2404 1544 R 4 0.1 0:00.04 top
6426 root 20 0 0 0 0 S 1 0.0 1:57.74 flush-0:73
14639 root 20 0 0 0 0 S 1 0.0 2:41.38 flush-0:58
20081 root 20 0 0 0 0 S 1 0.0 1:51.78 flush-0:60
24845 root 20 0 0 0 0 S 1 0.0 2:55.30 flush-0:32
1 root 20 0 10364 332 328 S 0 0.0 0:10.35 init
PerfTop: 3886 irqs/sec kernel:98.8% exact: 0.0% [1000Hz cycles], (all, 4 CPUs)
----------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _______________________________ __________________________
17772.00 80.6% _raw_spin_lock [kernel.kallsyms]
770.00 3.5% nfs_writepages [kernel.kallsyms]
530.00 2.4% writeback_sb_inodes [kernel.kallsyms]
463.00 2.1% writeback_single_inode [kernel.kallsyms]
364.00 1.7% queue_io [kernel.kallsyms]
283.00 1.3% nfs_write_inode [kernel.kallsyms]
210.00 1.0% bit_waitqueue [kernel.kallsyms]
185.00 0.8% redirty_tail [kernel.kallsyms]
182.00 0.8% do_writepages [kernel.kallsyms]
155.00 0.7% __iget [kernel.kallsyms]
129.00 0.6% iput [kernel.kallsyms]
122.00 0.6% __mark_inode_dirty [kernel.kallsyms]
102.00 0.5% write_cache_pages [kernel.kallsyms]
89.00 0.4% file_softmagic /usr/lib/libmagic.so.1.0.0
67.00 0.3% _atomic_dec_and_lock [kernel.kallsyms]
53.00 0.2% is_bad_inode [kernel.kallsyms]
48.00 0.2% page_fault [kernel.kallsyms]
44.00 0.2% __wake_up_bit [kernel.kallsyms]
32.00 0.1% find_get_pages_tag [kernel.kallsyms]
25.00 0.1% wake_up_bit [kernel.kallsyms]
17.00 0.1% copy_page_c [kernel.kallsyms]
17.00 0.1% clear_page_c [kernel.kallsyms]
13.00 0.1% radix_tree_gang_lookup_tag_slot [kernel.kallsyms]
12.00 0.1% ktime_get [kernel.kallsyms]
12.00 0.1% symbol_filter /usr/local/bin/perf
12.00 0.1% pagevec_lookup_tag [kernel.kallsyms]
11.00 0.0% mapping_tagged [kernel.kallsyms]
10.00 0.0% do_page_fault [kernel.kallsyms]
10.00 0.0% unmap_vmas [kernel.kallsyms]
9.00 0.0% find_get_page [kernel.kallsyms]
9.00 0.0% dso__load_sym /usr/local/bin/perf
9.00 0.0% __phys_addr [kernel.kallsyms]
8.00 0.0% nfs_pageio_init [kernel.kallsyms]
8.00 0.0% __d_lookup [kernel.kallsyms]
8.00 0.0% filemap_fault [kernel.kallsyms]
7.00 0.0% sched_clock_local [kernel.kallsyms]
7.00 0.0% memset /lib/libc-2.7.so
7.00 0.0% number [kernel.kallsyms]
6.00 0.0% handle_mm_fault [kernel.kallsyms]
6.00 0.0% symbols__insert /usr/local/bin/perf
6.00 0.0% radix_tree_tagged [kernel.kallsyms]
5.00 0.0% perf_event_task_tick [kernel.kallsyms]
5.00 0.0% native_write_msr_safe [kernel.kallsyms]
5.00 0.0% kfree [kernel.kallsyms]
5.00 0.0% __switch_to [kernel.kallsyms]
5.00 0.0% strlen /lib/libc-2.7.so
5.00 0.0% _cond_resched [kernel.kallsyms]
5.00 0.0% format_decode [kernel.kallsyms]
5.00 0.0% nfs_pageio_doio [kernel.kallsyms]
# cat /proc/5494/stack
[<ffffffff81052025>] __cond_resched+0x25/0x40
[<ffffffff81135459>] writeback_sb_inodes+0xb9/0x140
[<ffffffff81135efc>] writeback_inodes_wb+0xcc/0x140
[<ffffffff8113629b>] wb_writeback+0x32b/0x370
[<ffffffff811363b4>] wb_do_writeback+0xd4/0x1d0
[<ffffffff8113658b>] bdi_writeback_thread+0xdb/0x240
[<ffffffff81075046>] kthread+0x96/0xb0
[<ffffffff8100bca4>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
24 0 256484 2346524 26168 385584 0 1 33 26 5 14 18 9 60 13
25 0 256320 2346528 26168 385676 4 0 156 0 2041 2900 2 98 0 0
24 0 256144 2348916 26168 385968 32 0 172 8 2030 2879 3 97 0 0
24 0 255984 2349260 26168 386136 8 0 156 4 2080 2970 1 99 0 0
24 0 255360 2351216 26176 386168 128 0 272 44 2082 2942 0 100 0 0
25 0 255152 2351352 26184 386248 52 0 196 44 2071 3030 2 98 0 0
23 0 254880 2353400 26192 386660 144 0 296 84 2249 2960 3 97 0 0
23 0 254660 2354028 26192 386756 40 0 200 8 2060 3114 1 100 0 0
Gradually, the flush processes go back to sleeping, and then everything
works again, as if nothing happened. This is in contrast to what we
are seeing on the log crunching server, where system time just keeps
increasing forever with each log cruching run. These two cases may or
may not be the same problem.
See longer vmstat, nfsstat output, etc. here:
http://0x.ca/sim/ref/2.6.37-rc4-git1_flushy_nfs/
Simon-
next prev parent reply other threads:[~2010-12-14 23:38 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-08 21:25 System CPU increasing on idle 2.6.36 Simon Kirby
2010-12-08 21:53 ` Trond Myklebust
2010-12-08 22:36 ` Simon Kirby
2010-12-09 4:37 ` Trond Myklebust
2010-12-14 23:38 ` Simon Kirby [this message]
2010-12-15 1:10 ` Simon Kirby
2010-12-15 1:56 ` Simon Kirby
2010-12-15 18:08 ` J. Bruce Fields
2010-12-15 18:22 ` Trond Myklebust
2010-12-15 18:38 ` J. Bruce Fields
2010-12-15 19:33 ` Trond Myklebust
2010-12-15 19:49 ` J. Bruce Fields
2010-12-15 19:57 ` Trond Myklebust
2010-12-15 20:19 ` J. Bruce Fields
2010-12-15 20:32 ` Trond Myklebust
2010-12-15 21:48 ` J. Bruce Fields
2010-12-15 22:15 ` Trond Myklebust
2010-12-15 22:29 ` J. Bruce Fields
2010-12-15 22:55 ` J. Bruce Fields
2010-12-15 23:58 ` Trond Myklebust
2010-12-16 0:36 ` J. Bruce Fields
2011-09-27 0:39 ` NFS client growing system CPU Simon Kirby
2011-09-27 11:42 ` Trond Myklebust
2011-09-27 16:49 ` Simon Kirby
2011-09-27 17:04 ` Trond Myklebust
2011-09-28 19:58 ` Simon Kirby
2011-09-30 0:58 ` Simon Kirby
2011-09-30 1:11 ` Myklebust, Trond
2011-10-05 23:07 ` Simon Kirby
2010-12-18 1:08 ` System CPU increasing on idle 2.6.36 Simon Kirby
2010-12-21 20:31 ` Mark Moseley
2010-12-29 22:03 ` Simon Kirby
2011-01-04 17:42 ` Mark Moseley
2011-01-04 21:40 ` Simon Kirby
2011-01-05 19:43 ` Mark Moseley
2011-01-07 18:05 ` Mark Moseley
2011-01-07 18:12 ` Mark Moseley
2011-01-07 19:33 ` Mark Moseley
2011-01-08 0:52 ` Simon Kirby
2011-01-08 1:30 ` Mark Moseley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101214233843.GB836@hostway.ca \
--to=sim@hostway.ca \
--cc=Trond.Myklebust@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).