From: Simon Kirby <sim@hostway.ca>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Mark Moseley <moseleymark@gmail.com>,
linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: NFS client growing system CPU
Date: Wed, 5 Oct 2011 16:07:43 -0700 [thread overview]
Message-ID: <20111005230743.GB31168@hostway.ca> (raw)
In-Reply-To: <2E1EB2CF9ED1CB4AA966F0EB76EAB4430B6C979E@SACMVEXC2-PRD.hq.netapp.com>
On Thu, Sep 29, 2011 at 06:11:17PM -0700, Myklebust, Trond wrote:
> > -----Original Message-----
> > From: Simon Kirby [mailto:sim@hostway.ca]
> > Sent: Thursday, September 29, 2011 8:58 PM
> > To: Myklebust, Trond
> > Cc: linux-nfs@vger.kernel.org; linux-kernel@vger.kernel.org
> > Subject: Re: NFS client growing system CPU
> >
> > On Wed, Sep 28, 2011 at 12:58:35PM -0700, Simon Kirby wrote:
> >
> > > On Tue, Sep 27, 2011 at 01:04:15PM -0400, Trond Myklebust wrote:
> > >
> > > > On Tue, 2011-09-27 at 09:49 -0700, Simon Kirby wrote:
> > > > > On Tue, Sep 27, 2011 at 07:42:53AM -0400, Trond Myklebust wrote:
> > > > >
> > > > > > On Mon, 2011-09-26 at 17:39 -0700, Simon Kirby wrote:
> > > > > > > Hello!
> > > > > > >
> > > > > > > Following up on "System CPU increasing on idle 2.6.36", this
> > > > > > > issue is still happening even on 3.1-rc7. So, since it has
> > > > > > > been 9 months since I reported this, I figured I'd bisect
> this
> > > > > > > issue. The first bisection ended in an IPMI regression that
> > > > > > > looked like the problem, so I had to start again.
> Eventually,
> > > > > > > I got commit b80c3cb628f0ebc241b02e38dd028969fb8026a2
> > > > > > > which made it into 2.6.34-rc4.
> > > > > > >
> > > > > > > With this commit, system CPU keeps rising as the log crunch
> > > > > > > box runs (reads log files via NFS and spews out HTML files
> > > > > > > into NFS-mounted report directories). When it finishes the
> > > > > > > daily run, the system time stays non-zero and continues to
> be
> > > > > > > higher and higher after each run, until the box never
> completes a
> > run within a day due to all of the wasted cycles.
> > > > > >
> > > > > > So reverting that commit fixes the problem on 3.1-rc7?
> > > > > >
> > > > > > As far as I can see, doing so should be safe thanks to commit
> > > > > > 5547e8aac6f71505d621a612de2fca0dd988b439 (writeback: Update
> > > > > > dirty flags in two steps) which fixes the original problem at
> the VFS
> > level.
> > > > >
> > > > > Hmm, I went to git revert
> > > > > b80c3cb628f0ebc241b02e38dd028969fb8026a2, but for some reason
> git
> > > > > left the nfs_mark_request_dirty(req); line in
> > > > > nfs_writepage_setup(), even though the original commit had that.
> Is
> > that OK or should I remove that as well?
> > > > >
> > > > > Once that is sorted, I'll build it and let it run for a day and
> > > > > let you know. Thanks!
> > > >
> > > > It shouldn't make any difference whether you leave it or remove
> it.
> > > > The resulting second call to __set_page_dirty_nobuffers() will
> > > > always be a no-op since the page will already be marked as dirty.
> > >
> > > Ok, confirmed, git revert b80c3cb628f0ebc241b02e38dd028969fb8026a2
> on
> > > 3.1-rc7 fixes the problem for me. Does this make sense, then, or do
> we
> > > need further investigation and/or testing?
> >
> > Just to clear up what I said before, it seems that on plain 3.1-rc8, I
> am actually
> > able to clear the endless CPU use in nfs_writepages by just running
> "sync". I
> > am not sure when this changed, but I'm pretty sure that some versions
> > between 2.6.34 and 3.1-rc used to not be affected by just "sync"
> unless it
> > was paired with drop_caches. Maybe this makes the problem more
> > obvious...
>
> Hi Simon,
>
> I think you are just finding yourself cycling through the VFS writeback
> routines all the time because we dirty the inode for COMMIT at the same
> time as we dirty a new page. Usually, we want to wait until after the
> WRITE rpc call has completed, and so it was only the vfs race that
> forced us to write this workaround so that we can guarantee reliable
> fsync() behaviour.
>
> My only concern at this point is to make sure that in reverting that
> patch, we haven't overlooked some other fsync() bug that this patch
> fixed. So far, it looks as if Dmitry's patch is sufficient to deal with
> any issues that I can see.
Hello!
So, this is a regression that has caused uptime issues for us since
2.6.34-rc4. Dmitry's patch went into 2.6.35, so I think this revert
should be committed and be a stable candidate for 2.6.35 - 3.1.
We have not seen any problems resulting from the revert, but our loads
are not particularly fsync()-heavy. How did things work before this
patch, anyway?
Here is another graph showing the revert fixing the problem on this box
with relatively simple workload (revert applied Tuesday evening):
http://0x.ca/sim/ref/3.1-rc8/cpu-analog02-revert-b80c3cb6.png
It is helping on many other boxes, too, but they get various spurts of
memory pressure and other CPU spikes that cause the difference to be
harder to see. We're still running your sunrpc/clnt.c debugging patch as
well, but haven't hit the hang again yet.
Simon-
next prev parent reply other threads:[~2011-10-05 23:07 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-08 21:25 System CPU increasing on idle 2.6.36 Simon Kirby
2010-12-08 21:53 ` Trond Myklebust
2010-12-08 22:36 ` Simon Kirby
2010-12-09 4:37 ` Trond Myklebust
2010-12-14 23:38 ` Simon Kirby
2010-12-15 1:10 ` Simon Kirby
2010-12-15 1:56 ` Simon Kirby
2010-12-15 18:08 ` J. Bruce Fields
2010-12-15 18:22 ` Trond Myklebust
2010-12-15 18:38 ` J. Bruce Fields
2010-12-15 19:33 ` Trond Myklebust
2010-12-15 19:49 ` J. Bruce Fields
2010-12-15 19:57 ` Trond Myklebust
2010-12-15 20:19 ` J. Bruce Fields
2010-12-15 20:32 ` Trond Myklebust
2010-12-15 21:48 ` J. Bruce Fields
2010-12-15 22:15 ` Trond Myklebust
2010-12-15 22:29 ` J. Bruce Fields
2010-12-15 22:55 ` J. Bruce Fields
2010-12-15 23:58 ` Trond Myklebust
2010-12-16 0:36 ` J. Bruce Fields
2011-09-27 0:39 ` NFS client growing system CPU Simon Kirby
2011-09-27 11:42 ` Trond Myklebust
2011-09-27 16:49 ` Simon Kirby
2011-09-27 17:04 ` Trond Myklebust
2011-09-28 19:58 ` Simon Kirby
2011-09-30 0:58 ` Simon Kirby
2011-09-30 1:11 ` Myklebust, Trond
2011-10-05 23:07 ` Simon Kirby [this message]
2010-12-18 1:08 ` System CPU increasing on idle 2.6.36 Simon Kirby
2010-12-21 20:31 ` Mark Moseley
2010-12-29 22:03 ` Simon Kirby
2011-01-04 17:42 ` Mark Moseley
2011-01-04 21:40 ` Simon Kirby
2011-01-05 19:43 ` Mark Moseley
2011-01-07 18:05 ` Mark Moseley
2011-01-07 18:12 ` Mark Moseley
2011-01-07 19:33 ` Mark Moseley
2011-01-08 0:52 ` Simon Kirby
2011-01-08 1:30 ` Mark Moseley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111005230743.GB31168@hostway.ca \
--to=sim@hostway.ca \
--cc=Trond.Myklebust@netapp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=moseleymark@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).