From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Dave Chinner <david@fromorbit.com>
Cc: davej@redhat.com, linux-kernel@vger.kernel.org,
torvalds@linux-foundation.org, viro@zeniv.linux.org.uk,
hch@lst.de, jack@suse.cz, curtw@google.com, jaxboe@fusionio.com,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC fs] v2 Make sync() satisfy many requests with one invocation
Date: Fri, 26 Jul 2013 21:05:24 -0700 [thread overview]
Message-ID: <20130727040524.GC26694@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130727025703.GF21982@dastard>
On Sat, Jul 27, 2013 at 12:57:03PM +1000, Dave Chinner wrote:
> On Fri, Jul 26, 2013 at 04:28:52PM -0700, Paul E. McKenney wrote:
> > Dave Jones reported RCU stalls, overly long hrtimer interrupts, and
> > amazingly long NMI handlers from a trinity-induced workload involving
> > lots of concurrent sync() calls (https://lkml.org/lkml/2013/7/23/369).
> > There are any number of things that one might do to make sync() behave
> > better under high levels of contention, but it is also the case that
> > multiple concurrent sync() system calls can be satisfied by a single
> > sys_sync() invocation.
> >
> > Given that this situation is reminiscent of rcu_barrier(), this commit
> > applies the rcu_barrier() approach to sys_sync(). This approach uses
> > a global mutex and a sequence counter. The mutex is held across the
> > sync() operation, which eliminates contention between concurrent sync()
> > operations.
> >
> > The counter is incremented at the beginning and end of
> > each sync() operation, so that it is odd while a sync() operation is in
> > progress and even otherwise, just like sequence locks.
> >
> > The code that used to be in sys_sync() is now in do_sync(), and sys_sync()
> > now handles the concurrency. The sys_sync() function first takes a
> > snapshot of the counter, then acquires the mutex, and then takes another
> > snapshot of the counter. If the values of the two snapshots indicate that
> > a full do_sync() executed during the mutex acquisition, the sys_sync()
> > function releases the mutex and returns ("Our work is done!"). Otherwise,
> > sys_sync() increments the counter, invokes do_sync(), and increments
> > the counter again.
> >
> > This approach allows a single call to do_sync() to satisfy an arbitrarily
> > large number of sync() system calls, which should eliminate issues due
> > to large numbers of concurrent invocations of the sync() system call.
>
> This is not addressing the problem that is causing issues during
> sync. Indeed, it only puts a bandaid over the currently observed
> trigger.
>
> Indeed, i suspect that this will significantly slow down concurrent
> sync operations, as it serialised sync across all superblocks rather
> than serialising per-superblock like is currently done. Indeed, that
> per-superblock serialisation is where all the lock contention
> problems are. And it's not sync alone that causes the contention
> problems - it has to be combined with other concurrent workloads
> that add or remove inodes from the inode cache at tha same time.
Seems like something along the lines of wakeup_flusher_threads()
currently at the start of sys_sync() would address this.
> I have patches to address that by removing the source
> of the lock contention completely, and not just for the sys_sync
> trigger. Those patches make the problems with concurrent
> sys_sync operation go away completely for me, not to mention improve
> performance for 8+ thread metadata workloads on XFS significantly.
>
> IOWs, I don't see that concurrent sys_sync operation is a problem at
> all, and it is actively desirable for systems that have multiple
> busy filesystems as it allows concurrent dispatch of IO across those
> multiple filesystems. Serialising all sys_sync work might stop the
> contention problems, but it will also slow down concurrent sync
> operations on busy systems as it only allows one thread to dispatch
> and wait for IO at a time.
>
> So, let's not slap a bandaid over a symptom - let's address the
> cause of the lock contention properly....
Hmmm...
Could you please send your patches over to Dave Jones right now? I am
getting quite tired of getting RCU CPU stall warning complaints from
him that turn out to be due to highly contended sync() system calls.
Thanx, Paul
next prev parent reply other threads:[~2013-07-27 4:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-26 23:28 [PATCH RFC fs] v2 Make sync() satisfy many requests with one invocation Paul E. McKenney
2013-07-27 0:29 ` Linus Torvalds
2013-07-27 1:23 ` Paul E. McKenney
2013-07-27 2:57 ` Dave Chinner
2013-07-27 4:05 ` Paul E. McKenney [this message]
2013-07-27 6:21 ` Dave Chinner
2013-07-27 11:26 ` Paul E. McKenney
2013-07-29 7:06 ` Dave Chinner
2013-07-30 17:43 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130727040524.GC26694@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=curtw@google.com \
--cc=davej@redhat.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jaxboe@fusionio.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).