linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Dave Chinner <david@fromorbit.com>
Cc: davej@redhat.com, linux-kernel@vger.kernel.org,
	torvalds@linux-foundation.org, viro@zeniv.linux.org.uk,
	hch@lst.de, jack@suse.cz, curtw@google.com, jaxboe@fusionio.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC fs] v2 Make sync() satisfy many requests with one invocation
Date: Fri, 26 Jul 2013 21:05:24 -0700	[thread overview]
Message-ID: <20130727040524.GC26694@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130727025703.GF21982@dastard>

On Sat, Jul 27, 2013 at 12:57:03PM +1000, Dave Chinner wrote:
> On Fri, Jul 26, 2013 at 04:28:52PM -0700, Paul E. McKenney wrote:
> > Dave Jones reported RCU stalls, overly long hrtimer interrupts, and
> > amazingly long NMI handlers from a trinity-induced workload involving
> > lots of concurrent sync() calls (https://lkml.org/lkml/2013/7/23/369).
> > There are any number of things that one might do to make sync() behave
> > better under high levels of contention, but it is also the case that
> > multiple concurrent sync() system calls can be satisfied by a single
> > sys_sync() invocation.
> > 
> > Given that this situation is reminiscent of rcu_barrier(), this commit
> > applies the rcu_barrier() approach to sys_sync().  This approach uses
> > a global mutex and a sequence counter.  The mutex is held across the
> > sync() operation, which eliminates contention between concurrent sync()
> > operations.
> >
> > The counter is incremented at the beginning and end of
> > each sync() operation, so that it is odd while a sync() operation is in
> > progress and even otherwise, just like sequence locks.
> > 
> > The code that used to be in sys_sync() is now in do_sync(), and sys_sync()
> > now handles the concurrency.  The sys_sync() function first takes a
> > snapshot of the counter, then acquires the mutex, and then takes another
> > snapshot of the counter.  If the values of the two snapshots indicate that
> > a full do_sync() executed during the mutex acquisition, the sys_sync()
> > function releases the mutex and returns ("Our work is done!").  Otherwise,
> > sys_sync() increments the counter, invokes do_sync(), and increments
> > the counter again.
> > 
> > This approach allows a single call to do_sync() to satisfy an arbitrarily
> > large number of sync() system calls, which should eliminate issues due
> > to large numbers of concurrent invocations of the sync() system call.
> 
> This is not addressing the problem that is causing issues during
> sync. Indeed, it only puts a bandaid over the currently observed
> trigger.
> 
> Indeed, i suspect that this will significantly slow down concurrent
> sync operations, as it serialised sync across all superblocks rather
> than serialising per-superblock like is currently done. Indeed, that
> per-superblock serialisation is where all the lock contention
> problems are. And it's not sync alone that causes the contention
> problems - it has to be combined with other concurrent workloads
> that add or remove inodes from the inode cache at tha same time.

Seems like something along the lines of wakeup_flusher_threads()
currently at the start of sys_sync() would address this.

> I have patches to address that by removing the source
> of the lock contention completely, and not just for the sys_sync
> trigger. Those patches make the problems with concurrent
> sys_sync operation go away completely for me, not to mention improve
> performance for 8+ thread metadata workloads on XFS significantly.
> 
> IOWs, I don't see that concurrent sys_sync operation is a problem at
> all, and it is actively desirable for systems that have multiple
> busy filesystems as it allows concurrent dispatch of IO across those
> multiple filesystems. Serialising all sys_sync work might stop the
> contention problems, but it will also slow down concurrent sync
> operations on busy systems as it only allows one thread to dispatch
> and wait for IO at a time.
> 
> So, let's not slap a bandaid over a symptom - let's address the
> cause of the lock contention properly....

Hmmm...

Could you please send your patches over to Dave Jones right now?  I am
getting quite tired of getting RCU CPU stall warning complaints from
him that turn out to be due to highly contended sync() system calls.

							Thanx, Paul


  reply	other threads:[~2013-07-27  4:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-26 23:28 [PATCH RFC fs] v2 Make sync() satisfy many requests with one invocation Paul E. McKenney
2013-07-27  0:29 ` Linus Torvalds
2013-07-27  1:23   ` Paul E. McKenney
2013-07-27  2:57 ` Dave Chinner
2013-07-27  4:05   ` Paul E. McKenney [this message]
2013-07-27  6:21     ` Dave Chinner
2013-07-27 11:26       ` Paul E. McKenney
2013-07-29  7:06         ` Dave Chinner
2013-07-30 17:43           ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130727040524.GC26694@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=curtw@google.com \
    --cc=davej@redhat.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jaxboe@fusionio.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).