linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com,
	linux-ext4@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org
Subject: Re: Hole punching and mmap races
Date: Wed, 6 Jun 2012 11:58:27 +0200	[thread overview]
Message-ID: <20120606095827.GA6304@quack.suse.cz> (raw)
In-Reply-To: <20120606000636.GG22848@dastard>

On Wed 06-06-12 10:06:36, Dave Chinner wrote:
> On Wed, Jun 06, 2012 at 01:15:30AM +0200, Jan Kara wrote:
> > On Tue 05-06-12 15:51:50, Dave Chinner wrote:
> > > On Thu, May 24, 2012 at 02:35:38PM +0200, Jan Kara wrote:
> > > > > To me the issue at hand is that we have no method of serialising
> > > > > multi-page operations on the mapping tree between the filesystem and
> > > > > the VM, and that seems to be the fundamental problem we face in this
> > > > > whole area of mmap/buffered/direct IO/truncate/holepunch coherency.
> > > > > Hence it might be better to try to work out how to fix this entire
> > > > > class of problems rather than just adding a complex kuldge that just
> > > > > papers over the current "hot" symptom....
> > > >   Yes, looking at the above table, the amount of different synchronization
> > > > mechanisms is really striking. So probably we should look at some
> > > > possibility of unifying at least some cases.
> > > 
> > > It seems to me that we need some thing in between the fine grained
> > > page lock and the entire-file IO exclusion lock. We need to maintain
> > > fine grained locking for mmap scalability, but we also need to be
> > > able to atomically lock ranges of pages.
> >   Yes, we also need to keep things fine grained to keep scalability of
> > direct IO and buffered reads...
> > 
> > > I guess if we were to nest a fine grained multi-state lock
> > > inside both the IO exclusion lock and the mmap_sem, we might be able
> > > to kill all problems in one go.
> > > 
> > > Exclusive access on a range needs to be granted to:
> > > 
> > > 	- direct IO
> > > 	- truncate
> > > 	- hole punch
> > > 
> > > so they can be serialised against mmap based page faults, writeback
> > > and concurrent buffered IO. Serialisation against themselves is an
> > > IO/fs exclusion problem.
> > > 
> > > Shared access for traversal or modification needs to be granted to:
> > > 
> > > 	- buffered IO
> > > 	- mmap page faults
> > > 	- writeback
> > > 
> > > Each of these cases can rely on the existing page locks or IO
> > > exclusion locks to provide safety for concurrent access to the same
> > > ranges. This means that once we have access granted to a range we
> > > can check truncate races once and ignore the problem until we drop
> > > the access.  And the case of taking a page fault within a buffered
> > > IO won't deadlock because both take a shared lock....
> >   You cannot just use a lock (not even a shared one) both above and under
> > mmap_sem. That is deadlockable in presence of other requests for exclusive
> > locking...
> 
> Well, that's assuming that exclusive lock requests form a barrier to
> new shared requests. Remember that I'm talking about a range lock
> here, which we can make up whatever semantics we'd need, including
> having "shared lock if already locked shared" nested locking
> semantics which avoids this page-fault-in-buffered-IO-copy-in/out
> problem....
  That's true. But if you have semantics like this, constant writing to
or reading from a file could starve e.g. truncate. So I'd prefer not to
open this can of worms and keep semantics of rw semaphores if possible.

Furthermore, with direct IO you have to set in stone the ordering of
mmap_sem and range lock anyway because there we need an exclusive lock.

> It also allows writeback to work the same way it does write now when
> we take a page fault on a page that is under writeback
  I'm not sure what would be the difference regardless of which semantics
we choose...

> > Luckily, with buffered writes the situation isn't that bad. You
> > need mmap_sem only before each page is processed (in
> > iov_iter_fault_in_readable()). Later on in the loop we use
> > iov_iter_copy_from_user_atomic() which doesn't need mmap_sem. So we can
> > just get our shared lock after iov_iter_fault_in_readable() (or simply
> > leave it for ->write_begin() if we want to give control over the locking to
> > filesystems).
> 
> That would probably work as well, but it much more likely that
> people would get it wrong as opposed to special casing the nested
> lock semantic in the page fault code...
  I suppose some helper functions could make this easier...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2012-06-06  9:58 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-15 22:48 Hole punching and mmap races Jan Kara
2012-05-16  2:14 ` Dave Chinner
2012-05-16 13:04   ` Jan Kara
2012-05-17  7:43     ` Dave Chinner
2012-05-17 23:28       ` Jan Kara
2012-05-18 10:12         ` Dave Chinner
2012-05-18 13:32           ` Jan Kara
2012-05-19  1:40             ` Dave Chinner
2012-05-24 12:35               ` Jan Kara
2012-06-05  5:51                 ` Dave Chinner
2012-06-05  6:22                   ` Marco Stornelli
2012-06-05 23:15                   ` Jan Kara
2012-06-06  0:06                     ` Dave Chinner
2012-06-06  9:58                       ` Jan Kara [this message]
2012-06-06 13:36                         ` Dave Chinner
2012-06-07 21:58                           ` Jan Kara
2012-06-08  0:57                             ` Dave Chinner
2012-06-08 21:36                               ` Jan Kara
2012-06-08 23:06                                 ` Dave Chinner
2012-06-12  8:56                                   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120606095827.GA6304@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=david@fromorbit.com \
    --cc=hughd@google.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).