From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com,
linux-ext4@vger.kernel.org, Hugh Dickins <hughd@google.com>,
linux-mm@kvack.org
Subject: Re: Hole punching and mmap races
Date: Wed, 6 Jun 2012 01:15:30 +0200 [thread overview]
Message-ID: <20120605231530.GB4402@quack.suse.cz> (raw)
In-Reply-To: <20120605055150.GF4347@dastard>
On Tue 05-06-12 15:51:50, Dave Chinner wrote:
> On Thu, May 24, 2012 at 02:35:38PM +0200, Jan Kara wrote:
> > > To me the issue at hand is that we have no method of serialising
> > > multi-page operations on the mapping tree between the filesystem and
> > > the VM, and that seems to be the fundamental problem we face in this
> > > whole area of mmap/buffered/direct IO/truncate/holepunch coherency.
> > > Hence it might be better to try to work out how to fix this entire
> > > class of problems rather than just adding a complex kuldge that just
> > > papers over the current "hot" symptom....
> > Yes, looking at the above table, the amount of different synchronization
> > mechanisms is really striking. So probably we should look at some
> > possibility of unifying at least some cases.
>
> It seems to me that we need some thing in between the fine grained
> page lock and the entire-file IO exclusion lock. We need to maintain
> fine grained locking for mmap scalability, but we also need to be
> able to atomically lock ranges of pages.
Yes, we also need to keep things fine grained to keep scalability of
direct IO and buffered reads...
> I guess if we were to nest a fine grained multi-state lock
> inside both the IO exclusion lock and the mmap_sem, we might be able
> to kill all problems in one go.
>
> Exclusive access on a range needs to be granted to:
>
> - direct IO
> - truncate
> - hole punch
>
> so they can be serialised against mmap based page faults, writeback
> and concurrent buffered IO. Serialisation against themselves is an
> IO/fs exclusion problem.
>
> Shared access for traversal or modification needs to be granted to:
>
> - buffered IO
> - mmap page faults
> - writeback
>
> Each of these cases can rely on the existing page locks or IO
> exclusion locks to provide safety for concurrent access to the same
> ranges. This means that once we have access granted to a range we
> can check truncate races once and ignore the problem until we drop
> the access. And the case of taking a page fault within a buffered
> IO won't deadlock because both take a shared lock....
You cannot just use a lock (not even a shared one) both above and under
mmap_sem. That is deadlockable in presence of other requests for exclusive
locking... Luckily, with buffered writes the situation isn't that bad. You
need mmap_sem only before each page is processed (in
iov_iter_fault_in_readable()). Later on in the loop we use
iov_iter_copy_from_user_atomic() which doesn't need mmap_sem. So we can
just get our shared lock after iov_iter_fault_in_readable() (or simply
leave it for ->write_begin() if we want to give control over the locking to
filesystems).
> We'd need some kind of efficient shared/exclusive range lock for
> this sort of exclusion, and it's entirely possible that it would
> have too much overhead to be acceptible in the page fault path. It's
> the best I can think of right now.....
>
> As it is, a range lock of this kind would be very handy for other
> things, too (like the IO exclusion locks so we can do concurrent
> buffered writes in XFS ;).
Yes, that's what I thought as well. In particular it should be pretty
efficient in locking a single page range because that's going to be
majority of calls. I'll try to write something and see how fast it can
be...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2012-06-05 23:15 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-15 22:48 Hole punching and mmap races Jan Kara
2012-05-16 2:14 ` Dave Chinner
2012-05-16 13:04 ` Jan Kara
2012-05-17 7:43 ` Dave Chinner
2012-05-17 23:28 ` Jan Kara
2012-05-18 10:12 ` Dave Chinner
2012-05-18 13:32 ` Jan Kara
2012-05-19 1:40 ` Dave Chinner
2012-05-24 12:35 ` Jan Kara
2012-06-05 5:51 ` Dave Chinner
2012-06-05 6:22 ` Marco Stornelli
2012-06-05 23:15 ` Jan Kara [this message]
2012-06-06 0:06 ` Dave Chinner
2012-06-06 9:58 ` Jan Kara
2012-06-06 13:36 ` Dave Chinner
2012-06-07 21:58 ` Jan Kara
2012-06-08 0:57 ` Dave Chinner
2012-06-08 21:36 ` Jan Kara
2012-06-08 23:06 ` Dave Chinner
2012-06-12 8:56 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120605231530.GB4402@quack.suse.cz \
--to=jack@suse.cz \
--cc=david@fromorbit.com \
--cc=hughd@google.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).