linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com,
	linux-ext4@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org
Subject: Re: Hole punching and mmap races
Date: Tue, 12 Jun 2012 10:56:39 +0200	[thread overview]
Message-ID: <20120612085639.GA6021@quack.suse.cz> (raw)
In-Reply-To: <20120608230616.GA25389@dastard>

On Sat 09-06-12 09:06:16, Dave Chinner wrote:
> On Fri, Jun 08, 2012 at 11:36:29PM +0200, Jan Kara wrote:
> > On Fri 08-06-12 10:57:00, Dave Chinner wrote:
> > > On Thu, Jun 07, 2012 at 11:58:35PM +0200, Jan Kara wrote:
> > > > On Wed 06-06-12 23:36:16, Dave Chinner wrote:
> > > > Also we could implement the common case of locking a range
> > > > containing single page by just taking page lock so we save modification of
> > > > interval tree in the common case and generally make the tree smaller (of
> > > > course, at the cost of somewhat slowing down cases where we want to lock
> > > > larger ranges).
> > > 
> > > That seems like premature optimistion to me, and all the cases I
> > > think we need to care about are locking large ranges of the tree.
> > > Let's measure what the overhead of tracking everything in a single
> > > tree is first so we can then see what needs optimising...
> >   Umm, I agree that initially we probably want just to have the mapping
> > range lock ability, stick it somewhere to IO path and make things work.
> > Then we can look into making it faster / merging with page lock.
> > 
> > However I disagree we care most about locking large ranges. For all
> > buffered IO and all page faults we need to lock a range containing just a
> > single page. We cannot lock more due to locking constraints with mmap_sem.
> 
> Not sure I understand what that constraint is - I hav ebeen thinking
> that the buffered IO range lok would be across the entire IO, not
> individual pages.
> 
> As it is, if we want to do multipage writes (and we do), we have to
> be able to lock a range of the mapping in the buffered IO path at a
> time...
  The problem is that buffered IO path does (e.g. in
generic_perform_write()):
  iov_iter_fault_in_readable() - that faults in one page worth of buffers,
    takes mmap_sem
  ->write_begin()
  copy data - iov_iter_copy_from_user_atomic()
  ->write_end()

  So we take mmap_sem before writing every page. We could fault in more,
but that increases risk of iov_iter_copy_from_user_atomic() failing because
the page got reclaimed before we got to it. So the amount we fault in would
have to adapt to current memory pressure. That's certainly possible but not
related to the problem we are trying to solve now so I'd prefer to handle
it separately.

> > So the places that will lock larger ranges are: direct IO, truncate, punch
> > hole. Writeback actually doesn't seem to need any additional protection at
> > least as I've sketched out things so far.
> 
> AFAICT, writeback needs protection against punching holes, just like
> mmap does, because they use the same "avoid truncated pages"
> mechanism.
  If punching hole does:
lock_mapping_range()
evict all pages in a range
punch blocks
unlock_mapping_range()

  Then we shouldn't race against writeback because there are no pages in
the mapping range we punch and they cannot be created there because we
hold the lock. I agree this might be unnecessary optimization, but the nice
result is that we can clean dirty pages regardless of what others do with
the mapping. So in case there would be problems with taking mapping lock from
writeback, we could avoid that.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

      reply	other threads:[~2012-06-12  8:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-15 22:48 Hole punching and mmap races Jan Kara
2012-05-16  2:14 ` Dave Chinner
2012-05-16 13:04   ` Jan Kara
2012-05-17  7:43     ` Dave Chinner
2012-05-17 23:28       ` Jan Kara
2012-05-18 10:12         ` Dave Chinner
2012-05-18 13:32           ` Jan Kara
2012-05-19  1:40             ` Dave Chinner
2012-05-24 12:35               ` Jan Kara
2012-06-05  5:51                 ` Dave Chinner
2012-06-05  6:22                   ` Marco Stornelli
2012-06-05 23:15                   ` Jan Kara
2012-06-06  0:06                     ` Dave Chinner
2012-06-06  9:58                       ` Jan Kara
2012-06-06 13:36                         ` Dave Chinner
2012-06-07 21:58                           ` Jan Kara
2012-06-08  0:57                             ` Dave Chinner
2012-06-08 21:36                               ` Jan Kara
2012-06-08 23:06                                 ` Dave Chinner
2012-06-12  8:56                                   ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120612085639.GA6021@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=david@fromorbit.com \
    --cc=hughd@google.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).