Re: async O_DIRECT vs. buffered synchronization (or lack thereof)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: linux-fsdevel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Theodore Ts'o <tytso@mit.edu>
Subject: Re: async O_DIRECT vs. buffered synchronization (or lack thereof)
Date: Sun, 25 Oct 2015 08:50:14 +1100	[thread overview]
Message-ID: <20151024215014.GF8773@dastard> (raw)
In-Reply-To: <20151022234245.GA20005@kmo-pixel>

On Thu, Oct 22, 2015 at 03:42:45PM -0800, Kent Overstreet wrote:
> While off reading code, I noticed something that didn't look quite right...
> 
> Look at generic_file_direct_write(), in mm/filemap.c. What the code there is
> doing is:
> 
>  - dropping the range we're writing to from the page cache (writing it first if
>    necessary), then
>  - doing the write, then
>  - invalidating that range in the pagecache again.
....

> Yet _another_ fun fact: I mentioned that for the filemap_write_and_wait_range();
> invalidate_inode_pages2() sequence to work we have to be preventing pages from
> being redirtied. Well, i_mutex does the job for buffered writes, but not
> page faults - AFAICT page_mkwrite() would have to take i_mutex for this code to
> not race with page faults, and the default page_mkwrite implementation
> (filemap_page_mkwrite()) definitely does not.
> 
> It does _lock_ the page though, so if we had something that combined
> filemap_write_and_wait_range() with invalidating pages, making sure to have the
> page still locked when removing it from the page cache - that ought to work.
> 
> XFS does seem to attempt to get this right - its .page_mkwrite takes the inode
> XFS_MMAPLOCK_SHARED lock, and the xfs truncate and fallocate code both take
> XFS_MMAPLOCK_EXCL (truncate and (in particular) fcollapse also need to drop
> ranges from the page cache, fcollapse is where I first noticed this particular
> issue).
> 
> But AFAICT xfs's dio path does _not_ take the correct lock for this to work -
> although if you look at xfs_file_dio_aio_write() they were clearly thinking
> about page cache synchronization, so perhaps I'm missing something about how
> xfs's locking works.

We can't take it across direct IO submission/completion because that
creates a mmap_sem/XFS_MMAPLOCK inversion due to the direct IO code
calling get_user_pages().

We can't put locks in the page fault path to solve this problem - I
created the XFS_MMAPLOCK locking to solve the "page faults race with
extent manipulation operations" knowing that it couldn't be used to
solve the DIO vs mmap race conditions.

There's a reason why XFS developers still say "if you mix
buffered/mmap IO on the same file as direct IO, you get to keep all
the corrupted bits" and then point users at the open(2) man page:

	Applications should avoid mixing O_DIRECT and normal I/O to
	the same file, and especially to overlapping byte regions in
	the  same  file.   Even  when  the filesystem  correctly
	handles the coherency issues in this situation, overall I/O
	throughput is likely to be slower than using either mode
	alone.  Likewise, applications should avoid mixing mmap(2)
	of files with direct I/O to the same files.

We attempt best effort at maintaining coherency and preventing data
corruption, but we cannot guarantee coherency...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

     prev parent reply	other threads:[~2015-10-24 21:50 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-22 23:42 async O_DIRECT vs. buffered synchronization (or lack thereof) Kent Overstreet
2015-10-24 21:50 ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151024215014.GF8773@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.