Re: Async direct IO write vs buffered read race

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Lukas Czerner <lczerner@redhat.com>,
	linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
	esandeen@redhat.com, Christoph Hellwig <hch@lst.de>,
	Jan Kara <jack@suse.cz>
Subject: Re: Async direct IO write vs buffered read race
Date: Fri, 23 Jun 2017 09:59:42 +0200	[thread overview]
Message-ID: <20170623075942.GC25149@quack2.suse.cz> (raw)
In-Reply-To: <x49bmpg3qsp.fsf@segfault.boston.devel.redhat.com>

On Thu 22-06-17 12:55:50, Jeff Moyer wrote:
> Lukas Czerner <lczerner@redhat.com> writes:
> 
> > Hello,
> >
> > I am dealing with a problem where in case that buffered read happens to
> > land between direct IO submission and completion page cache will contain
> > the stale data, while the new data will be on disk.
> >
> > We are trying to avoid such problems by calling
> > invalidate_inode_pages2_range() before and after direct_IO() in
> > generic_file_direct_write() however that does not seem to be enough,
> > because nothing prevents buffered reads to come in afterwards populating
> > page cache.
> 
> Ugh, right.  With aio, we're doing the invalidate after the submission,
> not the completion.
> 
> > Aside from the fact that mixing direct and buffered IO is not such a
> > good idea, we end up with page cache showing different content than
> > what's on disk even after aio dio completes which seems very strange
> > to me.
> >
> > I can reproduce this on ext4 as well as xfs and kernel version going
> > back at least to v3.10 which leads me to believe that this might
> > actually be known behaviour ?
> 
> At least I didn't know about it.  ;-)

I'm actually aware of it :)

> > I was trying to avoid that by moving invalidate_inode_pages2_range() to
> > after the aio dio completion into dio_complete (or file system ->end_io
> > callback) but it has it's own problems - sometimes this appears to be
> > called from atomic context and I do not really see why...
> 
> Well, I/O completion processing of course happens in atomic context.  We
> do defer some things (like O_(D)SYNC processing) to process context.  I
> guess we could add another qualifier inside of dio_bio_end_aio:
> 
> 	bool defer_completion = false;
> 	if (dio->result)
> 		defer_completion = dio->defer_completion || 
> 			(dio->op == REQ_OP_WRITE && dio->inode->i_mapping->nrpages);
> 
> 	if (remaining == 0) {
> 		if (defer_completion) {
> 			INIT_WORK(&dio->complete_work, dio_aio_complete_work);
> 			queue_work(dio->inode->i_sb->s_dio_done_wq,
> 				   &dio->complete_work);
> ...
> 
> (I'm not sure whether we also have to take into account exceptional entries.)
> 
> And then call invalidate_inode_pages2_range from dio_aio_complete_work.
> That at least wouldn't defer /all/ completion processing to a workqueue.
> However, it will slow things down when there is mixed buffered and
> direct I/O.
> 
> Christoph or Jan, any thoughts on this?

So our stance has been: Do not ever mix buffered and direct IO! Definitely
not on the same file range, most definitely not at the same time.

The thing we do is a best effort thing that more or less guarantees that if
you do say buffered IO and direct IO after that, it will work reasonably.
However if direct and buffered IO can race, bad luck for your data. I don't
think we want to sacrifice any performance of AIO DIO (and offloading of
direct IO completion to a workqueue so that we can do invalidation costs
noticeable mount of performance) for supporting such usecase.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

next prev parent reply	other threads:[~2017-06-23  7:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-22 15:57 Async direct IO write vs buffered read race Lukas Czerner
2017-06-22 16:55 ` Jeff Moyer
2017-06-23  7:59   ` Jan Kara [this message]
2017-06-23 10:16     ` Lukas Czerner
2017-06-26 15:11       ` Jeff Moyer
2017-06-28 16:57         ` Rik van Riel
2017-06-30 11:16           ` Lukas Czerner
2017-06-23 18:04     ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170623075942.GC25149@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=esandeen@redhat.com \
    --cc=hch@lst.de \
    --cc=jmoyer@redhat.com \
    --cc=lczerner@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).