linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: Race between flush and write during an AIO+DIO+O_SYNC write?
Date: Tue, 6 Nov 2012 11:42:58 -0800	[thread overview]
Message-ID: <20121106194258.GD3941@blackbox.djwong.org> (raw)
In-Reply-To: <x49ehk6pt8x.fsf@segfault.boston.devel.redhat.com>

On Tue, Nov 06, 2012 at 11:54:06AM -0500, Jeff Moyer wrote:
> "Darrick J. Wong" <darrick.wong@oracle.com> writes:
> 
> > Hi all,
> >
> > One of our (app) developers noticed that io_submit() takes a very long time to
> > return if the program initiates a write to a block device that's been opened in
> > O_SYNC and O_DIRECTIO mode.  We traced the slowness to blkdev_aio_write, which
> > seems to initiate a disk cache flush if __generic_file_aio_write returns a
> > positive value or -EIOCBQUEUED.  Usually we see -EIOCBQUEUED returned, which
> > triggers the flush, hence io_submit() stalls for a long time.  That doesn't
> > really feel like the intended usage pattern for aio.
> >
> > This -EIOCBQUEUED case seems a little strange -- if an async io has been queued
> > (but not necessarily completed), why would we immediately issue a cache flush?
> > This seems like a setup for the flush racing against the write, which means
> > that the write could happen after the flush, which would be bad.
> >
> > Jeff Moyer proposed a patchset last spring[1] that removed the -EIOCBQUEUED
> > case and deferred the flush issue to each filesystem's end_io handler.  Google
> > doesn't find any NAKs, but the patches don't seem to have gone anywhere.  Is
> > there a technical reason why this patches haven't gone anywhere?
> 
> I never got the sign-off on the xfs bits, and I then got distracted with
> other work.  I'll see about updating the patch set.
> 
> > Could one establish an end_io handler in blkdev_direct_IO so that async writes
> > to an O_SYNC+DIO block device will result in a blkdev_issue_flush before
> > aio_complete?  That would seem to fix the problem of the write and flush race.
> 
> You mean like patch 1 in that series, or something different?

The original patchset doesn't seem to modify the block device aio code -- I
think blkdev_direct_IO needs to pass DIO_SYNC_WRITES to __blockdev_direct_IO,
and the -EIOCBQUEUED check needs to be taken out of blkdev_aio_write.

I also observed a crash in the queue_work call when running against a block
device.  For block devices, it looks like in do_blockdev_direct_IO,
iocb->ki_filp->f_mapping->host is an inode in the bdev filesystem, and
iocb->ki_filp->f_dentry->d_inode->i_sb is whichever inode the user accessed
(probably devtmpfs or something).  The two are of course equal for regular
files.

Since block devices are (I think) part of their own bdev filesystem, I think it
makes more sense if we call queue_work against the flush_wq of the bdev fs, not
the fs that just happened to contain the device file.

Will send patches after I clean 'em up and test them a bit more.

--D
> 
> Cheers,
> Jeff

  reply	other threads:[~2012-11-06 19:43 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-06  2:21 Race between flush and write during an AIO+DIO+O_SYNC write? Darrick J. Wong
2012-11-06 16:54 ` Jeff Moyer
2012-11-06 19:42   ` Darrick J. Wong [this message]
2012-11-06 20:26   ` [RFC PATCH] blkdev: Fix up AIO+DIO+O_SYNC to do the sync part correctly Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121106194258.GD3941@blackbox.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).