* Race between flush and write during an AIO+DIO+O_SYNC write?
@ 2012-11-06 2:21 Darrick J. Wong
2012-11-06 16:54 ` Jeff Moyer
0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2012-11-06 2:21 UTC (permalink / raw)
To: linux-fsdevel, Jeff Moyer
Hi all,
One of our (app) developers noticed that io_submit() takes a very long time to
return if the program initiates a write to a block device that's been opened in
O_SYNC and O_DIRECTIO mode. We traced the slowness to blkdev_aio_write, which
seems to initiate a disk cache flush if __generic_file_aio_write returns a
positive value or -EIOCBQUEUED. Usually we see -EIOCBQUEUED returned, which
triggers the flush, hence io_submit() stalls for a long time. That doesn't
really feel like the intended usage pattern for aio.
This -EIOCBQUEUED case seems a little strange -- if an async io has been queued
(but not necessarily completed), why would we immediately issue a cache flush?
This seems like a setup for the flush racing against the write, which means
that the write could happen after the flush, which would be bad.
Jeff Moyer proposed a patchset last spring[1] that removed the -EIOCBQUEUED
case and deferred the flush issue to each filesystem's end_io handler. Google
doesn't find any NAKs, but the patches don't seem to have gone anywhere. Is
there a technical reason why this patches haven't gone anywhere?
Could one establish an end_io handler in blkdev_direct_IO so that async writes
to an O_SYNC+DIO block device will result in a blkdev_issue_flush before
aio_complete? That would seem to fix the problem of the write and flush race.
--D
[1] http://oss.sgi.com/archives/xfs/2012-03/msg00082.html
"fs: fix up AIO+DIO+O_SYNC to actually do the sync part"
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Race between flush and write during an AIO+DIO+O_SYNC write?
2012-11-06 2:21 Race between flush and write during an AIO+DIO+O_SYNC write? Darrick J. Wong
@ 2012-11-06 16:54 ` Jeff Moyer
2012-11-06 19:42 ` Darrick J. Wong
2012-11-06 20:26 ` [RFC PATCH] blkdev: Fix up AIO+DIO+O_SYNC to do the sync part correctly Darrick J. Wong
0 siblings, 2 replies; 4+ messages in thread
From: Jeff Moyer @ 2012-11-06 16:54 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-fsdevel
"Darrick J. Wong" <darrick.wong@oracle.com> writes:
> Hi all,
>
> One of our (app) developers noticed that io_submit() takes a very long time to
> return if the program initiates a write to a block device that's been opened in
> O_SYNC and O_DIRECTIO mode. We traced the slowness to blkdev_aio_write, which
> seems to initiate a disk cache flush if __generic_file_aio_write returns a
> positive value or -EIOCBQUEUED. Usually we see -EIOCBQUEUED returned, which
> triggers the flush, hence io_submit() stalls for a long time. That doesn't
> really feel like the intended usage pattern for aio.
>
> This -EIOCBQUEUED case seems a little strange -- if an async io has been queued
> (but not necessarily completed), why would we immediately issue a cache flush?
> This seems like a setup for the flush racing against the write, which means
> that the write could happen after the flush, which would be bad.
>
> Jeff Moyer proposed a patchset last spring[1] that removed the -EIOCBQUEUED
> case and deferred the flush issue to each filesystem's end_io handler. Google
> doesn't find any NAKs, but the patches don't seem to have gone anywhere. Is
> there a technical reason why this patches haven't gone anywhere?
I never got the sign-off on the xfs bits, and I then got distracted with
other work. I'll see about updating the patch set.
> Could one establish an end_io handler in blkdev_direct_IO so that async writes
> to an O_SYNC+DIO block device will result in a blkdev_issue_flush before
> aio_complete? That would seem to fix the problem of the write and flush race.
You mean like patch 1 in that series, or something different?
Cheers,
Jeff
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Race between flush and write during an AIO+DIO+O_SYNC write?
2012-11-06 16:54 ` Jeff Moyer
@ 2012-11-06 19:42 ` Darrick J. Wong
2012-11-06 20:26 ` [RFC PATCH] blkdev: Fix up AIO+DIO+O_SYNC to do the sync part correctly Darrick J. Wong
1 sibling, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2012-11-06 19:42 UTC (permalink / raw)
To: Jeff Moyer; +Cc: linux-fsdevel
On Tue, Nov 06, 2012 at 11:54:06AM -0500, Jeff Moyer wrote:
> "Darrick J. Wong" <darrick.wong@oracle.com> writes:
>
> > Hi all,
> >
> > One of our (app) developers noticed that io_submit() takes a very long time to
> > return if the program initiates a write to a block device that's been opened in
> > O_SYNC and O_DIRECTIO mode. We traced the slowness to blkdev_aio_write, which
> > seems to initiate a disk cache flush if __generic_file_aio_write returns a
> > positive value or -EIOCBQUEUED. Usually we see -EIOCBQUEUED returned, which
> > triggers the flush, hence io_submit() stalls for a long time. That doesn't
> > really feel like the intended usage pattern for aio.
> >
> > This -EIOCBQUEUED case seems a little strange -- if an async io has been queued
> > (but not necessarily completed), why would we immediately issue a cache flush?
> > This seems like a setup for the flush racing against the write, which means
> > that the write could happen after the flush, which would be bad.
> >
> > Jeff Moyer proposed a patchset last spring[1] that removed the -EIOCBQUEUED
> > case and deferred the flush issue to each filesystem's end_io handler. Google
> > doesn't find any NAKs, but the patches don't seem to have gone anywhere. Is
> > there a technical reason why this patches haven't gone anywhere?
>
> I never got the sign-off on the xfs bits, and I then got distracted with
> other work. I'll see about updating the patch set.
>
> > Could one establish an end_io handler in blkdev_direct_IO so that async writes
> > to an O_SYNC+DIO block device will result in a blkdev_issue_flush before
> > aio_complete? That would seem to fix the problem of the write and flush race.
>
> You mean like patch 1 in that series, or something different?
The original patchset doesn't seem to modify the block device aio code -- I
think blkdev_direct_IO needs to pass DIO_SYNC_WRITES to __blockdev_direct_IO,
and the -EIOCBQUEUED check needs to be taken out of blkdev_aio_write.
I also observed a crash in the queue_work call when running against a block
device. For block devices, it looks like in do_blockdev_direct_IO,
iocb->ki_filp->f_mapping->host is an inode in the bdev filesystem, and
iocb->ki_filp->f_dentry->d_inode->i_sb is whichever inode the user accessed
(probably devtmpfs or something). The two are of course equal for regular
files.
Since block devices are (I think) part of their own bdev filesystem, I think it
makes more sense if we call queue_work against the flush_wq of the bdev fs, not
the fs that just happened to contain the device file.
Will send patches after I clean 'em up and test them a bit more.
--D
>
> Cheers,
> Jeff
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC PATCH] blkdev: Fix up AIO+DIO+O_SYNC to do the sync part correctly
2012-11-06 16:54 ` Jeff Moyer
2012-11-06 19:42 ` Darrick J. Wong
@ 2012-11-06 20:26 ` Darrick J. Wong
1 sibling, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2012-11-06 20:26 UTC (permalink / raw)
To: Jeff Moyer; +Cc: linux-fsdevel
For block devices, use the DIO_SYNC_WRITES so that flushes are issued /after/
the write completes, not before. Furthermore, we need to use the bdevfs
workqueue to issue the flush, not the fs that just happened to contain the
device node.
This patch requires Jeff Moyer's "fix up AIO+DIO+O_SYNC to actually do the sync
part" patch set.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/block_dev.c | 5 +++--
fs/direct-io.c | 2 +-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 1a1e5e3..05ff33a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -235,7 +235,8 @@ blkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
struct inode *inode = file->f_mapping->host;
return __blockdev_direct_IO(rw, iocb, inode, I_BDEV(inode), iov, offset,
- nr_segs, blkdev_get_blocks, NULL, NULL, 0);
+ nr_segs, blkdev_get_blocks, NULL, NULL,
+ DIO_SYNC_WRITES);
}
int __sync_blockdev(struct block_device *bdev, int wait)
@@ -1631,7 +1632,7 @@ ssize_t blkdev_aio_write(struct kiocb *iocb, const struct iovec *iov,
percpu_down_read(&bdev->bd_block_size_semaphore);
ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
- if (ret > 0 || ret == -EIOCBQUEUED) {
+ if (ret > 0) {
ssize_t err;
err = generic_write_sync(file, pos, ret);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 25dbd14..33c0bc2 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -258,7 +258,7 @@ void generic_dio_end_io(struct kiocb *iocb, loff_t offset, ssize_t bytes,
work->ret = ret;
work->offset = offset;
work->len = bytes;
- queue_work(inode->i_sb->s_dio_flush_wq, &work->work);
+ queue_work(iocb->ki_filp->f_mapping->host->i_sb->s_dio_flush_wq, &work->work);
} else {
aio_complete(iocb, ret, 0);
inode_dio_done(inode);
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-11-06 20:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-06 2:21 Race between flush and write during an AIO+DIO+O_SYNC write? Darrick J. Wong
2012-11-06 16:54 ` Jeff Moyer
2012-11-06 19:42 ` Darrick J. Wong
2012-11-06 20:26 ` [RFC PATCH] blkdev: Fix up AIO+DIO+O_SYNC to do the sync part correctly Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).