From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeff Moyer <jmoyer@redhat.com>
Subject: Re: Race between flush and write during an AIO+DIO+O_SYNC write?
Date: Tue, 06 Nov 2012 11:54:06 -0500
Message-ID: <x49ehk6pt8x.fsf@segfault.boston.devel.redhat.com>
References: <20121106022155.GA4255@blackbox.djwong.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: linux-fsdevel@vger.kernel.org
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:11727 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750967Ab2KFQyI (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 6 Nov 2012 11:54:08 -0500
In-Reply-To: <20121106022155.GA4255@blackbox.djwong.org> (Darrick J. Wong's
	message of "Mon, 5 Nov 2012 18:21:55 -0800")
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

"Darrick J. Wong" <darrick.wong@oracle.com> writes:

> Hi all,
>
> One of our (app) developers noticed that io_submit() takes a very long time to
> return if the program initiates a write to a block device that's been opened in
> O_SYNC and O_DIRECTIO mode.  We traced the slowness to blkdev_aio_write, which
> seems to initiate a disk cache flush if __generic_file_aio_write returns a
> positive value or -EIOCBQUEUED.  Usually we see -EIOCBQUEUED returned, which
> triggers the flush, hence io_submit() stalls for a long time.  That doesn't
> really feel like the intended usage pattern for aio.
>
> This -EIOCBQUEUED case seems a little strange -- if an async io has been queued
> (but not necessarily completed), why would we immediately issue a cache flush?
> This seems like a setup for the flush racing against the write, which means
> that the write could happen after the flush, which would be bad.
>
> Jeff Moyer proposed a patchset last spring[1] that removed the -EIOCBQUEUED
> case and deferred the flush issue to each filesystem's end_io handler.  Google
> doesn't find any NAKs, but the patches don't seem to have gone anywhere.  Is
> there a technical reason why this patches haven't gone anywhere?

I never got the sign-off on the xfs bits, and I then got distracted with
other work.  I'll see about updating the patch set.

> Could one establish an end_io handler in blkdev_direct_IO so that async writes
> to an O_SYNC+DIO block device will result in a blkdev_issue_flush before
> aio_complete?  That would seem to fix the problem of the write and flush race.

You mean like patch 1 in that series, or something different?

Cheers,
Jeff