From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Moyer Subject: Re: Race between flush and write during an AIO+DIO+O_SYNC write? Date: Tue, 06 Nov 2012 11:54:06 -0500 Message-ID: References: <20121106022155.GA4255@blackbox.djwong.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org To: "Darrick J. Wong" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:11727 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750967Ab2KFQyI (ORCPT ); Tue, 6 Nov 2012 11:54:08 -0500 In-Reply-To: <20121106022155.GA4255@blackbox.djwong.org> (Darrick J. Wong's message of "Mon, 5 Nov 2012 18:21:55 -0800") Sender: linux-fsdevel-owner@vger.kernel.org List-ID: "Darrick J. Wong" writes: > Hi all, > > One of our (app) developers noticed that io_submit() takes a very long time to > return if the program initiates a write to a block device that's been opened in > O_SYNC and O_DIRECTIO mode. We traced the slowness to blkdev_aio_write, which > seems to initiate a disk cache flush if __generic_file_aio_write returns a > positive value or -EIOCBQUEUED. Usually we see -EIOCBQUEUED returned, which > triggers the flush, hence io_submit() stalls for a long time. That doesn't > really feel like the intended usage pattern for aio. > > This -EIOCBQUEUED case seems a little strange -- if an async io has been queued > (but not necessarily completed), why would we immediately issue a cache flush? > This seems like a setup for the flush racing against the write, which means > that the write could happen after the flush, which would be bad. > > Jeff Moyer proposed a patchset last spring[1] that removed the -EIOCBQUEUED > case and deferred the flush issue to each filesystem's end_io handler. Google > doesn't find any NAKs, but the patches don't seem to have gone anywhere. Is > there a technical reason why this patches haven't gone anywhere? I never got the sign-off on the xfs bits, and I then got distracted with other work. I'll see about updating the patch set. > Could one establish an end_io handler in blkdev_direct_IO so that async writes > to an O_SYNC+DIO block device will result in a blkdev_issue_flush before > aio_complete? That would seem to fix the problem of the write and flush race. You mean like patch 1 in that series, or something different? Cheers, Jeff