From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:54150 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751933AbdF3LRB (ORCPT ); Fri, 30 Jun 2017 07:17:01 -0400 Date: Fri, 30 Jun 2017 13:16:50 +0200 From: Lukas Czerner To: Rik van Riel Cc: Jeff Moyer , Jan Kara , linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, esandeen@redhat.com, Christoph Hellwig Subject: Re: Async direct IO write vs buffered read race Message-ID: <20170630111650.e52mkfnszh4haizn@localhost.localdomain> References: <20170622155722.wnkicghc3rkpnvac@localhost.localdomain> <20170623075942.GC25149@quack2.suse.cz> <20170623101621.ggixwdjsnm7k5ch4@localhost.localdomain> <1498669079.20270.120.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1498669079.20270.120.camel@redhat.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Jun 28, 2017 at 12:57:59PM -0400, Rik van Riel wrote: > On Mon, 2017-06-26 at 11:11 -0400, Jeff Moyer wrote: > > Lukas Czerner writes: > > > > > > The thing we do is a best effort thing that more or less > > > > guarantees that if > > > > you do say buffered IO and direct IO after that, it will work > > > > reasonably. > > > > However if direct and buffered IO can race, bad luck for your > > > > data. I don't > > > > think we want to sacrifice any performance of AIO DIO (and > > > > offloading of > > > > direct IO completion to a workqueue so that we can do > > > > invalidation costs > > > > noticeable mount of performance) for supporting such usecase. > > > > > > What Jeff proposed would sacrifice performance for the case where > > > AIO > > > DIO write does race with buffered IO - the situation we agree is > > > not ideal > > > and should be avoided anyway. For the rest of AIO DIO this should > > > have no > > > effect right ? If true, I'd say this is a good effort to make sure > > > we do > > > not have disparity between page cache and disk. > > > > Exactly.��Jan, are you concerned about impacting performance for > > mixed > > buffered I/O and direct writes?��If so, we could look into > > restricting > > the process context switch further, to just overlapping buffered and > > direct I/O (assuming there are no locking issues). > > > > Alternatively, since we already know this is racy, we don't actually > > have to defer I/O completion to process context.��We could just > > complete > > the I/O as we normally would, but also queue up an > > invalidate_inode_pages2_range work item.��It will be asynchronous, > > but > > this is best effort, anyway. > > > > As Eric mentioned, the thing that bothers me is that we have invalid > > data lingering in the page cache indefinitely. > > Given that the requirement is that the page cache > gets invalidated after IO completion, would it be > possible to defer only the page cache invalidation > to task context, and handle the rest of the IO > completion in interrupt context? Hi, if I am reading it correctly that's basically how it works now for the IO that has defer_completion set (filesystems set this to do extent conversion at the completion). We'd use the same path here for the invalidation. -Lukas > > -- > All rights reversed