From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:54150 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751933AbdF3LRB (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
        Fri, 30 Jun 2017 07:17:01 -0400
Date: Fri, 30 Jun 2017 13:16:50 +0200
From: Lukas Czerner <lczerner@redhat.com>
To: Rik van Riel <riel@redhat.com>
Cc: Jeff Moyer <jmoyer@redhat.com>, Jan Kara <jack@suse.cz>,
        linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
        esandeen@redhat.com, Christoph Hellwig <hch@lst.de>
Subject: Re: Async direct IO write vs buffered read race
Message-ID: <20170630111650.e52mkfnszh4haizn@localhost.localdomain>
References: <20170622155722.wnkicghc3rkpnvac@localhost.localdomain>
 <x49bmpg3qsp.fsf@segfault.boston.devel.redhat.com>
 <20170623075942.GC25149@quack2.suse.cz>
 <20170623101621.ggixwdjsnm7k5ch4@localhost.localdomain>
 <x49y3seeqbm.fsf@segfault.boston.devel.redhat.com>
 <1498669079.20270.120.camel@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1498669079.20270.120.camel@redhat.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Wed, Jun 28, 2017 at 12:57:59PM -0400, Rik van Riel wrote:
> On Mon, 2017-06-26 at 11:11 -0400, Jeff Moyer wrote:
> > Lukas Czerner <lczerner@redhat.com> writes:
> > 
> > > > The thing we do is a best effort thing that more or less
> > > > guarantees that if
> > > > you do say buffered IO and direct IO after that, it will work
> > > > reasonably.
> > > > However if direct and buffered IO can race, bad luck for your
> > > > data. I don't
> > > > think we want to sacrifice any performance of AIO DIO (and
> > > > offloading of
> > > > direct IO completion to a workqueue so that we can do
> > > > invalidation costs
> > > > noticeable mount of performance) for supporting such usecase.
> > > 
> > > What Jeff proposed would sacrifice performance for the case where
> > > AIO
> > > DIO write does race with buffered IO - the situation we agree is
> > > not ideal
> > > and should be avoided anyway. For the rest of AIO DIO this should
> > > have no
> > > effect right ? If true, I'd say this is a good effort to make sure
> > > we do
> > > not have disparity between page cache and disk.
> > 
> > Exactly.��Jan, are you concerned about impacting performance for
> > mixed
> > buffered I/O and direct writes?��If so, we could look into
> > restricting
> > the process context switch further, to just overlapping buffered and
> > direct I/O (assuming there are no locking issues).
> > 
> > Alternatively, since we already know this is racy, we don't actually
> > have to defer I/O completion to process context.��We could just
> > complete
> > the I/O as we normally would, but also queue up an
> > invalidate_inode_pages2_range work item.��It will be asynchronous,
> > but
> > this is best effort, anyway.
> > 
> > As Eric mentioned, the thing that bothers me is that we have invalid
> > data lingering in the page cache indefinitely.
> 
> Given that the requirement is that the page cache
> gets invalidated after IO completion, would it be
> possible to defer only the page cache invalidation
> to task context, and handle the rest of the IO
> completion in interrupt context?

Hi,

if I am reading it correctly that's basically how it works now for the
IO that has defer_completion set (filesystems set this to do extent
conversion at the completion). We'd use the same path here for
the invalidation.

-Lukas

> 
> -- 
> All rights reversed