From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp2120.oracle.com ([156.151.31.85]:51280 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725913AbeKUJb4 (ORCPT ); Wed, 21 Nov 2018 04:31:56 -0500 Date: Tue, 20 Nov 2018 15:00:17 -0800 From: "Darrick J. Wong" To: Dave Chinner Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 1/5] iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents Message-ID: <20181120230017.GL6792@magnolia> References: <20181119211742.8824-1-david@fromorbit.com> <20181119211742.8824-2-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181119211742.8824-2-david@fromorbit.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, Nov 20, 2018 at 08:17:38AM +1100, Dave Chinner wrote: > From: Dave Chinner > > When we write into an unwritten extent via direct IO, we dirty > metadata on IO completion to convert the unwritten extent to > written. However, when we do the FUA optimisation checks, the inode > may be clean and so we issue a FUA write into the unwritten extent. > This means we then bypass the generic_write_sync() call after > unwritten extent conversion has ben done and we don't force the > modified metadata to stable storage. > > This violates O_DSYNC semantics. The window of exposure is a single > IO, as the next DIO write will see the inode has dirty metadata and > hence will not use the FUA optimisation. Calling > generic_write_sync() after completion of the second IO will also > sync the first write and it's metadata. > > Fix this by avoiding the FUA optimisation when writing to unwritten > extents. > > Signed-off-by: Dave Chinner Looks ok, Reviewed-by: Darrick J. Wong --D > --- > fs/iomap.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/fs/iomap.c b/fs/iomap.c > index 64ce240217a1..72f3864a2e6b 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -1596,12 +1596,13 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, > > if (iomap->flags & IOMAP_F_NEW) { > need_zeroout = true; > - } else { > + } else if (iomap->type == IOMAP_MAPPED) { > /* > - * Use a FUA write if we need datasync semantics, this > - * is a pure data IO that doesn't require any metadata > - * updates and the underlying device supports FUA. This > - * allows us to avoid cache flushes on IO completion. > + * Use a FUA write if we need datasync semantics, this is a pure > + * data IO that doesn't require any metadata updates (including > + * after IO completion such as unwritten extent conversion) and > + * the underlying device supports FUA. This allows us to avoid > + * cache flushes on IO completion. > */ > if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && > (dio->flags & IOMAP_DIO_WRITE_FUA) && > -- > 2.19.1 >