From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC9B8CA9EB9 for ; Tue, 29 Oct 2019 04:41:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9BD3420862 for ; Tue, 29 Oct 2019 04:41:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731729AbfJ2Eli (ORCPT ); Tue, 29 Oct 2019 00:41:38 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:57015 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727294AbfJ2Eli (ORCPT ); Tue, 29 Oct 2019 00:41:38 -0400 Received: from dread.disaster.area (pa49-180-67-183.pa.nsw.optusnet.com.au [49.180.67.183]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id B96077E98C6; Tue, 29 Oct 2019 15:41:34 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1iPJK1-0000Jo-Cs; Tue, 29 Oct 2019 15:41:33 +1100 Date: Tue, 29 Oct 2019 15:41:33 +1100 From: Dave Chinner To: "Darrick J. Wong" Cc: linux-xfs@vger.kernel.org Subject: Re: [PATCH] xfs: properly serialise fallocate against AIO+DIO Message-ID: <20191029044133.GN4614@dread.disaster.area> References: <20191029034850.8212-1-david@fromorbit.com> <20191029041908.GB15222@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191029041908.GB15222@magnolia> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=3wLbm4YUAFX2xaPZIabsgw==:117 a=3wLbm4YUAFX2xaPZIabsgw==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=XobE76Q3jBoA:10 a=7-415B0cAAAA:8 a=8VJdBDI6uDDErkn2tlAA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Mon, Oct 28, 2019 at 09:19:08PM -0700, Darrick J. Wong wrote: > On Tue, Oct 29, 2019 at 02:48:50PM +1100, Dave Chinner wrote: > > --- a/fs/xfs/xfs_bmap_util.c > > +++ b/fs/xfs/xfs_bmap_util.c > > @@ -1040,6 +1040,7 @@ xfs_unmap_extent( > > goto out_unlock; > > } > > > > +/* Caller must first wait for the completion of any pending DIOs if required. */ > > int > > xfs_flush_unmap_range( > > struct xfs_inode *ip, > > @@ -1051,9 +1052,6 @@ xfs_flush_unmap_range( > > xfs_off_t rounding, start, end; > > int error; > > > > - /* wait for the completion of any pending DIOs */ > > - inode_dio_wait(inode); > > Does xfs_reflink_remap_prep still need this function to call inode_dio_wait > before zapping the page cache prior to reflinking into an existing file? No, because that is done in generic_remap_file_range_prep() after we have locked the inodes and broken leases in xfs_reflink_remap_prep(). > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index 525b29b99116..865543e41fb4 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -817,6 +817,36 @@ xfs_file_fallocate( > > if (error) > > goto out_unlock; > > > > + /* > > + * Must wait for all AIO to complete before we continue as AIO can > > + * change the file size on completion without holding any locks we > > + * currently hold. We must do this first because AIO can update both > > + * the on disk and in memory inode sizes, and the operations that follow > > + * require the in-memory size to be fully up-to-date. > > + */ > > + inode_dio_wait(inode); > > + > > + /* > > + * Now AIO and DIO has drained we flush and (if necessary) invalidate > > + * the cached range over the first operation we are about to run. > > + * > > + * We care about zero and collapse here because they both run a hole > > + * punch over the range first. Because that can zero data, and the range > > + * of invalidation for the shift operations is much larger, we still do > > + * the required flush for collapse in xfs_prepare_shift(). > > + * > > + * Insert has the same range requirements as collapse, and we extend the > > + * file first which can zero data. Hence insert has the same > > + * flush/invalidate requirements as collapse and so they are both > > + * handled at the right time by xfs_prepare_shift(). > > + */ > > + if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE | > > + FALLOC_FL_COLLAPSE_RANGE)) { > > Er... "Insert has the same requirements as collapse", but we don't test > for that here? Also ... xfs_prepare_shift handles flushing for both > collapse and insert range, but we still have to flush here for collapse? > > only do insert range after updating the isize?> Yes, exactly. The flush for collapse here is for the hole punch part of collapse, before we start shifting extents. insert does not hole punch, so it doesn't need flushing here but it still needs flush/inval before shifting. i.e.: collapse insert flush_unmap(off, len) punch hole(off, len) extends EOF writes zeros around (off,len) writes zeros around EOF collapse(off, len) insert(off, len) flush_unmap(off, EOF) flush_unmap(off, EOF) shift extents down shift extents up So once we start the actual extent shift operation (up or down) the flush/unmap requirements are identical. > I think the third paragraph of the comment is just confusing me more. > Does the following describe what's going on? > > "Insert range has the same range [should this be "page cache flushing"?] > requirements as collapse. Because we can zero data as part of extending > the file size, we skip the flush here and let the flush in > xfs_prepare_shift take care of invalidating the page cache." ? It's a bit better - that's kinda what I was trying to describe - but I'll try to reword it more clearly after I've let it settle in my head for a little while.... Cheers, Dave. -- Dave Chinner david@fromorbit.com