From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp1040.oracle.com ([156.151.31.81]:22364 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758038AbaGOKsE (ORCPT ); Tue, 15 Jul 2014 06:48:04 -0400 Date: Tue, 15 Jul 2014 18:47:52 +0800 From: Liu Bo To: Miao Xie Cc: linux-btrfs Subject: Re: [PATCH] Btrfs: fix abnormal long waiting in fsync Message-ID: <20140715104751.GB26977@localhost.localdomain> Reply-To: bo.li.liu@oracle.com References: <1405416674-17208-1-git-send-email-bo.li.liu@oracle.com> <53C500D2.3060801@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <53C500D2.3060801@cn.fujitsu.com> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Tue, Jul 15, 2014 at 06:22:10PM +0800, Miao Xie wrote: > On Tue, 15 Jul 2014 17:31:14 +0800, Liu Bo wrote: > > xfstests generic/127 detected this problem. > > > > With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush > > data within the passed range. This is the cause of the above problem, > > -- btrfs's fsync has a stage called 'sync log' which will wait for all the > > ordered extents it've recorded to finish. > > > > In xfstests/generic/127, with mixed operations such as truncate, fallocate, > > punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will > > mmap, and then msync. And I find that msync will wait for quite a long time > > (about 20s in my case), thanks to ftrace, it turns out that the previous > > fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the > > range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants, > > btrfs_sync_file also calls 'btrfs_wait_ordered_range()' and introduces the same > problem. Yeah, looks that it will. > > > there can be some ordered extents created but not getting corresponding pages > > flushed, then they're left in memory until we fsync which runs into the > > stage 'sync log', and fsync will just wait for the system writeback thread > > to flush those pages and get ordered extents finished, so the latency is > > inevitable. > > > > This adds a non-blocked flush, filemap_flush(), in btrfs_sync_file() to fix > > that. > > I think this fix is not so good, because it will flush the pages that is not > relative to the current sync. I think the key reason is btrfs_wait_logged_extents(), > that just wait the ordered extents, not flush the relative dirty pages. > > So the more reasonable fix is to use btrfs_start_ordered_extent() instead of > wait_event in btrfs_wait_logged_extents(). It should work, the only difference is that here we wait for BTRFS_ORDERED_IO_DONE instead of COMPLETE. Will give a shot. thanks, -liubo > > (This above is just my analysis) > > Thanks > Miao > > > > > Signed-off-by: Liu Bo > > --- > > fs/btrfs/file.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > > index 1f2b99c..1af395d 100644 > > --- a/fs/btrfs/file.c > > +++ b/fs/btrfs/file.c > > @@ -2002,6 +2002,8 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) > > > > if (ret != BTRFS_NO_LOG_SYNC) { > > if (!ret) { > > + filemap_flush(inode->i_mapping); > > + > > ret = btrfs_sync_log(trans, root, &ctx); > > if (!ret) { > > ret = btrfs_end_transaction(trans, root); > > >