From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp1040.oracle.com ([156.151.31.81]:43757 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751505AbaGQDAz (ORCPT ); Wed, 16 Jul 2014 23:00:55 -0400 Date: Thu, 17 Jul 2014 11:00:47 +0800 From: Liu Bo To: Miao Xie Cc: linux-btrfs Subject: Re: [PATCH v2] Btrfs: fix abnormal long waiting in fsync Message-ID: <20140717030046.GB4271@localhost.localdomain> Reply-To: bo.li.liu@oracle.com References: <1405416674-17208-1-git-send-email-bo.li.liu@oracle.com> <1405496225-22416-1-git-send-email-bo.li.liu@oracle.com> <53C64787.70700@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <53C64787.70700@cn.fujitsu.com> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Wed, Jul 16, 2014 at 05:36:07PM +0800, Miao Xie wrote: > On Wed, 16 Jul 2014 15:37:05 +0800, Liu Bo wrote: > > xfstests generic/127 detected this problem. > > > > With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush > > data within the passed range. This is the cause of the above problem, > > -- btrfs's fsync has a stage called 'sync log' which will wait for all the > > ordered extents it've recorded to finish. > > > > In xfstests/generic/127, with mixed operations such as truncate, fallocate, > > punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will > > mmap, and then msync. And I find that msync will wait for quite a long time > > (about 20s in my case), thanks to ftrace, it turns out that the previous > > fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the > > range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants, > > there can be some ordered extents created but not getting corresponding pages > > flushed, then they're left in memory until we fsync which runs into the > > stage 'sync log', and fsync will just wait for the system writeback thread > > to flush those pages and get ordered extents finished, so the latency is > > inevitable. > > > > This adds a flush similar to btrfs_start_ordered_extent() in > > btrfs_wait_logged_extents() to fix that. > > > > Signed-off-by: Liu Bo > > --- > > v2: > > Move flush part into btrfs_wait_logged_extents() to get the flush range > > more precise. > > > > fs/btrfs/ordered-data.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c > > index e12441c..3b52a76 100644 > > --- a/fs/btrfs/ordered-data.c > > +++ b/fs/btrfs/ordered-data.c > > @@ -484,8 +484,16 @@ void btrfs_wait_logged_extents(struct btrfs_root *log, u64 transid) > > log_list); > > list_del_init(&ordered->log_list); > > spin_unlock_irq(&log->log_extents_lock[index]); > > + > > + WARN_ON(!ordered->inode); > > + if (!test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) > > + filemap_fdatawrite_range(ordered->inode->i_mapping, > > + ordered->file_offset, > > + ordered->file_offset + ordered->len - 1); > > I can use bytes_left to filter the ordered extents that have been written out. I prefer to use BTRFS_ORDERED_IO_DONE, but no big difference with bytes_left :) > > The other is OK. > > Reviewed-by: Miao Xie thanks, -liubo > > > + > > wait_event(ordered->wait, test_bit(BTRFS_ORDERED_IO_DONE, > > &ordered->flags)); > > + > > btrfs_put_ordered_extent(ordered); > > spin_lock_irq(&log->log_extents_lock[index]); > > } > > >