From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:35616 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750916AbeAKHys (ORCPT ); Thu, 11 Jan 2018 02:54:48 -0500 Date: Thu, 11 Jan 2018 15:54:41 +0800 From: Eryu Guan Subject: Re: [PATCH] xfs: recheck reflink / dirty page status before freeing CoW reservations Message-ID: <20180111075441.GH5123@eguan.usersys.redhat.com> References: <20180110220336.GU5602@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180110220336.GU5602@magnolia> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: "Darrick J. Wong" Cc: xfs On Wed, Jan 10, 2018 at 02:03:36PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong > > Eryu Guan reported seeing occasional hangs when running generic/269 with > a new fsstress that supports clonerange/deduperange. The cause of this > hang is an infinite loop when we convert the CoW fork extents from > unwritten to real just prior to writing the pages out; the infinite > loop happens because there's nothing in the CoW fork to convert, and so > it spins forever. > > The underlying issue here is that when we go to perform these CoW fork > conversions, we're supposed to have an extent waiting for us, but the > low space CoW reaper has snuck in and blown them away! There are four > conditions that can dissuade the reaper from touching our file -- no > reflink iflag; dirty page cache; writeback in progress; or directio in > progress. We check the four conditions prior to taking the locks, but > we neglect to recheck them once we have the locks, which is how we end > up whacking the writeback that's in progress. > > Therefore, refactor the four checks into a helper function and call it > once again once we have the locks to make sure we really want to reap > the inode. While we're at it, add an ASSERT for this weird condition so > that we'll fail noisily if we ever screw this up again. > > Reported-by: Eryu Guan > Signed-off-by: Darrick J. Wong I applied this patch on top of v4.15-rc5 kernel, and ran generic/083 generic/269 and generic/270 (where I hit the soft lockup and hang before) multiple times and tests all passed. I also ran all tests in 'enospc' group on 1k/2k/4k XFS with reflink enabled, tests passed too. So Tested-by: Eryu Guan Thanks! Eryu