From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:49916 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755631Ab3JADhd (ORCPT ); Mon, 30 Sep 2013 23:37:33 -0400 Date: Tue, 1 Oct 2013 11:37:22 +0800 From: Liu Bo To: Josef Bacik Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH] Btrfs: fix crash of compressed writes Message-ID: <20131001033721.GA32575@localhost.localdomain> Reply-To: bo.li.liu@oracle.com References: <1380544797-29798-1-git-send-email-bo.li.liu@oracle.com> <20130930170249.GO18681@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20130930170249.GO18681@localhost.localdomain> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Mon, Sep 30, 2013 at 01:02:49PM -0400, Josef Bacik wrote: > On Mon, Sep 30, 2013 at 08:39:57PM +0800, Liu Bo wrote: > > The crash[1] is found by xfstests/generic/208 with "-o compress", > > it's not reproduced everytime, but it does panic. > > > > The bug is quite interesting, it's actually introduced by a recent commit > > (573aecafca1cf7a974231b759197a1aebcf39c2a, > > Btrfs: actually limit the size of delalloc range). > > > > Btrfs implements delay allocation, so during writeback, we > > (1) get a page A and lock it > > (2) search the state tree for delalloc bytes and lock all pages within the range > > (3) process the delalloc range, including find disk space and create > > ordered extent and so on. > > (4) submit the page A. > > > > It runs well in normal cases, but if we're in a racy case, eg. > > buffered compressed writes and aio-dio writes, > > sometimes we may fail to lock all pages in the 'delalloc' range, > > in which case, we need to fall back to search the state tree again with > > a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset). > > > > The mentioned commit has a side effect, that is, in the fallback case, > > we can find delalloc bytes before the index of the page we already have locked, > > so we're in the case of (delalloc_end <= *start) and return with (found > 0). > > > > This ends with not locking delalloc pages but making ->writepage still > > process them, and the crash happens. > > > > This fixes it by not enforcing the 'max_bytes' limit in the special fallback > > case. > > > > Great analysis, thank you for that, however I don't like the fix ;). Instead We > need to change the > > if (!found || delalloc_end <= *start) > > to always return 0 since we should not call fill delalloc if the delalloc range > doesn't start at our current offset. Secondly the Setting 'found = 0' is also the first idea into my mind :), but I'm not sure if it's right. I've check it out, commit 70b99e6959a4c28ae1b314985eca731f3db72f1d (Btrfs: Compression corner fixes, Signed-off-by: Chris Mason ) introduced the (delalloc_end <= *start). The commit log says, "Make sure we lock pages in the correct order during delalloc. The search into the state tree for delalloc bytes can return bytes before the page we already have locked." But I think if we find bytes before the page we locked, then we should get (found = 0). So I don't know why we need the check (delalloc_end <= *start), maybe some corner cases need that? Chris can explain a bit? > > max_bytes = PAGE_CACHE_SIZE - offset; > > is completely crap since we are talking about the entire page/sector. We should > simply set max_bytes to sectorszie or PAGE_CACHE_SIZE that way we avoid this > issue. Thanks, This might be due to historical reasons, but for now, it doesn't cause problems as the passed @start is page aligned and @offset is always 0. thanks, -liubo