From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3BD0DC433F5
	for <linux-btrfs@archiver.kernel.org>; Thu, 19 May 2022 13:39:04 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S236947AbiESNi7 (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Thu, 19 May 2022 09:38:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41688 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229604AbiESNi4 (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Thu, 19 May 2022 09:38:56 -0400
Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F921A5019
        for <linux-btrfs@vger.kernel.org>; Thu, 19 May 2022 06:38:55 -0700 (PDT)
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
        (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
        (No client certificate requested)
        by dfw.source.kernel.org (Postfix) with ESMTPS id B65D661797
        for <linux-btrfs@vger.kernel.org>; Thu, 19 May 2022 13:38:54 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4C71C385AA;
        Thu, 19 May 2022 13:38:53 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
        s=k20201202; t=1652967534;
        bh=kmHJytpIHPFoRyCFS7VL0n6L+/IvyUxIdkH2lRcfBds=;
        h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
        b=R1N4i2lYLWWaOqxQNnl8NUN2g7TtNfizP7/WPEbRGnZ8BO/qMbs1G2JfthFDMvW70
         LAFam1kvcMKkHcEJyihR6n1Rwk8XtX3B1eAXtbGHvheZraqVehWsL99wq4y8G4/E8g
         jfkqvOBPGWJHMCTzg/xHlG8rhzpy6e3BeQzj/EDK/GB5s4+52RBzpgjJpdIooWtNLg
         lUJFcNk3YmsDSKF7eGkcJON7MUjkVQ4Jki6bfYU3m4O9xrabBfHeTfVlk/W1saZbz8
         a5Wf8dfCqVD+x8dDVhlOoeRPS7S9GLfPl0w+Vtgz/uuMNqSMv0nyHPaClJW+eYNHgt
         12t+MG/Q/qgPw==
Date:   Thu, 19 May 2022 14:38:50 +0100
From:   Filipe Manana <fdmanana@kernel.org>
To:     Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc:     Naohiro Aota <Naohiro.Aota@wdc.com>,
        "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
        David Sterba <dsterba@suse.com>,
        Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
        Filipe Manana <fdmanana@suse.com>,
        Josef Bacik <josef@toxicpanda.com>
Subject: Re: [PATCH] btrfs: ensure pages are unlocked on cow_file_range()
 failure
Message-ID: <20220519133850.GA2735952@falcondesktop>
References: <20211213034338.949507-1-naohiro.aota@wdc.com>
 <PH0PR04MB741660777362929B7E3D11DB9BD09@PH0PR04MB7416.namprd04.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <PH0PR04MB741660777362929B7E3D11DB9BD09@PH0PR04MB7416.namprd04.prod.outlook.com>
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

On Thu, May 19, 2022 at 12:24:00PM +0000, Johannes Thumshirn wrote:
> What's the status of this patch? It fixes actual errors 
> (hung_tasks) for me.

Well, there was previous review about it, and nothing was addressed in the
meanwhile.

It may happen to fix the hang for some test case for fstests on zoned mode.
But there's a fundamental problem with the error handling as Josef pointed
before - this is an existing problem, and not a problem with this patch or
exclusive to zoned mode.

The problem is if we fail on the second iteration of the while loop, when
reserving an extent for example, we leave the loop and with an ordered
extent created in the previous iteration, and return an error to the caller.
After that we end up never submitting a bio for that ordered extent's range,
which means we end up with an ordered extent that will never be completed.
So something like an fsync after that will hang forever for example, or
anything else calling btrfs_wait_ordered_range().

So on error we need to go through previously created ordered extents, set
the IOERR flag on them, complete them to wake up any waiters and remove it,
which also takes care or adding the reserved extent back to the free space
cache/tree.


> 
> On 13/12/2021 04:43, Naohiro Aota wrote:
> > There is a hung_task report regarding page lock on zoned btrfs like below.
> > 
> > https://github.com/naota/linux/issues/59
> > 
> > [  726.328648] INFO: task rocksdb:high0:11085 blocked for more than 241 seconds.
> > [  726.329839]       Not tainted 5.16.0-rc1+ #1
> > [  726.330484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  726.331603] task:rocksdb:high0   state:D stack:    0 pid:11085 ppid: 11082 flags:0x00000000
> > [  726.331608] Call Trace:
> > [  726.331611]  <TASK>
> > [  726.331614]  __schedule+0x2e5/0x9d0
> > [  726.331622]  schedule+0x58/0xd0
> > [  726.331626]  io_schedule+0x3f/0x70
> > [  726.331629]  __folio_lock+0x125/0x200
> > [  726.331634]  ? find_get_entries+0x1bc/0x240
> > [  726.331638]  ? filemap_invalidate_unlock_two+0x40/0x40
> > [  726.331642]  truncate_inode_pages_range+0x5b2/0x770
> > [  726.331649]  truncate_inode_pages_final+0x44/0x50
> > [  726.331653]  btrfs_evict_inode+0x67/0x480
> > [  726.331658]  evict+0xd0/0x180
> > [  726.331661]  iput+0x13f/0x200
> > [  726.331664]  do_unlinkat+0x1c0/0x2b0
> > [  726.331668]  __x64_sys_unlink+0x23/0x30
> > [  726.331670]  do_syscall_64+0x3b/0xc0
> > [  726.331674]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  726.331677] RIP: 0033:0x7fb9490a171b
> > [  726.331681] RSP: 002b:00007fb943ffac68 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > [  726.331684] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb9490a171b
> > [  726.331686] RDX: 00007fb943ffb040 RSI: 000055a6bbe6ec20 RDI: 00007fb94400d300
> > [  726.331687] RBP: 00007fb943ffad00 R08: 0000000000000000 R09: 0000000000000000
> > [  726.331688] R10: 0000000000000031 R11: 0000000000000246 R12: 00007fb943ffb000
> > [  726.331690] R13: 00007fb943ffb040 R14: 0000000000000000 R15: 00007fb943ffd260
> > [  726.331693]  </TASK>
> > 
> > While we debug the issue, we found running fstests generic/551 on 5GB
> > non-zoned null_blk device in the emulated zoned mode also had a
> > similar hung issue.
> > 
> > The hang occurs when cow_file_range() fails in the middle of
> > allocation. cow_file_range() called from do_allocation_zoned() can
> > split the give region ([start, end]) for allocation depending on
> > current block group usages. When btrfs can allocate bytes for one part
> > of the split regions but fails for the other region (e.g. because of
> > -ENOSPC), we return the error leaving the pages in the succeeded regions
> > locked. Technically, this occurs only when @unlock == 0. Otherwise, we
> > unlock the pages in an allocated region after creating an ordered
> > extent.
> > 
> > Theoretically, the same issue can happen on
> > submit_uncompressed_range(). However, I could not make it happen even
> > if I modified the code to go always through
> > submit_uncompressed_range().
> > 
> > Considering the callers of cow_file_range(unlock=0) won't write out
> > the pages, we can unlock the pages on error exit from
> > cow_file_range(). So, we can ensure all the pages except @locked_page
> > are unlocked on error case.
> > 
> > In summary, cow_file_range now behaves like this:
> > 
> > - page_started == 1 (return value)
> >   - All the pages are unlocked. IO is started.
> > - unlock == 0
> >   - All the pages except @locked_page are unlocked in any case
> > - unlock == 1
> >   - On success, all the pages are locked for writing out them
> >   - On failure, all the pages except @locked_page are unlocked
> >