[PATCH v2 0/4] btrfs: fix error handling of cow_file_range(unlock = 0)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Naohiro Aota <naohiro.aota@wdc.com>
To: linux-btrfs@vger.kernel.org
Cc: fdmanana@kernel.org, Naohiro Aota <naohiro.aota@wdc.com>
Subject: [PATCH v2 0/4] btrfs: fix error handling of cow_file_range(unlock = 0)
Date: Tue, 21 Jun 2022 15:40:58 +0900	[thread overview]
Message-ID: <cover.1655791781.git.naohiro.aota@wdc.com> (raw)

This series is a revisit of patch "btrfs: ensure pages are unlocked on
cow_file_range() failure" [1].

[1] https://lore.kernel.org/linux-btrfs/20211213034338.949507-1-naohiro.aota@wdc.com/

Changes
- v2:
  - Rephrase comment in patch 1 (Filipe)
  - Add stable target to patch 3
    - I choose 5.18+ as the target as they are after refactoring and we can
      apply the series cleanly.
  - Rephrase subject of patch 4 (Filipe)

We have a hang report like below which is caused by a locked page.

[  726.328648] INFO: task rocksdb:high0:11085 blocked for more than 241 seconds.
[  726.329839]       Not tainted 5.16.0-rc1+ #1
[  726.330484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  726.331603] task:rocksdb:high0   state:D stack:    0 pid:11085 ppid: 11082 flags:0x00000000
[  726.331608] Call Trace:
[  726.331611]  <TASK>
[  726.331614]  __schedule+0x2e5/0x9d0
[  726.331622]  schedule+0x58/0xd0
[  726.331626]  io_schedule+0x3f/0x70
[  726.331629]  __folio_lock+0x125/0x200
[  726.331634]  ? find_get_entries+0x1bc/0x240
[  726.331638]  ? filemap_invalidate_unlock_two+0x40/0x40
[  726.331642]  truncate_inode_pages_range+0x5b2/0x770
[  726.331649]  truncate_inode_pages_final+0x44/0x50
[  726.331653]  btrfs_evict_inode+0x67/0x480
[  726.331658]  evict+0xd0/0x180
[  726.331661]  iput+0x13f/0x200
[  726.331664]  do_unlinkat+0x1c0/0x2b0
[  726.331668]  __x64_sys_unlink+0x23/0x30
[  726.331670]  do_syscall_64+0x3b/0xc0
[  726.331674]  entry_SYSCALL_64_after_hwframe+0x44/0xae

When cow_file_range(unlock = 0), we never unlock a page after an ordered
extent is allocated for the page. When the allocation loop fails after some
ordered extents are allocated, we have three things to do: (1) unlock the
pages in the successfully allocated ordered extents, (2) clean-up the
allocated ordered extents, and (3) propagate the error to the userland.

However, current code fails to do (1) in the all cow_file_range(unlock = 0)
case and cause the above hang. Also, it fails to do (2) and (3) in
submit_uncompressed_range() case.

This series addresses these three issues on error handling of
cow_file_range(unlock = 0) case.

To test the series, I applied the following two patches to stress
submit_uncompressed_range() and to inject an error.

The following first diff forces the compress_type to be
BTRFS_COMPRESS_NONE, so that all "compress" writes go through
submit_uncompressed_range path.

    diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
    index 7a54f964ff37..eb12c47d02b8 100644
    --- a/fs/btrfs/inode.c
    +++ b/fs/btrfs/inode.c
    @@ -706,6 +706,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
     			compress_type = BTRFS_I(inode)->defrag_compress;
     		else if (BTRFS_I(inode)->prop_compress)
     			compress_type = BTRFS_I(inode)->prop_compress;
    +		compress_type = BTRFS_COMPRESS_NONE;
     
     		/*
     		 * we need to call clear_page_dirty_for_io on each

The following second diff limits the allocation size at most 256KB to run
the loop many times. Also, it fails the allocation at the specific offset
(fail_offset).

    diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
    index eb12c47d02b8..1247690e7021 100644
    --- a/fs/btrfs/inode.c
    +++ b/fs/btrfs/inode.c
    @@ -1155,6 +1155,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
     	bool extent_reserved = false;
     	int ret = 0;
     
    +	u64 fail_offset = SZ_1M + (u64)SZ_256K * 0;
    +
     	if (btrfs_is_free_space_inode(inode)) {
     		ret = -EINVAL;
     		goto out_unlock;
    @@ -1239,9 +1241,13 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
     
     	while (num_bytes > 0) {
     		cur_alloc_size = num_bytes;
    -		ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size,
    -					   min_alloc_size, 0, alloc_hint,
    -					   &ins, 1, 1);
    +		cur_alloc_size = min_t(u64, SZ_256K, num_bytes);
    +		if (start != fail_offset)
    +			ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size,
    +						   min_alloc_size, 0, alloc_hint,
    +						   &ins, 1, 1);
    +		else
    +			ret = -ENOSPC;
     		if (ret < 0)
     			goto out_unlock;
     		cur_alloc_size = ins.offset;

I ran the following script with these patches + the series applied changing
the fail_offset from "1MB + 256KB * 0" to "1MB + 256KB * 15" step by 256KB,
and confirmed the above three error handlings are properly done.

    run() {
    	local mkfs_opts=$1
    	local mount_opts=$2
    
    	for x in $(seq 100); do
    		echo $x
    		mkfs.btrfs -f -d single -m single ${mkfs_opts} /dev/nullb0
    		mount ${mount_opts} /dev/nullb0 /mnt/test
    		dd if=/dev/zero of=/mnt/test/file bs=1M count=4 seek=1 oflag=sync 2>&1 | tee /tmp/err
    		# check error propagation
    		grep -q 'No space left' /tmp/err || exit 1
    		sync
    		umount /mnt/test
    		dmesg | grep -q WARN && exit 1
    	done
    }
    
    run ""         ""
    run ""         "-o compress-force"
    run "-O zoned" ""
    run "-O zoned" "-o compress-force"

Patch 1 addresses the (1) by unlocking the pages in the error case. Also,
it adds a figure to clarify what to do for each range in the error case.

Patches 2 and 3 do (2) and (3) by fixing the error case of
submit_uncompressed_range().

Patch 4 is a refactoring patch to replace unnecessary "goto out"s with
direct return.

Naohiro Aota (4):
  btrfs: ensure pages are unlocked on cow_file_range() failure
  btrfs: extend btrfs_cleanup_ordered_extens for NULL locked_page
  btrfs: fix error handling of fallbacked uncompress write
  btrfs: replace unnecessary goto with direct return at cow_file_range()

 fs/btrfs/inode.c | 132 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 103 insertions(+), 29 deletions(-)

-- 
2.35.1

next             reply	other threads:[~2022-06-21  6:44 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-21  6:40 Naohiro Aota [this message]
2022-06-21  6:40 ` [PATCH v2 1/4] btrfs: ensure pages are unlocked on cow_file_range() failure Naohiro Aota
2022-06-21  6:41 ` [PATCH v2 2/4] btrfs: extend btrfs_cleanup_ordered_extens for NULL locked_page Naohiro Aota
2022-06-21  6:41 ` [PATCH v2 3/4] btrfs: fix error handling of fallbacked uncompress write Naohiro Aota
2022-06-21 15:04   ` Filipe Manana
2022-06-22  0:56     ` Naohiro Aota
2022-06-22  9:13       ` Filipe Manana
2022-06-21  6:41 ` [PATCH v2 4/4] btrfs: replace unnecessary goto with direct return at cow_file_range() Naohiro Aota
2022-06-22 13:58   ` David Sterba
2022-06-22  9:14 ` [PATCH v2 0/4] btrfs: fix error handling of cow_file_range(unlock = 0) Filipe Manana
2022-06-22 13:59 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1655791781.git.naohiro.aota@wdc.com \
    --to=naohiro.aota@wdc.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.