From: Nikolay Borisov <nborisov@suse.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: dsterba@suse.com, linux-btrfs@vger.kernel.org, wqu@suse.com
Subject: Re: [PATCH v2] btrfs: Fix error handling in btrfs_cleanup_ordered_extents
Date: Tue, 20 Nov 2018 23:09:03 +0200 [thread overview]
Message-ID: <1af84f15-4ef4-0aaf-60a4-f4c3275e08ba@suse.com> (raw)
In-Reply-To: <20181120190050.u4pyaa6m5p5tck27@macbook-pro-91.dhcp.thefacebook.com>
On 20.11.18 г. 21:00 ч., Josef Bacik wrote:
> On Fri, Oct 26, 2018 at 02:41:55PM +0300, Nikolay Borisov wrote:
>> Running btrfs/124 in a loop hung up on me sporadically with the
>> following call trace:
>> btrfs D 0 5760 5324 0x00000000
>> Call Trace:
>> ? __schedule+0x243/0x800
>> schedule+0x33/0x90
>> btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs]
>> ? wait_woken+0xa0/0xa0
>> btrfs_wait_ordered_range+0xbb/0x100 [btrfs]
>> btrfs_relocate_block_group+0x1ff/0x230 [btrfs]
>> btrfs_relocate_chunk+0x49/0x100 [btrfs]
>> btrfs_balance+0xbeb/0x1740 [btrfs]
>> btrfs_ioctl_balance+0x2ee/0x380 [btrfs]
>> btrfs_ioctl+0x1691/0x3110 [btrfs]
>> ? lockdep_hardirqs_on+0xed/0x180
>> ? __handle_mm_fault+0x8e7/0xfb0
>> ? _raw_spin_unlock+0x24/0x30
>> ? __handle_mm_fault+0x8e7/0xfb0
>> ? do_vfs_ioctl+0xa5/0x6e0
>> ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
>> do_vfs_ioctl+0xa5/0x6e0
>> ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
>> ksys_ioctl+0x3a/0x70
>> __x64_sys_ioctl+0x16/0x20
>> do_syscall_64+0x60/0x1b0
>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Turns out during page writeback it's possible that the code in
>> writepage_delalloc can instantiate a delalloc range which doesn't
>> belong to the page currently being written back. This happens since
>> find_lock_delalloc_range returns up to BTRFS_MAX_EXTENT_SIZE delalloc
>> range when asked and doens't really consider the range of the passed
>> page. When such a foregin range is found the code proceeds to
>> run_delalloc_range and calls the appropriate function to fill the
>> delalloc and create ordered extents. If, however, a failure occurs
>> while this operation is in effect then the clean up code in
>> btrfs_cleanup_ordered_extents will be called. This function has the
>> wrong assumption that caller of run_delalloc_range always properly
>> cleans the first page of the range hence when it calls
>> __endio_write_update_ordered it explicitly ommits the first page of
>> the delalloc range. This is wrong because this function could be
>> cleaning a delalloc range that doesn't belong to the current page. This
>> in turn means that the page cleanup code in __extent_writepage will
>> not really free the initial page for the range, leaving a hanging
>> ordered extent with bytes_left set to 4k. This bug has been present
>> ever since the original introduction of the cleanup code in 524272607e88.
>>
>> Fix this by correctly checking whether the current page belongs to the
>> range being instantiated and if so correctly adjust the range parameters
>> passed for cleaning up. If it doesn't, then just clean the whole OE
>> range directly.
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang")
>
> Can we just remove the endio cleanup in __extent_writepage() and make this do
> the proper cleanup? I'm not sure if that is feasible or not, but seems like it
> would make the cleanup stuff less confusing and more straightforward. If not
> you can add
Quickly skimming the code I think the cleanup in __extent_writepage
could be moved into __extent_writepage_io where we have 2 branches that
set PageError. So I guess it could be done, but I will have to
experiment with it.
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>
> Thanks,
>
> Josef
>
prev parent reply other threads:[~2018-11-20 21:09 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-26 11:41 [PATCH v2] btrfs: Fix error handling in btrfs_cleanup_ordered_extents Nikolay Borisov
2018-10-26 11:53 ` Qu Wenruo
2018-10-26 12:02 ` Nikolay Borisov
2018-10-29 5:53 ` Qu Wenruo
2018-10-29 7:51 ` Nikolay Borisov
2018-10-29 12:21 ` Nikolay Borisov
2018-10-29 14:39 ` Nikolay Borisov
2018-11-20 19:00 ` Josef Bacik
2018-11-20 21:09 ` Nikolay Borisov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1af84f15-4ef4-0aaf-60a4-f4c3275e08ba@suse.com \
--to=nborisov@suse.com \
--cc=dsterba@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).