linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: dsterba@suse.com, linux-btrfs@vger.kernel.org, wqu@suse.com
Subject: Re: [PATCH v2] btrfs: Fix error handling in btrfs_cleanup_ordered_extents
Date: Tue, 20 Nov 2018 23:09:03 +0200	[thread overview]
Message-ID: <1af84f15-4ef4-0aaf-60a4-f4c3275e08ba@suse.com> (raw)
In-Reply-To: <20181120190050.u4pyaa6m5p5tck27@macbook-pro-91.dhcp.thefacebook.com>



On 20.11.18 г. 21:00 ч., Josef Bacik wrote:
> On Fri, Oct 26, 2018 at 02:41:55PM +0300, Nikolay Borisov wrote:
>> Running btrfs/124 in a loop hung up on me sporadically with the
>> following call trace:
>> 	btrfs           D    0  5760   5324 0x00000000
>> 	Call Trace:
>> 	 ? __schedule+0x243/0x800
>> 	 schedule+0x33/0x90
>> 	 btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs]
>> 	 ? wait_woken+0xa0/0xa0
>> 	 btrfs_wait_ordered_range+0xbb/0x100 [btrfs]
>> 	 btrfs_relocate_block_group+0x1ff/0x230 [btrfs]
>> 	 btrfs_relocate_chunk+0x49/0x100 [btrfs]
>> 	 btrfs_balance+0xbeb/0x1740 [btrfs]
>> 	 btrfs_ioctl_balance+0x2ee/0x380 [btrfs]
>> 	 btrfs_ioctl+0x1691/0x3110 [btrfs]
>> 	 ? lockdep_hardirqs_on+0xed/0x180
>> 	 ? __handle_mm_fault+0x8e7/0xfb0
>> 	 ? _raw_spin_unlock+0x24/0x30
>> 	 ? __handle_mm_fault+0x8e7/0xfb0
>> 	 ? do_vfs_ioctl+0xa5/0x6e0
>> 	 ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
>> 	 do_vfs_ioctl+0xa5/0x6e0
>> 	 ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
>> 	 ksys_ioctl+0x3a/0x70
>> 	 __x64_sys_ioctl+0x16/0x20
>> 	 do_syscall_64+0x60/0x1b0
>> 	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Turns out during page writeback it's possible that the code in
>> writepage_delalloc can instantiate a delalloc range which doesn't
>> belong to the page currently being written back. This happens since
>> find_lock_delalloc_range returns up to BTRFS_MAX_EXTENT_SIZE delalloc
>> range when asked and doens't really consider the range of the passed
>> page. When such a foregin range is found the code proceeds to
>> run_delalloc_range and calls the appropriate function to fill the
>> delalloc and create ordered extents. If, however, a failure occurs
>> while this operation is in effect then the clean up code in
>> btrfs_cleanup_ordered_extents will be called. This function has the
>> wrong assumption that caller of run_delalloc_range always properly
>> cleans the first page of the range hence when it calls
>> __endio_write_update_ordered it explicitly ommits the first page of
>> the delalloc range. This is wrong because this function could be
>> cleaning a delalloc range that doesn't belong to the current page. This
>> in turn means that the page cleanup code in __extent_writepage will
>> not really free the initial page for the range, leaving a hanging
>> ordered extent with bytes_left set to 4k. This bug has been present
>> ever since the original introduction of the cleanup code in 524272607e88.
>>
>> Fix this by correctly checking whether the current page belongs to the
>> range being instantiated and if so correctly adjust the range parameters
>> passed for cleaning up. If it doesn't, then just clean the whole OE
>> range directly.
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang")
> 
> Can we just remove the endio cleanup in __extent_writepage() and make this do
> the proper cleanup?  I'm not sure if that is feasible or not, but seems like it
> would make the cleanup stuff less confusing and more straightforward.  If not
> you can add

Quickly skimming the code I think the cleanup in __extent_writepage
could be moved into __extent_writepage_io where we have 2 branches that
set PageError. So I guess it could be done, but I will have to
experiment with it.

> 
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> 
> Thanks,
> 
> Josef
> 

      reply	other threads:[~2018-11-20 21:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-26 11:41 [PATCH v2] btrfs: Fix error handling in btrfs_cleanup_ordered_extents Nikolay Borisov
2018-10-26 11:53 ` Qu Wenruo
2018-10-26 12:02   ` Nikolay Borisov
2018-10-29  5:53     ` Qu Wenruo
2018-10-29  7:51       ` Nikolay Borisov
2018-10-29 12:21         ` Nikolay Borisov
2018-10-29 14:39           ` Nikolay Borisov
2018-11-20 19:00 ` Josef Bacik
2018-11-20 21:09   ` Nikolay Borisov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1af84f15-4ef4-0aaf-60a4-f4c3275e08ba@suse.com \
    --to=nborisov@suse.com \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).