public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Hans van Kranenburg <hans@knorrie.org>, linux-btrfs@vger.kernel.org
Subject: Re: Balance + Ctrl-C = forced readonly
Date: Mon, 6 Jul 2020 14:15:43 +0800	[thread overview]
Message-ID: <fc428d50-2853-1be7-4764-e643f59faca5@gmx.com> (raw)
In-Reply-To: <93419615-8f34-efc4-f50e-eac1151f0f37@knorrie.org>


[-- Attachment #1.1: Type: text/plain, Size: 6622 bytes --]



On 2020/7/5 下午10:53, Hans van Kranenburg wrote:
> On 7/5/20 3:13 PM, Qu Wenruo wrote:
>>
>>
>> On 2020/7/5 下午8:49, Hans van Kranenburg wrote:
>>> Hi,
>>>
>>> This is Linux kernel 5.7.6 (the Debian package, 5.7.6-1).
>>>
>>> So, I wanted to try out this new quicker balance interrupt thing, and
>>> the result was that I could crash the fs at my very first try using it,
>>> which was simply doing balance, and then pressing Ctrl-C.
>>>
>>> Recipe to reproduce: Start balance, wait a few seconds, then press
>>> Ctrl-C. For me here, ~ 5 out of 10 times, it ends up exploding:
>>>
>>> -# btrfs balance start --full /btrfs/
>>> ^C
>>>
>>> [41190.572977] BTRFS info (device xvdb): balance: start -d -m -s
>>> [41190.573035] BTRFS info (device xvdb): relocating block group
>>> 73001861120 flags metadata
>>> [41205.409600] BTRFS info (device xvdb): found 12236 extents, stage:
>>> move data extents
>>> [41205.509316] BTRFS info (device xvdb): relocating block group
>>> 71928119296 flags data
>>> [41205.695319] BTRFS info (device xvdb): found 3 extents, stage: move
>>> data extents
>>> [41205.723009] BTRFS info (device xvdb): found 3 extents, stage: update
>>> data pointers
>>> [41205.750590] BTRFS info (device xvdb): relocating block group
>>> 60922265600 flags metadata
>>> [41208.183424] BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505:
>>> errno=-4 unknown
>>
>> -4 means -EINTR.
> 
> From extent-tree.c:
> 
>   5495         /*
>   5496          * So if we need to stop dropping the snapshot for
> whatever reason we
>   5497          * need to make sure to add it back to the dead root list
> so that we
>   5498          * keep trying to do the work later.  This also cleans up
> roots if we
>   5499          * don't have it in the radix (like when we recover after
> a power fail
>   5500          * or unmount) so we don't leak memory.
>   5501          */
>   5502         if (!for_reloc && !root_dropped)
>   5503                 btrfs_add_dead_root(root);
>   5504         if (err && err != -EAGAIN)
>   5505                 btrfs_handle_fs_error(fs_info, err, NULL);
>   5506         return err;
>   5507 }
> 
>> It means during btrfs balance, signal could interrupt code running in
>> kernel space??!!
> 
> What a wonderful world.
> 
> In the cases where the fs does not crash, it displays e.g.:
> 
> [ 1749.607057] BTRFS info (device xvdb): balance: start -d -m -s
> [ 1749.607154] BTRFS info (device xvdb): relocating block group
> 69780635648 flags data
> [ 1749.732598] BTRFS info (device xvdb): found 3 extents, stage: move
> data extents
> [ 1750.087368] BTRFS info (device xvdb): found 3 extents, stage: update
> data pointers
> [ 1750.109675] BTRFS info (device xvdb): relocating block group
> 60922265600 flags metadata
> [ 1758.021840] BTRFS info (device xvdb): balance: ended with status: -4
> 
> ...and it fairly quickly after pressing Ctrl-C exits 130 because SIGINT.
> (128+2)

I could get this reproduced now, with more filled fs.

Although I haven't yet reproduced the abort transaction, it should
already be a valid bug.

As at this case, next balance run can cause a kernel warning due to the
reloc tree not yet cleaned up.

This really exposed a new set of problems.

Thanks for the report, now it's time to debug it.

Thanks,
Qu

> 
> But when it goes wrong, then in between pressing Ctrl-C and the forced
> readonly happening, the balance in kernel continues for some time (this
> can be even multiple next block groups), until it hits the code path
> seen above (in btrfs_drop_snapshot), and it's *always* at that line.
> 
> So, it seems that depending on what part of the kernel code is running
> when the signal is sent, it's queued for being processed in that
> (different) part of the running code?
> 
>> I thought when we fall into the balance ioctl, we're unable to
>> receive/handle signal, as we are in the kernel space, while signal
>> handling are all handled in user space.
> 
> System calls can be interrupted from user space, e.g. a large read that
> goes to slow.
> 
> Previously, ^C on the btrfs balance execution would exit when the
> current block group in progress was ended. So, in that case the signal
> would also be picked up somewhere in the kernel.
> 
>> Or is there some config or out-of-tree patches make it possible? Is this
>> specific to Debian kernels?
>> At least I tried several times with upstream kernel, unable to reproduce
>> it yet (maybe my fs is too small?)
> 
> So, it at least seems to depends on the moment when Ctrl-C is pressed.
> 
> This is a two-disk fs, where I reflinked a single file many tens of
> thousands of time to generate quite some metadata. You might have to
> need some more data or metadata to have enough change to hit Ctrl-C at
> the right time, but I can only make guesses about that now.
> 
> -# btrfs fi show /btrfs/
> Label: none  uuid: 4771ea11-6ec6-4c00-a5f5-58acb3233659
> 	Total devices 2 FS bytes used 5.76GiB
> 	devid    1 size 10.00GiB used 3.50GiB path /dev/xvdb
> 	devid    2 size 10.00GiB used 3.53GiB path /dev/xvdc
> 
> -# btrfs-search-metadata block_groups /btrfs
> block group vaddr 78370570240 length 1073741824 flags DATA used
> 1072177152 used_pct 100
> block group vaddr 79444312064 length 268435456 flags METADATA used
> 219824128 used_pct 82
> block group vaddr 79712747520 length 33554432 flags SYSTEM used 16384
> used_pct 0
> block group vaddr 79746301952 length 1073741824 flags DATA used
> 1071206400 used_pct 100
> block group vaddr 80820043776 length 268435456 flags METADATA used
> 214712320 used_pct 80
> block group vaddr 81088479232 length 1073741824 flags DATA used
> 1073045504 used_pct 100
> block group vaddr 82162221056 length 268435456 flags METADATA used
> 262979584 used_pct 98
> block group vaddr 85920317440 length 1073741824 flags DATA used
> 1069948928 used_pct 100
> block group vaddr 86994059264 length 1073741824 flags DATA used 15978496
> used_pct 1
> block group vaddr 90349502464 length 1073741824 flags DATA used
> 1073246208 used_pct 100
> block group vaddr 91423244288 length 268435456 flags METADATA used
> 109608960 used_pct 41
> 
>> If it's config related, then we must re-consider a lot of error handling.
> 
> I don't know, but I don't think so.
> 
>>
>> Thanks,
>> Qu
>>> [41208.183450] BTRFS info (device xvdb): forced readonly
>>> [41208.183469] BTRFS info (device xvdb): balance: ended with status: -4
>>>
>>> Boom, readonly FS.
>>>
>>> Hans
>>>
>>
> 
> Hans
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-07-06  6:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-05 12:49 Balance + Ctrl-C = forced readonly Hans van Kranenburg
2020-07-05 13:13 ` Qu Wenruo
2020-07-05 14:53   ` Hans van Kranenburg
2020-07-06  6:15     ` Qu Wenruo [this message]
2020-07-06  7:48       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc428d50-2853-1be7-4764-e643f59faca5@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=hans@knorrie.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox