From: Su Yue <suy.fnst@cn.fujitsu.com>
To: Marc MERLIN <marc@merlins.org>, Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: <linux-btrfs@vger.kernel.org>
Subject: Re: So, does btrfs check lowmem take days? weeks?
Date: Fri, 29 Jun 2018 14:02:19 +0800 [thread overview]
Message-ID: <02ba7ad4-b618-85f0-a99f-c43b25d367de@cn.fujitsu.com> (raw)
In-Reply-To: <20180629052825.tifg2aw7oy3qyyvw@merlins.org>
On 06/29/2018 01:28 PM, Marc MERLIN wrote:
> On Fri, Jun 29, 2018 at 01:07:20PM +0800, Qu Wenruo wrote:
>>> lowmem repair seems to be going still, but it's been days and -p seems
>>> to do absolutely nothing.
>>
>> I'm a afraid you hit a bug in lowmem repair code.
>> By all means, --repair shouldn't really be used unless you're pretty
>> sure the problem is something btrfs check can handle.
>>
>> That's also why --repair is still marked as dangerous.
>> Especially when it's combined with experimental lowmem mode.
>
> Understood, but btrfs got corrupted (by itself or not, I don't know)
> I cannot mount the filesystem read/write
> I cannot btrfs check --repair it since that code will kill my machine
> What do I have left?
>
>>> My filesystem is "only" 10TB or so, albeit with a lot of files.
>>
>> Unless you have tons of snapshots and reflinked (deduped) files, it
>> shouldn't take so long.
>
> I may have a fair amount.
> gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2
> enabling repair mode
> WARNING: low-memory mode repair support is only partial
> Checking filesystem on /dev/mapper/dshelf2
> UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> Fixed 0 roots.
> ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, owner: 374857, offset: 3407872) wanted: 3, have: 4
> Created new chunk [18457780224000 1073741824]
> Delete backref in extent [84302495744 69632]
> ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, owner: 374857, offset: 3407872) wanted: 3, have: 4
> Delete backref in extent [84302495744 69632]
> ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, owner: 374857, offset: 114540544) wanted: 181, have: 240
> Delete backref in extent [125712527360 12214272]
> ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, owner: 374857, offset: 126754816) wanted: 68, have: 115
> Delete backref in extent [125730848768 5111808]
> ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, owner: 374857, offset: 126754816) wanted: 68, have: 115
> Delete backref in extent [125730848768 5111808]
> ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, owner: 374857, offset: 131866624) wanted: 115, have: 143
> Delete backref in extent [125736914944 6037504]
> ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, owner: 374857, offset: 131866624) wanted: 115, have: 143
> Delete backref in extent [125736914944 6037504]
> ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, owner: 374857, offset: 148234240) wanted: 302, have: 431
> Delete backref in extent [129952120832 20242432]
> ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, owner: 374857, offset: 148234240) wanted: 356, have: 433
> Delete backref in extent [129952120832 20242432]
> ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, owner: 374857, offset: 180371456) wanted: 161, have: 240
> Delete backref in extent [134925357056 11829248]
> ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, owner: 374857, offset: 180371456) wanted: 162, have: 240
> Delete backref in extent [134925357056 11829248]
> ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, owner: 374857, offset: 192200704) wanted: 170, have: 249
> Delete backref in extent [147895111680 12345344]
> ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, owner: 374857, offset: 192200704) wanted: 172, have: 251
> Delete backref in extent [147895111680 12345344]
> ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, owner: 374857, offset: 217653248) wanted: 348, have: 418
> Delete backref in extent [150850146304 17522688]
> ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, owner: 374857, offset: 235175936) wanted: 555, have: 1449
> Deleted root 2 item[156909494272, 178, 5476627808561673095]
> ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, owner: 374857, offset: 235175936) wanted: 556, have: 1452
> Deleted root 2 item[156909494272, 178, 7338474132555182983]
> ERROR: file extent[374857 235184128] root 21872 owner 21872 backref lost
> Add one extent data backref [156909494272 55320576]
> ERROR: file extent[374857 235184128] root 22911 owner 22911 backref lost
> Add one extent data backref [156909494272 55320576]
>
My bad.
It's almost possiblelly a bug about extent of lowmem check which
was reported by Chris too.
The extent check was wrong, the the repair did wrong things.
I have figured out the bug is lowmem check can't deal with shared tree
block in reloc tree. The fix is simple, you can try the follow repo:
https://github.com/Damenly/btrfs-progs/tree/tmp1
Please run lowmem check "without =--repair" first to be sure whether
your filesystem is fine.
Though the bug and phenomenon are clear enough, before sending my patch,
I have to make a test image. I have spent a week to study btrfs balance
but it seems a liitle hard for me.
Thanks,
Su
> The last two ERROR lines took over a day to get generated, so I'm not sure if it's still working, but just slowly.
> For what it's worth non lowmem check used to take 12 to 24H on that filesystem back when it still worked.
>
>>> 2 things that come to mind
>>> 1) can lowmem have some progress working so that I know if I'm looking
>>> at days, weeks, or even months before it will be done?
>>
>> It's hard to estimate, especially when every cross check involves a lot
>> of disk IO.
>> But at least, we could add such indicator to show we're doing something.
>
> Yes, anything to show that I should still wait is still good :)
>
>>> 2) non lowmem is more efficient obviously when it doesn't completely
>>> crash your machine, but could lowmem be given an amount of memory to use
>>> for caching, or maybe use some heuristics based on RAM free so that it's
>>> not so excrutiatingly slow?
>>
>> IIRC recent commit has added the ability.
>> a5ce5d219822 ("btrfs-progs: extent-cache: actually cache extent buffers")
>
> Oh, good.
>
>> That's already included in btrfs-progs v4.13.2.
>> So it should be a dead loop which lowmem repair code can't handle.
>
> I see. Is there any reasonably easy way to check on this running process?
>
> Both top and iotop show that it's working, but of course I can't tell if
> it's looping, or not.
>
> Then again, maybe it already fixed enough that I can mount my filesystem again.
>
> But back to the main point, it's sad that after so many years, the
> repair situation is still so suboptimal, especially when it's apparently
> pretty easy for btrfs to get damaged (through its own fault or not, hard
> to say).
>
> Thanks,
> Marc
>
next prev parent reply other threads:[~2018-06-29 5:56 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-29 4:27 So, does btrfs check lowmem take days? weeks? Marc MERLIN
2018-06-29 5:07 ` Qu Wenruo
2018-06-29 5:28 ` Marc MERLIN
2018-06-29 5:48 ` Qu Wenruo
2018-06-29 6:06 ` Marc MERLIN
2018-06-29 6:29 ` Qu Wenruo
2018-06-29 6:59 ` Marc MERLIN
2018-06-29 7:09 ` Roman Mamedov
2018-06-29 7:22 ` Marc MERLIN
2018-06-29 7:34 ` Roman Mamedov
2018-06-29 8:04 ` Lionel Bouton
2018-06-29 16:24 ` btrfs send/receive vs rsync Marc MERLIN
2018-06-30 8:18 ` Duncan
2018-06-29 7:20 ` So, does btrfs check lowmem take days? weeks? Qu Wenruo
2018-06-29 7:28 ` Marc MERLIN
2018-06-29 17:10 ` Marc MERLIN
2018-06-30 0:04 ` Chris Murphy
2018-06-30 2:44 ` Marc MERLIN
2018-06-30 14:49 ` Qu Wenruo
2018-06-30 21:06 ` Marc MERLIN
2018-06-29 6:02 ` Su Yue [this message]
2018-06-29 6:10 ` Marc MERLIN
2018-06-29 6:32 ` Su Yue
2018-06-29 6:43 ` Marc MERLIN
2018-07-01 23:22 ` Marc MERLIN
2018-07-02 2:02 ` Su Yue
2018-07-02 3:22 ` Marc MERLIN
2018-07-02 6:22 ` Su Yue
2018-07-02 14:05 ` Marc MERLIN
2018-07-02 14:42 ` Qu Wenruo
2018-07-02 15:18 ` how to best segment a big block device in resizeable btrfs filesystems? Marc MERLIN
2018-07-02 16:59 ` Austin S. Hemmelgarn
2018-07-02 17:34 ` Marc MERLIN
2018-07-02 18:35 ` Austin S. Hemmelgarn
2018-07-02 19:40 ` Marc MERLIN
2018-07-03 4:25 ` Andrei Borzenkov
2018-07-03 7:15 ` Duncan
2018-07-06 4:28 ` Andrei Borzenkov
2018-07-08 8:05 ` Duncan
2018-07-03 0:51 ` Paul Jones
2018-07-03 4:06 ` Marc MERLIN
2018-07-03 4:26 ` Paul Jones
2018-07-03 5:42 ` Marc MERLIN
2018-07-03 1:37 ` Qu Wenruo
2018-07-03 4:15 ` Marc MERLIN
2018-07-03 9:55 ` Paul Jones
2018-07-03 11:29 ` Qu Wenruo
2018-07-03 4:23 ` Andrei Borzenkov
2018-07-02 15:19 ` So, does btrfs check lowmem take days? weeks? Marc MERLIN
2018-07-02 17:08 ` Austin S. Hemmelgarn
2018-07-02 17:33 ` Roman Mamedov
2018-07-02 17:39 ` Marc MERLIN
2018-07-03 0:31 ` Chris Murphy
2018-07-03 4:22 ` Marc MERLIN
2018-07-03 8:34 ` Su Yue
2018-07-03 21:34 ` Chris Murphy
2018-07-03 21:40 ` Marc MERLIN
2018-07-04 1:37 ` Su Yue
2018-07-03 8:50 ` Qu Wenruo
2018-07-03 14:38 ` Marc MERLIN
2018-07-03 21:46 ` Chris Murphy
2018-07-03 22:00 ` Marc MERLIN
2018-07-03 22:52 ` Qu Wenruo
2018-06-29 5:35 ` Su Yue
2018-06-29 5:46 ` Marc MERLIN
[not found] <94caf6c5-77e1-3da0-d026-a29edb08d410@cn.fujitsu.com>
[not found] ` <CAKhhfD6svMo=28_UX=ZjRRmF6zNadd3H+8vVZKGX4zjqVr-giw@mail.gmail.com>
[not found] ` <3a83cb3c-de2b-e803-f07e-31f7de0ee25f@cn.fujitsu.com>
[not found] ` <b1b2d361-eb1a-f172-45d3-409abd131d2b@cn.fujitsu.com>
[not found] ` <20180705153023.GA30566@merlins.org>
[not found] ` <trinity-d028b6bd-31d9-41c0-a091-47bcb810cdc3-1530808069711@msvc-mesg-gmx023>
[not found] ` <20180705165049.t56dvqpz7ljjan5c@merlins.org>
[not found] ` <trinity-79578bdf-a849-4342-a082-f2b882f2251e-1530810500266@msvc-mesg-gmx024>
[not found] ` <20180706160523.kxwxjzwneseaamnt@merlins.org>
[not found] ` <20180706175636.53ebp7drifiqu5b7@merlins.org>
[not found] ` <20180707172114.bfc26eoahullffgg@merlins.org>
2018-07-10 1:37 ` Su Yue
2018-07-10 1:34 ` Qu Wenruo
2018-07-10 3:50 ` Marc MERLIN
2018-07-10 4:55 ` Qu Wenruo
2018-07-10 10:44 ` Su Yue
[not found] <f9bc21d6-fdc3-ca3a-793f-6fe574c7b8c6@cn.fujitsu.com>
[not found] ` <20180709031054.qfg4x5yzcl4rao2k@merlins.org>
[not found] ` <20180709031501.iutlokfvodtkkfhe@merlins.org>
[not found] ` <17cc0cc1-b64d-4daa-18b5-bb2da3736ea1@cn.fujitsu.com>
[not found] ` <20180709034058.wjavwjdyixx6smbw@merlins.org>
[not found] ` <29302c14-e277-2c69-ac08-c4722c2b18aa@cn.fujitsu.com>
[not found] ` <20180709155306.zr3p2kolnanvkpny@merlins.org>
[not found] ` <trinity-4aae1c42-a85e-4c73-a30e-8b0d0be05e86-1531152875875@msvc-mesg-gmx023>
[not found] ` <20180709174818.wq2d4awmgasxgwad@merlins.org>
[not found] ` <faba0923-8d1f-5270-ba03-ce9cc484e08a@gmx.com>
2018-07-10 4:00 ` Marc MERLIN
[not found] ` <trinity-4546309e-d603-4d29-885a-e76da594f792-1531159860064@msvc-mesg-gmx021>
[not found] ` <20180709222218.GP9859@merlins.org>
[not found] ` <440b7d12-3504-8b4f-5aa4-b1f39f549730@cn.fujitsu.com>
[not found] ` <20180710041037.4ynitx3flubtwtvc@merlins.org>
[not found] ` <58b36f04-3094-7de0-8d5e-e06e280aac00@cn.fujitsu.com>
2018-07-11 1:08 ` Su Yue
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=02ba7ad4-b618-85f0-a99f-c43b25d367de@cn.fujitsu.com \
--to=suy.fnst@cn.fujitsu.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=marc@merlins.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).