linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Su Yue <suy.fnst@cn.fujitsu.com>
To: Marc MERLIN <marc@merlins.org>, Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: <linux-btrfs@vger.kernel.org>
Subject: Re: So, does btrfs check lowmem take days? weeks?
Date: Fri, 29 Jun 2018 14:02:19 +0800	[thread overview]
Message-ID: <02ba7ad4-b618-85f0-a99f-c43b25d367de@cn.fujitsu.com> (raw)
In-Reply-To: <20180629052825.tifg2aw7oy3qyyvw@merlins.org>



On 06/29/2018 01:28 PM, Marc MERLIN wrote:
> On Fri, Jun 29, 2018 at 01:07:20PM +0800, Qu Wenruo wrote:
>>> lowmem repair seems to be going still, but it's been days and -p seems
>>> to do absolutely nothing.
>>
>> I'm a afraid you hit a bug in lowmem repair code.
>> By all means, --repair shouldn't really be used unless you're pretty
>> sure the problem is something btrfs check can handle.
>>
>> That's also why --repair is still marked as dangerous.
>> Especially when it's combined with experimental lowmem mode.
> 
> Understood, but btrfs got corrupted (by itself or not, I don't know)
> I cannot mount the filesystem read/write
> I cannot btrfs check --repair it since that code will kill my machine
> What do I have left?
> 
>>> My filesystem is "only" 10TB or so, albeit with a lot of files.
>>
>> Unless you have tons of snapshots and reflinked (deduped) files, it
>> shouldn't take so long.
> 
> I may have a fair amount.
> gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2
> enabling repair mode
> WARNING: low-memory mode repair support is only partial
> Checking filesystem on /dev/mapper/dshelf2
> UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> Fixed 0 roots.
> ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, owner: 374857, offset: 3407872) wanted: 3, have: 4
> Created new chunk [18457780224000 1073741824]
> Delete backref in extent [84302495744 69632]
> ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, owner: 374857, offset: 3407872) wanted: 3, have: 4
> Delete backref in extent [84302495744 69632]
> ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, owner: 374857, offset: 114540544) wanted: 181, have: 240
> Delete backref in extent [125712527360 12214272]
> ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, owner: 374857, offset: 126754816) wanted: 68, have: 115
> Delete backref in extent [125730848768 5111808]
> ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, owner: 374857, offset: 126754816) wanted: 68, have: 115
> Delete backref in extent [125730848768 5111808]
> ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, owner: 374857, offset: 131866624) wanted: 115, have: 143
> Delete backref in extent [125736914944 6037504]
> ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, owner: 374857, offset: 131866624) wanted: 115, have: 143
> Delete backref in extent [125736914944 6037504]
> ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, owner: 374857, offset: 148234240) wanted: 302, have: 431
> Delete backref in extent [129952120832 20242432]
> ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, owner: 374857, offset: 148234240) wanted: 356, have: 433
> Delete backref in extent [129952120832 20242432]
> ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, owner: 374857, offset: 180371456) wanted: 161, have: 240
> Delete backref in extent [134925357056 11829248]
> ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, owner: 374857, offset: 180371456) wanted: 162, have: 240
> Delete backref in extent [134925357056 11829248]
> ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, owner: 374857, offset: 192200704) wanted: 170, have: 249
> Delete backref in extent [147895111680 12345344]
> ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, owner: 374857, offset: 192200704) wanted: 172, have: 251
> Delete backref in extent [147895111680 12345344]
> ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, owner: 374857, offset: 217653248) wanted: 348, have: 418
> Delete backref in extent [150850146304 17522688]
> ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, owner: 374857, offset: 235175936) wanted: 555, have: 1449
> Deleted root 2 item[156909494272, 178, 5476627808561673095]
> ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, owner: 374857, offset: 235175936) wanted: 556, have: 1452
> Deleted root 2 item[156909494272, 178, 7338474132555182983]
> ERROR: file extent[374857 235184128] root 21872 owner 21872 backref lost
> Add one extent data backref [156909494272 55320576]
> ERROR: file extent[374857 235184128] root 22911 owner 22911 backref lost
> Add one extent data backref [156909494272 55320576]
> 
My bad.
It's almost possiblelly a bug about extent of lowmem check which
was reported by Chris too.
The extent check was wrong, the the repair did wrong things.

I have figured out the bug is lowmem check can't deal with shared tree 
block in reloc tree. The fix is simple, you can try the follow repo:

https://github.com/Damenly/btrfs-progs/tree/tmp1

Please run lowmem check "without =--repair" first to be sure whether
your filesystem is fine.

Though the bug and phenomenon are clear enough, before sending my patch,
I have to make a test image. I have spent a week to study btrfs balance
but it seems a liitle hard for me.

Thanks,
Su

> The last two ERROR lines took over a day to get generated, so I'm not sure if it's still working, but just slowly.
> For what it's worth non lowmem check used to take 12 to 24H on that filesystem back when it still worked.
> 
>>> 2 things that come to mind
>>> 1) can lowmem have some progress working so that I know if I'm looking
>>> at days, weeks, or even months before it will be done?
>>
>> It's hard to estimate, especially when every cross check involves a lot
>> of disk IO.
>> But at least, we could add such indicator to show we're doing something.
> 
> Yes, anything to show that I should still wait is still good :)
> 
>>> 2) non lowmem is more efficient obviously when it doesn't completely
>>> crash your machine, but could lowmem be given an amount of memory to use
>>> for caching, or maybe use some heuristics based on RAM free so that it's
>>> not so excrutiatingly slow?
>>
>> IIRC recent commit has added the ability.
>> a5ce5d219822 ("btrfs-progs: extent-cache: actually cache extent buffers")
>   
> Oh, good.
> 
>> That's already included in btrfs-progs v4.13.2.
>> So it should be a dead loop which lowmem repair code can't handle.
> 
> I see. Is there any reasonably easy way to check on this running process?
> 
> Both top and iotop show that it's working, but of course I can't tell if
> it's looping, or not.
> 
> Then again, maybe it already fixed enough that I can mount my filesystem again.
> 
> But back to the main point, it's sad that after so many years, the
> repair situation is still so suboptimal, especially when it's apparently
> pretty easy for btrfs to get damaged (through its own fault or not, hard
> to say).
> 
> Thanks,
> Marc
> 



  parent reply	other threads:[~2018-06-29  5:56 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29  4:27 So, does btrfs check lowmem take days? weeks? Marc MERLIN
2018-06-29  5:07 ` Qu Wenruo
2018-06-29  5:28   ` Marc MERLIN
2018-06-29  5:48     ` Qu Wenruo
2018-06-29  6:06       ` Marc MERLIN
2018-06-29  6:29         ` Qu Wenruo
2018-06-29  6:59           ` Marc MERLIN
2018-06-29  7:09             ` Roman Mamedov
2018-06-29  7:22               ` Marc MERLIN
2018-06-29  7:34                 ` Roman Mamedov
2018-06-29  8:04                 ` Lionel Bouton
2018-06-29 16:24                   ` btrfs send/receive vs rsync Marc MERLIN
2018-06-30  8:18                     ` Duncan
2018-06-29  7:20             ` So, does btrfs check lowmem take days? weeks? Qu Wenruo
2018-06-29  7:28               ` Marc MERLIN
2018-06-29 17:10                 ` Marc MERLIN
2018-06-30  0:04                   ` Chris Murphy
2018-06-30  2:44                   ` Marc MERLIN
2018-06-30 14:49                     ` Qu Wenruo
2018-06-30 21:06                       ` Marc MERLIN
2018-06-29  6:02     ` Su Yue [this message]
2018-06-29  6:10       ` Marc MERLIN
2018-06-29  6:32         ` Su Yue
2018-06-29  6:43           ` Marc MERLIN
2018-07-01 23:22             ` Marc MERLIN
2018-07-02  2:02               ` Su Yue
2018-07-02  3:22                 ` Marc MERLIN
2018-07-02  6:22                   ` Su Yue
2018-07-02 14:05                     ` Marc MERLIN
2018-07-02 14:42                       ` Qu Wenruo
2018-07-02 15:18                         ` how to best segment a big block device in resizeable btrfs filesystems? Marc MERLIN
2018-07-02 16:59                           ` Austin S. Hemmelgarn
2018-07-02 17:34                             ` Marc MERLIN
2018-07-02 18:35                               ` Austin S. Hemmelgarn
2018-07-02 19:40                                 ` Marc MERLIN
2018-07-03  4:25                                 ` Andrei Borzenkov
2018-07-03  7:15                                   ` Duncan
2018-07-06  4:28                                     ` Andrei Borzenkov
2018-07-08  8:05                                       ` Duncan
2018-07-03  0:51                           ` Paul Jones
2018-07-03  4:06                             ` Marc MERLIN
2018-07-03  4:26                               ` Paul Jones
2018-07-03  5:42                                 ` Marc MERLIN
2018-07-03  1:37                           ` Qu Wenruo
2018-07-03  4:15                             ` Marc MERLIN
2018-07-03  9:55                               ` Paul Jones
2018-07-03 11:29                                 ` Qu Wenruo
2018-07-03  4:23                             ` Andrei Borzenkov
2018-07-02 15:19                         ` So, does btrfs check lowmem take days? weeks? Marc MERLIN
2018-07-02 17:08                           ` Austin S. Hemmelgarn
2018-07-02 17:33                           ` Roman Mamedov
2018-07-02 17:39                             ` Marc MERLIN
2018-07-03  0:31                         ` Chris Murphy
2018-07-03  4:22                           ` Marc MERLIN
2018-07-03  8:34                             ` Su Yue
2018-07-03 21:34                               ` Chris Murphy
2018-07-03 21:40                                 ` Marc MERLIN
2018-07-04  1:37                                   ` Su Yue
2018-07-03  8:50                             ` Qu Wenruo
2018-07-03 14:38                               ` Marc MERLIN
2018-07-03 21:46                               ` Chris Murphy
2018-07-03 22:00                                 ` Marc MERLIN
2018-07-03 22:52                                   ` Qu Wenruo
2018-06-29  5:35   ` Su Yue
2018-06-29  5:46     ` Marc MERLIN
     [not found] <94caf6c5-77e1-3da0-d026-a29edb08d410@cn.fujitsu.com>
     [not found] ` <CAKhhfD6svMo=28_UX=ZjRRmF6zNadd3H+8vVZKGX4zjqVr-giw@mail.gmail.com>
     [not found]   ` <3a83cb3c-de2b-e803-f07e-31f7de0ee25f@cn.fujitsu.com>
     [not found]     ` <b1b2d361-eb1a-f172-45d3-409abd131d2b@cn.fujitsu.com>
     [not found]       ` <20180705153023.GA30566@merlins.org>
     [not found]         ` <trinity-d028b6bd-31d9-41c0-a091-47bcb810cdc3-1530808069711@msvc-mesg-gmx023>
     [not found]           ` <20180705165049.t56dvqpz7ljjan5c@merlins.org>
     [not found]             ` <trinity-79578bdf-a849-4342-a082-f2b882f2251e-1530810500266@msvc-mesg-gmx024>
     [not found]               ` <20180706160523.kxwxjzwneseaamnt@merlins.org>
     [not found]                 ` <20180706175636.53ebp7drifiqu5b7@merlins.org>
     [not found]                   ` <20180707172114.bfc26eoahullffgg@merlins.org>
2018-07-10  1:37                     ` Su Yue
2018-07-10  1:34                       ` Qu Wenruo
2018-07-10  3:50                         ` Marc MERLIN
2018-07-10  4:55                           ` Qu Wenruo
2018-07-10 10:44                             ` Su Yue
     [not found] <f9bc21d6-fdc3-ca3a-793f-6fe574c7b8c6@cn.fujitsu.com>
     [not found] ` <20180709031054.qfg4x5yzcl4rao2k@merlins.org>
     [not found]   ` <20180709031501.iutlokfvodtkkfhe@merlins.org>
     [not found]     ` <17cc0cc1-b64d-4daa-18b5-bb2da3736ea1@cn.fujitsu.com>
     [not found]       ` <20180709034058.wjavwjdyixx6smbw@merlins.org>
     [not found]         ` <29302c14-e277-2c69-ac08-c4722c2b18aa@cn.fujitsu.com>
     [not found]           ` <20180709155306.zr3p2kolnanvkpny@merlins.org>
     [not found]             ` <trinity-4aae1c42-a85e-4c73-a30e-8b0d0be05e86-1531152875875@msvc-mesg-gmx023>
     [not found]               ` <20180709174818.wq2d4awmgasxgwad@merlins.org>
     [not found]                 ` <faba0923-8d1f-5270-ba03-ce9cc484e08a@gmx.com>
2018-07-10  4:00                   ` Marc MERLIN
     [not found]                 ` <trinity-4546309e-d603-4d29-885a-e76da594f792-1531159860064@msvc-mesg-gmx021>
     [not found]                   ` <20180709222218.GP9859@merlins.org>
     [not found]                     ` <440b7d12-3504-8b4f-5aa4-b1f39f549730@cn.fujitsu.com>
     [not found]                       ` <20180710041037.4ynitx3flubtwtvc@merlins.org>
     [not found]                         ` <58b36f04-3094-7de0-8d5e-e06e280aac00@cn.fujitsu.com>
2018-07-11  1:08                           ` Su Yue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02ba7ad4-b618-85f0-a99f-c43b25d367de@cn.fujitsu.com \
    --to=suy.fnst@cn.fujitsu.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=marc@merlins.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).