From: Dmitrii Tcvetkov <demfloro@demfloro.ru>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: 4.17-rc1 FS went read-only during balance
Date: Mon, 23 Apr 2018 11:40:16 +0300 [thread overview]
Message-ID: <20180423114016.3cdc0ac1@job> (raw)
In-Reply-To: <d935c0e0-c2c8-a0ec-bb07-5c3879dd1be0@gmx.com>
[-- Attachment #1: Type: text/plain, Size: 4125 bytes --]
> >>>>> TL;DR It seems as regression in 4.17, but I managed to find a
> >>>>> workaround to make filesystem rw mountable again.
> >>>>>
> >>>>> Kernel built from tag v4.17-rc1
> >>>>> btrfs-progs 4.16
> >>>>>
> >>>>> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were
> >>>>> doing usual weekly balance with this command via cron:
> >>>>> btrfs balance start -musage=50 -dusage=50 <mountpoint>
> >>>>> Both machines run same kernel version.
> >>>>>
> >>>>> On PC that caused root and "data" filesystems to go readonly. Root
> >>>>> is on an SSD with data single and metadata DUP, "data" filesystem
> >>>>> is on 2 HDDs with RAID1 for data and metadata.
> >>>>>
> >>>>> On laptop only /home went ro, it's on NVMe SSD with data single and
> >>>>> metadata DUP.
> >>>>>
> >>>>> Btrfs check of PC rootfs was without any errors in both modes, I did
> >>>>> them once each before reboot on readonly filesystem with --force
> >>>>> flag and then from live usb. Same output without any errors.
> >>>>>
> >>>>> After reboot kernel refused rw mount rootfs with the same error as
> >>>>> during cron balance, ro mount was accepted, error during rw mount:
> >>>>> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117
> >>>
> >>>> 117 means EUCLEAN, which could be caused by the newly introduced
> >>>> first_key and level check.
> >>>
> >>>> Please apply this hotfix to fix it.
> >>>> btrfs: Only check first key for committed tree blocks
> >>>> (Which is included in latest pull request)
> >>>
> >>>> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra
> >>>> debug info.
> >>>
> >>>> Thanks,
> >>>> Qu
> >>>
> >>> I tried 4.17-rc2 (as the pull request was pulled) with
> >>> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb)
> >>> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg
> >>> attached.
> >>
> >> Thanks for the info and your previous btrfs-image.
> >>
> >> The image itself shows nothing wrong, so it should be runtime problem.
> >> Would you please apply these two debug patches?
> >> https://patchwork.kernel.org/patch/10335133/
> >> https://patchwork.kernel.org/patch/10335135/
> >>
> >> And the attached diff file?
> >>
> >> My guess is the parent node is not initialized correctly in this case.
> >>
> >> Thanks,
> >> Qu
> >
> > Dmesg from kernel with all three patches applied attached.
> >
> Thanks for the debug info, it really helps a lot!
>
> It turns out that I'm just a super idiot, a typo in replace_path()
> caused this, and it could not be trigger unless we enter it from
> relocation recovery.
>
> Please try the attached patch to see if it solves the problem.
>
> Thanks,
> Qu
Glad to help, the patch solved the problem,
rw mount is successful and balance finished, no errors or debug output,
btrfs check is clean in both modes.
[ 2.842718] BTRFS: device label home devid 1 transid 277952 /dev/vdb
[ 2.924965] BTRFS: device label root devid 1 transid 84092 /dev/vda2
[ 3.072271] BTRFS info (device vda2): use lzo compression, level 0
[ 3.072897] BTRFS info (device vda2): enabling auto defrag
[ 3.073476] BTRFS info (device vda2): using free space tree
[ 3.074049] BTRFS info (device vda2): has skinny extents
[ 5.411821] BTRFS info (device vda2): using free space tree
[ 24.925293] BTRFS info (device vdb): using free space tree
[ 24.925324] BTRFS info (device vdb): has skinny extents
[ 31.711868] BTRFS info (device vdb): continuing balance
[ 31.721658] BTRFS info (device vdb): checking UUID tree
[ 31.822920] BTRFS info (device vdb): relocating block group 69889687552flags data
[ 33.730399] BTRFS info (device vdb): found 12 extents
[ 36.950699] BTRFS info (device vdb): found 12 extents
[ 37.030813] BTRFS info (device vdb): relocating block group 67742203904flags metadata|dup
[ 37.104174] BTRFS info (device vdb): relocating block group 67708649472 flags system|dup
[ 37.189843] BTRFS info (device vdb): found 1 extents
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
prev parent reply other threads:[~2018-04-23 8:40 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-21 14:55 4.17-rc1 FS went read-only during balance Dmitrii Tcvetkov
2018-04-22 8:12 ` Dmitrii Tcvetkov
2018-04-23 1:23 ` Qu Wenruo
[not found] ` <20180423080745.5a9dc6be@demfloro.ru>
2018-04-23 6:13 ` Qu Wenruo
[not found] ` <20180423105543.43f13e3a@job>
2018-04-23 8:23 ` Qu Wenruo
2018-04-23 8:40 ` Dmitrii Tcvetkov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180423114016.3cdc0ac1@job \
--to=demfloro@demfloro.ru \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox