From: Marc Joliet <marcec@gmx.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: system hangs due to qgroups
Date: Mon, 05 Dec 2016 12:01:28 +0100 [thread overview]
Message-ID: <29619565.vvZbx4DoIQ@thetick> (raw)
In-Reply-To: <5f35cc2d-4c3d-b14c-01f6-4bfb0f22823b@cn.fujitsu.com>
[-- Attachment #1: Type: text/plain, Size: 7010 bytes --]
On Monday 05 December 2016 08:39:02 Qu Wenruo wrote:
> At 12/04/2016 02:40 AM, Marc Joliet wrote:
> > Hello all,
> >
> > I'm having some trouble with btrfs on a laptop, possibly due to qgroups.
> > Specifically, some file system activities (e.g., snapshot creation,
> > baloo_file_extractor from KDE Plasma) cause the system to hang for up to
> > about 40 minutes, maybe more. It always causes (most of) my desktop to
> > hang, (although I can usually navigate between pre-existing Konsole tabs)
> > and prevents new programs from starting. I've seen the system load go up
> > to >30 before the laptop suddenly resumes normal operation. I've been
> > seeing this since Linux 4.7, maybe already 4.6.
>
> Qgroup is CPU intensive operation.
>
> The main problem is the design of btrfs extent tree, which bias towards
> snapshot creating speed, but quite complicated if used for tracing all
> referencer (which qgroup heavily relies on it).
>
>
> The main factor affecting qgroup speed, is how many shared extents are
> in the fs.
> This including reflinked files and snapshot, under most case snapshot is
> the main part.
>
> Unless we find a better solution, to keep both qgroup accurate and fast,
> I'd recommend to keep qgroup under a reasonable number.
> (Personally speaking, 10 would be good)
>
> Despite the qgroup, relocation(balancing) should also be affected by the
> number of shared extents.
OK
> > Now, I thought that maybe this was (indirectly) due to an overly full file
> > system (~90% full), so I deleted some things I didn't need to get it up to
> > 15% free. (For the record, I also tried mounting with ssd_spread.)
> > After that, I ran a balance with -dusage=50, which started out promising,
> > but then went back to the "bad" behaviour. *But* it seemed better than
> > before overall, so I started a balance with -musage=10, then -musage=50.
> > That turned out to be a mistake. Since I had to transport the laptop,
> > and couldn't wait for "balance cancel" to return (IIUC it only returns
> > after the next block (group?) is freed), I forced the laptop off.
> >
> > After I next turned on the laptop, the balance resumed, causing bootup to
> > fail, after which I remembered about the skip_balance mount option, which
> > I
> > tried in a rescue shell from an initramfs. But wait, that failed, too!
> > Specifically, the stack trace I get whenever I try it includes as one of
> > the last lines:
> >
> > "RIP [<ffffffff8131226f>] qgroup_fix_relocated_data_extents+0x1f/0x2a8"
>
> This seems to be a NULL pointer bug in qgroup relocation fix.
>
> The latest fix (not merged yet) should address it.
>
> You could try the for-next-20161125 branch from David to fix it:
> https://github.com/kdave/btrfs-devel/tree/for-next-20161125
OK, I'll try that, thanks! I just have to wait for it to finish cloning...
> > (I can take photos of the full stack trace if requested.)
> >
> > So then I ran "btrfs qgroup show /sysroot/", which showed many quota
> > groups, much to my surprise. On the upside, at least now I discovered
> > the likely reason for the performance problems.
>
> So, the number of qgroups is the cause for the slowness.
OK
> > (I actually think I know why I'm seeing qgroups: at one point I was trying
> > out various snapshot/backup tools for btrfs, and one (I forgot which)
> > unconditionally activated quota support, which infuriated me, but I
> > promptly deactivated it, or so I thought. Is quota support automatically
> > enabled when qgroups are discovered, or did I perhaps not disable quota
> > support properly?)
> Qgroup will always be enabled after "btrfs quota enable", and until
> "btrfs quota disable" to disable it.
>
> No method to temporarily disable quota, since quota must trace any
> modification, or qgroup number will be out of true.
>
> So, one should manually disable quota.
> (And that's the backup tool to blame, it should either info user or
> disable qgroup on uninstallation)
Hmm, I must not be remembering the whole story then, because I was pretty sure
that I ran "quota disable" and verified that quotas were off, too, but then
again, it's been quite a while now (a year?) since it happened.
> > Since I couldn't use skip_balance, and logically can't destroy qgroups on
> > a
> > read-only file system, I decided to wait for a regular mount to finish.
> > That has been running since Tuesday, and I am slowly growing impatient.
> >
> > Thus I arrive at my question(s): is there anything else I can try, short
> > of
> > reformatting and restoring from backup? Can I use btrfs-check here, or
> > any
> > other tool? Or...?
> >
> > Also, should I be able to avoid reformatting: how do I properly disable
> > quota support?
>
> "btrfs quota disable <mnt>", yes you need RW mount.
> Any RW mountable snapshot/subvolume is OK.
OK
> > (BTW, searching for qgroup_fix_relocated_data_extents turned up the ML
> > thread "[PATCH] Btrfs: fix endless loop in balancing block groups", could
> > that be related?)
>
> Nope, the actual fixing patches are:
> [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works
> [PATCH 2/4] btrfs: qgroup: Rename functions to make it follow
> reserve,trace,account steps
> [PATCH 3/4] btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c
> [PATCH 4/4] btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
>
>
> The 4th patch is the real working one, but relies on previous 3 to apply.
>
> The regression is also caused by my patch:
> [PATCH v3.1 2/3] btrfs: relocation: Fix leaking qgroups numbers on data
> extents
>
> Sorry for the trouble.
No problem, I just wish I would've thought to check for qgroups before getting
into this mess.
Although I'm actually *relieved* that it's qgroups, because before that I was
worried that I had finally hit a nigh-show-stopping bug. I thought that I was
merely not seeing it on my other systems, but that it could happen at any
time. Now I'm more confident in the stability of my systems again :) .
> And for your recovery, I'd suggest to install an Archlinux into a USB
> HDD or USB stick, and compile David's branch and install it into the USB
> HDD.
>
> Then use the USB storage as rescue tool to mount the fs, which should do
> RW mount with or without skip_balance mount option.
> So you could disable quota then.
OK, I'll try that, thanks!
> Thanks,
> Qu
>
> > The laptop is currently running Gentoo with Linux 4.8.10 and btrfs-progs
> > 4.8.4.
> >
> > Greetings
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Greetings
--
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
next prev parent reply other threads:[~2016-12-05 11:01 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-03 18:40 system hangs due to qgroups Marc Joliet
2016-12-03 20:42 ` Chris Murphy
2016-12-03 21:46 ` Marc Joliet
2016-12-03 22:56 ` Chris Murphy
2016-12-04 16:02 ` Marc Joliet
2016-12-04 18:24 ` Duncan
2016-12-04 19:20 ` Marc Joliet
2016-12-05 2:32 ` Duncan
2016-12-04 18:52 ` Chris Murphy
2016-12-05 9:00 ` Marc Joliet
2016-12-05 10:16 ` Marc Joliet
2016-12-05 23:22 ` Marc Joliet
2016-12-19 11:17 ` Marc Joliet
2016-12-04 2:10 ` Adam Borowski
2016-12-04 16:02 ` Marc Joliet
2016-12-05 0:39 ` Qu Wenruo
2016-12-05 11:01 ` Marc Joliet [this message]
2016-12-05 12:10 ` Marc Joliet
2016-12-05 14:43 ` [SOLVED] " Marc Joliet
2016-12-06 0:29 ` Qu Wenruo
2016-12-06 10:12 ` Marc Joliet
2016-12-06 14:55 ` Marc Joliet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=29619565.vvZbx4DoIQ@thetick \
--to=marcec@gmx.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox