From: Marc Joliet <marcec@gmx.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: system hangs due to qgroups
Date: Mon, 05 Dec 2016 12:01:28 +0100 [thread overview]
Message-ID: <29619565.vvZbx4DoIQ@thetick> (raw)
In-Reply-To: <5f35cc2d-4c3d-b14c-01f6-4bfb0f22823b@cn.fujitsu.com>
[-- Attachment #1: Type: text/plain, Size: 7010 bytes --]
On Monday 05 December 2016 08:39:02 Qu Wenruo wrote:
> At 12/04/2016 02:40 AM, Marc Joliet wrote:
> > Hello all,
> >
> > I'm having some trouble with btrfs on a laptop, possibly due to qgroups.
> > Specifically, some file system activities (e.g., snapshot creation,
> > baloo_file_extractor from KDE Plasma) cause the system to hang for up to
> > about 40 minutes, maybe more. It always causes (most of) my desktop to
> > hang, (although I can usually navigate between pre-existing Konsole tabs)
> > and prevents new programs from starting. I've seen the system load go up
> > to >30 before the laptop suddenly resumes normal operation. I've been
> > seeing this since Linux 4.7, maybe already 4.6.
>
> Qgroup is CPU intensive operation.
>
> The main problem is the design of btrfs extent tree, which bias towards
> snapshot creating speed, but quite complicated if used for tracing all
> referencer (which qgroup heavily relies on it).
>
>
> The main factor affecting qgroup speed, is how many shared extents are
> in the fs.
> This including reflinked files and snapshot, under most case snapshot is
> the main part.
>
> Unless we find a better solution, to keep both qgroup accurate and fast,
> I'd recommend to keep qgroup under a reasonable number.
> (Personally speaking, 10 would be good)
>
> Despite the qgroup, relocation(balancing) should also be affected by the
> number of shared extents.
OK
> > Now, I thought that maybe this was (indirectly) due to an overly full file
> > system (~90% full), so I deleted some things I didn't need to get it up to
> > 15% free. (For the record, I also tried mounting with ssd_spread.)
> > After that, I ran a balance with -dusage=50, which started out promising,
> > but then went back to the "bad" behaviour. *But* it seemed better than
> > before overall, so I started a balance with -musage=10, then -musage=50.
> > That turned out to be a mistake. Since I had to transport the laptop,
> > and couldn't wait for "balance cancel" to return (IIUC it only returns
> > after the next block (group?) is freed), I forced the laptop off.
> >
> > After I next turned on the laptop, the balance resumed, causing bootup to
> > fail, after which I remembered about the skip_balance mount option, which
> > I
> > tried in a rescue shell from an initramfs. But wait, that failed, too!
> > Specifically, the stack trace I get whenever I try it includes as one of
> > the last lines:
> >
> > "RIP [<ffffffff8131226f>] qgroup_fix_relocated_data_extents+0x1f/0x2a8"
>
> This seems to be a NULL pointer bug in qgroup relocation fix.
>
> The latest fix (not merged yet) should address it.
>
> You could try the for-next-20161125 branch from David to fix it:
> https://github.com/kdave/btrfs-devel/tree/for-next-20161125
OK, I'll try that, thanks! I just have to wait for it to finish cloning...
> > (I can take photos of the full stack trace if requested.)
> >
> > So then I ran "btrfs qgroup show /sysroot/", which showed many quota
> > groups, much to my surprise. On the upside, at least now I discovered
> > the likely reason for the performance problems.
>
> So, the number of qgroups is the cause for the slowness.
OK
> > (I actually think I know why I'm seeing qgroups: at one point I was trying
> > out various snapshot/backup tools for btrfs, and one (I forgot which)
> > unconditionally activated quota support, which infuriated me, but I
> > promptly deactivated it, or so I thought. Is quota support automatically
> > enabled when qgroups are discovered, or did I perhaps not disable quota
> > support properly?)
> Qgroup will always be enabled after "btrfs quota enable", and until
> "btrfs quota disable" to disable it.
>
> No method to temporarily disable quota, since quota must trace any
> modification, or qgroup number will be out of true.
>
> So, one should manually disable quota.
> (And that's the backup tool to blame, it should either info user or
> disable qgroup on uninstallation)
Hmm, I must not be remembering the whole story then, because I was pretty sure
that I ran "quota disable" and verified that quotas were off, too, but then
again, it's been quite a while now (a year?) since it happened.
> > Since I couldn't use skip_balance, and logically can't destroy qgroups on
> > a
> > read-only file system, I decided to wait for a regular mount to finish.
> > That has been running since Tuesday, and I am slowly growing impatient.
> >
> > Thus I arrive at my question(s): is there anything else I can try, short
> > of
> > reformatting and restoring from backup? Can I use btrfs-check here, or
> > any
> > other tool? Or...?
> >
> > Also, should I be able to avoid reformatting: how do I properly disable
> > quota support?
>
> "btrfs quota disable <mnt>", yes you need RW mount.
> Any RW mountable snapshot/subvolume is OK.
OK
> > (BTW, searching for qgroup_fix_relocated_data_extents turned up the ML
> > thread "[PATCH] Btrfs: fix endless loop in balancing block groups", could
> > that be related?)
>
> Nope, the actual fixing patches are:
> [PATCH 1/4] btrfs: qgroup: Add comments explaining how btrfs qgroup works
> [PATCH 2/4] btrfs: qgroup: Rename functions to make it follow
> reserve,trace,account steps
> [PATCH 3/4] btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c
> [PATCH 4/4] btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
>
>
> The 4th patch is the real working one, but relies on previous 3 to apply.
>
> The regression is also caused by my patch:
> [PATCH v3.1 2/3] btrfs: relocation: Fix leaking qgroups numbers on data
> extents
>
> Sorry for the trouble.
No problem, I just wish I would've thought to check for qgroups before getting
into this mess.
Although I'm actually *relieved* that it's qgroups, because before that I was
worried that I had finally hit a nigh-show-stopping bug. I thought that I was
merely not seeing it on my other systems, but that it could happen at any
time. Now I'm more confident in the stability of my systems again :) .
> And for your recovery, I'd suggest to install an Archlinux into a USB
> HDD or USB stick, and compile David's branch and install it into the USB
> HDD.
>
> Then use the USB storage as rescue tool to mount the fs, which should do
> RW mount with or without skip_balance mount option.
> So you could disable quota then.
OK, I'll try that, thanks!
> Thanks,
> Qu
>
> > The laptop is currently running Gentoo with Linux 4.8.10 and btrfs-progs
> > 4.8.4.
> >
> > Greetings
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Greetings
--
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
next prev parent reply other threads:[~2016-12-05 11:01 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-03 18:40 system hangs due to qgroups Marc Joliet
2016-12-03 20:42 ` Chris Murphy
2016-12-03 21:46 ` Marc Joliet
2016-12-03 22:56 ` Chris Murphy
2016-12-04 16:02 ` Marc Joliet
2016-12-04 18:24 ` Duncan
2016-12-04 19:20 ` Marc Joliet
2016-12-05 2:32 ` Duncan
2016-12-04 18:52 ` Chris Murphy
2016-12-05 9:00 ` Marc Joliet
2016-12-05 10:16 ` Marc Joliet
2016-12-05 23:22 ` Marc Joliet
2016-12-19 11:17 ` Marc Joliet
2016-12-04 2:10 ` Adam Borowski
2016-12-04 16:02 ` Marc Joliet
2016-12-05 0:39 ` Qu Wenruo
2016-12-05 11:01 ` Marc Joliet [this message]
2016-12-05 12:10 ` Marc Joliet
2016-12-05 14:43 ` [SOLVED] " Marc Joliet
2016-12-06 0:29 ` Qu Wenruo
2016-12-06 10:12 ` Marc Joliet
2016-12-06 14:55 ` Marc Joliet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=29619565.vvZbx4DoIQ@thetick \
--to=marcec@gmx.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.