Re: filesystem goes ro trying to balance. "cpu stuck"

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: filesystem goes ro trying to balance. "cpu stuck"
Date: Mon, 12 Oct 2015 05:33:20 +0000 (UTC)	[thread overview]
Message-ID: <pan$121e9$47325977$22468ecf$501b7fa2@cox.net> (raw)
In-Reply-To: CAC=t97BtYg4W_Jxk6HdxbDmbJ1HFM1BbxTDPr+Y1KOd7M8W4gw@mail.gmail.com

Donald Pearson posted on Sun, 11 Oct 2015 11:46:14 -0500 as excerpted:

> Kernel 4.2.2-1.el7.elrepo btrfs-progs v4.2.1
> 
> I'm attempting to convert a filesystem from raid6 to raid10.  I didn't
> have any functional problems with it, but performance is abysmal
> compared to basically the same arrangement in raid10 so I thought I'd
> just get away from raid56 for a while (I also saw something about parity
> raid code developed beyond 2-disk parity that was ignored/thrown away so
> I'm thinking the devs don't care much about about parity raid at least
> for now).

Note on the parity-raid story:  AFAIK at least the btrfs folks aren't 
ignoring it (I don't know about the mdraid/dmraid folks).  There's simply 
more opportunities for new features than there are coders to code them 
up, and while progress is indeed occurring, some of these features may 
well take years.

Consider, even standard raid56 support was originally planned for IIRC 
3.5, but it wasn't actually added until (IIRC) 3.9, and that was only 
partial/runtime support (the parities were being calculated and written, 
but the tools to rebuild from parity were incomplete/broken/non-existent, 
so it was effectively a slow raid0 in terms of reliability, that would be 
upgraded to raid56 "for free" once the tools were done).  Complete raid56 
support wasn't even nominally there until 3.19, with the initial bugs 
still being worked out thru 4.0 and into 4.1.  So it took about /three/ 
/years/ longer than initially planned.

This sort of longer-to-implement-than-planned pattern has repeated 
multiple times over the life of btrfs, which is why it's taking so long 
to mature and stabilize.

So it's not that multi-parity-raid is being rejected or ignored, it's 
simply that there's way more to do than people to do it, and btrfs as a 
cow-based filesystem isn't exactly the simplest thing to implement 
correctly, so initial plans turned out to be /wildly/ optimistic, and 
honestly, some of these features, while not rejected, could well be a 
decade out.  Obviously others will be implemented before then, but 
there's just so many, and so few devs working on what really is a complex 
project, so something ends up being shoved back to that decade out, and 
that's the way it's going to be unless btrfs suddenly gets way more 
developer resources working on it than it has now.

> Partway through the balance something goes wrong and filesystem is
> forced read-only stopping the balance.
> 
> I did a fschk and it didn't complain about/find any errors.  The drives
> aren't throwing any errors or incrementing any smart attributes.  This
> is a backup array, so it's not the end of the world if I have to just
> blow it away and rebuild as raid10 from scratch.
> 
> The console prints this error.
> NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
> [btrfs-balance:8015]

I'm a user not a dev, tho I am a regular on this list, and backtraces 
don't mean a lot to me, so take this FWIW...

1) How old is the filesystem?  It isn't quite new, created with 
mkfs.btrfs from btrfs-progs v4.2.0 or v4.2.1, is it?  There's a known 
mkfs.btrfs bug along in there, that I don't remember whether it's fixed 
by 4.2.1 or only the latest 4.2.2, but it creates invalid filesystems.  
Btrfs check from 4.2.2 can detect the problem, but can't fix it, and as 
the filesystems as they are are unstable, it's best to get what you need 
off of them and recreate them with a non-buggy mkfs.btrfs ASAP.

2) Since you're on progs v4.2.1 ATM, that may apply to its mkfs.btrfs as 
well.  Please upgrade to 4.2.2 before creating any further btrfs, or 
failing that, downgrade to 4.1.3 or whatever the last in the progs 4.1 
series was.

3) Are you running btrfs quotas on the filesystem?  Unfortunately, btrfs 
quota handling code remains an unstable sore spot, tho they're continuing 
to work hard on fixing it.  I'm continuing to recommend, as I have for 
some time now, that people don't use it unless they're willing to deal 
with the problems and are actively working with the devs to fix them.  
Otherwise, either they need quota support and should really choose a 
filesystem where the feature is mature and stable, or they don't, in 
which case just leaving it off (or turning it off if on) avoids the 
problem.

There's at least two confirmed reasonably recent cases where turning off 
btrfs quota support eliminated the issues people were reporting, so this 
isn't an idle recommendation, it really does help in at least some 
cases.  If you don't really need quotas, leave (or turn) them off.  If 
you do, you really should be using a filesystem where the quota feature 
is mature and stable enough to rely on.  Yes, it does make a difference.

4) Snapshots (scaling).  While snapshots are a reasonably mature feature, 
they do remain a scaling challenge.  My recommendation is that you try to 
keep to about 250-ish snapshots per subvolume, no more than 3000 
snapshots worst-case total, and better no more than 1000 or 2000 (with 
1000, at the 250-per number, obviously letting you do that for four 
subvolumes).  If you're doing scheduled snapshotting, setup a scheduled 
thinning script as well, to keep your snapshots to around 250 or less per 
subvolume.  With reasonably thinning, that's actually a very reasonable 
number, even for those starting at multiple snapshots per hour.

Keeping the number of snapshots below 3000 at worst, and preferably to 
1000 or less, should dramatically speed up maintenance operations such as 
balance.  We sometimes see people with hundreds of thousands of 
snapshots, and then on top of that running quotas, and for them, 
balancing TiB-scale filesystems really can take not hours or days, but 
weeks or months, making it entirely unworkable in practice.  Keeping to a 
couple thousand snapshots, with quotas turned off, should at least keep 
that in the semi-reasonable days range (assuming the absence of bugs like 
the one you unfortunately seem to have, of course).

5) Snapshots (as a feature that can lock in place various not directly 
related bugs).  Despite the fact that snapshots are a reasonably stable 
feature, btrfs itself isn't yet entirely stable, and there remain bugs 
from time to time.  When a bug occurs and some part of the filesystem 
breaks, because of the way snapshots work to lock down older file extents 
that would be deleted or rewritten on a normal filesystem or on btrfs 
without snapshots, people often find that the problem isn't actually in 
the current copy of some file, but in some subset of their snapshots of 
that file.  If they simply delete all the snapshots that reference that 
bad bit of the filesystem, it frees it, and the balance that was hanging 
before, suddenly works.

Again, this isn't a snapshot bug directly, it's simply that on a 
filesystem with a snapshot history going back some time, often whatever 
filesystem bug or physical media defect occurred, happens to affect only 
the older extents that haven't been changed in awhile, and if the file 
has changed over time, often the newer version is no longer using the 
block that's bad, so deleting the snapshots that are still referencing 
it, suddenly eliminates the problem.

There have been several posters who reported various problems with 
balance, that went away when they deleted either their oldest, or all, 
snapshots.  It's by no means everyone, but it's a significant enough 
number that if you do have a bunch of old snapshots and can afford to 
delete them, often because you have the same files actually backed up 
elsewhere, it's worth a shot.

6) That's the obvious stuff.  If it's nothing there, then with luck 
somebody will recognize the trace and match it to a bug, or a dev with 
have the time to look at it.  Give it a couple days if you like to see if 
that happens, and if not, then I'd say, blow it away and start over, 
since it's backups anyway, so you can.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2015-10-12  5:33 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-11 16:46 filesystem goes ro trying to balance. "cpu stuck" Donald Pearson
2015-10-12  5:33 ` Duncan [this message]
2015-10-12 13:40   ` Donald Pearson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$121e9$47325977$22468ecf$501b7fa2@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).