* BTRFS RAID5 filesystem corruption during balance
@ 2015-05-21 21:43 Jan Voet
2015-05-22 4:43 ` Duncan
0 siblings, 1 reply; 7+ messages in thread
From: Jan Voet @ 2015-05-21 21:43 UTC (permalink / raw)
To: linux-btrfs
Hi,
I recently upgraded a quite old home NAS system (Celeron M based) to Ubuntu
14.04 with an upgraded linux kernel (3.19.8) and BTRFS tools v3.17. This
system has 5 brand new 6TB drives (HGST) with all drives directly handled by
BTRFS, both data and metadata in RAID5.
After loading up the system with 12.5TB data (took some time :-) ), a btrfs
balance was done to see how it would behave. After 3 days into it and
still 48% to go, the system locked up and didn't respond anymore to ssh, usb
keyboard, nor did the VGA output work anymore. Only pings worked (IP/ICMP
Echo Request/Reply) so the kernel IP stack was still active, nothing else
did however and no disk activity was seen at all.
So I did a hard reset, hoping that on restart it would resume the balance.
It actually seemed to restart the balance but showed only a few extents
remaining (11 or so, instead of the 3000+ that were shown originally) and
after a small amount of time seemed to have completed the balance ???
The result seems to be a mess however, with the filesystem being remounted
read-only after a few minutes, with lots of btrfs-related stackdumps in the
kernel message dump. Rebooting doesn't seem to help. It always ends up in
the same situation after some time.
The data is still visible, but I'm a bit of a loss as to how I should
continue. Any advice would be welcome.
Some data:
$ sudo btrfs fi show /dev/sdb
Label: none uuid: d278e7df-e26d-4a9b-99fb-71fbef819dd1
Total devices 5 FS bytes used 11.58TiB
devid 1 size 5.46TiB used 2.92TiB path /dev/sdb
devid 2 size 5.46TiB used 2.92TiB path /dev/sdc
devid 3 size 5.46TiB used 2.92TiB path /dev/sdd
devid 4 size 5.46TiB used 2.92TiB path /dev/sde
devid 5 size 5.46TiB used 2.92TiB path /dev/sdf
Btrfs v3.17
One of the stackdumps:
[ 328.224417] ------------[ cut here ]------------
[ 328.224446] WARNING: CPU: 0 PID: 1633 at
/home/kernel/COD/linux/fs/btrfs/disk-io.c:513 csum_dirty_buffer+0x6f/0xa0
[btrfs]()
[ 328.224448] Modules linked in: ppdev i915 video net2280 udc_core
drm_kms_helper lpc_ich drm serio_raw shpchp i2c_algo_bit 8250_fintek
parport_pc mac_hid lp parport btrfs xor raid6_pq hid_generic usbhid sata_mv
e1000 pata_acpi floppy hid
[ 328.224473] CPU: 0 PID: 1633 Comm: kworker/u2:12 Tainted: G W
3.19.8-031908-generic #201505110938
[ 328.224476] Hardware name: /i854GML-LPC47M182, BIOS 6.00 PG 06/21/2007
[ 328.224508] Workqueue: btrfs-worker btrfs_worker_helper [btrfs]
[ 328.224510] 00000000 00000000 c0ae5e40 c16e4a4d 00000000 c0ae5e70
c106250e c1907948
[ 328.224518] 00000000 00000661 f89c3444 00000201 f893142f f893142f
d6f3a8f0 f72b1ac8
[ 328.224525] f6d5d800 c0ae5e80 c1062572 00000009 00000000 c0ae5e9c
f893142f 187ced34
[ 328.224532] Call Trace:
[ 328.224537] [<c16e4a4d>] dump_stack+0x41/0x52
[ 328.224541] [<c106250e>] warn_slowpath_common+0x8e/0xd0
[ 328.224570] [<f893142f>] ? csum_dirty_buffer+0x6f/0xa0 [btrfs]
[ 328.224598] [<f893142f>] ? csum_dirty_buffer+0x6f/0xa0 [btrfs]
[ 328.224603] [<c1062572>] warn_slowpath_null+0x22/0x30
[ 328.224631] [<f893142f>] csum_dirty_buffer+0x6f/0xa0 [btrfs]
[ 328.224660] [<f893149f>] btree_csum_one_bio.isra.121+0x3f/0x50 [btrfs]
[ 328.224688] [<f89314c3>] __btree_submit_bio_start+0x13/0x20 [btrfs]
[ 328.224715] [<f892f81d>] run_one_async_start+0x3d/0x60 [btrfs]
[ 328.224750] [<f896e2b2>] normal_work_helper+0x62/0x180 [btrfs]
[ 328.224778] [<f8930630>] ? __btree_submit_bio_done+0x50/0x50 [btrfs]
[ 328.224812] [<f896e3e0>] btrfs_worker_helper+0x10/0x20 [btrfs]
[ 328.224817] [<c1077cb1>] process_one_work+0x121/0x3a0
[ 328.224822] [<c16f057c>] ? apic_timer_interrupt+0x34/0x3c
[ 328.224826] [<c107854d>] worker_thread+0xed/0x390
[ 328.224831] [<c1099fbf>] ? __wake_up_locked+0x1f/0x30
[ 328.224835] [<c1078460>] ? create_worker+0x1b0/0x1b0
[ 328.224840] [<c107d09b>] kthread+0x9b/0xb0
[ 328.224845] [<c16efb81>] ret_from_kernel_thread+0x21/0x30
[ 328.224850] [<c107d000>] ? flush_kthread_worker+0x80/0x80
[ 328.224853] ---[ end trace e8386011b87476a4 ]---
There's plenty more of those as well as other messages such as:
[ 329.354420] BTRFS: error (device sdf) in btrfs_run_delayed_refs:2792:
errno=-5 IO failure
[ 329.354522] BTRFS info (device sdf): forced readonly
[ 476.620532] perf interrupt took too long (2512 > 2500), lowering
kernel.perf_event_max_sample_rate to 50000
[ 549.412065] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[ 549.425057] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[ 549.425415] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[ 549.425641] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[ 549.425655] BTRFS info (device sdf): no csum found for inode 15963 start 0
[ 549.425943] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[ 549.426154] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[ 549.426165] BTRFS info (device sdf): no csum found for inode 15963 start 4096
[ 549.426443] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[ 549.426653] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[ 549.426663] BTRFS info (device sdf): no csum found for inode 15963 start 8192
[ 549.426944] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[ 549.427153] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[ 549.427163] BTRFS info (device sdf): no csum found for inode 15963 start
12288
[ 549.427655] BTRFS info (device sdf): no csum found for inode 15963 start
16384
[ 549.428447] BTRFS info (device sdf): no csum found for inode 15963 start
20480
[ 549.429175] BTRFS info (device sdf): no csum found for inode 15963 start
24576
.....
I can provide more info on request, and don't mind trying out different
things (the data was fully backed up before I started this experiment).
Kind regards,
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BTRFS RAID5 filesystem corruption during balance
2015-05-21 21:43 BTRFS RAID5 filesystem corruption during balance Jan Voet
@ 2015-05-22 4:43 ` Duncan
2015-05-22 18:11 ` Jan Voet
2015-05-22 19:15 ` Chris Murphy
0 siblings, 2 replies; 7+ messages in thread
From: Duncan @ 2015-05-22 4:43 UTC (permalink / raw)
To: linux-btrfs
Jan Voet posted on Thu, 21 May 2015 21:43:36 +0000 as excerpted:
> I recently upgraded a quite old home NAS system (Celeron M based) to
> Ubuntu 14.04 with an upgraded linux kernel (3.19.8) and BTRFS tools
> v3.17.
> This system has 5 brand new 6TB drives (HGST) with all drives directly
> handled by BTRFS, both data and metadata in RAID5.
FWIW, btrfs raid5 (and raid6, together called raid56 mode) is still
extremely new, only normal runtime implemented as originally introduced,
with complete repair from a device failure only completely implemented in
kernel 3.19, and while in theory complete, that implementation is still
very immature and poorly tested, and *WILL* have bugs, one of which you
may very well have found.
For in-production use, therefore, btrfs raid56 mode, while now at least
in theory complete, is really too immature at this point to recommend.
I'd recommend either btrfs raid1 or raid10 modes as more stable within
btrfs at this point, tho by the end of this year or early next, I predict
raid56 mode to have stabilized to about that of the rest of btrfs, which
is to say, not entirely stable, but heading that way.
IOW, for btrfs in general, the sysadmin's backup rule that if you don't
have backups by definition you don't care about the data regardless of
claims to the contrary, and untested would-be backups aren't backups
until you complete them by testing that they can be read and restored
from, continues to apply even more than to more stable filesystems, and
keeping up with current is still very important as by doing so you're
avoiding known and already fixed bugs. Given those constraints, btrfs
is /in/ /general/ usable. But not yet raid56 mode, which I'd definitely
consider to still be breakable at any time.
So certainly for the multi-TB of data you're dealing with, which you say
yourself takes some time (and is thus not something you can afford to
backup and restore trivially), I'd say stay off btrf raid56 until around
the end of the year or early next, at which point it should have
stabilized. Until then, consider either btrfs raid1 mode (which I use),
or for that amount of data, more likely btrfs raid10 mode.
Or if you must keep raid5 due to device and data size limitations,
consider sticking with mdraid5 or similar, for now, potentially with
btrfs on top, or perhaps with the more stable xfs or ext3/4 (or my
favorite reiserfs, which I have found /extremely/ reliable here, even
with less than absolutely reliable hardware, the old tales about it being
unreliable were from pre-data=ordered times, but that's early kernel 2.4
era and thus rather ancient history, now, but as they say, YMMV...).
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BTRFS RAID5 filesystem corruption during balance
2015-05-22 4:43 ` Duncan
@ 2015-05-22 18:11 ` Jan Voet
2015-05-23 15:02 ` Jan Voet
2015-05-22 19:15 ` Chris Murphy
1 sibling, 1 reply; 7+ messages in thread
From: Jan Voet @ 2015-05-22 18:11 UTC (permalink / raw)
To: linux-btrfs
Duncan <1i5t5.duncan <at> cox.net> writes:
> FWIW, btrfs raid5 (and raid6, together called raid56 mode) is still
> extremely new, only normal runtime implemented as originally introduced,
> with complete repair from a device failure only completely implemented in
> kernel 3.19, and while in theory complete, that implementation is still
> very immature and poorly tested, and *WILL* have bugs, one of which you
> may very well have found.
>
> For in-production use, therefore, btrfs raid56 mode, while now at least
> in theory complete, is really too immature at this point to recommend.
> I'd recommend either btrfs raid1 or raid10 modes as more stable within
> btrfs at this point, tho by the end of this year or early next, I predict
> raid56 mode to have stabilized to about that of the rest of btrfs, which
> is to say, not entirely stable, but heading that way.
>
Hi Duncan,
Thanks for your reply.
I was under the impression that RAID5/6 was considered quite stable in the
more recent kernels, hence my use of the 3.19 kernel and the upgraded
btrfstools. It's obvious that I was wrong in this assumption and maybe
btrfs RAID5 should be labeled as experimental code then.
A balance operation is supposed to be safe as it makes a copy of each file,
rewrites it, distributing the data over all devices and only then deletes
the original file? This should never lead to kernel deadlocks ...
Having a corrupted filesystem after a reboot due to this is even more
worrisome, I think. And worst of all are the btrfs kworker crashes. Kernel
code should never crash IMHO, but maybe I'm slightly naive here ;-) .
Anyways, lots of lessons learned, and I'll see if I can repair the
filesystem as described in https://btrfs.wiki.kernel.org/index.php/Btrfsck
If it doesn't work, I'll simply start over with an alternative filesystem.
Regards,
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BTRFS RAID5 filesystem corruption during balance
2015-05-22 4:43 ` Duncan
2015-05-22 18:11 ` Jan Voet
@ 2015-05-22 19:15 ` Chris Murphy
2015-05-23 2:56 ` Duncan
1 sibling, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2015-05-22 19:15 UTC (permalink / raw)
To: Btrfs BTRFS
On Thu, May 21, 2015 at 10:43 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> For in-production use, therefore, btrfs raid56 mode, while now at least
> in theory complete, is really too immature at this point to recommend.
At some point perhaps a developer will have time to state the expected
stability level on stable hardware. And what things should be included in a
complete report. I see many reports only including the bug/ Warning with
call trace. And too often problems were happening before that.
The XFS FAQ has an explicit "what to include in a report" other that may
serve as a guide to adapt for Btrfs reports.
--
Chris Murphy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BTRFS RAID5 filesystem corruption during balance
2015-05-22 19:15 ` Chris Murphy
@ 2015-05-23 2:56 ` Duncan
0 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2015-05-23 2:56 UTC (permalink / raw)
To: linux-btrfs
Chris Murphy posted on Fri, 22 May 2015 13:15:09 -0600 as excerpted:
> On Thu, May 21, 2015 at 10:43 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>> For in-production use, therefore, btrfs raid56 mode, while now at least
>> in theory complete, is really too immature at this point to recommend.
>
> At some point perhaps a developer will have time to state the expected
> stability level on stable hardware. And what things should be included
> in a complete report. I see many reports only including the bug/ Warning
> with call trace. And too often problems were happening before that.
>
> The XFS FAQ has an explicit "what to include in a report" other that may
> serve as a guide to adapt for Btrfs reports.
There's one spot on the wiki (bottom of the btrfs mailing lists page) that
lists the information to provide when filing a bug.
https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list
But, even being somewhat familiar with the wiki and knowing it was, or had
been, somewhere on the wiki, I had trouble finding it. It's definitely
not in the first place I looked, the Problem FAQ, under How do I report
bugs and issues? (Tho it does link to the list page.)
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#How_do_I_report_bugs_and_issues.3F
If I had ever gotten around to getting a wiki login, I'd fix that, but for
some reason, while I seem to be fine posting to newsgroups and
mailinglists (as newsgroups, via gmane.org's list2news service), I mostly
treat the web, wikis included, as read-only, other than the occasional
reply to an article. I never got into web forums that much either.
So if you have a wiki login and time to fix it... =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BTRFS RAID5 filesystem corruption during balance
2015-05-22 18:11 ` Jan Voet
@ 2015-05-23 15:02 ` Jan Voet
2015-06-20 3:50 ` Russell Coker
0 siblings, 1 reply; 7+ messages in thread
From: Jan Voet @ 2015-05-23 15:02 UTC (permalink / raw)
To: linux-btrfs
Jan Voet <jan.voet <at> gmail.com> writes:
>
> Duncan <1i5t5.duncan <at> cox.net> writes:
>
> > FWIW, btrfs raid5 (and raid6, together called raid56 mode) is still
> > extremely new, only normal runtime implemented as originally introduced,
> > with complete repair from a device failure only completely implemented in
> > kernel 3.19, and while in theory complete, that implementation is still
> > very immature and poorly tested, and *WILL* have bugs, one of which you
> > may very well have found.
> >
> > For in-production use, therefore, btrfs raid56 mode, while now at least
> > in theory complete, is really too immature at this point to recommend.
> > I'd recommend either btrfs raid1 or raid10 modes as more stable within
> > btrfs at this point, tho by the end of this year or early next, I predict
> > raid56 mode to have stabilized to about that of the rest of btrfs, which
> > is to say, not entirely stable, but heading that way.
> >
>
Looks like the the btrfs raid5 filesystem is back in working order.
What actually happened was that on reboot of the server, the interrupted
btrfs balance tried to resume each time, but wasn't capable of it due to an
incorrect/invalid state. The amount of errors that were spawned by this
made it very difficult to diagnose, as the kernel log got truncated very
quickly.
Doing a 'btrfs balance cancel' immediately after the array was mounted seems
to have done the trick. A subsequent 'btrfs check' didn't show any errors
at all and all the data seems to be there. :-)
Kind regards,
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BTRFS RAID5 filesystem corruption during balance
2015-05-23 15:02 ` Jan Voet
@ 2015-06-20 3:50 ` Russell Coker
0 siblings, 0 replies; 7+ messages in thread
From: Russell Coker @ 2015-06-20 3:50 UTC (permalink / raw)
To: Jan Voet; +Cc: linux-btrfs
On Sun, 24 May 2015 01:02:21 AM Jan Voet wrote:
> Doing a 'btrfs balance cancel' immediately after the array was mounted
> seems to have done the trick. A subsequent 'btrfs check' didn't show any
> errors at all and all the data seems to be there. :-)
I add "rootflags=skip_balance" to the kernel command-line of all my Debian
systems to solve this. I've had problems with the balance resuming in the
past which had similar results. I've also never seen a situation where
resuming the balance did any good.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-20 3:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-21 21:43 BTRFS RAID5 filesystem corruption during balance Jan Voet
2015-05-22 4:43 ` Duncan
2015-05-22 18:11 ` Jan Voet
2015-05-23 15:02 ` Jan Voet
2015-06-20 3:50 ` Russell Coker
2015-05-22 19:15 ` Chris Murphy
2015-05-23 2:56 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).