BTRFS RAID5 filesystem corruption during balance

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* BTRFS RAID5 filesystem corruption during balance
@ 2015-05-21 21:43 Jan Voet
  2015-05-22  4:43 ` Duncan
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Voet @ 2015-05-21 21:43 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I recently upgraded a quite old home NAS system (Celeron M based) to Ubuntu
14.04 with an upgraded linux kernel (3.19.8) and BTRFS tools v3.17.  This
system has 5 brand new 6TB drives (HGST) with all drives directly handled by
BTRFS, both data and metadata in RAID5.
After loading up the system with 12.5TB data (took some time :-) ), a btrfs
balance was done to see how it would behave.   After 3 days into it and
still 48% to go, the system locked up and didn't respond anymore to ssh, usb
keyboard, nor did the VGA output work anymore.  Only pings worked (IP/ICMP
Echo Request/Reply) so the kernel IP stack was still active, nothing else
did however and no disk activity was seen at all.
So I did a hard reset, hoping that on restart it would resume the balance. 
It actually seemed to restart the balance but showed only a few extents
remaining (11 or so, instead of the 3000+ that were shown originally) and
after a small amount of time seemed to have completed the balance ???
The result seems to be a mess however, with the filesystem being remounted
read-only after a few minutes, with lots of btrfs-related stackdumps in the
kernel message dump. Rebooting doesn't seem to help.  It always ends up in
the same situation after some time.
The data is still visible, but I'm a bit of a loss as to how I should
continue.  Any advice would be welcome.

Some data:

$ sudo btrfs fi show /dev/sdb
Label: none  uuid: d278e7df-e26d-4a9b-99fb-71fbef819dd1
        Total devices 5 FS bytes used 11.58TiB
        devid    1 size 5.46TiB used 2.92TiB path /dev/sdb
        devid    2 size 5.46TiB used 2.92TiB path /dev/sdc
        devid    3 size 5.46TiB used 2.92TiB path /dev/sdd
        devid    4 size 5.46TiB used 2.92TiB path /dev/sde
        devid    5 size 5.46TiB used 2.92TiB path /dev/sdf

Btrfs v3.17

One of the stackdumps:

[  328.224417] ------------[ cut here ]------------
[  328.224446] WARNING: CPU: 0 PID: 1633 at
/home/kernel/COD/linux/fs/btrfs/disk-io.c:513 csum_dirty_buffer+0x6f/0xa0
[btrfs]()
[  328.224448] Modules linked in: ppdev i915 video net2280 udc_core
drm_kms_helper lpc_ich drm serio_raw shpchp i2c_algo_bit 8250_fintek
parport_pc mac_hid lp parport btrfs xor raid6_pq hid_generic usbhid sata_mv
e1000 pata_acpi floppy hid
[  328.224473] CPU: 0 PID: 1633 Comm: kworker/u2:12 Tainted: G        W    
 3.19.8-031908-generic #201505110938
[  328.224476] Hardware name:    /i854GML-LPC47M182, BIOS 6.00 PG 06/21/2007
[  328.224508] Workqueue: btrfs-worker btrfs_worker_helper [btrfs]
[  328.224510]  00000000 00000000 c0ae5e40 c16e4a4d 00000000 c0ae5e70
c106250e c1907948
[  328.224518]  00000000 00000661 f89c3444 00000201 f893142f f893142f
d6f3a8f0 f72b1ac8
[  328.224525]  f6d5d800 c0ae5e80 c1062572 00000009 00000000 c0ae5e9c
f893142f 187ced34
[  328.224532] Call Trace:
[  328.224537]  [<c16e4a4d>] dump_stack+0x41/0x52
[  328.224541]  [<c106250e>] warn_slowpath_common+0x8e/0xd0
[  328.224570]  [<f893142f>] ? csum_dirty_buffer+0x6f/0xa0 [btrfs]
[  328.224598]  [<f893142f>] ? csum_dirty_buffer+0x6f/0xa0 [btrfs]
[  328.224603]  [<c1062572>] warn_slowpath_null+0x22/0x30
[  328.224631]  [<f893142f>] csum_dirty_buffer+0x6f/0xa0 [btrfs]
[  328.224660]  [<f893149f>] btree_csum_one_bio.isra.121+0x3f/0x50 [btrfs]
[  328.224688]  [<f89314c3>] __btree_submit_bio_start+0x13/0x20 [btrfs]
[  328.224715]  [<f892f81d>] run_one_async_start+0x3d/0x60 [btrfs]
[  328.224750]  [<f896e2b2>] normal_work_helper+0x62/0x180 [btrfs]
[  328.224778]  [<f8930630>] ? __btree_submit_bio_done+0x50/0x50 [btrfs]
[  328.224812]  [<f896e3e0>] btrfs_worker_helper+0x10/0x20 [btrfs]
[  328.224817]  [<c1077cb1>] process_one_work+0x121/0x3a0
[  328.224822]  [<c16f057c>] ? apic_timer_interrupt+0x34/0x3c
[  328.224826]  [<c107854d>] worker_thread+0xed/0x390
[  328.224831]  [<c1099fbf>] ? __wake_up_locked+0x1f/0x30
[  328.224835]  [<c1078460>] ? create_worker+0x1b0/0x1b0
[  328.224840]  [<c107d09b>] kthread+0x9b/0xb0
[  328.224845]  [<c16efb81>] ret_from_kernel_thread+0x21/0x30
[  328.224850]  [<c107d000>] ? flush_kthread_worker+0x80/0x80
[  328.224853] ---[ end trace e8386011b87476a4 ]---

There's plenty more of those as well as other messages such as:

[  329.354420] BTRFS: error (device sdf) in btrfs_run_delayed_refs:2792:
errno=-5 IO failure
[  329.354522] BTRFS info (device sdf): forced readonly
[  476.620532] perf interrupt took too long (2512 > 2500), lowering
kernel.perf_event_max_sample_rate to 50000
[  549.412065] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[  549.425057] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[  549.425415] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[  549.425641] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[  549.425655] BTRFS info (device sdf): no csum found for inode 15963 start 0
[  549.425943] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[  549.426154] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[  549.426165] BTRFS info (device sdf): no csum found for inode 15963 start 4096
[  549.426443] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[  549.426653] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[  549.426663] BTRFS info (device sdf): no csum found for inode 15963 start 8192
[  549.426944] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[  549.427153] BTRFS (device sdf): bad tree block start 17003380002271197777
19274981785600
[  549.427163] BTRFS info (device sdf): no csum found for inode 15963 start
12288
[  549.427655] BTRFS info (device sdf): no csum found for inode 15963 start
16384
[  549.428447] BTRFS info (device sdf): no csum found for inode 15963 start
20480
[  549.429175] BTRFS info (device sdf): no csum found for inode 15963 start
24576

.....

I can provide more info on request, and don't mind trying out different
things (the data was fully backed up before I started this experiment).

Kind regards,
Jan



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS RAID5 filesystem corruption during balance
  2015-05-21 21:43 BTRFS RAID5 filesystem corruption during balance Jan Voet
@ 2015-05-22  4:43 ` Duncan
  2015-05-22 18:11   ` Jan Voet
  2015-05-22 19:15   ` Chris Murphy
  0 siblings, 2 replies; 7+ messages in thread
From: Duncan @ 2015-05-22  4:43 UTC (permalink / raw)
  To: linux-btrfs

Jan Voet posted on Thu, 21 May 2015 21:43:36 +0000 as excerpted:

> I recently upgraded a quite old home NAS system (Celeron M based) to
> Ubuntu 14.04 with an upgraded linux kernel (3.19.8) and BTRFS tools
> v3.17.
>  This system has 5 brand new 6TB drives (HGST) with all drives directly
> handled by BTRFS, both data and metadata in RAID5.

FWIW, btrfs raid5 (and raid6, together called raid56 mode) is still 
extremely new, only normal runtime implemented as originally introduced, 
with complete repair from a device failure only completely implemented in 
kernel 3.19, and while in theory complete, that implementation is still 
very immature and poorly tested, and *WILL* have bugs, one of which you 
may very well have found.

For in-production use, therefore, btrfs raid56 mode, while now at least 
in theory complete, is really too immature at this point to recommend.  
I'd recommend either btrfs raid1 or raid10 modes as more stable within 
btrfs at this point, tho by the end of this year or early next, I predict 
raid56 mode to have stabilized to about that of the rest of btrfs, which 
is to say, not entirely stable, but heading that way.

IOW, for btrfs in general, the sysadmin's backup rule that if you don't 
have backups by definition you don't care about the data regardless of 
claims to the contrary, and untested would-be backups aren't backups 
until you complete them by testing that they can be read and restored 
from, continues to apply even more than to more stable filesystems, and 
keeping up with current is still very important as by doing so you're 
avoiding known and already fixed bugs.  Given those constraints, btrfs 
is /in/ /general/ usable.  But not yet raid56 mode, which I'd definitely 
consider to still be breakable at any time.

So certainly for the multi-TB of data you're dealing with, which you say 
yourself takes some time (and is thus not something you can afford to 
backup and restore trivially), I'd say stay off btrf raid56 until around 
the end of the year or early next, at which point it should have 
stabilized.  Until then, consider either btrfs raid1 mode (which I use), 
or for that amount of data, more likely btrfs raid10 mode.

Or if you must keep raid5 due to device and data size limitations, 
consider sticking with mdraid5 or similar, for now, potentially with 
btrfs on top, or perhaps with the more stable xfs or ext3/4 (or my 
favorite reiserfs, which I have found /extremely/ reliable here, even 
with less than absolutely reliable hardware, the old tales about it being 
unreliable were from pre-data=ordered times, but that's early kernel 2.4 
era and thus rather ancient history, now, but as they say, YMMV...).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS RAID5 filesystem corruption during balance
  2015-05-22  4:43 ` Duncan
@ 2015-05-22 18:11   ` Jan Voet
  2015-05-23 15:02     ` Jan Voet
  2015-05-22 19:15   ` Chris Murphy
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Voet @ 2015-05-22 18:11 UTC (permalink / raw)
  To: linux-btrfs

Duncan <1i5t5.duncan <at> cox.net> writes:

> FWIW, btrfs raid5 (and raid6, together called raid56 mode) is still 
> extremely new, only normal runtime implemented as originally introduced, 
> with complete repair from a device failure only completely implemented in 
> kernel 3.19, and while in theory complete, that implementation is still 
> very immature and poorly tested, and *WILL* have bugs, one of which you 
> may very well have found.
> 
> For in-production use, therefore, btrfs raid56 mode, while now at least 
> in theory complete, is really too immature at this point to recommend.  
> I'd recommend either btrfs raid1 or raid10 modes as more stable within 
> btrfs at this point, tho by the end of this year or early next, I predict 
> raid56 mode to have stabilized to about that of the rest of btrfs, which 
> is to say, not entirely stable, but heading that way.
> 

Hi Duncan,

Thanks for your reply.

I was under the impression that RAID5/6 was considered quite stable in the
more recent kernels, hence my use of the 3.19 kernel and the upgraded
btrfstools.  It's obvious that I was wrong in this assumption and maybe
btrfs RAID5 should be labeled as experimental code then.

A balance operation is supposed to be safe as it makes a copy of each file,
rewrites it, distributing the data over all devices and only then deletes
the original file?  This should never lead to kernel deadlocks ...
Having a corrupted filesystem after a reboot due to this is even more
worrisome, I think.  And worst of all are the btrfs kworker crashes.  Kernel
code should never crash IMHO, but maybe I'm slightly naive here ;-) .

Anyways, lots of lessons learned, and I'll see if I can repair the
filesystem as described in https://btrfs.wiki.kernel.org/index.php/Btrfsck
If it doesn't work, I'll simply start over with an alternative filesystem.

Regards,
Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS RAID5 filesystem corruption during balance
  2015-05-22  4:43 ` Duncan
  2015-05-22 18:11   ` Jan Voet
@ 2015-05-22 19:15   ` Chris Murphy
  2015-05-23  2:56     ` Duncan
  1 sibling, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2015-05-22 19:15 UTC (permalink / raw)
  To: Btrfs BTRFS

On Thu, May 21, 2015 at 10:43 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> For in-production use, therefore, btrfs raid56 mode, while now at least
> in theory complete, is really too immature at this point to recommend.

At some point perhaps a developer will have time to state the expected
stability level on stable hardware. And what things should be included in a
complete report. I see many reports only including the bug/ Warning with
call trace. And too often problems were happening before that.

The XFS FAQ has an explicit "what to include in a report" other that may
serve as a guide to adapt for Btrfs reports.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS RAID5 filesystem corruption during balance
  2015-05-22 19:15   ` Chris Murphy
@ 2015-05-23  2:56     ` Duncan
  0 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2015-05-23  2:56 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Fri, 22 May 2015 13:15:09 -0600 as excerpted:

> On Thu, May 21, 2015 at 10:43 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>> For in-production use, therefore, btrfs raid56 mode, while now at least
>> in theory complete, is really too immature at this point to recommend.
> 
> At some point perhaps a developer will have time to state the expected
> stability level on stable hardware. And what things should be included
> in a complete report. I see many reports only including the bug/ Warning
> with call trace. And too often problems were happening before that.
> 
> The XFS FAQ has an explicit "what to include in a report" other that may
> serve as a guide to adapt for Btrfs reports.

There's one spot on the wiki (bottom of the btrfs mailing lists page) that
lists the information to provide when filing a bug.

https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list

But, even being somewhat familiar with the wiki and knowing it was, or had
been, somewhere on the wiki, I had trouble finding it.  It's definitely
not in the first place I looked, the Problem FAQ, under How do I report
bugs and issues?  (Tho it does link to the list page.)

https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#How_do_I_report_bugs_and_issues.3F

If I had ever gotten around to getting a wiki login, I'd fix that, but for
some reason, while I seem to be fine posting to newsgroups and
mailinglists (as newsgroups, via gmane.org's list2news service), I mostly
treat the web, wikis included, as read-only, other than the occasional
reply to an article.  I never got into web forums that much either.

So if you have a wiki login and time to fix it... =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS RAID5 filesystem corruption during balance
  2015-05-22 18:11   ` Jan Voet
@ 2015-05-23 15:02     ` Jan Voet
  2015-06-20  3:50       ` Russell Coker
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Voet @ 2015-05-23 15:02 UTC (permalink / raw)
  To: linux-btrfs

Jan Voet <jan.voet <at> gmail.com> writes:

> 
> Duncan <1i5t5.duncan <at> cox.net> writes:
> 
> > FWIW, btrfs raid5 (and raid6, together called raid56 mode) is still 
> > extremely new, only normal runtime implemented as originally introduced, 
> > with complete repair from a device failure only completely implemented in 
> > kernel 3.19, and while in theory complete, that implementation is still 
> > very immature and poorly tested, and *WILL* have bugs, one of which you 
> > may very well have found.
> > 
> > For in-production use, therefore, btrfs raid56 mode, while now at least 
> > in theory complete, is really too immature at this point to recommend.  
> > I'd recommend either btrfs raid1 or raid10 modes as more stable within 
> > btrfs at this point, tho by the end of this year or early next, I predict 
> > raid56 mode to have stabilized to about that of the rest of btrfs, which 
> > is to say, not entirely stable, but heading that way.
> > 
> 

Looks like the the btrfs raid5 filesystem is back in working order.

What actually happened was that on reboot of the server, the interrupted
btrfs balance tried to resume each time, but wasn't capable of it due to an
incorrect/invalid state.  The amount of errors that were spawned by this
made it very difficult to diagnose, as the kernel log got truncated very
quickly.

Doing a 'btrfs balance cancel' immediately after the array was mounted seems
to have done the trick.  A subsequent 'btrfs check' didn't show any errors
at all and all the data seems to be there.  :-)

Kind regards,
Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BTRFS RAID5 filesystem corruption during balance
  2015-05-23 15:02     ` Jan Voet
@ 2015-06-20  3:50       ` Russell Coker
  0 siblings, 0 replies; 7+ messages in thread
From: Russell Coker @ 2015-06-20  3:50 UTC (permalink / raw)
  To: Jan Voet; +Cc: linux-btrfs

On Sun, 24 May 2015 01:02:21 AM Jan Voet wrote:
> Doing a 'btrfs balance cancel' immediately after the array was mounted
> seems to have done the trick.  A subsequent 'btrfs check' didn't show any
> errors at all and all the data seems to be there.  :-)

I add "rootflags=skip_balance" to the kernel command-line of all my Debian 
systems to solve this.  I've had problems with the balance resuming in the 
past which had similar results.  I've also never seen a situation where 
resuming the balance did any good.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-06-20  3:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-21 21:43 BTRFS RAID5 filesystem corruption during balance Jan Voet
2015-05-22  4:43 ` Duncan
2015-05-22 18:11   ` Jan Voet
2015-05-23 15:02     ` Jan Voet
2015-06-20  3:50       ` Russell Coker
2015-05-22 19:15   ` Chris Murphy
2015-05-23  2:56     ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).