BTRFS balance segfault, where to go from here

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* BTRFS balance segfault, where to go from here
@ 2014-10-27  9:26 Stephan Alz
  2014-10-27 16:51 ` Chris Murphy
  0 siblings, 1 reply; 8+ messages in thread
From: Stephan Alz @ 2014-10-27  9:26 UTC (permalink / raw)
  To: linux-btrfs

Hello Folks,

I used to have an array of 4x4TB drives with BTRFS in raid10.
The kernel version is: 3.13-0.bpo.1-amd64
BTRFS version is: v3.14.1

When it was reaching 80% in space I added another 4TB drive to the array with:

> btrfs device add /dev/sdf /mnt/backup

And started the balancing to the new drive:

> btrfs filesystem balance /mnt/backup

This was going for a while for 5-6 hours before it segfaulted with not enough free space message.
Now my configuration looks like this:

btrfs fi show /mnt/backup
Label: 'backup'  uuid: ...
	Total devices 5 FS bytes used 5.93TiB
	devid    1 size 3.64TiB used 2.82TiB path /dev/sdd
	devid    2 size 3.64TiB used 2.82TiB path /dev/sdc
	devid    3 size 3.64TiB used 2.81TiB path /dev/sdb
	devid    4 size 3.64TiB used 2.82TiB path /dev/sde
	devid    5 size 3.64TiB used 638.50GiB path /dev/sdf

After this crash happend during the balancing (logs are attached at the end) the system remounted my /mnt/backup share as RO.
At this point I started to really worry. I umounted and remounted it manually. At the beginning it run some self checks which took like 5 mins then as iotop showed it continued with the balancing which failed again the same way. For next time after mount I immediately put the balancing on pause (which helped). 

My question is where to go from here? What I going to do right now is to copy the most important data to another separated XFS drive.
What I planning to do is:

1, Upgrade the kernel
2, Upgrade BTRFS
3, Continue the balancing.


Could someone please also explain that how is exactly the raid10 setup works with ODD number of drives with btrfs? 
Raid10 should be a stripe of mirrors. Now then this sdf drive is mirrored or striped or what? 
Some btrfs gurus could tell me that should I be worried of dataloss because of this or not?

Would I need even more free space just to add a 5th drive? If so how much more? 

Kernel logs
-----------


Oct 24 17:25:44 backup kernel: [29396.873750] btrfs: relocating block group 5162588438528 flags 65
Oct 24 17:26:09 backup kernel: [29421.594524] btrfs: found 13126 extents
Oct 24 17:26:38 backup kernel: [29450.769228] btrfs: found 13126 extents
Oct 24 17:26:39 backup kernel: [29451.345198] btrfs: relocating block group 5161514696704 flags 68
Oct 24 17:31:33 backup kernel: [29745.776810] BTRFS debug (device sdb): run_one_delayed_ref returned -28
Oct 24 17:31:33 backup kernel: [29745.776818] ------------[ cut here ]------------
Oct 24 17:31:33 backup kernel: [29745.776847] WARNING: CPU: 1 PID: 1807 at /build/linux-t5aGFh/linux-3.13.10/fs/btrfs/super.c:254 __btrfs_abort_transaction+0x5a/0x140 [btrfs]()
Oct 24 17:31:33 backup kernel: [29745.776849] btrfs: Transaction aborted (error -28)
Oct 24 17:31:33 backup kernel: [29745.776851] Modules linked in: xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc 8021q garp mrp bridge stp llc loop iTCO_wdt iTCO_vendor_support lpc_ich radeon mfd_core processor evdev ttm drm_kms_helper drm i2c_algo_bit coretemp rng_core serio_raw pcspkr i2c_i801 i2c_core i3000_edac thermal_sys button shpchp edac_core ext4 crc16 mbcache jbd2 btrfs xor raid6_pq crc32c libcrc32c dm_mod xen_pciback sg sd_mod sr_mod crc_t10dif cdrom crct10dif_common ata_generic ahci ata_piix libahci 3w_9xxx libata scsi_mod ehci_pci uhci_hcd ehci_hcd e1000e ptp pps_core usbcore usb_common
Oct 24 17:31:33 backup kernel: [29745.776902] CPU: 1 PID: 1807 Comm: btrfs-transacti Not tainted 3.13-0.bpo.1-amd64 #1 Debian 3.13.10-1~bpo70+1
Oct 24 17:31:33 backup kernel: [29745.776905] Hardware name: Supermicro PDSM4+/PDSM4+, BIOS 6.00 02/05/2007
Oct 24 17:31:33 backup kernel: [29745.776907]  0000000000000000 ffffffffa0257130 ffffffff814d16c9 ffff88006a7f3cc8
Oct 24 17:31:33 backup kernel: [29745.776911]  ffffffff81060967 00000000ffffffe4 ffff880004282800 ffff88003b813ec0
Oct 24 17:31:33 backup kernel: [29745.776914]  0000000000000aaa ffffffffa0253b60 ffffffff81060a55 ffffffffa0257260
Oct 24 17:31:33 backup kernel: [29745.776918] Call Trace:
Oct 24 17:31:33 backup kernel: [29745.776926]  [<ffffffff814d16c9>] ? dump_stack+0x41/0x51
Oct 24 17:31:33 backup kernel: [29745.776931]  [<ffffffff81060967>] ? warn_slowpath_common+0x87/0xc0
Oct 24 17:31:33 backup kernel: [29745.776935]  [<ffffffff81060a55>] ? warn_slowpath_fmt+0x45/0x50
Oct 24 17:31:33 backup kernel: [29745.776946]  [<ffffffffa01b73ca>] ? __btrfs_abort_transaction+0x5a/0x140 [btrfs]
Oct 24 17:31:33 backup kernel: [29745.776959]  [<ffffffffa01d2e72>] ? btrfs_run_delayed_refs+0x372/0x530 [btrfs]
Oct 24 17:31:33 backup kernel: [29745.776974]  [<ffffffffa01fa8c3>] ? btrfs_run_ordered_operations+0x213/0x2b0 [btrfs]
Oct 24 17:31:33 backup kernel: [29745.776988]  [<ffffffffa01e2fea>] ? btrfs_commit_transaction+0x5a/0x990 [btrfs]
Oct 24 17:31:33 backup kernel: [29745.777001]  [<ffffffffa01e1345>] ? transaction_kthread+0x1c5/0x240 [btrfs]
Oct 24 17:31:33 backup kernel: [29745.777015]  [<ffffffffa01e1180>] ? open_ctree+0x1ff0/0x1ff0 [btrfs]
Oct 24 17:31:33 backup kernel: [29745.777019]  [<ffffffff8108233c>] ? kthread+0xbc/0xe0
Oct 24 17:31:33 backup kernel: [29745.777022]  [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
Oct 24 17:31:33 backup kernel: [29745.777026]  [<ffffffff814dee4c>] ? ret_from_fork+0x7c/0xb0
Oct 24 17:31:33 backup kernel: [29745.777030]  [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
Oct 24 17:31:33 backup kernel: [29745.777032] ---[ end trace 5de5beb31698a3c1 ]---
Oct 24 17:31:33 backup kernel: [29745.777035] BTRFS error (device sdb) in btrfs_run_delayed_refs:2730: errno=-28 No space left
Oct 24 17:31:33 backup kernel: [29745.777512] BTRFS info (device sdb): forced readonly
Oct 24 17:31:33 backup kernel: [29745.784767] BTRFS debug (device sdb): run_one_delayed_ref returned -28
Oct 24 17:31:33 backup kernel: [29745.784773] BTRFS error (device sdb) in btrfs_run_delayed_refs:2730: errno=-28 No space left
Oct 24 17:35:53 backup kernel: [30005.015967] btrfs: device label backup_fs devid 3 transid 86656 /dev/sdb
Oct 24 17:35:53 backup kernel: [30005.063903] btrfs: disk space caching is enabled
Oct 24 17:43:01 backup kernel: [30433.356660] BTRFS debug (device sdf): unlinked 1 orphans
Oct 24 17:43:01 backup kernel: [30433.395645] btrfs: continuing balance
Oct 24 17:43:02 backup kernel: [30434.395936] btrfs: relocating block group 7434626138112 flags 65
Oct 24 17:43:17 backup kernel: [30449.104022] btrfs: found 8842 extents
Oct 24 17:43:24 backup kernel: [30456.043235] btrfs: found 8834 extents
Oct 24 17:43:24 backup kernel: [30456.580133] btrfs: relocating block group 7223098998784 flags 68
Oct 24 17:48:42 backup kernel: [30774.465707] btrfs: found 37187 extents
Oct 24 17:48:43 backup kernel: [30775.058570] btrfs: relocating block group 6782864850944 flags 68
Oct 24 17:52:16 backup kernel: [30988.070735] BTRFS debug (device sdf): run_one_delayed_ref returned -28
Oct 24 17:52:16 backup kernel: [30988.070742] ------------[ cut here ]------------
Oct 24 17:52:16 backup kernel: [30988.070772] WARNING: CPU: 1 PID: 15920 at /build/linux-t5aGFh/linux-3.13.10/fs/btrfs/super.c:254 __btrfs_abort_transaction+0x5a/0x140 [btrfs]()
Oct 24 17:52:16 backup kernel: [30988.070775] btrfs: Transaction aborted (error -28)
Oct 24 17:52:16 backup kernel: [30988.070776] Modules linked in: xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc 8021q garp mrp bridge stp llc loop iTCO_wdt iTCO_vendor_support lpc_ich radeon mfd_core processor evdev ttm drm_kms_helper drm i2c_algo_bit coretemp rng_core serio_raw pcspkr i2c_i801 i2c_core i3000_edac thermal_sys button shpchp edac_core ext4 crc16 mbcache jbd2 btrfs xor raid6_pq crc32c libcrc32c dm_mod xen_pciback sg sd_mod sr_mod crc_t10dif cdrom crct10dif_common ata_generic ahci ata_piix libahci 3w_9xxx libata scsi_mod ehci_pci uhci_hcd ehci_hcd e1000e ptp pps_core usbcore usb_common
Oct 24 17:52:16 backup kernel: [30988.070828] CPU: 1 PID: 15920 Comm: btrfs-transacti Tainted: G        W    3.13-0.bpo.1-amd64 #1 Debian 3.13.10-1~bpo70+1
Oct 24 17:52:16 backup kernel: [30988.070830] Hardware name: Supermicro PDSM4+/PDSM4+, BIOS 6.00 02/05/2007
Oct 24 17:52:16 backup kernel: [30988.070833]  0000000000000000 ffffffffa0257130 ffffffff814d16c9 ffff880056d7bcc8
Oct 24 17:52:16 backup kernel: [30988.070838]  ffffffff81060967 00000000ffffffe4 ffff880003c97000 ffff88006ba9abe0
Oct 24 17:52:16 backup kernel: [30988.070841]  0000000000000aaa ffffffffa0253b60 ffffffff81060a55 ffffffffa0257260
Oct 24 17:52:16 backup kernel: [30988.070845] Call Trace:
Oct 24 17:52:16 backup kernel: [30988.070853]  [<ffffffff814d16c9>] ? dump_stack+0x41/0x51
Oct 24 17:52:16 backup kernel: [30988.070858]  [<ffffffff81060967>] ? warn_slowpath_common+0x87/0xc0
Oct 24 17:52:16 backup kernel: [30988.070862]  [<ffffffff81060a55>] ? warn_slowpath_fmt+0x45/0x50
Oct 24 17:52:16 backup kernel: [30988.070873]  [<ffffffffa01b73ca>] ? __btrfs_abort_transaction+0x5a/0x140 [btrfs]
Oct 24 17:52:16 backup kernel: [30988.070886]  [<ffffffffa01d2e72>] ? btrfs_run_delayed_refs+0x372/0x530 [btrfs]
Oct 24 17:52:16 backup kernel: [30988.070901]  [<ffffffffa01fa8c3>] ? btrfs_run_ordered_operations+0x213/0x2b0 [btrfs]
Oct 24 17:52:16 backup kernel: [30988.070915]  [<ffffffffa01e2fea>] ? btrfs_commit_transaction+0x5a/0x990 [btrfs]
Oct 24 17:52:16 backup kernel: [30988.070929]  [<ffffffffa01e1345>] ? transaction_kthread+0x1c5/0x240 [btrfs]
Oct 24 17:52:16 backup kernel: [30988.070942]  [<ffffffffa01e1180>] ? open_ctree+0x1ff0/0x1ff0 [btrfs]
Oct 24 17:52:16 backup kernel: [30988.070946]  [<ffffffff8108233c>] ? kthread+0xbc/0xe0
Oct 24 17:52:16 backup kernel: [30988.070949]  [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
Oct 24 17:52:16 backup kernel: [30988.070954]  [<ffffffff814dee4c>] ? ret_from_fork+0x7c/0xb0
Oct 24 17:52:16 backup kernel: [30988.070957]  [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
Oct 24 17:52:16 backup kernel: [30988.070960] ---[ end trace 5de5beb31698a3c2 ]---
Oct 24 17:52:16 backup kernel: [30988.070963] BTRFS error (device sdf) in btrfs_run_delayed_refs:2730: errno=-28 No space left
Oct 24 17:52:16 backup kernel: [30988.071439] BTRFS info (device sdf): forced readonly
Oct 24 17:52:16 backup kernel: [30988.081154] BTRFS debug (device sdf): run_one_delayed_ref returned -28
Oct 24 17:52:16 backup kernel: [30988.081161] BTRFS error (device sdf) in btrfs_run_delayed_refs:2730: errno=-28 No space left
Oct 24 17:55:34 backup kernel: [31186.936384] btrfs: device label backup_fs devid 3 transid 86683 /dev/sdb
Oct 24 17:55:35 backup kernel: [31187.067619] btrfs: disk space caching is enabled
Oct 24 18:01:23 backup kernel: [31535.301582] BTRFS debug (device sdf): unlinked 1 orphans
Oct 24 18:01:23 backup kernel: [31535.339410] btrfs: continuing balance
Oct 24 18:01:23 backup kernel: [31535.624023] btrfs: relocating block group 7438921105408 flags 68
Oct 24 18:02:37 backup kernel: [31609.293378] btrfs: found 26705 extents


Thanks!



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BTRFS balance segfault, where to go from here
  2014-10-27  9:26 BTRFS balance segfault, where to go from here Stephan Alz
@ 2014-10-27 16:51 ` Chris Murphy
  2014-10-28  0:07   ` Duncan
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Murphy @ 2014-10-27 16:51 UTC (permalink / raw)
  To: Stephan Alz; +Cc: linux-btrfs

On Oct 27, 2014, at 3:26 AM, Stephan Alz <stephan008@gmx.com> wrote:
> 
> My question is where to go from here? What I going to do right now is to copy the most important data to another separated XFS drive.
> What I planning to do is:
> 
> 1, Upgrade the kernel
> 2, Upgrade BTRFS
> 3, Continue the balancing.

Definitely upgrade the kernel and see how that goes, there's been many many changes since 3.13. I would upgrade the user space tools also but that's not as important.

FYI you can mount with skip_balance mount option to inhibit resuming balance, sometimes pausing the balance isn't fast enough when there are balance problems.

> 
> 
> Could someone please also explain that how is exactly the raid10 setup works with ODD number of drives with btrfs? 
> Raid10 should be a stripe of mirrors. Now then this sdf drive is mirrored or striped or what?

I have no idea honestly. Btrfs is very tolerant of adding odd number and sizes of devices, but things get a bit nutty in actual operation sometimes. This might be one of them because traditionally raid10 is always even number of drives, odd numbers just don't make sense. But Btrfs allows the addition; I think the expectation is you'd have added two before doing the balance though.

> Some btrfs gurus could tell me that should I be worried of dataloss because of this or not?

Anything is possible so hopefully you have backups. My expectation is worse case scenario the fs gets confused and you can't mount rw anymore in which case you won't be able to make it an even drive raid10. But in the case even as ro you can update your backups, blow away the Btrfs volume and start from scratch with an even number of drives, right?

> Would I need even more free space just to add a 5th drive? If so how much more?

Gonna guess you'd need to add a drive that's at least 2.83TiB in size if you want to keep it raid10.

Chris Murphy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BTRFS balance segfault, where to go from here
  2014-10-27 16:51 ` Chris Murphy
@ 2014-10-28  0:07   ` Duncan
  2014-10-28 11:33     ` Stephan Alz
  0 siblings, 1 reply; 8+ messages in thread
From: Duncan @ 2014-10-28  0:07 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Mon, 27 Oct 2014 10:51:16 -0600 as excerpted:

> On Oct 27, 2014, at 3:26 AM, Stephan Alz <stephan008@gmx.com> wrote:
>> 
>> My question is where to go from here? What I going to do right now is
>> to copy the most important data to another separated XFS drive.
>> What I planning to do is:
>> 
>> 1, Upgrade the kernel 2, Upgrade BTRFS 3, Continue the balancing.
> 
> Definitely upgrade the kernel and see how that goes, there's been many
> many changes since 3.13. I would upgrade the user space tools also but
> that's not as important.

Just emphasizing...

Because btrfs is still under heavy development and not yet fully stable, 
keeping particularly the kernel updated is vital, because running an old 
kernel often means running a kernel with known btrfs bugs, fixed in newer 
kernels.

The userspace isn't quite as important since under normal operation it 
mostly simply tells the kernel what operations to perform, and an older 
userspace simply means you might be missing newer features.  However, 
commands such as btrfs check (the old btrfsck) and btrfs restore work 
from userspace, so having a current btrfs-progs is important when you run 
into trouble and you're trying to fix things.

That said, a couple of recent kernels has known issues.  Don't use the 
3.15 series at all, and be sure you're on 3.16.3 or newer for the 3.16 
series.  3.17 introduced another bug, with the fix hopefully in 3.17.2 
(it didn't make 3.17.1) and in 3.18-rcs.

So 3.16.3 or later for stable kernel, or the latest 3.18-rc or live-git 
kernel, is what I'd recommend.  The other alternative if you're really 
conservative is the latest long-term stable series kernel, 3.14.x, as it 
gets critical bugfixes as well, tho it won't be quite as current as 
3.16.x or 3.18-rc.  But anything older than the latest 3.14.x stable 
series is old and outdated in btrfs terms, and is thus not recommended.  
And 3.15, 3.16 before 3.16.3, and 3.17 before 3.17.2 (hopefully), are 
blackout versions due to known btrfs bugs.  Avoid them.

Of course with btrfs still not fully stable, the usual sysadmin rule of 
thumb that if you don't have a tested backup you don't have a backup, and 
if you don't have a backup, by definition you don't care if you lose the 
data, applies more than ever.  If you're on not-yet-fully-stable btrfs 
and you don't have backups, by definition you don't care if you lose that 
data.  There's people having to learn that the hard way, tho btrfs 
restore can often recover at least some of what would otherwise be lost.

> FYI you can mount with skip_balance mount option to inhibit resuming
> balance, sometimes pausing the balance isn't fast enough when there are
> balance problems.

=:^)

>> Could someone please also explain that how is exactly the raid10 setup
>> works with ODD number of drives with btrfs?
>> Raid10 should be a stripe of mirrors. Now then this sdf drive is
>> mirrored or striped or what?
> 
> I have no idea honestly. Btrfs is very tolerant of adding odd number and
> sizes of devices, but things get a bit nutty in actual operation
> sometimes.

In btrfs, raid1, including the raid1 side of raid10, is defined as 
exactly two copies of the data, one on each of two different devices.  
These copies are allocated by chunk size, 1 GiB size for data, quarter 
GiB size for metadata, and chunks are normally allocated on the device 
with the most unallocated space available, provided the other constraints 
(such as don't but both copies on the same device) are met.

Btrfs raid0 stripes will be as wide as possible, but again are allocated 
a chunk at a time, in sub-chunk-size strips.

While I've not run btrfs raid10 personally and thus (as a sysadmin not a 
dev) can't say for sure, what this implies to me is that, assuming equal 
sized devices, an odd number of devices in raid10 will alternate skipping 
one device at each chunk allocation.

So with a five same-size device btrfs raid10, if I'm not mistaken, btrfs 
will allocate chunks from four at once, two mirrors, two stripes, with 
the fifth one unused for that chunk allocation.  However, at the next 
chunk allocation, the device skipped in the previous allocation will now 
have the most free space and will thus get the first allocation, with the 
one of the other four devices skipped in that allocation round.  After 
five allocation rounds (assuming all allocation rounds were 1 GiB data 
chunks, not quarter-GiB metadata), usage should thus be balanced across 
all five devices.

Of course with six same-size devices, because btrfs raid1 does exactly 
two copies, no more, each stripe will be three devices wide.

As for the dataloss question, unlike say raid56 mode which is known to be 
effectively little more than expensive raid0 at this point, raid10 should 
be as reliable as raid1, etc.  But I'd refer again to that sysadmin's 
rule of thumb above.  If you don't have tested backups, you don't have 
backups, and if you don't have backups, the data is by definition not 
valuable enough to be worth the hassle of backing it up; the calculated 
risk cost of data loss is lower than the given time required to make, 
test and keep current the backups.  After that, it's your decision 
whether you value that data more than the time required to make and 
maintain those backups, or not, given the risk factor including the fact 
that btrfs is still under heavy development and is not yet fully stable.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BTRFS balance segfault, where to go from here
  2014-10-28  0:07   ` Duncan
@ 2014-10-28 11:33     ` Stephan Alz
  2014-10-28 13:12       ` E V
  2014-10-28 13:33       ` Duncan
  0 siblings, 2 replies; 8+ messages in thread
From: Stephan Alz @ 2014-10-28 11:33 UTC (permalink / raw)
  To: linux-btrfs

Hello Folks,

Thanks for the help what I got so far. I did what you have recommended and upgraded the kernel to 3.16.

After reboot it automatically resumed the balancing operation. For about 2 hours it went well:

Label: 'backup' ...
    Total devices 5 FS bytes used 5.81TiB
    devid    1 size 3.64TiB used 2.77TiB path /dev/sdc
    devid    2 size 3.64TiB used 2.77TiB path /dev/sdb
    devid    3 size 3.64TiB used 2.77TiB path /dev/sda
    devid    4 size 3.64TiB used 2.76TiB path /dev/sdd
    devid    5 size 3.64TiB used 572.00GiB path /dev/sdf < interestingly the used is now lower than it was

After that all the sudden I just lost the machine. As I thought it crashed with kernel panic but this wasn't like with the 3.13, it killed the whole system. Not even the magic keys worked.

http://i59.tinypic.com/5we5ib.jpg

Then when I tried to reboot it with 3.16 the system always segfaulted at boot time when it tried to mount the btrfs filesystem.

With 3.13 it at least didn't crash the entire system so I booted back to that and managed to stop the balancing:

>btrfs filesystem balance status /mnt/backup

Balance on '/mnt/backup' is paused
1 out of about 10 chunks balanced (1 considered),  90% left

Now my filesystem is fortunately back to RW again. Backups can continue tonight.
And about the "data not being important to backed up", hell yes it is so yesterday I did a "backup of the backups" to a good old XFS filesystem (something which is reliable). The problem is that our whole backup system was designed to use BTRFS. It rsync from a lot of servers to the backup server every night then creates snapshots. Changing this and going back to other filesystem would require a lot of time and effort, possibly rewriting all of our backup scripts.

What else can I do?
Should I try an even later 3.18 kernel version?
Can this happen because it doesn't have enough space for real? 

The counter now says that:
 btrfs    19534313824 12468488824 3753187048  77%

The whole point I added the new drive is because it was running out of space.
Somebody could really explain how this balancing works with RAID10 mode. What I want to know that if ANY of the drives are fail do we lose data or not? And the fact that the balancing is paused now changes this or not? If any of the drives out of the 5 would completely fail right now, would I lose all the data? I definitely don't want to leave the system in an inconsistent state like this. At least the backups are only done at nights so if I can get the backup drive mounted to RW by the end of the day that's enough.

Thanks

At the end I attached some recent 3.13 crash logs (maybe it's any help).

[Tue Oct 28 12:01:35 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Oct 28 12:01:35 2014] btrfs           D ffff88007fc14280     0  3820   3202 0x00000000
[Tue Oct 28 12:01:35 2014]  ffff88003735e800 0000000000000086 0000000000000000 ffffffff81813480
[Tue Oct 28 12:01:35 2014]  0000000000014280 ffff880048feffd8 0000000000014280 ffff88003735e800
[Tue Oct 28 12:01:35 2014]  0000000000000246 ffff880036c8a000 ffff880036c8b260 ffff880036c8b2a0
[Tue Oct 28 12:01:35 2014] Call Trace:
[Tue Oct 28 12:01:35 2014]  [<ffffffffa02c486d>] ? btrfs_pause_balance+0x7d/0xf0 [btrfs]
[Tue Oct 28 12:01:35 2014]  [<ffffffff8109e400>] ? __wake_up_sync+0x10/0x10
[Tue Oct 28 12:01:35 2014]  [<ffffffffa02d1692>] ? btrfs_ioctl+0x1652/0x1f00 [btrfs]
[Tue Oct 28 12:01:35 2014]  [<ffffffff81199ea1>] ? path_openat+0xd1/0x630
[Tue Oct 28 12:01:35 2014]  [<ffffffff811956ac>] ? getname_flags+0xbc/0x1a0
[Tue Oct 28 12:01:35 2014]  [<ffffffff814dad78>] ? __do_page_fault+0x298/0x540
[Tue Oct 28 12:01:35 2014]  [<ffffffff8119c4c1>] ? do_vfs_ioctl+0x81/0x4d0
[Tue Oct 28 12:01:35 2014]  [<ffffffff81154a88>] ? do_brk+0x198/0x2f0
[Tue Oct 28 12:01:35 2014]  [<ffffffff8119c9b0>] ? SyS_ioctl+0xa0/0xc0
[Tue Oct 28 12:01:35 2014]  [<ffffffff814deef9>] ? system_call_fastpath+0x16/0x1b
[Tue Oct 28 12:03:35 2014] INFO: task btrfs:3820 blocked for more than 120 seconds.
[Tue Oct 28 12:03:35 2014]       Not tainted 3.13-0.bpo.1-amd64 #1
[Tue Oct 28 12:03:35 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Oct 28 12:03:35 2014] btrfs           D ffff88007fc14280     0  3820   3202 0x00000000
[Tue Oct 28 12:03:35 2014]  ffff88003735e800 0000000000000086 0000000000000000 ffffffff81813480
[Tue Oct 28 12:03:35 2014]  0000000000014280 ffff880048feffd8 0000000000014280 ffff88003735e800
[Tue Oct 28 12:03:35 2014]  0000000000000246 ffff880036c8a000 ffff880036c8b260 ffff880036c8b2a0
[Tue Oct 28 12:03:35 2014] Call Trace:
[Tue Oct 28 12:03:35 2014]  [<ffffffffa02c486d>] ? btrfs_pause_balance+0x7d/0xf0 [btrfs]
[Tue Oct 28 12:03:35 2014]  [<ffffffff8109e400>] ? __wake_up_sync+0x10/0x10
[Tue Oct 28 12:03:35 2014]  [<ffffffffa02d1692>] ? btrfs_ioctl+0x1652/0x1f00 [btrfs]
[Tue Oct 28 12:03:35 2014]  [<ffffffff81199ea1>] ? path_openat+0xd1/0x630
[Tue Oct 28 12:03:35 2014]  [<ffffffff811956ac>] ? getname_flags+0xbc/0x1a0
[Tue Oct 28 12:03:35 2014]  [<ffffffff814dad78>] ? __do_page_fault+0x298/0x540
[Tue Oct 28 12:03:35 2014]  [<ffffffff8119c4c1>] ? do_vfs_ioctl+0x81/0x4d0
[Tue Oct 28 12:03:35 2014]  [<ffffffff81154a88>] ? do_brk+0x198/0x2f0
[Tue Oct 28 12:03:35 2014]  [<ffffffff8119c9b0>] ? SyS_ioctl+0xa0/0xc0
[Tue Oct 28 12:03:35 2014]  [<ffffffff814deef9>] ? system_call_fastpath+0x16/0x1b
[Tue Oct 28 12:03:48 2014] btrfs: found 16561 extents

Sent: Tuesday, October 28, 2014 at 1:07 AM
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: BTRFS balance segfault, where to go from here
Chris Murphy posted on Mon, 27 Oct 2014 10:51:16 -0600 as excerpted:

> On Oct 27, 2014, at 3:26 AM, Stephan Alz <stephan008@gmx.com> wrote:
>>
>> My question is where to go from here? What I going to do right now is
>> to copy the most important data to another separated XFS drive.
>> What I planning to do is:
>>
>> 1, Upgrade the kernel 2, Upgrade BTRFS 3, Continue the balancing.
>
> Definitely upgrade the kernel and see how that goes, there's been many
> many changes since 3.13. I would upgrade the user space tools also but
> that's not as important.

Just emphasizing...

Because btrfs is still under heavy development and not yet fully stable,
keeping particularly the kernel updated is vital, because running an old
kernel often means running a kernel with known btrfs bugs, fixed in newer
kernels.

The userspace isn't quite as important since under normal operation it
mostly simply tells the kernel what operations to perform, and an older
userspace simply means you might be missing newer features. However,
commands such as btrfs check (the old btrfsck) and btrfs restore work
from userspace, so having a current btrfs-progs is important when you run
into trouble and you're trying to fix things.

That said, a couple of recent kernels has known issues. Don't use the
3.15 series at all, and be sure you're on 3.16.3 or newer for the 3.16
series. 3.17 introduced another bug, with the fix hopefully in 3.17.2
(it didn't make 3.17.1) and in 3.18-rcs.

So 3.16.3 or later for stable kernel, or the latest 3.18-rc or live-git
kernel, is what I'd recommend. The other alternative if you're really
conservative is the latest long-term stable series kernel, 3.14.x, as it
gets critical bugfixes as well, tho it won't be quite as current as
3.16.x or 3.18-rc. But anything older than the latest 3.14.x stable
series is old and outdated in btrfs terms, and is thus not recommended.
And 3.15, 3.16 before 3.16.3, and 3.17 before 3.17.2 (hopefully), are
blackout versions due to known btrfs bugs. Avoid them.

Of course with btrfs still not fully stable, the usual sysadmin rule of
thumb that if you don't have a tested backup you don't have a backup, and
if you don't have a backup, by definition you don't care if you lose the
data, applies more than ever. If you're on not-yet-fully-stable btrfs
and you don't have backups, by definition you don't care if you lose that
data. There's people having to learn that the hard way, tho btrfs
restore can often recover at least some of what would otherwise be lost.

> FYI you can mount with skip_balance mount option to inhibit resuming
> balance, sometimes pausing the balance isn't fast enough when there are
> balance problems.

=:^)

>> Could someone please also explain that how is exactly the raid10 setup
>> works with ODD number of drives with btrfs?
>> Raid10 should be a stripe of mirrors. Now then this sdf drive is
>> mirrored or striped or what?
>
> I have no idea honestly. Btrfs is very tolerant of adding odd number and
> sizes of devices, but things get a bit nutty in actual operation
> sometimes.

In btrfs, raid1, including the raid1 side of raid10, is defined as
exactly two copies of the data, one on each of two different devices.
These copies are allocated by chunk size, 1 GiB size for data, quarter
GiB size for metadata, and chunks are normally allocated on the device
with the most unallocated space available, provided the other constraints
(such as don't but both copies on the same device) are met.

Btrfs raid0 stripes will be as wide as possible, but again are allocated
a chunk at a time, in sub-chunk-size strips.

While I've not run btrfs raid10 personally and thus (as a sysadmin not a
dev) can't say for sure, what this implies to me is that, assuming equal
sized devices, an odd number of devices in raid10 will alternate skipping
one device at each chunk allocation.

So with a five same-size device btrfs raid10, if I'm not mistaken, btrfs
will allocate chunks from four at once, two mirrors, two stripes, with
the fifth one unused for that chunk allocation. However, at the next
chunk allocation, the device skipped in the previous allocation will now
have the most free space and will thus get the first allocation, with the
one of the other four devices skipped in that allocation round. After
five allocation rounds (assuming all allocation rounds were 1 GiB data
chunks, not quarter-GiB metadata), usage should thus be balanced across
all five devices.

Of course with six same-size devices, because btrfs raid1 does exactly
two copies, no more, each stripe will be three devices wide.

As for the dataloss question, unlike say raid56 mode which is known to be
effectively little more than expensive raid0 at this point, raid10 should
be as reliable as raid1, etc. But I'd refer again to that sysadmin's
rule of thumb above. If you don't have tested backups, you don't have
backups, and if you don't have backups, the data is by definition not
valuable enough to be worth the hassle of backing it up; the calculated
risk cost of data loss is lower than the given time required to make,
test and keep current the backups. After that, it's your decision
whether you value that data more than the time required to make and
maintain those backups, or not, given the risk factor including the fact
that btrfs is still under heavy development and is not yet fully stable.

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BTRFS balance segfault, where to go from here
  2014-10-28 11:33     ` Stephan Alz
@ 2014-10-28 13:12       ` E V
  2014-10-28 14:02         ` Rich Freeman
  2014-10-28 13:33       ` Duncan
  1 sibling, 1 reply; 8+ messages in thread
From: E V @ 2014-10-28 13:12 UTC (permalink / raw)
  To: Stephan Alz; +Cc: linux-btrfs

I've seen dead locks on 3.16.3. Personally, I'm staying with 3.14
until something newer stabilizes, haven't had any issues with it. You
might want to try the latest 3.14, though I think there should be a
new one pretty soon with quite a few btrfs patches.

On Tue, Oct 28, 2014 at 7:33 AM, Stephan Alz <stephan008@gmx.com> wrote:
> Hello Folks,
>
> Thanks for the help what I got so far. I did what you have recommended and upgraded the kernel to 3.16.
>
> After reboot it automatically resumed the balancing operation. For about 2 hours it went well:
>
> Label: 'backup' ...
>     Total devices 5 FS bytes used 5.81TiB
>     devid    1 size 3.64TiB used 2.77TiB path /dev/sdc
>     devid    2 size 3.64TiB used 2.77TiB path /dev/sdb
>     devid    3 size 3.64TiB used 2.77TiB path /dev/sda
>     devid    4 size 3.64TiB used 2.76TiB path /dev/sdd
>     devid    5 size 3.64TiB used 572.00GiB path /dev/sdf < interestingly the used is now lower than it was
>
> After that all the sudden I just lost the machine. As I thought it crashed with kernel panic but this wasn't like with the 3.13, it killed the whole system. Not even the magic keys worked.
>
> http://i59.tinypic.com/5we5ib.jpg
>
> Then when I tried to reboot it with 3.16 the system always segfaulted at boot time when it tried to mount the btrfs filesystem.
>
> With 3.13 it at least didn't crash the entire system so I booted back to that and managed to stop the balancing:
>
>>btrfs filesystem balance status /mnt/backup
>
> Balance on '/mnt/backup' is paused
> 1 out of about 10 chunks balanced (1 considered),  90% left
>
> Now my filesystem is fortunately back to RW again. Backups can continue tonight.
> And about the "data not being important to backed up", hell yes it is so yesterday I did a "backup of the backups" to a good old XFS filesystem (something which is reliable). The problem is that our whole backup system was designed to use BTRFS. It rsync from a lot of servers to the backup server every night then creates snapshots. Changing this and going back to other filesystem would require a lot of time and effort, possibly rewriting all of our backup scripts.
>
> What else can I do?
> Should I try an even later 3.18 kernel version?
> Can this happen because it doesn't have enough space for real?
>
>
> The counter now says that:
>  btrfs    19534313824 12468488824 3753187048  77%
>
> The whole point I added the new drive is because it was running out of space.
> Somebody could really explain how this balancing works with RAID10 mode. What I want to know that if ANY of the drives are fail do we lose data or not? And the fact that the balancing is paused now changes this or not? If any of the drives out of the 5 would completely fail right now, would I lose all the data? I definitely don't want to leave the system in an inconsistent state like this. At least the backups are only done at nights so if I can get the backup drive mounted to RW by the end of the day that's enough.
>
> Thanks
>
> At the end I attached some recent 3.13 crash logs (maybe it's any help).
>
>
> [Tue Oct 28 12:01:35 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Tue Oct 28 12:01:35 2014] btrfs           D ffff88007fc14280     0  3820   3202 0x00000000
> [Tue Oct 28 12:01:35 2014]  ffff88003735e800 0000000000000086 0000000000000000 ffffffff81813480
> [Tue Oct 28 12:01:35 2014]  0000000000014280 ffff880048feffd8 0000000000014280 ffff88003735e800
> [Tue Oct 28 12:01:35 2014]  0000000000000246 ffff880036c8a000 ffff880036c8b260 ffff880036c8b2a0
> [Tue Oct 28 12:01:35 2014] Call Trace:
> [Tue Oct 28 12:01:35 2014]  [<ffffffffa02c486d>] ? btrfs_pause_balance+0x7d/0xf0 [btrfs]
> [Tue Oct 28 12:01:35 2014]  [<ffffffff8109e400>] ? __wake_up_sync+0x10/0x10
> [Tue Oct 28 12:01:35 2014]  [<ffffffffa02d1692>] ? btrfs_ioctl+0x1652/0x1f00 [btrfs]
> [Tue Oct 28 12:01:35 2014]  [<ffffffff81199ea1>] ? path_openat+0xd1/0x630
> [Tue Oct 28 12:01:35 2014]  [<ffffffff811956ac>] ? getname_flags+0xbc/0x1a0
> [Tue Oct 28 12:01:35 2014]  [<ffffffff814dad78>] ? __do_page_fault+0x298/0x540
> [Tue Oct 28 12:01:35 2014]  [<ffffffff8119c4c1>] ? do_vfs_ioctl+0x81/0x4d0
> [Tue Oct 28 12:01:35 2014]  [<ffffffff81154a88>] ? do_brk+0x198/0x2f0
> [Tue Oct 28 12:01:35 2014]  [<ffffffff8119c9b0>] ? SyS_ioctl+0xa0/0xc0
> [Tue Oct 28 12:01:35 2014]  [<ffffffff814deef9>] ? system_call_fastpath+0x16/0x1b
> [Tue Oct 28 12:03:35 2014] INFO: task btrfs:3820 blocked for more than 120 seconds.
> [Tue Oct 28 12:03:35 2014]       Not tainted 3.13-0.bpo.1-amd64 #1
> [Tue Oct 28 12:03:35 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Tue Oct 28 12:03:35 2014] btrfs           D ffff88007fc14280     0  3820   3202 0x00000000
> [Tue Oct 28 12:03:35 2014]  ffff88003735e800 0000000000000086 0000000000000000 ffffffff81813480
> [Tue Oct 28 12:03:35 2014]  0000000000014280 ffff880048feffd8 0000000000014280 ffff88003735e800
> [Tue Oct 28 12:03:35 2014]  0000000000000246 ffff880036c8a000 ffff880036c8b260 ffff880036c8b2a0
> [Tue Oct 28 12:03:35 2014] Call Trace:
> [Tue Oct 28 12:03:35 2014]  [<ffffffffa02c486d>] ? btrfs_pause_balance+0x7d/0xf0 [btrfs]
> [Tue Oct 28 12:03:35 2014]  [<ffffffff8109e400>] ? __wake_up_sync+0x10/0x10
> [Tue Oct 28 12:03:35 2014]  [<ffffffffa02d1692>] ? btrfs_ioctl+0x1652/0x1f00 [btrfs]
> [Tue Oct 28 12:03:35 2014]  [<ffffffff81199ea1>] ? path_openat+0xd1/0x630
> [Tue Oct 28 12:03:35 2014]  [<ffffffff811956ac>] ? getname_flags+0xbc/0x1a0
> [Tue Oct 28 12:03:35 2014]  [<ffffffff814dad78>] ? __do_page_fault+0x298/0x540
> [Tue Oct 28 12:03:35 2014]  [<ffffffff8119c4c1>] ? do_vfs_ioctl+0x81/0x4d0
> [Tue Oct 28 12:03:35 2014]  [<ffffffff81154a88>] ? do_brk+0x198/0x2f0
> [Tue Oct 28 12:03:35 2014]  [<ffffffff8119c9b0>] ? SyS_ioctl+0xa0/0xc0
> [Tue Oct 28 12:03:35 2014]  [<ffffffff814deef9>] ? system_call_fastpath+0x16/0x1b
> [Tue Oct 28 12:03:48 2014] btrfs: found 16561 extents
>
> Sent: Tuesday, October 28, 2014 at 1:07 AM
> From: Duncan <1i5t5.duncan@cox.net>
> To: linux-btrfs@vger.kernel.org
> Subject: Re: BTRFS balance segfault, where to go from here
> Chris Murphy posted on Mon, 27 Oct 2014 10:51:16 -0600 as excerpted:
>
>> On Oct 27, 2014, at 3:26 AM, Stephan Alz <stephan008@gmx.com> wrote:
>>>
>>> My question is where to go from here? What I going to do right now is
>>> to copy the most important data to another separated XFS drive.
>>> What I planning to do is:
>>>
>>> 1, Upgrade the kernel 2, Upgrade BTRFS 3, Continue the balancing.
>>
>> Definitely upgrade the kernel and see how that goes, there's been many
>> many changes since 3.13. I would upgrade the user space tools also but
>> that's not as important.
>
> Just emphasizing...
>
> Because btrfs is still under heavy development and not yet fully stable,
> keeping particularly the kernel updated is vital, because running an old
> kernel often means running a kernel with known btrfs bugs, fixed in newer
> kernels.
>
> The userspace isn't quite as important since under normal operation it
> mostly simply tells the kernel what operations to perform, and an older
> userspace simply means you might be missing newer features. However,
> commands such as btrfs check (the old btrfsck) and btrfs restore work
> from userspace, so having a current btrfs-progs is important when you run
> into trouble and you're trying to fix things.
>
> That said, a couple of recent kernels has known issues. Don't use the
> 3.15 series at all, and be sure you're on 3.16.3 or newer for the 3.16
> series. 3.17 introduced another bug, with the fix hopefully in 3.17.2
> (it didn't make 3.17.1) and in 3.18-rcs.
>
> So 3.16.3 or later for stable kernel, or the latest 3.18-rc or live-git
> kernel, is what I'd recommend. The other alternative if you're really
> conservative is the latest long-term stable series kernel, 3.14.x, as it
> gets critical bugfixes as well, tho it won't be quite as current as
> 3.16.x or 3.18-rc. But anything older than the latest 3.14.x stable
> series is old and outdated in btrfs terms, and is thus not recommended.
> And 3.15, 3.16 before 3.16.3, and 3.17 before 3.17.2 (hopefully), are
> blackout versions due to known btrfs bugs. Avoid them.
>
> Of course with btrfs still not fully stable, the usual sysadmin rule of
> thumb that if you don't have a tested backup you don't have a backup, and
> if you don't have a backup, by definition you don't care if you lose the
> data, applies more than ever. If you're on not-yet-fully-stable btrfs
> and you don't have backups, by definition you don't care if you lose that
> data. There's people having to learn that the hard way, tho btrfs
> restore can often recover at least some of what would otherwise be lost.
>
>> FYI you can mount with skip_balance mount option to inhibit resuming
>> balance, sometimes pausing the balance isn't fast enough when there are
>> balance problems.
>
> =:^)
>
>>> Could someone please also explain that how is exactly the raid10 setup
>>> works with ODD number of drives with btrfs?
>>> Raid10 should be a stripe of mirrors. Now then this sdf drive is
>>> mirrored or striped or what?
>>
>> I have no idea honestly. Btrfs is very tolerant of adding odd number and
>> sizes of devices, but things get a bit nutty in actual operation
>> sometimes.
>
> In btrfs, raid1, including the raid1 side of raid10, is defined as
> exactly two copies of the data, one on each of two different devices.
> These copies are allocated by chunk size, 1 GiB size for data, quarter
> GiB size for metadata, and chunks are normally allocated on the device
> with the most unallocated space available, provided the other constraints
> (such as don't but both copies on the same device) are met.
>
> Btrfs raid0 stripes will be as wide as possible, but again are allocated
> a chunk at a time, in sub-chunk-size strips.
>
> While I've not run btrfs raid10 personally and thus (as a sysadmin not a
> dev) can't say for sure, what this implies to me is that, assuming equal
> sized devices, an odd number of devices in raid10 will alternate skipping
> one device at each chunk allocation.
>
> So with a five same-size device btrfs raid10, if I'm not mistaken, btrfs
> will allocate chunks from four at once, two mirrors, two stripes, with
> the fifth one unused for that chunk allocation. However, at the next
> chunk allocation, the device skipped in the previous allocation will now
> have the most free space and will thus get the first allocation, with the
> one of the other four devices skipped in that allocation round. After
> five allocation rounds (assuming all allocation rounds were 1 GiB data
> chunks, not quarter-GiB metadata), usage should thus be balanced across
> all five devices.
>
> Of course with six same-size devices, because btrfs raid1 does exactly
> two copies, no more, each stripe will be three devices wide.
>
>
> As for the dataloss question, unlike say raid56 mode which is known to be
> effectively little more than expensive raid0 at this point, raid10 should
> be as reliable as raid1, etc. But I'd refer again to that sysadmin's
> rule of thumb above. If you don't have tested backups, you don't have
> backups, and if you don't have backups, the data is by definition not
> valuable enough to be worth the hassle of backing it up; the calculated
> risk cost of data loss is lower than the given time required to make,
> test and keep current the backups. After that, it's your decision
> whether you value that data more than the time required to make and
> maintain those backups, or not, given the risk factor including the fact
> that btrfs is still under heavy development and is not yet fully stable.
>
> --
> Duncan - List replies preferred. No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master." Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BTRFS balance segfault, where to go from here
  2014-10-28 13:12       ` E V
@ 2014-10-28 14:02         ` Rich Freeman
  0 siblings, 0 replies; 8+ messages in thread
From: Rich Freeman @ 2014-10-28 14:02 UTC (permalink / raw)
  To: E V; +Cc: Stephan Alz, linux-btrfs

On Tue, Oct 28, 2014 at 9:12 AM, E V <eliventer@gmail.com> wrote:
> I've seen dead locks on 3.16.3. Personally, I'm staying with 3.14
> until something newer stabilizes, haven't had any issues with it. You
> might want to try the latest 3.14, though I think there should be a
> new one pretty soon with quite a few btrfs patches.

Yeah, I forget what drove me to switch to a newer kernel, but I'm
wishing I had stuck with 3.14.  The last set of stable kernels has
been a pretty rough ride.  :)

My sense browsing the list is that the activity level has picked up a
bit, and that might be why 3.15-17 have been a bit more bug-ridden
than is normal.  For the long-term it is actually a good sign for the
vitality of btrfs.

But, I'll probably track 3.17 until a new longterm is announced and be
a bit more conservative.

--
Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BTRFS balance segfault, where to go from here
  2014-10-28 11:33     ` Stephan Alz
  2014-10-28 13:12       ` E V
@ 2014-10-28 13:33       ` Duncan
  2014-10-28 17:01         ` Rich Freeman
  1 sibling, 1 reply; 8+ messages in thread
From: Duncan @ 2014-10-28 13:33 UTC (permalink / raw)
  To: linux-btrfs

Stephan Alz posted on Tue, 28 Oct 2014 12:33:12 +0100 as excerpted:

> And about the "data not being important to backed up", hell yes it is so
> yesterday I did a "backup of the backups" to a good old XFS filesystem
> (something which is reliable).

Makes sense.  FWIW, my second backup is to reiserfs, which (counter to 
reputation) I've found extremely reliable, even thru various bits of 
faulty hardware over the years, at least since the switch to data=ordered 
by default.

> The problem is that our whole backup
> system was designed to use BTRFS. It rsync from a lot of servers to the
> backup server every night then creates snapshots. Changing this and
> going back to other filesystem would require a lot of time and effort,
> possibly rewriting all of our backup scripts.

Ouch.  The filesystem access and write pattern rsync does appears to be 
quite stressful for btrfs, and has triggered a number of race-condition 
and similar bugs over time as the filesystem has continued to develop and 
mature.  They get fixed, but the number of times rsync has been a trigger 
does demonstrate that it's rather higher stress on a btrfs than most 
access and write patterns.

Between that and the fact that btrfs /isn't/ yet fully stable and mature, 
it's not a filesystem I'd normally recommend... yet... for production or 
production backup use, where down-time waiting for the non-btrfs second 
level backup to restore is really going to hurt.  Here I'm running it, 
but it's just my own system and primary backup, and if it goes down, 
nothing but a bit of personal inconvenience and stress is at stake.

In hindsight, I guess you'd do a bit more research into your backup 
filesystem before designing a backup system dependent on it, but kinda 
late for that now.  Like I said, ouch.

FWIW, you might look into zfs, either on Linux, or under FreeBSD or the 
like.  While it does have license issues on Linux (and isn't an option 
for me), it's the closest parallel to the btrfs feature set, except it's 
actually mature.  I'm told it does require significantly more memory, 
preferably ECC, however, but under the circumstances, it may be less of a 
redesign to substitute it in place of btrfs and use its mature feature-
set, even if you have to throw some money into hardware to run it 
properly, than it'd be to try to redesign your entire backup setup so 
it's not dependent on otherwise btrfs-specific feature such as snapshots, 
etc.  Since it's not an option here I've not looked into it too closely 
personally, and don't know if it'll fit your needs, but if it does, it 
may well be simpler to substitute it into the existing backup setup 
without rewriting the WHOLE thing, than to do that full rewrite from 
scratch, without the btrfs/zfs features.  I'd at least look into it, 
assuming you haven't already.

> What else can I do?
> Should I try an even later 3.18 kernel version?
> Can this happen because it doesn't have enough space for real?
> 
> 
> The counter now says that:
>  btrfs    19534313824 12468488824 3753187048  77%
> 
> The whole point I added the new drive is because it was running out of
> space.
> Somebody could really explain how this balancing works with RAID10 mode.
> What I want to know that if ANY of the drives are fail do we lose data
> or not? And the fact that the balancing is paused now changes this or
> not? If any of the drives out of the 5 would completely fail right now,
> would I lose all the data? I definitely don't want to leave the system
> in an inconsistent state like this. At least the backups are only done
> at nights so if I can get the backup drive mounted to RW by the end of
> the day that's enough.

In theory, your data isn't in danger unless two devices fail at once, 
because the not yet rebalanced data is raid10 over four devices, while 
the rebalanced data is raid10 over five, so either way, dropout of a 
single device should at worst force the filesystem read-only.  And one of 
the reasons rebalance does require additional space is because it does 
/not/ rewrite in-place, but creates new chunks to write the data and 
metadata in, and only deactivates the old ones once the new ones are 
fully written and online.  So the balance activity itself shouldn't put 
the data in further danger; you should have two full copies of every 
chunk at every possible point.

However, btrfs is /not/ yet fully stable, and you're obviously running 
into a bug of /some/ sort, so while the theory says you're fine as long 
as you lose only a single device, due to the unknown nature of the bugs 
you're already seeing and more specifically, the unknown effect the bug 
itself might have on the raid10 mode, reality can't guarantee that.

Or at least I can't.  But I'm only an admin and list regular, not a dev, 
so it's not as if I can look at the code and the bugs and personally say, 
one way or the other, that they do or don't affect the raid10 
distribution and thus the defined existence of the second copy.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BTRFS balance segfault, where to go from here
  2014-10-28 13:33       ` Duncan
@ 2014-10-28 17:01         ` Rich Freeman
  0 siblings, 0 replies; 8+ messages in thread
From: Rich Freeman @ 2014-10-28 17:01 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

On Tue, Oct 28, 2014 at 9:33 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> Since it's not an option here I've not looked into it too closely
> personally, and don't know if it'll fit your needs, but if it does, it
> may well be simpler to substitute it into the existing backup setup
> without rewriting the WHOLE thing, than to do that full rewrite from
> scratch, without the btrfs/zfs features.  I'd at least look into it,
> assuming you haven't already.

I haven't researched zfs as thoroughly as btrfs and I'm not running
it, but you're certainly right that it is more mature (though I would
not say that zfs on linux is as mature as zfs on BSD or especially
Solaris).

Keep in mind that ZFS is marketed more towards enterprise workloads.
It isn't quite a dynamic as btrfs is intended to be, though in truth
many of those btrfs features like reshaping a raid5 aren't implemented
yet.  My sense is that you're going to need to plan ahead a bit more
with ZFS and making changes without doing a full backup/re-create is
going to be harder.  It also isn't designed for SSD (though it does
have features for SSD caching of the write log and I think also
read-caching, which is something that does not yet exist for btrfs).

>From what I understand of both I'd say that btrfs actually has the
better overall design, but zfs just has a LOT more maturity.  I think
that btrfs will eventually overtake it, but just when that will happen
is anybody's guess, and it certainly isn't there today.

The one thing that zfs does have going for you is that you're very
unlikely to get BUGs and PANICs anytime you do something as simple as
running rsync on it.

I will also note that I rsync data off of my btrfs filesystem all the
time without issue.  I do not have experience with using rsync to
write TO a btrfs filesystem.  Right now I don't trust btrfs send
enough to rely on it - the whole purpose of using rsync right now is
to backup my btrfs data to an ext4 partition which lets me sleep well
at night while still getting to play around with btrfs and make use of
features like snapshots/etc.  :)

If I was running a large (ie measured in 10s of disks) storage system
I'd probably go with ZFS now.  In such a setup being limited to RAID6s
of maybe 7 drives each and having to add/remove drives 7 at a time
wouldn't be a big deal.  When you're running a system with 6 disks
total that is a much bigger limitation.  If you look at something like
Backblade's storage pods that is the perfect example of the kind of
situation ZFS was designed to handle.  On the other hand, btrfs aims
to eventually address that while being a decent default filesystem for
your smartphone.

--
Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-10-28 17:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-27  9:26 BTRFS balance segfault, where to go from here Stephan Alz
2014-10-27 16:51 ` Chris Murphy
2014-10-28  0:07   ` Duncan
2014-10-28 11:33     ` Stephan Alz
2014-10-28 13:12       ` E V
2014-10-28 14:02         ` Rich Freeman
2014-10-28 13:33       ` Duncan
2014-10-28 17:01         ` Rich Freeman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox