* BTRFS balance segfault, where to go from here @ 2014-10-27 9:26 Stephan Alz 2014-10-27 16:51 ` Chris Murphy 0 siblings, 1 reply; 8+ messages in thread From: Stephan Alz @ 2014-10-27 9:26 UTC (permalink / raw) To: linux-btrfs Hello Folks, I used to have an array of 4x4TB drives with BTRFS in raid10. The kernel version is: 3.13-0.bpo.1-amd64 BTRFS version is: v3.14.1 When it was reaching 80% in space I added another 4TB drive to the array with: > btrfs device add /dev/sdf /mnt/backup And started the balancing to the new drive: > btrfs filesystem balance /mnt/backup This was going for a while for 5-6 hours before it segfaulted with not enough free space message. Now my configuration looks like this: btrfs fi show /mnt/backup Label: 'backup' uuid: ... Total devices 5 FS bytes used 5.93TiB devid 1 size 3.64TiB used 2.82TiB path /dev/sdd devid 2 size 3.64TiB used 2.82TiB path /dev/sdc devid 3 size 3.64TiB used 2.81TiB path /dev/sdb devid 4 size 3.64TiB used 2.82TiB path /dev/sde devid 5 size 3.64TiB used 638.50GiB path /dev/sdf After this crash happend during the balancing (logs are attached at the end) the system remounted my /mnt/backup share as RO. At this point I started to really worry. I umounted and remounted it manually. At the beginning it run some self checks which took like 5 mins then as iotop showed it continued with the balancing which failed again the same way. For next time after mount I immediately put the balancing on pause (which helped). My question is where to go from here? What I going to do right now is to copy the most important data to another separated XFS drive. What I planning to do is: 1, Upgrade the kernel 2, Upgrade BTRFS 3, Continue the balancing. Could someone please also explain that how is exactly the raid10 setup works with ODD number of drives with btrfs? Raid10 should be a stripe of mirrors. Now then this sdf drive is mirrored or striped or what? Some btrfs gurus could tell me that should I be worried of dataloss because of this or not? Would I need even more free space just to add a 5th drive? If so how much more? Kernel logs ----------- Oct 24 17:25:44 backup kernel: [29396.873750] btrfs: relocating block group 5162588438528 flags 65 Oct 24 17:26:09 backup kernel: [29421.594524] btrfs: found 13126 extents Oct 24 17:26:38 backup kernel: [29450.769228] btrfs: found 13126 extents Oct 24 17:26:39 backup kernel: [29451.345198] btrfs: relocating block group 5161514696704 flags 68 Oct 24 17:31:33 backup kernel: [29745.776810] BTRFS debug (device sdb): run_one_delayed_ref returned -28 Oct 24 17:31:33 backup kernel: [29745.776818] ------------[ cut here ]------------ Oct 24 17:31:33 backup kernel: [29745.776847] WARNING: CPU: 1 PID: 1807 at /build/linux-t5aGFh/linux-3.13.10/fs/btrfs/super.c:254 __btrfs_abort_transaction+0x5a/0x140 [btrfs]() Oct 24 17:31:33 backup kernel: [29745.776849] btrfs: Transaction aborted (error -28) Oct 24 17:31:33 backup kernel: [29745.776851] Modules linked in: xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc 8021q garp mrp bridge stp llc loop iTCO_wdt iTCO_vendor_support lpc_ich radeon mfd_core processor evdev ttm drm_kms_helper drm i2c_algo_bit coretemp rng_core serio_raw pcspkr i2c_i801 i2c_core i3000_edac thermal_sys button shpchp edac_core ext4 crc16 mbcache jbd2 btrfs xor raid6_pq crc32c libcrc32c dm_mod xen_pciback sg sd_mod sr_mod crc_t10dif cdrom crct10dif_common ata_generic ahci ata_piix libahci 3w_9xxx libata scsi_mod ehci_pci uhci_hcd ehci_hcd e1000e ptp pps_core usbcore usb_common Oct 24 17:31:33 backup kernel: [29745.776902] CPU: 1 PID: 1807 Comm: btrfs-transacti Not tainted 3.13-0.bpo.1-amd64 #1 Debian 3.13.10-1~bpo70+1 Oct 24 17:31:33 backup kernel: [29745.776905] Hardware name: Supermicro PDSM4+/PDSM4+, BIOS 6.00 02/05/2007 Oct 24 17:31:33 backup kernel: [29745.776907] 0000000000000000 ffffffffa0257130 ffffffff814d16c9 ffff88006a7f3cc8 Oct 24 17:31:33 backup kernel: [29745.776911] ffffffff81060967 00000000ffffffe4 ffff880004282800 ffff88003b813ec0 Oct 24 17:31:33 backup kernel: [29745.776914] 0000000000000aaa ffffffffa0253b60 ffffffff81060a55 ffffffffa0257260 Oct 24 17:31:33 backup kernel: [29745.776918] Call Trace: Oct 24 17:31:33 backup kernel: [29745.776926] [<ffffffff814d16c9>] ? dump_stack+0x41/0x51 Oct 24 17:31:33 backup kernel: [29745.776931] [<ffffffff81060967>] ? warn_slowpath_common+0x87/0xc0 Oct 24 17:31:33 backup kernel: [29745.776935] [<ffffffff81060a55>] ? warn_slowpath_fmt+0x45/0x50 Oct 24 17:31:33 backup kernel: [29745.776946] [<ffffffffa01b73ca>] ? __btrfs_abort_transaction+0x5a/0x140 [btrfs] Oct 24 17:31:33 backup kernel: [29745.776959] [<ffffffffa01d2e72>] ? btrfs_run_delayed_refs+0x372/0x530 [btrfs] Oct 24 17:31:33 backup kernel: [29745.776974] [<ffffffffa01fa8c3>] ? btrfs_run_ordered_operations+0x213/0x2b0 [btrfs] Oct 24 17:31:33 backup kernel: [29745.776988] [<ffffffffa01e2fea>] ? btrfs_commit_transaction+0x5a/0x990 [btrfs] Oct 24 17:31:33 backup kernel: [29745.777001] [<ffffffffa01e1345>] ? transaction_kthread+0x1c5/0x240 [btrfs] Oct 24 17:31:33 backup kernel: [29745.777015] [<ffffffffa01e1180>] ? open_ctree+0x1ff0/0x1ff0 [btrfs] Oct 24 17:31:33 backup kernel: [29745.777019] [<ffffffff8108233c>] ? kthread+0xbc/0xe0 Oct 24 17:31:33 backup kernel: [29745.777022] [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0 Oct 24 17:31:33 backup kernel: [29745.777026] [<ffffffff814dee4c>] ? ret_from_fork+0x7c/0xb0 Oct 24 17:31:33 backup kernel: [29745.777030] [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0 Oct 24 17:31:33 backup kernel: [29745.777032] ---[ end trace 5de5beb31698a3c1 ]--- Oct 24 17:31:33 backup kernel: [29745.777035] BTRFS error (device sdb) in btrfs_run_delayed_refs:2730: errno=-28 No space left Oct 24 17:31:33 backup kernel: [29745.777512] BTRFS info (device sdb): forced readonly Oct 24 17:31:33 backup kernel: [29745.784767] BTRFS debug (device sdb): run_one_delayed_ref returned -28 Oct 24 17:31:33 backup kernel: [29745.784773] BTRFS error (device sdb) in btrfs_run_delayed_refs:2730: errno=-28 No space left Oct 24 17:35:53 backup kernel: [30005.015967] btrfs: device label backup_fs devid 3 transid 86656 /dev/sdb Oct 24 17:35:53 backup kernel: [30005.063903] btrfs: disk space caching is enabled Oct 24 17:43:01 backup kernel: [30433.356660] BTRFS debug (device sdf): unlinked 1 orphans Oct 24 17:43:01 backup kernel: [30433.395645] btrfs: continuing balance Oct 24 17:43:02 backup kernel: [30434.395936] btrfs: relocating block group 7434626138112 flags 65 Oct 24 17:43:17 backup kernel: [30449.104022] btrfs: found 8842 extents Oct 24 17:43:24 backup kernel: [30456.043235] btrfs: found 8834 extents Oct 24 17:43:24 backup kernel: [30456.580133] btrfs: relocating block group 7223098998784 flags 68 Oct 24 17:48:42 backup kernel: [30774.465707] btrfs: found 37187 extents Oct 24 17:48:43 backup kernel: [30775.058570] btrfs: relocating block group 6782864850944 flags 68 Oct 24 17:52:16 backup kernel: [30988.070735] BTRFS debug (device sdf): run_one_delayed_ref returned -28 Oct 24 17:52:16 backup kernel: [30988.070742] ------------[ cut here ]------------ Oct 24 17:52:16 backup kernel: [30988.070772] WARNING: CPU: 1 PID: 15920 at /build/linux-t5aGFh/linux-3.13.10/fs/btrfs/super.c:254 __btrfs_abort_transaction+0x5a/0x140 [btrfs]() Oct 24 17:52:16 backup kernel: [30988.070775] btrfs: Transaction aborted (error -28) Oct 24 17:52:16 backup kernel: [30988.070776] Modules linked in: xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc 8021q garp mrp bridge stp llc loop iTCO_wdt iTCO_vendor_support lpc_ich radeon mfd_core processor evdev ttm drm_kms_helper drm i2c_algo_bit coretemp rng_core serio_raw pcspkr i2c_i801 i2c_core i3000_edac thermal_sys button shpchp edac_core ext4 crc16 mbcache jbd2 btrfs xor raid6_pq crc32c libcrc32c dm_mod xen_pciback sg sd_mod sr_mod crc_t10dif cdrom crct10dif_common ata_generic ahci ata_piix libahci 3w_9xxx libata scsi_mod ehci_pci uhci_hcd ehci_hcd e1000e ptp pps_core usbcore usb_common Oct 24 17:52:16 backup kernel: [30988.070828] CPU: 1 PID: 15920 Comm: btrfs-transacti Tainted: G W 3.13-0.bpo.1-amd64 #1 Debian 3.13.10-1~bpo70+1 Oct 24 17:52:16 backup kernel: [30988.070830] Hardware name: Supermicro PDSM4+/PDSM4+, BIOS 6.00 02/05/2007 Oct 24 17:52:16 backup kernel: [30988.070833] 0000000000000000 ffffffffa0257130 ffffffff814d16c9 ffff880056d7bcc8 Oct 24 17:52:16 backup kernel: [30988.070838] ffffffff81060967 00000000ffffffe4 ffff880003c97000 ffff88006ba9abe0 Oct 24 17:52:16 backup kernel: [30988.070841] 0000000000000aaa ffffffffa0253b60 ffffffff81060a55 ffffffffa0257260 Oct 24 17:52:16 backup kernel: [30988.070845] Call Trace: Oct 24 17:52:16 backup kernel: [30988.070853] [<ffffffff814d16c9>] ? dump_stack+0x41/0x51 Oct 24 17:52:16 backup kernel: [30988.070858] [<ffffffff81060967>] ? warn_slowpath_common+0x87/0xc0 Oct 24 17:52:16 backup kernel: [30988.070862] [<ffffffff81060a55>] ? warn_slowpath_fmt+0x45/0x50 Oct 24 17:52:16 backup kernel: [30988.070873] [<ffffffffa01b73ca>] ? __btrfs_abort_transaction+0x5a/0x140 [btrfs] Oct 24 17:52:16 backup kernel: [30988.070886] [<ffffffffa01d2e72>] ? btrfs_run_delayed_refs+0x372/0x530 [btrfs] Oct 24 17:52:16 backup kernel: [30988.070901] [<ffffffffa01fa8c3>] ? btrfs_run_ordered_operations+0x213/0x2b0 [btrfs] Oct 24 17:52:16 backup kernel: [30988.070915] [<ffffffffa01e2fea>] ? btrfs_commit_transaction+0x5a/0x990 [btrfs] Oct 24 17:52:16 backup kernel: [30988.070929] [<ffffffffa01e1345>] ? transaction_kthread+0x1c5/0x240 [btrfs] Oct 24 17:52:16 backup kernel: [30988.070942] [<ffffffffa01e1180>] ? open_ctree+0x1ff0/0x1ff0 [btrfs] Oct 24 17:52:16 backup kernel: [30988.070946] [<ffffffff8108233c>] ? kthread+0xbc/0xe0 Oct 24 17:52:16 backup kernel: [30988.070949] [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0 Oct 24 17:52:16 backup kernel: [30988.070954] [<ffffffff814dee4c>] ? ret_from_fork+0x7c/0xb0 Oct 24 17:52:16 backup kernel: [30988.070957] [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0 Oct 24 17:52:16 backup kernel: [30988.070960] ---[ end trace 5de5beb31698a3c2 ]--- Oct 24 17:52:16 backup kernel: [30988.070963] BTRFS error (device sdf) in btrfs_run_delayed_refs:2730: errno=-28 No space left Oct 24 17:52:16 backup kernel: [30988.071439] BTRFS info (device sdf): forced readonly Oct 24 17:52:16 backup kernel: [30988.081154] BTRFS debug (device sdf): run_one_delayed_ref returned -28 Oct 24 17:52:16 backup kernel: [30988.081161] BTRFS error (device sdf) in btrfs_run_delayed_refs:2730: errno=-28 No space left Oct 24 17:55:34 backup kernel: [31186.936384] btrfs: device label backup_fs devid 3 transid 86683 /dev/sdb Oct 24 17:55:35 backup kernel: [31187.067619] btrfs: disk space caching is enabled Oct 24 18:01:23 backup kernel: [31535.301582] BTRFS debug (device sdf): unlinked 1 orphans Oct 24 18:01:23 backup kernel: [31535.339410] btrfs: continuing balance Oct 24 18:01:23 backup kernel: [31535.624023] btrfs: relocating block group 7438921105408 flags 68 Oct 24 18:02:37 backup kernel: [31609.293378] btrfs: found 26705 extents Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS balance segfault, where to go from here 2014-10-27 9:26 BTRFS balance segfault, where to go from here Stephan Alz @ 2014-10-27 16:51 ` Chris Murphy 2014-10-28 0:07 ` Duncan 0 siblings, 1 reply; 8+ messages in thread From: Chris Murphy @ 2014-10-27 16:51 UTC (permalink / raw) To: Stephan Alz; +Cc: linux-btrfs On Oct 27, 2014, at 3:26 AM, Stephan Alz <stephan008@gmx.com> wrote: > > My question is where to go from here? What I going to do right now is to copy the most important data to another separated XFS drive. > What I planning to do is: > > 1, Upgrade the kernel > 2, Upgrade BTRFS > 3, Continue the balancing. Definitely upgrade the kernel and see how that goes, there's been many many changes since 3.13. I would upgrade the user space tools also but that's not as important. FYI you can mount with skip_balance mount option to inhibit resuming balance, sometimes pausing the balance isn't fast enough when there are balance problems. > > > Could someone please also explain that how is exactly the raid10 setup works with ODD number of drives with btrfs? > Raid10 should be a stripe of mirrors. Now then this sdf drive is mirrored or striped or what? I have no idea honestly. Btrfs is very tolerant of adding odd number and sizes of devices, but things get a bit nutty in actual operation sometimes. This might be one of them because traditionally raid10 is always even number of drives, odd numbers just don't make sense. But Btrfs allows the addition; I think the expectation is you'd have added two before doing the balance though. > Some btrfs gurus could tell me that should I be worried of dataloss because of this or not? Anything is possible so hopefully you have backups. My expectation is worse case scenario the fs gets confused and you can't mount rw anymore in which case you won't be able to make it an even drive raid10. But in the case even as ro you can update your backups, blow away the Btrfs volume and start from scratch with an even number of drives, right? > Would I need even more free space just to add a 5th drive? If so how much more? Gonna guess you'd need to add a drive that's at least 2.83TiB in size if you want to keep it raid10. Chris Murphy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS balance segfault, where to go from here 2014-10-27 16:51 ` Chris Murphy @ 2014-10-28 0:07 ` Duncan 2014-10-28 11:33 ` Stephan Alz 0 siblings, 1 reply; 8+ messages in thread From: Duncan @ 2014-10-28 0:07 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Mon, 27 Oct 2014 10:51:16 -0600 as excerpted: > On Oct 27, 2014, at 3:26 AM, Stephan Alz <stephan008@gmx.com> wrote: >> >> My question is where to go from here? What I going to do right now is >> to copy the most important data to another separated XFS drive. >> What I planning to do is: >> >> 1, Upgrade the kernel 2, Upgrade BTRFS 3, Continue the balancing. > > Definitely upgrade the kernel and see how that goes, there's been many > many changes since 3.13. I would upgrade the user space tools also but > that's not as important. Just emphasizing... Because btrfs is still under heavy development and not yet fully stable, keeping particularly the kernel updated is vital, because running an old kernel often means running a kernel with known btrfs bugs, fixed in newer kernels. The userspace isn't quite as important since under normal operation it mostly simply tells the kernel what operations to perform, and an older userspace simply means you might be missing newer features. However, commands such as btrfs check (the old btrfsck) and btrfs restore work from userspace, so having a current btrfs-progs is important when you run into trouble and you're trying to fix things. That said, a couple of recent kernels has known issues. Don't use the 3.15 series at all, and be sure you're on 3.16.3 or newer for the 3.16 series. 3.17 introduced another bug, with the fix hopefully in 3.17.2 (it didn't make 3.17.1) and in 3.18-rcs. So 3.16.3 or later for stable kernel, or the latest 3.18-rc or live-git kernel, is what I'd recommend. The other alternative if you're really conservative is the latest long-term stable series kernel, 3.14.x, as it gets critical bugfixes as well, tho it won't be quite as current as 3.16.x or 3.18-rc. But anything older than the latest 3.14.x stable series is old and outdated in btrfs terms, and is thus not recommended. And 3.15, 3.16 before 3.16.3, and 3.17 before 3.17.2 (hopefully), are blackout versions due to known btrfs bugs. Avoid them. Of course with btrfs still not fully stable, the usual sysadmin rule of thumb that if you don't have a tested backup you don't have a backup, and if you don't have a backup, by definition you don't care if you lose the data, applies more than ever. If you're on not-yet-fully-stable btrfs and you don't have backups, by definition you don't care if you lose that data. There's people having to learn that the hard way, tho btrfs restore can often recover at least some of what would otherwise be lost. > FYI you can mount with skip_balance mount option to inhibit resuming > balance, sometimes pausing the balance isn't fast enough when there are > balance problems. =:^) >> Could someone please also explain that how is exactly the raid10 setup >> works with ODD number of drives with btrfs? >> Raid10 should be a stripe of mirrors. Now then this sdf drive is >> mirrored or striped or what? > > I have no idea honestly. Btrfs is very tolerant of adding odd number and > sizes of devices, but things get a bit nutty in actual operation > sometimes. In btrfs, raid1, including the raid1 side of raid10, is defined as exactly two copies of the data, one on each of two different devices. These copies are allocated by chunk size, 1 GiB size for data, quarter GiB size for metadata, and chunks are normally allocated on the device with the most unallocated space available, provided the other constraints (such as don't but both copies on the same device) are met. Btrfs raid0 stripes will be as wide as possible, but again are allocated a chunk at a time, in sub-chunk-size strips. While I've not run btrfs raid10 personally and thus (as a sysadmin not a dev) can't say for sure, what this implies to me is that, assuming equal sized devices, an odd number of devices in raid10 will alternate skipping one device at each chunk allocation. So with a five same-size device btrfs raid10, if I'm not mistaken, btrfs will allocate chunks from four at once, two mirrors, two stripes, with the fifth one unused for that chunk allocation. However, at the next chunk allocation, the device skipped in the previous allocation will now have the most free space and will thus get the first allocation, with the one of the other four devices skipped in that allocation round. After five allocation rounds (assuming all allocation rounds were 1 GiB data chunks, not quarter-GiB metadata), usage should thus be balanced across all five devices. Of course with six same-size devices, because btrfs raid1 does exactly two copies, no more, each stripe will be three devices wide. As for the dataloss question, unlike say raid56 mode which is known to be effectively little more than expensive raid0 at this point, raid10 should be as reliable as raid1, etc. But I'd refer again to that sysadmin's rule of thumb above. If you don't have tested backups, you don't have backups, and if you don't have backups, the data is by definition not valuable enough to be worth the hassle of backing it up; the calculated risk cost of data loss is lower than the given time required to make, test and keep current the backups. After that, it's your decision whether you value that data more than the time required to make and maintain those backups, or not, given the risk factor including the fact that btrfs is still under heavy development and is not yet fully stable. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS balance segfault, where to go from here 2014-10-28 0:07 ` Duncan @ 2014-10-28 11:33 ` Stephan Alz 2014-10-28 13:12 ` E V 2014-10-28 13:33 ` Duncan 0 siblings, 2 replies; 8+ messages in thread From: Stephan Alz @ 2014-10-28 11:33 UTC (permalink / raw) To: linux-btrfs Hello Folks, Thanks for the help what I got so far. I did what you have recommended and upgraded the kernel to 3.16. After reboot it automatically resumed the balancing operation. For about 2 hours it went well: Label: 'backup' ... Total devices 5 FS bytes used 5.81TiB devid 1 size 3.64TiB used 2.77TiB path /dev/sdc devid 2 size 3.64TiB used 2.77TiB path /dev/sdb devid 3 size 3.64TiB used 2.77TiB path /dev/sda devid 4 size 3.64TiB used 2.76TiB path /dev/sdd devid 5 size 3.64TiB used 572.00GiB path /dev/sdf < interestingly the used is now lower than it was After that all the sudden I just lost the machine. As I thought it crashed with kernel panic but this wasn't like with the 3.13, it killed the whole system. Not even the magic keys worked. http://i59.tinypic.com/5we5ib.jpg Then when I tried to reboot it with 3.16 the system always segfaulted at boot time when it tried to mount the btrfs filesystem. With 3.13 it at least didn't crash the entire system so I booted back to that and managed to stop the balancing: >btrfs filesystem balance status /mnt/backup Balance on '/mnt/backup' is paused 1 out of about 10 chunks balanced (1 considered), 90% left Now my filesystem is fortunately back to RW again. Backups can continue tonight. And about the "data not being important to backed up", hell yes it is so yesterday I did a "backup of the backups" to a good old XFS filesystem (something which is reliable). The problem is that our whole backup system was designed to use BTRFS. It rsync from a lot of servers to the backup server every night then creates snapshots. Changing this and going back to other filesystem would require a lot of time and effort, possibly rewriting all of our backup scripts. What else can I do? Should I try an even later 3.18 kernel version? Can this happen because it doesn't have enough space for real? The counter now says that: btrfs 19534313824 12468488824 3753187048 77% The whole point I added the new drive is because it was running out of space. Somebody could really explain how this balancing works with RAID10 mode. What I want to know that if ANY of the drives are fail do we lose data or not? And the fact that the balancing is paused now changes this or not? If any of the drives out of the 5 would completely fail right now, would I lose all the data? I definitely don't want to leave the system in an inconsistent state like this. At least the backups are only done at nights so if I can get the backup drive mounted to RW by the end of the day that's enough. Thanks At the end I attached some recent 3.13 crash logs (maybe it's any help). [Tue Oct 28 12:01:35 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Tue Oct 28 12:01:35 2014] btrfs D ffff88007fc14280 0 3820 3202 0x00000000 [Tue Oct 28 12:01:35 2014] ffff88003735e800 0000000000000086 0000000000000000 ffffffff81813480 [Tue Oct 28 12:01:35 2014] 0000000000014280 ffff880048feffd8 0000000000014280 ffff88003735e800 [Tue Oct 28 12:01:35 2014] 0000000000000246 ffff880036c8a000 ffff880036c8b260 ffff880036c8b2a0 [Tue Oct 28 12:01:35 2014] Call Trace: [Tue Oct 28 12:01:35 2014] [<ffffffffa02c486d>] ? btrfs_pause_balance+0x7d/0xf0 [btrfs] [Tue Oct 28 12:01:35 2014] [<ffffffff8109e400>] ? __wake_up_sync+0x10/0x10 [Tue Oct 28 12:01:35 2014] [<ffffffffa02d1692>] ? btrfs_ioctl+0x1652/0x1f00 [btrfs] [Tue Oct 28 12:01:35 2014] [<ffffffff81199ea1>] ? path_openat+0xd1/0x630 [Tue Oct 28 12:01:35 2014] [<ffffffff811956ac>] ? getname_flags+0xbc/0x1a0 [Tue Oct 28 12:01:35 2014] [<ffffffff814dad78>] ? __do_page_fault+0x298/0x540 [Tue Oct 28 12:01:35 2014] [<ffffffff8119c4c1>] ? do_vfs_ioctl+0x81/0x4d0 [Tue Oct 28 12:01:35 2014] [<ffffffff81154a88>] ? do_brk+0x198/0x2f0 [Tue Oct 28 12:01:35 2014] [<ffffffff8119c9b0>] ? SyS_ioctl+0xa0/0xc0 [Tue Oct 28 12:01:35 2014] [<ffffffff814deef9>] ? system_call_fastpath+0x16/0x1b [Tue Oct 28 12:03:35 2014] INFO: task btrfs:3820 blocked for more than 120 seconds. [Tue Oct 28 12:03:35 2014] Not tainted 3.13-0.bpo.1-amd64 #1 [Tue Oct 28 12:03:35 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Tue Oct 28 12:03:35 2014] btrfs D ffff88007fc14280 0 3820 3202 0x00000000 [Tue Oct 28 12:03:35 2014] ffff88003735e800 0000000000000086 0000000000000000 ffffffff81813480 [Tue Oct 28 12:03:35 2014] 0000000000014280 ffff880048feffd8 0000000000014280 ffff88003735e800 [Tue Oct 28 12:03:35 2014] 0000000000000246 ffff880036c8a000 ffff880036c8b260 ffff880036c8b2a0 [Tue Oct 28 12:03:35 2014] Call Trace: [Tue Oct 28 12:03:35 2014] [<ffffffffa02c486d>] ? btrfs_pause_balance+0x7d/0xf0 [btrfs] [Tue Oct 28 12:03:35 2014] [<ffffffff8109e400>] ? __wake_up_sync+0x10/0x10 [Tue Oct 28 12:03:35 2014] [<ffffffffa02d1692>] ? btrfs_ioctl+0x1652/0x1f00 [btrfs] [Tue Oct 28 12:03:35 2014] [<ffffffff81199ea1>] ? path_openat+0xd1/0x630 [Tue Oct 28 12:03:35 2014] [<ffffffff811956ac>] ? getname_flags+0xbc/0x1a0 [Tue Oct 28 12:03:35 2014] [<ffffffff814dad78>] ? __do_page_fault+0x298/0x540 [Tue Oct 28 12:03:35 2014] [<ffffffff8119c4c1>] ? do_vfs_ioctl+0x81/0x4d0 [Tue Oct 28 12:03:35 2014] [<ffffffff81154a88>] ? do_brk+0x198/0x2f0 [Tue Oct 28 12:03:35 2014] [<ffffffff8119c9b0>] ? SyS_ioctl+0xa0/0xc0 [Tue Oct 28 12:03:35 2014] [<ffffffff814deef9>] ? system_call_fastpath+0x16/0x1b [Tue Oct 28 12:03:48 2014] btrfs: found 16561 extents Sent: Tuesday, October 28, 2014 at 1:07 AM From: Duncan <1i5t5.duncan@cox.net> To: linux-btrfs@vger.kernel.org Subject: Re: BTRFS balance segfault, where to go from here Chris Murphy posted on Mon, 27 Oct 2014 10:51:16 -0600 as excerpted: > On Oct 27, 2014, at 3:26 AM, Stephan Alz <stephan008@gmx.com> wrote: >> >> My question is where to go from here? What I going to do right now is >> to copy the most important data to another separated XFS drive. >> What I planning to do is: >> >> 1, Upgrade the kernel 2, Upgrade BTRFS 3, Continue the balancing. > > Definitely upgrade the kernel and see how that goes, there's been many > many changes since 3.13. I would upgrade the user space tools also but > that's not as important. Just emphasizing... Because btrfs is still under heavy development and not yet fully stable, keeping particularly the kernel updated is vital, because running an old kernel often means running a kernel with known btrfs bugs, fixed in newer kernels. The userspace isn't quite as important since under normal operation it mostly simply tells the kernel what operations to perform, and an older userspace simply means you might be missing newer features. However, commands such as btrfs check (the old btrfsck) and btrfs restore work from userspace, so having a current btrfs-progs is important when you run into trouble and you're trying to fix things. That said, a couple of recent kernels has known issues. Don't use the 3.15 series at all, and be sure you're on 3.16.3 or newer for the 3.16 series. 3.17 introduced another bug, with the fix hopefully in 3.17.2 (it didn't make 3.17.1) and in 3.18-rcs. So 3.16.3 or later for stable kernel, or the latest 3.18-rc or live-git kernel, is what I'd recommend. The other alternative if you're really conservative is the latest long-term stable series kernel, 3.14.x, as it gets critical bugfixes as well, tho it won't be quite as current as 3.16.x or 3.18-rc. But anything older than the latest 3.14.x stable series is old and outdated in btrfs terms, and is thus not recommended. And 3.15, 3.16 before 3.16.3, and 3.17 before 3.17.2 (hopefully), are blackout versions due to known btrfs bugs. Avoid them. Of course with btrfs still not fully stable, the usual sysadmin rule of thumb that if you don't have a tested backup you don't have a backup, and if you don't have a backup, by definition you don't care if you lose the data, applies more than ever. If you're on not-yet-fully-stable btrfs and you don't have backups, by definition you don't care if you lose that data. There's people having to learn that the hard way, tho btrfs restore can often recover at least some of what would otherwise be lost. > FYI you can mount with skip_balance mount option to inhibit resuming > balance, sometimes pausing the balance isn't fast enough when there are > balance problems. =:^) >> Could someone please also explain that how is exactly the raid10 setup >> works with ODD number of drives with btrfs? >> Raid10 should be a stripe of mirrors. Now then this sdf drive is >> mirrored or striped or what? > > I have no idea honestly. Btrfs is very tolerant of adding odd number and > sizes of devices, but things get a bit nutty in actual operation > sometimes. In btrfs, raid1, including the raid1 side of raid10, is defined as exactly two copies of the data, one on each of two different devices. These copies are allocated by chunk size, 1 GiB size for data, quarter GiB size for metadata, and chunks are normally allocated on the device with the most unallocated space available, provided the other constraints (such as don't but both copies on the same device) are met. Btrfs raid0 stripes will be as wide as possible, but again are allocated a chunk at a time, in sub-chunk-size strips. While I've not run btrfs raid10 personally and thus (as a sysadmin not a dev) can't say for sure, what this implies to me is that, assuming equal sized devices, an odd number of devices in raid10 will alternate skipping one device at each chunk allocation. So with a five same-size device btrfs raid10, if I'm not mistaken, btrfs will allocate chunks from four at once, two mirrors, two stripes, with the fifth one unused for that chunk allocation. However, at the next chunk allocation, the device skipped in the previous allocation will now have the most free space and will thus get the first allocation, with the one of the other four devices skipped in that allocation round. After five allocation rounds (assuming all allocation rounds were 1 GiB data chunks, not quarter-GiB metadata), usage should thus be balanced across all five devices. Of course with six same-size devices, because btrfs raid1 does exactly two copies, no more, each stripe will be three devices wide. As for the dataloss question, unlike say raid56 mode which is known to be effectively little more than expensive raid0 at this point, raid10 should be as reliable as raid1, etc. But I'd refer again to that sysadmin's rule of thumb above. If you don't have tested backups, you don't have backups, and if you don't have backups, the data is by definition not valuable enough to be worth the hassle of backing it up; the calculated risk cost of data loss is lower than the given time required to make, test and keep current the backups. After that, it's your decision whether you value that data more than the time required to make and maintain those backups, or not, given the risk factor including the fact that btrfs is still under heavy development and is not yet fully stable. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS balance segfault, where to go from here 2014-10-28 11:33 ` Stephan Alz @ 2014-10-28 13:12 ` E V 2014-10-28 14:02 ` Rich Freeman 2014-10-28 13:33 ` Duncan 1 sibling, 1 reply; 8+ messages in thread From: E V @ 2014-10-28 13:12 UTC (permalink / raw) To: Stephan Alz; +Cc: linux-btrfs I've seen dead locks on 3.16.3. Personally, I'm staying with 3.14 until something newer stabilizes, haven't had any issues with it. You might want to try the latest 3.14, though I think there should be a new one pretty soon with quite a few btrfs patches. On Tue, Oct 28, 2014 at 7:33 AM, Stephan Alz <stephan008@gmx.com> wrote: > Hello Folks, > > Thanks for the help what I got so far. I did what you have recommended and upgraded the kernel to 3.16. > > After reboot it automatically resumed the balancing operation. For about 2 hours it went well: > > Label: 'backup' ... > Total devices 5 FS bytes used 5.81TiB > devid 1 size 3.64TiB used 2.77TiB path /dev/sdc > devid 2 size 3.64TiB used 2.77TiB path /dev/sdb > devid 3 size 3.64TiB used 2.77TiB path /dev/sda > devid 4 size 3.64TiB used 2.76TiB path /dev/sdd > devid 5 size 3.64TiB used 572.00GiB path /dev/sdf < interestingly the used is now lower than it was > > After that all the sudden I just lost the machine. As I thought it crashed with kernel panic but this wasn't like with the 3.13, it killed the whole system. Not even the magic keys worked. > > http://i59.tinypic.com/5we5ib.jpg > > Then when I tried to reboot it with 3.16 the system always segfaulted at boot time when it tried to mount the btrfs filesystem. > > With 3.13 it at least didn't crash the entire system so I booted back to that and managed to stop the balancing: > >>btrfs filesystem balance status /mnt/backup > > Balance on '/mnt/backup' is paused > 1 out of about 10 chunks balanced (1 considered), 90% left > > Now my filesystem is fortunately back to RW again. Backups can continue tonight. > And about the "data not being important to backed up", hell yes it is so yesterday I did a "backup of the backups" to a good old XFS filesystem (something which is reliable). The problem is that our whole backup system was designed to use BTRFS. It rsync from a lot of servers to the backup server every night then creates snapshots. Changing this and going back to other filesystem would require a lot of time and effort, possibly rewriting all of our backup scripts. > > What else can I do? > Should I try an even later 3.18 kernel version? > Can this happen because it doesn't have enough space for real? > > > The counter now says that: > btrfs 19534313824 12468488824 3753187048 77% > > The whole point I added the new drive is because it was running out of space. > Somebody could really explain how this balancing works with RAID10 mode. What I want to know that if ANY of the drives are fail do we lose data or not? And the fact that the balancing is paused now changes this or not? If any of the drives out of the 5 would completely fail right now, would I lose all the data? I definitely don't want to leave the system in an inconsistent state like this. At least the backups are only done at nights so if I can get the backup drive mounted to RW by the end of the day that's enough. > > Thanks > > At the end I attached some recent 3.13 crash logs (maybe it's any help). > > > [Tue Oct 28 12:01:35 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Tue Oct 28 12:01:35 2014] btrfs D ffff88007fc14280 0 3820 3202 0x00000000 > [Tue Oct 28 12:01:35 2014] ffff88003735e800 0000000000000086 0000000000000000 ffffffff81813480 > [Tue Oct 28 12:01:35 2014] 0000000000014280 ffff880048feffd8 0000000000014280 ffff88003735e800 > [Tue Oct 28 12:01:35 2014] 0000000000000246 ffff880036c8a000 ffff880036c8b260 ffff880036c8b2a0 > [Tue Oct 28 12:01:35 2014] Call Trace: > [Tue Oct 28 12:01:35 2014] [<ffffffffa02c486d>] ? btrfs_pause_balance+0x7d/0xf0 [btrfs] > [Tue Oct 28 12:01:35 2014] [<ffffffff8109e400>] ? __wake_up_sync+0x10/0x10 > [Tue Oct 28 12:01:35 2014] [<ffffffffa02d1692>] ? btrfs_ioctl+0x1652/0x1f00 [btrfs] > [Tue Oct 28 12:01:35 2014] [<ffffffff81199ea1>] ? path_openat+0xd1/0x630 > [Tue Oct 28 12:01:35 2014] [<ffffffff811956ac>] ? getname_flags+0xbc/0x1a0 > [Tue Oct 28 12:01:35 2014] [<ffffffff814dad78>] ? __do_page_fault+0x298/0x540 > [Tue Oct 28 12:01:35 2014] [<ffffffff8119c4c1>] ? do_vfs_ioctl+0x81/0x4d0 > [Tue Oct 28 12:01:35 2014] [<ffffffff81154a88>] ? do_brk+0x198/0x2f0 > [Tue Oct 28 12:01:35 2014] [<ffffffff8119c9b0>] ? SyS_ioctl+0xa0/0xc0 > [Tue Oct 28 12:01:35 2014] [<ffffffff814deef9>] ? system_call_fastpath+0x16/0x1b > [Tue Oct 28 12:03:35 2014] INFO: task btrfs:3820 blocked for more than 120 seconds. > [Tue Oct 28 12:03:35 2014] Not tainted 3.13-0.bpo.1-amd64 #1 > [Tue Oct 28 12:03:35 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Tue Oct 28 12:03:35 2014] btrfs D ffff88007fc14280 0 3820 3202 0x00000000 > [Tue Oct 28 12:03:35 2014] ffff88003735e800 0000000000000086 0000000000000000 ffffffff81813480 > [Tue Oct 28 12:03:35 2014] 0000000000014280 ffff880048feffd8 0000000000014280 ffff88003735e800 > [Tue Oct 28 12:03:35 2014] 0000000000000246 ffff880036c8a000 ffff880036c8b260 ffff880036c8b2a0 > [Tue Oct 28 12:03:35 2014] Call Trace: > [Tue Oct 28 12:03:35 2014] [<ffffffffa02c486d>] ? btrfs_pause_balance+0x7d/0xf0 [btrfs] > [Tue Oct 28 12:03:35 2014] [<ffffffff8109e400>] ? __wake_up_sync+0x10/0x10 > [Tue Oct 28 12:03:35 2014] [<ffffffffa02d1692>] ? btrfs_ioctl+0x1652/0x1f00 [btrfs] > [Tue Oct 28 12:03:35 2014] [<ffffffff81199ea1>] ? path_openat+0xd1/0x630 > [Tue Oct 28 12:03:35 2014] [<ffffffff811956ac>] ? getname_flags+0xbc/0x1a0 > [Tue Oct 28 12:03:35 2014] [<ffffffff814dad78>] ? __do_page_fault+0x298/0x540 > [Tue Oct 28 12:03:35 2014] [<ffffffff8119c4c1>] ? do_vfs_ioctl+0x81/0x4d0 > [Tue Oct 28 12:03:35 2014] [<ffffffff81154a88>] ? do_brk+0x198/0x2f0 > [Tue Oct 28 12:03:35 2014] [<ffffffff8119c9b0>] ? SyS_ioctl+0xa0/0xc0 > [Tue Oct 28 12:03:35 2014] [<ffffffff814deef9>] ? system_call_fastpath+0x16/0x1b > [Tue Oct 28 12:03:48 2014] btrfs: found 16561 extents > > Sent: Tuesday, October 28, 2014 at 1:07 AM > From: Duncan <1i5t5.duncan@cox.net> > To: linux-btrfs@vger.kernel.org > Subject: Re: BTRFS balance segfault, where to go from here > Chris Murphy posted on Mon, 27 Oct 2014 10:51:16 -0600 as excerpted: > >> On Oct 27, 2014, at 3:26 AM, Stephan Alz <stephan008@gmx.com> wrote: >>> >>> My question is where to go from here? What I going to do right now is >>> to copy the most important data to another separated XFS drive. >>> What I planning to do is: >>> >>> 1, Upgrade the kernel 2, Upgrade BTRFS 3, Continue the balancing. >> >> Definitely upgrade the kernel and see how that goes, there's been many >> many changes since 3.13. I would upgrade the user space tools also but >> that's not as important. > > Just emphasizing... > > Because btrfs is still under heavy development and not yet fully stable, > keeping particularly the kernel updated is vital, because running an old > kernel often means running a kernel with known btrfs bugs, fixed in newer > kernels. > > The userspace isn't quite as important since under normal operation it > mostly simply tells the kernel what operations to perform, and an older > userspace simply means you might be missing newer features. However, > commands such as btrfs check (the old btrfsck) and btrfs restore work > from userspace, so having a current btrfs-progs is important when you run > into trouble and you're trying to fix things. > > That said, a couple of recent kernels has known issues. Don't use the > 3.15 series at all, and be sure you're on 3.16.3 or newer for the 3.16 > series. 3.17 introduced another bug, with the fix hopefully in 3.17.2 > (it didn't make 3.17.1) and in 3.18-rcs. > > So 3.16.3 or later for stable kernel, or the latest 3.18-rc or live-git > kernel, is what I'd recommend. The other alternative if you're really > conservative is the latest long-term stable series kernel, 3.14.x, as it > gets critical bugfixes as well, tho it won't be quite as current as > 3.16.x or 3.18-rc. But anything older than the latest 3.14.x stable > series is old and outdated in btrfs terms, and is thus not recommended. > And 3.15, 3.16 before 3.16.3, and 3.17 before 3.17.2 (hopefully), are > blackout versions due to known btrfs bugs. Avoid them. > > Of course with btrfs still not fully stable, the usual sysadmin rule of > thumb that if you don't have a tested backup you don't have a backup, and > if you don't have a backup, by definition you don't care if you lose the > data, applies more than ever. If you're on not-yet-fully-stable btrfs > and you don't have backups, by definition you don't care if you lose that > data. There's people having to learn that the hard way, tho btrfs > restore can often recover at least some of what would otherwise be lost. > >> FYI you can mount with skip_balance mount option to inhibit resuming >> balance, sometimes pausing the balance isn't fast enough when there are >> balance problems. > > =:^) > >>> Could someone please also explain that how is exactly the raid10 setup >>> works with ODD number of drives with btrfs? >>> Raid10 should be a stripe of mirrors. Now then this sdf drive is >>> mirrored or striped or what? >> >> I have no idea honestly. Btrfs is very tolerant of adding odd number and >> sizes of devices, but things get a bit nutty in actual operation >> sometimes. > > In btrfs, raid1, including the raid1 side of raid10, is defined as > exactly two copies of the data, one on each of two different devices. > These copies are allocated by chunk size, 1 GiB size for data, quarter > GiB size for metadata, and chunks are normally allocated on the device > with the most unallocated space available, provided the other constraints > (such as don't but both copies on the same device) are met. > > Btrfs raid0 stripes will be as wide as possible, but again are allocated > a chunk at a time, in sub-chunk-size strips. > > While I've not run btrfs raid10 personally and thus (as a sysadmin not a > dev) can't say for sure, what this implies to me is that, assuming equal > sized devices, an odd number of devices in raid10 will alternate skipping > one device at each chunk allocation. > > So with a five same-size device btrfs raid10, if I'm not mistaken, btrfs > will allocate chunks from four at once, two mirrors, two stripes, with > the fifth one unused for that chunk allocation. However, at the next > chunk allocation, the device skipped in the previous allocation will now > have the most free space and will thus get the first allocation, with the > one of the other four devices skipped in that allocation round. After > five allocation rounds (assuming all allocation rounds were 1 GiB data > chunks, not quarter-GiB metadata), usage should thus be balanced across > all five devices. > > Of course with six same-size devices, because btrfs raid1 does exactly > two copies, no more, each stripe will be three devices wide. > > > As for the dataloss question, unlike say raid56 mode which is known to be > effectively little more than expensive raid0 at this point, raid10 should > be as reliable as raid1, etc. But I'd refer again to that sysadmin's > rule of thumb above. If you don't have tested backups, you don't have > backups, and if you don't have backups, the data is by definition not > valuable enough to be worth the hassle of backing it up; the calculated > risk cost of data loss is lower than the given time required to make, > test and keep current the backups. After that, it's your decision > whether you value that data more than the time required to make and > maintain those backups, or not, given the risk factor including the fact > that btrfs is still under heavy development and is not yet fully stable. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS balance segfault, where to go from here 2014-10-28 13:12 ` E V @ 2014-10-28 14:02 ` Rich Freeman 0 siblings, 0 replies; 8+ messages in thread From: Rich Freeman @ 2014-10-28 14:02 UTC (permalink / raw) To: E V; +Cc: Stephan Alz, linux-btrfs On Tue, Oct 28, 2014 at 9:12 AM, E V <eliventer@gmail.com> wrote: > I've seen dead locks on 3.16.3. Personally, I'm staying with 3.14 > until something newer stabilizes, haven't had any issues with it. You > might want to try the latest 3.14, though I think there should be a > new one pretty soon with quite a few btrfs patches. Yeah, I forget what drove me to switch to a newer kernel, but I'm wishing I had stuck with 3.14. The last set of stable kernels has been a pretty rough ride. :) My sense browsing the list is that the activity level has picked up a bit, and that might be why 3.15-17 have been a bit more bug-ridden than is normal. For the long-term it is actually a good sign for the vitality of btrfs. But, I'll probably track 3.17 until a new longterm is announced and be a bit more conservative. -- Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS balance segfault, where to go from here 2014-10-28 11:33 ` Stephan Alz 2014-10-28 13:12 ` E V @ 2014-10-28 13:33 ` Duncan 2014-10-28 17:01 ` Rich Freeman 1 sibling, 1 reply; 8+ messages in thread From: Duncan @ 2014-10-28 13:33 UTC (permalink / raw) To: linux-btrfs Stephan Alz posted on Tue, 28 Oct 2014 12:33:12 +0100 as excerpted: > And about the "data not being important to backed up", hell yes it is so > yesterday I did a "backup of the backups" to a good old XFS filesystem > (something which is reliable). Makes sense. FWIW, my second backup is to reiserfs, which (counter to reputation) I've found extremely reliable, even thru various bits of faulty hardware over the years, at least since the switch to data=ordered by default. > The problem is that our whole backup > system was designed to use BTRFS. It rsync from a lot of servers to the > backup server every night then creates snapshots. Changing this and > going back to other filesystem would require a lot of time and effort, > possibly rewriting all of our backup scripts. Ouch. The filesystem access and write pattern rsync does appears to be quite stressful for btrfs, and has triggered a number of race-condition and similar bugs over time as the filesystem has continued to develop and mature. They get fixed, but the number of times rsync has been a trigger does demonstrate that it's rather higher stress on a btrfs than most access and write patterns. Between that and the fact that btrfs /isn't/ yet fully stable and mature, it's not a filesystem I'd normally recommend... yet... for production or production backup use, where down-time waiting for the non-btrfs second level backup to restore is really going to hurt. Here I'm running it, but it's just my own system and primary backup, and if it goes down, nothing but a bit of personal inconvenience and stress is at stake. In hindsight, I guess you'd do a bit more research into your backup filesystem before designing a backup system dependent on it, but kinda late for that now. Like I said, ouch. FWIW, you might look into zfs, either on Linux, or under FreeBSD or the like. While it does have license issues on Linux (and isn't an option for me), it's the closest parallel to the btrfs feature set, except it's actually mature. I'm told it does require significantly more memory, preferably ECC, however, but under the circumstances, it may be less of a redesign to substitute it in place of btrfs and use its mature feature- set, even if you have to throw some money into hardware to run it properly, than it'd be to try to redesign your entire backup setup so it's not dependent on otherwise btrfs-specific feature such as snapshots, etc. Since it's not an option here I've not looked into it too closely personally, and don't know if it'll fit your needs, but if it does, it may well be simpler to substitute it into the existing backup setup without rewriting the WHOLE thing, than to do that full rewrite from scratch, without the btrfs/zfs features. I'd at least look into it, assuming you haven't already. > What else can I do? > Should I try an even later 3.18 kernel version? > Can this happen because it doesn't have enough space for real? > > > The counter now says that: > btrfs 19534313824 12468488824 3753187048 77% > > The whole point I added the new drive is because it was running out of > space. > Somebody could really explain how this balancing works with RAID10 mode. > What I want to know that if ANY of the drives are fail do we lose data > or not? And the fact that the balancing is paused now changes this or > not? If any of the drives out of the 5 would completely fail right now, > would I lose all the data? I definitely don't want to leave the system > in an inconsistent state like this. At least the backups are only done > at nights so if I can get the backup drive mounted to RW by the end of > the day that's enough. In theory, your data isn't in danger unless two devices fail at once, because the not yet rebalanced data is raid10 over four devices, while the rebalanced data is raid10 over five, so either way, dropout of a single device should at worst force the filesystem read-only. And one of the reasons rebalance does require additional space is because it does /not/ rewrite in-place, but creates new chunks to write the data and metadata in, and only deactivates the old ones once the new ones are fully written and online. So the balance activity itself shouldn't put the data in further danger; you should have two full copies of every chunk at every possible point. However, btrfs is /not/ yet fully stable, and you're obviously running into a bug of /some/ sort, so while the theory says you're fine as long as you lose only a single device, due to the unknown nature of the bugs you're already seeing and more specifically, the unknown effect the bug itself might have on the raid10 mode, reality can't guarantee that. Or at least I can't. But I'm only an admin and list regular, not a dev, so it's not as if I can look at the code and the bugs and personally say, one way or the other, that they do or don't affect the raid10 distribution and thus the defined existence of the second copy. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BTRFS balance segfault, where to go from here 2014-10-28 13:33 ` Duncan @ 2014-10-28 17:01 ` Rich Freeman 0 siblings, 0 replies; 8+ messages in thread From: Rich Freeman @ 2014-10-28 17:01 UTC (permalink / raw) To: Duncan; +Cc: Btrfs BTRFS On Tue, Oct 28, 2014 at 9:33 AM, Duncan <1i5t5.duncan@cox.net> wrote: > Since it's not an option here I've not looked into it too closely > personally, and don't know if it'll fit your needs, but if it does, it > may well be simpler to substitute it into the existing backup setup > without rewriting the WHOLE thing, than to do that full rewrite from > scratch, without the btrfs/zfs features. I'd at least look into it, > assuming you haven't already. I haven't researched zfs as thoroughly as btrfs and I'm not running it, but you're certainly right that it is more mature (though I would not say that zfs on linux is as mature as zfs on BSD or especially Solaris). Keep in mind that ZFS is marketed more towards enterprise workloads. It isn't quite a dynamic as btrfs is intended to be, though in truth many of those btrfs features like reshaping a raid5 aren't implemented yet. My sense is that you're going to need to plan ahead a bit more with ZFS and making changes without doing a full backup/re-create is going to be harder. It also isn't designed for SSD (though it does have features for SSD caching of the write log and I think also read-caching, which is something that does not yet exist for btrfs). >From what I understand of both I'd say that btrfs actually has the better overall design, but zfs just has a LOT more maturity. I think that btrfs will eventually overtake it, but just when that will happen is anybody's guess, and it certainly isn't there today. The one thing that zfs does have going for you is that you're very unlikely to get BUGs and PANICs anytime you do something as simple as running rsync on it. I will also note that I rsync data off of my btrfs filesystem all the time without issue. I do not have experience with using rsync to write TO a btrfs filesystem. Right now I don't trust btrfs send enough to rely on it - the whole purpose of using rsync right now is to backup my btrfs data to an ext4 partition which lets me sleep well at night while still getting to play around with btrfs and make use of features like snapshots/etc. :) If I was running a large (ie measured in 10s of disks) storage system I'd probably go with ZFS now. In such a setup being limited to RAID6s of maybe 7 drives each and having to add/remove drives 7 at a time wouldn't be a big deal. When you're running a system with 6 disks total that is a much bigger limitation. If you look at something like Backblade's storage pods that is the perfect example of the kind of situation ZFS was designed to handle. On the other hand, btrfs aims to eventually address that while being a decent default filesystem for your smartphone. -- Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-10-28 17:01 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-27 9:26 BTRFS balance segfault, where to go from here Stephan Alz 2014-10-27 16:51 ` Chris Murphy 2014-10-28 0:07 ` Duncan 2014-10-28 11:33 ` Stephan Alz 2014-10-28 13:12 ` E V 2014-10-28 14:02 ` Rich Freeman 2014-10-28 13:33 ` Duncan 2014-10-28 17:01 ` Rich Freeman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox