* Understanding BTRFS storage @ 2015-08-26 8:56 George Duffield 2015-08-26 11:41 ` Austin S Hemmelgarn ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: George Duffield @ 2015-08-26 8:56 UTC (permalink / raw) To: linux-btrfs Hi Is there a more comprehensive discussion/ documentation of Btrfs features than is referenced in https://btrfs.wiki.kernel.org/index.php/Main_Page...I'd love to learn more but it seems there's no readily available authoritative documentation out there? I'm looking to switch from a 5x3TB mdadm raid5 array to a Btrfs based solution that will involve duplicating a data store on a second machine for backup purposes (the machine is only powered up for backups). Two quick questions: - If I were simply to create a Btrfs volume using 5x3TB drives and not create a raid5/6/10 array I understand data would be striped across the 5 drives with no reduncancy ... i.e. if a drive fails all data is lost? Is this correct? - Is Btrfs RAID10 (for data) ready to be used reliably? */ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-26 8:56 Understanding BTRFS storage George Duffield @ 2015-08-26 11:41 ` Austin S Hemmelgarn 2015-08-26 11:50 ` Hugo Mills ` (2 subsequent siblings) 3 siblings, 0 replies; 18+ messages in thread From: Austin S Hemmelgarn @ 2015-08-26 11:41 UTC (permalink / raw) To: George Duffield, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 990 bytes --] On 2015-08-26 04:56, George Duffield wrote: > Hi > > Is there a more comprehensive discussion/ documentation of Btrfs > features than is referenced in > https://btrfs.wiki.kernel.org/index.php/Main_Page...I'd love to learn > more but it seems there's no readily available authoritative > documentation out there? > > I'm looking to switch from a 5x3TB mdadm raid5 array to a Btrfs based > solution that will involve duplicating a data store on a second > machine for backup purposes (the machine is only powered up for > backups). > > Two quick questions: > - If I were simply to create a Btrfs volume using 5x3TB drives and not > create a raid5/6/10 array I understand data would be striped across > the 5 drives with no reduncancy ... i.e. if a drive fails all data is > lost? Is this correct? Yes, although the striping is at a much larger granularity than a typical RAID0. > > - Is Btrfs RAID10 (for data) ready to be used reliably? Yes, in general it is. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3019 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-26 8:56 Understanding BTRFS storage George Duffield 2015-08-26 11:41 ` Austin S Hemmelgarn @ 2015-08-26 11:50 ` Hugo Mills 2015-08-26 11:50 ` Roman Mamedov 2015-08-26 11:50 ` Duncan 3 siblings, 0 replies; 18+ messages in thread From: Hugo Mills @ 2015-08-26 11:50 UTC (permalink / raw) To: George Duffield; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1701 bytes --] On Wed, Aug 26, 2015 at 10:56:03AM +0200, George Duffield wrote: > Hi > > Is there a more comprehensive discussion/ documentation of Btrfs > features than is referenced in > https://btrfs.wiki.kernel.org/index.php/Main_Page...I'd love to learn > more but it seems there's no readily available authoritative > documentation out there? > > I'm looking to switch from a 5x3TB mdadm raid5 array to a Btrfs based > solution that will involve duplicating a data store on a second > machine for backup purposes (the machine is only powered up for > backups). > > Two quick questions: > - If I were simply to create a Btrfs volume using 5x3TB drives and not > create a raid5/6/10 array I understand data would be striped across > the 5 drives with no reduncancy ... i.e. if a drive fails all data is > lost? Is this correct? With RAID-1 metadata and single data, when you lose a device the FS will continue to be usable. Any data that was stored on the missing device will return an I/O error when you try to read it. With single data, the data space is assigned to devices in 1 GiB chunks in turn. Within that, files that are written once and not modified are likely to be placed linearly within that sequence. Files that get modified may have their modifications placed out of sequence on other chunks and devices. > - Is Btrfs RAID10 (for data) ready to be used reliably? I'd say yes, others may say no. I'd suggest using RAID-1 for now anyway -- it uses the space better when you come to add new devices or replace them (with different sizes). Hugo. -- Hugo Mills | Preventing talpidian orogenesis. hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-26 8:56 Understanding BTRFS storage George Duffield 2015-08-26 11:41 ` Austin S Hemmelgarn 2015-08-26 11:50 ` Hugo Mills @ 2015-08-26 11:50 ` Roman Mamedov 2015-08-26 12:03 ` Austin S Hemmelgarn 2015-08-26 11:50 ` Duncan 3 siblings, 1 reply; 18+ messages in thread From: Roman Mamedov @ 2015-08-26 11:50 UTC (permalink / raw) To: George Duffield; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 674 bytes --] On Wed, 26 Aug 2015 10:56:03 +0200 George Duffield <forumscollective@gmail.com> wrote: > I'm looking to switch from a 5x3TB mdadm raid5 array to a Btrfs based > solution that will involve duplicating a data store on a second > machine for backup purposes (the machine is only powered up for > backups). What do you want to achieve by switching? As Btrfs RAID5/6 is not safe yet, do you also plan to migrate to RAID10, losing in storage efficiency? Why not use Btrfs in single-device mode on top of your mdadm RAID5/6? Can even migrate without moving any data if you currently use Ext4, as it can be converted to Btrfs in-place. -- With respect, Roman [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-26 11:50 ` Roman Mamedov @ 2015-08-26 12:03 ` Austin S Hemmelgarn 2015-08-27 2:58 ` Duncan 2015-08-28 8:50 ` George Duffield 0 siblings, 2 replies; 18+ messages in thread From: Austin S Hemmelgarn @ 2015-08-26 12:03 UTC (permalink / raw) To: Roman Mamedov, George Duffield; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 878 bytes --] On 2015-08-26 07:50, Roman Mamedov wrote: > On Wed, 26 Aug 2015 10:56:03 +0200 > George Duffield <forumscollective@gmail.com> wrote: > >> I'm looking to switch from a 5x3TB mdadm raid5 array to a Btrfs based >> solution that will involve duplicating a data store on a second >> machine for backup purposes (the machine is only powered up for >> backups). > > What do you want to achieve by switching? As Btrfs RAID5/6 is not safe yet, do > you also plan to migrate to RAID10, losing in storage efficiency? > > Why not use Btrfs in single-device mode on top of your mdadm RAID5/6? Can even > migrate without moving any data if you currently use Ext4, as it can be > converted to Btrfs in-place. > As of right now, btrfs-convert does not work reliably or safely. I would strongly advise against using it unless you are trying to help get it working again. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3019 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-26 12:03 ` Austin S Hemmelgarn @ 2015-08-27 2:58 ` Duncan 2015-08-27 12:01 ` Austin S Hemmelgarn 2015-08-28 8:50 ` George Duffield 1 sibling, 1 reply; 18+ messages in thread From: Duncan @ 2015-08-27 2:58 UTC (permalink / raw) To: linux-btrfs Austin S Hemmelgarn posted on Wed, 26 Aug 2015 08:03:40 -0400 as excerpted: > On 2015-08-26 07:50, Roman Mamedov wrote: >> On Wed, 26 Aug 2015 10:56:03 +0200 George Duffield >> <forumscollective@gmail.com> wrote: >> >>> I'm looking to switch from a 5x3TB mdadm raid5 array to a Btrfs based >>> solution that will involve duplicating a data store on a second >>> machine for backup purposes (the machine is only powered up for >>> backups). >> >> What do you want to achieve by switching? As Btrfs RAID5/6 is not safe >> yet, do you also plan to migrate to RAID10, losing in storage >> efficiency? >> >> Why not use Btrfs in single-device mode on top of your mdadm RAID5/6? >> Can even migrate without moving any data if you currently use Ext4, as >> it can be converted to Btrfs in-place. Someone (IIRC it was Austin H) posted what I thought was an extremely good setup, a few weeks ago. Create two (or more) mdraid0s, and put btrfs raid1 (or raid5/6 when it's a bit more mature, I've been recommending waiting until 4.4 and see what the on-list reports for it look like then) on top. The btrfs raid on top lets you use btrfs' data integrity features, while the mdraid0s beneath help counteract the fact that btrfs isn't well optimized for speed yet, the way mdraid has been. And the btrfs raid on top means all is not lost with a device going bad in the mdraid0, as would normally be the case, since the other raid0(s), functioning as the remaining btrfs devices, let you rebuild the missing btrfs device, by recreating the missing raid0. Normally, that sort of raid01 is discouraged in favor of raid10, with raid1 at the lower level and raid0 on top, for more efficient rebuilds, but btrfs' data integrity features change that story entirely. =:^) > As of right now, btrfs-convert does not work reliably or safely. I > would strongly advise against using it unless you are trying to help get > it working again. Seconded. Better to use your existing ext4 as a backup, which you should have anyway if you value your data, and copy the data from that ext4 "backup" to the new btrfs you created with mkfs.btrfs using your preferred options. That leaves the existing ext4 in place /as/ a backup, while starting with a fresh and clean btrfs, setup with exactly the options you want. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-27 2:58 ` Duncan @ 2015-08-27 12:01 ` Austin S Hemmelgarn 2015-08-28 9:47 ` Duncan 0 siblings, 1 reply; 18+ messages in thread From: Austin S Hemmelgarn @ 2015-08-27 12:01 UTC (permalink / raw) To: Duncan, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2575 bytes --] On 2015-08-26 22:58, Duncan wrote: > Austin S Hemmelgarn posted on Wed, 26 Aug 2015 08:03:40 -0400 as > excerpted: > >> On 2015-08-26 07:50, Roman Mamedov wrote: >>> On Wed, 26 Aug 2015 10:56:03 +0200 George Duffield >>> <forumscollective@gmail.com> wrote: >>> >>>> I'm looking to switch from a 5x3TB mdadm raid5 array to a Btrfs based >>>> solution that will involve duplicating a data store on a second >>>> machine for backup purposes (the machine is only powered up for >>>> backups). >>> >>> What do you want to achieve by switching? As Btrfs RAID5/6 is not safe >>> yet, do you also plan to migrate to RAID10, losing in storage >>> efficiency? >>> >>> Why not use Btrfs in single-device mode on top of your mdadm RAID5/6? >>> Can even migrate without moving any data if you currently use Ext4, as >>> it can be converted to Btrfs in-place. > > Someone (IIRC it was Austin H) posted what I thought was an extremely > good setup, a few weeks ago. Create two (or more) mdraid0s, and put > btrfs raid1 (or raid5/6 when it's a bit more mature, I've been > recommending waiting until 4.4 and see what the on-list reports for it > look like then) on top. The btrfs raid on top lets you use btrfs' data > integrity features, while the mdraid0s beneath help counteract the fact > that btrfs isn't well optimized for speed yet, the way mdraid has been. > And the btrfs raid on top means all is not lost with a device going bad > in the mdraid0, as would normally be the case, since the other raid0(s), > functioning as the remaining btrfs devices, let you rebuild the missing > btrfs device, by recreating the missing raid0. > > Normally, that sort of raid01 is discouraged in favor of raid10, with > raid1 at the lower level and raid0 on top, for more efficient rebuilds, > but btrfs' data integrity features change that story entirely. =:^) > Two additional things: 1. If you use MD RAID1 instead of RAID0, it's just as fast for reads, no slower than on top of single disks for writes, and get's you better data safety guarantees than even raid6 (if you do 2 MD RAID 1 devices with BTRFS raid1 on top, you can lose all but one disk and still have all your data). 2. I would be cautious of MD/DM RAID on the most recent kernels, the clustered MD code that went in recently broke a lot of things initially, and I'm not yet convinced that they have managed to glue everything back together yet (I'm still having occasional problems with RAID1 and RAID10 on LVM), so do some testing on a non-production system first. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3019 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-27 12:01 ` Austin S Hemmelgarn @ 2015-08-28 9:47 ` Duncan 2015-08-28 12:54 ` Austin S Hemmelgarn 0 siblings, 1 reply; 18+ messages in thread From: Duncan @ 2015-08-28 9:47 UTC (permalink / raw) To: linux-btrfs Austin S Hemmelgarn posted on Thu, 27 Aug 2015 08:01:58 -0400 as excerpted: >> Someone (IIRC it was Austin H) posted what I thought was an extremely >> good setup, a few weeks ago. Create two (or more) mdraid0s, and put >> btrfs raid1 (or raid5/6 when it's a bit more mature, I've been >> recommending waiting until 4.4 and see what the on-list reports for it >> look like then) on top. The btrfs raid on top lets you use btrfs' data >> integrity features, while the mdraid0s beneath help counteract the fact >> that btrfs isn't well optimized for speed yet, the way mdraid has been. >> And the btrfs raid on top means all is not lost with a device going bad >> in the mdraid0, as would normally be the case, since the other >> raid0(s), >> functioning as the remaining btrfs devices, let you rebuild the missing >> btrfs device, by recreating the missing raid0. >> >> Normally, that sort of raid01 is discouraged in favor of raid10, with >> raid1 at the lower level and raid0 on top, for more efficient rebuilds, >> but btrfs' data integrity features change that story entirely. =:^) >> > Two additional things: > 1. If you use MD RAID1 instead of RAID0, it's just as fast for reads, no > slower than on top of single disks for writes, and get's you better data > safety guarantees than even raid6 (if you do 2 MD RAID 1 devices with > BTRFS raid1 on top, you can lose all but one disk and still have all > your data). My hesitation for btrfs raid1 on top of mdraid1, is that a btrfs scrub doesn't scrub all the mdraid component devices. Of course if btrfs scrub finds an error, it will try to rewrite the bad copy from the (hopefully good) other btrfs raid1 copy, and that will trigger a rewrite of both/all copies on that underlying mdraid1, which should catch the bad one in the process no matter which one it was. But if one of the lower level mdraid1 component devices is bad while the other(s) are good, and mdraid happens to pick the good device, it won't even see and thus can't scrub the bad lower-level copy. To avoid that problem, one can of course do an mdraid level scrub followed by a btrfs scrub. The mdraid level scrub won't tell bad from good but will simply ensure they match, and if it happens to pick the bad one at that level, the followon btrfs level scrub will detect that and trigger a rewrite from its other copy, which again, will rewrite both/all the underlying mdraid1 component devices on that btrfs raid1 side, but that still wouldn't ensure that the rewrite actually happened properly, so then you're left redoing both levels yet again, to ensure that. Which in theory can work, but in practice, particularly on spinning rust, you pretty quickly reach a point when you're running 24/7 scrubs, which, again particularly on spinning rust, is going to kill throughput for pretty much any other IO going on at the same time. Which is one of the reasons I found btrfs raid1 on mdraid0 so appealing in comparison -- raid0 has only the single copy, which is either correct or incorrect, and if the btrfs scrub turns up a problem, it does the rewrite, and a single second pass of that btrfs scrub can verify that the rewrite happened correctly, because there's no hidden copies being picked more or less randomly at the mdraid level, only the single copy, which is either correct or incorrect. I like that determinism! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-28 9:47 ` Duncan @ 2015-08-28 12:54 ` Austin S Hemmelgarn 0 siblings, 0 replies; 18+ messages in thread From: Austin S Hemmelgarn @ 2015-08-28 12:54 UTC (permalink / raw) To: Duncan, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 3983 bytes --] On 2015-08-28 05:47, Duncan wrote: > Austin S Hemmelgarn posted on Thu, 27 Aug 2015 08:01:58 -0400 as > excerpted: > >>> Someone (IIRC it was Austin H) posted what I thought was an extremely >>> good setup, a few weeks ago. Create two (or more) mdraid0s, and put >>> btrfs raid1 (or raid5/6 when it's a bit more mature, I've been >>> recommending waiting until 4.4 and see what the on-list reports for it >>> look like then) on top. The btrfs raid on top lets you use btrfs' data >>> integrity features, while the mdraid0s beneath help counteract the fact >>> that btrfs isn't well optimized for speed yet, the way mdraid has been. >>> And the btrfs raid on top means all is not lost with a device going bad >>> in the mdraid0, as would normally be the case, since the other >>> raid0(s), >>> functioning as the remaining btrfs devices, let you rebuild the missing >>> btrfs device, by recreating the missing raid0. >>> >>> Normally, that sort of raid01 is discouraged in favor of raid10, with >>> raid1 at the lower level and raid0 on top, for more efficient rebuilds, >>> but btrfs' data integrity features change that story entirely. =:^) >>> >> Two additional things: >> 1. If you use MD RAID1 instead of RAID0, it's just as fast for reads, no >> slower than on top of single disks for writes, and get's you better data >> safety guarantees than even raid6 (if you do 2 MD RAID 1 devices with >> BTRFS raid1 on top, you can lose all but one disk and still have all >> your data). > > My hesitation for btrfs raid1 on top of mdraid1, is that a btrfs scrub > doesn't scrub all the mdraid component devices. > > Of course if btrfs scrub finds an error, it will try to rewrite the bad > copy from the (hopefully good) other btrfs raid1 copy, and that will > trigger a rewrite of both/all copies on that underlying mdraid1, which > should catch the bad one in the process no matter which one it was. > > But if one of the lower level mdraid1 component devices is bad while the > other(s) are good, and mdraid happens to pick the good device, it won't > even see and thus can't scrub the bad lower-level copy. > > To avoid that problem, one can of course do an mdraid level scrub > followed by a btrfs scrub. The mdraid level scrub won't tell bad from > good but will simply ensure they match, and if it happens to pick the bad > one at that level, the followon btrfs level scrub will detect that and > trigger a rewrite from its other copy, which again, will rewrite both/all > the underlying mdraid1 component devices on that btrfs raid1 side, but > that still wouldn't ensure that the rewrite actually happened properly, > so then you're left redoing both levels yet again, to ensure that. > > Which in theory can work, but in practice, particularly on spinning rust, > you pretty quickly reach a point when you're running 24/7 scrubs, which, > again particularly on spinning rust, is going to kill throughput for > pretty much any other IO going on at the same time. Well yes, but only if you are working with large data sets. In my use case, the usage amounts to write once, read at most twice, and the data sets are both less than 32G, so scrubbing the lower level RAID1 takes about 10 minutes as of right now. In particular, the array's get written to at most once a day, and only read when the primary data sources fail. In my use case, performance isn't as important as up-time. > > Which is one of the reasons I found btrfs raid1 on mdraid0 so appealing > in comparison -- raid0 has only the single copy, which is either correct > or incorrect, and if the btrfs scrub turns up a problem, it does the > rewrite, and a single second pass of that btrfs scrub can verify that the > rewrite happened correctly, because there's no hidden copies being picked > more or less randomly at the mdraid level, only the single copy, which is > either correct or incorrect. I like that determinism! =:^) > [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3019 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-26 12:03 ` Austin S Hemmelgarn 2015-08-27 2:58 ` Duncan @ 2015-08-28 8:50 ` George Duffield 2015-08-28 9:35 ` Hugo Mills 2015-08-28 9:46 ` Roman Mamedov 1 sibling, 2 replies; 18+ messages in thread From: George Duffield @ 2015-08-28 8:50 UTC (permalink / raw) To: Austin S Hemmelgarn; +Cc: Roman Mamedov, linux-btrfs Running a traditional raid5 array of that size is statistically guaranteed to fail in the event of a rebuild. I also need to expand the size of available storage to accomodate future storage requirements. My understanding is that a Btrfs array is easily expanded without the overhead associated with expanding a traditional array. Add to that the ability to throw varying drive sizes at the problem and a Btrfs RAID array looks pretty appealing. For clarity, my intention is to create a Btrfs array using new drives, not to convert the existing ext4 raid5 array. On Wed, Aug 26, 2015 at 2:03 PM, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote: > On 2015-08-26 07:50, Roman Mamedov wrote: >> >> On Wed, 26 Aug 2015 10:56:03 +0200 >> George Duffield <forumscollective@gmail.com> wrote: >> >>> I'm looking to switch from a 5x3TB mdadm raid5 array to a Btrfs based >>> solution that will involve duplicating a data store on a second >>> machine for backup purposes (the machine is only powered up for >>> backups). >> >> >> What do you want to achieve by switching? As Btrfs RAID5/6 is not safe >> yet, do >> you also plan to migrate to RAID10, losing in storage efficiency? >> >> Why not use Btrfs in single-device mode on top of your mdadm RAID5/6? Can >> even >> migrate without moving any data if you currently use Ext4, as it can be >> converted to Btrfs in-place. >> > As of right now, btrfs-convert does not work reliably or safely. I would > strongly advise against using it unless you are trying to help get it > working again. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-28 8:50 ` George Duffield @ 2015-08-28 9:35 ` Hugo Mills 2015-08-28 15:42 ` Chris Murphy ` (2 more replies) 2015-08-28 9:46 ` Roman Mamedov 1 sibling, 3 replies; 18+ messages in thread From: Hugo Mills @ 2015-08-28 9:35 UTC (permalink / raw) To: George Duffield; +Cc: Austin S Hemmelgarn, Roman Mamedov, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2350 bytes --] On Fri, Aug 28, 2015 at 10:50:12AM +0200, George Duffield wrote: > Running a traditional raid5 array of that size is statistically > guaranteed to fail in the event of a rebuild. Except that if it were, you wouldn't see anyone running RAID-5 arrays of that size and (considerably) larger. And successfully replacing devices in them. As I understand it, the calculations that lead to the conclusion you quote are based on the assumption that the bit error rate (BER) of the drive is applied on all reads -- this is not the case. The BER is the error rate of the platter after the device has been left unread (and powered off) for some long period of time. (I've seen 5 years been quoted for that). Hugo. > I also need to expand > the size of available storage to accomodate future storage > requirements. My understanding is that a Btrfs array is easily > expanded without the overhead associated with expanding a traditional > array. Add to that the ability to throw varying drive sizes at the > problem and a Btrfs RAID array looks pretty appealing. > > For clarity, my intention is to create a Btrfs array using new drives, > not to convert the existing ext4 raid5 array. > > On Wed, Aug 26, 2015 at 2:03 PM, Austin S Hemmelgarn > <ahferroin7@gmail.com> wrote: > > On 2015-08-26 07:50, Roman Mamedov wrote: > >> > >> On Wed, 26 Aug 2015 10:56:03 +0200 > >> George Duffield <forumscollective@gmail.com> wrote: > >> > >>> I'm looking to switch from a 5x3TB mdadm raid5 array to a Btrfs based > >>> solution that will involve duplicating a data store on a second > >>> machine for backup purposes (the machine is only powered up for > >>> backups). > >> > >> > >> What do you want to achieve by switching? As Btrfs RAID5/6 is not safe > >> yet, do > >> you also plan to migrate to RAID10, losing in storage efficiency? > >> > >> Why not use Btrfs in single-device mode on top of your mdadm RAID5/6? Can > >> even > >> migrate without moving any data if you currently use Ext4, as it can be > >> converted to Btrfs in-place. > >> > > As of right now, btrfs-convert does not work reliably or safely. I would > > strongly advise against using it unless you are trying to help get it > > working again. > > -- Hugo Mills | Beware geeks bearing GIFs hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-28 9:35 ` Hugo Mills @ 2015-08-28 15:42 ` Chris Murphy 2015-08-28 17:11 ` Austin S Hemmelgarn 2015-08-29 8:52 ` George Duffield 2015-09-02 5:01 ` Russell Coker 2 siblings, 1 reply; 18+ messages in thread From: Chris Murphy @ 2015-08-28 15:42 UTC (permalink / raw) To: Hugo Mills, George Duffield, Austin S Hemmelgarn, Roman Mamedov, Btrfs BTRFS On Fri, Aug 28, 2015 at 3:35 AM, Hugo Mills <hugo@carfax.org.uk> wrote: > On Fri, Aug 28, 2015 at 10:50:12AM +0200, George Duffield wrote: >> Running a traditional raid5 array of that size is statistically >> guaranteed to fail in the event of a rebuild. > > Except that if it were, you wouldn't see anyone running RAID-5 > arrays of that size and (considerably) larger. And successfully > replacing devices in them. > > As I understand it, the calculations that lead to the conclusion > you quote are based on the assumption that the bit error rate (BER) of > the drive is applied on all reads -- this is not the case. The BER is > the error rate of the platter after the device has been left unread > (and powered off) for some long period of time. (I've seen 5 years > been quoted for that). I think the confusion comes from the Unrecovered Read Error (URE) or "Non-recoverable read errors per bits read" in the drive spec sheet. e.g. on a WDC Red this is written as "<1 in 10^14" but this gets (wrongly) reinterpreted into an *expected* URE once every 12.5TB (not TiB) read, which is of course complete utter bullshit. But it gets repeated all the time. It's as if symbols have no meaning, and < is some sort of arrow, or someone got bored and just didn't want to use a space. That symbol makes the URE value a maximum for what is ostensibly a scientific sample of drives. We have no idea what the minimum is, we don't even know the mean, and it's not in the manufacturer's best interest to do that. The mean between consumer SATA and enterprise SAS may not be all that different, while the maximum is two orders magnitude better for enterprise SAS so it makes sense to try to upsell us with that promise. -- Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-28 15:42 ` Chris Murphy @ 2015-08-28 17:11 ` Austin S Hemmelgarn 0 siblings, 0 replies; 18+ messages in thread From: Austin S Hemmelgarn @ 2015-08-28 17:11 UTC (permalink / raw) To: Chris Murphy, Hugo Mills, George Duffield, Roman Mamedov, Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 2044 bytes --] On 2015-08-28 11:42, Chris Murphy wrote: > On Fri, Aug 28, 2015 at 3:35 AM, Hugo Mills <hugo@carfax.org.uk> wrote: >> On Fri, Aug 28, 2015 at 10:50:12AM +0200, George Duffield wrote: >>> Running a traditional raid5 array of that size is statistically >>> guaranteed to fail in the event of a rebuild. >> >> Except that if it were, you wouldn't see anyone running RAID-5 >> arrays of that size and (considerably) larger. And successfully >> replacing devices in them. >> >> As I understand it, the calculations that lead to the conclusion >> you quote are based on the assumption that the bit error rate (BER) of >> the drive is applied on all reads -- this is not the case. The BER is >> the error rate of the platter after the device has been left unread >> (and powered off) for some long period of time. (I've seen 5 years >> been quoted for that). > > I think the confusion comes from the Unrecovered Read Error (URE) or > "Non-recoverable read errors per bits read" in the drive spec sheet. > e.g. on a WDC Red this is written as "<1 in 10^14" but this gets > (wrongly) reinterpreted into an *expected* URE once every 12.5TB (not > TiB) read, which is of course complete utter bullshit. But it gets > repeated all the time. > > It's as if symbols have no meaning, and < is some sort of arrow, or > someone got bored and just didn't want to use a space. That symbol > makes the URE value a maximum for what is ostensibly a scientific > sample of drives. We have no idea what the minimum is, we don't even > know the mean, and it's not in the manufacturer's best interest to do > that. The mean between consumer SATA and enterprise SAS may not be all > that different, while the maximum is two orders magnitude better for > enterprise SAS so it makes sense to try to upsell us with that > promise. > That probably is the case, the truly sad thing is that there are so many engineers (read as 'people who are supposed to actually pay attention to the specs') who do this on a regular basis. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3019 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-28 9:35 ` Hugo Mills 2015-08-28 15:42 ` Chris Murphy @ 2015-08-29 8:52 ` George Duffield 2015-08-29 22:28 ` Chris Murphy 2015-09-02 5:01 ` Russell Coker 2 siblings, 1 reply; 18+ messages in thread From: George Duffield @ 2015-08-29 8:52 UTC (permalink / raw) To: Hugo Mills, George Duffield, Austin S Hemmelgarn, Roman Mamedov, linux-btrfs Funny you should say that, whilst I'd read about it it didn't concern me much until Neil Brown himself advised me against expanding the raid5 arrays any further (one was built using 3TB drives and the other using 4TB drives). My understanding is that larger arrays are typically built using more drives of lower capacity. I'm also loathe to use mdadm as expanding arrays takes forever whereas a Btrfs array should expand much quicker. If Btrfs raid isn't yet ready for prime time I'll just hold off doing anything for the moment, frustrating as that is. On Fri, Aug 28, 2015 at 11:35 AM, Hugo Mills <hugo@carfax.org.uk> wrote: > On Fri, Aug 28, 2015 at 10:50:12AM +0200, George Duffield wrote: >> Running a traditional raid5 array of that size is statistically >> guaranteed to fail in the event of a rebuild. > > Except that if it were, you wouldn't see anyone running RAID-5 > arrays of that size and (considerably) larger. And successfully > replacing devices in them. > > As I understand it, the calculations that lead to the conclusion > you quote are based on the assumption that the bit error rate (BER) of > the drive is applied on all reads -- this is not the case. The BER is > the error rate of the platter after the device has been left unread > (and powered off) for some long period of time. (I've seen 5 years > been quoted for that). > > Hugo. > >> I also need to expand >> the size of available storage to accomodate future storage >> requirements. My understanding is that a Btrfs array is easily >> expanded without the overhead associated with expanding a traditional >> array. Add to that the ability to throw varying drive sizes at the >> problem and a Btrfs RAID array looks pretty appealing. >> >> For clarity, my intention is to create a Btrfs array using new drives, >> not to convert the existing ext4 raid5 array. >> >> On Wed, Aug 26, 2015 at 2:03 PM, Austin S Hemmelgarn >> <ahferroin7@gmail.com> wrote: >> > On 2015-08-26 07:50, Roman Mamedov wrote: >> >> >> >> On Wed, 26 Aug 2015 10:56:03 +0200 >> >> George Duffield <forumscollective@gmail.com> wrote: >> >> >> >>> I'm looking to switch from a 5x3TB mdadm raid5 array to a Btrfs based >> >>> solution that will involve duplicating a data store on a second >> >>> machine for backup purposes (the machine is only powered up for >> >>> backups). >> >> >> >> >> >> What do you want to achieve by switching? As Btrfs RAID5/6 is not safe >> >> yet, do >> >> you also plan to migrate to RAID10, losing in storage efficiency? >> >> >> >> Why not use Btrfs in single-device mode on top of your mdadm RAID5/6? Can >> >> even >> >> migrate without moving any data if you currently use Ext4, as it can be >> >> converted to Btrfs in-place. >> >> >> > As of right now, btrfs-convert does not work reliably or safely. I would >> > strongly advise against using it unless you are trying to help get it >> > working again. >> > > > -- > Hugo Mills | Beware geeks bearing GIFs > hugo@... carfax.org.uk | > http://carfax.org.uk/ | > PGP: E2AB1DE4 | ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-29 8:52 ` George Duffield @ 2015-08-29 22:28 ` Chris Murphy 0 siblings, 0 replies; 18+ messages in thread From: Chris Murphy @ 2015-08-29 22:28 UTC (permalink / raw) To: George Duffield Cc: Hugo Mills, Austin S Hemmelgarn, Roman Mamedov, Btrfs BTRFS On Sat, Aug 29, 2015 at 2:52 AM, George Duffield <forumscollective@gmail.com> wrote: > Funny you should say that, whilst I'd read about it it didn't concern > me much until Neil Brown himself advised me against expanding the > raid5 arrays any further (one was built using 3TB drives and the other > using 4TB drives). My understanding is that larger arrays are > typically built using more drives of lower capacity. I'm also loathe > to use mdadm as expanding arrays takes forever whereas a Btrfs array > should expand much quicker. If Btrfs raid isn't yet ready for prime > time I'll just hold off doing anything for the moment, frustrating as > that is. I think a grid of mdadm vs btrfs feature/behavior comparisons might be useful. The main thing to be aware of with btrfs multiple device is the failure handling is really not present; whereas it is with mdadm and lvm raids. This means btrfs tolerates read and write failures where md will "eject" the drive from the array after even one write failure, and after so many read failures (not sure what it is). There's also no spares support. And no notifications of problems, just kernel messages. Instead of notification emails the mdadm way, I think it's better to look at maybe libblockdev and storaged projects since both of those are taking on standardizing the manipulation of mdadm arrays, LVM, LUKS, and other Linux storage technologies. And then project like (but no limited to) openLMI and a future udisks2 replacement can then get information and state on such things, and propagate that up to the user (with email, text message, web browser, whatever). -- Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-28 9:35 ` Hugo Mills 2015-08-28 15:42 ` Chris Murphy 2015-08-29 8:52 ` George Duffield @ 2015-09-02 5:01 ` Russell Coker 2 siblings, 0 replies; 18+ messages in thread From: Russell Coker @ 2015-09-02 5:01 UTC (permalink / raw) To: Hugo Mills, linux-btrfs On Fri, 28 Aug 2015 07:35:02 PM Hugo Mills wrote: > On Fri, Aug 28, 2015 at 10:50:12AM +0200, George Duffield wrote: > > Running a traditional raid5 array of that size is statistically > > guaranteed to fail in the event of a rebuild. > > Except that if it were, you wouldn't see anyone running RAID-5 > arrays of that size and (considerably) larger. And successfully > replacing devices in them. Let's not assume that everyone who thinks that they are "successfully" running a RAID-5 array is actually doing so. One of the features of BTRFS is that you won't get undetected data corruption. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-28 8:50 ` George Duffield 2015-08-28 9:35 ` Hugo Mills @ 2015-08-28 9:46 ` Roman Mamedov 1 sibling, 0 replies; 18+ messages in thread From: Roman Mamedov @ 2015-08-28 9:46 UTC (permalink / raw) To: George Duffield; +Cc: Austin S Hemmelgarn, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 858 bytes --] On Fri, 28 Aug 2015 10:50:12 +0200 George Duffield <forumscollective@gmail.com> wrote: > Running a traditional raid5 array of that size is statistically > guaranteed to fail in the event of a rebuild. Yeah I consider RAID5 to be safe up to about 4 devices. As you already have 5 and looking to expand, I'd recommend going RAID6. The "fail on rebuild" issue is almost completely mitigated by it, perhaps up to a dozen of drives or more. Don't know about your usage scenarios, but as for me the loss of storage efficiency in RAID10 compared to RAID6 in unacceptable, and I also don't need the performance benefits of RAID10 at all. So both from the efficiency and stability standpoints my personal choice currently is Btrfs single device mode on top of MD RAID5 (in a 4 drive array) and RAID6 (with 7 drives). -- With respect, Roman [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Understanding BTRFS storage 2015-08-26 8:56 Understanding BTRFS storage George Duffield ` (2 preceding siblings ...) 2015-08-26 11:50 ` Roman Mamedov @ 2015-08-26 11:50 ` Duncan 3 siblings, 0 replies; 18+ messages in thread From: Duncan @ 2015-08-26 11:50 UTC (permalink / raw) To: linux-btrfs George Duffield posted on Wed, 26 Aug 2015 10:56:03 +0200 as excerpted: > Two quick questions: > - If I were simply to create a Btrfs volume using 5x3TB drives and not > create a raid5/6/10 array I understand data would be striped across the > 5 drives with no reduncancy ... i.e. if a drive fails all data is lost? > Is this correct? I'm not actually sure if the data default on a multi-device is raid0 (all data effectively lost) or what btrfs calls single mode, which is what it uses for a single device and on a multi-device fs, is sort of like raid0 but with very large strips. Earlier on it was single mode, but somebody commented that it's raid0 mode now instead, so I'm no longer sure what the current default is. In single mode, files written all at once and not changed, upto a gig in size (that being the nominal data chunk size), will likely appear on a single device. With five devices, dropping out only one should in theory leave many of those files and even a reasonable number of 2 GiB files intact. However, fragmentation or rewriting some data within a file would tend to spread it out among data chunks, and thus likely across more devices, making the chance of loosing it higher. Meanwhile, metadata default remains paired-mirrored raid1, regardless of the number of devices. But you can always specify the data and metadata raid levels as desired, assuming you have at least the minimum number of devices required for that raid level. I always specify them here, preferring raid1 for both data and metadata, tho if it were available, I'd probably use 3-way- mirroring. That's roadmapped but probably won't be available for a year or so yet, and it'll take some time to stabilize after that. > - Is Btrfs RAID10 (for data) ready to be used reliably? Btrfs raid0/1/10 modes as well as single and (for single device metadata) dup modes are all relatively mature, and should be as stable as btrfs itself, meaning stabilizing, but not fully stable just yet, with bugs from time to time. Basically, that means the sysadmin's backups rule, that if it's not backed up, by action and definition it wasn't valuable, regardless of claims to the contrary (and complete backups are tested, if it's not tested usable/restorable, the backup isn't complete yet), applies double -- really, have backups or you're playing Russian roulette with your data, but those modes are stable enough for daily use, as long as you do have those backups or the data is simply throw-away. Btrfs raid56 (5 and 6, it's the same code dealing with both) modes were nominally code-complete as of 3.19, but are still new enough they've not reached the stability of the rest of btrfs, yet. As such, I've been suggesting that unless people are prepared to deal with that additional potential instability and bugginess, they wait for a year after introduction, effectively five kernel cycles, which should put btrfs- stability-match at about the 4.4 kernel timeframe. Similarly, quota code has been a problem and remains less than stable, so don't use btrfs quotas in the near term (until at least 4.3, then see what behavior looks like), unless of course you're doing so in cooperation with the devs working on it specifically to help test and stabilize it. Other features are generally as stable as btrfs as a whole, except that keeping to say 250-ish snapshots per subvolume, 1000-2000 snapshots per filesystem, is recommended, as snapshotting, while it works well in general as long as there's not too many, simply doesn't scale well in terms of maintenance time -- device replaces, balances, btrfs checks, etc. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2015-09-02 5:02 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-26 8:56 Understanding BTRFS storage George Duffield 2015-08-26 11:41 ` Austin S Hemmelgarn 2015-08-26 11:50 ` Hugo Mills 2015-08-26 11:50 ` Roman Mamedov 2015-08-26 12:03 ` Austin S Hemmelgarn 2015-08-27 2:58 ` Duncan 2015-08-27 12:01 ` Austin S Hemmelgarn 2015-08-28 9:47 ` Duncan 2015-08-28 12:54 ` Austin S Hemmelgarn 2015-08-28 8:50 ` George Duffield 2015-08-28 9:35 ` Hugo Mills 2015-08-28 15:42 ` Chris Murphy 2015-08-28 17:11 ` Austin S Hemmelgarn 2015-08-29 8:52 ` George Duffield 2015-08-29 22:28 ` Chris Murphy 2015-09-02 5:01 ` Russell Coker 2015-08-28 9:46 ` Roman Mamedov 2015-08-26 11:50 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).