On 2015-10-05 09:14, Hugo Mills wrote:
> On Mon, Oct 05, 2015 at 08:30:17AM -0400, Austin S Hemmelgarn wrote:
>> I've been having issues recently with a relatively simple setup
>> using a two device BTRFS raid1 on top of two two device md RAID0's,
>> and every time I've rebooted since starting trying to use this
>> particular filesystem, I've found it unable to mount and had to
>> recreate it from scratch.  This is more of an inconvenience than
>> anything else (while I don't have backups of it, all the data is
>> trivial to recreate (in fact, so trivial that doing backups would be
>> more effort than just recreating the data by hand)), but it's still
>> something that I would like to try and fix.
>>
>> First off, general info:
>> Kernel version: 4.2.1-local+ (4.2.1 with minor modifications,
>> sources can be found here: https://github.com/ferroin/linux)
>> Btrfs-progs version: 4.2
>>
>> I would post output from btrfs fi show, but that's spouting
>> obviously wrong data (it's saying I'm using only 127MB with 2GB of
>> allocations on each 'disk',  I had been storing approximately 4-6GB
>> of actual data on the filesystem).
>>
>> This particular filesystem is composed of BTRFS raid1 across two LVM
>> managed DM/MD RAID0 devices, each of which spans 2 physical hard
>> drives.  I have a couple of other filesystems with the exact same
>> configuration that have not ever displayed this issue.
>>
>> When I run 'btrfs check' on the filesystem when it refuses to mount,
>> I get a number of lines like the following:
>> bad metadata [<bytenr>, <bytenr>) crossing stripe boundary
>>
>> followed eventually by:
>> Errors found in extent allocation tree or chunk allocation
>
>     I _think_ this is a bug in mkfs from 4.2.0, fixed in later
> releases of the btrfs-progs.
If so, that's good news (that is, that it's just a mkfs bug).  I guess 
it's time for me to quit waiting around for Gentoo to package the newest 
version and build it myself.
>
>> As is typical of a failed mount, dmesg shows a 'failed to read the
>> system array on <device>' 'open_ctree failed'.
>>
>> I doubt that this is a hardware issue because:
>> 1. Memory is brand new, and I ran a 48 hour burn-in test that showed
>> no errors.
>> 2. A failing storage controller, PSU, or CPU would be manifesting
>> with many more issues than just this.
>> 3. A disk failure would mean that two different disks, from
>> different manufacturing lots, are encountering errors on exactly the
>> same LBA's at exactly the same time, which while possible is
>> astronomically unlikely for disks bigger than a few hundred
>> gigabytes (the disks in question are 1TB each).
>>