On 2015-07-16 07:49, Austin S Hemmelgarn wrote:
> On 2015-07-14 07:49, Austin S Hemmelgarn wrote:
>> So, after experiencing this same issue multiple times (on almost a
>> dozen different kernel versions since 4.0) and ruling out the
>> possibility of it being caused by my hardware (or at least, the RAM,
>> SATA controller and disk drives themselves), I've decided to report it
>> here.
>>
>> The general symptom is that raid6 profile filesystems that I have are
>> working fine for multiple weeks, until I either reboot or otherwise
>> try to remount them, at which point the system refuses to mount them.
>>
>> I'm currently using btrfs-progs v4.1 with kernel 4.1.2, although I've
>> been seeing this with versions of both since 4.0.
>>
>> Output of 'btrfs fi show' for the most recent fs that I had this issue
>> with:
>>          Label: 'altroot'  uuid: 86eef6b9-febe-4350-a316-4cb00c40bbc5
>>     Total devices 4 FS bytes used 9.70GiB
>>     devid    1 size 24.00GiB used 6.03GiB path /dev/mapper/vg-altroot.0
>>     devid    2 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.1
>>     devid    3 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.2
>>     devid    4 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.3
>>
>>          btrfs-progs v4.1
>>
>> Each of the individual LVS that are in the FS is just a flat chunk of
>> space on a separate disk from the others.
>>
>> The FS itself passes btrfs check just fine (no reported errors, exit
>> value of 0), but the kernel refuses to mount it with the message
>> 'open_ctree failed'.
>>
>> I've run btrfs chunk recover and attached the output from that.
>>
>> Here's a link to an image from 'btrfs image -c9 -w':
>> https://www.dropbox.com/s/pl7gs305ej65u9q/altroot.btrfs.img?dl=0
>> (That link will expire in 30 days, let me know if you need access to
>> it beyond that).
>>
>> The filesystems in question all see relatively light but consistent
>> usage as targets for receiving daily incremental snapshots for
>> on-system backups (and because I know someone will mention it, yes, I
>> do have other backups of the data, these are just my online backups).
>>
> Secondary but possibly related issue, I'm seeing similar issues with all
> data/metadata profiles when using BTRFS on top of a dm-thinp volume with
> zeroing-mode turned off (that is, discard doesn't clear data from the
> areas that were discarded).
>
Following up further on this specific issue, I've tracked this down to 
dm-thinp not clearing the discard_zeros_data flag on the devices when 
you turn off zeroing mode.  I'm going to do some more digging regarding 
that and probably send a patch to lkml to fix it.