Issues with unmountable BTRFS raid1 filesystem

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Issues with unmountable BTRFS raid1 filesystem
@ 2015-10-05 12:30 Austin S Hemmelgarn
  2015-10-05 13:14 ` Hugo Mills
  0 siblings, 1 reply; 5+ messages in thread
From: Austin S Hemmelgarn @ 2015-10-05 12:30 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2204 bytes --]

I've been having issues recently with a relatively simple setup using a 
two device BTRFS raid1 on top of two two device md RAID0's, and every 
time I've rebooted since starting trying to use this particular 
filesystem, I've found it unable to mount and had to recreate it from 
scratch.  This is more of an inconvenience than anything else (while I 
don't have backups of it, all the data is trivial to recreate (in fact, 
so trivial that doing backups would be more effort than just recreating 
the data by hand)), but it's still something that I would like to try 
and fix.

First off, general info:
Kernel version: 4.2.1-local+ (4.2.1 with minor modifications, sources 
can be found here: https://github.com/ferroin/linux)
Btrfs-progs version: 4.2

I would post output from btrfs fi show, but that's spouting obviously 
wrong data (it's saying I'm using only 127MB with 2GB of allocations on 
each 'disk',  I had been storing approximately 4-6GB of actual data on 
the filesystem).

This particular filesystem is composed of BTRFS raid1 across two LVM 
managed DM/MD RAID0 devices, each of which spans 2 physical hard drives. 
  I have a couple of other filesystems with the exact same configuration 
that have not ever displayed this issue.

When I run 'btrfs check' on the filesystem when it refuses to mount, I 
get a number of lines like the following:
bad metadata [<bytenr>, <bytenr>) crossing stripe boundary

followed eventually by:
Errors found in extent allocation tree or chunk allocation

As is typical of a failed mount, dmesg shows a 'failed to read the 
system array on <device>' 'open_ctree failed'.

I doubt that this is a hardware issue because:
1. Memory is brand new, and I ran a 48 hour burn-in test that showed no 
errors.
2. A failing storage controller, PSU, or CPU would be manifesting with 
many more issues than just this.
3. A disk failure would mean that two different disks, from different 
manufacturing lots, are encountering errors on exactly the same LBA's at 
exactly the same time, which while possible is astronomically unlikely 
for disks bigger than a few hundred gigabytes (the disks in question are 
1TB each).

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issues with unmountable BTRFS raid1 filesystem
  2015-10-05 12:30 Issues with unmountable BTRFS raid1 filesystem Austin S Hemmelgarn
@ 2015-10-05 13:14 ` Hugo Mills
  2015-10-05 14:19   ` Austin S Hemmelgarn
  0 siblings, 1 reply; 5+ messages in thread
From: Hugo Mills @ 2015-10-05 13:14 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2574 bytes --]

On Mon, Oct 05, 2015 at 08:30:17AM -0400, Austin S Hemmelgarn wrote:
> I've been having issues recently with a relatively simple setup
> using a two device BTRFS raid1 on top of two two device md RAID0's,
> and every time I've rebooted since starting trying to use this
> particular filesystem, I've found it unable to mount and had to
> recreate it from scratch.  This is more of an inconvenience than
> anything else (while I don't have backups of it, all the data is
> trivial to recreate (in fact, so trivial that doing backups would be
> more effort than just recreating the data by hand)), but it's still
> something that I would like to try and fix.
> 
> First off, general info:
> Kernel version: 4.2.1-local+ (4.2.1 with minor modifications,
> sources can be found here: https://github.com/ferroin/linux)
> Btrfs-progs version: 4.2
>
> I would post output from btrfs fi show, but that's spouting
> obviously wrong data (it's saying I'm using only 127MB with 2GB of
> allocations on each 'disk',  I had been storing approximately 4-6GB
> of actual data on the filesystem).
> 
> This particular filesystem is composed of BTRFS raid1 across two LVM
> managed DM/MD RAID0 devices, each of which spans 2 physical hard
> drives.  I have a couple of other filesystems with the exact same
> configuration that have not ever displayed this issue.
> 
> When I run 'btrfs check' on the filesystem when it refuses to mount,
> I get a number of lines like the following:
> bad metadata [<bytenr>, <bytenr>) crossing stripe boundary
> 
> followed eventually by:
> Errors found in extent allocation tree or chunk allocation

   I _think_ this is a bug in mkfs from 4.2.0, fixed in later
releases of the btrfs-progs.

   Hugo.

> As is typical of a failed mount, dmesg shows a 'failed to read the
> system array on <device>' 'open_ctree failed'.
> 
> I doubt that this is a hardware issue because:
> 1. Memory is brand new, and I ran a 48 hour burn-in test that showed
> no errors.
> 2. A failing storage controller, PSU, or CPU would be manifesting
> with many more issues than just this.
> 3. A disk failure would mean that two different disks, from
> different manufacturing lots, are encountering errors on exactly the
> same LBA's at exactly the same time, which while possible is
> astronomically unlikely for disks bigger than a few hundred
> gigabytes (the disks in question are 1TB each).
> 



-- 
Hugo Mills             | Jazz is the sort of music where no-one plays
hugo@... carfax.org.uk | anything the same way once.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issues with unmountable BTRFS raid1 filesystem
  2015-10-05 13:14 ` Hugo Mills
@ 2015-10-05 14:19   ` Austin S Hemmelgarn
  2015-10-05 16:01     ` Austin S Hemmelgarn
  2015-10-05 16:04     ` Holger Hoffstätte
  0 siblings, 2 replies; 5+ messages in thread
From: Austin S Hemmelgarn @ 2015-10-05 14:19 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]

On 2015-10-05 09:14, Hugo Mills wrote:
> On Mon, Oct 05, 2015 at 08:30:17AM -0400, Austin S Hemmelgarn wrote:
>> I've been having issues recently with a relatively simple setup
>> using a two device BTRFS raid1 on top of two two device md RAID0's,
>> and every time I've rebooted since starting trying to use this
>> particular filesystem, I've found it unable to mount and had to
>> recreate it from scratch.  This is more of an inconvenience than
>> anything else (while I don't have backups of it, all the data is
>> trivial to recreate (in fact, so trivial that doing backups would be
>> more effort than just recreating the data by hand)), but it's still
>> something that I would like to try and fix.
>>
>> First off, general info:
>> Kernel version: 4.2.1-local+ (4.2.1 with minor modifications,
>> sources can be found here: https://github.com/ferroin/linux)
>> Btrfs-progs version: 4.2
>>
>> I would post output from btrfs fi show, but that's spouting
>> obviously wrong data (it's saying I'm using only 127MB with 2GB of
>> allocations on each 'disk',  I had been storing approximately 4-6GB
>> of actual data on the filesystem).
>>
>> This particular filesystem is composed of BTRFS raid1 across two LVM
>> managed DM/MD RAID0 devices, each of which spans 2 physical hard
>> drives.  I have a couple of other filesystems with the exact same
>> configuration that have not ever displayed this issue.
>>
>> When I run 'btrfs check' on the filesystem when it refuses to mount,
>> I get a number of lines like the following:
>> bad metadata [<bytenr>, <bytenr>) crossing stripe boundary
>>
>> followed eventually by:
>> Errors found in extent allocation tree or chunk allocation
>
>     I _think_ this is a bug in mkfs from 4.2.0, fixed in later
> releases of the btrfs-progs.
If so, that's good news (that is, that it's just a mkfs bug).  I guess 
it's time for me to quit waiting around for Gentoo to package the newest 
version and build it myself.
>
>> As is typical of a failed mount, dmesg shows a 'failed to read the
>> system array on <device>' 'open_ctree failed'.
>>
>> I doubt that this is a hardware issue because:
>> 1. Memory is brand new, and I ran a 48 hour burn-in test that showed
>> no errors.
>> 2. A failing storage controller, PSU, or CPU would be manifesting
>> with many more issues than just this.
>> 3. A disk failure would mean that two different disks, from
>> different manufacturing lots, are encountering errors on exactly the
>> same LBA's at exactly the same time, which while possible is
>> astronomically unlikely for disks bigger than a few hundred
>> gigabytes (the disks in question are 1TB each).
>>




[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issues with unmountable BTRFS raid1 filesystem
  2015-10-05 14:19   ` Austin S Hemmelgarn
@ 2015-10-05 16:01     ` Austin S Hemmelgarn
  2015-10-05 16:04     ` Holger Hoffstätte
  1 sibling, 0 replies; 5+ messages in thread
From: Austin S Hemmelgarn @ 2015-10-05 16:01 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2259 bytes --]

On 2015-10-05 10:19, Austin S Hemmelgarn wrote:
> On 2015-10-05 09:14, Hugo Mills wrote:
>> On Mon, Oct 05, 2015 at 08:30:17AM -0400, Austin S Hemmelgarn wrote:
>>> I've been having issues recently with a relatively simple setup
>>> using a two device BTRFS raid1 on top of two two device md RAID0's,
>>> and every time I've rebooted since starting trying to use this
>>> particular filesystem, I've found it unable to mount and had to
>>> recreate it from scratch.  This is more of an inconvenience than
>>> anything else (while I don't have backups of it, all the data is
>>> trivial to recreate (in fact, so trivial that doing backups would be
>>> more effort than just recreating the data by hand)), but it's still
>>> something that I would like to try and fix.
>>>
>>> First off, general info:
>>> Kernel version: 4.2.1-local+ (4.2.1 with minor modifications,
>>> sources can be found here: https://github.com/ferroin/linux)
>>> Btrfs-progs version: 4.2
>>>
>>> I would post output from btrfs fi show, but that's spouting
>>> obviously wrong data (it's saying I'm using only 127MB with 2GB of
>>> allocations on each 'disk',  I had been storing approximately 4-6GB
>>> of actual data on the filesystem).
>>>
>>> This particular filesystem is composed of BTRFS raid1 across two LVM
>>> managed DM/MD RAID0 devices, each of which spans 2 physical hard
>>> drives.  I have a couple of other filesystems with the exact same
>>> configuration that have not ever displayed this issue.
>>>
>>> When I run 'btrfs check' on the filesystem when it refuses to mount,
>>> I get a number of lines like the following:
>>> bad metadata [<bytenr>, <bytenr>) crossing stripe boundary
>>>
>>> followed eventually by:
>>> Errors found in extent allocation tree or chunk allocation
>>
>>     I _think_ this is a bug in mkfs from 4.2.0, fixed in later
>> releases of the btrfs-progs.
> If so, that's good news (that is, that it's just a mkfs bug).  I guess
> it's time for me to quit waiting around for Gentoo to package the newest
> version and build it myself.
It looks like it was indeed a bug in mkfs from 4.2.0, I upgraded to the 
development branch and everything seems to be working correctly now. 
Thanks for the help!



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Issues with unmountable BTRFS raid1 filesystem
  2015-10-05 14:19   ` Austin S Hemmelgarn
  2015-10-05 16:01     ` Austin S Hemmelgarn
@ 2015-10-05 16:04     ` Holger Hoffstätte
  1 sibling, 0 replies; 5+ messages in thread
From: Holger Hoffstätte @ 2015-10-05 16:04 UTC (permalink / raw)
  To: Austin S Hemmelgarn, linux-btrfs

On 10/05/15 16:19, Austin S Hemmelgarn wrote:
> If so, that's good news (that is, that it's just a mkfs bug). I guess

It is, both in mkfs and check.

> it's time for me to quit waiting around for Gentoo to package the
> newest version and build it myself.

Copy away:
https://github.com/hhoffstaette/portage/tree/master/sys-fs/btrfs-progs

;)

-h


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-10-05 16:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-05 12:30 Issues with unmountable BTRFS raid1 filesystem Austin S Hemmelgarn
2015-10-05 13:14 ` Hugo Mills
2015-10-05 14:19   ` Austin S Hemmelgarn
2015-10-05 16:01     ` Austin S Hemmelgarn
2015-10-05 16:04     ` Holger Hoffstätte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).