lost with degraded RAID1

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* lost with degraded RAID1
@ 2014-01-29 19:16 Johan Kröckel
  2014-01-30  3:28 ` Duncan
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Johan Kröckel @ 2014-01-29 19:16 UTC (permalink / raw)
  To: linux-btrfs

My situation:

Former btrfs-RAID1 on two luks encrypted partitions (bunkerA and bunkerB).
Disk holding bunkerB died online.
Now I started rebalancing bunkerA to single, but in between had the
idea to go over a reboot ("maybe the disk reappears"?). So stopped
rebalancing, rebooted.
Now the damaged disk disappeared completely. No sign of it in /proc/diskstats.
So, i wanted to finish the rebalancing to single.

Label: bunker  uuid: 7f954a85-7566-4251-832c-44f2d3de2211
        Total devices 2 FS bytes used 1.58TiB
        devid    1 size 1.82TiB used 1.58TiB path
        devid    2 size 1.82TiB used 1.59TiB path /dev/mapper/bunkerA

btrfs filesystem df /mnt
Data, RAID1: total=1.58TiB, used=1.57TiB
Data, single: total=11.00GiB, used=10.00GiB
System, RAID1: total=8.00MiB, used=240.00KiB
Metadata, RAID1: total=3.00GiB, used=1.61GiB

The 11 GiB single Data was balanced in this way while having no second
actually functioning device in the array before reboot!

But now, I can't mount bunkerA degraded,RW because degraded
filesystems are not allowed to be mounted RW (?). The consequence is,
that I felt unable to do anything other than mounting it RO and back
it up (all data is fine) to another filesystem. I don't have a new
substitute disk, yet, so I couldn't test whether I can add a new
device to a RO-mounted filesystem. Adding a loop device didn't work.

What are my options now? What further information could be useful? I
cant believe, that I have all my data in front of me and have to start
over again with a new filesystem because of security checks and no
enforcement option to mount it RW.

Thanks

Johan Kröckel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-29 19:16 lost with degraded RAID1 Johan Kröckel
@ 2014-01-30  3:28 ` Duncan
  2014-01-30 11:53 ` Johan Kröckel
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Duncan @ 2014-01-30  3:28 UTC (permalink / raw)
  To: linux-btrfs

Johan Kröckel posted on Wed, 29 Jan 2014 20:16:09 +0100 as excerpted:

[btrfs raid1 on a pair of luks-encrypted partitions, one went south and a 
balance back to single was started on the good one, interrupted by a 
reboot.]

> But now, I can't mount bunkerA degraded,RW because degraded filesystems
> are not allowed to be mounted RW (?).

AFAIK that shouldn't be the case.  Degraded should allow the RW mount -- 
I know it did some kernels ago when I tried it then, and if it changed, 
it's news to me too, in which case I need to do some reevaluation here.

What I think /might/ have happened is that there's some other damage to 
the filesystem unrelated to the missing device in the raid1, which forces 
it read-only as soon as btrfs finds that damage.  However, mount -o 
remount,rw,degraded should still work, I /think/.

> The consequence is,
> that I felt unable to do anything other than mounting it RO and back it
> up (all data is fine) to another filesystem. I don't have a new
> substitute disk, yet, so I couldn't test whether I can add a new device
> to a RO-mounted filesystem. Adding a loop device didn't work.

Well, given that btrfs isn't yet entirely stable, and both mkfs.btrfs and 
the btrfs wiki at btrfs.wiki.kernel.org warn to keep tested backups in 
case something goes wrong with the btrfs your testing and you lose that 
copy... you should have already had that backup, and wouldn't need to 
make it, only perhaps update a few files that changed since your last 
backup that you were willing to lose the changes too if it came to it, 
but since you can still mount ro, you might as well take the chance to 
update the backup given that you can.

> What are my options now? What further information could be useful? I
> cant believe, that I have all my data in front of me and have to start
> over again with a new filesystem because of security checks and no
> enforcement option to mount it RW.

As I said, to the best of my (non-dev btrfs user and list regular) 
knowledge, mount -o degraded,rw should work.  If it's not, there's likely 
something else going on triggering the ro remount, and seeing any related 
dmesg output would presumably help.

One other thing that might help is the skip_balance mount option, to 
avoid restarting the in-process balance immediately.  See if that lets 
you mount degraded,rw without immediately forcing back to ro.  You can 
then try resuming the balance.  If that fails, try canceling the balance 
and then starting a new balance using balance filters (see the balance 
questions in the FAQ on the wiki, which link to the balance-filters page).

Meanwhile, at least you have ro access to all the data and can take that 
backup that you apparently ignored the warnings about making, previously.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: lost with degraded RAID1
  2014-01-29 19:16 lost with degraded RAID1 Johan Kröckel
  2014-01-30  3:28 ` Duncan
@ 2014-01-30 11:53 ` Johan Kröckel
  2014-01-30 17:57   ` Chris Murphy
                     ` (2 more replies)
  2014-01-30 17:33 ` Chris Murphy
  2014-01-31  6:40 ` Johan Kröckel
  3 siblings, 3 replies; 25+ messages in thread
From: Johan Kröckel @ 2014-01-30 11:53 UTC (permalink / raw)
  To: linux-btrfs

[Answer from Duncan, 1i5t5.duncan@DOMAIN.HIDDEN (Thanks for the try)]

[AFAIK that shouldn't be the case.  Degraded should allow the RW mount --
I know it did some kernels ago when I tried it then, and if it changed,
it's news to me too, in which case I need to do some reevaluation here.]

What I think /might/ have happened is that there's some other damage to
the filesystem unrelated to the missing device in the raid1, which forces
it read-only as soon as btrfs finds that damage.  However, mount -o
remount,rw,degraded should still work, I /think/.]

http://www.spinics.net/lists/linux-btrfs/msg20164.html

[you should have already had that backup, and wouldn't need to
make it, only perhaps update a few files that changed since your last
backup that you were willing to lose the changes too if it came to it,
but since you can still mount ro, you might as well take the chance to
update the backup given that you can.]

I have backups on other disks and from. Thats not my problem. That was
just as a notice, to show that the data IS totally fine. It was/is
RAID1, so why shouldnt it be.

[As I said, to the best of my (non-dev btrfs user and list regular)
knowledge, mount -o degraded,rw should work.]

No, it doesnt.
Syslog says: "Jan 30 12:44:02 fortknox kernel: [756677.795661] Btrfs:
too many missing devices, writeable mount is not allowed"

[One other thing that might help is the skip_balance mount option, to
avoid restarting the in-process balance immediately.]

No, same in syslog.

[If that fails, try canceling the balance
and then starting a new balance using balance filters...]

The balance was already canceled AND its not possible to balance a ro
filesystem.

[Meanwhile, at least you have ro access to all the data and can take that
backup that you apparently ignored the warnings about making, previously.]

Yes, i can and no, i did not.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-29 19:16 lost with degraded RAID1 Johan Kröckel
  2014-01-30  3:28 ` Duncan
  2014-01-30 11:53 ` Johan Kröckel
@ 2014-01-30 17:33 ` Chris Murphy
  2014-01-30 17:58   ` Hugo Mills
       [not found]   ` <CABgvyo822bOAHeA1GH28MPaBAU+Zdi72MD_uwL+dhopt+nwMig@mail.gmail.com>
  2014-01-31  6:40 ` Johan Kröckel
  3 siblings, 2 replies; 25+ messages in thread
From: Chris Murphy @ 2014-01-30 17:33 UTC (permalink / raw)
  To: Btrfs BTRFS

On Jan 29, 2014, at 12:16 PM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:

> My situation:
> 
> Former btrfs-RAID1 on two luks encrypted partitions (bunkerA and bunkerB).
> Disk holding bunkerB died online.
> Now I started rebalancing bunkerA to single,

You're doing an online conversion of a degraded raid1 volume into single? Does anyone know if this is expected or intended to work?

Obviously the tools allow you to start it, but for some reason to me it doesn't seem like a good idea.  I'd just leave it -o degraded until I get a replacement drive. Deciding to do a conversion on a degraded volume doesn't seem like proper timing.

> 
> Now the damaged disk disappeared completely. No sign of it in /proc/diskstats.

I don't know what you mean by it's disappeared complete. Are you talking about the physical block device? Or the dm logical block device?

What do you get for:
lsblk
blkid
btrfs device scan --all-devices
btrfs fi show

> But now, I can't mount bunkerA degraded,RW because degraded
> filesystems are not allowed to be mounted RW (?)

Have you tried? What errors do you get in user space and dmesg?

Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-30 11:53 ` Johan Kröckel
@ 2014-01-30 17:57   ` Chris Murphy
       [not found]     ` <CABgvyo9FYGSYpj+jL1oqCvtNUqsC8HZ+z=x-Gz7naWoEcCKWpQ@mail.gmail.com>
  2014-01-30 22:18   ` Duncan
  2014-01-31  2:19   ` Chris Murphy
  2 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2014-01-30 17:57 UTC (permalink / raw)
  To: Johan Kröckel; +Cc: linux-btrfs


On Jan 30, 2014, at 4:53 AM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:
> 
> [As I said, to the best of my (non-dev btrfs user and list regular)
> knowledge, mount -o degraded,rw should work.]
> 
> No, it doesnt.
> Syslog says: "Jan 30 12:44:02 fortknox kernel: [756677.795661] Btrfs:
> too many missing devices, writeable mount is not allowed"

Is this encrypted Btrfs volume used for rootfs? Or is it only for data?  If it's only for data, then make sure the volume (all subvolumes) are umounted, then mount with this:

-o degraded,recovery,skip_balance

Do you still get too many missing devices message?

Vaguely related questions (for everyone) are:

- If this is rootfs, by default grub options mount ro. So when we use rootflags=degraded,recovery that's effectively -o ro,degraded,recovery. Upon fstab being read, rootfs is remounted rw. Does the previous ro,degraded,recovery repair get written to the volume upon fstab remount rw?

This is somewhat important to understand, and whether the grub default of ro mount, and the remount rw switch at fstab time is sufficiently detecting and repairing Btrfs problems (with and without rootflags=recovery).

- Is an online conversion from profile X to profile Y expected to work when the volume is mounted degraded? If it's not expected to work, or isn't a good idea, then maybe balance needs to be disabled on degraded mounts?

Chris Murphy


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-30 17:33 ` Chris Murphy
@ 2014-01-30 17:58   ` Hugo Mills
  2014-01-30 18:25     ` Chris Murphy
       [not found]   ` <CABgvyo822bOAHeA1GH28MPaBAU+Zdi72MD_uwL+dhopt+nwMig@mail.gmail.com>
  1 sibling, 1 reply; 25+ messages in thread
From: Hugo Mills @ 2014-01-30 17:58 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 980 bytes --]

On Thu, Jan 30, 2014 at 10:33:21AM -0700, Chris Murphy wrote:
> 
> On Jan 29, 2014, at 12:16 PM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:
> 
> > My situation:
> > 
> > Former btrfs-RAID1 on two luks encrypted partitions (bunkerA and bunkerB).
> > Disk holding bunkerB died online.
> > Now I started rebalancing bunkerA to single,
> 
> 
> You're doing an online conversion of a degraded raid1 volume into single? Does anyone know if this is expected or intended to work?

   I don't see why not. One suggested method of recovering RAID from a
degraded situation is to rebalance over just the remaining devices
(space permitting, of course).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
         --- Everything simple is false. Everything which is ---         
                          complex is unusable.                           

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-30 17:58   ` Hugo Mills
@ 2014-01-30 18:25     ` Chris Murphy
  2014-02-03 20:55       ` Johan Kröckel
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2014-01-30 18:25 UTC (permalink / raw)
  To: Hugo Mills; +Cc: Btrfs BTRFS

On Jan 30, 2014, at 10:58 AM, Hugo Mills <hugo@carfax.org.uk> wrote:

> On Thu, Jan 30, 2014 at 10:33:21AM -0700, Chris Murphy wrote:
>> 
>> On Jan 29, 2014, at 12:16 PM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:
>> 
>>> My situation:
>>> 
>>> Former btrfs-RAID1 on two luks encrypted partitions (bunkerA and bunkerB).
>>> Disk holding bunkerB died online.
>>> Now I started rebalancing bunkerA to single,
>> 
>> 
>> You're doing an online conversion of a degraded raid1 volume into single? Does anyone know if this is expected or intended to work?
> 
>   I don't see why not. One suggested method of recovering RAID from a
> degraded situation is to rebalance over just the remaining devices
> (space permitting, of course).

Right but that's not a conversion. That's a regular balance on a degraded mount, with multiple remaining devices: e.g. a 4 disk raid1, drive fails, mount -o degraded, delete missing, then balance will replicate any missing 2nd copies onto three drives.

The bigger problem at the moment is that -o degraded isn't working for Johan. The too many missing devices message seems like a bug and with limited information it may even be whatever that bug is, that cause the conversion to fail. Some 11GB were converted prior to the failure.

Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
       [not found]     ` <CABgvyo9FYGSYpj+jL1oqCvtNUqsC8HZ+z=x-Gz7naWoEcCKWpQ@mail.gmail.com>
@ 2014-01-30 19:32       ` Chris Murphy
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Murphy @ 2014-01-30 19:32 UTC (permalink / raw)
  To: Johan Kröckel; +Cc: Btrfs BTRFS

On Jan 30, 2014, at 11:44 AM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:

> 2014-01-30 Chris Murphy <lists@colorremedies.com>:
>> Is this encrypted Btrfs volume used for rootfs? Or is it only for data?  If it's only for data, then make sure the volume (all subvolumes) are umounted, then mount with this:
>> 
>> -o degraded,recovery,skip_balance
>> 
>> Do you still get too many missing devices message?
> 
> root@fortknox:/# mount  -odegraded,recovery,skip_balance
> /dev/mapper/bunkerA /mnt
> mount: wrong fs type, bad option, bad superblock on /dev/mapper/bunkerA,
>       missing codepage or helper program, or other error
>       In some cases useful info is found in syslog - try
>       dmesg | tail  or so
> 
> root@fortknox:/# dmesg|tail
> [781630.598155] btrfs: bdev (null) errs: wr 1375454, rd 186442, flush
> 0, corrupt 0, gen 0

That's a lot of read and write errors. I wonder what that's all about. Is btrfs tracking the resulting read/write failures due to the missing device?

So at this point it's worth a btrfs check /dev/mapper/bunkerA. This is read only, does not make repairs. And also I'm curious what you get for btrfs-show-super -a /dev/mapper/bunkerA. The volume shouldn't be mounted for the first one, and can be either mounted or not for the 2nd.

Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-30 11:53 ` Johan Kröckel
  2014-01-30 17:57   ` Chris Murphy
@ 2014-01-30 22:18   ` Duncan
  2014-01-31  3:00     ` Chris Murphy
  2014-01-31  2:19   ` Chris Murphy
  2 siblings, 1 reply; 25+ messages in thread
From: Duncan @ 2014-01-30 22:18 UTC (permalink / raw)
  To: linux-btrfs

Johan Kröckel posted on Thu, 30 Jan 2014 12:53:44 +0100 as excerpted:

> [Answer from Duncan, 1i5t5.duncan@DOMAIN.HIDDEN (Thanks for the try)]
> 
> [AFAIK that shouldn't be the case.  Degraded should allow the RW mount
> -- I know it did some kernels ago when I tried it then, and if it
> changed, it's news to me too, in which case I need to do some
> reevaluation here.]
> 
> What I think /might/ have happened is that there's some other damage to
> the filesystem unrelated to the missing device in the raid1, which
> forces it read-only as soon as btrfs finds that damage.  However, mount
> -o remount,rw,degraded should still work, I /think/.]
> 
> http://www.spinics.net/lists/linux-btrfs/msg20164.html

Thanks.  I had seen that message go by but obviously missed all the 
implications.

IMO the idea of the patch is correct as a default, but there should be a 
way to override, since AFAIK ro-mount will I believe at times prevent 
undegrading as well, certainly when an admin's choice to clear the 
degraded state is reduced redundancy, which is what you're doing.

IOW, I guess I don't agree with that patch as it was apparently 
committed.  There needs to be a force option as well.

Meanwhile, back to the current situation.  Given that you found the patch 
preventing what you were trying to do, what happens if you simply find 
that commit in the git repo, revert and rebuild?  Assuming there are no 
further commits building on that one, a revert and rebuild should at 
least allow you to complete the half-completed balance.

And regardless of whether btrfs policy is to prevent that entirely in the 
future or not, definitely letting you start the balance before a reboot 
and not letting you finish it afterward is a bug.  It should either be 
prevented entirely, or allowed to finish after a reboot if it was allowed 
to start.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-30 11:53 ` Johan Kröckel
  2014-01-30 17:57   ` Chris Murphy
  2014-01-30 22:18   ` Duncan
@ 2014-01-31  2:19   ` Chris Murphy
  2 siblings, 0 replies; 25+ messages in thread
From: Chris Murphy @ 2014-01-31  2:19 UTC (permalink / raw)
  To: Johan Kröckel; +Cc: linux-btrfs


On Jan 30, 2014, at 4:53 AM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:

> http://www.spinics.net/lists/linux-btrfs/msg20164.html

and

> Syslog says: "Jan 30 12:44:02 fortknox kernel: [756677.795661] Btrfs:
> too many missing devices, writeable mount is not allowed"


By the way, the cited patch says "Btrfs: too many missing devices, writeable remount is not allowed"

The patch says "remount" but your error says "mount". I'm not sure you have the right patch.


Chris Murphy


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-30 22:18   ` Duncan
@ 2014-01-31  3:00     ` Chris Murphy
  2014-01-31  5:58       ` Duncan
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2014-01-31  3:00 UTC (permalink / raw)
  To: Btrfs BTRFS

On Jan 30, 2014, at 3:18 PM, Duncan <1i5t5.duncan@cox.net> wrote:

> IOW, I guess I don't agree with that patch as it was apparently 
> committed.  There needs to be a force option as well.

-o degraded is the force option. I think the problem here is that there's sufficient damage to the one remaining device, that it cannot be mounted rw. It's sort of a chicken - egg problem, the single device available has a file system that's sufficiently damaged that it undamaged metadata to get a rw mount. Since there are too few devices for that, it fails to mount rw.

I'm not seeing it as any different from a single device volume with data/metadata profile single, with sufficient damage to cause it to not mount rw. If -o recovery can't fix it, I think it's done for.

So something Johan sent to me but didn't make the list (I've asked him to repost) is that his attempts to mount degraded,recovery,skip_balance, result in a:
/dev/mapper/bunkerA /mnt
mount: wrong fs type, bad option, bad superblock 

He gets other errors also, and he has the results of btrfs check that might be more revealing, but to say the least it looks pretty nasty.

Another question for Johann is what exact balance command was to go back to single? Was there -dconvert and -mconvert? Both are required to go from raid1/raid to single/DUP or single/single, and actually get rid of the 2nd device. And looking at the man page, I'm not sure how we do that conversion and specify which multiple device we're dropping:

[filesystem] balance start [options] <path>

With a missing device, presumably this is obvious, but…

Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-31  3:00     ` Chris Murphy
@ 2014-01-31  5:58       ` Duncan
  2014-01-31  6:10         ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Duncan @ 2014-01-31  5:58 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Thu, 30 Jan 2014 20:00:39 -0700 as excerpted:

> Another question for Johann is what exact balance command was to go back
> to single? Was there -dconvert and -mconvert? Both are required to go
> from raid1/raid to single/DUP or single/single, and actually get rid of
> the 2nd device. And looking at the man page, I'm not sure how we do that
> conversion and specify which multiple device we're dropping:
> 
> [filesystem] balance start [options] <path>
> 
> With a missing device, presumably this is obvious, but…

Of course the normal no-missing-device usage of a balance converting back 
to single would be one of:

1) Multi-device raidN converting back to single with all devices present, 
presumably either to get more space (from raid1/5/6/10), or to get back 
/some/ device-loss tolerance when converting back from raid0 (presumably 
a data-only conversion in that case, with metadata kept as the default 
multi-device raid1).

2) Single-device, converting metadata-only to single from dup, either for 
space reasons, or because the physical device is SSD (in which case 
single-device metadata default is used due to some devices doing hardware/
firmware dedup with the effect of dup metadata on such a device thus not 
entirely predictable anyway), possibly an ssd that does deduping in 
hardware, so dup mode's not particularly useful and is somewhat 
unpredictable in any case.

IOW, just because it's a conversion to single mode doesn't mean we're 
dropping a device, and a rebalance to single mode wasn't in fact designed 
to drop a device (that's what device delete is for), so it doesn't really 
need a way to specify a device to drop.  If one is missing, a rebalance 
will obviously rebalance to existing devices, that's obvious, but 
otherwise, balance isn't the command for that, device delete is.

But your question to Johann remains valid, since we don't know for sure 
that he was doing the "correct" full balance -dconvert -mconvert, and so 
far I've simply assumed he was.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-31  5:58       ` Duncan
@ 2014-01-31  6:10         ` Chris Murphy
  2014-01-31  6:13           ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2014-01-31  6:10 UTC (permalink / raw)
  To: Duncan, Johan Kröckel; +Cc: linux-btrfs

On Jan 30, 2014, at 10:58 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> 
> IOW, just because it's a conversion to single mode doesn't mean we're 
> dropping a device, and a rebalance to single mode wasn't in fact designed 
> to drop a device (that's what device delete is for)

Ahh yes, of course. So for two device raid1/raid1 going to two device single/DUP, we'd expect to see both drives still used something close to round robin. And then we'd do a device delete which would migrate chunks. I see a future optimization here :-) to avoid much of the writing this two step technique involves.

I'm also seeing many "Error reading 1647012864000, -1" with different block addresses (same -1 though), and also "1659900002304failed to load free space cache for block group" also with different numbers. Maybe hundreds of these. I'm not sure if this is due to the missing device, and it's reporting missing meta data? Or if the working device also has some problem, which depending on the configuration might implicate a single SATA controller.

Johan, can you post a full dmesg should be posted somewhere? And also smartctl -x results for the working drive?

Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-31  6:10         ` Chris Murphy
@ 2014-01-31  6:13           ` Chris Murphy
  2014-01-31  7:37             ` Duncan
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2014-01-31  6:13 UTC (permalink / raw)
  To: Duncan, Johan Kröckel; +Cc: linux-btrfs


On Jan 30, 2014, at 11:10 PM, Chris Murphy <lists@colorremedies.com> wrote:
> 
> I'm also seeing many "Error reading 1647012864000, -1" with different block addresses (same -1 though), and also "1659900002304failed to load free space cache for block group" also with different numbers. Maybe hundreds of these. I'm not sure if this is due to the missing device, and it's reporting missing meta data? Or if the working device also has some problem, which depending on the configuration might implicate a single SATA controller.

Sorry I'm being impatient since I'm away snowboarding tomorrow. The above refers to parts of Johan's btrfs check output which hasn't yet made it to the list.

Chris Murphy


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Fwd: lost with degraded RAID1
       [not found]   ` <CABgvyo822bOAHeA1GH28MPaBAU+Zdi72MD_uwL+dhopt+nwMig@mail.gmail.com>
@ 2014-01-31  6:28     ` Johan Kröckel
  0 siblings, 0 replies; 25+ messages in thread
From: Johan Kröckel @ 2014-01-31  6:28 UTC (permalink / raw)
  To: linux-btrfs

2014-01-30 Chris Murphy <lists@colorremedies.com>:
>> Now the damaged disk disappeared completely. No sign of it in /proc/diskstats.
>
> I don't know what you mean by it's disappeared complete. Are you talking about the physical block device? Or the dm logical block device?
It failed and doesn't appear anywhere. I don't have physical access to
the server at the moment.
> What do you get for:
> lsblk
> blkid
> btrfs device scan --all-devices
> btrfs fi show
root@fortknox:/# lsblk
NAME               MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                  8:0    0 232,9G  0 disk
└─sda1               8:1    0 232,9G  0 part  /
sdb                  8:16   0   1,8T  0 disk
└─sdb1               8:17   0   1,8T  0 part
  └─bunkerA (dm-0) 253:0    0   1,8T  0 crypt
sdc                  8:32   0   2,7T  0 disk
└─sdc1               8:33   0   2,7T  0 part  /USBPLATTE
loop1                7:1    0   500G  0 loop
root@fortknox:/# blkid
/dev/sda1: UUID="d055ffe7-80ed-455e-8117-4a2c8bef3ed5" TYPE="ext4"
/dev/sdc1: UUID="812c784b-df5a-48a6-98ff-5b755e29b563" TYPE="ext4"
/dev/sdb1: UUID="947206e2-488e-4dc4-b243-912901512ed4" TYPE="crypto_LUKS"
/dev/mapper/bunkerA: LABEL="bunker"
UUID="7f954a85-7566-4251-832c-44f2d3de2211"
UUID_SUB="6c66c3c1-df8f-4b29-9acb-bd923a69b3ee" TYPE="btrfs"
root@fortknox:/# btrfs device scan --all-devices
Scanning for Btrfs filesystems
root@fortknox:/# btrfs fi show
Label: 'bunker'  uuid: 7f954a85-7566-4251-832c-44f2d3de2211
        Total devices 2 FS bytes used 1.58TiB
        devid    2 size 1.82TiB used 1.59TiB path /dev/mapper/bunkerA
        *** Some devices missing

Btrfs v3.12

>> But now, I can't mount bunkerA degraded,RW because degraded
>> filesystems are not allowed to be mounted RW (?)
>
> Have you tried? What errors do you get in user space and dmesg?
root@fortknox:/# umount /mnt
umount: /mnt: not mounted
root@fortknox:/# mount -odegraded,rw /dev/mapper/bunkerA /mnt
mount: wrong fs type, bad option, bad superblock on /dev/mapper/bunkerA,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

root@fortknox:/# dmesg|tail
[756751.574065] btrfs: bdev (null) errs: wr 1375454, rd 186442, flush
0, corrupt 0, gen 0
[756751.837133] Btrfs: too many missing devices, writeable mount is not allowed
[756751.865278] btrfs: open_ctree failed
[781332.902000] btrfs: device label bunker devid 2 transid 121487 /dev/dm-0
[781630.561889] btrfs: device label bunker devid 2 transid 121487
/dev/mapper/bunkerA
[781630.564673] btrfs: allowing degraded mounts
[781630.564684] btrfs: disk space caching is enabled
[781630.598155] btrfs: bdev (null) errs: wr 1375454, rd 186442, flush
0, corrupt 0, gen 0
[781630.816989] Btrfs: too many missing devices, writeable mount is not allowed
[781630.842901] btrfs: open_ctree failed

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-29 19:16 lost with degraded RAID1 Johan Kröckel
                   ` (2 preceding siblings ...)
  2014-01-30 17:33 ` Chris Murphy
@ 2014-01-31  6:40 ` Johan Kröckel
  3 siblings, 0 replies; 25+ messages in thread
From: Johan Kröckel @ 2014-01-31  6:40 UTC (permalink / raw)
  To: linux-btrfs

Sorry, I failed using this list in conjunction with gmail and send
most answers to respectively duncan and chris directly.

So I sum it up here again:
> What do you get for:
> lsblk
> blkid
> btrfs device scan --all-devices
> btrfs fi show
root@fortknox:/# lsblk
NAME               MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                  8:0    0 232,9G  0 disk
└─sda1               8:1    0 232,9G  0 part  /
sdb                  8:16   0   1,8T  0 disk
└─sdb1               8:17   0   1,8T  0 part
  └─bunkerA (dm-0) 253:0    0   1,8T  0 crypt
sdc                  8:32   0   2,7T  0 disk
└─sdc1               8:33   0   2,7T  0 part  /USBPLATTE
loop1                7:1    0   500G  0 loop
root@fortknox:/# blkid
/dev/sda1: UUID="d055ffe7-80ed-455e-8117-4a2c8bef3ed5" TYPE="ext4"
/dev/sdc1: UUID="812c784b-df5a-48a6-98ff-5b755e29b563" TYPE="ext4"
/dev/sdb1: UUID="947206e2-488e-4dc4-b243-912901512ed4" TYPE="crypto_LUKS"
/dev/mapper/bunkerA: LABEL="bunker"
UUID="7f954a85-7566-4251-832c-44f2d3de2211"
UUID_SUB="6c66c3c1-df8f-4b29-9acb-bd923a69b3ee" TYPE="btrfs"
root@fortknox:/# btrfs device scan --all-devices
Scanning for Btrfs filesystems
root@fortknox:/# btrfs fi show
Label: 'bunker'  uuid: 7f954a85-7566-4251-832c-44f2d3de2211
        Total devices 2 FS bytes used 1.58TiB
        devid    2 size 1.82TiB used 1.59TiB path /dev/mapper/bunkerA
        *** Some devices missing

Btrfs v3.12

>> But now, I can't mount bunkerA degraded,RW because degraded
>> filesystems are not allowed to be mounted RW (?)
>
> Have you tried? What errors do you get in user space and dmesg?
root@fortknox:/# umount /mnt
umount: /mnt: not mounted
root@fortknox:/# mount -odegraded,rw /dev/mapper/bunkerA /mnt
mount: wrong fs type, bad option, bad superblock on /dev/mapper/bunkerA,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

root@fortknox:/# dmesg|tail
[756751.574065] btrfs: bdev (null) errs: wr 1375454, rd 186442, flush
0, corrupt 0, gen 0
[756751.837133] Btrfs: too many missing devices, writeable mount is not allowed
[756751.865278] btrfs: open_ctree failed
[781332.902000] btrfs: device label bunker devid 2 transid 121487 /dev/dm-0
[781630.561889] btrfs: device label bunker devid 2 transid 121487
/dev/mapper/bunkerA
[781630.564673] btrfs: allowing degraded mounts
[781630.564684] btrfs: disk space caching is enabled
[781630.598155] btrfs: bdev (null) errs: wr 1375454, rd 186442, flush
0, corrupt 0, gen 0
[781630.816989] Btrfs: too many missing devices, writeable mount is not allowed
[781630.842901] btrfs: open_ctree failed




root@fortknox:/# mount  -odegraded,recovery,skip_balance
/dev/mapper/bunkerA /mnt
mount: wrong fs type, bad option, bad superblock on /dev/mapper/bunkerA,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

root@fortknox:/# dmesg|tail
[781630.598155] btrfs: bdev (null) errs: wr 1375454, rd 186442, flush
0, corrupt 0, gen 0
[781630.816989] Btrfs: too many missing devices, writeable mount is not allowed
[781630.842901] btrfs: open_ctree failed
[781781.311741] btrfs: device label bunker devid 2 transid 121487
/dev/mapper/bunkerA
[781781.314204] btrfs: allowing degraded mounts
[781781.314219] btrfs: enabling auto recovery
[781781.314227] btrfs: disk space caching is enabled
[781781.348118] btrfs: bdev (null) errs: wr 1375454, rd 186442, flush
0, corrupt 0, gen 0
[781781.572222] Btrfs: too many missing devices, writeable mount is not allowed
[781781.597722] btrfs: open_ctree failed





But (i think) read and write errors are not increasing. I dont know
whether this is because of the missing drive or because the filesystem
has never been mounted writeable since a point in time.

root@fortknox:~# btrfs-show-super -a /dev/mapper/bunkerA
superblock: bytenr=65536, device=/dev/mapper/bunkerA
---------------------------------------------------------
csum 0x40e23443 [match]
bytenr 65536
flags 0x1
magic _BHRfS_M [match]
fsid 7f954a85-7566-4251-832c-44f2d3de2211
label bunker
generation 121487
root 1888523018240
sys_array_size 129
chunk_root_generation 121477
root_level 2
chunk_root 3304906031104
chunk_root_level 1
log_root 0
log_root_transid 0
log_root_level 0
total_bytes 4000791559680
bytes_used 1743686074368
sectorsize 4096
nodesize 4096
leafsize 4096
stripesize 4096
root_dir 6
num_devices 2
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x1
csum_type 0
csum_size 4
cache_generation 121487
uuid_tree_generation 121487
dev_item.uuid 6c66c3c1-df8f-4b29-9acb-bd923a69b3ee
dev_item.fsid 7f954a85-7566-4251-832c-44f2d3de2211 [match]
dev_item.type 0
dev_item.total_bytes 2000395771392
dev_item.bytes_used 1746986336256
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 2
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0

superblock: bytenr=67108864, device=/dev/mapper/bunkerA
---------------------------------------------------------
csum 0xe0831c8d [match]
bytenr 67108864
flags 0x1
magic _BHRfS_M [match]
fsid 7f954a85-7566-4251-832c-44f2d3de2211
label bunker
generation 121487
root 1888523018240
sys_array_size 129
chunk_root_generation 121477
root_level 2
chunk_root 3304906031104
chunk_root_level 1
log_root 0
log_root_transid 0
log_root_level 0
total_bytes 4000791559680
bytes_used 1743686074368
sectorsize 4096
nodesize 4096
leafsize 4096
stripesize 4096
root_dir 6
num_devices 2
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x1
csum_type 0
csum_size 4
cache_generation 121487
uuid_tree_generation 121487
dev_item.uuid 6c66c3c1-df8f-4b29-9acb-bd923a69b3ee
dev_item.fsid 7f954a85-7566-4251-832c-44f2d3de2211 [match]
dev_item.type 0
dev_item.total_bytes 2000395771392
dev_item.bytes_used 1746986336256
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 2
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0

superblock: bytenr=274877906944, device=/dev/mapper/bunkerA
---------------------------------------------------------
csum 0x1d044abc [match]
bytenr 274877906944
flags 0x1
magic _BHRfS_M [match]
fsid 7f954a85-7566-4251-832c-44f2d3de2211
label bunker
generation 121487
root 1888523018240
sys_array_size 129
chunk_root_generation 121477
root_level 2
chunk_root 3304906031104
chunk_root_level 1
log_root 0
log_root_transid 0
log_root_level 0
total_bytes 4000791559680
bytes_used 1743686074368
sectorsize 4096
nodesize 4096
leafsize 4096
stripesize 4096
root_dir 6
num_devices 2
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x1
csum_type 0
csum_size 4
cache_generation 121487
uuid_tree_generation 121487
dev_item.uuid 6c66c3c1-df8f-4b29-9acb-bd923a69b3ee
dev_item.fsid 7f954a85-7566-4251-832c-44f2d3de2211 [match]
dev_item.type 0
dev_item.total_bytes 2000395771392
dev_item.bytes_used 1746986336256
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 2
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0




root@fortknox:~# btrfsck /dev/mapper/bunkerA
warning, device 1 is missing
warning devid 1 not found already
Checking filesystem on /dev/mapper/bunkerA
UUID: 7f954a85-7566-4251-832c-44f2d3de2211
checking extents
checking free space cache
Error reading 1647012864000, -1
Error reading 1647819816960, -1
Error reading 1647965962240, -1
Error reading 1647006314496, -1
Error reading 2778306510848, -1
Error reading 1655481655296, -1
Error reading 1655482179584, -1
Error reading 1655493296128, -1
Error reading 1788748169216, -1
Error reading 1644927381504, -1
Error reading 1671399948288, -1
Error reading 1646999830528, -1
Error reading 1647013462016, -1
Error reading 1738282172416, -1
Error reading 1648014381056, -1
Error reading 1655474946048, -1
Error reading 2778315161600, -1
failed to load free space cache for block group 1644867616768failed to
load free space cache for block group 1645941358592failed to load free
space cache for block group 1647015100416failed to load free space
cache for block group 1654531293184failed to load free space cache for
block group 1657752518656failed to load free space cache for block
group 1658826260480failed to load free space cache for block group
1659900002304failed to load free space cache for block group
1667416195072failed to load free space cache for block group
1670637420544failed to load free space cache for block group
1676006129664failed to load free space cache for block group
1681374838784failed to load free space cache for block group
1695333482496failed to load free space cache for block group
1698554707968failed to load free space cache for block group
1699628449792failed to load free space cache for block group
1700702191616failed to load free space cache for block group
1703923417088failed to load free space cache for block group Error
reading 2778315685888, -1
Error reading 2778315948032, -1
Error reading 2778316210176, -1
Error reading 2778316472320, -1
Error reading 2778317520896, -1
Error reading 2778318307328, -1
Error reading 2778318569472, -1
Error reading 2778320142336, -1
Error reading 2778320404480, -1
Error reading 2778320666624, -1
Error reading 1937998594048, -1
Error reading 1646997389312, -1
Error reading 1794117312512, -1
Error reading 1647971205120, -1
Error reading 1655480606720, -1
Error reading 1655481917440, -1
Error reading 1968063332352, -1
1715734577152failed to load free space cache for block group
1726471995392failed to load free space cache for block group
1742578122752failed to load free space cache for block group
1745799348224failed to load free space cache for block group
1752241799168failed to load free space cache for block group
1753315540992failed to load free space cache for block group
1758684250112failed to load free space cache for block group
1761905475584failed to load free space cache for block group
1762979217408failed to load free space cache for block group
1766200442880failed to load free space cache for block group
1774790377472failed to load free space cache for block group
1775864119296failed to load free space cache for block group
1780159086592failed to load free space cache for block group
1781232828416failed to load free space cache for block group
1782306570240failed to load free space cache for block group
1783380312064failed to load free space cache for block group
1786601537536failed to load free space cache forError reading
1655482441728, -1
Error reading 1655482949632, -1
Error reading 1975579820032, -1
Error reading 1646999044096, -1
Error reading 2099058638848, -1
Error reading 1655506403328, -1
Error reading 1661504389120, -1
Error reading 1747946557440, -1
Error reading 1752241340416, -1
Error reading 1766200131584, -1
Error reading 1769420718080, -1
Error reading 1776937304064, -1
Error reading 1778010673152, -1
Error reading 1778010935296, -1
Error reading 1784452825088, -1
Error reading 1786600423424, -1
Error reading 1787673284608, -1
 block group 1787675279360failed to load free space cache for block
group 1788749021184failed to load free space cache for block group
1789822763008failed to load free space cache for block group
1790896504832failed to load free space cache for block group
1791970246656failed to load free space cache for block group
1794117730304failed to load free space cache for block group
1795191472128failed to load free space cache for block group
1796265213952failed to load free space cache for block group
1797338955776failed to load free space cache for block group
1798412697600failed to load free space cache for block group
1799486439424failed to load free space cache for block group
1800560181248failed to load free space cache for block group
1801633923072failed to load free space cache for block group
1802707664896failed to load free space cache for block group
1803781406720failed to load free space cache for block group
1804855148544failed to load free space cache for block group
1805928890368failed to load free spError reading 1787674857472, -1
Error reading 1768347648000, -1
Error reading 1645348323328, -1
Error reading 1671399424000, -1
Error reading 1647011557376, -1
Error reading 1661503520768, -1
Error reading 1811296747520, -1
Error reading 1992759660544, -1
Error reading 2778269958144, -1
Error reading 2099059163136, -1
Error reading 3065128484864, -1
Error reading 1720028954624, -1
Error reading 1778011197440, -1
Error reading 1920818544640, -1
Error reading 1937998856192, -1
Error reading 1947662532608, -1
Error reading 2474869772288, -1
ace cache for block group 1807002632192failed to load free space cache
for block group 1808076374016failed to load free space cache for block
group 1810223857664failed to load free space cache for block group
1813445083136failed to load free space cache for block group
1815592566784failed to load free space cache for block group
1816666308608failed to load free space cache for block group
1818813792256failed to load free space cache for block group
1820961275904failed to load free space cache for block group
1827403726848failed to load free space cache for block group
1828477468672failed to load free space cache for block group
1829551210496failed to load free space cache for block group
1830624952320failed to load free space cache for block group
1832772435968failed to load free space cache for block group
1834919919616failed to load free space cache for block group
1835993661440failed to load free space cache for block group
1843509854208failed to load free space cache for block group
1844583596032failed toError reading 1721102303232, -1
Error reading 1957325766656, -1
Error reading 2164558200832, -1
Error reading 1912229044224, -1
Error reading 3099322155008, -1
Error reading 2778243129344, -1
Error reading 2150599700480, -1
Error reading 2163484598272, -1
Error reading 1722176700416, -1
Error reading 3099324702720, -1
Error reading 2099058900992, -1
Error reading 2778206109696, -1
Error reading 2778271531008, -1
Error reading 1646997127168, -1
Error reading 2499565780992, -1
Error reading 2099059609600, -1
 load free space cache for block group 1845657337856failed to load
free space cache for block group 1846731079680failed to load free
space cache for block group 1849952305152failed to load free space
cache for block group 1851026046976failed to load free space cache for
block group 1854247272448failed to load free space cache for block
group 1857468497920failed to load free space cache for block group
1858542239744failed to load free space cache for block group
1859615981568failed to load free space cache for block group
1863910948864failed to load free space cache for block group
1866058432512failed to load free space cache for block group
1872500883456failed to load free space cache for block group
1874648367104failed to load free space cache for block group
1880017076224failed to load free space cache for block group
1882164559872failed to load free space cache for block group
1887533268992failed to load free space cache for block group
1888607010816failed to load free space cache for block group
188968075Error reading 2080806600704, -1
Error reading 1671398899712, -1
Error reading 2257973387264, -1
Error reading 2173148192768, -1
Error reading 2500639551488, -1
Error reading 3222112632832, -1
Error reading 1941220032512, -1
Error reading 2049667981312, -1
Error reading 1646987771904, -1
Error reading 1743650672640, -1
Error reading 1943366107136, -1
Error reading 2778275590144, -1
Error reading 1644910018560, -1
Error reading 2778272055296, -1
Error reading 2778272317440, -1
Error reading 3211456638976, -1
Error reading 1749019860992, -1
2640failed to load free space cache for block group
1890754494464failed to load free space cache for block group
1891828236288failed to load free space cache for block group
1893975719936failed to load free space cache for block group
1895049461760failed to load free space cache for block group
1909008105472failed to load free space cache for block group
1911155589120failed to load free space cache for block group
1912229330944failed to load free space cache for block group
1917598040064failed to load free space cache for block group
1919745523712failed to load free space cache for block group
1925114232832failed to load free space cache for block group
1931556683776failed to load free space cache for block group
1936925392896failed to load free space cache for block group
1937999134720failed to load free space cache for block group
1939072876544failed to load free space cache for block group
1940146618368failed to load free space cache for block group
1944441585664failed to load free space cache for block grError reading
2147378200576, -1
Error reading 1959473307648, -1
Error reading 2109797539840, -1
Error reading 2398634049536, -1
Error reading 3075524984832, -1
Error reading 2873227935744, -1
Error reading 2139862138880, -1
Error reading 2895542550528, -1
Error reading 2165631942656, -1
Error reading 2778217775104, -1
Error reading 2778272841728, -1
Error reading 2235425173504, -1
Error reading 2778277281792, -1
Error reading 2150599438336, -1
Error reading 2313807921152, -1
Error reading 1945514999808, -1
Error reading 2538220527616, -1
oup 1945515327488failed to load free space cache for block group
1946589069312failed to load free space cache for block group
1953031520256failed to load free space cache for block group
1956252745728failed to load free space cache for block group
1958400229376failed to load free space cache for block group
1959473971200failed to load free space cache for block group
1960547713024failed to load free space cache for block group
1966990163968failed to load free space cache for block group
1971285131264failed to load free space cache for block group
1974506356736failed to load free space cache for block group
1978801324032failed to load free space cache for block group
1983096291328failed to load free space cache for block group
1985243774976failed to load free space cache for block group
1987391258624failed to load free space cache for block group
1990612484096failed to load free space cache for block group
1991686225920failed to load free space cache for block group
2005644869632failed to load free space cacheError reading
2645594722304, -1
Error reading 1782306279424, -1
Error reading 2944094769152, -1
Error reading 1647003238400, -1
Error reading 2778243915776, -1
Error reading 2778210566144, -1
Error reading 2778244177920, -1
Error reading 2273005338624, -1
Error reading 2256900136960, -1
Error reading 3203940081664, -1
Error reading 2280522493952, -1
Error reading 3257540214784, -1
Error reading 2596202586112, -1
Error reading 2240794132480, -1
Error reading 2778308608000, -1
Error reading 2849604960256, -1
Error reading 3005297262592, -1
 for block group 2008866095104failed to load free space cache for
block group 2028193447936failed to load free space cache for block
group 2030340931584failed to load free space cache for block group
2035709640704failed to load free space cache for block group
2038930866176failed to load free space cache for block group
2043225833472failed to load free space cache for block group
2046447058944failed to load free space cache for block group
2047520800768failed to load free space cache for block group
2048594542592failed to load free space cache for block group
2057184477184failed to load free space cache for block group
2061479444480failed to load free space cache for block group
2064700669952failed to load free space cache for block group
2065774411776failed to load free space cache for block group
2068995637248failed to load free space cache for block group
2071143120896failed to load free space cache for block group
2072216862720failed to load free space cache for block group
2082954280960failed to load freError reading 1647005265920, -1
Error reading 3051468955648, -1
Error reading 2273006305280, -1
Error reading 3099324964864, -1
Error reading 2276227534848, -1
Error reading 2247236485120, -1
Error reading 2908661415936, -1
Error reading 3060058935296, -1
Error reading 2778244440064, -1
Error reading 2987044610048, -1
Error reading 3337841999872, -1
Error reading 1751167737856, -1
Error reading 1647006646272, -1
Error reading 1825255927808, -1
Error reading 3099325227008, -1
Error reading 3133073588224, -1
Error reading 2842089160704, -1
e space cache for block group 2084028022784failed to load free space
cache for block group 2085101764608failed to load free space cache for
block group 2090470473728failed to load free space cache for block
group 2092617957376failed to load free space cache for block group
2093691699200failed to load free space cache for block group
2096912924672failed to load free space cache for block group
2097986666496failed to load free space cache for block group
2102281633792failed to load free space cache for block group
2108724084736failed to load free space cache for block group
2109797826560failed to load free space cache for block group
2110871568384failed to load free space cache for block group
2113019052032failed to load free space cache for block group
2116240277504failed to load free space cache for block group
2121608986624failed to load free space cache for block group
2123756470272failed to load free space cache for block group
2136641372160failed to load free space cache for block group
2146305048576faileError reading 3203940343808, -1
Error reading 2300923604992, -1
Error reading 2842089422848, -1
Error reading 3209145024512, -1
Error reading 2945168691200, -1
Error reading 2924767465472, -1
Error reading 2917251407872, -1
Error reading 2353536950272, -1
Error reading 1809149812736, -1
Error reading 3101692198912, -1
Error reading 2778244702208, -1
Error reading 3101692461056, -1
Error reading 3101692723200, -1
Error reading 2778267860992, -1
Error reading 3099325489152, -1
Error reading 2598350065664, -1
d to load free space cache for block group 2148452532224failed to load
free space cache for block group 2149526274048failed to load free
space cache for block group 2154894983168failed to load free space
cache for block group 2155968724992failed to load free space cache for
block group 2157042466816failed to load free space cache for block
group 2159189950464failed to load free space cache for block group
2162411175936failed to load free space cache for block group
2163484917760failed to load free space cache for block group
2164558659584failed to load free space cache for block group
2167779885056failed to load free space cache for block group
2168853626880failed to load free space cache for block group
2169927368704failed to load free space cache for block group
2172074852352failed to load free space cache for block group
2173148594176failed to load free space cache for block group
2175296077824failed to load free space cache for block group
2181738528768failed to load free space cache for block group
21849Error reading 1647008669696, -1
Error reading 2778218299392, -1
Error reading 2778245226496, -1
Error reading 3099326013440, -1
Error reading 2275153801216, -1
Error reading 3078046679040, -1
Error reading 1646993993728, -1
Error reading 3101693247488, -1
Error reading 2999929532416, -1
Error reading 2972012052480, -1
Error reading 2778245488640, -1
Error reading 2778206633984, -1
Error reading 3099327062016, -1
Error reading 3099328110592, -1
Error reading 3099328372736, -1
Error reading 2368569409536, -1
Error reading 2160263286784, -1
59754240failed to load free space cache for block group
2193549688832failed to load free space cache for block group
2194623430656failed to load free space cache for block group
2196770914304failed to load free space cache for block group
2197844656128failed to load free space cache for block group
2199992139776failed to load free space cache for block group
2207508332544failed to load free space cache for block group
2212877041664failed to load free space cache for block group
2215024525312failed to load free space cache for block group
2216098267136failed to load free space cache for block group
2217172008960failed to load free space cache for block group
2218245750784failed to load free space cache for block group
2219319492608failed to load free space cache for block group
2220393234432failed to load free space cache for block group
2225761943552failed to load free space cache for block group
2228983169024failed to load free space cache for block group
2230056910848failed to load free space cache for blocError reading
3099329159168, -1
Error reading 3099330207744, -1
Error reading 2813098323968, -1
Error reading 2778216988672, -1
Error reading 1646990393344, -1
Error reading 2111944982528, -1
Error reading 2203213070336, -1
Error reading 1648066887680, -1
Error reading 2502787137536, -1
Error reading 2777428246528, -1
Error reading 2217171730432, -1
Error reading 2283743698944, -1
Error reading 2778245750784, -1
Error reading 2155968266240, -1
Error reading 2510303260672, -1
Error reading 2711092965376, -1
Error reading 2788402348032, -1
k group 2232204394496failed to load free space cache for block group
2233278136320failed to load free space cache for block group
2234351878144failed to load free space cache for block group
2235425619968failed to load free space cache for block group
2236499361792failed to load free space cache for block group
2237573103616failed to load free space cache for block group
2238646845440failed to load free space cache for block group
2240794329088failed to load free space cache for block group
2244015554560failed to load free space cache for block group
2246163038208failed to load free space cache for block group
2247236780032failed to load free space cache for block group
2248310521856failed to load free space cache for block group
2249384263680failed to load free space cache for block group
2250458005504failed to load free space cache for block group
2252605489152failed to load free space cache for block group
2253679230976failed to load free space cache for block group
2254752972800failed to load free space cError reading 2543589195776,
-1
Error reading 2970937540608, -1
Error reading 2996708245504, -1
Error reading 1701765120000, -1
Error reading 2778246012928, -1
Error reading 2623046090752, -1
Error reading 2777427148800, -1
Error reading 2917250883584, -1
Error reading 2917251145728, -1
Error reading 3013888114688, -1
Error reading 1788747644928, -1
Error reading 2365347962880, -1
Error reading 3054690205696, -1
Error reading 2918324613120, -1
Error reading 2777427722240, -1
Error reading 2777428508672, -1
Error reading 2961274667008, -1
ache for block group 2255826714624failed to load free space cache for
block group 2256900456448failed to load free space cache for block
group 2257974198272failed to load free space cache for block group
2259047940096failed to load free space cache for block group
2260121681920failed to load free space cache for block group
2263342907392failed to load free space cache for block group
2264416649216failed to load free space cache for block group
2271932841984failed to load free space cache for block group
2273006583808failed to load free space cache for block group
2274080325632failed to load free space cache for block group
2275154067456failed to load free space cache for block group
2277301551104failed to load free space cache for block group
2278375292928failed to load free space cache for block group
2282670260224failed to load free space cache for block group
2283744002048failed to load free space cache for block group
2284817743872failed to load free space cache for block group
2299850129408failed to loadError reading 3033215467520, -1
Error reading 3076165009408, -1
Error reading 2778280181760, -1
Error reading 3099331256320, -1
Error reading 1738281648128, -1
Error reading 3099340431360, -1
Error reading 2777431654400, -1
Error reading 2973085794304, -1
Error reading 2777426886656, -1
Error reading 2923693670400, -1
Error reading 3099340693504, -1
Error reading 3099340955648, -1
Error reading 2537146613760, -1
Error reading 3099341479936, -1
Error reading 3020330565632, -1
Error reading 3025699323904, -1
Error reading 2975232884736, -1
 free space cache for block group 2312735031296failed to load free
space cache for block group 2319177482240failed to load free space
cache for block group 2320251224064failed to load free space cache for
block group 2321324965888failed to load free space cache for block
group 2326693675008failed to load free space cache for block group
2327767416832failed to load free space cache for block group
2328841158656failed to load free space cache for block group
2352463478784failed to load free space cache for block group
2354610962432failed to load free space cache for block group
2355684704256failed to load free space cache for block group
2364274638848failed to load free space cache for block group
2376085798912failed to load free space cache for block group
2378233282560failed to load free space cache for block group
2384675733504failed to load free space cache for block group
2386823217152failed to load free space cache for block group
2397560635392failed to load free space cache for block group
2408298053632fError reading 1791968215040, -1
Error reading 1647009984512, -1
Error reading 2929062400000, -1
Error reading 2149525815296, -1
Error reading 1646985674752, -1
Error reading 3029994242048, -1
Error reading 2778207158272, -1
Error reading 3079048380416, -1
Error reading 3058985140224, -1
Error reading 3074017714176, -1
Error reading 2778224590848, -1
Error reading 3099342528512, -1
Error reading 3099342790656, -1
Error reading 2157042008064, -1
Error reading 2778225901568, -1
Error reading 3209144762368, -1
ailed to load free space cache for block group 2420109213696failed to
load free space cache for block group 2427625406464failed to load free
space cache for block group 2439436566528failed to load free space
cache for block group 2440510308352failed to load free space cache for
block group 2441584050176failed to load free space cache for block
group 2451247726592failed to load free space cache for block group
2473796304896failed to load free space cache for block group
2478091272192failed to load free space cache for block group
2479165014016failed to load free space cache for block group
2498492366848failed to load free space cache for block group
2499566108672failed to load free space cache for block group
2509229785088failed to load free space cache for block group
2515672236032failed to load free space cache for block group
2530704621568failed to load free space cache for block group
2534999588864failed to load free space cache for block group
2542515781632failed to load free space cache for block group 2Error
reading 2778230095872, -1
Error reading 2777433219072, -1
Error reading 2778207420416, -1
Error reading 1655478509568, -1
Error reading 1646986199040, -1
Error reading 3217720803328, -1
Error reading 3222111846400, -1
Error reading 2778305724416, -1
Error reading 2778229309440, -1
Error reading 2778270220288, -1
Error reading 2778277560320, -1
Error reading 2882891501568, -1
Error reading 2778219085824, -1
Error reading 2778226163712, -1
Error reading 1646986723328, -1
Error reading 1646986985472, -1
Error reading 1646987247616, -1
546810748928failed to load free space cache for block group
2570433069056failed to load free space cache for block group
2571506810880failed to load free space cache for block group
2573654294528failed to load free space cache for block group
2582244229120failed to load free space cache for block group
2585465454592failed to load free space cache for block group
2595129131008failed to load free space cache for block group
2597276614656failed to load free space cache for block group
2620898934784failed to load free space cache for block group
2621972676608failed to load free space cache for block group
2632710094848failed to load free space cache for block group
2661701124096failed to load free space cache for block group
2686397186048failed to load free space cache for block group
2688544669696failed to load free space cache for block group
2718609440768failed to load free space cache for block group
2752969179136failed to load free space cache for block group
2758337888256failed to load free space cache for Error reading
3099343052800, -1
Error reading 1728619188224, -1
Error reading 3079048642560, -1
Error reading 1647012601856, -1
Error reading 1786600161280, -1
Error reading 1655494344704, -1
Error reading 2778308083712, -1
Error reading 2778310180864, -1
Error reading 1644921614336, -1
Error reading 2778312540160, -1
Error reading 1683453624320, -1
Error reading 1797337710592, -1
Error reading 1645331808256, -1
Error reading 3209129295872, -1
Error reading 2998855598080, -1
Error reading 2787328196608, -1
Error reading 2831352135680, -1
block group 2759411630080failed to load free space cache for block
group 2773370273792failed to load free space cache for block group
2774444015616failed to load free space cache for block group
2775517757440failed to load free space cache for block group
2776591499264failed to load free space cache for block group
2777665241088failed to load free space cache for block group
2778738982912failed to load free space cache for block group
2779812724736failed to load free space cache for block group
2780886466560failed to load free space cache for block group
2786255175680failed to load free space cache for block group
2789476401152failed to load free space cache for block group
2795918852096failed to load free space cache for block group
2807730012160failed to load free space cache for block group
2810951237632failed to load free space cache for block group
2812024979456failed to load free space cache for block group
2841016008704failed to load free space cache for block group
2847458459648failed to load free spaError reading 1645349634048, -1
Error reading 2883965411328, -1
Error reading 2837794250752, -1
Error reading 2970937802752, -1
Error reading 3099362435072, -1
Error reading 1646997995520, -1
Error reading 2877522681856, -1
Error reading 3222112894976, -1
Error reading 2778232455168, -1
Error reading 3066863624192, -1
ce cache for block group 2860343361536failed to load free space cache
for block group 2862490845184failed to load free space cache for block
group 2865712070656failed to load free space cache for block group
2873228263424failed to load free space cache for block group
2876449488896failed to load free space cache for block group
2882891939840failed to load free space cache for block group
2883965681664failed to load free space cache for block group
2885039423488failed to load free space cache for block group
2886113165312failed to load free space cache for block group
2889334390784failed to load free space cache for block group
2890408132608free space inode generation (0) did not match free space
cache generation (20905)
Error reading 1661505437696, -1
Error reading 2165631680512, -1
Error reading 2778314375168, -1
Error reading 2778314899456, -1
Error reading 1646988296192, -1
Error reading 2318103281664, -1
Error reading 2778318045184, -1
Error reading 3066863886336, -1
Error reading 3099360862208, -1
Error reading 1646989869056, -1
Error reading 3066864148480, -1
Error reading 1647005597696, -1
Error reading 3099361124352, -1
Error reading 1646992752640, -1
Error reading 1655480868864, -1
Error reading 1644921085952, -1
Error reading 2891481677824, -1
failed to load free space cache for block group 2894703099904failed to
load free space cache for block group 2900071809024failed to load free
space cache for block group 2902219292672failed to load free space
cache for block group 2903293034496failed to load free space cache for
block group 2910809227264failed to load free space cache for block
group 2912956710912failed to load free space cache for block group
2914030452736failed to load free space cache for block group
2915104194560failed to load free space cache for block group
2917251678208failed to load free space cache for block group
2921546645504failed to load free space cache for block group
2922620387328failed to load free space cache for block group
2924767870976failed to load free space cache for block group
2925841612800failed to load free space cache for block group
2926915354624failed to load free space cache for block group
2930136580096failed to load free space cache for block group
2931210321920failed to load free space cache for block group Error
reading 2916176310272, -1
Error reading 3038584225792, -1
Error reading 3046100369408, -1
Error reading 1647972515840, -1
Error reading 2923693408256, -1
Error reading 1645349109760, -1
Error reading 3058345299968, -1
Error reading 1988464431104, -1
Error reading 3058346348544, -1
Error reading 1647004549120, -1
Error reading 2951610437632, -1
Error reading 3099360337920, -1
Error reading 2994560761856, -1
Error reading 2997781913600, -1
Error reading 2348168204288, -1
Error reading 3066864934912, -1
Error reading 1644895338496, -1
2936579031040failed to load free space cache for block group
2940873998336failed to load free space cache for block group
2944095223808failed to load free space cache for block group
2948390191104failed to load free space cache for block group
2950537674752failed to load free space cache for block group
2954832642048failed to load free space cache for block group
2956980125696failed to load free space cache for block group
2964496318464failed to load free space cache for block group
2965570060288failed to load free space cache for block group
2969865027584failed to load free space cache for block group
2973086253056failed to load free space cache for block group
2975233736704failed to load free space cache for block group
2977381220352failed to load free space cache for block group
2980602445824failed to load free space cache for block group
2988118638592failed to load free space cache for block group
2990266122240failed to load free space cache for block group
2993487347712failed to load free space cache forError reading
3066865197056, -1
Error reading 1646989344768, -1
Error reading 2999929008128, -1
Error reading 2887186579456, -1
Error reading 1655477182464, -1
Error reading 1645349371904, -1
Error reading 3099361386496, -1
Error reading 1671399686144, -1
Error reading 3005297000448, -1
Error reading 1644915576832, -1
Error reading 1645348847616, -1
Error reading 2379306696704, -1
Error reading 3061920546816, -1
Error reading 3099362172928, -1
Error reading 1647976710144, -1
Error reading 1644915838976, -1
Error reading 1644926586880, -1
 block group 2994561089536failed to load free space cache for block
group 2996708573184failed to load free space cache for block group
2998856056832failed to load free space cache for block group
2999929798656failed to load free space cache for block group
3002077282304failed to load free space cache for block group
3004224765952failed to load free space cache for block group
3005298507776failed to load free space cache for block group
3006372249600failed to load free space cache for block group
3009593475072failed to load free space cache for block group
3013888442368failed to load free space cache for block group
3021404635136failed to load free space cache for block group
3022478376960failed to load free space cache for block group
3036437020672failed to load free space cache for block group
3037510762496failed to load free space cache for block group
3047174438912failed to load free space cache for block group
3066501791744failed to load free space cache for block group
3098714046464failed to load free spError reading 1648000040960, -1
Error reading 1669556535296, -1
Error reading 1671400472576, -1
Error reading 1644917489664, -1
Error reading 1644922470400, -1
Error reading 1644929810432, -1
Error reading 1644925353984, -1
Error reading 1645326041088, -1
Error reading 1647002632192, -1
Error reading 1647004286976, -1
Error reading 1648030711808, -1
Error reading 1647968583680, -1
Error reading 1648054566912, -1
Error reading 1648064790528, -1
Error reading 1648074096640, -1
Error reading 1647985623040, -1
Error reading 1655474683904, -1
ace cache for block group 3099787788288failed to load free space cache
for block group 3100861530112failed to load free space cache for block
group 3169581006848failed to load free space cache for block group
3171728490496failed to load free space cache for block group
3221120614400failed to load free space cache for block group
3222194356224failed to load free space cache for block group
3308135645184failed to load free space cache for block group
3311356870656failed to load free space cache for block group
3312430612480failed to load free space cache for block group
3313504354304failed to load free space cache for block group
3314578096128failed to load free space cache for block group
3315651837952failed to load free space cache for block group
3316725579776failed to load free space cache for block group
3317799321600failed to load free space cache for block group
3319946805248failed to load free space cache for block group
3321020547072failed to load free space cache for block group
3322094288896failed toError reading 1655477985280, -1
Error reading 1655479296000, -1
Error reading 1655499849728, -1
Error reading 1655506927616, -1
Error reading 1671703769088, -1
Error reading 1676003966976, -1
Error reading 1677069975552, -1
Error reading 1699627622400, -1
Error reading 1655479820288, -1
checking fs roots
checking csums
checking root refs
 load free space cache for block group 3323168030720failed to load
free space cache for block group 3324241772544failed to load free
space cache for block group 3325315514368failed to load free space
cache for block group 3326389256192failed to load free space cache for
block group 3327462998016failed to load free space cache for block
group 3328536739840failed to load free space cache for block group
3329610481664failed to load free space cache for block group
3330684223488failed to load free space cache for block group
3350011576320failed to load free space cache for block group
3454164533248found 23759468894 bytes used err is 0
total csum bytes: 1533072864
total tree bytes: 1728692224
total fs tree bytes: 44019712
total extent tree bytes: 47996928
btree space waste bytes: 46886467
file data blocks allocated: 4140676993024
 referenced 1741813084160
Btrfs v3.12

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-31  6:13           ` Chris Murphy
@ 2014-01-31  7:37             ` Duncan
  0 siblings, 0 replies; 25+ messages in thread
From: Duncan @ 2014-01-31  7:37 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Thu, 30 Jan 2014 23:13:36 -0700 as excerpted:

> On Jan 30, 2014, at 11:10 PM, Chris Murphy <lists@colorremedies.com>
> wrote:
>> 
>> I'm also seeing many "Error reading 1647012864000, -1" with different
>> block addresses (same -1 though), and also "1659900002304failed to load
>> free space cache for block group" also with different numbers. Maybe
>> hundreds of these. I'm not sure if this is due to the missing device,
>> and it's reporting missing meta data? Or if the working device also has
>> some problem, which depending on the configuration might implicate a
>> single SATA controller.
> 
> Sorry I'm being impatient since I'm away snowboarding tomorrow. The
> above refers to parts of Johan's btrfs check output which hasn't yet
> made it to the list.

Break a leg! =:^)  (If you've done much drama/acting you may know the 
reference; there's a traditional superstition that wishing someone .... 
luck is a hex and they'll forget their lines or worse!  So you wish them 
to break a leg instead!  =:^)

I'm assuming the error reading errors are due to the missing device, with 
the round-robin trying to read the bad device 50% of the time, but if it 
was full raid1, then the existing copy should be found and read instead, 
assuming of course that the checksum verifies, so despite all the 
alarming looking noise, I believe it's harmless... assuming it was indeed 
full raid1 both data and metadata.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-01-30 18:25     ` Chris Murphy
@ 2014-02-03 20:55       ` Johan Kröckel
  2014-02-03 21:08         ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Johan Kröckel @ 2014-02-03 20:55 UTC (permalink / raw)
  Cc: Btrfs BTRFS

2014-01-30 Chris Murphy <lists@colorremedies.com>:
>
> On Jan 30, 2014, at 10:58 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
>
>> On Thu, Jan 30, 2014 at 10:33:21AM -0700, Chris Murphy wrote:
>>> You're doing an online conversion of a degraded raid1 volume into single? Does anyone know if this is expected or intended to work?
>>
>>   I don't see why not. One suggested method of recovering RAID from a
>> degraded situation is to rebalance over just the remaining devices
>> (space permitting, of course).
>
> Right but that's not a conversion. That's a regular balance on a degraded mount, with multiple remaining devices: e.g. a 4 disk raid1, drive fails, mount -o degraded, delete missing, then balance will replicate any missing 2nd copies onto three drives.
>
> The bigger problem at the moment is that -o degraded isn't working for Johan. The too many missing devices message seems like a bug and with limited information it may even be whatever that bug is, that cause the conversion to fail. Some 11GB were converted prior to the failure.
Which usefull information can provide. On the weekend I was at the
server and found out, that the vanishing of the drive at reboot was
strange behavior of the bios. So the drive is online again. but the
filesystem is still showing strange behavior, but now I can mount it
rw.
> Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-02-03 20:55       ` Johan Kröckel
@ 2014-02-03 21:08         ` Chris Murphy
  2014-02-03 21:31           ` Johan Kröckel
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2014-02-03 21:08 UTC (permalink / raw)
  To: Johan Kröckel; +Cc: Btrfs BTRFS


On Feb 3, 2014, at 1:55 PM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:

> 2014-01-30 Chris Murphy <lists@colorremedies.com>:
>> 
>> On Jan 30, 2014, at 10:58 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> 
>>> On Thu, Jan 30, 2014 at 10:33:21AM -0700, Chris Murphy wrote:
>>>> You're doing an online conversion of a degraded raid1 volume into single? Does anyone know if this is expected or intended to work?
>>> 
>>>  I don't see why not. One suggested method of recovering RAID from a
>>> degraded situation is to rebalance over just the remaining devices
>>> (space permitting, of course).
>> 
>> Right but that's not a conversion. That's a regular balance on a degraded mount, with multiple remaining devices: e.g. a 4 disk raid1, drive fails, mount -o degraded, delete missing, then balance will replicate any missing 2nd copies onto three drives.
>> 
>> The bigger problem at the moment is that -o degraded isn't working for Johan. The too many missing devices message seems like a bug and with limited information it may even be whatever that bug is, that cause the conversion to fail. Some 11GB were converted prior to the failure.
> Which usefull information can provide. On the weekend I was at the
> server and found out, that the vanishing of the drive at reboot was
> strange behavior of the bios. So the drive is online again. but the
> filesystem is still showing strange behavior, but now I can mount it
> rw.

I'd like to see btrfs fi df results for the volume. And new btrfs check. And then a backup if needed, and then a scrub to see if that fixes anything broken between them. I'm not sure what happens if a new generation object is broken and the old generation is OK, what scrub will do? Maybe it just reports it, I'm not sure. If you want you could do a btrfs scrub -r which is read only and just reports what the problems are.

You also have an incomplete balance, right? So it's possible some things might not be fixable if the conversion to single was successful. You'll need to decide if you want to reconvert back to data/metadata raid1/raid from whatever you're at now.


Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-02-03 21:08         ` Chris Murphy
@ 2014-02-03 21:31           ` Johan Kröckel
  2014-02-07 11:34             ` Johan Kröckel
  0 siblings, 1 reply; 25+ messages in thread
From: Johan Kröckel @ 2014-02-03 21:31 UTC (permalink / raw)
  Cc: Btrfs BTRFS

State is: I wont use this filesystem again. I have a backup. So I am
interested to give the necessary information for debuging it and
afterwards format it and create a new one. I already did fscks and
btrfschk --repair and pushed the output to txt-files but they are more
than 4 mb in size.

So I will post excerpts:

============file: btrfsck.out===========================
Checking filesystem on /dev/mapper/bunkerA
UUID: 7f954a85-7566-4251-832c-44f2d3de2211
42
parent transid verify failed on 1887688011776 wanted 121037 found 88533
parent transid verify failed on 1888518615040 wanted 121481 found 90267
parent transid verify failed on 1681394102272 wanted 110919 found 91024
parent transid verify failed on 1888522838016 wanted 121486 found 90270
parent transid verify failed on 1888398331904 wanted 121062 found 89987
leaf parent key incorrect 1887867330560
bad block 1887867330560
leaf parent key incorrect 1888120320000
bad block 1888120320000
leaf parent key incorrect 1888124637184
bad block 1888124637184
leaf parent key incorrect 1888131444736
bad block 1888131444736

[...and so on for 4MB]

bad block 1888513552384
leaf parent key incorrect 1888513642496
bad block 1888513642496
leaf parent key incorrect 1888513654784
bad block 1888513654784
leaf parent key incorrect 1888514023424
bad block 1888514023424
btrfsck: cmds-check.c:2212: check_owner_ref: Assertion `!(rec->is_root)' failed.

================file: smartctl-before-btrfschk-repair==============
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.12-0.bpo.1-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail
Always       -       172055696
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       7
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000f   069   060   030    Pre-fail
Always       -       9085642
  9 Power_On_Hours          0x0032   097   097   000    Old_age
Always       -       2769
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       7
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age
Always       -       0
189 High_Fly_Writes         0x003a   083   083   000    Old_age
Always       -       17
190 Airflow_Temperature_Cel 0x0022   077   071   045    Old_age
Always       -       23 (Min/Max 22/23)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       5
193 Load_Cycle_Count        0x0032   100   100   000    Old_age
Always       -       7
194 Temperature_Celsius     0x0022   023   040   000    Old_age
Always       -       23 (0 20 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0

=================file:btrfsck-repair.out========================
enabling repair mode
Checking filesystem on /dev/mapper/bunkerA
UUID: 7f954a85-7566-4251-832c-44f2d3de2211
ify failed on 1887688011776 wanted 121037 found 88533
parent transid verify failed on 1888518615040 wanted 121481 found 90267
parent transid verify failed on 1681394102272 wanted 110919 found 91024
parent transid verify failed on 1888522838016 wanted 121486 found 90270
parent transid verify failed on 1888398331904 wanted 121062 found 89987
leaf parent key incorrect 1887867330560
bad block 1887867330560

[...and so on for 4MB]

bad block 1888513642496
leaf parent key incorrect 1888513654784
bad block 1888513654784
leaf parent key incorrect 1888514023424
bad block 1888514023424
btrfsck: cmds-check.c:2212: check_owner_ref: Assertion `!(rec->is_root)' failed.

==============file:smartctl-after-btrfschk-repair==================
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.12-0.bpo.1-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail
Always       -       178377016
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       7
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000f   069   060   030    Pre-fail
Always       -       9087571
  9 Power_On_Hours          0x0032   097   097   000    Old_age
Always       -       2769
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       7
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age
Always       -       0
189 High_Fly_Writes         0x003a   083   083   000    Old_age
Always       -       17
190 Airflow_Temperature_Cel 0x0022   077   071   045    Old_age
Always       -       23 (Min/Max 22/23)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       5
193 Load_Cycle_Count        0x0032   100   100   000    Old_age
Always       -       7
194 Temperature_Celsius     0x0022   023   040   000    Old_age
Always       -       23 (0 20 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0

2014-02-03 Chris Murphy <lists@colorremedies.com>:
>
> On Feb 3, 2014, at 1:55 PM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:
>
>> 2014-01-30 Chris Murphy <lists@colorremedies.com>:
>>>
>>> On Jan 30, 2014, at 10:58 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
>>>
>>>> On Thu, Jan 30, 2014 at 10:33:21AM -0700, Chris Murphy wrote:
>>>>> You're doing an online conversion of a degraded raid1 volume into single? Does anyone know if this is expected or intended to work?
>>>>
>>>>  I don't see why not. One suggested method of recovering RAID from a
>>>> degraded situation is to rebalance over just the remaining devices
>>>> (space permitting, of course).
>>>
>>> Right but that's not a conversion. That's a regular balance on a degraded mount, with multiple remaining devices: e.g. a 4 disk raid1, drive fails, mount -o degraded, delete missing, then balance will replicate any missing 2nd copies onto three drives.
>>>
>>> The bigger problem at the moment is that -o degraded isn't working for Johan. The too many missing devices message seems like a bug and with limited information it may even be whatever that bug is, that cause the conversion to fail. Some 11GB were converted prior to the failure.
>> Which usefull information can provide. On the weekend I was at the
>> server and found out, that the vanishing of the drive at reboot was
>> strange behavior of the bios. So the drive is online again. but the
>> filesystem is still showing strange behavior, but now I can mount it
>> rw.
>
> I'd like to see btrfs fi df results for the volume. And new btrfs check. And then a backup if needed, and then a scrub to see if that fixes anything broken between them. I'm not sure what happens if a new generation object is broken and the old generation is OK, what scrub will do? Maybe it just reports it, I'm not sure. If you want you could do a btrfs scrub -r which is read only and just reports what the problems are.
>
> You also have an incomplete balance, right? So it's possible some things might not be fixable if the conversion to single was successful. You'll need to decide if you want to reconvert back to data/metadata raid1/raid from whatever you're at now.
>
>
> Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-02-03 21:31           ` Johan Kröckel
@ 2014-02-07 11:34             ` Johan Kröckel
  2014-02-07 17:43               ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Johan Kröckel @ 2014-02-07 11:34 UTC (permalink / raw)
  Cc: Btrfs BTRFS

Is there anything else I should do with this setup or may I nuke the
two partitions and reuse them?

2014-02-03 Johan Kröckel <johan.kroeckel@gmail.com>:
> State is: I wont use this filesystem again. I have a backup. So I am
> interested to give the necessary information for debuging it and
> afterwards format it and create a new one. I already did fscks and
> btrfschk --repair and pushed the output to txt-files but they are more
> than 4 mb in size.
>
> So I will post excerpts:
>
> ============file: btrfsck.out===========================
> Checking filesystem on /dev/mapper/bunkerA
> UUID: 7f954a85-7566-4251-832c-44f2d3de2211
> 42
> parent transid verify failed on 1887688011776 wanted 121037 found 88533
> parent transid verify failed on 1888518615040 wanted 121481 found 90267
> parent transid verify failed on 1681394102272 wanted 110919 found 91024
> parent transid verify failed on 1888522838016 wanted 121486 found 90270
> parent transid verify failed on 1888398331904 wanted 121062 found 89987
> leaf parent key incorrect 1887867330560
> bad block 1887867330560
> leaf parent key incorrect 1888120320000
> bad block 1888120320000
> leaf parent key incorrect 1888124637184
> bad block 1888124637184
> leaf parent key incorrect 1888131444736
> bad block 1888131444736
>
> [...and so on for 4MB]
>
> bad block 1888513552384
> leaf parent key incorrect 1888513642496
> bad block 1888513642496
> leaf parent key incorrect 1888513654784
> bad block 1888513654784
> leaf parent key incorrect 1888514023424
> bad block 1888514023424
> btrfsck: cmds-check.c:2212: check_owner_ref: Assertion `!(rec->is_root)' failed.
>
> ================file: smartctl-before-btrfschk-repair==============
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.12-0.bpo.1-amd64] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail
> Always       -       172055696
>   3 Spin_Up_Time            0x0003   093   093   000    Pre-fail
> Always       -       0
>   4 Start_Stop_Count        0x0032   100   100   020    Old_age
> Always       -       7
>   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
> Always       -       0
>   7 Seek_Error_Rate         0x000f   069   060   030    Pre-fail
> Always       -       9085642
>   9 Power_On_Hours          0x0032   097   097   000    Old_age
> Always       -       2769
>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age
> Always       -       7
> 184 End-to-End_Error        0x0032   100   100   099    Old_age
> Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age
> Always       -       0
> 188 Command_Timeout         0x0032   100   100   000    Old_age
> Always       -       0
> 189 High_Fly_Writes         0x003a   083   083   000    Old_age
> Always       -       17
> 190 Airflow_Temperature_Cel 0x0022   077   071   045    Old_age
> Always       -       23 (Min/Max 22/23)
> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
> Always       -       0
> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
> Always       -       5
> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age
> Always       -       7
> 194 Temperature_Celsius     0x0022   023   040   000    Old_age
> Always       -       23 (0 20 0 0)
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
> Always       -       0
>
> =================file:btrfsck-repair.out========================
> enabling repair mode
> Checking filesystem on /dev/mapper/bunkerA
> UUID: 7f954a85-7566-4251-832c-44f2d3de2211
> ify failed on 1887688011776 wanted 121037 found 88533
> parent transid verify failed on 1888518615040 wanted 121481 found 90267
> parent transid verify failed on 1681394102272 wanted 110919 found 91024
> parent transid verify failed on 1888522838016 wanted 121486 found 90270
> parent transid verify failed on 1888398331904 wanted 121062 found 89987
> leaf parent key incorrect 1887867330560
> bad block 1887867330560
>
> [...and so on for 4MB]
>
> bad block 1888513642496
> leaf parent key incorrect 1888513654784
> bad block 1888513654784
> leaf parent key incorrect 1888514023424
> bad block 1888514023424
> btrfsck: cmds-check.c:2212: check_owner_ref: Assertion `!(rec->is_root)' failed.
>
> ==============file:smartctl-after-btrfschk-repair==================
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.12-0.bpo.1-amd64] (local build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail
> Always       -       178377016
>   3 Spin_Up_Time            0x0003   093   093   000    Pre-fail
> Always       -       0
>   4 Start_Stop_Count        0x0032   100   100   020    Old_age
> Always       -       7
>   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
> Always       -       0
>   7 Seek_Error_Rate         0x000f   069   060   030    Pre-fail
> Always       -       9087571
>   9 Power_On_Hours          0x0032   097   097   000    Old_age
> Always       -       2769
>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age
> Always       -       7
> 184 End-to-End_Error        0x0032   100   100   099    Old_age
> Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age
> Always       -       0
> 188 Command_Timeout         0x0032   100   100   000    Old_age
> Always       -       0
> 189 High_Fly_Writes         0x003a   083   083   000    Old_age
> Always       -       17
> 190 Airflow_Temperature_Cel 0x0022   077   071   045    Old_age
> Always       -       23 (Min/Max 22/23)
> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
> Always       -       0
> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
> Always       -       5
> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age
> Always       -       7
> 194 Temperature_Celsius     0x0022   023   040   000    Old_age
> Always       -       23 (0 20 0 0)
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
> Always       -       0
>
> 2014-02-03 Chris Murphy <lists@colorremedies.com>:
>>
>> On Feb 3, 2014, at 1:55 PM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:
>>
>>> 2014-01-30 Chris Murphy <lists@colorremedies.com>:
>>>>
>>>> On Jan 30, 2014, at 10:58 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
>>>>
>>>>> On Thu, Jan 30, 2014 at 10:33:21AM -0700, Chris Murphy wrote:
>>>>>> You're doing an online conversion of a degraded raid1 volume into single? Does anyone know if this is expected or intended to work?
>>>>>
>>>>>  I don't see why not. One suggested method of recovering RAID from a
>>>>> degraded situation is to rebalance over just the remaining devices
>>>>> (space permitting, of course).
>>>>
>>>> Right but that's not a conversion. That's a regular balance on a degraded mount, with multiple remaining devices: e.g. a 4 disk raid1, drive fails, mount -o degraded, delete missing, then balance will replicate any missing 2nd copies onto three drives.
>>>>
>>>> The bigger problem at the moment is that -o degraded isn't working for Johan. The too many missing devices message seems like a bug and with limited information it may even be whatever that bug is, that cause the conversion to fail. Some 11GB were converted prior to the failure.
>>> Which usefull information can provide. On the weekend I was at the
>>> server and found out, that the vanishing of the drive at reboot was
>>> strange behavior of the bios. So the drive is online again. but the
>>> filesystem is still showing strange behavior, but now I can mount it
>>> rw.
>>
>> I'd like to see btrfs fi df results for the volume. And new btrfs check. And then a backup if needed, and then a scrub to see if that fixes anything broken between them. I'm not sure what happens if a new generation object is broken and the old generation is OK, what scrub will do? Maybe it just reports it, I'm not sure. If you want you could do a btrfs scrub -r which is read only and just reports what the problems are.
>>
>> You also have an incomplete balance, right? So it's possible some things might not be fixable if the conversion to single was successful. You'll need to decide if you want to reconvert back to data/metadata raid1/raid from whatever you're at now.
>>
>>
>> Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-02-07 11:34             ` Johan Kröckel
@ 2014-02-07 17:43               ` Chris Murphy
  2014-02-08 11:09                 ` Johan Kröckel
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2014-02-07 17:43 UTC (permalink / raw)
  To: Johan Kröckel; +Cc: Btrfs BTRFS

On Feb 7, 2014, at 4:34 AM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:

> Is there anything else I should do with this setup or may I nuke the
> two partitions and reuse them?

Well I'm pretty sure once you run 'btrfs check --repair' that you've hit the end of the road. Possibly btrfs restore can still extract some files, it might be worth testing whether that works.

Otherwise blow it away. I'd say test with 3.14-rc2 with a new file system and see if you can reproduce the sequence that caused this problem in the first place. If it's reproducible, I think there's a bug here somewhere.

Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-02-07 17:43               ` Chris Murphy
@ 2014-02-08 11:09                 ` Johan Kröckel
  2014-02-09  5:40                   ` Duncan
  0 siblings, 1 reply; 25+ messages in thread
From: Johan Kröckel @ 2014-02-08 11:09 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Ok, I did nuke it now and created the fs again using 3.12 kernel. So
far so good. Runs fine.
Finally, I know its kind of offtopic, but can some help me
interpreting this (I think this is the error in the smart-log which
started the whole mess)?

Error 1 occurred at disk power-on lifetime: 2576 hours (107 days + 8 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 00 ff ff ff 0f  Device Fault; Error: ABRT at LBA = 0x0fffffff
= 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 08 ff ff ff 4f 00   5d+04:53:11.169  WRITE FPDMA QUEUED
  61 00 08 80 18 00 40 00   5d+04:52:45.129  WRITE FPDMA QUEUED
  61 00 08 ff ff ff 4f 00   5d+04:52:44.701  WRITE FPDMA QUEUED
  61 00 08 ff ff ff 4f 00   5d+04:52:44.700  WRITE FPDMA QUEUED
  61 00 08 ff ff ff 4f 00   5d+04:52:44.679  WRITE FPDMA QUEUED

2014-02-07 Chris Murphy <lists@colorremedies.com>:
>
> On Feb 7, 2014, at 4:34 AM, Johan Kröckel <johan.kroeckel@gmail.com> wrote:
>
>> Is there anything else I should do with this setup or may I nuke the
>> two partitions and reuse them?
>
> Well I'm pretty sure once you run 'btrfs check --repair' that you've hit the end of the road. Possibly btrfs restore can still extract some files, it might be worth testing whether that works.
>
> Otherwise blow it away. I'd say test with 3.14-rc2 with a new file system and see if you can reproduce the sequence that caused this problem in the first place. If it's reproducible, I think there's a bug here somewhere.
>
>
> Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-02-08 11:09                 ` Johan Kröckel
@ 2014-02-09  5:40                   ` Duncan
  2014-02-10 14:05                     ` Johan Kröckel
  0 siblings, 1 reply; 25+ messages in thread
From: Duncan @ 2014-02-09  5:40 UTC (permalink / raw)
  To: linux-btrfs

Johan Kröckel posted on Sat, 08 Feb 2014 12:09:46 +0100 as excerpted:

> Ok, I did nuke it now and created the fs again using 3.12 kernel. So far
> so good. Runs fine.
> Finally, I know its kind of offtopic, but can some help me interpreting
> this (I think this is the error in the smart-log which started the whole
> mess)?
> 
> Error 1 occurred at disk power-on lifetime: 2576 hours (107 days + 8
> hours)
>   When the command that caused the error occurred, the device was
> active or idle.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   04 71 00 ff ff ff 0f
>  Device Fault; Error: ABRT at LBA = 0x0fffffff = 268435455

I'm no SMART expert, but that LBA number is incredibly suspicious.  With 
standard 512-byte sectors that's the 128 GiB boundary, the old 28-bit LBA 
limit (LBA28, introduced with ATA-1 in 1994, modern drives are LBA48, 
introduced in 2003 with ATA-6 and offering an addressing capacity of 128 
PiB, according to wikipedia's article on LBA).

It looks like something flipped back to LBA28, and when a continuing 
operation happened to write past that value... it triggered the abort you 
see in the SMART log.

Double-check your BIOS to be sure it didn't somehow revert to the old 
LBA28 compatibility mode or some such, and the drives, to make sure they 
aren't "clipped" to LBA28 compatibility mode as well.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: lost with degraded RAID1
  2014-02-09  5:40                   ` Duncan
@ 2014-02-10 14:05                     ` Johan Kröckel
  0 siblings, 0 replies; 25+ messages in thread
From: Johan Kröckel @ 2014-02-10 14:05 UTC (permalink / raw)
  Cc: Btrfs BTRFS

Thanks, that explains something.  There was indeed a BIOS-Problem (the
drive vanished was disabled in BIOS suddenly and was only useable
again after reactivating it in the BIOS again). So should have been a
BIOS-problem.

2014-02-09 Duncan <1i5t5.duncan@cox.net>:
> Johan Kröckel posted on Sat, 08 Feb 2014 12:09:46 +0100 as excerpted:
>
>> Ok, I did nuke it now and created the fs again using 3.12 kernel. So far
>> so good. Runs fine.
>> Finally, I know its kind of offtopic, but can some help me interpreting
>> this (I think this is the error in the smart-log which started the whole
>> mess)?
>>
>> Error 1 occurred at disk power-on lifetime: 2576 hours (107 days + 8
>> hours)
>>   When the command that caused the error occurred, the device was
>> active or idle.
>>
>>   After command completion occurred, registers were:
>>   ER ST SC SN CL CH DH
>>   -- -- -- -- -- -- --
>>   04 71 00 ff ff ff 0f
>>  Device Fault; Error: ABRT at LBA = 0x0fffffff = 268435455
>
> I'm no SMART expert, but that LBA number is incredibly suspicious.  With
> standard 512-byte sectors that's the 128 GiB boundary, the old 28-bit LBA
> limit (LBA28, introduced with ATA-1 in 1994, modern drives are LBA48,
> introduced in 2003 with ATA-6 and offering an addressing capacity of 128
> PiB, according to wikipedia's article on LBA).
>
> It looks like something flipped back to LBA28, and when a continuing
> operation happened to write past that value... it triggered the abort you
> see in the SMART log.
>
> Double-check your BIOS to be sure it didn't somehow revert to the old
> LBA28 compatibility mode or some such, and the drives, to make sure they
> aren't "clipped" to LBA28 compatibility mode as well.
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-02-10 14:06 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-29 19:16 lost with degraded RAID1 Johan Kröckel
2014-01-30  3:28 ` Duncan
2014-01-30 11:53 ` Johan Kröckel
2014-01-30 17:57   ` Chris Murphy
     [not found]     ` <CABgvyo9FYGSYpj+jL1oqCvtNUqsC8HZ+z=x-Gz7naWoEcCKWpQ@mail.gmail.com>
2014-01-30 19:32       ` Chris Murphy
2014-01-30 22:18   ` Duncan
2014-01-31  3:00     ` Chris Murphy
2014-01-31  5:58       ` Duncan
2014-01-31  6:10         ` Chris Murphy
2014-01-31  6:13           ` Chris Murphy
2014-01-31  7:37             ` Duncan
2014-01-31  2:19   ` Chris Murphy
2014-01-30 17:33 ` Chris Murphy
2014-01-30 17:58   ` Hugo Mills
2014-01-30 18:25     ` Chris Murphy
2014-02-03 20:55       ` Johan Kröckel
2014-02-03 21:08         ` Chris Murphy
2014-02-03 21:31           ` Johan Kröckel
2014-02-07 11:34             ` Johan Kröckel
2014-02-07 17:43               ` Chris Murphy
2014-02-08 11:09                 ` Johan Kröckel
2014-02-09  5:40                   ` Duncan
2014-02-10 14:05                     ` Johan Kröckel
     [not found]   ` <CABgvyo822bOAHeA1GH28MPaBAU+Zdi72MD_uwL+dhopt+nwMig@mail.gmail.com>
2014-01-31  6:28     ` Fwd: " Johan Kröckel
2014-01-31  6:40 ` Johan Kröckel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).