* problem replacing failing drive
@ 2012-10-22 9:07 sam tygier
2012-10-25 21:02 ` sam tygier
0 siblings, 1 reply; 4+ messages in thread
From: sam tygier @ 2012-10-22 9:07 UTC (permalink / raw)
To: linux-btrfs
hi,
I have a 2 drive btrfs raid set up. It was created first with a single drive, and then adding a second and doing
btrfs fi balance start -dconvert=raid1 /data
the original drive is showing smart errors so i want to replace it. i dont easily have space in my desktop for an extra disk, so i decided to proceed by shutting down. taking out the old failing drive and putting in the new drive. this is similar to the description at
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_Failed_Devices
(the other reason to try this is to simulate what would happen if a drive did completely fail).
so after swapping the drives and rebooting, i try to mount as degraded. i instantly get a kernel panic, http://www.hep.man.ac.uk/u/sam/pub/IMG_5397_crop.png
so far all this has been with 3.5 kernel. so i upgraded to 3.6.2 and tried to mount degraded again.
first with just sudo mount /dev/sdd2 /mnt, then with sudo mount -o degraded /dev/sdd2 /mnt
[ 582.535689] device label bdata devid 1 transid 25342 /dev/sdd2
[ 582.536196] btrfs: disk space caching is enabled
[ 582.536602] btrfs: failed to read the system array on sdd2
[ 582.536860] btrfs: open_ctree failed
[ 606.784176] device label bdata devid 1 transid 25342 /dev/sdd2
[ 606.784647] btrfs: allowing degraded mounts
[ 606.784650] btrfs: disk space caching is enabled
[ 606.785131] btrfs: failed to read chunk root on sdd2
[ 606.785331] btrfs warning page private not zero on page 3222292922368
[ 606.785408] btrfs: open_ctree failed
[ 782.422959] device label bdata devid 1 transid 25342 /dev/sdd2
no panic is good progress, but something is still not right.
my options would seem to be
1) reconnect old drive (probably in a USB caddy), see if it mounts as if nothing ever happened. or possibly try and recover it back to a working raid1. then try again with adding the new drive first, then removing the old one.
2) give up experimenting and create a new btrfs raid1, and restore from backup
both leave me with a worry about what would happen if a disk in a raid 1 did die. (unless is was the panic that did some damage that borked the filesystem.)
thanks.
sam
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: problem replacing failing drive
2012-10-22 9:07 problem replacing failing drive sam tygier
@ 2012-10-25 21:02 ` sam tygier
2012-10-25 21:37 ` Kyle Gates
0 siblings, 1 reply; 4+ messages in thread
From: sam tygier @ 2012-10-25 21:02 UTC (permalink / raw)
To: linux-btrfs
On 22/10/12 10:07, sam tygier wrote:
> hi,
>
> I have a 2 drive btrfs raid set up. It was created first with a single drive, and then adding a second and doing
> btrfs fi balance start -dconvert=raid1 /data
>
> the original drive is showing smart errors so i want to replace it. i dont easily have space in my desktop for an extra disk, so i decided to proceed by shutting down. taking out the old failing drive and putting in the new drive. this is similar to the description at
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_Failed_Devices
> (the other reason to try this is to simulate what would happen if a drive did completely fail).
>
> so after swapping the drives and rebooting, i try to mount as degraded. i instantly get a kernel panic, http://www.hep.man.ac.uk/u/sam/pub/IMG_5397_crop.png
>
> so far all this has been with 3.5 kernel. so i upgraded to 3.6.2 and tried to mount degraded again.
>
> first with just sudo mount /dev/sdd2 /mnt, then with sudo mount -o degraded /dev/sdd2 /mnt
>
> [ 582.535689] device label bdata devid 1 transid 25342 /dev/sdd2
> [ 582.536196] btrfs: disk space caching is enabled
> [ 582.536602] btrfs: failed to read the system array on sdd2
> [ 582.536860] btrfs: open_ctree failed
> [ 606.784176] device label bdata devid 1 transid 25342 /dev/sdd2
> [ 606.784647] btrfs: allowing degraded mounts
> [ 606.784650] btrfs: disk space caching is enabled
> [ 606.785131] btrfs: failed to read chunk root on sdd2
> [ 606.785331] btrfs warning page private not zero on page 3222292922368
> [ 606.785408] btrfs: open_ctree failed
> [ 782.422959] device label bdata devid 1 transid 25342 /dev/sdd2
>
> no panic is good progress, but something is still not right.
>
> my options would seem to be
> 1) reconnect old drive (probably in a USB caddy), see if it mounts as if nothing ever happened. or possibly try and recover it back to a working raid1. then try again with adding the new drive first, then removing the old one.
> 2) give up experimenting and create a new btrfs raid1, and restore from backup
>
> both leave me with a worry about what would happen if a disk in a raid 1 did die. (unless is was the panic that did some damage that borked the filesystem.)
Some more details.
If i reconnect the failing drive then I can mount the filesystem with no errors, a quick glance suggests that the data is all there.
Label: 'bdata' uuid: 1f07081c-316b-48be-af73-49e6f76535cc
Total devices 2 FS bytes used 2.50TB
devid 2 size 2.73TB used 2.73TB path /dev/sde1 <-- this is the drive that i wish to remove
devid 1 size 2.73TB used 2.73TB path /dev/sdd2
sudo btrfs filesystem df /mnt
Data, RAID1: total=2.62TB, used=2.50TB
System, DUP: total=40.00MB, used=396.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=112.00GB, used=3.84GB
Metadata: total=8.00MB, used=0.00
is the failure to mount when i remove sde due to it being dup, rather than raid1?
is adding a second drive to a btrfs filesystem and running
btrfs fi balance start -dconvert=raid1 /mnt
not sufficient to create an array that can survive the loss of a disk? do i need -mconvert as well? is there an -sconvert for system?
thanks
Sam
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: problem replacing failing drive
2012-10-25 21:02 ` sam tygier
@ 2012-10-25 21:37 ` Kyle Gates
2012-10-26 9:02 ` sam tygier
0 siblings, 1 reply; 4+ messages in thread
From: Kyle Gates @ 2012-10-25 21:37 UTC (permalink / raw)
To: sam tygier, linux-btrfs@vger.kernel.org
----------------------------------------
> To: linux-btrfs@vger.kernel.org
> From: samtygier@yahoo.co.uk
> Subject: Re: problem replacing failing drive
> Date: Thu, 25 Oct 2012 22:02:23 +0100
>
> On 22/10/12 10:07, sam tygier wrote:
> > hi,
> >
> > I have a 2 drive btrfs raid set up. It was created first with a single drive, and then adding a second and doing
> > btrfs fi balance start -dconvert=raid1 /data
> >
> > the original drive is showing smart errors so i want to replace it. i dont easily have space in my desktop for an extra disk, so i decided to proceed by shutting down. taking out the old failing drive and putting in the new drive. this is similar to the description at
> > https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_Failed_Devices
> > (the other reason to try this is to simulate what would happen if a drive did completely fail).
> >
> > so after swapping the drives and rebooting, i try to mount as degraded. i instantly get a kernel panic, http://www.hep.man.ac.uk/u/sam/pub/IMG_5397_crop.png
> >
> > so far all this has been with 3.5 kernel. so i upgraded to 3.6.2 and tried to mount degraded again.
> >
> > first with just sudo mount /dev/sdd2 /mnt, then with sudo mount -o degraded /dev/sdd2 /mnt
> >
> > [ 582.535689] device label bdata devid 1 transid 25342 /dev/sdd2
> > [ 582.536196] btrfs: disk space caching is enabled
> > [ 582.536602] btrfs: failed to read the system array on sdd2
> > [ 582.536860] btrfs: open_ctree failed
> > [ 606.784176] device label bdata devid 1 transid 25342 /dev/sdd2
> > [ 606.784647] btrfs: allowing degraded mounts
> > [ 606.784650] btrfs: disk space caching is enabled
> > [ 606.785131] btrfs: failed to read chunk root on sdd2
> > [ 606.785331] btrfs warning page private not zero on page 3222292922368
> > [ 606.785408] btrfs: open_ctree failed
> > [ 782.422959] device label bdata devid 1 transid 25342 /dev/sdd2
> >
> > no panic is good progress, but something is still not right.
> >
> > my options would seem to be
> > 1) reconnect old drive (probably in a USB caddy), see if it mounts as if nothing ever happened. or possibly try and recover it back to a working raid1. then try again with adding the new drive first, then removing the old one.
> > 2) give up experimenting and create a new btrfs raid1, and restore from backup
> >
> > both leave me with a worry about what would happen if a disk in a raid 1 did die. (unless is was the panic that did some damage that borked the filesystem.)
>
> Some more details.
>
> If i reconnect the failing drive then I can mount the filesystem with no errors, a quick glance suggests that the data is all there.
>
> Label: 'bdata' uuid: 1f07081c-316b-48be-af73-49e6f76535cc
> Total devices 2 FS bytes used 2.50TB
> devid 2 size 2.73TB used 2.73TB path /dev/sde1 <-- this is the drive that i wish to remove
> devid 1 size 2.73TB used 2.73TB path /dev/sdd2
>
> sudo btrfs filesystem df /mnt
> Data, RAID1: total=2.62TB, used=2.50TB
> System, DUP: total=40.00MB, used=396.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=112.00GB, used=3.84GB
> Metadata: total=8.00MB, used=0.00
>
> is the failure to mount when i remove sde due to it being dup, rather than raid1?
Yes, I would say so.
Try a
btrfs balance start -mconvert=raid1 /mnt
so all metadata is on each drive.
>
> is adding a second drive to a btrfs filesystem and running
> btrfs fi balance start -dconvert=raid1 /mnt
> not sufficient to create an array that can survive the loss of a disk? do i need -mconvert as well? is there an -sconvert for system?
>
> thanks
>
> Sam
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: problem replacing failing drive
2012-10-25 21:37 ` Kyle Gates
@ 2012-10-26 9:02 ` sam tygier
0 siblings, 0 replies; 4+ messages in thread
From: sam tygier @ 2012-10-26 9:02 UTC (permalink / raw)
To: linux-btrfs
On 25/10/12 22:37, Kyle Gates wrote:
>> On 22/10/12 10:07, sam tygier wrote:
>>> hi,
>>>
>>> I have a 2 drive btrfs raid set up. It was created first with a single drive, and then adding a second and doing
>>> btrfs fi balance start -dconvert=raid1 /data
>>>
>>> the original drive is showing smart errors so i want to replace it. i dont easily have space in my desktop for an extra disk, so i decided to proceed by shutting down. taking out the old failing drive and putting in the new drive. this is similar to the description at
>>> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_Failed_Devices
>>> (the other reason to try this is to simulate what would happen if a drive did completely fail).
>>
>> If i reconnect the failing drive then I can mount the filesystem with no errors, a quick glance suggests that the data is all there.
>>
>> Label: 'bdata' uuid: 1f07081c-316b-48be-af73-49e6f76535cc
>> Total devices 2 FS bytes used 2.50TB
>> devid 2 size 2.73TB used 2.73TB path /dev/sde1 <-- this is the drive that i wish to remove
>> devid 1 size 2.73TB used 2.73TB path /dev/sdd2
>>
>> sudo btrfs filesystem df /mnt
>> Data, RAID1: total=2.62TB, used=2.50TB
>> System, DUP: total=40.00MB, used=396.00KB
>> System: total=4.00MB, used=0.00
>> Metadata, DUP: total=112.00GB, used=3.84GB
>> Metadata: total=8.00MB, used=0.00
>>
>> is the failure to mount when i remove sde due to it being dup, rather than raid1?
>
> Yes, I would say so.
> Try a
> btrfs balance start -mconvert=raid1 /mnt
> so all metadata is on each drive.
Thanks
btrfs balance start -mconvert=raid1 /mnt
did the trick. It gave "btrfs: 9 enospc errors during balance" errors the first few times i ran it, but got there in the end (smaller number of errors each time). the volume is pretty full, so i'll forgive it, (though is "Metadata, RAID1: total=111.84GB, used=3.83GB" a reasonable ratio?).
i can now successfully remove the failed device and mount the filesystem in degraded mode.
It seems like the system blocks get convert automatically.
i have added an example for how to do this at https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Adding_New_Devices
Thanks,
Sam
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-10-26 9:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-22 9:07 problem replacing failing drive sam tygier
2012-10-25 21:02 ` sam tygier
2012-10-25 21:37 ` Kyle Gates
2012-10-26 9:02 ` sam tygier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).