* RAID6 duplicate device in array after replacing a drive. what the?
@ 2015-09-21 10:20 Rob
2015-09-23 16:32 ` Goffredo Baroncelli
0 siblings, 1 reply; 3+ messages in thread
From: Rob @ 2015-09-21 10:20 UTC (permalink / raw)
To: linux-btrfs
Hi all,
I managed to get a test raid6 into a strange state after removing a drive that
was going faulty, putting in a blank replacement and then mounting with -o
degraded
uname -a
Linux urvile 4.1.7-v7+ #815 SMP PREEMPT Thu Sep 17 18:34:33 BST 2015 armv7l
GNU/Linux
btrfs --version
btrfs-progs v4.1.2
Here's the steps taken to get into this state:
1. btrfs scrub start /media/btrfs-rpi-raid6
- at some point the I/O rate dropped, scrub status started showing a lot of
corrections on sdc1 and kernel spitting heaps of sector errors on sdc.
smartctl shows lots of current_pending_sectors etc. Time to replace the drive.
2. btrfs scrub cancel /media/btrfs-rpi-raid6
- I waited 4h but this didnt return to a prompt (tried unmounting, killall -9
btrfs) so i switched power off to the disks, replaced the faulty disk and
switched the enclosure on again.
at this point we have:
8 0 1953514584 sda
8 48 1953514584 sdd
8 49 1953513560 sdd1
8 16 1953514584 sdb
8 17 1953513560 sdb1
8 32 1953514584 sdc
8 33 1953513560 sdc1
You can see the drives have shifted position theres a new "sdc" which is one
of the good drives, and sda is the new unpartitioned drive.
3. mount /dev/sdb1 /media/btrfs-rpi-raid6 -o degraded,noatime
4. btrfs fi usage btrfs-rpi-raid6/
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
Overall:
Device size: 7.28TiB
Device allocated: 20.00MiB
Device unallocated: 7.28TiB
Device missing: 0.00B
Used: 0.00B
Free (estimated): 83.70PiB (min: 7.37TiB)
Data ratio: 0.00
Metadata ratio: 0.00
Global reserve: 48.00MiB (used: 0.00B)
Data,single: Size:8.00MiB, Used:0.00B
/dev/sdc1 8.00MiB
Data,RAID6: Size:92.00GiB, Used:91.34GiB
/dev/sdb1 46.00GiB
/dev/sdc1 46.00GiB
/dev/sdc1 46.00GiB
/dev/sdd1 46.00GiB
Metadata,single: Size:8.00MiB, Used:0.00B
/dev/sdc1 8.00MiB
Metadata,RAID6: Size:2.00GiB, Used:125.38MiB
/dev/sdb1 1.00GiB
/dev/sdc1 1.00GiB
/dev/sdc1 1.00GiB
/dev/sdd1 1.00GiB
System,single: Size:4.00MiB, Used:0.00B
/dev/sdc1 4.00MiB
System,RAID6: Size:16.00MiB, Used:16.00KiB
/dev/sdb1 8.00MiB
/dev/sdc1 8.00MiB
/dev/sdc1 8.00MiB
/dev/sdd1 8.00MiB
Unallocated:
/dev/sdb1 1.77TiB
/dev/sdc1 1.77TiB
/dev/sdc1 1.77TiB
/dev/sdd1 1.77TiB
The files inside look ok, but looking at the above, i'm scared to touch it lest
the array goes corrupt.
So my questions for anyone that is knowledgable are:
- Should the above not say theres a missing device?
- why is sdc1 listed twice?
- will bad things (tm) happen if i write to the filesystem?
- are there some commands i can run to safely remove the duplicate so i can
add the replacement drive and balance?
For completeness, heres some more data points:
btrfs fi df btrfs-rpi-raid6
--------------------
Data, single: total=8.00MiB, used=0.00B
Data, RAID6: total=92.00GiB, used=91.34GiB
System, single: total=4.00MiB, used=0.00B
System, RAID6: total=16.00MiB, used=16.00KiB
Metadata, single: total=8.00MiB, used=0.00B
Metadata, RAID6: total=2.00GiB, used=125.38MiB
GlobalReserve, single: total=48.00MiB, used=0.00B
btrfs fi show btrfs-rpi-raid6
--------------------
Label: 'btrfs-rpi-raid6' uuid: de8aa5e0-1d58-43df-a14e-160471caef5b
Total devices 4 FS bytes used 91.46GiB
devid 1 size 1.82TiB used 47.03GiB path /dev/sdc1
devid 2 size 1.82TiB used 47.01GiB path /dev/sdc1
devid 3 size 1.82TiB used 47.01GiB path /dev/sdd1
devid 4 size 1.82TiB used 47.01GiB path /dev/sdb1
again, duplicate device as devid 1 *and* 2 looks abnormal
Thanks for reading!
-Rob
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: RAID6 duplicate device in array after replacing a drive. what the?
2015-09-21 10:20 RAID6 duplicate device in array after replacing a drive. what the? Rob
@ 2015-09-23 16:32 ` Goffredo Baroncelli
2015-09-23 18:50 ` Goffredo Baroncelli
0 siblings, 1 reply; 3+ messages in thread
From: Goffredo Baroncelli @ 2015-09-23 16:32 UTC (permalink / raw)
To: Rob, linux-btrfs
On 2015-09-21 12:20, Rob wrote:
> 2. btrfs scrub cancel /media/btrfs-rpi-raid6
>
> - I waited 4h but this didnt return to a prompt (tried unmounting, killall -9
> btrfs) so i switched power off to the disks, replaced the faulty disk and
> switched the enclosure on again.
I don't understood if the disks are in an external enclosure which was switched OFF (leaving the system ON) or you have switched OFF all the system. Could you clarify ?
BR
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: RAID6 duplicate device in array after replacing a drive. what the?
2015-09-23 16:32 ` Goffredo Baroncelli
@ 2015-09-23 18:50 ` Goffredo Baroncelli
0 siblings, 0 replies; 3+ messages in thread
From: Goffredo Baroncelli @ 2015-09-23 18:50 UTC (permalink / raw)
To: Rob; +Cc: linux-btrfs, Anand Jain
On 2015-09-23 18:32, Goffredo Baroncelli wrote:
> On 2015-09-21 12:20, Rob wrote:
>> 2. btrfs scrub cancel /media/btrfs-rpi-raid6
>>
>> - I waited 4h but this didnt return to a prompt (tried unmounting,
>> killall -9 btrfs) so i switched power off to the disks, replaced
>> the faulty disk and switched the enclosure on again.
>
> I don't understood if the disks are in an external enclosure which
> was switched OFF (leaving the system ON) or you have switched OFF all
> the system. Could you clarify ?
>
I was able to reproduce this; the good new is that I reproduced this issue with an old kernel (v4.1.5); a more recent kernel (v4.2.1) doesn't show the problem.
I suspect that the commit below solved this issue:
commit 4fde46f0cc71c7aba299ee6dfb4f017fb97b6e70
Author: Anand Jain <Anand.Jain@oracle.com>
Date: Wed Jun 17 21:10:48 2015 +0800
Btrfs: free the stale device
When btrfs on a device is overwritten with a new btrfs (mkfs),
the old btrfs instance in the kernel becomes stale. So with this
patch, if kernel finds device is overwritten then delete the stale
fsid/uuid.
To trigger the problem you have to re-register two different devices (== different dev_uuid) with the same device name and the same fs_uuid (without rebooting).
Below how I reproduced this issue:
# creating the filesystem
truncate -s 20G img0
truncate -s 20G img1
truncate -s 20G img2
truncate -s 20G img3
losetup /dev/loop0 img0
losetup /dev/loop1 img1
losetup /dev/loop2 img2
losetup /dev/loop3 img3
mkfs.btrfs -draid6 -mraid6 /dev/loop[0-3]
# mount and use the filesystem
mount /dev/loop1 /mnt/test
[...]
umount /mnt/test
# remove the img2, rotate the devices
losetup -d /dev/loop0
losetup -d /dev/loop1
losetup -d /dev/loop2
losetup -d /dev/loop3
losetup /dev/loop1 img0
losetup /dev/loop2 img1
losetup /dev/loop3 img3
#NOTE /dev/loop0 is unassigned
mount -o degraded /dev/loop1 /mnt/test
btrfs fi usage /mnt/test
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
Overall:
Device size: 80.00GiB
Device allocated: 20.00MiB
Device unallocated: 79.98GiB
Device missing: 0.00B
Used: 0.00B
Free (estimated): 20.07TiB (min: 81.99GiB)
Data ratio: 0.00
Metadata ratio: 0.00
Global reserve: 16.00MiB (used: 0.00B)
Data,single: Size:8.00MiB, Used:0.00B
/dev/loop1 8.00MiB
Data,RAID6: Size:2.00GiB, Used:11.00MiB
/dev/loop1 1.00GiB
/dev/loop2 1.00GiB
/dev/loop2 1.00GiB
/dev/loop3 1.00GiB
Metadata,single: Size:8.00MiB, Used:0.00B
/dev/loop1 8.00MiB
Metadata,RAID6: Size:2.00GiB, Used:112.00KiB
/dev/loop1 1.00GiB
/dev/loop2 1.00GiB
/dev/loop2 1.00GiB
/dev/loop3 1.00GiB
System,single: Size:4.00MiB, Used:0.00B
/dev/loop1 4.00MiB
System,RAID6: Size:16.00MiB, Used:16.00KiB
/dev/loop1 8.00MiB
/dev/loop2 8.00MiB
/dev/loop2 8.00MiB
/dev/loop3 8.00MiB
Unallocated:
/dev/loop1 17.97GiB
/dev/loop2 17.99GiB
/dev/loop2 17.99GiB
/dev/loop3 17.99GiB
BR
G.Baroncelli
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-09-23 18:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-21 10:20 RAID6 duplicate device in array after replacing a drive. what the? Rob
2015-09-23 16:32 ` Goffredo Baroncelli
2015-09-23 18:50 ` Goffredo Baroncelli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).