* RAID-1 not accessible after disk replacement
@ 2024-05-23 11:47 Richard
2024-05-23 12:20 ` Yu Kuai
2024-05-23 16:23 ` Phillip Susi
0 siblings, 2 replies; 8+ messages in thread
From: Richard @ 2024-05-23 11:47 UTC (permalink / raw)
To: linux-raid
Hello,
I was reference to this mailinglist via the https://raid.wiki.kernel.org/
index.php/RAID_Recovery page, "Make sure to contact someone who's experienced
in dealing with md RAID problems and failures - either someone you know in
person, or, alternatively, the friendly and helpful folk linux-raid mailing
list".
As I've a problem with an RAID-1 array after replacing a failed disk, I hope
that you can help me to make it at least readable again.
Information about the RAID:
/dev/sda3 is the disk partition that has the data on it.
/dev/sdb6 is the new disk partition - no data on it.
Information of the involved RAID before the disk replacement:
/dev/md126:
Version : 1.0
Creation Time : Sat Apr 29 20:30:27 2017
Raid Level : raid1
Array Size : 247464768 (236.00 GiB 253.40 GB)
Used Dev Size : 247464768 (236.00 GiB 253.40 GB)
I grew (--grow) the RAID to an smaller size as it was complaining about the
size (no logging of that).
After the this action the RAID was functioning and fully accessible.
After reboot the RAID is not accessible anymore.
The actual data on the RAID is about 37GB.
After reboot it became md125.
[ 23.872267] md/raid1:md125: active with 1 out of 2 mirrors
[ 23.903300] md125: detected capacity change from 0 to 488636416
2024-05-23T12:13:29.708106+02:00 cloud kernel: [ 632.338266][ T2913] EXT4-fs
(md125): bad geometry: block count 6186598
4 exceeds size of device (61079552 blocks)
# mount /dev/md125 /srv
mount: /srv: wrong fs type, bad option, bad superblock on /dev/md125, missing
codepage or helper program, or other error
# LANG=C fsck /dev/md125
fsck from util-linux 2.37.4
e2fsck 1.46.4 (18-Aug-2021)
The filesystem size (according to the superblock) is 61865984 blocks
The physical size of the device is 61079552 blocks
Either the superblock or the partition table is likely to be corrupt!
# cat /proc/mdstat
md125 : active raid1 sda3[2]
244318208 blocks super 1.0 [2/1] [U_]
bitmap: 1/2 pages [4KB], 65536KB chunk
mdadm --detail /dev/md125
/dev/md125:
Version : 1.0
Creation Time : Sat Apr 29 20:30:27 2017
Raid Level : raid1
Array Size : 244318208 (233.00 GiB 250.18 GB)
Used Dev Size : 244318208 (233.00 GiB 250.18 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon May 20 18:40:03 2024
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : any:srv
UUID : d96112c1:4c249022:96b4488c:642e84a6
Events : 576774
Number Major Minor RaidDevice State
2 8 3 0 active sync /dev/sda3
- 0 0 1 removed
# LANG=C fdisk -l /dev/sda
Disk /dev/sda: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDC WD5000LPCX-2
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xc28b197d
Apparaat Op. Begin Einde Sectoren Grootte ID Type
/dev/sda3 46139392 891291647 845152256 403G fd Linux raidautodetectie
# LANG=C fdisk -l /dev/sdb
Disk /dev/sdb: 298.09 GiB, 320072933376 bytes, 625142448 sectors
Disk model: WDC WD3200BUCT-6
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x975182e6
Device Boot Start End Sectors Size Id Type
/dev/sdb6 136321024 625142447 488821424 233.1G fd Linux raid autodetect
# mdadm --examine /dev/sda3
/dev/sda3:
Magic : a92b4efc
Version : 1.0
Feature Map : 0x1
Array UUID : d96112c1:4c249022:96b4488c:642e84a6
Name : any:srv
Creation Time : Sat Apr 29 20:30:27 2017
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 845151976 sectors (403.00 GiB 432.72 GB)
Array Size : 244318208 KiB (233.00 GiB 250.18 GB)
Used Dev Size : 488636416 sectors (233.00 GiB 250.18 GB)
Super Offset : 845152240 sectors
Unused Space : before=0 sectors, after=356515808 sectors
State : clean
Device UUID : d75cd979:3401034d:41932cc1:4c98b232
Internal Bitmap : -16 sectors from superblock
Update Time : Mon May 20 18:40:03 2024
Bad Block Log : 512 entries available at offset -8 sectors
Checksum : f6864782 - correct
Events : 576774
Device Role : Active device 0
Array State : A. ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sdb6
/dev/sdb6:
Magic : a92b4efc
Version : 1.0
Feature Map : 0x1
Array UUID : d96112c1:4c249022:96b4488c:642e84a6
Name : any:srv
Creation Time : Sat Apr 29 20:30:27 2017
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 488821392 sectors (233.09 GiB 250.28 GB)
Array Size : 244318208 KiB (233.00 GiB 250.18 GB)
Used Dev Size : 488636416 sectors (233.00 GiB 250.18 GB)
Super Offset : 488821408 sectors
Unused Space : before=0 sectors, after=184976 sectors
State : clean
Device UUID : c077a12d:9ba6c3a0:9c5749ba:600e9aef
Internal Bitmap : -16 sectors from superblock
Update Time : Mon May 20 18:15:52 2024
Bad Block Log : 512 entries available at offset -8 sectors
Checksum : 899ae2ec - correct
Events : 576771
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
Is there anything that can be done, to access the data on /dev/sda3?
--
Thanks in advance,
Richard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID-1 not accessible after disk replacement
2024-05-23 11:47 RAID-1 not accessible after disk replacement Richard
@ 2024-05-23 12:20 ` Yu Kuai
2024-05-23 12:47 ` Richard
2024-05-23 16:23 ` Phillip Susi
1 sibling, 1 reply; 8+ messages in thread
From: Yu Kuai @ 2024-05-23 12:20 UTC (permalink / raw)
To: Richard, linux-raid, yukuai (C)
Hi,
在 2024/05/23 19:47, Richard 写道:
> Hello,
>
> I was reference to this mailinglist via the https://raid.wiki.kernel.org/
> index.php/RAID_Recovery page, "Make sure to contact someone who's experienced
> in dealing with md RAID problems and failures - either someone you know in
> person, or, alternatively, the friendly and helpful folk linux-raid mailing
> list".
>
>
> As I've a problem with an RAID-1 array after replacing a failed disk, I hope
> that you can help me to make it at least readable again.
>
>
>
> Information about the RAID:
>
> /dev/sda3 is the disk partition that has the data on it.
> /dev/sdb6 is the new disk partition - no data on it.
>
>
> Information of the involved RAID before the disk replacement:
>
> /dev/md126:
> Version : 1.0
> Creation Time : Sat Apr 29 20:30:27 2017
> Raid Level : raid1
> Array Size : 247464768 (236.00 GiB 253.40 GB)
> Used Dev Size : 247464768 (236.00 GiB 253.40 GB)
>
> I grew (--grow) the RAID to an smaller size as it was complaining about the
> size (no logging of that).
This is insane, there is no way ext4 can mount again. And what's
worse, looks like you're doing this with ext4 still mounted.
> After the this action the RAID was functioning and fully accessible.
>
> After reboot the RAID is not accessible anymore.
>
> The actual data on the RAID is about 37GB.
>
> After reboot it became md125.
>
> [ 23.872267] md/raid1:md125: active with 1 out of 2 mirrors
> [ 23.903300] md125: detected capacity change from 0 to 488636416
> 2024-05-23T12:13:29.708106+02:00 cloud kernel: [ 632.338266][ T2913] EXT4-fs
> (md125): bad geometry: block count 6186598
> 4 exceeds size of device (61079552 blocks)
And kernel log already told you the reason.
I'll suggest you to grow the raid to it's orginal size, however, there
is no guarantee you won't lost your data.
Thanks,
Kuai
>
>
> # mount /dev/md125 /srv
> mount: /srv: wrong fs type, bad option, bad superblock on /dev/md125, missing
> codepage or helper program, or other error
>
> # LANG=C fsck /dev/md125
> fsck from util-linux 2.37.4
> e2fsck 1.46.4 (18-Aug-2021)
> The filesystem size (according to the superblock) is 61865984 blocks
> The physical size of the device is 61079552 blocks
> Either the superblock or the partition table is likely to be corrupt!
>
> # cat /proc/mdstat
> md125 : active raid1 sda3[2]
> 244318208 blocks super 1.0 [2/1] [U_]
> bitmap: 1/2 pages [4KB], 65536KB chunk
> mdadm --detail /dev/md125
> /dev/md125:
> Version : 1.0
> Creation Time : Sat Apr 29 20:30:27 2017
> Raid Level : raid1
> Array Size : 244318208 (233.00 GiB 250.18 GB)
> Used Dev Size : 244318208 (233.00 GiB 250.18 GB)
> Raid Devices : 2
> Total Devices : 1
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Mon May 20 18:40:03 2024
> State : clean, degraded
> Active Devices : 1
> Working Devices : 1
> Failed Devices : 0
> Spare Devices : 0
>
> Consistency Policy : bitmap
>
> Name : any:srv
> UUID : d96112c1:4c249022:96b4488c:642e84a6
> Events : 576774
>
> Number Major Minor RaidDevice State
> 2 8 3 0 active sync /dev/sda3
> - 0 0 1 removed
>
> # LANG=C fdisk -l /dev/sda
> Disk /dev/sda: 465.76 GiB, 500107862016 bytes, 976773168 sectors
> Disk model: WDC WD5000LPCX-2
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: dos
> Disk identifier: 0xc28b197d
> Apparaat Op. Begin Einde Sectoren Grootte ID Type
> /dev/sda3 46139392 891291647 845152256 403G fd Linux raidautodetectie
>
>
> # LANG=C fdisk -l /dev/sdb
> Disk /dev/sdb: 298.09 GiB, 320072933376 bytes, 625142448 sectors
> Disk model: WDC WD3200BUCT-6
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: dos
> Disk identifier: 0x975182e6
> Device Boot Start End Sectors Size Id Type
> /dev/sdb6 136321024 625142447 488821424 233.1G fd Linux raid autodetect
>
> # mdadm --examine /dev/sda3
> /dev/sda3:
> Magic : a92b4efc
> Version : 1.0
> Feature Map : 0x1
> Array UUID : d96112c1:4c249022:96b4488c:642e84a6
> Name : any:srv
> Creation Time : Sat Apr 29 20:30:27 2017
> Raid Level : raid1
> Raid Devices : 2
>
> Avail Dev Size : 845151976 sectors (403.00 GiB 432.72 GB)
> Array Size : 244318208 KiB (233.00 GiB 250.18 GB)
> Used Dev Size : 488636416 sectors (233.00 GiB 250.18 GB)
> Super Offset : 845152240 sectors
> Unused Space : before=0 sectors, after=356515808 sectors
> State : clean
> Device UUID : d75cd979:3401034d:41932cc1:4c98b232
>
> Internal Bitmap : -16 sectors from superblock
> Update Time : Mon May 20 18:40:03 2024
> Bad Block Log : 512 entries available at offset -8 sectors
> Checksum : f6864782 - correct
> Events : 576774
>
>
> Device Role : Active device 0
> Array State : A. ('A' == active, '.' == missing, 'R' == replacing)
>
>
> # mdadm --examine /dev/sdb6
> /dev/sdb6:
> Magic : a92b4efc
> Version : 1.0
> Feature Map : 0x1
> Array UUID : d96112c1:4c249022:96b4488c:642e84a6
> Name : any:srv
> Creation Time : Sat Apr 29 20:30:27 2017
> Raid Level : raid1
> Raid Devices : 2
>
> Avail Dev Size : 488821392 sectors (233.09 GiB 250.28 GB)
> Array Size : 244318208 KiB (233.00 GiB 250.18 GB)
> Used Dev Size : 488636416 sectors (233.00 GiB 250.18 GB)
> Super Offset : 488821408 sectors
> Unused Space : before=0 sectors, after=184976 sectors
> State : clean
> Device UUID : c077a12d:9ba6c3a0:9c5749ba:600e9aef
>
> Internal Bitmap : -16 sectors from superblock
> Update Time : Mon May 20 18:15:52 2024
> Bad Block Log : 512 entries available at offset -8 sectors
> Checksum : 899ae2ec - correct
> Events : 576771
>
>
> Device Role : Active device 1
> Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
>
>
> Is there anything that can be done, to access the data on /dev/sda3?
>
>
>
> --
> Thanks in advance,
>
>
> Richard
>
>
>
>
> .
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID-1 not accessible after disk replacement
2024-05-23 12:20 ` Yu Kuai
@ 2024-05-23 12:47 ` Richard
0 siblings, 0 replies; 8+ messages in thread
From: Richard @ 2024-05-23 12:47 UTC (permalink / raw)
To: linux-raid
Hi,
Op donderdag 23 mei 2024 14:20:35 CEST schreef Yu Kuai:
> Hi,
>
> 在 2024/05/23 19:47, Richard 写道:
> > Hello,
> >
...
> > Information of the involved RAID before the disk replacement:
> >
> > /dev/md126:
> > Version : 1.0
> >
> > Creation Time : Sat Apr 29 20:30:27 2017
> >
> > Raid Level : raid1
> > Array Size : 247464768 (236.00 GiB 253.40 GB)
> >
> > Used Dev Size : 247464768 (236.00 GiB 253.40 GB)
> >
> > I grew (--grow) the RAID to an smaller size as it was complaining about
> > the
> > size (no logging of that).
>
> This is insane, there is no way ext4 can mount again. And what's
> worse, looks like you're doing this with ext4 still mounted.
Thanks for the swift answer. Will try this later.
It's (also) good to know that it may not be possible to fix (as that saves time).
(big) If there are still possibilities.... I'll try that.
--
Richard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID-1 not accessible after disk replacement
2024-05-23 11:47 RAID-1 not accessible after disk replacement Richard
2024-05-23 12:20 ` Yu Kuai
@ 2024-05-23 16:23 ` Phillip Susi
2024-05-24 15:23 ` Richard
1 sibling, 1 reply; 8+ messages in thread
From: Phillip Susi @ 2024-05-23 16:23 UTC (permalink / raw)
To: Richard, linux-raid
Richard <richard@radoeka.nl> writes:
> I grew (--grow) the RAID to an smaller size as it was complaining about the
> size (no logging of that).
> After the this action the RAID was functioning and fully accessible.
I think you mean you used -z to reduce the size of the array. It
appears that you are trying to replace the failed drive with one that is
half the size, then shrunk the array, which truncated your filesystem,
which is why you can no longer access it. You can't shrink the disk out
from under the filesystem.
Grow the array back to the full size of the larger disk and most likely
you should be able to mount the filesystem again. You will need to get
a replacement disk that is the same size as the original that failed if
you want to replace it, or if you can shrink the filesystem to fit on
the new disk, you have to do that FIRST, then shrink the raid array.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID-1 not accessible after disk replacement
2024-05-23 16:23 ` Phillip Susi
@ 2024-05-24 15:23 ` Richard
2024-05-24 15:55 ` Richard
0 siblings, 1 reply; 8+ messages in thread
From: Richard @ 2024-05-24 15:23 UTC (permalink / raw)
To: linux-raid
Philip, Kuai,
Op donderdag 23 mei 2024 18:23:31 CEST schreef Phillip Susi:
> Richard <richard@radoeka.nl> writes:
> > I grew (--grow) the RAID to an smaller size as it was complaining about
> > the
> > size (no logging of that).
> > After the this action the RAID was functioning and fully accessible.
>
> I think you mean you used -z to reduce the size of the array. It
> appears that you are trying to replace the failed drive with one that is
> half the size, then shrunk the array, which truncated your filesystem,
> which is why you can no longer access it. You can't shrink the disk out
> from under the filesystem.
>
> Grow the array back to the full size of the larger disk and most likely
> you should be able to mount the filesystem again. You will need to get
> a replacement disk that is the same size as the original that failed if
> you want to replace it, or if you can shrink the filesystem to fit on
> the new disk, you have to do that FIRST, then shrink the raid array.
I followed your advice, and made the array size the same as it used to be.
I'm now able to see the data on the partition (RAID) again.
Very nice.
Thanks a lot for your support.
--
Richard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID-1 not accessible after disk replacement
2024-05-24 15:23 ` Richard
@ 2024-05-24 15:55 ` Richard
2024-05-25 19:27 ` Roger Heflin
0 siblings, 1 reply; 8+ messages in thread
From: Richard @ 2024-05-24 15:55 UTC (permalink / raw)
To: linux-raid
Op vrijdag 24 mei 2024 17:23:49 CEST schreef Richard:
> Philip, Kuai,
>
> Op donderdag 23 mei 2024 18:23:31 CEST schreef Phillip Susi:
> > Richard <richard@radoeka.nl> writes:
> > > I grew (--grow) the RAID to an smaller size as it was complaining about
> > > the
> > > size (no logging of that).
> > > After the this action the RAID was functioning and fully accessible.
> >
> > I think you mean you used -z to reduce the size of the array. It
> > appears that you are trying to replace the failed drive with one that is
> > half the size, then shrunk the array, which truncated your filesystem,
> > which is why you can no longer access it. You can't shrink the disk out
> > from under the filesystem.
> >
> > Grow the array back to the full size of the larger disk and most likely
> > you should be able to mount the filesystem again. You will need to get
> > a replacement disk that is the same size as the original that failed if
> > you want to replace it, or if you can shrink the filesystem to fit on
> > the new disk, you have to do that FIRST, then shrink the raid array.
>
> I followed your advice, and made the array size the same as it used to be.
> I'm now able to see the data on the partition (RAID) again.
> Very nice.
>
> Thanks a lot for your support.
I'm getting a bigger drive. That means that I'm going to get the following
setup:
/dev/sda6 403GB (the one that is now the active partition)
I'll make /dev/sdb6 the same size, also 403 GB.
The array size is now set at 236 GB (with sda6 having a size of 403GB).
Once both 403GB partitions are part of the array, would it then be possible to
grow the array from 236GB to 400GB? Or will that result in problems as well?
--
Richard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID-1 not accessible after disk replacement
2024-05-24 15:55 ` Richard
@ 2024-05-25 19:27 ` Roger Heflin
2024-05-28 19:04 ` Phillip Susi
0 siblings, 1 reply; 8+ messages in thread
From: Roger Heflin @ 2024-05-25 19:27 UTC (permalink / raw)
To: Richard; +Cc: linux-raid
Going bigger works once you resize the fs (that requires a separate
command specific to your fs).
Going smaller typically requires the FS to be umounted, maybe fscked,
and resized smaller (assuming the FS even supports that, xfs does not)
before the array is made smaller. Resizing the array or LV smaller
before and/or without the fs being resized only ends when the resize
smaller is undone (like you did). When going smaller I also tend to
make the fs a decent amount smaller than I need to, then make the
array smaller and then resize the fs up using no options (so it uses
the current larger device size).
On Fri, May 24, 2024 at 10:56 AM Richard <richard@radoeka.nl> wrote:
>
> Op vrijdag 24 mei 2024 17:23:49 CEST schreef Richard:
> > Philip, Kuai,
> >
> > Op donderdag 23 mei 2024 18:23:31 CEST schreef Phillip Susi:
> > > Richard <richard@radoeka.nl> writes:
> > > > I grew (--grow) the RAID to an smaller size as it was complaining about
> > > > the
> > > > size (no logging of that).
> > > > After the this action the RAID was functioning and fully accessible.
> > >
> > > I think you mean you used -z to reduce the size of the array. It
> > > appears that you are trying to replace the failed drive with one that is
> > > half the size, then shrunk the array, which truncated your filesystem,
> > > which is why you can no longer access it. You can't shrink the disk out
> > > from under the filesystem.
> > >
> > > Grow the array back to the full size of the larger disk and most likely
> > > you should be able to mount the filesystem again. You will need to get
> > > a replacement disk that is the same size as the original that failed if
> > > you want to replace it, or if you can shrink the filesystem to fit on
> > > the new disk, you have to do that FIRST, then shrink the raid array.
> >
> > I followed your advice, and made the array size the same as it used to be.
> > I'm now able to see the data on the partition (RAID) again.
> > Very nice.
> >
> > Thanks a lot for your support.
>
> I'm getting a bigger drive. That means that I'm going to get the following
> setup:
>
> /dev/sda6 403GB (the one that is now the active partition)
>
> I'll make /dev/sdb6 the same size, also 403 GB.
>
> The array size is now set at 236 GB (with sda6 having a size of 403GB).
>
> Once both 403GB partitions are part of the array, would it then be possible to
> grow the array from 236GB to 400GB? Or will that result in problems as well?
>
> --
> Richard
>
>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID-1 not accessible after disk replacement
2024-05-25 19:27 ` Roger Heflin
@ 2024-05-28 19:04 ` Phillip Susi
0 siblings, 0 replies; 8+ messages in thread
From: Phillip Susi @ 2024-05-28 19:04 UTC (permalink / raw)
To: Roger Heflin, Richard; +Cc: linux-raid
Roger Heflin <rogerheflin@gmail.com> writes:
> Going bigger works once you resize the fs (that requires a separate
> command specific to your fs).
>
> Going smaller typically requires the FS to be umounted, maybe fscked,
> and resized smaller (assuming the FS even supports that, xfs does not)
> before the array is made smaller. Resizing the array or LV smaller
> before and/or without the fs being resized only ends when the resize
> smaller is undone (like you did). When going smaller I also tend to
> make the fs a decent amount smaller than I need to, then make the
> array smaller and then resize the fs up using no options (so it uses
> the current larger device size).
Just going to mention that btrfs can be shrunk while it is mounted.
It's a pretty neat thing to see gparted shrink the partition of a
mounted btrfs volume.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-05-28 19:04 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23 11:47 RAID-1 not accessible after disk replacement Richard
2024-05-23 12:20 ` Yu Kuai
2024-05-23 12:47 ` Richard
2024-05-23 16:23 ` Phillip Susi
2024-05-24 15:23 ` Richard
2024-05-24 15:55 ` Richard
2024-05-25 19:27 ` Roger Heflin
2024-05-28 19:04 ` Phillip Susi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).