* Repairing R1: Part tabl, & precise command
@ 2016-04-05 9:14 Ron Leach
2016-04-05 11:04 ` Ron Leach
0 siblings, 1 reply; 8+ messages in thread
From: Ron Leach @ 2016-04-05 9:14 UTC (permalink / raw)
To: Linux-RAID
List, good morning,
We use a 2 x 3TB Raid 1 configuration in a Debian Oldstable (Wheezy)
machine employed for backup and one of the Raid pair has dropped off
the array. An lsdrv report [1] is pasted below; sdb has failed.
Oddly, the offending disc seems to not have a partition table, now,
either. The backup space occupied an LVM space on a 2.5TB R1 array;
the discs also have other R1 arrays. I haven't altered any of the
information on the surviving element of the raid1-lvm, which seems to
be (read) functioning and with its data complete.
I've re-seated cables and power connectors and read-tested the failed
disc using
dd if=/dev/sdb of=/dev/null bs=1M
which, after running overnight, reported the whole (suspect) disc read
without any errors.
I'd like to try to restore the array using the previously-failed disk,
temporarily, while another drive arrives from suppliers. I'm not sure
of the precise mechanism for restoring the array. I think the same
process will be needed when the replacement disk - which will be
blank, too - arrives.
Presumably, I need to set up the partition table again. These are
identical discs from the same manufacturer. Elsewhere, people have
suggested using:
sfdisk -d /dev/sdc | sfdisk /dev/sdb
Will this be ok for mdadm or will this command also replicate some
UUIDs or headers or partition content that mdadm prefers should be
kept unique?
Having prepared the partition table on sdb, is the next step a simple
mdadm --manage /dev/md(x) --add /dev/sdb(y)
sequence of commands? Do I need to disable any attempt by mdadm to
rebuild itself?
Would be grateful for any pointers to anything incorrect I'm
proposing. Though it is 'only' backup material on the disks, to us it
is pretty important because it was incremental and also therefore is
the repository of anything accidentally deleted since.
regards, Ron
[1] lsdrv output
root@D7bak:/home/user# ./lsdrv
PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation NM10/ICH7
Family SATA Controller [IDE mode] (rev 01)
-scsi 0:0:1:0 ATA WDC WD2500AAKX-2 {WD-WCC2ED752256}
-:sda 232.88g [8:0] Partitioned (dos)
- -sda1 17.27g [8:1] ext2 {ac6943f4-24dd-4605-ab97-9633bdf4aa0f}
- -sda2 1.00k [8:2] Partitioned (dos)
- -sda5 1.86g [8:5] ext4 {d0f0831b-b67b-43c7-869a-74ccd2b82f0a}
- -:Mounted as /dev/disk/by-uuid/d0f0831b-b67b-43c7-869a-74ccd2b82f0a @ /
- -sda6 13.97g [8:6] ext4 {777a0299-5756-4bb4-ae86-4b968a25ed20}
- -:Mounted as /dev/sda6 @ /usr
- -sda7 23.28g [8:7] ext4 {519b7c6e-cc14-4b4e-b8cb-8d9573da48fb}
- -:Mounted as /dev/sda7 @ /var
- -sda8 2.05g [8:8] swap {c1527154-1cfe-42c9-bc00-780fda199508}
- -sda9 2.79g [8:9] ext4 {ca294bd6-4c95-4008-8531-1f467d2a3d7b}
- -:Mounted as /dev/sda9 @ /tmp
- :sda10 79.16g [8:10] ext4 {1a4793f3-cb46-4256-88f6-f065b4c49d88}
- :Mounted as /dev/sda10 @ /home
-scsi 1:0:0:0 ATA WDC WD30EZRX-00D {WD-WMC1T4003426}
-:sdb 2.73t [8:16] Empty/Unknown
:scsi 1:0:1:0 ATA WDC WD30EZRX-00D {WD-WMC1T3894559}
:sdc 2.73t [8:32] Partitioned (gpt)
-sdc1 1.00m [8:33] Empty/Unknown
-sdc2 100.00m [8:34] MD raid1 (1/2) in_sync
{cc72a049-f9e6-7085-4a8f-b478c8ee9588}
-:md2 99.94m [9:2] MD v0.90 raid1 (2) read-auto DEGRADED
{cc72a049:f9e67085:4a8fb478:c8ee9588}
- ext2 'boot' {e211fdf8-4179-4d52-b994-61e958789b6c}
-sdc3 2.00g [8:35] Empty/Unknown
-sdc4 150.00g [8:36] MD raid1 (1/2) in_sync 'D7bak:4'
{741fb6b3-491b-f823-ba5d-bb377101ce96}
-:md4 149.87g [9:4] MD v1.2 raid1 (2) read-auto DEGRADED
{741fb6b3:491bf823:ba5dbb37:7101ce96}
- ext4 'OS' {618eae7a-2a6b-44fd-90d4-74595d3f24bd}
:sdc5 2.58t [8:37] MD raid1 (1/2) in_sync 'D7bak:5'
{0a1bd77e-6f0d-4fba-3260-932c021ed347}
:md5 2.58t [9:5] MD v1.2 raid1 (2) clean DEGRADED
{0a1bd77e:6f0d4fba:3260932c:021ed347}
- PV LVM2_member 2.58t used, 0 free
{5b0KRp-rFJ3-WiBR-JW3i-U01v-SITm-fNcVbr}
:VG bkp100vg 2.58t 0 free {zWgmjF-zYiv-X9fp-0XCU-RFIf-kT8E-7bropr}
:dm-0 2.58t [253:0] LV bkp100lv ext4
{709a00ef-9306-4617-b464-4f30a4790f60}
:Mounted as /dev/mapper/bkp100vg-bkp100lv @ /mnt/bkp
Other Block Devices
-loop0 0.00k [7:0] Empty/Unknown
-loop1 0.00k [7:1] Empty/Unknown
-loop2 0.00k [7:2] Empty/Unknown
-loop3 0.00k [7:3] Empty/Unknown
-loop4 0.00k [7:4] Empty/Unknown
-loop5 0.00k [7:5] Empty/Unknown
-loop6 0.00k [7:6] Empty/Unknown
-loop7 0.00k [7:7] Empty/Unknown
root@D7bak:/home/user#
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Repairing R1: Part tabl, & precise command
2016-04-05 9:14 Repairing R1: Part tabl, & precise command Ron Leach
@ 2016-04-05 11:04 ` Ron Leach
2016-04-05 12:22 ` Ron Leach
0 siblings, 1 reply; 8+ messages in thread
From: Ron Leach @ 2016-04-05 11:04 UTC (permalink / raw)
To: Linux RAID Mailing List
On 05/04/2016 10:14, Ron Leach wrote:
>
> Presumably, I need to set up the partition table again. These are
> identical discs from the same manufacturer. Elsewhere, people have
> suggested using:
>
> sfdisk -d /dev/sdc | sfdisk /dev/sdb
>
(Apologies for replying to my own post.)
Everybody will have noticed how ill-prepared I am;
man sfdisk
explains, right at the start, that sfdisk will NOT work with gpt
partitions.
I'll look at parted and gdisk.
Ron
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Repairing R1: Part tabl, & precise command
2016-04-05 11:04 ` Ron Leach
@ 2016-04-05 12:22 ` Ron Leach
2016-04-05 15:28 ` Phil Turmel
0 siblings, 1 reply; 8+ messages in thread
From: Ron Leach @ 2016-04-05 12:22 UTC (permalink / raw)
Cc: Linux RAID Mailing List
On 05/04/2016 12:04, Ron Leach wrote:
>
> I'll look at parted and gdisk.
>
gdisk reports /dev/sdb has a damaged main GPT partition table, and a
reasonable backup GPT partition table - but with an invalid header.
(/dev/sda and /dev/sdc are fine.) So the immediate problem here is
fixing the GPT partition table.
Strictly, that isn't a RAID1 problem and I don't think it's
appropriate to ask for comment on that issue on the raid list; people
here are truly helpful so I'll fix the partition but I could use a
comment on whether I should then just use the
mdadm -- manage .. --add ..
commands and whether mdadm needs to be inhibited from taking any
automatic remedial action.
regards, Ron
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Repairing R1: Part tabl, & precise command
2016-04-05 12:22 ` Ron Leach
@ 2016-04-05 15:28 ` Phil Turmel
2016-04-05 16:34 ` Ron Leach
0 siblings, 1 reply; 8+ messages in thread
From: Phil Turmel @ 2016-04-05 15:28 UTC (permalink / raw)
To: Ron Leach; +Cc: Linux RAID Mailing List
On 04/05/2016 08:22 AM, Ron Leach wrote:
>
> mdadm -- manage .. --add ..
>
> commands and whether mdadm needs to be inhibited from taking any
> automatic remedial action.
If your array has write-intent bitmaps, use --re-add instead of --add.
It'll be quick. Otherwise just --add and let it rebuild.
Phil
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Repairing R1: Part tabl, & precise command
2016-04-05 15:28 ` Phil Turmel
@ 2016-04-05 16:34 ` Ron Leach
2016-04-05 23:47 ` Adam Goryachev
0 siblings, 1 reply; 8+ messages in thread
From: Ron Leach @ 2016-04-05 16:34 UTC (permalink / raw)
To: Linux RAID Mailing List
On 05/04/2016 16:28, Phil Turmel wrote:
>
> If your array has write-intent bitmaps, use --re-add instead of --add.
> It'll be quick. Otherwise just --add and let it rebuild.
>
Phil, thanks for the advice.
I hit an unexpected problem fixing the partition table on /dev/sdb,
the disk that dropped from the Raid1 array. The problem is caused by
/dev/sdb being *smaller* than /dev/sdc (the working array member) -
despite the disks being identical products from WD. gdisk complains
that partition 5 (/dev/sdb5), which is to be the Raid1 partner for the
LVM containing all our backed up files, is too big (together with the
other partitions) for the /dev/sdb disk.
Presumably, raid1 doesn't work if an 'add'ed disk partition is smaller
than the existing, running, degraded array? Am I right in thinking
that the LVM won't be able to be carried securely on the underlying md
system? lsdrv is reporting that /dev/md127 has 0 free, so it seems
that the LVM is occupying the complete space of /dev/md127, and it
must be using the complete space of the underlying /dev/sdc5 because
only sdc is active, at the moment (the Raid1 being still degraded).
To protect the LVM, what would be a good thing to do? Should I define
a slightly shorter 'partner' partition on the failed disk (/dev/sdb) -
I would think not, but I would welcome advice.
I did think about reducing the size of one of the other partitions on
/dev/sdb - there's a swap partition of 2G which could become 1.5G,
because there's another 2G on the working disk anyway. Doing that,
the partner partitions for the real data could be the same size,
though not in exactly the same place on both disks. I think this
might work?
regards, Ron
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Repairing R1: Part tabl, & precise command
2016-04-05 16:34 ` Ron Leach
@ 2016-04-05 23:47 ` Adam Goryachev
2016-04-06 10:14 ` Ron Leach
0 siblings, 1 reply; 8+ messages in thread
From: Adam Goryachev @ 2016-04-05 23:47 UTC (permalink / raw)
To: Ron Leach, Linux RAID Mailing List
On 06/04/16 02:34, Ron Leach wrote:
> On 05/04/2016 16:28, Phil Turmel wrote:
>
>>
>> If your array has write-intent bitmaps, use --re-add instead of --add.
>> It'll be quick. Otherwise just --add and let it rebuild.
>>
>
> Phil, thanks for the advice.
>
> I hit an unexpected problem fixing the partition table on /dev/sdb,
> the disk that dropped from the Raid1 array. The problem is caused by
> /dev/sdb being *smaller* than /dev/sdc (the working array member) -
> despite the disks being identical products from WD. gdisk complains
> that partition 5 (/dev/sdb5), which is to be the Raid1 partner for the
> LVM containing all our backed up files, is too big (together with the
> other partitions) for the /dev/sdb disk.
>
> Presumably, raid1 doesn't work if an 'add'ed disk partition is smaller
> than the existing, running, degraded array? Am I right in thinking
> that the LVM won't be able to be carried securely on the underlying md
> system? lsdrv is reporting that /dev/md127 has 0 free, so it seems
> that the LVM is occupying the complete space of /dev/md127, and it
> must be using the complete space of the underlying /dev/sdc5 because
> only sdc is active, at the moment (the Raid1 being still degraded).
>
> To protect the LVM, what would be a good thing to do? Should I define
> a slightly shorter 'partner' partition on the failed disk (/dev/sdb) -
> I would think not, but I would welcome advice.
>
> I did think about reducing the size of one of the other partitions on
> /dev/sdb - there's a swap partition of 2G which could become 1.5G,
> because there's another 2G on the working disk anyway. Doing that, the
> partner partitions for the real data could be the same size, though
> not in exactly the same place on both disks. I think this might work?
>
> regards, Ron
Hi Ron,
That is one option (reduce the swap partition size). You might also look
at the mdadm information of the array, generally it is possible to
create a raid1 array across two devices that are different size, and
mdadm will automatically ignore the "excess" space of the larger drive.
eg:
sda1 1000M
sdb1 1050M
The disks and partition tables will show both disks 100% full, because
the partition fills the disk
mdadm will ignore the extra 50M on sdb1 and create a raid1 array of 1000M
LVM (or whatever you put onto the raid1) will show 1000M as the total
size, and will know nothing about the extra 50M
I think mdadm is silent about size differences if the difference is less
than 10% (or some other percentage value).
Another concern I have is that the drive has a number of damaged
sectors, has used up all the "spare" sectors that it has for
re-allocation, and is now reporting a smaller size because it knows that
a number of sectors are bad. I don't think drives do this, but it is a
failed drive, and manufacturers might do some strange things.
Can you provide full output of smartctl, it should show more details on
the status of the drive, what damage it might have/etc...
Regards,
Adam
--
Adam Goryachev Website Managers www.websitemanagers.com.au
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Repairing R1: Part tabl, & precise command
2016-04-05 23:47 ` Adam Goryachev
@ 2016-04-06 10:14 ` Ron Leach
2016-04-06 10:31 ` Étienne Buira
0 siblings, 1 reply; 8+ messages in thread
From: Ron Leach @ 2016-04-06 10:14 UTC (permalink / raw)
To: Linux RAID Mailing List
On 06/04/2016 00:47, Adam Goryachev wrote:
> That is one option (reduce the swap partition size). You might also
> look at the mdadm information of the array, generally it is possible
> to create a raid1 array across two devices that are different size,
> and mdadm will automatically ignore the "excess" space of the larger
> drive.
>
> eg:
> sda1 1000M
> sdb1 1050M
>
> The disks and partition tables will show both disks 100% full, because
> the partition fills the disk
> mdadm will ignore the extra 50M on sdb1 and create a raid1 array of 1000M
> LVM (or whatever you put onto the raid1) will show 1000M as the total
> size, and will know nothing about the extra 50M
>
> I think mdadm is silent about size differences if the difference is
> less than 10% (or some other percentage value).
Adam, thank you for confirming that reducing swap might work. I'd
prefer that because I worry that if the md beneath the LVM reduces in
size, then I could lose data that is in that LVM at present.
I checked fstab and realised that the swap partitions on /dev/sdb are
not used, so I have gone ahead and reduced the swap partition, and
rebuilt a good gpt table. That went ok. mdadm is now 'add'ing the
partner partitions to the 3 md devices in the system.
>
> [snip]
>
> Can you provide full output of smartctl, it should show more details
> on the status of the drive, what damage it might have/etc...
>
I thought this was good advice but apt-get isn't finding smartctl. I
think your surmise is likely, though, that this disk has 'reduced' in
size because of faults, and I've a new 3TB drive on its way. I'll
replace this suspect disk as soon as it arrives.
Grateful for the help
regards, Ron
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Repairing R1: Part tabl, & precise command
2016-04-06 10:14 ` Ron Leach
@ 2016-04-06 10:31 ` Étienne Buira
0 siblings, 0 replies; 8+ messages in thread
From: Étienne Buira @ 2016-04-06 10:31 UTC (permalink / raw)
To: Ron Leach; +Cc: Linux RAID Mailing List
On Wed, Apr 06, 2016 at 11:14:02AM +0100, Ron Leach wrote:
> On 06/04/2016 00:47, Adam Goryachev wrote:
../..
> I thought this was good advice but apt-get isn't finding smartctl. I
> think your surmise is likely, though, that this disk has 'reduced' in
> size because of faults, and I've a new 3TB drive on its way. I'll
> replace this suspect disk as soon as it arrives.
Hi,
Package is named smartmontools on debian, and IMHO you want to regularly
check HDD health status (smartctl -a /dev/sdX).
Regards
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-04-06 10:31 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-05 9:14 Repairing R1: Part tabl, & precise command Ron Leach
2016-04-05 11:04 ` Ron Leach
2016-04-05 12:22 ` Ron Leach
2016-04-05 15:28 ` Phil Turmel
2016-04-05 16:34 ` Ron Leach
2016-04-05 23:47 ` Adam Goryachev
2016-04-06 10:14 ` Ron Leach
2016-04-06 10:31 ` Étienne Buira
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).