Repairing R1: Part tabl, & precise command

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Repairing R1: Part tabl, & precise command
@ 2016-04-05  9:14 Ron Leach
  2016-04-05 11:04 ` Ron Leach
  0 siblings, 1 reply; 8+ messages in thread
From: Ron Leach @ 2016-04-05  9:14 UTC (permalink / raw)
  To: Linux-RAID

List, good morning,

We use a 2 x 3TB Raid 1 configuration in a Debian Oldstable (Wheezy) 
machine employed for backup and one of the Raid pair has dropped off 
the array.  An lsdrv report [1] is pasted below; sdb has failed. 
Oddly, the offending disc seems to not have a partition table, now, 
either.  The backup space occupied an LVM space on a 2.5TB R1 array; 
the discs also have other R1 arrays.  I haven't altered any of the 
information on the surviving element of the raid1-lvm, which seems to 
be (read) functioning and with its data complete.

I've re-seated cables and power connectors and read-tested the failed 
disc using

dd if=/dev/sdb of=/dev/null bs=1M

which, after running overnight, reported the whole (suspect) disc read 
without any errors.

I'd like to try to restore the array using the previously-failed disk, 
temporarily, while another drive arrives from suppliers.  I'm not sure 
of the precise mechanism for restoring the array.  I think the same 
process will be needed when the replacement disk - which will be 
blank, too - arrives.

Presumably, I need to set up the partition table again.  These are 
identical discs from the same manufacturer.  Elsewhere, people have 
suggested using:

sfdisk -d /dev/sdc | sfdisk /dev/sdb

Will this be ok for mdadm or will this command also replicate some 
UUIDs or headers or partition content that mdadm prefers should be 
kept unique?

Having prepared the partition table on sdb, is the next step a simple

mdadm --manage /dev/md(x) --add /dev/sdb(y)

sequence of commands?  Do I need to disable any attempt by mdadm to 
rebuild itself?

Would be grateful for any pointers to anything incorrect I'm 
proposing.  Though it is 'only' backup material on the disks, to us it 
is pretty important because it was incremental and also therefore is 
the repository of anything accidentally deleted since.

regards, Ron

[1] lsdrv output

root@D7bak:/home/user# ./lsdrv

PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation NM10/ICH7 
Family SATA Controller [IDE mode] (rev 01)

-scsi 0:0:1:0 ATA      WDC WD2500AAKX-2 {WD-WCC2ED752256}

-:sda 232.88g [8:0] Partitioned (dos)
- -sda1 17.27g [8:1] ext2 {ac6943f4-24dd-4605-ab97-9633bdf4aa0f}
- -sda2 1.00k [8:2] Partitioned (dos)
- -sda5 1.86g [8:5] ext4 {d0f0831b-b67b-43c7-869a-74ccd2b82f0a}
- -:Mounted as /dev/disk/by-uuid/d0f0831b-b67b-43c7-869a-74ccd2b82f0a @ /
- -sda6 13.97g [8:6] ext4 {777a0299-5756-4bb4-ae86-4b968a25ed20}
- -:Mounted as /dev/sda6 @ /usr
- -sda7 23.28g [8:7] ext4 {519b7c6e-cc14-4b4e-b8cb-8d9573da48fb}
- -:Mounted as /dev/sda7 @ /var
- -sda8 2.05g [8:8] swap {c1527154-1cfe-42c9-bc00-780fda199508}
- -sda9 2.79g [8:9] ext4 {ca294bd6-4c95-4008-8531-1f467d2a3d7b}
- -:Mounted as /dev/sda9 @ /tmp
- :sda10 79.16g [8:10] ext4 {1a4793f3-cb46-4256-88f6-f065b4c49d88}
-  :Mounted as /dev/sda10 @ /home

-scsi 1:0:0:0 ATA      WDC WD30EZRX-00D {WD-WMC1T4003426}

-:sdb 2.73t [8:16] Empty/Unknown

:scsi 1:0:1:0 ATA      WDC WD30EZRX-00D {WD-WMC1T3894559}

  :sdc 2.73t [8:32] Partitioned (gpt)
   -sdc1 1.00m [8:33] Empty/Unknown
   -sdc2 100.00m [8:34] MD raid1 (1/2) in_sync 
{cc72a049-f9e6-7085-4a8f-b478c8ee9588}
   -:md2 99.94m [9:2] MD v0.90 raid1 (2) read-auto DEGRADED 
{cc72a049:f9e67085:4a8fb478:c8ee9588}
   -                  ext2 'boot' {e211fdf8-4179-4d52-b994-61e958789b6c}
   -sdc3 2.00g [8:35] Empty/Unknown
   -sdc4 150.00g [8:36] MD raid1 (1/2) in_sync 'D7bak:4' 
{741fb6b3-491b-f823-ba5d-bb377101ce96}
   -:md4 149.87g [9:4] MD v1.2 raid1 (2) read-auto DEGRADED 
{741fb6b3:491bf823:ba5dbb37:7101ce96}
   -                   ext4 'OS' {618eae7a-2a6b-44fd-90d4-74595d3f24bd}
   :sdc5 2.58t [8:37] MD raid1 (1/2) in_sync 'D7bak:5' 
{0a1bd77e-6f0d-4fba-3260-932c021ed347}
    :md5 2.58t [9:5] MD v1.2 raid1 (2) clean DEGRADED 
{0a1bd77e:6f0d4fba:3260932c:021ed347}
     -               PV LVM2_member 2.58t used, 0 free 
{5b0KRp-rFJ3-WiBR-JW3i-U01v-SITm-fNcVbr}
     :VG bkp100vg 2.58t 0 free {zWgmjF-zYiv-X9fp-0XCU-RFIf-kT8E-7bropr}
      :dm-0 2.58t [253:0] LV bkp100lv ext4 
{709a00ef-9306-4617-b464-4f30a4790f60}
       :Mounted as /dev/mapper/bkp100vg-bkp100lv @ /mnt/bkp

Other Block Devices
-loop0 0.00k [7:0] Empty/Unknown
-loop1 0.00k [7:1] Empty/Unknown
-loop2 0.00k [7:2] Empty/Unknown
-loop3 0.00k [7:3] Empty/Unknown
-loop4 0.00k [7:4] Empty/Unknown
-loop5 0.00k [7:5] Empty/Unknown
-loop6 0.00k [7:6] Empty/Unknown
-loop7 0.00k [7:7] Empty/Unknown
root@D7bak:/home/user#

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Repairing R1: Part tabl, & precise command
  2016-04-05  9:14 Repairing R1: Part tabl, & precise command Ron Leach
@ 2016-04-05 11:04 ` Ron Leach
  2016-04-05 12:22   ` Ron Leach
  0 siblings, 1 reply; 8+ messages in thread
From: Ron Leach @ 2016-04-05 11:04 UTC (permalink / raw)
  To: Linux RAID Mailing List

On 05/04/2016 10:14, Ron Leach wrote:
>
> Presumably, I need to set up the partition table again. These are
> identical discs from the same manufacturer. Elsewhere, people have
> suggested using:
>
> sfdisk -d /dev/sdc | sfdisk /dev/sdb
>

(Apologies for replying to my own post.)

Everybody will have noticed how ill-prepared I am;

man sfdisk

explains, right at the start, that sfdisk will NOT work with gpt 
partitions.

I'll look at parted and gdisk.

Ron

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Repairing R1: Part tabl, & precise command
  2016-04-05 11:04 ` Ron Leach
@ 2016-04-05 12:22   ` Ron Leach
  2016-04-05 15:28     ` Phil Turmel
  0 siblings, 1 reply; 8+ messages in thread
From: Ron Leach @ 2016-04-05 12:22 UTC (permalink / raw)
  Cc: Linux RAID Mailing List

On 05/04/2016 12:04, Ron Leach wrote:

>
> I'll look at parted and gdisk.
>

gdisk reports /dev/sdb has a damaged main GPT partition table, and a 
reasonable backup GPT partition table - but with an invalid header. 
(/dev/sda and /dev/sdc are fine.)  So the immediate problem here is 
fixing the GPT partition table.

Strictly, that isn't a RAID1 problem and I don't think it's 
appropriate to ask for comment on that issue on the raid list; people 
here are truly helpful so I'll fix the partition but I could use a 
comment on whether I should then just use the

mdadm  -- manage  ..  --add  ..

commands and whether mdadm needs to be inhibited from taking any 
automatic remedial action.

regards, Ron

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Repairing R1: Part tabl, & precise command
  2016-04-05 12:22   ` Ron Leach
@ 2016-04-05 15:28     ` Phil Turmel
  2016-04-05 16:34       ` Ron Leach
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Turmel @ 2016-04-05 15:28 UTC (permalink / raw)
  To: Ron Leach; +Cc: Linux RAID Mailing List

On 04/05/2016 08:22 AM, Ron Leach wrote:
> 
> mdadm  -- manage  ..  --add  ..
> 
> commands and whether mdadm needs to be inhibited from taking any
> automatic remedial action.

If your array has write-intent bitmaps, use --re-add instead of --add.
It'll be quick.  Otherwise just --add and let it rebuild.

Phil


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Repairing R1: Part tabl, & precise command
  2016-04-05 15:28     ` Phil Turmel
@ 2016-04-05 16:34       ` Ron Leach
  2016-04-05 23:47         ` Adam Goryachev
  0 siblings, 1 reply; 8+ messages in thread
From: Ron Leach @ 2016-04-05 16:34 UTC (permalink / raw)
  To: Linux RAID Mailing List

On 05/04/2016 16:28, Phil Turmel wrote:

>
> If your array has write-intent bitmaps, use --re-add instead of --add.
> It'll be quick.  Otherwise just --add and let it rebuild.
>

Phil, thanks for the advice.

I hit an unexpected problem fixing the partition table on /dev/sdb, 
the disk that dropped from the Raid1 array.  The problem is caused by 
/dev/sdb being *smaller* than /dev/sdc (the working array member) - 
despite the disks being identical products from WD.  gdisk complains 
that partition 5 (/dev/sdb5), which is to be the Raid1 partner for the 
LVM containing all our backed up files, is too big (together with the 
other partitions) for the /dev/sdb disk.

Presumably, raid1 doesn't work if an 'add'ed disk partition is smaller 
than the existing, running, degraded array?  Am I right in thinking 
that the LVM won't be able to be carried securely on the underlying md 
system?  lsdrv is reporting that /dev/md127 has 0 free, so it seems 
that the LVM is occupying the complete space of /dev/md127, and it 
must be using the complete space of the underlying /dev/sdc5 because 
only sdc is active, at the moment (the Raid1 being still degraded).

To protect the LVM, what would be a good thing to do?  Should I define 
a slightly shorter 'partner' partition on the failed disk (/dev/sdb) - 
I would think not, but I would welcome advice.

I did think about reducing the size of one of the other partitions on 
/dev/sdb - there's a swap partition of 2G which could become 1.5G, 
because there's another 2G on the working disk anyway.  Doing that, 
the partner partitions for the real data could be the same size, 
though not in exactly the same place on both disks.  I think this 
might work?

regards, Ron

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Repairing R1: Part tabl, & precise command
  2016-04-05 16:34       ` Ron Leach
@ 2016-04-05 23:47         ` Adam Goryachev
  2016-04-06 10:14           ` Ron Leach
  0 siblings, 1 reply; 8+ messages in thread
From: Adam Goryachev @ 2016-04-05 23:47 UTC (permalink / raw)
  To: Ron Leach, Linux RAID Mailing List

On 06/04/16 02:34, Ron Leach wrote:
> On 05/04/2016 16:28, Phil Turmel wrote:
>
>>
>> If your array has write-intent bitmaps, use --re-add instead of --add.
>> It'll be quick.  Otherwise just --add and let it rebuild.
>>
>
> Phil, thanks for the advice.
>
> I hit an unexpected problem fixing the partition table on /dev/sdb, 
> the disk that dropped from the Raid1 array.  The problem is caused by 
> /dev/sdb being *smaller* than /dev/sdc (the working array member) - 
> despite the disks being identical products from WD.  gdisk complains 
> that partition 5 (/dev/sdb5), which is to be the Raid1 partner for the 
> LVM containing all our backed up files, is too big (together with the 
> other partitions) for the /dev/sdb disk.
>
> Presumably, raid1 doesn't work if an 'add'ed disk partition is smaller 
> than the existing, running, degraded array?  Am I right in thinking 
> that the LVM won't be able to be carried securely on the underlying md 
> system?  lsdrv is reporting that /dev/md127 has 0 free, so it seems 
> that the LVM is occupying the complete space of /dev/md127, and it 
> must be using the complete space of the underlying /dev/sdc5 because 
> only sdc is active, at the moment (the Raid1 being still degraded).
>
> To protect the LVM, what would be a good thing to do?  Should I define 
> a slightly shorter 'partner' partition on the failed disk (/dev/sdb) - 
> I would think not, but I would welcome advice.
>
> I did think about reducing the size of one of the other partitions on 
> /dev/sdb - there's a swap partition of 2G which could become 1.5G, 
> because there's another 2G on the working disk anyway. Doing that, the 
> partner partitions for the real data could be the same size, though 
> not in exactly the same place on both disks.  I think this might work?
>
> regards, Ron

Hi Ron,

That is one option (reduce the swap partition size). You might also look 
at the mdadm information of the array, generally it is possible to 
create a raid1 array across two devices that are different size, and 
mdadm will automatically ignore the "excess" space of the larger drive.

eg:
sda1 1000M
sdb1 1050M

The disks and partition tables will show both disks 100% full, because 
the partition fills the disk
mdadm will ignore the extra 50M on sdb1 and create a raid1 array of 1000M
LVM (or whatever you put onto the raid1) will show 1000M as the total 
size, and will know nothing about the extra 50M

I think mdadm is silent about size differences if the difference is less 
than 10% (or some other percentage value).

Another concern I have is that the drive has a number of damaged 
sectors, has used up all the "spare" sectors that it has for 
re-allocation, and is now reporting a smaller size because it knows that 
a number of sectors are bad. I don't think drives do this, but it is a 
failed drive, and manufacturers might do some strange things.

Can you provide full output of smartctl, it should show more details on 
the status of the drive, what damage it might have/etc...

Regards,
Adam

-- 
Adam Goryachev Website Managers www.websitemanagers.com.au

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Repairing R1: Part tabl, & precise command
  2016-04-05 23:47         ` Adam Goryachev
@ 2016-04-06 10:14           ` Ron Leach
  2016-04-06 10:31             ` Étienne Buira
  0 siblings, 1 reply; 8+ messages in thread
From: Ron Leach @ 2016-04-06 10:14 UTC (permalink / raw)
  To: Linux RAID Mailing List

On 06/04/2016 00:47, Adam Goryachev wrote:
> That is one option (reduce the swap partition size). You might also
> look at the mdadm information of the array, generally it is possible
> to create a raid1 array across two devices that are different size,
> and mdadm will automatically ignore the "excess" space of the larger
> drive.
>
> eg:
> sda1 1000M
> sdb1 1050M
>
> The disks and partition tables will show both disks 100% full, because
> the partition fills the disk
> mdadm will ignore the extra 50M on sdb1 and create a raid1 array of 1000M
> LVM (or whatever you put onto the raid1) will show 1000M as the total
> size, and will know nothing about the extra 50M
>
> I think mdadm is silent about size differences if the difference is
> less than 10% (or some other percentage value).

Adam, thank you for confirming that reducing swap might work.  I'd 
prefer that because I worry that if the md beneath the LVM reduces in 
size, then I could lose data that is in that LVM at present.

I checked fstab and realised that the swap partitions on /dev/sdb are 
not used, so I have gone ahead and reduced the swap partition, and 
rebuilt a good gpt table.  That went ok.  mdadm is now 'add'ing the 
partner partitions to the 3 md devices in the system.

>
> [snip]
>
> Can you provide full output of smartctl, it should show more details
> on the status of the drive, what damage it might have/etc...
>

I thought this was good advice but apt-get isn't finding smartctl.  I 
think your surmise is likely, though, that this disk has 'reduced' in 
size because of faults, and I've a new 3TB drive on its way.  I'll 
replace this suspect disk as soon as it arrives.

Grateful for the help

regards, Ron

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Repairing R1: Part tabl, & precise command
  2016-04-06 10:14           ` Ron Leach
@ 2016-04-06 10:31             ` Étienne Buira
  0 siblings, 0 replies; 8+ messages in thread
From: Étienne Buira @ 2016-04-06 10:31 UTC (permalink / raw)
  To: Ron Leach; +Cc: Linux RAID Mailing List

On Wed, Apr 06, 2016 at 11:14:02AM +0100, Ron Leach wrote:
> On 06/04/2016 00:47, Adam Goryachev wrote:

../..

> I thought this was good advice but apt-get isn't finding smartctl.  I 
> think your surmise is likely, though, that this disk has 'reduced' in 
> size because of faults, and I've a new 3TB drive on its way.  I'll 
> replace this suspect disk as soon as it arrives.

Hi,

Package is named smartmontools on debian, and IMHO you want to regularly
check HDD health status (smartctl -a /dev/sdX).

Regards


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-04-06 10:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-05  9:14 Repairing R1: Part tabl, & precise command Ron Leach
2016-04-05 11:04 ` Ron Leach
2016-04-05 12:22   ` Ron Leach
2016-04-05 15:28     ` Phil Turmel
2016-04-05 16:34       ` Ron Leach
2016-04-05 23:47         ` Adam Goryachev
2016-04-06 10:14           ` Ron Leach
2016-04-06 10:31             ` Étienne Buira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).