Re: RAID6 seemingly shrunk itself after hard power outage and rebuild with replacement disk

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: linux-raid@vger.kernel.org, robbat2@gentoo.org
Subject: Re: RAID6 seemingly shrunk itself after hard power outage and rebuild with replacement disk
Date: Sat, 05 Mar 2011 11:57:58 -0500	[thread overview]
Message-ID: <4D726B96.3010501@turmel.org> (raw)
In-Reply-To: <robbat2-20110304T221807-988505787Z@orbis-terrarum.net>

Hi Robin,

On 03/04/2011 06:27 PM, Robin H. Johnson wrote:
> (Please CC, not subscribed to linux-raid).
> 
> Problem summary:
> -------------------
> After a rebuild following disk replacement, the MD array (RAID6, 12 devices)
> appears to have shrunk by 10880KiB. Presumed at the start of the device, but no
> confirmation.

Sounds similar to a problem recently encountered by Simon McNeil...

> Background:
> -----------
> I got called in to help a friend with a data loss problem after a catastrophic
> UPS failure which killed at least one motherboards, and several disks. Almost
> all of which lead to no data loss, except for one system...
> 
> For the system in question, one disk died (cciss/c1d12), and was
> promptly replaced, and this problem started when the rebuild kicked in.
> 
> Prior to calling me, my friend had already tried a few things from a rescue
> env, and almost certainly contributed to making the problem worse, and doesn't
> have good logs of what he did.

I have a suspicion that 'mdadm --create --assume-clean' or some variant was one of those.  And that the rescue environment has a version of mdadm >= 3.1.2.  The default metadata alignment changed in that version.

> The MD array was portions of two very large LVM LVs (15TiB and ~20TiB
> respectively).  Specifically, the PV of the MD array was chunk in the middle of
> each of the two LVs.
> 
> The kernel version 2.6.35.4 did not change during the power outage.
> 
> Problem identification:
> -----------------------
> When bringing the system back online, LVM refused to make one LV accessible as
> it complained of a shrunk device. One other LV exhibited corruption.
> 
> The entry in /proc/partitions noted the array size of 14651023360KiB, while
> older LVM backups showed the usable size of the array to previously be
> 14651034240KiB, a difference of 10880KiB.
> 
> The first LV has inaccessible data for all files at or after the missing chunk.
> All files prior to that point are accessible.
> 
> LVM refused to bring the second LV online as it complained the physical device
> was now too small for all the extents. 
> 
> Prior to the outage, 800KiB of the collected devices was used for metadata, and
> post the outage, now 11680KiB is used (difference of 10880 KIB).
> 
> Questions:
> ----------
> Why did the array shrink? How can I get it back to the original size, or
> accurately identify the missing chunk size and offset, so that I can adjust the
> LVM definitions and recover the other data.

Please share mdadm -E for all of the devices in the problem array, and a sample of mdadm -E for some of the devices in the working arrays.  I think you'll find differences in the data offset.  Newer mdadm aligns to 1MB.  Older mdadm aligns to "superblock size + bitmap size".

"mdadm -E /dev/cciss/c1d{12..23}p1" should show us individual device details for the problem array.

> Collected information:
> ----------------------
> 
> Relevant lines from /proc/partitions:
> =====================================
>    9        3 14651023360 md3
>  105      209 1465103504 cciss/c1d13p1
>  ...
> 
> Line from mdstat right now:
> ===========================
> md3 : active raid6 cciss/c1d18p1[5] cciss/c1d17p1[4] cciss/c1d13p1[0]
> cciss/c1d21p1[8] cciss/c1d20p1[7] cciss/c1d19p1[6] cciss/c1d15p1[2]
> cciss/c1d12p1[12] cciss/c1d14p1[1] cciss/c1d23p1[10] cciss/c1d16p1[3]
> cciss/c1d22p1[9]
>       14651023360 blocks super 1.2 level 6, 64k chunk, algorithm 2
> 	  [12/12] [UUUUUUUUUUUU]
> 
> MDADM output:
> =============
> # mdadm --detail /dev/md3
> /dev/md3:
>         Version : 1.2
>   Creation Time : Wed Feb 16 19:53:05 2011
>      Raid Level : raid6
>      Array Size : 14651023360 (13972.30 GiB 15002.65 GB)
>   Used Dev Size : 1465102336 (1397.23 GiB 1500.26 GB)
>    Raid Devices : 12
>   Total Devices : 12
>     Persistence : Superblock is persistent
> 
>     Update Time : Fri Mar  4 17:19:43 2011
>           State : clean
>  Active Devices : 12
> Working Devices : 12
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>            Name : CENSORED:3  (local to host CENSORED)
>            UUID : efa04ecf:4dbd0bfa:820a5942:de8a234f
>          Events : 25
> 
>     Number   Major   Minor   RaidDevice State
>        0     105      209        0      active sync   /dev/cciss/c1d13p1
>        1     105      225        1      active sync   /dev/cciss/c1d14p1
>        2     105      241        2      active sync   /dev/cciss/c1d15p1
>        3     105      257        3      active sync   /dev/cciss/c1d16p1
>        4     105      273        4      active sync   /dev/cciss/c1d17p1
>        5     105      289        5      active sync   /dev/cciss/c1d18p1
>        6     105      305        6      active sync   /dev/cciss/c1d19p1
>        7     105      321        7      active sync   /dev/cciss/c1d20p1
>        8     105      337        8      active sync   /dev/cciss/c1d21p1
>        9     105      353        9      active sync   /dev/cciss/c1d22p1
>       10     105      369       10      active sync   /dev/cciss/c1d23p1
>       12     105      193       11      active sync   /dev/cciss/c1d12p1

The lowest device node is the last device role?  Any chance these are also out of order?

> LVM PV definition:
> ==================
>   pv1 {
>       id = "CENSORED"
>       device = "/dev/md3" # Hint only
>       status = ["ALLOCATABLE"]
>       flags = []
>       dev_size = 29302068480  # 13.6448 Terabytes
>       pe_start = 384 
>       pe_count = 3576912  # 13.6448 Terabytes
>   }   

It would be good to know where the LVM PV signature is on the problem array's devices, and which one has it.  LVM stores a text copy of the VG's configuration in its metadata blocks at the beginning of a PV, so you should find it on the true "Raid device 0", at the original MD data offset from the beginning of the device.

I suggest scripting a loop through each device, piping the first 1MB (with dd) to "strings -t x" to grep, looking for the PV uuid in clear text.

> LVM segments output:
> ====================
> 
> # lvs --units 1m --segments \
>   -o lv_name,lv_size,seg_start,seg_start_pe,seg_size,seg_pe_ranges \
>   vg/LV1 vg/LV2
>   LV    LSize     Start     Start   SSize    PE Ranges               
>   LV1   15728640m        0m       0 1048576m /dev/md2:1048576-1310719
>   LV1   15728640m  1048576m  262144 1048576m /dev/md2:2008320-2270463
>   LV1   15728640m  2097152m  524288 7936132m /dev/md3:1592879-3576911
>   LV1   15728640m 10033284m 2508321  452476m /dev/md4:2560-115678    
>   LV1   15728640m 10485760m 2621440 5242880m /dev/md4:2084381-3395100
>   LV2   20969720m        0m       0 4194304m /dev/md2:0-1048575      
>   LV2   20969720m  4194304m 1048576 1048576m /dev/md2:1746176-2008319
>   LV2   20969720m  5242880m 1310720  456516m /dev/md2:2270464-2384592
>   LV2   20969720m  5699396m 1424849  511996m /dev/md2:1566721-1694719
>   LV2   20969720m  6211392m 1552848       4m /dev/md2:1566720-1566720
>   LV2   20969720m  6211396m 1552849 6371516m /dev/md3:0-1592878      
>   LV2   20969720m 12582912m 3145728  512000m /dev/md2:1438720-1566719
>   LV2   20969720m 13094912m 3273728 7874808m /dev/md4:115679-2084380 
> 

If my suspicions are right, you'll have to use an old version of mdadm to redo an 'mdadm --create --assume-clean'.

HTH,

Phil

next prev parent reply	other threads:[~2011-03-05 16:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-04 23:27 RAID6 seemingly shrunk itself after hard power outage and rebuild with replacement disk Robin H. Johnson
2011-03-05  8:32 ` Stan Hoeppner
2011-03-05 16:57 ` Phil Turmel [this message]
2011-03-05 17:09   ` Phil Turmel
2011-03-06 19:22   ` Robin H. Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D726B96.3010501@turmel.org \
    --to=philip@turmel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=robbat2@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.