how to proceed with possible corruption

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* how to proceed with possible corruption
@ 2012-12-19 21:06 Ross Boylan
  2012-12-19 22:30 ` Robin Hill
  0 siblings, 1 reply; 4+ messages in thread
From: Ross Boylan @ 2012-12-19 21:06 UTC (permalink / raw)
  To: linux-raid; +Cc: ross

Short version: I suspect some of my array components may be corrupt, and
wonder what the best way is to  check for it and fix it.

I have a VM  configued similarly to my real machine for testing.
When I brought it up there were some complaints about the arrays and
needing to resync them.  While the sync appeared to complete
successfully, the VM was quite unstable afterwards, as it had not been
before.

The VM was most likely shut down abruptly when the real machine had a
power failure.  Also, I had added a 3rd disk to my RAID-1 arrays since I
last booted the VM, but my mdadm.conf had -num-devices=2 (which we have
already established is a recipe for trouble).

I installed a new kernel in the VM, and have not had problems since.  So
I wonder if some of the kernel files got corrupted, and more generally
if the virtual disks are trustworthy.

Does this /proc/mdstat offer any clues about which disks might be a
problem?

Personalities : [raid1]
md0 : active raid1 sda1[0] sdc1[2] sdb1[1]
      96256 blocks [3/3] [UUU]

md1 : active raid1 sdb3[0] sdc3[2] sda3[1]
      8187712 blocks [3/3] [UUU]

unused devices: <none>

It seems odd the disks are out of order, that is not sda, sdb, sdc.

I know I could fail some components and add them back to assure
consistency, but this wouldn't tell me if they were inconsistent before
that.  There's also the possibility they are  consisstent but corrupt.

Rebuiding the VM would take significant time.

Thanks for any advice.
Ross Boylan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: how to proceed with possible corruption
  2012-12-19 21:06 how to proceed with possible corruption Ross Boylan
@ 2012-12-19 22:30 ` Robin Hill
  2012-12-19 23:16   ` Ross Boylan
  0 siblings, 1 reply; 4+ messages in thread
From: Robin Hill @ 2012-12-19 22:30 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2391 bytes --]

On Wed Dec 19, 2012 at 01:06:35PM -0800, Ross Boylan wrote:

> Short version: I suspect some of my array components may be corrupt, and
> wonder what the best way is to  check for it and fix it.
> 
> I have a VM  configued similarly to my real machine for testing.
> When I brought it up there were some complaints about the arrays and
> needing to resync them.  While the sync appeared to complete
> successfully, the VM was quite unstable afterwards, as it had not been
> before.
> 
> The VM was most likely shut down abruptly when the real machine had a
> power failure.  Also, I had added a 3rd disk to my RAID-1 arrays since I
> last booted the VM, but my mdadm.conf had -num-devices=2 (which we have
> already established is a recipe for trouble).
> 
> I installed a new kernel in the VM, and have not had problems since.  So
> I wonder if some of the kernel files got corrupted, and more generally
> if the virtual disks are trustworthy.
> 
> Does this /proc/mdstat offer any clues about which disks might be a
> problem?
> 
> Personalities : [raid1]
> md0 : active raid1 sda1[0] sdc1[2] sdb1[1]
>       96256 blocks [3/3] [UUU]
> 
> md1 : active raid1 sdb3[0] sdc3[2] sda3[1]
>       8187712 blocks [3/3] [UUU]
> 
> unused devices: <none>
> 
> It seems odd the disks are out of order, that is not sda, sdb, sdc.
> 
> I know I could fail some components and add them back to assure
> consistency, but this wouldn't tell me if they were inconsistent before
> that.  There's also the possibility they are  consisstent but corrupt.
> 
The arrays are RAID1, so the order of the disks is irrelevant - they
data should be identical on all disks.

You can check whether the disks are all in sync by doing:
    echo check > /sys/block/mdX/md/sync_action

Once the check is complete (you can see the progress via /proc/mdstat)
then /sys/block/mdX/md/mismatch_cnt will indicate whether or not there
are any mismatches. If so, use "repair" instead of "check" in the above
command to resync the drives.

Otherwise, the issue could be filesystem corruption. A "fsck -f" on the
array should detect that.

HTH,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: how to proceed with possible corruption
  2012-12-19 22:30 ` Robin Hill
@ 2012-12-19 23:16   ` Ross Boylan
  2012-12-20  8:38     ` Robin Hill
  0 siblings, 1 reply; 4+ messages in thread
From: Ross Boylan @ 2012-12-19 23:16 UTC (permalink / raw)
  To: linux-raid

On 12/19/2012 2:30 PM, Robin Hill wrote:
> On Wed Dec 19, 2012 at 01:06:35PM -0800, Ross Boylan wrote:
>
>> Short version: I suspect some of my array components may be corrupt, and
>> wonder what the best way is to  check for it and fix it.
>>
>> I have a VM  configued similarly to my real machine for testing.
>> When I brought it up there were some complaints about the arrays and
>> needing to resync them.  While the sync appeared to complete
>> successfully, the VM was quite unstable afterwards, as it had not been
>> before.
>>
>> The VM was most likely shut down abruptly when the real machine had a
>> power failure.  Also, I had added a 3rd disk to my RAID-1 arrays since I
>> last booted the VM, but my mdadm.conf had -num-devices=2 (which we have
>> already established is a recipe for trouble).
>>
>> I installed a new kernel in the VM, and have not had problems since.  So
>> I wonder if some of the kernel files got corrupted, and more generally
>> if the virtual disks are trustworthy.
>>
>> Does this /proc/mdstat offer any clues about which disks might be a
>> problem?
>>
>> Personalities : [raid1]
>> md0 : active raid1 sda1[0] sdc1[2] sdb1[1]
>>        96256 blocks [3/3] [UUU]
>>
>> md1 : active raid1 sdb3[0] sdc3[2] sda3[1]
>>        8187712 blocks [3/3] [UUU]
>>
>> unused devices: <none>
>>
>> It seems odd the disks are out of order, that is not sda, sdb, sdc.
>>
>> I know I could fail some components and add them back to assure
>> consistency, but this wouldn't tell me if they were inconsistent before
>> that.  There's also the possibility they are  consisstent but corrupt.
>>
> The arrays are RAID1, so the order of the disks is irrelevant - they
> data should be identical on all disks.
>
> You can check whether the disks are all in sync by doing:
>      echo check > /sys/block/mdX/md/sync_action
>
> Once the check is complete (you can see the progress via /proc/mdstat)
> then /sys/block/mdX/md/mismatch_cnt will indicate whether or not there
> are any mismatches. If so, use "repair" instead of "check" in the above
> command to resync the drives.
Thank you for the tip.  This says there are no mismatches, and so I can 
count on the components being in sync.
> Otherwise, the issue could be filesystem corruption. A "fsck -f" on the
> array should detect that.
I think that's my next step.  My understanding is that this guarantees 
the integrity of the file system, but not necessarily the integrity of 
the contents of individual files.  I'm on ext3.
>
> HTH,
>      Robin


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: how to proceed with possible corruption
  2012-12-19 23:16   ` Ross Boylan
@ 2012-12-20  8:38     ` Robin Hill
  0 siblings, 0 replies; 4+ messages in thread
From: Robin Hill @ 2012-12-20  8:38 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1129 bytes --]

On Wed Dec 19, 2012 at 03:16:22PM -0800, Ross Boylan wrote:

> On 12/19/2012 2:30 PM, Robin Hill wrote:

> > Otherwise, the issue could be filesystem corruption. A "fsck -f" on the
> > array should detect that.
> I think that's my next step.  My understanding is that this guarantees 
> the integrity of the file system, but not necessarily the integrity of 
> the contents of individual files.  I'm on ext3.
> 
To check the files themselves, you'll have to have some info on their
correct state. A lot of distributions will include this as part of the
packaging, so that should be your first port of call (e.g. rpm
--verify). I'm not aware of any other way to verify the file contents
without previously having read their correct state (there's various file
integrity checking tools, intended for security checks to prevent
surreptitious replacement of system files).

HTH,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-12-20  8:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-19 21:06 how to proceed with possible corruption Ross Boylan
2012-12-19 22:30 ` Robin Hill
2012-12-19 23:16   ` Ross Boylan
2012-12-20  8:38     ` Robin Hill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).