Repairing a RAID1 with non-zero mismatch

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Repairing a RAID1 with non-zero mismatch_cnt
@ 2020-01-20 10:02 Andrey ``Bass'' Shcheglov
  2020-01-20 10:56 ` Wols Lists
  0 siblings, 1 reply; 7+ messages in thread
From: Andrey ``Bass'' Shcheglov @ 2020-01-20 10:02 UTC (permalink / raw)
  To: linux-raid

Greetings,

I have a question on how to repair a RAID1 array (consisting of 2
physical hard drives, metadata version 1.2) which went split-brain.

One of my md-devices repeatedly shows a non-zero mismatch_cnt:

# cat /sys/block/md4/md/mismatch_cnt
1024

Zeroing out free space (with `zerofree`, as recommended here:
<http://decafbad.net/2017/01/03/mismatch_cnt,-raid1,-and-a-clever-fix/>)
and disabling the swap both retain the mismatch count at the very same
level.
Also, none of the drives is failing (18x and 19x SMART attributes are ok).
Checking file systems (ext4) doesn't show any problem, either, so the
file system metadata is most probably correct, too.

The usual suspects ruled out, I'm starting to think it my data got
corrupted, and at least one out of two replicas is affected.
Of course I can

# echo repair > /sys/block/md0/md/sync_action

but I have a 50% chance of losing information stored on the "right" replica.

So, assuming my /dev/md0 is now assembled from /dev/sda1 and /dev/sdb1,
I feel like assemble and run two separate degraded mirrors from
/dev/sda1 and /dev/sdb1, respectively (`mdadm -A`),
mount the corresponding file systems R/O,
create two backups (one backup per replica)
and then compare them with each other (`diff -urN`).

The question is: is it possible to assemble an array in a read-only mode,
so that the underlying block device is never written to,
the metadata in the superblock remains intact and the event count is
not incremented?

My intention is to avoid the resync when my original /dev/md0 is
reassembled from /dev/sda1 and /dev/sdb1.

If you have any other recommendations on how to interactively repair
the array (I want to be able to peek at the data being synced),
I'd appreciate you sharing them.

Regards,
Andrey.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repairing a RAID1 with non-zero mismatch_cnt
  2020-01-20 10:02 Repairing a RAID1 with non-zero mismatch_cnt Andrey ``Bass'' Shcheglov
@ 2020-01-20 10:56 ` Wols Lists
  2020-01-20 22:30   ` Andrey ``Bass'' Shcheglov
  0 siblings, 1 reply; 7+ messages in thread
From: Wols Lists @ 2020-01-20 10:56 UTC (permalink / raw)
  To: andrewbass, linux-raid

On 20/01/20 10:02, Andrey ``Bass'' Shcheglov wrote:
> Greetings,
> 
> I have a question on how to repair a RAID1 array (consisting of 2
> physical hard drives, metadata version 1.2) which went split-brain.
> 
> One of my md-devices repeatedly shows a non-zero mismatch_cnt:
> 
> # cat /sys/block/md4/md/mismatch_cnt
> 1024
> 
> Zeroing out free space (with `zerofree`, as recommended here:
> <http://decafbad.net/2017/01/03/mismatch_cnt,-raid1,-and-a-clever-fix/>)
> and disabling the swap both retain the mismatch count at the very same
> level.
> Also, none of the drives is failing (18x and 19x SMART attributes are ok).
> Checking file systems (ext4) doesn't show any problem, either, so the
> file system metadata is most probably correct, too.
> 
> The usual suspects ruled out, I'm starting to think it my data got
> corrupted, and at least one out of two replicas is affected.
> Of course I can
> 
> # echo repair > /sys/block/md0/md/sync_action
> 
> but I have a 50% chance of losing information stored on the "right" replica.
> 
> 
> So, assuming my /dev/md0 is now assembled from /dev/sda1 and /dev/sdb1,
> I feel like assemble and run two separate degraded mirrors from
> /dev/sda1 and /dev/sdb1, respectively (`mdadm -A`),
> mount the corresponding file systems R/O,
> create two backups (one backup per replica)
> and then compare them with each other (`diff -urN`).
> 
> 
> The question is: is it possible to assemble an array in a read-only mode,
> so that the underlying block device is never written to,
> the metadata in the superblock remains intact and the event count is
> not incremented?
> 
> My intention is to avoid the resync when my original /dev/md0 is
> reassembled from /dev/sda1 and /dev/sdb1.
> 
Then how (assuming one drive is corrupt) are you going to re-assemble
the array without forcing a resync on that drive?
> 
> If you have any other recommendations on how to interactively repair
> the array (I want to be able to peek at the data being synced),
> I'd appreciate you sharing them.
> 
My inclination (no warranty included!) would be to shut down the array,
then assemble it with "/dev/sda1 missing" and --force if necessary. fsck
that, then rinse and repeat with the second drive.

Assuming neither drive has problems, you should then be able to assemble
--assume-clean, which will prevent the sync, otherwise you'll have to
just re-add the duff drive and let it resync.

(In other words, why worry about the resync, because if you find the
problem then you're going to have to resync to fix it, anyway.)

Hint - look at dm-integrity. I believe you can put the integrity
information elsewhere (if you've got a spare bit of disk space) so this
issue won't arise again. It's new with raid, but apparently works fine
with raid-1. Don't try it with the higher raids - 5 or 6 - yet.

> Regards,
> Andrey.
> 
Cheers,
Wol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repairing a RAID1 with non-zero mismatch_cnt
  2020-01-20 10:56 ` Wols Lists
@ 2020-01-20 22:30   ` Andrey ``Bass'' Shcheglov
  2020-04-08 22:13     ` Repairing a RAID1 with non-zero mismatch_cnt, vol. 2 Andrey ``Bass'' Shcheglov
  0 siblings, 1 reply; 7+ messages in thread
From: Andrey ``Bass'' Shcheglov @ 2020-01-20 22:30 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

Many thanks for your response, Wol.

On Mon, 20 Jan 2020 at 13:56, Wols Lists <antlists@youngman.org.uk> wrote:
>
> > The question is: is it possible to assemble an array in a read-only mode,
> > so that the underlying block device is never written to,
> > the metadata in the superblock remains intact and the event count is
> > not incremented?
> >
> > My intention is to avoid the resync when my original /dev/md0 is
> > reassembled from /dev/sda1 and /dev/sdb1.
> >
> Then how (assuming one drive is corrupt) are you going to re-assemble
> the array without forcing a resync on that drive?

Well, of course I will resync eventually, but

1. My original intention was to find the files residing on top of
corrupted blocks and then restore (rewrite) them using the data
recovered from the intact replica.
2. From my experience, an MD array may start re-syncing automatically
at system boot, and this is what I'd rather avoid -- I want to proceed
through all the steps manually, fully understanding what I is being
done.

> >
> > If you have any other recommendations on how to interactively repair
> > the array (I want to be able to peek at the data being synced),
> > I'd appreciate you sharing them.
> >
> My inclination (no warranty included!) would be to shut down the array,
> then assemble it with "/dev/sda1 missing" and --force if necessary. fsck
> that, then rinse and repeat with the second drive.
>
> Assuming neither drive has problems, you should then be able to assemble
> --assume-clean, which will prevent the sync, otherwise you'll have to
> just re-add the duff drive and let it resync.
>
> (In other words, why worry about the resync, because if you find the
> problem then you're going to have to resync to fix it, anyway.)

Thanks, will try that.

Does it make any sense to backup the superblocks of the replicas (e.
g.: using `dd`)?
And if so, then what precautions should be made before restoring the
superblock from a backup?

> Hint - look at dm-integrity. I believe you can put the integrity
> information elsewhere (if you've got a spare bit of disk space) so this
> issue won't arise again. It's new with raid, but apparently works fine
> with raid-1. Don't try it with the higher raids - 5 or 6 - yet.

Thanks, I'll give it a try.

Regards,
Andrey.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Repairing a RAID1 with non-zero mismatch_cnt, vol. 2
  2020-01-20 22:30   ` Andrey ``Bass'' Shcheglov
@ 2020-04-08 22:13     ` Andrey ``Bass'' Shcheglov
  2020-04-09  1:06       ` Phil Turmel
  2020-04-09  7:19       ` Robin Hill
  0 siblings, 2 replies; 7+ messages in thread
From: Andrey ``Bass'' Shcheglov @ 2020-04-08 22:13 UTC (permalink / raw)
  To: linux-raid

Greetings,

I was posting a question on non-zero mismatch_cnt here a while ago.

Now I have backed up all the data (from both replicas of the mirror,
and the two data copies turned out to be identical),
so I was finally free to mess with my mirror w/o risking the data.

Mi first idea was that the metadata (ownership, mtime, ctime, atime)
have diverged, so I ran

chown -R (a couple of times) with a
touch (recursively)

Then I once again zeroed out the free space (with zerofree).

The above didn't help, with mismatch_cnt holding at 1024.

Then I deleted the data,
removed and re-created disk partitions (within the raid1 device),
formatted the partitions as ext4,
ran tune2fs
and zeroed out the free space -- yet another time.

Now I have 3 empty ext4 partitions with only lost+found directories.

And the value of mismatch_cnt dropped to 384.

Okay, so far, so good. I don't have any data, so a repair action can't
possibly harm it.

> echo repair >>/sys/block/md4/md/sync_action

And the value of mismatch_cnt is still 384.

My first guess was that one of the hard drives was degrading, but
SMART attributes of both disks are ok (and nearly identical).

Can you please propose the explanation of the non-zero value?
And what else can I do to finally make it drop to zero (w/o
reassembling the whole array)?

Regards,
Andrey.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repairing a RAID1 with non-zero mismatch_cnt, vol. 2
  2020-04-08 22:13     ` Repairing a RAID1 with non-zero mismatch_cnt, vol. 2 Andrey ``Bass'' Shcheglov
@ 2020-04-09  1:06       ` Phil Turmel
  2020-04-09  7:19       ` Robin Hill
  1 sibling, 0 replies; 7+ messages in thread
From: Phil Turmel @ 2020-04-09  1:06 UTC (permalink / raw)
  To: andrewbass, linux-raid

Hi Andrew,

On 4/8/20 6:13 PM, Andrey ``Bass'' Shcheglov wrote:
> Greetings,

[trim /]

> Okay, so far, so good. I don't have any data, so a repair action can't
> possibly harm it.
> 
>> echo repair >>/sys/block/md4/md/sync_action
> 
> And the value of mismatch_cnt is still 384.
> 
> My first guess was that one of the hard drives was degrading, but
> SMART attributes of both disks are ok (and nearly identical).
> 
> 
> Can you please propose the explanation of the non-zero value?
> And what else can I do to finally make it drop to zero (w/o
> reassembling the whole array)?

This has always been a phenomenon possible with raid1, particularly when 
swap is involved anywhere on top, but also where filesystems don't 
exactly fill the raid device.  Tail-packing inodes can leave strays, too.

My understanding is that it is an artifact of abandoned writes from 
buffer cache, where the write to at least one mirror made it to the 
device, but not to all mirrors.  Logically harmless, except when scrubbing.

If it bothers you that much, fail one device, fill the entire device 
with zeroes (dd with bs=512), then repartition, --add and wait for 
rebuild complete.  Then fail and do the same with the other.  Use 
--replace with a temporary third device if degraded is too risky.

Phil

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repairing a RAID1 with non-zero mismatch_cnt, vol. 2
  2020-04-08 22:13     ` Repairing a RAID1 with non-zero mismatch_cnt, vol. 2 Andrey ``Bass'' Shcheglov
  2020-04-09  1:06       ` Phil Turmel
@ 2020-04-09  7:19       ` Robin Hill
  2020-04-09 21:18         ` Andrey ``Bass'' Shcheglov
  1 sibling, 1 reply; 7+ messages in thread
From: Robin Hill @ 2020-04-09  7:19 UTC (permalink / raw)
  To: Andrey ``Bass'' Shcheglov; +Cc: linux-raid

On Thu Apr 09, 2020 at 01:13:40AM +0300, Andrey ``Bass'' Shcheglov wrote:

> Now I have 3 empty ext4 partitions with only lost+found directories.
> 
> And the value of mismatch_cnt dropped to 384.
> 
> 
> Okay, so far, so good. I don't have any data, so a repair action can't
> possibly harm it.
> 
> > echo repair >>/sys/block/md4/md/sync_action
> 
> And the value of mismatch_cnt is still 384.
> 
The mismatch_cnt after repair indicates how many repairs were completed.
You need to run a new repair/check to see whether there are any
remaining mismatches.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repairing a RAID1 with non-zero mismatch_cnt, vol. 2
  2020-04-09  7:19       ` Robin Hill
@ 2020-04-09 21:18         ` Andrey ``Bass'' Shcheglov
  0 siblings, 0 replies; 7+ messages in thread
From: Andrey ``Bass'' Shcheglov @ 2020-04-09 21:18 UTC (permalink / raw)
  To: robin; +Cc: linux-raid

Indeed.

The subsequent check resulted in the zero value of mismatch_cnt.

Thank you Robin.

Regards,
Andrey.

On Thu, 9 Apr 2020 at 10:19, Robin Hill <robin@robinhill.me.uk> wrote:
>
> The mismatch_cnt after repair indicates how many repairs were completed.
> You need to run a new repair/check to see whether there are any
> remaining mismatches.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-09 21:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-01-20 10:02 Repairing a RAID1 with non-zero mismatch_cnt Andrey ``Bass'' Shcheglov
2020-01-20 10:56 ` Wols Lists
2020-01-20 22:30   ` Andrey ``Bass'' Shcheglov
2020-04-08 22:13     ` Repairing a RAID1 with non-zero mismatch_cnt, vol. 2 Andrey ``Bass'' Shcheglov
2020-04-09  1:06       ` Phil Turmel
2020-04-09  7:19       ` Robin Hill
2020-04-09 21:18         ` Andrey ``Bass'' Shcheglov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).