linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Cleaning up a Raid5 after discrepancies discovered
@ 2024-06-13 16:36 dfc
  2024-06-13 19:55 ` Roman Mamedov
  0 siblings, 1 reply; 2+ messages in thread
From: dfc @ 2024-06-13 16:36 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

I noticed some data inconsistencies in my raid5 (5 disks, 3.6T per
disk) and discovered via smartmon that 1 disk was about to fail (many
reallocated sectors). Mismatch_cnt was approximately 128 at this point.
I don't have a spare 6th disk in the setup.

I dd'd the failing disk's entire contents (including partition table)
to a new (8T) disk and inserted it in the array. The new configuration
was recognized without problems. I ran check without mounting the file
system. This completed (I failed to check dmesg to see how many
inconsistencies it found). I mounted the file system and things seemed
OK.

Next I did a diff with respect to a backup (unfortunately a close but
not perfect backup). There were definitely some differencies within
some binary files.

So I ran check (again) at this point and see some mismatched sectors
in the dmesg log and the Mismatch_cnt is 128.

My question is "how to clean up this array?"

Should I try to delete the specific files I know have discrepancies
and recopy them from the backup? Does that cure the mismatches in the
space occuppied by those files?

I have seen a post with a user filling the disk with zero's and then
deleting that file to deal with mismatches in free space.

What strategy one should take when it's clear that there's been a
limited amount of bitrot?

Thanks
David

PS Detailed information is attached below

-------------------------------------------------------------------

$ uname -a
Linux xxxxxxx 6.8.11-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Sun May 26
20:05:41 UTC 2024 x86_64 GNU/Linux

$ mdadm --version
mdadm - v4.2 - 2021-12-30

$ more /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md127 : active raid5 sdi1[3] sdk1[1] sdl1[0] sdj1[2] sdh1[5]
      15627542528 blocks super 1.2 level 5, 512k chunk, algorithm 2
[5/5] [UUUUU
]
      bitmap: 0/30 pages [0KB], 65536KB chunk

unused devices: <none>

$ more /sys/block/md127/md/mismatch_cnt 
128

checking operation:

$more dmesg

[518371.195611] md/raid:md127: device sdi1 operational as raid disk 3
[518371.195621] md/raid:md127: device sdk1 operational as raid disk 1
[518371.195625] md/raid:md127: device sdl1 operational as raid disk 0
[518371.195627] md/raid:md127: device sdj1 operational as raid disk 2
[518371.195630] md/raid:md127: device sdh1 operational as raid disk 4
[518371.197612] md/raid:md127: raid level 5 active with 5 out of 5
devices, algorithm 2
[518371.233040] md127: detected capacity change from 0 to 31255085056
[518615.655340] SGI XFS with ACLs, security attributes, realtime,
scrub, quota, no debug enabled
[518615.661545] XFS (md127): Deprecated V4 format (crc=0) will not be
supported after September 2030.
[518615.661970] XFS (md127): Mounting V4 Filesystem 134d3d10-3a73-462d-
bf7f-03b2310638c1
[518616.108155] XFS (md127): Starting recovery (logdev: internal)
[518616.182117] XFS (md127): Ending recovery (logdev: internal)
[518616.182357] XFS (md127): Unmounting Filesystem 134d3d10-3a73-462d-
bf7f-03b2310638c1
[518633.338736] XFS (md127): Mounting V4 Filesystem 134d3d10-3a73-462d-
bf7f-03b2310638c1
[518633.740966] XFS (md127): Ending clean mount
[525118.638537] md: data-check of RAID array md127
[560647.736826] perf: interrupt took too long (6462 > 6453), lowering
kernel.perf_event_max_sample_rate to 30000
[757745.588678] md127: mismatch sector in range 3574914288-3574914296
[757745.588690] md127: mismatch sector in range 3574914296-3574914304
[757748.955261] md127: mismatch sector in range 3575062536-3575062544
[757827.106584] md127: mismatch sector in range 3576178688-3576178696
[779366.372926] md127: mismatch sector in range 3907250080-3907250088
[779383.573705] md127: mismatch sector in range 3907600576-3907600584
[820930.145928] md127: mismatch sector in range 4559852464-4559852472
[820930.145940] md127: mismatch sector in range 4559852472-4559852480
[820930.145943] md127: mismatch sector in range 4559852480-4559852488
[820930.145946] md127: mismatch sector in range 4559852488-4559852496
[820930.145948] md127: mismatch sector in range 4559852496-4559852504
[820930.145953] md127: mismatch sector in range 4559852504-4559852512
[820930.145955] md127: mismatch sector in range 4559852512-4559852520
[820930.145958] md127: mismatch sector in range 4559852520-4559852528
[820930.145960] md127: mismatch sector in range 4559852528-4559852536
[820930.145963] md127: mismatch sector in range 4559852536-4559852544
[1024770.015887] md: md127: data-check done.


$ sudo mdadm --examine /dev/sdi
/dev/sdi:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdj
/dev/sdj:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdk
/dev/sdk:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdl
/dev/sdl:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdh
/dev/sdh:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdi1
/dev/sdi1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 954b2546:5c467e9c:a4eb74e3:27dad837
           Name : impala:0
  Creation Time : Fri May 22 15:32:31 2015
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 7813771264 sectors (3.64 TiB 4.00 TB)
     Array Size : 15627542528 KiB (14.55 TiB 16.00 TB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 2e1a57ff:f892fb23:1f698390:53dd98e3

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Jun 13 06:44:05 2024
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 2f92510e - correct
         Events : 209201

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAA ('A' == active, '.' == missing, 'R' ==
replacing)
$ sudo mdadm --examine /dev/sdj1
/dev/sdj1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 954b2546:5c467e9c:a4eb74e3:27dad837
           Name : impala:0
  Creation Time : Fri May 22 15:32:31 2015
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 7813771264 sectors (3.64 TiB 4.00 TB)
     Array Size : 15627542528 KiB (14.55 TiB 16.00 TB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 3be7bbb4:4e5f07e3:f78f3c31:5bd6df6b

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Jun 13 06:44:05 2024
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : cd030a4f - correct
         Events : 209201

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAA ('A' == active, '.' == missing, 'R' ==
replacing)
$ sudo mdadm --examine /dev/sdk1
/dev/sdk1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 954b2546:5c467e9c:a4eb74e3:27dad837
           Name : impala:0
  Creation Time : Fri May 22 15:32:31 2015
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 7813771264 sectors (3.64 TiB 4.00 TB)
     Array Size : 15627542528 KiB (14.55 TiB 16.00 TB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 2b09eed0:0a6ead54:48671d28:0abd1b6e

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Jun 13 06:44:05 2024
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 53f7fcb2 - correct
         Events : 209201

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAA ('A' == active, '.' == missing, 'R' ==
replacing)
$ sudo mdadm --examine /dev/sdl1
/dev/sdl1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 954b2546:5c467e9c:a4eb74e3:27dad837
           Name : impala:0
  Creation Time : Fri May 22 15:32:31 2015
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 7813771264 sectors (3.64 TiB 4.00 TB)
     Array Size : 15627542528 KiB (14.55 TiB 16.00 TB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 324b49de:233d8769:7f75afad:dddb0ec8

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Jun 13 06:44:05 2024
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 55b23724 - correct
         Events : 209201

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing, 'R' ==
replacing)
$ sudo mdadm --examine /dev/sdh1
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 954b2546:5c467e9c:a4eb74e3:27dad837
           Name : impala:0
  Creation Time : Fri May 22 15:32:31 2015
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 7813771264 sectors (3.64 TiB 4.00 TB)
     Array Size : 15627542528 KiB (14.55 TiB 16.00 TB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : f0cda836:8c1c28d1:53710d20:db8d088a

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Jun 13 06:44:05 2024
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 4a0a4721 - correct
         Events : 209201

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAA ('A' == active, '.' == missing, 'R' ==
replacing)

$ lsblk
NAME                            MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda                               8:0    0 476.9G  0 disk  
├─sda1                            8:1    0    10G  0 part  /boot
└─sda2                            8:2    0 466.9G  0 part  
  └─fedora_localhost--live-root 253:0    0 466.9G  0 lvm   /
sdb                               8:16   0   7.3T  0 disk  
└─sdb1                            8:17   0   7.3T  0 part  
sdc                               8:32   0   1.8T  0 disk  
└─sdc1                            8:33   0   1.8T  0 part  
sdd                               8:48   0   7.3T  0 disk  
└─sdd1                            8:49   0   7.3T  0 part  
sde                               8:64   0   1.8T  0 disk  
└─sde1                            8:65   0   1.8T  0 part  
sdf                               8:80   1     0B  0 disk  
sdg                               8:96   1     0B  0 disk  
sdh                               8:112  0   3.6T  0 disk  
└─sdh1                            8:113  0   3.6T  0 part  
  └─md127                         9:127  0  14.6T  0 raid5 /mnt/backup
sdi                               8:128  0   3.6T  0 disk  
└─sdi1                            8:129  0   3.6T  0 part  
  └─md127                         9:127  0  14.6T  0 raid5 /mnt/backup
sdj                               8:144  0   3.6T  0 disk  
└─sdj1                            8:145  0   3.6T  0 part  
  └─md127                         9:127  0  14.6T  0 raid5 /mnt/backup
sdk                               8:160  0   7.3T  0 disk  
└─sdk1                            8:161  0   3.6T  0 part  
  └─md127                         9:127  0  14.6T  0 raid5 /mnt/backup
sdl                               8:176  0   3.6T  0 disk  
└─sdl1                            8:177  0   3.6T  0 part  
  └─md127                         9:127  0  14.6T  0 raid5 /mnt/backup
sr0                              11:0    1  1024M  0 rom   
sr1                              11:1    1  1024M  0 rom   
zram0                           252:0    0     8G  0 disk  [SWAP]





^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Cleaning up a Raid5 after discrepancies discovered
  2024-06-13 16:36 Cleaning up a Raid5 after discrepancies discovered dfc
@ 2024-06-13 19:55 ` Roman Mamedov
  0 siblings, 0 replies; 2+ messages in thread
From: Roman Mamedov @ 2024-06-13 19:55 UTC (permalink / raw)
  To: dfc; +Cc: linux-raid@vger.kernel.org

On Thu, 13 Jun 2024 12:36:50 -0400
dfc <chernoff@astro.cornell.edu> wrote:

> I noticed some data inconsistencies in my raid5 (5 disks, 3.6T per
> disk) and discovered via smartmon that 1 disk was about to fail (many
> reallocated sectors). Mismatch_cnt was approximately 128 at this point.
> I don't have a spare 6th disk in the setup.
> 
> I dd'd the failing disk's entire contents (including partition table)
> to a new (8T) disk and inserted it in the array. The new configuration
> was recognized without problems. I ran check without mounting the file
> system. This completed (I failed to check dmesg to see how many
> inconsistencies it found). I mounted the file system and things seemed
> OK.
> 
> Next I did a diff with respect to a backup (unfortunately a close but
> not perfect backup). There were definitely some differencies within
> some binary files.

If I'm not mistaken, the regular RAID5 cannot protect from data corruption; in
case of one RAID member content becoming corrupt (but readable) the recovery
of the affected stripe consistency will likely damage the user data.

If you know one disk has corrupted content, you may be better off removing
that one from the array ASAP, and putting in a clean new disk, then rebuilding
onto that from the known-good other RAID members. (Of course then you take the
usual risk in any RAID5 rebuild, of another drive failing...)

Meanwhile RAID6 supposedly can do better and detect which disk had the wrong
content, but I remember reading something to the effect that this math may or
may not have been implemented in mdadm RAID yet.

To protect from data corruption you need a RAID coupled with a checksumming
filesystem, like Btrfs or ZFS. But Btrfs RAID5/6 are not mature and not
recommended for use.

> My question is "how to clean up this array?"
> 
> Should I try to delete the specific files I know have discrepancies
> and recopy them from the backup? Does that cure the mismatches in the
> space occuppied by those files?

I would say yes, barring some corner case I'm missing. Writing new data will
write the new and consistent stripe content for that data on all disks
including the problematic one.

> What strategy one should take when it's clear that there's been a
> limited amount of bitrot?

If you do not use a checksumming filesystem, have a tool like "cfv" store
checksum files in each dir with rarely-modified content (such as a media
library). If you had those prior to this incident, you could easily recheck
them all and tell which files need to be restored from backups or reobtained
elsewhere, or have to deal with some rot of the content (may not be a fatal
issue for video files for example).

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-06-13 20:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 16:36 Cleaning up a Raid5 after discrepancies discovered dfc
2024-06-13 19:55 ` Roman Mamedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).