linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Rabbitson <rabbit+list@rabbit.us>
To: linux-raid@vger.kernel.org
Subject: BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)
Date: Mon, 28 Jan 2008 11:05:26 +0100	[thread overview]
Message-ID: <479DA8E6.6040209@rabbit.us> (raw)

[-- Attachment #1: Type: text/plain, Size: 2719 bytes --]

Hello,

It seems that mdadm/md do not perform proper sanity checks before adding a 
component to a degraded array. If the size of the new component is just right, 
the superblock information will overlap with the data area. This will happen 
without any error indications in the syslog or otherwise.

I came up with a reproducible scenario which I am attaching to this email 
alongside with the entire test script. I have not tested it for other raid 
levels, or other types of superblocks, but I suspect the same problem will 
occur for many other configurations.

I am willing to test patches, however the attached script is non-intrusive 
enough to be executed anywhere.

The output of the script follows bellow.

Peter

======================================================================
======================================================================
======================================================================

root@Thesaurus:/media/space/testmd# ./md_overlap_test
Creating component 1 (1056768 bytes)... done.
Creating component 2 (1056768 bytes)... done.
Creating component 3 (1056768 bytes)... done.


===============================================================
Creating 3 disk raid5 array with v1.1 superblock
mdadm: array /dev/md9 started.
Waiting for resync to finish... done.

md9 : active raid5 loop3[3] loop2[1] loop1[0]
       2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Initial checksum of raw raid5 device: 4df1921524a3b717a956fceaed0ae691  /dev/md9


===============================================================
Failing first componnent
mdadm: set /dev/loop1 faulty in /dev/md9
mdadm: hot removed /dev/loop1

md9 : active raid5 loop3[3] loop2[1]
       2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/2] [_UU]

Checksum of raw raid5 device after failing componnent: 
4df1921524a3b717a956fceaed0ae691  /dev/md9


===============================================================
Re-creating block device with size 1048576 bytes, so both the superblock and 
data start at the same spot
Adding back to array
mdadm: added /dev/loop1
Waiting for resync to finish... done.

md9 : active raid5 loop1[4] loop3[3] loop2[1]
       2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Checksum of raw raid5 device after adding back smaller component: 
bb854f77ad222d224fcdd8c8f96b51f0  /dev/md9


===============================================================
Attempting recovery
Waiting for recovery to finish... done.
Performing check
Waiting for check to finish... done.

Current value of mismatch_cnt: 0

Checksum of raw raid5 device after repair/check: 
146f5c37305c42cda64538782c8c3794  /dev/md9
root@Thesaurus:/media/space/testmd#

[-- Attachment #2: md_overlap_test --]
[-- Type: text/plain, Size: 3212 bytes --]

#!/bin/bash

echo "Please read the script first, and comment the exit line at the top."
echo "This script will require about 3MB of free space, it will free (and use)"
echo "loop devices 1 2 and 3, and will use the md device number specified in MD_DEV."
exit 0

MD_DEV="md9"    # make sure this is not an array you use
COMP_NUM=3
COMP_SIZE=$((1 * 1024 * 1024 + 8192))     #1MiB comp sizes with room for 8k (16 sect) of metadata

mdadm -S /dev/$MD_DEV &>/dev/null

DEVS=""
for i in $(seq $COMP_NUM); do
    echo -n "Creating component $i ($COMP_SIZE bytes)... "
    losetup -d /dev/loop${i} &>/dev/null
    
    set -e
    PCMD="print \"\\x${i}${i}\" x $COMP_SIZE"   # fill entire image with the component number (0xiiiiiii...)
    perl -e "$PCMD" > dummy${i}.img
    losetup /dev/loop${i} dummy${i}.img
    DEVS="$DEVS /dev/loop${i}"
    set +e
    echo "done."
done

echo
echo
echo "==============================================================="
echo "Creating $COMP_NUM disk raid5 array with v1.1 superblock"
# superblock at beginning of blockdev guarantees that it will overlap with real data, not with parity
mdadm -C /dev/$MD_DEV -l 5 -n $COMP_NUM -e 1.1 $DEVS

echo -n "Waiting for resync to finish..."
while [ $(cat /sys/block/$MD_DEV/md/sync_action) != "idle" ] ; do
    echo -n "."
    sleep 1
done
echo " done."
echo
grep -A1 $MD_DEV /proc/mdstat 

echo
echo -n "Initial checksum of raw raid5 device: "
md5sum /dev/$MD_DEV

echo
echo
echo "==============================================================="
echo "Failing first componnent"
mdadm -f /dev/$MD_DEV /dev/loop1
mdadm -r /dev/$MD_DEV /dev/loop1

echo
grep -A1 $MD_DEV /proc/mdstat 

echo
echo -n "Checksum of raw raid5 device after failing componnent: "
md5sum /dev/$MD_DEV

echo
echo
echo "==============================================================="
NEWSIZE=$(( $COMP_SIZE - $(cat /sys/block/$MD_DEV/md/rd1/offset) * 512 ))
echo "Re-creating block device with size $NEWSIZE bytes, so both the superblock and data start at the same spot"
losetup -d /dev/loop1 &>/dev/null
PCMD="print \"\\x11\" x $NEWSIZE"
perl -e "$PCMD" > dummy1.img
losetup /dev/loop1 dummy1.img

echo "Adding back to array"
mdadm -a /dev/$MD_DEV /dev/loop1

echo -n "Waiting for resync to finish..."
while [ $(cat /sys/block/$MD_DEV/md/sync_action) != "idle" ] ; do
    echo -n "."
    sleep 1
done
echo " done."

echo
grep -A1 $MD_DEV /proc/mdstat 

echo
echo -n "Checksum of raw raid5 device after adding back smaller component: "
md5sum /dev/$MD_DEV

echo
echo 
echo "==============================================================="
echo "Attempting recovery"
echo repair > /sys/block/$MD_DEV/md/sync_action
echo -n "Waiting for recovery to finish..."
while [ $(cat /sys/block/$MD_DEV/md/sync_action) != "idle" ] ; do
    echo -n "."
    sleep 1
done
echo " done."

echo "Performing check"
echo check > /sys/block/$MD_DEV/md/sync_action
echo -n "Waiting for check to finish..."
while [ $(cat /sys/block/$MD_DEV/md/sync_action) != "idle" ] ; do
    echo -n "."
    sleep 1
done
echo " done."

echo
echo -n "Current value of mismatch_cnt: "
cat /sys/block/$MD_DEV/md/mismatch_cnt

echo
echo -n "Checksum of raw raid5 device after repair/check: "
md5sum /dev/$MD_DEV

             reply	other threads:[~2008-01-28 10:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-28 10:05 Peter Rabbitson [this message]
2008-01-28 11:34 ` BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too) Neil Brown
2008-01-28 11:54   ` Peter Rabbitson
2008-01-29  0:38 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=479DA8E6.6040209@rabbit.us \
    --to=rabbit+list@rabbit.us \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).