raid1 (re)-add recovery data corruption

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Yi Zhang <yizhan@redhat.com>
To: linux-raid@vger.kernel.org
Cc: Xiao Ni <xni@redhat.com>, jes sorensen <jes.sorensen@redhat.com>,
	Yi Zhang <yizhan@redhat.com>
Subject: raid1 (re)-add recovery data corruption
Date: Thu, 30 Jul 2015 07:35:10 -0400 (EDT)	[thread overview]
Message-ID: <705099586.901961.1438256110567.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <543799034.898896.1438255427047.JavaMail.zimbra@redhat.com>

Hi Neil
I observed raid1 data corruption on raid1 test, below is the test env/reproduce steps/log, pls check it.

Kernel-verison: 4.2.0-rc3
Test-steps:
1. First create one 2GB file bigfile
2. Execute below script
#!/bin/bash
Create_Loop()
{
for i in `seq 0 7`;do
        dd if=/dev/zero of=/tmp/$i.tmp bs=1M count=3000 &
done
wait
for i in `seq 0 7`;do
        losetup /dev/loop$i /tmp/$i.tmp
done
}
Prepare()
{
mdadm --create --run /dev/md0 --level 1 --metadata 1.2 --raid-devices 8 /dev/loop[0-7] --chunk 512 --bitmap=internal --bitmap-chunk=64M
mdadm --wait /dev/md0
mkfs.ext4  /dev/md0
mkdir /mnt/fortest
mount /dev/md0 /mnt/fortest
md5sum bigfile  >md5sum1
}
Create_Loop
Prepare
cnt=0
while [ 1 ]; do
        echo "-----------------------------------------------------$cnt"
        cp bigfile /mnt/fortest &
        sleep 10
        mdadm /dev/md0 -f /dev/loop0
        sleep 5
        mdadm /dev/md0 -r /dev/loop0
        while [ 1 ]; do
                if [ $? -ne 0 ];then
                        sleep 5
                        mdadm /dev/md0 -r /dev/loop0
                else
                        break
                fi
        done
        sleep 30
        mdadm /dev/md0 -a /dev/loop0
        wait
        echo "cp done"
        mdadm --wait /dev/md0
        echo "recovery done"
        md5sum /mnt/fortest/bigfile > md5sum2
        tmp1=`awk '{print $1}' ./md5sum1`
        tmp2=`awk '{print $1}' ./md5sum2`
        echo $tmp1 > a
        echo $tmp2 > b
        diff a b                                         //data corruption observed
        if [ $? -ne 0 ]; then
                echo "There are some date corruption, cnt is $cnt"
                exit 1
        fi
        ((cnt++))
        rm -rf /mnt/fortest/bigfile
done


Kernel-Log:
[ 1113.577378] loop: module loaded
[ 1290.190065] md: bind<loop0>
[ 1290.193214] md: bind<loop1>
[ 1290.196387] md: bind<loop2>
[ 1290.199542] md: bind<loop3>
[ 1290.202704] md: bind<loop4>
[ 1290.205854] md: bind<loop5>
[ 1290.209003] md: bind<loop6>
[ 1290.212170] md: bind<loop7>
[ 1290.229799] md: raid1 personality registered for level 1
[ 1290.235946] md/raid1:md0: not clean -- starting background reconstruction
[ 1290.243515] md/raid1:md0: active with 8 out of 8 mirrors
[ 1290.249449] created bitmap (1 pages) for device md0
[ 1290.254927] md0: bitmap initialized from disk: read 1 pages, set 47 of 47 bits
[ 1290.328736] md0: detected capacity change from 0 to 3143630848
[ 1290.335316] md: resync of RAID array md0
[ 1290.339689] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1290.346192] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 1290.356702] md: using 128k window, over a total of 3069952k.
[ 1640.101181] md: md0: resync done.
[ 1668.352287] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
[ 1681.845966] md/raid1:md0: Disk failure on loop0, disabling device.
[ 1681.845966] md/raid1:md0: Operation continuing on 7 devices.
[ 1844.296614] md: unbind<loop0>
[ 1844.302013] md: export_rdev(loop0)
[ 1874.363488] md: bind<loop0>
[ 1874.566435] md: recovery of RAID array md0
[ 1874.571006] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1874.577514] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 1874.588224] md: using 128k window, over a total of 3069952k.
[ 1889.487210] md: md0: recovery done.


Test Log:
-----------------------------------------------------0
mdadm: set /dev/loop0 faulty in /dev/md0
mdadm: hot removed /dev/loop0 from /dev/md0
mdadm: re-added /dev/loop0
cp done
recovery done
-----------------------------------------------------1
mdadm: set /dev/loop0 faulty in /dev/md0
mdadm: hot removed /dev/loop0 from /dev/md0
mdadm: re-added /dev/loop0
cp done
recovery done
1c1
< c4eddcf325ba5741d37f164750412619
---
> 4444f8bbfb1d22f1731fb5b0c846ef8a
There are some date corruption, cnt is 1



Best Regards,
 Yi Zhang

next      parent reply	other threads:[~2015-07-30 11:35 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <543799034.898896.1438255427047.JavaMail.zimbra@redhat.com>
2015-07-30 11:35 ` Yi Zhang [this message]
2015-08-06  2:48   ` raid1 (re)-add recovery data corruption Yi Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=705099586.901961.1438256110567.JavaMail.zimbra@redhat.com \
    --to=yizhan@redhat.com \
    --cc=jes.sorensen@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).