public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: clplayer <cl.player@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Content Of Files May Be Changed After One Disk Is Failed In RAID5
Date: Fri, 7 Sep 2012 12:33:48 +1000	[thread overview]
Message-ID: <20120907123348.798dfc28@notabene.brown> (raw)
In-Reply-To: <CAOHz1948UduxDDvpA33T9BaM4QoMpwF1wAQKjY3UgSSOOy6k8g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

On Fri, 7 Sep 2012 09:40:18 +0800 clplayer <cl.player@gmail.com> wrote:

> I am stressing the RAID5 functions on my desktop.
> 
> I installed 8 hard disks which 4 were on the internal SATA ports and
> the others were connected via eSATA.
> 
> The operating system on the desktop is Ubuntu 12.04.1 LTS 64-bit.
> 
> I have made a script to check the files in the raid while there are
> disks becoming failed.
> 
> The actions are as below:
> 
> 1. creating an 8-disk raid, one of the 8 disks is set as the spare.
> 2. making a ext4 file system on the raid and mounting that raid.
> 3. generating a file from /dev/urandom in the root file system, and
> the size of the file is 1GB.
> 4. calculating the checksum of the file by the command "cksum."
> 5. making 10 duplicates of the file and store in the raid, and then
> calculating the checksums of each duplicate.
> 6. setting one of the disks in the raid to be failed after the 10
> duplicates are stored and checked.
> 7. parallelly calculating the checksums of the duplicates again immediately.
> 
> Curiously, there are usually several files changed and the checksums
> are not consistent.
> 
> Then I tried the same senario with the 8-disk reaid with no spare, and
> the results is the same.
> 
> I have also tried with RAID1 and RAID6, and the checksums are
> consistent with the two algorithms.
> 
> It looks like there are something wrong within the raid5 functions. I
> am tracing the file raid5.c but I can not figure out the
> 
> root causes yet.
> 
> Would someone please suggest any ideas? Thank you very much.
> 
> My script is attached below:
> 
> #!/bin/sh
> 
> TESTSEQ="0 1 2 3 4 5 6 7 8 9"
> 
> mdadm --create /dev/md0 --level=raid5 --raid-devices=7
> --spare-devices=1 /dev/sd[a-h]3 --assume-clean -z 10485760 -f -R

--assume-clean is not safe with RAID5 unless the array actually is clean.
It is safe with RAID1 and RAID6 due to details of the specific implementation.
So I suspect that is the cause of the corruption.

NeilBrown

> 
> mkfs.ext4 /dev/md0
> 
> mount /dev/md0 /mnt
> 
> #duplicating the source file and calculating the checksum
> for ITEM in $TESTSEQ
> do
>         echo "copying 1Gr.${ITEM}..."
>         cp /1Gr /mnt/1Gr.${ITEM}
> 
>         cksum /mnt/1Gr.${ITEM} >> /tmp/cksum_org.${ITEM}
>         cat /tmp/cksum_org.${ITEM} | while read tmpline
>         do
>                 orgcksum=${tmpline%% *}
>                 echo "checksum is ${orgcksum}"
>         done
> done
> 
> sync
> 
> sleep 10
> 
> mdadm -f /dev/md0 /dev/sdb3
> 
> echo "producing checksum..."
> for ITEM in $TESTSEQ
> do
>         cksum /md0/1Gr.${ITEM} > /tmp/cksum_out.${ITEM} &
> done
> 
> #wait for the 10 cksum process being done
> sleep 120
> 
> echo "checking the result..."
> for ITEM in $TESTSEQ
> do
>         cat /tmp/cksum_out.${ITEM} | while read line
>         do
>                 item=${line%% *}
> 
> 		#the value 2606882893 was pre-calculated manually
>                 if [ x"$item" != "x2606882893" ]
>                 then
>                         echo "get wrong cksum on ${ITEM}"
>                 else
>                         rm /tmp/cksum_out.${ITEM}
>                 fi
>         done
> done
> 
> Thanks.
> Peng.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-09-07  2:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-07  1:40 Content Of Files May Be Changed After One Disk Is Failed In RAID5 clplayer
2012-09-07  2:33 ` NeilBrown [this message]
2012-09-07  6:30   ` clplayer
2012-09-07  6:48     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120907123348.798dfc28@notabene.brown \
    --to=neilb@suse.de \
    --cc=cl.player@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox