All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: clplayer <cl.player@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Content Of Files May Be Changed After One Disk Is Failed In RAID5
Date: Fri, 7 Sep 2012 12:33:48 +1000	[thread overview]
Message-ID: <20120907123348.798dfc28@notabene.brown> (raw)
In-Reply-To: <CAOHz1948UduxDDvpA33T9BaM4QoMpwF1wAQKjY3UgSSOOy6k8g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

On Fri, 7 Sep 2012 09:40:18 +0800 clplayer <cl.player@gmail.com> wrote:

> I am stressing the RAID5 functions on my desktop.
> 
> I installed 8 hard disks which 4 were on the internal SATA ports and
> the others were connected via eSATA.
> 
> The operating system on the desktop is Ubuntu 12.04.1 LTS 64-bit.
> 
> I have made a script to check the files in the raid while there are
> disks becoming failed.
> 
> The actions are as below:
> 
> 1. creating an 8-disk raid, one of the 8 disks is set as the spare.
> 2. making a ext4 file system on the raid and mounting that raid.
> 3. generating a file from /dev/urandom in the root file system, and
> the size of the file is 1GB.
> 4. calculating the checksum of the file by the command "cksum."
> 5. making 10 duplicates of the file and store in the raid, and then
> calculating the checksums of each duplicate.
> 6. setting one of the disks in the raid to be failed after the 10
> duplicates are stored and checked.
> 7. parallelly calculating the checksums of the duplicates again immediately.
> 
> Curiously, there are usually several files changed and the checksums
> are not consistent.
> 
> Then I tried the same senario with the 8-disk reaid with no spare, and
> the results is the same.
> 
> I have also tried with RAID1 and RAID6, and the checksums are
> consistent with the two algorithms.
> 
> It looks like there are something wrong within the raid5 functions. I
> am tracing the file raid5.c but I can not figure out the
> 
> root causes yet.
> 
> Would someone please suggest any ideas? Thank you very much.
> 
> My script is attached below:
> 
> #!/bin/sh
> 
> TESTSEQ="0 1 2 3 4 5 6 7 8 9"
> 
> mdadm --create /dev/md0 --level=raid5 --raid-devices=7
> --spare-devices=1 /dev/sd[a-h]3 --assume-clean -z 10485760 -f -R

--assume-clean is not safe with RAID5 unless the array actually is clean.
It is safe with RAID1 and RAID6 due to details of the specific implementation.
So I suspect that is the cause of the corruption.

NeilBrown

> 
> mkfs.ext4 /dev/md0
> 
> mount /dev/md0 /mnt
> 
> #duplicating the source file and calculating the checksum
> for ITEM in $TESTSEQ
> do
>         echo "copying 1Gr.${ITEM}..."
>         cp /1Gr /mnt/1Gr.${ITEM}
> 
>         cksum /mnt/1Gr.${ITEM} >> /tmp/cksum_org.${ITEM}
>         cat /tmp/cksum_org.${ITEM} | while read tmpline
>         do
>                 orgcksum=${tmpline%% *}
>                 echo "checksum is ${orgcksum}"
>         done
> done
> 
> sync
> 
> sleep 10
> 
> mdadm -f /dev/md0 /dev/sdb3
> 
> echo "producing checksum..."
> for ITEM in $TESTSEQ
> do
>         cksum /md0/1Gr.${ITEM} > /tmp/cksum_out.${ITEM} &
> done
> 
> #wait for the 10 cksum process being done
> sleep 120
> 
> echo "checking the result..."
> for ITEM in $TESTSEQ
> do
>         cat /tmp/cksum_out.${ITEM} | while read line
>         do
>                 item=${line%% *}
> 
> 		#the value 2606882893 was pre-calculated manually
>                 if [ x"$item" != "x2606882893" ]
>                 then
>                         echo "get wrong cksum on ${ITEM}"
>                 else
>                         rm /tmp/cksum_out.${ITEM}
>                 fi
>         done
> done
> 
> Thanks.
> Peng.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-09-07  2:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-07  1:40 Content Of Files May Be Changed After One Disk Is Failed In RAID5 clplayer
2012-09-07  2:33 ` NeilBrown [this message]
2012-09-07  6:30   ` clplayer
2012-09-07  6:48     ` NeilBrown
  -- strict thread matches above, loose matches on Subject: below --
2012-09-07  2:04 clplayer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120907123348.798dfc28@notabene.brown \
    --to=neilb@suse.de \
    --cc=cl.player@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.