Re: RAID6 - repeated hot-pulls issue

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: John Gehring <john.gehring@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID6 - repeated hot-pulls issue
Date: Mon, 5 Dec 2011 17:15:40 +1100	[thread overview]
Message-ID: <20111205171540.4fe659e2@notabene.brown> (raw)
In-Reply-To: <CALwOXvL6avy3f76TcpMdMxqObCRY-SphRErjo-j1=iun-etBpA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3290 bytes --]

On Fri, 2 Dec 2011 09:34:40 -0700 John Gehring <john.gehring@gmail.com> wrote:

> I am having trouble with a hot-pull scenario.
> 
> - linux 2.6.38.8
> - LSI 2008 sas
> - RAID6 via md
> - 8 drives (2 TB each)
> 
> Suspect sequence:
> 
> 1 - Create Raid6 array using all 8 drives (/dev/md1). Each drive is
> partitioned identically with two partitions. The second partition of
> each drive is used for the raid set. The size of the partition varies,
> but I have been using a 4GB partition for testing in order to have
> quick re-sync times.
> 2 - Wait for raid re-sync to complete.
> 3 - Start read-only IO against /dev/md1 via following command:  dd
> if=/dev/md1 of=/dev/null bs=1  This step insures that pulled drives
> are detected by the md.
> 4 - Physically pull a drive from the array.
> 5 - Verify that the md has removed the drive/device from the array.
> mdadm --detail /dev/md1 should show it as faulty and removed from the
> array.
> 6 - Remove the device from the raid array:  mdadm /dev/md1 -r /dev/sd[?]2
> 7 - Re-insert the drive back into the slot.
> 8 - Take a look at dmesg to see what device name has been assigned.
> Typically has the same letter assigned as before.
> 9 - Add the drive back into the raid array: mdadm /dev/md1 -a
> /dev/sd[?]2   Now some folks might say that I should use --re-add, but
> the mdadm documentation states that re-add will be used anyway if the
> system detects that a drive has been 're-inserted'. Additionally, the
> mdadm response to this command shows that an 'add' or 'readd' was
> executed depending on the state of the disk inserted.
> --All is apparently going fine at this point. The add command succeeds
> and cat /proc/mdstat shows the re-sync in progress and it eventually
> finishes.
> --Now for the interesting part.
> 10 - Verify that the dd command is still running.
> 11 - Pull the same drive again.
> 
> This time, the device is not removed from the array, although it is
> marked as faulty in the /proc/mdstat report.
> 
> In mdadm --detail /dev/md1, the device is still in the raid set and is
> marked as "faulty spare rebuilding". I have not found a command that
> will remove drive from the raid set at this point. There were a couple
> of instances/tests where after 10+ minutes, the device came out of the
> array and was simply marked faulty, at which point I could add a new
> drive, but that has been the exception. Usually, it remains in the
> 'faulty spare rebuilding' mode.
> 
> I don't understand why there is different behavior the second time the
> drive is pulled. I tried zeroing out both partitions on the drive,
> re-partitioning, mdadm --zero-superblock, but still the same behavior.
> If I pull a drive and replace it, I am able to do a subsequent pull of
> the new drive without trouble, albeit only once.
> 
> Comments? Suggestions? I'm glad to provide more info.
>

Yes, strange.

The only think that should stop you being able to remove the device is if
there are outstanding IO requests.

Maybe the driver is being slow in aborting requests the second time.  Could
be a driver bug on the LSI.

You could try using blktrace to watch all the requests and make sure every
request that starts also completes....

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2011-12-05  6:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-02 16:34 RAID6 - repeated hot-pulls issue John Gehring
2011-12-05  6:15 ` NeilBrown [this message]
2012-01-21 17:16   ` Alexander Lyakas
     [not found]     ` <CALwOXvL9c32=BstLn7BHF2PkwnS3UOM-cOGSRQep=eWX7FQiwA@mail.gmail.com>
2012-01-31 10:49       ` Alexander Lyakas
2012-01-31 15:46         ` John Gehring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111205171540.4fe659e2@notabene.brown \
    --to=neilb@suse.de \
    --cc=john.gehring@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).