linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Richard Scobie <richard@sauce.co.nz>
Cc: 'Linux RAID' <linux-raid@vger.kernel.org>
Subject: Re: md and sd out of sync
Date: Mon, 5 Jul 2010 10:05:17 +1000	[thread overview]
Message-ID: <20100705100517.77515870@notabene.brown> (raw)
In-Reply-To: <4C310A61.1020209@sauce.co.nz>

On Mon, 05 Jul 2010 10:25:37 +1200
Richard Scobie <richard@sauce.co.nz> wrote:

> I have 16 x 2TB drives that each partitioned into 3 equal sized partitions.
> 
> Three md RAID6 arrays have then been built, each utilising one partition 
> on each drive.
> 
> Over the weekend, one member of one array was failed out:
> 
> end_request: I/O error, dev sdz, sector 1302228737
> md: super_written gets error=-5, uptodate=0
> raid5: Disk failure on sdz1, disabling device.
> raid5: Operation continuing on 15 devices.
> 
> Checking with smartctl is not an option as the controller (LSI SAS) 
> reacts badly. On the basis of it possibly being a transitory error or a 
> sector that could be remapped on resync I re-added it to the array.
> 
> This failed part way through and cased enough disruption to the 
> controller that the whole drive was taken offline:
> 
> sd 8:0:24:0: [sdz] <6>sd 8:0:24:0: [sdz] Result: hostbyte=DID_NO_CONNECT 
> driverb
> yte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdz, sector 569772337
> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdz, sector 569769777
> sd 8:0:24:0: [sdz] <6>mptsas: ioc0: removing sata device, channel 0, id 
> 32, phy
> 11
> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdz, sector 569770097
>   port-8:1:8: mptsas: ioc0: delete port (8)
> 
> 
> I would have thought at this point for mdadm to be reporting that the 
> remaining 2 complete arrays had each lost their /dev/sdz components, but 
> this is not the case - it shows healthy arrays.
> 
> Is this expected behaviour?

Yes.  md only notices that a device has failed when it tried to perform IO
and gets an error.

The next release of mdadm will have "mdadm -incremental --fail" which can be
called by udev when udev notices a device disappearing.  mdadm will find any
array that included the given device and fail/remove it.

> 
> To complicate things further, without any intervention, the disconnected 
> drive was then recognised again as a new device and reconnected as 
> /dev/sdai:
> 
> mptsas: ioc0: attaching sata device, channel 0, id 32, phy 11
> scsi 8:0:34:0: Direct-Access     ATA      WDC WD2003FYYS-0 0D02 PQ: 0 
> ANSI: 5
> sd 8:0:34:0: [sdai] 3907029168 512-byte hardware sectors (2000399 MB)
> sd 8:0:34:0: [sdai] Write Protect is off
> sd 8:0:34:0: [sdai] Mode Sense: 73 00 00 08
> sd 8:0:34:0: [sdai] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
> sd 8:0:34:0: [sdai] 3907029168 512-byte hardware sectors (2000399 MB)
> sd 8:0:34:0: [sdai] Write Protect is off
> sd 8:0:34:0: [sdai] Mode Sense: 73 00 00 08
> sd 8:0:34:0: [sdai] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
>   sdai: sdai1 sdai2 sdai3
> sd 8:0:34:0: [sdai] Attached SCSI disk
> sd 8:0:34:0: Attached scsi generic sg26 type 0
> 
> 
> Because sdz no longer exists, I cannot fail and remove /dev/sdz2 and 
> /dev/sdz3 from the other 2 md arrays.

You can.
  mdadm /dev/mdXX --fail detached
  mdadm /dev/mdXX --remove detached

NeilBrown


> 
> I will proceed by just replacing the drive and rebooting, at which point 
> I should just be able to re-add it to all arrays, but I just wanted to 
> draw attention to how ignorant md seems to be to all the changes that 
> have occurred. Maybe things have changed in later versions:
> 
> Kernel 2.6.27.19-78.2.30.fc9.x86_64
> mdadm 2.6.4
> 
> Regards,
> 
> Richard
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


      parent reply	other threads:[~2010-07-05  0:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-04 22:25 md and sd out of sync Richard Scobie
2010-07-04 23:17 ` richard
2010-07-05  0:05 ` Neil Brown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100705100517.77515870@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=richard@sauce.co.nz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).