linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Chris Webb <chris@arachsys.com>
Cc: linux-scsi@vger.kernel.org, Ric Wheeler <rwheeler@redhat.com>,
	Andrei Tanas <andrei@tanas.ca>, NeilBrown <neilb@suse.de>,
	linux-kernel@vger.kernel.org,
	IDE/ATA development list <linux-ide@vger.kernel.org>,
	Jeff Garzik <jgarzik@redhat.com>, Mark Lord <mlord@pobox.com>
Subject: Re: MD/RAID time out writing superblock
Date: Mon, 14 Sep 2009 16:41:56 +0900	[thread overview]
Message-ID: <4AADF3C4.5060004@kernel.org> (raw)
In-Reply-To: <20090909120218.GB21829@arachsys.com>

Hello, Chris.

Chris Webb wrote:
> Chris Webb <chris@arachsys.com> writes:
> 
>> I've also noticed that during this recovery, I'm seeing lots of timeouts but
>> they don't seem to interrupt the resync:
>>
>>   05:47:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>>   05:47:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>>   05:47:39         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
>>   05:47:39 ata5.00: status: { DRDY }
>>   05:47:39 ata5: hard resetting link
>>   05:47:49 ata5: softreset failed (device not ready)
>>   05:47:49 ata5: hard resetting link
>>   05:47:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>   05:47:49 ata5.00: configured for UDMA/133
>>   05:47:49 ata5: EH complete
>>   
>>   08:17:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>>   08:17:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>>   08:17:39         res 40/00:00:35:83:f8/00:00:4d:00:00/40 Emask 0x4 (timeout)
>>   08:17:39 ata5.00: status: { DRDY }
>>   08:17:39 ata5: hard resetting link
>>   08:17:49 ata5: softreset failed (device not ready)
>>   08:17:49 ata5: hard resetting link
>>   08:17:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>   08:17:49 ata5.00: configured for UDMA/133
>>   08:17:49 ata5: EH complete
>>   
>>   10:22:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>>   10:22:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>>   10:22:39         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
>>   10:22:39 ata5.00: status: { DRDY }
>>   10:22:39 ata5: hard resetting link
>>   10:22:49 ata5: softreset failed (device not ready)
>>   10:22:49 ata5: hard resetting link
>>   10:22:50 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>   10:22:51 ata5.00: configured for UDMA/133
>>   10:22:51 ata5: EH complete
> 
> ... the difference being that a timeout which causes a super_written failure
> seems to return an I/O error whereas the others don't:

The aboves are IDENTIFY.  Who's issuing IDENTIFY regularly?  It isn't
from the regular IO paths or md.  It's probably being issued via SG_IO
from userland.  These failures don't affect normal operation.

>   ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>   ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>           res 40/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
>   ata5.00: status: { DRDY }
>   ata5: hard resetting link
>   ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>   ata5.00: configured for UDMA/133
>   ata5: EH complete
>   end_request: I/O error, dev sde, sector 1465147272
>   md: super_written gets error=-5, uptodate=0
>   raid10: Disk failure on sde3, disabling device.
> 
> I wonder what's different about these two timeouts such that one causes an I/O
> error and the other just causes a retry after reset? Presumably if the latter
> was also just a retry, everything would be (closer to being) fine.

Because this error is actually seen by the md layer and FLUSH in
general can't be retried cleanly.  On retrial, the drive goes on and
retry the sectors after the point of failure.  I'm not sure whether
FLUSH is actually failing here or it's a communication glitch.  At any
rate, if FLUSH is failing or timing out, the only right thing to do is
to kick it out of the array as keeping after retrying may lead to
silent data corruption.  Seriously, it's most likely a hardware
malfunction although I can't tell where the problem is with the given
data.  Get the hardware fixed.

Thanks.

-- 
tejun

  reply	other threads:[~2009-09-14  7:44 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <004e01ca25e4$c11a54e0$434efea0$@ca>
     [not found] ` <9cfb6af689a7010df166fdebb1ef516b.squirrel@neil.brown.name>
     [not found]   ` <4A948A82.4080901@redhat.com>
     [not found]     ` <b585ed9f13649050bbc984869d081315.squirrel@neil.brown.name>
     [not found]       ` <4A94905F.7050705@redhat.com>
     [not found]         ` <005101ca25f4$09006830$1b013890$@ca>
     [not found]           ` <4A94A0E6.4020401@redhat.com>
     [not found]             ` <005401ca25ff$9ac91cc0$d05b5640$@ca>
     [not found]               ` <4A950FA6.4020408@redhat.com>
     [not found]                 ` <92cb16daad8278b0aa98125b9e1d057a@localhost>
     [not found]                   ` <4A95573A.6090404@redhat.com>
2009-08-26 18:12                     ` MD/RAID: what's wrong with sector 1953519935? Andrei Tanas
2009-08-27  0:07                       ` Mark Lord
2009-08-27  1:37                         ` Andrei Tanas
2009-08-27  2:33                       ` Robert Hancock
     [not found]                       ` <d086b110526f8bac2f562850dfc70b03@localhost>
2009-08-27 21:57                         ` MD/RAID time out writing superblock Ric Wheeler
2009-08-31  8:10                           ` Tejun Heo
2009-08-31 12:04                             ` Ric Wheeler
2009-08-31 12:20                               ` Tejun Heo
2009-09-07 11:44                                 ` Chris Webb
2009-09-07 11:59                                   ` Chris Webb
2009-09-09 12:02                                     ` Chris Webb
2009-09-14  7:41                                       ` Tejun Heo [this message]
2009-09-14  7:44                                         ` Tejun Heo
2009-09-14 12:48                                           ` Mark Lord
2009-09-14 13:05                                             ` Tejun Heo
2009-09-14 14:25                                               ` Mark Lord
2009-09-16 23:19                                                 ` Chris Webb
2009-09-17 13:29                                                   ` Mark Lord
2009-09-17 13:32                                                     ` Mark Lord
2009-09-17 13:37                                                     ` Chris Webb
2009-09-17 15:35                                                     ` Tejun Heo
2009-09-17 16:16                                                       ` Mark Lord
2009-09-17 16:17                                                         ` Mark Lord
2009-09-18 17:05                                                           ` Chris Webb
2009-09-21 10:26                                                             ` Chris Webb
2009-09-21 19:47                                                               ` Mark Lord
2009-09-22  6:16                                                               ` Robert Hancock
2009-09-20 18:36                                                         ` Robert Hancock
2009-09-14 13:11                                           ` Henrique de Moraes Holschuh
2009-09-14 13:24                                             ` Tejun Heo
2009-09-14 14:02                                               ` Henrique de Moraes Holschuh
2009-09-14 14:34                                                 ` Tejun Heo
2009-09-14 13:14                                         ` Gabor Gombas
2009-09-07 16:55                                   ` Allan Wind
2009-09-07 23:26                                     ` Thomas Fjellstrom
2009-09-14  7:46                                       ` Tejun Heo
2009-09-14 21:13                                         ` Thomas Fjellstrom
2009-09-14 22:23                                           ` Tejun Heo
2009-09-16 22:28                                 ` Chris Webb
2009-09-16 23:47                                   ` Tejun Heo
2009-09-17  0:34                                     ` Neil Brown
2009-09-17 12:00                                       ` Chris Webb
2009-09-17 11:57                                     ` Chris Webb
2009-09-17 15:44                                       ` Tejun Heo
2009-09-17 16:36                                         ` Allan Wind
2009-09-18  0:16                                           ` Tejun Heo
2009-09-18  2:47                                             ` Allan Wind
2009-09-18 17:07                                         ` Chris Webb
2009-09-20 18:46                                         ` Robert Hancock
2009-09-21  0:02                                           ` Kyle Moffett
2009-09-17 13:35                                     ` Mark Lord
2009-09-17 15:47                                       ` Tejun Heo
2009-08-31 12:21                             ` Mark Lord
2009-08-31 23:45                               ` Mark Lord
2009-09-01 13:07                                 ` Andrei Tanas
2009-09-01 13:15                                   ` Mark Lord
2009-09-01 13:30                                     ` Tejun Heo
2009-09-01 13:47                                       ` Ric Wheeler
2009-09-01 14:18                                         ` Andrei Tanas
2009-09-02 21:58                                   ` Allan Wind
2009-09-04 19:39                                     ` Andrei Tanas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AADF3C4.5060004@kernel.org \
    --to=tj@kernel.org \
    --cc=andrei@tanas.ca \
    --cc=chris@arachsys.com \
    --cc=jgarzik@redhat.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mlord@pobox.com \
    --cc=neilb@suse.de \
    --cc=rwheeler@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).