From: Tejun Heo <tj@kernel.org>
To: Ric Wheeler <rwheeler@redhat.com>
Cc: Andrei Tanas <andrei@tanas.ca>, NeilBrown <neilb@suse.de>,
linux-kernel@vger.kernel.org,
IDE/ATA development list <linux-ide@vger.kernel.org>,
linux-scsi@vger.kernel.org, Jeff Garzik <jgarzik@redhat.com>,
Mark Lord <mlord@pobox.com>
Subject: Re: MD/RAID time out writing superblock
Date: Mon, 31 Aug 2009 17:10:43 +0900 [thread overview]
Message-ID: <4A9B8583.9050601@kernel.org> (raw)
In-Reply-To: <4A970154.2020507@redhat.com>
Ric Wheeler wrote:
> On 08/27/2009 05:22 PM, Andrei Tanas wrote:
>> Hello,
>>
>> This is about the same problem that I wrote two days ago (md gets an
>> error
>> while writing superblock and fails a hard drive).
>>
>> I've tried to figure out what's really going on, and as far as I can
>> tell,
>> the disk doesn't really fail (as confirmed by multiple tests), it
>> times out
>> trying to execute ATA_CMD_FLUSH_EXT ("at2.00 cmd ea..." in the log)
>> command. The reason for this I believe is that md_super_write queues the
>> write comand with BIO_RW_SYNCIO flag.
>> As I wrote before, with 32MB cache it is conceivable that it will take
>> the
>> drive longer than 30 seconds (defined by SD_TIMEOUT in scsi/sd.h) to
>> flush
>> its buffers.
>>
>> Changing safe_mode_delay to more conservative 2 seconds should definitely
>> help, but is it really necessary to write the superblock synchronously
>> when
>> array changes status from active to active-idle?
>>
>> [90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>> frozen
>> [90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>> [90307.328277] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
>> (timeout)
>> [90307.328280] ata2.00: status: { DRDY }
>> [90307.328288] ata2: hard resetting link
>> [90313.218511] ata2: link is slow to respond, please be patient (ready=0)
>> [90317.377711] ata2: SRST failed (errno=-16)
>> [90317.377720] ata2: hard resetting link
>> [90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [90318.338026] ata2.00: configured for UDMA/133
>> [90318.338062] ata2: EH complete
>> [90318.370625] end_request: I/O error, dev sdb, sector 1953519935
>> [90318.370632] md: super_written gets error=-5, uptodate=0
>>
>>
>
> 30 seconds is a very long time for a drive to respond, but I think that
> your explanation fits the facts pretty well...
Even with 32MB cache, 30secs should be more than enough. It's not
like the drive is gonna do random write on those. It's likely to make
only very few number of strokes over the platter and it really
shouldn't take very long. I'm yet to see an actual case where a
properly functioning drive timed out flush because the flush itself
took long enough.
> The drive might take a longer time like this when doing error handling
> (sector remapping, etc), but then I would expect to see your remapped
> sector count grow.
Yes, this is a possibility and according to the spec, libata EH should
be retrying flushes a few times before giving up but I'm not sure
whether keeping retrying for several minutes is a good idea either.
Is it?
Thanks.
--
tejun
next prev parent reply other threads:[~2009-08-31 8:11 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <004e01ca25e4$c11a54e0$434efea0$@ca>
[not found] ` <9cfb6af689a7010df166fdebb1ef516b.squirrel@neil.brown.name>
[not found] ` <4A948A82.4080901@redhat.com>
[not found] ` <b585ed9f13649050bbc984869d081315.squirrel@neil.brown.name>
[not found] ` <4A94905F.7050705@redhat.com>
[not found] ` <005101ca25f4$09006830$1b013890$@ca>
[not found] ` <4A94A0E6.4020401@redhat.com>
[not found] ` <005401ca25ff$9ac91cc0$d05b5640$@ca>
[not found] ` <4A950FA6.4020408@redhat.com>
[not found] ` <92cb16daad8278b0aa98125b9e1d057a@localhost>
[not found] ` <4A95573A.6090404@redhat.com>
2009-08-26 18:12 ` MD/RAID: what's wrong with sector 1953519935? Andrei Tanas
2009-08-27 0:07 ` Mark Lord
2009-08-27 1:37 ` Andrei Tanas
2009-08-27 2:33 ` Robert Hancock
[not found] ` <d086b110526f8bac2f562850dfc70b03@localhost>
2009-08-27 21:57 ` MD/RAID time out writing superblock Ric Wheeler
2009-08-31 8:10 ` Tejun Heo [this message]
2009-08-31 12:04 ` Ric Wheeler
2009-08-31 12:20 ` Tejun Heo
2009-09-07 11:44 ` Chris Webb
2009-09-07 11:59 ` Chris Webb
2009-09-09 12:02 ` Chris Webb
2009-09-14 7:41 ` Tejun Heo
2009-09-14 7:44 ` Tejun Heo
2009-09-14 12:48 ` Mark Lord
2009-09-14 13:05 ` Tejun Heo
2009-09-14 14:25 ` Mark Lord
2009-09-16 23:19 ` Chris Webb
2009-09-17 13:29 ` Mark Lord
2009-09-17 13:32 ` Mark Lord
2009-09-17 13:37 ` Chris Webb
2009-09-17 15:35 ` Tejun Heo
2009-09-17 16:16 ` Mark Lord
2009-09-17 16:17 ` Mark Lord
2009-09-18 17:05 ` Chris Webb
2009-09-21 10:26 ` Chris Webb
2009-09-21 19:47 ` Mark Lord
2009-09-22 6:16 ` Robert Hancock
2009-09-20 18:36 ` Robert Hancock
2009-09-14 13:11 ` Henrique de Moraes Holschuh
2009-09-14 13:24 ` Tejun Heo
2009-09-14 14:02 ` Henrique de Moraes Holschuh
2009-09-14 14:34 ` Tejun Heo
2009-09-14 13:14 ` Gabor Gombas
2009-09-07 16:55 ` Allan Wind
2009-09-07 23:26 ` Thomas Fjellstrom
2009-09-14 7:46 ` Tejun Heo
2009-09-14 21:13 ` Thomas Fjellstrom
2009-09-14 22:23 ` Tejun Heo
2009-09-16 22:28 ` Chris Webb
2009-09-16 23:47 ` Tejun Heo
2009-09-17 0:34 ` Neil Brown
2009-09-17 12:00 ` Chris Webb
2009-09-17 11:57 ` Chris Webb
2009-09-17 15:44 ` Tejun Heo
2009-09-17 16:36 ` Allan Wind
2009-09-18 0:16 ` Tejun Heo
2009-09-18 2:47 ` Allan Wind
2009-09-18 17:07 ` Chris Webb
2009-09-20 18:46 ` Robert Hancock
2009-09-21 0:02 ` Kyle Moffett
2009-09-17 13:35 ` Mark Lord
2009-09-17 15:47 ` Tejun Heo
2009-08-31 12:21 ` Mark Lord
2009-08-31 23:45 ` Mark Lord
2009-09-01 13:07 ` Andrei Tanas
2009-09-01 13:15 ` Mark Lord
2009-09-01 13:30 ` Tejun Heo
2009-09-01 13:47 ` Ric Wheeler
2009-09-01 14:18 ` Andrei Tanas
2009-09-02 21:58 ` Allan Wind
2009-09-04 19:39 ` Andrei Tanas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A9B8583.9050601@kernel.org \
--to=tj@kernel.org \
--cc=andrei@tanas.ca \
--cc=jgarzik@redhat.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mlord@pobox.com \
--cc=neilb@suse.de \
--cc=rwheeler@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).