From: Mark Lord <liml@rtr.ca>
To: Tejun Heo <tj@kernel.org>
Cc: Chris Webb <chris@arachsys.com>,
Ric Wheeler <rwheeler@redhat.com>, Andrei Tanas <andrei@tanas.ca>,
NeilBrown <neilb@suse.de>,
linux-kernel@vger.kernel.org,
IDE/ATA development list <linux-ide@vger.kernel.org>,
linux-scsi@vger.kernel.org, Jeff Garzik <jgarzik@redhat.com>,
Mark Lord <mlord@pobox.com>
Subject: Re: MD/RAID time out writing superblock
Date: Thu, 17 Sep 2009 09:35:19 -0400 [thread overview]
Message-ID: <4AB23B17.2040204@rtr.ca> (raw)
In-Reply-To: <4AB17905.90606@kernel.org>
Tejun Heo wrote:
> Hello,
>
> Chris Webb wrote:
>> Hi Tejun. Thanks for following up to this. We've done some more
>> experimentation over the last couple of days based on your
>> suggestions and thoughts.
>>
>> Tejun Heo <tj@kernel.org> writes:
>>> Seriously, it's most likely a hardware malfunction although I can't tell
>>> where the problem is with the given data. Get the hardware fixed.
>> We know this isn't caused by a single faulty piece of hardware,
>> because we have a cluster of identical machines and all have shown
>> this behaviour. This doesn't mean that there isn't a hardware
>> problem, but if there is one, it's a design problem or firmware bug
>> affecting all of our hosts.
>
> If it's multiple machines, it's much less likely to be faulty drives,
> but if the machines are configured mostly identically, hardware
> problems can't be ruled out either.
>
>> There have also been a few reports of problems which look very
>> similar in this thread from people with somewhat different hardware
>> and drives to ours.
>
> I wouldn't connect the reported cases too eagerly at this point. Too
> many different causes end up showing similar symptoms especially with
> timeouts.
>
>>> The aboves are IDENTIFY. Who's issuing IDENTIFY regularly? It isn't
>>> from the regular IO paths or md. It's probably being issued via SG_IO
>>> from userland. These failures don't affect normal operation.
>> [...]
>>> Oooh, another possibility is the above continuous IDENTIFY tries.
>>> Doing things like that generally isn't a good idea because vendors
>>> don't expect IDENTIFY to be mixed regularly with normal IOs and
>>> firmwares aren't tested against that. Even smart commands sometimes
>>> cause problems. So, finding out the thing which is obsessed with the
>>> identity of the drive and stopping it might help.
>> We tracked this down to some (excessively frequent!) monitoring we
>> were doing using smartctl. Things were improved considerably by
>> stopping smartd and disabling all callers of smartctl, although it
>> doesn't appear to have been a cure. The frequency of these timeouts
>> during resync seems to have gone from about once every two hours to
>> about once a day, which means we've been able to complete some
>> resyncs whereas we were unable to before.
>
> That's interesting. One important side effect of issuing IDENTIFY is
> that they will serialize command streams as they are not NCQ commands
> and thus could change command patterns significantly.
..
SMART is the opcode that is most frequently implicated here, not IDENTIFY.
Note that even a barrier FLUSH CACHE is non NCQ and will serialize the stream.
Cheers
next prev parent reply other threads:[~2009-09-17 13:35 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-26 0:32 MD/RAID: what's wrong with sector 1953519935? Andrei Tanas
2009-08-26 0:50 ` NeilBrown
2009-08-26 1:06 ` Ric Wheeler
2009-08-26 1:24 ` NeilBrown
2009-08-26 1:31 ` Ric Wheeler
2009-08-26 2:22 ` Andrei Tanas
2009-08-26 2:41 ` Ric Wheeler
2009-08-26 3:45 ` Andrei Tanas
2009-08-26 10:34 ` Ric Wheeler
2009-08-26 14:46 ` Andrei Tanas
2009-08-26 14:49 ` Andrei Tanas
2009-08-26 15:39 ` Ric Wheeler
2009-08-26 18:12 ` Andrei Tanas
2009-08-26 18:12 ` Andrei Tanas
2009-08-27 0:07 ` Mark Lord
2009-08-27 1:37 ` Andrei Tanas
2009-08-27 1:37 ` Andrei Tanas
2009-08-27 2:33 ` Robert Hancock
2009-08-27 21:22 ` MD/RAID time out writing superblock Andrei Tanas
2009-08-27 21:57 ` Ric Wheeler
2009-08-31 8:10 ` Tejun Heo
2009-08-31 12:04 ` Ric Wheeler
2009-08-31 12:20 ` Tejun Heo
2009-09-07 11:44 ` Chris Webb
2009-09-07 11:59 ` Chris Webb
2009-09-09 12:02 ` Chris Webb
2009-09-14 7:41 ` Tejun Heo
2009-09-14 7:44 ` Tejun Heo
2009-09-14 12:48 ` Mark Lord
2009-09-14 13:05 ` Tejun Heo
2009-09-14 14:25 ` Mark Lord
2009-09-16 23:19 ` Chris Webb
2009-09-17 13:29 ` Mark Lord
2009-09-17 13:32 ` Mark Lord
2009-09-17 13:37 ` Chris Webb
2009-09-17 15:35 ` Tejun Heo
2009-09-17 16:16 ` Mark Lord
2009-09-17 16:17 ` Mark Lord
2009-09-18 17:05 ` Chris Webb
2009-09-20 17:35 ` Allan Wind
2009-09-28 5:32 ` Allan Wind
2009-09-21 10:26 ` Chris Webb
2009-09-21 19:47 ` Mark Lord
2009-09-22 6:16 ` Robert Hancock
2009-09-20 18:36 ` Robert Hancock
2009-09-14 13:11 ` Henrique de Moraes Holschuh
2009-09-14 13:24 ` Tejun Heo
2009-09-14 14:02 ` Henrique de Moraes Holschuh
2009-09-14 14:34 ` Tejun Heo
2009-09-14 13:14 ` Gabor Gombas
2009-09-07 16:55 ` Allan Wind
2009-09-07 23:26 ` Thomas Fjellstrom
2009-09-07 23:26 ` Thomas Fjellstrom
2009-09-14 7:46 ` Tejun Heo
2009-09-14 21:13 ` Thomas Fjellstrom
2009-09-14 22:23 ` Tejun Heo
2009-09-07 16:55 ` Allan Wind
2009-09-16 22:28 ` Chris Webb
2009-09-16 23:47 ` Tejun Heo
2009-09-17 0:34 ` Neil Brown
2009-09-17 12:00 ` Chris Webb
2009-09-17 11:57 ` Chris Webb
2009-09-17 15:44 ` Tejun Heo
2009-09-17 16:36 ` Allan Wind
2009-09-18 0:16 ` Tejun Heo
2009-09-18 2:47 ` Allan Wind
2009-09-18 17:07 ` Chris Webb
2009-09-20 18:46 ` Robert Hancock
2009-09-21 0:02 ` Kyle Moffett
2009-09-17 13:35 ` Mark Lord [this message]
2009-09-17 15:47 ` Tejun Heo
2009-08-31 12:21 ` Mark Lord
2009-08-31 23:45 ` Mark Lord
2009-09-01 13:07 ` Andrei Tanas
2009-09-01 13:07 ` Andrei Tanas
2009-09-01 13:15 ` Mark Lord
2009-09-01 13:30 ` Tejun Heo
2009-09-01 13:47 ` Ric Wheeler
2009-09-01 14:18 ` Andrei Tanas
2009-09-01 14:18 ` Andrei Tanas
2009-09-14 5:30 ` Marc Giger
2009-09-14 5:30 ` Marc Giger
2009-09-02 21:58 ` Allan Wind
2009-09-04 19:39 ` Andrei Tanas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AB23B17.2040204@rtr.ca \
--to=liml@rtr.ca \
--cc=andrei@tanas.ca \
--cc=chris@arachsys.com \
--cc=jgarzik@redhat.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mlord@pobox.com \
--cc=neilb@suse.de \
--cc=rwheeler@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.