From: Ric Wheeler <rwheeler@redhat.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Chris Friesen" <chris.friesen@genband.com>,
"Mathias Burén" <mathias.buren@gmail.com>,
"Roy Sigurd Karlsbakk" <roy@karlsbakk.net>,
"Neil Brown" <neilb@suse.de>,
Linux-RAID <linux-raid@vger.kernel.org>,
"Jens Axboe" <axboe@kernel.dk>,
"IDE/ATA development list" <linux-ide@vger.kernel.org>,
linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: getting I/O errors in super_written()...any ideas what would cause this?
Date: Wed, 05 Dec 2012 06:41:02 -0500 [thread overview]
Message-ID: <50BF32CE.2010704@redhat.com> (raw)
In-Reply-To: <1354699254.2243.5.camel@dabdike.int.hansenpartnership.com>
On 12/05/2012 04:20 AM, James Bottomley wrote:
> On Tue, 2012-12-04 at 16:00 -0600, Chris Friesen wrote:
>> As another data point, it looks like we may be doing a SEND DIAGNOSTIC
>> command specifying the default self-test in addition to the background
>> short self-test. This seems a bit risky and excessive to me, but
>> apparently the guy that wrote it is no longer with the company.
> This is a really bad idea. A lot of disks go out to lunch until the
> diagnostics complete (the same goes for SMART diagnostics). This means
> that if you do diagnostics on a running device, the drivers start to get
> timeouts on commands which are queued waiting for diagnostics to
> complete ... if those go over the standard SCSI timeouts, we'll start to
> try error recovery and likely have the disaster you see above.
>
>> What is the recommended method for monitoring disks on a system that
>> is likely to go a long time between boots? Do we avoid any in-service
>> testing and just monitor the SMART data and only test it if something
>> actually goes wrong? Or should we intentionally drop a disk out of the
>> array and test it? (The downside of that is that we lose
>> redundancy since we only have 2 disks.)
> What do you mean by "monitoring" ... as in what are you looking for? To
> make sure the disk is healthy and responding, a simple test unit ready
> works. To look at other parameters, read the mode pages.
>
> Anything that actively causes the disk to go out and check something is
> a bad idea in a running environment. Only do this if you can quiesce
> the I/O before starting the active diagnostic (or drop the disk from the
> array as you suggest).
>
> To be honest, though, modern disks do a whole host of diagnostics as
> they write data just to check that it is safely committed, so passive
> monitoring should be fine.
>
> James
>
>
I don't think that the basic stat gathering (smartctl -a ....) has this kind of
impact, but am worried about the running of the diagnostics,
ric
next prev parent reply other threads:[~2012-12-05 11:41 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-28 17:52 getting I/O errors in super_written()...any ideas what would cause this? Chris Friesen
2012-11-28 18:08 ` Mathias Burén
2012-11-28 18:51 ` Roy Sigurd Karlsbakk
2012-11-28 20:21 ` Chris Friesen
2012-11-28 20:27 ` Mathias Burén
2012-11-28 20:29 ` Chris Friesen
2012-12-03 20:22 ` Ric Wheeler
2012-12-03 20:44 ` Chris Friesen
2012-12-03 20:52 ` Ric Wheeler
2012-12-03 21:08 ` Chris Friesen
2012-12-03 21:21 ` Dave Jiang
2012-12-03 21:36 ` Chris Friesen
2012-12-03 21:59 ` Dave Jiang
2012-12-03 21:53 ` Ric Wheeler
2012-12-04 22:00 ` Chris Friesen
2012-12-04 23:55 ` Ric Wheeler
2012-12-05 9:20 ` James Bottomley
2012-12-05 11:41 ` Ric Wheeler [this message]
2012-12-05 11:57 ` James Bottomley
2012-12-06 18:15 ` Chris Friesen
2012-12-06 20:27 ` Chris Murphy
2012-12-08 18:08 ` James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50BF32CE.2010704@redhat.com \
--to=rwheeler@redhat.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=axboe@kernel.dk \
--cc=chris.friesen@genband.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mathias.buren@gmail.com \
--cc=neilb@suse.de \
--cc=roy@karlsbakk.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.