From: Ric Wheeler <rwheeler@redhat.com>
To: Chris Friesen <chris.friesen@genband.com>
Cc: "Mathias Burén" <mathias.buren@gmail.com>,
"Roy Sigurd Karlsbakk" <roy@karlsbakk.net>,
"Neil Brown" <neilb@suse.de>,
Linux-RAID <linux-raid@vger.kernel.org>,
"Jens Axboe" <axboe@kernel.dk>,
"IDE/ATA development list" <linux-ide@vger.kernel.org>,
linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: getting I/O errors in super_written()...any ideas what would cause this?
Date: Tue, 04 Dec 2012 18:55:45 -0500 [thread overview]
Message-ID: <50BE8D81.4050700@redhat.com> (raw)
In-Reply-To: <50BE7293.8060200@genband.com>
On 12/04/2012 05:00 PM, Chris Friesen wrote:
> On 12/03/2012 03:53 PM, Ric Wheeler wrote:
>> On 12/03/2012 04:08 PM, Chris Friesen wrote:
>>> On 12/03/2012 02:52 PM, Ric Wheeler wrote:
>>>
>>>> I jumped into this thread late - can you repost detail on the specific
>>>> drive and HBA used here? In any case, it sounds like this is a better
>>>> topic for the linux-scsi or linux-ide list where most of the low level
>>>> storage people lurk :)
>>> Okay, expanding the receiver list. :)
>>>
>>> To recap:
>>>
>>> I'm running 2.6.27 with LVM over software RAID 1 over a pair of SAS
>>> disks.
>>> Disks are WD9001BKHG, controller is Intel C600.
>>>
>>> Recently we started seeing messages of the following pattern, and we
>>> don't know what's causing them:
>>>
>>> Nov 28 08:57:10 kernel: end_request: I/O error, dev sda, sector
>>> 1758169523
>>> Nov 28 08:57:10 kernel: md: super_written gets error=-5, uptodate=0
>>> Nov 28 08:57:10 kernel: raid1: Disk failure on sda2, disabling device.
>>> Nov 28 08:57:10 kernel: raid1: Operation continuing on 1 devices.
>>>
>>> We've been assuming it's a software issue since it's reproducible on
>>> multiple systems, although so far we've only seen the problem with
>>> these particular disks.
>>>
>>> We've seen the problems with disk write cache enabled and disabled.
>> Hi Chris,
>>
>> Are there any earlier IO errors or sda related errors in the log?
> Nope, at least not nearby. On one system for instance we boot up and
> get into steady-state, then there are no kernel logs for about half an
> hour then out of the blue we see:
>
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: end_request: I/O error, dev sda, sector 1758169523
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: md: super_written gets error=-5, uptodate=0
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: raid1: Disk failure on sda2, disabling device.
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: raid1: Operation continuing on 1 devices.
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: end_request: I/O error, dev sdb, sector 1758169523
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: md: super_written gets error=-5, uptodate=0
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: RAID1 conf printout:
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: --- wd:1 rd:2
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: disk 0, wo:1, o:0, dev:sda2
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: disk 1, wo:0, o:1, dev:sdb2
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: RAID1 conf printout:
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: --- wd:1 rd:2
> Nov 27 14:58:13 base0-0-0-13-0-11-1 kernel: disk 1, wo:0, o:1, dev:sdb2
>
>
> As another data point, it looks like we may be doing a SEND DIAGNOSTIC
> command specifying the default self-test in addition to the background
> short self-test. This seems a bit risky and excessive to me, but
> apparently the guy that wrote it is no longer with the company.
>
> What is the recommended method for monitoring disks on a system that
> is likely to go a long time between boots? Do we avoid any in-service
> testing and just monitor the SMART data and only test it if something
> actually goes wrong? Or should we intentionally drop a disk out of the
> array and test it? (The downside of that is that we lose
> redundancy since we only have 2 disks.)
>
> Chris
I don't know if running the self tests really helps. Normally, I would simply
suggest scanning for remapped sectors (and looking out for lots of them, not
just a handful since they are moderately normal in disks). You can do that with
smartctl.
Best advice is to try and consult directly with your disk vendor about their
suggestions if you have that connection of course :)
Ric
next prev parent reply other threads:[~2012-12-04 23:55 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-28 17:52 getting I/O errors in super_written()...any ideas what would cause this? Chris Friesen
2012-11-28 18:08 ` Mathias Burén
2012-11-28 18:51 ` Roy Sigurd Karlsbakk
2012-11-28 20:21 ` Chris Friesen
2012-11-28 20:27 ` Mathias Burén
2012-11-28 20:29 ` Chris Friesen
2012-12-03 20:22 ` Ric Wheeler
2012-12-03 20:44 ` Chris Friesen
2012-12-03 20:52 ` Ric Wheeler
2012-12-03 21:08 ` Chris Friesen
2012-12-03 21:21 ` Dave Jiang
2012-12-03 21:36 ` Chris Friesen
2012-12-03 21:59 ` Dave Jiang
2012-12-03 21:53 ` Ric Wheeler
2012-12-04 22:00 ` Chris Friesen
2012-12-04 23:55 ` Ric Wheeler [this message]
2012-12-05 9:20 ` James Bottomley
2012-12-05 11:41 ` Ric Wheeler
2012-12-05 11:57 ` James Bottomley
2012-12-06 18:15 ` Chris Friesen
2012-12-06 20:27 ` Chris Murphy
2012-12-08 18:08 ` James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50BE8D81.4050700@redhat.com \
--to=rwheeler@redhat.com \
--cc=axboe@kernel.dk \
--cc=chris.friesen@genband.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mathias.buren@gmail.com \
--cc=neilb@suse.de \
--cc=roy@karlsbakk.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).