linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Webb <chris@arachsys.com>
To: Mark Lord <liml@rtr.ca>
Cc: Tejun Heo <teheo@suse.de>,
	linux-scsi@vger.kernel.org, Ric Wheeler <rwheeler@redhat.com>,
	Andrei Tanas <andrei@tanas.ca>, NeilBrown <neilb@suse.de>,
	linux-kernel@vger.kernel.org,
	IDE/ATA development list <linux-ide@vger.kernel.org>,
	Jeff Garzik <jgarzik@redhat.com>, Mark Lord <mlord@pobox.com>
Subject: Re: MD/RAID time out writing superblock
Date: Mon, 21 Sep 2009 11:26:54 +0100	[thread overview]
Message-ID: <20090921102654.GD8789@arachsys.com> (raw)
In-Reply-To: <20090918170517.GI2141@arachsys.com>

Chris Webb <chris@arachsys.com> writes:

> Mark Lord <liml@rtr.ca> writes:
> 
> > Speaking of which..
> > 
> > Chris:  I wonder if the errors will also vanish in your situation
> > by disabling the onboard write-caches in the drives ?
> > 
> > Eg.  hdparm -W0 /dev/sd?
> 
> Hi Mark. I've got a test machine on its way at the moment, so I'll make sure
> I check this one out on it too.

Our test machine is still being built, but we had an opportunity to try this on
a couple of the live machines when their RAID arrays failed over the weekend.
We still got timeouts, but (predictably!) they're not on flushes any more:

  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd 35/00:08:98:c6:00/00:00:4e:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/33
  ata2: EH complete
  [...] 
  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd 35/00:08:18:94:68/00:00:3d:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/33
  ata2: EH complete
  [...]

all the way through the night.

I also have these in the log, but they are immediately after turning off the
write caching in all drives, so may be a red herring with data still being
written out.

  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd c8/00:08:00:20:80/00:00:00:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/133
  ata2: EH complete
  ata2.00: limiting speed to UDMA/100:PIO4
  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd 25/00:08:80:3e:2d/00:00:4e:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: link is slow to respond, please be patient (ready=0
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/100
  ata2: EH complete

On another machine, I saw this with write caching turned off:

  ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
  ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
           res 40/00:00:40:1f:80/00:00:00:00:00/40 Emask 0x4 (timeout)
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata2.00: configured for UDMA/133
  ata2: EH complete
  ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
  ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
           res 40/00:00:20:1f:80/00:00:00:00:00/40 Emask 0x4 (timeout)
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata2.00: qc timeout (cmd 0xef)
  ata2.00: failed to set xfermode (err_mask=0x4)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: link is slow to respond, please be patient (ready=0)
  ata2: softreset failed (device not ready)
  ata2: limiting SATA link speed to 1.5 Gbps
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: reset failed, giving up
  ata2.00: disabled
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: link is slow to respond, please be patient (ready=0)
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2: EH complete
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 8396584
  end_request: I/O error, dev sdb, sector 8396584
  md: super_written gets error=-5, uptodate=0
  raid1: Disk failure on sdb1, disabling device.
  raid1: Operation continuing on 5 devices.
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 8396632
  end_request: I/O error, dev sdb, sector 8396632
  md: super_written gets error=-5, uptodate=0
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 654934840
  raid10: sdb3: rescheduling sector 1788594488
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 1311583568
  raid10: Disk failure on sdb3, disabling device.
  raid10: Operation continuing on 3 devices.
  Buffer I/O error on device dm-51, logical block 31930
  lost page write due to I/O error on dm-51
  Buffer I/O error on device dm-51, logical block 31931
  lost page write due to I/O error on dm-51
  Buffer I/O error on device dm-51, logical block 31932
  lost page write due to I/O error on dm-51
  Buffer I/O error on device dm-51, logical block 31933
  lost page write due to I/O error on dm-51
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 1465147272
  end_request: I/O error, dev sdb, sector 1465147272
  md: super_written gets error=-5, uptodate=0
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 8396584
  end_request: I/O error, dev sdb, sector 8396584
  md: super_written gets error=-5, uptodate=0
  sd 1:0:0:0: [sdb] READ CAPACITY(16) failed
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  sd 1:0:0:0: [sdb] Sense not available.
  sd 1:0:0:0: [sdb] READ CAPACITY failed
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  sd 1:0:0:0: [sdb] Sense not available.
  sd 1:0:0:0: [sdb] Asking for cache data failed
  sd 1:0:0:0: [sdb] Assuming drive cache: write through
  sdb: detected capacity change from 750156374016 to 0
  raid10: sdb: unrecoverable I/O read error for block 1788594488
  Buffer I/O error on device dm-59, logical block 204023
  Buffer I/O error on device dm-59, logical block 204023
  Buffer I/O error on device dm-43, logical block 24845
  lost page write due to I/O error on dm-43
  Buffer I/O error on device dm-62, logical block 558722
  lost page write due to I/O error on dm-62
  Buffer I/O error on device dm-43, logical block 24846
  lost page write due to I/O error on dm-43
  RAID1 conf printout:
   --- wd:5 rd:6
   disk 0, wo:0, o:1, dev:sda1
   disk 1, wo:1, o:0, dev:sdb1
   disk 2, wo:0, o:1, dev:sdc1
  RAID1 conf printout:
   --- wd:5 rd:6
   disk 0, wo:0, o:1, dev:sda1
   disk 2, wo:0, o:1, dev:sdc1
   disk 3, wo:0, o:1, dev:sdd1
   disk 4, wo:0, o:1, dev:sde1
   disk 5, wo:0, o:1, dev:sdf1
  raid10: Disk failure on sdb2, disabling device.
  raid10: Operation continuing on 3 devices.
  raid10: sdb: unrecoverable I/O read error for block 0
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 1, wo:1, o:0, dev:sdb2
   disk 2, wo:0, o:1, dev:sdc2
   disk 3, wo:0, o:1, dev:sdd2
   disk 5, wo:0, o:1, dev:sdf2
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 2, wo:0, o:1, dev:sdc2
   disk 3, wo:0, o:1, dev:sdd2
   disk 5, wo:0, o:1, dev:sdf2
  md: md2: resync done.
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 1, wo:1, o:0, dev:sdb3
   disk 2, wo:0, o:1, dev:sdc3
   disk 3, wo:0, o:1, dev:sdd3
   disk 5, wo:0, o:1, dev:sdf3
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 2, wo:0, o:1, dev:sdc3
   disk 3, wo:0, o:1, dev:sdd3
   disk 5, wo:0, o:1, dev:sdf3

Cheers,

Chris.

  reply	other threads:[~2009-09-21 10:26 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <004e01ca25e4$c11a54e0$434efea0$@ca>
     [not found] ` <9cfb6af689a7010df166fdebb1ef516b.squirrel@neil.brown.name>
     [not found]   ` <4A948A82.4080901@redhat.com>
     [not found]     ` <b585ed9f13649050bbc984869d081315.squirrel@neil.brown.name>
     [not found]       ` <4A94905F.7050705@redhat.com>
     [not found]         ` <005101ca25f4$09006830$1b013890$@ca>
     [not found]           ` <4A94A0E6.4020401@redhat.com>
     [not found]             ` <005401ca25ff$9ac91cc0$d05b5640$@ca>
     [not found]               ` <4A950FA6.4020408@redhat.com>
     [not found]                 ` <92cb16daad8278b0aa98125b9e1d057a@localhost>
     [not found]                   ` <4A95573A.6090404@redhat.com>
2009-08-26 18:12                     ` MD/RAID: what's wrong with sector 1953519935? Andrei Tanas
2009-08-27  0:07                       ` Mark Lord
2009-08-27  1:37                         ` Andrei Tanas
2009-08-27  2:33                       ` Robert Hancock
     [not found]                       ` <d086b110526f8bac2f562850dfc70b03@localhost>
2009-08-27 21:57                         ` MD/RAID time out writing superblock Ric Wheeler
2009-08-31  8:10                           ` Tejun Heo
2009-08-31 12:04                             ` Ric Wheeler
2009-08-31 12:20                               ` Tejun Heo
2009-09-07 11:44                                 ` Chris Webb
2009-09-07 11:59                                   ` Chris Webb
2009-09-09 12:02                                     ` Chris Webb
2009-09-14  7:41                                       ` Tejun Heo
2009-09-14  7:44                                         ` Tejun Heo
2009-09-14 12:48                                           ` Mark Lord
2009-09-14 13:05                                             ` Tejun Heo
2009-09-14 14:25                                               ` Mark Lord
2009-09-16 23:19                                                 ` Chris Webb
2009-09-17 13:29                                                   ` Mark Lord
2009-09-17 13:32                                                     ` Mark Lord
2009-09-17 13:37                                                     ` Chris Webb
2009-09-17 15:35                                                     ` Tejun Heo
2009-09-17 16:16                                                       ` Mark Lord
2009-09-17 16:17                                                         ` Mark Lord
2009-09-18 17:05                                                           ` Chris Webb
2009-09-21 10:26                                                             ` Chris Webb [this message]
2009-09-21 19:47                                                               ` Mark Lord
2009-09-22  6:16                                                               ` Robert Hancock
2009-09-20 18:36                                                         ` Robert Hancock
2009-09-14 13:11                                           ` Henrique de Moraes Holschuh
2009-09-14 13:24                                             ` Tejun Heo
2009-09-14 14:02                                               ` Henrique de Moraes Holschuh
2009-09-14 14:34                                                 ` Tejun Heo
2009-09-14 13:14                                         ` Gabor Gombas
2009-09-07 16:55                                   ` Allan Wind
2009-09-07 23:26                                     ` Thomas Fjellstrom
2009-09-14  7:46                                       ` Tejun Heo
2009-09-14 21:13                                         ` Thomas Fjellstrom
2009-09-14 22:23                                           ` Tejun Heo
2009-09-16 22:28                                 ` Chris Webb
2009-09-16 23:47                                   ` Tejun Heo
2009-09-17  0:34                                     ` Neil Brown
2009-09-17 12:00                                       ` Chris Webb
2009-09-17 11:57                                     ` Chris Webb
2009-09-17 15:44                                       ` Tejun Heo
2009-09-17 16:36                                         ` Allan Wind
2009-09-18  0:16                                           ` Tejun Heo
2009-09-18  2:47                                             ` Allan Wind
2009-09-18 17:07                                         ` Chris Webb
2009-09-20 18:46                                         ` Robert Hancock
2009-09-21  0:02                                           ` Kyle Moffett
2009-09-17 13:35                                     ` Mark Lord
2009-09-17 15:47                                       ` Tejun Heo
2009-08-31 12:21                             ` Mark Lord
2009-08-31 23:45                               ` Mark Lord
2009-09-01 13:07                                 ` Andrei Tanas
2009-09-01 13:15                                   ` Mark Lord
2009-09-01 13:30                                     ` Tejun Heo
2009-09-01 13:47                                       ` Ric Wheeler
2009-09-01 14:18                                         ` Andrei Tanas
2009-09-02 21:58                                   ` Allan Wind
2009-09-04 19:39                                     ` Andrei Tanas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090921102654.GD8789@arachsys.com \
    --to=chris@arachsys.com \
    --cc=andrei@tanas.ca \
    --cc=jgarzik@redhat.com \
    --cc=liml@rtr.ca \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mlord@pobox.com \
    --cc=neilb@suse.de \
    --cc=rwheeler@redhat.com \
    --cc=teheo@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).