All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Webb <chris@arachsys.com>
To: Mark Lord <liml@rtr.ca>
Cc: Tejun Heo <teheo@suse.de>,
	linux-scsi@vger.kernel.org, Ric Wheeler <rwheeler@redhat.com>,
	Andrei Tanas <andrei@tanas.ca>, NeilBrown <neilb@suse.de>,
	linux-kernel@vger.kernel.org,
	IDE/ATA development list <linux-ide@vger.kernel.org>,
	Jeff Garzik <jgarzik@redhat.com>, Mark Lord <mlord@pobox.com>
Subject: Re: MD/RAID time out writing superblock
Date: Mon, 21 Sep 2009 11:26:54 +0100	[thread overview]
Message-ID: <20090921102654.GD8789@arachsys.com> (raw)
In-Reply-To: <20090918170517.GI2141@arachsys.com>

Chris Webb <chris@arachsys.com> writes:

> Mark Lord <liml@rtr.ca> writes:
> 
> > Speaking of which..
> > 
> > Chris:  I wonder if the errors will also vanish in your situation
> > by disabling the onboard write-caches in the drives ?
> > 
> > Eg.  hdparm -W0 /dev/sd?
> 
> Hi Mark. I've got a test machine on its way at the moment, so I'll make sure
> I check this one out on it too.

Our test machine is still being built, but we had an opportunity to try this on
a couple of the live machines when their RAID arrays failed over the weekend.
We still got timeouts, but (predictably!) they're not on flushes any more:

  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd 35/00:08:98:c6:00/00:00:4e:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/33
  ata2: EH complete
  [...] 
  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd 35/00:08:18:94:68/00:00:3d:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/33
  ata2: EH complete
  [...]

all the way through the night.

I also have these in the log, but they are immediately after turning off the
write caching in all drives, so may be a red herring with data still being
written out.

  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd c8/00:08:00:20:80/00:00:00:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/133
  ata2: EH complete
  ata2.00: limiting speed to UDMA/100:PIO4
  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd 25/00:08:80:3e:2d/00:00:4e:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: link is slow to respond, please be patient (ready=0
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/100
  ata2: EH complete

On another machine, I saw this with write caching turned off:

  ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
  ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
           res 40/00:00:40:1f:80/00:00:00:00:00/40 Emask 0x4 (timeout)
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata2.00: configured for UDMA/133
  ata2: EH complete
  ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
  ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
           res 40/00:00:20:1f:80/00:00:00:00:00/40 Emask 0x4 (timeout)
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata2.00: qc timeout (cmd 0xef)
  ata2.00: failed to set xfermode (err_mask=0x4)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: link is slow to respond, please be patient (ready=0)
  ata2: softreset failed (device not ready)
  ata2: limiting SATA link speed to 1.5 Gbps
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: reset failed, giving up
  ata2.00: disabled
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: link is slow to respond, please be patient (ready=0)
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2: EH complete
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 8396584
  end_request: I/O error, dev sdb, sector 8396584
  md: super_written gets error=-5, uptodate=0
  raid1: Disk failure on sdb1, disabling device.
  raid1: Operation continuing on 5 devices.
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 8396632
  end_request: I/O error, dev sdb, sector 8396632
  md: super_written gets error=-5, uptodate=0
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 654934840
  raid10: sdb3: rescheduling sector 1788594488
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 1311583568
  raid10: Disk failure on sdb3, disabling device.
  raid10: Operation continuing on 3 devices.
  Buffer I/O error on device dm-51, logical block 31930
  lost page write due to I/O error on dm-51
  Buffer I/O error on device dm-51, logical block 31931
  lost page write due to I/O error on dm-51
  Buffer I/O error on device dm-51, logical block 31932
  lost page write due to I/O error on dm-51
  Buffer I/O error on device dm-51, logical block 31933
  lost page write due to I/O error on dm-51
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 1465147272
  end_request: I/O error, dev sdb, sector 1465147272
  md: super_written gets error=-5, uptodate=0
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 8396584
  end_request: I/O error, dev sdb, sector 8396584
  md: super_written gets error=-5, uptodate=0
  sd 1:0:0:0: [sdb] READ CAPACITY(16) failed
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  sd 1:0:0:0: [sdb] Sense not available.
  sd 1:0:0:0: [sdb] READ CAPACITY failed
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  sd 1:0:0:0: [sdb] Sense not available.
  sd 1:0:0:0: [sdb] Asking for cache data failed
  sd 1:0:0:0: [sdb] Assuming drive cache: write through
  sdb: detected capacity change from 750156374016 to 0
  raid10: sdb: unrecoverable I/O read error for block 1788594488
  Buffer I/O error on device dm-59, logical block 204023
  Buffer I/O error on device dm-59, logical block 204023
  Buffer I/O error on device dm-43, logical block 24845
  lost page write due to I/O error on dm-43
  Buffer I/O error on device dm-62, logical block 558722
  lost page write due to I/O error on dm-62
  Buffer I/O error on device dm-43, logical block 24846
  lost page write due to I/O error on dm-43
  RAID1 conf printout:
   --- wd:5 rd:6
   disk 0, wo:0, o:1, dev:sda1
   disk 1, wo:1, o:0, dev:sdb1
   disk 2, wo:0, o:1, dev:sdc1
  RAID1 conf printout:
   --- wd:5 rd:6
   disk 0, wo:0, o:1, dev:sda1
   disk 2, wo:0, o:1, dev:sdc1
   disk 3, wo:0, o:1, dev:sdd1
   disk 4, wo:0, o:1, dev:sde1
   disk 5, wo:0, o:1, dev:sdf1
  raid10: Disk failure on sdb2, disabling device.
  raid10: Operation continuing on 3 devices.
  raid10: sdb: unrecoverable I/O read error for block 0
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 1, wo:1, o:0, dev:sdb2
   disk 2, wo:0, o:1, dev:sdc2
   disk 3, wo:0, o:1, dev:sdd2
   disk 5, wo:0, o:1, dev:sdf2
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 2, wo:0, o:1, dev:sdc2
   disk 3, wo:0, o:1, dev:sdd2
   disk 5, wo:0, o:1, dev:sdf2
  md: md2: resync done.
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 1, wo:1, o:0, dev:sdb3
   disk 2, wo:0, o:1, dev:sdc3
   disk 3, wo:0, o:1, dev:sdd3
   disk 5, wo:0, o:1, dev:sdf3
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 2, wo:0, o:1, dev:sdc3
   disk 3, wo:0, o:1, dev:sdd3
   disk 5, wo:0, o:1, dev:sdf3

Cheers,

Chris.

  parent reply	other threads:[~2009-09-21 10:26 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-26  0:32 MD/RAID: what's wrong with sector 1953519935? Andrei Tanas
2009-08-26  0:50 ` NeilBrown
2009-08-26  1:06   ` Ric Wheeler
2009-08-26  1:24     ` NeilBrown
2009-08-26  1:31       ` Ric Wheeler
2009-08-26  2:22         ` Andrei Tanas
2009-08-26  2:41           ` Ric Wheeler
2009-08-26  3:45             ` Andrei Tanas
2009-08-26 10:34               ` Ric Wheeler
2009-08-26 14:46                 ` Andrei Tanas
2009-08-26 14:49                   ` Andrei Tanas
2009-08-26 15:39                   ` Ric Wheeler
2009-08-26 18:12                     ` Andrei Tanas
2009-08-26 18:12                       ` Andrei Tanas
2009-08-27  0:07                       ` Mark Lord
2009-08-27  1:37                         ` Andrei Tanas
2009-08-27  1:37                           ` Andrei Tanas
2009-08-27  2:33                       ` Robert Hancock
2009-08-27 21:22                       ` MD/RAID time out writing superblock Andrei Tanas
2009-08-27 21:57                         ` Ric Wheeler
2009-08-31  8:10                           ` Tejun Heo
2009-08-31 12:04                             ` Ric Wheeler
2009-08-31 12:20                               ` Tejun Heo
2009-09-07 11:44                                 ` Chris Webb
2009-09-07 11:59                                   ` Chris Webb
2009-09-09 12:02                                     ` Chris Webb
2009-09-14  7:41                                       ` Tejun Heo
2009-09-14  7:44                                         ` Tejun Heo
2009-09-14 12:48                                           ` Mark Lord
2009-09-14 13:05                                             ` Tejun Heo
2009-09-14 14:25                                               ` Mark Lord
2009-09-16 23:19                                                 ` Chris Webb
2009-09-17 13:29                                                   ` Mark Lord
2009-09-17 13:32                                                     ` Mark Lord
2009-09-17 13:37                                                     ` Chris Webb
2009-09-17 15:35                                                     ` Tejun Heo
2009-09-17 16:16                                                       ` Mark Lord
2009-09-17 16:17                                                         ` Mark Lord
2009-09-18 17:05                                                           ` Chris Webb
2009-09-20 17:35                                                             ` Allan Wind
2009-09-28  5:32                                                               ` Allan Wind
2009-09-21 10:26                                                             ` Chris Webb [this message]
2009-09-21 19:47                                                               ` Mark Lord
2009-09-22  6:16                                                               ` Robert Hancock
2009-09-20 18:36                                                         ` Robert Hancock
2009-09-14 13:11                                           ` Henrique de Moraes Holschuh
2009-09-14 13:24                                             ` Tejun Heo
2009-09-14 14:02                                               ` Henrique de Moraes Holschuh
2009-09-14 14:34                                                 ` Tejun Heo
2009-09-14 13:14                                         ` Gabor Gombas
2009-09-07 16:55                                   ` Allan Wind
2009-09-07 23:26                                     ` Thomas Fjellstrom
2009-09-07 23:26                                       ` Thomas Fjellstrom
2009-09-14  7:46                                       ` Tejun Heo
2009-09-14 21:13                                         ` Thomas Fjellstrom
2009-09-14 22:23                                           ` Tejun Heo
2009-09-07 16:55                                   ` Allan Wind
2009-09-16 22:28                                 ` Chris Webb
2009-09-16 23:47                                   ` Tejun Heo
2009-09-17  0:34                                     ` Neil Brown
2009-09-17 12:00                                       ` Chris Webb
2009-09-17 11:57                                     ` Chris Webb
2009-09-17 15:44                                       ` Tejun Heo
2009-09-17 16:36                                         ` Allan Wind
2009-09-18  0:16                                           ` Tejun Heo
2009-09-18  2:47                                             ` Allan Wind
2009-09-18 17:07                                         ` Chris Webb
2009-09-20 18:46                                         ` Robert Hancock
2009-09-21  0:02                                           ` Kyle Moffett
2009-09-17 13:35                                     ` Mark Lord
2009-09-17 15:47                                       ` Tejun Heo
2009-08-31 12:21                             ` Mark Lord
2009-08-31 23:45                               ` Mark Lord
2009-09-01 13:07                                 ` Andrei Tanas
2009-09-01 13:07                                   ` Andrei Tanas
2009-09-01 13:15                                   ` Mark Lord
2009-09-01 13:30                                     ` Tejun Heo
2009-09-01 13:47                                       ` Ric Wheeler
2009-09-01 14:18                                         ` Andrei Tanas
2009-09-01 14:18                                           ` Andrei Tanas
2009-09-14  5:30                                           ` Marc Giger
2009-09-14  5:30                                             ` Marc Giger
2009-09-02 21:58                                   ` Allan Wind
2009-09-04 19:39                                     ` Andrei Tanas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090921102654.GD8789@arachsys.com \
    --to=chris@arachsys.com \
    --cc=andrei@tanas.ca \
    --cc=jgarzik@redhat.com \
    --cc=liml@rtr.ca \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mlord@pobox.com \
    --cc=neilb@suse.de \
    --cc=rwheeler@redhat.com \
    --cc=teheo@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.