* No response?
@ 2005-01-20 17:55 David Dougall
2005-01-20 18:12 ` Peter T. Breuer
` (4 more replies)
0 siblings, 5 replies; 22+ messages in thread
From: David Dougall @ 2005-01-20 17:55 UTC (permalink / raw)
To: linux-raid
Perhaps I was asking a stupid question or an obvious one, but I have
received not response.
Maybe if I simplify the question...
If I am running software raid1 and a disk device starts throwing I/O
errors, Is the filesystem supposed to see any indication of this? I
thought software raid would mask all of this and just fail the drive.
I have servers with xfs as the filesystem and xfs will start to throw I/O
errors when a disk starts acting up even with software raid in between.
Please advise on how I can confirm my setup or if this is possibly a bug
how to diagnose further.
If it makes a difference, I am running linux-2.4.26
Thanks
--David Dougall
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: No response? 2005-01-20 17:55 No response? David Dougall @ 2005-01-20 18:12 ` Peter T. Breuer 2005-01-20 18:14 ` Gordon Henderson ` (3 subsequent siblings) 4 siblings, 0 replies; 22+ messages in thread From: Peter T. Breuer @ 2005-01-20 18:12 UTC (permalink / raw) To: linux-raid David Dougall <davidd@et.byu.edu> wrote: > If I am running software raid1 and a disk device starts throwing I/O > errors, Is the filesystem supposed to see any indication of this? I No - not if the error is on only one disk. The first error will fault the disk from the array and the driver will retry the read, and must retry from another disk (the first is no longer there). The actual i/o to the disks does not form part of the i/o to the raid array itself, so there is little chance of contamination between the two. The raid i/o is only acked back to the user when one of the disk i/o's (on read) has succeeded. > thought software raid would mask all of this and just fail the drive. > > I have servers with xfs as the filesystem and xfs will start to throw I/O > errors when a disk starts acting up even with software raid in between. That's strange. But not impossible - coding for error situations is always difficult, and more difficult to test. > If it makes a difference, I am running linux-2.4.26 Peter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 17:55 No response? David Dougall 2005-01-20 18:12 ` Peter T. Breuer @ 2005-01-20 18:14 ` Gordon Henderson 2005-01-20 18:37 ` Mark Bellon 2005-01-20 18:21 ` Mike Hardy ` (2 subsequent siblings) 4 siblings, 1 reply; 22+ messages in thread From: Gordon Henderson @ 2005-01-20 18:14 UTC (permalink / raw) To: David Dougall; +Cc: linux-raid On Thu, 20 Jan 2005, David Dougall wrote: > Perhaps I was asking a stupid question or an obvious one, but I have > received not response. > Maybe if I simplify the question... > > If I am running software raid1 and a disk device starts throwing I/O > errors, Is the filesystem supposed to see any indication of this? No.. > I > thought software raid would mask all of this and just fail the drive. It should. > I have servers with xfs as the filesystem and xfs will start to throw I/O > errors when a disk starts acting up even with software raid in between. > Please advise on how I can confirm my setup or if this is possibly a bug > how to diagnose further. I've experienced long delays (30 seconds? It seemed longer) in a system when a disk fails for a genuine reason - (I've deliberately run badblocks on an md device when I knew one of the underlying devices had genuine bad blocks) maybe the md code really tries hard to read the block, maybe the underlying device driver tries really hard), but in these cases, I've seen the system more or less freeze (all processes accessing that device anyway) until the raid code decided to kick the device out of the array. Maybe XFS has a timer and doesn't like devices to "go away" for a long period of time? > If it makes a difference, I am running linux-2.4.26 I've used 2.4.x for a long time - I did try xfs about a year ago, but wasn't happy with it all (for various reasons). Gordon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 18:14 ` Gordon Henderson @ 2005-01-20 18:37 ` Mark Bellon 2005-01-20 19:15 ` David Dougall 2005-01-20 19:37 ` Gordon Henderson 0 siblings, 2 replies; 22+ messages in thread From: Mark Bellon @ 2005-01-20 18:37 UTC (permalink / raw) To: Gordon Henderson; +Cc: David Dougall, linux-raid Gordon Henderson wrote: >On Thu, 20 Jan 2005, David Dougall wrote: > > > >>Perhaps I was asking a stupid question or an obvious one, but I have >>received not response. >>Maybe if I simplify the question... >> >>If I am running software raid1 and a disk device starts throwing I/O >>errors, Is the filesystem supposed to see any indication of this? >> >> > >No.. > > > >> I >>thought software raid would mask all of this and just fail the drive. >> >> > >It should. > > > >>I have servers with xfs as the filesystem and xfs will start to throw I/O >>errors when a disk starts acting up even with software raid in between. >>Please advise on how I can confirm my setup or if this is possibly a bug >>how to diagnose further. >> >> > >I've experienced long delays (30 seconds? It seemed longer) in a system >when a disk fails for a genuine reason - (I've deliberately run badblocks >on an md device when I knew one of the underlying devices had genuine bad >blocks) maybe the md code really tries hard to read the block, maybe the >underlying device driver tries really hard), but in these cases, I've seen >the system more or less freeze (all processes accessing that device >anyway) until the raid code decided to kick the device out of the array. > > I've seen this too. The worst case can actually last for over 2 minutes. We've been running with a patch to the RAID 1 driver that handles this so critical applications do not hang for too long. Basically it uses timers in the RAID 1 driver to force the disk to be treated as actually having failed if it doesn't respond within a reasonable time (tunable but usually ~3 seconds). It then handles the I/O requests coming back async. and does the clean up. >Maybe XFS has a timer and doesn't like devices to "go away" for a long period of time? > > Not that I know of but I would need to look. Any XFS wizard's comments? mark > > >>If it makes a difference, I am running linux-2.4.26 >> >> > >I've used 2.4.x for a long time - I did try xfs about a year ago, but >wasn't happy with it all (for various reasons). > >Gordon >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 18:37 ` Mark Bellon @ 2005-01-20 19:15 ` David Dougall 2005-01-20 19:35 ` Mark Bellon 2005-01-20 19:37 ` Gordon Henderson 1 sibling, 1 reply; 22+ messages in thread From: David Dougall @ 2005-01-20 19:15 UTC (permalink / raw) To: Mark Bellon; +Cc: Gordon Henderson, linux-raid Oooh, that ~3 second patch sounds very interesting. I actually think that the theory about timeouts causing the problem is correct. I didn't realize that applications/fs calls could stall for that long. My NFS servers have a timeout themselves of about 10 seconds before they start to try to shut things down. --David Dougall On Thu, 20 Jan 2005, Mark Bellon wrote: > Gordon Henderson wrote: > > >On Thu, 20 Jan 2005, David Dougall wrote: > > > > > > > >>Perhaps I was asking a stupid question or an obvious one, but I have > >>received not response. > >>Maybe if I simplify the question... > >> > >>If I am running software raid1 and a disk device starts throwing I/O > >>errors, Is the filesystem supposed to see any indication of this? > >> > >> > > > >No.. > > > > > > > >> I > >>thought software raid would mask all of this and just fail the drive. > >> > >> > > > >It should. > > > > > > > >>I have servers with xfs as the filesystem and xfs will start to throw I/O > >>errors when a disk starts acting up even with software raid in between. > >>Please advise on how I can confirm my setup or if this is possibly a bug > >>how to diagnose further. > >> > >> > > > >I've experienced long delays (30 seconds? It seemed longer) in a system > >when a disk fails for a genuine reason - (I've deliberately run badblocks > >on an md device when I knew one of the underlying devices had genuine bad > >blocks) maybe the md code really tries hard to read the block, maybe the > >underlying device driver tries really hard), but in these cases, I've seen > >the system more or less freeze (all processes accessing that device > >anyway) until the raid code decided to kick the device out of the array. > > > > > I've seen this too. The worst case can actually last for over 2 minutes. > > We've been running with a patch to the RAID 1 driver that handles this > so critical applications do not hang for too long. Basically it uses > timers in the RAID 1 driver to force the disk to be treated as actually > having failed if it doesn't respond within a reasonable time (tunable > but usually ~3 seconds). It then handles the I/O requests coming back > async. and does the clean up. > > >Maybe XFS has a timer and doesn't like devices to "go away" for a long period of time? > > > > > Not that I know of but I would need to look. Any XFS wizard's comments? > > mark > > > > > > >>If it makes a difference, I am running linux-2.4.26 > >> > >> > > > >I've used 2.4.x for a long time - I did try xfs about a year ago, but > >wasn't happy with it all (for various reasons). > > > >Gordon > >- > >To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >the body of a message to majordomo@vger.kernel.org > >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 19:15 ` David Dougall @ 2005-01-20 19:35 ` Mark Bellon 0 siblings, 0 replies; 22+ messages in thread From: Mark Bellon @ 2005-01-20 19:35 UTC (permalink / raw) To: David Dougall; +Cc: Gordon Henderson, linux-raid David Dougall wrote: >Oooh, that ~3 second patch sounds very interesting. I actually think that >the theory about timeouts causing the problem is correct. I didn't >realize that applications/fs calls could stall for that long. My NFS >servers have a timeout themselves of about 10 seconds before they start to >try to shut things down. > > I could generate one for 2.4.26 for you but I need a bit of time - I'm running a 2.4.20 with a great many enhancements and there are a few differences. If there is interested I can post it to linux-raid too. mark >--David Dougall > > >On Thu, 20 Jan 2005, Mark Bellon wrote: > > > >>Gordon Henderson wrote: >> >> >> >>>On Thu, 20 Jan 2005, David Dougall wrote: >>> >>> >>> >>> >>> >>>>Perhaps I was asking a stupid question or an obvious one, but I have >>>>received not response. >>>>Maybe if I simplify the question... >>>> >>>>If I am running software raid1 and a disk device starts throwing I/O >>>>errors, Is the filesystem supposed to see any indication of this? >>>> >>>> >>>> >>>> >>>No.. >>> >>> >>> >>> >>> >>>>I >>>>thought software raid would mask all of this and just fail the drive. >>>> >>>> >>>> >>>> >>>It should. >>> >>> >>> >>> >>> >>>>I have servers with xfs as the filesystem and xfs will start to throw I/O >>>>errors when a disk starts acting up even with software raid in between. >>>>Please advise on how I can confirm my setup or if this is possibly a bug >>>>how to diagnose further. >>>> >>>> >>>> >>>> >>>I've experienced long delays (30 seconds? It seemed longer) in a system >>>when a disk fails for a genuine reason - (I've deliberately run badblocks >>>on an md device when I knew one of the underlying devices had genuine bad >>>blocks) maybe the md code really tries hard to read the block, maybe the >>>underlying device driver tries really hard), but in these cases, I've seen >>>the system more or less freeze (all processes accessing that device >>>anyway) until the raid code decided to kick the device out of the array. >>> >>> >>> >>> >>I've seen this too. The worst case can actually last for over 2 minutes. >> >>We've been running with a patch to the RAID 1 driver that handles this >>so critical applications do not hang for too long. Basically it uses >>timers in the RAID 1 driver to force the disk to be treated as actually >>having failed if it doesn't respond within a reasonable time (tunable >>but usually ~3 seconds). It then handles the I/O requests coming back >>async. and does the clean up. >> >> >> >>>Maybe XFS has a timer and doesn't like devices to "go away" for a long period of time? >>> >>> >>> >>> >>Not that I know of but I would need to look. Any XFS wizard's comments? >> >>mark >> >> >> >>> >>> >>>>If it makes a difference, I am running linux-2.4.26 >>>> >>>> >>>> >>>> >>>I've used 2.4.x for a long time - I did try xfs about a year ago, but >>>wasn't happy with it all (for various reasons). >>> >>>Gordon >>>- >>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>the body of a message to majordomo@vger.kernel.org >>>More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> >>> >> >> >> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 18:37 ` Mark Bellon 2005-01-20 19:15 ` David Dougall @ 2005-01-20 19:37 ` Gordon Henderson 2005-01-20 19:41 ` Mark Bellon 1 sibling, 1 reply; 22+ messages in thread From: Gordon Henderson @ 2005-01-20 19:37 UTC (permalink / raw) To: Mark Bellon; +Cc: linux-raid On Thu, 20 Jan 2005, Mark Bellon wrote: > I've seen this too. The worst case can actually last for over 2 minutes. > > We've been running with a patch to the RAID 1 driver that handles this > so critical applications do not hang for too long. Basically it uses > timers in the RAID 1 driver to force the disk to be treated as actually > having failed if it doesn't respond within a reasonable time (tunable > but usually ~3 seconds). It then handles the I/O requests coming back > async. and does the clean up. This is intersting, but make it an option (kernel compile, sysctl, etc.)... I have a small home server/firewall that I run with the disks spun down (noflushd) and spinning up a disk sometimes takes 8 seconds - it's a RAID-1 set and seems to cope OK with the disks spinning down & up again as required... Gordon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 19:37 ` Gordon Henderson @ 2005-01-20 19:41 ` Mark Bellon 2005-01-20 19:49 ` David Dougall 0 siblings, 1 reply; 22+ messages in thread From: Mark Bellon @ 2005-01-20 19:41 UTC (permalink / raw) To: Gordon Henderson; +Cc: linux-raid Gordon Henderson wrote: >On Thu, 20 Jan 2005, Mark Bellon wrote: > > > >>I've seen this too. The worst case can actually last for over 2 minutes. >> >>We've been running with a patch to the RAID 1 driver that handles this >>so critical applications do not hang for too long. Basically it uses >>timers in the RAID 1 driver to force the disk to be treated as actually >>having failed if it doesn't respond within a reasonable time (tunable >>but usually ~3 seconds). It then handles the I/O requests coming back >>async. and does the clean up. >> >> > >This is intersting, but make it an option (kernel compile, sysctl, >etc.)... I have a small home server/firewall that I run with the disks >spun down (noflushd) and spinning up a disk sometimes takes 8 seconds - >it's a RAID-1 set and seems to cope OK with the disks spinning down & up >again as required... > > The current patch has config options to adjust the Non-Responsive-Disk-Timer. A zero specified no timeout and a non-zero value is the timeout in seconds. Let me pull a 2.4.26 kernel source and see how fast I can work up a patch. Or would it better to generate it against 2.4.29? mark >Gordon > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 19:41 ` Mark Bellon @ 2005-01-20 19:49 ` David Dougall 0 siblings, 0 replies; 22+ messages in thread From: David Dougall @ 2005-01-20 19:49 UTC (permalink / raw) To: Mark Bellon; +Cc: Gordon Henderson, linux-raid It would certainly be easier for me if it were 2.4.26. The other 2 patches I have to apply have only been released for 2.4.26 and 2.4.25-pre9 or something like that. If it is better for everyone else to do 2.4.29, I can just backport it to my kernel. Thanks --David Dougall On Thu, 20 Jan 2005, Mark Bellon wrote: > Gordon Henderson wrote: > > >On Thu, 20 Jan 2005, Mark Bellon wrote: > > > > > > > >>I've seen this too. The worst case can actually last for over 2 minutes. > >> > >>We've been running with a patch to the RAID 1 driver that handles this > >>so critical applications do not hang for too long. Basically it uses > >>timers in the RAID 1 driver to force the disk to be treated as actually > >>having failed if it doesn't respond within a reasonable time (tunable > >>but usually ~3 seconds). It then handles the I/O requests coming back > >>async. and does the clean up. > >> > >> > > > >This is intersting, but make it an option (kernel compile, sysctl, > >etc.)... I have a small home server/firewall that I run with the disks > >spun down (noflushd) and spinning up a disk sometimes takes 8 seconds - > >it's a RAID-1 set and seems to cope OK with the disks spinning down & up > >again as required... > > > > > The current patch has config options to adjust the > Non-Responsive-Disk-Timer. A zero specified no timeout and a non-zero > value is the timeout in seconds. > > Let me pull a 2.4.26 kernel source and see how fast I can work up a > patch. Or would it better to generate it against 2.4.29? > > mark > > >Gordon > > > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 17:55 No response? David Dougall 2005-01-20 18:12 ` Peter T. Breuer 2005-01-20 18:14 ` Gordon Henderson @ 2005-01-20 18:21 ` Mike Hardy 2005-01-20 18:30 ` Mario Holbe 2005-01-20 18:49 ` Kanoa Withington 4 siblings, 0 replies; 22+ messages in thread From: Mike Hardy @ 2005-01-20 18:21 UTC (permalink / raw) To: linux-raid This hasn't been my experience, and I just had a drive journey across the river styx a few days back. It was in a RAID1 mirror, and while I got some log messages about it, and smartd and mdadm both sent me email, the software raid device and the machine in general both kept ticking along. That was 2.6.10 and ext3, but I've also had this experience in the 2.4 series, with ext3. Either its an xfs interaction, there's something else going on. There was mention recently of RAID1 corruption, but I believe that was in reference to 2.6.10, and there have only been two reports, where I'd expect a storm of reports if it was ocurring to more people. Regardless, perhaps its possible to test it yourself by assembling a couple of disk files into loopback bindings with LVM's faulty block support on them, and finally a raid mirror on top. Sounds a bit like a house of cards, but you'd be able to simulate a failing drive that way with xfs on it, and see how the mirror and filesystem react -Mike David Dougall wrote: > Perhaps I was asking a stupid question or an obvious one, but I have > received not response. > Maybe if I simplify the question... > > If I am running software raid1 and a disk device starts throwing I/O > errors, Is the filesystem supposed to see any indication of this? I > thought software raid would mask all of this and just fail the drive. > > I have servers with xfs as the filesystem and xfs will start to throw I/O > errors when a disk starts acting up even with software raid in between. > Please advise on how I can confirm my setup or if this is possibly a bug > how to diagnose further. > If it makes a difference, I am running linux-2.4.26 > Thanks > --David Dougall > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 17:55 No response? David Dougall ` (2 preceding siblings ...) 2005-01-20 18:21 ` Mike Hardy @ 2005-01-20 18:30 ` Mario Holbe 2005-01-20 18:57 ` David Dougall 2005-01-20 18:49 ` Kanoa Withington 4 siblings, 1 reply; 22+ messages in thread From: Mario Holbe @ 2005-01-20 18:30 UTC (permalink / raw) To: linux-raid David Dougall <davidd@et.byu.edu> wrote: > If I am running software raid1 and a disk device starts throwing I/O > errors, Is the filesystem supposed to see any indication of this? I Usually this should not happen. Presumed a) this device is not the only active device in this RAID1 and b) this device is the only failing one. > I have servers with xfs as the filesystem and xfs will start to throw I/O > errors when a disk starts acting up even with software raid in between. It could be helpful to show the messages appearing (dmesg), the RAID setup (cat /proc/mdstat) and the mount (cat /etc/fstab /etc/mtab or /proc/mounts). regards, Mario -- <jv> Oh well, config <jv> one actually wonders what force in the universe is holding it <jv> and makes it working <Beeth> chances and accidents :) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 18:30 ` Mario Holbe @ 2005-01-20 18:57 ` David Dougall 2005-01-20 19:12 ` Kanoa Withington ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: David Dougall @ 2005-01-20 18:57 UTC (permalink / raw) To: Mario Holbe; +Cc: linux-raid The following appears to be relavent information from the syslog file: Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:06 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:06 linux-sg2 kernel: I/O error: dev 08:10, sector 314179976 Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:06 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:06 linux-sg2 kernel: I/O error: dev 08:10, sector 314179969 Jan 10 11:56:06 linux-sg2 kernel: XFS: device device-mapper(254,1)- XFS write er ror in file system meta-data block 0x2bb20008 in device-mapper(254,1) Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 144067 Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 62129592 Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 144131 Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 104726920 Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 385 Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 157090184 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453448 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 343219280 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 392 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 104726913 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 144143 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 157090177 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453441 Jan 10 11:56:08 linux-sg2 kernel: I/O error in filesystem ("device-mapper(254,1) ") meta-data dev device-mapper(254,1) block 0x18fa318f ("xlog_iodone") err or 5 buf count 2048 Jan 10 11:56:08 linux-sg2 kernel: xfs_force_shutdown(device-mapper(254,1),0x2) c alled from line 966 of file xfs_log.c. Return address = 0xc0246d9b Jan 10 11:56:08 linux-sg2 kernel: Filesystem "device-mapper(254,1)": Log I/O Err or Detected. Shutting down filesystem: device-mapper(254,1) Jan 10 11:56:08 linux-sg2 kernel: Please umount the filesystem, and rectify the problem(s) I don't see any error messages from md in any of these logs. --David Dougall On Thu, 20 Jan 2005, Mario Holbe wrote: > David Dougall <davidd@et.byu.edu> wrote: > > If I am running software raid1 and a disk device starts throwing I/O > > errors, Is the filesystem supposed to see any indication of this? I > > Usually this should not happen. Presumed a) this device is not the > only active device in this RAID1 and b) this device is the only > failing one. > > > I have servers with xfs as the filesystem and xfs will start to throw I/O > > errors when a disk starts acting up even with software raid in between. > > It could be helpful to show the messages appearing (dmesg), the > RAID setup (cat /proc/mdstat) and the mount (cat /etc/fstab /etc/mtab > or /proc/mounts). > > > regards, > Mario > -- > <jv> Oh well, config > <jv> one actually wonders what force in the universe is holding it > <jv> and makes it working > <Beeth> chances and accidents :) > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 18:57 ` David Dougall @ 2005-01-20 19:12 ` Kanoa Withington 2005-01-20 19:17 ` David Dougall 2005-01-20 19:18 ` Guy ` (2 subsequent siblings) 3 siblings, 1 reply; 22+ messages in thread From: Kanoa Withington @ 2005-01-20 19:12 UTC (permalink / raw) To: David Dougall; +Cc: Mario Holbe, linux-raid Yes, that's a standard XFS timeout and shutdown. If your second disk is on the sme SCSI channel try moving it to a different one, preferably a different controller alotgether. Your disk 08:10 does have real problems, but they are separate from the XFS shutdown which should be prevented by the MD layer. -Kanoa On Thu, 20 Jan 2005, David Dougall wrote: > return code = 8000002 > Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: > sense key > Hardware Error > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453441 > Jan 10 11:56:08 linux-sg2 kernel: I/O error in filesystem > ("device-mapper(254,1) > ") meta-data dev device-mapper(254,1) block 0x18fa318f > ("xlog_iodone") err > or 5 buf count 2048 > Jan 10 11:56:08 linux-sg2 kernel: > xfs_force_shutdown(device-mapper(254,1),0x2) c > alled from line 966 of file xfs_log.c. Return address = 0xc0246d9b > Jan 10 11:56:08 linux-sg2 kernel: Filesystem "device-mapper(254,1)": Log > I/O Err > or Detected. Shutting down filesystem: device-mapper(254,1) > Jan 10 11:56:08 linux-sg2 kernel: Please umount the filesystem, and > rectify the > problem(s) > > > I don't see any error messages from md in any of these logs. > --David Dougall > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 19:12 ` Kanoa Withington @ 2005-01-20 19:17 ` David Dougall 2005-01-20 19:23 ` Guy 2005-01-20 19:34 ` Kanoa Withington 0 siblings, 2 replies; 22+ messages in thread From: David Dougall @ 2005-01-20 19:17 UTC (permalink / raw) To: Kanoa Withington; +Cc: Mario Holbe, linux-raid By "different controller" do you mean HBA controller or disk controller? The disk devices are on completely different jbods. They are both through the same HBA(the server only has 1 PCI slot) --David Dougall On Thu, 20 Jan 2005, Kanoa Withington wrote: > > Yes, that's a standard XFS timeout and shutdown. If your second disk > is on the sme SCSI channel try moving it to a different one, > preferably a different controller alotgether. > > Your disk 08:10 does have real problems, but they are separate from > the XFS shutdown which should be prevented by the MD layer. > > -Kanoa > > On Thu, 20 Jan 2005, David Dougall wrote: > > > > return code = 8000002 > > Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: > > sense key > > Hardware Error > > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453441 > > Jan 10 11:56:08 linux-sg2 kernel: I/O error in filesystem > > ("device-mapper(254,1) > > ") meta-data dev device-mapper(254,1) block 0x18fa318f > > ("xlog_iodone") err > > or 5 buf count 2048 > > Jan 10 11:56:08 linux-sg2 kernel: > > xfs_force_shutdown(device-mapper(254,1),0x2) c > > alled from line 966 of file xfs_log.c. Return address = 0xc0246d9b > > Jan 10 11:56:08 linux-sg2 kernel: Filesystem "device-mapper(254,1)": Log > > I/O Err > > or Detected. Shutting down filesystem: device-mapper(254,1) > > Jan 10 11:56:08 linux-sg2 kernel: Please umount the filesystem, and > > rectify the > > problem(s) > > > > > > I don't see any error messages from md in any of these logs. > > --David Dougall > > > > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: No response? 2005-01-20 19:17 ` David Dougall @ 2005-01-20 19:23 ` Guy 2005-01-20 19:34 ` Kanoa Withington 1 sibling, 0 replies; 22+ messages in thread From: Guy @ 2005-01-20 19:23 UTC (permalink / raw) To: 'David Dougall', 'Kanoa Withington' Cc: 'Mario Holbe', linux-raid At least: Different SCSI or IDE bus. Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Dougall Sent: Thursday, January 20, 2005 2:18 PM To: Kanoa Withington Cc: Mario Holbe; linux-raid@vger.kernel.org Subject: Re: No response? By "different controller" do you mean HBA controller or disk controller? The disk devices are on completely different jbods. They are both through the same HBA(the server only has 1 PCI slot) --David Dougall On Thu, 20 Jan 2005, Kanoa Withington wrote: > > Yes, that's a standard XFS timeout and shutdown. If your second disk > is on the sme SCSI channel try moving it to a different one, > preferably a different controller alotgether. > > Your disk 08:10 does have real problems, but they are separate from > the XFS shutdown which should be prevented by the MD layer. > > -Kanoa > > On Thu, 20 Jan 2005, David Dougall wrote: > > > > return code = 8000002 > > Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: > > sense key > > Hardware Error > > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453441 > > Jan 10 11:56:08 linux-sg2 kernel: I/O error in filesystem > > ("device-mapper(254,1) > > ") meta-data dev device-mapper(254,1) block 0x18fa318f > > ("xlog_iodone") err > > or 5 buf count 2048 > > Jan 10 11:56:08 linux-sg2 kernel: > > xfs_force_shutdown(device-mapper(254,1),0x2) c > > alled from line 966 of file xfs_log.c. Return address = 0xc0246d9b > > Jan 10 11:56:08 linux-sg2 kernel: Filesystem "device-mapper(254,1)": Log > > I/O Err > > or Detected. Shutting down filesystem: device-mapper(254,1) > > Jan 10 11:56:08 linux-sg2 kernel: Please umount the filesystem, and > > rectify the > > problem(s) > > > > > > I don't see any error messages from md in any of these logs. > > --David Dougall > > > > > > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 19:17 ` David Dougall 2005-01-20 19:23 ` Guy @ 2005-01-20 19:34 ` Kanoa Withington 2005-01-20 19:44 ` Mark Bellon 1 sibling, 1 reply; 22+ messages in thread From: Kanoa Withington @ 2005-01-20 19:34 UTC (permalink / raw) To: David Dougall; +Cc: Mario Holbe, linux-raid Ideally a different HBA altogether, but a different channel on a multichannel HBA at a minimum. If your SCSI card is not a multichannel card, think about getting one or think about a completely different arrangement. It may be possible to tune the HBA reset behavior or the XFS timeout threshold but as a matter of principle when constructing disk mirrors you should try to keep the disks as separate as possible. You should only need to tune, tweak or patch if you are trying to do something unusual - which you are not. In the short term, unplug the failing disk: Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 You are better off without it if your system is crashing. -Kanoa On Thu, 20 Jan 2005, David Dougall wrote: > By "different controller" do you mean HBA controller or disk controller? > The disk devices are on completely different jbods. They are both through > the same HBA(the server only has 1 PCI slot) > --David Dougall > > > On Thu, 20 Jan 2005, Kanoa Withington wrote: > > > > > Yes, that's a standard XFS timeout and shutdown. If your second disk > > is on the sme SCSI channel try moving it to a different one, > > preferably a different controller alotgether. > > > > Your disk 08:10 does have real problems, but they are separate from > > the XFS shutdown which should be prevented by the MD layer. > > > > -Kanoa > > > > On Thu, 20 Jan 2005, David Dougall wrote: > > > > > > > return code = 8000002 > > > Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: > > > sense key > > > Hardware Error > > > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453441 > > > Jan 10 11:56:08 linux-sg2 kernel: I/O error in filesystem > > > ("device-mapper(254,1) > > > ") meta-data dev device-mapper(254,1) block 0x18fa318f > > > ("xlog_iodone") err > > > or 5 buf count 2048 > > > Jan 10 11:56:08 linux-sg2 kernel: > > > xfs_force_shutdown(device-mapper(254,1),0x2) c > > > alled from line 966 of file xfs_log.c. Return address = 0xc0246d9b > > > Jan 10 11:56:08 linux-sg2 kernel: Filesystem "device-mapper(254,1)": Log > > > I/O Err > > > or Detected. Shutting down filesystem: device-mapper(254,1) > > > Jan 10 11:56:08 linux-sg2 kernel: Please umount the filesystem, and > > > rectify the > > > problem(s) > > > > > > > > > I don't see any error messages from md in any of these logs. > > > --David Dougall > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 19:34 ` Kanoa Withington @ 2005-01-20 19:44 ` Mark Bellon 0 siblings, 0 replies; 22+ messages in thread From: Mark Bellon @ 2005-01-20 19:44 UTC (permalink / raw) To: Kanoa Withington; +Cc: David Dougall, Mario Holbe, linux-raid Kanoa Withington wrote: >Ideally a different HBA altogether, but a different channel on a >multichannel HBA at a minimum. If your SCSI card is not a multichannel >card, think about getting one or think about a completely different >arrangement. > >It may be possible to tune the HBA reset behavior or the XFS timeout >threshold but as a matter of principle when constructing disk mirrors >you should try to keep the disks as separate as possible. You should >only need to tune, tweak or patch if you are trying to do something >unusual - which you are not. > > Very true. The default parameters for SCSI (5 retries as I recall) can take a very long time when a SCSI bus reset is called for (settle times and such) - I've seen 2+ minutes. Even with totally redundent controllers a logical I/O (to the RAID) could be held up waiting for a physical I/O by this long. The XFS parameter would need to be raised above the threadhold. mark >In the short term, unplug the failing disk: > >Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 > >You are better off without it if your system is crashing. > >-Kanoa > > > >On Thu, 20 Jan 2005, David Dougall wrote: > > > >>By "different controller" do you mean HBA controller or disk controller? >>The disk devices are on completely different jbods. They are both through >>the same HBA(the server only has 1 PCI slot) >>--David Dougall >> >> >>On Thu, 20 Jan 2005, Kanoa Withington wrote: >> >> >> >>>Yes, that's a standard XFS timeout and shutdown. If your second disk >>>is on the sme SCSI channel try moving it to a different one, >>>preferably a different controller alotgether. >>> >>>Your disk 08:10 does have real problems, but they are separate from >>>the XFS shutdown which should be prevented by the MD layer. >>> >>>-Kanoa >>> >>>On Thu, 20 Jan 2005, David Dougall wrote: >>> >>> >>> >>> >>>> return code = 8000002 >>>>Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >>>>sense key >>>> Hardware Error >>>>Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453441 >>>>Jan 10 11:56:08 linux-sg2 kernel: I/O error in filesystem >>>>("device-mapper(254,1) >>>>") meta-data dev device-mapper(254,1) block 0x18fa318f >>>>("xlog_iodone") err >>>>or 5 buf count 2048 >>>>Jan 10 11:56:08 linux-sg2 kernel: >>>>xfs_force_shutdown(device-mapper(254,1),0x2) c >>>>alled from line 966 of file xfs_log.c. Return address = 0xc0246d9b >>>>Jan 10 11:56:08 linux-sg2 kernel: Filesystem "device-mapper(254,1)": Log >>>>I/O Err >>>>or Detected. Shutting down filesystem: device-mapper(254,1) >>>>Jan 10 11:56:08 linux-sg2 kernel: Please umount the filesystem, and >>>>rectify the >>>>problem(s) >>>> >>>> >>>>I don't see any error messages from md in any of these logs. >>>>--David Dougall >>>> >>>> >>>> >>>> >>> >>> >>> >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: No response? 2005-01-20 18:57 ` David Dougall 2005-01-20 19:12 ` Kanoa Withington @ 2005-01-20 19:18 ` Guy 2005-01-20 19:24 ` Peter T. Breuer 2005-01-20 19:28 ` Mark Bellon 3 siblings, 0 replies; 22+ messages in thread From: Guy @ 2005-01-20 19:18 UTC (permalink / raw) To: 'David Dougall', 'Mario Holbe'; +Cc: linux-raid Are you sure it is RAID? Maybe hardware RAID? Send the output of these commands: cat /proc/mdstat df mdadm -D /dev/md? If using LVM: vgdisplay -v Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Dougall Sent: Thursday, January 20, 2005 1:57 PM To: Mario Holbe Cc: linux-raid@vger.kernel.org Subject: Re: No response? The following appears to be relavent information from the syslog file: Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:06 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:06 linux-sg2 kernel: I/O error: dev 08:10, sector 314179976 Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:06 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:06 linux-sg2 kernel: I/O error: dev 08:10, sector 314179969 Jan 10 11:56:06 linux-sg2 kernel: XFS: device device-mapper(254,1)- XFS write er ror in file system meta-data block 0x2bb20008 in device-mapper(254,1) Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 144067 Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 62129592 Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 144131 Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 104726920 Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 385 Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 157090184 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453448 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 343219280 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 392 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 104726913 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 144143 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 157090177 Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 lun 47 return code = 8000002 Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: sense key Hardware Error Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453441 Jan 10 11:56:08 linux-sg2 kernel: I/O error in filesystem ("device-mapper(254,1) ") meta-data dev device-mapper(254,1) block 0x18fa318f ("xlog_iodone") err or 5 buf count 2048 Jan 10 11:56:08 linux-sg2 kernel: xfs_force_shutdown(device-mapper(254,1),0x2) c alled from line 966 of file xfs_log.c. Return address = 0xc0246d9b Jan 10 11:56:08 linux-sg2 kernel: Filesystem "device-mapper(254,1)": Log I/O Err or Detected. Shutting down filesystem: device-mapper(254,1) Jan 10 11:56:08 linux-sg2 kernel: Please umount the filesystem, and rectify the problem(s) I don't see any error messages from md in any of these logs. --David Dougall On Thu, 20 Jan 2005, Mario Holbe wrote: > David Dougall <davidd@et.byu.edu> wrote: > > If I am running software raid1 and a disk device starts throwing I/O > > errors, Is the filesystem supposed to see any indication of this? I > > Usually this should not happen. Presumed a) this device is not the > only active device in this RAID1 and b) this device is the only > failing one. > > > I have servers with xfs as the filesystem and xfs will start to throw I/O > > errors when a disk starts acting up even with software raid in between. > > It could be helpful to show the messages appearing (dmesg), the > RAID setup (cat /proc/mdstat) and the mount (cat /etc/fstab /etc/mtab > or /proc/mounts). > > > regards, > Mario > -- > <jv> Oh well, config > <jv> one actually wonders what force in the universe is holding it > <jv> and makes it working > <Beeth> chances and accidents :) > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 18:57 ` David Dougall 2005-01-20 19:12 ` Kanoa Withington 2005-01-20 19:18 ` Guy @ 2005-01-20 19:24 ` Peter T. Breuer 2005-01-20 19:51 ` David Dougall 2005-01-20 19:28 ` Mark Bellon 3 siblings, 1 reply; 22+ messages in thread From: Peter T. Breuer @ 2005-01-20 19:24 UTC (permalink / raw) To: linux-raid David Dougall <davidd@et.byu.edu> wrote: > Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 > lun 47 > return code = 8000002 That is sda. > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 343219280 Well, I don't really understand - that is sdb, no? No? (or sda10 if the numbers are in decimal instead of hex). I don't know why sda and sdb seem to be confused! The "lun 47" also has me wondering! > Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: > sense key > Hardware Error > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 392 sdb, very insistently, at a completely different sector. Looks belly up. > Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 > lun 47 > return code = 8000002 But 0:0:0 should be sda! > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453441 > Jan 10 11:56:08 linux-sg2 kernel: I/O error in filesystem > ("device-mapper(254,1) > ") meta-data dev device-mapper(254,1) block 0x18fa318f > ("xlog_iodone") err > or 5 buf count 2048 > Jan 10 11:56:08 linux-sg2 kernel: > xfs_force_shutdown(device-mapper(254,1),0x2) c > alled from line 966 of file xfs_log.c. Return address = 0xc0246d9b > Jan 10 11:56:08 linux-sg2 kernel: Filesystem "device-mapper(254,1)": Log > I/O Err > or Detected. Shutting down filesystem: device-mapper(254,1) > Jan 10 11:56:08 linux-sg2 kernel: Please umount the filesystem, and > rectify the > problem(s) > > > I don't see any error messages from md in any of these logs. Why is the device mapper implicated? Doesn't that mean that you are using LVM and not raid? 2.4.26, you said? Hmm. Peter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 19:24 ` Peter T. Breuer @ 2005-01-20 19:51 ` David Dougall 0 siblings, 0 replies; 22+ messages in thread From: David Dougall @ 2005-01-20 19:51 UTC (permalink / raw) To: Peter T. Breuer; +Cc: linux-raid Unfortunately my email client wrapped the line. It is not sda. It is actually sdb. --David Dougall On Thu, 20 Jan 2005, Peter T. Breuer wrote: > David Dougall <davidd@et.byu.edu> wrote: > > Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 > > lun 47 > > return code = 8000002 > > That is sda. > > > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 343219280 > > Well, I don't really understand - that is sdb, no? No? (or sda10 if the > numbers are in decimal instead of hex). > > I don't know why sda and sdb seem to be confused! The "lun 47" also has > me wondering! > > > Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: > > sense key > > Hardware Error > > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 392 > > sdb, very insistently, at a completely different sector. Looks belly > up. > > > Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 > > lun 47 > > return code = 8000002 > > But 0:0:0 should be sda! > > > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453441 > > Jan 10 11:56:08 linux-sg2 kernel: I/O error in filesystem > > ("device-mapper(254,1) > > ") meta-data dev device-mapper(254,1) block 0x18fa318f > > ("xlog_iodone") err > > or 5 buf count 2048 > > Jan 10 11:56:08 linux-sg2 kernel: > > xfs_force_shutdown(device-mapper(254,1),0x2) c > > alled from line 966 of file xfs_log.c. Return address = 0xc0246d9b > > Jan 10 11:56:08 linux-sg2 kernel: Filesystem "device-mapper(254,1)": Log > > I/O Err > > or Detected. Shutting down filesystem: device-mapper(254,1) > > Jan 10 11:56:08 linux-sg2 kernel: Please umount the filesystem, and > > rectify the > > problem(s) > > > > > > I don't see any error messages from md in any of these logs. > > Why is the device mapper implicated? Doesn't that mean that you are > using LVM and not raid? 2.4.26, you said? Hmm. > > Peter > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 18:57 ` David Dougall ` (2 preceding siblings ...) 2005-01-20 19:24 ` Peter T. Breuer @ 2005-01-20 19:28 ` Mark Bellon 3 siblings, 0 replies; 22+ messages in thread From: Mark Bellon @ 2005-01-20 19:28 UTC (permalink / raw) To: David Dougall; +Cc: Mario Holbe, linux-raid This looks like MD did it's thing properly and there is a timeout within XFS. mark >The following appears to be relavent information from the syslog file: > >Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:06 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:06 linux-sg2 kernel: I/O error: dev 08:10, sector 314179976 >Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:06 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:06 linux-sg2 kernel: I/O error: dev 08:10, sector 314179969 >Jan 10 11:56:06 linux-sg2 kernel: XFS: device device-mapper(254,1)- XFS >write er >ror in file system meta-data block 0x2bb20008 in device-mapper(254,1) >Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 144067 >Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 62129592 >Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 144131 >Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 104726920 >Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:07 linux-sg2 kernel: I/O error: dev 08:10, sector 385 >Jan 10 11:56:07 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:07 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 157090184 >Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453448 >Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 343219280 >Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 392 >Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 104726913 >Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 144143 >Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 157090177 >Jan 10 11:56:08 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 >lun 47 > return code = 8000002 >Jan 10 11:56:08 linux-sg2 kernel: Info fld=0xc7c0181, Current sd08:10: >sense key > Hardware Error >Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 209453441 >Jan 10 11:56:08 linux-sg2 kernel: I/O error in filesystem >("device-mapper(254,1) >") meta-data dev device-mapper(254,1) block 0x18fa318f >("xlog_iodone") err >or 5 buf count 2048 >Jan 10 11:56:08 linux-sg2 kernel: >xfs_force_shutdown(device-mapper(254,1),0x2) c >alled from line 966 of file xfs_log.c. Return address = 0xc0246d9b >Jan 10 11:56:08 linux-sg2 kernel: Filesystem "device-mapper(254,1)": Log >I/O Err >or Detected. Shutting down filesystem: device-mapper(254,1) >Jan 10 11:56:08 linux-sg2 kernel: Please umount the filesystem, and >rectify the >problem(s) > > >I don't see any error messages from md in any of these logs. >--David Dougall > > > > >On Thu, 20 Jan 2005, Mario Holbe wrote: > > > >>David Dougall <davidd@et.byu.edu> wrote: >> >> >>>If I am running software raid1 and a disk device starts throwing I/O >>>errors, Is the filesystem supposed to see any indication of this? I >>> >>> >>Usually this should not happen. Presumed a) this device is not the >>only active device in this RAID1 and b) this device is the only >>failing one. >> >> >> >>>I have servers with xfs as the filesystem and xfs will start to throw I/O >>>errors when a disk starts acting up even with software raid in between. >>> >>> >>It could be helpful to show the messages appearing (dmesg), the >>RAID setup (cat /proc/mdstat) and the mount (cat /etc/fstab /etc/mtab >>or /proc/mounts). >> >> >>regards, >> Mario >>-- >><jv> Oh well, config >><jv> one actually wonders what force in the universe is holding it >><jv> and makes it working >><Beeth> chances and accidents :) >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> >> >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: No response? 2005-01-20 17:55 No response? David Dougall ` (3 preceding siblings ...) 2005-01-20 18:30 ` Mario Holbe @ 2005-01-20 18:49 ` Kanoa Withington 4 siblings, 0 replies; 22+ messages in thread From: Kanoa Withington @ 2005-01-20 18:49 UTC (permalink / raw) To: David Dougall; +Cc: linux-raid Hi David, I have several systems with similar kernels, raid1 mirrors and XFS filesystems and I don't have the problem you are talking about so there is no inherent incompatibility. XFS will, however, shut down a filesystem if an I/O timeout is reached. I have seen this on a system with SCSI drives where the host controller keeps resetting itself to try to get a failing disk back. If both disks are on the same SCSI channel, the card resets can sometimes be long enough to trip the XFS timeout. The solution is to make sure your mirror elements are on different host controllers, no matter what type of interface (SCSI/IDE/ETC) they are using. After an XFS timeout you can unmount and remount the filesystem, unless of course it is the root filesystem. -Kanoa On Thu, 20 Jan 2005, David Dougall wrote: > Perhaps I was asking a stupid question or an obvious one, but I have > received not response. > Maybe if I simplify the question... > > If I am running software raid1 and a disk device starts throwing I/O > errors, Is the filesystem supposed to see any indication of this? I > thought software raid would mask all of this and just fail the drive. > > I have servers with xfs as the filesystem and xfs will start to throw I/O > errors when a disk starts acting up even with software raid in between. > Please advise on how I can confirm my setup or if this is possibly a bug > how to diagnose further. > If it makes a difference, I am running linux-2.4.26 > Thanks > --David Dougall > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2005-01-20 19:51 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-01-20 17:55 No response? David Dougall 2005-01-20 18:12 ` Peter T. Breuer 2005-01-20 18:14 ` Gordon Henderson 2005-01-20 18:37 ` Mark Bellon 2005-01-20 19:15 ` David Dougall 2005-01-20 19:35 ` Mark Bellon 2005-01-20 19:37 ` Gordon Henderson 2005-01-20 19:41 ` Mark Bellon 2005-01-20 19:49 ` David Dougall 2005-01-20 18:21 ` Mike Hardy 2005-01-20 18:30 ` Mario Holbe 2005-01-20 18:57 ` David Dougall 2005-01-20 19:12 ` Kanoa Withington 2005-01-20 19:17 ` David Dougall 2005-01-20 19:23 ` Guy 2005-01-20 19:34 ` Kanoa Withington 2005-01-20 19:44 ` Mark Bellon 2005-01-20 19:18 ` Guy 2005-01-20 19:24 ` Peter T. Breuer 2005-01-20 19:51 ` David Dougall 2005-01-20 19:28 ` Mark Bellon 2005-01-20 18:49 ` Kanoa Withington
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).