could someone plz explain those ext3/hard disk errors

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* could someone plz explain those ext3/hard disk errors
@ 2004-02-08 17:53 JG
  2004-02-08 18:55 ` Micha Feigin
  2004-02-09  1:47 ` Heriberto A Tejeda
  0 siblings, 2 replies; 13+ messages in thread
From: JG @ 2004-02-08 17:53 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6418 bytes --]

hi,

i'm getting many many disk errors and there already died 4 hard disks (IDE, different vendors) within two months (even the newer ones). i was able to save most of the data, but now even my spare disks are full and since 1 hour the next 2 disks are showing problems.
i don't know if the mainboard, the raid controller or all disks are dying, everything else works fine but it seems very strange that 6 disks die that fast.

could someone please explain those errors? i know they are many, but i'm completely lost.

hdi = 200gb maxtor
hdj = 200gb maxtor
hdk = 120gb seagate
mainboard = elitegroup k7s5a
raidcontroller = highpoint rocketraid 404, hpt374
kernel 2.6.2

recently died 3 160gb maxtor and 1 120gb hitachi/ibm
------------
init_special_inode: bogus i_mode (76557)
init_special_inode: bogus i_mode (74557)
init_special_inode: bogus i_mode (74557)
EXT3-fs error (device hdf1): ext3_readdir: bad entry in directory #15941633: directory entry across blocks - offset=0, inode=410738721, rec_len=4124, name_len=25
Aborting journal on device hdf1.

hdi: task_out_intr: status=0x7f { DriveReady DeviceFault SeekComplete DataRequest CorrectedError Index Error }
hdi: task_out_intr: error=0x7f { DriveStatusError UncorrectableError SectorIdNotFound TrackZeroNotFound AddrMarkNotFound }, LBAsect=280923064991615, high=16744319, low=8355711, sector=199145949
ide4: reset: success

hdi: drive not ready for command
ide4: reset: master: error (0x7f?)
blk: queue e7ca1800, I/O limit 4095Mb (mask 0xffffffff)
end_request: I/O error, dev hdi, sector 199145948
Buffer I/O error on device hdi2, logical block 526
lost page write due to I/O error on hdi2
end_request: I/O error, dev hdi, sector 57151695
end_request: I/O error, dev hdi, sector 199141740
Buffer I/O error on device hdi2, logical block 0
lost page write due to I/O error on hdi2
hdk: dma_timer_expiry: dma status == 0x00
hdk: DMA timeout retry
hdk: timeout waiting for DMA
hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest }

hdk: drive not ready for command
end_request: I/O error, dev hdi, sector 57151695
end_request: I/O error, dev hdi, sector 57151695
EXT3-fs error (device hdi1): ext3_readdir: directory #3571715 contains a hole at offset 0
Aborting journal on device hdi1.
end_request: I/O error, dev hdi, sector 4271
Buffer I/O error on device hdi1, logical block 526
lost page write due to I/O error on hdi1
end_request: I/O error, dev hdi, sector 63
Buffer I/O error on device hdi1, logical block 0
lost page write due to I/O error on hdi1
ext3_abort called.
EXT3-fs abort (device hdi1): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only
end_request: I/O error, dev hdi, sector 57151695
EXT3-fs error (device hdi1): ext3_readdir: directory #3571715 contains a hole at offset 0
end_request: I/O error, dev hdi, sector 57151695
end_request: I/O error, dev hdi, sector 57151695

EXT3-fs error (device hdi2): ext3_readdir: directory #32769 contains a hole at offset 0
buffer layer error at fs/buffer.c:1262
Call Trace:
 [<c0150684>] mark_buffer_dirty+0x54/0x60
 [<c0192151>] ext3_commit_super+0x51/0x90
 [<c018fdb5>] ext3_handle_error+0x75/0xb0
 [<c018fe45>] ext3_error+0x55/0x60
 [<c0185c7d>] ext3_readdir+0x4cd/0x500
 [<c01ca863>] capable+0x23/0x50
 [<c014363d>] do_brk+0x15d/0x230
 [<c015f85c>] vfs_readdir+0x9c/0xa0
 [<c015fb70>] filldir64+0x0/0x150
 [<c015fd3b>] sys_getdents64+0x7b/0xc1
 [<c015fb70>] filldir64+0x0/0x150
 [<c0108f0f>] syscall_call+0x7/0xb

buffer layer error at fs/buffer.c:2666
Call Trace:
 [<c0152a90>] submit_bh+0x1a0/0x200
 [<c01398ab>] __set_page_dirty_nobuffers+0x7b/0x90
 [<c0152bdc>] sync_dirty_buffer+0x5c/0xc0
 [<c018fdb5>] ext3_handle_error+0x75/0xb0
 [<c018fe45>] ext3_error+0x55/0x60
 [<c0185c7d>] ext3_readdir+0x4cd/0x500
 [<c01ca863>] capable+0x23/0x50
 [<c014363d>] do_brk+0x15d/0x230
 [<c015f85c>] vfs_readdir+0x9c/0xa0
 [<c015fb70>] filldir64+0x0/0x150
 [<c015fd3b>] sys_getdents64+0x7b/0xc1
 [<c015fb70>] filldir64+0x0/0x150
 [<c0108f0f>] syscall_call+0x7/0xb

end_request: I/O error, dev hdi, sector 199141740
Buffer I/O error on device hdi2, logical block 0
lost page write due to I/O error on hdi2
ext3_abort called.
EXT3-fs abort (device hdi2): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only

buffer layer error at fs/buffer.c:1262
Call Trace:
 [<c0150684>] mark_buffer_dirty+0x54/0x60
 [<c019c4a0>] journal_update_superblock+0x50/0xb0
 [<c01656f3>] destroy_inode+0x43/0x70
 [<c019c7e9>] journal_destroy+0xa9/0x190
 [<c019493e>] ext3_xattr_put_super+0x1e/0x30
 [<c01902f9>] ext3_put_super+0x29/0x190
 [<c0154296>] generic_shutdown_super+0xf6/0x110
 [<c0154bbd>] kill_block_super+0x1d/0x50
 [<c01540d8>] deactivate_super+0x48/0x80
 [<c0168a6f>] sys_umount+0x3f/0x90
 [<c0168ad7>] sys_oldumount+0x17/0x20
 [<c0108f0f>] syscall_call+0x7/0xb

buffer layer error at fs/buffer.c:2666
Call Trace:
 [<c0152a90>] submit_bh+0x1a0/0x200
 [<c0152bdc>] sync_dirty_buffer+0x5c/0xc0
 [<c015066b>] mark_buffer_dirty+0x3b/0x60
 [<c019c4b4>] journal_update_superblock+0x64/0xb0
 [<c01656f3>] destroy_inode+0x43/0x70
 [<c019c7e9>] journal_destroy+0xa9/0x190
 [<c019493e>] ext3_xattr_put_super+0x1e/0x30
 [<c01902f9>] ext3_put_super+0x29/0x190
 [<c0154296>] generic_shutdown_super+0xf6/0x110
 [<c0154bbd>] kill_block_super+0x1d/0x50
 [<c01540d8>] deactivate_super+0x48/0x80
 [<c0168a6f>] sys_umount+0x3f/0x90
 [<c0168ad7>] sys_oldumount+0x17/0x20
 [<c0108f0f>] syscall_call+0x7/0xb

-------
hdj: task_out_intr: status=0x51 { DriveReady SeekComplete Error }
hdj: task_out_intr: error=0x04 { DriveStatusError }
hdj: task_out_intr: status=0x58 { DriveReady SeekComplete DataRequest }
hdj: task_out_intr: status=0x51 { DriveReady SeekComplete Error }
hdj: task_out_intr: error=0x51 { UncorrectableError SectorIdNotFound AddrMarkNotFound }, LBAsect=141
284313497587, high=8421201, low=5341171, sector=4326
end_request: I/O error, dev hdj, sector 4319
Buffer I/O error on device hdj1, logical block 532
lost page write due to I/O error on hdj1
hdj: status error: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdj: status error: error=0x51 { UncorrectableError SectorIdNotFound AddrMarkNotFound }, LBAsect=1412
84313497471, high=8421201, low=5341055, sector=63
end_request: I/O error, dev hdj, sector 63
Buffer I/O error on device hdj1, logical block 0
lost page write due to I/O error on hdj1
hdj: no DRQ after issuing WRITE_EXT

[...]

thx!
JG


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-08 17:53 could someone plz explain those ext3/hard disk errors JG
@ 2004-02-08 18:55 ` Micha Feigin
  2004-02-08 19:05   ` JG
  2004-02-09  1:47 ` Heriberto A Tejeda
  1 sibling, 1 reply; 13+ messages in thread
From: Micha Feigin @ 2004-02-08 18:55 UTC (permalink / raw)
  To: linux-kernel

On Sun, Feb 08, 2004 at 06:53:34PM +0100, JG wrote:
> hi,
> 
> i'm getting many many disk errors and there already died 4 hard disks (IDE, different vendors) within two months (even the newer ones). i was able to save most of the data, but now even my spare disks are full and since 1 hour the next 2 disks are showing problems.
> i don't know if the mainboard, the raid controller or all disks are dying, everything else works fine but it seems very strange that 6 disks die that fast.
> 

It could be power surges. Hard disks are very sensitive to that.

Also have you run fsck on these disks ? could it be that you are not
shutting them down properly?

> could someone please explain those errors? i know they are many, but i'm completely lost.
> 
> hdi = 200gb maxtor
> hdj = 200gb maxtor
> hdk = 120gb seagate
> mainboard = elitegroup k7s5a
> raidcontroller = highpoint rocketraid 404, hpt374
> kernel 2.6.2
> 
> recently died 3 160gb maxtor and 1 120gb hitachi/ibm
> ------------
> init_special_inode: bogus i_mode (76557)
> init_special_inode: bogus i_mode (74557)
> init_special_inode: bogus i_mode (74557)
> EXT3-fs error (device hdf1): ext3_readdir: bad entry in directory #15941633: directory entry across blocks - offset=0, inode=410738721, rec_len=4124, name_len=25
> Aborting journal on device hdf1.
> 
> hdi: task_out_intr: status=0x7f { DriveReady DeviceFault SeekComplete DataRequest CorrectedError Index Error }
> hdi: task_out_intr: error=0x7f { DriveStatusError UncorrectableError SectorIdNotFound TrackZeroNotFound AddrMarkNotFound }, LBAsect=280923064991615, high=16744319, low=8355711, sector=199145949
> ide4: reset: success
> 
> hdi: drive not ready for command
> ide4: reset: master: error (0x7f?)
> blk: queue e7ca1800, I/O limit 4095Mb (mask 0xffffffff)
> end_request: I/O error, dev hdi, sector 199145948
> Buffer I/O error on device hdi2, logical block 526
> lost page write due to I/O error on hdi2
> end_request: I/O error, dev hdi, sector 57151695
> end_request: I/O error, dev hdi, sector 199141740
> Buffer I/O error on device hdi2, logical block 0
> lost page write due to I/O error on hdi2
> hdk: dma_timer_expiry: dma status == 0x00
> hdk: DMA timeout retry
> hdk: timeout waiting for DMA
> hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> 
> hdk: drive not ready for command
> end_request: I/O error, dev hdi, sector 57151695
> end_request: I/O error, dev hdi, sector 57151695
> EXT3-fs error (device hdi1): ext3_readdir: directory #3571715 contains a hole at offset 0
> Aborting journal on device hdi1.
> end_request: I/O error, dev hdi, sector 4271
> Buffer I/O error on device hdi1, logical block 526
> lost page write due to I/O error on hdi1
> end_request: I/O error, dev hdi, sector 63
> Buffer I/O error on device hdi1, logical block 0
> lost page write due to I/O error on hdi1
> ext3_abort called.
> EXT3-fs abort (device hdi1): ext3_journal_start: Detected aborted journal
> Remounting filesystem read-only
> end_request: I/O error, dev hdi, sector 57151695
> EXT3-fs error (device hdi1): ext3_readdir: directory #3571715 contains a hole at offset 0
> end_request: I/O error, dev hdi, sector 57151695
> end_request: I/O error, dev hdi, sector 57151695
> 
> EXT3-fs error (device hdi2): ext3_readdir: directory #32769 contains a hole at offset 0
> buffer layer error at fs/buffer.c:1262
> Call Trace:
>  [<c0150684>] mark_buffer_dirty+0x54/0x60
>  [<c0192151>] ext3_commit_super+0x51/0x90
>  [<c018fdb5>] ext3_handle_error+0x75/0xb0
>  [<c018fe45>] ext3_error+0x55/0x60
>  [<c0185c7d>] ext3_readdir+0x4cd/0x500
>  [<c01ca863>] capable+0x23/0x50
>  [<c014363d>] do_brk+0x15d/0x230
>  [<c015f85c>] vfs_readdir+0x9c/0xa0
>  [<c015fb70>] filldir64+0x0/0x150
>  [<c015fd3b>] sys_getdents64+0x7b/0xc1
>  [<c015fb70>] filldir64+0x0/0x150
>  [<c0108f0f>] syscall_call+0x7/0xb
> 
> buffer layer error at fs/buffer.c:2666
> Call Trace:
>  [<c0152a90>] submit_bh+0x1a0/0x200
>  [<c01398ab>] __set_page_dirty_nobuffers+0x7b/0x90
>  [<c0152bdc>] sync_dirty_buffer+0x5c/0xc0
>  [<c018fdb5>] ext3_handle_error+0x75/0xb0
>  [<c018fe45>] ext3_error+0x55/0x60
>  [<c0185c7d>] ext3_readdir+0x4cd/0x500
>  [<c01ca863>] capable+0x23/0x50
>  [<c014363d>] do_brk+0x15d/0x230
>  [<c015f85c>] vfs_readdir+0x9c/0xa0
>  [<c015fb70>] filldir64+0x0/0x150
>  [<c015fd3b>] sys_getdents64+0x7b/0xc1
>  [<c015fb70>] filldir64+0x0/0x150
>  [<c0108f0f>] syscall_call+0x7/0xb
> 
> end_request: I/O error, dev hdi, sector 199141740
> Buffer I/O error on device hdi2, logical block 0
> lost page write due to I/O error on hdi2
> ext3_abort called.
> EXT3-fs abort (device hdi2): ext3_journal_start: Detected aborted journal
> Remounting filesystem read-only
> 
> buffer layer error at fs/buffer.c:1262
> Call Trace:
>  [<c0150684>] mark_buffer_dirty+0x54/0x60
>  [<c019c4a0>] journal_update_superblock+0x50/0xb0
>  [<c01656f3>] destroy_inode+0x43/0x70
>  [<c019c7e9>] journal_destroy+0xa9/0x190
>  [<c019493e>] ext3_xattr_put_super+0x1e/0x30
>  [<c01902f9>] ext3_put_super+0x29/0x190
>  [<c0154296>] generic_shutdown_super+0xf6/0x110
>  [<c0154bbd>] kill_block_super+0x1d/0x50
>  [<c01540d8>] deactivate_super+0x48/0x80
>  [<c0168a6f>] sys_umount+0x3f/0x90
>  [<c0168ad7>] sys_oldumount+0x17/0x20
>  [<c0108f0f>] syscall_call+0x7/0xb
> 
> buffer layer error at fs/buffer.c:2666
> Call Trace:
>  [<c0152a90>] submit_bh+0x1a0/0x200
>  [<c0152bdc>] sync_dirty_buffer+0x5c/0xc0
>  [<c015066b>] mark_buffer_dirty+0x3b/0x60
>  [<c019c4b4>] journal_update_superblock+0x64/0xb0
>  [<c01656f3>] destroy_inode+0x43/0x70
>  [<c019c7e9>] journal_destroy+0xa9/0x190
>  [<c019493e>] ext3_xattr_put_super+0x1e/0x30
>  [<c01902f9>] ext3_put_super+0x29/0x190
>  [<c0154296>] generic_shutdown_super+0xf6/0x110
>  [<c0154bbd>] kill_block_super+0x1d/0x50
>  [<c01540d8>] deactivate_super+0x48/0x80
>  [<c0168a6f>] sys_umount+0x3f/0x90
>  [<c0168ad7>] sys_oldumount+0x17/0x20
>  [<c0108f0f>] syscall_call+0x7/0xb
> 
> -------
> hdj: task_out_intr: status=0x51 { DriveReady SeekComplete Error }
> hdj: task_out_intr: error=0x04 { DriveStatusError }
> hdj: task_out_intr: status=0x58 { DriveReady SeekComplete DataRequest }
> hdj: task_out_intr: status=0x51 { DriveReady SeekComplete Error }
> hdj: task_out_intr: error=0x51 { UncorrectableError SectorIdNotFound AddrMarkNotFound }, LBAsect=141
> 284313497587, high=8421201, low=5341171, sector=4326
> end_request: I/O error, dev hdj, sector 4319
> Buffer I/O error on device hdj1, logical block 532
> lost page write due to I/O error on hdj1
> hdj: status error: status=0x59 { DriveReady SeekComplete DataRequest Error }
> hdj: status error: error=0x51 { UncorrectableError SectorIdNotFound AddrMarkNotFound }, LBAsect=1412
> 84313497471, high=8421201, low=5341055, sector=63
> end_request: I/O error, dev hdj, sector 63
> Buffer I/O error on device hdj1, logical block 0
> lost page write due to I/O error on hdj1
> hdj: no DRQ after issuing WRITE_EXT
> 
> [...]
> 
> thx!
> JG
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-08 18:55 ` Micha Feigin
@ 2004-02-08 19:05   ` JG
  0 siblings, 0 replies; 13+ messages in thread
From: JG @ 2004-02-08 19:05 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 654 bytes --]

hi,

thanks for your anwser!

> It could be power surges. Hard disks are very sensitive to that.
> 
> Also have you run fsck on these disks ? could it be that you are not
> shutting them down properly?

i'm using an enermax 550W PSU attached to a APC UPS, at the moment there are 10 disks and 1 cdrom in the server.
the server is usually running 24/7 but i had some crashes within the last weeks due to hard disk failures so it might be possible that the other disks were affected. i don't dare to fsck the disks now because i couldn't backup the data yet (no spare disks) and i have lost too much data because of fsck'ing my hard disks in the past.

JG

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-08 17:53 could someone plz explain those ext3/hard disk errors JG
  2004-02-08 18:55 ` Micha Feigin
@ 2004-02-09  1:47 ` Heriberto A Tejeda
  2004-02-09  9:52   ` JG
  1 sibling, 1 reply; 13+ messages in thread
From: Heriberto A Tejeda @ 2004-02-09  1:47 UTC (permalink / raw)
  To: linux-kernel

I have had the same exact errors.. and i have RMA'ed two disks... already.
i just thought it was bad luck.. 

i power down properly and i dont think its power surges, since i am in school and it only happens to me. i would expect my neighbors to have
problems too.

-eddie

On Sun, Feb 08, 2004 at 06:53:34PM +0100, JG wrote:
> hi,
> 
> i'm getting many many disk errors and there already died 4 hard disks (IDE, different vendors) within two months (even the newer ones). i was able to save most of the data, but now even my spare disks are full and since 1 hour the next 2 disks are showing problems.
> i don't know if the mainboard, the raid controller or all disks are dying, everything else works fine but it seems very strange that 6 disks die that fast.
> 
> could someone please explain those errors? i know they are many, but i'm completely lost.
> 
> hdi = 200gb maxtor
> hdj = 200gb maxtor
> hdk = 120gb seagate
> mainboard = elitegroup k7s5a
> raidcontroller = highpoint rocketraid 404, hpt374
> kernel 2.6.2
> 
> recently died 3 160gb maxtor and 1 120gb hitachi/ibm
> ------------
> init_special_inode: bogus i_mode (76557)
> init_special_inode: bogus i_mode (74557)
> init_special_inode: bogus i_mode (74557)
> EXT3-fs error (device hdf1): ext3_readdir: bad entry in directory #15941633: directory entry across blocks - offset=0, inode=410738721, rec_len=4124, name_len=25
> Aborting journal on device hdf1.
> 
> hdi: task_out_intr: status=0x7f { DriveReady DeviceFault SeekComplete DataRequest CorrectedError Index Error }
> hdi: task_out_intr: error=0x7f { DriveStatusError UncorrectableError SectorIdNotFound TrackZeroNotFound AddrMarkNotFound }, LBAsect=280923064991615, high=16744319, low=8355711, sector=199145949
> ide4: reset: success
> 
> hdi: drive not ready for command
> ide4: reset: master: error (0x7f?)
> blk: queue e7ca1800, I/O limit 4095Mb (mask 0xffffffff)
> end_request: I/O error, dev hdi, sector 199145948
> Buffer I/O error on device hdi2, logical block 526
> lost page write due to I/O error on hdi2
> end_request: I/O error, dev hdi, sector 57151695
> end_request: I/O error, dev hdi, sector 199141740
> Buffer I/O error on device hdi2, logical block 0
> lost page write due to I/O error on hdi2
> hdk: dma_timer_expiry: dma status == 0x00
> hdk: DMA timeout retry
> hdk: timeout waiting for DMA
> hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> 
> hdk: drive not ready for command
> end_request: I/O error, dev hdi, sector 57151695
> end_request: I/O error, dev hdi, sector 57151695
> EXT3-fs error (device hdi1): ext3_readdir: directory #3571715 contains a hole at offset 0
> Aborting journal on device hdi1.
> end_request: I/O error, dev hdi, sector 4271
> Buffer I/O error on device hdi1, logical block 526
> lost page write due to I/O error on hdi1
> end_request: I/O error, dev hdi, sector 63
> Buffer I/O error on device hdi1, logical block 0
> lost page write due to I/O error on hdi1
> ext3_abort called.
> EXT3-fs abort (device hdi1): ext3_journal_start: Detected aborted journal
> Remounting filesystem read-only
> end_request: I/O error, dev hdi, sector 57151695
> EXT3-fs error (device hdi1): ext3_readdir: directory #3571715 contains a hole at offset 0
> end_request: I/O error, dev hdi, sector 57151695
> end_request: I/O error, dev hdi, sector 57151695
> 
> EXT3-fs error (device hdi2): ext3_readdir: directory #32769 contains a hole at offset 0
> buffer layer error at fs/buffer.c:1262
> Call Trace:
>  [<c0150684>] mark_buffer_dirty+0x54/0x60
>  [<c0192151>] ext3_commit_super+0x51/0x90
>  [<c018fdb5>] ext3_handle_error+0x75/0xb0
>  [<c018fe45>] ext3_error+0x55/0x60
>  [<c0185c7d>] ext3_readdir+0x4cd/0x500
>  [<c01ca863>] capable+0x23/0x50
>  [<c014363d>] do_brk+0x15d/0x230
>  [<c015f85c>] vfs_readdir+0x9c/0xa0
>  [<c015fb70>] filldir64+0x0/0x150
>  [<c015fd3b>] sys_getdents64+0x7b/0xc1
>  [<c015fb70>] filldir64+0x0/0x150
>  [<c0108f0f>] syscall_call+0x7/0xb
> 
> buffer layer error at fs/buffer.c:2666
> Call Trace:
>  [<c0152a90>] submit_bh+0x1a0/0x200
>  [<c01398ab>] __set_page_dirty_nobuffers+0x7b/0x90
>  [<c0152bdc>] sync_dirty_buffer+0x5c/0xc0
>  [<c018fdb5>] ext3_handle_error+0x75/0xb0
>  [<c018fe45>] ext3_error+0x55/0x60
>  [<c0185c7d>] ext3_readdir+0x4cd/0x500
>  [<c01ca863>] capable+0x23/0x50
>  [<c014363d>] do_brk+0x15d/0x230
>  [<c015f85c>] vfs_readdir+0x9c/0xa0
>  [<c015fb70>] filldir64+0x0/0x150
>  [<c015fd3b>] sys_getdents64+0x7b/0xc1
>  [<c015fb70>] filldir64+0x0/0x150
>  [<c0108f0f>] syscall_call+0x7/0xb
> 
> end_request: I/O error, dev hdi, sector 199141740
> Buffer I/O error on device hdi2, logical block 0
> lost page write due to I/O error on hdi2
> ext3_abort called.
> EXT3-fs abort (device hdi2): ext3_journal_start: Detected aborted journal
> Remounting filesystem read-only
> 
> buffer layer error at fs/buffer.c:1262
> Call Trace:
>  [<c0150684>] mark_buffer_dirty+0x54/0x60
>  [<c019c4a0>] journal_update_superblock+0x50/0xb0
>  [<c01656f3>] destroy_inode+0x43/0x70
>  [<c019c7e9>] journal_destroy+0xa9/0x190
>  [<c019493e>] ext3_xattr_put_super+0x1e/0x30
>  [<c01902f9>] ext3_put_super+0x29/0x190
>  [<c0154296>] generic_shutdown_super+0xf6/0x110
>  [<c0154bbd>] kill_block_super+0x1d/0x50
>  [<c01540d8>] deactivate_super+0x48/0x80
>  [<c0168a6f>] sys_umount+0x3f/0x90
>  [<c0168ad7>] sys_oldumount+0x17/0x20
>  [<c0108f0f>] syscall_call+0x7/0xb
> 
> buffer layer error at fs/buffer.c:2666
> Call Trace:
>  [<c0152a90>] submit_bh+0x1a0/0x200
>  [<c0152bdc>] sync_dirty_buffer+0x5c/0xc0
>  [<c015066b>] mark_buffer_dirty+0x3b/0x60
>  [<c019c4b4>] journal_update_superblock+0x64/0xb0
>  [<c01656f3>] destroy_inode+0x43/0x70
>  [<c019c7e9>] journal_destroy+0xa9/0x190
>  [<c019493e>] ext3_xattr_put_super+0x1e/0x30
>  [<c01902f9>] ext3_put_super+0x29/0x190
>  [<c0154296>] generic_shutdown_super+0xf6/0x110
>  [<c0154bbd>] kill_block_super+0x1d/0x50
>  [<c01540d8>] deactivate_super+0x48/0x80
>  [<c0168a6f>] sys_umount+0x3f/0x90
>  [<c0168ad7>] sys_oldumount+0x17/0x20
>  [<c0108f0f>] syscall_call+0x7/0xb
> 
> -------
> hdj: task_out_intr: status=0x51 { DriveReady SeekComplete Error }
> hdj: task_out_intr: error=0x04 { DriveStatusError }
> hdj: task_out_intr: status=0x58 { DriveReady SeekComplete DataRequest }
> hdj: task_out_intr: status=0x51 { DriveReady SeekComplete Error }
> hdj: task_out_intr: error=0x51 { UncorrectableError SectorIdNotFound AddrMarkNotFound }, LBAsect=141
> 284313497587, high=8421201, low=5341171, sector=4326
> end_request: I/O error, dev hdj, sector 4319
> Buffer I/O error on device hdj1, logical block 532
> lost page write due to I/O error on hdj1
> hdj: status error: status=0x59 { DriveReady SeekComplete DataRequest Error }
> hdj: status error: error=0x51 { UncorrectableError SectorIdNotFound AddrMarkNotFound }, LBAsect=1412
> 84313497471, high=8421201, low=5341055, sector=63
> end_request: I/O error, dev hdj, sector 63
> Buffer I/O error on device hdj1, logical block 0
> lost page write due to I/O error on hdj1
> hdj: no DRQ after issuing WRITE_EXT
> 
> [...]
> 
> thx!
> JG
> 



-- 
eddie tejeda

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-09  1:47 ` Heriberto A Tejeda
@ 2004-02-09  9:52   ` JG
  2004-02-09 10:26     ` John Bradford
  2004-02-09 11:52     ` Gene Heskett
  0 siblings, 2 replies; 13+ messages in thread
From: JG @ 2004-02-09  9:52 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1017 bytes --]

hi,

> I have had the same exact errors.. and i have RMA'ed two disks... already.
> i just thought it was bad luck.. 
> 
> i power down properly and i dont think its power surges, since i am in school and it only happens to me. i would expect my neighbors to have
> problems too.

really strange, me too.
now...hm, it all started when i upgraded from kernel 2.4.19 to 2.6.0 in late decemeber, the system worked very fine for a week or so (having great response times!) but then all of a sudden the problems started. 2 disks died. then my gigabit network card was only able to transmit 200kb/s (but this was really a hardware problem, a new card is working fine again, well...). a week later the next disks are having problems and i have yet to RMA three disks. and now the next two disks..., i'm getting insane ;) i can't see any EXT3 error anymore *g* the next disks will be reiserfs only to see other error messages ;) well, but that doesn't solve the problem of 6 disks within 2 months...this is so unlikely.

JG

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-09  9:52   ` JG
@ 2004-02-09 10:26     ` John Bradford
  2004-02-09 11:06       ` JG
  2004-02-18 13:31       ` Bill Davidsen
  2004-02-09 11:52     ` Gene Heskett
  1 sibling, 2 replies; 13+ messages in thread
From: John Bradford @ 2004-02-09 10:26 UTC (permalink / raw)
  To: JG, linux-kernel

> now...hm, it all started when i upgraded from kernel 2.4.19 to 2.6.0
> in late decemeber, the system worked very fine for a week or so
> (having great response times!) but then all of a sudden the problems
> started. 2 disks died. then my gigabit network card was only able to
> transmit 200kb/s (but this was really a hardware problem, a new card
> is working fine again, well...). a week later the next disks are
> having problems and i have yet to RMA three disks. and now the next
> two disks..., i'm getting insane ;) i can't see any EXT3 error anymore
> *g* the next disks will be reiserfs only to see other error messages
> ;) well, but that doesn't solve the problem of 6 disks within 2
> months...this is so unlikely.

Please read the FAQ, fix your mail application - you are sending long
lines, and don't break the CC list.

As to your problem, look at the LBA sector addresses in the error
message:

280923064991615

is your drive really over 100 EB?  No...

John.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-09 10:26     ` John Bradford
@ 2004-02-09 11:06       ` JG
  2004-02-18 13:31       ` Bill Davidsen
  1 sibling, 0 replies; 13+ messages in thread
From: JG @ 2004-02-09 11:06 UTC (permalink / raw)
  To: John Bradford; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 534 bytes --]

hi,

> Please read the FAQ, fix your mail application - you are sending long
> lines, and don't break the CC list.

i've re-read it now, but i'm sorry, i don't know what you mean with "don't break the CC list".
the long lines were my mistake.


> As to your problem, look at the LBA sector addresses in the error
> message:
> 
> 280923064991615
> 
> is your drive really over 100 EB?  No...

i know this value can't be right, but why does such a problem arise? is it the raid-controller's driver or bios? or something else?

thx,
JG


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-09 10:26     ` John Bradford
  2004-02-09 11:06       ` JG
@ 2004-02-18 13:31       ` Bill Davidsen
  2004-02-18 14:42         ` John Bradford
  1 sibling, 1 reply; 13+ messages in thread
From: Bill Davidsen @ 2004-02-18 13:31 UTC (permalink / raw)
  To: John Bradford; +Cc: JG, linux-kernel

John Bradford wrote:
>>now...hm, it all started when i upgraded from kernel 2.4.19 to 2.6.0
>>in late decemeber, the system worked very fine for a week or so
>>(having great response times!) but then all of a sudden the problems
>>started. 2 disks died. then my gigabit network card was only able to
>>transmit 200kb/s (but this was really a hardware problem, a new card
>>is working fine again, well...). a week later the next disks are
>>having problems and i have yet to RMA three disks. and now the next
>>two disks..., i'm getting insane ;) i can't see any EXT3 error anymore
>>*g* the next disks will be reiserfs only to see other error messages
>>;) well, but that doesn't solve the problem of 6 disks within 2
>>months...this is so unlikely.
> 
> 
> Please read the FAQ, fix your mail application - you are sending long
> lines, and don't break the CC list.
> 
> As to your problem, look at the LBA sector addresses in the error
> message:
> 
> 280923064991615
> 
> is your drive really over 100 EB?  No...

I think we could assume that (a) he never told the kernel the disk was 
"over 100 EB" and (b) the kernel was trying to use that LBA anyway. 
Which could be due to either a kernel bug or memory corruption (or CPU 
problems, but unlikely).

-- 
bill davidsen <davidsen@tmr.com>
   CTO TMR Associates, Inc
   Doing interesting things with small computers since 1979

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-18 13:31       ` Bill Davidsen
@ 2004-02-18 14:42         ` John Bradford
  2004-02-18 15:54           ` JG
  2004-02-18 18:26           ` Bill Davidsen
  0 siblings, 2 replies; 13+ messages in thread
From: John Bradford @ 2004-02-18 14:42 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: JG, linux-kernel

Quote from Bill Davidsen <davidsen@tmr.com>:
> John Bradford wrote:
> >>now...hm, it all started when i upgraded from kernel 2.4.19 to 2.6.0
> >>in late decemeber, the system worked very fine for a week or so
> >>(having great response times!) but then all of a sudden the problems
> >>started. 2 disks died. then my gigabit network card was only able to
> >>transmit 200kb/s (but this was really a hardware problem, a new card
> >>is working fine again, well...). a week later the next disks are
> >>having problems and i have yet to RMA three disks. and now the next
> >>two disks..., i'm getting insane ;) i can't see any EXT3 error anymore
> >>*g* the next disks will be reiserfs only to see other error messages
> >>;) well, but that doesn't solve the problem of 6 disks within 2
> >>months...this is so unlikely.
> > 
> > 
> > Please read the FAQ, fix your mail application - you are sending long
> > lines, and don't break the CC list.
> > 
> > As to your problem, look at the LBA sector addresses in the error
> > message:
> > 
> > 280923064991615
> > 
> > is your drive really over 100 EB?  No...
> 
> I think we could assume that (a) he never told the kernel the disk was 
> "over 100 EB" and (b) the kernel was trying to use that LBA anyway. 
> Which could be due to either a kernel bug or memory corruption (or CPU 
> problems, but unlikely).

What I was trying to point out is that the error message is clearly
the result of a problem elsewhere.  Unless the drive firmware is
buggy, or something very strange is going on inside the drive, (bad
internal RAM or something like that), then the kernel did send a
request for a sector which is well out of range.  What caused that
request we don't know - quite possibly corruption of some filesystem
structure on the disk caused that request, but it's important to be
clear that the error message is the expected response to a request for
such a high block number, and doesn't within itself indicate a problem
with the disk.

I.E. Even though there is every chance that the drive is faulty, the
posted error message doesn't indicate a drive failiure in itself, and
you should look elsewhere.

John.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-18 14:42         ` John Bradford
@ 2004-02-18 15:54           ` JG
  2004-02-18 18:26           ` Bill Davidsen
  1 sibling, 0 replies; 13+ messages in thread
From: JG @ 2004-02-18 15:54 UTC (permalink / raw)
  To: John Bradford; +Cc: Bill Davidsen, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]

hi,

> I.E. Even though there is every chance that the drive is faulty, the
> posted error message doesn't indicate a drive failiure in itself, and
> you should look elsewhere.

i recently got the new disks and could backup nearly everything (after reboot the disks were accessible again, though i've lost some data).

i tried to zero out the disk with 'dd if=/dev/zero of=/dev/hdX' which led to a complete system lockup after some time.

after a reboot i wanted to run the long S.M.A.R.T. tests (smartctl -t long /dev/hdX, smartctl v5.26). it said that it is backgrounding for about 80 minutes. but again after some time => complete lockup.
i couldn't do anything anymore on the server, only sysrq-keys were working. killing the processes gave me some error messages (can't remember the exact wording but they were like: "DMA lost" on nearly every disk and some weird interrupt errors (related to the NIC).

$ cat /proc/interrupts
           CPU0
  0:  435370782          XT-PIC  timer
  1:        315          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  5:    6144225          XT-PIC  ide2, ide3, ide4, ide5
  8:          2          XT-PIC  rtc
 10:          0          XT-PIC  ohci_hcd
 11:   68722839          XT-PIC  eth1
 12:  227629214          XT-PIC  ohci_hcd, eth0
 14:    4515100          XT-PIC  ide0
 15:     643567          XT-PIC  ide1
NMI:          0
LOC:  435357136
ERR:     680356
MIS:          0

don't know if the ERR-rate is too high, this is with an uptime of 5 days. i usually have much higher ERR numbers. 

JG

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-18 14:42         ` John Bradford
  2004-02-18 15:54           ` JG
@ 2004-02-18 18:26           ` Bill Davidsen
  1 sibling, 0 replies; 13+ messages in thread
From: Bill Davidsen @ 2004-02-18 18:26 UTC (permalink / raw)
  To: John Bradford; +Cc: JG, linux-kernel

John Bradford wrote:
> Quote from Bill Davidsen <davidsen@tmr.com>:
> 
>>John Bradford wrote:
>>
>>>>now...hm, it all started when i upgraded from kernel 2.4.19 to 2.6.0
>>>>in late decemeber, the system worked very fine for a week or so
>>>>(having great response times!) but then all of a sudden the problems
>>>>started. 2 disks died. then my gigabit network card was only able to
>>>>transmit 200kb/s (but this was really a hardware problem, a new card
>>>>is working fine again, well...). a week later the next disks are
>>>>having problems and i have yet to RMA three disks. and now the next
>>>>two disks..., i'm getting insane ;) i can't see any EXT3 error anymore
>>>>*g* the next disks will be reiserfs only to see other error messages
>>>>;) well, but that doesn't solve the problem of 6 disks within 2
>>>>months...this is so unlikely.
>>>
>>>
>>>Please read the FAQ, fix your mail application - you are sending long
>>>lines, and don't break the CC list.
>>>
>>>As to your problem, look at the LBA sector addresses in the error
>>>message:
>>>
>>>280923064991615
>>>
>>>is your drive really over 100 EB?  No...
>>
>>I think we could assume that (a) he never told the kernel the disk was 
>>"over 100 EB" and (b) the kernel was trying to use that LBA anyway. 
>>Which could be due to either a kernel bug or memory corruption (or CPU 
>>problems, but unlikely).
> 
> 
> What I was trying to point out is that the error message is clearly
> the result of a problem elsewhere.  Unless the drive firmware is
> buggy, or something very strange is going on inside the drive, (bad
> internal RAM or something like that), then the kernel did send a
> request for a sector which is well out of range.  What caused that
> request we don't know - quite possibly corruption of some filesystem
> structure on the disk caused that request, but it's important to be
> clear that the error message is the expected response to a request for
> such a high block number, and doesn't within itself indicate a problem
> with the disk.
> 
> I.E. Even though there is every chance that the drive is faulty, the
> posted error message doesn't indicate a drive failiure in itself, and
> you should look elsewhere.

Yes, I think we are saying the same thing, and by now hopefully the O.P. 
has gotten that. I suspect non-disk cause, just my guess.

-- 
bill davidsen <davidsen@tmr.com>
   CTO TMR Associates, Inc
   Doing interesting things with small computers since 1979

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-09  9:52   ` JG
  2004-02-09 10:26     ` John Bradford
@ 2004-02-09 11:52     ` Gene Heskett
  2004-02-18 13:41       ` Bill Davidsen
  1 sibling, 1 reply; 13+ messages in thread
From: Gene Heskett @ 2004-02-09 11:52 UTC (permalink / raw)
  To: JG; +Cc: linux-kernel

On Monday 09 February 2004 04:52, JG wrote:
>hi,
>
>> I have had the same exact errors.. and i have RMA'ed two disks...
>> already. i just thought it was bad luck..
>>
>> i power down properly and i dont think its power surges, since i
>> am in school and it only happens to me. i would expect my
>> neighbors to have problems too.
>
>really strange, me too.
>now...hm, it all started when i upgraded from kernel 2.4.19 to 2.6.0
> in late decemeber, the system worked very fine for a week or so
> (having great response times!) but then all of a sudden the
> problems started. 2 disks died. then my gigabit network card was
> only able to transmit 200kb/s (but this was really a hardware
> problem, a new card is working fine again, well...). a week later
> the next disks are having problems and i have yet to RMA three
> disks. and now the next two disks..., i'm getting insane ;) i can't
> see any EXT3 error anymore *g* the next disks will be reiserfs only
> to see other error messages ;) well, but that doesn't solve the
> problem of 6 disks within 2 months...this is so unlikely.
>
>JG

This thread seems to be related to a slashdot story about a 
mis-formulated epoxy-b that went into production circa 18-20 months 
ago.  It contains a time bomb chemical reaction involving trace 
anounts of red phosphorus, and is said to be the reason all HD makers 
went to a 1 year warranty.  You may have to google for the story now 
as I don't have a record of the link, sorry.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.22% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: could someone plz explain those ext3/hard disk errors
  2004-02-09 11:52     ` Gene Heskett
@ 2004-02-18 13:41       ` Bill Davidsen
  0 siblings, 0 replies; 13+ messages in thread
From: Bill Davidsen @ 2004-02-18 13:41 UTC (permalink / raw)
  To: gene.heskett; +Cc: JG, linux-kernel

Gene Heskett wrote:
	[___snip___]
> This thread seems to be related to a slashdot story about a 
> mis-formulated epoxy-b that went into production circa 18-20 months 
> ago.  It contains a time bomb chemical reaction involving trace 
> anounts of red phosphorus, and is said to be the reason all HD makers 
> went to a 1 year warranty.  You may have to google for the story now 
> as I don't have a record of the link, sorry.

http://www.geek.com/news/geeknews/2004Feb/gee20040210023815.htm

-- 
bill davidsen <davidsen@tmr.com>
   CTO TMR Associates, Inc
   Doing interesting things with small computers since 1979

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-02-18 18:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-08 17:53 could someone plz explain those ext3/hard disk errors JG
2004-02-08 18:55 ` Micha Feigin
2004-02-08 19:05   ` JG
2004-02-09  1:47 ` Heriberto A Tejeda
2004-02-09  9:52   ` JG
2004-02-09 10:26     ` John Bradford
2004-02-09 11:06       ` JG
2004-02-18 13:31       ` Bill Davidsen
2004-02-18 14:42         ` John Bradford
2004-02-18 15:54           ` JG
2004-02-18 18:26           ` Bill Davidsen
2004-02-09 11:52     ` Gene Heskett
2004-02-18 13:41       ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox