All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: Disk Errors
@ 2005-02-02 14:12 Salyzyn, Mark
  2005-02-03  8:18 ` Andi Kleen
  2005-02-15  5:56 ` Douglas Gilbert
  0 siblings, 2 replies; 20+ messages in thread
From: Salyzyn, Mark @ 2005-02-02 14:12 UTC (permalink / raw)
  To: dougg; +Cc: Bryan Henderson, Kit Gerrits, linux-scsi

From: Douglas Gilbert [mailto:dougg@torque.net] writes:
> All may not be lost. If a medium error occurs and the ASC and
> ASCQ imply the sector could be read but
> failed ECC then the READ LONG SCSI command should fetch the
> block (plus ECC and other data). For example a Fujitsu MAM3184
> returns 576 bytes. It is probably too much to expect that all
> the damage will be in the last 64 bytes.

However, the drive has taken whatever action it could to reconstruct the
data, the failure to report the block for a standard read means that the
data is in fact `lost'. The data+ECC combination must be in a state
where there are more bits of damage than the error correction can deal
with; 64 bytes of ECC deals with single bit errors thus we know that we
have more than 1 bit of damage to the disk. We could have 4096 bits of
damage in the worst case :-) and never know that fact.

If I wanted in desperation to recover whatever data I could, this would
be grand, but as it stands, from the Linux File System Driver
perspective, it would be dangerous to accept this block as anything more
than it is.

If the data is of the form to permit some loss, for example video, audio
content or an error correcting stream of data, someone can make a case
where READ_LONG is an appropriate action to take to help fill in missing
content. 

A fun thought ...

^ permalink raw reply	[flat|nested] 20+ messages in thread
* RE: Disk Errors
@ 2005-02-01 18:24 Salyzyn, Mark
  2005-02-02  3:55 ` Douglas Gilbert
  0 siblings, 1 reply; 20+ messages in thread
From: Salyzyn, Mark @ 2005-02-01 18:24 UTC (permalink / raw)
  To: Bryan Henderson, dougg; +Cc: Kit Gerrits, linux-scsi

An unrecoverable medium error is typically `corrected' when a write to
the block occurs. RAID cards will use the redundancy to calculate the
data and write it back to the offending drive for instance.

Otherwise, for none-redundant stores, bad media is as good as anything
to remind one that the data is gone ;->

Sincerely -- Mark Salyzyn

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org
[mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Bryan Henderson
Sent: Tuesday, February 01, 2005 1:01 PM
To: dougg@torque.net
Cc: Kit Gerrits; linux-scsi@vger.kernel.org
Subject: Re: Disk Errors

>So there are two situations in which damaged blocks remain
>accessible:
>    1) unrecoverable medium errors
> ...

What's the rationale behind leaving a damaged block accessible in the
case 
of an unrecoverable medium error?  A possibility that someone might 
actually be able to recover the data?

^ permalink raw reply	[flat|nested] 20+ messages in thread
* RE: Disk Errors
@ 2005-02-01 15:56 Cress, Andrew R
  0 siblings, 0 replies; 20+ messages in thread
From: Cress, Andrew R @ 2005-02-01 15:56 UTC (permalink / raw)
  To: Salyzyn, Mark, dougg, Kit Gerrits; +Cc: linux-scsi

Kit,

If you have another (non-RAID) SCSI system, you could take the faulty
drive there to modify the mode pages to turn on AWRE and ARRE with
either sgmode (scsirastools.sf.net) or sginfo (sg3_utils).

Otherwise, you are dependent on the tools that are provided for the
PowerEdge RAID controller.

Andy

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org
[mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Douglas Gilbert
Sent: Tuesday, February 01, 2005 7:44 AM
To: Kit Gerrits
Cc: linux-scsi@vger.kernel.org
Subject: Re: Disk Errors

Kit Gerrits wrote:
> I have found 08:05 to correspond to /dev/sda5, mounted as /usr(Thanks
for
> the pointer!).
> 
> Sda is the single-drive volume
> (non-RAID, as it is only for the O/S,
> which needs to be speedy and can be pulled from tape easily).
> 
> This explains several things:
> A/ Why a single error can take an entire volume offline B/ Why the
error is
> not logged
> 	If it only took the partition offline, 
> 	it would still have been logged, 
> 	as / is mounted from sda3
> 
> And leaves one question:
> What caused the error?
> 
> There are no GROWN defects on the drive in this volume

Kit,
A block/sector is added to the grown defect list after it
has been reassigned. Reaasignment occurs automatically for
recoverable (medium) errors if the AWRE and/or ARRE bits are
set (those bits are in the read write error recovery mode page).

So there are two situations in which damaged blocks remain
accessible:
    1) unrecoverable medium errors
    2) recoverable medium errors when AWRE and/or ARRE
       are clear

Case 2) can be ignored ** or could be handled by setting
ARRE and then reading the whole disk (e.g. with dd). Both cases
can be handled with the REASSIGN BLOCKS SCSI command
once the defective logical block address (lba) or
addresses have been identified.

Using the sg3_utils package various things can be
done:
    - "sginfo -e /dev/sda" will show the AWRE and ARRE
      settings. Changing them with sginfo is a bit ugly
    - "sginfo -G /dev/sda" will show the grown defect list
      in "index" format (up to 3 other formats may be
      available)
    - "sg_dd if=/dev/sg0 of=/dev/null bs=512" will read the
      whole disk or fail at the first unrecoverable (medium)
      error. If a medium error is detected the "info"
      field is the lba of the defect. ***
    - "sg_reassign -a <lba> /dev/sda" will reassign the
      <lba> block. If this succeeds <lba> should appear
      in the grown defect list ("sginfo -G -Flogical /dev/sda").

When a logical block with unrecoverable errors is reassigned
then the new contents are vendor specific. I'm not sure how
file systems react to this.


** recoverable errors can be ignored. Assuming these
    recoverable errors occur on read operations then the
    "read error counter" log page's
    recovered error counter (one of them depending on the
    duration of the recovery process) will be incremented

*** due to error processing, it is still better to use /dev/sg0
     rather than than /dev/sda with the sg_dd utility. Recent
     changes (lk 2.6.11-rc2-bk8) make the following work:
     "sg_dd if=/dev/sda blk_sgio=1 of=/dev/null bs=512"
     in the presence of errors

Doug Gilbert

> ---------------
> Reference logs:
> ---------------
> 
> Executing: disk show defects (ID=0)
> Number of PRIMARY defects on drive: 1912 Number of GROWN defects on
drive: 0
> 
> Executing: container list
> Num          Total  Oth Chunk          Scsi   Partition    
> Label Type   Size   Ctr Size   Usage   B:ID:L Offset:Size
> ----- ------ ------ --- ------ ------- ------ -------------
>  0    Volume 8.47GB            Open    0:00:0 64.0KB:8.47GB
>  /dev/sda             NT
>  1    RAID-5 16.9GB       32KB Open    0:01:0 64.0KB:8.47GB
>  /dev/sdb             DATA             0:02:0 64.0KB:8.47GB
>                                        ?:??:?  - Missing - Mount
points it
> to:
> # /dev/sda5             5.3G  1.5G  3.6G  30% /usr
>  
> 
> 
>>-----Oorspronkelijk bericht-----
>>Van: Salyzyn, Mark [mailto:mark_salyzyn@adaptec.com]
>>Verzonden: dinsdag 1 februari 2005 4:15
>>Aan: Kit Gerrits
>>Onderwerp: RE: Disk errors
>>
>>The controller does not appear to be busted; you have a Volume and a 
>>RAID-5. Are you missing an Array?
>>
>>A two drive failure on a RAID-5 gives you an offline array.
>>
>>A single drive failure in a Volume gives you an offline array.
>>
>>You need to find who is 08:05, look through /dev for the major/minor 
>>number and relate it to the 'device'. Look through /proc/scsi/scsi and

>>/var/messages to help correlate it.
>>
>>Sincerely -- Mark Salyzyn
>>
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread
* RE: Disk Errors
@ 2005-02-01 12:50 Salyzyn, Mark
  0 siblings, 0 replies; 20+ messages in thread
From: Salyzyn, Mark @ 2005-02-01 12:50 UTC (permalink / raw)
  To: dougg, Kit Gerrits; +Cc: linux-scsi

Good information for a single drive on a simple SCSI card. This will not
work for drives that are part of an array (volume) as /dev/sda
references a pseudo device. Besides, the firmware in the RAID controller
takes the actions necessary to perform recoverable bad block remaps.

Sincerely -- Mark Salyzyn

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org
[mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Douglas Gilbert
Sent: Tuesday, February 01, 2005 7:44 AM
To: Kit Gerrits
Cc: linux-scsi@vger.kernel.org
Subject: Re: Disk Errors

Kit Gerrits wrote:
> I have found 08:05 to correspond to /dev/sda5, mounted as /usr(Thanks
for
> the pointer!).
> 
> Sda is the single-drive volume
> (non-RAID, as it is only for the O/S,
> which needs to be speedy and can be pulled from tape easily).
> 
> This explains several things:
> A/ Why a single error can take an entire volume offline B/ Why the
error is
> not logged
> 	If it only took the partition offline, 
> 	it would still have been logged, 
> 	as / is mounted from sda3
> 
> And leaves one question:
> What caused the error?
> 
> There are no GROWN defects on the drive in this volume

Kit,
A block/sector is added to the grown defect list after it
has been reassigned. Reaasignment occurs automatically for
recoverable (medium) errors if the AWRE and/or ARRE bits are
set (those bits are in the read write error recovery mode page).

So there are two situations in which damaged blocks remain
accessible:
    1) unrecoverable medium errors
    2) recoverable medium errors when AWRE and/or ARRE
       are clear

Case 2) can be ignored ** or could be handled by setting
ARRE and then reading the whole disk (e.g. with dd). Both cases
can be handled with the REASSIGN BLOCKS SCSI command
once the defective logical block address (lba) or
addresses have been identified.

Using the sg3_utils package various things can be
done:
    - "sginfo -e /dev/sda" will show the AWRE and ARRE
      settings. Changing them with sginfo is a bit ugly
    - "sginfo -G /dev/sda" will show the grown defect list
      in "index" format (up to 3 other formats may be
      available)
    - "sg_dd if=/dev/sg0 of=/dev/null bs=512" will read the
      whole disk or fail at the first unrecoverable (medium)
      error. If a medium error is detected the "info"
      field is the lba of the defect. ***
    - "sg_reassign -a <lba> /dev/sda" will reassign the
      <lba> block. If this succeeds <lba> should appear
      in the grown defect list ("sginfo -G -Flogical /dev/sda").

When a logical block with unrecoverable errors is reassigned
then the new contents are vendor specific. I'm not sure how
file systems react to this.


** recoverable errors can be ignored. Assuming these
    recoverable errors occur on read operations then the
    "read error counter" log page's
    recovered error counter (one of them depending on the
    duration of the recovery process) will be incremented

*** due to error processing, it is still better to use /dev/sg0
     rather than than /dev/sda with the sg_dd utility. Recent
     changes (lk 2.6.11-rc2-bk8) make the following work:
     "sg_dd if=/dev/sda blk_sgio=1 of=/dev/null bs=512"
     in the presence of errors

Doug Gilbert

> ---------------
> Reference logs:
> ---------------
> 
> Executing: disk show defects (ID=0)
> Number of PRIMARY defects on drive: 1912 Number of GROWN defects on
drive: 0
> 
> Executing: container list
> Num          Total  Oth Chunk          Scsi   Partition    
> Label Type   Size   Ctr Size   Usage   B:ID:L Offset:Size
> ----- ------ ------ --- ------ ------- ------ -------------
>  0    Volume 8.47GB            Open    0:00:0 64.0KB:8.47GB
>  /dev/sda             NT
>  1    RAID-5 16.9GB       32KB Open    0:01:0 64.0KB:8.47GB
>  /dev/sdb             DATA             0:02:0 64.0KB:8.47GB
>                                        ?:??:?  - Missing - Mount
points it
> to:
> # /dev/sda5             5.3G  1.5G  3.6G  30% /usr
>  
> 
> 
>>-----Oorspronkelijk bericht-----
>>Van: Salyzyn, Mark [mailto:mark_salyzyn@adaptec.com]
>>Verzonden: dinsdag 1 februari 2005 4:15
>>Aan: Kit Gerrits
>>Onderwerp: RE: Disk errors
>>
>>The controller does not appear to be busted; you have a Volume and a 
>>RAID-5. Are you missing an Array?
>>
>>A two drive failure on a RAID-5 gives you an offline array.
>>
>>A single drive failure in a Volume gives you an offline array.
>>
>>You need to find who is 08:05, look through /dev for the major/minor 
>>number and relate it to the 'device'. Look through /proc/scsi/scsi and

>>/var/messages to help correlate it.
>>
>>Sincerely -- Mark Salyzyn
>>
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread
* Re: Disk Errors
@ 2005-02-01  8:53 Kit Gerrits
  2005-02-01 12:43 ` Douglas Gilbert
  0 siblings, 1 reply; 20+ messages in thread
From: Kit Gerrits @ 2005-02-01  8:53 UTC (permalink / raw)
  To: linux-scsi

I have found 08:05 to correspond to /dev/sda5, mounted as /usr(Thanks for
the pointer!).

Sda is the single-drive volume
(non-RAID, as it is only for the O/S,
which needs to be speedy and can be pulled from tape easily).

This explains several things:
A/ Why a single error can take an entire volume offline B/ Why the error is
not logged
	If it only took the partition offline, 
	it would still have been logged, 
	as / is mounted from sda3

And leaves one question:
What caused the error?

There are no GROWN defects on the drive in this volume


---------------
Reference logs:
---------------

Executing: disk show defects (ID=0)
Number of PRIMARY defects on drive: 1912 Number of GROWN defects on drive: 0

Executing: container list
Num          Total  Oth Chunk          Scsi   Partition    
Label Type   Size   Ctr Size   Usage   B:ID:L Offset:Size
----- ------ ------ --- ------ ------- ------ -------------
 0    Volume 8.47GB            Open    0:00:0 64.0KB:8.47GB
 /dev/sda             NT
 1    RAID-5 16.9GB       32KB Open    0:01:0 64.0KB:8.47GB
 /dev/sdb             DATA             0:02:0 64.0KB:8.47GB
                                       ?:??:?  - Missing - Mount points it
to:
# /dev/sda5             5.3G  1.5G  3.6G  30% /usr
 

> -----Oorspronkelijk bericht-----
> Van: Salyzyn, Mark [mailto:mark_salyzyn@adaptec.com]
> Verzonden: dinsdag 1 februari 2005 4:15
> Aan: Kit Gerrits
> Onderwerp: RE: Disk errors
> 
> The controller does not appear to be busted; you have a Volume and a 
> RAID-5. Are you missing an Array?
> 
> A two drive failure on a RAID-5 gives you an offline array.
> 
> A single drive failure in a Volume gives you an offline array.
> 
> You need to find who is 08:05, look through /dev for the major/minor 
> number and relate it to the 'device'. Look through /proc/scsi/scsi and 
> /var/messages to help correlate it.
> 
> Sincerely -- Mark Salyzyn
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread
* RE: Disk errors
@ 2005-01-31 18:21 Salyzyn, Mark
  2005-01-31 23:41 ` Kit Gerrits
  0 siblings, 1 reply; 20+ messages in thread
From: Salyzyn, Mark @ 2005-01-31 18:21 UTC (permalink / raw)
  To: Kit Gerrits; +Cc: linux-scsi

The PERC controller looks after bad block reassignment.

Sincerely -- Mark Salyzyn

-----Original Message-----
From: Kit Gerrits [mailto:kit@gerritsacc.nl] 
Sent: Monday, January 31, 2005 11:44 AM
To: Salyzyn, Mark
Cc: linux-scsi@vger.kernel.org
Subject: RE: Disk errors

Indeed, I had an entire screenful of errors (a few each second) when I
came
in in the morning...
The strange thing is, that the drive with the grown error is part of the
DATA container (/home and /data), whilst the disk with the rest ( / )
was
fine.

You'd expect the error to show  up in /var/log/messages, but it didn't. 
I think the entire controller gave up as soon as the error popped up.

-----
Is there a way of having the controller detect / handle grown errors?
Will setting automatic remapping handle this?

Does Anyone know how to read / write mode pages?
----

Thanks all!

Kit

> -----Oorspronkelijk bericht-----
> Van: Salyzyn, Mark [mailto:mark_salyzyn@adaptec.com] 
> Verzonden: maandag 31 januari 2005 17:03
> Aan: Kit Gerrits
> Onderwerp: RE: Disk errors
> 
> You get tones of I/O error messages from the filesystem 
> driver once the device goes offline. You can check 
> /var/log/messages to find the root cause.
> 
> You will need to run the RAID management tools (afacli) to 
> display the underlying components (container list). Dell has 
> their own customized tools for this, I can not comment on their usage.
> 
> Sincerely -- Mark Salyzyn
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread
* RE: Disk errors
@ 2005-01-31 17:11 Cress, Andrew R
  0 siblings, 0 replies; 20+ messages in thread
From: Cress, Andrew R @ 2005-01-31 17:11 UTC (permalink / raw)
  To: Kit Gerrits, linux-scsi


I don't know much about agacli.

The mode pages do have bits to enable SMART, but that's not what I think
you are interested in.
However, SMART can generate info events that the OS may not be
recognizing.  

What you are interested is mode page 0x01 to see if AWRE and ARRE are
turned on (bits 7 & 6, 0xC0).
The default setting for these may be documented in the disk manual for
your drives also, which can be obtained from the vendor web site.  Or,
the PERC vendor may be able to help get this info.

Andy

-----Original Message-----
From: Kit Gerrits [mailto:kit@gerritsacc.nl] 
Sent: Monday, January 31, 2005 10:22 AM
To: Cress, Andrew R; linux-scsi@vger.kernel.org
Subject: RE: Disk errors


Andrew,

Thanks for explaining the initial vs grown error list.
 
Unfortunately, the tool itself monitors softwareRAID and SCSI devices.
This means that sgmode itself sees only the containers on the PERC.


Would you happen to know how to accomplish this in afacli?


AFA0> disk set ?
disk set default - Sets the various disk defaults for all subsequent CLI
commands.
disk set smart - Change a device's SMART configuration.

AFA0> disk show ?
disk show default - Shows the various defaults set for the CLI commands.
disk show defects - Shows the number of defects and/or defect list on a
particular disk drive.
disk show partition - Shows the partitions on the disks attached to this
controller.
disk show smart - Displays SMART values and settings for SMART enabled
devices.
disk show space - Shows space usage on the disks attached to the
controller.

AFA0> disk show default
Executing: disk show default
No Default

AFA0>disk show smart
Executing: disk show smart
        Smart    Method of         Enable
        Capable  Informational     Exception  Performance  Error
B:ID:L  Device   Exceptions(MRIE)  Control    Enabled      Count
------  -------  ----------------  ---------  -----------  ------
0:00:0     Y            6             Y           N             0
0:01:0     Y            6             Y           N             0
0:02:0     Y            6             Y           N             0
0:03:0     Y            6             Y           N             0
0:06:0     N


Thanks for the info

Kit


> -----Oorspronkelijk bericht-----
> Van: Cress, Andrew R [mailto:andrew.r.cress@intel.com] 
> Verzonden: maandag 31 januari 2005 15:46
> Aan: Kit Gerrits; linux-scsi@vger.kernel.org
> Onderwerp: RE: Disk errors
> 
> Kit,
> 
> With the growing size of disk drives, and a more sectors 
> allocated to reserve sectors, the number of defects alone is 
> not a big concern, expecially if they are PRIMARY defects 
> (found at manufacture-time).
> What would be of concern, is an increase in the number of 
> GROWN defects over a short period of time.  Unfortunately, it 
> is quite common for one defect to cause a disk to be 
> replaced, when it could be remapped without the expense and 
> trouble of a field replacement.
> 
> The automatic remapping of grown defects is a feature of SCSI 
> disks, but may not be configured in the disk's mode pages.  
> The mode pages can be changed without affecting the content 
> of the disk (with the exception of size & sector mapping 
> parameters).  There are several Linux tools to read/set mode 
> pages, among which is 'sgmode' from http://scsirastools.sf.net.
> 
> As a guess, it appears that you had a grown defect occur on 
> one of your disks, but the remapping was not set to occur 
> automatically on that disk, so a write never finished.
> 
> Andy
> 
> 
> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org
> [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Kit Gerrits
> Sent: Monday, January 31, 2005 9:28 AM
> To: linux-scsi@vger.kernel.org
> Subject: Disk errors
> 
> 
> Exactly how many errors is a SCSI disk allowed to have?
> 
> I have a PE2400 with a PERC2/Si with 4x9GB
> 
> My disks show:
> AFA0> disk show defects 0
> Executing: disk show defects (ID=0)
> Number of PRIMARY defects on drive: 1912 Number of GROWN 
> defects on drive: 0
> 
> AFA0> disk show defects 1
> Executing: disk show defects (ID=1)
> Number of PRIMARY defects on drive: 952
> Number of GROWN defects on drive: 1
> 
> AFA0> disk show defects 2
> Executing: disk show defects (ID=2)
> Number of PRIMARY defects on drive: 2457 Number of GROWN 
> defects on drive: 0
> 
> AFA0> disk show defects 3
> Executing: disk show defects (ID=3)
> Number of PRIMARY defects on drive: 2794 Number of GROWN 
> defects on drive: 0
> 
> The reason I ask is tha tmy O/S (RedHat Enterprise Linux 3.0) 
> has recently hung with the error:
> 
> I/O Error Dev 08:05 Sector 529712
> 
> I would assume that this error is generated by the harddrive, 
> but shouldn't the controller catch SCSI errors (and relocate 
> sectors automagically)?
> 
> Thanks in advance,
> 
> Kit Gerrits
> 
> kit@gerritsaa.nl
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" in the body of a message to 
> majordomo@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 20+ messages in thread
[parent not found: <60807403EABEB443939A5A7AA8A7458BB51FD1@otce2k01.adaptec.com>]
* RE: Disk errors
@ 2005-01-31 14:46 Cress, Andrew R
  2005-01-31 15:22 ` Kit Gerrits
  0 siblings, 1 reply; 20+ messages in thread
From: Cress, Andrew R @ 2005-01-31 14:46 UTC (permalink / raw)
  To: Kit Gerrits, linux-scsi

Kit,

With the growing size of disk drives, and a more sectors allocated to
reserve sectors, the number of defects alone is not a big concern,
expecially if they are PRIMARY defects (found at manufacture-time).
What would be of concern, is an increase in the number of GROWN defects
over a short period of time.  Unfortunately, it is quite common for one
defect to cause a disk to be replaced, when it could be remapped without
the expense and trouble of a field replacement.

The automatic remapping of grown defects is a feature of SCSI disks, but
may not be configured in the disk's mode pages.  The mode pages can be
changed without affecting the content of the disk (with the exception of
size & sector mapping parameters).  There are several Linux tools to
read/set mode pages, among which is 'sgmode' from
http://scsirastools.sf.net.

As a guess, it appears that you had a grown defect occur on one of your
disks, but the remapping was not set to occur automatically on that
disk, so a write never finished.

Andy


-----Original Message-----
From: linux-scsi-owner@vger.kernel.org
[mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Kit Gerrits
Sent: Monday, January 31, 2005 9:28 AM
To: linux-scsi@vger.kernel.org
Subject: Disk errors


Exactly how many errors is a SCSI disk allowed to have?

I have a PE2400 with a PERC2/Si with 4x9GB

My disks show:
AFA0> disk show defects 0
Executing: disk show defects (ID=0)
Number of PRIMARY defects on drive: 1912
Number of GROWN defects on drive: 0

AFA0> disk show defects 1
Executing: disk show defects (ID=1)
Number of PRIMARY defects on drive: 952
Number of GROWN defects on drive: 1

AFA0> disk show defects 2
Executing: disk show defects (ID=2)
Number of PRIMARY defects on drive: 2457
Number of GROWN defects on drive: 0

AFA0> disk show defects 3
Executing: disk show defects (ID=3)
Number of PRIMARY defects on drive: 2794
Number of GROWN defects on drive: 0

The reason I ask is tha tmy O/S (RedHat Enterprise Linux 3.0) has
recently
hung with the error:

I/O Error Dev 08:05 Sector 529712

I would assume that this error is generated by the harddrive, but
shouldn't
the controller catch SCSI errors (and relocate sectors automagically)?

Thanks in advance,

Kit Gerrits

kit@gerritsaa.nl

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread
* Disk errors
@ 2005-01-31 14:27 Kit Gerrits
  0 siblings, 0 replies; 20+ messages in thread
From: Kit Gerrits @ 2005-01-31 14:27 UTC (permalink / raw)
  To: linux-scsi

Exactly how many errors is a SCSI disk allowed to have?

I have a PE2400 with a PERC2/Si with 4x9GB

My disks show:
AFA0> disk show defects 0
Executing: disk show defects (ID=0)
Number of PRIMARY defects on drive: 1912
Number of GROWN defects on drive: 0

AFA0> disk show defects 1
Executing: disk show defects (ID=1)
Number of PRIMARY defects on drive: 952
Number of GROWN defects on drive: 1

AFA0> disk show defects 2
Executing: disk show defects (ID=2)
Number of PRIMARY defects on drive: 2457
Number of GROWN defects on drive: 0

AFA0> disk show defects 3
Executing: disk show defects (ID=3)
Number of PRIMARY defects on drive: 2794
Number of GROWN defects on drive: 0

The reason I ask is tha tmy O/S (RedHat Enterprise Linux 3.0) has recently
hung with the error:

I/O Error Dev 08:05 Sector 529712

I would assume that this error is generated by the harddrive, but shouldn't
the controller catch SCSI errors (and relocate sectors automagically)?

Thanks in advance,

Kit Gerrits

kit@gerritsaa.nl


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2005-02-15  5:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-02 14:12 Disk Errors Salyzyn, Mark
2005-02-03  8:18 ` Andi Kleen
2005-02-15  5:56 ` Douglas Gilbert
  -- strict thread matches above, loose matches on Subject: below --
2005-02-01 18:24 Salyzyn, Mark
2005-02-02  3:55 ` Douglas Gilbert
2005-02-03 18:50   ` Bryan Henderson
2005-02-01 15:56 Cress, Andrew R
2005-02-01 12:50 Salyzyn, Mark
2005-02-01  8:53 Kit Gerrits
2005-02-01 12:43 ` Douglas Gilbert
2005-02-01 18:01   ` Bryan Henderson
2005-01-31 18:21 Disk errors Salyzyn, Mark
2005-01-31 23:41 ` Kit Gerrits
2005-01-31 23:55   ` Matt Domsch
2005-02-01  2:05   ` Guy
2005-01-31 17:11 Cress, Andrew R
     [not found] <60807403EABEB443939A5A7AA8A7458BB51FD1@otce2k01.adaptec.com>
2005-01-31 16:43 ` Kit Gerrits
2005-01-31 14:46 Cress, Andrew R
2005-01-31 15:22 ` Kit Gerrits
2005-01-31 14:27 Kit Gerrits

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.