Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
       [not found]       ` <48B9C261.4010609@xms.se>
@ 2008-08-30 22:12         ` Justin Piszcz
  2008-08-31 10:00           ` Jonas Petersson
  2008-09-02 13:39           ` Owen Martin
  0 siblings, 2 replies; 5+ messages in thread
From: Justin Piszcz @ 2008-08-30 22:12 UTC (permalink / raw)
  To: Jonas Petersson; +Cc: linux-ide, smartmontools-support, linux-kernel

On Sat, 30 Aug 2008, Jonas Petersson wrote:

> Justin Piszcz skrev:
>> On Sat, 30 Aug 2008, Jonas Petersson wrote:
>>> [...]
>> smartctl -a would be useful (#1)
>
> # smartctl -a /dev/sda
> smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/

I have the same controller in my host as well, but it does not appear to
matter whether it happens on the ICH8 controller or other controllers.

I have noticed on Velociraptors I seem to get the same/similar error that
you do as well, and I ran all the same tests as you, to no avail as to getting
any closer to finding the root cause/problem.
(.. more so than the regular old raptor150s)

Besides the annoying messages in the kernel log/syslog/dmesg, does it
affect your system stability in any way as of yet?

I must add a very important note here though, you are using an ICH8 chipset
and so am I, we both have same/similar problems-- however, I also have
another machine setup VERY similarly (except different HDDs) for the RAID5
but the RAID1 is the same as one of my ICH8 boxes (dual raptor150s)--
and to date it has never? or rarely thrown the frozen error except when a disk 
actually failed (or when NCQ is enabled for a WD drive), (NCQ+Linux for WD) is
broken.

I have disks in a raid set (both raid1 and raid5) that get same/similar
warnings as I mentioned above and so far it has not had any impact that I have
noticed in relation to these specific errors.

I think for now we just have to live with them, I am not sure what else
to say here..

CC'ing linux-ide and linux-kernel with your original error from the start
of this e-mail thread:

Here is a snippet from this morning - this time it came back to life:

[46874.898690] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
frozen
[46874.898703] ata3.00: cmd c8/00:08:90:3c:59/00:00:00:00:00/ef tag 0
dma 4096 in
[46874.898705]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
[46874.898709] ata3.00: status: { DRDY }
[46879.643962] ata3: port is slow to respond, please be patient (Status
0xd0)
[46884.473195] ata3: device not ready (errno=-16), forcing hardreset
[46884.473202] ata3: soft resetting link
[46912.740010] ata3.00: qc timeout (cmd 0xec)
[46912.740020] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[46912.740023] ata3.00: revalidation failed (errno=-5)
[46912.740028] ata3: failed to recover some devices, retrying in 5 secs
[46917.458070] ata3: soft resetting link
[46917.636464] ata3.00: configured for UDMA/100
[46917.636482] ata3: EH complete
[46917.699224] sd 2:0:0:0: [sda] 488397168 512-byte hardware sectors
(250059 MB)
[46917.699257] sd 2:0:0:0: [sda] Write Protect is off
[46917.699263] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[46917.699300] sd 2:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA

Here is an example from my host (same/similar issue):

Aug 23 20:00:32 p34 kernel: [189770.219773] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 23 20:00:32 p34 kernel: [189770.219784] ata1.00: cmd 35/00:40:9a:d9:7a/00:00:12:00:00/e0 tag 0 dma 32768 out
Aug 23 20:00:32 p34 kernel: [189770.219786]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 23 20:00:32 p34 kernel: [189770.219790] ata1.00: status: { DRDY }
Aug 23 20:00:32 p34 kernel: [189770.219795] ata1: hard resetting link
Aug 23 20:00:32 p34 kernel: [189770.524770] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 23 20:00:32 p34 kernel: [189770.543960] ata1.00: configured for UDMA/133
Aug 23 20:00:32 p34 kernel: [189770.543977] ata1: EH complete
Aug 23 20:00:32 p34 kernel: [189770.544810] sd 0:0:0:0: [sda] 586072368 512-byte hardware sectors (300069 MB)
Aug 23 20:00:32 p34 kernel: [189770.551810] sd 0:0:0:0: [sda] Write Protect is off
Aug 23 20:00:32 p34 kernel: [189770.551810] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Aug 23 20:00:32 p34 kernel: [189770.863810] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

What is the root cause of this? It still seems to be a mystery to most as far
as I can tell, but the one thing in common is we are both using ICH8 chipsets,
which, just may happen to be part of the problem?

Justin.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
  2008-08-30 22:12         ` exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Justin Piszcz
@ 2008-08-31 10:00           ` Jonas Petersson
  2008-09-02 13:39           ` Owen Martin
  1 sibling, 0 replies; 5+ messages in thread
From: Jonas Petersson @ 2008-08-31 10:00 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-ide, smartmontools-support, linux-kernel

Hi again Justin,

Justin Piszcz skrev:
> On Sat, 30 Aug 2008, Jonas Petersson wrote:
>> Justin Piszcz skrev:
>>> On Sat, 30 Aug 2008, Jonas Petersson wrote:
>>>> [...]
>>> smartctl -a would be useful (#1)
>> # smartctl -a /dev/sda
>> smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
> 
> I have the same controller in my host as well, but it does not appear to
> matter whether it happens on the ICH8 controller or other controllers.
> 
> I have noticed on Velociraptors I seem to get the same/similar error that
> you do as well, and I ran all the same tests as you, to no avail as to getting
> any closer to finding the root cause/problem.
> (.. more so than the regular old raptor150s)
> 
> Besides the annoying messages in the kernel log/syslog/dmesg, does it
> affect your system stability in any way as of yet?

Very much so, yes.

At best, all disk access will hang for a while and then resume after the 
reset has worked out - this often happens a couple of times per day now.

At worst, the reset will not work and the disk is remounted read-only 
and I can sort of use the system a bit this way. It seems somewhat 
random how much still works: Up until today I could at least always use 
dmesg and tail various logs to try to hunt down what happened, but this 
morning dmesg could not be found and I got I/O errors when accessing 
anything in /var/log. Rebooting helped as usual.

This fatal variant has happened about every second day lately.

The first two weeks I had the system showed nothing at all like this: I 
have log files since July 26 and the first recorded (reset-able) glitch 
is from Aug 16. Obviously, any non-resetable problem would have been 
easy to spot.

> I must add a very important note here though, you are using an ICH8 chipset
> and so am I, we both have same/similar problems-- however, I also have
> another machine setup VERY similarly (except different HDDs) for the RAID5
> but the RAID1 is the same as one of my ICH8 boxes (dual raptor150s)--
> and to date it has never? or rarely thrown the frozen error except when a disk 
> actually failed (or when NCQ is enabled for a WD drive), (NCQ+Linux for WD) is
> broken.

Yes, I would not point fingers to the ICH8 chipset either: The other 
MacBookPro I have experimented with now is a 2,2 (ATI based) and has 
ICH7, but I'm 99.9% sure my previous MacBookPro 3,1 (nvidia based) was 
ICH8 and it worked flawlessly (I saw no reason to swap for the 4,1 
version, but it was stolen from me in June). As far as I know the 
significant differences with my current MBP are just: higher screen 
resolution, multitouch ("iphone") touchpad and more memory. Alas, I 
didn't keep a lshw dump.

> [...]
> CC'ing linux-ide and linux-kernel with your original error from the start
> of this e-mail thread:
> 
> Here is a snippet from this morning - this time it came back to life:
> 
> [46874.898690] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> frozen
> [46874.898703] ata3.00: cmd c8/00:08:90:3c:59/00:00:00:00:00/ef tag 0
> dma 4096 in
> [46874.898705]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask
> 0x4 (timeout)
> [46874.898709] ata3.00: status: { DRDY }
> [46879.643962] ata3: port is slow to respond, please be patient (Status
> 0xd0)
> [46884.473195] ata3: device not ready (errno=-16), forcing hardreset
> [46884.473202] ata3: soft resetting link
> [46912.740010] ata3.00: qc timeout (cmd 0xec)
> [46912.740020] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [46912.740023] ata3.00: revalidation failed (errno=-5)
> [46912.740028] ata3: failed to recover some devices, retrying in 5 secs
> [46917.458070] ata3: soft resetting link
> [46917.636464] ata3.00: configured for UDMA/100
> [46917.636482] ata3: EH complete
> [46917.699224] sd 2:0:0:0: [sda] 488397168 512-byte hardware sectors
> (250059 MB)
> [46917.699257] sd 2:0:0:0: [sda] Write Protect is off
> [46917.699263] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [46917.699300] sd 2:0:0:0: [sda] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA

I'll just clarify that the errno after "revalidation failed" is not 
always -5. When it ends up fatal I've also seen -3 and possibly 
something else too. I would have taken a screen shot this morning if 
only dmesg had worked. :-(

> What is the root cause of this? It still seems to be a mystery to most as far
> as I can tell, but the one thing in common is we are both using ICH8 chipsets,
> which, just may happen to be part of the problem?

For the record: My current theory is that it is some kind of hardware 
problem - either in the disk or on the motherboard so I have persuaded 
my local AppleStore to swap the harddisk on Monday and then they will 
run their full hardware stress test (4+ hours according to him). The 
stress test was apparently suggested from the central repair people (who 
have no idea I run Linux on it - the local techie knows, but has no 
problem with it as long as I keep a small OSX partition) so I guess this 
sort of hints that they are aware of hardware issues.

(Note: I've had the same techie replace a broken motherboard in the past 
when the Linux messages where at least as clear as the OSX ones - in 
that case drives would in the end only show up in the boot menue when 
the system had cooled down for at least 20 minutes. To be on the safe 
side, I've upped the minimum fan speed by 50% to ensure all sensors give 
me happy readings all the time - luckily the 4,1 fans are very silent 
compared to the 2,2)

I hope to have everything back in shape on Wednesday and I'll let you 
know how it fares.

BTW: For a while I displayed the hddtemp sensor all the time along with 
coretemp etc, but I now understand that this is also SMART based so I've 
turned it off in the past weeks experimentation. Again, it seemed to 
work flawlessly for months on my previous (stolen) MBP 3,1.

			Best / Jonas

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
  2008-08-30 22:12         ` exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Justin Piszcz
  2008-08-31 10:00           ` Jonas Petersson
@ 2008-09-02 13:39           ` Owen Martin
  2008-09-02 15:49             ` [smartmontools-support] " Jonas Petersson
  1 sibling, 1 reply; 5+ messages in thread
From: Owen Martin @ 2008-09-02 13:39 UTC (permalink / raw)
  To: Justin Piszcz, Jonas Petersson
  Cc: linux-ide, smartmontools-support, linux-kernel

This looks like a timeout during a read command:

ata3.00: cmd c8/00:08:90:3c:59

Read dma of 8 blocks from 0x903c59

Next time it happens, see if it is the same LBA. Since the drive came
back after the bus reset makes me think it was probably in error
recovery for an extended amount of time.

Sorry, but I am new to using smartmontools for decoding SMART
attributes. Your previous email showed:

Device is:        Not in smartctl database [for details use: -P showall]

Does that imply the tool will not know the exact meaning of all the
attributes? I am not familiar with Fujitsu's implementation.

>From the data you sent about the attributes before, it looks like the
pending and reallocated sector counts are zero, so the block must have
not failed recovery. Can you try to dump the sector using hdparm-8.9 to
see if it reproduces?

hdparm --read-sector 9452633 /dev/sda

What is the timeout set to?

cat /sys/block/sda/device/timeout

Maybe try to increase that. You want to be sure that it is not a drive
issue by verifying the block is readable and the raw values from the
pending, uncorrectable or reallocated sector attributes don't change.

I was seeing the exact same thing when I was trying to run the SMART
selftest in captive mode (not using smartmon). When I increased the
timeout it was able to complete.

Aug 27 02:36:42 spu0201 user.err kernel: ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x2 frozen
Aug 27 02:36:42 spu0201 user.err kernel: ata1.00: cmd
b0/d4:00:83:4f:c2/00:00:00:00:00/00 tag 0
Aug 27 02:36:42 spu0201 user.warn kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Justin's error is from a write:

ata1.00: cmd 35/00:40:9a:d9:7a/00:00:12:00:00/e0 tag 0 dma 32768 out

That typically only happens in a high vibration environment. Since the
write is open loop, typically the only thing that can prevent it from
completing is position error. It might be a PHY issue, but without a bus
analyzer, it is hard to tell. The new Seagate drives have attribute 199,
SATA R-err count, which might help to identify the issue, if you think
it is related to the chipset/PHY.

-Owen

-----Original Message-----
From: smartmontools-support-bounces@lists.sourceforge.net
[mailto:smartmontools-support-bounces@lists.sourceforge.net] On Behalf
Of Justin Piszcz
Sent: Saturday, August 30, 2008 6:13 PM
To: Jonas Petersson
Cc: linux-ide@vger.kernel.org;
smartmontools-support@lists.sourceforge.net;
linux-kernel@vger.kernel.org
Subject: Re: [smartmontools-support] exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x2 frozen

On Sat, 30 Aug 2008, Jonas Petersson wrote:

> Justin Piszcz skrev:
>> On Sat, 30 Aug 2008, Jonas Petersson wrote:
>>> [...]
>> smartctl -a would be useful (#1)
>
> # smartctl -a /dev/sda
> smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
Allen
> Home page is http://smartmontools.sourceforge.net/

I have the same controller in my host as well, but it does not appear to
matter whether it happens on the ICH8 controller or other controllers.

I have noticed on Velociraptors I seem to get the same/similar error
that
you do as well, and I ran all the same tests as you, to no avail as to
getting
any closer to finding the root cause/problem.
(.. more so than the regular old raptor150s)

Besides the annoying messages in the kernel log/syslog/dmesg, does it
affect your system stability in any way as of yet?

I must add a very important note here though, you are using an ICH8
chipset
and so am I, we both have same/similar problems-- however, I also have
another machine setup VERY similarly (except different HDDs) for the
RAID5
but the RAID1 is the same as one of my ICH8 boxes (dual raptor150s)--
and to date it has never? or rarely thrown the frozen error except when
a disk 
actually failed (or when NCQ is enabled for a WD drive), (NCQ+Linux for
WD) is
broken.

I have disks in a raid set (both raid1 and raid5) that get same/similar
warnings as I mentioned above and so far it has not had any impact that
I have
noticed in relation to these specific errors.

I think for now we just have to live with them, I am not sure what else
to say here..

CC'ing linux-ide and linux-kernel with your original error from the
start
of this e-mail thread:

Here is a snippet from this morning - this time it came back to life:

[46874.898690] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
frozen
[46874.898703] ata3.00: cmd c8/00:08:90:3c:59/00:00:00:00:00/ef tag 0
dma 4096 in
[46874.898705]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
[46874.898709] ata3.00: status: { DRDY }
[46879.643962] ata3: port is slow to respond, please be patient (Status
0xd0)
[46884.473195] ata3: device not ready (errno=-16), forcing hardreset
[46884.473202] ata3: soft resetting link
[46912.740010] ata3.00: qc timeout (cmd 0xec)
[46912.740020] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[46912.740023] ata3.00: revalidation failed (errno=-5)
[46912.740028] ata3: failed to recover some devices, retrying in 5 secs
[46917.458070] ata3: soft resetting link
[46917.636464] ata3.00: configured for UDMA/100
[46917.636482] ata3: EH complete
[46917.699224] sd 2:0:0:0: [sda] 488397168 512-byte hardware sectors
(250059 MB)
[46917.699257] sd 2:0:0:0: [sda] Write Protect is off
[46917.699263] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[46917.699300] sd 2:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA

Here is an example from my host (same/similar issue):

Aug 23 20:00:32 p34 kernel: [189770.219773] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 23 20:00:32 p34 kernel: [189770.219784] ata1.00: cmd
35/00:40:9a:d9:7a/00:00:12:00:00/e0 tag 0 dma 32768 out
Aug 23 20:00:32 p34 kernel: [189770.219786]          res
40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 23 20:00:32 p34 kernel: [189770.219790] ata1.00: status: { DRDY }
Aug 23 20:00:32 p34 kernel: [189770.219795] ata1: hard resetting link
Aug 23 20:00:32 p34 kernel: [189770.524770] ata1: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Aug 23 20:00:32 p34 kernel: [189770.543960] ata1.00: configured for
UDMA/133
Aug 23 20:00:32 p34 kernel: [189770.543977] ata1: EH complete
Aug 23 20:00:32 p34 kernel: [189770.544810] sd 0:0:0:0: [sda] 586072368
512-byte hardware sectors (300069 MB)
Aug 23 20:00:32 p34 kernel: [189770.551810] sd 0:0:0:0: [sda] Write
Protect is off
Aug 23 20:00:32 p34 kernel: [189770.551810] sd 0:0:0:0: [sda] Mode
Sense: 00 3a 00 00
Aug 23 20:00:32 p34 kernel: [189770.863810] sd 0:0:0:0: [sda] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA

What is the root cause of this? It still seems to be a mystery to most
as far
as I can tell, but the one thing in common is we are both using ICH8
chipsets,
which, just may happen to be part of the problem?

Justin.

------------------------------------------------------------------------
-
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the
world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Smartmontools-support mailing list
Smartmontools-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/smartmontools-support

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [smartmontools-support] exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
  2008-09-02 13:39           ` Owen Martin
@ 2008-09-02 15:49             ` Jonas Petersson
  2008-09-07 20:48               ` Jonas Petersson
  0 siblings, 1 reply; 5+ messages in thread
From: Jonas Petersson @ 2008-09-02 15:49 UTC (permalink / raw)
  To: Owen Martin; +Cc: Justin Piszcz, linux-ide, smartmontools-support, linux-kernel

Hi Owen,

Owen Martin wrote:
 > This looks like a timeout during a read command:
 >
 > ata3.00: cmd c8/00:08:90:3c:59
 >
 > Read dma of 8 blocks from 0x903c59
 >
 > Next time it happens, see if it is the same LBA. Since the drive came
 > back after the bus reset makes me think it was probably in error
 > recovery for an extended amount of time.

Sounds like a good idea. However, I had the drive swapped yesterday and 
have now reinstalled on a (seemingly) identical one which so far seems 
to be free from these messages. Hence, I keep my fingers crossed that 
this was indeed a hw error.

As it was on warranty I was not allowed to keep the bad drive for 
further experiments.

 > Sorry, but I am new to using smartmontools for decoding SMART
 > attributes. Your previous email showed:
 >
 > Device is:        Not in smartctl database [for details use: -P showall]
 >
 > Does that imply the tool will not know the exact meaning of all the
 > attributes? I am not familiar with Fujitsu's implementation.

I believe you are correct.

 >>From the data you sent about the attributes before, it looks like the
 > pending and reallocated sector counts are zero, so the block must have
 > not failed recovery. Can you try to dump the sector using hdparm-8.9 to
 > see if it reproduces?
 >
 > hdparm --read-sector 9452633 /dev/sda

Would if I could... The messages I sent were indeed only from cases 
where the driver succeeded write to the disk in the end (extracts from 
/var/log/messages). In the failure cases I did not make a hard copy.

 > What is the timeout set to?
 >
 > cat /sys/block/sda/device/timeout

30

 > Maybe try to increase that. You want to be sure that it is not a drive
 > issue by verifying the block is readable and the raw values from the
 > pending, uncorrectable or reallocated sector attributes don't change.

Will do if I ever see it again.

			Best / Jonas



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
  2008-09-02 15:49             ` [smartmontools-support] " Jonas Petersson
@ 2008-09-07 20:48               ` Jonas Petersson
  0 siblings, 0 replies; 5+ messages in thread
From: Jonas Petersson @ 2008-09-07 20:48 UTC (permalink / raw)
  Cc: linux-ide, smartmontools-support, linux-kernel

For the record:

Jonas Petersson wrote:
> Owen Martin wrote:
>  > This looks like a timeout during a read command:
>  >
>  > ata3.00: cmd c8/00:08:90:3c:59
>  >
>  > Read dma of 8 blocks from 0x903c59
>  >
>  > Next time it happens, see if it is the same LBA. Since the drive came
>  > back after the bus reset makes me think it was probably in error
>  > recovery for an extended amount of time.
> 
> Sounds like a good idea. However, I had the drive swapped yesterday and 
> have now reinstalled on a (seemingly) identical one which so far seems 
> to be free from these messages. Hence, I keep my fingers crossed that 
> this was indeed a hw error.
 > [...]

I've now stressed the new disk for almost a week and seen no indication 
at all to the previous error. Everything else is the same as before - I 
even installed from the very same DVD. My conclusion is therefore that I 
really had a disk that was broken in a way that normal tests will not 
detect.

Hence, my tip to anyone having a similar experience: Don't blame the 
driver, nor the motherboard/chipset - just replace the drive. It would 
of course be even nicer if the error message could spell this out 
somewhat clearer too, but I guess the "I/O error" in the middle is a 
fair hint in retrospect.

			Best / Jonas



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-09-07 20:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <48B91198.3020803@xms.se>
     [not found] ` <alpine.DEB.1.10.0808300708330.19513@p34.internal.lan>
     [not found]   ` <48B9AE15.7010605@xms.se>
     [not found]     ` <alpine.DEB.1.10.0808301710380.11166@p34.internal.lan>
     [not found]       ` <48B9C261.4010609@xms.se>
2008-08-30 22:12         ` exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Justin Piszcz
2008-08-31 10:00           ` Jonas Petersson
2008-09-02 13:39           ` Owen Martin
2008-09-02 15:49             ` [smartmontools-support] " Jonas Petersson
2008-09-07 20:48               ` Jonas Petersson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).