SATA error while resume

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* SATA error while resume
@ 2007-08-19  9:12 Maciek Rutecki
  2007-08-19 10:45 ` Tejun Heo
  2007-08-19 14:25 ` Mark Lord
  0 siblings, 2 replies; 6+ messages in thread
From: Maciek Rutecki @ 2007-08-19  9:12 UTC (permalink / raw)
  To: Linux-kernel, htejun, linux-ide, linux-acpi, rjw

Kernel: 2.6.23-rc2 witch patches [1], but older and stable versions also
affected.

[1] http://www.ussg.iu.edu/hypermail/linux/kernel/0708.0/2655.html
+ipw3945 and truecrypt.

Sometimes (one in ten, or rarely) I have this error while system resume
from suspend to disk:

=================
swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000
swsusp: Basic memory bitmaps created
Freezing user space processes ... (elapsed 0.00 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
Loading image data pages (117687 pages)
...     \b\b\b\b  0%\b\b\b\b  1%\b\b\b\b  2%\b\b\b\b  3%\b\b\b\b  4%\b\b\b\b  5%\b\b\b\b  6%\b\b\b\b  7%\b\b\b\b
 8%\b\b\b\b  9%\b\b\b\b 10%\b\b\b\b 11%\b\b\b\b 12%\b\b\b\b 13%\b\b\b\b 14%\b\b\b\b 15%\b\b\b\b 16%\b\b\b\b 17%\b\b\b\b
18%\b\b\b\b 19%\b\b\b\b 20%<3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: cmd 25/00:00:10:0b:f3/00:04:05:00:00/e0 tag 0 cdb 0x0 data
524288 in
         res 51/40:a4:6c:0b:f3/00:03:05:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: cmd 25/00:00:10:0b:f3/00:04:05:00:00/e0 tag 0 cdb 0x0 data
524288 in
         res 51/40:a4:6c:0b:f3/00:03:05:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: cmd 25/00:00:10:0b:f3/00:04:05:00:00/e0 tag 0 cdb 0x0 data
524288 in
         res 51/40:a4:6c:0b:f3/00:03:05:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: cmd 25/00:00:10:0b:f3/00:04:05:00:00/e0 tag 0 cdb 0x0 data
524288 in
         res 51/40:a4:6c:0b:f3/00:03:05:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: cmd 25/00:00:10:0b:f3/00:04:05:00:00/e0 tag 0 cdb 0x0 data
524288 in
         res 51/40:a4:6c:0b:f3/00:03:05:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: cmd 25/00:00:10:0b:f3/00:04:05:00:00/e0 tag 0 cdb 0x0 data
524288 in
         res 51/40:a4:6c:0b:f3/00:03:05:00:00/e0 Emask 0x9 (media error)
ata1.00: configured for UDMA/100
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
        05 f3 0b 6c
sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate
failed
end_request: I/O error, dev sda, sector 99814252
Read-error on swap-device (8:0:99814256)
Read-error on swap-device (8:0:99814264)
Read-error on swap-device (8:0:99814272)
...
Read-error on swap-device (8:0:99815184)
ata1: EH complete
sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
Read 470748 kbytes in 30.97 seconds (15.20 MB/s)
PM: Restore failed, recovering.
Restarting tasks ... done.
swsusp: Basic memory bitmaps freed
=================

Then system continue booting without resume.


I use smartctl and check disk 2 times and run fsck/mkswap -c and I have
no erros:

=================
 rutek:/home/maciek/kernel.org/libata_error# smartctl -A /dev/sda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   046    Pre-fail  Always
      -       28879
  2 Throughput_Performance  0x0005   100   100   030    Pre-fail
Offline      -       20381999
  3 Spin_Up_Time            0x0003   100   100   025    Pre-fail  Always
      -       1
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always
      -       1599
  5 Reallocated_Sector_Ct   0x0033   100   100   024    Pre-fail  Always
      -       8589934592000
  7 Seek_Error_Rate         0x000f   100   100   047    Pre-fail  Always
      -       3713
  8 Seek_Time_Performance   0x0005   100   100   019    Pre-fail
Offline      -       0
  9 Power_On_Seconds        0x0032   096   096   000    Old_age   Always
      -       0h+41m+39s
 10 Spin_Retry_Count        0x0013   100   100   020    Pre-fail  Always
      -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
      -       1354
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always
      -       65
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always
      -       2776
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always
      -       33 (Lifetime Min/Max 15/46)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always
      -       344
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always
      -       444268544
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always
      -       1
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always
      -       0
200 Multi_Zone_Error_Rate   0x000f   100   100   060    Pre-fail  Always
      -       22830
203 Run_Out_Cancel          0x0002   100   100   000    Old_age   Always
      -       2632796799455
240 Head_Flying_Hours       0x003e   200   200   000    Old_age   Always
      -       0

=================

Dmesg and config:
http://www.unixy.pl/maciek/download/kernel/libata_error/

Regards
-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SATA error while resume
  2007-08-19  9:12 SATA error while resume Maciek Rutecki
@ 2007-08-19 10:45 ` Tejun Heo
  2007-08-19 14:25 ` Mark Lord
  1 sibling, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2007-08-19 10:45 UTC (permalink / raw)
  To: Maciek Rutecki; +Cc: Linux-kernel, linux-ide, linux-acpi, rjw

Maciek Rutecki wrote:
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> ata1.00: irq_stat 0x40000001
> ata1.00: cmd 25/00:00:10:0b:f3/00:04:05:00:00/e0 tag 0 cdb 0x0 data
> 524288 in
>          res 51/40:a4:6c:0b:f3/00:03:05:00:00/e0 Emask 0x9 (media error)
> ata1.00: configured for UDMA/100
> 
> Then system continue booting without resume.
> 
> I use smartctl and check disk 2 times and run fsck/mkswap -c and I have
> no erros:
> 
> =================
>   5 Reallocated_Sector_Ct   0x0033   100   100   024    Pre-fail  Always
>       -       8589934592000
> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always
>       -       444268544

It very much looks like the disk is dying.  Dunno why it doesn't show up
during SMART testing but you better back up and contact the hardware vendor.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SATA error while resume
  2007-08-19  9:12 SATA error while resume Maciek Rutecki
  2007-08-19 10:45 ` Tejun Heo
@ 2007-08-19 14:25 ` Mark Lord
  2007-08-19 15:16   ` Maciek Rutecki
  1 sibling, 1 reply; 6+ messages in thread
From: Mark Lord @ 2007-08-19 14:25 UTC (permalink / raw)
  To: Maciek Rutecki; +Cc: Linux-kernel, htejun, linux-ide, linux-acpi, rjw

Maciek Rutecki wrote:
> Kernel: 2.6.23-rc2 witch patches [1], but older and stable versions also
.. 
> Sometimes (one in ten, or rarely) I have this error while system resume
> from suspend to disk:
..
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> ata1.00: irq_stat 0x40000001
> ata1.00: cmd 25/00:00:10:0b:f3/00:04:05:00:00/e0 tag 0 cdb 0x0 data
> 524288 in
>          res 51/40:a4:6c:0b:f3/00:03:05:00:00/e0 Emask 0x9 (media error)
> ata1.00: configured for UDMA/100
> sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
>         05 f3 0b 6c
> sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate
> failed
> end_request: I/O error, dev sda, sector 99814252
> Read-error on swap-device (8:0:99814256)
...

Looks like a bad sector in the swap partition.
You can probably repair it by using this sequence of commands:

swapoff /dev/sdX	<--- replace sdX with actual swap partition dev name
sync
cat /dev/zero > /dev/sdX
mkswap /dev/sdX
swapon /dev/sdX

If it recurs after doing that, then it's time for a new drive.

-ml

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SATA error while resume
  2007-08-19 14:25 ` Mark Lord
@ 2007-08-19 15:16   ` Maciek Rutecki
  2007-08-19 15:49     ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Maciek Rutecki @ 2007-08-19 15:16 UTC (permalink / raw)
  To: Mark Lord; +Cc: Linux-kernel, htejun, linux-ide, linux-acpi, rjw

Mark Lord pisze:

> 
> Looks like a bad sector in the swap partition.
> You can probably repair it by using this sequence of commands:
> 
> swapoff /dev/sdX    <--- replace sdX with actual swap partition dev name
> sync
> cat /dev/zero > /dev/sdX
> mkswap /dev/sdX
> swapon /dev/sdX
> 
> If it recurs after doing that, then it's time for a new drive.
> 
> -ml
> -

rutek:/home/maciek# swapoff /dev/sda6
rutek:/home/maciek# sync
rutek:/home/maciek# cat /dev/zero > /dev/sda6
cat: błąd zapisu: Błąd wejścia/wyjścia (write error, after few minutes,
probably sda6 is full)

rutek:/home/maciek# dd if=/dev/zero of=/dev/sda6
dd: zapis do `/dev/sda6': Błąd wejścia/wyjścia
5992177+0 przeczytanych recordów
5992176+0 zapisanych recordów
skopiowane 3067994112 bajtów (3,1 GB), 298,159 sekund, 10,3 MB/s
rutek:/home/maciek# mkswap /dev/sda6
Setting up swapspace version 1, size = 3067990 kB
no label, UUID=2061df6e-d385-4367-9a4c-c8431e57b73a
rutek:/home/maciek# swapon /dev/sda6


dmesg:
Adding 2996080k swap on /dev/sda6.  Priority:-2 extents:1 across:2996080k

Also I try:
dd if==/dev/sda... of=/dev/null for all partitions
Test disk with bios utility and smartctl. Use autotest (bash shared
mapping and disktest). No errors/warnings. Only (sometimes) while system
resume from suspend to disk. Disk 10 months old...

Regards
-- 
Maciej Rutecki
http://www.unixy.pl



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SATA error while resume
  2007-08-19 15:16   ` Maciek Rutecki
@ 2007-08-19 15:49     ` Tejun Heo
  2007-08-19 16:25       ` Maciek Rutecki
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2007-08-19 15:49 UTC (permalink / raw)
  To: Maciek Rutecki; +Cc: Mark Lord, Linux-kernel, linux-ide, linux-acpi, rjw

Maciek Rutecki wrote:
> rutek:/home/maciek# swapoff /dev/sda6
> rutek:/home/maciek# sync
> rutek:/home/maciek# cat /dev/zero > /dev/sda6
> cat: błąd zapisu: Błąd wejścia/wyjścia (write error, after few minutes,
> probably sda6 is full)
> 
> rutek:/home/maciek# dd if=/dev/zero of=/dev/sda6
> dd: zapis do `/dev/sda6': Błąd wejścia/wyjścia
> 5992177+0 przeczytanych recordów
> 5992176+0 zapisanych recordów
> skopiowane 3067994112 bajtów (3,1 GB), 298,159 sekund, 10,3 MB/s
> rutek:/home/maciek# mkswap /dev/sda6
> Setting up swapspace version 1, size = 3067990 kB
> no label, UUID=2061df6e-d385-4367-9a4c-c8431e57b73a
> rutek:/home/maciek# swapon /dev/sda6
> 
> 
> dmesg:
> Adding 2996080k swap on /dev/sda6.  Priority:-2 extents:1 across:2996080k
> 
> Also I try:
> dd if==/dev/sda... of=/dev/null for all partitions
> Test disk with bios utility and smartctl. Use autotest (bash shared
> mapping and disktest). No errors/warnings. Only (sometimes) while system
> resume from suspend to disk. Disk 10 months old...

Hmmmm... Does Power-Off_Retract_Count increase after suspend/resume cycle?

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SATA error while resume
  2007-08-19 15:49     ` Tejun Heo
@ 2007-08-19 16:25       ` Maciek Rutecki
  0 siblings, 0 replies; 6+ messages in thread
From: Maciek Rutecki @ 2007-08-19 16:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Mark Lord, Linux-kernel, linux-ide, linux-acpi, rjw

Tejun Heo pisze:

> Hmmmm... Does Power-Off_Retract_Count increase after suspend/resume cycle?
>


No.

Before:

rutek:/home/maciek# smartctl -A -d ata /dev/sda  | grep
Power-Off_Retract_Count
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always
      -       66


Tested:

2.6.22.1 OK (double spin down of disk issue).

2.6.23-rc2 with patches [1] (prevent double spin down while suspend to
disk), also was tested ealier [2] OK

[1] http://www.ussg.iu.edu/hypermail/linux/kernel/0708.0/2655.html
[2] http://www.ussg.iu.edu/hypermail/linux/kernel/0708.0/2784.html


After:
rutek:/home/maciek# smartctl -A -d ata /dev/sda  | grep
Power-Off_Retract_Count
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always
      -       66


-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-08-19 16:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-19  9:12 SATA error while resume Maciek Rutecki
2007-08-19 10:45 ` Tejun Heo
2007-08-19 14:25 ` Mark Lord
2007-08-19 15:16   ` Maciek Rutecki
2007-08-19 15:49     ` Tejun Heo
2007-08-19 16:25       ` Maciek Rutecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).