linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* smartd causing SATA timeouts on sleeping drives
@ 2007-10-06  1:38 Andrew Paprocki
  2007-10-06 20:15 ` Tejun Heo
  2007-10-10 19:46 ` Bruce Allen
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Paprocki @ 2007-10-06  1:38 UTC (permalink / raw)
  To: linux-ide; +Cc: Tejun Heo, Bruce Allen

Tejun/Bruce,

I tracked down the source of timeouts I have been frequently getting.
It appears smartd is not properly handling drives that are spun down
by the BIOS ACPI settings. I have SATA timeouts which occur every half
hour (the default -i 1800 in smartd) that do not occur when smartd is
not running. The drives smartd is configured to look at have a sleep
time configured in the BIOS. When the drives are asleep, I get a soft
reset every half hour as smartd attempts to access the drives. While
in this state, smartd also reports bad state to syslog (e.g.
temperature changes to 200C). Just for comparison, hddtemp knows the
drives are sleeping:

# hddtemp /dev/sda
/dev/sda: Hitachi HDS721010KLA330                 : drive is sleeping
# ls /storage
... wakes up the drives ...
# hddtemp /dev/sda
/dev/sda: Hitachi HDS721010KLA330                 :  29 C or  F

I'm pasting the example cmd / timeout error / soft reset below. Also,
I'm pasting the invalid settings which smartd detects when in this
state. What needs to change for smartd to recognize drives are
sleeping and either not perform its checks, or forcefully wake them up
to perform them? (Should that be a configuration parameter in smartd?)

Thanks,
-Andrew

# uname -a
Linux (none) 2.6.22.6 #5 Mon Sep 10 02:15:22 EDT 2007 i586 unknown
(Using sata_sil on 3114 chips)

# smartctl -V
smartmontools release 5.38 dated 2006/12/20 at 20:37:59 UTC
...
smartctl compile dated Sep 17 2007 at 13:47:25
(repository code checked out on Sep 17th)

# cat /var/run/smartd.conf
/dev/sda -d ata -a -S on -s (S/../.././02|L/../../6/03)
/dev/sdb -d ata -a -S on -s (S/../.././02|L/../../6/03)

What happens every 30 minutes when drives are sleeping:

Oct  6 01:05:48 (none) user.err kernel: ata2.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x2 frozen
Oct  6 01:05:48 (none) user.err kernel: ata2.00: cmd
b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
Oct  6 01:05:48 (none) user.warn kernel:          res
40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct  6 01:05:53 (none) user.warn kernel: ata2: port is slow to
respond, please be patient (Status 0xd0)
Oct  6 01:05:55 (none) user.info kernel: ata2: soft resetting port
Oct  6 01:05:56 (none) user.info kernel: ata2: SATA link up 1.5 Gbps
(SStatus 113 SControl 310)
Oct  6 01:05:56 (none) user.info kernel: ata2.00: configured for UDMA/100
Oct  6 01:05:56 (none) user.info kernel: ata2: EH complete
Oct  6 01:05:56 (none) user.notice kernel: sd 1:0:0:0: [sdb]
1953525168 512-byte hardware sectors (1000205 MB)
Oct  6 01:05:56 (none) user.notice kernel: sd 1:0:0:0: [sdb] Write
Protect is off
Oct  6 01:05:56 (none) user.debug kernel: sd 1:0:0:0: [sdb] Mode
Sense: 00 3a 00 00
Oct  6 01:05:56 (none) user.notice kernel: sd 1:0:0:0: [sdb] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA

Invalid attribute values:

Oct  2 22:35:21 (none) daemon.info smartd[585]: Device: /dev/sda,
SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 87 to 86
Oct  2 23:35:21 (none) daemon.info smartd[585]: Device: /dev/sda,
SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 86 to 85
Oct  5 20:05:56 (none) daemon.info smartd[585]: Device: /dev/sdb,
SMART Prefailure Attribute: 3 Spin_Up_Time changed from 84 to 85
Oct  6 01:05:38 (none) daemon.info smartd[585]: Device: /dev/sda,
SMART Usage Attribute: 194 Temperature_Celsius changed from 200 to 206
Oct  6 01:05:56 (none) daemon.info smartd[585]: Device: /dev/sdb,
SMART Usage Attribute: 194 Temperature_Celsius changed from 193 to 200

Once the drives are started up, those values report:

  3 Spin_Up_Time            0x0007   085   085   024    Pre-fail
Always       -       821 (Average 820)
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail
Always       -       0
194 Temperature_Celsius     0x0002   193   193   000    Old_age
Always       -       31 (Lifetime Min/Max 24/67)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-10-12  9:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-06  1:38 smartd causing SATA timeouts on sleeping drives Andrew Paprocki
2007-10-06 20:15 ` Tejun Heo
2007-10-08  5:51   ` Andrew Paprocki
2007-10-08  6:06     ` Tejun Heo
2007-10-08  6:32       ` Andrew Paprocki
2007-10-10 19:39         ` Bruce Allen
2007-10-11  2:02           ` Tejun Heo
2007-10-11  2:46             ` Andrew Paprocki
2007-10-11  3:06               ` Tejun Heo
2007-10-11  4:00             ` Andrew Paprocki
2007-10-12  9:15             ` [smartmontools-devel] " Bruce Allen
2007-10-10 19:42   ` Bruce Allen
2007-10-10 19:46 ` Bruce Allen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).