* regarding crazy head unloads
@ 2008-05-23 1:36 Tejun Heo
2008-05-23 11:36 ` Tejun Heo
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Tejun Heo @ 2008-05-23 1:36 UTC (permalink / raw)
To: Bruce Allen
Cc: roland.kletzing, Smartmontools Developers List,
IDE/ATA development list, Jeff Garzik, Mark Lord, Alan Cox, scott
[-- Attachment #1: Type: text/plain, Size: 1613 bytes --]
Hello, ATA/SMART fellows.
This message is regarding crazy head unloads on certain laptops. In a
desperate attempt to increase battery time, some vendors configure ATA
APM (advanced power management) too aggressive to the point of being
fragile (can even be triggered on Windows) and the drive unloads head
like crazy and kills itself quickly (in months). For more information,
please take a look at the following links.
https://bugzilla.novell.com/show_bug.cgi?id=386555
https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695
http://www.thinkwiki.org/wiki/Problem_with_hard_drive_clicking
This primarily is those hardware vendors' faults and updating their
firmware is probably the best way to fix it; however, it can actually
kill the harddrive which usually causes a lot of anxiety and stress on
the user, so I think we need to take some measures.
Attached are storage-fixup script which is to be called during boot and
resume and configuration file to go under /etc. The script can match
dmi and hal properties and execute commands on the matching devices.
The config file currently only contains three rules.
Here are two ideas to better handle this problem:
1. Describe the problem on linux-ata.org and ask people to report
dmidecode and hdparm -I output on affected machines. Share
storage-fixup (or any other alternative) and storage-fixup.conf on the page.
2. This is from Roland. Make smartd aware of the problem and warn user
if load/unload count per powered on hours goes too high. Maybe the
warning can direct the user to linux-ata.org page?
Thanks.
--
tejun
[-- Attachment #2: storage-fixup --]
[-- Type: text/plain, Size: 4116 bytes --]
#! /bin/bash
#
# storage-fixup - Tejun Heo <teheo@suse.de>
#
# Script to issue fix up commands for weird disks. This is primarily
# to adjust ATA APM setting. Some laptop BIOSen set this value too
# aggressively causing frequent head unloads which can kill the drive
# quickly. This script should be called during boot and resume. It
# examines rules from /etc/stroage-fixup.conf and executes matching
# commands.
#
# In stroage-fixup.conf, empty lines and lines starting w/ # are
# ignored. Each line starts with rule, dmi, hal or act.
#
# rule RULENAME
# Starts a rule. $RULENAME can't contain whitespaces.
#
# dmi KEY VALUE
# Checks whether DMI value for KEY matches VALUE. If not, the
# rule is skipped.
#
# hal KEY VALUE
# Checks whether there are devices which has KEY value matching
# VALUE. storage-fixup determines applies actions to devices
# which match all hal matches, so all rules should have at least
# one hal match.
#
# act ACTION
# Executes ACTION on matched devices. ACTION can contain $DEV
# which will be substituted with device file of matching device.
#
# For example, the following (useless) rule disables APM on the first
# harddrive of my machine.
#
# rule p5w64
# dmi baseboard-product-name P5W64 WS Pro
# dmi baseboard-manufacturer ASUSTeK Computer INC.
# hal storage.model WDC WD5000YS-01M
# hal storage.serial SATA_WDC_WD5000YS-01_WD-WMANU1217262
# act hdparm -B 255 $DEV
#
conf_file=${CONF_FILE:-/etc/storage-fixup.conf}
hal_find_by_property=${HAL_FIND_BY_PROPERTY:-hal-find-by-property}
hal_get_property=${HAL_GET_PROPERTY:-hal-get-property}
dmidecode=${DMIDECODE:-dmidecode}
verbose=0
lineno=0
skip=0
rule_name=""
declare -a dev_ids
newline=$'\n'
log() {
echo "storage-fixup: $@"
}
warn() {
log "$@" 1>&2
}
debug() {
if [ $verbose -ne 0 ]; then
warn "$@"
fi
}
#
# Match functions - do_dmi() and do_hal() - execute DMI and HAL
# matches respectively. Return value 0 indicates match, 1 invalid
# match (triggers warning) and 2 mismatch.
#
do_dmi() {
local val
if [ -z "$1" -o -z "$2" ]; then
return 1
fi
val=$($dmidecode --string "$1")
if [ "$?" -ne 0 ]; then
return 1
fi
if [ "$val" = "$2" ]; then
debug "Y $lineno $rule_name dmi $1=$2"
return 0;
fi
debug "N $lineno $rule_name dmi $1=$2"
return 2
}
do_hal() {
local i out ifs_store append=0
if [ -z "$1" -o -z "$2" ]; then
return 1
fi
if [ ${#dev_ids[@]} -eq 0 ]; then
append=1
fi
#
# bash really isn't a good programming language for this kind of
# stuff and makes it look much more complex than it needs to be.
# The following loop executes hal-find-by-property and ands the
# result with the previous result.
#
ifs_store="$IFS"
IFS="$newline"
dev_ids=(
$($hal_find_by_property --key "$1" --string "$2" \
| while read found; do
if [ $append -ne 0 ]; then
echo "$found"
else
for id in "${dev_ids[@]}"; do
if [ "$id" = "$found" ]; then
echo "$found"
break
fi
done
fi
done))
IFS="$ifs_store"
if [ "$?" -ne 0 ]; then
debug "N $lineno $rule_name hal $1=$2"
return 2
fi
debug "Y $lineno $rule_name hal nr_devs=${#dev_ids[@]} $1=$2"
return 0
}
do_act() {
local id dev
for id in "${dev_ids[@]}"; do
if ! DEV=$($hal_get_property --udi "$id" --key block.device); then
warn "can't find device node for $id"
continue
fi
eval log "$rule_name: executing \"$1\""
eval "$1"
done
return 0
}
while read f0 f1 f2; do
true $((lineno++))
if [ -z ${f0###*} ]; then
continue
fi
if [ "$f0" = rule ]; then
rule_name=$f1
skip=0
dev_ids=()
continue
fi
if [ $skip -ne 0 ]; then
continue
fi
case "$f0" in
dmi)
do_dmi "$f1" "$f2"
;;
hal)
do_hal "$f1" "$f2"
;;
act)
do_act "$f1 $f2"
;;
*)
false
;;
esac
ret=$?
if [ $ret -ne 0 ]; then
if [ $ret -eq 1 ]; then
warn "malformed line $lineno \"$f0 $f1 $f2\","\
"skipping rule $rule_name" 2>&1
fi
skip=1
fi
done < $conf_file
[-- Attachment #3: storage-fixup.conf --]
[-- Type: text/plain, Size: 510 bytes --]
rule tp-t60
dmi system-manufacturer LENOVO
dmi system-product-name 1952W5R
dmi system-version ThinkPad T60
hal storage.model Hitachi HTS722020K9SA00
act hdparm -B 255 $DEV
rule hp-dv6500
dmi system-manufacturer Hewlett-Packard
dmi system-product-name HP Pavilion dv6500 Notebook PC
dmi system-version Rev 1
hal storage.model SAMSUNG HM250JI
act hdparm -B 255 $DEV
rule dell-e1505
dmi system-manufacturer Dell Inc.
dmi system-product-name MM061
hal storage.model ST9100824AS
act hdparm -B 255 $DEV
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: regarding crazy head unloads
2008-05-23 1:36 regarding crazy head unloads Tejun Heo
@ 2008-05-23 11:36 ` Tejun Heo
2008-06-02 9:55 ` Scott James Remnant
2008-09-13 11:44 ` Bruce Allen
2 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2008-05-23 11:36 UTC (permalink / raw)
To: Bruce Allen
Cc: roland.kletzing, Smartmontools Developers List,
IDE/ATA development list, Jeff Garzik, Mark Lord, Alan Cox, scott
[-- Attachment #1: Type: text/plain, Size: 173 bytes --]
Here's updated version which can do glob matching and other stuff. I've
also set up a git tree.
http://git.kernel.org/?p=linux/kernel/git/tj/storage-fixup.git
--
tejun
[-- Attachment #2: storage-fixup --]
[-- Type: text/plain, Size: 7392 bytes --]
#! /bin/bash
#
# storage-fixup - Tejun Heo <teheo@suse.de>
#
# Script to issue fix up commands for weird disks. This is primarily
# to adjust ATA APM setting. Some laptop BIOSen set this value too
# aggressively causing frequent head unloads which can kill the drive
# quickly. This script should be called during boot and resume. It
# examines rules from /etc/stroage-fixup.conf and executes matching
# commands.
#
# In stroage-fixup.conf, empty lines and lines starting w/ # are
# ignored. Each line starts with rule, dmi, hal or act.
#
# rule RULENAME
# Starts a rule. $RULENAME can't contain whitespaces.
#
# dmi KEY PATTERN
# Checks whether DMI value for KEY matches PATTERN. If not, the
# rule is skipped.
#
# hal KEY PATTERN
# Checks whether there are devices which has KEY value matching
# PATTERN. storage-fixup determines applies actions to devices
# which match all hal matches, so all rules should have at least
# one hal match.
#
# act ACTION
# Executes ACTION on matched devices. ACTION can contain $DEV
# which will be substituted with device file of matching device.
#
# PATTERN is bash glob pattern.
#
# For example, the following (useless) rule disables APM on the first
# harddrive of my machine.
#
# rule p5w64
# dmi baseboard-product-name P5W64 WS Pro
# dmi baseboard-manufacturer ASUSTeK Computer INC.
# hal storage.model WDC WD5000YS-01M
# hal storage.serial *-01_WD-WMANU1217262
# act hdparm -B 255 $DEV
#
declare usage="
Usage: storage-fixup [-h] [-V] [-v] [-b] [-c config_file]
-h Print this help message and exit
-V Print version and exit
-v Verbose
-d Dry run, don't actually execute action
-c Use config_file instead of /etc/storage-fixup.conf
"
declare hal_find_by_capability=${HAL_FIND_BY_CAPABILITY:-hal-find-by-capability}
declare hal_get_property=${HAL_GET_PROPERTY:-hal-get-property}
declare dmidecode=${DMIDECODE:-dmidecode}
declare version=0.1
declare conf_file=/etc/storage-fixup.conf
declare newline=$'\n'
declare dry_run=0 verbose=0 lineno=0 skip=0 rule_name="" reply
declare -a storage_ids
declare -a hal_cache
declare -a matches
log() {
echo "storage-fixup: $@"
}
warn() {
log "$@" 1>&2
}
debug() {
if [ $verbose -ne 0 ]; then
warn "$@"
fi
}
#
# do_dmi - perform DMI match
# @key: DMI key to be passed as --string argument to dmidecode
# @pattern: glob pattern to match
#
# Returns 0 on match, 1 on mismatch, 2 on invalid match (triggers
# warning).
#
do_dmi() {
local key="$1" pattern="$2"
local val
if [ -z "$key" -o -z "$pattern" ]; then
return 1
fi
val=$($dmidecode --string "$key")
if [ "$?" -ne 0 ]; then
return 2
fi
if [ -z "${val##$pattern}" ]; then
debug "Y $lineno $rule_name dmi $key=$pattern"
return 0
fi
debug "N $lineno $rule_name dmi $key=$pattern"
return 1
}
#
# search_hal_cache - search hal cache
# @id: udi of the device to search for
# @key: key of hal property to search
#
# Searches hal cache and returns 0 if found, 1 if @key properties are
# cached but matching entry is not found, 2 if @key properties are not
# cached yet. On success, the matched property is returned in $reply.
#
search_hal_cache() {
local id="$1" key="$2"
local i key_found=0 cache len match
reply=
for ((i=0;i<${#hal_cache[@]};i++)); do
cache=${hal_cache[i]}
len=${#cache}
match="${cache#$key }"
if [ ${#match} -ne $len ]; then
key_found=1
elif [ $key_found -eq 1 ]; then
return 1
fi
match="${cache#$key $id }"
if [ ${#match} -ne $len ]; then
reply="$match"
return 0
fi
done
if [ $key_found -eq 1 ]; then
return 1
else
return 2
fi
}
#
# fetch_hal_property - fetch hal property matching id and key
# @id: udi of the device to fetch property for
# @key: key of the property to fetch
#
# Fetch @key property for udi @id. If @key properties are already
# cached, it's returned from cache. If not, cache is populated with
# @key properties and searched again.
#
# Returns 0 if found, 1 if not found, 2 if something went wrong. On
# success, the matched property is returned in $reply.
#
fetch_hal_property() {
local id="$1" key="$2" property
local i ret tid cnt=0
# search cache
search_hal_cache "$id" "$key"
ret=$?
if [ $ret -ne 2 ]; then
return $ret
fi
# $key wasn't in the cache, populate the cache
# placeholder indicating $key has been populated
hal_cache+=("$key ")
# run hal-get-property on each storage device and put the result in cache
for ((i=0;i<${#storage_ids[@]};i++)); do
tid="${storage_ids[i]}"
property="$($hal_get_property --udi "$tid" --key "$key")"
if [ -n "$property" ]; then
hal_cache+=("$key $tid $property")
true $((cnt++))
fi
done
debug "C $cnt entries added to hal cache for $key"
# and retry
search_hal_cache "$id" "$key"
return $?
}
#
# do_hal - perform HAL match
# @key: property key of interest
# @pattern: pattern to match
#
# Walk through $matches array and match each id against @key and
# @pattern. Entries which don't match are removed from $matches.
#
# Returns 0 if $matches contain any entry after matching, 1 if it's
# empty, 2 if something went wrong.
#
do_hal() {
local key="$1" pattern="$2" property i
local -a old_matches=("${matches[@]}")
if [ -z "$1" -o -z "$2" ]; then
return 2
fi
matches=()
for ((i=0;i<${#old_matches[@]};i++)); do
fetch_hal_property "${old_matches[i]}" "$key"
if [ $? -eq 0 -a -z "${reply##$pattern}" ]; then
matches+=("${old_matches[i]}")
fi
done
if [ ${#matches[@]} -eq 0 ]; then
debug "N $lineno $rule_name hal $1=$2"
return 1
fi
debug "Y $lineno $rule_name hal nr_devs=${#matches[@]} $1=$2"
return 0
}
#
# do_act - execute action
# @act: action to execute
#
# Execute @act for each device in $matches. "$DEV" in @act is
# substituted with the /dev node of each match. If $dry_run is set,
# the action is logged but not actually executed.
#
# Returns 0.
#
do_act() {
local act="$1"
local id dev
for id in "${matches[@]}"; do
if ! DEV=$($hal_get_property --udi "$id" --key block.device); then
warn "can't find device node for $id"
continue
fi
if [ $dry_run -eq 0 ]; then
eval log "$rule_name: executing \"$act\""
eval "$1"
else
eval log "$rule_name: dry-run \"$act\""
fi
done
return 0
}
#
# Execution starts here
#
while getopts "dvVc:h" option; do
case $option in
d)
dry_run=1;;
v)
verbose=1;;
V)
echo "$version"
exit 0;;
c)
conf_file=$OPTARG;;
*)
echo "$usage" 2>&1
exit 1;;
esac
done
storage_ids=($($hal_find_by_capability --capability storage))
debug "I ${#storage_ids[@]} storage devices"
while read f0 f1 f2; do
true $((lineno++))
if [ -z ${f0###*} ]; then
continue
fi
if [ "$f0" = rule ]; then
rule_name=$f1
skip=0
matches=("${storage_ids[@]}")
continue
fi
if [ $skip -ne 0 ]; then
continue
fi
case "$f0" in
dmi)
do_dmi "$f1" "$f2"
;;
hal)
do_hal "$f1" "$f2"
;;
act)
do_act "$f1 $f2"
;;
*)
false
;;
esac
ret=$?
if [ $ret -ne 0 ]; then
if [ $ret -eq 2 ]; then
warn "malformed line $lineno \"$f0 $f1 $f2\","\
"skipping rule $rule_name" 2>&1
fi
skip=1
fi
done < $conf_file
[-- Attachment #3: storage-fixup.conf --]
[-- Type: text/plain, Size: 1351 bytes --]
#
# /etc/storage-fixup.conf - Configuration file for storage-fixup
#
# Blank lines and lines starting with # are ignored. Please read
# comment at the top of storage-fixup for more information.
#
# Drive model patterns are generalized to cover drives from the same
# family. Drive manufacturers usually have datasheets or web pages
# listing all models of the same family.
#
# The DMI part is difficult to generalize as there's no such
# information. We'll have to generalize as we collect entries.
#
# If you have a harddrive which does crazy unloading but not listed
# here, please write to linux-ide@vger.kernel.org with the outputs of
# "dmidecode" and "hdparm -I DRIVE", on a laptop the DRIVE is usually
# /dev/sda.
#
# Reported drive model: Hitachi HTS722020K9SA00
rule tp-t60
dmi system-manufacturer LENOVO
dmi system-product-name 1952W5R
dmi system-version ThinkPad T60
hal storage.model Hitachi HTS7220*K9*A*
act hdparm -B 255 $DEV
# Reported drive model: SAMSUNG HM250JI
rule hp-dv6500
dmi system-manufacturer Hewlett-Packard
dmi system-product-name HP Pavilion dv6500 Notebook PC
dmi system-version Rev 1
hal storage.model SAMSUNG HM*I
act hdparm -B 255 $DEV
# Reported drive model: ST9100824AS
rule dell-e1505
dmi system-manufacturer Dell Inc.
dmi system-product-name MM061
hal storage.model ST9*AS
act hdparm -B 255 $DEV
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: regarding crazy head unloads
2008-05-23 1:36 regarding crazy head unloads Tejun Heo
2008-05-23 11:36 ` Tejun Heo
@ 2008-06-02 9:55 ` Scott James Remnant
2008-06-09 1:56 ` Tejun Heo
2008-09-13 11:44 ` Bruce Allen
2 siblings, 1 reply; 6+ messages in thread
From: Scott James Remnant @ 2008-06-02 9:55 UTC (permalink / raw)
To: Tejun Heo
Cc: Bruce Allen, roland.kletzing, Smartmontools Developers List,
IDE/ATA development list, Jeff Garzik, Mark Lord, Alan Cox
[-- Attachment #1: Type: text/plain, Size: 441 bytes --]
On Fri, 2008-05-23 at 10:36 +0900, Tejun Heo wrote:
> Attached are storage-fixup script which is to be called during boot and
> resume and configuration file to go under /etc. The script can match
> dmi and hal properties and execute commands on the matching devices.
> The config file currently only contains three rules.
>
Could this not be done better inside HAL itself?
Scott
--
Scott James Remnant
scott@ubuntu.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regarding crazy head unloads
2008-06-02 9:55 ` Scott James Remnant
@ 2008-06-09 1:56 ` Tejun Heo
0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2008-06-09 1:56 UTC (permalink / raw)
To: Scott James Remnant
Cc: Bruce Allen, roland.kletzing, Smartmontools Developers List,
IDE/ATA development list, Jeff Garzik, Mark Lord, Alan Cox
Scott James Remnant wrote:
> On Fri, 2008-05-23 at 10:36 +0900, Tejun Heo wrote:
>
>> Attached are storage-fixup script which is to be called during boot and
>> resume and configuration file to go under /etc. The script can match
>> dmi and hal properties and execute commands on the matching devices.
>> The config file currently only contains three rules.
>>
> Could this not be done better inside HAL itself?
Could be. I really don't know about HAL. There were several problems
w/ HAL and I had to drop HAL dependency tho. HAL depends on a lot of
things and thus is started too late which makes it impossible to use in
early stages of boot (e.g. single mode). Another problem was that HAL
currently uses truncated ATA model ID string reported through SCSI
layer. The updated version is in the following git tree.
http://git.kernel.org/?p=linux/kernel/git/tj/storage-fixup.git;a=shortlog;h=master
If there's any better way to deal with this, I'll be happy to drop the
ugly script.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regarding crazy head unloads
2008-05-23 1:36 regarding crazy head unloads Tejun Heo
2008-05-23 11:36 ` Tejun Heo
2008-06-02 9:55 ` Scott James Remnant
@ 2008-09-13 11:44 ` Bruce Allen
2008-09-30 5:36 ` Tejun Heo
2 siblings, 1 reply; 6+ messages in thread
From: Bruce Allen @ 2008-09-13 11:44 UTC (permalink / raw)
To: Tejun Heo
Cc: roland.kletzing, Smartmontools Developers List,
IDE/ATA development list, Jeff Garzik, Mark Lord, Alan Cox, scott
Hi Tejun,
Sorry, this is a really really slow reply. This email came at an
unusually busy time and I did not reply before it got buried in the
deluge.
> 2. This is from Roland. Make smartd aware of the problem and warn user
> if load/unload count per powered on hours goes too high. Maybe the
> warning can direct the user to linux-ata.org page?
It would be easy to implement this in smartd. But I'm concerned about
three things:
-- most people don't run smartd (I think by default it is off on
the main Linux distros). Those wise enough to run smartd will
probably be aware of the issue, since it's gotten lots of buzz. And
even if smartd is turned on in the distro it is normally not
configured to send warning emails.
-- I'm not sure if the attribute used for head load/unload is a standard
one. On my current laptop Hitachi disk it is attribute 193. Does
anyone know if this is standard?
-- I'm not sure at what threshold to send warning emails. If we send
too many false alarms, they will be ignored by users.
Let me know if you have suggestions or comments about these points.
Cheers,
Bruce
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: regarding crazy head unloads
2008-09-13 11:44 ` Bruce Allen
@ 2008-09-30 5:36 ` Tejun Heo
0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2008-09-30 5:36 UTC (permalink / raw)
To: Bruce Allen
Cc: roland.kletzing, Smartmontools Developers List,
IDE/ATA development list, Jeff Garzik, Mark Lord, Alan Cox, scott
Bruce Allen wrote:
> Hi Tejun,
>
> Sorry, this is a really really slow reply. This email came at an
> unusually busy time and I did not reply before it got buried in the deluge.
Well, you were slow but I'm slower. :-)
>> 2. This is from Roland. Make smartd aware of the problem and warn user
>> if load/unload count per powered on hours goes too high. Maybe the
>> warning can direct the user to linux-ata.org page?
>
> It would be easy to implement this in smartd. But I'm concerned about
> three things:
>
> -- most people don't run smartd (I think by default it is off on
> the main Linux distros). Those wise enough to run smartd will
> probably be aware of the issue, since it's gotten lots of buzz. And
> even if smartd is turned on in the distro it is normally not
> configured to send warning emails.
We only have to hit one instance of each configuration where such
problem is occurring. I think adding warning message out smartctl
output or letting smartd warn it to users via email or log messages
should be good enough for this problem.
> -- I'm not sure if the attribute used for head load/unload is a standard
> one. On my current laptop Hitachi disk it is attribute 193. Does
> anyone know if this is standard?
I have no idea whatsoever. Maybe implement a regexp matching on
attribute name?
> -- I'm not sure at what threshold to send warning emails. If we send
> too many false alarms, they will be ignored by users.
To get it correct completely, I think we'll need per-device database.
Not easy.... :-(
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-09-30 5:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-23 1:36 regarding crazy head unloads Tejun Heo
2008-05-23 11:36 ` Tejun Heo
2008-06-02 9:55 ` Scott James Remnant
2008-06-09 1:56 ` Tejun Heo
2008-09-13 11:44 ` Bruce Allen
2008-09-30 5:36 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).