From: Tejun Heo <tj@kernel.org>
To: Janos Haar <janos.haar@netcenter.hu>
Cc: linux-ide@vger.kernel.org
Subject: Re: possible data corruption on ICH8 or WD raptor
Date: Sat, 02 Aug 2008 00:36:06 +0900 [thread overview]
Message-ID: <48932D66.10700@kernel.org> (raw)
In-Reply-To: <037e01c8f3c2$f81c5330$0400a8c0@dcccs>
Janos Haar wrote:
>> You're getting PHY event on flush which is a pretty strong indication
>> that you're having power problem. The disk goes out to transfer data in
>> its buffer to the platter and draws more power from the cable. For some
>> reason, power is not maintained properly. Disk checks out momentarily
>> causing the PHY event and losing the data in its buffer. Try to connect
>> the harddrive to a separate PSU and see whether the problem goes away.
>
> Thank you for the answer.
>
> Now, this server is a productive syetem, and runs an important application.
> The problem generally exists, but looks like comes only when i am
> testing the transfer with big files.
> (the application does not do that)
>
> About the power:
> This PC have one 650W Chieftech PS, 1 quad core cpu, and 6 hdd.
> I have previously measured the power current on the line, and the PC
> uses only 100-120W on peak.
>
> The problem only comes on the 4 raptor hdd, and this drive only uses
> each 6W. (from the documentation).
>
> It is hard to try separate PS or something hw solution.
> Additionally, generally i think it is not power issue, i am 90% sure.
Don't be too sure. Power problems seem pretty common. We (or rather I)
often suggest ruling out power problem first and often see unexpectedly
high portion of weird problems actually are caused by power. And in
most of those cases, the wattage or brand printed on the PSU didn't mean
much.
> Are you sure this can not be software issue?
> If you say yes, i will go into the server room, and will try another ps
> anyway....
No, I'm not sure at all it can't be a software issue. What I know are...
* FLUSH is one of the less likely commands which can trigger state
machine or transfer logic problem. It's a command without any data.
Pretty difficult to get that wrong while getting others correct.
* Without ruling power problem out, debugging is really difficult as
power problems could manifest in unpredictable ways. Plus, ruling out
power problem isn't too difficult. Just hook up a separate PSU and
connect problematic hard drives to it.
* For some reason, we've been seeing good portion of weird link related
or data corruption problems following timeout or phy event turn out to
be power related ones. I get the link problems as serial highspeed
links are highly susceptible to interferences. I don't know why
suddenly there seemingly are more machines where disk looses data due to
power instability. Maybe SATA made it cheap and easy to hook up more
disks to a machine. Maybe those multi-lane power supplies just suck. I
don't know.
If you can't hook up a separate PSU, can you please run "smartctl -a
/dev/sdX" right after boot and again after the phy error occurs and
report the results?
--
tejun
next prev parent reply other threads:[~2008-08-01 15:36 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-14 21:48 possible data corruption on ICH8 or WD raptor Janos Haar
2008-08-01 4:29 ` Tejun Heo
2008-08-01 10:40 ` Janos Haar
2008-08-01 15:36 ` Tejun Heo [this message]
2008-08-01 22:40 ` Alan Cox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48932D66.10700@kernel.org \
--to=tj@kernel.org \
--cc=janos.haar@netcenter.hu \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).