From: Oliver Schinagl <oliver+list@schinagl.nl>
To: Peter Grandi <pg@lxra2.for.sabi.co.UK>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: OT: silent data corruption reading from hard drives
Date: Thu, 16 Aug 2012 09:30:51 +0200 [thread overview]
Message-ID: <502CA1AB.8030706@schinagl.nl> (raw)
In-Reply-To: <20524.6904.183888.747899@tree.ty.sabi.co.UK>
On 15-08-12 23:55, Peter Grandi wrote:
> [ ... ]
>
>> In my opinion, any corruption noticed in a non-ECC system is
>> most likely due to the RAM.
> That's pretty common, but many disk drive models also have bugs,
> and most hw RAID host adapters have many (terrible) bugs.
>
>> You really need to run memtest86 on your system, preferably
>> for 24 hours or more.
> Even that is not conclusive. Some "memory" errors are due to
> activity/noise spikes on the PCI/PCIe bus due to hw bugs or
> poorly electrically designed cards.
>
>>>> Hard drives write extensive ECC payloads to catch
>>>> corruptions there; SATA and SAS protocols have CRC checks on
>>>> every frame transferred;
> A warning to the masses: USB mass storage is weak as to this and
> in particular as to error recovery, and most USB chipsets
> (especially USB-drive ones, but also motherboard ones) are
> massively buggy.
>
>>>> the PCIe bus uses CRC checks on each lane, with low-level
>>>> encoding very similar to SATA. Even modern processors are
>>>> using PCIe-style encoded [ ... ]
>> [ ... ] machine handling data you really care about
> ... should have end-to-end verification, that is the data itself
> should be checksummed at least to detect corruption. For example
> by putting it into checksummed containers (even just ZIP without
> compression).
>
>> should have ECC ram.
> Oh yes, and any machine should have ECC RAM as the cost is
> really modest. Unfortunately the usual evil marketers like to
> segment artificially the market into cheap stuff without ECC and
> premium stuff with ECC, and will not put ECC into cheap stuff to
> avoid tempting business customers to buy it instead of the
> premium stuff.
While I agree that all machine's should have ECC Ram (there are still
some people think its not worth it), last time I checked on newegg, I
found ECC prices not that much higher. My servers both run happily with
ECC ram.
As for data corruption, I've also been there and know it simply just
happens. Yes I had shitty IDE drives on a shitty 'rocketraid 404'
controller, but that's no excuse to simply assume all data will always
be right. Maybe in a few years from now, we'll have some 'open cores'
for properly designed almost bug free hardware :)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-08-16 7:30 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-01 12:02 OT: silent data corruption reading from hard drives matt
2012-08-01 13:03 ` Roman Mamedov
2012-08-02 0:56 ` Stan Hoeppner
2012-08-02 1:07 ` Roberto Spadim
2012-08-02 1:14 ` Roberto Spadim
2012-08-02 1:27 ` Adam Goryachev
2012-08-02 1:35 ` Roberto Spadim
2012-08-02 3:23 ` Stan Hoeppner
2012-08-02 13:02 ` Drew
2012-08-02 3:19 ` Roman Mamedov
2012-08-02 7:51 ` Stan Hoeppner
2012-08-02 8:06 ` Roman Mamedov
2012-08-02 9:29 ` Stan Hoeppner
2012-08-02 12:26 ` Iustin Pop
2012-08-02 16:59 ` listy
2012-08-02 17:04 ` Roberto Spadim
2012-08-02 17:13 ` Jeff Johnson
2012-08-02 17:19 ` Roman Mamedov
2012-08-02 17:25 ` Roberto Spadim
2012-08-02 17:22 ` Roberto Spadim
[not found] ` <501AB9D8.1030404@turmel.org>
2012-08-02 18:32 ` listy
2012-08-03 13:36 ` Phil Turmel
2012-08-15 21:55 ` Peter Grandi
2012-08-16 7:30 ` Oliver Schinagl [this message]
[not found] ` <CABYL=TqU6qvDK-CuFak42iVNj0v4OcvALXOnr=6XLM4HyXfGkw@mail.gmail.com>
2012-08-16 14:33 ` Roberto Spadim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=502CA1AB.8030706@schinagl.nl \
--to=oliver+list@schinagl.nl \
--cc=linux-raid@vger.kernel.org \
--cc=pg@lxra2.for.sabi.co.UK \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.