From: Tejun Heo <tj@kernel.org>
To: Pierre Ossman <pierre-list@ossman.eu>
Cc: linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Strange read data corruption on ext4/LVM/md
Date: Thu, 20 May 2010 11:42:29 +0200 [thread overview]
Message-ID: <4BF50405.4070706@kernel.org> (raw)
In-Reply-To: <20100520112945.61bf9705@mjolnir.ossman.eu>
Hello,
On 05/20/2010 11:29 AM, Pierre Ossman wrote:
> The machine is rather crammed right now, with one controller in each of
> the three available pci-e slots (5 disks). I am running continuous tests
> on the disks right now though to see if the problems is on all disks or
> just some. If just one slot is causing problems then we should see some
> results there.
I see.
> When you say FIS corruption, do you mean corruption in the sense of
Oh, not FIS, FIS is the name for SATA packets. I meant the PCI-e
packets. How were they called... yeap TLPs.
> randomly flipped bits? I don't know if you saw the first couple of
> mails (before linux-ide was added), but the problem is data being moved
> around, not just randomly changed.
I ony saw your previous posting. TLP corruption can happen during
command setup phase and bit flipping in the command address part is
definitely possible, so reads and writes can be headed at wrong places
in both memory and disk. I don't know whether this would fit your
symptom tho.
> Another note is that the problem seems to worsen under load. I'm
> running the dd thing in the background, which seems to make read errors
> more common on my test files on the filesystem level.
It would be great if you can try a different controller in similar
setup. But please keep trying to narrow down the problem and if
possible please remove filesystem from the stack and test against the
block device directly.
Thanks.
--
tejun
next prev parent reply other threads:[~2010-05-20 9:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20100519225653.1fedb453@mjolnir.ossman.eu>
[not found] ` <20100519230426.47c6c1ed@mjolnir.ossman.eu>
[not found] ` <20100519232906.3be82279@mjolnir.ossman.eu>
[not found] ` <20100519233408.7436bd9b@mjolnir.ossman.eu>
2010-05-20 7:14 ` Strange read data corruption on ext4/LVM/md Pierre Ossman
2010-05-20 8:57 ` Tejun Heo
2010-05-20 9:29 ` Pierre Ossman
2010-05-20 9:42 ` Tejun Heo [this message]
2010-05-20 10:22 ` Pierre Ossman
2010-05-20 14:00 ` Pierre Ossman
2010-05-20 16:28 ` Pierre Ossman
2010-07-15 19:38 ` Pierre Ossman
2011-01-16 14:01 ` Pierre Ossman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BF50405.4070706@kernel.org \
--to=tj@kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pierre-list@ossman.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).