From: Tomasz Kusmierz <tom.kusmierz@gmail.com>
To: Chris Mason <chris.mason@fusionio.com>,
Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>,
Chris Mason <clmason@fusionio.com>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs for files > 10GB = random spontaneous CRC failure.
Date: Tue, 05 Feb 2013 14:10:43 +0000 [thread overview]
Message-ID: <511112E3.1020309@gmail.com> (raw)
In-Reply-To: <20130205124923.GA20797@shiny>
On 05/02/13 12:49, Chris Mason wrote:
> On Tue, Feb 05, 2013 at 03:16:34AM -0700, Tomasz Kusmierz wrote:
>> On 16/01/13 09:21, Bernd Schubert wrote:
>>> On 01/16/2013 12:32 AM, Tom Kusmierz wrote:
>>>
>>>> p.s. bizzare that when I "fill" ext4 partition with test data everything
>>>> check's up OK (crc over all files), but with Chris tool it gets
>>>> corrupted - for both Adaptec crappy pcie controller and for mother board
>>>> built in one. Also since courses of history proven that my testing
>>>> facilities are crap - any suggestion's on how can I test ram, cpu &
>>>> controller would be appreciated.
>>> Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe
>>> you could try that? You can easily see the pattern of the corruption
>>> with that. But maybe Chris' stress.sh also provides it.
>>> Anyway, I yesterday added support to specify min and max file size, as
>>> it before only used 1MiB to 1GiB sizes... It's a bit cryptic with
>>> bits, though, I will improve that later.
>>> https://bitbucket.org/aakef/ql-fstest/downloads
>>>
>>>
>>> Cheers,
>>> Bernd
>>>
>>>
>>> PS: But see my other thread, using ql-fstest I yesterday entirely
>>> broke a btrfs test file system resulting in kernel panics.
>> Hi,
>>
>> Its been a while, but I think I should provide a "definite anwser" or
>> simply what was the cause of whole problem:
>>
>> It was a printer!
>>
>> Long story short, I was going nuts trying to diagnose which bit of my
>> server is going bad and effectively I was down to blaming a interface
>> card that connects hotswapable disks to mobo / pcie controllers. When
>> I've got back from my holiday I've sat in front of server and decided to
>> go with ql-fstest which in a very nice way reports errors with a very
>> low lag (~2 minutes) after they occurred. At this point my printer
>> kicked in with "self clean" and error just showed up after ~ two minutes
>> - so I've restarted printer and while it was going through it's own post
>> with self clean another error showed up. Issue here turned out to be
>> that I was using one of those fantastic pci 4 port ethernet cards and
>> printer was directly to it - after moving it and everything else to
>> switch all problem and issues have went away. AT the moment I'm running
>> server for 2 weeks without any corruptions, any random kernel btrfs
>> crashes etc.
> Wow, I've never heard that one before. You might want to try a
> different 4 port card and/or report it to the driver maintainer. That
> shouldn't happen ;)
>
> ql-fstest looks neat, I'll check it out (thanks Bernd).
>
> -chris
>
I've forgot to mention that server sits on UPS, and printer is directly
connected to mains - when thinking of it, it creates an ground shift
effect since nothing on cheap PSU got "real" ground. But anyway this is
not a fault of this 4 port card, I've tried moving it to cheap ne2000
and to motherboard integrated one and effect was the same. Also
diagnostics was veeery problematic because beside of having a corruption
on hdd memtest was returning corruptions in ram, but on a very rare
occation, also a cpu test was returning corruption on 1 / day basis.
I've replaced nearly everything on this server - including psu (to 1400W
from my dev rig) to make NO difference. I should mention as well that
this printer is a colour laser printer which got 4 drums to clean, so I
would assume that it produces enough static electricity to power a small
cattle.
ps. it shouldn't be an driver issue since errors in ram were 1 - 4 bit
big located in same 32 bit word - hence i think a single transfer had to
be corrupt rather than whole eth packet showed into random memory.
next prev parent reply other threads:[~2013-02-05 14:10 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-14 11:09 btrfs for files > 10GB = random spontaneous CRC failure Tomasz Kusmierz
2013-01-14 14:59 ` Chris Mason
2013-01-14 15:22 ` Tomasz Kusmierz
2013-01-14 15:57 ` Chris Mason
2013-01-14 16:32 ` Tomasz Kusmierz
2013-01-14 16:34 ` Chris Mason
2013-01-15 16:54 ` Lars Weber
2013-01-15 23:32 ` Tom Kusmierz
2013-01-15 23:44 ` Chris Mason
2013-01-16 9:21 ` Bernd Schubert
2013-02-05 10:16 ` Tomasz Kusmierz
2013-02-05 12:49 ` Chris Mason
2013-02-05 14:10 ` Tomasz Kusmierz [this message]
2013-02-05 13:46 ` Roman Mamedov
2013-02-05 14:18 ` Tomasz Kusmierz
2013-01-14 16:20 ` Roman Mamedov
2013-01-14 16:34 ` Tomasz Kusmierz
-- strict thread matches above, loose matches on Subject: below --
2013-01-14 11:17 Tomasz Kusmierz
2013-01-14 11:25 ` Roman Mamedov
2013-01-14 11:43 ` Tomasz Kusmierz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=511112E3.1020309@gmail.com \
--to=tom.kusmierz@gmail.com \
--cc=bernd.schubert@itwm.fraunhofer.de \
--cc=chris.mason@fusionio.com \
--cc=clmason@fusionio.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).