linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tomasz Kusmierz <tom.kusmierz@gmail.com>
To: Chris Mason <chris.mason@fusionio.com>,
	Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>,
	Chris Mason <clmason@fusionio.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs for files > 10GB = random spontaneous CRC failure.
Date: Tue, 05 Feb 2013 14:10:43 +0000	[thread overview]
Message-ID: <511112E3.1020309@gmail.com> (raw)
In-Reply-To: <20130205124923.GA20797@shiny>

On 05/02/13 12:49, Chris Mason wrote:
> On Tue, Feb 05, 2013 at 03:16:34AM -0700, Tomasz Kusmierz wrote:
>> On 16/01/13 09:21, Bernd Schubert wrote:
>>> On 01/16/2013 12:32 AM, Tom Kusmierz wrote:
>>>
>>>> p.s. bizzare that when I "fill" ext4 partition with test data everything
>>>> check's up OK (crc over all files), but with Chris tool it gets
>>>> corrupted - for both Adaptec crappy pcie controller and for mother board
>>>> built in one. Also since courses of history proven that my testing
>>>> facilities are crap - any suggestion's on how can I test ram, cpu &
>>>> controller would be appreciated.
>>> Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe
>>> you could try that? You can easily see the pattern of the corruption
>>> with that. But maybe Chris' stress.sh also provides it.
>>> Anyway, I yesterday added support to specify min and max file size, as
>>> it before only used 1MiB to 1GiB sizes... It's a bit cryptic with
>>> bits, though, I will improve that later.
>>> https://bitbucket.org/aakef/ql-fstest/downloads
>>>
>>>
>>> Cheers,
>>> Bernd
>>>
>>>
>>> PS: But see my other thread, using ql-fstest I yesterday entirely
>>> broke a btrfs test file system resulting in kernel panics.
>> Hi,
>>
>> Its been a while, but I think I should provide a "definite anwser" or
>> simply what was the cause of whole problem:
>>
>> It was a printer!
>>
>> Long story short, I was going nuts trying to diagnose which bit of my
>> server is going bad and effectively I was down to blaming a interface
>> card that connects hotswapable disks to mobo / pcie controllers. When
>> I've got back from my holiday I've sat in front of server and decided to
>> go with ql-fstest which in a very nice way reports errors with a very
>> low lag (~2 minutes) after they occurred. At this point my printer
>> kicked in with "self clean" and error just showed up after ~ two minutes
>> - so I've restarted printer and while it was going through it's own post
>> with self clean another error showed up. Issue here turned out to be
>> that I was using one of those fantastic pci 4 port ethernet cards and
>> printer was directly to it - after moving it and everything else to
>> switch all problem and issues have went away. AT the moment I'm running
>> server for 2 weeks without any corruptions, any random kernel btrfs
>> crashes etc.
> Wow, I've never heard that one before.  You might want to try a
> different 4 port card and/or report it to the driver maintainer.  That
> shouldn't happen ;)
>
> ql-fstest looks neat, I'll check it out (thanks Bernd).
>   
> -chris
>
I've forgot to mention that server sits on UPS, and printer is directly 
connected to mains - when thinking of it, it creates an ground shift 
effect since nothing on cheap PSU got "real" ground. But anyway this is 
not a fault of this 4 port card, I've tried moving it to cheap ne2000 
and to motherboard integrated one and effect was the same. Also 
diagnostics was veeery problematic because beside of having a corruption 
on hdd memtest was returning corruptions in ram, but on a very rare 
occation, also a cpu test was returning corruption on 1 / day basis. 
I've replaced nearly everything on this server - including psu (to 1400W 
from my dev rig) to make NO difference. I should mention as well that 
this printer is a colour laser printer which got 4 drums to clean, so I 
would assume that it produces enough static electricity to power a small 
cattle.

ps. it shouldn't be an driver issue since errors in ram were 1 - 4 bit 
big located in same 32 bit word - hence i think a single transfer had to 
be corrupt rather than whole eth packet showed into random memory.

  reply	other threads:[~2013-02-05 14:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-14 11:09 btrfs for files > 10GB = random spontaneous CRC failure Tomasz Kusmierz
2013-01-14 14:59 ` Chris Mason
2013-01-14 15:22   ` Tomasz Kusmierz
2013-01-14 15:57     ` Chris Mason
2013-01-14 16:32       ` Tomasz Kusmierz
2013-01-14 16:34         ` Chris Mason
2013-01-15 16:54           ` Lars Weber
2013-01-15 23:32           ` Tom Kusmierz
2013-01-15 23:44             ` Chris Mason
2013-01-16  9:21             ` Bernd Schubert
2013-02-05 10:16               ` Tomasz Kusmierz
2013-02-05 12:49                 ` Chris Mason
2013-02-05 14:10                   ` Tomasz Kusmierz [this message]
2013-02-05 13:46                 ` Roman Mamedov
2013-02-05 14:18                   ` Tomasz Kusmierz
2013-01-14 16:20     ` Roman Mamedov
2013-01-14 16:34       ` Tomasz Kusmierz
  -- strict thread matches above, loose matches on Subject: below --
2013-01-14 11:17 Tomasz Kusmierz
2013-01-14 11:25 ` Roman Mamedov
2013-01-14 11:43   ` Tomasz Kusmierz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=511112E3.1020309@gmail.com \
    --to=tom.kusmierz@gmail.com \
    --cc=bernd.schubert@itwm.fraunhofer.de \
    --cc=chris.mason@fusionio.com \
    --cc=clmason@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).