All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tomasz Kusmierz <tom.kusmierz@gmail.com>
To: Chris Mason <chris.mason@fusionio.com>,
	Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>,
	Chris Mason <clmason@fusionio.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs for files > 10GB = random spontaneous CRC failure.
Date: Tue, 05 Feb 2013 14:10:43 +0000	[thread overview]
Message-ID: <511112E3.1020309@gmail.com> (raw)
In-Reply-To: <20130205124923.GA20797@shiny>

On 05/02/13 12:49, Chris Mason wrote:
> On Tue, Feb 05, 2013 at 03:16:34AM -0700, Tomasz Kusmierz wrote:
>> On 16/01/13 09:21, Bernd Schubert wrote:
>>> On 01/16/2013 12:32 AM, Tom Kusmierz wrote:
>>>
>>>> p.s. bizzare that when I "fill" ext4 partition with test data everything
>>>> check's up OK (crc over all files), but with Chris tool it gets
>>>> corrupted - for both Adaptec crappy pcie controller and for mother board
>>>> built in one. Also since courses of history proven that my testing
>>>> facilities are crap - any suggestion's on how can I test ram, cpu &
>>>> controller would be appreciated.
>>> Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe
>>> you could try that? You can easily see the pattern of the corruption
>>> with that. But maybe Chris' stress.sh also provides it.
>>> Anyway, I yesterday added support to specify min and max file size, as
>>> it before only used 1MiB to 1GiB sizes... It's a bit cryptic with
>>> bits, though, I will improve that later.
>>> https://bitbucket.org/aakef/ql-fstest/downloads
>>>
>>>
>>> Cheers,
>>> Bernd
>>>
>>>
>>> PS: But see my other thread, using ql-fstest I yesterday entirely
>>> broke a btrfs test file system resulting in kernel panics.
>> Hi,
>>
>> Its been a while, but I think I should provide a "definite anwser" or
>> simply what was the cause of whole problem:
>>
>> It was a printer!
>>
>> Long story short, I was going nuts trying to diagnose which bit of my
>> server is going bad and effectively I was down to blaming a interface
>> card that connects hotswapable disks to mobo / pcie controllers. When
>> I've got back from my holiday I've sat in front of server and decided to
>> go with ql-fstest which in a very nice way reports errors with a very
>> low lag (~2 minutes) after they occurred. At this point my printer
>> kicked in with "self clean" and error just showed up after ~ two minutes
>> - so I've restarted printer and while it was going through it's own post
>> with self clean another error showed up. Issue here turned out to be
>> that I was using one of those fantastic pci 4 port ethernet cards and
>> printer was directly to it - after moving it and everything else to
>> switch all problem and issues have went away. AT the moment I'm running
>> server for 2 weeks without any corruptions, any random kernel btrfs
>> crashes etc.
> Wow, I've never heard that one before.  You might want to try a
> different 4 port card and/or report it to the driver maintainer.  That
> shouldn't happen ;)
>
> ql-fstest looks neat, I'll check it out (thanks Bernd).
>   
> -chris
>
I've forgot to mention that server sits on UPS, and printer is directly 
connected to mains - when thinking of it, it creates an ground shift 
effect since nothing on cheap PSU got "real" ground. But anyway this is 
not a fault of this 4 port card, I've tried moving it to cheap ne2000 
and to motherboard integrated one and effect was the same. Also 
diagnostics was veeery problematic because beside of having a corruption 
on hdd memtest was returning corruptions in ram, but on a very rare 
occation, also a cpu test was returning corruption on 1 / day basis. 
I've replaced nearly everything on this server - including psu (to 1400W 
from my dev rig) to make NO difference. I should mention as well that 
this printer is a colour laser printer which got 4 drums to clean, so I 
would assume that it produces enough static electricity to power a small 
cattle.

ps. it shouldn't be an driver issue since errors in ram were 1 - 4 bit 
big located in same 32 bit word - hence i think a single transfer had to 
be corrupt rather than whole eth packet showed into random memory.

  reply	other threads:[~2013-02-05 14:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-14 11:09 btrfs for files > 10GB = random spontaneous CRC failure Tomasz Kusmierz
2013-01-14 14:59 ` Chris Mason
2013-01-14 15:22   ` Tomasz Kusmierz
2013-01-14 15:57     ` Chris Mason
2013-01-14 16:32       ` Tomasz Kusmierz
2013-01-14 16:34         ` Chris Mason
2013-01-15 16:54           ` Lars Weber
2013-01-15 23:32           ` Tom Kusmierz
2013-01-15 23:44             ` Chris Mason
2013-01-16  9:21             ` Bernd Schubert
2013-02-05 10:16               ` Tomasz Kusmierz
2013-02-05 12:49                 ` Chris Mason
2013-02-05 14:10                   ` Tomasz Kusmierz [this message]
2013-02-05 13:46                 ` Roman Mamedov
2013-02-05 14:18                   ` Tomasz Kusmierz
2013-01-14 16:20     ` Roman Mamedov
2013-01-14 16:34       ` Tomasz Kusmierz
  -- strict thread matches above, loose matches on Subject: below --
2013-01-14 11:17 Tomasz Kusmierz
2013-01-14 11:25 ` Roman Mamedov
2013-01-14 11:43   ` Tomasz Kusmierz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=511112E3.1020309@gmail.com \
    --to=tom.kusmierz@gmail.com \
    --cc=bernd.schubert@itwm.fraunhofer.de \
    --cc=chris.mason@fusionio.com \
    --cc=clmason@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.