linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@fusionio.com>
To: Tom Kusmierz <tom.kusmierz@gmail.com>
Cc: Chris Mason <clmason@fusionio.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs for files > 10GB = random spontaneous CRC failure.
Date: Tue, 15 Jan 2013 18:44:11 -0500	[thread overview]
Message-ID: <20130115234411.GA30647@shiny.int.fusionio.com> (raw)
In-Reply-To: <50F5E6FA.60803@gmail.com>

On Tue, Jan 15, 2013 at 04:32:10PM -0700, Tom Kusmierz wrote:
> Chris & all,
> 
> Sorry for not replying for that long but Chris old friend "stress.sh" 
> have proven that all my storage is affected with this bug and first 
> thing was to bring everything down before corruptions will spread any 
> further. Anyway for subject sake btrfs stress have failed after 2h, ext4 
> stress have failed after 8h (according to "time ./stress.sh blablabla" ) 
> - so it might be related to that ext4 always seamed slower on my machine 
> than btrfs.

Ok, great.  These problems are really hard to debug, and I'm glad we've
nailed it down to the lower layers.

> 
> 
> Anyway I wanted to use this opportunity to thank Chris and everybody 
> related to btrfs development - your file system found a hidden bug in my 
> set up that would be there until it would pretty much corrupt 
> everything. I don't even want to think how much my main storage got 
> corrupted over time (etx4 over lvm over md raid 5).
> 
> p.s. bizzare that when I "fill" ext4 partition with test data everything 
> check's up OK (crc over all files), but with Chris tool it gets 
> corrupted - for both Adaptec crappy pcie controller and for mother board 
> built in one.

One really hard part of tracking down corruptions is that our boxes have
so much ram right now that they are often hidden by the page cache.  My
first advice is to boot with much less ram (1G/2G) or pin down all your
ram for testing.  A problem that triggers in 10 minutes is a billion
times easier to figure out than one that triggers in 8 hours.

> Also since courses of history proven that my testing 
> facilities are crap - any suggestion's on how can I test ram, cpu & 
> controller would be appreciated.

Step one is to figure out if you've got a CPU/memory problem or an IO problem.
memtest is often able to find CPU and memory problems, but if you pass
memtest I like to use gcc for extra hard testing.

If you have the ram, make a copy of the linux kernel tree in /dev/shm or
any ramdisk/tmpfs mount.  Then run make -j ; make clean in a loop until
your box either crashes, gcc reports an internal compiler error, or 16
hours go by.  Your loop will need to check for failed makes and stop
once you get the first failure.

Hopefully that will catch it.  Otherwise we need to look at the IO
stack.

-chris

  reply	other threads:[~2013-01-15 23:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-14 11:09 btrfs for files > 10GB = random spontaneous CRC failure Tomasz Kusmierz
2013-01-14 14:59 ` Chris Mason
2013-01-14 15:22   ` Tomasz Kusmierz
2013-01-14 15:57     ` Chris Mason
2013-01-14 16:32       ` Tomasz Kusmierz
2013-01-14 16:34         ` Chris Mason
2013-01-15 16:54           ` Lars Weber
2013-01-15 23:32           ` Tom Kusmierz
2013-01-15 23:44             ` Chris Mason [this message]
2013-01-16  9:21             ` Bernd Schubert
2013-02-05 10:16               ` Tomasz Kusmierz
2013-02-05 12:49                 ` Chris Mason
2013-02-05 14:10                   ` Tomasz Kusmierz
2013-02-05 13:46                 ` Roman Mamedov
2013-02-05 14:18                   ` Tomasz Kusmierz
2013-01-14 16:20     ` Roman Mamedov
2013-01-14 16:34       ` Tomasz Kusmierz
  -- strict thread matches above, loose matches on Subject: below --
2013-01-14 11:17 Tomasz Kusmierz
2013-01-14 11:25 ` Roman Mamedov
2013-01-14 11:43   ` Tomasz Kusmierz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130115234411.GA30647@shiny.int.fusionio.com \
    --to=chris.mason@fusionio.com \
    --cc=clmason@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=tom.kusmierz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).