linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Kusmierz <tom.kusmierz@gmail.com>
To: Chris Mason <chris.mason@fusionio.com>,
	Chris Mason <clmason@fusionio.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs for files > 10GB = random spontaneous CRC failure.
Date: Tue, 15 Jan 2013 23:32:10 +0000	[thread overview]
Message-ID: <50F5E6FA.60803@gmail.com> (raw)
In-Reply-To: <20130114163433.GD1387@shiny>

On 14/01/13 16:34, Chris Mason wrote:
> On Mon, Jan 14, 2013 at 09:32:25AM -0700, Tomasz Kusmierz wrote:
>> On 14/01/13 15:57, Chris Mason wrote:
>>> On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote:
>>>> On 14/01/13 14:59, Chris Mason wrote:
>>>>> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Since I had some free time over Christmas, I decided to conduct few
>>>>>> tests over btrFS to se how it will cope with "real life storage" for
>>>>>> normal "gray users" and I've found that filesystem will always mess up
>>>>>> your files that are larger than 10GB.
>>>>> Hi Tom,
>>>>>
>>>>> I'd like to nail down the test case a little better.
>>>>>
>>>>> 1) Create on one drive, fill with data
>>>>> 2) Add a second drive, convert to raid1
>>>>> 3) find corruptions?
>>>>>
>>>>> What happens if you start with two drives in raid1?  In other words, I'm
>>>>> trying to see if this is a problem with the conversion code.
>>>>>
>>>>> -chris
>>>> Ok, my description might be a bit enigmatic so to cut long story short
>>>> tests are:
>>>> 1) create a single drive default btrfs volume on single partition ->
>>>> fill with test data -> scrub -> admire errors.
>>>> 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on
>>>> separate disk, each same size etc. -> fill with test data -> scrub ->
>>>> admire errors.
>>>> 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on
>>>> separate disk, each same size etc. -> fill with test data -> scrub ->
>>>> admire errors.
>>>>
>>>> all disks are same age + size + model ... two different batches to avoid
>>>> same time failure.
>>> Ok, so we have two possible causes.  #1 btrfs is writing garbage to your
>>> disks.  #2 something in your kernel is corrupting your data.
>>>
>>> Since you're able to see this 100% of the time, lets assume that if #2
>>> were true, we'd be able to trigger it on other filesystems.
>>>
>>> So, I've attached an old friend, stress.sh.  Use it like this:
>>>
>>> stress.sh -n 5 -c <your source directory> -s <your btrfs mount point>
>>>
>>> It will run in a loop with 5 parallel processes and make 5 copies of
>>> your data set into the destination.  It will run forever until there are
>>> errors.  You can use a higher process count (-n) to force more
>>> concurrency and use more ram.  It may help to pin down all but 2 or 3 GB
>>> of your memory.
>>>
>>> What I'd like you to do is find a data set and command line that make
>>> the script find errors on btrfs.  Then, try the same thing on xfs or
>>> ext4 and let it run at least twice as long.  Then report back ;)
>>>
>>> -chris
>>>
>> Chris,
>>
>> Will do, just please be remember that 2TB of test data on "customer
>> grade" sata drives will take a while to test :)
> Many thanks.  You might want to start with a smaller data set, 20GB or
> so total.
>
> -chris
>
Chris & all,

Sorry for not replying for that long but Chris old friend "stress.sh" 
have proven that all my storage is affected with this bug and first 
thing was to bring everything down before corruptions will spread any 
further. Anyway for subject sake btrfs stress have failed after 2h, ext4 
stress have failed after 8h (according to "time ./stress.sh blablabla" ) 
- so it might be related to that ext4 always seamed slower on my machine 
than btrfs.


Anyway I wanted to use this opportunity to thank Chris and everybody 
related to btrfs development - your file system found a hidden bug in my 
set up that would be there until it would pretty much corrupt 
everything. I don't even want to think how much my main storage got 
corrupted over time (etx4 over lvm over md raid 5).

p.s. bizzare that when I "fill" ext4 partition with test data everything 
check's up OK (crc over all files), but with Chris tool it gets 
corrupted - for both Adaptec crappy pcie controller and for mother board 
built in one. Also since courses of history proven that my testing 
facilities are crap - any suggestion's on how can I test ram, cpu & 
controller would be appreciated.



  parent reply	other threads:[~2013-01-15 23:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-14 11:09 btrfs for files > 10GB = random spontaneous CRC failure Tomasz Kusmierz
2013-01-14 14:59 ` Chris Mason
2013-01-14 15:22   ` Tomasz Kusmierz
2013-01-14 15:57     ` Chris Mason
2013-01-14 16:32       ` Tomasz Kusmierz
2013-01-14 16:34         ` Chris Mason
2013-01-15 16:54           ` Lars Weber
2013-01-15 23:32           ` Tom Kusmierz [this message]
2013-01-15 23:44             ` Chris Mason
2013-01-16  9:21             ` Bernd Schubert
2013-02-05 10:16               ` Tomasz Kusmierz
2013-02-05 12:49                 ` Chris Mason
2013-02-05 14:10                   ` Tomasz Kusmierz
2013-02-05 13:46                 ` Roman Mamedov
2013-02-05 14:18                   ` Tomasz Kusmierz
2013-01-14 16:20     ` Roman Mamedov
2013-01-14 16:34       ` Tomasz Kusmierz
  -- strict thread matches above, loose matches on Subject: below --
2013-01-14 11:17 Tomasz Kusmierz
2013-01-14 11:25 ` Roman Mamedov
2013-01-14 11:43   ` Tomasz Kusmierz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50F5E6FA.60803@gmail.com \
    --to=tom.kusmierz@gmail.com \
    --cc=chris.mason@fusionio.com \
    --cc=clmason@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).