All of lore.kernel.org
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair
Date: Mon, 9 May 2016 19:18:59 +0000 (UTC)	[thread overview]
Message-ID: <pan$5176$b51afa6d$75bca86d$7e2b8169@cox.net> (raw)
In-Reply-To: 4aa3dda7-70d6-5dcf-2fa7-4f2b509e4a1e@gmail.com

Austin S. Hemmelgarn posted on Mon, 09 May 2016 14:21:57 -0400 as
excerpted:

> This practice evolved out of the fact that the only bad RAM I've ever
> dealt with either completely failed to POST (which can have all kinds of
> interesting symptoms if it's just one module, some MB's refuse to boot,
> some report the error, others just disable the module and act like
> nothing happened), or passed all the memory testing tools I threw at it
> (memtest86, memtest86+, memtester, concurrent memtest86 invocations from
> Xen domains, inventive acrobatics with tmpfs and FIO, etc), but failed
> under heavy concurrent random access, which can be reliably produced by
> running a bunch of big software builds at the same time with the CPU
> insanely over-committed.

My (likely much more limited) experience matches yours.

Tho FWIW, in my case I did find that one of the more common memory 
failure indicators was bz2-ed tarball decompression, where the tarball 
would fail its decompression checksum safety checks.  However, that most 
reliably happened in the context of a heavily loaded system doing other 
package builds in parallel to the package tarball extraction that failed.

In my case, I even had ECC RAM, but it was apparently just slightly out 
of spec for its labeled and internally configured memory speeds (PC3200 
DDR1 at the time), at least on my hardware.  Once I got a BIOS update 
that let me, I slightly downclocked the memory (to PC3000, IIRC), and it 
was absolutely solid, no more errors, even with tightened up wait-state 
timings.  Later I upgraded RAM, and the new RAM worked just fine at the 
same PC3200 speeds that were a problem for the older RAM.

The problem was apparently that while the RAM cells that memcheck checks 
were fine, it was testing in an otherwise calm environment (not much 
choice since you can only boot to the test directly and can't do anything 
else at the same time), without all the other stuff going on in the 
hectic environment of a multi-package parallel build, that apparently 
happened to occasionally trigger the edge-case that would corrupt things.

And FWIW, I still have major respect for how well reiserfs behaved under 
those conditions.  No filesystem can be expected to be 100% reliable when 
it's getting corrupted data due to bad memory, but reiserfs held up 
remarkably well, far better than btrfs did under similar conditions (but 
then with the PCI and SATA bus) a few year later, forcing me back to 
reiserfs for a time, which again, continued to work like a champ, even 
under hardware conditions that were absolutely unworkable with btrfs.  I 
had a heat-related (AC went out, in Phoenix, in the summer, 40+ C 
outside, 50+C inside, who knows what the disks were!?) head crash on a 
disk too, where the partitions that were mounted and likely had the head 
flying over them were damaged beyond (easy) recovery, but other 
partitions on the same disk were absolutely fine, and I actually 
continued to run off them for a few months after cooling everything back 
down.  That sort of experience is the reason I still use reiserfs on 
spinning rust, including my second and third level backups, even while 
I'm running btrfs on the ssds for the working system and primary backup.  
It's also the reason I continue to use a partitioned system with multiple 
independent filesystems (btrfs raid1 on a pair of ssds for most of the 
working btrfs and primary backups, individual ssd btrfs in dup mode for 
/boot, and its backup on the other ssd), instead of putting my data eggs 
all in the same filesystem basket with subvolumes, where if the 
filesystem goes out all the subvolumes go with it!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  reply	other threads:[~2016-05-09 19:22 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-04 23:21 btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair Niccolò Belli
2016-05-05  1:07 ` Chris Murphy
2016-05-05 10:36   ` Niccolò Belli
2016-05-05 17:48     ` Omar Sandoval
2016-05-06 11:38       ` Niccolò Belli
2016-05-07 15:45         ` Niccolò Belli
2016-05-07 15:58           ` Clemens Eisserer
2016-05-07 16:11             ` Niccolò Belli
2016-05-08 18:27               ` Patrik Lundquist
2016-05-09 11:52               ` Austin S. Hemmelgarn
2016-05-09 14:53                 ` Niccolò Belli
2016-05-09 16:29                   ` Zygo Blaxell
2016-05-09 18:21                     ` Austin S. Hemmelgarn
2016-05-09 19:18                       ` Duncan [this message]
2016-05-12 14:35                     ` Niccolò Belli
2016-05-12 15:43                       ` Austin S. Hemmelgarn
2016-05-13 11:07                         ` Niccolò Belli
2016-05-13 11:35                           ` Austin S. Hemmelgarn
2016-05-13 12:10                             ` Niccolò Belli
2016-05-13 21:54                               ` Chris Murphy
2016-05-12 16:48                       ` Zygo Blaxell
2016-05-09 19:23                   ` Lionel Bouton
2016-05-09 21:30                   ` Chris Murphy
2016-05-07 23:35           ` Chris Murphy
2016-05-05  4:12 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$5176$b51afa6d$75bca86d$7e2b8169@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.