From: "Niccolò Belli" <darkbasic@linuxsystems.it>
To: <linux-btrfs@vger.kernel.org>
Cc: Clemens Eisserer <linuxhippy@gmail.com>,
"Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
Patrik Lundquist <patrik.lundquist@gmail.com>,
Chris Murphy <lists@colorremedies.com>,
Qu Wenruo <quwenruo@cn.fujitsu.com>,
Omar Sandoval <osandov@osandov.com>
Subject: Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair
Date: Mon, 09 May 2016 16:53:13 +0200 [thread overview]
Message-ID: <52f0c710-d695-443d-b6d5-266e3db634f8@linuxsystems.it> (raw)
In-Reply-To: <799cf552-4612-56c5-b44d-59458119e2b0@gmail.com>
On domenica 8 maggio 2016 20:27:55 CEST, Patrik Lundquist wrote:
> Are you using any power management tweaks?
Yes, as stated in my very first post I use TLP with
SATA_LINKPWR_ON_BAT=max_performance, but I managed to reproduce the bug
even without TLP. Also in the past week I've alwyas been on AC.
On lunedì 9 maggio 2016 13:52:16 CEST, Austin S. Hemmelgarn wrote:
> Memtest doesn't replicate typical usage patterns very well. My
> usual testing for RAM involves not just memtest, but also
> booting into a LiveCD (usually SystemRescueCD), pulling down a
> copy of the kernel source, and then running as many concurrent
> kernel builds as cores, each with as many make jobs as cores (so
> if you've got a quad core CPU (or a dual core with
> hyperthreading), it would be running 4 builds with -j4 passed to
> make). GCC seems to have memory usage patterns that reliably
> trigger memory errors that aren't caught by memtest, so this
> generally gives good results.
Building kernel with 4 concurrent threads is not an issue for my system, in
fact I do compile a lot and I never had any issue.
On lunedì 9 maggio 2016 13:52:16 CEST, Austin S. Hemmelgarn wrote:
> On a similar note, badblocks doesn't replicate filesystem like
> access patterns, it just runs sequentially through the entire
> disk. This isn't as likely to give bad results, but it's still
> important to know. In particular, try running it over a dmcrypt
> volume a couple of times (preferably with a different key each
> time, pulling keys from /dev/urandom works well for this), as
> that will result in writing different data. For what it's
> worth, when I'm doing initial testing of new disks, I always use
> ddrescue to copy /dev/zero over the whole disk, then do it twice
> through dmcrypt with different keys, copying from the disk to
> /dev/null after each pass. This gives random data on disk as a
> starting point (which is good if you're going to use dmcrypt),
> and usually triggers reallocation of any bad sectors as early as
> possible.
While trying to find a common denominator for my issue I did lots of
backups of /dev/mapper/cryptroot and I restored them into
/dev/mapper/cryptroot dozens of times (triggering a 150GB+ random data
write every time), without any issue (after restoring the backup I alwyas
check the parition with btrfs check). So disk doesn't seem to be the
culprit.
On lunedì 9 maggio 2016 13:52:16 CEST, Austin S. Hemmelgarn wrote:
> 1. If you have an eSATA port, try plugging your hard disk in
> there and see if things work. If that works but having the hard
> drive plugged in internally doesn't, then the issue is probably
> either that specific SATA port (in which case your chip-set is
> bad and you should get a new system), or the SATA connector
> itself (or the wiring, but that's not as likely when it's traces
> on a PCB). Normally I'd suggest just swapping cables and SATA
> ports, but that's not really possible with a laptop.
> 2. If you have access to a reasonably large flash drive, or to
> a USB to SATA adapter, try that as well, if it works on that but
> not internally (or on an eSATA port), you've probably got a bad
> SATA controller, and should get a new system.
My laptop doesn't have an eSATA port and my only big enough external drive
is currently used for daily backups, since I fear for data loss.
On lunedì 9 maggio 2016 13:52:16 CEST, Austin S. Hemmelgarn wrote:
> 3. Try things without dmcrypt. Adding extra layers makes it
> harder to determine what is actually wrong. If it works without
> dmcrypt, try using different parameters for the encryption
> (different ciphers is what I would try first). If it works
> reliably without dmcrypt, then it's either a bug in dmcrypt
> (which I don't think is very likely), or it's bad interaction
> between dmcrypt and BTRFS. If it works with some encryption
> parameters but not others, then that will help narrow down where
> the issue is.
On domenica 8 maggio 2016 01:35:16 CEST, Chris Murphy wrote:
> You're making the troubleshooting unnecessarily difficult by
> continuing to use non-default options. *shrug*
>
> Every single layer you add complicates the setup and troubleshooting.
> Of course all of it should work together, many people do. But you're
> the one having the problem so in order to demonstrate whether this is
> a software bug or hardware problem, you need to test it with the most
> basic setup possible --> btrfs on plain partitions and default mount
> options.
I will try to recap because you obviously missed my previous e-mail: I
managed to replicate the irrecoverable corruption bug even with default
options and no dmcrypt at all. Somehow it was a bit more difficult to
replicate with default options and so I started to play with different
combinations to find if there was something which increased the chances of
getting corruption. I have the feeling that "autodefrag" enhances the
chances to get corruption, but I'm not 100% sure about it. Anyway,
triggering a whole packages reinstall with "pacaur -S $(pacman -Qe)",
giving high chances to get irrecoverable corruption. When running such
command it simply extracts the tarballs from the cache and overwrites the
already installed files. It doesn't write lots of data (after
reinstallation my system is still quite small, just a few GBs) but it seems
to be enough to displease the filesystem.
To avoid losing my data every time I power on or reboot my laptop I first
boot into an external drive, I btrfs check /dev/mapper/cryptroot and if
it's still sane I backup /dev/mapper/cryptroot into an external SSD with
dd, otherwise I restore the previous copy from the SSD into
/dev/mapper/cryptroot.
I cannot manage to survive such annoying workflow for long, so I really
hope someone will manage to track the bug down soon.
Thanks for your help, I really appreciate it.
Niccolò
next prev parent reply other threads:[~2016-05-09 14:53 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-04 23:21 btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair Niccolò Belli
2016-05-05 1:07 ` Chris Murphy
2016-05-05 10:36 ` Niccolò Belli
2016-05-05 17:48 ` Omar Sandoval
2016-05-06 11:38 ` Niccolò Belli
2016-05-07 15:45 ` Niccolò Belli
2016-05-07 15:58 ` Clemens Eisserer
2016-05-07 16:11 ` Niccolò Belli
2016-05-08 18:27 ` Patrik Lundquist
2016-05-09 11:52 ` Austin S. Hemmelgarn
2016-05-09 14:53 ` Niccolò Belli [this message]
2016-05-09 16:29 ` Zygo Blaxell
2016-05-09 18:21 ` Austin S. Hemmelgarn
2016-05-09 19:18 ` Duncan
2016-05-12 14:35 ` Niccolò Belli
2016-05-12 15:43 ` Austin S. Hemmelgarn
2016-05-13 11:07 ` Niccolò Belli
2016-05-13 11:35 ` Austin S. Hemmelgarn
2016-05-13 12:10 ` Niccolò Belli
2016-05-13 21:54 ` Chris Murphy
2016-05-12 16:48 ` Zygo Blaxell
2016-05-09 19:23 ` Lionel Bouton
2016-05-09 21:30 ` Chris Murphy
2016-05-07 23:35 ` Chris Murphy
2016-05-05 4:12 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52f0c710-d695-443d-b6d5-266e3db634f8@linuxsystems.it \
--to=darkbasic@linuxsystems.it \
--cc=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linuxhippy@gmail.com \
--cc=lists@colorremedies.com \
--cc=osandov@osandov.com \
--cc=patrik.lundquist@gmail.com \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).