linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: BTRFS free space handling still needs more work: Hangs again
Date: Sat, 27 Dec 2014 04:26:10 +0000 (UTC)	[thread overview]
Message-ID: <pan$248af$ca4818d4$b1ce7df3$527f91c1@cox.net> (raw)
In-Reply-To: 6794875.ElpGA7E3Vr@merkaba

Martin Steigerwald posted on Fri, 26 Dec 2014 16:59:09 +0100 as excerpted:

> Dec 26 16:17:57 merkaba kernel: [ 8102.029438] mce:
> [Hardware Error]: Machine check events logged
> Dec 26 16:20:27 merkaba kernel: [ 8252.054015] mce:
> [Hardware Error]: Machine check events logged

Have you checked these MCEs?  What are they?

MCEs are hardware errors.  These are *NOT* kernel errors, tho of course 
they may /trigger/ kernel errors.  The reported event codes can be looked 
up and translated into English. 

Since shortly after the first one until a bit before the second one here, 
you had hardware thermal throttling, the CPUs, on-chip cache, and 
possibly the memory, was working pretty hard.

FWIW, I had an AMD machine that would MCE with memory related errors some 
time (about a decade) ago.  I had ECC RAM, but it was cheap and 
apparently not quite up to the speeds it was actually rated for.  MemTest 
check out the memory fine, but under high stress especially, it would 
sometimes have bus/transit related corruption, which would sometimes (not 
always) trigger those MCEs.

Eventually a BIOS update gave me the ability to turn down the memory 
timings, and turning them down just one notch made everything rock-stable 
-- I was even able to decrease some of the wait-states to get a bit of 
the memory speed back.  It just so happened that it was borderline stable 
at the rated clock, and turning the memory clock down just one notch was 
all it took.  Later, I upgraded the RAM (the bad RAM was two half-gig 
sticks, back when they were $100+ a piece, I upgraded to four 2-gig 
sticks), and the new RAM didn't have the problem at all -- the bad RAM 
sticks simply weren't /quite/ stable at the rated speed, that was it.

I run gentoo so of course do a lot of building from sources, and 
interestingly enough, the thing that turned out to detect the corruption 
the most often was bzip2 compression checksums -- I'd get errors on 
sources decompress previous to the build, rather more often than actual 
build failures altho those would happen occasionally as well, while 
redoing it would work fine -- checksums passed, and I never had a build 
that actually finished fail to run due to a bad build.

Now here's the thing.  Of course a decade ago was well before I was 
running btrfs (FWIW I was running reiserfs at the time, and it seemed 
pretty resilient given the bad RAM I had), so it was the bzip2 checksums 
it failed on.

But guess what btrfs uses for file integrity, checksums.  If your MCEs 
are either like my memory-related MCEs were, or are similar CPU-cache or 
CPU related but still something that would affect checksumming, btrfs may 
well be fighting bad checksums due to the same issues, and that would of 
course throw all sorts of wrenches into things.  Another thing I've seen 
reported as triggering MCEs is bad power (in that case it was an either 
underpowered or going bad UPS, once it was out of the picture, the MCEs 
and problems stopped).

Now I think you're having other btrfs issues as well, some of which are 
likely legit bugs.  However, your MCEs certainly aren't helping things, 
and I'd definitely recommend checking up on them to see what's actually 
happening to your hardware.  It may well be that without whatever 
hardware issues are triggering those MCEs, you may end up with less btrfs 
problems as well.

Or maybe not, but it's something to look into, because right now, 
regardless of whether they're making things worse physically, they're at 
minimum obscuring a troubleshooting picture that would be clearer without 
them.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  reply	other threads:[~2014-12-27  4:26 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-26 13:37 BTRFS free space handling still needs more work: Hangs again Martin Steigerwald
2014-12-26 14:20 ` Martin Steigerwald
2014-12-26 14:41   ` Martin Steigerwald
2014-12-27  3:33     ` Duncan
2014-12-26 15:59 ` Martin Steigerwald
2014-12-27  4:26   ` Duncan [this message]
2014-12-26 22:48 ` Robert White
2014-12-27  5:54   ` Duncan
2014-12-27  9:01   ` Martin Steigerwald
2014-12-27  9:30     ` Hugo Mills
2014-12-27 10:54       ` Martin Steigerwald
2014-12-27 11:52         ` Robert White
2014-12-27 13:16           ` Martin Steigerwald
2014-12-27 13:49             ` Robert White
2014-12-27 14:06               ` Martin Steigerwald
2014-12-27 14:00             ` Robert White
2014-12-27 14:14               ` Martin Steigerwald
2014-12-27 14:21                 ` Martin Steigerwald
2014-12-27 15:14                   ` Robert White
2014-12-27 16:01                     ` Martin Steigerwald
2014-12-28  0:25                       ` Robert White
2014-12-28  1:01                         ` Bardur Arantsson
2014-12-28  4:03                           ` Robert White
2014-12-28 12:03                             ` Martin Steigerwald
2014-12-28 17:04                               ` Patrik Lundquist
2014-12-29 10:14                                 ` Martin Steigerwald
2014-12-28 12:07                             ` Martin Steigerwald
2014-12-28 14:52                               ` Robert White
2014-12-28 15:42                                 ` Martin Steigerwald
2014-12-28 15:47                                   ` Martin Steigerwald
2014-12-29  0:27                                   ` Robert White
2014-12-29  9:14                                     ` Martin Steigerwald
2014-12-27 16:10                     ` Martin Steigerwald
2014-12-27 14:19               ` Robert White
2014-12-27 11:11       ` Martin Steigerwald
2014-12-27 12:08         ` Robert White
2014-12-27 13:55       ` Martin Steigerwald
2014-12-27 14:54         ` Robert White
2014-12-27 16:26           ` Hugo Mills
2014-12-27 17:11             ` Martin Steigerwald
2014-12-27 17:59               ` Martin Steigerwald
2014-12-28  0:06             ` Robert White
2014-12-28 11:05               ` Martin Steigerwald
2014-12-28 13:00         ` BTRFS free space handling still needs more work: Hangs again (further tests) Martin Steigerwald
2014-12-28 13:40           ` BTRFS free space handling still needs more work: Hangs again (further tests, as close as I dare) Martin Steigerwald
2014-12-28 13:56             ` BTRFS free space handling still needs more work: Hangs again (further tests, as close as I dare, current idea) Martin Steigerwald
2014-12-28 15:00               ` Martin Steigerwald
2014-12-29  9:25               ` Martin Steigerwald
2014-12-27 18:28       ` BTRFS free space handling still needs more work: Hangs again Zygo Blaxell
2014-12-27 18:40         ` Hugo Mills
2014-12-27 19:23           ` BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time) Martin Steigerwald
2014-12-29  2:07             ` Zygo Blaxell
2014-12-29  9:32               ` Martin Steigerwald
2015-01-06 20:03                 ` Zygo Blaxell
2015-01-07 19:08                   ` Martin Steigerwald
2015-01-07 21:41                     ` Zygo Blaxell
2015-01-08  5:45                     ` Duncan
2015-01-08 10:18                       ` Martin Steigerwald
2015-01-09  8:25                         ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$248af$ca4818d4$b1ce7df3$527f91c1@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).