From: Marc Joliet <marcec@gmx.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: Two persistent problems
Date: Wed, 26 Nov 2014 19:35:27 +0100 [thread overview]
Message-ID: <20141126193527.5f5a09c6@marcec.fritz.box> (raw)
In-Reply-To: <54667B7A.8050704@fb.com>
[-- Attachment #1: Type: text/plain, Size: 3569 bytes --]
Am Fri, 14 Nov 2014 17:00:26 -0500
schrieb Josef Bacik <jbacik@fb.com>:
> On 11/14/2014 04:51 PM, Hugo Mills wrote:
> > Chris, Josef, anyone else who's interested,
> >
> > On IRC, I've been seeing reports of two persistent unsolved
> > problems. Neither is showing up very often, but both have turned up
> > often enough to indicate that there's something specific going on
> > worthy of investigation.
> >
> > One of them is definitely a btrfs problem. The other may be btrfs,
> > or something in the block layer, or just broken hardware; it's hard to
> > tell from where I sit.
> >
> > Problem 1: ENOSPC on balance
> >
> > This has been going on since about March this year. I can
> > reasonably certainly recall 8-10 cases, possibly a number more. When
> > running a balance, the operation fails with ENOSPC when there's plenty
> > of space remaining unallocated. This happens on full balance, filtered
> > balance, and device delete. Other than the ENOSPC on balance, the FS
> > seems to work OK. It seems to be more prevalent on filesystems
> > converted from ext*. The first few or more reports of this didn't make
> > it to bugzilla, but a few of them since then have gone in.
> >
> > Problem 2: Unexplained zeroes
> >
> > Failure to mount. Transid failure, "expected xyz, have 0". Chris
> > looked at an early one of these (for Ke, on IRC) back in September
> > (the 27th -- sadly, the public IRC logs aren't there for it, but I can
> > supply a copy of the private log). He rapidly came to the conclusion
> > that it was something bad going on with TRIM, replacing some blocks
> > with zeroes. Since then, I've seen a bunch of these coming past on
> > IRC. It seems to be a 3.17 thing. I can successfully predict the
> > presence of an SSD and -odiscard from the "have 0". I've successfully
> > persuaded several people to put this into bugzilla and capture
> > btrfs-images. btrfs recover doesn't generally seem to be helpful in
> > recovering data.
> >
> >
> > I think Josef had problem 1 in his sights, but I don't know if
> > additional images or reports are helpful at this point. For problem 2,
> > there's obviously something bad going on, but there's not much else to
> > go on -- and the inability to recover data isn't good.
> >
> > For each of these, what more information should I be trying to
> > collect from any future reporters?
> >
> >
>
> So for #2 I've been looking at that the last two weeks. I'm always
> paranoid we're screwing up one of our data integrity sort of things,
> either not waiting on IO to complete properly or something like that.
> I've built a dm target to be as evil as possible and have been running
> it trying to make bad things happen. I got slightly side tracked since
> my stress test exposed a bug in the tree log stuff an csums which I just
> fixed. Now that I've fixed that I'm going back to try and make the
> "expected blah, have 0" type errors happen.
Just a quick question from a user: does Filipe's patch "Btrfs: fix race between
fs trimming and block group remove/allocation" fix this? Judging by the commit
message, it looks like it. If so, can you say whether it will make it into
3.17.x?
Maybe I'm being overly paranoid, but I stuck with 3.16.7 because of this. (I
mean, I have backups, but there's no need to provoke a situation where I will
need them ;-) .)
--
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
prev parent reply other threads:[~2014-11-26 18:35 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-14 21:51 Two persistent problems Hugo Mills
2014-11-14 22:00 ` Josef Bacik
2014-11-17 10:59 ` Konstantin
2014-11-17 11:36 ` Hugo Mills
2014-11-17 11:10 ` Hugo Mills
2014-11-26 18:35 ` Marc Joliet [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141126193527.5f5a09c6@marcec.fritz.box \
--to=marcec@gmx.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).