From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Another ENOSPC situation
Date: Sat, 2 Apr 2016 07:31:44 +0000 (UTC) [thread overview]
Message-ID: <pan$9f448$74821e54$55580fe7$7b88b8c6@cox.net> (raw)
In-Reply-To: CAJCQCtR8RNJddBXAcTsbFN51YrwEDSz2Get45oQyJsP3o6xS-w@mail.gmail.com
Chris Murphy posted on Fri, 01 Apr 2016 23:43:46 -0600 as excerpted:
> On Fri, Apr 1, 2016 at 10:55 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>> Marc Haber posted on Fri, 01 Apr 2016 15:40:29 +0200 as excerpted:
>
>>> [4/502]mh@swivel:~$ sudo btrfs fi usage / Overall:
>>> Device size: 600.00GiB Device allocated:
>>> 600.00GiB Device unallocated: 1.00MiB
>>
>> That's the problem right there. The admin didn't do his job and spot
>> the near full allocation issue
>
>
> I don't yet agree this is an admin problem. This is the 2nd or 3rd case
> we've seen only recently where there's plenty of space in all chunk
> types and yet ENOSPC happens, seemingly only because there's no
> unallocated space remaining. I don't know that this is a regression for
> sure, but it sure seems like one.
Notice that he said _balance_ failed with ENOSPC. He did _NOT_ say he
was getting it in ordinary usage, just yet. Which would fit a 100%
allocated situation, with plenty of space left in both data and metadata
chunks. The plenty of space left inside the chunks would keep ordinary
usage from running into problems just yet, but balance really /does/ need
room to allocate at least one new chunk in ordered to properly handle the
chunk rewrite via COW. (At least for data, metadata seems to work a bit
differently. See below.)
Balance has always failed with ENOSPC if there was no unallocated space
left. It used to happen all the time, before btrfs learned how to delete
empty chunks in 3.17, but while that helps, it only works for literally
/empty/ chunks. Chunks with even a single block/node still in use don't
get deleted automatically.
What I think is happening now is that while the empty-chunk deleting from
3.17 on helped, it has been long enough since then, now, that people with
particular usage patterns, I'd strongly suspect those with heavy
snapshotting, don't tend to fully empty their chunks to the extent that
those with other usage patterns do, and it has been just long enough now
that we're beginning to see the problem reported again, because deleting
empty chunks helped, but they weren't fully emptying enough chunks to
keep up with things that way, in their particular use-cases.
>>> Data,single: Size:553.93GiB, Used:405.73GiB
>>> /dev/mapper/swivelbtr 553.93GiB
>>>
>>> Metadata,DUP: Size:23.00GiB, Used:3.83GiB
>>> /dev/mapper/swivelbtr 46.00GiB
>>>
>>> System,DUP: Size:32.00MiB, Used:112.00KiB
>>> /dev/mapper/swivelbtr 64.00MiB
>>>
>>> Unallocated:
>>> /dev/mapper/swivelbtr 1.00MiB
>>> [5/503]mh@swivel:~$
>>
>> Both data and metadata have several GiB free, data ~140 GiB free, and
>> metadata isn't into global reserve, so the system isn't totally wedged,
>> only partially, due to the lack of unallocated space.
>
> Unallocated space alone hasn't ever caused this that I can remember.
> It's most often been totally full metadata chunks, with free space in
> allocated data chunks, with no unallocated space out of which to create
> another metadata chunk to write out changes.
Unallocated space alone doesn't cause ENOSPC with normal operations; for
those you're correct, running out of either data or metadata space is
required as well. (Normally it's metadata that runs out, but I recall
seeing one post from someone who had metadata room but full data. The
behavior was.. "interesting", as he could do renames, etc, and even
create small files as long as they were small enough to stay in
metadata. As soon as he tried to do anything that needed an actual data
extent, however, ENOSPC.)
But balance has always required space to allocate at least one chunk, as
COW means the existing chunk can't be released until everything is
rewritten into the new one.
Tho it seems that btrfs can sometimes either write very small metadata
chunks, which don't forget are dup by default on a single device, as they
are in this case. He has 1 MiB unallocated. Split in half that's 512
KiB. I'm not sure if btrfs can go that small, but if it can, and it can
find a low enough usage metadata chunk to write into it, freeing the
larger metadata chunk...
Or maybe btrfs can actually use the global reserve for that, since global
reserve is part of metadata. If it can, a 512 MiB global reserve would
be just large enough to write the two copies of a nominally 256 MiB
metadata chunk.
Either way, I've seen a number of times now where btrfs was able to
balance metadata, when it had less than the 256 (*2 if dup) MiB
unallocated that would normally be required. Maybe it /is/ able to use
global reserve for that, which would allow it to work, as long as
metadata isn't so tight that it's already using global reserve. That's
actually what I bet it's doing, now that I think about it. Because as
long as the global reserve isn't being used, 512 MiB of global reserve
would be exactly 2*256 MiB metadata chunks, and if they're unused, that
would allow balance to claim them without actually having to allocate
them. But, I'd bet it works only if global reserve remains at absolutely
0 usage.
> There should be plenty of space for either a -dusage=1 or -musage=1
> balance to free up a bunch of partially allocated chunks. Offhand I
> don't think the profiles filter is helpful in this case.
>
> OK so where I could be wrong is that I'm expecting balance doesn't
> require allocated space to work. I'd expect that it can COW extents from
> one chunk into another existing chunk (of the same type) and then once
> that's successful, free up that chunk, i.e. revert it back to
> unallocated. If balance can only copy into newly allocated chunks, that
> seems like a big problem. I thought that problems had been fixed a very
> long time ago.
I don't believe it can. It has to create new chunks. (Tho if it works
as in the metadata and global reserve discussion above, that would be an
exception, as it could then use those 100% unused metadata global reserve
chunks without having to actually allocate them first.)
> And what we don't see from 'usage' that we will see from 'df' is the
> GlobalReserve values. I'd like to see that.
Actually... look again. It's there, and I quoted it, but you snipped
that part. =:^)
Tho I don't blame you, an actually usable btrfs fi usage is new enough to
all of us that we're still getting used to it, and don't have its format
hard-wired into our wetware by repetition just yet, as we do btrfs fi
show and btrfs fi df. I know there's been several times when I "lost" a
figure in fi usage that I knew was there... somewhere! and had to start
from the top and go thru every line one by one to find it, because I just
don't know usage like I know show and df yet. =:^\
Plus, I think it's a bit more difficult because the display is more
spread out, so there's more "haystack" to lose the "needle" in. =;^P
But I suppose we'll get used to it, over time.
> Anyway, in the meantime there is a work around:
>
> btrfs dev add
>
> Just add a device, even if it's an 8GiB flash drive. But it can be a
> spare space on a partition, or it can be a logical volume, or whatever
> you want. That'll add some gigs of unallocated space. Now the balance
> will work, or for absolutely sure there's a bug (and a new one because
> this has always worked in the past). After whatever filtered or full
> balance is done, make sure to 'btfs dev rem' and confirm it's gone with
> 'btrfs fi show' before removing the device. It's a two device volume
> until that device is successfully removed and is in something of a
> fragile state until then because any loss of data on that 2nd device has
> a good chance of face planting the file system.
Agreed. This is the next step if he can't finagle enough room out of
metadata without it ENOSPCing. But as I said, I've actually seen it
(metadata only, not data... until metadata shrunk enough to leave some
gigs unallocated) work a couple times recently when I didn't think it
could due to no unallocated space, and I'm actually beginning to think
that's due to balance being able to use the (metadata-only) global
reserve.
Which would make sense, but it's either a relatively new development, or
one I simply didn't know about previously.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2016-04-02 7:31 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-01 13:40 Another ENOSPC situation Marc Haber
2016-04-01 15:44 ` Henk Slager
2016-04-01 16:30 ` Marc Haber
2016-04-01 16:50 ` Marc Haber
2016-04-01 19:20 ` Henk Slager
2016-04-01 20:40 ` Marc Haber
2016-04-01 23:39 ` Henk Slager
2016-04-02 4:55 ` Duncan
2016-04-02 5:43 ` Chris Murphy
2016-04-02 7:31 ` Duncan [this message]
2016-04-05 13:39 ` Austin S. Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$9f448$74821e54$55580fe7$7b88b8c6@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).