* Re: Fixing Btrfs Filesystem Full Problems typo?
[not found] <CAA7pwKNH-Cbd+_D+sCEJxxdervLC=_3_AzaywSE3mXi8MLydxw@mail.gmail.com>
@ 2014-11-22 22:26 ` Marc MERLIN
2014-11-22 23:26 ` Patrik Lundquist
0 siblings, 1 reply; 36+ messages in thread
From: Marc MERLIN @ 2014-11-22 22:26 UTC (permalink / raw)
To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org
+btrfs list so that someone can correct me if I'm wrong.
On Sat, Nov 22, 2014 at 09:34:59PM +0100, Patrik Lundquist wrote:
> Hi,
>
> I was scratching my head over a failing btrfs balance and read your
> very informative
> http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html,
> but shouldn't
>
> "I can ask balance to rewrite all chunks that are more than 55% full"
>
> be
>
> "I can ask balance to rewrite all chunks that are less than 55% full"?
This one hurts my brain every time I think about it :)
So, the bigger the -dusage number, the more work btrfs has to do.
-dusage=0 does almost nothing
-dusage=100 effectively rebalances everything
But saying saying "less than 95% full" for -dusage=95 would mean
rebalancing everything that isn't almost full, so I'm not sure it makes
sense either (I would think you'd wan't to reblance full blocks first).
The logical wording would be "less than 95% space free".
I'll update my page since this is what makes the most sense.
Now, just to be sure, if I'm getting this right, if your filesystem is
55% full, you could rebalance all blocks that have less than 55% space
free, and use -dusage=55
Does that sound right?
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-22 22:26 ` Fixing Btrfs Filesystem Full Problems typo? Marc MERLIN
@ 2014-11-22 23:26 ` Patrik Lundquist
2014-11-22 23:46 ` Marc MERLIN
2014-11-23 0:05 ` Hugo Mills
0 siblings, 2 replies; 36+ messages in thread
From: Patrik Lundquist @ 2014-11-22 23:26 UTC (permalink / raw)
To: Marc MERLIN; +Cc: linux-btrfs@vger.kernel.org
On 22 November 2014 at 23:26, Marc MERLIN <marc@merlins.org> wrote:
>
> This one hurts my brain every time I think about it :)
I'm new to Btrfs so I may very well be wrong, since I haven't really
read up on it. :-)
> So, the bigger the -dusage number, the more work btrfs has to do.
Agreed.
> -dusage=0 does almost nothing
> -dusage=100 effectively rebalances everything
And -dusage=0 effectively reclaims empty chunks, right?
> But saying saying "less than 95% full" for -dusage=95 would mean
> rebalancing everything that isn't almost full,
But isn't that what rebalance does? Rewriting chunks <=95% full to
completely full chunks and effectively defragmenting chunks and most
likely reduce the number of chunks.
A -dusage=0 rebalance reduced my number of chunks from 1173 to 998 and
dev_item.bytes_used went from 1593466421248 to 1491460947968.
> Now, just to be sure, if I'm getting this right, if your filesystem is
> 55% full, you could rebalance all blocks that have less than 55% space
> free, and use -dusage=55
I realize that I interpret the usage parameter as operating on blocks
(chunks? are they the same in this case?) that are <= 55% full while
you interpret it as <= 55% free.
Which is correct?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-22 23:26 ` Patrik Lundquist
@ 2014-11-22 23:46 ` Marc MERLIN
2014-11-23 0:05 ` Hugo Mills
1 sibling, 0 replies; 36+ messages in thread
From: Marc MERLIN @ 2014-11-22 23:46 UTC (permalink / raw)
To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org
On Sun, Nov 23, 2014 at 12:26:38AM +0100, Patrik Lundquist wrote:
> I realize that I interpret the usage parameter as operating on blocks
> (chunks? are they the same in this case?) that are <= 55% full while
> you interpret it as <= 55% free.
>
> Which is correct?
I will let someone else answer because I'm not 100% certain anymore.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-22 23:26 ` Patrik Lundquist
2014-11-22 23:46 ` Marc MERLIN
@ 2014-11-23 0:05 ` Hugo Mills
2014-11-23 1:07 ` Marc MERLIN
1 sibling, 1 reply; 36+ messages in thread
From: Hugo Mills @ 2014-11-23 0:05 UTC (permalink / raw)
To: Patrik Lundquist; +Cc: Marc MERLIN, linux-btrfs@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 2189 bytes --]
On Sun, Nov 23, 2014 at 12:26:38AM +0100, Patrik Lundquist wrote:
> On 22 November 2014 at 23:26, Marc MERLIN <marc@merlins.org> wrote:
> >
> > This one hurts my brain every time I think about it :)
>
> I'm new to Btrfs so I may very well be wrong, since I haven't really
> read up on it. :-)
>
>
> > So, the bigger the -dusage number, the more work btrfs has to do.
>
> Agreed.
>
>
> > -dusage=0 does almost nothing
> > -dusage=100 effectively rebalances everything
>
> And -dusage=0 effectively reclaims empty chunks, right?
>
>
> > But saying saying "less than 95% full" for -dusage=95 would mean
> > rebalancing everything that isn't almost full,
>
> But isn't that what rebalance does? Rewriting chunks <=95% full to
> completely full chunks and effectively defragmenting chunks and most
> likely reduce the number of chunks.
>
> A -dusage=0 rebalance reduced my number of chunks from 1173 to 998 and
> dev_item.bytes_used went from 1593466421248 to 1491460947968.
>
>
> > Now, just to be sure, if I'm getting this right, if your filesystem is
> > 55% full, you could rebalance all blocks that have less than 55% space
> > free, and use -dusage=55
>
> I realize that I interpret the usage parameter as operating on blocks
> (chunks? are they the same in this case?) that are <= 55% full while
> you interpret it as <= 55% free.
>
> Which is correct?
Less than or equal to 55% full.
0 gives you less than or equal to 0% full -- i.e. the empty block
groups. 100 gives you less than or equal to 100% full, i.e. all block
groups.
A chunk is the part of a block group that lives on one device, so
in RAID-1, every block group is precisely two chunks; in RAID-0, every
block group is 2 or more chunks, up to the number of devices in the
FS. A chunk is usually 1 GiB in size for data and 250 MiB for
metadata, but can be smaller under some circumstances.
Hugo.
--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- And what rough beast, its hour come round at last / slouches ---
towards Bethlehem, to be born?
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-23 0:05 ` Hugo Mills
@ 2014-11-23 1:07 ` Marc MERLIN
2014-11-23 7:52 ` Duncan
2014-11-24 18:05 ` Brendan Hide
0 siblings, 2 replies; 36+ messages in thread
From: Marc MERLIN @ 2014-11-23 1:07 UTC (permalink / raw)
To: Hugo Mills, Patrik Lundquist, linux-btrfs@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]
On Sun, Nov 23, 2014 at 12:05:04AM +0000, Hugo Mills wrote:
> > Which is correct?
>
> Less than or equal to 55% full.
This confuses me. Does that mean that the fullest blocks do not get
rebalanced?
I guess I was under the mistaken impression that the more data you had the
more you could be out of balance.
> A chunk is the part of a block group that lives on one device, so
> in RAID-1, every block group is precisely two chunks; in RAID-0, every
> block group is 2 or more chunks, up to the number of devices in the
> FS. A chunk is usually 1 GiB in size for data and 250 MiB for
> metadata, but can be smaller under some circumstances.
Right. So, why would you rebalance empty chunks or near empty chunks?
Don't you want to rebalance almost full chunks first, and work you way to
less and less full as needed?
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 308 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-23 1:07 ` Marc MERLIN
@ 2014-11-23 7:52 ` Duncan
2014-11-23 15:12 ` Patrik Lundquist
` (2 more replies)
2014-11-24 18:05 ` Brendan Hide
1 sibling, 3 replies; 36+ messages in thread
From: Duncan @ 2014-11-23 7:52 UTC (permalink / raw)
To: linux-btrfs
Marc MERLIN posted on Sat, 22 Nov 2014 17:07:42 -0800 as excerpted:
> On Sun, Nov 23, 2014 at 12:05:04AM +0000, Hugo Mills wrote:
>> > Which is correct?
>>
>> Less than or equal to 55% full.
>
> This confuses me. Does that mean that the fullest blocks do not get
> rebalanced?
Yes. =:^)
> I guess I was under the mistaken impression that the more data you had
> the more you could be out of balance.
What you were thinking is a misstatement of the situation, so yes, again,
that was a mistaken impression. =:^)
>> A chunk is the part of a block group that lives on one device, so
>> in RAID-1, every block group is precisely two chunks; in RAID-0, every
>> block group is 2 or more chunks, up to the number of devices in the FS.
>> A chunk is usually 1 GiB in size for data and 250 MiB for metadata, but
>> can be smaller under some circumstances.
>
> Right. So, why would you rebalance empty chunks or near empty chunks?
> Don't you want to rebalance almost full chunks first, and work you way
> to less and less full as needed?
No, the closer to empty a chunk is, the more effect you can get in
rebalancing it along with others of the same fullness.
Think of it this way.
One goal of a rebalance, the goal we have when data and metadata is
unbalanced and we're hitting ENOSPC as a result (as opposed to the goal
of converting or balancing among devices when one has just been added or
removed), and thus the goal that the usage filter is designed to help
solve, is this: Free excess chunk-allocated but chunk-empty space back to
unallocated, so it can be used by the other type, data or metadata.
More specifically, all available space has been allocated to data and
metadata chunks leaving no space available to allocate more chunks, and
one of two extremes has been reached, we'll call them D and M:
(
D1: All data chunks are full and more need to be allocated, but they
can't be as there's no more unallocated space to allocate the new data
chunks from,
*AND*
D2: There's a whole bunch of excess metadata chunks allocated, using up
all that unallocated space, but they're mostly empty, and need to be
rebalanced to consolidate usage into fewer but fuller metadata chunks,
thus freeing the space currently taken by all those mostly empty metadata
chunks.
)
*OR* the reverse:
(
M1: All metadata chunks are full and more need to be allocated, but they
can't be as there's no more unallocated space to allocate the new
metadata chunks from,
*AND*
M2: There's a whole bunch of excess data chunks allocated, using up all
the unallocated space, but they're mostly empty, and need to be
rebalanced to consoldidate usage into fewer but fuller data chunks, thus
freeing the space currently taken by all those mostly empty data chunks.
)
In both cases, the one type is full and needs more allocation, but the
other type is hogging all the space with mostly empty chunks. In both
cases, then, you *DON'T* want to bother with the full type, since it's
full and rewriting it won't do anything but shuffle the full chunks
around -- you can't combine any because they're all full.
In both cases, What you *WANT* to do is deal with the EMPTY type, the
chunks that are hogging all the space but not actually using it.
This is evidently a bit counterintuitive on first glance as you're not
the first to have problems with it, but it /is/ the case, and once you
understand what's actually happening and why, it /does/ make sense.
More specifically, in the D case, where all /data/ chunks are full, you
want to rebalance the mostly empty /metadata/ chunks, combining for
example 5 near 20% full metadata chunks into a single near 100% full
metadata chunk, deallocating the other four metadata chunks (instead of
rewriting empty chunks) once there's nothing in them at all. Five just
became one, freeing four to unallocated space, which can now be used to
allocate new data chunks.
And the reverse in the M case, where all metadata chunks are full. Here,
you want to rebalance the mostly empty data chunks, again combining say
five 20% usage data chunks into a single 100% usage data chunk,
deallocating the other four data chunks once there's nothing in them at
all. Again, five just become one, freeing four to unallocated space,
which now can be used to allocate new, in this case, metadata chunks.
Thus the goal is to rebalance the nearly /empty/ chunks of the *OPPOSITE*
type to the one you're running short on, combining multiple nearly empty
chunks of the type you have too many of, thus freeing that empty space
back to unallocated, so the type that you're actually short on can
actually allocate chunks from the just freed to unallocated space.
That being the goal, working with the full chunks won't get you much.
Suppose you work with the 95% full chunks, 5% empty. You'll have to
rewrite *TWENTY* of them to combine all those 5% empties to free just
*ONE* chunk! And rewriting 100% full chunks won't get you anything at
all toward this goal, since they're already full and no more can be
stuffed into them. Rewrite 100 chunks 100% full, and you still have 100
chunks 100% full! =:^(
OTOH, suppose you work with 5% full chunks, 95% empty. Rewrite just two
of them, and you've already freed one, with the one left only 10% full.
Add a third one and free a second, with the one you're left with still
only 15% full. Continue until you've rewritten 20 of them, AND YOU FREE
19 OF THEM! =:^)
So it *CLEARLY* pays to work with the mostly empty ones. Usage=N, where
balance only works with the ones with LESS than or equal usage to that,
lets you do exactly that, work with the mostly EMPTY ones.
*BUT*, the payoff is even HIGHER than that. Consider, since only the
actually used blocks in a blockgroup need rewritten, an almost full chunk
is going to take FAR longer than an almost empty chunk to rewrite. Now
there's going to be /some/ overhead, but let's consider that 5% full
example again. For chunks only 5% full, you're only writing 5% of the
data or metadata that you'd be writing for a 100% full chunk, 1/20th as
much.
So in our example above, where we find and rewrite 20 5% usage chunk into
a single 100% usage chunk, while there will be /some/ overhead, you might
well write those 20 5% used chunks into a single 100% used chuck in
perhaps the same time it'd take you to rewrite just ONE 95% usage chunk.
IOW, rewriting 20 95% usage chunks to 19, freeing just one, is going to
take you nearly 20 times as long as rewriting 20 5% usage chunks, freeing
19 of them, since in the latter case you're actually only rewriting one
full chunk's worth of data or metadata.
So working with 5% usage chunks as opposed to 95% usage chunks, you free
19 times as much space, using only a bit over a 20th as much time. Even
with 100% overhead, you'd still spend a tenth as much time freeing 19
times as many chunks!
Which is why the usage= filter is such a big deal. In many cases, it
allows you *HUGE* bang for the buck! While I'm pulling numbers out of
the air for this example, they're well within reason. Something like
usage=10 might take you half an hour and free up 70% of the space that a
full balance would free, while the full balance may well take a whole 24-
hour day!
OK, so what /is/ the effect of a fuller filesystem? Simply this. As the
filesystem fills up, there's less and less fully free unallocated space
available even after a full balance, meaning that free space can be used
up with fewer and fewer chunk allocations, so you have to rebalance more
and more often to keep what's left from getting out of balance and
running into ENOSPC conditions.
Compounding the problem, as the filesystem fills up, it's less and less
likely that there will be more than just one mostly free chunk available
(the one that's actively being written into), with others full or nearly
so), so it'll be necessary to use higher and higher usage=N balances to
get anything back, and the bonus payoff we had above will be working in
reverse as now we WILL be having to do 20 95% full chunks to free just
one chunk back to unallocated. Compounding the problem even FURTHER,
will be the fact that we have ALL THOSE GiB (TiB?) of actual data to
rewrite, so it'll be a worse and worse slog for fewer and fewer freed
chunks in payback.
Again, numbers out of thin air, but for illustrative purposes...
When a TiB filesystem is say 10% full, 90% of it could be in almost-empty
chunks. Not only will it take a relatively long time to get to that
point with only 10% usage, but a usage=10 filter will very likely free
say 80% (leaving 10% that would require a higher usage filter to
recover), in only a few minutes or a half hour or whatever. And you do
it once and could be good for six months or a year before you start
running low on space again and need to redo it.
When it's 90% full, you're likely to need at least usage=80 to get
anywhere, and you'll be rewriting a good portion of that 900+ GiB in
ordered to get just a handful of chunks worth of space recovered, with
the balance taking say 10-12 hours, perhaps longer. What's worse, you
may well find yourself having to do a rebalance like that every week,
because your total deallocatable free space (even after a full balance)
is approaching your weekly working set!
Obviously at/before that point it's time to invest in more storage!
But, beware! Just because your filesystem is say 55% full (number from
your example earlier), does **NOT** mean usage=55 is the best number to
use. That may well be the case, or it may not. There's simply no
necessarily direct correlation in that regard, and a recommended N for
usage=N cannot be determined without a LOT more use-case information than
simply knowing the filesystem is at 55% capacity.
The most that can be /reliably/ stated is that in general, as usage of
the filesystem goes up, so will the necessary N for the usage=N balance
filter -- there's a general correlation, yes, but it's nowhere NEAR
possible to assume any particular ratio like 1:1, without knowing rather
more about the use-case.
In particular, with the filesystem at 55% capacity, the extremes are all
used chunks at 100% capacity except for one (the one that's actively
being used, this is in theory the case immediately after a full balance,
and even a full balance wouldn't do anything further here), *OR* all used
chunks at 56% usage but for one (in this case usage=55 would do nothing,
since all those 56% used chunks are above the 55% cutoff and the single
chunk that might be rewritten has nothing to combine with, but a usage=56
or a usage=60 would be as effective as a full balance), *OR* most chunks
are actually empty, with the remember but one at 100% usage (nearly the
same as the first case, except in that case there's no empty chunks
allocated, in this case all available space is allocated to empty chunks,
such that a usage=0 would be as effective as a full balance), *OR* all
used chunks but one are at 54-55% usage (usage=55 would in this case
just /happen/ to be the magic number that is as effective as a full
balance, while usage=54 would do nothing).
Another way of looking at that would be the old pick a number between 0
and 100 game. So you're using two d10 (10-sided dice, with one marked to
be the 10s digit thus generating 01-(1)00 as the range) to generate the
number and know the dice are weighted slightly to favor 5s, you along
with two friends are picking, and you pick first.
So you pick 55. But your two friends, not being dummies, pick 54 and
56. Unless those d10s are HEAVILY weighted, despite the weighing, your
odds of being the closest with that 55 aren't very good, are they?
Given no differences in time necessary and no additional knowledge about
how long it has been since the last full balance (which would have tended
to cram everything to 100% usage), and no knowledge about usage pattern,
55 would indeed be arguably the best choice to begin with.
But given the huge time advantage of lower values of N for usage=N if
they /do/ happen to do what you need, and thus the chance of usage=20
either doing the job in MUCH less time, or getting done in even LESS time
because it couldn't actually do /anything/, there's a good chance I'd try
something like that first, if only to then have some idea how much higher
I might want to go, because it'll be done SO much faster and has a /small/
chance of doing all I need anyway!
If usage=20 wasn't enough, I might then try usage=40, hoping that it
would do the rest, knowing that a rerun at higher but still under 100
number would at most redo only a single chunk from the previous run, the
one that didn't get filled up all the way at the end -- all the others
would either be 100% or would have been deallocated as empty, and knowing
that the higher the number, the MUCH higher the time required, in general.
So the 55% filesystem capacity would probably inform my choice of jumps,
say 20% at a time, but I'd still start much lower and jump at that 20% or
so at a time.
Meanwhile, if the filesystem was only at say 20% capacity, I'd probably
start with usage=0 and jump by 5% at a time, while if it was at say 80%
capacity, I might still start at usage=0 to see if I could get lucky, but
then jump to usage=60, and then usage=98 or 99, because the really high
number still under 100 would still avoid rewriting all the full chunks
I'd created with the previous runs as well as all 100% full chunks that
would yield no benefit toward our goal, but would still recover pretty
much everything it was possible to recover, which once you reach 80%
capacity is going to start looking pretty necessary at some point.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-23 7:52 ` Duncan
@ 2014-11-23 15:12 ` Patrik Lundquist
2014-11-24 4:23 ` Duncan
2014-11-23 21:16 ` Marc MERLIN
2014-12-07 21:38 ` Marc MERLIN
2 siblings, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-11-23 15:12 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
On 23 November 2014 at 08:52, Duncan <1i5t5.duncan@cox.net> wrote:
> [a whole lot]
Thanks for the long post, Duncan.
My venture into the finer details of balance began with converting an
ext4 fs to btrfs and after an inital defrag having a full balance fail
with about a third to go.
Consecutive full balances further reduced the number of chunks and got
me closer to finish without the infamous ENOSPC. After 3-4 full
balance runs it failed with less than 8% to go.
The balance run now finishes without errors with usage=99 and I think
I'll leave it at that. No RAID yet but will convert to RAID1.
Is it correct that there is no reason to ever do a 100% balance as
routine maintenance? I mean if you really need that last 1% space you
actually need a disk upgrade instead.
How about running a monthly maintenance job that uses bytes_used and
dev_item.bytes_used from btrfs-show-super to approximate the balance
need?
(dev_item.bytes_used - bytes_used) / bytes_used == extra device space used
The extra device space used after my balance usage=99 is 0,15%. It was
7,0% before I began tinkering with usage and ran into ENOSPC and I
think it is safe to assume that it was a lot more right after the fs
conversion.
So lets iterate a balance run which begins with usage=0 and increases
in steps of 5 or 10 and stops at 90 or 99 or when the extra device
space used is less than 1%.
Does it make sense?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-23 7:52 ` Duncan
2014-11-23 15:12 ` Patrik Lundquist
@ 2014-11-23 21:16 ` Marc MERLIN
2014-11-23 22:49 ` Holger Hoffstätte
2014-12-07 21:38 ` Marc MERLIN
2 siblings, 1 reply; 36+ messages in thread
From: Marc MERLIN @ 2014-11-23 21:16 UTC (permalink / raw)
To: Duncan; +Cc: linux-btrfs
On Sun, Nov 23, 2014 at 07:52:29AM +0000, Duncan wrote:
> > Right. So, why would you rebalance empty chunks or near empty chunks?
> > Don't you want to rebalance almost full chunks first, and work you way
> > to less and less full as needed?
>
> No, the closer to empty a chunk is, the more effect you can get in
> rebalancing it along with others of the same fullness.
Ok, now I see what I was thinking the wrong way around:
Rebalancing is not rebalancing data within a chunk, optimizing some tree
data structure.
Rebalancing is taking a nearly empty chunk and merging it with other
chunks to free up that chunkspace.
So, -dusage=10 only picks chunks that are 10% used or less, and tries
to free them up by putting their data elsewhere.
Did I get it right this time? :)
> IOW, rewriting 20 95% usage chunks to 19, freeing just one, is going to
> take you nearly 20 times as long as rewriting 20 5% usage chunks, freeing
> 19 of them, since in the latter case you're actually only rewriting one
> full chunk's worth of data or metadata.
Right, that makes sense.
> OK, so what /is/ the effect of a fuller filesystem? Simply this. As the
> filesystem fills up, there's less and less fully free unallocated space
> available even after a full balance, meaning that free space can be used
> up with fewer and fewer chunk allocations, so you have to rebalance more
> and more often to keep what's left from getting out of balance and
> running into ENOSPC conditions.
Yes, been there, done that :)
> But, beware! Just because your filesystem is say 55% full (number from
> your example earlier), does **NOT** mean usage=55 is the best number to
> use. That may well be the case, or it may not. There's simply no
> necessarily direct correlation in that regard, and a recommended N for
> usage=N cannot be determined without a LOT more use-case information than
> simply knowing the filesystem is at 55% capacity.
Yeah, I remember that. I'm ok with using the same number but I
understand it's not a given that it's the perfect number.
> So the 55% filesystem capacity would probably inform my choice of jumps,
> say 20% at a time, but I'd still start much lower and jump at that 20% or
> so at a time.
That makes sense. I'll try to synthetize all this and rewrite my blog
post and the wiki to make this clearer.
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-23 21:16 ` Marc MERLIN
@ 2014-11-23 22:49 ` Holger Hoffstätte
2014-11-24 4:40 ` Duncan
0 siblings, 1 reply; 36+ messages in thread
From: Holger Hoffstätte @ 2014-11-23 22:49 UTC (permalink / raw)
To: linux-btrfs
On Sun, 23 Nov 2014 13:16:50 -0800, Marc MERLIN wrote:
(snip)
> That makes sense. I'll try to synthetize all this and rewrite my blog
> post and the wiki to make this clearer.
Maybe also add that as of 3.18 empty block groups are automatically
collected, so balancing to prevent ENOSPC-by-empty-chunks is no longer
necessary. This works pretty well; I haven't run balance in weeks,
and my total-vs.-used overhead has always been <10 GB.
Holger
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-23 15:12 ` Patrik Lundquist
@ 2014-11-24 4:23 ` Duncan
2014-11-24 12:35 ` Patrik Lundquist
0 siblings, 1 reply; 36+ messages in thread
From: Duncan @ 2014-11-24 4:23 UTC (permalink / raw)
To: linux-btrfs
Patrik Lundquist posted on Sun, 23 Nov 2014 16:12:54 +0100 as excerpted:
> The balance run now finishes without errors with usage=99 and I think
> I'll leave it at that. No RAID yet but will convert to RAID1.
Converting between raid modes is done with a balance, so if you can't get
that last bit to balance, you can't do a full conversion to raid1.
> Is it correct that there is no reason to ever do a 100% balance as
> routine maintenance? I mean if you really need that last 1% space you
> actually need a disk upgrade instead.
I'm too cautious to make an unequivocal statement like that, but at least
of the top of my head, I can't think of any reason why /routine/
maintenance needs a full balance. Like I said above, the mode
conversions need it as that's what rewrites them to the new mode, but
that's not /routine/. Similarly, adding/deleting devices, where balance
is used to rebalance the usage between remaining devices, isn't routine.
Certainly, I've had no reason to do that full balance, as opposed to 99%
or whatever not-quite-full value, here, in routine usage. That doesn't
mean I won't someday find such a reason, but I've not seen one so far.
> How about running a monthly maintenance job that uses bytes_used and
> dev_item.bytes_used from btrfs-show-super to approximate the balance
> need?
I'm not familiar enough with the individual btrfs-show-super line items
to address that specific question in an intelligent manner.
What I'd recommend using instead is the output from btrfs filesystem df
<mountpoint> and/or btrfs fi show <mountpoint>. These commands spit out
information that's more "human readable", that should be usable in a
script that conditionally triggers a balance as needed, as well.
In btrfs fi show, you're primarily interested in the devid line(s). That
tells you how much of the total available space is chunk-allocated for
that device, with the difference between total and used being the
unallocated space, available to allocate to either data or metadata
chunks as needed.
What you're watching for there is of course nearly all space used. How
much you want to keep free will depend to some extent on the size of the
devices and how close to full they actually are, but with data chunks
being 1 GiB in size and metadata chunks being a quarter GiB in size,
until the filesystem gets really too full to do so, keeping enough room
to allocate several chunks of each shouldn't hurt. With the usual multi-
hundred-gig filesystems[1], I'd suggest doing a rebalance whenever
unallocated space is under 20 GiB. If in fact you have /lots/ of unused
space, say a TB filesystem with only a couple hundred GiB used, I'd
probably set the safety margin higher, say 100 GiB or even 200 GiB, while
at the same time using a lower usage=N balance filter. No sense getting
anywhere /close/ to the wire in that case. As the filesystem fills that
can be reduced as necessary, but you'll want to keep at *LEAST* 3 GiB or
so unallocated, so the filesystem always has room to do at least a couple
more chunk-allocations each of data and metadata. That should also
guarantee that there's at least enough room for balance to create a new
chunk in ordered to be able to do its rewriting thing, thus allowing you
to free /more/ space.
In btrfs fi df, watch the data and metadata lines. Specifically, you're
interested in the spread between total, which is what is chunk-allocated
for the filesystem, and used, actual usage within those allocated
chunks. High spread indicates a bunch of empty chunks that a balance can
free back to unallocated space, our goal in this case.
Again, data chunks are 1 GiB in size, so for the data line, a spread of
under a GiB indicates that even a full balance isn't likely to free
anything back to unallocated. Generally if it's within a single-digit
number of GiB difference, don't worry about balancing it. Similarly, on
a TB-class filesystem, if btrfs fi show says you still have hundreds of
GiB of room, there's little reason to worry about a balance even if the
spread in fi df is a similar hundreds of GiB, because you still have
plenty of unallocated room left.
Metadata chunks are a quarter-GiB in size, but on a single-device-
filesystem, they normally default to DUP mode, so two will be allocated
at a time. So if you're under a half-gig difference between total (aka
allocated) and used metadata, doing even a full metadata balance is
unlikely to get anything back, and it's normally not worth worrying about
a metadata balance unless the spread is over a couple GiB. Basically the
same general rules apply as for data, only at half the metadata size. So
under 5-10 GiB spread is unlikely to be worth the hassle. On a TB-class
filesystem, still don't worry about it if there's hundreds of GiB
unallocated, but if the fi df metadata spread between total/allocated and
used is 50 GiB or more, you may wish to do a metadata balance just to get
some of that back, even if unallocated (from fi show, as above) /is/
still hundreds of GiB.
So bottom line, on a TB-class filesystem with plenty of room (a couple
hundred GiB free still, or more), I'd rebalance if unallocated (fi show,
difference between total and used on a device line) drops under 100 GiB,
rebalancing data if fi df shows over 100 GiB spread between data total
(aka allocated) and used, and rebalancing metadata if there's over a 50
GiB spread.
As the filesystem fills up, say with only 100 GiB free, that'd drop to
triggering a balance if there's under perhaps 20 GiB unallocated on fi
show, with a data balance at a similar 20 GiB data spread, and a metadata
balance with a 10 GiB metadata spread.
On a TB-class filesystem or even a half-TB-class filesystem, once you're
having trouble maintaining at least 10 GiB free, you should really be
adding more devices or upgrading to bigger hardware, because you really
don't want the unallocated to drop below 3 GiB or balance itself can have
trouble running.
On my sub-100 GiB filesystems, I tend to have the filesystem sized much
closer to what I actually need. For instance, my rootfs is btrfs raid1
mode, 8 GiB per device, two devices, so 8 GiB filesystem capacity.
/bin/df reports 2.1 G used, 5.8 GiB available.
btrfs fi show reports (per device) 8 GiB size, 2.78 GiB used. So call it
3 GiB used and 5 GiB unallocated.
btrfs fi df reports data of 2 GiB (obviously two 1 GiB chunks) total,
1.75 GiB of which is used, for a spread of a quarter GiB. That's under
the 1 GiB data chunk size so even a full balance likely won't return
anything.
Btrfs fi df reports metadata of 768 MiB total (obviously three quarter-
GiB chunks, remember this is raid1 so it's not duping the metadata chunks
to the same device, the other copy is on the other device), 298.12 MiB
used.
So in theory I could get 1 chunk of that metadata back, reducing it to 2
metadata chunks. However, there's typically a couple-hundred MiB
metadata overhead that btrfs won't actually let you use as it uses it
internally, and even a full balance doesn't recover it. So it's unlikely
I could recover that /apparently/ spare metadata block.
So I appear to be at optimum. Obviously on an 8 GiB filesystem, I'm
going to have to watch unallocated space very closely. However, because
this /is/ a specific-purpose filesystem (system root, with all installed
programs and config) and I've already using it for that specific purpose,
usage shouldn't and doesn't change /that/ much, even tho I'm on gentoo
and thus have rolling updates. It's thus /easier/ to keep an eye on data/
metadata spread as well as on total allocated usage and do a balance
(which on an 8 GiB only filesystem on SSD, tends to take only perhaps a
couple minutes for a full balance anyway) when I need to, because while
it's small, even with updates the general data/metadata ratio doesn't
tend to change much and normally the data and metadata usage stays about
the same even as the files are updated, because it's simply reusing the
chunks it has.
---
[1] Multi-hundred-gig filesystems: These are usual for me as I like to
keep my physical devices partitioned up and my filesystems small and
manageable, but most people just create a big filesystem or two out of
the multi-hundred-gig physical device, so their filesystems are commonly
multi-hundred-gig as well.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-23 22:49 ` Holger Hoffstätte
@ 2014-11-24 4:40 ` Duncan
0 siblings, 0 replies; 36+ messages in thread
From: Duncan @ 2014-11-24 4:40 UTC (permalink / raw)
To: linux-btrfs
Holger Hoffstätte posted on Sun, 23 Nov 2014 22:49:01 +0000 as excerpted:
> On Sun, 23 Nov 2014 13:16:50 -0800, Marc MERLIN wrote:
>
> (snip)
>
>> That makes sense. I'll try to synthetize all this and rewrite my blog
>> post and the wiki to make this clearer.
>
> Maybe also add that as of 3.18 empty block groups are automatically
> collected, so balancing to prevent ENOSPC-by-empty-chunks is no longer
> necessary. This works pretty well; I haven't run balance in weeks, and
> my total-vs.-used overhead has always been <10 GB.
For those of us who have been around btrfs for awhile, this still sounds
like the stuff if science fiction, perhaps possible sometime in the
future, but definitely not something we're yet used to having actually
automatically handled for us.
=:^)
So I think we're all glad it's here now, but kind of holding our breath
waiting for that bug due to this that stops everything cold. I think the
feeling is, OK, but let's not rock the boat too much in our haste to
celebrate, or we might find ourselves unexpectedly in the water once
again. Let's just go on teaching people to swim, assuming they'll need
to know how, and if they never do, well then that's a bonus! =:^)
But I think once we get into the 3.19 development cycle, if there's no
critical bugs with this 3.18 feature yet, then and only then are we
likely to start really talking about it. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-24 4:23 ` Duncan
@ 2014-11-24 12:35 ` Patrik Lundquist
2014-12-09 22:29 ` Patrik Lundquist
0 siblings, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-11-24 12:35 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
On 24 November 2014 at 05:23, Duncan <1i5t5.duncan@cox.net> wrote:
> Patrik Lundquist posted on Sun, 23 Nov 2014 16:12:54 +0100 as excerpted:
>
>> The balance run now finishes without errors with usage=99 and I think
>> I'll leave it at that. No RAID yet but will convert to RAID1.
>
> Converting between raid modes is done with a balance, so if you can't get
> that last bit to balance, you can't do a full conversion to raid1.
Good point! It slipped my mind. I'll report back if incremental
balances eventually solves the balance after conversion ENOSPC
problem.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-23 1:07 ` Marc MERLIN
2014-11-23 7:52 ` Duncan
@ 2014-11-24 18:05 ` Brendan Hide
1 sibling, 0 replies; 36+ messages in thread
From: Brendan Hide @ 2014-11-24 18:05 UTC (permalink / raw)
To: Marc MERLIN; +Cc: Hugo Mills, Patrik Lundquist, linux-btrfs@vger.kernel.org
On 2014/11/23 03:07, Marc MERLIN wrote:
> On Sun, Nov 23, 2014 at 12:05:04AM +0000, Hugo Mills wrote:
>>> Which is correct?
>> Less than or equal to 55% full.
>
> This confuses me. Does that mean that the fullest blocks do not get
> rebalanced?
"Balance has three primary benefits:
- free up some space for new allocations
- change storage profile
- balance/migrate data to or away from new or failing disks (the
original purpose of balance)
and one fringe benefit:
- force a data re-write (good if you think your spinning-rust needs to
re-allocate sectors)
In the regular case where you're not changing the storage profile or
migrating data between disks, there isn't much to gain from balancing
full chunks - and it involves a lot of work. For SSDs, it is
particularly bad for wear. For spinning rust it is merely a lot of
unnecessary work.
> I guess I was under the mistaken impression that the more data you had the
> more you could be out of balance.
>
>> A chunk is the part of a block group that lives on one device, so
>> in RAID-1, every block group is precisely two chunks; in RAID-0, every
>> block group is 2 or more chunks, up to the number of devices in the
>> FS. A chunk is usually 1 GiB in size for data and 250 MiB for
>> metadata, but can be smaller under some circumstances.
> Right. So, why would you rebalance empty chunks or near empty chunks?
> Don't you want to rebalance almost full chunks first, and work you way to
> less and less full as needed?
Balancing empty chunks makes them available for re-allocation - so that
is directly useful and light on workload.
--
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-23 7:52 ` Duncan
2014-11-23 15:12 ` Patrik Lundquist
2014-11-23 21:16 ` Marc MERLIN
@ 2014-12-07 21:38 ` Marc MERLIN
2 siblings, 0 replies; 36+ messages in thread
From: Marc MERLIN @ 2014-12-07 21:38 UTC (permalink / raw)
To: Duncan, Holger Hoffstätte; +Cc: linux-btrfs
On Sun, Nov 23, 2014 at 07:52:29AM +0000, Duncan wrote:
> > Right. So, why would you rebalance empty chunks or near empty chunks?
> > Don't you want to rebalance almost full chunks first, and work you way
> > to less and less full as needed?
>
> No, the closer to empty a chunk is, the more effect you can get in
> rebalancing it along with others of the same fullness.
On Sun, Nov 23, 2014 at 10:49:01PM +0000, Holger Hoffstätte wrote:
> Maybe also add that as of 3.18 empty block groups are automatically
> collected, so balancing to prevent ENOSPC-by-empty-chunks is no longer
> necessary. This works pretty well; I haven't run balance in weeks,
> and my total-vs.-used overhead has always been <10 GB.
Sorry for the delay in confirming this.
I've corrected both
https://btrfs.wiki.kernel.org/index.php/Balance_Filters#Balancing_to_fix_filesystem_full_errors
and
http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
with your input.
Thanks much for that.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-11-24 12:35 ` Patrik Lundquist
@ 2014-12-09 22:29 ` Patrik Lundquist
2014-12-09 23:13 ` Robert White
` (2 more replies)
0 siblings, 3 replies; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-09 22:29 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
On 24 November 2014 at 13:35, Patrik Lundquist
<patrik.lundquist@gmail.com> wrote:
> On 24 November 2014 at 05:23, Duncan <1i5t5.duncan@cox.net> wrote:
>> Patrik Lundquist posted on Sun, 23 Nov 2014 16:12:54 +0100 as excerpted:
>>
>>> The balance run now finishes without errors with usage=99 and I think
>>> I'll leave it at that. No RAID yet but will convert to RAID1.
>>
>> Converting between raid modes is done with a balance, so if you can't get
>> that last bit to balance, you can't do a full conversion to raid1.
>
> Good point! It slipped my mind. I'll report back if incremental
> balances eventually solves the balance after conversion ENOSPC
> problem.
I'm having no luck with a full balance of the converted filesystem.
Tried it again with Linux v3.18.0 and btrfs-progs v3.17.3.
What conclusions can be drawn from the following?
BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
BTRFS: space_info 1 has 4773171200 free, is not full
BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
reserved=99700736, may_use=2102390784, readonly=241664
BTRFS: block group 234109272064 has 5368709120 bytes, 5368709120 used
0 pinned 0 reserved
BTRFS info (device sdc1): block group has cluster?: no
BTRFS info (device sdc1): 0 blocks of free space at or bigger than bytes is
BTRFS: block group 242699206656 has 5368709120 bytes, 5368709120 used
0 pinned 0 reserved
BTRFS info (device sdc1): block group has cluster?: no
BTRFS info (device sdc1): 0 blocks of free space at or bigger than bytes is
BTRFS: block group 339335970816 has 5368709120 bytes, 5368705024 used
0 pinned 0 reserved
BTRFS critical (device sdc1): entry offset 344704675840, bytes 4096, bitmap no
Label: none uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
Total devices 1 FS bytes used 1.35TiB
devid 1 size 2.73TiB used 1.36TiB path /dev/sdc1
Data, single: total=1.35TiB, used=1.35TiB
System, single: total=32.00MiB, used=112.00KiB
Metadata, single: total=3.00GiB, used=1.55GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
Checking filesystem on /dev/sdc1
UUID: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
found 825003219475 bytes used err is 0
total csum bytes: 1452612464
total tree bytes: 1669943296
total fs tree bytes: 39600128
total extent tree bytes: 52903936
btree space waste bytes: 79921034
file data blocks allocated: 1487627730944
referenced 1487627730944
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-09 22:29 ` Patrik Lundquist
@ 2014-12-09 23:13 ` Robert White
2014-12-10 7:19 ` Patrik Lundquist
2014-12-09 23:20 ` Robert White
2014-12-09 23:48 ` Robert White
2 siblings, 1 reply; 36+ messages in thread
From: Robert White @ 2014-12-09 23:13 UTC (permalink / raw)
To: Patrik Lundquist, linux-btrfs@vger.kernel.org
On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
> Label: none uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
> Total devices 1 FS bytes used 1.35TiB
> devid 1 size 2.73TiB used 1.36TiB path /dev/sdc1
>
>
> Data, single: total=1.35TiB, used=1.35TiB
> System, single: total=32.00MiB, used=112.00KiB
> Metadata, single: total=3.00GiB, used=1.55GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
Are you trying to convert a filesystem on a single device/partition to
RAID 1?
I don't think thats legal. Whithout a second slice to distribute the
copies of the data onto there is no raiding to be done.
Add the second device with btrfs device add, and _then_ use balance to
redistribute and copy the data to the second device.
ASIDE: I, personally, think that a single device RAID1 should be legal.
I also think that it should be possible to tell the system that you want
N copies if you have N-or-more slices onto which they would spread.
These would match my expectations from mdadm and several hardware and
appliance RAID solutions. But my opinions in the matter do _not_ match
the BTRFS code base. RAID1 means exactly two devices (for any given
piece of information) [though I don't know whether it always has to be
the _same_ two devices for two different pieces of information.]
So yea, if that is what you are trying to do, the inability to find a
second drive on which to allocate the peer-block(s) for an extent would
produce interesting errors. I cant say for sure that this is the exact
genesis of your issue, but I've read here in other threads a number of
comments that would translate as "trying to set RAID1 with on a
one-slice file system will be full of fail".
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-09 22:29 ` Patrik Lundquist
2014-12-09 23:13 ` Robert White
@ 2014-12-09 23:20 ` Robert White
2014-12-09 23:48 ` Robert White
2 siblings, 0 replies; 36+ messages in thread
From: Robert White @ 2014-12-09 23:20 UTC (permalink / raw)
To: Patrik Lundquist, linux-btrfs@vger.kernel.org
On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
> Data, single: total=1.35TiB, used=1.35TiB
> System, single: total=32.00MiB, used=112.00KiB
> Metadata, single: total=3.00GiB, used=1.55GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
P.S. you should re-balance your System and Metadata as "DUP" for now.
Two copies of that stuff is better than one as right now you have no
real recovery path for that stuff. If you didn't make that change on
purpose it probably got down-revved from DUP automagically when you
tired to RAID it. e.g. block-by-block the dups were removed but then
there was no place to put the mirrored copy.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-09 22:29 ` Patrik Lundquist
2014-12-09 23:13 ` Robert White
2014-12-09 23:20 ` Robert White
@ 2014-12-09 23:48 ` Robert White
2014-12-10 0:01 ` Robert White
2 siblings, 1 reply; 36+ messages in thread
From: Robert White @ 2014-12-09 23:48 UTC (permalink / raw)
To: Patrik Lundquist, linux-btrfs@vger.kernel.org
On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
> (stuff depicting a nearly full file system).
Having taken another look at it all, I'd bet (there is not sufficient
information to be _sure_ from the output you've provided) that you don't
have the necessary 1Gb free on your disk slice to allocate another data
extent. The COW portion of some consolidation event is being blocked for
lack of any place to put the condensed/congealed result of balancing one
or more of your blocks. You are gong to have to grow your filesystem by
at least 1 gig to get the balance to complete as-is; or alternately
remove at least a gig worth of "large files" (e.g. files that are stored
in a DATA extent as opposed to small ones stored in the metadata).
In the alternate, if you have a "bigger drive" to use, then add that
device to the file system (q.v. "btrfs device add /dev/sdd1 /muntpoint")
and then remove the current device (q.v. "btrfs device delete /dev/sdc1
/mountpoint"). You now have the filesystem on a bigger media where stuff
can happen correctly.
At _that_ point you can RAID the larger device to equally sized peers
etc. if your actual goal is to establish full redundancy.
===
If the whole raiding thing was about running out of space, then you are
actually "done" as soon as you add the second device. It will be used
automatically, and you can balance over to it directly or not as you see
fit.
In particular if you have another drive/slice of equal size and you
intend to spread out onto it, your best choices are ::
btrfs device add /dev/sdd1 /mp
btrfs balance start -dconvert=raid0 -mconvert=dup -sconvert=dup /mp
--or--
btrfs balance start -dconvert=raid0 -mconvert=raid1 -sconvert=raid1 /mp
(the latter is more preservative of information a failed media, but
since your bulk data isn't being stored redundantly it probably doesn't
matter which you use.)
Once the second drive is in place you'll have the room you need for the
balances to finish.
In the second model you'll be spreading your bulk data out onto the two
drives "evenly" via striping (raid0) and your metadata will be
duplicated between the two slices.
If you are adding storage and you are _not_ going to be adding it all
with -dconvert=raid1, it doesn't matter that you don't currently have
the needed space for a balance to complete on the currently full media.
If you are trying to raid1 your entire filesystem when you are already
effectively out of space, you will find no joy. Adding a full media
raid1 is a no-op for available space.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-09 23:48 ` Robert White
@ 2014-12-10 0:01 ` Robert White
2014-12-10 12:47 ` Duncan
0 siblings, 1 reply; 36+ messages in thread
From: Robert White @ 2014-12-10 0:01 UTC (permalink / raw)
To: Patrik Lundquist, linux-btrfs@vger.kernel.org
On 12/09/2014 03:48 PM, Robert White wrote:
> On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
> > (stuff depicting a nearly full file system).
>
> Having taken another look at it all, I'd bet (there is not sufficient
> information to be _sure_ from the output you've provided) that you don't
> have the necessary 1Gb free on your disk slice to allocate another data
> extent. The COW portion of some consolidation event is being blocked for
> lack of any place to put the condensed/congealed result of balancing one
> or more of your blocks. You are gong to have to grow your filesystem by
> at least 1 gig to get the balance to complete as-is; or alternately
> remove at least a gig worth of "large files" (e.g. files that are stored
> in a DATA extent as opposed to small ones stored in the metadata).
>
> In the alternate, if you have a "bigger drive" to use, then add that
> device to the file system (q.v. "btrfs device add /dev/sdd1 /muntpoint")
> and then remove the current device (q.v. "btrfs device delete /dev/sdc1
> /mountpoint"). You now have the filesystem on a bigger media where stuff
> can happen correctly.
>
> At _that_ point you can RAID the larger device to equally sized peers
> etc. if your actual goal is to establish full redundancy.
>
> ===
>
> If the whole raiding thing was about running out of space, then you are
> actually "done" as soon as you add the second device. It will be used
> automatically, and you can balance over to it directly or not as you see
> fit.
>
> In particular if you have another drive/slice of equal size and you
> intend to spread out onto it, your best choices are ::
>
> btrfs device add /dev/sdd1 /mp
EDIT :: SLIGHT BOO BOO maybe...
You may not have enough data chunks to get the -dconvert=raid0 to run
right (I've never done the experiment but you are probably still
COW-blocked) away as I don't know if the convert will drop back to "make
room" mode and just bump some data aside before doing the conversion. So
you might need to do a limited data balance before you do either of the
below.
btrfs balance start -dlimit=20 /mp #20 is a wild guess at a good number
This will examine 20 chunks, and likely move at least one or two, if not
all twenty, onto the second drive. This will make room for the
subsequent raid0 segments on the first drive.
>
> btrfs balance start -dconvert=raid0 -mconvert=dup -sconvert=dup /mp
> --or--
> btrfs balance start -dconvert=raid0 -mconvert=raid1 -sconvert=raid1 /mp
>
> (the latter is more preservative of information a failed media, but
> since your bulk data isn't being stored redundantly it probably doesn't
> matter which you use.)
>
> Once the second drive is in place you'll have the room you need for the
> balances to finish.
>
> In the second model you'll be spreading your bulk data out onto the two
> drives "evenly" via striping (raid0) and your metadata will be
> duplicated between the two slices.
>
> If you are adding storage and you are _not_ going to be adding it all
> with -dconvert=raid1, it doesn't matter that you don't currently have
> the needed space for a balance to complete on the currently full media.
> If you are trying to raid1 your entire filesystem when you are already
> effectively out of space, you will find no joy. Adding a full media
> raid1 is a no-op for available space.
>
(EDIT::Continued.) Worse case, just add the second device and run a
balance with no arguments to spread your data out. Then run the format
specific conversions to get it all rational and optimal.
Full filesystems always get into corner cases.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-09 23:13 ` Robert White
@ 2014-12-10 7:19 ` Patrik Lundquist
2014-12-10 12:17 ` Robert White
0 siblings, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-10 7:19 UTC (permalink / raw)
To: Robert White; +Cc: linux-btrfs@vger.kernel.org
On 10 December 2014 at 00:13, Robert White <rwhite@pobox.com> wrote:
> On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
>>
>> Label: none uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
>> Total devices 1 FS bytes used 1.35TiB
>> devid 1 size 2.73TiB used 1.36TiB path /dev/sdc1
>>
>>
>> Data, single: total=1.35TiB, used=1.35TiB
>> System, single: total=32.00MiB, used=112.00KiB
>> Metadata, single: total=3.00GiB, used=1.55GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> Are you trying to convert a filesystem on a single device/partition to RAID
> 1?
Not yet. I'm stuck at the full balance after the conversion from ext4.
I haven't added the disks for RAID1 and might need them for starting
over instead.
A balance with -musage=100 -dusage=99 works but a full fails. It would
be nice to nail the bug since the fs passes btrfs check and it seems
to be a clear ENOSPC bug.
I don't know how to interpret the space_info error. Why is only
4773171200 (4,4GiB) free?
Can I inspect block group 1821099687936 to try to find out what makes
it problematic?
BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
BTRFS: space_info 1 has 4773171200 free, is not full
BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
reserved=99700736, may_use=2102390784, readonly=241664
> P.S. you should re-balance your System and Metadata as "DUP" for now. Two
> copies of that stuff is better than one as right now you have no real
> recovery path for that stuff. If you didn't make that change on purpose it
> probably got down-revved from DUP automagically when you tired to RAID it.
Good point. Maybe btrfs-convert should do that by default? I don't
think it has ever been DUP.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 7:19 ` Patrik Lundquist
@ 2014-12-10 12:17 ` Robert White
2014-12-10 13:11 ` Duncan
2014-12-10 13:36 ` Patrik Lundquist
0 siblings, 2 replies; 36+ messages in thread
From: Robert White @ 2014-12-10 12:17 UTC (permalink / raw)
To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org
On 12/09/2014 11:19 PM, Patrik Lundquist wrote:
> On 10 December 2014 at 00:13, Robert White <rwhite@pobox.com> wrote:
>> On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
>>>
>>> Label: none uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
>>> Total devices 1 FS bytes used 1.35TiB
>>> devid 1 size 2.73TiB used 1.36TiB path /dev/sdc1
>>>
>>>
>>> Data, single: total=1.35TiB, used=1.35TiB
>>> System, single: total=32.00MiB, used=112.00KiB
>>> Metadata, single: total=3.00GiB, used=1.55GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>>
>> Are you trying to convert a filesystem on a single device/partition to RAID
>> 1?
>
> Not yet. I'm stuck at the full balance after the conversion from ext4.
> I haven't added the disks for RAID1 and might need them for starting
> over instead.
You are not "stuck" here as this step is not mandatory. (see below)
>
> A balance with -musage=100 -dusage=99 works but a full fails. It would
> be nice to nail the bug since the fs passes btrfs check and it seems
> to be a clear ENOSPC bug.
Conversion from ext2/3/4 is constrained because it needs to be reversible.
If you are out of space this isn't a "bug", you are just out of space.
So by telling the system to ignore the 100% full clusters it is free to
juggle the fragments. But once you get into moving the fully full
extents the COW features _MUST_ have access to _contiguous_ 1Gib blocks
to make the new extents int which the Copy will be Written. If your file
system was nearly full it's completely likely that there are no such
contiguous blocks available to make the necessary extents.
BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
filesystem. That is, the recommended balance (and recursive defrag) is
_not_ a useability issue, its an efficiency issue.
Check what you've got. Make sure it is good. Make sure you are cool with
it all. When you know everything is usable then remove the undo
information snapshot. That snapshot is pinning a _lot_ of data into
exact positions on disk. It's memorializing your previous fragmentation
and the anniversary positions of all the EXT4 data structures. Since
your system is basically full that undo information has to go.
At that point your balance will probably have the room it needs.
_Then_ you can balance if you feel the desire.
If you are _still_ out of space you'll need to add some, at least
temporarily, to give the system enough room to work.
Since we all _know_ you are a dilligent system administrator and
architect with a good, recent, and well tested backup we know we can
recommend that you just dump the undo partition with a nice btrfs subvol
delete, right? Because you made a backup and everything yes?
So anyway. Your system isn't "bugged" or "broken" it's "full" but its a
fragmented fullness that has lots of free sectors but insufficent
contiguous free sectors, so it cannot satisfy the request.
That Said...
I suspect you _have_ revealed a problem with the error reporting in the
case of "scary and wrong error message".
The allocator in extent-tree.c just tells you the raw free space on the
disk and says "hua... there are lots of bytes out there".
Which is _WAY_ different than "there are enough bytes all in one clump
to satisfy my needs. E.g. there is _not_ a lot of brains behind the message.
ret = find_free_extent(root, num_bytes, empty_size, hint_byte, ins,
flags, delalloc);
if (ret == -ENOSPC) {
if (!final_tried && ins->offset) {
num_bytes = min(num_bytes >> 1, ins->offset);
num_bytes = round_down(num_bytes,
root->sectorsize);
num_bytes = max(num_bytes, min_alloc_size);
if (num_bytes == min_alloc_size)
final_tried = true;
goto again;
} else if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
struct btrfs_space_info *sinfo;
sinfo = __find_space_info(root->fs_info, flags);
btrfs_err(root->fs_info, "allocation failed
flags %llu, wanted %llu",
flags, num_bytes);
if (sinfo)
dump_space_info(sinfo, num_bytes, 1);
}
}
>
>
> I don't know how to interpret the space_info error. Why is only
> 4773171200 (4,4GiB) free?
> Can I inspect block group 1821099687936 to try to find out what makes
> it problematic?
>
> BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
> BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
> BTRFS: space_info 1 has 4773171200 free, is not full
> BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
> reserved=99700736, may_use=2102390784, readonly=241664
So it was looking for a single chunk 2013265920 bytes long and it
couldn't find one because all the spaces were smaller and there was no
room to make a new suitable space.
The problem is that it wanted 2013265920 bytes and while the system as a
whole had no way to satisfy that desire. It asked for something just shy
of two gigs as a single extent. That's a tough order on a full platter.
Since your entire free size is 2102390784 that is an attempt to allocate
about 80% of your free space as one contiguous block. That's never going
to happen. 8-)
I don't even know if 2GiB is normally a legal size for an extent. My
understanding is that data is allocated in 1G chunks, so I'd expect all
extents to be smaller than 1G.
Normally...
But... I would bet that this 2gig monster is the image file, or part
thereof, that btrfs-convert left behind, and it may well be a magical
allocation of some sort. It may even be beyond the reach of balance et
al for being so large. But it _is_ within the bounds of the byte offests
and sizes the file system uses.
After a quick glance at the btrfs-convert, it looks like it might make
some pretty atypical extents if the underlying donor filesystem needed
needed them. It wouldn't have had a choice. So it's easily within the
realm of reason that you'd have some really fascinating data as a result
of converting a nearly full EXT4 file system of the Terabyte+ size. This
would be quadruply true if you'd tweaked the block group ratios when you
made the original file system.
So since you have nice backups... you should probably drop the
ext2_saved subvolume and then get on with your life for good or ill.
But its do or undo time.
AND UNDO IS NOT A BAD OPTION.
If you've got the media, building a fresh filesystem and copying the
contents onto it is my preferred method anyway. I get to set the options
I want (compression, skinny metadata, whatever) and I know I've got a
good backup on the original media. It's also the perfectly natural way
to get the subvolume boundaries where I want them and all that stuff.
Think of the time and worry you'd have saved if you'd copied the thing
in the first place. 8-)
So anyway...
Probably fine.
Probably just very full filesystem.
Clearly got some big whale files that just won't balance due to space.
Probably those files are the leftover EXT4 structures.
Probably okay to revert.
Probably okay to just delete the revert info.
The prior two items are mutually exclusive.
Since you have nice and validated backups you can't go wrong either way.
>
>> P.S. you should re-balance your System and Metadata as "DUP" for now. Two
>> copies of that stuff is better than one as right now you have no real
>> recovery path for that stuff. If you didn't make that change on purpose it
>> probably got down-revved from DUP automagically when you tired to RAID it.
>
> Good point. Maybe btrfs-convert should do that by default? I don't
> think it has ever been DUP.
Eyup.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 0:01 ` Robert White
@ 2014-12-10 12:47 ` Duncan
2014-12-10 20:11 ` Patrik Lundquist
0 siblings, 1 reply; 36+ messages in thread
From: Duncan @ 2014-12-10 12:47 UTC (permalink / raw)
To: linux-btrfs
Robert White posted on Tue, 09 Dec 2014 16:01:02 -0800 as excerpted:
> On 12/09/2014 03:48 PM, Robert White wrote:
>> On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
>>> (stuff depicting a nearly full file system).
>>
>> Having taken another look at it all, I'd bet (there is not sufficient
>> information to be _sure_ from the output you've provided) that you
>> don't have the necessary 1Gb free on your disk slice to allocate
>> another data extent.
[snip most of both quote levels]
> Full filesystems always get into corner cases.
But, from the content you snipped from his post, this from btrfs fi show:
>>> Label: none uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
>>> Total devices 1 FS bytes used 1.35TiB
>>> devid 1 size 2.73TiB used 1.36TiB path /dev/sdc1
Device 2.73 TiB, used only 1.36 TiB.
That's over a TiB of entirely unallocated space, so a mere 1 GiB chunk
allocation shouldn't be a problem.
I'm sticking with my original hypothesis (assuming this is a continuation
from the thread I think it was), that there's something about the
conversion from ext* that didn't work correctly; most likely a file
larger than the btrfs 1 GiB data-chunk size, that has an extent larger
than that size as well. Btrfs balance couldn't do anything with that, as
it's larger than the native 1 GiB data-chunk size and balance alone
doesn't know how to split it up.
The recursive btrfs defrag after deleting the saved ext* subvolume
_should_ have split up any such > 1 GiB extents so balance could deal
with them, but either it failed for some reason on at least one such
file, or there's some other weird corner-case going on, very likely
something else having to do with the conversion.
Patrik, assuming no btrfs snapshots yet, can you do a du --all --block-
size=1M | sort -n (or similar), then take a look at all results over 1024
(1 GiB since the du specified 1 MiB blocks), and see if it's reasonable
to move all those files out of the filesystem and back? Assuming there's
not too many of them, the idea is to kill the copy in the filesystem by
moving them elsewhere, then move them back so they get recreated using
native btrfs semantics -- no extents larger than the native btrfs data
chunk size of 1 GiB.
If you have lots of memory to work with, one method would be to create a
tmpfs, then /copy/ the files to tmpfs and /move/ them back to a temporary
tree on the btrfs, deleting the originals on btrfs only after the move
back from tmpfs and a sync (or btrfs fi sync) so there's always a
permanent copy if the machine should crash and take down the tmpfs with
it. After all the files have been processed and the originals deleted
you can then move the contents of the temporary tree back into the
original location.
That should ensure no more > 1 GiB file extents and will I hope get rid
of the problem, as this workaround has been demonstrated to fix problems
other people had with converted-from-ext* btrfs, generally where they had
failed to run the defrag right after the conversion, and now had a bunch
more data on the filesystem and didn't want to have to defrag it too.
Obviously it works best when there's only a handful of > 1 GiB files,
however, and snapshots containing references to the affected files will
prevent the file delete from actually deleting the problematic extents.
With luck that'll allow a full 100% balance without error. If not, at
least it should eliminate the > 1 GiB file extents possibility, and the
focus can move to something else.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 12:17 ` Robert White
@ 2014-12-10 13:11 ` Duncan
2014-12-10 18:56 ` Patrik Lundquist
2014-12-10 13:36 ` Patrik Lundquist
1 sibling, 1 reply; 36+ messages in thread
From: Duncan @ 2014-12-10 13:11 UTC (permalink / raw)
To: linux-btrfs
Robert White posted on Wed, 10 Dec 2014 04:17:50 -0800 as excerpted:
>> BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
>> BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
>> BTRFS: space_info 1 has 4773171200 free, is not full BTRFS: space_info
>> total=1494648619008, used=1489775505408, pinned=0, reserved=99700736,
>> may_use=2102390784, readonly=241664
>
> So it was looking for a single chunk 2013265920 bytes long and it
> couldn't find one because all the spaces were smaller and there was no
> room to make a new suitable space.
>
> The problem is that it wanted 2013265920 bytes and while the system as a
> whole had no way to satisfy that desire. It asked for something just shy
> of two gigs as a single extent. That's a tough order on a full platter.
>
> Since your entire free size is 2102390784 that is an attempt to allocate
> about 80% of your free space as one contiguous block. That's never going
> to happen. 8-)
>
> I don't even know if 2GiB is normally a legal size for an extent. My
> understanding is that data is allocated in 1G chunks, so I'd expect all
> extents to be smaller than 1G.
On native btrfs, an extent must fit within the 1 GiB data chunk size,
with extents inherited from an ext* conversion being an obvious non-
native exception.
I hadn't looked at the actual output, but that confirms my earlier
suspicion, that after the ext* saved subvolume delete, the defrag somehow
missed at least one file > 1 GiB with a "super-extent" also > 1 GiB in
size.
>From there... I've never used it but I /think/ btrfs inspect-internal
logical-resolve should let you map the 182109... address to a filename.
>From there, moving that file out of the filesystem and back in should
eliminate that issue.
Assuming no snapshots still contain the file, of course, and that the
ext* saved subvolume has already been deleted.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 12:17 ` Robert White
2014-12-10 13:11 ` Duncan
@ 2014-12-10 13:36 ` Patrik Lundquist
2014-12-11 8:42 ` Robert White
1 sibling, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-10 13:36 UTC (permalink / raw)
To: Robert White; +Cc: linux-btrfs@vger.kernel.org
On 10 December 2014 at 13:17, Robert White <rwhite@pobox.com> wrote:
> On 12/09/2014 11:19 PM, Patrik Lundquist wrote:
>>
> BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
> filesystem. That is, the recommended balance (and recursive defrag) is _not_
> a useability issue, its an efficiency issue.
But if I can't start with an efficient filesystem I'd rather start
over now/soon. I intend to add four more old disks for a RAID1 and it
will be problematic to start over later on (I'd have to buy new, large
disks).
I deleted the subvolume after being satisfied with the conversion,
defragged recursively, and balanced. In that order.
> Because you made a backup and everything yes?
Shh!
> So anyway. Your system isn't "bugged" or "broken" it's "full" but its a
> fragmented fullness that has lots of free sectors but insufficent contiguous
> free sectors, so it cannot satisfy the request.
It's a half full 3TB disk. There _is_ space, somewhere. I can't speak
for contiguous space though.
>> I don't know how to interpret the space_info error. Why is only
>> 4773171200 (4,4GiB) free?
>> Can I inspect block group 1821099687936 to try to find out what makes
>> it problematic?
>>
>> BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
>> BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
>> BTRFS: space_info 1 has 4773171200 free, is not full
>> BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
>> reserved=99700736, may_use=2102390784, readonly=241664
>
>
> So it was looking for a single chunk 2013265920 bytes long and it couldn't
> find one because all the spaces were smaller and there was no room to make a
> new suitable space.
>
> The problem is that it wanted 2013265920 bytes and while the system as a
> whole had no way to satisfy that desire. It asked for something just shy of
> two gigs as a single extent. That's a tough order on a full platter.
>
> Since your entire free size is 2102390784 that is an attempt to allocate
> about 80% of your free space as one contiguous block. That's never going to
> happen. 8-)
What about "space_info 1 has 4773171200 free"? Besides the other 1,5TB
free space.
> I don't even know if 2GiB is normally a legal size for an extent. My
> understanding is that data is allocated in 1G chunks, so I'd expect all
> extents to be smaller than 1G.
The 'summary' after the failed balances is always something like "98
enospc errors" which now makes me suspect that I have 98 files with
extents larger than 1GiB that the defrag didn't take care of.
So if I can find out which files have >1GiB extents I can then copy
them back and forth to solve the problem.
Maybe running defrag more times can also solve it? Can I get a list of
fragmented files?
Suppose an old file with 2GiB extent isn't fragmented, will btrfs
defrag still try to defrag it?
> After a quick glance at the btrfs-convert, it looks like it might make some
> pretty atypical extents if the underlying donor filesystem needed needed
> them. It wouldn't have had a choice. So it's easily within the realm of
> reason that you'd have some really fascinating data as a result of
> converting a nearly full EXT4 file system of the Terabyte+ size.
It was about half full at conversion.
> This would
> be quadruply true if you'd tweaked the block group ratios when you made the
> original file system.
Ext4 created with defaults, but I think it has been completely full at one time.
> So since you have nice backups... you should probably drop the ext2_saved
> subvolume and then get on with your life for good or ill.
Done before defrag and balance attempts.
> Think of the time and worry you'd have saved if you'd copied the thing in
> the first place. 8-)
But then I wouldn't learn as much. :-)
>>> P.S. you should re-balance your System and Metadata as "DUP" for now. Two
>>> copies of that stuff is better than one as right now you have no real
>>> recovery path for that stuff. If you didn't make that change on purpose
>>> it
>>> probably got down-revved from DUP automagically when you tired to RAID
>>> it.
>>
>>
>> Good point. Maybe btrfs-convert should do that by default? I don't
>> think it has ever been DUP.
>
> Eyup.
And the metadata is now DUP. That's ~1.5GB extra metadata that was
allocated just fine after the failed balance.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 13:11 ` Duncan
@ 2014-12-10 18:56 ` Patrik Lundquist
2014-12-10 22:28 ` Robert White
0 siblings, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-10 18:56 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
On 10 December 2014 at 14:11, Duncan <1i5t5.duncan@cox.net> wrote:
>
> From there... I've never used it but I /think/ btrfs inspect-internal
> logical-resolve should let you map the 182109... address to a filename.
> From there, moving that file out of the filesystem and back in should
> eliminate that issue.
btrfs inspect-internal logical-resolve 1821099687936 /mnt gives me the
filename and it's only a 54175 bytes file.
> Assuming no snapshots still contain the file, of course, and that the
> ext* saved subvolume has already been deleted.
Got no snapshots or subvolumes. Keeping it simple for now.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 12:47 ` Duncan
@ 2014-12-10 20:11 ` Patrik Lundquist
2014-12-11 4:02 ` Duncan
2014-12-11 4:49 ` Duncan
0 siblings, 2 replies; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-10 20:11 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
On 10 December 2014 at 13:47, Duncan <1i5t5.duncan@cox.net> wrote:
>
> The recursive btrfs defrag after deleting the saved ext* subvolume
> _should_ have split up any such > 1 GiB extents so balance could deal
> with them, but either it failed for some reason on at least one such
> file, or there's some other weird corner-case going on, very likely
> something else having to do with the conversion.
I've run defrag several times again and it doesn't do anything additional.
> Patrik, assuming no btrfs snapshots yet, can you do a du --all --block-
> size=1M | sort -n (or similar), then take a look at all results over 1024
> (1 GiB since the du specified 1 MiB blocks), and see if it's reasonable
> to move all those files out of the filesystem and back?
Good idea, but it's quite a lot of files. I'd rather start over.
But I've identified 46 files from Btrfs errors in syslog and will try
to move them to another disk. They're ranging from 41KiB to 6.6GiB in
size.
Is btrfs-debug-tree -e useful in finding problematic files?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 18:56 ` Patrik Lundquist
@ 2014-12-10 22:28 ` Robert White
2014-12-11 4:13 ` Duncan
2014-12-11 6:16 ` Patrik Lundquist
0 siblings, 2 replies; 36+ messages in thread
From: Robert White @ 2014-12-10 22:28 UTC (permalink / raw)
To: Patrik Lundquist, linux-btrfs@vger.kernel.org
On 12/10/2014 10:56 AM, Patrik Lundquist wrote:
> On 10 December 2014 at 14:11, Duncan <1i5t5.duncan@cox.net> wrote:
>> Assuming no snapshots still contain the file, of course, and that the
>> ext* saved subvolume has already been deleted.
>
> Got no snapshots or subvolumes. Keeping it simple for now.
Does that mean that you have already manually removed the subvolume that
was automatically created by btrfs-convert?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 20:11 ` Patrik Lundquist
@ 2014-12-11 4:02 ` Duncan
2014-12-11 4:49 ` Duncan
1 sibling, 0 replies; 36+ messages in thread
From: Duncan @ 2014-12-11 4:02 UTC (permalink / raw)
To: linux-btrfs
Patrik Lundquist posted on Wed, 10 Dec 2014 21:11:52 +0100 as excerpted:
> Is btrfs-debug-tree -e useful in finding problematic files?
Since you were replying directly to me, my answer...
ENOTENOUGHINFO
I don't know enough about it to honestly say, as I've never used it
myself and haven't seen anyone posting practical usage that I could make
note of in case I or someone else needed it later.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 22:28 ` Robert White
@ 2014-12-11 4:13 ` Duncan
2014-12-11 10:29 ` Patrik Lundquist
2014-12-11 6:16 ` Patrik Lundquist
1 sibling, 1 reply; 36+ messages in thread
From: Duncan @ 2014-12-11 4:13 UTC (permalink / raw)
To: linux-btrfs
Robert White posted on Wed, 10 Dec 2014 14:28:10 -0800 as excerpted:
> On 12/10/2014 10:56 AM, Patrik Lundquist wrote:
>> On 10 December 2014 at 14:11, Duncan <1i5t5.duncan@cox.net> wrote:
>>> Assuming no snapshots still contain the file, of course, and that the
>>> ext* saved subvolume has already been deleted.
>>
>> Got no snapshots or subvolumes. Keeping it simple for now.
>
> Does that mean that you have already manually removed the subvolume that
> was automatically created by btrfs-convert?
Yes, he had.
Patrik correct me if I have this wrong, but filling in the history as I
believe I have it...
If I'm keeping my cases straight, he had actually posted a thread some
weeks ago with the initial problem, saying he had followed the conversion
instructions to the letter -- conversion, delete-saved, defrag, balance,
and ran into this problem with balance. The conclusion at that time was
that he'd try successively larger balance -dusage=N figures, hoping to
work thru it that way.
That original thread could well have been shortly before you appeared on
the list, however, and you may not have seen it. Either that, or you saw
it but didn't connect that case with this one.
Anyway, yes, assuming I haven't gotten my casefiles mixed up, and
evidence so far is that I haven't, he did everything he was supposed to
and still ended up with this issue. Obviously there's still a bug
somewhere.
And now he's back. The incrementally increasing usage= balances reaching
99%, but that last 1% is the sticking point and he, and the rest of us,
are trying to figure out what happened and how to get him past it.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 20:11 ` Patrik Lundquist
2014-12-11 4:02 ` Duncan
@ 2014-12-11 4:49 ` Duncan
1 sibling, 0 replies; 36+ messages in thread
From: Duncan @ 2014-12-11 4:49 UTC (permalink / raw)
To: linux-btrfs
Patrik Lundquist posted on Wed, 10 Dec 2014 21:11:52 +0100 as excerpted:
>> Patrik, assuming no btrfs snapshots yet, can you do a du --all --block-
>> size=1M | sort -n (or similar), then take a look at all results over
>> 1024 (1 GiB since the du specified 1 MiB blocks), and see if it's
>> reasonable to move all those files out of the filesystem and back?
>
> Good idea, but it's quite a lot of files. I'd rather start over.
>
> But I've identified 46 files from Btrfs errors in syslog and will try to
> move them to another disk. They're ranging from 41KiB to 6.6GiB in size.
There's one as yet incomplete piece of the puzzle. I guess the devs
could probably answer this, but being a simple sysadmin, I don't claim to
read code well and don't know...
That log snippet you quoted earlier gave block-group addresses. That's
the chunks, in this case normally 1 GiB data chunks, but here we're
dealing with a conversion from ext4 and apparently the extents are
larger, nearly 2 GiB in this case according to that snippet.
That had me thinking the problem files were all > 1 GiB and had these
super-extents that btrfs can't work with.
But you say you tracked down the file as I suggested using btrfs-inspect-
internal, and the file is much smaller than that.
Now I don't even know for sure what that log snippet was from, a normal
dmesg during an attempted balance, or dmesg with btrfs debug turned on in
the kernel, or the userspace debug you ask about, or...
And not being a dev and not having done anything like this level myself,
I'm sort of feeling my way along here too, trying to figure things out as
you report them.
So the missing piece I'm talking about is this. OK, we have the address
of a nearly 2 GiB block group reported, and I recalled seeing in an
earlier post that trick with btrfs-inspect-internal, so I though to try
it here.
But with the file being so much smaller than the 2 GiB block group
reported, something's not matching. Either the file is somehow using an
extent much much larger than it is (possible with fallocate, then writing
a shorter file, I believe), or the referred to block group actually
contains more than one file -- certainly btrfs data chunks can do so, but
given that we're dealing with a conversion here, I don't know if the same
rules apply, or...
Anyway, it's possible that smaller file is simply the first one in the
block group, thus being the one that was mapped when you plugged that
address into inspect-internal, and that the problem file is actually a
much larger file located after it in the same block group.
So if moving the small files doesn't do the trick, try feeding inspect-
internal with an address after that. Given that btrfs blocks are 4 KiB
in size, round the size of the small file up to the nearest 4 KiB and add
that to the address originally obtained from the log, and see if inspect-
internal points at a different, presumably much larger (> 1 GiB or at
least big enough so it'd extent beyond a GiB beyond the original
address), file, with the new offset address. If so, try moving /that/
file, and see if you have any better luck.
I was /hoping/ it would be the simple case and all the problem block-
group addresses would point to > 1 GiB files and moving them would be
it. But with a significant number of those addresses pointing at far
smaller files, either I was wrong about the use of inspect-internal here
and they're entirely unrelated, or the situation is otherwise rather more
complex than I was hoping to be the case.
OTOH, if for whatever reason all those smaller files were fallocated to
some huge size and then written smaller, or something similar happened
such that they're using huge > 1 GiB extents even while being smaller
than 1 GiB in size, that COULD go some distance to explaining why defrag
missed them. If defrag is looking at filesize and the files happen to be
small but in huge extents, and it's those extents causing the problem,
then we just found our bug, and all that's left is figuring out how to
fix it, which is where I step out and the devs step in. With a bit of
luck, that's it, and we're now well on the way to fixing a bug that could
have otherwise triggered unexplained problems for some people doing
conversions, but not others, for quite some time to come. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 22:28 ` Robert White
2014-12-11 4:13 ` Duncan
@ 2014-12-11 6:16 ` Patrik Lundquist
1 sibling, 0 replies; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-11 6:16 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
On 10 December 2014 at 23:28, Robert White <rwhite@pobox.com> wrote:
> On 12/10/2014 10:56 AM, Patrik Lundquist wrote:
>>
>> On 10 December 2014 at 14:11, Duncan <1i5t5.duncan@cox.net> wrote:
>>>
>>> Assuming no snapshots still contain the file, of course, and that the
>>> ext* saved subvolume has already been deleted.
>>
>> Got no snapshots or subvolumes. Keeping it simple for now.
>
> Does that mean that you have already manually removed the subvolume that was
> automatically created by btrfs-convert?
Yes.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-10 13:36 ` Patrik Lundquist
@ 2014-12-11 8:42 ` Robert White
2014-12-11 9:02 ` Duncan
2014-12-11 9:55 ` Patrik Lundquist
0 siblings, 2 replies; 36+ messages in thread
From: Robert White @ 2014-12-11 8:42 UTC (permalink / raw)
To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org
On 12/10/2014 05:36 AM, Patrik Lundquist wrote:
> On 10 December 2014 at 13:17, Robert White <rwhite@pobox.com> wrote:
>> On 12/09/2014 11:19 PM, Patrik Lundquist wrote:
>>>
>> BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
>> filesystem. That is, the recommended balance (and recursive defrag) is _not_
>> a useability issue, its an efficiency issue.
>
> But if I can't start with an efficient filesystem I'd rather start
> over now/soon. I intend to add four more old disks for a RAID1 and it
> will be problematic to start over later on (I'd have to buy new, large
> disks).
Nope, not an issue.
When you add the space and rebalance with the conversions by adding all
those other disks and such it will _completely_ _obliterate_ the current
balance.
You are cleaning the house before the maid comes.
PLUS:::
If you are going to add four more volumes, if those volumes are big
enough just make a new filesystem on them then copy the files over. You
wont have any freakish nonsense left over from the old drive and its
foibles. Then just add the existing drive to the "new" filesystem and
_then_ do the balance.
Right now you are at best trying to iron over cruft from the conversion
with the larger-than-1G extents and stuff that would never happen on a
fresh system.
PLUS:::
The whole "time saving" chance of doing a conversion? Well that window
closed last freaking month... 8-)
> I deleted the subvolume after being satisfied with the conversion,
> defragged recursively, and balanced. In that order.
Yea, but your file system is full and you are out of space so get on
with the adding space.
>> Because you made a backup and everything yes?
>
> Shh!
>> So anyway. Your system isn't "bugged" or "broken" it's "full" but its a
>> fragmented fullness that has lots of free sectors but insufficient contiguous
>> free sectors, so it cannot satisfy the request.
>
> It's a half full 3TB disk. There _is_ space, somewhere. I can't speak
> for contiguous space though.
Contiguous space is all that matters here. It's trying to swallow a
brick that is _slightly_ larger than any extent ext4 would have likely
left hanging about.
(looking back through my mail spool) You haven't sent the output of
/bin/df or btrfs fi df yet, I'd like to see what those two commands say.
No Space (to allocate a storage extent)
is different than
No Space (to allocate file contents).
So the space may just be sitting there in the difference between your
data total= and your data used=
I mean this could easily be "situation normal" if your output looks like
"Data, single: total=3TiB, used=1.5TiB" or something.
>>> I don't know how to interpret the space_info error. Why is only
>>> 4773171200 (4,4GiB) free?
>>> Can I inspect block group 1821099687936 to try to find out what makes
>>> it problematic?
>>>
>>> BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
>>> BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
>>> BTRFS: space_info 1 has 4773171200 free, is not full
>>> BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
>>> reserved=99700736, may_use=2102390784, readonly=241664
>>
>>
>> So it was looking for a single chunk 2013265920 bytes long and it couldn't
>> find one because all the spaces were smaller and there was no room to make a
>> new suitable space.
>>
>> The problem is that it wanted 2013265920 bytes and while the system as a
>> whole had no way to satisfy that desire. It asked for something just shy of
>> two gigs as a single extent. That's a tough order on a full platter.
>>
>> Since your entire free size is 2102390784 that is an attempt to allocate
>> about 80% of your free space as one contiguous block. That's never going to
>> happen. 8-)
>
> What about "space_info 1 has 4773171200 free"? Besides the other 1,5TB
> free space.
The "1" is the drive. That 4773171200 is not contiguous. I didn't look
much further in the code because it's a new code base to me. But its
asking for one contiguous extent of size 2013265920 and that's a
non-starter for me. With the odd sized chunks possible after a
conversion ... well pshaw...
>> I don't even know if 2GiB is normally a legal size for an extent. My
>> understanding is that data is allocated in 1G chunks, so I'd expect all
>> extents to be smaller than 1G.
>
> The 'summary' after the failed balances is always something like "98
> enospc errors" which now makes me suspect that I have 98 files with
> extents larger than 1GiB that the defrag didn't take care of.
Files? No extents. a.k.a. "chunks" whatever those are after a
conversion. Room for extents is different than room for files.
> So if I can find out which files have >1GiB extents I can then copy
> them back and forth to solve the problem.
Deck Chairs. You are playing a game of musical deck chairs. Don't
obsess. 8-)
> Maybe running defrag more times can also solve it? Can I get a list of
> fragmented files?
I wouldn't expect defrag to do a thing about this. The extents in the
extent tree are not necessarily for single files. (they might _never_ be
for single files.)
> Suppose an old file with 2GiB extent isn't fragmented, will btrfs
> defrag still try to defrag it?
No idea. I'd think it would not move something that is already
contiguous. This isn't windows where the defrager itself leaves
micro-fragments after each file. (Don't get me started on that nonsense. 8-)
>> After a quick glance at the btrfs-convert, it looks like it might make some
>> pretty atypical extents if the underlying donor filesystem needed needed
>> them. It wouldn't have had a choice. So it's easily within the realm of
>> reason that you'd have some really fascinating data as a result of
>> converting a nearly full EXT4 file system of the Terabyte+ size.
>
> It was about half full at conversion.
Being the opposite of an expert on btrfs-convert I cant help wonder
where the threshold of discrimination is. I mean did it just take every
single block group and toss it into a separate extent with no eye to the
actual contents? That would be valid and fast. Then the allocation maps
per-file would handle you re-using the referenced space etc.
>> This would
>> be quadruply true if you'd tweaked the block group ratios when you made the
>> original file system.
>
> Ext4 created with defaults, but I think it has been completely full at one time.
Did you use e4defrag before you did the conversion or is this the result
of converting chaos most profound?
>> So since you have nice backups... you should probably drop the ext2_saved
>> subvolume and then get on with your life for good or ill.
>
> Done before defrag and balance attempts.
Good job.
>> Think of the time and worry you'd have saved if you'd copied the thing in
>> the first place. 8-)
>
> But then I wouldn't learn as much. :-)
Learning not to cut corners is a lesson... 8-)
>>>> P.S. you should re-balance your System and Metadata as "DUP" for now. Two
>>>> copies of that stuff is better than one as right now you have no real
>>>> recovery path for that stuff. If you didn't make that change on purpose
>>>> it
>>>> probably got down-revved from DUP automagically when you tired to RAID
>>>> it.
>>>
>>>
>>> Good point. Maybe btrfs-convert should do that by default? I don't
>>> think it has ever been DUP.
>>
>> Eyup.
>
> And the metadata is now DUP. That's ~1.5GB extra metadata that was
> allocated just fine after the failed balance.
More evidence that you are just trying to swallow a brick. Metadata is
done in like 256Mb chunks I think, so yea, lots of room for that left
sitting around on a typical EXT4 etc.
TRUTH BE TOLD :: After two very "eventful" conversions not too long ago
I just don't do those any more. The total amount of time I "saved" by
not copying the files was in the negative numbers before I just copied
the files onto an external media and reformatted and restored.
Additionally I got the chance to lay out my subvolumes and decide about
compression and such before doing the restore.
With a new filesystem I knew exactly what I was getting for a layout and
I've had no mysteries since.
I don't know if thats politic to say in this list but really, most
format conversions I've ever done (hearkening all the way back to some
9-track tape excitement in the eighties) usually leave me feeling like
maybe I hacked the corners off a cardboard box with a machete to make it
fit in under a sofa.
Then again I am getting old and sometimes its easier to just chase kids
off your lawn. 8-)
SO .....
What I'd do, most to least likely.
(0) look at df and btrfs fi df output and see if I could account for the
free space I expected. If it's there I'd post a "oh hey, look at that"
message on the list and then move on to one of the latter options.
then
(1) Make an New FS on those other drives and copy my working set onto
it, that way I got all the defaults in sizes and extents and it will all
be nice round numbers like 1G and 256Mb, cause some day I might be
adding in SSDs or something. I like a nice orderly system.
(1a) Then I could maybe keep the old drive and dissect it's contents for
fun and knowledge.
(1b) Then I could just add the old drive into the new array once I
needed the storage.
or else
(2) Hook up the new drives and add them into the existing filesystem.
Then balance the everything.
(2a) Look at the extent maps after that and discover that I still had
odd 2-ish gig extents and silently fume at the assemetry.
or else
(3) Keep fiddling with it till I got frustrated then go back to one of
the prior options. 8-)
It's like a choose-your-own-adventure book! 8-)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-11 8:42 ` Robert White
@ 2014-12-11 9:02 ` Duncan
2014-12-11 9:55 ` Patrik Lundquist
1 sibling, 0 replies; 36+ messages in thread
From: Duncan @ 2014-12-11 9:02 UTC (permalink / raw)
To: linux-btrfs
Robert White posted on Thu, 11 Dec 2014 00:42:38 -0800 as excerpted:
> TRUTH BE TOLD :: After two very "eventful" conversions not too long ago
> I just don't do those any more. The total amount of time I "saved" by
> not copying the files was in the negative numbers before I just copied
> the files onto an external media and reformatted and restored.
While I was running reiserfs and thus wasn't a conversion candidate, I
have basically the same opinion of the ext* -> btrfs conversion tool.
It's for people who don't have the extra space resources necessary to do
a full backup, wipe clean and set it up the way you like, then restore.
That said, the conversion and subsequent btrfs troubleshooting has
certainly been a "real world" learning experience for you (Patrik, not
Robert as quoted above), and while I'd certainly start clean when I was
really going to do it, for initially playing around, learning the tools,
some troubleshooting, etc, and just to be able to say I've tried the
conversion, I could easily see myself spending some time doing what you
did, just learning the ropes, etc. When I was done playing and ready to
do it for real, I'd wipe the playground and start clean, more confident
in my setup and management since I had spent some time playing with it
and familiarizing myself with how it worked and what might work best in
terms of my own setup.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-11 8:42 ` Robert White
2014-12-11 9:02 ` Duncan
@ 2014-12-11 9:55 ` Patrik Lundquist
2014-12-11 11:01 ` Robert White
1 sibling, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-11 9:55 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
On 11 December 2014 at 09:42, Robert White <rwhite@pobox.com> wrote:
> On 12/10/2014 05:36 AM, Patrik Lundquist wrote:
>>
>> On 10 December 2014 at 13:17, Robert White <rwhite@pobox.com> wrote:
>>>
>>> On 12/09/2014 11:19 PM, Patrik Lundquist wrote:
>>>>
>>>>
>>> BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
>>> filesystem. That is, the recommended balance (and recursive defrag) is
>>> _not_
>>> a useability issue, its an efficiency issue.
>>
>>
>> But if I can't start with an efficient filesystem I'd rather start
>> over now/soon. I intend to add four more old disks for a RAID1 and it
>> will be problematic to start over later on (I'd have to buy new, large
>> disks).
>
>
> Nope, not an issue.
>
> When you add the space and rebalance with the conversions by adding all
> those other disks and such it will _completely_ _obliterate_ the current
> balance.
But if the issue is too large extents, why would they fit on any added
btrfs space?
> You are cleaning the house before the maid comes.
Indeed, as a health check. And the patient is slightly ill.
> If you are going to add four more volumes, if those volumes are big enough
> just make a new filesystem on them then copy the files over.
As it looks now, I will, but I also think there's a bug which I'm
trying to zero in on.
>> I deleted the subvolume after being satisfied with the conversion,
>> defragged recursively, and balanced. In that order.
>
> Yea, but your file system is full and you are out of space so get on with
> the adding space.
I don't think it is full. balance -musage=100 -dusage=99 completes
with ~1.5TB free space. The remaining unbalanced data is using full or
close to full blocks. Still can't speak for contiguous space though.
> (looking back through my mail spool) You haven't sent the output of /bin/df
> or btrfs fi df yet, I'd like to see what those two commands say.
I have posted these before, but not /bin/df (no access at the moment).
btrfs fi show
Label: none uuid: 770fe01d-6a45-42b9-912e-
e8f8b413f6a4
Total devices 1 FS bytes used 1.35TiB
devid 1 size 2.73TiB used 1.36TiB path /dev/sdc1
btrfs fi df /mnt
Data, single: total=1.35TiB, used=1.35TiB
System, single: total=32.00MiB, used=112.00KiB
Metadata, single: total=3.00GiB, used=1.55GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
btrfs check /dev/sdc1
Checking filesystem on /dev/sdc1
UUID: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
found 825003219475 bytes used err is 0
total csum bytes: 1452612464
total tree bytes: 1669943296
total fs tree bytes: 39600128
total extent tree bytes: 52903936
btree space waste bytes: 79921034
file data blocks allocated: 1487627730944
referenced 1487627730944
>>> This would
>>> be quadruply true if you'd tweaked the block group ratios when you made
>>> the original file system.
>>
>> Ext4 created with defaults, but I think it has been completely full at one
>> time.
>
> Did you use e4defrag before you did the conversion or is this the result of
> converting chaos most profound?
Didn't use e4defrag.
>>> Think of the time and worry you'd have saved if you'd copied the thing in
>>> the first place. 8-)
>>
>> But then I wouldn't learn as much. :-)
>
> Learning not to cut corners is a lesson... 8-)
This is more of an experiment than cutting corners, but yeah.
> TRUTH BE TOLD :: After two very "eventful" conversions not too long ago I
> just don't do those any more. The total amount of time I "saved" by not
> copying the files was in the negative numbers before I just copied the files
> onto an external media and reformatted and restored.
Conversion probably should be discouraged on the wiki then.
> It's like a choose-your-own-adventure book! 8-)
I like that! :-)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-11 4:13 ` Duncan
@ 2014-12-11 10:29 ` Patrik Lundquist
0 siblings, 0 replies; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-11 10:29 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
On 11 December 2014 at 05:13, Duncan <1i5t5.duncan@cox.net> wrote:
>
> Patrik correct me if I have this wrong, but filling in the history as I
> believe I have it...
You're right Duncan, except it began as a private question about an
error in a blog and went from there. Not that it matters, except the
subject is not very fitting anymore and I tried to reboot the thread
with a summary since it's getting a bit hard to find the facts.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Fixing Btrfs Filesystem Full Problems typo?
2014-12-11 9:55 ` Patrik Lundquist
@ 2014-12-11 11:01 ` Robert White
0 siblings, 0 replies; 36+ messages in thread
From: Robert White @ 2014-12-11 11:01 UTC (permalink / raw)
To: Patrik Lundquist, linux-btrfs@vger.kernel.org
On 12/11/2014 01:55 AM, Patrik Lundquist wrote:
> On 11 December 2014 at 09:42, Robert White <rwhite@pobox.com> wrote:
>> On 12/10/2014 05:36 AM, Patrik Lundquist wrote:
>>>
>>> On 10 December 2014 at 13:17, Robert White <rwhite@pobox.com> wrote:
>>>>
>>>> On 12/09/2014 11:19 PM, Patrik Lundquist wrote:
>>>>>
>>>>>
>>>> BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
>>>> filesystem. That is, the recommended balance (and recursive defrag) is
>>>> _not_
>>>> a useability issue, its an efficiency issue.
>>>
>>>
>>> But if I can't start with an efficient filesystem I'd rather start
>>> over now/soon. I intend to add four more old disks for a RAID1 and it
>>> will be problematic to start over later on (I'd have to buy new, large
>>> disks).
>>
>>
>> Nope, not an issue.
>>
>> When you add the space and rebalance with the conversions by adding all
>> those other disks and such it will _completely_ _obliterate_ the current
>> balance.
>
> But if the issue is too large extents, why would they fit on any added
> btrfs space?
Because that added btrfs space will be _empty_. It's not that the extent
is "too big" by some absolute measure. It's that it's too big to fit in
the available space at the _extent_ _tree_ level.
You can't put two feet into one shoe.
>> You are cleaning the house before the maid comes.
>
> Indeed, as a health check. And the patient is slightly ill.
Not really...
So lets say I have a bunch of things that are all size 10-inches
And lets say I space them along a rail with 9-inches between each object.
And I glue them down (because Copy On Write only)
And I do that until the rail is "full", say it takes 100 to fill the rail.
So I still have 900 inches of "free space" but I don't have _any_ _more_
_room_ available if I need to mount another 10-inch item.
There's plenty of space but there is no room.
This is what youve got going on.
The conversion hoovered up all the block groups from the ext4 donor
image more-or-less, and then it built the metadata blocks
(see btrfs-convert at about line 1486)
/* for each block group, create device extent and chunk item */
etc...
>> If you are going to add four more volumes, if those volumes are big enough
>> just make a new filesystem on them then copy the files over.
>
> As it looks now, I will, but I also think there's a bug which I'm
> trying to zero in on.
It doesn't exist. There is no bug that I can see from anything you've shown.
You are confusing the word "extent" as used in ext4, which is a per-file
thing, with the word "extent" as used differently in btrfs which is a
raw storage region into which other structures or data is placed.
>>> I deleted the subvolume after being satisfied with the conversion,
>>> defragged recursively, and balanced. In that order.
>>
>> Yea, but your file system is full and you are out of space so get on with
>> the adding space.
>
> I don't think it is full. balance -musage=100 -dusage=99 completes
> with ~1.5TB free space. The remaining unbalanced data is using full or
> close to full blocks. Still can't speak for contiguous space though.
>
>
>> (looking back through my mail spool) You haven't sent the output of /bin/df
>> or btrfs fi df yet, I'd like to see what those two commands say.
>
> I have posted these before, but not /bin/df (no access at the moment).
Ah, yes, I remember these, but the /bin/df is what's going to be
dispositive.
> btrfs fi show
> Label: none uuid: 770fe01d-6a45-42b9-912e-
> e8f8b413f6a4
> Total devices 1 FS bytes used 1.35TiB
> devid 1 size 2.73TiB used 1.36TiB path /dev/sdc1
>
>
> btrfs fi df /mnt
> Data, single: total=1.35TiB, used=1.35TiB
> System, single: total=32.00MiB, used=112.00KiB
> Metadata, single: total=3.00GiB, used=1.55GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> btrfs check /dev/sdc1
> Checking filesystem on /dev/sdc1
> UUID: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
> found 825003219475 bytes used err is 0
> total csum bytes: 1452612464
> total tree bytes: 1669943296
> total fs tree bytes: 39600128
> total extent tree bytes: 52903936
> btree space waste bytes: 79921034
> file data blocks allocated: 1487627730944
> referenced 1487627730944
>>>> This would
>>>> be quadruply true if you'd tweaked the block group ratios when you made
>>>> the original file system.
>>>
>>> Ext4 created with defaults, but I think it has been completely full at one
>>> time.
>>
>> Did you use e4defrag before you did the conversion or is this the result of
>> converting chaos most profound?
>
> Didn't use e4defrag.
Probably doesn't matter. Now that I've read more of btrfs-convert.c I
think I can see how this is shaking out. e4defrag might have packed the
block groups tighter but it doesn't really try to maximize free space
within the extent.
>>>> Think of the time and worry you'd have saved if you'd copied the thing in
>>>> the first place. 8-)
>>>
>>> But then I wouldn't learn as much. :-)
>>
>> Learning not to cut corners is a lesson... 8-)
>
> This is more of an experiment than cutting corners, but yeah.
>
>
>> TRUTH BE TOLD :: After two very "eventful" conversions not too long ago I
>> just don't do those any more. The total amount of time I "saved" by not
>> copying the files was in the negative numbers before I just copied the files
>> onto an external media and reformatted and restored.
>
> Conversion probably should be discouraged on the wiki then.
I didn't pursue the wiki on the matter, but conversion of anything to
anything always requires living with the limits of both, at least to
start. In this case you are suffering under the burden of the block
group alignment and layout that was selected by mkfs.ext4, which is
based on assumptions optimal to ext4.
Systems are _rarely_ replaced by other systems based on the same
assumptions.
As a terrible aside example, EXT4 says it can support file extent sizes
up to two gig. But that assumes your CPU memory page size is 64k. On a
typical Intel PC the page size is 4k, so your maximum extent size is
1/8th that size. I filed a bug on that some time ago because e4defrag
output didn't take that into account.
e.g. http://sourceforge.net/p/e2fsprogs/bugs/314/
The mythology of that two-gig file extent has people allocating VM drive
stripes and rdbms files (etc) in two-gig chunks thinking they are
optimally alinging things with their drive allocations. But when they do
it on an intel box they are wrong. Those extents should have been 128Meg
if they wanted one file equals one extent layouts.
So assumptions in systems can become pernicious, and when you try to do
any sort of in-place conversion you are likely to end up with the least
of all worlds.
The devils are always in the details.
Heck, we are still dragging around head/track/sector disk geometry
nonsense despite variable pitch recording performed on modern drives.
Thats because we just keep converting old ideas to new.
My "eventful" conversions of those two disks may well have (and probably
were) completely my own doing. It's a poor craftsman that blames his tools.
My house is a mess. My computers tend to be scrupulously organized. And
the result from btrfs-convert just doesn't seem optimal for all future
geometries. After all if default extent size of 0x1000 and 0x8000 were
chosen for optimal cause (instead of beauty) ending up with a buch of
two-gig-ish extents would oppose that cause.
It just feels ookie to use btrfs-convert in _my_ _humble_ _opinion_.
>> It's like a choose-your-own-adventure book! 8-)
>
> I like that! :-)
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2014-12-11 11:01 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAA7pwKNH-Cbd+_D+sCEJxxdervLC=_3_AzaywSE3mXi8MLydxw@mail.gmail.com>
2014-11-22 22:26 ` Fixing Btrfs Filesystem Full Problems typo? Marc MERLIN
2014-11-22 23:26 ` Patrik Lundquist
2014-11-22 23:46 ` Marc MERLIN
2014-11-23 0:05 ` Hugo Mills
2014-11-23 1:07 ` Marc MERLIN
2014-11-23 7:52 ` Duncan
2014-11-23 15:12 ` Patrik Lundquist
2014-11-24 4:23 ` Duncan
2014-11-24 12:35 ` Patrik Lundquist
2014-12-09 22:29 ` Patrik Lundquist
2014-12-09 23:13 ` Robert White
2014-12-10 7:19 ` Patrik Lundquist
2014-12-10 12:17 ` Robert White
2014-12-10 13:11 ` Duncan
2014-12-10 18:56 ` Patrik Lundquist
2014-12-10 22:28 ` Robert White
2014-12-11 4:13 ` Duncan
2014-12-11 10:29 ` Patrik Lundquist
2014-12-11 6:16 ` Patrik Lundquist
2014-12-10 13:36 ` Patrik Lundquist
2014-12-11 8:42 ` Robert White
2014-12-11 9:02 ` Duncan
2014-12-11 9:55 ` Patrik Lundquist
2014-12-11 11:01 ` Robert White
2014-12-09 23:20 ` Robert White
2014-12-09 23:48 ` Robert White
2014-12-10 0:01 ` Robert White
2014-12-10 12:47 ` Duncan
2014-12-10 20:11 ` Patrik Lundquist
2014-12-11 4:02 ` Duncan
2014-12-11 4:49 ` Duncan
2014-11-23 21:16 ` Marc MERLIN
2014-11-23 22:49 ` Holger Hoffstätte
2014-11-24 4:40 ` Duncan
2014-12-07 21:38 ` Marc MERLIN
2014-11-24 18:05 ` Brendan Hide
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).