Re: Fixing Btrfs Filesystem Full Problems typo?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Fixing Btrfs Filesystem Full Problems typo?
       [not found] <CAA7pwKNH-Cbd+_D+sCEJxxdervLC=_3_AzaywSE3mXi8MLydxw@mail.gmail.com>
@ 2014-11-22 22:26 ` Marc MERLIN
  2014-11-22 23:26   ` Patrik Lundquist
  0 siblings, 1 reply; 36+ messages in thread
From: Marc MERLIN @ 2014-11-22 22:26 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org

+btrfs list so that someone can correct me if I'm wrong.

On Sat, Nov 22, 2014 at 09:34:59PM +0100, Patrik Lundquist wrote:
> Hi,
> 
> I was scratching my head over a failing btrfs balance and read your
> very informative
> http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html,
> but shouldn't
> 
> "I can ask balance to rewrite all chunks that are more than 55% full"
> 
> be
> 
> "I can ask balance to rewrite all chunks that are less than 55% full"?

This one hurts my brain every time I think about it :)

So, the bigger the -dusage number, the more work btrfs has to do.

-dusage=0 does almost nothing
-dusage=100 effectively rebalances everything

But saying saying "less than 95% full" for -dusage=95 would mean
rebalancing everything that isn't almost full, so I'm not sure it makes
sense either (I would think you'd wan't to reblance full blocks first).

The logical wording would be "less than 95% space free".

I'll update my page since this is what makes the most sense.

Now, just to be sure, if I'm getting this right, if your filesystem is
55% full, you could rebalance all blocks that have less than 55% space
free, and use -dusage=55

Does that sound right?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-22 22:26 ` Fixing Btrfs Filesystem Full Problems typo? Marc MERLIN
@ 2014-11-22 23:26   ` Patrik Lundquist
  2014-11-22 23:46     ` Marc MERLIN
  2014-11-23  0:05     ` Hugo Mills
  0 siblings, 2 replies; 36+ messages in thread
From: Patrik Lundquist @ 2014-11-22 23:26 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs@vger.kernel.org

On 22 November 2014 at 23:26, Marc MERLIN <marc@merlins.org> wrote:
>
> This one hurts my brain every time I think about it :)

I'm new to Btrfs so I may very well be wrong, since I haven't really
read up on it. :-)

> So, the bigger the -dusage number, the more work btrfs has to do.

Agreed.

> -dusage=0 does almost nothing
> -dusage=100 effectively rebalances everything

And -dusage=0 effectively reclaims empty chunks, right?

> But saying saying "less than 95% full" for -dusage=95 would mean
> rebalancing everything that isn't almost full,

But isn't that what rebalance does? Rewriting chunks <=95% full to
completely full chunks and effectively defragmenting chunks and most
likely reduce the number of chunks.

A -dusage=0 rebalance reduced my number of chunks from 1173 to 998 and
dev_item.bytes_used went from 1593466421248 to 1491460947968.

> Now, just to be sure, if I'm getting this right, if your filesystem is
> 55% full, you could rebalance all blocks that have less than 55% space
> free, and use -dusage=55

I realize that I interpret the usage parameter as operating on blocks
(chunks? are they the same in this case?) that are <= 55% full while
you interpret it as <= 55% free.

Which is correct?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-22 23:26   ` Patrik Lundquist
@ 2014-11-22 23:46     ` Marc MERLIN
  2014-11-23  0:05     ` Hugo Mills
  1 sibling, 0 replies; 36+ messages in thread
From: Marc MERLIN @ 2014-11-22 23:46 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org

On Sun, Nov 23, 2014 at 12:26:38AM +0100, Patrik Lundquist wrote:
> I realize that I interpret the usage parameter as operating on blocks
> (chunks? are they the same in this case?) that are <= 55% full while
> you interpret it as <= 55% free.
> 
> Which is correct?

I will let someone else answer because I'm not 100% certain anymore.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-22 23:26   ` Patrik Lundquist
  2014-11-22 23:46     ` Marc MERLIN
@ 2014-11-23  0:05     ` Hugo Mills
  2014-11-23  1:07       ` Marc MERLIN
  1 sibling, 1 reply; 36+ messages in thread
From: Hugo Mills @ 2014-11-23  0:05 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: Marc MERLIN, linux-btrfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2189 bytes --]

On Sun, Nov 23, 2014 at 12:26:38AM +0100, Patrik Lundquist wrote:
> On 22 November 2014 at 23:26, Marc MERLIN <marc@merlins.org> wrote:
> >
> > This one hurts my brain every time I think about it :)
> 
> I'm new to Btrfs so I may very well be wrong, since I haven't really
> read up on it. :-)
> 
> 
> > So, the bigger the -dusage number, the more work btrfs has to do.
> 
> Agreed.
> 
> 
> > -dusage=0 does almost nothing
> > -dusage=100 effectively rebalances everything
> 
> And -dusage=0 effectively reclaims empty chunks, right?
> 
> 
> > But saying saying "less than 95% full" for -dusage=95 would mean
> > rebalancing everything that isn't almost full,
> 
> But isn't that what rebalance does? Rewriting chunks <=95% full to
> completely full chunks and effectively defragmenting chunks and most
> likely reduce the number of chunks.
> 
> A -dusage=0 rebalance reduced my number of chunks from 1173 to 998 and
> dev_item.bytes_used went from 1593466421248 to 1491460947968.
> 
> 
> > Now, just to be sure, if I'm getting this right, if your filesystem is
> > 55% full, you could rebalance all blocks that have less than 55% space
> > free, and use -dusage=55
> 
> I realize that I interpret the usage parameter as operating on blocks
> (chunks? are they the same in this case?) that are <= 55% full while
> you interpret it as <= 55% free.
> 
> Which is correct?

   Less than or equal to 55% full.

   0 gives you less than or equal to 0% full -- i.e. the empty block
groups. 100 gives you less than or equal to 100% full, i.e. all block
groups.

   A chunk is the part of a block group that lives on one device, so
in RAID-1, every block group is precisely two chunks; in RAID-0, every
block group is 2 or more chunks, up to the number of devices in the
FS. A chunk is usually 1 GiB in size for data and 250 MiB for
metadata, but can be smaller under some circumstances.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- And what rough beast,  its hour come round at last / slouches ---  
                     towards Bethlehem,  to be born?                     

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-23  0:05     ` Hugo Mills
@ 2014-11-23  1:07       ` Marc MERLIN
  2014-11-23  7:52         ` Duncan
  2014-11-24 18:05         ` Brendan Hide
  0 siblings, 2 replies; 36+ messages in thread
From: Marc MERLIN @ 2014-11-23  1:07 UTC (permalink / raw)
  To: Hugo Mills, Patrik Lundquist, linux-btrfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]

On Sun, Nov 23, 2014 at 12:05:04AM +0000, Hugo Mills wrote:
> > Which is correct?
> 
>    Less than or equal to 55% full.
 
This confuses me. Does that mean that the fullest blocks do not get
rebalanced?
I guess I was under the mistaken impression that the more data you had the
more you could be out of balance.

>    A chunk is the part of a block group that lives on one device, so
> in RAID-1, every block group is precisely two chunks; in RAID-0, every
> block group is 2 or more chunks, up to the number of devices in the
> FS. A chunk is usually 1 GiB in size for data and 250 MiB for
> metadata, but can be smaller under some circumstances.

Right. So, why would you rebalance empty chunks or near empty chunks?
Don't you want to rebalance almost full chunks first, and work you way to
less and less full as needed?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 308 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-23  1:07       ` Marc MERLIN
@ 2014-11-23  7:52         ` Duncan
  2014-11-23 15:12           ` Patrik Lundquist
                             ` (2 more replies)
  2014-11-24 18:05         ` Brendan Hide
  1 sibling, 3 replies; 36+ messages in thread
From: Duncan @ 2014-11-23  7:52 UTC (permalink / raw)
  To: linux-btrfs

Marc MERLIN posted on Sat, 22 Nov 2014 17:07:42 -0800 as excerpted:

> On Sun, Nov 23, 2014 at 12:05:04AM +0000, Hugo Mills wrote:
>> > Which is correct?
>> 
>>    Less than or equal to 55% full.
>  
> This confuses me. Does that mean that the fullest blocks do not get
> rebalanced?

Yes. =:^)

> I guess I was under the mistaken impression that the more data you had
> the more you could be out of balance.

What you were thinking is a misstatement of the situation, so yes, again, 
that was a mistaken impression. =:^)

>>    A chunk is the part of a block group that lives on one device, so
>> in RAID-1, every block group is precisely two chunks; in RAID-0, every
>> block group is 2 or more chunks, up to the number of devices in the FS.
>> A chunk is usually 1 GiB in size for data and 250 MiB for metadata, but
>> can be smaller under some circumstances.
> 
> Right. So, why would you rebalance empty chunks or near empty chunks?
> Don't you want to rebalance almost full chunks first, and work you way
> to less and less full as needed?

No, the closer to empty a chunk is, the more effect you can get in 
rebalancing it along with others of the same fullness.

Think of it this way.

One goal of a rebalance, the goal we have when data and metadata is 
unbalanced and we're hitting ENOSPC as a result (as opposed to the goal 
of converting or balancing among devices when one has just been added or 
removed), and thus the goal that the usage filter is designed to help 
solve, is this: Free excess chunk-allocated but chunk-empty space back to 
unallocated, so it can be used by the other type, data or metadata.

More specifically, all available space has been allocated to data and 
metadata chunks leaving no space available to allocate more chunks, and 
one of two extremes has been reached, we'll call them D and M:

(

D1: All data chunks are full and more need to be allocated, but they 
can't be as there's no more unallocated space to allocate the new data 
chunks from, 

*AND* 

D2: There's a whole bunch of excess metadata chunks allocated, using up 
all that unallocated space, but they're mostly empty, and need to be 
rebalanced to consolidate usage into fewer but fuller metadata chunks, 
thus freeing the space currently taken by all those mostly empty metadata 
chunks.

)

*OR* the reverse:

(

M1: All metadata chunks are full and more need to be allocated, but they 
can't be as there's no more unallocated space to allocate the new 
metadata chunks from,

*AND*

M2: There's a whole bunch of excess data chunks allocated, using up all 
the unallocated space, but they're mostly empty, and need to be 
rebalanced to consoldidate usage into fewer but fuller data chunks, thus 
freeing the space currently taken by all those mostly empty data chunks.

)

In both cases, the one type is full and needs more allocation, but the 
other type is hogging all the space with mostly empty chunks.  In both 
cases, then, you *DON'T* want to bother with the full type, since it's 
full and rewriting it won't do anything but shuffle the full chunks 
around -- you can't combine any because they're all full.

In both cases, What you *WANT* to do is deal with the EMPTY type, the 
chunks that are hogging all the space but not actually using it.

This is evidently a bit counterintuitive on first glance as you're not 
the first to have problems with it, but it /is/ the case, and once you 
understand what's actually happening and why, it /does/ make sense.

More specifically, in the D case, where all /data/ chunks are full, you 
want to rebalance the mostly empty /metadata/ chunks, combining for 
example 5 near 20% full metadata chunks into a single near 100% full 
metadata chunk, deallocating the other four metadata chunks (instead of 
rewriting empty chunks) once there's nothing in them at all.  Five just 
became one, freeing four to unallocated space, which can now be used to 
allocate new data chunks.

And the reverse in the M case, where all metadata chunks are full.  Here, 
you want to rebalance the mostly empty data chunks, again combining say 
five 20% usage data chunks into a single 100% usage data chunk, 
deallocating the other four data chunks once there's nothing in them at 
all.  Again, five just become one, freeing four to unallocated space, 
which now can be used to allocate new, in this case, metadata chunks.

Thus the goal is to rebalance the nearly /empty/ chunks of the *OPPOSITE* 
type to the one you're running short on, combining multiple nearly empty 
chunks of the type you have too many of, thus freeing that empty space 
back to unallocated, so the type that you're actually short on can 
actually allocate chunks from the just freed to unallocated space.

That being the goal, working with the full chunks won't get you much.  
Suppose you work with the 95% full chunks, 5% empty.  You'll have to 
rewrite *TWENTY* of them to combine all those 5% empties to free just 
*ONE* chunk!  And rewriting 100% full chunks won't get you anything at 
all toward this goal, since they're already full and no more can be 
stuffed into them.  Rewrite 100 chunks 100% full, and you still have 100 
chunks 100% full! =:^(

OTOH, suppose you work with 5% full chunks, 95% empty.  Rewrite just two 
of them, and you've already freed one, with the one left only 10% full.  
Add a third one and free a second, with the one you're left with still 
only 15% full.  Continue until you've rewritten 20 of them, AND YOU FREE 
19 OF THEM! =:^)

So it *CLEARLY* pays to work with the mostly empty ones.  Usage=N, where 
balance only works with the ones with LESS than or equal usage to that, 
lets you do exactly that, work with the mostly EMPTY ones.

*BUT*, the payoff is even HIGHER than that.  Consider, since only the 
actually used blocks in a blockgroup need rewritten, an almost full chunk 
is going to take FAR longer than an almost empty chunk to rewrite.  Now 
there's going to be /some/ overhead, but let's consider that 5% full 
example again.  For chunks only 5% full, you're only writing 5% of the 
data or metadata that you'd be writing for a 100% full chunk, 1/20th as 
much.

So in our example above, where we find and rewrite 20 5% usage chunk into 
a single 100% usage chunk, while there will be /some/ overhead, you might 
well write those 20 5% used chunks into a single 100% used chuck in 
perhaps the same time it'd take you to rewrite just ONE 95% usage chunk.

IOW, rewriting 20 95% usage chunks to 19, freeing just one, is going to 
take you nearly 20 times as long as rewriting 20 5% usage chunks, freeing 
19 of them, since in the latter case you're actually only rewriting one 
full chunk's worth of data or metadata.

So working with 5% usage chunks as opposed to 95% usage chunks, you free 
19 times as much space, using only a bit over a 20th as much time.  Even 
with 100% overhead, you'd still spend a tenth as much time freeing 19 
times as many chunks!

Which is why the usage= filter is such a big deal.  In many cases, it 
allows you *HUGE* bang for the buck!  While I'm pulling numbers out of 
the air for this example, they're well within reason.  Something like 
usage=10 might take you half an hour and free up 70% of the space that a 
full balance would free, while the full balance may well take a whole 24-
hour day!

OK, so what /is/ the effect of a fuller filesystem?  Simply this.  As the 
filesystem fills up, there's less and less fully free unallocated space 
available even after a full balance, meaning that free space can be used 
up with fewer and fewer chunk allocations, so you have to rebalance more 
and more often to keep what's left from getting out of balance and 
running into ENOSPC conditions.

Compounding the problem, as the filesystem fills up, it's less and less 
likely that there will be more than just one mostly free chunk available 
(the one that's actively being written into), with others full or nearly 
so), so it'll be necessary to use higher and higher usage=N balances to 
get anything back, and the bonus payoff we had above will be working in 
reverse as now we WILL be having to do 20 95% full chunks to free just 
one chunk back to unallocated.  Compounding the problem even FURTHER, 
will be the fact that we have ALL THOSE GiB (TiB?) of actual data to 
rewrite, so it'll be a worse and worse slog for fewer and fewer freed 
chunks in payback.

Again, numbers out of thin air, but for illustrative purposes...

When a TiB filesystem is say 10% full, 90% of it could be in almost-empty 
chunks.  Not only will it take a relatively long time to get to that 
point with only 10% usage, but a usage=10 filter will very likely free 
say 80% (leaving 10% that would require a higher usage filter to 
recover), in only a few minutes or a half hour or whatever.  And you do 
it once and could be good for six months or a year before you start 
running low on space again and need to redo it.

When it's 90% full, you're likely to need at least usage=80 to get 
anywhere, and you'll be rewriting a good portion of that 900+ GiB in 
ordered to get just a handful of chunks worth of space recovered, with 
the balance taking say 10-12 hours, perhaps longer.  What's worse, you 
may well find yourself having to do a rebalance like that every week, 
because your total deallocatable free space (even after a full balance) 
is approaching your weekly working set!

Obviously at/before that point it's time to invest in more storage!

But, beware!  Just because your filesystem is say 55% full (number from 
your example earlier), does **NOT** mean usage=55 is the best number to 
use.  That may well be the case, or it may not.  There's simply no 
necessarily direct correlation in that regard, and a recommended N for 
usage=N cannot be determined without a LOT more use-case information than 
simply knowing the filesystem is at 55% capacity.

The most that can be /reliably/ stated is that in general, as usage of 
the filesystem goes up, so will the necessary N for the usage=N balance 
filter -- there's a general correlation, yes, but it's nowhere NEAR 
possible to assume any particular ratio like 1:1, without knowing rather 
more about the use-case.

In particular, with the filesystem at 55% capacity, the extremes are all 
used chunks at 100% capacity except for one (the one that's actively 
being used, this is in theory the case immediately after a full balance, 
and even a full balance wouldn't do anything further here), *OR* all used 
chunks at 56% usage but for one (in this case usage=55 would do nothing, 
since all those 56% used chunks are above the 55% cutoff and the single 
chunk that might be rewritten has nothing to combine with, but a usage=56 
or a usage=60 would be as effective as a full balance), *OR* most chunks 
are actually empty, with the remember but one at 100% usage (nearly the 
same as the first case, except in that case there's no empty chunks 
allocated, in this case all available space is allocated to empty chunks, 
such that a usage=0 would be as effective as a full balance), *OR* all 
used chunks but one are at 54-55% usage (usage=55 would in this case 
just /happen/ to be the magic number that is as effective as a full 
balance, while usage=54 would do nothing).

Another way of looking at that would be the old pick a number between 0 
and 100 game.  So you're using two d10 (10-sided dice, with one marked to 
be the 10s digit thus generating 01-(1)00 as the range) to generate the 
number and know the dice are weighted slightly to favor 5s, you along 
with two friends are picking, and you pick first.

So you pick 55.  But your two friends, not being dummies, pick 54 and 
56.  Unless those d10s are HEAVILY weighted, despite the weighing, your 
odds of being the closest with that 55 aren't very good, are they?

Given no differences in time necessary and no additional knowledge about 
how long it has been since the last full balance (which would have tended 
to cram everything to 100% usage), and no knowledge about usage pattern, 
55 would indeed be arguably the best choice to begin with.  

But given the huge time advantage of lower values of N for usage=N if 
they /do/ happen to do what you need, and thus the chance of usage=20 
either doing the job in MUCH less time, or getting done in even LESS time 
because it couldn't actually do /anything/, there's a good chance I'd try 
something like that first, if only to then have some idea how much higher 
I might want to go, because it'll be done SO much faster and has a /small/
chance of doing all I need anyway!

If usage=20 wasn't enough, I might then try usage=40, hoping that it 
would do the rest, knowing that a rerun at higher but still under 100 
number would at most redo only a single chunk from the previous run, the 
one that didn't get filled up all the way at the end -- all the others 
would either be 100% or would have been deallocated as empty, and knowing 
that the higher the number, the MUCH higher the time required, in general.

So the 55% filesystem capacity would probably inform my choice of jumps, 
say 20% at a time, but I'd still start much lower and jump at that 20% or 
so at a time.

Meanwhile, if the filesystem was only at say 20% capacity, I'd probably 
start with usage=0 and jump by 5% at a time, while if it was at say 80% 
capacity, I might still start at usage=0 to see if I could get lucky, but 
then jump to usage=60, and then usage=98 or 99, because the really high 
number still under 100 would still avoid rewriting all the full chunks 
I'd created with the previous runs as well as all 100% full chunks that 
would yield no benefit toward our goal, but would still recover pretty 
much everything it was possible to recover, which once you reach 80% 
capacity is going to start looking pretty necessary at some point.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-23  7:52         ` Duncan
@ 2014-11-23 15:12           ` Patrik Lundquist
  2014-11-24  4:23             ` Duncan
  2014-11-23 21:16           ` Marc MERLIN
  2014-12-07 21:38           ` Marc MERLIN
  2 siblings, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-11-23 15:12 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 23 November 2014 at 08:52, Duncan <1i5t5.duncan@cox.net> wrote:
> [a whole lot]

Thanks for the long post, Duncan.

My venture into the finer details of balance began with converting an
ext4 fs to btrfs and after an inital defrag having a full balance fail
with about a third to go.

Consecutive full balances further reduced the number of chunks and got
me closer to finish without the infamous ENOSPC. After 3-4 full
balance runs it failed with less than 8% to go.

The balance run now finishes without errors with usage=99 and I think
I'll leave it at that. No RAID yet but will convert to RAID1.

Is it correct that there is no reason to ever do a 100% balance as
routine maintenance? I mean if you really need that last 1% space you
actually need a disk upgrade instead.

How about running a monthly maintenance job that uses bytes_used and
dev_item.bytes_used from btrfs-show-super to approximate the balance
need?

(dev_item.bytes_used - bytes_used) / bytes_used == extra device space used

The extra device space used after my balance usage=99 is 0,15%. It was
7,0% before I began tinkering with usage and ran into ENOSPC and I
think it is safe to assume that it was a lot more right after the fs
conversion.

So lets iterate a balance run which begins with usage=0 and increases
in steps of 5 or 10 and stops at 90 or 99 or when the extra device
space used is less than 1%.

Does it make sense?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-23  7:52         ` Duncan
  2014-11-23 15:12           ` Patrik Lundquist
@ 2014-11-23 21:16           ` Marc MERLIN
  2014-11-23 22:49             ` Holger Hoffstätte
  2014-12-07 21:38           ` Marc MERLIN
  2 siblings, 1 reply; 36+ messages in thread
From: Marc MERLIN @ 2014-11-23 21:16 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Sun, Nov 23, 2014 at 07:52:29AM +0000, Duncan wrote:
> > Right. So, why would you rebalance empty chunks or near empty chunks?
> > Don't you want to rebalance almost full chunks first, and work you way
> > to less and less full as needed?
> 
> No, the closer to empty a chunk is, the more effect you can get in 
> rebalancing it along with others of the same fullness.
 
Ok, now I see what I was thinking the wrong way around:
Rebalancing is not rebalancing data within a chunk, optimizing some tree 
data structure.

Rebalancing is taking a nearly empty chunk and merging it with other
chunks to free up that chunkspace.

So, -dusage=10 only picks chunks that are 10% used or less, and tries
to free them up by putting their data elsewhere.

Did I get it right this time? :)

> IOW, rewriting 20 95% usage chunks to 19, freeing just one, is going to 
> take you nearly 20 times as long as rewriting 20 5% usage chunks, freeing 
> 19 of them, since in the latter case you're actually only rewriting one 
> full chunk's worth of data or metadata.

Right, that makes sense.
 
> OK, so what /is/ the effect of a fuller filesystem?  Simply this.  As the 
> filesystem fills up, there's less and less fully free unallocated space 
> available even after a full balance, meaning that free space can be used 
> up with fewer and fewer chunk allocations, so you have to rebalance more 
> and more often to keep what's left from getting out of balance and 
> running into ENOSPC conditions.

Yes, been there, done that :)
 
> But, beware!  Just because your filesystem is say 55% full (number from 
> your example earlier), does **NOT** mean usage=55 is the best number to 
> use.  That may well be the case, or it may not.  There's simply no 
> necessarily direct correlation in that regard, and a recommended N for 
> usage=N cannot be determined without a LOT more use-case information than 
> simply knowing the filesystem is at 55% capacity.

Yeah, I remember that. I'm ok with using the same number but I
understand it's not a given that it's the perfect number.
 
> So the 55% filesystem capacity would probably inform my choice of jumps, 
> say 20% at a time, but I'd still start much lower and jump at that 20% or 
> so at a time.

That makes sense. I'll try to synthetize all this and rewrite my blog
post and the wiki to make this clearer.
 
Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-23 21:16           ` Marc MERLIN
@ 2014-11-23 22:49             ` Holger Hoffstätte
  2014-11-24  4:40               ` Duncan
  0 siblings, 1 reply; 36+ messages in thread
From: Holger Hoffstätte @ 2014-11-23 22:49 UTC (permalink / raw)
  To: linux-btrfs

On Sun, 23 Nov 2014 13:16:50 -0800, Marc MERLIN wrote:

(snip)

> That makes sense. I'll try to synthetize all this and rewrite my blog
> post and the wiki to make this clearer.

Maybe also add that as of 3.18 empty block groups are automatically
collected, so balancing to prevent ENOSPC-by-empty-chunks is no longer
necessary. This works pretty well; I haven't run balance in weeks,
and my total-vs.-used overhead has always been <10 GB.

Holger

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-23 15:12           ` Patrik Lundquist
@ 2014-11-24  4:23             ` Duncan
  2014-11-24 12:35               ` Patrik Lundquist
  0 siblings, 1 reply; 36+ messages in thread
From: Duncan @ 2014-11-24  4:23 UTC (permalink / raw)
  To: linux-btrfs

Patrik Lundquist posted on Sun, 23 Nov 2014 16:12:54 +0100 as excerpted:

> The balance run now finishes without errors with usage=99 and I think
> I'll leave it at that. No RAID yet but will convert to RAID1.

Converting between raid modes is done with a balance, so if you can't get 
that last bit to balance, you can't do a full conversion to raid1.

> Is it correct that there is no reason to ever do a 100% balance as
> routine maintenance? I mean if you really need that last 1% space you
> actually need a disk upgrade instead.

I'm too cautious to make an unequivocal statement like that, but at least 
of the top of my head, I can't think of any reason why /routine/ 
maintenance needs a full balance.  Like I said above, the mode 
conversions need it as that's what rewrites them to the new mode, but 
that's not /routine/.  Similarly, adding/deleting devices, where balance 
is used to rebalance the usage between remaining devices, isn't routine.

Certainly, I've had no reason to do that full balance, as opposed to 99% 
or whatever not-quite-full value, here, in routine usage.  That doesn't 
mean I won't someday find such a reason, but I've not seen one so far.

> How about running a monthly maintenance job that uses bytes_used and
> dev_item.bytes_used from btrfs-show-super to approximate the balance
> need?

I'm not familiar enough with the individual btrfs-show-super line items 
to address that specific question in an intelligent manner.

What I'd recommend using instead is the output from btrfs filesystem df 
<mountpoint> and/or btrfs fi show <mountpoint>.  These commands spit out 
information that's more "human readable", that should be usable in a 
script that conditionally triggers a balance as needed, as well.

In btrfs fi show, you're primarily interested in the devid line(s).  That 
tells you how much of the total available space is chunk-allocated for 
that device, with the difference between total and used being the 
unallocated space, available to allocate to either data or metadata 
chunks as needed.

What you're watching for there is of course nearly all space used.  How 
much you want to keep free will depend to some extent on the size of the 
devices and how close to full they actually are, but with data chunks 
being 1 GiB in size and metadata chunks being a quarter GiB in size, 
until the filesystem gets really too full to do so, keeping enough room 
to allocate several chunks of each shouldn't hurt.  With the usual multi-
hundred-gig filesystems[1], I'd suggest doing a rebalance whenever 
unallocated space is under 20 GiB.  If in fact you have /lots/ of unused 
space, say a TB filesystem with only a couple hundred GiB used, I'd 
probably set the safety margin higher, say 100 GiB or even 200 GiB, while 
at the same time using a lower usage=N balance filter.  No sense getting 
anywhere /close/ to the wire in that case.  As the filesystem fills that 
can be reduced as necessary, but you'll want to keep at *LEAST* 3 GiB or 
so unallocated, so the filesystem always has room to do at least a couple 
more chunk-allocations each of data and metadata.  That should also 
guarantee that there's at least enough room for balance to create a new 
chunk in ordered to be able to do its rewriting thing, thus allowing you 
to free /more/ space.

In btrfs fi df, watch the data and metadata lines.  Specifically, you're 
interested in the spread between total, which is what is chunk-allocated 
for the filesystem, and used, actual usage within those allocated 
chunks.  High spread indicates a bunch of empty chunks that a balance can 
free back to unallocated space, our goal in this case.

Again, data chunks are 1 GiB in size, so for the data line, a spread of 
under a GiB indicates that even a full balance isn't likely to free 
anything back to unallocated.  Generally if it's within a single-digit 
number of GiB difference, don't worry about balancing it.  Similarly, on 
a TB-class filesystem, if btrfs fi show says you still have hundreds of 
GiB  of room, there's little reason to worry about a balance even if the 
spread in fi df is a similar hundreds of GiB, because you still have 
plenty of unallocated room left.

Metadata chunks are a quarter-GiB in size, but on a single-device-
filesystem, they normally default to DUP mode, so two will be allocated 
at a time.  So if you're under a half-gig difference between total (aka 
allocated) and used metadata, doing even a full metadata balance is 
unlikely to get anything back, and it's normally not worth worrying about 
a metadata balance unless the spread is over a couple GiB.  Basically the 
same general rules apply as for data, only at half the metadata size.  So 
under 5-10 GiB spread is unlikely to be worth the hassle.  On a TB-class 
filesystem, still don't worry about it if there's hundreds of GiB 
unallocated, but if the fi df metadata spread between total/allocated and 
used is 50 GiB or more, you may wish to do a metadata balance just to get 
some of that back, even if unallocated (from fi show, as above) /is/ 
still hundreds of GiB.

So bottom line, on a TB-class filesystem with plenty of room (a couple 
hundred GiB free still, or more), I'd rebalance if unallocated (fi show, 
difference between total and used on a device line) drops under 100 GiB, 
rebalancing data if fi df shows over 100 GiB spread between data total 
(aka allocated) and used, and rebalancing metadata if there's over a 50 
GiB spread.

As the filesystem fills up, say with only 100 GiB free, that'd drop to 
triggering a balance if there's under perhaps 20 GiB unallocated on fi 
show, with a data balance at a similar 20 GiB data spread, and a metadata 
balance with a 10 GiB metadata spread.

On a TB-class filesystem or even a half-TB-class filesystem, once you're 
having trouble maintaining at least 10 GiB free, you should really be 
adding more devices or upgrading to bigger hardware, because you really 
don't want the unallocated to drop below 3 GiB or balance itself can have 
trouble running.

On my sub-100 GiB filesystems, I tend to have the filesystem sized much 
closer to what I actually need.  For instance, my rootfs is btrfs raid1 
mode, 8 GiB per device, two devices, so 8 GiB filesystem capacity.

/bin/df reports 2.1 G used, 5.8 GiB available.

btrfs fi show reports (per device) 8 GiB size, 2.78 GiB used.  So call it 
3 GiB used and 5 GiB unallocated.

btrfs fi df reports data of 2 GiB (obviously two 1 GiB chunks) total, 
1.75 GiB of which is used, for a spread of a quarter GiB.  That's under 
the 1 GiB data chunk size so even a full balance likely won't return 
anything.

Btrfs fi df reports metadata of 768 MiB total (obviously three quarter-
GiB chunks, remember this is raid1 so it's not duping the metadata chunks 
to the same device, the other copy is on the other device), 298.12 MiB 
used.

So in theory I could get 1 chunk of that metadata back, reducing it to 2 
metadata chunks.  However, there's typically a couple-hundred MiB 
metadata overhead that btrfs won't actually let you use as it uses it 
internally, and even a full balance doesn't recover it.  So it's unlikely 
I could recover that /apparently/ spare metadata block.

So I appear to be at optimum.  Obviously on an 8 GiB filesystem, I'm 
going to have to watch unallocated space very closely.  However, because 
this /is/ a specific-purpose filesystem (system root, with all installed 
programs and config) and I've already using it for that specific purpose, 
usage shouldn't and doesn't change /that/ much, even tho I'm on gentoo 
and thus have rolling updates.  It's thus /easier/ to keep an eye on data/
metadata spread as well as on total allocated usage and do a balance  
(which on an 8 GiB only filesystem on SSD, tends to take only perhaps a 
couple minutes for a full balance anyway) when I need to, because while 
it's small, even with updates the general data/metadata ratio doesn't 
tend to change much and normally the data and metadata usage stays about 
the same even as the files are updated, because it's simply reusing the 
chunks it has.

---
[1] Multi-hundred-gig filesystems:  These are usual for me as I like to 
keep my physical devices partitioned up and my filesystems small and 
manageable, but most people just create a big filesystem or two out of 
the multi-hundred-gig physical device, so their filesystems are commonly 
multi-hundred-gig as well.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-23 22:49             ` Holger Hoffstätte
@ 2014-11-24  4:40               ` Duncan
  0 siblings, 0 replies; 36+ messages in thread
From: Duncan @ 2014-11-24  4:40 UTC (permalink / raw)
  To: linux-btrfs

Holger Hoffstätte posted on Sun, 23 Nov 2014 22:49:01 +0000 as excerpted:

> On Sun, 23 Nov 2014 13:16:50 -0800, Marc MERLIN wrote:
> 
> (snip)
> 
>> That makes sense. I'll try to synthetize all this and rewrite my blog
>> post and the wiki to make this clearer.
> 
> Maybe also add that as of 3.18 empty block groups are automatically
> collected, so balancing to prevent ENOSPC-by-empty-chunks is no longer
> necessary. This works pretty well; I haven't run balance in weeks, and
> my total-vs.-used overhead has always been <10 GB.

For those of us who have been around btrfs for awhile, this still sounds 
like the stuff if science fiction, perhaps possible sometime in the 
future, but definitely not something we're yet used to having actually 
automatically handled for us.

=:^)

So I think we're all glad it's here now, but kind of holding our breath 
waiting for that bug due to this that stops everything cold.  I think the 
feeling is, OK, but let's not rock the boat too much in our haste to 
celebrate, or we might find ourselves unexpectedly in the water once 
again.  Let's just go on teaching people to swim, assuming they'll need 
to know how, and if they never do, well then that's a bonus! =:^)

But I think once we get into the 3.19 development cycle, if there's no 
critical bugs with this 3.18 feature yet, then and only then are we 
likely to start really talking about it. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-24  4:23             ` Duncan
@ 2014-11-24 12:35               ` Patrik Lundquist
  2014-12-09 22:29                 ` Patrik Lundquist
  0 siblings, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-11-24 12:35 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 24 November 2014 at 05:23, Duncan <1i5t5.duncan@cox.net> wrote:
> Patrik Lundquist posted on Sun, 23 Nov 2014 16:12:54 +0100 as excerpted:
>
>> The balance run now finishes without errors with usage=99 and I think
>> I'll leave it at that. No RAID yet but will convert to RAID1.
>
> Converting between raid modes is done with a balance, so if you can't get
> that last bit to balance, you can't do a full conversion to raid1.

Good point! It slipped my mind. I'll report back if incremental
balances eventually solves the balance after conversion ENOSPC
problem.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-23  1:07       ` Marc MERLIN
  2014-11-23  7:52         ` Duncan
@ 2014-11-24 18:05         ` Brendan Hide
  1 sibling, 0 replies; 36+ messages in thread
From: Brendan Hide @ 2014-11-24 18:05 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Hugo Mills, Patrik Lundquist, linux-btrfs@vger.kernel.org

On 2014/11/23 03:07, Marc MERLIN wrote:
> On Sun, Nov 23, 2014 at 12:05:04AM +0000, Hugo Mills wrote:
>>> Which is correct?
>>     Less than or equal to 55% full.
>   
> This confuses me. Does that mean that the fullest blocks do not get
> rebalanced?

"Balance has three primary benefits:
- free up some space for new allocations
- change storage profile
- balance/migrate data to or away from new or failing disks (the 
original purpose of balance)
and one fringe benefit:
- force a data re-write (good if you think your spinning-rust needs to 
re-allocate sectors)

In the regular case where you're not changing the storage profile or 
migrating data between disks, there isn't much to gain from balancing 
full chunks - and it involves a lot of work. For SSDs, it is 
particularly bad for wear. For spinning rust it is merely a lot of 
unnecessary work.

> I guess I was under the mistaken impression that the more data you had the
> more you could be out of balance.
>
>>     A chunk is the part of a block group that lives on one device, so
>> in RAID-1, every block group is precisely two chunks; in RAID-0, every
>> block group is 2 or more chunks, up to the number of devices in the
>> FS. A chunk is usually 1 GiB in size for data and 250 MiB for
>> metadata, but can be smaller under some circumstances.
> Right. So, why would you rebalance empty chunks or near empty chunks?
> Don't you want to rebalance almost full chunks first, and work you way to
> less and less full as needed?

Balancing empty chunks makes them available for re-allocation - so that 
is directly useful and light on workload.

-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-23  7:52         ` Duncan
  2014-11-23 15:12           ` Patrik Lundquist
  2014-11-23 21:16           ` Marc MERLIN
@ 2014-12-07 21:38           ` Marc MERLIN
  2 siblings, 0 replies; 36+ messages in thread
From: Marc MERLIN @ 2014-12-07 21:38 UTC (permalink / raw)
  To: Duncan, Holger Hoffstätte; +Cc: linux-btrfs

On Sun, Nov 23, 2014 at 07:52:29AM +0000, Duncan wrote:
> > Right. So, why would you rebalance empty chunks or near empty chunks?
> > Don't you want to rebalance almost full chunks first, and work you way
> > to less and less full as needed?
> 
> No, the closer to empty a chunk is, the more effect you can get in 
> rebalancing it along with others of the same fullness.

On Sun, Nov 23, 2014 at 10:49:01PM +0000, Holger Hoffstätte wrote:
> Maybe also add that as of 3.18 empty block groups are automatically
> collected, so balancing to prevent ENOSPC-by-empty-chunks is no longer
> necessary. This works pretty well; I haven't run balance in weeks,
> and my total-vs.-used overhead has always been <10 GB.

Sorry for the delay in confirming this.
I've corrected both 
https://btrfs.wiki.kernel.org/index.php/Balance_Filters#Balancing_to_fix_filesystem_full_errors
and
http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
with your input.

Thanks much for that.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-11-24 12:35               ` Patrik Lundquist
@ 2014-12-09 22:29                 ` Patrik Lundquist
  2014-12-09 23:13                   ` Robert White
                                     ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-09 22:29 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 24 November 2014 at 13:35, Patrik Lundquist
<patrik.lundquist@gmail.com> wrote:
> On 24 November 2014 at 05:23, Duncan <1i5t5.duncan@cox.net> wrote:
>> Patrik Lundquist posted on Sun, 23 Nov 2014 16:12:54 +0100 as excerpted:
>>
>>> The balance run now finishes without errors with usage=99 and I think
>>> I'll leave it at that. No RAID yet but will convert to RAID1.
>>
>> Converting between raid modes is done with a balance, so if you can't get
>> that last bit to balance, you can't do a full conversion to raid1.
>
> Good point! It slipped my mind. I'll report back if incremental
> balances eventually solves the balance after conversion ENOSPC
> problem.

I'm having no luck with a full balance of the converted filesystem.
Tried it again with Linux v3.18.0 and btrfs-progs v3.17.3.

What conclusions can be drawn from the following?

BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
BTRFS: space_info 1 has 4773171200 free, is not full
BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
reserved=99700736, may_use=2102390784, readonly=241664
BTRFS: block group 234109272064 has 5368709120 bytes, 5368709120 used
0 pinned 0 reserved
BTRFS info (device sdc1): block group has cluster?: no
BTRFS info (device sdc1): 0 blocks of free space at or bigger than bytes is
BTRFS: block group 242699206656 has 5368709120 bytes, 5368709120 used
0 pinned 0 reserved
BTRFS info (device sdc1): block group has cluster?: no
BTRFS info (device sdc1): 0 blocks of free space at or bigger than bytes is
BTRFS: block group 339335970816 has 5368709120 bytes, 5368705024 used
0 pinned 0 reserved
BTRFS critical (device sdc1): entry offset 344704675840, bytes 4096, bitmap no


Label: none  uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
    Total devices 1 FS bytes used 1.35TiB
    devid    1 size 2.73TiB used 1.36TiB path /dev/sdc1


Data, single: total=1.35TiB, used=1.35TiB
System, single: total=32.00MiB, used=112.00KiB
Metadata, single: total=3.00GiB, used=1.55GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


Checking filesystem on /dev/sdc1
UUID: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
found 825003219475 bytes used err is 0
total csum bytes: 1452612464
total tree bytes: 1669943296
total fs tree bytes: 39600128
total extent tree bytes: 52903936
btree space waste bytes: 79921034
file data blocks allocated: 1487627730944
 referenced 1487627730944

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-09 22:29                 ` Patrik Lundquist
@ 2014-12-09 23:13                   ` Robert White
  2014-12-10  7:19                     ` Patrik Lundquist
  2014-12-09 23:20                   ` Robert White
  2014-12-09 23:48                   ` Robert White
  2 siblings, 1 reply; 36+ messages in thread
From: Robert White @ 2014-12-09 23:13 UTC (permalink / raw)
  To: Patrik Lundquist, linux-btrfs@vger.kernel.org

On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
> Label: none  uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
>      Total devices 1 FS bytes used 1.35TiB
>      devid    1 size 2.73TiB used 1.36TiB path /dev/sdc1
>
>
> Data, single: total=1.35TiB, used=1.35TiB
> System, single: total=32.00MiB, used=112.00KiB
> Metadata, single: total=3.00GiB, used=1.55GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B

Are you trying to convert a filesystem on a single device/partition to 
RAID 1?

I don't think thats legal. Whithout a second slice to distribute the 
copies of the data onto there is no raiding to be done.

Add the second device with btrfs device add, and _then_ use balance to 
redistribute and copy the data to the second device.

ASIDE: I, personally, think that a single device RAID1 should be legal. 
I also think that it should be possible to tell the system that you want 
N copies if you have N-or-more slices onto which they would spread. 
These would match my expectations from mdadm and several hardware and 
appliance RAID solutions. But my opinions in the matter do _not_ match 
the BTRFS code base. RAID1 means exactly two devices (for any given 
piece of information) [though I don't know whether it always has to be 
the _same_ two devices for two different pieces of information.]

So yea, if that is what you are trying to do, the inability to find a 
second drive on which to allocate the peer-block(s) for an extent would 
produce interesting errors. I cant say for sure that this is the exact 
genesis of your issue, but I've read here in other threads a number of 
comments that would translate as "trying to set RAID1 with on a 
one-slice file system will be full of fail".

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-09 22:29                 ` Patrik Lundquist
  2014-12-09 23:13                   ` Robert White
@ 2014-12-09 23:20                   ` Robert White
  2014-12-09 23:48                   ` Robert White
  2 siblings, 0 replies; 36+ messages in thread
From: Robert White @ 2014-12-09 23:20 UTC (permalink / raw)
  To: Patrik Lundquist, linux-btrfs@vger.kernel.org

On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
> Data, single: total=1.35TiB, used=1.35TiB
> System, single: total=32.00MiB, used=112.00KiB
> Metadata, single: total=3.00GiB, used=1.55GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B

P.S. you should re-balance your System and Metadata as "DUP" for now. 
Two copies of that stuff is better than one as right now you have no 
real recovery path for that stuff. If you didn't make that change on 
purpose it probably got down-revved from DUP automagically when you 
tired to RAID it. e.g. block-by-block the dups were removed but then 
there was no place to put the mirrored copy.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-09 22:29                 ` Patrik Lundquist
  2014-12-09 23:13                   ` Robert White
  2014-12-09 23:20                   ` Robert White
@ 2014-12-09 23:48                   ` Robert White
  2014-12-10  0:01                     ` Robert White
  2 siblings, 1 reply; 36+ messages in thread
From: Robert White @ 2014-12-09 23:48 UTC (permalink / raw)
  To: Patrik Lundquist, linux-btrfs@vger.kernel.org

On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
 > (stuff depicting a nearly full file system).

Having taken another look at it all, I'd bet (there is not sufficient 
information to be _sure_ from the output you've provided) that you don't 
have the necessary 1Gb free on your disk slice to allocate another data 
extent. The COW portion of some consolidation event is being blocked for 
lack of any place to put the condensed/congealed result of balancing one 
or more of your blocks. You are gong to have to grow your filesystem by 
at least 1 gig to get the balance to complete as-is; or alternately 
remove at least a gig worth of "large files" (e.g. files that are stored 
in a DATA extent as opposed to small ones stored in the metadata).

In the alternate, if you have a "bigger drive" to use, then add that 
device to the file system (q.v. "btrfs device add /dev/sdd1 /muntpoint") 
and then remove the current device (q.v. "btrfs device delete /dev/sdc1 
/mountpoint"). You now have the filesystem on a bigger media where stuff 
can happen correctly.

At _that_ point you can RAID the larger device to equally sized peers 
etc. if your actual goal is to establish full redundancy.

===

If the whole raiding thing was about running out of space, then you are 
actually "done" as soon as you add the second device. It will be used 
automatically, and you can balance over to it directly or not as you see 
fit.

In particular if you have another drive/slice of equal size and you 
intend to spread out onto it, your best choices are ::

btrfs device add /dev/sdd1 /mp

btrfs balance start -dconvert=raid0 -mconvert=dup -sconvert=dup /mp
--or--
btrfs balance start -dconvert=raid0 -mconvert=raid1 -sconvert=raid1 /mp

(the latter is more preservative of information a failed media, but 
since your bulk data isn't being stored redundantly it probably doesn't 
matter which you use.)

Once the second drive is in place you'll have the room you need for the 
balances to finish.

In the second model you'll be spreading your bulk data out onto the two 
drives "evenly" via striping (raid0) and your metadata will be 
duplicated between the two slices.

If you are adding storage and you are _not_ going to be adding it all 
with -dconvert=raid1, it doesn't matter that you don't currently have 
the needed space for a balance to complete on the currently full media. 
If you are trying to raid1 your entire filesystem when you are already 
effectively out of space, you will find no joy. Adding a full media 
raid1 is a no-op for available space.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-09 23:48                   ` Robert White
@ 2014-12-10  0:01                     ` Robert White
  2014-12-10 12:47                       ` Duncan
  0 siblings, 1 reply; 36+ messages in thread
From: Robert White @ 2014-12-10  0:01 UTC (permalink / raw)
  To: Patrik Lundquist, linux-btrfs@vger.kernel.org

On 12/09/2014 03:48 PM, Robert White wrote:
> On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
>  > (stuff depicting a nearly full file system).
>
> Having taken another look at it all, I'd bet (there is not sufficient
> information to be _sure_ from the output you've provided) that you don't
> have the necessary 1Gb free on your disk slice to allocate another data
> extent. The COW portion of some consolidation event is being blocked for
> lack of any place to put the condensed/congealed result of balancing one
> or more of your blocks. You are gong to have to grow your filesystem by
> at least 1 gig to get the balance to complete as-is; or alternately
> remove at least a gig worth of "large files" (e.g. files that are stored
> in a DATA extent as opposed to small ones stored in the metadata).
>
> In the alternate, if you have a "bigger drive" to use, then add that
> device to the file system (q.v. "btrfs device add /dev/sdd1 /muntpoint")
> and then remove the current device (q.v. "btrfs device delete /dev/sdc1
> /mountpoint"). You now have the filesystem on a bigger media where stuff
> can happen correctly.
>
> At _that_ point you can RAID the larger device to equally sized peers
> etc. if your actual goal is to establish full redundancy.
>
> ===
>
> If the whole raiding thing was about running out of space, then you are
> actually "done" as soon as you add the second device. It will be used
> automatically, and you can balance over to it directly or not as you see
> fit.
>
> In particular if you have another drive/slice of equal size and you
> intend to spread out onto it, your best choices are ::
>
> btrfs device add /dev/sdd1 /mp

EDIT :: SLIGHT BOO BOO maybe...

You may not have enough data chunks to get the -dconvert=raid0 to run 
right (I've never done the experiment but you are probably still 
COW-blocked) away as I don't know if the convert will drop back to "make 
room" mode and just bump some data aside before doing the conversion. So 
you might need to do a limited data balance before you do either of the 
below.

btrfs balance start -dlimit=20 /mp #20 is a wild guess at a good number

This will examine 20 chunks, and likely move at least one or two, if not 
all twenty, onto the second drive. This will make room for the 
subsequent raid0 segments on the first drive.


>
> btrfs balance start -dconvert=raid0 -mconvert=dup -sconvert=dup /mp
> --or--
> btrfs balance start -dconvert=raid0 -mconvert=raid1 -sconvert=raid1 /mp
>
> (the latter is more preservative of information a failed media, but
> since your bulk data isn't being stored redundantly it probably doesn't
> matter which you use.)
>
> Once the second drive is in place you'll have the room you need for the
> balances to finish.
>
> In the second model you'll be spreading your bulk data out onto the two
> drives "evenly" via striping (raid0) and your metadata will be
> duplicated between the two slices.
>
> If you are adding storage and you are _not_ going to be adding it all
> with -dconvert=raid1, it doesn't matter that you don't currently have
> the needed space for a balance to complete on the currently full media.
> If you are trying to raid1 your entire filesystem when you are already
> effectively out of space, you will find no joy. Adding a full media
> raid1 is a no-op for available space.
>

(EDIT::Continued.) Worse case, just add the second device and run a 
balance with no arguments to spread your data out. Then run the format 
specific conversions to get it all rational and optimal.

Full filesystems always get into corner cases.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-09 23:13                   ` Robert White
@ 2014-12-10  7:19                     ` Patrik Lundquist
  2014-12-10 12:17                       ` Robert White
  0 siblings, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-10  7:19 UTC (permalink / raw)
  To: Robert White; +Cc: linux-btrfs@vger.kernel.org

On 10 December 2014 at 00:13, Robert White <rwhite@pobox.com> wrote:
> On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
>>
>> Label: none  uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
>>      Total devices 1 FS bytes used 1.35TiB
>>      devid    1 size 2.73TiB used 1.36TiB path /dev/sdc1
>>
>>
>> Data, single: total=1.35TiB, used=1.35TiB
>> System, single: total=32.00MiB, used=112.00KiB
>> Metadata, single: total=3.00GiB, used=1.55GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> Are you trying to convert a filesystem on a single device/partition to RAID
> 1?

Not yet. I'm stuck at the full balance after the conversion from ext4.
I haven't added the disks for RAID1 and might need them for starting
over instead.

A balance with -musage=100 -dusage=99 works but a full fails. It would
be nice to nail the bug since the fs passes btrfs check and it seems
to be a clear ENOSPC bug.


I don't know how to interpret the space_info error. Why is only
4773171200 (4,4GiB) free?
Can I inspect block group 1821099687936 to try to find out what makes
it problematic?

BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
BTRFS: space_info 1 has 4773171200 free, is not full
BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
reserved=99700736, may_use=2102390784, readonly=241664


> P.S. you should re-balance your System and Metadata as "DUP" for now. Two
> copies of that stuff is better than one as right now you have no real
> recovery path for that stuff. If you didn't make that change on purpose it
> probably got down-revved from DUP automagically when you tired to RAID it.

Good point. Maybe btrfs-convert should do that by default? I don't
think it has ever been DUP.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10  7:19                     ` Patrik Lundquist
@ 2014-12-10 12:17                       ` Robert White
  2014-12-10 13:11                         ` Duncan
  2014-12-10 13:36                         ` Patrik Lundquist
  0 siblings, 2 replies; 36+ messages in thread
From: Robert White @ 2014-12-10 12:17 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org

On 12/09/2014 11:19 PM, Patrik Lundquist wrote:
> On 10 December 2014 at 00:13, Robert White <rwhite@pobox.com> wrote:
>> On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
>>>
>>> Label: none  uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
>>>       Total devices 1 FS bytes used 1.35TiB
>>>       devid    1 size 2.73TiB used 1.36TiB path /dev/sdc1
>>>
>>>
>>> Data, single: total=1.35TiB, used=1.35TiB
>>> System, single: total=32.00MiB, used=112.00KiB
>>> Metadata, single: total=3.00GiB, used=1.55GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>>
>> Are you trying to convert a filesystem on a single device/partition to RAID
>> 1?
>
> Not yet. I'm stuck at the full balance after the conversion from ext4.
> I haven't added the disks for RAID1 and might need them for starting
> over instead.

You are not "stuck" here as this step is not mandatory. (see below)

>
> A balance with -musage=100 -dusage=99 works but a full fails. It would
> be nice to nail the bug since the fs passes btrfs check and it seems
> to be a clear ENOSPC bug.

Conversion from ext2/3/4 is constrained because it needs to be reversible.

If you are out of space this isn't a "bug", you are just out of space. 
So by telling the system to ignore the 100% full clusters it is free to 
juggle the fragments. But once you get into moving the fully full 
extents the COW features _MUST_ have access to _contiguous_ 1Gib blocks 
to make the new extents int which the Copy will be Written. If your file 
system was nearly full it's completely likely that there are no such 
contiguous blocks available to make the necessary extents.

BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted 
filesystem. That is, the recommended balance (and recursive defrag) is 
_not_ a useability issue, its an efficiency issue.

Check what you've got. Make sure it is good. Make sure you are cool with 
it all. When you know everything is usable then remove the undo 
information snapshot. That snapshot is pinning a _lot_ of data into 
exact positions on disk. It's memorializing your previous fragmentation 
and the anniversary positions of all the EXT4 data structures. Since 
your system is basically full that undo information has to go.

At that point your balance will probably have the room it needs.

_Then_ you can balance if you feel the desire.

If you are _still_ out of space you'll need to add some, at least 
temporarily, to give the system enough room to work.

Since we all _know_ you are a dilligent system administrator and 
architect with a good, recent, and well tested backup we know we can 
recommend that you just dump the undo partition with a nice btrfs subvol 
delete, right? Because you made a backup and everything yes?

So anyway. Your system isn't "bugged" or "broken" it's "full" but its a 
fragmented fullness that has lots of free sectors but insufficent 
contiguous free sectors, so it cannot satisfy the request.

That Said...

I suspect you _have_ revealed a problem with the error reporting in the 
case of "scary and wrong error message".

The allocator in extent-tree.c just tells you the raw free space on the 
disk and says "hua... there are lots of bytes out there".

Which is _WAY_ different than "there are enough bytes all in one clump 
to satisfy my needs. E.g. there is _not_ a lot of brains behind the message.

         ret = find_free_extent(root, num_bytes, empty_size, hint_byte, ins,
                                flags, delalloc);

         if (ret == -ENOSPC) {
                 if (!final_tried && ins->offset) {
                         num_bytes = min(num_bytes >> 1, ins->offset);
                         num_bytes = round_down(num_bytes, 
root->sectorsize);
                         num_bytes = max(num_bytes, min_alloc_size);
                         if (num_bytes == min_alloc_size)
                                 final_tried = true;
                         goto again;
                 } else if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
                         struct btrfs_space_info *sinfo;

                         sinfo = __find_space_info(root->fs_info, flags);
                         btrfs_err(root->fs_info, "allocation failed 
flags %llu, wanted %llu",
                                 flags, num_bytes);
                         if (sinfo)
                                 dump_space_info(sinfo, num_bytes, 1);
                 }
         }

>
>
> I don't know how to interpret the space_info error. Why is only
> 4773171200 (4,4GiB) free?
> Can I inspect block group 1821099687936 to try to find out what makes
> it problematic?
>
> BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
> BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
> BTRFS: space_info 1 has 4773171200 free, is not full
> BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
> reserved=99700736, may_use=2102390784, readonly=241664

So it was looking for a single chunk 2013265920 bytes long and it 
couldn't find one because all the spaces were smaller and there was no 
room to make a new suitable space.

The problem is that it wanted 2013265920 bytes and while the system as a 
whole had no way to satisfy that desire. It asked for something just shy 
of two gigs as a single extent. That's a tough order on a full platter.

Since your entire free size is 2102390784 that is an attempt to allocate 
about 80% of your free space as one contiguous block. That's never going 
to happen. 8-)

I don't even know if 2GiB is normally a legal size for an extent. My 
understanding is that data is allocated in 1G chunks, so I'd expect all 
extents to be smaller than 1G.

Normally...

But... I would bet that this 2gig monster is the image file, or part 
thereof, that btrfs-convert left behind, and it may well be a magical 
allocation of some sort. It may even be beyond the reach of balance et 
al for being so large. But it _is_ within the bounds of the byte offests 
and sizes the file system uses.

After a quick glance at the btrfs-convert, it looks like it might make 
some pretty atypical extents if the underlying donor filesystem needed 
needed them. It wouldn't have had a choice. So it's easily within the 
realm of reason that you'd have some really fascinating data as a result 
of converting a nearly full EXT4 file system of the Terabyte+ size. This 
would be quadruply true if you'd tweaked the block group ratios when you 
made the original file system.

So since you have nice backups... you should probably drop the 
ext2_saved subvolume and then get on with your life for good or ill.

But its do or undo time.

AND UNDO IS NOT A BAD OPTION.

If you've got the media, building a fresh filesystem and copying the 
contents onto it is my preferred method anyway. I get to set the options 
I want (compression, skinny metadata, whatever) and I know I've got a 
good backup on the original media. It's also the perfectly natural way 
to get the subvolume boundaries where I want them and all that stuff.

Think of the time and worry you'd have saved if you'd copied the thing 
in the first place. 8-)

So anyway...

Probably fine.
Probably just very full filesystem.
Clearly got some big whale files that just won't balance due to space.
Probably those files are the leftover EXT4 structures.
Probably okay to revert.
Probably okay to just delete the revert info.
The prior two items are mutually exclusive.

Since you have nice and validated backups you can't go wrong either way.

>
>> P.S. you should re-balance your System and Metadata as "DUP" for now. Two
>> copies of that stuff is better than one as right now you have no real
>> recovery path for that stuff. If you didn't make that change on purpose it
>> probably got down-revved from DUP automagically when you tired to RAID it.
>
> Good point. Maybe btrfs-convert should do that by default? I don't
> think it has ever been DUP.

Eyup.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10  0:01                     ` Robert White
@ 2014-12-10 12:47                       ` Duncan
  2014-12-10 20:11                         ` Patrik Lundquist
  0 siblings, 1 reply; 36+ messages in thread
From: Duncan @ 2014-12-10 12:47 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Tue, 09 Dec 2014 16:01:02 -0800 as excerpted:

> On 12/09/2014 03:48 PM, Robert White wrote:
>> On 12/09/2014 02:29 PM, Patrik Lundquist wrote:
>>> (stuff depicting a nearly full file system).
>>
>> Having taken another look at it all, I'd bet (there is not sufficient
>> information to be _sure_ from the output you've provided) that you
>> don't have the necessary 1Gb free on your disk slice to allocate
>> another data extent.

[snip most of both quote levels]

> Full filesystems always get into corner cases.

But, from the content you snipped from his post, this from btrfs fi show:

>>> Label: none  uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
>>>    Total devices 1 FS bytes used 1.35TiB
>>>    devid    1 size 2.73TiB used 1.36TiB path /dev/sdc1

Device 2.73 TiB, used only 1.36 TiB.

That's over a TiB of entirely unallocated space, so a mere 1 GiB chunk 
allocation shouldn't be a problem.

I'm sticking with my original hypothesis (assuming this is a continuation 
from the thread I think it was), that there's something about the 
conversion from ext* that didn't work correctly; most likely a file 
larger than the btrfs 1 GiB data-chunk size, that has an extent larger 
than that size as well.  Btrfs balance couldn't do anything with that, as 
it's larger than the native 1 GiB data-chunk size and balance alone 
doesn't know how to split it up.

The recursive btrfs defrag after deleting the saved ext* subvolume 
_should_ have split up any such > 1 GiB extents so balance could deal 
with them, but either it failed for some reason on at least one such 
file, or there's some other weird corner-case going on, very likely 
something else having to do with the conversion.

Patrik, assuming no btrfs snapshots yet, can you do a du --all --block-
size=1M | sort -n (or similar), then take a look at all results over 1024 
(1 GiB since the du specified 1 MiB blocks), and see if it's reasonable 
to move all those files out of the filesystem and back?  Assuming there's 
not too many of them, the idea is to kill the copy in the filesystem by 
moving them elsewhere, then move them back so they get recreated using 
native btrfs semantics -- no extents larger than the native btrfs data 
chunk size of 1 GiB.

If you have lots of memory to work with, one method would be to create a 
tmpfs, then /copy/ the files to tmpfs and /move/ them back to a temporary 
tree on the btrfs, deleting the originals on btrfs only after the move 
back from tmpfs and a sync (or btrfs fi sync) so there's always a 
permanent copy if the machine should crash and take down the tmpfs with 
it.  After all the files have been processed and the originals deleted 
you can then move the contents of the temporary tree back into the 
original location.

That should ensure no more > 1 GiB file extents and will I hope get rid 
of the problem, as this workaround has been demonstrated to fix problems 
other people had with converted-from-ext* btrfs, generally where they had 
failed to run the defrag right after the conversion, and now had a bunch 
more data on the filesystem and didn't want to have to defrag it too.  
Obviously it works best when there's only a handful of > 1 GiB files, 
however, and snapshots containing references to the affected files will 
prevent the file delete from actually deleting the problematic extents.

With luck that'll allow a full 100% balance without error.  If not, at 
least it should eliminate the > 1 GiB file extents possibility, and the 
focus can move to something else.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10 12:17                       ` Robert White
@ 2014-12-10 13:11                         ` Duncan
  2014-12-10 18:56                           ` Patrik Lundquist
  2014-12-10 13:36                         ` Patrik Lundquist
  1 sibling, 1 reply; 36+ messages in thread
From: Duncan @ 2014-12-10 13:11 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Wed, 10 Dec 2014 04:17:50 -0800 as excerpted:

>> BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
>> BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
>> BTRFS: space_info 1 has 4773171200 free, is not full BTRFS: space_info
>> total=1494648619008, used=1489775505408, pinned=0, reserved=99700736,
>> may_use=2102390784, readonly=241664
> 
> So it was looking for a single chunk 2013265920 bytes long and it
> couldn't find one because all the spaces were smaller and there was no
> room to make a new suitable space.
> 
> The problem is that it wanted 2013265920 bytes and while the system as a
> whole had no way to satisfy that desire. It asked for something just shy
> of two gigs as a single extent. That's a tough order on a full platter.
> 
> Since your entire free size is 2102390784 that is an attempt to allocate
> about 80% of your free space as one contiguous block. That's never going
> to happen. 8-)
> 
> I don't even know if 2GiB is normally a legal size for an extent. My
> understanding is that data is allocated in 1G chunks, so I'd expect all
> extents to be smaller than 1G.

On native btrfs, an extent must fit within the 1 GiB data chunk size, 
with extents inherited from an ext* conversion being an obvious non-
native exception.

I hadn't looked at the actual output, but that confirms my earlier 
suspicion, that after the ext* saved subvolume delete, the defrag somehow 
missed at least one file > 1 GiB with a "super-extent" also > 1 GiB in 
size.

>From there... I've never used it but I /think/ btrfs inspect-internal 
logical-resolve should let you map the 182109... address to a filename.  
>From there, moving that file out of the filesystem and back in should 
eliminate that issue.

Assuming no snapshots still contain the file, of course, and that the 
ext* saved subvolume has already been deleted.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10 12:17                       ` Robert White
  2014-12-10 13:11                         ` Duncan
@ 2014-12-10 13:36                         ` Patrik Lundquist
  2014-12-11  8:42                           ` Robert White
  1 sibling, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-10 13:36 UTC (permalink / raw)
  To: Robert White; +Cc: linux-btrfs@vger.kernel.org

On 10 December 2014 at 13:17, Robert White <rwhite@pobox.com> wrote:
> On 12/09/2014 11:19 PM, Patrik Lundquist wrote:
>>
> BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
> filesystem. That is, the recommended balance (and recursive defrag) is _not_
> a useability issue, its an efficiency issue.

But if I can't start with an efficient filesystem I'd rather start
over now/soon. I intend to add four more old disks for a RAID1 and it
will be problematic to start over later on (I'd have to buy new, large
disks).

I deleted the subvolume after being satisfied with the conversion,
defragged recursively, and balanced. In that order.


> Because you made a backup and everything yes?

Shh!


> So anyway. Your system isn't "bugged" or "broken" it's "full" but its a
> fragmented fullness that has lots of free sectors but insufficent contiguous
> free sectors, so it cannot satisfy the request.

It's a half full 3TB disk. There _is_ space, somewhere. I can't speak
for contiguous space though.


>> I don't know how to interpret the space_info error. Why is only
>> 4773171200 (4,4GiB) free?
>> Can I inspect block group 1821099687936 to try to find out what makes
>> it problematic?
>>
>> BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
>> BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
>> BTRFS: space_info 1 has 4773171200 free, is not full
>> BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
>> reserved=99700736, may_use=2102390784, readonly=241664
>
>
> So it was looking for a single chunk 2013265920 bytes long and it couldn't
> find one because all the spaces were smaller and there was no room to make a
> new suitable space.
>
> The problem is that it wanted 2013265920 bytes and while the system as a
> whole had no way to satisfy that desire. It asked for something just shy of
> two gigs as a single extent. That's a tough order on a full platter.
>
> Since your entire free size is 2102390784 that is an attempt to allocate
> about 80% of your free space as one contiguous block. That's never going to
> happen. 8-)

What about "space_info 1 has 4773171200 free"? Besides the other 1,5TB
free space.


> I don't even know if 2GiB is normally a legal size for an extent. My
> understanding is that data is allocated in 1G chunks, so I'd expect all
> extents to be smaller than 1G.

The 'summary' after the failed balances is always something like "98
enospc errors" which now makes me suspect that I have 98 files with
extents larger than 1GiB that the defrag didn't take care of.

So if I can find out which files have >1GiB extents I can then copy
them back and forth to solve the problem.

Maybe running defrag more times can also solve it? Can I get a list of
fragmented files?

Suppose an old file with 2GiB extent isn't fragmented, will btrfs
defrag still try to defrag it?


> After a quick glance at the btrfs-convert, it looks like it might make some
> pretty atypical extents if the underlying donor filesystem needed needed
> them. It wouldn't have had a choice. So it's easily within the realm of
> reason that you'd have some really fascinating data as a result of
> converting a nearly full EXT4 file system of the Terabyte+ size.

It was about half full at conversion.


> This would
> be quadruply true if you'd tweaked the block group ratios when you made the
> original file system.

Ext4 created with defaults, but I think it has been completely full at one time.


> So since you have nice backups... you should probably drop the ext2_saved
> subvolume and then get on with your life for good or ill.

Done before defrag and balance attempts.


> Think of the time and worry you'd have saved if you'd copied the thing in
> the first place. 8-)

But then I wouldn't learn as much. :-)


>>> P.S. you should re-balance your System and Metadata as "DUP" for now. Two
>>> copies of that stuff is better than one as right now you have no real
>>> recovery path for that stuff. If you didn't make that change on purpose
>>> it
>>> probably got down-revved from DUP automagically when you tired to RAID
>>> it.
>>
>>
>> Good point. Maybe btrfs-convert should do that by default? I don't
>> think it has ever been DUP.
>
> Eyup.

And the metadata is now DUP. That's ~1.5GB extra metadata that was
allocated just fine after the failed balance.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10 13:11                         ` Duncan
@ 2014-12-10 18:56                           ` Patrik Lundquist
  2014-12-10 22:28                             ` Robert White
  0 siblings, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-10 18:56 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 10 December 2014 at 14:11, Duncan <1i5t5.duncan@cox.net> wrote:
>
> From there... I've never used it but I /think/ btrfs inspect-internal
> logical-resolve should let you map the 182109... address to a filename.
> From there, moving that file out of the filesystem and back in should
> eliminate that issue.

btrfs inspect-internal logical-resolve 1821099687936 /mnt gives me the
filename and it's only a 54175 bytes file.


> Assuming no snapshots still contain the file, of course, and that the
> ext* saved subvolume has already been deleted.

Got no snapshots or subvolumes. Keeping it simple for now.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10 12:47                       ` Duncan
@ 2014-12-10 20:11                         ` Patrik Lundquist
  2014-12-11  4:02                           ` Duncan
  2014-12-11  4:49                           ` Duncan
  0 siblings, 2 replies; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-10 20:11 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 10 December 2014 at 13:47, Duncan <1i5t5.duncan@cox.net> wrote:
>
> The recursive btrfs defrag after deleting the saved ext* subvolume
> _should_ have split up any such > 1 GiB extents so balance could deal
> with them, but either it failed for some reason on at least one such
> file, or there's some other weird corner-case going on, very likely
> something else having to do with the conversion.

I've run defrag several times again and it doesn't do anything additional.


> Patrik, assuming no btrfs snapshots yet, can you do a du --all --block-
> size=1M | sort -n (or similar), then take a look at all results over 1024
> (1 GiB since the du specified 1 MiB blocks), and see if it's reasonable
> to move all those files out of the filesystem and back?

Good idea, but it's quite a lot of files. I'd rather start over.

But I've identified 46 files from Btrfs errors in syslog and will try
to move them to another disk. They're ranging from 41KiB to 6.6GiB in
size.

Is btrfs-debug-tree -e useful in finding problematic files?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10 18:56                           ` Patrik Lundquist
@ 2014-12-10 22:28                             ` Robert White
  2014-12-11  4:13                               ` Duncan
  2014-12-11  6:16                               ` Patrik Lundquist
  0 siblings, 2 replies; 36+ messages in thread
From: Robert White @ 2014-12-10 22:28 UTC (permalink / raw)
  To: Patrik Lundquist, linux-btrfs@vger.kernel.org

On 12/10/2014 10:56 AM, Patrik Lundquist wrote:
> On 10 December 2014 at 14:11, Duncan <1i5t5.duncan@cox.net> wrote:
>> Assuming no snapshots still contain the file, of course, and that the
>> ext* saved subvolume has already been deleted.
>
> Got no snapshots or subvolumes. Keeping it simple for now.

Does that mean that you have already manually removed the subvolume that 
was automatically created by btrfs-convert?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10 20:11                         ` Patrik Lundquist
@ 2014-12-11  4:02                           ` Duncan
  2014-12-11  4:49                           ` Duncan
  1 sibling, 0 replies; 36+ messages in thread
From: Duncan @ 2014-12-11  4:02 UTC (permalink / raw)
  To: linux-btrfs

Patrik Lundquist posted on Wed, 10 Dec 2014 21:11:52 +0100 as excerpted:

> Is btrfs-debug-tree -e useful in finding problematic files?

Since you were replying directly to me, my answer...

ENOTENOUGHINFO

I don't know enough about it to honestly say, as I've never used it 
myself and haven't seen anyone posting practical usage that I could make 
note of in case I or someone else needed it later.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10 22:28                             ` Robert White
@ 2014-12-11  4:13                               ` Duncan
  2014-12-11 10:29                                 ` Patrik Lundquist
  2014-12-11  6:16                               ` Patrik Lundquist
  1 sibling, 1 reply; 36+ messages in thread
From: Duncan @ 2014-12-11  4:13 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Wed, 10 Dec 2014 14:28:10 -0800 as excerpted:

> On 12/10/2014 10:56 AM, Patrik Lundquist wrote:
>> On 10 December 2014 at 14:11, Duncan <1i5t5.duncan@cox.net> wrote:
>>> Assuming no snapshots still contain the file, of course, and that the
>>> ext* saved subvolume has already been deleted.
>>
>> Got no snapshots or subvolumes. Keeping it simple for now.
> 
> Does that mean that you have already manually removed the subvolume that
> was automatically created by btrfs-convert?

Yes, he had.

Patrik correct me if I have this wrong, but filling in the history as I 
believe I have it...

If I'm keeping my cases straight, he had actually posted a thread some 
weeks ago with the initial problem, saying he had followed the conversion 
instructions to the letter -- conversion, delete-saved, defrag, balance, 
and ran into this problem with balance.  The conclusion at that time was 
that he'd try successively larger balance -dusage=N figures, hoping to 
work thru it that way.

That original thread could well have been shortly before you appeared on 
the list, however, and you may not have seen it.  Either that, or you saw 
it but didn't connect that case with this one.

Anyway, yes, assuming I haven't gotten my casefiles mixed up, and 
evidence so far is that I haven't, he did everything he was supposed to 
and still ended up with this issue.  Obviously there's still a bug 
somewhere.

And now he's back.  The incrementally increasing usage= balances reaching 
99%, but that last 1% is the sticking point and he, and the rest of us, 
are trying to figure out what happened and how to get him past it.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10 20:11                         ` Patrik Lundquist
  2014-12-11  4:02                           ` Duncan
@ 2014-12-11  4:49                           ` Duncan
  1 sibling, 0 replies; 36+ messages in thread
From: Duncan @ 2014-12-11  4:49 UTC (permalink / raw)
  To: linux-btrfs

Patrik Lundquist posted on Wed, 10 Dec 2014 21:11:52 +0100 as excerpted:

>> Patrik, assuming no btrfs snapshots yet, can you do a du --all --block-
>> size=1M | sort -n (or similar), then take a look at all results over
>> 1024 (1 GiB since the du specified 1 MiB blocks), and see if it's
>> reasonable to move all those files out of the filesystem and back?
> 
> Good idea, but it's quite a lot of files. I'd rather start over.
> 
> But I've identified 46 files from Btrfs errors in syslog and will try to
> move them to another disk. They're ranging from 41KiB to 6.6GiB in size.

There's one as yet incomplete piece of the puzzle.  I guess the devs 
could probably answer this, but being a simple sysadmin, I don't claim to 
read code well and don't know...

That log snippet you quoted earlier gave block-group addresses.  That's 
the chunks, in this case normally 1 GiB data chunks, but here we're 
dealing with a conversion from ext4 and apparently the extents are 
larger, nearly 2 GiB in this case according to that snippet.

That had me thinking the problem files were all > 1 GiB and had these 
super-extents that btrfs can't work with.

But you say you tracked down the file as I suggested using btrfs-inspect-
internal, and the file is much smaller than that.

Now I don't even know for sure what that log snippet was from, a normal 
dmesg during an attempted balance, or dmesg with btrfs debug turned on in 
the kernel, or the userspace debug you ask about, or...

And not being a dev and not having done anything like this level myself, 
I'm sort of feeling my way along here too, trying to figure things out as 
you report them.

So the missing piece I'm talking about is this.  OK, we have the address 
of a nearly 2 GiB block group reported, and I recalled seeing in an 
earlier post that trick with btrfs-inspect-internal, so I though to try 
it here.

But with the file being so much smaller than the 2 GiB block group 
reported, something's not matching.  Either the file is somehow using an 
extent much much larger than it is (possible with fallocate, then writing 
a shorter file, I believe), or the referred to block group actually 
contains more than one file -- certainly btrfs data chunks can do so, but 
given that we're dealing with a conversion here, I don't know if the same 
rules apply, or...

Anyway, it's possible that smaller file is simply the first one in the 
block group, thus being the one that was mapped when you plugged that 
address into inspect-internal, and that the problem file is actually a 
much larger file located after it in the same block group.

So if moving the small files doesn't do the trick, try feeding inspect-
internal with an address after that.  Given that btrfs blocks are 4 KiB 
in size, round the size of the small file up to the nearest 4 KiB and add 
that to the address originally obtained from the log, and see if inspect-
internal points at a different, presumably much larger (> 1 GiB or at 
least big enough so it'd extent beyond a GiB beyond the original 
address), file, with the new offset address.  If so, try moving /that/ 
file, and see if you have any better luck.

I was /hoping/ it would be the simple case and all the problem block-
group addresses would point to > 1 GiB files and moving them would be 
it.  But with a significant number of those addresses pointing at far 
smaller files, either I was wrong about the use of inspect-internal here 
and they're entirely unrelated, or the situation is otherwise rather more 
complex than I was hoping to be the case.

OTOH, if for whatever reason all those smaller files were fallocated to 
some huge size and then written smaller, or something similar happened 
such that they're using huge > 1 GiB extents even while being smaller 
than 1 GiB in size, that COULD go some distance to explaining why defrag 
missed them.  If defrag is looking at filesize and the files happen to be 
small but in huge extents, and it's those extents causing the problem, 
then we just found our bug, and all that's left is figuring out how to 
fix it, which is where I step out and the devs step in.  With a bit of 
luck, that's it, and we're now well on the way to fixing a bug that could 
have otherwise triggered unexplained problems for some people doing 
conversions, but not others, for quite some time to come.  =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10 22:28                             ` Robert White
  2014-12-11  4:13                               ` Duncan
@ 2014-12-11  6:16                               ` Patrik Lundquist
  1 sibling, 0 replies; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-11  6:16 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 10 December 2014 at 23:28, Robert White <rwhite@pobox.com> wrote:
> On 12/10/2014 10:56 AM, Patrik Lundquist wrote:
>>
>> On 10 December 2014 at 14:11, Duncan <1i5t5.duncan@cox.net> wrote:
>>>
>>> Assuming no snapshots still contain the file, of course, and that the
>>> ext* saved subvolume has already been deleted.
>>
>> Got no snapshots or subvolumes. Keeping it simple for now.
>
> Does that mean that you have already manually removed the subvolume that was
> automatically created by btrfs-convert?

Yes.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-10 13:36                         ` Patrik Lundquist
@ 2014-12-11  8:42                           ` Robert White
  2014-12-11  9:02                             ` Duncan
  2014-12-11  9:55                             ` Patrik Lundquist
  0 siblings, 2 replies; 36+ messages in thread
From: Robert White @ 2014-12-11  8:42 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org

On 12/10/2014 05:36 AM, Patrik Lundquist wrote:
> On 10 December 2014 at 13:17, Robert White <rwhite@pobox.com> wrote:
>> On 12/09/2014 11:19 PM, Patrik Lundquist wrote:
>>>
>> BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
>> filesystem. That is, the recommended balance (and recursive defrag) is _not_
>> a useability issue, its an efficiency issue.
>
> But if I can't start with an efficient filesystem I'd rather start
> over now/soon. I intend to add four more old disks for a RAID1 and it
> will be problematic to start over later on (I'd have to buy new, large
> disks).

Nope, not an issue.

When you add the space and rebalance with the conversions by adding all 
those other disks and such it will _completely_ _obliterate_ the current 
balance.

You are cleaning the house before the maid comes.

PLUS:::

If you are going to add four more volumes, if those volumes are big 
enough just make a new filesystem on them then copy the files over. You 
wont have any freakish nonsense left over from the old drive and its 
foibles. Then just add the existing drive to the "new" filesystem and 
_then_ do the balance.

Right now you are at best trying to iron over cruft from the conversion 
with the larger-than-1G extents and stuff that would never happen on a 
fresh system.

PLUS:::

The whole "time saving" chance of doing a conversion? Well that window 
closed last freaking month... 8-)

> I deleted the subvolume after being satisfied with the conversion,
> defragged recursively, and balanced. In that order.

Yea, but your file system is full and you are out of space so get on 
with the adding space.

>> Because you made a backup and everything yes?
>
> Shh!

>> So anyway. Your system isn't "bugged" or "broken" it's "full" but its a
>> fragmented fullness that has lots of free sectors but insufficient contiguous
>> free sectors, so it cannot satisfy the request.
>
> It's a half full 3TB disk. There _is_ space, somewhere. I can't speak
> for contiguous space though.

Contiguous space is all that matters here. It's trying to swallow a 
brick that is _slightly_ larger than any extent ext4 would have likely 
left hanging about.

(looking back through my mail spool) You haven't sent the output of 
/bin/df or btrfs fi df yet, I'd like to see what those two commands say.

No Space (to allocate a storage extent)

is different than

No Space (to allocate file contents).

So the space may just be sitting there in the difference between your 
data total= and your data used=

I mean this could easily be "situation normal" if your output looks like 
"Data, single: total=3TiB, used=1.5TiB" or something.

>>> I don't know how to interpret the space_info error. Why is only
>>> 4773171200 (4,4GiB) free?
>>> Can I inspect block group 1821099687936 to try to find out what makes
>>> it problematic?
>>>
>>> BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
>>> BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
>>> BTRFS: space_info 1 has 4773171200 free, is not full
>>> BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
>>> reserved=99700736, may_use=2102390784, readonly=241664
>>
>>
>> So it was looking for a single chunk 2013265920 bytes long and it couldn't
>> find one because all the spaces were smaller and there was no room to make a
>> new suitable space.
>>
>> The problem is that it wanted 2013265920 bytes and while the system as a
>> whole had no way to satisfy that desire. It asked for something just shy of
>> two gigs as a single extent. That's a tough order on a full platter.
>>
>> Since your entire free size is 2102390784 that is an attempt to allocate
>> about 80% of your free space as one contiguous block. That's never going to
>> happen. 8-)
>
> What about "space_info 1 has 4773171200 free"? Besides the other 1,5TB
> free space.

The "1" is the drive. That 4773171200 is not contiguous. I didn't look 
much further in the code because it's a new code base to me. But its 
asking for one contiguous extent of size 2013265920 and that's a 
non-starter for me. With the odd sized chunks possible after a 
conversion ... well pshaw...

>> I don't even know if 2GiB is normally a legal size for an extent. My
>> understanding is that data is allocated in 1G chunks, so I'd expect all
>> extents to be smaller than 1G.
>
> The 'summary' after the failed balances is always something like "98
> enospc errors" which now makes me suspect that I have 98 files with
> extents larger than 1GiB that the defrag didn't take care of.

Files? No extents. a.k.a. "chunks" whatever those are after a 
conversion. Room for extents is different than room for files.

> So if I can find out which files have >1GiB extents I can then copy
> them back and forth to solve the problem.

Deck Chairs. You are playing a game of musical deck chairs. Don't 
obsess. 8-)

> Maybe running defrag more times can also solve it? Can I get a list of
> fragmented files?

I wouldn't expect defrag to do a thing about this. The extents in the 
extent tree are not necessarily for single files. (they might _never_ be 
for single files.)

> Suppose an old file with 2GiB extent isn't fragmented, will btrfs
> defrag still try to defrag it?

No idea. I'd think it would not move something that is already 
contiguous. This isn't windows where the defrager itself leaves 
micro-fragments after each file. (Don't get me started on that nonsense. 8-)

>> After a quick glance at the btrfs-convert, it looks like it might make some
>> pretty atypical extents if the underlying donor filesystem needed needed
>> them. It wouldn't have had a choice. So it's easily within the realm of
>> reason that you'd have some really fascinating data as a result of
>> converting a nearly full EXT4 file system of the Terabyte+ size.
>
> It was about half full at conversion.

Being the opposite of an expert on btrfs-convert I cant help wonder 
where the threshold of discrimination is. I mean did it just take every 
single block group and toss it into a separate extent with no eye to the 
actual contents? That would be valid and fast. Then the allocation maps 
per-file would handle you re-using the referenced space etc.

>> This would
>> be quadruply true if you'd tweaked the block group ratios when you made the
>> original file system.
>
> Ext4 created with defaults, but I think it has been completely full at one time.

Did you use e4defrag before you did the conversion or is this the result 
of converting chaos most profound?

>> So since you have nice backups... you should probably drop the ext2_saved
>> subvolume and then get on with your life for good or ill.
>
> Done before defrag and balance attempts.

Good job.

>> Think of the time and worry you'd have saved if you'd copied the thing in
>> the first place. 8-)
>
> But then I wouldn't learn as much. :-)

Learning not to cut corners is a lesson... 8-)

>>>> P.S. you should re-balance your System and Metadata as "DUP" for now. Two
>>>> copies of that stuff is better than one as right now you have no real
>>>> recovery path for that stuff. If you didn't make that change on purpose
>>>> it
>>>> probably got down-revved from DUP automagically when you tired to RAID
>>>> it.
>>>
>>>
>>> Good point. Maybe btrfs-convert should do that by default? I don't
>>> think it has ever been DUP.
>>
>> Eyup.
>
> And the metadata is now DUP. That's ~1.5GB extra metadata that was
> allocated just fine after the failed balance.

More evidence that you are just trying to swallow a brick. Metadata is 
done in like 256Mb chunks I think, so yea, lots of room for that left 
sitting around on a typical EXT4 etc.

TRUTH BE TOLD :: After two very "eventful" conversions not too long ago 
I just don't do those any more. The total amount of time I "saved" by 
not copying the files was in the negative numbers before I just copied 
the files onto an external media and reformatted and restored.

Additionally I got the chance to lay out my subvolumes and decide about 
compression and such before doing the restore.

With a new filesystem I knew exactly what I was getting for a layout and 
I've had no mysteries since.

I don't know if thats politic to say in this list but really, most 
format conversions I've ever done (hearkening all the way back to some 
9-track tape excitement in the eighties) usually leave me feeling like 
maybe I hacked the corners off a cardboard box with a machete to make it 
fit in under a sofa.

Then again I am getting old and sometimes its easier to just chase kids 
off your lawn. 8-)

SO .....

What I'd do, most to least likely.

(0) look at df and btrfs fi df output and see if I could account for the 
free space I expected. If it's there I'd post a "oh hey, look at that" 
message on the list and then move on to one of the latter options.

then

(1) Make an New FS on those other drives and copy my working set onto 
it, that way I got all the defaults in sizes and extents and it will all 
be nice round numbers like 1G and 256Mb, cause some day I might be 
adding in SSDs or something. I like a nice orderly system.

(1a) Then I could maybe keep the old drive and dissect it's contents for 
fun and knowledge.

(1b) Then I could just add the old drive into the new array once I 
needed the storage.

or else

(2) Hook up the new drives and add them into the existing filesystem. 
Then balance the everything.

(2a) Look at the extent maps after that and discover that I still had 
odd 2-ish gig extents and silently fume at the assemetry.

or else

(3) Keep fiddling with it till I got frustrated then go back to one of 
the prior options. 8-)

It's like a choose-your-own-adventure book! 8-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-11  8:42                           ` Robert White
@ 2014-12-11  9:02                             ` Duncan
  2014-12-11  9:55                             ` Patrik Lundquist
  1 sibling, 0 replies; 36+ messages in thread
From: Duncan @ 2014-12-11  9:02 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Thu, 11 Dec 2014 00:42:38 -0800 as excerpted:

> TRUTH BE TOLD :: After two very "eventful" conversions not too long ago
> I just don't do those any more. The total amount of time I "saved" by
> not copying the files was in the negative numbers before I just copied
> the files onto an external media and reformatted and restored.

While I was running reiserfs and thus wasn't a conversion candidate, I 
have basically the same opinion of the ext* -> btrfs conversion tool.  
It's for people who don't have the extra space resources necessary to do 
a full backup, wipe clean and set it up the way you like, then restore.

That said, the conversion and subsequent btrfs troubleshooting has 
certainly been a "real world" learning experience for you (Patrik, not 
Robert as quoted above), and while I'd certainly start clean when I was 
really going to do it, for initially playing around, learning the tools, 
some troubleshooting, etc, and just to be able to say I've tried the 
conversion, I could easily see myself spending some time doing what you 
did, just learning the ropes, etc.  When I was done playing and ready to 
do it for real, I'd wipe the playground and start clean, more confident 
in my setup and management since I had spent some time playing with it 
and familiarizing myself with how it worked and what might work best in 
terms of my own setup.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-11  8:42                           ` Robert White
  2014-12-11  9:02                             ` Duncan
@ 2014-12-11  9:55                             ` Patrik Lundquist
  2014-12-11 11:01                               ` Robert White
  1 sibling, 1 reply; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-11  9:55 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 11 December 2014 at 09:42, Robert White <rwhite@pobox.com> wrote:
> On 12/10/2014 05:36 AM, Patrik Lundquist wrote:
>>
>> On 10 December 2014 at 13:17, Robert White <rwhite@pobox.com> wrote:
>>>
>>> On 12/09/2014 11:19 PM, Patrik Lundquist wrote:
>>>>
>>>>
>>> BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
>>> filesystem. That is, the recommended balance (and recursive defrag) is
>>> _not_
>>> a useability issue, its an efficiency issue.
>>
>>
>> But if I can't start with an efficient filesystem I'd rather start
>> over now/soon. I intend to add four more old disks for a RAID1 and it
>> will be problematic to start over later on (I'd have to buy new, large
>> disks).
>
>
> Nope, not an issue.
>
> When you add the space and rebalance with the conversions by adding all
> those other disks and such it will _completely_ _obliterate_ the current
> balance.

But if the issue is too large extents, why would they fit on any added
btrfs space?


> You are cleaning the house before the maid comes.

Indeed, as a health check. And the patient is slightly ill.


> If you are going to add four more volumes, if those volumes are big enough
> just make a new filesystem on them then copy the files over.

As it looks now, I will, but I also think there's a bug which I'm
trying to zero in on.


>> I deleted the subvolume after being satisfied with the conversion,
>> defragged recursively, and balanced. In that order.
>
> Yea, but your file system is full and you are out of space so get on with
> the adding space.

I don't think it is full. balance -musage=100 -dusage=99 completes
with ~1.5TB free space. The remaining unbalanced data is using full or
close to full blocks. Still can't speak for contiguous space though.


> (looking back through my mail spool) You haven't sent the output of /bin/df
> or btrfs fi df yet, I'd like to see what those two commands say.

I have posted these before, but not /bin/df (no access at the moment).

btrfs fi show
Label: none  uuid: 770fe01d-6a45-42b9-912e-
e8f8b413f6a4
    Total devices 1 FS bytes used 1.35TiB
    devid    1 size 2.73TiB used 1.36TiB path /dev/sdc1


btrfs fi df /mnt
Data, single: total=1.35TiB, used=1.35TiB
System, single: total=32.00MiB, used=112.00KiB
Metadata, single: total=3.00GiB, used=1.55GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


btrfs check /dev/sdc1
Checking filesystem on /dev/sdc1
UUID: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
found 825003219475 bytes used err is 0
total csum bytes: 1452612464
total tree bytes: 1669943296
total fs tree bytes: 39600128
total extent tree bytes: 52903936
btree space waste bytes: 79921034
file data blocks allocated: 1487627730944
 referenced 1487627730944



>>> This would
>>> be quadruply true if you'd tweaked the block group ratios when you made
>>> the original file system.
>>
>> Ext4 created with defaults, but I think it has been completely full at one
>> time.
>
> Did you use e4defrag before you did the conversion or is this the result of
> converting chaos most profound?

Didn't use e4defrag.



>>> Think of the time and worry you'd have saved if you'd copied the thing in
>>> the first place. 8-)
>>
>> But then I wouldn't learn as much. :-)
>
> Learning not to cut corners is a lesson... 8-)

This is more of an experiment than cutting corners, but yeah.


> TRUTH BE TOLD :: After two very "eventful" conversions not too long ago I
> just don't do those any more. The total amount of time I "saved" by not
> copying the files was in the negative numbers before I just copied the files
> onto an external media and reformatted and restored.

Conversion probably should be discouraged on the wiki then.


> It's like a choose-your-own-adventure book! 8-)

I like that! :-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-11  4:13                               ` Duncan
@ 2014-12-11 10:29                                 ` Patrik Lundquist
  0 siblings, 0 replies; 36+ messages in thread
From: Patrik Lundquist @ 2014-12-11 10:29 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 11 December 2014 at 05:13, Duncan <1i5t5.duncan@cox.net> wrote:
>
> Patrik correct me if I have this wrong, but filling in the history as I
> believe I have it...

You're right Duncan, except it began as a private question about an
error in a blog and went from there. Not that it matters, except the
subject is not very fitting anymore and I tried to reboot the thread
with a summary since it's getting a bit hard to find the facts.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Fixing Btrfs Filesystem Full Problems typo?
  2014-12-11  9:55                             ` Patrik Lundquist
@ 2014-12-11 11:01                               ` Robert White
  0 siblings, 0 replies; 36+ messages in thread
From: Robert White @ 2014-12-11 11:01 UTC (permalink / raw)
  To: Patrik Lundquist, linux-btrfs@vger.kernel.org

On 12/11/2014 01:55 AM, Patrik Lundquist wrote:
> On 11 December 2014 at 09:42, Robert White <rwhite@pobox.com> wrote:
>> On 12/10/2014 05:36 AM, Patrik Lundquist wrote:
>>>
>>> On 10 December 2014 at 13:17, Robert White <rwhite@pobox.com> wrote:
>>>>
>>>> On 12/09/2014 11:19 PM, Patrik Lundquist wrote:
>>>>>
>>>>>
>>>> BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
>>>> filesystem. That is, the recommended balance (and recursive defrag) is
>>>> _not_
>>>> a useability issue, its an efficiency issue.
>>>
>>>
>>> But if I can't start with an efficient filesystem I'd rather start
>>> over now/soon. I intend to add four more old disks for a RAID1 and it
>>> will be problematic to start over later on (I'd have to buy new, large
>>> disks).
>>
>>
>> Nope, not an issue.
>>
>> When you add the space and rebalance with the conversions by adding all
>> those other disks and such it will _completely_ _obliterate_ the current
>> balance.
>
> But if the issue is too large extents, why would they fit on any added
> btrfs space?

Because that added btrfs space will be _empty_. It's not that the extent 
is "too big" by some absolute measure. It's that it's too big to fit in 
the available space at the _extent_ _tree_ level.

You can't put two feet into one shoe.

>> You are cleaning the house before the maid comes.
>
> Indeed, as a health check. And the patient is slightly ill.

Not really...

So lets say I have a bunch of things that are all size 10-inches

And lets say I space them along a rail with 9-inches between each object.

And I glue them down (because Copy On Write only)

And I do that until the rail is "full", say it takes 100 to fill the rail.

So I still have 900 inches of "free space" but I don't have _any_ _more_ 
_room_ available if I need to mount another 10-inch item.

There's plenty of space but there is no room.

This is what youve got going on.

The conversion hoovered up all the block groups from the ext4 donor 
image more-or-less, and then it built the metadata blocks

(see btrfs-convert at about line 1486)
/* for each block group, create device extent and chunk item */
etc...

>> If you are going to add four more volumes, if those volumes are big enough
>> just make a new filesystem on them then copy the files over.
>
> As it looks now, I will, but I also think there's a bug which I'm
> trying to zero in on.

It doesn't exist. There is no bug that I can see from anything you've shown.

You are confusing the word "extent" as used in ext4, which is a per-file 
thing, with the word "extent" as used differently in btrfs which is a 
raw storage region into which other structures or data is placed.

>>> I deleted the subvolume after being satisfied with the conversion,
>>> defragged recursively, and balanced. In that order.
>>
>> Yea, but your file system is full and you are out of space so get on with
>> the adding space.
>
> I don't think it is full. balance -musage=100 -dusage=99 completes
> with ~1.5TB free space. The remaining unbalanced data is using full or
> close to full blocks. Still can't speak for contiguous space though.
>
>
>> (looking back through my mail spool) You haven't sent the output of /bin/df
>> or btrfs fi df yet, I'd like to see what those two commands say.
>
> I have posted these before, but not /bin/df (no access at the moment).

Ah, yes, I remember these, but the /bin/df is what's going to be 
dispositive.

> btrfs fi show
> Label: none  uuid: 770fe01d-6a45-42b9-912e-
> e8f8b413f6a4
>      Total devices 1 FS bytes used 1.35TiB
>      devid    1 size 2.73TiB used 1.36TiB path /dev/sdc1
>
>
> btrfs fi df /mnt
> Data, single: total=1.35TiB, used=1.35TiB
> System, single: total=32.00MiB, used=112.00KiB
> Metadata, single: total=3.00GiB, used=1.55GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> btrfs check /dev/sdc1
> Checking filesystem on /dev/sdc1
> UUID: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
> found 825003219475 bytes used err is 0
> total csum bytes: 1452612464
> total tree bytes: 1669943296
> total fs tree bytes: 39600128
> total extent tree bytes: 52903936
> btree space waste bytes: 79921034
> file data blocks allocated: 1487627730944
>   referenced 1487627730944

>>>> This would
>>>> be quadruply true if you'd tweaked the block group ratios when you made
>>>> the original file system.
>>>
>>> Ext4 created with defaults, but I think it has been completely full at one
>>> time.
>>
>> Did you use e4defrag before you did the conversion or is this the result of
>> converting chaos most profound?
>
> Didn't use e4defrag.

Probably doesn't matter. Now that I've read more of btrfs-convert.c I 
think I can see how this is shaking out. e4defrag might have packed the 
block groups tighter but it doesn't really try to maximize free space 
within the extent.

>>>> Think of the time and worry you'd have saved if you'd copied the thing in
>>>> the first place. 8-)
>>>
>>> But then I wouldn't learn as much. :-)
>>
>> Learning not to cut corners is a lesson... 8-)
>
> This is more of an experiment than cutting corners, but yeah.
>
>
>> TRUTH BE TOLD :: After two very "eventful" conversions not too long ago I
>> just don't do those any more. The total amount of time I "saved" by not
>> copying the files was in the negative numbers before I just copied the files
>> onto an external media and reformatted and restored.
>
> Conversion probably should be discouraged on the wiki then.

I didn't pursue the wiki on the matter, but conversion of anything to 
anything always requires living with the limits of both, at least to 
start. In this case you are suffering under the burden of the block 
group alignment and layout that was selected by mkfs.ext4, which is 
based on assumptions optimal to ext4.

Systems are _rarely_ replaced by other systems based on the same 
assumptions.

As a terrible aside example, EXT4 says it can support file extent sizes 
up to two gig. But that assumes your CPU memory page size is 64k. On a 
typical Intel PC the page size is 4k, so your maximum extent size is 
1/8th that size. I filed a bug on that some time ago because e4defrag 
output didn't take that into account.

e.g. http://sourceforge.net/p/e2fsprogs/bugs/314/

The mythology of that two-gig file extent has people allocating VM drive 
stripes and rdbms files (etc) in two-gig chunks thinking they are 
optimally alinging things with their drive allocations. But when they do 
it on an intel box they are wrong. Those extents should have been 128Meg 
if they wanted one file equals one extent layouts.

So assumptions in systems can become pernicious, and when you try to do 
any sort of in-place conversion you are likely to end up with the least 
of all worlds.

The devils are always in the details.

Heck, we are still dragging around head/track/sector disk geometry 
nonsense despite variable pitch recording performed on modern drives. 
Thats because we just keep converting old ideas to new.

My "eventful" conversions of those two disks may well have (and probably 
were) completely my own doing. It's a poor craftsman that blames his tools.

My house is a mess. My computers tend to be scrupulously organized. And 
the result from btrfs-convert just doesn't seem optimal for all future 
geometries. After all if default extent size of 0x1000 and 0x8000 were 
chosen for optimal cause (instead of beauty) ending up with a buch of 
two-gig-ish extents would oppose that cause.

It just feels ookie to use btrfs-convert in _my_ _humble_ _opinion_.

>> It's like a choose-your-own-adventure book! 8-)
>
> I like that! :-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-12-11 11:01 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAA7pwKNH-Cbd+_D+sCEJxxdervLC=_3_AzaywSE3mXi8MLydxw@mail.gmail.com>
2014-11-22 22:26 ` Fixing Btrfs Filesystem Full Problems typo? Marc MERLIN
2014-11-22 23:26   ` Patrik Lundquist
2014-11-22 23:46     ` Marc MERLIN
2014-11-23  0:05     ` Hugo Mills
2014-11-23  1:07       ` Marc MERLIN
2014-11-23  7:52         ` Duncan
2014-11-23 15:12           ` Patrik Lundquist
2014-11-24  4:23             ` Duncan
2014-11-24 12:35               ` Patrik Lundquist
2014-12-09 22:29                 ` Patrik Lundquist
2014-12-09 23:13                   ` Robert White
2014-12-10  7:19                     ` Patrik Lundquist
2014-12-10 12:17                       ` Robert White
2014-12-10 13:11                         ` Duncan
2014-12-10 18:56                           ` Patrik Lundquist
2014-12-10 22:28                             ` Robert White
2014-12-11  4:13                               ` Duncan
2014-12-11 10:29                                 ` Patrik Lundquist
2014-12-11  6:16                               ` Patrik Lundquist
2014-12-10 13:36                         ` Patrik Lundquist
2014-12-11  8:42                           ` Robert White
2014-12-11  9:02                             ` Duncan
2014-12-11  9:55                             ` Patrik Lundquist
2014-12-11 11:01                               ` Robert White
2014-12-09 23:20                   ` Robert White
2014-12-09 23:48                   ` Robert White
2014-12-10  0:01                     ` Robert White
2014-12-10 12:47                       ` Duncan
2014-12-10 20:11                         ` Patrik Lundquist
2014-12-11  4:02                           ` Duncan
2014-12-11  4:49                           ` Duncan
2014-11-23 21:16           ` Marc MERLIN
2014-11-23 22:49             ` Holger Hoffstätte
2014-11-24  4:40               ` Duncan
2014-12-07 21:38           ` Marc MERLIN
2014-11-24 18:05         ` Brendan Hide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).