Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* counting fragments takes more time than defragmenting
@ 2015-06-04  8:42 Marc MERLIN
  2015-06-24  3:20 ` Marc MERLIN
  0 siblings, 1 reply; 13+ messages in thread
From: Marc MERLIN @ 2015-06-04  8:42 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

Hi Chris,

After our quick chat, I gave it a shot on 3.19.6, and things are better
than last time I tried.

legolas:/var/local/nobck/VirtualBox VMs# lsattr Win7/
---------------C Win7/Logs
---------------C Win7/Snapshots
---------------C Win7/Win7.vdi
---------------C Win7/Win7.png
---------------C Win7/autotune1.png
---------------C Win7/new_autotune2.png
---------------C Win7/Win7.vbox-prev
---------------C Win7/Win7.vbox

But I have snapshots of that subvolume, so obviously that gets
in the way of disabling COW.

I had a look, and I have 100K fragments. That took 10mn to figure out:

legolas:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi
Win7.vdi: 104306 extents found

This first filefrag run took about 10mn to count all the fragments on my
SSD. That feels a bit slow, but maybe the userland tool is doing things
in suboptimal ways.

Defrag actually worked (mostly) and wasn't too slow. It used to take hours
not to finish, and now it worked in 3mn:
legolas:/var/local/nobck/VirtualBox VMs/Win7# time btrfs fi defrag Win7.vdi 
real	3m43.807s
user	0m0.000s
sys	0m44.044s

This is defintely better than before.
Note that it's not fully defragged, but close enough. Each subsequent
run, filefrag is faster, and defrag is still faster than filefrag:

legolas:/var/local/nobck/VirtualBox VMs/Win7# time filefrag Win7.vdi
Win7.vdi: 11428 extents found
real	2m42.090s
user	0m0.000s
sys	2m37.308s

legolas:/var/local/nobck/VirtualBox VMs/Win7# time btrfs fi defrag Win7.vdi 
real	0m7.483s
user	0m0.000s
sys	0m2.672s

legolas:/var/local/nobck/VirtualBox VMs/Win7# time filefrag Win7.vdi
Win7.vdi: 11132 extents found
real	0m22.525s
user	0m0.000s
sys	0m22.264s

It's a bit unexpected that I still have 10k fragments after 2 defrag
runs, but it's better than 100k :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-06-04  8:42 counting fragments takes more time than defragmenting Marc MERLIN
@ 2015-06-24  3:20 ` Marc MERLIN
  2015-06-24  8:28   ` Patrik Lundquist
  0 siblings, 1 reply; 13+ messages in thread
From: Marc MERLIN @ 2015-06-24  3:20 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

Hello again,

Just curious, is anyone seeing similar things with big VM images or other
DBs?
I forgot to mention that my vdi file is 88GB.

It's surprising that it took longer to count the fragments than to actually
defragment the file.
Or that it took 3 defrag runs to get down to 11K extents from 104K.

Are others seeing similar things?

Marc

On Thu, Jun 04, 2015 at 05:42:45PM +0900, Marc MERLIN wrote:
> Hi Chris,
> 
> After our quick chat, I gave it a shot on 3.19.6, and things are better
> than last time I tried.
> 
> legolas:/var/local/nobck/VirtualBox VMs# lsattr Win7/
> ---------------C Win7/Logs
> ---------------C Win7/Snapshots
> ---------------C Win7/Win7.vdi
> ---------------C Win7/Win7.png
> ---------------C Win7/autotune1.png
> ---------------C Win7/new_autotune2.png
> ---------------C Win7/Win7.vbox-prev
> ---------------C Win7/Win7.vbox
> 
> But I have snapshots of that subvolume, so obviously that gets
> in the way of disabling COW.
> 
> I had a look, and I have 100K fragments. That took 10mn to figure out:
> 
> legolas:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi
> Win7.vdi: 104306 extents found
> 
> This first filefrag run took about 10mn to count all the fragments on my
> SSD. That feels a bit slow, but maybe the userland tool is doing things
> in suboptimal ways.
> 
> Defrag actually worked (mostly) and wasn't too slow. It used to take hours
> not to finish, and now it worked in 3mn:
> legolas:/var/local/nobck/VirtualBox VMs/Win7# time btrfs fi defrag Win7.vdi 
> real	3m43.807s
> user	0m0.000s
> sys	0m44.044s
> 
> This is defintely better than before.
> Note that it's not fully defragged, but close enough. Each subsequent
> run, filefrag is faster, and defrag is still faster than filefrag:
> 
> legolas:/var/local/nobck/VirtualBox VMs/Win7# time filefrag Win7.vdi
> Win7.vdi: 11428 extents found
> real	2m42.090s
> user	0m0.000s
> sys	2m37.308s
> 
> legolas:/var/local/nobck/VirtualBox VMs/Win7# time btrfs fi defrag Win7.vdi 
> real	0m7.483s
> user	0m0.000s
> sys	0m2.672s
> 
> legolas:/var/local/nobck/VirtualBox VMs/Win7# time filefrag Win7.vdi
> Win7.vdi: 11132 extents found
> real	0m22.525s
> user	0m0.000s
> sys	0m22.264s
> 
> It's a bit unexpected that I still have 10k fragments after 2 defrag
> runs, but it's better than 100k :)
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-06-24  3:20 ` Marc MERLIN
@ 2015-06-24  8:28   ` Patrik Lundquist
  2015-06-24 10:46     ` Duncan
  0 siblings, 1 reply; 13+ messages in thread
From: Patrik Lundquist @ 2015-06-24  8:28 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Chris Mason, linux-btrfs

On 24 June 2015 at 05:20, Marc MERLIN <marc@merlins.org> wrote:
>
> Hello again,
>
> Just curious, is anyone seeing similar things with big VM images or other
> DBs?
> I forgot to mention that my vdi file is 88GB.
>
> It's surprising that it took longer to count the fragments than to actually
> defragment the file.
> Or that it took 3 defrag runs to get down to 11K extents from 104K.
>
> Are others seeing similar things?

Filefrag is pretty much instant for my 30GB (150 extents) virtual
disk, no CoW on file, no snapshots on volume.

But what doesn't make sense to me is btrfs fi defrag; the -t option says

           -t <size>
               defragment only files at least <size> bytes big

The -t value goes into struct
btrfs_ioctl_defrag_range_args.extent_thresh which is documented as

       /*
         * any extent bigger than this will be considered
         * already defragged.  Use 0 to take the kernel default
         * Use 1 to say every single extent must be rewritten
         */

Default extent_thresh is 256K. I can't see how 1 would say every
single extent must be rewritten. On the contrary; 1 skips every
extent. The compress flag even sets extent_thresh=(u32)-1 to force a
rewrite.

Marc, try btrfs fi defrag -t 4294967295 Win7.vdi for maximum defrag
and time filefrag again with fewer extents.

/Patrik


> Marc
>
> On Thu, Jun 04, 2015 at 05:42:45PM +0900, Marc MERLIN wrote:
> > Hi Chris,
> >
> > After our quick chat, I gave it a shot on 3.19.6, and things are better
> > than last time I tried.
> >
> > legolas:/var/local/nobck/VirtualBox VMs# lsattr Win7/
> > ---------------C Win7/Logs
> > ---------------C Win7/Snapshots
> > ---------------C Win7/Win7.vdi
> > ---------------C Win7/Win7.png
> > ---------------C Win7/autotune1.png
> > ---------------C Win7/new_autotune2.png
> > ---------------C Win7/Win7.vbox-prev
> > ---------------C Win7/Win7.vbox
> >
> > But I have snapshots of that subvolume, so obviously that gets
> > in the way of disabling COW.
> >
> > I had a look, and I have 100K fragments. That took 10mn to figure out:
> >
> > legolas:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi
> > Win7.vdi: 104306 extents found
> >
> > This first filefrag run took about 10mn to count all the fragments on my
> > SSD. That feels a bit slow, but maybe the userland tool is doing things
> > in suboptimal ways.
> >
> > Defrag actually worked (mostly) and wasn't too slow. It used to take hours
> > not to finish, and now it worked in 3mn:
> > legolas:/var/local/nobck/VirtualBox VMs/Win7# time btrfs fi defrag Win7.vdi
> > real  3m43.807s
> > user  0m0.000s
> > sys   0m44.044s
> >
> > This is defintely better than before.
> > Note that it's not fully defragged, but close enough. Each subsequent
> > run, filefrag is faster, and defrag is still faster than filefrag:
> >
> > legolas:/var/local/nobck/VirtualBox VMs/Win7# time filefrag Win7.vdi
> > Win7.vdi: 11428 extents found
> > real  2m42.090s
> > user  0m0.000s
> > sys   2m37.308s
> >
> > legolas:/var/local/nobck/VirtualBox VMs/Win7# time btrfs fi defrag Win7.vdi
> > real  0m7.483s
> > user  0m0.000s
> > sys   0m2.672s
> >
> > legolas:/var/local/nobck/VirtualBox VMs/Win7# time filefrag Win7.vdi
> > Win7.vdi: 11132 extents found
> > real  0m22.525s
> > user  0m0.000s
> > sys   0m22.264s
> >
> > It's a bit unexpected that I still have 10k fragments after 2 defrag
> > runs, but it's better than 100k :)
> >
> > Marc
> > --
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > Microsoft is to operating systems ....
> >                                       .... what McDonalds is to gourmet cooking
> > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-06-24  8:28   ` Patrik Lundquist
@ 2015-06-24 10:46     ` Duncan
  2015-06-24 12:05       ` Patrik Lundquist
  2015-07-14 11:57       ` Patrik Lundquist
  0 siblings, 2 replies; 13+ messages in thread
From: Duncan @ 2015-06-24 10:46 UTC (permalink / raw)
  To: linux-btrfs

Patrik Lundquist posted on Wed, 24 Jun 2015 10:28:09 +0200 as excerpted:

> But what doesn't make sense to me is btrfs fi defrag; the -t option says
> 
>            -t <size>
>                defragment only files at least <size> bytes big
> 
> The -t value goes into struct
> btrfs_ioctl_defrag_range_args.extent_thresh which is documented as
> 
>        /*
>          * any extent bigger than this will be considered * already
>          defragged.  Use 0 to take the kernel default * Use 1 to say
>          every single extent must be rewritten */
> 
> Default extent_thresh is 256K. I can't see how 1 would say every single
> extent must be rewritten. On the contrary; 1 skips every extent. The
> compress flag even sets extent_thresh=(u32)-1 to force a rewrite.
> 
> Marc, try btrfs fi defrag -t 4294967295 Win7.vdi for maximum defrag and
> time filefrag again with fewer extents.

The manpage wording for btrfs fi defrag -t has been debated on-list 
several times, and I believe remains (as of btrfs-progs v4.1) confusing 
still.

First, under the general defragment description, before the individual 
options, it says:

>>>> Any extent bigger than threshold given by -t option, will be
>>>> considered already defragged. Use 0 to take the kernel default.

So according to that, an extent BIGGER than -t is treated as already 
defragged (it defrags SMALLER extents)

But, under the -t option, it says:

>>>> -t <size>[kKmMgGtTpPeE]
>>>>     defragment only files at least <size> bytes big

So according to that, only extents BIGGER than -t are defragged (smaller 
is ignored).

Again, that's the btrfs-filesystem (8) manpage as of -progs 4.1.

So which is it?  The manpage itself can't make up its mind.


AFAIK, it's set huge to defrag everything, but last time I posted on this 
I got it wrong, and I don't remember for sure what I said then, so... try 
it and see to be sure, which is what I'd do.

Meanwhile, it's worth noting that btrfs data chunks are normally 1 GiB 
(tho apparently they can be bigger under certain circumstances).  1 
extent per chunk is the best btrfs normally does, which means 1 GiB per 
extent is nominally the best that can be done, with the first and last 
extent possibly less than a gig (the first taking up the remainder of a 
partially used chunk and the last finishing up the file, which probably 
won't end on an even chunk boundary).

Assuming "set a huge -t to defrag to the maximum extent possible" is 
correct, that means -t 1G should be exactly as effective as -t 1T...

Regardless of whether 1 or huge -t means maximum defrag, however, the 
nominal data chunk size of 1 GiB means that 30 GiB file you mentioned 
should be considered ideally defragged at 31 extents.  This is a 
departure from ext4, which AFAIK in theory has no extent upper limit, so 
should be able to do that 30 GiB file in a single extent.

But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents 
still indicates at least some remaining fragmentation.

Finally, last I remember, filefrag didn't understand btrfs compression 
(which is off for nocow, so this shouldn't apply there), which uses 128 
KiB blocks IIRC.  Until it does, large btrfs-compressed files will always 
show many extents 8/MiB, so thousands on anything even close to a GiB, 
tens of thousands on multiple GiBs).  But I believe there had been some 
work to teach filefrag about btrfs compression, tho I don't know if it 
has made it into an e2fsprogs release, yet.  If so, it'll be pretty close 
to the latest release.  So anything but the latest filefrag won't be 
accurate with btrfs-compressed files, while the latest may now be 
accurate, or not, I'm not sure.  I guess one could check e2fsprogs' 
release notes...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-06-24 10:46     ` Duncan
@ 2015-06-24 12:05       ` Patrik Lundquist
  2015-06-25  4:01         ` Duncan
  2015-07-14 11:57       ` Patrik Lundquist
  1 sibling, 1 reply; 13+ messages in thread
From: Patrik Lundquist @ 2015-06-24 12:05 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 24 June 2015 at 12:46, Duncan <1i5t5.duncan@cox.net> wrote:
> Patrik Lundquist posted on Wed, 24 Jun 2015 10:28:09 +0200 as excerpted:
>
> AFAIK, it's set huge to defrag everything,

It's set to 256K by default.


> Assuming "set a huge -t to defrag to the maximum extent possible" is
> correct, that means -t 1G should be exactly as effective as -t 1T...

1G is actually more effective because 1T overflows the uint32
extent_thresh field, so 1T, 0, and 256K are currently the same.

3G is the largest value that works with -t as expected (disregarding
the man page) and is easy to type.


> But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
> still indicates at least some remaining fragmentation.

I gave it another shot but I've now got 154 extents instead. :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-06-24 12:05       ` Patrik Lundquist
@ 2015-06-25  4:01         ` Duncan
  2015-06-25  6:30           ` Patrik Lundquist
  0 siblings, 1 reply; 13+ messages in thread
From: Duncan @ 2015-06-25  4:01 UTC (permalink / raw)
  To: linux-btrfs

Patrik Lundquist posted on Wed, 24 Jun 2015 14:05:57 +0200 as excerpted:

> On 24 June 2015 at 12:46, Duncan <1i5t5.duncan@cox.net> wrote:
>> Patrik Lundquist posted on Wed, 24 Jun 2015 10:28:09 +0200 as
>> excerpted:
>>
>> AFAIK, it's set huge to defrag everything,
> 
> It's set to 256K by default.

What I meant is that AFAIK, set it huge to defrag everything...

>> Assuming "set a huge -t to defrag to the maximum extent possible" is
>> correct, that means -t 1G should be exactly as effective as -t 1T...
> 
> 1G is actually more effective because 1T overflows the uint32
> extent_thresh field, so 1T, 0, and 256K are currently the same.

Then the manpage needs some work (in addition to the more serious 
ambiguity over whether 1 or 1G means defrag everything), since it 
mentions upto petabyte (P), without any indication that setting anything 
that large won't work as expected.

If it's uint32 limited, either kill everything above that in both the 
documentation and code, or alias everything above that to 3G (your next 
paragraph) or whatever.

> 3G is the largest value that works with -t as expected (disregarding the
> man page) and is easy to type.
> 
> 
>> But btrfs or ext4, 31 extents ideal or a single extent ideal, 150
>> extents still indicates at least some remaining fragmentation.
> 
> I gave it another shot but I've now got 154 extents instead. :-)

Is it possible there's simply no gig-size free-space holes in the 
filesystem allocation, so it simply /can't/ defrag further than that, 
because there's no place to allocate whole-gig data chunks at a time?

Which brings up a more general defrag functionality question.  For multi-
gig files, does btrfs fi defrag allocate fresh data chunks in ordered to 
create the largest extents possible (possibly after filling the remainder 
of the original first chunk), thereby increasing data chunk allocation 
before fully using currently allocated chunks, or does it try to find the 
biggest extents possible in currently allocated chunks, first, and only 
allocate new chunks when all current allocation is full?

Obviously if it uses up current allocations first, that could explain 
your problem.  OTOH, if either defrag or general allocation strategy 
favors new chunks for large extents when necessary, that would explain 
the "deoptimization" some people report from running balance.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-06-25  4:01         ` Duncan
@ 2015-06-25  6:30           ` Patrik Lundquist
  0 siblings, 0 replies; 13+ messages in thread
From: Patrik Lundquist @ 2015-06-25  6:30 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 25 June 2015 at 06:01, Duncan <1i5t5.duncan@cox.net> wrote:
>
> Patrik Lundquist posted on Wed, 24 Jun 2015 14:05:57 +0200 as excerpted:
>
> > On 24 June 2015 at 12:46, Duncan <1i5t5.duncan@cox.net> wrote:
>
> If it's uint32 limited, either kill everything above that in both the
> documentation and code, or alias everything above that to 3G (your next
> paragraph) or whatever.

My simple overflow patch yesterday fixes the problem, so 4G or larger
is max instead of 0.


> >> But btrfs or ext4, 31 extents ideal or a single extent ideal, 150
> >> extents still indicates at least some remaining fragmentation.
> >
> > I gave it another shot but I've now got 154 extents instead. :-)
>
> Is it possible there's simply no gig-size free-space holes in the
> filesystem allocation, so it simply /can't/ defrag further than that,
> because there's no place to allocate whole-gig data chunks at a time?

I would guess so, without allocating new chunks. Defrag can probably
be smarter and avoid rewriting extents if it means splitting them
(unless the compression flag is set and it must rewrite everything).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-06-24 10:46     ` Duncan
  2015-06-24 12:05       ` Patrik Lundquist
@ 2015-07-14 11:57       ` Patrik Lundquist
  2015-07-14 17:32         ` Duncan
  2015-07-14 18:41         ` Hugo Mills
  1 sibling, 2 replies; 13+ messages in thread
From: Patrik Lundquist @ 2015-07-14 11:57 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 24 June 2015 at 12:46, Duncan <1i5t5.duncan@cox.net> wrote:
>
> Regardless of whether 1 or huge -t means maximum defrag, however, the
> nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
> should be considered ideally defragged at 31 extents.  This is a
> departure from ext4, which AFAIK in theory has no extent upper limit, so
> should be able to do that 30 GiB file in a single extent.
>
> But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
> still indicates at least some remaining fragmentation.

So I converted the VMware VMDK file to a VirtualBox VDI file:

-rw------- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
-rw------- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi

$ filefrag Windows7.vdi
Windows7.vdi: 15 extents found

$ btrfs filesystem defragment -t 3g Windows7.vdi
$ filefrag Windows7.vdi
Windows7.vdi: 24 extents found

How can it be less than 28 extents with a chunk size of 1 GiB?

E2fsprogs version 1.42.12

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-07-14 11:57       ` Patrik Lundquist
@ 2015-07-14 17:32         ` Duncan
  2015-07-14 18:41         ` Hugo Mills
  1 sibling, 0 replies; 13+ messages in thread
From: Duncan @ 2015-07-14 17:32 UTC (permalink / raw)
  To: linux-btrfs

Patrik Lundquist posted on Tue, 14 Jul 2015 13:57:07 +0200 as excerpted:

> On 24 June 2015 at 12:46, Duncan <1i5t5.duncan@cox.net> wrote:
>>
>> Regardless of whether 1 or huge -t means maximum defrag, however, the
>> nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
>> should be considered ideally defragged at 31 extents.  This is a
>> departure from ext4, which AFAIK in theory has no extent upper limit,
>> so should be able to do that 30 GiB file in a single extent.
>>
>> But btrfs or ext4, 31 extents ideal or a single extent ideal, 150
>> extents still indicates at least some remaining fragmentation.
> 
> So I converted the VMware VMDK file to a VirtualBox VDI file:
> 
> -rw------- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
> -rw------- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi
> 
> $ filefrag Windows7.vdi Windows7.vdi: 15 extents found
> 
> $ btrfs filesystem defragment -t 3g Windows7.vdi $ filefrag Windows7.vdi
> Windows7.vdi: 24 extents found
> 
> How can it be less than 28 extents with a chunk size of 1 GiB?
> 
> E2fsprogs version 1.42.12

That's why I said "nominal"[1] 1 GiB.  I'm just a list and filesystem 
user, not a dev, and I don't know the details, but someone (a dev or at 
least someone that can actually read code, but not a btrfs dev) mentioned 
in reply to a post of mine a few months ago, that under the right 
conditions, btrfs can allocate larger-than 1 GiB data chunks.

I /believe/ data chunk allocation size has something to do with the 
amount of unallocated space on the filesystem; that on large (TiB plus, 
perhaps) btrfs some of the initial allocations will be multiple GiB, 
which of course would allow greater-than 1 GiB extents as well.  But I 
really don't know the conditions under which that can happen and I've not 
seen an actual btrfs dev comment on it, and AFAIK the "base" data chunk 
size remains 1 GiB under most conditions.  Meanwhile, I tend to partition 
up my storage here, and while I have multiple separate btrfs, the 
partitions are all under 50 GiB, so I'm unlikely to see that sort of > 1 
GiB data chunk allocations at all, here.

So rather than go to the complexity of explaining all this detail that 
I'm not sure of anyway, I deliberately blurred out a bit as not necessary 
to the primary point, which was that for files over a GiB, don't expect 
to see or be able to defrag to a single extent, as 1 GiB data chunks and 
thus extents are nominal/normal.

If it does happen, I'd consider it due to those data "superchunks" and 
wouldn't be entirely surprised, but the point remains that you're 
unlikely to get the number of extents much below the file size number in 
GiB using defrag, even when everything is working "perfectly as designed".

---
[1] Nominal: In the sense of normal or standard as-designed value, see 
wiktionary's English adjective sense 6 and 10, as well as the wikipedia 
writeups on real vs. nominal values and nominal size:

https://en.wiktionary.org/wiki/nominal#Adjective
https://en.wikipedia.org/wiki/Real_versus_nominal_value
https://en.wikipedia.org/wiki/Nominal_size

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-07-14 11:57       ` Patrik Lundquist
  2015-07-14 17:32         ` Duncan
@ 2015-07-14 18:41         ` Hugo Mills
  2015-07-14 19:09           ` Patrik Lundquist
  1 sibling, 1 reply; 13+ messages in thread
From: Hugo Mills @ 2015-07-14 18:41 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1741 bytes --]

On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote:
> On 24 June 2015 at 12:46, Duncan <1i5t5.duncan@cox.net> wrote:
> >
> > Regardless of whether 1 or huge -t means maximum defrag, however, the
> > nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
> > should be considered ideally defragged at 31 extents.  This is a
> > departure from ext4, which AFAIK in theory has no extent upper limit, so
> > should be able to do that 30 GiB file in a single extent.
> >
> > But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
> > still indicates at least some remaining fragmentation.
> 
> So I converted the VMware VMDK file to a VirtualBox VDI file:
> 
> -rw------- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
> -rw------- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi
> 
> $ filefrag Windows7.vdi
> Windows7.vdi: 15 extents found
> 
> $ btrfs filesystem defragment -t 3g Windows7.vdi
> $ filefrag Windows7.vdi
> Windows7.vdi: 24 extents found
> 
> How can it be less than 28 extents with a chunk size of 1 GiB?

   I _think_ the fragment size will be limited by the block group
size. This is not the same as the chunk size for some RAID levels --
for example, RAID-0, a block group can be anything from 2 to n chunks
(across the same number of devices), where each chunk is 1 GiB, so
potentially you could have arbitrary-sized block groups. The same
would apply to RAID-10, -5 and -6.

   (Note, I haven't verified this, but it makes sense based on what I
know of the internal data structures).

   Hugo.

-- 
Hugo Mills             | Go not to the elves for counsel, for they will say
hugo@... carfax.org.uk | both no and yes.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-07-14 18:41         ` Hugo Mills
@ 2015-07-14 19:09           ` Patrik Lundquist
  2015-07-14 19:15             ` Hugo Mills
  0 siblings, 1 reply; 13+ messages in thread
From: Patrik Lundquist @ 2015-07-14 19:09 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 14 July 2015 at 20:41, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote:
>> On 24 June 2015 at 12:46, Duncan <1i5t5.duncan@cox.net> wrote:
>> >
>> > Regardless of whether 1 or huge -t means maximum defrag, however, the
>> > nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
>> > should be considered ideally defragged at 31 extents.  This is a
>> > departure from ext4, which AFAIK in theory has no extent upper limit, so
>> > should be able to do that 30 GiB file in a single extent.
>> >
>> > But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
>> > still indicates at least some remaining fragmentation.
>>
>> So I converted the VMware VMDK file to a VirtualBox VDI file:
>>
>> -rw------- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
>> -rw------- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi
>>
>> $ filefrag Windows7.vdi
>> Windows7.vdi: 15 extents found
>>
>> $ btrfs filesystem defragment -t 3g Windows7.vdi
>> $ filefrag Windows7.vdi
>> Windows7.vdi: 24 extents found
>>
>> How can it be less than 28 extents with a chunk size of 1 GiB?
>
>    I _think_ the fragment size will be limited by the block group
> size. This is not the same as the chunk size for some RAID levels --
> for example, RAID-0, a block group can be anything from 2 to n chunks
> (across the same number of devices), where each chunk is 1 GiB, so
> potentially you could have arbitrary-sized block groups. The same
> would apply to RAID-10, -5 and -6.
>
>    (Note, I haven't verified this, but it makes sense based on what I
> know of the internal data structures).

It's a raid1 filesystem, so the block group ought to be the same size
as the chunk, right?

A 2GiB block group would suffice to explain it though.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-07-14 19:09           ` Patrik Lundquist
@ 2015-07-14 19:15             ` Hugo Mills
  2015-07-21 15:35               ` Patrik Lundquist
  0 siblings, 1 reply; 13+ messages in thread
From: Hugo Mills @ 2015-07-14 19:15 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2189 bytes --]

On Tue, Jul 14, 2015 at 09:09:00PM +0200, Patrik Lundquist wrote:
> On 14 July 2015 at 20:41, Hugo Mills <hugo@carfax.org.uk> wrote:
> > On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote:
> >> On 24 June 2015 at 12:46, Duncan <1i5t5.duncan@cox.net> wrote:
> >> >
> >> > Regardless of whether 1 or huge -t means maximum defrag, however, the
> >> > nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
> >> > should be considered ideally defragged at 31 extents.  This is a
> >> > departure from ext4, which AFAIK in theory has no extent upper limit, so
> >> > should be able to do that 30 GiB file in a single extent.
> >> >
> >> > But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
> >> > still indicates at least some remaining fragmentation.
> >>
> >> So I converted the VMware VMDK file to a VirtualBox VDI file:
> >>
> >> -rw------- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
> >> -rw------- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi
> >>
> >> $ filefrag Windows7.vdi
> >> Windows7.vdi: 15 extents found
> >>
> >> $ btrfs filesystem defragment -t 3g Windows7.vdi
> >> $ filefrag Windows7.vdi
> >> Windows7.vdi: 24 extents found
> >>
> >> How can it be less than 28 extents with a chunk size of 1 GiB?
> >
> >    I _think_ the fragment size will be limited by the block group
> > size. This is not the same as the chunk size for some RAID levels --
> > for example, RAID-0, a block group can be anything from 2 to n chunks
> > (across the same number of devices), where each chunk is 1 GiB, so
> > potentially you could have arbitrary-sized block groups. The same
> > would apply to RAID-10, -5 and -6.
> >
> >    (Note, I haven't verified this, but it makes sense based on what I
> > know of the internal data structures).
> 
> It's a raid1 filesystem, so the block group ought to be the same size
> as the chunk, right?

   Yes.

> A 2GiB block group would suffice to explain it though.

   Not with RAID-1 -- I'd expect the block group size to be 1 GiB.

   Hugo.

-- 
Hugo Mills             | There isn't a noun that can't be verbed.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: counting fragments takes more time than defragmenting
  2015-07-14 19:15             ` Hugo Mills
@ 2015-07-21 15:35               ` Patrik Lundquist
  0 siblings, 0 replies; 13+ messages in thread
From: Patrik Lundquist @ 2015-07-21 15:35 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 14 July 2015 at 21:15, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Tue, Jul 14, 2015 at 09:09:00PM +0200, Patrik Lundquist wrote:
>> On 14 July 2015 at 20:41, Hugo Mills <hugo@carfax.org.uk> wrote:
>> > On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote:
>> >> On 24 June 2015 at 12:46, Duncan <1i5t5.duncan@cox.net> wrote:
>> >> >
>> >> > Regardless of whether 1 or huge -t means maximum defrag, however, the
>> >> > nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
>> >> > should be considered ideally defragged at 31 extents.  This is a
>> >> > departure from ext4, which AFAIK in theory has no extent upper limit, so
>> >> > should be able to do that 30 GiB file in a single extent.
>> >> >
>> >> > But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
>> >> > still indicates at least some remaining fragmentation.
>> >>
>> >> So I converted the VMware VMDK file to a VirtualBox VDI file:
>> >>
>> >> -rw------- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
>> >> -rw------- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi
>> >>
>> >> $ filefrag Windows7.vdi
>> >> Windows7.vdi: 15 extents found
>> >>
>> >> $ btrfs filesystem defragment -t 3g Windows7.vdi
>> >> $ filefrag Windows7.vdi
>> >> Windows7.vdi: 24 extents found
>> >>
>> >> How can it be less than 28 extents with a chunk size of 1 GiB?
>> >
>> >    I _think_ the fragment size will be limited by the block group
>> > size. This is not the same as the chunk size for some RAID levels --
>> > for example, RAID-0, a block group can be anything from 2 to n chunks
>> > (across the same number of devices), where each chunk is 1 GiB, so
>> > potentially you could have arbitrary-sized block groups. The same
>> > would apply to RAID-10, -5 and -6.
>> >
>> >    (Note, I haven't verified this, but it makes sense based on what I
>> > know of the internal data structures).
>>
>> It's a raid1 filesystem, so the block group ought to be the same size
>> as the chunk, right?
>
>    Yes.
>
>> A 2GiB block group would suffice to explain it though.
>
>    Not with RAID-1 -- I'd expect the block group size to be 1 GiB.

So I had a look at the filefrag source and filefrag actually doesn't
print the number of extents but the number of disk fragments.
Contiguously allocated extents counts as one fragment.

"Windows7.vdi: 47 extents found" is really 213 extents over 47 disk fragments.

But I have one 2GiB extent, according to filefrag -v, so the question
remains. :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-07-21 15:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-04  8:42 counting fragments takes more time than defragmenting Marc MERLIN
2015-06-24  3:20 ` Marc MERLIN
2015-06-24  8:28   ` Patrik Lundquist
2015-06-24 10:46     ` Duncan
2015-06-24 12:05       ` Patrik Lundquist
2015-06-25  4:01         ` Duncan
2015-06-25  6:30           ` Patrik Lundquist
2015-07-14 11:57       ` Patrik Lundquist
2015-07-14 17:32         ` Duncan
2015-07-14 18:41         ` Hugo Mills
2015-07-14 19:09           ` Patrik Lundquist
2015-07-14 19:15             ` Hugo Mills
2015-07-21 15:35               ` Patrik Lundquist

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox