Putting very big and small files in one subvolume?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Putting very big and small files in one subvolume?
@ 2014-08-17  8:56 Shriramana Sharma
  2014-08-17 12:31 ` Duncan
  2014-08-29 16:04 ` Shriramana Sharma
  0 siblings, 2 replies; 8+ messages in thread
From: Shriramana Sharma @ 2014-08-17  8:56 UTC (permalink / raw)
  To: linux-btrfs

Hello. One more Q re generic BTRFS behaviour.
https://btrfs.wiki.kernel.org/index.php/Main_Page specifically
advertises BTRFS's "Space-efficient packing of small files".

So far (on ext3/4) I have been using two partitions for small/regular
files (like my source code repos, home directory with its hidden
config subdirectories etc) and big files (like downloaded Linux ISOs,
VMs etc) under some sort of understanding that this will help curb
fragmentation -- frankly I'm not a professional sysadmin in some
company or such so my assumption may not be valid.

In any case, since BTRFS effectively discourages usage of separate
partitions to take advantage of subvolumes etc, and given the above
claim to the FS automatically handling small files efficiently, I
wonder if it makes sense any longer to create separate subvolumes for
such big/small files as I describe in my use case?

Thanks!

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Putting very big and small files in one subvolume?
  2014-08-17  8:56 Putting very big and small files in one subvolume? Shriramana Sharma
@ 2014-08-17 12:31 ` Duncan
  2014-08-17 14:51   ` Russell Coker
  2014-08-18 18:16   ` Martin
  2014-08-29 16:04 ` Shriramana Sharma
  1 sibling, 2 replies; 8+ messages in thread
From: Duncan @ 2014-08-17 12:31 UTC (permalink / raw)
  To: linux-btrfs

Shriramana Sharma posted on Sun, 17 Aug 2014 14:26:06 +0530 as excerpted:

> Hello. One more Q re generic BTRFS behaviour.
> https://btrfs.wiki.kernel.org/index.php/Main_Page specifically
> advertises BTRFS's "Space-efficient packing of small files".
> 
> So far (on ext3/4) I have been using two partitions for small/regular
> files (like my source code repos, home directory with its hidden config
> subdirectories etc) and big files (like downloaded Linux ISOs,
> VMs etc) under some sort of understanding that this will help curb
> fragmentation -- frankly I'm not a professional sysadmin in some company
> or such so my assumption may not be valid.
> 
> In any case, since BTRFS effectively discourages usage of separate
> partitions to take advantage of subvolumes etc, and given the above
> claim to the FS automatically handling small files efficiently, I wonder
> if it makes sense any longer to create separate subvolumes for such
> big/small files as I describe in my use case?

It's worth noting that btrfs subvolumes are a reasonably lightweight 
construct, comparable enough to ordinary subdirectories that they're 
presented that way when browsing a parent subvolume, and there was 
actually discussion of making subvolumes and subdirs the exact same 
thing, effectively turning all subdirs into subvolumes.

As it turns out that wasn't feasible due not to btrfs limitations, but 
(as I understand it) to assumptions about subdirectories vs. mountable 
entities (subvolumes) built into the Linux POSIX and VFS levels.  Tho I 
admit to not really understanding the details, either because that 
discussion mostly happened before I became a regular or because it's 
above my head, or both, I'm not sure which.

But the point is, there really /is/ little overhead in creating a 
subvolume in btrfs.  It's basically a subdir that happens to be directly 
mountable on its own, tho if you're doing snapshotting there's also the 
bit about snapshots stopping at subvolume boundaries, while they don't 
stop at subdirs, to consider.

Based on that, there's really nothing stopping you from creating as many 
subvolumes as you want on btrfs.

OTOH, I tend to be rather more of an independent partition booster than 
many.  The biggest reason for that is the too many eggs in one basket 
problem.  Fully separate filesystems on separate partitions separate 
those data "eggs" into separate baskets, so if the metaphorical bottom 
drops out of one of those filesystem baskets, only the data eggs in that 
filesystem basket are lost, while the eggs in the separate filesystem 
baskets are still safe and sound, not affected at all. =:^)

The thing that troubles me about replacing a bunch of independent 
partitions and filesystems with a bunch of subvolumes on a single btrfs 
filesystem is thus just that, you've nicely divided that big basket into 
little subvolume compartments, but it's still one big basket, and if the 
bottom falls out, you potentially lose EVERYTHING in that filesystem 
basket!

Particularly while btrfs remains not entirely mature and stable, that 
doesn't seem to me to be a particularly wise move.  Both out of caution 
and because over the years I've evolved a partitioning scheme that works 
well for me, I'll probably keep even after I'm satisfied with btrfs 
stability, but certainly until then, I personally shudder at the 
additional risk every time I see someone mention replacing partitions 
with subvolumes.  (Of course as I said about something else in a previous 
reply, given that btrfs isn't fully stable, by definition all data that's 
important to you is backed up, and if it's on btrfs and not backed up, by 
definition it's not that important to you.  By that argument, there's 
really nothing for me to be shuddering /about/, but the fact remains, I 
do.)

I actually learned that lesson back on MS before the turn of the 
century.  This was before IE4 came out and I along with many others was 
running the public betas.  As it happened, in ordered to speed up IE the 
devs changed it to keep the temporary-internet-files cache index file 
location in RAM and to direct-write index changes to the appropriate file 
block locations without going thru the normal filesystem layers.  What 
they forgot about was the critical fact that they had combined the 
previously separate Windows Explorer shell with IE, and that as a 
consequence it was now running all the time.  Fine most of the time, but 
what happens when defrag comes along and decides the index file needs to 
be moved out from where the still running combined IE/WE shell things it 
is?

Most people running that beta that also had defrag scheduled to run 
automatically, as many did since it was a beta and they were power users, 
ended up with cross-linked files and an otherwise badly mangled 
filesystem that chkdisk couldn't completely sort out, because IE ended up 
simply overwriting whatever files defrag decided to stick where that 
index file had been, with index file data that would have been routed to 
the new index file location had MSIE not bypassed the normal filesystem 
access routines.  A number of those testers lost important files they 
didn't have backups for as a result.

Eventually MS "solved" the problem by simply marking the index file with 
the system attribute, which caused defrag to skip it, leaving it where it 
was regardless of what else it wanted to put there or how many fragments 
it might be in.

But while I did get a bit of temporary internet file cache corruption, 
that was all.  Why?  Because I had a separate partition for my temporary 
stuff, including both $TEMP/TMP and temporary internet files.  Defrag 
still moved the index file out from under IE, and IE still overwrote 
whatever else defrag put in its place, but since I had the temporary 
internet files cache configured to be on the tempfiles partition, the 
only thing there to overwrite was temporary files anyway, so none of my 
valuable files were ever in danger.

Talk about a lesson reinforcing a choice to put my tempfiles on a 
separate partition!  That's ONE thing I've *ALWAYS* been sure I did ever 
since, and indeed, these days my $TMP, /tmp and /var/tmp is actually on 
tmpfs, a memory-based filesystem so it's all in memory and erased when I 
reboot, kept as far away from permanent on-drive files as I can keep it. 
=:^)

The second reason for separate partitions is that they take less time to 
fsck, backup/restore, and on btrfs, balance/scrub, than big huge 
monolithic partitions.  Especially since it's likely some partitions 
aren't mounted and / is read-only mounted (see below), that means less 
time spent in recovery, and btrfs scrubbed and balanced more frequently 
since it's a matter of minutes (especially on ssd), not hours.

FWIW, while I've evolved my partitioning scheme over the years, here's 
what I use now:

/	8 GB, on ssd, read-only mounted by default.

/ includes most of /usr and /var as well as /etc.  It's mounted read-only 
unless I'm actively updating, thus dramatically increasing its robustness 
in the event of a crash, since it's nearly always mounted read-only and 
thus isn't likely to be corrupted.

/home	20 GB, on ssd.

/home includes my normal user stuff of course, but not my big media 
files, etc.  I also symlink some state dirs from /var to /home/var/, so 
they can be written while /var itself, on /, remains read-only mounted.

/var/log	Half a GB, ssd.

/var/log is very small.  I keep a tight logrotate schedule. =:^)  It's a 
dedicated partition for two reasons.  First, if something starts run-away 
logging, it can fill up the log partition but can't otherwise affect the 
system.  Second, logs being what they are, in the event of a crash, it's 
very likely some log entry will have been in the process of being 
written, and thus this partition will see some potential corruption.  
Limiting that corruption risk to a dedicated log partition seems wise. 
=:^)

/mnt/packages	24 GB, not mounted by default, ssd.

This contains my distro's package tree and various overlays.  I'm running 
gentoo, so the package tree is simply build scripts and configuration, 
but I also keep all the source tarballs here, along with my kernel tree 
(git), binpkgs for quick reinstallation without rebuilding from sources, 
and ccache.  Additionally, I build for my 32-bit netbook on this machine 
and keep the binpkgs and ccache for it here too.  That's why it's a full 
24 GB.

This isn't mounted at all unless I'm updating, thus keeping it out of 
harm's way in the event of a crash.

/mm		100 GB+, not mounted by default, spinning rust.

/mm is my media partition, mostly long term storage for pretty big files, 
doesn't need mounted by default but often mounted to access the media 
files.  While not ideal for large files this one's reiserfs, since that's 
what I standardized on before switching to ssd and btrfs.  From my 
experience, reiserfs is in fact more stable than ext3/4, since relatively 
fewer kernel devs dare mess with it, and it has proven stable for me even 
thru hardware issues such as bad memory.

That's also why I keep a reiserfs backup of all the SSD/btrfs partitions 
too, since I know it is long-term stable and isn't likely to suddenly bug 
out on me.

/tmp, /var/tmp, /run...  tmpfs.

Additionally I have primary backup partitions of all the btrfs/SSD 
partitions (except /var/log) on (separate) btrfs/SSD, and as mentioned 
secondary backups of all partitions on reiserfs/spinning-rust.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Putting very big and small files in one subvolume?
  2014-08-17 12:31 ` Duncan
@ 2014-08-17 14:51   ` Russell Coker
  2014-08-18 18:16   ` Martin
  1 sibling, 0 replies; 8+ messages in thread
From: Russell Coker @ 2014-08-17 14:51 UTC (permalink / raw)
  To: linux-btrfs

On Sun, 17 Aug 2014 12:31:42 Duncan wrote:
> OTOH, I tend to be rather more of an independent partition booster than 
> many.  The biggest reason for that is the too many eggs in one basket 
> problem.  Fully separate filesystems on separate partitions separate 
> those data "eggs" into separate baskets, so if the metaphorical bottom 
> drops out of one of those filesystem baskets, only the data eggs in that 
> filesystem basket are lost, while the eggs in the separate filesystem 
> baskets are still safe and sound, not affected at all. =:^)
> 
> The thing that troubles me about replacing a bunch of independent 
> partitions and filesystems with a bunch of subvolumes on a single btrfs 
> filesystem is thus just that, you've nicely divided that big basket into 
> little subvolume compartments, but it's still one big basket, and if the 
> bottom falls out, you potentially lose EVERYTHING in that filesystem 
> basket!

I'll write the counter-point to this.

If you have several partitions for /, /var/log, and /home then losing any one 
of them will result in a system that's mostly unusable.  So for continuous 
service there doesn't seem to be a benefit in having multiple partitions.

When you have to restore a backup in adverse circumstances the restore time is 
important.  For example if you have 10*4TB disks and need RAID-1 redundancy 
(which you need on any BTRFS filesystem of note as I don't think RAID-5 and 
RAID-6 are trustworthy) then an advantage of 5*4TB RAID-1 filesystems over a 
20TB RAID-10 is that restore time will be a lot smaller.  But this isn't an 
issue for typical BTRFS users who are working with much smaller amounts of 
data, at this time I have to recommend ZFS over BTRFS for most systems that 
manage 20TB of data.

If you have a RAID-1 array of the biggest disks available (which is probably 
the biggest storage for >99% of BTRFS users) then you are looking at a restore 
time of maybe 4TB at 160MB/s == something less than 7 hours.  For a home 
network 7 hours delay in getting things going after a major failure is quite 
OK.

Finally failures of filesystems on different partitions won't be independent.  
If one filesystem on a disk becomes unusable due to drive firmware issues or 
other serious problems then other filesystems on the same physical disk are 
likely to suffer the same fate.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Putting very big and small files in one subvolume?
  2014-08-17 12:31 ` Duncan
  2014-08-17 14:51   ` Russell Coker
@ 2014-08-18 18:16   ` Martin
  2014-08-19  4:07     ` Duncan
  2014-08-19  5:26     ` Duncan
  1 sibling, 2 replies; 8+ messages in thread
From: Martin @ 2014-08-18 18:16 UTC (permalink / raw)
  To: linux-btrfs

Good questions and already good comment given.

For another view...

On 17/08/14 13:31, Duncan wrote:
> Shriramana Sharma posted on Sun, 17 Aug 2014 14:26:06 +0530 as excerpted:
> 
>> Hello. One more Q re generic BTRFS behaviour.
>> https://btrfs.wiki.kernel.org/index.php/Main_Page specifically
>> advertises BTRFS's "Space-efficient packing of small files".
>>
>> So far (on ext3/4) I have been using two partitions for small/regular
>> files (like my source code repos, home directory with its hidden config
>> subdirectories etc) and big files (like downloaded Linux ISOs,
>> VMs etc) under some sort of understanding that this will help curb
>> fragmentation...

The cases of pathological fragmentation by btrfs (for 'database-style'
files and VM image files especially) have been mentioned, as have the
use of nocow and/or using separate subvolumes to reduce or slow down the
buildup of the fragmentation.

systemd logging even bulldozed blindly into that one spectacularly!...

There is now a defragment option. However, that does not scale well for
large or frequently rewritten files and you gamble how much IO bandwidth
you can afford to lose rewriting *entire* files.

The COW fragmentation problem is not going to go away. Also, there is
quite a high requirement for user awareness to specially mark
directories/files as nocow. And yet then, that still does not work well
if multiple snapshots are being taken...!

Could a better and more complete fix be to automatically defragment say
just x4 the size being written for a file segment?

Also, for the file segment being defragged, abandon any links to other
snapshots to in effect deliberately replicate the data where appropriate
so that data segment is fully defragged.

>> In any case, since BTRFS effectively discourages usage of separate
>> partitions to take advantage of subvolumes etc, and given the above
>> claim to the FS automatically handling small files efficiently, I wonder
>> if it makes sense any longer to create separate subvolumes for such
>> big/small files as I describe in my use case?
> 
> It's worth noting that btrfs subvolumes are a reasonably lightweight 
> construct, comparable enough to ordinary subdirectories that they're 
> presented that way when browsing a parent subvolume, and there was 
> actually discussion of making subvolumes and subdirs the exact same 
> thing, effectively turning all subdirs into subvolumes.
> 
> As it turns out that wasn't feasible due not to btrfs limitations, but 
> (as I understand it) to assumptions about subdirectories vs. mountable 
> entities (subvolumes) built into the Linux POSIX and VFS levels...

Due to namespaces and inode number spaces?...

> OTOH, I tend to be rather more of an independent partition booster than 
> many.  The biggest reason for that is the too many eggs in one basket 
> problem.  Fully separate filesystems on separate partitions...

I do so similarly myself. A good scheme that I have found to work well
for my cases is to have separate partitions for:

/boot
/var
/var/log
/
/usr
/home
/mnt/data...

And all the better and easy to do using GPT partition tables.

The one aspect to all that is that you can protect your system becoming
jammed by suffering a full disk for whatever reason and all without
needing to resort to quotas. So for example, rogue logging can fill up
/var/log and you can still use the system and be able to easily tidy
things up.

However, that scheme does also require that you have a good idea of what
partition sizes you will need right from when first set up.

You can 'cheat' and gain flexibility at the expense of HDD head seek
time by cobbling together LVM volumes as and when needed to resize
whichever filesystem.

Which is where btrfs comes into play in that if you can trust to not
lose all your eggs to btrfs corruption, you can utilise your partition
scheme with subvolumes and quotas and allow the intelligence in btrfs to
make everything work well even if you change what size (quota) you want
for a subvolume. The ENTIRE disk (no partition table) is all btrfs.

Special NOTE: Myself, I consider btrfs *quotas* to be still very
experimental at the moment and not to be used with valued data!

Other big plusses for btrfs for me are the raid and snapshots.

The killer though is for how robust the filesystem is against corruption
and random data/hardware failure.

btrfsck?

Always keep multiple backups!

Regards,
Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Putting very big and small files in one subvolume?
  2014-08-18 18:16   ` Martin
@ 2014-08-19  4:07     ` Duncan
  2014-08-19  5:26     ` Duncan
  1 sibling, 0 replies; 8+ messages in thread
From: Duncan @ 2014-08-19  4:07 UTC (permalink / raw)
  To: linux-btrfs

Martin posted on Mon, 18 Aug 2014 19:16:20 +0100 as excerpted:

> Also, for the file segment being defragged, abandon any links to other
> snapshots to in effect deliberately replicate the data where appropriate
> so that data segment is fully defragged.

FWIW, this is the current state.

The initial attempt at snapshot-aware-defrag was committed to mainline 
(the kernel release is listed in the wiki's changelog page) but people 
quickly ran into scaling issues with multi-thousand-snapshot systems as 
well as quotas that weren't originally expected or, I guess, tested, pre-
mainline-merge, so they ended up reverting the snapshot awareness 
temporarily, until they could come up with something more scalable.  I 
believe they've addressed some of that now, but I'm not sure it's yet 
scaling the way the original trial suggested it needed to, and snapshot-
aware-defrag remains disabled for the time being.

So at least currently, defragging isn't snapshot aware, and as a result, 
if the filesystem is highly fragmented, attempting to defrag will 
increase space usage substantially as all those snapshot links are broken 
in ordered to defrag the file on the currently mounted and being defragged 
snapshot.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Putting very big and small files in one subvolume?
  2014-08-18 18:16   ` Martin
  2014-08-19  4:07     ` Duncan
@ 2014-08-19  5:26     ` Duncan
  1 sibling, 0 replies; 8+ messages in thread
From: Duncan @ 2014-08-19  5:26 UTC (permalink / raw)
  To: linux-btrfs

Martin posted on Mon, 18 Aug 2014 19:16:20 +0100 as excerpted:

>> OTOH, I tend to be rather more of an independent partition booster than
>> many.  The biggest reason for that is the too many eggs in one basket
>> problem.  Fully separate filesystems on separate partitions...
> 
> I do so similarly myself. A good scheme that I have found to work well
> for my cases is to have separate partitions for:
> 
> /boot
> /var
> /var/log
> /
> /usr
> /home
> /mnt/data...

Of course with "the new layout" ala systemd, etc, a separate /usr is 
claimed to be broken (but see below), as too many binaries and libs, plus 
a lot of config needed in early boot, are located there.  It used to be 
that distros went to some pain to ensure that all binaries needed in 
early boot were located in /bin and were either statically linked or had 
the necessary libs in /lib (instead of /usr/bin and /usr/lib 
respectively), but as systems have become more complicated, that has 
become more difficult, and the systemd folks made headlines saying it was 
no longer supported, tho I imagine that was to the great relief of the 
various distro folks who formerly had to deal with it.

Tho it can still be made to work with an initr* that ensures both / and 
/usr are mounted before doing the pivot_root or whatever and handing 
control over to the main system init (which on more and more distros is 
systemd).  And since most distros use an initr* anyway, as long as it's 
configured to mount /usr before switching to the main system init, it 
doesn't necessarily appear to be broken.

But that's why some distros are doing root-and-usr unification, 
effectively putting /everything/ in /usr/bin and /usr/lib, and taking 
advantage of /usr/share for distro-based config so /etc can be left for 
site-specific config.

FWIW, I've actually reversed that here:  /usr -> .  So anything installed 
in /usr/bin actually ends up installed in /bin. =:^)

Meanwhile, another lesson I learned, this one unfortunately the hard way 
unlike the always-keep-tmp-on-its-own-filesystem rule I mentioned above, 
is that either restoring from backup or simply falling back to that 
backup as /, is *MUCH* simpler if everything the package manager installs 
to is on the same filesystem.

I learned that when I lost air conditioning, here in Phoenix in the 
summer, when I had gone away with the system left on.  Outside temps in 
the shade can reach 50C (122F) here, and when I came back I expect the 
room was 60C+ (140F), with the computer easily reaching 70C+.  The CPU 
survived and was fine after I cooled things back down, but the disk had 
head-crashed.

Unfortunately I had separate / /usr and /var, and I ended up restoring 
from backups from different dates for each.  The package manager's 
installed-package database was in /var/db, so what it said was installed 
didn't match what actually got restored to either / or /usr.

Fortunately the system worked well enough that I could boot back up and 
reinstall from (nearly) current binary builds (gentoo, so without the 
binary builds I'd have ahd to rebuild from sources), but because the 
installed-package database was out-of-sync with what was actually in-
place, uninstalling the old versions as I reinstalled missed quite a few 
odd files here and there.

I was dealing with THAT mistake for QUITE some time as I was still 
cleaning up stray files nearly two years later.

So these days (with some minor state tracking exceptions) my policy is 
that anything the package manager touches is on root, along with the 
database tracking it all, so the installed-package database is always 
automatically in sync with whatever backup I end up using, since all 
installed files are on the same partition as the database and that 
partition is backed up as a whole.

As a result, that partition includes /usr (which made it extremely easy 
to make /usr a symlink /usr -> . as mentioned above, when I decided to do 
so) and /var, tho as I mentioned /var/log is separate.  The minor state-
tracking exceptions are the packages that need actively writable state, 
mostly kept in /var/lib by default, since my rootfs is read-only by 
default.  Where necessary I have these symlinked to corresponding subdirs 
of /home/var, which as part of /home is writable by default.  But they're 
state only and should it be necessary I could start with clean state.  
But / is still only 8 GB in size, more or less half used.  That's small 
enough I can keep multiple identically sized backup root partitions on 
various devices.

> And all the better and easy to do using GPT partition tables.

Absolutely and enthusiastically agreed!

I use GPT on anything I partition these days, misc. USB based external 
and thumb drives included.  Among other reasons (robustness and lack of 
primary/secondary/logical partition hassles) GPT partition names/labels 
form an integral part of my device/partition ID scheme, such that I can 
immediately identify on-sight the functionality (home, root, log, etc), 
target machine (netbook, workstation, portable, etc), containing device 
(938 GiB Seagate external drive #3, etc), working or backup generation, 
date I setup the partition, etc.

That dramatically helps keeping things straight when there's several 
generations of backup for various partitions on various machines, 
floating around on various devices both internal and external. =:^)

> Special NOTE: Myself, I consider btrfs *quotas* to be still very
> experimental at the moment and not to be used with valued data!

Definitely so.  There's a set of patches currently floating around that 
should dramatically improve quota stability and reliability, but they've 
not hit mainline yet (and look to be delayed until 3.18 now due to a mixup 
during the 3.17 merge window), and the existing btrfs quota code simply 
has too many known problems to be recommended for anything besides pure 
experimental usage.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Putting very big and small files in one subvolume?
  2014-08-17  8:56 Putting very big and small files in one subvolume? Shriramana Sharma
  2014-08-17 12:31 ` Duncan
@ 2014-08-29 16:04 ` Shriramana Sharma
  2014-08-29 16:24   ` Hugo Mills
  1 sibling, 1 reply; 8+ messages in thread
From: Shriramana Sharma @ 2014-08-29 16:04 UTC (permalink / raw)
  To: linux-btrfs

On 8/17/14, Shriramana Sharma <samjnaa@gmail.com> wrote:
> Hello. One more Q re generic BTRFS behaviour.
> https://btrfs.wiki.kernel.org/index.php/Main_Page specifically
> advertises BTRFS's "Space-efficient packing of small files".

Hello. I realized that while I got lots of interesting advice on how
to best layout my FS on multiple devices/FSs, I would like to
specifically know how exactly the above works (in not-too-technical
terms) so I'd like to decide for myself if the above feature of BTRFS
would suit my particular purpose.

Thank you!

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Putting very big and small files in one subvolume?
  2014-08-29 16:04 ` Shriramana Sharma
@ 2014-08-29 16:24   ` Hugo Mills
  0 siblings, 0 replies; 8+ messages in thread
From: Hugo Mills @ 2014-08-29 16:24 UTC (permalink / raw)
  To: Shriramana Sharma; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2693 bytes --]

On Fri, Aug 29, 2014 at 09:34:54PM +0530, Shriramana Sharma wrote:
> On 8/17/14, Shriramana Sharma <samjnaa@gmail.com> wrote:
> > Hello. One more Q re generic BTRFS behaviour.
> > https://btrfs.wiki.kernel.org/index.php/Main_Page specifically
> > advertises BTRFS's "Space-efficient packing of small files".
> 
> Hello. I realized that while I got lots of interesting advice on how
> to best layout my FS on multiple devices/FSs, I would like to
> specifically know how exactly the above works (in not-too-technical
> terms) so I'd like to decide for myself if the above feature of BTRFS
> would suit my particular purpose.

   In brief: For small files (typically under about 3.5k), the FS can
put the file's data in the metadata -- specifically, the extent tree
-- so that the data is directly available without a second seek to
find it.

   The longer version: btrfs has a number of B-trees in its metadata.
These are trees with a high fan-out (from memory, it's something like
30-240 children each, depending on the block size), and with the
actual data being stored at the leaves of the tree. Each leaf of the
tree is a fixed size, depending on the options passed to mkfs.
Typically 4k-32k.

   The data in the trees is stored as a key and a value -- the tree
indexes the keys efficiently, and stores the values (usually some data
structure like an inode or file extent information) in the same leaf
node as the key -- keys at the front of the leaf, data at the back.

   The extent tree keeps track of the contiguous byte sequences of
each file, and where those sequences can be found on the FS. To read a
file, the FS looks up the file's extents in the extent tree, and then
has to go and find the data that it points to. This involves an extra
read of the disk, which is slow. However, the metadata tree leaf is
already in RAM (because the FS has just read it). So, for performance
and space efficiency reasons, it can optionally store data for small
files as part of the "value" component of the key/value pair for the
file's extent. This means that the file's data is available
immediately, without the extra disk read.

   Drawbacks -- metadata on btrfs is usually DUP, which means two
copies, so storing lots of medium-small files (2k-4k) will take up
more space than it would otherwise, because you're storing two copies
and not saving enough space to make it worthwhile. It also makes it
harder to calculate the "used" vs "free" values for df.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
       --- Great films about cricket: Umpire of the Rising Sun ---       

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-08-29 16:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-17  8:56 Putting very big and small files in one subvolume? Shriramana Sharma
2014-08-17 12:31 ` Duncan
2014-08-17 14:51   ` Russell Coker
2014-08-18 18:16   ` Martin
2014-08-19  4:07     ` Duncan
2014-08-19  5:26     ` Duncan
2014-08-29 16:04 ` Shriramana Sharma
2014-08-29 16:24   ` Hugo Mills

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).