btrfs and 1 billion small files

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* btrfs and 1 billion small files
       [not found] <1913174825.1910.1336382310577.JavaMail.root@zimbra.interconnessioni.it>
@ 2012-05-07  9:28 ` Alessio Focardi
  2012-05-07  9:58   ` Hubert Kario
                     ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Alessio Focardi @ 2012-05-07  9:28 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I need some help in designing a storage structure for 1 billion of small files (<512 Bytes), and I was wondering how btrfs will fit in this scenario. Keep in mind that I never worked with btrfs - I just read some documentation and browsed this mailing list - so forgive me if my questions are silly! :X

On with the main questions, then:

- What's the advice to maximize disk capacity using such small files, even sacrificing some speed?

- Would you store all the files "flat", or would you build a hierarchical tree of directories to speed up file lookups? (basically duplicating the filesystem Btree indexes)

I tried to answer those questions, and here is what I found:

it seems that the smallest block size is 4K. So, in this scenario, if every file uses a full block I will end up with lots of space wasted. Wouldn't change much if block was 2K, anyhow.

I tough about compression, but is not clear to me the compression is handled at the file level or at the block level.

Also I read that there is a mode that uses blocks for shared storage of metadata and data, designed for small filesystems. Haven't found any other info about it.

Still is not yet clear to me if btrfs can fit my situation, would you recommend it over XFS?

XFS has a minimum block size of 512, but BTRFS is more modern and, given the fact that is able to handle indexes on his own, it could help us speed up file operations (could it?)

Thank you for any advice!

Alessio Focardi
------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07  9:28 ` btrfs and 1 billion small files Alessio Focardi
@ 2012-05-07  9:58   ` Hubert Kario
  2012-05-07 10:06     ` Boyd Waters
  2012-05-07 10:55   ` Hugo Mills
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Hubert Kario @ 2012-05-07  9:58 UTC (permalink / raw)
  To: Alessio Focardi; +Cc: linux-btrfs

On Monday 07 of May 2012 11:28:13 Alessio Focardi wrote:
> Hi,
>=20
> I need some help in designing a storage structure for 1 billion of sm=
all
> files (<512 Bytes), and I was wondering how btrfs will fit in this
> scenario. Keep in mind that I never worked with btrfs - I just read s=
ome
> documentation and browsed this mailing list - so forgive me if my que=
stions
> are silly! :X
>=20
>=20
> On with the main questions, then:
>=20
> - What's the advice to maximize disk capacity using such small files,=
 even
> sacrificing some speed?
>=20
> - Would you store all the files "flat", or would you build a hierarch=
ical
> tree of directories to speed up file lookups? (basically duplicating =
the
> filesystem Btree indexes)
>=20
>=20
> I tried to answer those questions, and here is what I found:
>=20
> it seems that the smallest block size is 4K. So, in this scenario, if=
 every
> file uses a full block I will end up with lots of space wasted. Would=
n't
> change much if block was 2K, anyhow.
>=20
> I tough about compression, but is not clear to me the compression is =
handled
> at the file level or at the block level.
>=20
> Also I read that there is a mode that uses blocks for shared storage =
of
> metadata and data, designed for small filesystems. Haven't found any =
other
> info about it.
>=20
>=20
> Still is not yet clear to me if btrfs can fit my situation, would you
> recommend it over XFS?
>=20
> XFS has a minimum block size of 512, but BTRFS is more modern and, gi=
ven the
> fact that is able to handle indexes on his own, it could help us spee=
d up
> file operations (could it?)
>=20
> Thank you for any advice!
>=20

btrfs will inline such small files in metadata blocks.

I'm not sure about limits to size of directory, but I'd guess that goin=
g over=20
few tens of thousands of files in single flat directory will have speed=
=20
penalties.

Regards,
--=20
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawer=F3w 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07  9:58   ` Hubert Kario
@ 2012-05-07 10:06     ` Boyd Waters
  2012-05-08  6:31       ` Chris Samuel
  0 siblings, 1 reply; 14+ messages in thread
From: Boyd Waters @ 2012-05-07 10:06 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org; +Cc: Alessio Focardi

Use a directory hierarchy. Even if the filesystem handles a flat structure effectively, userspace programs will choke on tens of thousands of files in a single directory. For example 'ls' will try to lexically sort its output (very slowly) unless given the command-line option not to do so.

Sent from my iPad

On May 7, 2012, at 3:58 AM, Hubert Kario <hka@qbs.com.pl> wrote:

> I'm not sure about limits to size of directory, but I'd guess that going over 
> few tens of thousands of files in single flat directory will have speed 
> penalties

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07  9:28 ` btrfs and 1 billion small files Alessio Focardi
  2012-05-07  9:58   ` Hubert Kario
@ 2012-05-07 10:55   ` Hugo Mills
  2012-05-07 11:15     ` Alessio Focardi
  2012-05-07 11:05   ` vivo75
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Hugo Mills @ 2012-05-07 10:55 UTC (permalink / raw)
  To: Alessio Focardi; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3091 bytes --]

On Mon, May 07, 2012 at 11:28:13AM +0200, Alessio Focardi wrote:
> Hi,
> 
> I need some help in designing a storage structure for 1 billion of small files (<512 Bytes), and I was wondering how btrfs will fit in this scenario. Keep in mind that I never worked with btrfs - I just read some documentation and browsed this mailing list - so forgive me if my questions are silly! :X
> 
> 
> On with the main questions, then:

> - What's the advice to maximize disk capacity using such small
>   files, even sacrificing some speed?

   See my comments below about inlining files.

> - Would you store all the files "flat", or would you build a
>   hierarchical tree of directories to speed up file lookups?
>   (basically duplicating the filesystem Btree indexes)

   Hierarchically, for the reasons Hubert and Boyd gave. (And it's not
duplicating the btree indexes -- the tree of the btree does not
reflect the tree of the directory hierarchy).

> I tried to answer those questions, and here is what I found:
>
> it seems that the smallest block size is 4K. So, in this scenario,
> if every file uses a full block I will end up with lots of space
> wasted. Wouldn't change much if block was 2K, anyhow.

   With small files, they will typically be inlined into the metadata.
This is a lot more compact (as you can have several files' data in a
single block), but by default will write two copies of each file, even
on a single disk.

   So, if you want to use some form of redundancy (e.g. RAID-1), then
that's great, and you need to do nothing unusual. However, if you want
to maximise space usage at the expense of robustness in a device
failure, then you need to ensure that you only keep one copy of your
data. This will mean that you should format the filesystem with the -m
single option.

> I tough about compression, but is not clear to me the compression is
> handled at the file level or at the block level.

> Also I read that there is a mode that uses blocks for shared storage
> of metadata and data, designed for small filesystems. Haven't found
> any other info about it.

   Don't use that unless your filesystem is <16GB or so in size. It
won't help here (i.e. file data stored in data chunks will still be
allocated on a block-by-block basis).

> Still is not yet clear to me if btrfs can fit my situation, would
> you recommend it over XFS?

   The relatively small metadata overhead (e.g. compared to ext4) and
inline capability of btrfs would seem to be a good match for your
use-case.

> XFS has a minimum block size of 512, but BTRFS is more modern and,
> given the fact that is able to handle indexes on his own, it could
> help us speed up file operations (could it?)

   Not sure what you mean by "handle indexes on its own". XFS will
have its own set of indexes and file metadata -- it wouldn't be much
of a filesystem if it didn't.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
                        --- argc, argv, argh! ---                        

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07  9:28 ` btrfs and 1 billion small files Alessio Focardi
  2012-05-07  9:58   ` Hubert Kario
  2012-05-07 10:55   ` Hugo Mills
@ 2012-05-07 11:05   ` vivo75
  2012-05-08 16:46     ` Martin
  2012-05-07 15:13   ` David Sterba
  2012-05-08 12:31   ` Chris Mason
  4 siblings, 1 reply; 14+ messages in thread
From: vivo75 @ 2012-05-07 11:05 UTC (permalink / raw)
  To: Alessio Focardi; +Cc: linux-btrfs

Il 07/05/2012 11:28, Alessio Focardi ha scritto:
> Hi,
>
> I need some help in designing a storage structure for 1 billion of small files (<512 Bytes), and I was wondering how btrfs will fit in this scenario. Keep in mind that I never worked with btrfs - I just read some documentation and browsed this mailing list - so forgive me if my questions are silly! :X
Are you *really* sure a database is *not* what are you looking for?

> On with the main questions, then:
>
> - What's the advice to maximize disk capacity using such small files, even sacrificing some speed?
>
> - Would you store all the files "flat", or would you build a hierarchical tree of directories to speed up file lookups? (basically duplicating the filesystem Btree indexes)
>
>
> I tried to answer those questions, and here is what I found:
>
> it seems that the smallest block size is 4K. So, in this scenario, if every file uses a full block I will end up with lots of space wasted. Wouldn't change much if block was 2K, anyhow.
>
> I tough about compression, but is not clear to me the compression is handled at the file level or at the block level.
>
> Also I read that there is a mode that uses blocks for shared storage of metadata and data, designed for small filesystems. Haven't found any other info about it.
>
>
> Still is not yet clear to me if btrfs can fit my situation, would you recommend it over XFS?
>
> XFS has a minimum block size of 512, but BTRFS is more modern and, given the fact that is able to handle indexes on his own, it could help us speed up file operations (could it?)
>
> Thank you for any advice!
>
> Alessio Focardi
> ------------------
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07 10:55   ` Hugo Mills
@ 2012-05-07 11:15     ` Alessio Focardi
  2012-05-07 11:39       ` Hugo Mills
  0 siblings, 1 reply; 14+ messages in thread
From: Alessio Focardi @ 2012-05-07 11:15 UTC (permalink / raw)
  To: Hugo Mills; +Cc: linux-btrfs

> This is a lot more compact (as you can have several files' data in a
> single block), but by default will write two copies of each file,
> even
> on a single disk.

Great, no (or less) space wasted, then! I will have a filesystem that's composed mostly of metadata blocks, if I understand correctly. Will this create any problem? 

>    So, if you want to use some form of redundancy (e.g. RAID-1), then
> that's great, and you need to do nothing unusual. However, if you
> want
> to maximise space usage at the expense of robustness in a device
> failure, then you need to ensure that you only keep one copy of your
> data. This will mean that you should format the filesystem with the
> -m
> single option.

That's a very clever suggestion, I'm preparing a test server right now: going to use the -m single option. Any other suggestion regarding format options?

pagesize? leafsize?

> > XFS has a minimum block size of 512, but BTRFS is more modern and,
> > given the fact that is able to handle indexes on his own, it could
> > help us speed up file operations (could it?)
> 
>    Not sure what you mean by "handle indexes on its own". XFS will
> have its own set of indexes and file metadata -- it wouldn't be much
> of a filesystem if it didn't.

Yes, you are perfectly right; I tough that recreating a tree like /d/u/m/m/y/ to store "dummy" would have been redundant since the whole filesystem is based on trees - I don't have to "ls" directories, we are using php to write and read files, I will have to find a "compromise" between levels of directories and number of files in each one of them.

May I ask you about compression? Would you use it in the scenario I described?

Thank you for your help!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07 11:15     ` Alessio Focardi
@ 2012-05-07 11:39       ` Hugo Mills
  2012-05-07 12:19         ` Johannes Hirte
  0 siblings, 1 reply; 14+ messages in thread
From: Hugo Mills @ 2012-05-07 11:39 UTC (permalink / raw)
  To: Alessio Focardi; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3475 bytes --]

On Mon, May 07, 2012 at 01:15:26PM +0200, Alessio Focardi wrote:
> > This is a lot more compact (as you can have several files' data in a
> > single block), but by default will write two copies of each file,
> > even
> > on a single disk.
> 
> Great, no (or less) space wasted, then!

   Less space wasted -- you will still have empty bytes left at the
end(*) of most metadata blocks, but you will definitely be packing in
storage far more densely than otherwise.

(*) Actually, the middle, but let's ignore that here.

> I will have a filesystem that's composed mostly of metadata blocks,
> if I understand correctly. Will this create any problem?

   Not that I'm aware of -- but you probably need to run proper tests
of your likely behaviour just to see what it'll be like.

> >    So, if you want to use some form of redundancy (e.g. RAID-1), then
> > that's great, and you need to do nothing unusual. However, if you
> > want
> > to maximise space usage at the expense of robustness in a device
> > failure, then you need to ensure that you only keep one copy of your
> > data. This will mean that you should format the filesystem with the
> > -m
> > single option.
> 
> 
> That's a very clever suggestion, I'm preparing a test server right now: going to use the -m single option. Any other suggestion regarding format options?
> 
> pagesize? leafsize?

   I'm not sure about these -- some values of them definitely break
things. I think they are required to be the same, and that you could
take them up to 64k with no major problems, but do check that first
with someone who actually knows.

   Having a larger pagesize/leafsize will reduce the depth of the
trees, and will allow you to store more items in each tree block,
which gives you less wastage again. I don't know what the drawbacks
are, though.

> > > XFS has a minimum block size of 512, but BTRFS is more modern and,
> > > given the fact that is able to handle indexes on his own, it could
> > > help us speed up file operations (could it?)
> > 
> >    Not sure what you mean by "handle indexes on its own". XFS will
> > have its own set of indexes and file metadata -- it wouldn't be much
> > of a filesystem if it didn't.

> Yes, you are perfectly right; I tough that recreating a tree like
> /d/u/m/m/y/ to store "dummy" would have been redundant since the
> whole filesystem is based on trees - I don't have to "ls"
> directories, we are using php to write and read files, I will have
> to find a "compromise" between levels of directories and number of
> files in each one of them.

   The FS tree (which is the bit that stores the directory hierarchy
and file metadata) is (broadly) a tree-structured index of inodes,
ordered by inode number. Don't confuse the inode index structure with
the directory structure -- they're totally different arrangements of
the data. You may want to try looking at [1], which attempts to
describe how the FS tree holds file data.

> May I ask you about compression? Would you use it in the scenario I
> described?

   I'm not sure if compression will apply to inline file data. Again,
someone else may be able to answer; and you should probably test it
with your own use-cases anyway.

   Hugo.

[1] http://btrfs.ipv5.de/index.php?title=Trees

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
              --- Welcome to Rivendell,  Mr Anderson... ---              

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07 11:39       ` Hugo Mills
@ 2012-05-07 12:19         ` Johannes Hirte
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Hirte @ 2012-05-07 12:19 UTC (permalink / raw)
  To: Hugo Mills; +Cc: Alessio Focardi, linux-btrfs

Am Mon, 7 May 2012 12:39:28 +0100
schrieb Hugo Mills <hugo@carfax.org.uk>:

> On Mon, May 07, 2012 at 01:15:26PM +0200, Alessio Focardi wrote:
...
> > That's a very clever suggestion, I'm preparing a test server right
> > now: going to use the -m single option. Any other suggestion
> > regarding format options?
> > 
> > pagesize? leafsize?
> 
>    I'm not sure about these -- some values of them definitely break
> things. I think they are required to be the same, and that you could
> take them up to 64k with no major problems, but do check that first
> with someone who actually knows.

First, if you have this filesystem as rootfs, a separate /boot
partition is needed. Grub is unable to boot from btrfs with different
node-/leafsize. Second a very recent kernel is needed (linux-3.4-rc1 at
least).

regards,
  Johannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07  9:28 ` btrfs and 1 billion small files Alessio Focardi
                     ` (2 preceding siblings ...)
  2012-05-07 11:05   ` vivo75
@ 2012-05-07 15:13   ` David Sterba
  2012-05-08 12:31   ` Chris Mason
  4 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2012-05-07 15:13 UTC (permalink / raw)
  To: Alessio Focardi; +Cc: linux-btrfs

On Mon, May 07, 2012 at 11:28:13AM +0200, Alessio Focardi wrote:
> I tough about compression, but is not clear to me the compression is
> handled at the file level or at the block level.

I don't recommend using compression for your expected file size range.
Unless the files are highly compressible (50-75%, which I don't
expect), the extra cpu processing of compression will make things only
worse.


david

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07 10:06     ` Boyd Waters
@ 2012-05-08  6:31       ` Chris Samuel
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Samuel @ 2012-05-08  6:31 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 07/05/12 20:06, Boyd Waters wrote:

> Use a directory hierarchy. Even if the filesystem handles a
> flat structure effectively, userspace programs will choke on
> tens of thousands of files in a single directory. For example
> 'ls' will try to lexically sort its output (very slowly) unless
> given the command-line option not to do so.

In my experience it's not so much that lexical sorting that kills you
but the default -F option which gets set for users these days, that
results in ls doing an lstat() on every file to work out if it's an
executable, directory, symlink, etc to modify how it displays it to you.

For instance on one of our HPC systems here we've a user with over
200,000 files in one directory.  It takes about 4 seconds for \ls
whereas \ls -F takes, well I can't tell you because it was still running
after 53 minutes (strace confirmed it was still lstat()ing) when I
killed it..

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07  9:28 ` btrfs and 1 billion small files Alessio Focardi
                     ` (3 preceding siblings ...)
  2012-05-07 15:13   ` David Sterba
@ 2012-05-08 12:31   ` Chris Mason
  2012-05-08 16:51     ` Martin
  4 siblings, 1 reply; 14+ messages in thread
From: Chris Mason @ 2012-05-08 12:31 UTC (permalink / raw)
  To: Alessio Focardi; +Cc: linux-btrfs

On Mon, May 07, 2012 at 11:28:13AM +0200, Alessio Focardi wrote:
> Hi,
> 
> I need some help in designing a storage structure for 1 billion of small files (<512 Bytes), and I was wondering how btrfs will fit in this scenario. Keep in mind that I never worked with btrfs - I just read some documentation and browsed this mailing list - so forgive me if my questions are silly! :X

A few people have already mentioned how btrfs will pack these small
files into metadata blocks.  If you're running btrfs on a single disk,
the mkfs default will duplicate metadata blocks, which will decrease the
files per disk you're able to store.

If you use mkfs.btrfs -m single, you'll store each file only once.  I
recommend some kind of raid for data you care about though, either
hardware raid or putting the files across two drives (mkfs.btrfs -m
raid1 -d raid1)

I suggest you experiment with compression.  Both lzo and zlib will make
the files smaller, but exactly how much depends quite a lot on your
workload.  We compress on a per-extent level, which varies from a single
block to up to much larger sizes.

Newer kernels (3.4 and higher) can support larger metadata block sizes.
This increases storage efficiency because we need fewer extent records
to describe all your metadata blocks.  It also allows us to pack many
more files into a single block, reducing internal btree block
fragmentation.

But the cost is increased CPU usage.  Btrfs hits memmove and memcpy
pretty hard when you're using larger blocks.

I suggest using a 16K or 32K block size.  You can go up to 64K, it may
work well if you have beefy CPUs.  Example for 16K:

mkfs.btrfs -l 16K -n 16K /dev/xxx

Others have already recommended deeper directory trees.  You can
experiment with a few variations here, but a few subdirs will improve
performance.  Too many subdirs will waste kernel ram and resources on
the dentries.

Another thing to keep in mind is that btrfs uses a btree for each
subvolume.  Using multiple subvolumes does allow you to break up the
btree locks and improve concurrency.  You can safely use a subvolume in
most places you would use a top level directory, but remember that
snapshots don't recurse into subvolumes.

-chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-07 11:05   ` vivo75
@ 2012-05-08 16:46     ` Martin
  0 siblings, 0 replies; 14+ messages in thread
From: Martin @ 2012-05-08 16:46 UTC (permalink / raw)
  To: linux-btrfs

On 07/05/12 12:05, vivo75@gmail.com wrote:
> Il 07/05/2012 11:28, Alessio Focardi ha scritto:
>> Hi,
>>
>> I need some help in designing a storage structure for 1 billion of
>> small files (<512 Bytes), and I was wondering how btrfs will fit in
>> this scenario. Keep in mind that I never worked with btrfs - I just
>> read some documentation and browsed this mailing list - so forgive me
>> if my questions are silly! :X

> Are you *really* sure a database is *not* what are you looking for?

My thought also.

Or:

1 billion 512 byte files... Is that not a 512GByte HDD?

With that, use a database to index your data by sector number and
read/write your data direct to the disk?

For that example, your database just holds filename, size, and sector.

If your 512 byte files are written and accessed sequentially, then just
use a HDD and address them by sector number from a database index. That
then becomes your 'filesystem'.

If you need fast random access, then use SSDs.

Plausible?

Regards,
Martin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-08 12:31   ` Chris Mason
@ 2012-05-08 16:51     ` Martin
  2012-05-08 20:54       ` Chris Mason
  0 siblings, 1 reply; 14+ messages in thread
From: Martin @ 2012-05-08 16:51 UTC (permalink / raw)
  To: linux-btrfs

On 08/05/12 13:31, Chris Mason wrote:

[...]
> A few people have already mentioned how btrfs will pack these small
> files into metadata blocks.  If you're running btrfs on a single disk,

[...]
> But the cost is increased CPU usage.  Btrfs hits memmove and memcpy
> pretty hard when you're using larger blocks.
> 
> I suggest using a 16K or 32K block size.  You can go up to 64K, it may
> work well if you have beefy CPUs.  Example for 16K:
> 
> mkfs.btrfs -l 16K -n 16K /dev/xxx

Is that still with "-s 4K" ?


Might that help SSDs that work in 16kByte chunks?

And why are memmove and memcpy more heavily used?

Does that suggest better optimisation of the (meta)data, or just a
greater housekeeping overhead to shuffle data to new offsets?


Regards,
Martin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs and 1 billion small files
  2012-05-08 16:51     ` Martin
@ 2012-05-08 20:54       ` Chris Mason
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Mason @ 2012-05-08 20:54 UTC (permalink / raw)
  To: Martin; +Cc: linux-btrfs

On Tue, May 08, 2012 at 05:51:05PM +0100, Martin wrote:
> On 08/05/12 13:31, Chris Mason wrote:
> 
> [...]
> > A few people have already mentioned how btrfs will pack these small
> > files into metadata blocks.  If you're running btrfs on a single disk,
> 
> [...]
> > But the cost is increased CPU usage.  Btrfs hits memmove and memcpy
> > pretty hard when you're using larger blocks.
> > 
> > I suggest using a 16K or 32K block size.  You can go up to 64K, it may
> > work well if you have beefy CPUs.  Example for 16K:
> > 
> > mkfs.btrfs -l 16K -n 16K /dev/xxx
> 
> Is that still with "-s 4K" ?

Yes, the data sector size should still be the same as the page size.

> 
> 
> Might that help SSDs that work in 16kByte chunks?

Most ssds today work in much larger chunks, so the bulk of the benefit
comes from better packing, and fewer extent records required to hold the
same amount of metadata.

> 
> And why are memmove and memcpy more heavily used?
> 
> Does that suggest better optimisation of the (meta)data, or just a
> greater housekeeping overhead to shuffle data to new offsets?

Inserting something into the middle of a block is more expensive because
we have to shift left and right first.  The bigger the block, the more
we have to shift.

-chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-05-08 20:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1913174825.1910.1336382310577.JavaMail.root@zimbra.interconnessioni.it>
2012-05-07  9:28 ` btrfs and 1 billion small files Alessio Focardi
2012-05-07  9:58   ` Hubert Kario
2012-05-07 10:06     ` Boyd Waters
2012-05-08  6:31       ` Chris Samuel
2012-05-07 10:55   ` Hugo Mills
2012-05-07 11:15     ` Alessio Focardi
2012-05-07 11:39       ` Hugo Mills
2012-05-07 12:19         ` Johannes Hirte
2012-05-07 11:05   ` vivo75
2012-05-08 16:46     ` Martin
2012-05-07 15:13   ` David Sterba
2012-05-08 12:31   ` Chris Mason
2012-05-08 16:51     ` Martin
2012-05-08 20:54       ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).