From: Chris Mason <chris.mason@oracle.com>
To: Alessio Focardi <alessiof@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs and 1 billion small files
Date: Tue, 8 May 2012 08:31:53 -0400 [thread overview]
Message-ID: <20120508123153.GS11876@shiny> (raw)
In-Reply-To: <711331964.2091.1336382892940.JavaMail.root@zimbra.interconnessioni.it>
On Mon, May 07, 2012 at 11:28:13AM +0200, Alessio Focardi wrote:
> Hi,
>
> I need some help in designing a storage structure for 1 billion of small files (<512 Bytes), and I was wondering how btrfs will fit in this scenario. Keep in mind that I never worked with btrfs - I just read some documentation and browsed this mailing list - so forgive me if my questions are silly! :X
A few people have already mentioned how btrfs will pack these small
files into metadata blocks. If you're running btrfs on a single disk,
the mkfs default will duplicate metadata blocks, which will decrease the
files per disk you're able to store.
If you use mkfs.btrfs -m single, you'll store each file only once. I
recommend some kind of raid for data you care about though, either
hardware raid or putting the files across two drives (mkfs.btrfs -m
raid1 -d raid1)
I suggest you experiment with compression. Both lzo and zlib will make
the files smaller, but exactly how much depends quite a lot on your
workload. We compress on a per-extent level, which varies from a single
block to up to much larger sizes.
Newer kernels (3.4 and higher) can support larger metadata block sizes.
This increases storage efficiency because we need fewer extent records
to describe all your metadata blocks. It also allows us to pack many
more files into a single block, reducing internal btree block
fragmentation.
But the cost is increased CPU usage. Btrfs hits memmove and memcpy
pretty hard when you're using larger blocks.
I suggest using a 16K or 32K block size. You can go up to 64K, it may
work well if you have beefy CPUs. Example for 16K:
mkfs.btrfs -l 16K -n 16K /dev/xxx
Others have already recommended deeper directory trees. You can
experiment with a few variations here, but a few subdirs will improve
performance. Too many subdirs will waste kernel ram and resources on
the dentries.
Another thing to keep in mind is that btrfs uses a btree for each
subvolume. Using multiple subvolumes does allow you to break up the
btree locks and improve concurrency. You can safely use a subvolume in
most places you would use a top level directory, but remember that
snapshots don't recurse into subvolumes.
-chris
next prev parent reply other threads:[~2012-05-08 12:31 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1913174825.1910.1336382310577.JavaMail.root@zimbra.interconnessioni.it>
2012-05-07 9:28 ` btrfs and 1 billion small files Alessio Focardi
2012-05-07 9:58 ` Hubert Kario
2012-05-07 10:06 ` Boyd Waters
2012-05-08 6:31 ` Chris Samuel
2012-05-07 10:55 ` Hugo Mills
2012-05-07 11:15 ` Alessio Focardi
2012-05-07 11:39 ` Hugo Mills
2012-05-07 12:19 ` Johannes Hirte
2012-05-07 11:05 ` vivo75
2012-05-08 16:46 ` Martin
2012-05-07 15:13 ` David Sterba
2012-05-08 12:31 ` Chris Mason [this message]
2012-05-08 16:51 ` Martin
2012-05-08 20:54 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120508123153.GS11876@shiny \
--to=chris.mason@oracle.com \
--cc=alessiof@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).