From: "Ted Ts'o" <tytso@mit.edu>
To: Jacek Luczak <difrost.kernel@gmail.com>
Cc: Andreas Dilger <adilger@whamcloud.com>,
Lukas Czerner <lczerner@redhat.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: getdents - ext4 vs btrfs performance
Date: Sun, 18 Mar 2012 16:56:58 -0400 [thread overview]
Message-ID: <20120318205658.GB31682@thunk.org> (raw)
In-Reply-To: <CADDYkjRSd-Dv2jECwKt=3Q95hceVhiA+StZj1fvwywZHWaEgfw@mail.gmail.com>
On Thu, Mar 15, 2012 at 11:42:24AM +0100, Jacek Luczak wrote:
>
> That was not a SVN server. It was a build host having checkouts of SVN
> projects.
>
> The many files/dirs case is common for VCS and the SVN is not the only
> that would be affected here.
Well, with SVN it's 2x or 3x the number of files in the checked out
source code directory, right? So if a particular source tree has
2,000 files in a source directory, then SVN might have at most 6,000
files, and if you assume each directory entry is 64 bytes, we're still
talking about 375k. Do you have more files than that in a directory
in practice with SVN? And if so why?
> AFAIR git.kernel.org was also suffering from the getdents().
git.kernel.org was suffering from a different problem, which was that
the git.kernel.org administrators didn't feel like automatically doing
a "git gc" on all of the repositories, and a lot of people were just
doing "git pushes" and not bothering to gc their repositories. Since
git.kernel.org users don't have shell access any more, the
git.kernel.org administrators have to be doing automatic git gc's. By
default git is supposed to automatically do a gc when there are more
than 6700 loose object files (which are distributed across 256 1st
level directories, so in practice a .git/objects/XX directory
shouldn't have more than 30 objects in it, which each directory object
taking 48 bytes). The problem I believe is that "git push" commands
weren't checking gc.auto limit, and so that's why git.kernel.org had
in the past suffered from large directories. This is arguably a git
bug, though, and as I mentioned, since we all don't have shell access
to git.kernel.org, this has to be handled automatically now....
> Same applies to commercial products that are
> heavily stuffed with many files/dirs, e.g. ClearCase or Synergy.
How many files in a dircectory do we commonly see with these systems?
I'm not familiar with them, and so I don't have a good feel for what
typical directory sizes tend to be.
> A medium size you are referring would most probably fit into 256k and
> this could be enough for 90% of cases. Large production system running
> on ext4 need backups thus those would benefit the most here.
Yeah, 256k or 512k is probably the best. Alternatively, the backup
programs could simply be taught to sort the directory entries by inode
number, and if that's not enough, to grab the initial block numbers
using FIEMAP and then sort by block number. Of course, all of this
optimization may or may not actually give us as much returns as we
think, given that the disk is probably seeking from other workloads
happening in parallel anyway (another reason why I am suspicious that
timing the tar command may not be an accurate way of measuring actual
performance when you have other tasks accessing the file system in
parallel with the backup).
- Ted
next prev parent reply other threads:[~2012-03-18 20:57 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-29 13:52 getdents - ext4 vs btrfs performance Jacek Luczak
2012-02-29 13:55 ` Jacek Luczak
2012-02-29 14:07 ` Jacek Luczak
2012-02-29 14:21 ` Jacek Luczak
2012-02-29 14:42 ` Chris Mason
2012-02-29 14:55 ` Jacek Luczak
2012-03-01 13:35 ` Jacek Luczak
2012-03-01 13:50 ` Hillf Danton
2012-03-01 14:03 ` Jacek Luczak
2012-03-01 14:18 ` Chris Mason
2012-03-01 14:43 ` Jacek Luczak
2012-03-01 14:51 ` Chris Mason
2012-03-01 14:57 ` Jacek Luczak
2012-03-01 18:42 ` Ted Ts'o
2012-03-02 9:51 ` Jacek Luczak
2012-03-01 4:44 ` Theodore Tso
2012-03-01 14:38 ` Chris Mason
2012-03-02 10:05 ` Jacek Luczak
2012-03-02 14:00 ` Chris Mason
2012-03-02 14:16 ` Jacek Luczak
2012-03-02 14:26 ` Chris Mason
2012-03-02 19:32 ` Ted Ts'o
2012-03-02 19:50 ` Chris Mason
2012-03-05 13:10 ` Jan Kara
2012-03-03 22:41 ` Jacek Luczak
2012-03-04 10:25 ` Jacek Luczak
2012-03-05 11:32 ` Jacek Luczak
2012-03-06 0:37 ` Chris Mason
2012-03-08 17:02 ` Phillip Susi
2012-03-09 11:29 ` Lukas Czerner
2012-03-09 14:34 ` Chris Mason
2012-03-10 0:09 ` Andreas Dilger
2012-03-10 4:48 ` Ted Ts'o
2012-03-11 10:30 ` Andreas Dilger
2012-03-11 16:13 ` Ted Ts'o
2012-03-15 10:42 ` Jacek Luczak
2012-03-18 20:56 ` Ted Ts'o [this message]
2012-03-13 19:05 ` Phillip Susi
2012-03-13 19:53 ` Ted Ts'o
2012-03-13 20:22 ` Phillip Susi
2012-03-13 21:33 ` Ted Ts'o
2012-03-14 2:48 ` Yongqiang Yang
2012-03-14 2:51 ` Ted Ts'o
2012-03-14 14:17 ` Zach Brown
2012-03-14 16:48 ` Ted Ts'o
2012-03-14 17:37 ` Zach Brown
2012-03-14 8:12 ` Lukas Czerner
2012-03-14 9:29 ` Yongqiang Yang
2012-03-14 9:38 ` Lukas Czerner
2012-03-14 12:50 ` Ted Ts'o
2012-03-14 14:34 ` Lukas Czerner
2012-03-14 17:02 ` Ted Ts'o
2012-03-14 19:17 ` Chris Mason
2012-03-14 14:28 ` Phillip Susi
2012-03-14 16:54 ` Ted Ts'o
2012-03-10 3:52 ` Ted Ts'o
2012-03-15 7:59 ` Jacek Luczak
-- strict thread matches above, loose matches on Subject: below --
2012-02-29 13:31 Jacek Luczak
2012-02-29 13:51 ` Chris Mason
2012-02-29 14:00 ` Lukas Czerner
2012-02-29 14:05 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120318205658.GB31682@thunk.org \
--to=tytso@mit.edu \
--cc=adilger@whamcloud.com \
--cc=difrost.kernel@gmail.com \
--cc=lczerner@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox