From: Chris Mason <mason@suse.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Krzysztof Halasa <khc@pm.waw.pl>, git@vger.kernel.org
Subject: Re: [PATCH] multi item packed files
Date: Fri, 22 Apr 2005 14:58:34 -0400 [thread overview]
Message-ID: <200504221458.36300.mason@suse.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0504220916060.2344@ppc970.osdl.org>
On Friday 22 April 2005 12:22, Linus Torvalds wrote:
> On Thu, 21 Apr 2005, Chris Mason wrote:
> > We can sort by the files before reading them in, but even if we order
> > things perfectly, we're spreading the io out too much across the drive.
>
> No we don't.
>
> It's easy to just copy the repository in a way where this just isn't true:
> you sort the objects by how far they are from the current HEAD, and you
> just copy the repository in that order ("furthest" objects first - commits
> last).
>
> That's what I meant by defragmentation - you can actually do this on your
> own, even if your filesystem doesn't support it.
This certainly can help. Based on some ideas from andrea I made a poor man's
defrag script last year that was similar. It worked by copying files into a
flat dir in the order you expected to read them in, deleting the original,
then hard linking them into their original name.
Copying in order straight into a new git tree doesn't help much when the
filesystem is using the subdirectory as a hint to block allocation. So
you'll probably have to copy them all into a flat directory and then hard
link back into the git tree (the flat dir can then be deleted of course).
The problem I see for git is that once you have enough data, it should degrade
over and over again somewhat quickly. My own guess is that you'll need to
run the script at least monthly. If we're designing the thing now and say
'wow, that's going to be really slow without help', it doesn't hurt to look
at alternatives.
I grabbed Ingo's tarball of 28,000 patches since 2.4.0 and applied them all
into git on ext3 (htree). It only took ~2.5 hrs to apply. I did use my
write-tree patch where you had to give write-tree a list of directories to
search, but I don't think this helped much since the operation was mostly
disk write bound.
Anyway, I ended up with a 2.6GB .git directory. Then I:
rm .git/index
umount ; mount again
time read-tree `tree-id` (24.45s)
time checkout-cache --prefix=../checkout/ -a -f (4m30s)
--prefix is neat ;)
The tree that ended up in checkout was 239456k, giving us an effective io rate
for checkout-cache of 885k/s. (this drive gets 24MB/s sequential reads).
I'll have numbers for the packed files later on today. No, I don't really
expect the numbers will convince you to implement some kind of packing ;)
But it's still a good data point to have, and generating them here is just
poking the box every 2 hours or so.
-chris
next prev parent reply other threads:[~2005-04-22 18:54 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-21 15:13 [PATCH] multi item packed files Chris Mason
2005-04-21 15:41 ` Linus Torvalds
2005-04-21 16:23 ` Chris Mason
2005-04-21 19:28 ` Krzysztof Halasa
2005-04-21 20:07 ` Linus Torvalds
2005-04-22 9:40 ` Krzysztof Halasa
2005-04-22 18:12 ` Martin Uecker
2005-04-21 20:22 ` Chris Mason
2005-04-21 22:47 ` Linus Torvalds
2005-04-22 0:16 ` Chris Mason
2005-04-22 16:22 ` Linus Torvalds
2005-04-22 18:58 ` Chris Mason [this message]
2005-04-22 19:43 ` Linus Torvalds
2005-04-22 20:32 ` Chris Mason
2005-04-22 23:55 ` Chris Mason
2005-04-25 22:20 ` Chris Mason
2005-04-22 9:48 ` Krzysztof Halasa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200504221458.36300.mason@suse.com \
--to=mason@suse.com \
--cc=git@vger.kernel.org \
--cc=khc@pm.waw.pl \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).