From: Andreas Ericsson <ae@op5.se>
To: Edward Ned Harvey <git@nedharvey.com>
Cc: git@vger.kernel.org
Subject: Re: git performance
Date: Thu, 23 Oct 2008 09:41:21 +0200 [thread overview]
Message-ID: <49002AA1.80203@op5.se> (raw)
In-Reply-To: <000901c93490$e0c40ed0$a24c2c70$@com>
Edward Ned Harvey wrote:
>> Yes, it does stat all the files. How many files are you talking
>> about, and what platform? From a warm cache on Linux, the 23,000
>> files kernel repo takes about a tenth of a second to stat all files
>> for me (and this on a several year-old machine). And of course many
>> operations don't require stat'ing at all (like looking at logs, or
>> diffs that don't involve the working tree).
>
> No worries. No solution can meet everyone's needs.
>
> I'm talking about 40-50,000 files, on multi-user production linux,
Umm... using git to track a production server? I think there's something
in your specific use-case that eluded pretty much everyone here the
first time you asked about it.
git was built to maintain the linux kernel with its patch-and-merge based
workflow, 117k commits and 25k files. It's *good* at that sort of thing,
but a lot of features are "source-code management" specific. It sounds to
me you're asking for something that will keep a backup of most of your
entire system (apart from /home), which it's not really suited for. For
instance, it doesn't keep track of mode-bits on files (apart from
"executable or not").
> which means the cache is never warm, except when I'm benchmarking.
> Specifically RHEL 4 with the files on NFS mount. Cold cache "svn st"
> takes ~10 mins. Warm cache 20-30 sec. Surprisingly to me,
> performance was approx the same for files on local disk versus NFS.
> Probably the best solution for us is perforce, we just don't like the
> pricetag.
>
> Out of curiosity, what are they talking about, when they say "git is
> fast?"
Merges, patch application, committing, history walking and data
transfers are all extremely quick operations under git.
Actually, history walking isn't extremely quick, but several neat
tricks are in place that make it *seem* quick. Running
"git log drivers/net/wireless" on the linux kernel with a cold
cache starts spitting out output after about 1 second on my measly
laptop (where the kernel has 117k commits on 25k files).
> Just the fact that it's all local disk, or is there more to
> it than that? I could see - git would probably outperform perforce
> for versioning of large files (let's say iso files) to benefit from
> sustained local disk IO, while perforce would probably outperform
> anything I can think of, operating on thousands of tiny files,
> because it will never walk the tree.
>
Git doesn't *have* to walk the tree either. "git status" obviously
has to do that, since you're asking "what files have changed in this
tree since I last added stuff to the index", but you can use git just
fine without ever issuing "git status" (assuming you're the one
controlling the changes, that is).
"git rm" and "git add" won't walk the tree. They're just interested in
the paths you give them and won't touch anything else.
"git commit path1 path2" won't walk the tree. It has to walk the paths
(which can be entire subdirectories, or all of them), but not more than
that.
"git push" (ie, send your changes upstream) won't walk the tree. It'll
just look at the history and how they differ.
"git merge" (and therefore also "git pull") doesn't walk the tree. It
only makes sure paths that are touched by the merge are up-to-date.
Apart from that, it would be trivial to hack up some inotify config
and scripts that stages changes in a separate index-file and then
add a simple wrapper that operates on the separate index-file rather
than the "regular" one.
Sample "giti" wrapper:
--%<--%<--%<--
#!/bin/sh
# giti - inotify driven git wrapper
GIT_INDEX=.git/inotify-index
export GIT_INDEX
case "$@" in
status)
git diff --name-only --cached
exit $?
;;
esac
git "$@"
--%<--%<--%<--
Sample inotify script:
--%<--%<--%<--
#!/bin/sh
GIT_INDEX=.git/inotify-index git add $1
--%<--%<--%<--
Sample incrontab(5) entry:
--%<--%<--%<--
/watched/path IN_CLOSE_WRITE inotify.git $@/$#
--%<--%<--%<--
Totally untested ofcourse, so it probably needs tweaking. It should
work rather well though, assuming you're somewhat careful what
arguments you send to the "giti" wrapper and make sure to never
use any git-commands that *have* to walk the entire tree (such as
"git commit -a").
Let us know how it pans out.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
next prev parent reply other threads:[~2008-10-23 7:42 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-22 20:17 git performance Edward Ned Harvey
2008-10-22 20:36 ` Jeff King
2008-10-22 21:13 ` Peter Harris
2008-10-22 21:55 ` Edward Ned Harvey
2008-10-23 7:11 ` Andreas Ericsson
2008-10-23 7:11 ` Andreas Ericsson
2008-10-23 7:41 ` Andreas Ericsson [this message]
2008-10-23 12:16 ` Matthieu Moy
2008-10-23 16:39 ` Jeff King
[not found] ` <000001c9358f$232bac70$69830550$@com>
2008-10-24 14:29 ` Jeff King
2008-10-24 17:42 ` George Shammas
2008-10-24 19:06 ` Jakub Narebski
2008-10-24 17:53 ` Linus Torvalds
2008-10-24 18:20 ` Jeff King
2008-10-23 18:31 ` Daniel Barkalow
2008-10-23 22:24 ` Nanako Shiraishi
2008-10-24 3:56 ` Daniel Barkalow
2008-10-24 7:55 ` Pete Harlan
2008-10-24 23:10 ` Pete Harlan
2008-10-22 22:42 ` Jakub Narebski
2008-10-23 7:43 ` Andreas Ericsson
2008-10-23 13:04 ` Nguyen Thai Ngoc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49002AA1.80203@op5.se \
--to=ae@op5.se \
--cc=git@nedharvey.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).