* Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial @ 2007-08-01 0:16 Jakub Narebski 2007-08-01 2:14 ` Linus Torvalds 2007-08-01 2:17 ` Shawn O. Pearce 0 siblings, 2 replies; 29+ messages in thread From: Jakub Narebski @ 2007-08-01 0:16 UTC (permalink / raw) To: git I have lately added new Git speed benchmark, from Bryan Murdock blog. The repository is bit untypical: <quote> By performance, I mean that I used the UNIX time command to see how long various basic operations took. Performing the various basic operations gave me some insight into the usability of each as well. For this test I used a directory with 266 MB of files, 258 KB of which were text files, with the rest being image files. I know, kind of weird to version all those binary files, but that was the project I was interested in testing this out on. Your mileage may vary and all that. Here’s a table summarizing the real times reported by time(1): </quote> If I remember correctly there were some patches to git which tried to better deal with large blobs. In this simple benchmark git was outperformed by Mercurial and even Bazaar-NG a bit. http://git.or.cz/gitwiki/GitBenchmarks#head-5657b8361895b5a02c0de39337c410e4d8dcdbce http://bryan-murdock.blogspot.com/2007/03/cutting-edge-revision-control.html -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 0:16 Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial Jakub Narebski @ 2007-08-01 2:14 ` Linus Torvalds 2007-08-01 5:50 ` Junio C Hamano 2007-08-01 8:33 ` Jakub Narebski 2007-08-01 2:17 ` Shawn O. Pearce 1 sibling, 2 replies; 29+ messages in thread From: Linus Torvalds @ 2007-08-01 2:14 UTC (permalink / raw) To: Jakub Narebski; +Cc: Git Mailing List On Wed, 1 Aug 2007, Jakub Narebski wrote: > > If I remember correctly there were some patches to git which tried to > better deal with large blobs. In this simple benchmark git was > outperformed by Mercurial and even Bazaar-NG a bit. It's almost certainly not the binary blobs. I think almost all the difference is from the cloning, without repacking the souce or using a local clone. The default action for a git clone is to create a pack-file, and do a local clone as if you did it over the network. That is obviously much slower than using the "-l" flag for the _clone_ action, but it tends to be better for the end result - since you get a nice packed starting point, and none of the confusion with hardlinks etc. [ Maybe I'm just a worry-wart, but hardlinking two repos still makes me worried. Even though we never modify the object files. Quite frankly, I almost wish we hadn't ever done "-l" at all, and I cannot really suggest using it. Either use "-s" for the truly shared repository, or use the default pack-generating one. The hardlinking one was simple and made sense, but it's really not very nice. But that aversion to "git clone -l" is really totally illogical. The way we do the object handling, hardlinking object files in git is just about the most safe operation you can think of - and I *still* shudder at it ] Now, I think the "always act as if you were network transparent" by default is great, but especially if you have never run "git gc" to generate a pack to begin with, it's going to be a very costly thing. And I think that's what the numbers show. That's the only op we do a *lot* worse on than we should. (The "nonconflicting merge" is probably - once more - the diffstat generation that bites us. That's generally the most costly thing of the whole merge, but I *love* the diffstat). That said, even if he had done a "git gc", to be fair he would have had to include the cost of that first garbage collect in the "initial import", so the end result would have been exactly the same. Git _does_ end up having a very odd performance profile, and while it's optimized for certain thing, the "initial import" is not one of them. (Which admittedly is a bit odd. The reason I didn't ever seriously even consider monotone was that the initial import was so *incredibly* sucky, and took hours for the kernel. So use "-l" for benchmarks, and damn my "I hate hardlinking repos" idiocy). So the only way to truly do a fast initial import *and* get a reasonably good initial clone is likely one of: - take full advantage of git, and use local branches, instead of bothering with lots of clones. I think that this is often the right thing to do, but it's obviously not fair for comparisons, since it's really something different from what's likely available in the other SCM's. But it's the "git way". - use "git clone -s" (or "-l"). I think the hg numbers are the result of hg defaulting to "-l" behaviour. Which makes sense for hg, since people need to clone more (in git, you'd generally work with local branches instead). - or the initial import would be done with some "git fast-import" thing, rather than "git add ." We don't do it now, and the resulting pack-file wouldn't be optimal, but it would be reasonable. It would at least cut down a _bit_ on the clone cost. The other reaction I took away from that (quite reasonable, I think) comparison is that I think Murdock would have been much happier if git diff defaulted to "-C". We don't do that (for the best of reasons: interoperability), but maybe we should document the "-M/-C" options more. The options do show up in the man-page, but apparently not obviously enough, since he hadn't noticed. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 2:14 ` Linus Torvalds @ 2007-08-01 5:50 ` Junio C Hamano 2007-08-01 8:48 ` David Kastrup 2007-08-01 9:24 ` Theodore Tso 2007-08-01 8:33 ` Jakub Narebski 1 sibling, 2 replies; 29+ messages in thread From: Junio C Hamano @ 2007-08-01 5:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > (Which admittedly is a bit odd. The reason I didn't ever seriously even > consider monotone was that the initial import was so *incredibly* sucky, > and took hours for the kernel. So use "-l" for benchmarks, and damn my > "I hate hardlinking repos" idiocy). I would call aversion to -l a superstition, while aversion to -s has a sound technical reasons. The latter means you need to know what you are doing --- namely, you are making the clone still dependent on the original. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 5:50 ` Junio C Hamano @ 2007-08-01 8:48 ` David Kastrup 2007-08-01 9:24 ` Theodore Tso 1 sibling, 0 replies; 29+ messages in thread From: David Kastrup @ 2007-08-01 8:48 UTC (permalink / raw) To: git Junio C Hamano <gitster@pobox.com> writes: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> (Which admittedly is a bit odd. The reason I didn't ever seriously even >> consider monotone was that the initial import was so *incredibly* sucky, >> and took hours for the kernel. So use "-l" for benchmarks, and damn my >> "I hate hardlinking repos" idiocy). > > I would call aversion to -l a superstition, while aversion to -s > has a sound technical reasons. The latter means you need to know > what you are doing --- namely, you are making the clone still > dependent on the original. Well, I'd not call the -l aversy a complete superstition: it means that cloning a repository won't provide any redundancy worth noting against file system corruption. -- David Kastrup ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 5:50 ` Junio C Hamano 2007-08-01 8:48 ` David Kastrup @ 2007-08-01 9:24 ` Theodore Tso 2007-08-01 10:15 ` Junio C Hamano 1 sibling, 1 reply; 29+ messages in thread From: Theodore Tso @ 2007-08-01 9:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Jakub Narebski, Git Mailing List On Tue, Jul 31, 2007 at 10:50:48PM -0700, Junio C Hamano wrote: > I would call aversion to -l a superstition, while aversion to -s > has a sound technical reasons. The latter means you need to know > what you are doing --- namely, you are making the clone still > dependent on the original. So would you accept a patch which adds a git-config variable which specifies whether or not local clones should use hard links by default (defaulting to yes), and which adds a --no-hard-links option to git-clone to override the config option? I could imagine a situation where if you are using a git repository exclusively on a local system, with no remote repositories to act as backups, where you might want git clone to to make full copies to provide backups in case of filesystem or disk induced corruption. But most of the time there are enough copies of the the repo on other machines that the need for making separate copies of the git objects/packs isn't really needed. - Ted ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 9:24 ` Theodore Tso @ 2007-08-01 10:15 ` Junio C Hamano 2007-08-01 13:20 ` Alex Riesen ` (4 more replies) 0 siblings, 5 replies; 29+ messages in thread From: Junio C Hamano @ 2007-08-01 10:15 UTC (permalink / raw) To: Theodore Tso; +Cc: Linus Torvalds, Jakub Narebski, Git Mailing List Theodore Tso <tytso@mit.edu> writes: > On Tue, Jul 31, 2007 at 10:50:48PM -0700, Junio C Hamano wrote: >> I would call aversion to -l a superstition, while aversion to -s >> has a sound technical reasons. The latter means you need to know >> what you are doing --- namely, you are making the clone still >> dependent on the original. > > So would you accept a patch which adds a git-config variable which > specifies whether or not local clones should use hard links by default > (defaulting to yes), and which adds a --no-hard-links option to > git-clone to override the config option? Are you suggesting to make -l the default for local, in other words? I personally do not make local clone often enough that I am not disturbed having to type extra " -l" on the command line. But giving a way to force "copy not hardlink" while still avoiding "the same as the networked case by doing pack transfer" overhead may be a good thing to do. Perhaps if the destination is local, - if -s is given, just set up alternates, do nothing else; - by default, do "always copy never hardlink"; - with -l, do "hardlink if possible"; Hmmmm... ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 10:15 ` Junio C Hamano @ 2007-08-01 13:20 ` Alex Riesen 2007-08-01 13:20 ` Alex Riesen 2007-08-01 15:49 ` Carl Worth ` (3 subsequent siblings) 4 siblings, 1 reply; 29+ messages in thread From: Alex Riesen @ 2007-08-01 13:20 UTC (permalink / raw) To: Junio C Hamano Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Git Mailing List On 8/1/07, Junio C Hamano <gitster@pobox.com> wrote: > Theodore Tso <tytso@mit.edu> writes: > > So would you accept a patch which adds a git-config variable which > > specifies whether or not local clones should use hard links by default > > (defaulting to yes), and which adds a --no-hard-links option to > > git-clone to override the config option? > > Are you suggesting to make -l the default for local, in other > words? I personally do not make local clone often enough that I > am not disturbed having to type extra " -l" on the command line. ...as long as the underlying filesystem _supports_ hardlinks. BTW, we need a warning when falling back to normal copy, if git-clone -l is used. The user _asked_ for a hard-linked clone, but silently got something else. Something like this: diff --git a/git-clone.sh b/git-clone.sh index 0922554..a744f5b 100755 --- a/git-clone.sh +++ b/git-clone.sh @@ -266,6 +266,7 @@ yes,yes) l= if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null then + echo >&2 "Hardlinks not supported. Falling back to copy" l=l fi && rm -f "$GIT_DIR/objects/sample" && ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 13:20 ` Alex Riesen @ 2007-08-01 13:20 ` Alex Riesen 2007-08-01 13:23 ` Alex Riesen 0 siblings, 1 reply; 29+ messages in thread From: Alex Riesen @ 2007-08-01 13:20 UTC (permalink / raw) To: Junio C Hamano Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Git Mailing List On 8/1/07, Alex Riesen <raa.lkml@gmail.com> wrote: > if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null > then > + echo >&2 "Hardlinks not supported. Falling back to copy" > l=l > fi && Err, the other way around, of course. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 13:20 ` Alex Riesen @ 2007-08-01 13:23 ` Alex Riesen 0 siblings, 0 replies; 29+ messages in thread From: Alex Riesen @ 2007-08-01 13:23 UTC (permalink / raw) To: Junio C Hamano Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Git Mailing List On 8/1/07, Alex Riesen <raa.lkml@gmail.com> wrote: > On 8/1/07, Alex Riesen <raa.lkml@gmail.com> wrote: > > if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null > > then > > + echo >&2 "Hardlinks not supported. Falling back to copy" > > l=l > > fi && > > Err, the other way around, of course. > diff --git a/git-clone.sh b/git-clone.sh index 0922554..483b91d 100755 --- a/git-clone.sh +++ b/git-clone.sh @@ -264,8 +264,10 @@ yes,yes) test -f "$repo/$sample_file" || exit l= - if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null + if ! ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null then + echo >&2 "Hardlinks not supported. Falling back to copy" + else l=l fi && rm -f "$GIT_DIR/objects/sample" && ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 10:15 ` Junio C Hamano 2007-08-01 13:20 ` Alex Riesen @ 2007-08-01 15:49 ` Carl Worth 2007-08-01 17:03 ` Linus Torvalds 2007-08-01 22:03 ` Theodore Tso ` (2 subsequent siblings) 4 siblings, 1 reply; 29+ messages in thread From: Carl Worth @ 2007-08-01 15:49 UTC (permalink / raw) To: Junio C Hamano Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 1108 bytes --] On Wed, 01 Aug 2007 03:15:25 -0700, Junio C Hamano wrote: > > Are you suggesting to make -l the default for local, in other > words? I personally do not make local clone often enough that I > am not disturbed having to type extra " -l" on the command line. Personally, I think it would be a great default. And I think the frequency with which you type this command is not a good metric for deciding if a command-line option should be required. Instead, the focus should be on having good defaults for a good user experience, (for example, the benchmarking that started this thread that gave a bad first impression of git). So, just making git-clone go as fast as possible when local, without requiring any additional options from the user, would be a very good thing. As for the concern that new users might do local clones in the hope to get some redundancy, hopefully the fact that the operation is instantaneous will give plenty of clue to the user that no redundancy has been provided. That should be enough to send the user looking for the documentation to find the --no-hard-links option. -Carl [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 15:49 ` Carl Worth @ 2007-08-01 17:03 ` Linus Torvalds 2007-08-01 18:17 ` David Kastrup 2007-08-02 6:09 ` Junio C Hamano 0 siblings, 2 replies; 29+ messages in thread From: Linus Torvalds @ 2007-08-01 17:03 UTC (permalink / raw) To: Carl Worth; +Cc: Junio C Hamano, Theodore Tso, Jakub Narebski, Git Mailing List On Wed, 1 Aug 2007, Carl Worth wrote: > > On Wed, 01 Aug 2007 03:15:25 -0700, Junio C Hamano wrote: > > > > Are you suggesting to make -l the default for local, in other > > words? I personally do not make local clone often enough that I > > am not disturbed having to type extra " -l" on the command line. > > Personally, I think it would be a great default. I suspect it probably *would* make sense to default to "-l". Even if it makes me get goose-bumps. I freely admit that my worries are totally illogical. We might make it something like: "if you use an url, we don't default to local", so the difference would be that git clone file:///directory/to/repo would work the way it does now, but git clone /directory/to/repo would default to "-l" behaviour. That kind of would make sense (and should be easy to implement: it would be a trivial fixup to "connect.c". Something like this adds support for "file://". And then git-clone could just do something like # if the source is a local directory, default to local if [ -d "$src" ]; then use_local=yes fi or similar. Linus --- connect.c | 12 +++++++----- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/connect.c b/connect.c index 715cdc0..ae49c5a 100644 --- a/connect.c +++ b/connect.c @@ -145,6 +145,8 @@ static enum protocol get_protocol(const char *name) return PROTO_SSH; if (!strcmp(name, "ssh+git")) return PROTO_SSH; + if (!strcmp(name, "file")) + return PROTO_LOCAL; die("I don't handle protocol '%s'", name); } @@ -498,13 +500,13 @@ pid_t git_connect(int fd[2], char *url, const char *prog, int flags) end = host; path = strchr(end, c); - if (c == ':') { - if (path) { + if (path) { + if (c == ':') { protocol = PROTO_SSH; *path++ = '\0'; - } else - path = host; - } + } + } else + path = end; if (!path || !*path) die("No path specified. See 'man git-pull' for valid url syntax"); ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 17:03 ` Linus Torvalds @ 2007-08-01 18:17 ` David Kastrup 2007-08-01 20:36 ` Florian Weimer 2007-08-02 6:09 ` Junio C Hamano 1 sibling, 1 reply; 29+ messages in thread From: David Kastrup @ 2007-08-01 18:17 UTC (permalink / raw) To: Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > I suspect it probably *would* make sense to default to "-l". Even if it > makes me get goose-bumps. I freely admit that my worries are totally > illogical. > > We might make it something like: "if you use an url, we don't default to > local", Couldn't git clone http://host/directory/to/repo tell the proxy that it should enter off-line mode and stop updating? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 18:17 ` David Kastrup @ 2007-08-01 20:36 ` Florian Weimer 0 siblings, 0 replies; 29+ messages in thread From: Florian Weimer @ 2007-08-01 20:36 UTC (permalink / raw) To: git * David Kastrup: > Couldn't git clone http://host/directory/to/repo tell the proxy that > it should enter off-line mode and stop updating? Huh? I don't see how this is relevant to the current thread. Anyway, I don't think the max-stale cache control mechanism is widely implemented. If you want effective expiry controls, you need to implement them on the server side. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 17:03 ` Linus Torvalds 2007-08-01 18:17 ` David Kastrup @ 2007-08-02 6:09 ` Junio C Hamano 2007-08-02 10:29 ` David Kastrup 1 sibling, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2007-08-02 6:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Carl Worth, Theodore Tso, Jakub Narebski, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > We might make it something like: "if you use an url, we don't default to > local", so the difference would be that > > git clone file:///directory/to/repo > > would work the way it does now, but > > git clone /directory/to/repo > > would default to "-l" behaviour. That kind of would make sense (and should > be easy to implement: it would be a trivial fixup to "connect.c". The attached does not default to "-l", but filesystem level copy behaviour, which is what happens with "clone -l" across filesystem boundaries with the current code. Clone of linux-2.6 repository (the source is well packed) (hardlink -- obviously, almost no cost) $ /usr/bin/time git clone -l --bare linux-2.6 l-clone.git Initialized empty Git repository in /git/l-clone.git/ 0 blocks 0.55user 1.00system 0:01.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+206724minor)pagefaults 0swaps (same-as-network) $ /usr/bin/time git clone --bare file://`pwd`/linux-2.6 n-clone.git Initialized empty Git repository in /git/n-clone.git/ remote: Generating pack... remote: Counting objects: 1076746 remote: Done counting 1169654 objects. remote: Deltifying 1169654 objects... remote: 100% (1169654/1169654) done Indexing 1169654 objects... 100% (1169654/1169654) done remote: Total 1169654 (delta 959223), reused 1160595 (delta 950164) Resolving 959223 deltas... 100% (959223/959223) done 172.85user 20.94system 4:25.88elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6294major+2019874minor)pagefaults 0swaps (copy -- takes a lot more than hardlink but cheaper than net) $ /usr/bin/time git clone --bare linux-2.6 c-clone.git Initialized empty Git repository in /git/c-clone.git/ 1266644 blocks 0.92user 10.81system 0:38.38elapsed 30%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (406major+204775minor)pagefaults 0swaps I am ambivalent between -l vs no -l. * Without -l (i.e. have all objects/ copied via cpio) would not catch the source repository corruption, and also risks corrupted recipient repository if an alpha-particle hits memory cell while indexing and resolving deltas. As long as the recipient is made uncorrupted, you have a good back-up. * same-as-network is expensive, but it would catch if the source is already corrupted. It still risks corrupted recipient repository. As long as the recipient is made uncorrupted, you have a good back-up. * With -l, as long as the source repository is healthy, it is very likely that the recipient would be, too. Also it is very cheap. You do not get any back-up benefit. None of the method is resilient against the source repository corruption, so let's discount that from the comparison. Then the differences between -l and non -l matters primarily if you value the back-up benefit or not. If you want to use the cloned repository as a back-up, then it is cheaper to do a non -l clone and two git-fsck (source before clone, recipient after clone) than same-as-network clone, especially as you are likely to do a git-fsck on the recipient if you are so paranoid anyway. Which leads me to believe that being able to use file:/// is probably a good idea, if only for testability, but probably of little practical value, and we can default to -l for everyday use, and paranoids can use non -l as a way to make a back-up. --- git-clone.sh | 61 +++++++++++++++++++++++--------------------- t/t5500-fetch-pack.sh | 2 +- t/t5700-clone-reference.sh | 2 +- t/t5701-clone-local.sh | 17 ++++++++++++ 4 files changed, 51 insertions(+), 31 deletions(-) diff --git a/git-clone.sh b/git-clone.sh index 0922554..0583f64 100755 --- a/git-clone.sh +++ b/git-clone.sh @@ -87,7 +87,7 @@ Perhaps git-update-server-info needs to be run there?" quiet= local=no -use_local=no +use_local_hardlink=no local_shared=no unset template no_checkout= @@ -108,9 +108,10 @@ while no_checkout=yes ;; *,--na|*,--nak|*,--nake|*,--naked|\ *,-b|*,--b|*,--ba|*,--bar|*,--bare) bare=yes ;; - *,-l|*,--l|*,--lo|*,--loc|*,--loca|*,--local) use_local=yes ;; + *,-l|*,--l|*,--lo|*,--loc|*,--loca|*,--local) + use_local_hardlink=yes ;; *,-s|*,--s|*,--sh|*,--sha|*,--shar|*,--share|*,--shared) - local_shared=yes; use_local=yes ;; + local_shared=yes; ;; 1,--template) usage ;; *,--template) shift; template="--template=$1" ;; @@ -249,34 +250,36 @@ fi rm -f "$GIT_DIR/CLONE_HEAD" # We do local magic only when the user tells us to. -case "$local,$use_local" in -yes,yes) +case "$local" in +yes) ( cd "$repo/objects" ) || - die "-l flag seen but repository '$repo' is not local." + die "cannot chdir to local '$repo/objects'." - case "$local_shared" in - no) - # See if we can hardlink and drop "l" if not. - sample_file=$(cd "$repo" && \ - find objects -type f -print | sed -e 1q) - - # objects directory should not be empty since we are cloning! - test -f "$repo/$sample_file" || exit - - l= - if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null - then - l=l - fi && - rm -f "$GIT_DIR/objects/sample" && - cd "$repo" && - find objects -depth -print | cpio -pumd$l "$GIT_DIR/" || exit 1 - ;; - yes) - mkdir -p "$GIT_DIR/objects/info" - echo "$repo/objects" >> "$GIT_DIR/objects/info/alternates" - ;; - esac + if test "$local_shared" = yes + then + mkdir -p "$GIT_DIR/objects/info" + echo "$repo/objects" >>"$GIT_DIR/objects/info/alternates" + else + l= && + if test "$use_local_hardlink" = yes + then + # See if we can hardlink and drop "l" if not. + sample_file=$(cd "$repo" && \ + find objects -type f -print | sed -e 1q) + # objects directory should not be empty because + # we are cloning! + test -f "$repo/$sample_file" || exit + if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null + then + rm -f "$GIT_DIR/objects/sample" + l=l + else + echo >&2 "Warning: -l asked but cannot hardlink to $repo" + fi + fi && + cd "$repo" && + find objects -depth -print | cpio -pumd$l "$GIT_DIR/" || exit 1 + fi git-ls-remote "$repo" >"$GIT_DIR/CLONE_HEAD" || exit 1 ;; *) diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index 7da5153..7b6798d 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -129,7 +129,7 @@ pull_to_client 2nd "B" $((64*3)) pull_to_client 3rd "A" $((1*3)) # old fails -test_expect_success "clone shallow" "git-clone --depth 2 . shallow" +test_expect_success "clone shallow" "git-clone --depth 2 file://`pwd`/. shallow" (cd shallow; git count-objects -v) > count.shallow diff --git a/t/t5700-clone-reference.sh b/t/t5700-clone-reference.sh index 6d43252..4e93aaa 100755 --- a/t/t5700-clone-reference.sh +++ b/t/t5700-clone-reference.sh @@ -51,7 +51,7 @@ diff expected current' cd "$base_dir" test_expect_success 'cloning with reference (no -l -s)' \ -'git clone --reference B A D' +'git clone --reference B file://`pwd`/A D' cd "$base_dir" diff --git a/t/t5701-clone-local.sh b/t/t5701-clone-local.sh index b093327..032c498 100755 --- a/t/t5701-clone-local.sh +++ b/t/t5701-clone-local.sh @@ -43,4 +43,21 @@ test_expect_success 'local clone from x.git that does not exist' ' fi ' +test_expect_success 'Without -l, local will make a copy' ' + cd "$D" && + git clone --bare x w && + cd w && + linked=$(find objects -type f ! -links 1 | wc -l) && + test "$linked" = 0 +' + +test_expect_success 'With -l, local will make a hardlink' ' + cd "$D" && + rm -fr w && + git clone -l --bare x w && + cd w && + copied=$(find objects -type f -links 1 | wc -l) && + test "$copied" = 0 +' + test_done ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-02 6:09 ` Junio C Hamano @ 2007-08-02 10:29 ` David Kastrup 2007-08-03 0:51 ` Junio C Hamano 0 siblings, 1 reply; 29+ messages in thread From: David Kastrup @ 2007-08-02 10:29 UTC (permalink / raw) To: git Junio C Hamano <gitster@pobox.com> writes: > * With -l, as long as the source repository is healthy, it is > very likely that the recipient would be, too. Also it is > very cheap. You do not get any back-up benefit. Oh, but one does: an overzealous prune or rm -oopswrongoption in one repo does not hurt the other. > Which leads me to believe that being able to use file:/// is > probably a good idea, if only for testability, but probably of > little practical value, and we can default to -l for everyday > use, and paranoids can use non -l as a way to make a back-up. Sane enough, I guess. -- David Kastrup ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-02 10:29 ` David Kastrup @ 2007-08-03 0:51 ` Junio C Hamano 2007-08-03 6:14 ` David Kastrup 2007-08-03 8:20 ` Johan Herland 0 siblings, 2 replies; 29+ messages in thread From: Junio C Hamano @ 2007-08-03 0:51 UTC (permalink / raw) To: David Kastrup; +Cc: git David Kastrup <dak@gnu.org> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> * With -l, as long as the source repository is healthy, it is >> very likely that the recipient would be, too. Also it is >> very cheap. You do not get any back-up benefit. > > Oh, but one does: an overzealous prune or rm -oopswrongoption in one > repo does not hurt the other. That's not "back-up" benefit I was thinking about. It is more about protecting your data from hardware failure. You physically have bits in two places, preferrably on separate disk drives. And that is what you do not get from hardlinked clone. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-03 0:51 ` Junio C Hamano @ 2007-08-03 6:14 ` David Kastrup 2007-08-03 8:20 ` Johan Herland 1 sibling, 0 replies; 29+ messages in thread From: David Kastrup @ 2007-08-03 6:14 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > David Kastrup <dak@gnu.org> writes: > >> Junio C Hamano <gitster@pobox.com> writes: >> >>> * With -l, as long as the source repository is healthy, it is >>> very likely that the recipient would be, too. Also it is >>> very cheap. You do not get any back-up benefit. >> >> Oh, but one does: an overzealous prune or rm -oopswrongoption in one >> repo does not hurt the other. > > That's not "back-up" benefit I was thinking about. It is more > about protecting your data from hardware failure. You > physically have bits in two places, preferrably on separate disk > drives. > > And that is what you do not get from hardlinked clone. Not at the inode/blob level, but at least the directory manipulations of one are safe from the other. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-03 0:51 ` Junio C Hamano 2007-08-03 6:14 ` David Kastrup @ 2007-08-03 8:20 ` Johan Herland 1 sibling, 0 replies; 29+ messages in thread From: Johan Herland @ 2007-08-03 8:20 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, David Kastrup On Friday 03 August 2007, Junio C Hamano wrote: > David Kastrup <dak@gnu.org> writes: > > > Junio C Hamano <gitster@pobox.com> writes: > > > >> * With -l, as long as the source repository is healthy, it is > >> very likely that the recipient would be, too. Also it is > >> very cheap. You do not get any back-up benefit. > > > > Oh, but one does: an overzealous prune or rm -oopswrongoption in one > > repo does not hurt the other. > > That's not "back-up" benefit I was thinking about. It is more > about protecting your data from hardware failure. If one is serious about backing up ones repo to protect it from hardware failure, there is not much use at all in cloning (by copy, hardlink, or otherwise) to a different location on the _same_ filesystem. In order for a backup to be at least marginally useful, it should be on a different disk drive (which you hint at below), or even better; on a different continent... My point is as follows: One has to clone a repo onto (at least) a different filesystem if one is serious about backup. But if one is cloning to a different filesystem, hardlinking is no longer an option; git _has_ to make a copy of some sort. Therefore we might as well hardlink as long as we're on a single filesystem (since the extra copy would not be worth much, backup-wise). > You physically have bits in two places, preferrably on separate disk > drives. > And that is what you do not get from hardlinked clone. If the two copies are on separate disk drives (i.e. separate filesystems), you cannot make a hardlink in the first place. If the two copies are on the same filesystem, they're not much more worth than a single copy (backup-wise). Given the clone-to-same-filesystem(-with-hardlink-capability) scenario (which is the only scenario where we have the option of using hardlinks), we have the following pros and cons when using hardlinks instead of copying: Pros: - Hardlink is _much_ faster (for big repos, we're talking orders of magnitude faster) Cons: - Hardlink will not leave two copies on the disk. But I'm arguing that the additional copy will have pretty much _no_ value from a redundancy POV, since the copy is still left on the _same_ filesystem. Some would even go as far as to say that the second copy provides a false sense of security as long as it is located on the same filesystem. Have fun! ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 10:15 ` Junio C Hamano 2007-08-01 13:20 ` Alex Riesen 2007-08-01 15:49 ` Carl Worth @ 2007-08-01 22:03 ` Theodore Tso 2007-08-01 22:49 ` Brandon Casey 2007-08-02 4:02 ` Allan Wind 2007-08-01 22:18 ` Jakub Narebski 2007-08-02 18:08 ` Ramsay Jones 4 siblings, 2 replies; 29+ messages in thread From: Theodore Tso @ 2007-08-01 22:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Jakub Narebski, Git Mailing List On Wed, Aug 01, 2007 at 03:15:25AM -0700, Junio C Hamano wrote: > > So would you accept a patch which adds a git-config variable which > > specifies whether or not local clones should use hard links by default > > (defaulting to yes), and which adds a --no-hard-links option to > > git-clone to override the config option? > > Are you suggesting to make -l the default for local, in other > words? I personally do not make local clone often enough that I > am not disturbed having to type extra " -l" on the command line. Yeah, essentially, with a git-config option (and comand-line option) to override the default for those people who are "squeamish" about git clone -l. Linus's suggestion of using file:// as a way to indicate non-local also makes a lot of sense to me. > Perhaps if the destination is local, > > - if -s is given, just set up alternates, do nothing else; As I understand it, the main objection with making -s the default is surprising result that could happen if you do a git-prune in the base repository which causes objects which are borrowed from the base repository via .git/objects/info/alternates, right? What if git clone -s appended the repository which is borrowing objects via alternates to a file located in the base repository, .git/objects/info/shared-repos? Then git-prune could also use the refs marked in each of the downstream repositories that are sharing objects with base repository and not make those objects go away. That way, git-gc --prune won't do anything surprising to any shared repositories, since it will scan those shared repositories automatically. Would that be considered acceptable? That way you can get instant clones even on filesystems that don't support hard links, such as Windows systems. The way I would suggest doing it is once we implement support for .git/objects/info/shared-repos is to do the following with git-clone by default: * If the source repo is specified via a file:// URL, per Linus's suggestion, do the clone via copying. * If the source repo is specified via a local pathname, and .git/objects/info/shared-repos in the source repository is writeable by the user who is doing the clone, then do a clone -s. * If the above fails, try clone -l * If the above fails, do a clone via copying over a new pack Would this be acceptable? Patches to do the following should be fairly easy to whip up. Obviously this wouldn't be for 1.5.3. :-) - Ted ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 22:03 ` Theodore Tso @ 2007-08-01 22:49 ` Brandon Casey 2007-08-02 4:02 ` Allan Wind 1 sibling, 0 replies; 29+ messages in thread From: Brandon Casey @ 2007-08-01 22:49 UTC (permalink / raw) To: Theodore Tso Cc: Junio C Hamano, Linus Torvalds, Jakub Narebski, Git Mailing List Theodore Tso wrote: > On Wed, Aug 01, 2007 at 03:15:25AM -0700, Junio C Hamano wrote: >> Perhaps if the destination is local, >> >> - if -s is given, just set up alternates, do nothing else; > > As I understand it, the main objection with making -s the default is > surprising result that could happen if you do a git-prune in the base > repository which causes objects which are borrowed from the base > repository via .git/objects/info/alternates, right? -s would be a lot safer to use if repack -a -d (as used by git-gc) was smarter. -a -d has the nasty side effect of doing what it seems only prune is intended to do... that is to remove unreferenced objects. -s usage currently has to be very well thought out, unless you're just using it for a short-lived temporary branch. If this unintended pruning could be avoided then an average user could go about their merry business repacking and git-gc'ing without a care, and only when doing a git-gc --prune would they need to do something special. -brandon ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 22:03 ` Theodore Tso 2007-08-01 22:49 ` Brandon Casey @ 2007-08-02 4:02 ` Allan Wind 2007-08-02 4:13 ` Linus Torvalds 1 sibling, 1 reply; 29+ messages in thread From: Allan Wind @ 2007-08-02 4:02 UTC (permalink / raw) To: Theodore Tso Cc: Junio C Hamano, Linus Torvalds, Jakub Narebski, Git Mailing List On 2007-08-01T18:03:50-0400, Theodore Tso wrote: > Yeah, essentially, with a git-config option (and comand-line option) > to override the default for those people who are "squeamish" about git > clone -l. Linus's suggestion of using file:// as a way to indicate > non-local also makes a lot of sense to me. I would expect /something and file:///something to behave exactly the same way (the latter just having bit extra syntax sugar). /Allan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-02 4:02 ` Allan Wind @ 2007-08-02 4:13 ` Linus Torvalds 0 siblings, 0 replies; 29+ messages in thread From: Linus Torvalds @ 2007-08-02 4:13 UTC (permalink / raw) To: Allan Wind; +Cc: Theodore Tso, Junio C Hamano, Jakub Narebski, Git Mailing List On Thu, 2 Aug 2007, Allan Wind wrote: > > I would expect /something and file:///something to behave exactly the > same way (the latter just having bit extra syntax sugar). I do agree that they should be basically the same, but from an implementation standpoint it actually makes a lot of sense to separate them. Also, there's actually a small amount of "logic" in it: the /something is obviously a "raw filename", while the "file:://something" clearly is something a lot more abstract. I don't actually have a very strong opinion, but I do think that "file://" makes sense regardless (ie the patch I sent out is probably a good idea). I also strongly dispute that "file://something" is _identical_ to just "something". There's a huge difference, as anybody who has ever tried to do cp file://file-A file-B will have hopefully found out. They may mean the same thing, but they have totally different levels of abstraction, so it does actually make some sense that you end up *cloning* the same thing, but different ways. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 10:15 ` Junio C Hamano ` (2 preceding siblings ...) 2007-08-01 22:03 ` Theodore Tso @ 2007-08-01 22:18 ` Jakub Narebski 2007-08-02 11:19 ` Jakub Narebski 2007-08-02 18:08 ` Ramsay Jones 4 siblings, 1 reply; 29+ messages in thread From: Jakub Narebski @ 2007-08-01 22:18 UTC (permalink / raw) To: Junio C Hamano; +Cc: Theodore Tso, Linus Torvalds, Git Mailing List Junio C Hamano wrote: > Perhaps if the destination is local, > > - if -s is given, just set up alternates, do nothing else; > - by default, do "always copy never hardlink"; > - with -l, do "hardlink if possible"; > > Hmmmm... That I think it is the best solution, together with support for file:///path/to/repo.git scheme which would turn on old repacking behavior. I'm all for it. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 22:18 ` Jakub Narebski @ 2007-08-02 11:19 ` Jakub Narebski 0 siblings, 0 replies; 29+ messages in thread From: Jakub Narebski @ 2007-08-02 11:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: Theodore Tso, Linus Torvalds, Git Mailing List Jakub Narebski wrote: > Junio C Hamano wrote: > > > Perhaps if the destination is local, > > > > - if -s is given, just set up alternates, do nothing else; > > - by default, do "always copy never hardlink"; > > - with -l, do "hardlink if possible"; > > > > Hmmmm... > > That I think it is the best solution, together with support for > file:///path/to/repo.git scheme which would turn on old repacking > behavior. I'm all for it. By the way, with "-l" you have hardlinks only till repack ("git gc"), isn't it? -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 10:15 ` Junio C Hamano ` (3 preceding siblings ...) 2007-08-01 22:18 ` Jakub Narebski @ 2007-08-02 18:08 ` Ramsay Jones 4 siblings, 0 replies; 29+ messages in thread From: Ramsay Jones @ 2007-08-02 18:08 UTC (permalink / raw) To: Junio C Hamano Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Git Mailing List Junio C Hamano wrote: > Are you suggesting to make -l the default for local, in other > words? I personally do not make local clone often enough that I > am not disturbed having to type extra " -l" on the command line. > > But giving a way to force "copy not hardlink" while still > avoiding "the same as the networked case by doing pack transfer" > overhead may be a good thing to do. > > Perhaps if the destination is local, > > - if -s is given, just set up alternates, do nothing else; > - by default, do "always copy never hardlink"; > - with -l, do "hardlink if possible"; > > Hmmmm... > About six weeks ago, I finally got around to installing Linux (ubuntu 7.04) on my laptop. Naturally, I cloned my sparse and git repositories over from the Windows XP partition. Unfortunately, that left me with a sparse repo that I could not modify; during the clone cpio copied the object directory, with perhaps a little too much fidelity, which resulted in a .git/objects tree with 555 permissions (both files and directories). [It also set the file timestamps with utime(), BTW]. A quick chmod fixed it up without problem, but still ... When I cloned the git repo, however, I forgot the -l parameter and git-clone effectively did a "git-fetch-pack --all -k $repo", leaving me with a working, and fully repacked, repository. Nice. So, I was about to suggest that when invoked with -l, if the object database cannot be linked, due to EXDEV for example, it should fall back to the "fetch-pack" behaviour. Since I don't have access to a large repo, I can't compare the filesystem-copy time versus the fetch-pack time for a "realistic" repo, but I suppose the copy would always be faster. Oh Well. Just a data point. ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 2:14 ` Linus Torvalds 2007-08-01 5:50 ` Junio C Hamano @ 2007-08-01 8:33 ` Jakub Narebski 2007-08-01 8:48 ` Junio C Hamano 1 sibling, 1 reply; 29+ messages in thread From: Jakub Narebski @ 2007-08-01 8:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Linus Torvalds wrote: > (The "nonconflicting merge" is probably - once more - the diffstat > generation that bites us. That's generally the most costly thing of the > whole merge, but I *love* the diffstat). http://bryan-murdock.blogspot.com/2007/03/cutting-edge-revision-control.html doesn't tell what is the directory structure of imported files. If it is flat, then git does not use advantage of hierarchical tree structure. By the way, I guess that "nonconflicting merge" is trivial tree-level merge, as "no changes" merge should be faster (or fast-forward). About clone: there was "pack loose, copy existing packs" idea. I don't remember what happened with it. At least for local clone it would be nice. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 8:33 ` Jakub Narebski @ 2007-08-01 8:48 ` Junio C Hamano 2007-08-01 23:51 ` Jakub Narebski 0 siblings, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2007-08-01 8:48 UTC (permalink / raw) To: Jakub Narebski; +Cc: Linus Torvalds, Git Mailing List Jakub Narebski <jnareb@gmail.com> writes: > About clone: there was "pack loose, copy existing packs" idea. Can you give more details --- I do not recall such an "idea" discussed. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 8:48 ` Junio C Hamano @ 2007-08-01 23:51 ` Jakub Narebski 0 siblings, 0 replies; 29+ messages in thread From: Jakub Narebski @ 2007-08-01 23:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > > > About clone: there was "pack loose, copy existing packs" idea. > > Can you give more details --- I do not recall such an "idea" > discussed. The idea was to avoid repacking, and just pack loose, unpacked objects (and save this pack if possible), then concatenate all packs and send this concatenated pack as the result. This saves a bit (quite a bit) of CPU at the cost of additional bandwidth usage if packfiles are not optimized. The only result of the discussion was that it would be fairly easy to send multiple packs concatenated into one pack, without need to add some multi-pack extension, as there would be required minor changes to split "concatenated" packfiles. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial 2007-08-01 0:16 Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial Jakub Narebski 2007-08-01 2:14 ` Linus Torvalds @ 2007-08-01 2:17 ` Shawn O. Pearce 1 sibling, 0 replies; 29+ messages in thread From: Shawn O. Pearce @ 2007-08-01 2:17 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> wrote: > I have lately added new Git speed benchmark, from Bryan Murdock blog. > The repository is bit untypical: > > <quote> > By performance, I mean that I used the UNIX time command to see how > long various basic operations took. Performing the various basic > operations gave me some insight into the usability of each as well. > For this test I used a directory with 266 MB of files, 258 KB of which > were text files, with the rest being image files. I know, kind of > weird to version all those binary files, but that was the project I > was interested in testing this out on. Your mileage may vary and all > that. Here’s a table summarizing the real times reported by time(1): > </quote> > > If I remember correctly there were some patches to git which tried to > better deal with large blobs. In this simple benchmark git was > outperformed by Mercurial and even Bazaar-NG a bit. Yes. And we backed them out more recently. :-( A while ago someone had issues with large binary blobs being added to the repository as loose objects (e.g. by git-add/git-update-index). Repacking that repository (for just git-gc or for transport/clone) was ugly as the large binary blob had to be deflated then reinflated to encode it in the packfile. The solution was the core.legacyheaders = false configuration setting, which used packfile encoding for loose objects, thereby allowing the packer to just copy the already compressed data into the output packfile. Unfortunately we backed that out recently to "simplify the code". We can still read that loose object format, but we cannot create it and during packing we don't copy the data (we deflate/inflate anyway). So we're back to the horrible deflate/inflate problem. That probably explains the large clone time seen by the author. I wonder if hg realizes that the two repositories are on the same filesystem and automatically uses hardlinks if possible (aka git clone -l). That would easily explain how they can clone so dang fast. Maybe we should do the same in git-clone, its a pretty simple thing to do. I do have to question the author's timing method. I don't know if this was hot-cache or not, and he doesn't say. I don't know if the system was 100% idle when running these times, or the times were averaged over a few runs. Usually the first run of anything can give inaccurate timings, as for example the executable code may not be paged in from disk. One of the tools may have had a bias as maybe he poked around with that tool first, before starting the timings, so its executables were still hot in cache. Etc. However assuming everything was actually done in a way that the timings can be accurately relied upon... Regarding the initial file import it looks like we about broke even with bzr if you add the "initial file import" and "initial commit" times together. Remember we have to hash and compress the data during git-add; bzr probably delayed their equivilant operation(s) until the commit operation. Summing these two times is probably needed to really compare them. We were also rather close to hg if you again sum the times up. But we do appear to be slower, by about 27s. I guess I find that hard to believe, but sure, maybe hg somehow has a faster codepath for their file revision disk IO than we do. Maybe its because hg is streaming data and we're loading it all in-core first; maybe the author's system had to swap get enough virtual memory for git-add. Maybe it is just because the author's testing methodology was not very good and one or more of these numbers are just bunk. Our merge time is pretty respectible giving the competition. Its probably within the margin of error of the author's testing methodology. -- Shawn. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2007-08-03 8:21 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-08-01 0:16 Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial Jakub Narebski 2007-08-01 2:14 ` Linus Torvalds 2007-08-01 5:50 ` Junio C Hamano 2007-08-01 8:48 ` David Kastrup 2007-08-01 9:24 ` Theodore Tso 2007-08-01 10:15 ` Junio C Hamano 2007-08-01 13:20 ` Alex Riesen 2007-08-01 13:20 ` Alex Riesen 2007-08-01 13:23 ` Alex Riesen 2007-08-01 15:49 ` Carl Worth 2007-08-01 17:03 ` Linus Torvalds 2007-08-01 18:17 ` David Kastrup 2007-08-01 20:36 ` Florian Weimer 2007-08-02 6:09 ` Junio C Hamano 2007-08-02 10:29 ` David Kastrup 2007-08-03 0:51 ` Junio C Hamano 2007-08-03 6:14 ` David Kastrup 2007-08-03 8:20 ` Johan Herland 2007-08-01 22:03 ` Theodore Tso 2007-08-01 22:49 ` Brandon Casey 2007-08-02 4:02 ` Allan Wind 2007-08-02 4:13 ` Linus Torvalds 2007-08-01 22:18 ` Jakub Narebski 2007-08-02 11:19 ` Jakub Narebski 2007-08-02 18:08 ` Ramsay Jones 2007-08-01 8:33 ` Jakub Narebski 2007-08-01 8:48 ` Junio C Hamano 2007-08-01 23:51 ` Jakub Narebski 2007-08-01 2:17 ` Shawn O. Pearce
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).