git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
@ 2007-08-01  0:16 Jakub Narebski
  2007-08-01  2:14 ` Linus Torvalds
  2007-08-01  2:17 ` Shawn O. Pearce
  0 siblings, 2 replies; 29+ messages in thread
From: Jakub Narebski @ 2007-08-01  0:16 UTC (permalink / raw)
  To: git

I have lately added new Git speed benchmark, from Bryan Murdock blog. 
The repository is bit untypical:

<quote>  
  By performance, I mean that I used the UNIX time command to see how
  long various basic operations took. Performing the various basic
  operations gave me some insight into the usability of each as well.
  For this test I used a directory with 266 MB of files, 258 KB of which
  were text files, with the rest being image files. I know, kind of
  weird to version all those binary files, but that was the project I
  was interested in testing this out on. Your mileage may vary and all
  that. Here’s a table summarizing the real times reported by time(1):
</quote>

If I remember correctly there were some patches to git which tried to 
better deal with large blobs. In this simple benchmark git was 
outperformed by Mercurial and even Bazaar-NG a bit.

http://git.or.cz/gitwiki/GitBenchmarks#head-5657b8361895b5a02c0de39337c410e4d8dcdbce
http://bryan-murdock.blogspot.com/2007/03/cutting-edge-revision-control.html
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01  0:16 Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial Jakub Narebski
@ 2007-08-01  2:14 ` Linus Torvalds
  2007-08-01  5:50   ` Junio C Hamano
  2007-08-01  8:33   ` Jakub Narebski
  2007-08-01  2:17 ` Shawn O. Pearce
  1 sibling, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-01  2:14 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Git Mailing List



On Wed, 1 Aug 2007, Jakub Narebski wrote:
> 
> If I remember correctly there were some patches to git which tried to 
> better deal with large blobs. In this simple benchmark git was 
> outperformed by Mercurial and even Bazaar-NG a bit.

It's almost certainly not the binary blobs.

I think almost all the difference is from the cloning, without repacking 
the souce or using a local clone.

The default action for a git clone is to create a pack-file, and do a 
local clone as if you did it over the network. That is obviously much 
slower than using the "-l" flag for the _clone_ action, but it tends to be 
better for the end result - since you get a nice packed starting point, 
and none of the confusion with hardlinks etc.

[ Maybe I'm just a worry-wart, but hardlinking two repos still makes me 
  worried. Even though we never modify the object files. 

  Quite frankly, I almost wish we hadn't ever done "-l" at all, and I 
  cannot really suggest using it. Either use "-s" for the truly shared 
  repository, or use the default pack-generating one. The hardlinking one 
  was simple and made sense, but it's really not very nice.

  But that aversion to "git clone -l" is really totally illogical. The way 
  we do the object handling, hardlinking object files in git is just about 
  the most safe operation you can think of - and I *still* shudder at it ]

Now, I think the "always act as if you were network transparent" by 
default is great, but especially if you have never run "git gc" to 
generate a pack to begin with, it's going to be a very costly thing. And I 
think that's what the numbers show. That's the only op we do a *lot* worse 
on than we should.

(The "nonconflicting merge" is probably - once more - the diffstat 
generation that bites us. That's generally the most costly thing of the 
whole merge, but I *love* the diffstat).

That said, even if he had done a "git gc", to be fair he would have had to 
include the cost of that first garbage collect in the "initial import", so 
the end result would have been exactly the same. Git _does_ end up having 
a very odd performance profile, and while it's optimized for certain 
thing, the "initial import" is not one of them.

(Which admittedly is a bit odd. The reason I didn't ever seriously even 
consider monotone was that the initial import was so *incredibly* sucky, 
and took hours for the kernel. So use "-l" for benchmarks, and damn my 
"I hate hardlinking repos" idiocy).

So the only way to truly do a fast initial import *and* get a reasonably 
good initial clone is likely one of:

 - take full advantage of git, and use local branches, instead of 
   bothering with lots of clones.

   I think that this is often the right thing to do, but it's obviously 
   not fair for comparisons, since it's really something different from 
   what's likely available in the other SCM's. But it's the "git way".

 - use "git clone -s" (or "-l").

   I think the hg numbers are the result of hg defaulting to "-l" 
   behaviour.  Which makes sense for hg, since people need to clone more 
   (in git, you'd generally work with local branches instead).

 - or the initial import would be done with some "git fast-import" thing, 
   rather than "git add ." We don't do it now, and the resulting pack-file 
   wouldn't be optimal, but it would be reasonable. It would at least cut 
   down a _bit_ on the clone cost.

The other reaction I took away from that (quite reasonable, I think) 
comparison is that I think Murdock would have been much happier if git 
diff defaulted to "-C". We don't do that (for the best of reasons: 
interoperability), but maybe we should document the "-M/-C" options more. 

The options do show up in the man-page, but apparently not 
obviously enough, since he hadn't noticed.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01  0:16 Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial Jakub Narebski
  2007-08-01  2:14 ` Linus Torvalds
@ 2007-08-01  2:17 ` Shawn O. Pearce
  1 sibling, 0 replies; 29+ messages in thread
From: Shawn O. Pearce @ 2007-08-01  2:17 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:
> I have lately added new Git speed benchmark, from Bryan Murdock blog. 
> The repository is bit untypical:
> 
> <quote>  
>   By performance, I mean that I used the UNIX time command to see how
>   long various basic operations took. Performing the various basic
>   operations gave me some insight into the usability of each as well.
>   For this test I used a directory with 266 MB of files, 258 KB of which
>   were text files, with the rest being image files. I know, kind of
>   weird to version all those binary files, but that was the project I
>   was interested in testing this out on. Your mileage may vary and all
>   that. Here’s a table summarizing the real times reported by time(1):
> </quote>
> 
> If I remember correctly there were some patches to git which tried to 
> better deal with large blobs. In this simple benchmark git was 
> outperformed by Mercurial and even Bazaar-NG a bit.

Yes.  And we backed them out more recently.  :-(

A while ago someone had issues with large binary blobs being added to
the repository as loose objects (e.g. by git-add/git-update-index).
Repacking that repository (for just git-gc or for transport/clone)
was ugly as the large binary blob had to be deflated then
reinflated to encode it in the packfile.  The solution was the
core.legacyheaders = false configuration setting, which used
packfile encoding for loose objects, thereby allowing the packer
to just copy the already compressed data into the output packfile.

Unfortunately we backed that out recently to "simplify the code".
We can still read that loose object format, but we cannot create
it and during packing we don't copy the data (we deflate/inflate
anyway).  So we're back to the horrible deflate/inflate problem.
That probably explains the large clone time seen by the author.

I wonder if hg realizes that the two repositories are on the
same filesystem and automatically uses hardlinks if possible (aka
git clone -l).  That would easily explain how they can clone so
dang fast.  Maybe we should do the same in git-clone, its a pretty
simple thing to do.


I do have to question the author's timing method.  I don't know if
this was hot-cache or not, and he doesn't say.  I don't know if the
system was 100% idle when running these times, or the times were
averaged over a few runs.  Usually the first run of anything can
give inaccurate timings, as for example the executable code may
not be paged in from disk.  One of the tools may have had a bias
as maybe he poked around with that tool first, before starting the
timings, so its executables were still hot in cache.  Etc.

However assuming everything was actually done in a way that the
timings can be accurately relied upon...

Regarding the initial file import it looks like we about broke even
with bzr if you add the "initial file import" and "initial commit"
times together.  Remember we have to hash and compress the data
during git-add; bzr probably delayed their equivilant operation(s)
until the commit operation.  Summing these two times is probably
needed to really compare them.

We were also rather close to hg if you again sum the times up.
But we do appear to be slower, by about 27s.  I guess I find that
hard to believe, but sure, maybe hg somehow has a faster codepath
for their file revision disk IO than we do.  Maybe its because hg
is streaming data and we're loading it all in-core first; maybe the
author's system had to swap get enough virtual memory for git-add.
Maybe it is just because the author's testing methodology was not
very good and one or more of these numbers are just bunk.


Our merge time is pretty respectible giving the competition.
Its probably within the margin of error of the author's testing
methodology.
 
-- 
Shawn.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01  2:14 ` Linus Torvalds
@ 2007-08-01  5:50   ` Junio C Hamano
  2007-08-01  8:48     ` David Kastrup
  2007-08-01  9:24     ` Theodore Tso
  2007-08-01  8:33   ` Jakub Narebski
  1 sibling, 2 replies; 29+ messages in thread
From: Junio C Hamano @ 2007-08-01  5:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> (Which admittedly is a bit odd. The reason I didn't ever seriously even 
> consider monotone was that the initial import was so *incredibly* sucky, 
> and took hours for the kernel. So use "-l" for benchmarks, and damn my 
> "I hate hardlinking repos" idiocy).

I would call aversion to -l a superstition, while aversion to -s
has a sound technical reasons.  The latter means you need to know
what you are doing --- namely, you are making the clone still
dependent on the original.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01  2:14 ` Linus Torvalds
  2007-08-01  5:50   ` Junio C Hamano
@ 2007-08-01  8:33   ` Jakub Narebski
  2007-08-01  8:48     ` Junio C Hamano
  1 sibling, 1 reply; 29+ messages in thread
From: Jakub Narebski @ 2007-08-01  8:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds wrote:

> (The "nonconflicting merge" is probably - once more - the diffstat 
> generation that bites us. That's generally the most costly thing of the 
> whole merge, but I *love* the diffstat).

http://bryan-murdock.blogspot.com/2007/03/cutting-edge-revision-control.html
doesn't tell what is the directory structure of imported files.
If it is flat, then git does not use advantage of hierarchical tree
structure.

By the way, I guess that "nonconflicting merge" is trivial tree-level
merge, as "no changes" merge should be faster (or fast-forward).

About clone: there was "pack loose, copy existing packs" idea. I don't
remember what happened with it. At least for local clone it would be
nice.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01  8:33   ` Jakub Narebski
@ 2007-08-01  8:48     ` Junio C Hamano
  2007-08-01 23:51       ` Jakub Narebski
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2007-08-01  8:48 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Linus Torvalds, Git Mailing List

Jakub Narebski <jnareb@gmail.com> writes:

> About clone: there was "pack loose, copy existing packs" idea.

Can you give more details --- I do not recall such an "idea"
discussed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01  5:50   ` Junio C Hamano
@ 2007-08-01  8:48     ` David Kastrup
  2007-08-01  9:24     ` Theodore Tso
  1 sibling, 0 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-01  8:48 UTC (permalink / raw)
  To: git


Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> (Which admittedly is a bit odd. The reason I didn't ever seriously even 
>> consider monotone was that the initial import was so *incredibly* sucky, 
>> and took hours for the kernel. So use "-l" for benchmarks, and damn my 
>> "I hate hardlinking repos" idiocy).
>
> I would call aversion to -l a superstition, while aversion to -s
> has a sound technical reasons.  The latter means you need to know
> what you are doing --- namely, you are making the clone still
> dependent on the original.

Well, I'd not call the -l aversy a complete superstition: it means
that cloning a repository won't provide any redundancy worth noting
against file system corruption.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01  5:50   ` Junio C Hamano
  2007-08-01  8:48     ` David Kastrup
@ 2007-08-01  9:24     ` Theodore Tso
  2007-08-01 10:15       ` Junio C Hamano
  1 sibling, 1 reply; 29+ messages in thread
From: Theodore Tso @ 2007-08-01  9:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Jakub Narebski, Git Mailing List

On Tue, Jul 31, 2007 at 10:50:48PM -0700, Junio C Hamano wrote:
> I would call aversion to -l a superstition, while aversion to -s
> has a sound technical reasons.  The latter means you need to know
> what you are doing --- namely, you are making the clone still
> dependent on the original.

So would you accept a patch which adds a git-config variable which
specifies whether or not local clones should use hard links by default
(defaulting to yes), and which adds a --no-hard-links option to
git-clone to override the config option?

I could imagine a situation where if you are using a git repository
exclusively on a local system, with no remote repositories to act as
backups, where you might want git clone to to make full copies to
provide backups in case of filesystem or disk induced corruption.  But
most of the time there are enough copies of the the repo on other
machines that the need for making separate copies of the git
objects/packs isn't really needed.

					- Ted

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01  9:24     ` Theodore Tso
@ 2007-08-01 10:15       ` Junio C Hamano
  2007-08-01 13:20         ` Alex Riesen
                           ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Junio C Hamano @ 2007-08-01 10:15 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Linus Torvalds, Jakub Narebski, Git Mailing List

Theodore Tso <tytso@mit.edu> writes:

> On Tue, Jul 31, 2007 at 10:50:48PM -0700, Junio C Hamano wrote:
>> I would call aversion to -l a superstition, while aversion to -s
>> has a sound technical reasons.  The latter means you need to know
>> what you are doing --- namely, you are making the clone still
>> dependent on the original.
>
> So would you accept a patch which adds a git-config variable which
> specifies whether or not local clones should use hard links by default
> (defaulting to yes), and which adds a --no-hard-links option to
> git-clone to override the config option?

Are you suggesting to make -l the default for local, in other
words?  I personally do not make local clone often enough that I
am not disturbed having to type extra " -l" on the command line.

But giving a way to force "copy not hardlink" while still
avoiding "the same as the networked case by doing pack transfer"
overhead may be a good thing to do.

Perhaps if the destination is local,

         - if -s is given, just set up alternates, do nothing else;
         - by default, do "always copy never hardlink";
         - with -l, do "hardlink if possible";

Hmmmm...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 10:15       ` Junio C Hamano
@ 2007-08-01 13:20         ` Alex Riesen
  2007-08-01 13:20           ` Alex Riesen
  2007-08-01 15:49         ` Carl Worth
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Alex Riesen @ 2007-08-01 13:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Git Mailing List

On 8/1/07, Junio C Hamano <gitster@pobox.com> wrote:
> Theodore Tso <tytso@mit.edu> writes:
> > So would you accept a patch which adds a git-config variable which
> > specifies whether or not local clones should use hard links by default
> > (defaulting to yes), and which adds a --no-hard-links option to
> > git-clone to override the config option?
>
> Are you suggesting to make -l the default for local, in other
> words?  I personally do not make local clone often enough that I
> am not disturbed having to type extra " -l" on the command line.

...as long as the underlying filesystem _supports_ hardlinks.

BTW, we need a warning when falling back to normal copy,
if git-clone -l is used. The user _asked_ for a hard-linked
clone, but silently got something else. Something like this:

diff --git a/git-clone.sh b/git-clone.sh
index 0922554..a744f5b 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -266,6 +266,7 @@ yes,yes)
 	    l=
 	    if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null
 	    then
+		    echo >&2 "Hardlinks not supported. Falling back to copy"
 		    l=l
 	    fi &&
 	    rm -f "$GIT_DIR/objects/sample" &&

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 13:20         ` Alex Riesen
@ 2007-08-01 13:20           ` Alex Riesen
  2007-08-01 13:23             ` Alex Riesen
  0 siblings, 1 reply; 29+ messages in thread
From: Alex Riesen @ 2007-08-01 13:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Git Mailing List

On 8/1/07, Alex Riesen <raa.lkml@gmail.com> wrote:
>             if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null
>             then
> +                   echo >&2 "Hardlinks not supported. Falling back to copy"
>                     l=l
>             fi &&

Err, the other way around, of course.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 13:20           ` Alex Riesen
@ 2007-08-01 13:23             ` Alex Riesen
  0 siblings, 0 replies; 29+ messages in thread
From: Alex Riesen @ 2007-08-01 13:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Git Mailing List

On 8/1/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> On 8/1/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> >             if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null
> >             then
> > +                   echo >&2 "Hardlinks not supported. Falling back to copy"
> >                     l=l
> >             fi &&
>
> Err, the other way around, of course.
>

diff --git a/git-clone.sh b/git-clone.sh
index 0922554..483b91d 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -264,8 +264,10 @@ yes,yes)
 	    test -f "$repo/$sample_file" || exit

 	    l=
-	    if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null
+	    if ! ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null
 	    then
+		    echo >&2 "Hardlinks not supported. Falling back to copy"
+	    else
 		    l=l
 	    fi &&
 	    rm -f "$GIT_DIR/objects/sample" &&

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 10:15       ` Junio C Hamano
  2007-08-01 13:20         ` Alex Riesen
@ 2007-08-01 15:49         ` Carl Worth
  2007-08-01 17:03           ` Linus Torvalds
  2007-08-01 22:03         ` Theodore Tso
                           ` (2 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Carl Worth @ 2007-08-01 15:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]

On Wed, 01 Aug 2007 03:15:25 -0700, Junio C Hamano wrote:
>
> Are you suggesting to make -l the default for local, in other
> words?  I personally do not make local clone often enough that I
> am not disturbed having to type extra " -l" on the command line.

Personally, I think it would be a great default.

And I think the frequency with which you type this command is not a
good metric for deciding if a command-line option should be required.

Instead, the focus should be on having good defaults for a good user
experience, (for example, the benchmarking that started this thread
that gave a bad first impression of git).

So, just making git-clone go as fast as possible when local, without
requiring any additional options from the user, would be a very good
thing.

As for the concern that new users might do local clones in the hope to
get some redundancy, hopefully the fact that the operation is
instantaneous will give plenty of clue to the user that no redundancy
has been provided. That should be enough to send the user looking for
the documentation to find the --no-hard-links option.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 15:49         ` Carl Worth
@ 2007-08-01 17:03           ` Linus Torvalds
  2007-08-01 18:17             ` David Kastrup
  2007-08-02  6:09             ` Junio C Hamano
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-01 17:03 UTC (permalink / raw)
  To: Carl Worth; +Cc: Junio C Hamano, Theodore Tso, Jakub Narebski, Git Mailing List



On Wed, 1 Aug 2007, Carl Worth wrote:
>
> On Wed, 01 Aug 2007 03:15:25 -0700, Junio C Hamano wrote:
> >
> > Are you suggesting to make -l the default for local, in other
> > words?  I personally do not make local clone often enough that I
> > am not disturbed having to type extra " -l" on the command line.
> 
> Personally, I think it would be a great default.

I suspect it probably *would* make sense to default to "-l". Even if it 
makes me get goose-bumps. I freely admit that my worries are totally 
illogical.

We might make it something like: "if you use an url, we don't default to 
local", so the difference would be that

	git clone file:///directory/to/repo

would work the way it does now, but

	git clone /directory/to/repo

would default to "-l" behaviour. That kind of would make sense (and should 
be easy to implement: it would be a trivial fixup to "connect.c".

Something like this adds support for "file://". And then git-clone could 
just do something like

	# if the source is a local directory, default to local
	if [ -d "$src" ]; then
		use_local=yes
	fi

or similar.

		Linus

---
 connect.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/connect.c b/connect.c
index 715cdc0..ae49c5a 100644
--- a/connect.c
+++ b/connect.c
@@ -145,6 +145,8 @@ static enum protocol get_protocol(const char *name)
 		return PROTO_SSH;
 	if (!strcmp(name, "ssh+git"))
 		return PROTO_SSH;
+	if (!strcmp(name, "file"))
+		return PROTO_LOCAL;
 	die("I don't handle protocol '%s'", name);
 }
 
@@ -498,13 +500,13 @@ pid_t git_connect(int fd[2], char *url, const char *prog, int flags)
 		end = host;
 
 	path = strchr(end, c);
-	if (c == ':') {
-		if (path) {
+	if (path) {
+		if (c == ':') {
 			protocol = PROTO_SSH;
 			*path++ = '\0';
-		} else
-			path = host;
-	}
+		}
+	} else
+		path = end;
 
 	if (!path || !*path)
 		die("No path specified. See 'man git-pull' for valid url syntax");

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 17:03           ` Linus Torvalds
@ 2007-08-01 18:17             ` David Kastrup
  2007-08-01 20:36               ` Florian Weimer
  2007-08-02  6:09             ` Junio C Hamano
  1 sibling, 1 reply; 29+ messages in thread
From: David Kastrup @ 2007-08-01 18:17 UTC (permalink / raw)
  To: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> I suspect it probably *would* make sense to default to "-l". Even if it 
> makes me get goose-bumps. I freely admit that my worries are totally 
> illogical.
>
> We might make it something like: "if you use an url, we don't default to 
> local",

Couldn't git clone http://host/directory/to/repo tell the proxy that
it should enter off-line mode and stop updating?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 18:17             ` David Kastrup
@ 2007-08-01 20:36               ` Florian Weimer
  0 siblings, 0 replies; 29+ messages in thread
From: Florian Weimer @ 2007-08-01 20:36 UTC (permalink / raw)
  To: git

* David Kastrup:

> Couldn't git clone http://host/directory/to/repo tell the proxy that
> it should enter off-line mode and stop updating?

Huh? I don't see how this is relevant to the current thread.

Anyway, I don't think the max-stale cache control mechanism is widely
implemented.  If you want effective expiry controls, you need to
implement them on the server side.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 10:15       ` Junio C Hamano
  2007-08-01 13:20         ` Alex Riesen
  2007-08-01 15:49         ` Carl Worth
@ 2007-08-01 22:03         ` Theodore Tso
  2007-08-01 22:49           ` Brandon Casey
  2007-08-02  4:02           ` Allan Wind
  2007-08-01 22:18         ` Jakub Narebski
  2007-08-02 18:08         ` Ramsay Jones
  4 siblings, 2 replies; 29+ messages in thread
From: Theodore Tso @ 2007-08-01 22:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Jakub Narebski, Git Mailing List

On Wed, Aug 01, 2007 at 03:15:25AM -0700, Junio C Hamano wrote:
> > So would you accept a patch which adds a git-config variable which
> > specifies whether or not local clones should use hard links by default
> > (defaulting to yes), and which adds a --no-hard-links option to
> > git-clone to override the config option?
> 
> Are you suggesting to make -l the default for local, in other
> words?  I personally do not make local clone often enough that I
> am not disturbed having to type extra " -l" on the command line.

Yeah, essentially, with a git-config option (and comand-line option)
to override the default for those people who are "squeamish" about git
clone -l.  Linus's suggestion of using file:// as a way to indicate
non-local also makes a lot of sense to me.

> Perhaps if the destination is local,
> 
>          - if -s is given, just set up alternates, do nothing else;

As I understand it, the main objection with making -s the default is
surprising result that could happen if you do a git-prune in the base
repository which causes objects which are borrowed from the base
repository via .git/objects/info/alternates, right?

What if git clone -s appended the repository which is borrowing
objects via alternates to a file located in the base repository,
.git/objects/info/shared-repos?

Then git-prune could also use the refs marked in each of the
downstream repositories that are sharing objects with base repository
and not make those objects go away.  That way, git-gc --prune won't do
anything surprising to any shared repositories, since it will scan
those shared repositories automatically.  Would that be considered
acceptable?  That way you can get instant clones even on filesystems
that don't support hard links, such as Windows systems.

The way I would suggest doing it is once we implement support for
.git/objects/info/shared-repos is to do the following with git-clone
by default:

   	* If the source repo is specified via a file:// URL, per Linus's
          suggestion, do the clone via copying.

	* If the source repo is specified via a local pathname, and
          .git/objects/info/shared-repos in the source repository is
          writeable by the user who is doing the clone, then do a
          clone -s.

	* If the above fails, try clone -l

	* If the above fails, do a clone via copying over a new pack

Would this be acceptable?  Patches to do the following should be
fairly easy to whip up.  Obviously this wouldn't be for 1.5.3.  :-)

       	       	    	 	   	- Ted

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 10:15       ` Junio C Hamano
                           ` (2 preceding siblings ...)
  2007-08-01 22:03         ` Theodore Tso
@ 2007-08-01 22:18         ` Jakub Narebski
  2007-08-02 11:19           ` Jakub Narebski
  2007-08-02 18:08         ` Ramsay Jones
  4 siblings, 1 reply; 29+ messages in thread
From: Jakub Narebski @ 2007-08-01 22:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Theodore Tso, Linus Torvalds, Git Mailing List

Junio C Hamano wrote:

> Perhaps if the destination is local,
> 
>          - if -s is given, just set up alternates, do nothing else;
>          - by default, do "always copy never hardlink";
>          - with -l, do "hardlink if possible";
> 
> Hmmmm...

That I think it is the best solution, together with support for
file:///path/to/repo.git scheme which would turn on old repacking
behavior. I'm all for it.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 22:03         ` Theodore Tso
@ 2007-08-01 22:49           ` Brandon Casey
  2007-08-02  4:02           ` Allan Wind
  1 sibling, 0 replies; 29+ messages in thread
From: Brandon Casey @ 2007-08-01 22:49 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Junio C Hamano, Linus Torvalds, Jakub Narebski, Git Mailing List

Theodore Tso wrote:
> On Wed, Aug 01, 2007 at 03:15:25AM -0700, Junio C Hamano wrote:

>> Perhaps if the destination is local,
>>
>>          - if -s is given, just set up alternates, do nothing else;
> 
> As I understand it, the main objection with making -s the default is
> surprising result that could happen if you do a git-prune in the base
> repository which causes objects which are borrowed from the base
> repository via .git/objects/info/alternates, right?

-s would be a lot safer to use if repack -a -d (as used by git-gc) was smarter.
-a -d has the nasty side effect of doing what it seems only prune is intended
to do... that is to remove unreferenced objects.

-s usage currently has to be very well thought out, unless you're just using it
for a short-lived temporary branch. If this unintended pruning could be avoided
then an average user could go about their merry business repacking and git-gc'ing
without a care, and only when doing a git-gc --prune would they need to do
something special.

-brandon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01  8:48     ` Junio C Hamano
@ 2007-08-01 23:51       ` Jakub Narebski
  0 siblings, 0 replies; 29+ messages in thread
From: Jakub Narebski @ 2007-08-01 23:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > About clone: there was "pack loose, copy existing packs" idea.
> 
> Can you give more details --- I do not recall such an "idea"
> discussed.

The idea was to avoid repacking, and just pack loose, unpacked objects 
(and save this pack if possible), then concatenate all packs and send 
this concatenated pack as the result. This saves a bit (quite a bit) of 
CPU at the cost of additional bandwidth usage if packfiles are not 
optimized.

The only result of the discussion was that it would be fairly easy to 
send multiple packs concatenated into one pack, without need to add 
some multi-pack extension, as there would be required minor changes to 
split "concatenated" packfiles.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 22:03         ` Theodore Tso
  2007-08-01 22:49           ` Brandon Casey
@ 2007-08-02  4:02           ` Allan Wind
  2007-08-02  4:13             ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Allan Wind @ 2007-08-02  4:02 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Junio C Hamano, Linus Torvalds, Jakub Narebski, Git Mailing List

On 2007-08-01T18:03:50-0400, Theodore Tso wrote:
> Yeah, essentially, with a git-config option (and comand-line option)
> to override the default for those people who are "squeamish" about git
> clone -l.  Linus's suggestion of using file:// as a way to indicate
> non-local also makes a lot of sense to me.

I would expect /something and file:///something to behave exactly the 
same way (the latter just having bit extra syntax sugar).


/Allan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-02  4:02           ` Allan Wind
@ 2007-08-02  4:13             ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2007-08-02  4:13 UTC (permalink / raw)
  To: Allan Wind; +Cc: Theodore Tso, Junio C Hamano, Jakub Narebski, Git Mailing List



On Thu, 2 Aug 2007, Allan Wind wrote:
> 
> I would expect /something and file:///something to behave exactly the 
> same way (the latter just having bit extra syntax sugar).

I do agree that they should be basically the same, but from an 
implementation standpoint it actually makes a lot of sense to separate 
them. Also, there's actually a small amount of "logic" in it: the 
/something is obviously a "raw filename", while the "file:://something" 
clearly is something a lot more abstract.

I don't actually have a very strong opinion, but I do think that "file://" 
makes sense regardless (ie the patch I sent out is probably a good idea).

I also strongly dispute that "file://something" is _identical_ to just 
"something". There's a huge difference, as anybody who has ever tried to 
do

	cp file://file-A file-B

will have hopefully found out. They may mean the same thing, but they have 
totally different levels of abstraction, so it does actually make some 
sense that you end up *cloning* the same thing, but different ways.

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 17:03           ` Linus Torvalds
  2007-08-01 18:17             ` David Kastrup
@ 2007-08-02  6:09             ` Junio C Hamano
  2007-08-02 10:29               ` David Kastrup
  1 sibling, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2007-08-02  6:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Carl Worth, Theodore Tso, Jakub Narebski, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> We might make it something like: "if you use an url, we don't default to 
> local", so the difference would be that
>
> 	git clone file:///directory/to/repo
>
> would work the way it does now, but
>
> 	git clone /directory/to/repo
>
> would default to "-l" behaviour. That kind of would make sense (and should 
> be easy to implement: it would be a trivial fixup to "connect.c".

The attached does not default to "-l", but filesystem level copy
behaviour, which is what happens with "clone -l" across
filesystem boundaries with the current code.

Clone of linux-2.6 repository (the source is well packed)

(hardlink -- obviously, almost no cost)
$ /usr/bin/time git clone -l --bare linux-2.6 l-clone.git
Initialized empty Git repository in /git/l-clone.git/
0 blocks
0.55user 1.00system 0:01.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+206724minor)pagefaults 0swaps

(same-as-network)
$ /usr/bin/time git clone --bare file://`pwd`/linux-2.6 n-clone.git
Initialized empty Git repository in /git/n-clone.git/
remote: Generating pack...
remote: Counting objects: 1076746
remote: Done counting 1169654 objects.
remote: Deltifying 1169654 objects...
remote:  100% (1169654/1169654) done
Indexing 1169654 objects...
 100% (1169654/1169654) done
remote: Total 1169654 (delta 959223), reused 1160595 (delta 950164)
Resolving 959223 deltas...
 100% (959223/959223) done
172.85user 20.94system 4:25.88elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6294major+2019874minor)pagefaults 0swaps

(copy -- takes a lot more than hardlink but cheaper than net)
$ /usr/bin/time git clone --bare linux-2.6 c-clone.git
Initialized empty Git repository in /git/c-clone.git/
1266644 blocks
0.92user 10.81system 0:38.38elapsed 30%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (406major+204775minor)pagefaults 0swaps

I am ambivalent between -l vs no -l.

 * Without -l (i.e. have all objects/ copied via cpio) would not
   catch the source repository corruption, and also risks
   corrupted recipient repository if an alpha-particle hits
   memory cell while indexing and resolving deltas.  As long as
   the recipient is made uncorrupted, you have a good back-up.

 * same-as-network is expensive, but it would catch if the
   source is already corrupted.  It still risks corrupted
   recipient repository.  As long as the recipient is made
   uncorrupted, you have a good back-up.

 * With -l, as long as the source repository is healthy, it is
   very likely that the recipient would be, too.  Also it is
   very cheap.  You do not get any back-up benefit.

None of the method is resilient against the source repository
corruption, so let's discount that from the comparison.  Then
the differences between -l and non -l matters primarily if you
value the back-up benefit or not.  If you want to use the cloned
repository as a back-up, then it is cheaper to do a non -l clone
and two git-fsck (source before clone, recipient after clone)
than same-as-network clone, especially as you are likely to do a
git-fsck on the recipient if you are so paranoid anyway.

Which leads me to believe that being able to use file:/// is
probably a good idea, if only for testability, but probably of
little practical value, and we can default to -l for everyday
use, and paranoids can use non -l as a way to make a back-up.

---

 git-clone.sh               |   61 +++++++++++++++++++++++---------------------
 t/t5500-fetch-pack.sh      |    2 +-
 t/t5700-clone-reference.sh |    2 +-
 t/t5701-clone-local.sh     |   17 ++++++++++++
 4 files changed, 51 insertions(+), 31 deletions(-)

diff --git a/git-clone.sh b/git-clone.sh
index 0922554..0583f64 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -87,7 +87,7 @@ Perhaps git-update-server-info needs to be run there?"
 
 quiet=
 local=no
-use_local=no
+use_local_hardlink=no
 local_shared=no
 unset template
 no_checkout=
@@ -108,9 +108,10 @@ while
 	  no_checkout=yes ;;
 	*,--na|*,--nak|*,--nake|*,--naked|\
 	*,-b|*,--b|*,--ba|*,--bar|*,--bare) bare=yes ;;
-	*,-l|*,--l|*,--lo|*,--loc|*,--loca|*,--local) use_local=yes ;;
+	*,-l|*,--l|*,--lo|*,--loc|*,--loca|*,--local)
+	  use_local_hardlink=yes ;;
         *,-s|*,--s|*,--sh|*,--sha|*,--shar|*,--share|*,--shared)
-          local_shared=yes; use_local=yes ;;
+          local_shared=yes; ;;
 	1,--template) usage ;;
 	*,--template)
 		shift; template="--template=$1" ;;
@@ -249,34 +250,36 @@ fi
 rm -f "$GIT_DIR/CLONE_HEAD"
 
 # We do local magic only when the user tells us to.
-case "$local,$use_local" in
-yes,yes)
+case "$local" in
+yes)
 	( cd "$repo/objects" ) ||
-		die "-l flag seen but repository '$repo' is not local."
+		die "cannot chdir to local '$repo/objects'."
 
-	case "$local_shared" in
-	no)
-	    # See if we can hardlink and drop "l" if not.
-	    sample_file=$(cd "$repo" && \
-			  find objects -type f -print | sed -e 1q)
-
-	    # objects directory should not be empty since we are cloning!
-	    test -f "$repo/$sample_file" || exit
-
-	    l=
-	    if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null
-	    then
-		    l=l
-	    fi &&
-	    rm -f "$GIT_DIR/objects/sample" &&
-	    cd "$repo" &&
-	    find objects -depth -print | cpio -pumd$l "$GIT_DIR/" || exit 1
-	    ;;
-	yes)
-	    mkdir -p "$GIT_DIR/objects/info"
-	    echo "$repo/objects" >> "$GIT_DIR/objects/info/alternates"
-	    ;;
-	esac
+	if test "$local_shared" = yes
+	then
+		mkdir -p "$GIT_DIR/objects/info"
+		echo "$repo/objects" >>"$GIT_DIR/objects/info/alternates"
+	else
+		l= &&
+		if test "$use_local_hardlink" = yes
+		then
+			# See if we can hardlink and drop "l" if not.
+			sample_file=$(cd "$repo" && \
+				      find objects -type f -print | sed -e 1q)
+			# objects directory should not be empty because
+			# we are cloning!
+			test -f "$repo/$sample_file" || exit
+			if ln "$repo/$sample_file" "$GIT_DIR/objects/sample" 2>/dev/null
+			then
+				rm -f "$GIT_DIR/objects/sample"
+				l=l
+			else
+				echo >&2 "Warning: -l asked but cannot hardlink to $repo"
+			fi
+		fi &&
+		cd "$repo" &&
+		find objects -depth -print | cpio -pumd$l "$GIT_DIR/" || exit 1
+	fi
 	git-ls-remote "$repo" >"$GIT_DIR/CLONE_HEAD" || exit 1
 	;;
 *)
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 7da5153..7b6798d 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -129,7 +129,7 @@ pull_to_client 2nd "B" $((64*3))
 
 pull_to_client 3rd "A" $((1*3)) # old fails
 
-test_expect_success "clone shallow" "git-clone --depth 2 . shallow"
+test_expect_success "clone shallow" "git-clone --depth 2 file://`pwd`/. shallow"
 
 (cd shallow; git count-objects -v) > count.shallow
 
diff --git a/t/t5700-clone-reference.sh b/t/t5700-clone-reference.sh
index 6d43252..4e93aaa 100755
--- a/t/t5700-clone-reference.sh
+++ b/t/t5700-clone-reference.sh
@@ -51,7 +51,7 @@ diff expected current'
 cd "$base_dir"
 
 test_expect_success 'cloning with reference (no -l -s)' \
-'git clone --reference B A D'
+'git clone --reference B file://`pwd`/A D'
 
 cd "$base_dir"
 
diff --git a/t/t5701-clone-local.sh b/t/t5701-clone-local.sh
index b093327..032c498 100755
--- a/t/t5701-clone-local.sh
+++ b/t/t5701-clone-local.sh
@@ -43,4 +43,21 @@ test_expect_success 'local clone from x.git that does not exist' '
 	fi
 '
 
+test_expect_success 'Without -l, local will make a copy' '
+	cd "$D" &&
+	git clone --bare x w &&
+	cd w &&
+	linked=$(find objects -type f ! -links 1 | wc -l) &&
+	test "$linked" = 0
+'
+
+test_expect_success 'With -l, local will make a hardlink' '
+	cd "$D" &&
+	rm -fr w &&
+	git clone -l --bare x w &&
+	cd w &&
+	copied=$(find objects -type f -links 1 | wc -l) &&
+	test "$copied" = 0
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-02  6:09             ` Junio C Hamano
@ 2007-08-02 10:29               ` David Kastrup
  2007-08-03  0:51                 ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: David Kastrup @ 2007-08-02 10:29 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster@pobox.com> writes:

>  * With -l, as long as the source repository is healthy, it is
>    very likely that the recipient would be, too.  Also it is
>    very cheap.  You do not get any back-up benefit.

Oh, but one does: an overzealous prune or rm -oopswrongoption in one
repo does not hurt the other.

> Which leads me to believe that being able to use file:/// is
> probably a good idea, if only for testability, but probably of
> little practical value, and we can default to -l for everyday
> use, and paranoids can use non -l as a way to make a back-up.

Sane enough, I guess.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 22:18         ` Jakub Narebski
@ 2007-08-02 11:19           ` Jakub Narebski
  0 siblings, 0 replies; 29+ messages in thread
From: Jakub Narebski @ 2007-08-02 11:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Theodore Tso, Linus Torvalds, Git Mailing List

Jakub Narebski wrote:
> Junio C Hamano wrote:
> 
> > Perhaps if the destination is local,
> > 
> >          - if -s is given, just set up alternates, do nothing else;
> >          - by default, do "always copy never hardlink";
> >          - with -l, do "hardlink if possible";
> > 
> > Hmmmm...
> 
> That I think it is the best solution, together with support for
> file:///path/to/repo.git scheme which would turn on old repacking
> behavior. I'm all for it.

By the way, with "-l" you have hardlinks only till repack ("git gc"),
isn't it?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-01 10:15       ` Junio C Hamano
                           ` (3 preceding siblings ...)
  2007-08-01 22:18         ` Jakub Narebski
@ 2007-08-02 18:08         ` Ramsay Jones
  4 siblings, 0 replies; 29+ messages in thread
From: Ramsay Jones @ 2007-08-02 18:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Git Mailing List

Junio C Hamano wrote:
> Are you suggesting to make -l the default for local, in other
> words?  I personally do not make local clone often enough that I
> am not disturbed having to type extra " -l" on the command line.
> 
> But giving a way to force "copy not hardlink" while still
> avoiding "the same as the networked case by doing pack transfer"
> overhead may be a good thing to do.
> 
> Perhaps if the destination is local,
> 
>          - if -s is given, just set up alternates, do nothing else;
>          - by default, do "always copy never hardlink";
>          - with -l, do "hardlink if possible";
> 
> Hmmmm...
> 

About six weeks ago, I finally got around to installing Linux (ubuntu 7.04)
on my laptop. Naturally, I cloned my sparse and git repositories over from
the Windows XP partition. Unfortunately, that left me with a sparse repo that
I could not modify; during the clone cpio copied the object directory, with
perhaps a little too much fidelity, which resulted in a .git/objects tree
with 555 permissions (both files and directories). [It also set the file
timestamps with utime(), BTW]. A quick chmod fixed it up without problem,
but still ...

When I cloned the git repo, however, I forgot the -l parameter and git-clone
effectively did a "git-fetch-pack --all -k $repo", leaving me with a
working, and fully repacked, repository. Nice.

So, I was about to suggest that when invoked with -l, if the object database
cannot be linked, due to EXDEV for example, it should fall back to the
"fetch-pack" behaviour. Since I don't have access to a large repo, I can't
compare the filesystem-copy time versus the fetch-pack time for a "realistic"
repo, but I suppose the copy would always be faster. Oh Well.

Just a data point.

ATB,

Ramsay Jones

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-02 10:29               ` David Kastrup
@ 2007-08-03  0:51                 ` Junio C Hamano
  2007-08-03  6:14                   ` David Kastrup
  2007-08-03  8:20                   ` Johan Herland
  0 siblings, 2 replies; 29+ messages in thread
From: Junio C Hamano @ 2007-08-03  0:51 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

David Kastrup <dak@gnu.org> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>>  * With -l, as long as the source repository is healthy, it is
>>    very likely that the recipient would be, too.  Also it is
>>    very cheap.  You do not get any back-up benefit.
>
> Oh, but one does: an overzealous prune or rm -oopswrongoption in one
> repo does not hurt the other.

That's not "back-up" benefit I was thinking about.  It is more
about protecting your data from hardware failure.  You
physically have bits in two places, preferrably on separate disk
drives.

And that is what you do not get from hardlinked clone.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-03  0:51                 ` Junio C Hamano
@ 2007-08-03  6:14                   ` David Kastrup
  2007-08-03  8:20                   ` Johan Herland
  1 sibling, 0 replies; 29+ messages in thread
From: David Kastrup @ 2007-08-03  6:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> David Kastrup <dak@gnu.org> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>>  * With -l, as long as the source repository is healthy, it is
>>>    very likely that the recipient would be, too.  Also it is
>>>    very cheap.  You do not get any back-up benefit.
>>
>> Oh, but one does: an overzealous prune or rm -oopswrongoption in one
>> repo does not hurt the other.
>
> That's not "back-up" benefit I was thinking about.  It is more
> about protecting your data from hardware failure.  You
> physically have bits in two places, preferrably on separate disk
> drives.
>
> And that is what you do not get from hardlinked clone.

Not at the inode/blob level, but at least the directory manipulations
of one are safe from the other.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial
  2007-08-03  0:51                 ` Junio C Hamano
  2007-08-03  6:14                   ` David Kastrup
@ 2007-08-03  8:20                   ` Johan Herland
  1 sibling, 0 replies; 29+ messages in thread
From: Johan Herland @ 2007-08-03  8:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, David Kastrup

On Friday 03 August 2007, Junio C Hamano wrote:
> David Kastrup <dak@gnu.org> writes:
> 
> > Junio C Hamano <gitster@pobox.com> writes:
> >
> >>  * With -l, as long as the source repository is healthy, it is
> >>    very likely that the recipient would be, too.  Also it is
> >>    very cheap.  You do not get any back-up benefit.
> >
> > Oh, but one does: an overzealous prune or rm -oopswrongoption in one
> > repo does not hurt the other.
> 
> That's not "back-up" benefit I was thinking about.  It is more
> about protecting your data from hardware failure.

If one is serious about backing up ones repo to protect it from hardware 
failure, there is not much use at all in cloning (by copy, hardlink, or 
otherwise) to a different location on the _same_ filesystem. In order for a 
backup to be at least marginally useful, it should be on a different disk 
drive (which you hint at below), or even better; on a different 
continent...

My point is as follows: One has to clone a repo onto (at least) a different 
filesystem if one is serious about backup. But if one is cloning to a 
different filesystem, hardlinking is no longer an option; git _has_ to make 
a copy of some sort. Therefore we might as well hardlink as long as we're 
on a single filesystem (since the extra copy would not be worth much, 
backup-wise).

> You physically have bits in two places, preferrably on separate disk
> drives.
> And that is what you do not get from hardlinked clone.

If the two copies are on separate disk drives (i.e. separate filesystems), 
you cannot make a hardlink in the first place. If the two copies are on the 
same filesystem, they're not much more worth than a single copy 
(backup-wise).

Given the clone-to-same-filesystem(-with-hardlink-capability) scenario 
(which is the only scenario where we have the option of using hardlinks), 
we have the following pros and cons when using hardlinks instead of 
copying:

Pros:
- Hardlink is _much_ faster (for big repos, we're talking orders of 
magnitude faster)

Cons:
- Hardlink will not leave two copies on the disk. But I'm arguing that the 
additional copy will have pretty much _no_ value from a redundancy POV, 
since the copy is still left on the _same_ filesystem. Some would even go 
as far as to say that the second copy provides a false sense of security as 
long as it is located on the same filesystem.


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-08-03  8:21 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-01  0:16 Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial Jakub Narebski
2007-08-01  2:14 ` Linus Torvalds
2007-08-01  5:50   ` Junio C Hamano
2007-08-01  8:48     ` David Kastrup
2007-08-01  9:24     ` Theodore Tso
2007-08-01 10:15       ` Junio C Hamano
2007-08-01 13:20         ` Alex Riesen
2007-08-01 13:20           ` Alex Riesen
2007-08-01 13:23             ` Alex Riesen
2007-08-01 15:49         ` Carl Worth
2007-08-01 17:03           ` Linus Torvalds
2007-08-01 18:17             ` David Kastrup
2007-08-01 20:36               ` Florian Weimer
2007-08-02  6:09             ` Junio C Hamano
2007-08-02 10:29               ` David Kastrup
2007-08-03  0:51                 ` Junio C Hamano
2007-08-03  6:14                   ` David Kastrup
2007-08-03  8:20                   ` Johan Herland
2007-08-01 22:03         ` Theodore Tso
2007-08-01 22:49           ` Brandon Casey
2007-08-02  4:02           ` Allan Wind
2007-08-02  4:13             ` Linus Torvalds
2007-08-01 22:18         ` Jakub Narebski
2007-08-02 11:19           ` Jakub Narebski
2007-08-02 18:08         ` Ramsay Jones
2007-08-01  8:33   ` Jakub Narebski
2007-08-01  8:48     ` Junio C Hamano
2007-08-01 23:51       ` Jakub Narebski
2007-08-01  2:17 ` Shawn O. Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).