git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Make "git clone" less of a deathly quiet experience
@ 2006-02-11  4:31 Linus Torvalds
  2006-02-11  4:37 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Linus Torvalds @ 2006-02-11  4:31 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano, Petr Baudis


I was on IRC today (which is definitely not normal, but hey, I tried it), 
and somebody was complaining about how horribly slow "git clone" was on 
the WineHQ repository.

The WineHQ git repo is actually fairly big: 120+MB packed, 220+ thousand 
objects. So creating the pack is actually a big operation, and yes, it's 
too slow. We should be better at it, and it would be good if the pack-file 
generation were much faster.

However, it turns out that the "slow" git-pack-objects was only using up 
2.3% of CPU time. The fact is, the primary reason it took a long time is 
that even packed, it had to get 120 MB of data. So in this case, it 
appears that the fact that it uses a lot of CPU is actually a good 
trade-off, because the damn thing would have been even slower if it hadn't 
been packed.

(Of course, pre-generated packs would be good regardless)

Anyway, what _really_ made for a pissed-off user was that "git clone" was 
just very very silent all the time. No updates on what the hell it was 
doing. Was it working at all? Was something broken? Is git just a piece of 
cr*p? But "git clone" would not say a peep about it.

It used to be that "git-unpack-objects" would give nice percentages, but 
now that we don't unpack the initial clone pack any more, it doesn't. And 
I'd love to do that nice percentage view in the pack objects downloader 
too, but the thing doesn't even read the pack header, much less know how 
much it's going to get, so I was lazy and didn't.

Instead, it at least prints out how much data it's gotten, and what the 
packign speed is. Which makes the user realize that it's actually doing 
something useful instead of sitting there silently (and if the recipient 
knows how large the final result is, he can at least make a guess about 
when it migt be done).

So with this patch, I get something like this on my DSL line:

	[torvalds@g5 ~]$ time git clone master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 clone-test
	Packing 188543 objects
	  48.398MB  (154 kB/s)

where even the speed approximation seem sto be roughtly correct (even 
though my algorithm is a truly stupid one, and only really gives "speed in 
the last half second or so").

Anyway, _something_ like this is definitely needed. It could certainly be 
better (if it showed the same kind of thing that git-unpack-objects did, 
that would be much nicer, but would require parsing the object stream as 
it comes in). But this is  big step forward, I think.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

Comments? Hate-mail? Improvements?

diff --git a/cache.h b/cache.h
index bdbe2d6..c255421 100644
--- a/cache.h
+++ b/cache.h
@@ -348,6 +348,6 @@ extern int copy_fd(int ifd, int ofd);
 
 /* Finish off pack transfer receiving end */
 extern int receive_unpack_pack(int fd[2], const char *me, int quiet);
-extern int receive_keep_pack(int fd[2], const char *me);
+extern int receive_keep_pack(int fd[2], const char *me, int quiet);
 
 #endif /* CACHE_H */
diff --git a/clone-pack.c b/clone-pack.c
index f634431..719e1c4 100644
--- a/clone-pack.c
+++ b/clone-pack.c
@@ -6,6 +6,8 @@ static const char clone_pack_usage[] =
 "git-clone-pack [--exec=<git-upload-pack>] [<host>:]<directory> [<heads>]*";
 static const char *exec = "git-upload-pack";
 
+static int quiet = 0;
+
 static void clone_handshake(int fd[2], struct ref *ref)
 {
 	unsigned char sha1[20];
@@ -123,7 +125,9 @@ static int clone_pack(int fd[2], int nr_
 	}
 	clone_handshake(fd, refs);
 
-	status = receive_keep_pack(fd, "git-clone-pack");
+	if (!quiet)
+		fprintf(stderr, "Generating pack ...\r");
+	status = receive_keep_pack(fd, "git-clone-pack", quiet);
 
 	if (!status) {
 		if (nr_match == 0)
@@ -154,8 +158,10 @@ int main(int argc, char **argv)
 		char *arg = argv[i];
 
 		if (*arg == '-') {
-			if (!strcmp("-q", arg))
+			if (!strcmp("-q", arg)) {
+				quiet = 1;
 				continue;
+			}
 			if (!strncmp("--exec=", arg, 7)) {
 				exec = arg + 7;
 				continue;
diff --git a/fetch-clone.c b/fetch-clone.c
index 859f400..b67d976 100644
--- a/fetch-clone.c
+++ b/fetch-clone.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "exec_cmd.h"
 #include <sys/wait.h>
+#include <sys/time.h>
 
 static int finish_pack(const char *pack_tmp_name, const char *me)
 {
@@ -129,10 +130,12 @@ int receive_unpack_pack(int fd[2], const
 	die("git-unpack-objects died of unnatural causes %d", status);
 }
 
-int receive_keep_pack(int fd[2], const char *me)
+int receive_keep_pack(int fd[2], const char *me, int quiet)
 {
 	char tmpfile[PATH_MAX];
 	int ofd, ifd;
+	unsigned long total;
+	static struct timeval prev_tv;
 
 	ifd = fd[0];
 	snprintf(tmpfile, sizeof(tmpfile),
@@ -141,6 +144,8 @@ int receive_keep_pack(int fd[2], const c
 	if (ofd < 0)
 		return error("unable to create temporary file %s", tmpfile);
 
+	gettimeofday(&prev_tv, NULL);
+	total = 0;
 	while (1) {
 		char buf[8192];
 		ssize_t sz, wsz, pos;
@@ -165,6 +170,27 @@ int receive_keep_pack(int fd[2], const c
 			}
 			pos += wsz;
 		}
+		total += sz;
+		if (!quiet) {
+			static unsigned long last;
+			struct timeval tv;
+			unsigned long diff = total - last;
+			/* not really "msecs", but a power-of-two millisec (1/1024th of a sec) */
+			unsigned long msecs;
+
+			gettimeofday(&tv, NULL);
+			msecs = tv.tv_sec - prev_tv.tv_sec;
+			msecs <<= 10;
+			msecs += (int)(tv.tv_usec - prev_tv.tv_usec) >> 10;
+			if (msecs > 500) {
+				prev_tv = tv;
+				last = total;
+				fprintf(stderr, "%4lu.%03luMB  (%lu kB/s)        \r",
+					total >> 20,
+					1000*((total >> 10) & 1023)>>10,
+					diff / msecs );
+			}
+		}
 	}
 	close(ofd);
 	return finish_pack(tmpfile, me);
diff --git a/fetch-pack.c b/fetch-pack.c
index 27f5d2a..aa6f42a 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -378,7 +378,7 @@ static int fetch_pack(int fd[2], int nr_
 		fprintf(stderr, "warning: no common commits\n");
 
 	if (keep_pack)
-		status = receive_keep_pack(fd, "git-fetch-pack");
+		status = receive_keep_pack(fd, "git-fetch-pack", quiet);
 	else
 		status = receive_unpack_pack(fd, "git-fetch-pack", quiet);
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11  4:31 Make "git clone" less of a deathly quiet experience Linus Torvalds
@ 2006-02-11  4:37 ` Linus Torvalds
  2006-02-11  5:50   ` Junio C Hamano
  2006-02-11  5:48 ` Junio C Hamano
  2006-02-11 18:39 ` Alex Riesen
  2 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2006-02-11  4:37 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano, Petr Baudis



On Fri, 10 Feb 2006, Linus Torvalds wrote:
> 
> Instead, it at least prints out how much data it's gotten, and what the 
> packign speed is. Which makes the user realize that it's actually doing 
> something useful instead of sitting there silently (and if the recipient 
> knows how large the final result is, he can at least make a guess about 
> when it migt be done).

Btw, we should print out the other "stages" too - the checkout in 
particular can be a big part of the overhead, and it would probably make 
sense to tell people about the fact that "hey, now we're checking the 
result out, we're not actually trying to destroy your disk".

Quite often, the way to make users happy is not by being impossibly fast 
or beautiful or otherwise wonderful, but by just _managing_ their 
expectations, so that they don't say "that's some slow crud", but instead 
say "Ok, it's a nice program, and it's doing a lot of hard work for me".

It takes me 15 minutes to clone a kernel repo over the network. Once I can 
see that most of that is getting a 106MB pack-file at 146 kB/s, I say "ok, 
that's fairly reasonable".

			Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11  4:31 Make "git clone" less of a deathly quiet experience Linus Torvalds
  2006-02-11  4:37 ` Linus Torvalds
@ 2006-02-11  5:48 ` Junio C Hamano
  2006-02-11  7:35   ` Craig Schlenter
                     ` (2 more replies)
  2006-02-11 18:39 ` Alex Riesen
  2 siblings, 3 replies; 24+ messages in thread
From: Junio C Hamano @ 2006-02-11  5:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Petr Baudis

Linus Torvalds <torvalds@osdl.org> writes:

> Anyway, _something_ like this is definitely needed. It could certainly be 
> better (if it showed the same kind of thing that git-unpack-objects did, 
> that would be much nicer, but would require parsing the object stream as 
> it comes in). But this is  big step forward, I think.
>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> ---
>
> Comments? Hate-mail? Improvements?

It probably should default to quiet if (!isatty(1)).

The real improvement, independent of this client-side patch,
would be to reuse recently generated packs, but that needs
writable cache directory on the server side.  Another thing that
I stumbled upon last time I tried it was that it did not look
totally trivial to modify the csum-file interface so that I can
splice the output from it into two different destinations (one
to cachefile, the other to the consumer).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11  4:37 ` Linus Torvalds
@ 2006-02-11  5:50   ` Junio C Hamano
  2006-02-11 17:39     ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2006-02-11  5:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Petr Baudis

Linus Torvalds <torvalds@osdl.org> writes:

> Btw, we should print out the other "stages" too - the checkout in 
> particular can be a big part of the overhead, and it would probably make 
> sense to tell people about the fact that "hey, now we're checking the 
> result out, we're not actually trying to destroy your disk".

Would you suggest doing that with "checkout-index -v", that
shows "1 path1\r2 path2\r3 path3\r...\rDone.\n"?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11  5:48 ` Junio C Hamano
@ 2006-02-11  7:35   ` Craig Schlenter
  2006-02-11  8:44     ` Radoslaw Szkodzinski
  2006-02-11 13:33   ` Petr Baudis
  2006-02-11 17:45   ` Linus Torvalds
  2 siblings, 1 reply; 24+ messages in thread
From: Craig Schlenter @ 2006-02-11  7:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On 11 Feb 2006, at 7:48 AM, Junio C Hamano wrote:
[snip]
> The real improvement, independent of this client-side patch,
> would be to reuse recently generated packs, but that needs
> writable cache directory on the server side.

Speaking of improvements, I've noticed that my attempts to track
the 2.6 kernel via the git protocol result in inefficiencies from time
to time when the connection hangs or is terminated when my
flakey wireless link goes down. When I restart the pull, the data
that has already been downloaded is lost and things start from
scratch which is painful if it's a big update.

It would be nice if the "partial pack" or whatever that has been
downloaded at the time of the breakage could be re-used and
things could start "from that point onwards" or the bits that were
already received could be unpacked. Comments?

Thank you,

--Craig

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11  7:35   ` Craig Schlenter
@ 2006-02-11  8:44     ` Radoslaw Szkodzinski
  2006-02-11 13:05       ` Petr Baudis
  0 siblings, 1 reply; 24+ messages in thread
From: Radoslaw Szkodzinski @ 2006-02-11  8:44 UTC (permalink / raw)
  To: Craig Schlenter; +Cc: Junio C Hamano, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

Craig Schlenter wrote:
> On 11 Feb 2006, at 7:48 AM, Junio C Hamano wrote:
> It would be nice if the "partial pack" or whatever that has been
> downloaded at the time of the breakage could be re-used and
> things could start "from that point onwards" or the bits that were
> already received could be unpacked. Comments?

It even already works on plain http repos with git fetch.
(e.g. WineHQ repository)
Why git protocol doesn't support it?

+10

-- 
GPG Key id:  0xD1F10BA2
Fingerprint: 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2

AstralStorm


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11  8:44     ` Radoslaw Szkodzinski
@ 2006-02-11 13:05       ` Petr Baudis
  2006-02-11 13:15         ` Radoslaw Szkodzinski
  0 siblings, 1 reply; 24+ messages in thread
From: Petr Baudis @ 2006-02-11 13:05 UTC (permalink / raw)
  To: Radoslaw Szkodzinski; +Cc: Craig Schlenter, Junio C Hamano, Git Mailing List

Dear diary, on Sat, Feb 11, 2006 at 09:44:00AM CET, I got a letter
where Radoslaw Szkodzinski <astralstorm@gorzow.mm.pl> said that...
> Craig Schlenter wrote:
> > On 11 Feb 2006, at 7:48 AM, Junio C Hamano wrote:
> > It would be nice if the "partial pack" or whatever that has been
> > downloaded at the time of the breakage could be re-used and
> > things could start "from that point onwards" or the bits that were
> > already received could be unpacked. Comments?
> 
> It even already works on plain http repos with git fetch.
> (e.g. WineHQ repository)
> Why git protocol doesn't support it?

Because it works totally different. When downloading from plain HTTP
repos, you are just downloading files from the remote repository and it
is easy to pick up wherever you left (and last night, I just added a
possibility to Cogito to resume an interrupted cg-clone by just cd'ing
inside and running cg-fetch, as is; it's pretty neat) - you just resume
downloading of the file you downloaded last, and don't download again
the files you already have.

But the native git protocol works completely differently - you tell the
server "give me all objects you have between object X and head", the
object will generate a completely custom pack just for you and send it
over the network. The next time you fetch, you just ask for a pack
between object X and head again, but the head can be already totally
different. What we would have to do is to check for interrupted
packfiles before fetching, attempt to fix them (cutting out the
incomplete objects and broken delta chains, if applicable), and then
tell the remote side to skip those objects; but that may not be easy
because there can be a lot of "loose fibres". Another way would be to
just tell the server "if head is still Y, start sending the pack only
after N bytes". *shudder*

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11 13:05       ` Petr Baudis
@ 2006-02-11 13:15         ` Radoslaw Szkodzinski
  0 siblings, 0 replies; 24+ messages in thread
From: Radoslaw Szkodzinski @ 2006-02-11 13:15 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Craig Schlenter, Junio C Hamano, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1202 bytes --]

Petr Baudis wrote:
> But the native git protocol works completely differently - you tell the
> server "give me all objects you have between object X and head", the
> object will generate a completely custom pack just for you and send it
> over the network. The next time you fetch, you just ask for a pack
> between object X and head again, but the head can be already totally
> different. What we would have to do is to check for interrupted
> packfiles before fetching, attempt to fix them (cutting out the
> incomplete objects and broken delta chains, if applicable), and then
> tell the remote side to skip those objects; but that may not be easy
> because there can be a lot of "loose fibres". Another way would be to
> just tell the server "if head is still Y, start sending the pack only
> after N bytes". *shudder*
> 

The other way would be:
 - generate pack file between X and Y
 - start sending from N bytes

It could break if the repo has been rebased in the meantime.
But we could safeguard against it by sending the hash of the packfile
up to N bytes.

-- 
GPG Key id:  0xD1F10BA2
Fingerprint: 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2

AstralStorm


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11  5:48 ` Junio C Hamano
  2006-02-11  7:35   ` Craig Schlenter
@ 2006-02-11 13:33   ` Petr Baudis
  2006-02-11 13:41     ` Petr Baudis
  2006-02-11 17:24     ` Alex Riesen
  2006-02-11 17:45   ` Linus Torvalds
  2 siblings, 2 replies; 24+ messages in thread
From: Petr Baudis @ 2006-02-11 13:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

BTW, some historical (from the very channel beginning) logs of #git for
fun, profit and late night reading are available at
http://pasky.or.cz/~pasky/cp/%23git/, e.g. the 2006-02-10 early morning
features the King Penguin explaining the deepness and intricacies of
pack files construction! Don't miss the opportunity!

New files won't be world-readable by default, but I hope to get some
irclogger with cutesy web interface set up for #git.


Dear diary, on Sat, Feb 11, 2006 at 06:48:55AM CET, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > Anyway, _something_ like this is definitely needed. It could certainly be 
> > better (if it showed the same kind of thing that git-unpack-objects did, 
> > that would be much nicer, but would require parsing the object stream as 
> > it comes in). But this is  big step forward, I think.
> >
> > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > ---
> >
> > Comments? Hate-mail? Improvements?
> 
> It probably should default to quiet if (!isatty(1)).

isatty(2) or something, 1 is in practice always a ref generator. Perhaps
it would be better not to clutter stderr, though; what about directly
opening /dev/tty? Does Cygwin support that?

> The real improvement, independent of this client-side patch,
> would be to reuse recently generated packs, but that needs
> writable cache directory on the server side.  Another thing that
> I stumbled upon last time I tried it was that it did not look
> totally trivial to modify the csum-file interface so that I can
> splice the output from it into two different destinations (one
> to cachefile, the other to the consumer).

Yes, I said that on IRC yesterday as well. I don't think even a cache is
needed; just look at the repository and say:

	* while there are packs containing only objects we are going to
	  send, pick the largest one and send it as-is.
	* if there is a pack with more than a 75% (totally arbitrary)
	  overlap with the objects we are going to send, send it as-is.
	* pack the loose objects.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11 13:33   ` Petr Baudis
@ 2006-02-11 13:41     ` Petr Baudis
  2006-02-11 17:24     ` Alex Riesen
  1 sibling, 0 replies; 24+ messages in thread
From: Petr Baudis @ 2006-02-11 13:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

Dear diary, on Sat, Feb 11, 2006 at 02:33:40PM CET, I got a letter
where Petr Baudis <pasky@suse.cz> said that...
> BTW, some historical (from the very channel beginning) logs of #git for
> fun, profit and late night reading are available at
> http://pasky.or.cz/~pasky/cp/%23git/, e.g. the 2006-02-10 early morning
> features the King Penguin explaining the deepness and intricacies of
> pack files construction! Don't miss the opportunity!
> 
> New files won't be world-readable by default, but I hope to get some
> irclogger with cutesy web interface set up for #git.

Like,

	http://colabti.de/irclogger/irclogger_logs/git

(Courtesy of Francois Beerten.)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11 13:33   ` Petr Baudis
  2006-02-11 13:41     ` Petr Baudis
@ 2006-02-11 17:24     ` Alex Riesen
  1 sibling, 0 replies; 24+ messages in thread
From: Alex Riesen @ 2006-02-11 17:24 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Linus Torvalds, Git Mailing List

Petr Baudis, Sat, Feb 11, 2006 14:33:40 +0100:
> > It probably should default to quiet if (!isatty(1)).
> 
> isatty(2) or something, 1 is in practice always a ref generator. Perhaps
> it would be better not to clutter stderr, though; what about directly
> opening /dev/tty? Does Cygwin support that?

It can't. Windows has no terminals (as in "none at all"). It has a
Console, which is a special kind of window attached to an application
and where the unbuffered stdout and stderr are magically redirected.

A test for is stdout/err is a tty can only check if the process has
the console attached, and an attempt to open it for writing will
probably just create the thing.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11  5:50   ` Junio C Hamano
@ 2006-02-11 17:39     ` Linus Torvalds
  0 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2006-02-11 17:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Petr Baudis



On Fri, 10 Feb 2006, Junio C Hamano wrote:
> 
> Would you suggest doing that with "checkout-index -v", that
> shows "1 path1\r2 path2\r3 path3\r...\rDone.\n"?

Not if it shows every single path.

When going tty output, we should be careful to limit it to not do tons and 
tons of lines. The download output does gettimeofday to limit itself to 
max 2 times per sec, and the percentage output of git-unpack-objects 
similarly limits itself so that it never spews _tons_ of stuff to the 
terminal.

Under many loads, the terminal will be a lot slower than actually writing 
a file ("context switch to gnome-term + context switch to X + set up 
complex text output").

		Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11  5:48 ` Junio C Hamano
  2006-02-11  7:35   ` Craig Schlenter
  2006-02-11 13:33   ` Petr Baudis
@ 2006-02-11 17:45   ` Linus Torvalds
  2006-02-11 19:10     ` Keith Packard
  2 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2006-02-11 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Petr Baudis



On Fri, 10 Feb 2006, Junio C Hamano wrote:
> 
> It probably should default to quiet if (!isatty(1)).

Sounds fine. isatty(2), though, since we use stderr for these messages 
(stdout is usually the data-stream).

> The real improvement, independent of this client-side patch,
> would be to reuse recently generated packs, but that needs
> writable cache directory on the server side.

More importantly, it really wouldn't have helped that much in this 
situation. At least for me, the network is 90% of the problem, the 
pack-file generation is at most 10%. So cached packfiles really only 
matter for server-side problems (high CPU load, or lack of memory, or 
heavy disk activity).

So the problems really are very independent.

			Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11  4:31 Make "git clone" less of a deathly quiet experience Linus Torvalds
  2006-02-11  4:37 ` Linus Torvalds
  2006-02-11  5:48 ` Junio C Hamano
@ 2006-02-11 18:39 ` Alex Riesen
  2006-02-11 19:04   ` Linus Torvalds
  2 siblings, 1 reply; 24+ messages in thread
From: Alex Riesen @ 2006-02-11 18:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano, Petr Baudis

Linus Torvalds, Sat, Feb 11, 2006 05:31:09 +0100:
> So with this patch, I get something like this on my DSL line:
> 
> 	[torvalds@g5 ~]$ time git clone master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 clone-test
> 	Packing 188543 objects
> 	  48.398MB  (154 kB/s)

I get this:

    $ git clone . ../cloned
    Packing 15440 objects
    $ 2 kB/s)

I'd put a \n before finish_pack to make it nicer.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>

diff --git a/fetch-clone.c b/fetch-clone.c
index b67d976..37141e9 100644
--- a/fetch-clone.c
+++ b/fetch-clone.c
@@ -193,5 +193,7 @@ int receive_keep_pack(int fd[2], const c
 		}
 	}
 	close(ofd);
+	if ( !quiet )
+	    fputc('\n', stderr);
 	return finish_pack(tmpfile, me);
 }

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11 18:39 ` Alex Riesen
@ 2006-02-11 19:04   ` Linus Torvalds
  0 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2006-02-11 19:04 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Git Mailing List, Junio C Hamano, Petr Baudis



On Sat, 11 Feb 2006, Alex Riesen wrote:
> 
> I'd put a \n before finish_pack to make it nicer.

Yes.

Duh. I did all my testing with "time git clone ..", so I had the extra \n 
added by the fact that "time" itself will do it.

Side comment: the pack preparation stage seems to take about 90s for the 
kernel. Of course, that will keep growing with history, but so will 
probably the pack-size, so percentage-wise, the 90% / 10% thing is likely 
to hold for DSL (yes, DSL gets faster too, but so do CPU ;).

That 90s is unquestionably irritating, though, so we do want to either 
cache them, or add similar "I'm working on it" output to that phase too.

		Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11 17:45   ` Linus Torvalds
@ 2006-02-11 19:10     ` Keith Packard
  2006-02-12  3:43       ` Andreas Ericsson
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Packard @ 2006-02-11 19:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: keithp, Junio C Hamano, Git Mailing List, Petr Baudis

[-- Attachment #1: Type: text/plain, Size: 674 bytes --]

On Sat, 2006-02-11 at 09:45 -0800, Linus Torvalds wrote:

> More importantly, it really wouldn't have helped that much in this 
> situation. At least for me, the network is 90% of the problem, the 
> pack-file generation is at most 10%. So cached packfiles really only 
> matter for server-side problems (high CPU load, or lack of memory, or 
> heavy disk activity).

I'd like to see git use less CPU than CVS does on my distribution host;
some mechanism for re-using either existing or cached packs would help a
whole lot with that. The alternative is to see people switch to rsync
instead, which seems like a far worse idea.   

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-11 19:10     ` Keith Packard
@ 2006-02-12  3:43       ` Andreas Ericsson
  2006-02-12  4:11         ` Keith Packard
  0 siblings, 1 reply; 24+ messages in thread
From: Andreas Ericsson @ 2006-02-12  3:43 UTC (permalink / raw)
  To: Keith Packard
  Cc: Linus Torvalds, Junio C Hamano, Git Mailing List, Petr Baudis

Keith Packard wrote:
> On Sat, 2006-02-11 at 09:45 -0800, Linus Torvalds wrote:
> 
> 
>>More importantly, it really wouldn't have helped that much in this 
>>situation. At least for me, the network is 90% of the problem, the 
>>pack-file generation is at most 10%. So cached packfiles really only 
>>matter for server-side problems (high CPU load, or lack of memory, or 
>>heavy disk activity).
> 
> 
> I'd like to see git use less CPU than CVS does on my distribution host;
> some mechanism for re-using either existing or cached packs would help a
> whole lot with that. The alternative is to see people switch to rsync
> instead, which seems like a far worse idea.   
> 

A weird oddity; Cloning is faster over rsync, day-to-day pulling is not.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-12  3:43       ` Andreas Ericsson
@ 2006-02-12  4:11         ` Keith Packard
  2006-02-12 11:02           ` Andreas Ericsson
  2006-02-13  2:06           ` Martin Langhoff
  0 siblings, 2 replies; 24+ messages in thread
From: Keith Packard @ 2006-02-12  4:11 UTC (permalink / raw)
  To: Andreas Ericsson
  Cc: keithp, Linus Torvalds, Junio C Hamano, Git Mailing List,
	Petr Baudis

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On Sun, 2006-02-12 at 04:43 +0100, Andreas Ericsson wrote:

> A weird oddity; Cloning is faster over rsync, day-to-day pulling is not.

Precisely. If the protocol could deliver existing packs instead of
unpacking and repacking them, then git would be as fast as rsync and I
wouldn't have to worry about supporting two protocols.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-12  4:11         ` Keith Packard
@ 2006-02-12 11:02           ` Andreas Ericsson
  2006-02-12 21:04             ` Keith Packard
  2006-02-16  6:56             ` Eric W. Biederman
  2006-02-13  2:06           ` Martin Langhoff
  1 sibling, 2 replies; 24+ messages in thread
From: Andreas Ericsson @ 2006-02-12 11:02 UTC (permalink / raw)
  To: Keith Packard
  Cc: Linus Torvalds, Junio C Hamano, Git Mailing List, Petr Baudis

Keith Packard wrote:
> On Sun, 2006-02-12 at 04:43 +0100, Andreas Ericsson wrote:
> 
> 
>>A weird oddity; Cloning is faster over rsync, day-to-day pulling is not.
> 
> 
> Precisely. If the protocol could deliver existing packs instead of
> unpacking and repacking them, then git would be as fast as rsync and I
> wouldn't have to worry about supporting two protocols.
> 

Caching features have been discussed, but that means the daemon needs to 
have write-access to some directory within the repository. It would also 
work poorly for projects that see very rapid development unless the 
cached pack-files can be amended to. A sort of "create packs on demand". 
It shouldn't be too difficult, really.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-12 11:02           ` Andreas Ericsson
@ 2006-02-12 21:04             ` Keith Packard
  2006-02-16  6:56             ` Eric W. Biederman
  1 sibling, 0 replies; 24+ messages in thread
From: Keith Packard @ 2006-02-12 21:04 UTC (permalink / raw)
  To: Andreas Ericsson
  Cc: keithp, Linus Torvalds, Junio C Hamano, Git Mailing List,
	Petr Baudis

[-- Attachment #1: Type: text/plain, Size: 800 bytes --]

On Sun, 2006-02-12 at 12:02 +0100, Andreas Ericsson wrote:

> Caching features have been discussed, but that means the daemon needs to 
> have write-access to some directory within the repository. 

Caching seems a bit dicey to me; security concerns and all. I would much
rather have it discover packs on disk that provided a subset of the
necessary objects; repository cloning would then be a process of
delivering any available packs and then packing up the remaining
objects. Clever administration of the repository could then construct a
single pack of 'historical' data followed by periodic packs of
incremental data.

Yeah, I know, I should just implement this and see how well it works in
practice. I apologize for thinking in public.     
      
-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-12  4:11         ` Keith Packard
  2006-02-12 11:02           ` Andreas Ericsson
@ 2006-02-13  2:06           ` Martin Langhoff
  2006-02-13  3:36             ` Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Martin Langhoff @ 2006-02-13  2:06 UTC (permalink / raw)
  To: Keith Packard
  Cc: Andreas Ericsson, Linus Torvalds, Junio C Hamano,
	Git Mailing List, Petr Baudis

On 2/12/06, Keith Packard <keithp@keithp.com> wrote:
> On Sun, 2006-02-12 at 04:43 +0100, Andreas Ericsson wrote:
>
> > A weird oddity; Cloning is faster over rsync, day-to-day pulling is not.
>
> Precisely. If the protocol could deliver existing packs instead of
> unpacking and repacking them, then git would be as fast as rsync and I
> wouldn't have to worry about supporting two protocols.

+1... there should be an easy-to-compute threshold trigger to say --
hey, let's quit being smart and send this client the packs we got and
get it over with. Or perhaps a client flag so large projects can
recommend that uses do their initial clone with --gimme-all-packs?

My workaround for large repos is to clone over http, and s/http:/git:/
on the origin file once it's done ;-)


martin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-13  2:06           ` Martin Langhoff
@ 2006-02-13  3:36             ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2006-02-13  3:36 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Keith Packard, Andreas Ericsson, Linus Torvalds, Git Mailing List,
	Petr Baudis

Martin Langhoff <martin.langhoff@gmail.com> writes:

> +1... there should be an easy-to-compute threshold trigger to say --
> hey, let's quit being smart and send this client the packs we got and
> get it over with. Or perhaps a client flag so large projects can
> recommend that uses do their initial clone with --gimme-all-packs?

What upload-pack does boils down to:

    * find out the latest of what client has and what client asked.

    * run "rev-list --objects ^client ours" to make a list of
      objects client needs.  The actual command line has multiple
      "clients" to exclude what is unneeded to be sent, and
      multiple "ours" to include refs asked.  When you are doing
      a full clone, ^client is empty and ours is essentially
      --all.

    * feed that output to "pack-objects --stdout" and send out
      the result.

If you run this command:

	$ git-rev-list --objects --all |
          git-pack-objects --stdout >/dev/null 

It would say some things.  The phases of operations are:

	Generating pack...
	Counting objects XXXX...
        Done counting XXXX objects.
        Packing XXXXX objects.....

Phase (1).  Between the time it says "Generating pack..." upto
"Done counting XXXX objects.", the time is spent by rev-list to
list up all the objects to be sent out.

Phase (2). After that, it tries to make decision what object to
delta against what other object, while twenty or so dots are
printed after "Packing XXXXX objects." (see #git irc log a
couple of days ago; Linus describes how pack building works).

Phase (3). After the dot stops, the program becomes silent.
That is where it actually does delta compression and writeout.

You would notice that quite a lot of time is spent in all
phases.

There is an internal hook to create full repository pack inside
upload-pack (which is what runs on the other end when you run
fetch-pack or clone-pack), but it works slightly differently
from what you are suggesting, in that it still tries to do the
"correct" thing.  It still runs "rev-list --objects --all", so
"dangling objects" are never sent out.

We could cheat in all phases to speed things up, at the expense
of ending up sending excess objects.  So let's pretend we
decided to treat everything in .git/objects/packs/pack-* (and
the ones found in alternates as well) have interesting objects
for the cloner.

(1) This part unfortunately cannot be totally eliminated.  By
    assume all packs are interesting, we could use the object
    names from the pack index, which is a lot cheaper than
    rev-list object traversal.  We still need to run rev-list
    --objects --all --unpacked to pick up loose objects we would
    not be able to tell by looking at the pack index to cover
    the rest.

    This however needs to be done in conjunction with the second
    phase change.  pack-objects depends on the hint rev-list
    --objects output gives it to group the blobs and trees with
    the same pathnames together, and that greatly affects the
    packing efficiency.  Unfortunately pack index does not have
    that information -- it does not know type, nor pathnames.
    Type is relatively cheap to obtain but pathnames for blob
    objects are inherently unavailable.

(2) This part can be mostly eliminated for already packed
    objects, because we have already decided to cheat by sending
    everything, so we can just reuse how objects are deltified
    in existing packs.  It still needs to be done for loose
    objects we collected to fill the gap in (1).

(3) This also can be sped up by reusing what are already in
    packs.  Pack index records starting (but not end) offset of
    each object in the pack, so we can sort by offset to find
    out which part of the existing pack corresponds to what
    object, to reorder the objects in the final pack.  This
    needs to be done somewhat carefully to preserve the locality
    of objects (again, see #git log).  The deltifying and
    compressing for loose objects cannot be avoided.

    While we are writing things out in (3), we need to keep
    track of running SHA1 sum of what we write out so that we
    can fill out the correct checksum at the end, but I am
    guessing that is relatively cheap compared to the
    deltification and compression cost we are currently paying
    in this phase.

NB. In the #git log, Linus made it sound like I am clueless
about how pack is generated, but if you check commit 9d5ab96,
the "recency of delta is inherited from base", one of the tricks
that have a big performance impact, was done by me ;-).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-12 11:02           ` Andreas Ericsson
  2006-02-12 21:04             ` Keith Packard
@ 2006-02-16  6:56             ` Eric W. Biederman
  2006-02-16  7:33               ` Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Eric W. Biederman @ 2006-02-16  6:56 UTC (permalink / raw)
  To: Andreas Ericsson
  Cc: Keith Packard, Linus Torvalds, Junio C Hamano, Git Mailing List,
	Petr Baudis

Andreas Ericsson <ae@op5.se> writes:

> Keith Packard wrote:
>> On Sun, 2006-02-12 at 04:43 +0100, Andreas Ericsson wrote:
>>
>>>A weird oddity; Cloning is faster over rsync, day-to-day pulling is not.
>> Precisely. If the protocol could deliver existing packs instead of
>> unpacking and repacking them, then git would be as fast as rsync and I
>> wouldn't have to worry about supporting two protocols.
>>
>
> Caching features have been discussed, but that means the daemon needs to have
> write-access to some directory within the repository. It would also work poorly
> for projects that see very rapid development unless the cached pack-files can be
> amended to. A sort of "create packs on demand". It shouldn't be too difficult,
> really.

Actually for the clone case we don't need a writable directory for the
git-daemon. 

If we assume that a repository up for download is reasonably packed,
we can just lob all of the packs in the current repository, and then
pack the few remaining objects and send them.

I don't know how well multiple packs will work with the current git
protocol but it should be pretty natural, and the clone case is easy
detect as there are no heads in common.  Can that be detected quickly?

I don't have a patch but it feels like a pretty straight forward thing
to implement.

Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Make "git clone" less of a deathly quiet experience
  2006-02-16  6:56             ` Eric W. Biederman
@ 2006-02-16  7:33               ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2006-02-16  7:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andreas Ericsson, Keith Packard, Linus Torvalds, Git Mailing List,
	Petr Baudis

ebiederm@xmission.com (Eric W. Biederman) writes:

> I don't know how well multiple packs will work with the current git
> protocol...

Then I wonder why you are making this observation ... ;-)

In any case, I suspect this would be helped to a certain degree
by the pack-object that reuses delta data from existing packs,
if your repository is reasonably packed.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2006-02-16  7:33 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-11  4:31 Make "git clone" less of a deathly quiet experience Linus Torvalds
2006-02-11  4:37 ` Linus Torvalds
2006-02-11  5:50   ` Junio C Hamano
2006-02-11 17:39     ` Linus Torvalds
2006-02-11  5:48 ` Junio C Hamano
2006-02-11  7:35   ` Craig Schlenter
2006-02-11  8:44     ` Radoslaw Szkodzinski
2006-02-11 13:05       ` Petr Baudis
2006-02-11 13:15         ` Radoslaw Szkodzinski
2006-02-11 13:33   ` Petr Baudis
2006-02-11 13:41     ` Petr Baudis
2006-02-11 17:24     ` Alex Riesen
2006-02-11 17:45   ` Linus Torvalds
2006-02-11 19:10     ` Keith Packard
2006-02-12  3:43       ` Andreas Ericsson
2006-02-12  4:11         ` Keith Packard
2006-02-12 11:02           ` Andreas Ericsson
2006-02-12 21:04             ` Keith Packard
2006-02-16  6:56             ` Eric W. Biederman
2006-02-16  7:33               ` Junio C Hamano
2006-02-13  2:06           ` Martin Langhoff
2006-02-13  3:36             ` Junio C Hamano
2006-02-11 18:39 ` Alex Riesen
2006-02-11 19:04   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).