Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] cvsimport: cleanup commit function
From: Jeff King @ 2006-05-24  9:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Morten Welinder, Martin Langhoff, Matthias Urlichs, git
In-Reply-To: <7vpsi41f82.fsf@assigned-by-dhcp.cox.net>

On Tue, May 23, 2006 at 04:41:33PM -0700, Junio C Hamano wrote:

> Are you two talking about running git-commit-tree via env is two
> fork-execs instead of just one?  Does that have a measurable
> difference?

Yes, that's what I was talking about. No, probably not a huge
difference. I did some performance measurements of all of the recent
cvsimport changes on a small-ish personal repo (I don't have the gentoo
repo). The results were not significant (<= 1% improvement for each
change).  I would expect some of the changes (index-info, fetchfile) to
have an impact on a repo with different characteristics (like the gentoo
one).

-Peff

^ permalink raw reply

* Re: file name case-sensitivity issues
From: Ben Clifford @ 2006-05-24  9:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vd5e4xkrh.fsf@assigned-by-dhcp.cox.net>

On Tue, 23 May 2006, Junio C Hamano wrote:

> That's interesting.  I wonder how...  Does this sequence remove FOO
> on that filesystem?
> 
> 	$ date >FOO
>         $ rm -f foo
>         $ ls

yes.

$ ls
$ date >FOO
$ ls
FOO
$ rm -f foo
$ ls

> Also if you do the final "git pull" using resolve strategy, does
> it change the result (say "git pull -s resolve . side" instead)?

Different result:

$ mkdir case-sensitivity-test
$ cd case-sensitivity-test
$ git init-db
defaulting to local storage area
$ echo foo > foo
$ echo bar > bar
$ git add foo bar
$ git commit -m initial\ commit
Committing initial tree 89ff1a2aefcbff0f09197f0fd8beeb19a7b6e51c
$ git checkout -b side
$ echo bar-side >> bar
$ git commit -m side\ commit -o bar
$ git checkout master
$ rm foo
$ git update-index --remove foo
$ echo FOO > FOO
$ git add FOO
$ git commit -m case\ change
$ ls
FOO bar
$ git pull -s resolve . side
Trying really trivial in-index merge...
fatal: Merge requires file-level merging
Nope.
Trying simple merge.
Merge 06c11eeb08edefba8178b091287ec6d951d1ef1d, made by resolve.
 bar |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
$ ls
FOO bar
$ 

-- 

^ permalink raw reply

* Re: [PATCH] gitk: Replace "git-" commands with "git "
From: Timo Hirvonen @ 2006-05-24 10:34 UTC (permalink / raw)
  To: Alex Riesen; +Cc: paulus, git
In-Reply-To: <81b0412b0605240323q29b64949s80d4738cb54c22c8@mail.gmail.com>

"Alex Riesen" <raa.lkml@gmail.com> wrote:

> On 5/24/06, Timo Hirvonen <tihirvon@gmail.com> wrote:
> > git-* commands work only if gitexecdir is in PATH.
> >
> 
> How about getting exec-path (git --exec-path) and prepend it
> to every git-<call> instead? You'll save a fork+exec a call in this case.

Many commands are already built-in so I don't think it's a problem
anymore.

-- 
http://onion.dynserv.net/~timo/

^ permalink raw reply

* [PATCH] Builtin git-cat-file
From: Timo Hirvonen @ 2006-05-24 11:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


Signed-off-by: Timo Hirvonen <tihirvon@gmail.com>

---

Not huge disc space savings but avoids fork+exec.

95174d93a8fb39b907f4f7359a381b9ad5757e5d
 Makefile                         |    6 +++---
 cat-file.c => builtin-cat-file.c |    3 ++-
 builtin.h                        |    1 +
 git.c                            |    1 +
 4 files changed, 7 insertions(+), 4 deletions(-)
 rename cat-file.c => builtin-cat-file.c (98%)

95174d93a8fb39b907f4f7359a381b9ad5757e5d
diff --git a/Makefile b/Makefile
index 5423b7a..faab3f9 100644
--- a/Makefile
+++ b/Makefile
@@ -149,7 +149,7 @@ SIMPLE_PROGRAMS = \
 
 # ... and all the rest that could be moved out of bindir to gitexecdir
 PROGRAMS = \
-	git-apply$X git-cat-file$X \
+	git-apply$X \
 	git-checkout-index$X git-clone-pack$X git-commit-tree$X \
 	git-convert-objects$X git-diff-files$X \
 	git-diff-index$X git-diff-stages$X \
@@ -171,7 +171,7 @@ PROGRAMS = \
 BUILT_INS = git-log$X git-whatchanged$X git-show$X \
 	git-count-objects$X git-diff$X git-push$X \
 	git-grep$X git-rev-list$X git-check-ref-format$X \
-	git-init-db$X
+	git-init-db$X git-cat-file$X
 
 # what 'all' will build and 'install' will install, in gitexecdir
 ALL_PROGRAMS = $(PROGRAMS) $(SIMPLE_PROGRAMS) $(SCRIPTS)
@@ -220,7 +220,7 @@ LIB_OBJS = \
 BUILTIN_OBJS = \
 	builtin-log.o builtin-help.o builtin-count.o builtin-diff.o builtin-push.o \
 	builtin-grep.o builtin-rev-list.o builtin-check-ref-format.o \
-	builtin-init-db.o
+	builtin-init-db.o builtin-cat-file.o
 
 GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
 LIBS = $(GITLIBS) -lz
diff --git a/cat-file.c b/builtin-cat-file.c
similarity index 98%
rename from cat-file.c
rename to builtin-cat-file.c
index 7413fee..8ab136e 100644
--- a/cat-file.c
+++ b/builtin-cat-file.c
@@ -7,6 +7,7 @@ #include "cache.h"
 #include "exec_cmd.h"
 #include "tag.h"
 #include "tree.h"
+#include "builtin.h"
 
 static void flush_buffer(const char *buf, unsigned long size)
 {
@@ -93,7 +94,7 @@ static int pprint_tag(const unsigned cha
 	return 0;
 }
 
-int main(int argc, char **argv)
+int cmd_cat_file(int argc, const char **argv, char **envp)
 {
 	unsigned char sha1[20];
 	char type[20];
diff --git a/builtin.h b/builtin.h
index 6054126..01f2eec 100644
--- a/builtin.h
+++ b/builtin.h
@@ -27,5 +27,6 @@ extern int cmd_grep(int argc, const char
 extern int cmd_rev_list(int argc, const char **argv, char **envp);
 extern int cmd_check_ref_format(int argc, const char **argv, char **envp);
 extern int cmd_init_db(int argc, const char **argv, char **envp);
+extern int cmd_cat_file(int argc, const char **argv, char **envp);
 
 #endif
diff --git a/git.c b/git.c
index 3216d31..6df0902 100644
--- a/git.c
+++ b/git.c
@@ -52,6 +52,7 @@ static void handle_internal_command(int 
 		{ "grep", cmd_grep },
 		{ "rev-list", cmd_rev_list },
 		{ "init-db", cmd_init_db },
+		{ "cat-file", cmd_cat_file },
 		{ "check-ref-format", cmd_check_ref_format }
 	};
 	int i;
-- 
1.3.3.g40505-dirty

^ permalink raw reply related

* Re: Incremental cvsimports
From: Geoff Russell @ 2006-05-24 11:19 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90605240121o117fadb6vf3ce910a3ad3e90@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]

Dear Martin,


On 5/24/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> On 5/24/06, Geoff Russell <geoffrey.russell@gmail.com> wrote:
> > Dear Git,
>
> Dear Geoff,
>
> if you look at the list archive for the last couple of days, you'll
> see there's been quite a bit of activity in tuning cvsimport so that
> it scales better with large imports like yours. We have been playing
> with a gentoo cvs repo with 300K commits / 1.6GB uncompressed.
>
> Don't split up the tree... that'll lead to something rather ackward.
> Instead, fetch and build git from Junio's 'master' branch which seems
> to have collected most (all?) of the patches posted, including one
> from Linus that will repack the repo every 1K commits -- keeping the
> import size down.

I got the latest git and yes, the size is kept down. I've only tried with
a smaller repository but it looks promising. When I ran git-cvsimport without a
CVS-module name (wanting the entire repository), it gave me a Usage message
indicating that the CVS-module name was optional - but it isn't :)

I did have to change
2 lines in git-cvsimport to get it to run with my 5.8.0 perl (problems with
POSIX errno). I've attached a patch but my work around isn't as quick as
what it replaced.

Many thanks, I'll have a go with the big repository at work tomorrow!

Cheers,
Geoff Russell

P.S. I've just started to look with git. We have wanted a cvs replacement for
a while but have been too scared to change (until now).



>
> You _will_ need a lot of memory though, as cvsps grows large (working
> on a workaround now) and cvsimport grows a bit over time (where is
> that last leak?!). And a fast machine -- specially fast IO. I've just
> switched from an old test machine to an AMD64 with fast disks, and
> it's importing around 10K commits per hour.

I

>
> You will probably want to run cvsps by hand, and later use the -P flag.
>
> cheers,
>
>
> martin
>
>

[-- Attachment #2: 999 --]
[-- Type: application/octet-stream, Size: 903 bytes --]

*** git-cvsimport	2006-05-24 20:13:19.000000000 +0930
--- /usr/local/bin/git-cvsimport	2006-05-24 20:22:27.000000000 +0930
*************** use File::Basename qw(basename dirname);
*** 23,29 ****
  use Time::Local;
  use IO::Socket;
  use IO::Pipe;
! use POSIX qw(strftime dup2 :errno_h);
  use IPC::Open2;
  
  $SIG{'PIPE'}="IGNORE";
--- 23,29 ----
  use Time::Local;
  use IO::Socket;
  use IO::Pipe;
! use POSIX qw(strftime dup2);
  use IPC::Open2;
  
  $SIG{'PIPE'}="IGNORE";
*************** sub get_headref ($$) {
*** 446,452 ****
  	    is_sha1($r) or die "Cannot get head id for $name ($r): $!";
  	    return $r;
      }
!     die "unable to open $f: $!" unless $! == POSIX::ENOENT;
      return undef;
  }
  
--- 446,452 ----
  	    is_sha1($r) or die "Cannot get head id for $name ($r): $!";
  	    return $r;
      }
!     die "unable to open $f: $!" if -f $f;
      return undef;
  }
  






^ permalink raw reply

* Re: Incremental cvsimports
From: Jeff King @ 2006-05-24 12:22 UTC (permalink / raw)
  To: geoff; +Cc: Martin Langhoff, git
In-Reply-To: <93c3eada0605240419o48891cdle6c100fc0ac870ff@mail.gmail.com>

On Wed, May 24, 2006 at 08:49:03PM +0930, Geoff Russell wrote:

> I did have to change 2 lines in git-cvsimport to get it to run with my
> 5.8.0 perl (problems with POSIX errno). I've attached a patch but my
> work around isn't as quick as what it replaced.

Can you describe your problem in more detail? The POSIX errno constants
have been available since long before 5.8.0, so we should be able to use
them.

(btw, the change was introduced in my commit() cleanups:
  e73aefe4fdba0d161d9878642c69b40d83a0204c).

-Peff

^ permalink raw reply

* Re: Incremental cvsimports
From: Geoff Russell @ 2006-05-24 12:33 UTC (permalink / raw)
  To: geoff, Martin Langhoff, git
In-Reply-To: <20060524122246.GA3997@coredump.intra.peff.net>

Dear Jeff,

See below.

On 5/24/06, Jeff King <peff@peff.net> wrote:
> On Wed, May 24, 2006 at 08:49:03PM +0930, Geoff Russell wrote:
>
> > I did have to change 2 lines in git-cvsimport to get it to run with my
> > 5.8.0 perl (problems with POSIX errno). I've attached a patch but my
> > work around isn't as quick as what it replaced.
>
> Can you describe your problem in more detail? The POSIX errno constants
> have been available since long before 5.8.0, so we should be able to use
> them.


   $ ./git-cvsimport

   ":errno_h" is not exported by the POSIX module
   Can't continue after import errors at
/usr/lib/perl5/5.8.0/i386-linux-thread-multi/POSIX.pm line 19
    BEGIN failed--compilation aborted at ./git-cvsimport line 26.

When I deleted ":errno_h" I needed to patch the place it was used (as per patch
I attached in original post).

Cheers,
Geoff Russell



>
> (btw, the change was introduced in my commit() cleanups:
>   e73aefe4fdba0d161d9878642c69b40d83a0204c).
>
> -Peff
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply

* Slow fetches of tags
From: Ralf Baechle @ 2006-05-24 13:10 UTC (permalink / raw)
  To: git

I have a fairly large git tree (with a 320MB pack file containing some
700,000 objects).  A small fetch like

  git fetch git://www.kernel.org/pub/scm/linux/kernel/git/stable/\
       linux-2.6.16.y.git master:v2.6.16-stable

which only fetches a handful of objects (v2.6.16.17 -> v2.6.16.18) will
take on the order of 4-5 minutes.  Adding the "-n" option is will bring
the operation down to under a second, so it really is just the tags
that are slowing things down so much..

  Ralf

^ permalink raw reply

* Re: Incremental cvsimports
From: Jeff King @ 2006-05-24 13:23 UTC (permalink / raw)
  To: geoff; +Cc: Martin Langhoff, git
In-Reply-To: <93c3eada0605240533q4d1b5b81p128dc2b905aa9976@mail.gmail.com>

On Wed, May 24, 2006 at 10:03:44PM +0930, Geoff Russell wrote:

>   ":errno_h" is not exported by the POSIX module
>   Can't continue after import errors at
> /usr/lib/perl5/5.8.0/i386-linux-thread-multi/POSIX.pm line 19
>    BEGIN failed--compilation aborted at ./git-cvsimport line 26.

Hmm. It looks like something is nonstandard in your setup. I just compiled
5.8.0 from source and the :errno_h tag works fine. What is your
platform?  Can you try the following and let me know which work:
  $ perl -e 'use POSIX qw(:errno_h)'
  $ perl -e 'use POSIX qw(errno_h)'
  $ perl -e 'use Errno'

-Peff

^ permalink raw reply

* Re: [PATCH] Add a test-case for git-apply trying to add an ending line
From: Catalin Marinas @ 2006-05-24 13:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <7v8xosqaqm.fsf@assigned-by-dhcp.cox.net>

On 24/05/06, Junio C Hamano <junkio@cox.net> wrote:
> I'd admit that trying to apply a patch without context like the
> above example _is_ insane, and I realize that I am making this
> problem unsolvable by refusing the heuristics to consider that
> the patch is anchored at the end when we do not see any trailing
> context.  But somehow it feels wrong...

The reason I sent you this test is that GNU patch fails to apply the
diff but git-apply succeeds (and I thought git-apply is more
restrictive).

When there are context lines either before or after the "+" line, it
should be OK to assume that the diff has context and therefore the EOF
should be considered.

If there are no context lines at all, the diff is either without
context or it is meant to patch an empty file. The latter is safer and
probably valid for most of the cases but if you have a patch without
context, you could explicitely pass the -C0 option to git-apply.

-- 
Catalin

^ permalink raw reply

* Re: Incremental cvsimports
From: Geoff Russell @ 2006-05-24 13:47 UTC (permalink / raw)
  To: Martin Langhoff, git, Jeff King
In-Reply-To: <20060524132317.GA4594@coredump.intra.peff.net>

Hi Jeff,

On 5/24/06, Jeff King <peff@peff.net> wrote:
> On Wed, May 24, 2006 at 10:03:44PM +0930, Geoff Russell wrote:
>
> >   ":errno_h" is not exported by the POSIX module
> >   Can't continue after import errors at
> > /usr/lib/perl5/5.8.0/i386-linux-thread-multi/POSIX.pm line 19
> >    BEGIN failed--compilation aborted at ./git-cvsimport line 26.
>
> Hmm. It looks like something is nonstandard in your setup. I just compiled
> 5.8.0 from source and the :errno_h tag works fine. What is your
> platform?  Can you try the following and let me know which work:

I compiled perl from source on Mandrake 9.1.

>   $ perl -e 'use POSIX qw(:errno_h)'
>   $ perl -e 'use POSIX qw(errno_h)'
>   $ perl -e 'use Errno'

All 3 work.  But if I add a second tag before the ':errno_h", then I
get an error.

The "use" line that makes git-cvsimport compile for me is:

        use POSIX qw(strftime dup2 ENOENT);

Which just imports the required symbol and not the full tag list.

Cheers,
Geoff.

>
> -Peff
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply

* Re: Incremental cvsimports
From: Jeff King @ 2006-05-24 13:58 UTC (permalink / raw)
  To: junkio; +Cc: Martin Langhoff, git, geoff
In-Reply-To: <93c3eada0605240647i48db0588ja343e348f5feb08e@mail.gmail.com>

On Wed, May 24, 2006 at 11:17:32PM +0930, Geoff Russell wrote:

> All 3 work.  But if I add a second tag before the ':errno_h", then I
> get an error.
> 
> The "use" line that makes git-cvsimport compile for me is:
> 
>        use POSIX qw(strftime dup2 ENOENT);

Odd. It's either a bug with importing tags in older versions, or there's
some deep perl voodoo that I don't understand (either way, it is "fixed"
in more recent versions).  Importing ENOENT directly is reasonable.

Junio, can you apply the following fix?

diff --git a/git-cvsimport.perl b/git-cvsimport.perl
index af331d9..76f6246 100755
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -23,7 +23,7 @@ use File::Basename qw(basename dirname);
 use Time::Local;
 use IO::Socket;
 use IO::Pipe;
-use POSIX qw(strftime dup2 :errno_h);
+use POSIX qw(strftime dup2 ENOENT);
 use IPC::Open2;
 
 $SIG{'PIPE'}="IGNORE";

^ permalink raw reply related

* Re: [osol-bugs] access() behaves strange when used as root
From: Stefan Pfetzing @ 2006-05-24 14:08 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <447460C1.6070305@sun.com>

Hi Alan,

2006/5/24, Alan Coopersmith <Alan.Coopersmith@sun.com>:

> Compilers also fall in the class of things I've never understood why
> people would ever run as root.   Far too complex and completely unnecessary.
> "make all" as a normal user, and then if you absolutely must, "make install"
> as root.  (After running "make -n install" first to see what it will do.)

Yes thats completely true, but it still leaves the point if you want
to manage some
of your config files with git.

bye

Stefan

-- 
       http://www.dreamind.de/
Oroborus and Debian GNU/Linux Developer.

^ permalink raw reply

* Re: [osol-bugs] access() behaves strange when used as root
From: Stefan Pfetzing @ 2006-05-24 14:09 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <f3d7535d0605240708k7e55cc3fu8c2e8ad744f738c9@mail.gmail.com>

[snip]

oups - sorry, wrong recepient... :( *shrug*

bye

Stefan

-- 
       http://www.dreamind.de/
Oroborus and Debian GNU/Linux Developer.

^ permalink raw reply

* Re: [PATCH] Add a test-case for git-apply trying to add an ending line
From: Linus Torvalds @ 2006-05-24 14:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Catalin Marinas, git
In-Reply-To: <7v8xosqaqm.fsf@assigned-by-dhcp.cox.net>

On Tue, 23 May 2006, Junio C Hamano wrote:

> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > On Tue, 23 May 2006, Junio C Hamano wrote:
> >
> >> The issue is if we can reliably tell if there is such an EOF
> >> context by looking at the diff.  Not having the same number of
> >> lines that starts with ' ' in the hunk is not really a nice way
> >> of doing so (you could make a unified diff that does not have
> >> trailing context at all), and I do not offhand think of a good
> >> way to do so.
> >
> > We can. Something like this should do it.
> >
> > (The same thing could be done for "match_beginning", perhaps).
> 
> But this is exactly what I said I had trouble with in the above.

Well, not quite. You said "not the same number of lines", and I say "no 
ending context". Very different.

My patch actually is totally self-consistent: not having any context at 
the end of a unified diff really means that it is the end of the file (ie, 
the "end of file" there _is_ the context). And if you want to apply files 
without context, you should use "-Cx", and my patch does that too - if you 
asked for "relaxed context checking", it will re-try without the "only at 
end" check thanks to the

	if (match_end) {
		match_end = 0;
		continue;
	}

so it all should work.

Not that I _tested_ it, of course ;)

		Linus

^ permalink raw reply

* cg-clone -a
From: Belmar-Letelier @ 2006-05-24 15:12 UTC (permalink / raw)
  To: git

Hello

Any news about a a way to get all tags with cogito ?

Some kind of

$ cg-clone -a

-- 
Luis Belmar-Letelier

^ permalink raw reply

* Clean up sha1 file writing
From: Linus Torvalds @ 2006-05-24 15:30 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


This cleans up and future-proofs the sha1 file writing in sha1_file.c.

In particular, instead of doing a simple "write()" call and just verifying 
that it succeeds (or - as in one place - just assuming it does), it uses 
"write_buffer()" to write data to the file descriptor while correctly 
checking for partial writes, EINTR etc.

It also splits up write_sha1_to_fd() to be a lot more readable: if we need 
to re-create the compressed object, we do so in a separate helper 
function, making the logic a whole lot more modular and obvious.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

This shouldn't change any behaviour, and it's obviously touching some core 
code, so maybe it's not worth it. On the other hand, from a longer-term 
maintenance standpoint and from a "be much more careful when doing file 
writes" standpoint, I think it's worth it.

The re-write is "obviously correct" (famous last words) and is mostly 
just moving code around and getting rid of a few temporaries that become 
unnecessary as a result.

The patch looks a bit messy: the changes aren't actually that big, but the 
split-up and the resulting re-indentation makes the patch fairly 
unreadable, so the cleanups are more obvious when you look at the 
before-and-after side by side rather than when looking at the unified 
diff..)

diff --git a/sha1_file.c b/sha1_file.c
index 2230010..c2fe7c2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1399,6 +1399,25 @@ int move_temp_to_file(const char *tmpfil
 	return 0;
 }
 
+static int write_buffer(int fd, const void *buf, size_t len)
+{
+	while (len) {
+		ssize_t size;
+
+		size = write(fd, buf, len);
+		if (!size)
+			return error("file write: disk full");
+		if (size < 0) {
+			if (errno == EINTR || errno == EAGAIN)
+				continue;
+			return error("file write error (%s)", strerror(errno));
+		}
+		len -= size;
+		buf += size;
+	}
+	return 0;
+}
+
 int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
 {
 	int size;
@@ -1465,8 +1484,8 @@ int write_sha1_file(void *buf, unsigned 
 	deflateEnd(&stream);
 	size = stream.total_out;
 
-	if (write(fd, compressed, size) != size)
-		die("unable to write file");
+	if (write_buffer(fd, compressed, size) < 0)
+		die("unable to write sha1 file");
 	fchmod(fd, 0444);
 	close(fd);
 	free(compressed);
@@ -1474,73 +1493,70 @@ int write_sha1_file(void *buf, unsigned 
 	return move_temp_to_file(tmpfile, filename);
 }
 
-int write_sha1_to_fd(int fd, const unsigned char *sha1)
+/*
+ * We need to unpack and recompress the object for writing
+ * it out to a different file.
+ */
+static void *repack_object(const unsigned char *sha1, unsigned long *objsize)
 {
-	ssize_t size;
-	unsigned long objsize;
-	int posn = 0;
-	void *map = map_sha1_file_internal(sha1, &objsize);
-	void *buf = map;
-	void *temp_obj = NULL;
+	size_t size;
 	z_stream stream;
+	unsigned char *unpacked;
+	unsigned long len;
+	char type[20];
+	char hdr[50];
+	int hdrlen;
+	void *buf;
 
-	if (!buf) {
-		unsigned char *unpacked;
-		unsigned long len;
-		char type[20];
-		char hdr[50];
-		int hdrlen;
-		// need to unpack and recompress it by itself
-		unpacked = read_packed_sha1(sha1, type, &len);
+	// need to unpack and recompress it by itself
+	unpacked = read_packed_sha1(sha1, type, &len);
 
-		hdrlen = sprintf(hdr, "%s %lu", type, len) + 1;
+	hdrlen = sprintf(hdr, "%s %lu", type, len) + 1;
 
-		/* Set it up */
-		memset(&stream, 0, sizeof(stream));
-		deflateInit(&stream, Z_BEST_COMPRESSION);
-		size = deflateBound(&stream, len + hdrlen);
-		temp_obj = buf = xmalloc(size);
+	/* Set it up */
+	memset(&stream, 0, sizeof(stream));
+	deflateInit(&stream, Z_BEST_COMPRESSION);
+	size = deflateBound(&stream, len + hdrlen);
+	buf = xmalloc(size);
 
-		/* Compress it */
-		stream.next_out = buf;
-		stream.avail_out = size;
+	/* Compress it */
+	stream.next_out = buf;
+	stream.avail_out = size;
 		
-		/* First header.. */
-		stream.next_in = (void *)hdr;
-		stream.avail_in = hdrlen;
-		while (deflate(&stream, 0) == Z_OK)
-			/* nothing */;
+	/* First header.. */
+	stream.next_in = (void *)hdr;
+	stream.avail_in = hdrlen;
+	while (deflate(&stream, 0) == Z_OK)
+		/* nothing */;
 
-		/* Then the data itself.. */
-		stream.next_in = unpacked;
-		stream.avail_in = len;
-		while (deflate(&stream, Z_FINISH) == Z_OK)
-			/* nothing */;
-		deflateEnd(&stream);
-		free(unpacked);
-		
-		objsize = stream.total_out;
-	}
+	/* Then the data itself.. */
+	stream.next_in = unpacked;
+	stream.avail_in = len;
+	while (deflate(&stream, Z_FINISH) == Z_OK)
+		/* nothing */;
+	deflateEnd(&stream);
+	free(unpacked);
 
-	do {
-		size = write(fd, buf + posn, objsize - posn);
-		if (size <= 0) {
-			if (!size) {
-				fprintf(stderr, "write closed\n");
-			} else {
-				perror("write ");
-			}
-			return -1;
-		}
-		posn += size;
-	} while (posn < objsize);
+	*objsize = stream.total_out;
+	return buf;
+}
 
-	if (map)
-		munmap(map, objsize);
-	if (temp_obj)
-		free(temp_obj);
+int write_sha1_to_fd(int fd, const unsigned char *sha1)
+{
+	int retval;
+	unsigned long objsize;
+	void *buf = map_sha1_file_internal(sha1, &objsize);
 
-	return 0;
+	if (buf) {
+		retval = write_buffer(fd, buf, objsize);
+		munmap(buf, objsize);
+		return retval;
+	}
+
+	buf = repack_object(sha1, &objsize);
+	retval = write_buffer(fd, buf, objsize);    
+	free(buf);
+	return retval;
 }
 
 int write_sha1_from_fd(const unsigned char *sha1, int fd, char *buffer,
@@ -1579,7 +1595,8 @@ int write_sha1_from_fd(const unsigned ch
 				SHA1_Update(&c, discard, sizeof(discard) -
 					    stream.avail_out);
 			} while (stream.avail_in && ret == Z_OK);
-			write(local, buffer, *bufposn - stream.avail_in);
+			if (write_buffer(local, buffer, *bufposn - stream.avail_in) < 0)
+				die("unable to write sha1 file");
 			memmove(buffer, buffer + *bufposn - stream.avail_in,
 				stream.avail_in);
 			*bufposn = stream.avail_in;

^ permalink raw reply related

* Re: Slow fetches of tags
From: Linus Torvalds @ 2006-05-24 16:45 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: git
In-Reply-To: <20060524131022.GA11449@linux-mips.org>



On Wed, 24 May 2006, Ralf Baechle wrote:
>
> I have a fairly large git tree (with a 320MB pack file containing some
> 700,000 objects).  A small fetch like
> 
>   git fetch git://www.kernel.org/pub/scm/linux/kernel/git/stable/\
>        linux-2.6.16.y.git master:v2.6.16-stable
> 
> which only fetches a handful of objects (v2.6.16.17 -> v2.6.16.18) will
> take on the order of 4-5 minutes.  Adding the "-n" option is will bring
> the operation down to under a second, so it really is just the tags
> that are slowing things down so much..

So this is a tree where you already _have_ most of the tags, no?

Can you add a printout to show what the "taglist" is for you in 
git-fetch.sh (just before the thing that does that

	fetch_main "$taglist"

thing?). It _should_ have pruned out all the tags you already have.

Or is it just the "git-ls-remote" that takes forever? (Or, if you run 
"top", is there something that is an obviously heavy operation on the 
client side?)

		Linus

^ permalink raw reply

* Re: Incremental cvsimports
From: Junio C Hamano @ 2006-05-24 17:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Langhoff, git, geoff
In-Reply-To: <20060524135828.GA23934@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Odd. It's either a bug with importing tags in older versions, or there's
> some deep perl voodoo that I don't understand (either way, it is "fixed"
> in more recent versions).  Importing ENOENT directly is reasonable.

Sounds good.  Thanks for the back-and-forth helping others in
the community.  I appreciate it.

> Junio, can you apply the following fix?

Will do, but I would have preferred if you did the commit log
message and the stuff properly.  Less work for me ;-).

>
> diff --git a/git-cvsimport.perl b/git-cvsimport.perl
> index af331d9..76f6246 100755
> --- a/git-cvsimport.perl
> +++ b/git-cvsimport.perl
> @@ -23,7 +23,7 @@ use File::Basename qw(basename dirname);
>  use Time::Local;
>  use IO::Socket;
>  use IO::Pipe;
> -use POSIX qw(strftime dup2 :errno_h);
> +use POSIX qw(strftime dup2 ENOENT);
>  use IPC::Open2;
>  
>  $SIG{'PIPE'}="IGNORE";

^ permalink raw reply

* Re: Slow fetches of tags
From: Linus Torvalds @ 2006-05-24 17:21 UTC (permalink / raw)
  To: Ralf Baechle, Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0605240931480.5623@g5.osdl.org>

On Wed, 24 May 2006, Linus Torvalds wrote:
> 
> Can you add a printout to show what the "taglist" is for you in 
> git-fetch.sh (just before the thing that does that
> 
> 	fetch_main "$taglist"
> 
> thing?). It _should_ have pruned out all the tags you already have.

Actually, looking at that tag-fetching logic, we already know that we have 
the objects that the tags point to (because those are the only kinds that 
we should auto-follow). I wonder if the slowness is because of all the 
have/want commit following, which walks the whole tree to say "I have 
this", when in this case we really should directly say "I have these" for 
the objects that the tags point to.

So the problem may be that we basically send a totally unnecessary list of 
all the objects we have, when the other end really only cares about the 
fact that we have the objects that the tags point to. Which we know we do, 
but we didn't say so, because "git-fetch" didn't really mark them that 
way.

And instead of sending the commits that we know we have, and that we know 
are the interesting ones and that will cut off the tag-object-walk, we 
start from all the local tips, and use the regular "parse commits in date 
order" thing and send "have" lines for everything we see that isn't 
common. Walking a lot of unnecessary crud.

Junio? Any ideas? I didn't want to do that tag-auto-following, and while I 
admit it's damn convenient, it's really quite broken, methinks. 

I almost suspect that we need to have a syntax where-by the local 
fetch-list ends up doing

	"$tagname:$tagname:$sha1wehave"

as the argument to fetch-pack, and then fetch-pack would be modified to 
send those "$sha1wehave" objects early as "have" objects. Ie start from 
something like

	diff --git a/git-fetch.sh b/git-fetch.sh
	index 280f62e..dce3812 100755
	--- a/git-fetch.sh
	+++ b/git-fetch.sh
	@@ -400,7 +400,7 @@ case "$no_tags$tags" in
	 			}
	 			git-cat-file -t "$sha1" >/dev/null 2>&1 || continue
	 			echo >&2 "Auto-following $name"
	-			echo ".${name}:${name}"
	+			echo ".${name}:${name}:${sha1}"
	 		done)
	 	esac
	 	case "$taglist" in

and then pass the info all the way up (the above patch will obviously 
result in a totally broken script, everything downstream from that point 
would have to be taught about the "already have this" part too).

Ralf, which repo is this, so that others (me, if I get the time and 
energy, Junio or some other hapless sucker^W^Whero if I'm lucky) can try 
things out?

		Linus

^ permalink raw reply

* Re: Slow fetches of tags
From: Ralf Baechle @ 2006-05-24 18:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0605240931480.5623@g5.osdl.org>

On Wed, May 24, 2006 at 09:45:29AM -0700, Linus Torvalds wrote:

> So this is a tree where you already _have_ most of the tags, no?

Yes, git did end up only fetching v2.6.16.18 as the single tag.

> Can you add a printout to show what the "taglist" is for you in 
> git-fetch.sh (just before the thing that does that
> 
> 	fetch_main "$taglist"
> 
> thing?). It _should_ have pruned out all the tags you already have.

Right, it's just "refs/tags/v2.6.16.18:refs/tags/v2.6.16.18".

> Or is it just the "git-ls-remote" that takes forever?

git-ls-remote git://www.kernel.org/pub/scm/linux/kernel/git/stable/\
linux-2.6.16.y takes about 1.5s.

> (Or, if you run 
> "top", is there something that is an obviously heavy operation on the 
> client side?)

git-fetch-pack was burning some 6min CPU.  Nothing else even even shows
up on the "top" radar.

Another funny thing I noticed in top is that the git-fetch-pack arguments
got overwritten:

$ cat /proc/1702/cmdline | tr '\0' ' '
git-fetch-pack --thin git //www.kernel.org pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git  efs/heads/master  efs/tags/v2.6.16.18

Guess that doesn't matter.  Anyway, so I ran strace on this git-fetch-pack
invocation:

[...]
munmap(0xb7fe5000, 229)                 = 0
getdents(5, /* 0 entries */, 4096)      = 0
close(5)                                = 0
getdents(4, /* 0 entries */, 4096)      = 0
close(4)                                = 0
write(3, "0046want 9b549d8e1e2f16cffbb414a"..., 70) = 70
write(3, "0000", 4)                     = 4
write(3, "0032have 0bcf7932d0ea742e765a40b"..., 50) = 50
write(3, "0032have 54e938a80873e85f9c02ab4"..., 50) = 50
write(3, "0032have 2d0a9369c540519bab8018e"..., 50) = 50
write(3, "0032have bf3060065ef9f0a8274fc32"..., 50) = 50
write(3, "0032have 27602bd8de8456ac619b77c"..., 50) = 50
[... another 42,000+ similar lines chopped off ...]

9b549d8e1e2f16cffbb414a is Chris Wright's tag for v2.6.16.18.  So far,
as expected.

And this is where things are getting interesting:

$ git-name-rev 0bcf7932d0ea742e765a40b
0bcf7932d0ea742e765a40b master
$ git-name-rev 54e938a80873e85f9c02ab4
54e938a80873e85f9c02ab4 34k-2.6.16.18
$ git-name-rev 2d0a9369c540519bab8018e
2d0a9369c540519bab8018e 34k-2.6.16.18~1
$ git-name-rev bf3060065ef9f0a8274fc32
bf3060065ef9f0a8274fc32 34k-2.6.16.18~2
$ git-name-rev 27602bd8de8456ac619b77c
27602bd8de8456ac619b77c 34k-2.6.16.18~3

It's sending every object back to the start of history ...

  Ralf

^ permalink raw reply

* Re: Slow fetches of tags
From: Junio C Hamano @ 2006-05-24 18:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ralf Baechle, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0605240947580.5623@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> So the problem may be that we basically send a totally unnecessary list of 
> all the objects we have, when the other end really only cares about the 
> fact that we have the objects that the tags point to. Which we know we do, 
> but we didn't say so, because "git-fetch" didn't really mark them that 
> way.

I think this speculation is correct.  We should be able to do
better.

> I almost suspect that we need to have a syntax where-by the local 
> fetch-list ends up doing
>
> 	"$tagname:$tagname:$sha1wehave"
>
> as the argument to fetch-pack, and then fetch-pack would be modified to 
> send those "$sha1wehave" objects early as "have" objects.

But this logic has to be a bit more involved.

A "have" object is not just has_sha1_file(), but it needs to be
reachable from one of our tips we have already verified as
complete, so either the caller of fetch-pack does the
verification and give a verified $sha1wehave, or fetch-pack
takes $sha1weseemtohave and does its own verification and then
send it as one of the "have" objects (the issue is the same as
the one in my previous message to Eric W. Biederman -- we trust
only refs not just having a single object).

It might be useful to have a helper script you can give N object
names and M refs (and/or --all flag to mean "all of the refs"),
which returns the ones that are reachable from the given refs.
It would be even more useful if it were a helper function, but
given that the computation would involve walking the ancestry
chain, I suspect it would have a bad interaction with any user
of such a helper function that wants to do its own ancestry
walking, because many of them seem to assume an object that has
already been parsed are the ones they parsed for their own
purpose.

^ permalink raw reply

* Re: Clean up sha1 file writing
From: Matthias Lederhofer @ 2006-05-24 18:14 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0605240820560.5623@g5.osdl.org>

> checking for partial writes
Just out of interest: is this to be safe on any OS or should this
be checked always?

> +		size = write(fd, buf, len);
> +		if (!size)
> +			return error("file write: disk full");
Shouldn't write to a full disk return -1 with ENOSPC?

^ permalink raw reply

* Re: Slow fetches of tags
From: Junio C Hamano @ 2006-05-24 18:41 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: git
In-Reply-To: <20060524180813.GA32519@linux-mips.org>

Ralf Baechle <ralf@linux-mips.org> writes:

>> Or is it just the "git-ls-remote" that takes forever?
>
> git-ls-remote git://www.kernel.org/pub/scm/linux/kernel/git/stable/\
> linux-2.6.16.y takes about 1.5s.

Good; that is as expected.  ls-remote over git protocol just
gets the initial "have" lines from the upload-pack and exits,
and there is no handshaking.

> Another funny thing I noticed in top is that the git-fetch-pack arguments
> got overwritten:
>
> $ cat /proc/1702/cmdline | tr '\0' ' '
> git-fetch-pack --thin git //www.kernel.org pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git  efs/heads/master  efs/tags/v2.6.16.18
>
> Guess that doesn't matter.

This is also expected - fetch-pack (connect.c::path_match(), actually) 
smudges the list of refs to remember which ones the caller asked
are going to be fulfilled and which ones are not.  Not the most
beautiful part of the code ;-).

> Guess that doesn't matter.  Anyway, so I ran strace on this git-fetch-pack
> invocation:
>
> [...]
> munmap(0xb7fe5000, 229)                 = 0
> getdents(5, /* 0 entries */, 4096)      = 0
> close(5)                                = 0
> getdents(4, /* 0 entries */, 4096)      = 0
> close(4)                                = 0
> write(3, "0046want 9b549d8e1e2f16cffbb414a"..., 70) = 70
> write(3, "0000", 4)                     = 4
> write(3, "0032have 0bcf7932d0ea742e765a40b"..., 50) = 50
> write(3, "0032have 54e938a80873e85f9c02ab4"..., 50) = 50
> write(3, "0032have 2d0a9369c540519bab8018e"..., 50) = 50
> write(3, "0032have bf3060065ef9f0a8274fc32"..., 50) = 50
> write(3, "0032have 27602bd8de8456ac619b77c"..., 50) = 50
> [... another 42,000+ similar lines chopped off ...]
>
> 9b549d8e1e2f16cffbb414a is Chris Wright's tag for v2.6.16.18.  So far,
> as expected.
>
> And this is where things are getting interesting:
>
> $ git-name-rev 0bcf7932d0ea742e765a40b
> 0bcf7932d0ea742e765a40b master
> $ git-name-rev 54e938a80873e85f9c02ab4
> 54e938a80873e85f9c02ab4 34k-2.6.16.18
> $ git-name-rev 2d0a9369c540519bab8018e
> 2d0a9369c540519bab8018e 34k-2.6.16.18~1
> $ git-name-rev bf3060065ef9f0a8274fc32
> bf3060065ef9f0a8274fc32 34k-2.6.16.18~2
> $ git-name-rev 27602bd8de8456ac619b77c
> 27602bd8de8456ac619b77c 34k-2.6.16.18~3
>
> It's sending every object back to the start of history ...

Is this "master" commit 0bcf79 part of v2.6.16.18 history?  If
not, how diverged are you?  That is, what does this command tell
you?

	git rev-list b7d0617..master | wc -l

Here, b7d0617 is the name of the commit object that is pointed
by v2.6.16.18 tag.

^ permalink raw reply

* Re: Clean up sha1 file writing
From: Linus Torvalds @ 2006-05-24 18:52 UTC (permalink / raw)
  To: Matthias Lederhofer; +Cc: Git Mailing List
In-Reply-To: <E1Fixs4-0005pD-10@moooo.ath.cx>

On Wed, 24 May 2006, Matthias Lederhofer wrote:

> > checking for partial writes
>
> Just out of interest: is this to be safe on any OS or should this
> be checked always?

Any POSIX-conformant OS/filesystem should always do a full write for a 
regular file, unless a serious error happens.

HOWEVER. 

In practice, you can get partial writes at least over NFS (hey, it may not 
be posix, but it's _common_) when the filesystem has been mounted soft 
(and/or interruptible). And obviously if your file descriptor isn't a 
regular file, you can easily get partial writes.

Doing the loop is always safe, so it's worth doing it that way.

> > +		size = write(fd, buf, len);
> > +		if (!size)
> > +			return error("file write: disk full");
>
> Shouldn't write to a full disk return -1 with ENOSPC?

In that case, the "size < 0" check will catch it. The "return zero for 
full" case is an alternate error return (it happens for block device files 
at the end, it could happen for other things too). So the "returns zero 
means full" is the portable/safe thing to do.

		Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox