Git development

Git development
 help / color / mirror / Atom feed

* Re: irc usage..
From: Matthias Urlichs @ 2006-05-22 21:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221256090.3697@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 872 bytes --]

Hi,

Linus Torvalds:
> I wonder why those "git-update-index" calls seem to be (assuming I read 
> the perl correctly) done only a few files at a time. We can do a hundreds 
> in one go, but it seems to want to do just ten files or something at the 
> same time.

No, fifty.

I simply was too lazy to count the actual filenames' lengths. ;-)

> That thing would probably be an order of magnitude faster if written to 
> use the git library interfaces directly. Of course, the CVS part is 
> probably a big overhead, so it might not help much 

The beast *was* mainly written to do this remotely...

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
The worst form of inequality is to try to make unequal things equal.
					-- Aristotle

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: Current Issues #3
From: Carl Worth @ 2006-05-22 22:02 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0605221738090.6713@iabervon.org>

[-- Attachment #1: Type: text/plain, Size: 214 bytes --]

On Mon, 22 May 2006 17:54:28 -0400 (EDT), Daniel Barkalow wrote:
> On Mon, 22 May 2006, Junio C Hamano wrote:
> 
> > * reflog

Am I the only one that read that as re-flog rather than ref-log the
first time?

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: Petr Baudis @ 2006-05-22 22:02 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Sean, git
In-Reply-To: <44722A8F.9020609@zytor.com>

Dear diary, on Mon, May 22, 2006 at 11:18:07PM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> said that...
> Sean wrote:
> >On Sun, 21 May 2006 16:47:45 -0700
> >"H. Peter Anvin" <hpa@zytor.com> wrote:
> >
> >>It appears that doing a *local* -- meaning using a file path or file URL 
> >>-- clone or fetch with cogito is just glacial when the repository has an 
> >>even moderate number of tags (and it's fetching the tags that takes all 
> >>the time.)  That's a really serious problem for me.
> >>
> >
> >Peter, does git clone work acceptably for you?
> >
> 
> Well, it does, except it doesn't set up the cogito branches (which one can 
> of course copy manually.)

What about incremental fetches using git-fetch? From a quick scan of the
git-fetch automagic tags following code, it seems to be even
significantly more expensive than Cogito's (in terms of number of
forks).

git-clone has an advantage here since it clones _everything_ while
Cogito fetches only stuff related to the branch you are cloning, and
verifying if what it fetches is sensible for you unfortunately takes a
lot of time. :/ I guess there is no way to verify presence of multiple
objects at once and there is also no way to order local fetch of
multiple objects at once.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: Current Issues #3
From: Daniel Barkalow @ 2006-05-22 21:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v8xoue9eo.fsf@assigned-by-dhcp.cox.net>

On Mon, 22 May 2006, Junio C Hamano wrote:

> * reflog
> 
>   I still haven't merged this series to "next" -- I do not have
>   much against what the code does, but I am unconvinced if it is
>   useful.  Also objections raised on the list that this can be
>   replaced by making sure that a repository that has hundreds of
>   tags usable certainly have a point.

I think it would make gitweb's summary view clearer, and Linus seemed 
interested in being able to look up what happened in the fast forward 
which was the first of several merges in a day.

It could be replaced by a repository with hundreds of machine-readable 
tags with code to parse dates into queries for suitable tags. But I don't 
think there's an advantage to using the tag mechanism here, because you 
never want to look the history up by exactly which history it is (the 
thing that a tag ref is good for); you'll be looking for whatever reflog 
item is the newest not after a specified time, where the specified time is 
almost never a time that a reflog item was created.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Re: irc usage..
From: Donnie Berkholz @ 2006-05-22 21:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221312380.3697@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 233 bytes --]

Linus Torvalds wrote:
> The latest stable CVS release is 1.11.21, I think: you seem to be running 
> the "development" version (1.12.x).

Backed down to the 1.11 series, things seem to be going fine so far.

Thanks,
Donnie


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: H. Peter Anvin @ 2006-05-22 21:18 UTC (permalink / raw)
  To: Sean; +Cc: git
In-Reply-To: <BAYC1-PASMTP11FDE05B530CFF43C043E5AE9A0@CEZ.ICE>

Sean wrote:
> On Sun, 21 May 2006 16:47:45 -0700
> "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
>> It appears that doing a *local* -- meaning using a file path or file URL 
>> -- clone or fetch with cogito is just glacial when the repository has an 
>> even moderate number of tags (and it's fetching the tags that takes all 
>> the time.)  That's a really serious problem for me.
>>
> 
> Peter, does git clone work acceptably for you?
> 

Well, it does, except it doesn't set up the cogito branches (which one can of course copy 
manually.)

cg-clone probably should be rewritten as a thin wrapper around git-clone.

	-hpa

^ permalink raw reply

* [PATCH] git status: ignore empty directories (because they cannot be added)
From: Matthias Lederhofer @ 2006-05-22 21:02 UTC (permalink / raw)
  To: git

and a new option -u / --untracked-files to show files in untracked
directories.

---
A few things I'm not sure about:
- Should there be another option to disable --no-empty-directory?
- Is the option name --untracked-files ok?
- Should it be documented (probably yes :))? At the moment the
  git-status man page does not tell about any command line option at
  all but for git-commit it does not make sense.

 git-commit.sh |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

---

1921592d5e7809f72a902cca1a38217b150800a9
diff --git a/git-commit.sh b/git-commit.sh
index 6ef1a9d..6785826 100755
--- a/git-commit.sh
+++ b/git-commit.sh
@@ -3,7 +3,7 @@ #
 # Copyright (c) 2005 Linus Torvalds
 # Copyright (c) 2006 Junio C Hamano
 
-USAGE='[-a] [-s] [-v] [--no-verify] [-m <message> | -F <logfile> | (-C|-c) <commit>) [--amend] [-e] [--author <author>] [[-i | -o] <path>...]'
+USAGE='[-a] [-s] [-v] [--no-verify] [-m <message> | -F <logfile> | (-C|-c) <commit>] [-u] [--amend] [-e] [--author <author>] [[-i | -o] <path>...]'
 SUBDIRECTORY_OK=Yes
 . git-sh-setup
 
@@ -134,13 +134,17 @@ #'
 	report "Changed but not updated" \
 	    "use git-update-index to mark for commit"
 
+        option=""
+        if test -z "$untracked_files"; then
+            option="--directory --no-empty-directory"
+        fi
 	if test -f "$GIT_DIR/info/exclude"
 	then
-	    git-ls-files -z --others --directory \
+	    git-ls-files -z --others $option \
 		--exclude-from="$GIT_DIR/info/exclude" \
 		--exclude-per-directory=.gitignore
 	else
-	    git-ls-files -z --others --directory \
+	    git-ls-files -z --others $option \
 		--exclude-per-directory=.gitignore
 	fi |
 	perl -e '$/ = "\0";
@@ -203,6 +207,7 @@ verbose=
 signoff=
 force_author=
 only_include_assumed=
+untracked_files=
 while case "$#" in 0) break;; esac
 do
   case "$1" in
@@ -340,6 +345,12 @@ do
       verbose=t
       shift
       ;;
+  -u|--u|--un|--unt|--untr|--untra|--untrac|--untrack|--untracke|--untracked|\
+  --untracked-|--untracked-f|--untracked-fi|--untracked-fil|--untracked-file|\
+  --untracked-files)
+      untracked_files=t
+      shift
+      ;;
   --)
       shift
       break
-- 
1.3.2

^ permalink raw reply related

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 20:33 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221256090.3697@g5.osdl.org>

On Mon, 22 May 2006, Linus Torvalds wrote:
> 
> Of course, the CVS part is probably a big overhead, so it might not help 
> much (I would not be surprised at all if a number of the fork/exec/exit 
> things are due to the CVS server starting RCS or something, not due to 
> git-cvsimport itself)

Ahh. stracing the CVS server seems to imply that it forks off a subprocess 
for every command. It doesn't actually execute any external program, but 
just does a fork + muck around in the ,v files + exit.

Maybe one of the changes in the 1.12.x versions is to not do that, which 
might explain why Donnie seems to see much better performance, but also 
sees all the memory leakage?

		Linus

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 20:20 UTC (permalink / raw)
  To: Donnie Berkholz
  Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <447215D4.5020403@gentoo.org>

On Mon, 22 May 2006, Donnie Berkholz wrote:
>
> Linus Torvalds wrote:
> > Hmm. My cvs server doesn't really grow at all. It's at 13M RSS.
> 
> Yeah, that's the thing. RSS stayed about the same (according to top),
> but virtual just kept growing.

Not for me. The virtual size is certainly bigger than RSS, but not by a 
huge amount. So this might be a regression in CVS, since you seem to have 
a newer version than I do.

The latest stable CVS release is 1.11.21, I think: you seem to be running 
the "development" version (1.12.x).

			Linus

^ permalink raw reply

* Re: irc usage..
From: Donnie Berkholz @ 2006-05-22 20:16 UTC (permalink / raw)
  To: Donnie Berkholz
  Cc: Martin Langhoff, Linus Torvalds, Yann Dirson, Git Mailing List,
	Matthias Urlichs, Johannes Schindelin
In-Reply-To: <44720C66.6040304@gentoo.org>

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]

Donnie Berkholz wrote:
> OK, I started a new run without -L, and I'm watching it in top right
> now.

Tried a run with -L 1024 and it broke in just a couple of minutes:

Fetching
sys-kernel/linux/files/2.4.0.8/linux-2.4.0-ac8-reiserfs-3.6.25-nfs.diff.gz
  v 1.1
New
sys-kernel/linux/files/2.4.0.8/linux-2.4.0-ac8-reiserfs-3.6.25-nfs.diff.gz:
6367 bytes
Tree ID 457f629df10e70a5ef430f431eca27ed02a83d46
Parent ID 0541d8b54a02df3be50d529497236556c6862a4c
Committed patch 1024 (origin 2001-01-13 00:29:39)
Commit ID ba9d995d12a37502a851e198b67e141623f79544
DONE; creating master branch
cat: write error: Broken pipe

Thanks,
Donnie


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 20:11 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <46a038f90605221241x58ffa2a4o26159d38d86a8092@mail.gmail.com>

On Tue, 23 May 2006, Martin Langhoff wrote:
> 
> The dev machine where I am running the import is a slug! It's still
> working on it, only gotten to 7700 commits, with the cvsimport process
> stable at 28MB RAM and cvs stable at 4MB.

I have to say, that cvsimport script really does do horrible things. It's 
basically a fork/exec/exit benchmark, as far as I can tell. Running 
oprofile on the thing, the top offenders are (ignore the 45% idle thing: 
it's just because this was run on a dual-cpu system, so since it's almost 
completely single-threaded you get ~50% idle by default).

	3117654  45.8708  vmlinux                  vmlinux                  .power4_idle
	802313   11.8046  vmlinux                  vmlinux                  .unmap_vmas
	632913    9.3122  vmlinux                  vmlinux                  .copy_page_range
	150359    2.2123  vmlinux                  vmlinux                  .release_pages
	131330    1.9323  vmlinux                  vmlinux                  .vm_normal_page
	117836    1.7337  libperl.so               libperl.so               (no symbols)
	74098     1.0902  libgklayout.so           libgklayout.so           (no symbols)
	54680     0.8045  vmlinux                  vmlinux                  .free_pages_and_swap_cache
	54300     0.7989  libfb.so                 libfb.so                 (no symbols)
	49052     0.7217  vmlinux                  vmlinux                  .copy_4K_page
	46559     0.6850  libc-2.4.so              libc-2.4.so              getc
	42677     0.6279  vmlinux                  vmlinux                  .page_remove_rmap
	41133     0.6052  libc-2.4.so              libc-2.4.so              ferror
	..

those kernel functions are all about process create/exit, and COW faulting 
after the fork.

Now, this is on ppc, so process creation is likely slower (idiotic PPC VM 
page table hashes), but Linux is actually very good at doing this, and the 
fact that process create/exit is so high is a very big sign that the 
script just ends up executing a _ton_ of small simple processes that do 
almost nothing.

I wonder why those "git-update-index" calls seem to be (assuming I read 
the perl correctly) done only a few files at a time. We can do a hundreds 
in one go, but it seems to want to do just ten files or something at the 
same time. Although since most commits should hopefully just modify a 
couple of files, that probably isn't a big deal.

That thing would probably be an order of magnitude faster if written to 
use the git library interfaces directly. Of course, the CVS part is 
probably a big overhead, so it might not help much (I would not be 
surprised at all if a number of the fork/exec/exit things are due to the 
CVS server starting RCS or something, not due to git-cvsimport itself)

		Linus

^ permalink raw reply

* Re: irc usage..
From: Donnie Berkholz @ 2006-05-22 19:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221234430.3697@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

Linus Torvalds wrote:
> Hmm. My cvs server doesn't really grow at all. It's at 13M RSS.

Yeah, that's the thing. RSS stayed about the same (according to top),
but virtual just kept growing.

> What version of cvs are you running?
> 
> 	[torvalds@g5 ~]$ cvs --version
> 
> 	Concurrent Versions System (CVS) 1.11.21 (client/server)

Concurrent Versions System (CVS) 1.12.12 (client/server)

Looks like there's a .13 out but the zlib interaction is badly broken
(-z >=1) so my system didn't get upgraded. I'll try it anyway after the
-L run finishes.

Thanks,
Donnie


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply

* Re: irc usage..
From: Martin Langhoff @ 2006-05-22 19:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221013020.3697@g5.osdl.org>

On 5/23/06, Linus Torvalds <torvalds@osdl.org> wrote:
> Ok, initial results are promising. git-cvsimport appears to be still
> slowly growing, but it's at 40M (ie pretty tiny, considering that cvsps
> grew to 800+MB on this archive) and growth seems to actually be slowing.

That's great news. The cvs archive seems to have large commits every
once in a while, so I suspect the residual memory growth may be
related to those. Or to a smaller leak I haven't nailed.

My test box is bloody slow it seems. I'll try and get hold of a faster
machine to run this if I can.

> As to packing, it doing something like

Given that we are running batch, it is safe and simple to stop the
import, repack, prune-packed, and keep going. Don't think we'll win
any races by running it in parallel ;-)

cheers,

martin

^ permalink raw reply

* Re: irc usage..
From: Martin Langhoff @ 2006-05-22 19:41 UTC (permalink / raw)
  To: Donnie Berkholz
  Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <44720C66.6040304@gentoo.org>

On 5/23/06, Donnie Berkholz <spyderous@gentoo.org> wrote:
> So it seems the problem is in cvs itself. I will try another run with -L
> now.

What version of cvs are you using? Perhaps trying a different one?

The dev machine where I am running the import is a slug! It's still
working on it, only gotten to 7700 commits, with the cvsimport process
stable at 28MB RAM and cvs stable at 4MB.

cheers,

martin

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 19:38 UTC (permalink / raw)
  To: Donnie Berkholz
  Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <44720C66.6040304@gentoo.org>



On Mon, 22 May 2006, Donnie Berkholz wrote:
> 
> OK, I started a new run without -L, and I'm watching it in top right
> now. The cvsimport seems to be doing alright, but the cvs server process
> sucks about another megabyte of virtual every 4-5 seconds. This is a bit
> concerning since I don't have any swap. Shortly after it hit 670M, I got
> "Cannot allocate memory" again. I've got a gig of RAM, and around 300M
> was resident in various processes at the time.

Hmm. My cvs server doesn't really grow at all. It's at 13M RSS.

What version of cvs are you running?

	[torvalds@g5 ~]$ cvs --version

	Concurrent Versions System (CVS) 1.11.21 (client/server)

maybe that matters.

(but my import is only up to Jun 22, 2003 so far).

		Linus

^ permalink raw reply

* [PATCH] Problem: 'trap...exit' causes error message when /bin/sh is ash.
From: Yakov Lerner @ 2006-05-22 19:34 UTC (permalink / raw)
  To: git; +Cc: iler.ml

Problem: 'trap...exit' causes error message when /bin/sh is ash.
Fix: Change 'trap...exit' to 'trap...0' like in other scripts.

Signed-off-by: Yakov Lerner <iler.ml@gmail.com>

---

 git-clone.sh |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

954e49bc242cacd27e002f194d54a6895e64f88c
diff --git a/git-clone.sh b/git-clone.sh
index 227245c..d96894d 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -199,7 +199,7 @@ # Try using "humanish" part of source re
 [ -e "$dir" ] && echo "$dir already exists." && usage
 mkdir -p "$dir" &&
 D=$(cd "$dir" && pwd) &&
-trap 'err=$?; cd ..; rm -r "$D"; exit $err' exit
+trap 'err=$?; cd ..; rm -r "$D"; exit $err' 0
 case "$bare" in
 yes) GIT_DIR="$D" ;;
 *) GIT_DIR="$D/.git" ;;
@@ -407,5 +407,5 @@ Pull: refs/heads/$head_points_at:$origin
 fi
 rm -f "$GIT_DIR/CLONE_HEAD" "$GIT_DIR/REMOTE_HEAD"
 
-trap - exit
+trap - 0
 
-- 
1.3.GIT

^ permalink raw reply related

* Re: [PATCH 2/2] added more informative error messages to git-mktag
From: Junio C Hamano @ 2006-05-22 19:22 UTC (permalink / raw)
  To: Björn Engelmann; +Cc: git
In-Reply-To: <4471CF91.9010202@gmx.de>

Björn Engelmann <BjEngelmann@gmx.de> writes:

> -    if (size < 64)
> +    if (size < 64) {
> +        printf("wanna fool me ? you obviously got the size wrong !\n");
>          return -1;
> +    }

Please do this instead:

	return error("wanna ...");

you can lose the braces and the message goes to the stderr.

> -    if (memcmp(object, "object ", 7))
> +    if (memcmp(object, "object ", 7)) {
> +        printf("char%i: does not start with \"object \"\n", 0);
>          return -1;

Although they may be synonyms, we tend to use %d for ints and it
is more conventional.

>      tag_line++;
> -    if (memcmp(tag_line, "tag ", 4) || tag_line[4] == '\n')
> +    if (memcmp(tag_line, "tag ", 4) || tag_line[4] == '\n') {
> +        printf("char%i: no \"tag \" found\n", (int)tag_line - (int)buffer);
>          return -1;
> +    }

If you have to cast, please do not cast pointers to ints and
take their difference, but take the difference and cast the
resulting ptrdiff_t to int, like this:

	(int)(tag_line - buffer)

Or use "%td" instead of "%i" and lose the cast.

^ permalink raw reply

* Re: [PATCH 1/2] removes the artificial restriction tagsize < 8kb from git-mktag
From: Junio C Hamano @ 2006-05-22 19:19 UTC (permalink / raw)
  To: Björn Engelmann; +Cc: git
In-Reply-To: <4471CF5A.702@gmx.de>

Björn Engelmann <BjEngelmann@gmx.de> writes:

> @@ -154,6 +154,7 @@ extern int ce_match_stat(struct cache_en
>  extern int ce_modified(struct cache_entry *ce, struct stat *st, int);
>  extern int ce_path_match(const struct cache_entry *ce, const char
> **pathspec);
>  extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int
> write_object, const char *type);
> +extern int read_pipe(int fd, char** return_buf, unsigned long*
> return_size);
>  extern int index_pipe(unsigned char *sha1, int fd, const char *type,
> int write_object);

I smell whitespace breakage around here...

> diff --git a/mktag.c b/mktag.c
> index 2328878..79c466c 100644
> --- a/mktag.c
> +++ b/mktag.c
>...
> @@ -114,21 +114,24 @@ int main(int argc, char **argv)
>  
>      setup_git_directory();
>  
> -    // Read the signature
> -    size = 0;
> -    for (;;) {
> -        int ret = xread(0, buffer + size, MAXSIZE - size);
> -        if (ret <= 0)
> -            break;
> -        size += ret;
> +    if (read_pipe(0, &buffer, &size)) {
> +        free(buffer);
> +        die("could not read from stdin");
>      }
> -
> +    

Please do not introduce lines with trailing whitespaces.

>      // Verify it for some basic sanity: it needs to start with "object
> <sha1>\ntype\ntagger "
> -    if (verify_tag(buffer, size) < 0)
> +    if (verify_tag(buffer, size) < 0) {
> +        free(buffer);
>          die("invalid tag signature file");
> +    }

You seem to be striving for extra cleanliness, but I personally
consider it is not worth calling free() immediately before you
call die().

>  
> -    if (write_sha1_file(buffer, size, tag_type, result_sha1) < 0)
> +    if (write_sha1_file(buffer, size, tag_type, result_sha1) < 0) {
> +        free(buffer);
>          die("unable to write tag file");
> +    }
> +        
> +    free(buffer);
> +    
>      printf("%s\n", sha1_to_hex(result_sha1));
>      return 0;
>  }

A call to free() immediately before returning from main() might
look similar to the die() issue I mentioned above, but we might
extend it to do a lot more after writing the tag in the future,
so this one is very good to keep.

^ permalink raw reply

* Re: [PATCH 0/2] tagsize < 8kb restriction
From: Junio C Hamano @ 2006-05-22 19:18 UTC (permalink / raw)
  To: Björn Engelmann; +Cc: git
In-Reply-To: <4471CF23.1070807@gmx.de>

Björn Engelmann <BjEngelmann@gmx.de> writes:

> I am currently working on an interface for source code quality assurance
> tools to automatically scan newly commited code. Since it is the only
> way to add data (scan-results) to an already-existing commit, I decided
> to use tags for that.
>
> Since the scan-results will most definitly exeed the 8kb-limit, I would
> like to remove this artificial restriction.

Lifting the limit is good, but I am not sure if the use of tags
for that purpose is appropriate (or any git object for that
matter).  I'll talk about that at the end.

> What I found odd when writing the patch was that main() in mktag.c uses
> xread() to read from stdin (which respects EAGAIN and EINTR return
> values), but index_pipe() in sha1_file.c just uses read() for doing
> merely the same thing. For unifying both routines i found that xread()
> might be the better choice.

Good.

> Removing the restriction was pretty straightforward but do you think
> this would break something in other places ?

I do not think so offhand.

Now, about the usage of such a long tag for your purpose.

As you noticed, commits and tags are the only types of objetcs
that can refer to other commits structurally.  But there are
cases where you do not even need nor want structural reference.
For example, 'git cherry-pick' records the commit object name of
the cherry-picked commit in the commit message as part of the
text -- such a commit does not have structural reference to the
original commit, and we would not _want_ one.  I have a strong
suspicion that your application does not need or want structural
reference to commits, and it might be better to merely mention
their object names as part of the text the application produces,
just like what 'git cherry-pick' does.

Presumably you will have one such tag per commit, and by default
'fetch' (both cg and git) tries to follow tags, which means
anybody who fetches new revision would automatically download
this QA data -- that is one implication of using a tag to store
this information.  Without knowing the nature of it, I am not
sure if everybody who tracks the source wants such baggage.  If
not, then use of a tag for this may not be appropriate.

Another question is if the QA data expected to be amended or
annotated later, after it is created.

If the answer is yes, then you probably would not want tags --
you can create a new tag that points at the same commit to
update the data, but then you have no structural relationships
given by git between such tags that point at the same commit.
You could infer their order by timestamp but that is about it.
I think you are better off creating a separate QA project that
adds one new file per commit on the main project, and have the
file identify the commit object on the main project (either
start your text file format for QA data with the commit object
name, or name each such QA data file after the commit object
name).  Then your automated procedure could scan and add a new
file to the QA project every time a new commit is made to the
main project, and the data in the QA project can be amended or
annotated and the changes will be version controlled.

If the answer is no, then it is probably better to just use an
append-only log file that textually records which entry
corresponds to which commit in the project.  If it is not
version controlled, and if it is not part of the main project, I
do not see much point in putting the data under git control and
in the same project.

^ permalink raw reply

* Re: irc usage..
From: Donnie Berkholz @ 2006-05-22 19:09 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Linus Torvalds, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <46a038f90605220554y569c11b9p24027772bd2ee79a@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1530 bytes --]

Martin Langhoff wrote:
> On 5/22/06, Linus Torvalds <torvalds@osdl.org> wrote:
>> On Mon, 22 May 2006, Martin Langhoff wrote:
>> >
>> > Or a slow leak in Perl? The 5.8.8 release notes do talk about some
>> > leaks being fixed, but this 5.8.8 isn't making a difference.
>> >
>> > Working on it.
>>
>> Thanks. Looking at what I did convert, that horrid gentoo CVS tree is
>> interesting. The resulting (partial) git history has 93413 commits and
>> 850,000+ objects total, all in a totally linear history.
> 
> Ok, so there's 3 patches posted that should help narrow down the
> problem. There's a new -L <imit> so that Donnie can get his stuff done
> by running it in a while(true) loop. Not proud of it, but hey.
> 
> And there are two patches that I suspect may fix the leak. After
> applying them, the cvsimport process grows up to ~13MB and then tapers
> off, at least as far as my patience has gotten me. It's late on this
> side of the globe so I'll look at the results tomorrow morning.

OK, I started a new run without -L, and I'm watching it in top right
now. The cvsimport seems to be doing alright, but the cvs server process
sucks about another megabyte of virtual every 4-5 seconds. This is a bit
concerning since I don't have any swap. Shortly after it hit 670M, I got
"Cannot allocate memory" again. I've got a gig of RAM, and around 300M
was resident in various processes at the time.

So it seems the problem is in cvs itself. I will try another run with -L
now.

Thanks,
Donnie


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply

* Re: irc usage..
From: Junio C Hamano @ 2006-05-22 19:09 UTC (permalink / raw)
  To: Matthias Lederhofer; +Cc: git
In-Reply-To: <E1FiFgL-0003m6-Eb@moooo.ath.cx>

Matthias Lederhofer <matled@gmx.net> writes:

> ...  Is there any way to
> delete unnecessary packs (those which would repack -a -d delete)?
> Making it possible to do a git repack -a and delete those packs the
> next night?

pack-redundant is supposed to figure it out, but I have never
used it myself so your mileage may vary.

^ permalink raw reply

* Re: irc usage..
From: Matthias Lederhofer @ 2006-05-22 19:03 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0605221055270.3697@g5.osdl.org>

> But people _should_ realize that removing objects is very very special. 

Just a similar question: is there any reason not tu run git
repack/prune-packed as cron job? I would think of something like this
for every night:

- git prune-packed (remove objects packed last time)
- check how many objects git-count-objects counts, if it are not enough
  abort
- git repack

git repack -a -d is probably a bad idea, I guess, because a program
could try to open them after they were deleted.  Is there any way to
delete unnecessary packs (those which would repack -a -d delete)?
Making it possible to do a git repack -a and delete those packs the
next night?

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 18:03 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e4stna$o1g$1@sea.gmane.org>

On Mon, 22 May 2006, Jakub Narebski wrote:
>
> Linus Torvalds wrote:
> 
> >                       git repack -a
> >                       #
> >                       # Stupid sleep to make sure that nobody is still
> >                       # using any unpacked objects after the pack got
> >                       # generated
> >                       #
> >                       sleep 10
> >                       git prune-packed
> 
> Is it really necessary (on Linux at least)? Git boast it's atomicity...

I don't think it's necessary in practice.

But people _should_ realize that removing objects is very very special. 
Whether it's done by "git prune-packed" or "git prune", that's a very 
dangerous operations. "git prune" a lot more so than "git prune-packed", 
of course (in fact, you should _never_ run "git prune" on a repository 
that is active - you _will_ corrupt it)-

Doing "git prune-packed" _should_ be mostly safe on UNIX, since the 
objects all exist in packs, and anybody who already opened an object will 
keep the fd open, and not even notice that the name is gone. However, 
there is at least one race:

	object lookup			"git repack -a -d"
	=============			==================

 - a process does its object
   database setup. No new pack-file
   yet.

					 - mv tmp-packfile active-packfile

					 - git prune-packed

 - the process looks up the object,
   and doesn't look in the pack-file
   because it didn't see the pack-file.

   So it tries to look up an object,
   fails, and errors out.

   It's not a fatal error (just re-try)
   but it could break something like a
   cvsimport

Now, in PRACTICE, I doubt you'd ever hit this. But the fact is, pruning 
your repository (whether prune-packed or a full prune) is _the_ special 
operation. It's something that removes a filesystem representation of an 
object that is otherwise immutable.

		Linus

^ permalink raw reply

* Re: irc usage..
From: Jakub Narebski @ 2006-05-22 17:51 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0605221013020.3697@g5.osdl.org>

Linus Torvalds wrote:

>                       git repack -a
>                       #
>                       # Stupid sleep to make sure that nobody is still
>                       # using any unpacked objects after the pack got
>                       # generated
>                       #
>                       sleep 10
>                       git prune-packed

Is it really necessary (on Linux at least)? Git boast it's atomicity...

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 17:27 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <46a038f90605220554y569c11b9p24027772bd2ee79a@mail.gmail.com>

On Tue, 23 May 2006, Martin Langhoff wrote:
> 
> And there are two patches that I suspect may fix the leak. After
> applying them, the cvsimport process grows up to ~13MB and then tapers
> off, at least as far as my patience has gotten me. It's late on this
> side of the globe so I'll look at the results tomorrow morning.

Ok, initial results are promising. git-cvsimport appears to be still 
slowly growing, but it's at 40M (ie pretty tiny, considering that cvsps 
grew to 800+MB on this archive) and growth seems to actually be slowing.

My conversion is only up to September 2002, but if it doesn't suddenly hit 
some huge growth spurt, I wouldn't expect it to run out of memory. The CVS 
server process itself is tiny, and doesn't seem to grow at all.

As to packing, it doing something like

	while :
	do
		sleep 30

		#
		# repack roughly every 25600 objects
		#
		n=$(ls .git/objects/00 2> /dev/null | wc -l)
		if [ $n -gt 100 ]; then
			git repack -a
			#
			# Stupid sleep to make sure that nobody is still
			# using any unpacked objects after the pack got
			# generated
			#
			sleep 10
			git prune-packed
		fi
	done

or similar (the above is totally untested - I've just done it by hand a 
few times) should work. It's perfectly ok to repack the archive even while 
the cvsimport script is adding more data and changing it.

		Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox