Git development
 help / color / mirror / Atom feed
* Re: [PATCH] strbuf: instate cleanup rule in case of non-memory errors
From: Junio C Hamano @ 2009-01-07 21:19 UTC (permalink / raw)
  To: René Scharfe; +Cc: Pierre Habouzit, Linus Torvalds, git
In-Reply-To: <4963C1EA.504@lsrfire.ath.cx>

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Make all strbuf functions that can fail free() their memory on error if
> they have allocated it.  They don't shrink buffers that have been grown,
> though.

Thanks; applied.

^ permalink raw reply

* Re: [PATCH 4/3] shortlog: handle multi-line subjects like log --pretty=oneline et. al. do
From: Junio C Hamano @ 2009-01-07 21:19 UTC (permalink / raw)
  To: René Scharfe; +Cc: markus.heidelberg, git
In-Reply-To: <4963C1E2.8070906@lsrfire.ath.cx>

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> The commit message parser of git shortlog used to treat only the first
> non-empty line of the commit message as the subject.  Other log commands
> (e.g. --pretty=oneline) show the whole first paragraph instead (unwrapped
> into a single line).
>
> For consistency, this patch borrows format_subject() from pretty.c to
> make shortlog do the same.

Thanks; will queue.

^ permalink raw reply

* Re: [PATCH] diff --no-index -q: fix endless loop
From: Junio C Hamano @ 2009-01-07 21:30 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git
In-Reply-To: <1231326930-7132-1-git-send-email-trast@student.ethz.ch>

Thanks.

^ permalink raw reply

* Re: [PATCH] gitweb: don't use pathinfo for global actions
From: Giuseppe Bilotta @ 2009-01-07 21:32 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano, Devin Doucette
In-Reply-To: <200901061837.23637.jnareb@gmail.com>

On Tue, Jan 6, 2009 at 6:37 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Fri, 2 Jan 2009, Giuseppe Bilotta wrote:
>> Accepting global actions in use_pathinfo is not a very robust solution
>> due to possible present and future conflicts between project names and
>> global actions, therefore we just refuse to create PATH_INFO URLs when
>> the project is not defined.
>
> I think it is quite robust solution and it makes sense; we use
> shortcuts http://git.example.com for projects_list page, and
> http://git.example.com/path/to/repo.git for overview 'summary'
> action for a project, therefore pathinfo has to look like the
> following: http://git.example.com/repo/action/hash with "action"
> _after_ "project".  And there is also matter of backward compatibility
> of URL (URLs shouldn't break).
>
> Anyway, we have $home_link for default project_list page, which
> is path_info without project, and query without query string...

Today I had this idea: a possible way to have global actions into the
path would be to use an invalid project name, but I'm not sure if
there ARE invalid project names at all. Maybe using something very
abstruse such as _projects_ (underscore "projects" underscore) or even
just _ (underscore).

The only thing I can think of for which global actions in path WOULD
be interesting would be that project tag paths would become something
like http://git.example.com/_/tag/sometagname which can be tagged by
the rel-tag microformat http://microformats.org/wiki/rel-tag ...

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply

* Re: [PATCH (topgit) 1/2] Implement setup_pager just like in git
From: Kirill Smelkov @ 2009-01-07 22:00 UTC (permalink / raw)
  To: Thomas Rast, Bert Wesarg, Pierre Habouzit, Petr Baudis
  Cc: martin f krafft, git
In-Reply-To: <200901071324.57222.trast@student.ethz.ch>

On Wed, Jan 07, 2009 at 01:24:44PM +0100, Thomas Rast wrote:
> Kirill Smelkov wrote:
> > On Tue, Jan 06, 2009 at 09:32:03PM +0100, martin f krafft wrote:
> > > I find this very confusing. Why not simply
> > > 
> > >   TG_PAGER="${GIT_PAGER:-}"
> > >   TG_PAGER="${TG_PAGER:-$PAGER}"
> > > 
> > > ?
> > 
> > I find it confusing too, but this is needed because they usually do
> > something like this
> > 
> >     $ GIT_PAGER='' <some-git-command>
> > 
> > to force it to be pagerless.
> [...]
> > So I think it would be better to preserve the same semantics for `tg
> > patch` callers, though it's a pity that it's hard (maybe I'm wrong ?) to
> > see whether an env var is unset.
> 
> Admittedly I haven't really studied your patch, but the ${} constructs
> can in fact tell empty from unset:
> 
>   $ EMPTY=
>   $ unset UNDEFINED
>   $ echo ${UNDEFINED-foo}
>   foo
>   $ echo ${UNDEFINED:-foo}
>   foo
>   $ echo ${EMPTY-foo}
> 
>   $ echo ${EMPTY:-foo}
>   foo
> 
> 'man bash' indeed says
> 
>   When not performing substring expansion, bash tests for a parameter
>   that is unset or null; omitting the colon results in a test only for
>   a parameter that is unset.
> 
> So I suppose you could use
> 
>   ${GIT_PAGER-${PAGER-less}}
> 
> or similar.

Good eyes, thanks!

I'll rework it.


On Wed, Jan 07, 2009 at 03:24:02PM +0100, Bert Wesarg wrote:
> On Wed, Jan 7, 2009 at 12:27, Kirill Smelkov <kirr@landau.phys.spbu.ru> wrote:
> > Martin, thanks for your review.
> > +       # atexit(close(1); wait pager)
> > +       trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
> I think you need to escape the double quotes.

Good eyes -- corrected and thanks!


On Wed, Jan 07, 2009 at 04:10:00PM +0100, Petr Baudis wrote:
> On Wed, Jan 07, 2009 at 02:27:54PM +0300, Kirill Smelkov wrote:
> > >From 2193b7c703c2d31c8739eec617b8c0e8c1d09b79 Mon Sep 17 00:00:00 2001
> > From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
> > Date: Tue, 6 Jan 2009 17:56:37 +0300
> > Subject: [PATCH (topgit) v2] Implement setup_pager just like in git
> > 
> > setup_pager() spawns a pager process and redirect the rest of our output
> > to it.
> > 
> > This will be needed to fix `tg patch` output in the next commit.
> > 
> > Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
> 
> But you never use it...?

What do you mean?

It is used in the next patch as posted in original series:

http://marc.info/?l=git&m=123125495000600&w=2

For completeness, I'll include both patches in this email.

> > ---
> >  tg.sh |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 54 insertions(+), 0 deletions(-)
> > 
> > diff --git a/tg.sh b/tg.sh
> > index 8c23d26..bf9cf5c 100644
> > --- a/tg.sh
> > +++ b/tg.sh
> > @@ -243,6 +243,60 @@ do_help()
> >  	fi
> >  }
> >  
> > +## Pager stuff
> > +
> > +# isatty FD
> > +isatty()
> > +{
> > +	tty -s 0<&$1
> > +}
> > +
> > +# setup_pager
> > +# Spawn pager process and redirect the rest of our output to it
> > +setup_pager()
> > +{
> > +	isatty 1 || return 0
> > +
> > +	# TG_PAGER = GIT_PAGER | PAGER
> > +	# (but differentiate between GIT_PAGER='' and unset variables)
> > +	# http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2004-03/0792.html
> > +	case ${GIT_PAGER+XXX} in
> > +	'')
> > +		case ${PAGER+XXX} in
> > +		'')
> 
> I'm pretty sure there's been a nice trick for this, but I can't remember
> it at all now.

Already corrected to ${GIT_PAGER-${PAGER-less}}, thanks to Thomas.

> > +			# both GIT_PAGER & PAGER unset
> > +			TG_PAGER=''
> > +			;;
> > +		*)
> > +			TG_PAGER="$PAGER"
> > +			;;
> > +		esac
> > +		;;
> > +	*)
> > +		TG_PAGER="$GIT_PAGER"
> > +		;;
> > +	esac
> > +
> > +	[ -z "$TG_PAGER"  -o  "$TG_PAGER" = "cat" ]  && return 0
> > +
> > +
> > +	# now spawn pager
> > +	export LESS=${LESS:-FRSX}	# as in pager.c:pager_preexec()
> > +
> > +	_pager_fifo_dir="$(mktemp -t -d tg-pager-fifo.XXXXXX)"
> > +	_pager_fifo="$_pager_fifo_dir/0"
> > +	mkfifo -m 600 "$_pager_fifo"
> > +
> > +	"$TG_PAGER" < "$_pager_fifo" &
> > +	exec > "$_pager_fifo"		# dup2(pager_fifo.in, 1)
> > +
> > +	# this is needed so e.g. `git diff` will still colorize it's output if
> > +	# requested in ~/.gitconfig with color.diff=auto
> > +	export GIT_PAGER_IN_USE=1
> > +
> > +	# atexit(close(1); wait pager)
> > +	trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
> > +}
> 
> Frankly, I would have been just much happier if something like git
> pager--helper would be provided for external tools to use. Seeing how it
> gets reimplemented like this just pains me greatly.

After we settle on implementation, would it make sense to include this
setup_pager into git-sh-setup?

I propose we include this stuff into tg.sh first, so that topgit would
work correctly with previous versions of git.

> On Wed, Jan 07, 2009 at 03:44:32PM +0100, Pierre Habouzit wrote:
> > On Wed, Jan 07, 2009 at 11:27:54AM +0000, Kirill Smelkov wrote:
> > > isatty()
> > > {
> > > 	tty -s 0<&$1
> > > }
> > 
> > why not test -t 0 ? I'm not sure it's POSIX though.
> 
> It's SUS for many issues already it seems.

Pierre and Petr - thanks for the info. Yes, `test -t $fd` looks better.


Here is improved patch:

Changes since v1:

 o Simplify TG_PAGER setup  (thanks to Thomas Rast)
 o Properly escape "        (thanks to Bert Wesarg)
 o Simpler isatty           (thanks to Pierre Habouzit & Petr Baudis)


(interdiff)

diff --git a/tg.sh b/tg.sh
index bf9cf5c..b64fc3a 100644
--- a/tg.sh
+++ b/tg.sh
@@ -248,7 +248,7 @@ do_help()
 # isatty FD
 isatty()
 {
-	tty -s 0<&$1
+	test -t $1
 }
 
 # setup_pager
@@ -257,25 +257,9 @@ setup_pager()
 {
 	isatty 1 || return 0
 
-	# TG_PAGER = GIT_PAGER | PAGER
-	# (but differentiate between GIT_PAGER='' and unset variables)
-	# http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2004-03/0792.html
-	case ${GIT_PAGER+XXX} in
-	'')
-		case ${PAGER+XXX} in
-		'')
-			# both GIT_PAGER & PAGER unset
-			TG_PAGER=''
-			;;
-		*)
-			TG_PAGER="$PAGER"
-			;;
-		esac
-		;;
-	*)
-		TG_PAGER="$GIT_PAGER"
-		;;
-	esac
+	# TG_PAGER = GIT_PAGER | PAGER | less
+	# NOTE: GIT_PAGER='' is significant
+	TG_PAGER=${GIT_PAGER-${PAGER-less}}
 
 	[ -z "$TG_PAGER"  -o  "$TG_PAGER" = "cat" ]  && return 0
 
@@ -295,7 +279,7 @@ setup_pager()
 	export GIT_PAGER_IN_USE=1
 
 	# atexit(close(1); wait pager)
-	trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
+	trap "exec >&-; rm \"$_pager_fifo\"; rmdir \"$_pager_fifo_dir\"; wait" EXIT
 }
 
 ## Startup


From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
To: Petr Baudis <pasky@suse.cz>
Cc: Git Mailing List <git@vger.kernel.org>
Bcc: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Subject: Implement setup_pager just like in git

setup_pager() spawns a pager process and redirect the rest of our output
to it.

This will be needed to fix `tg patch` output in the next commit.

Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>

---
 tg.sh |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/tg.sh b/tg.sh
index 8c23d26..b64fc3a 100644
--- a/tg.sh
+++ b/tg.sh
@@ -243,6 +243,44 @@ do_help()
 	fi
 }
 
+## Pager stuff
+
+# isatty FD
+isatty()
+{
+	test -t $1
+}
+
+# setup_pager
+# Spawn pager process and redirect the rest of our output to it
+setup_pager()
+{
+	isatty 1 || return 0
+
+	# TG_PAGER = GIT_PAGER | PAGER | less
+	# NOTE: GIT_PAGER='' is significant
+	TG_PAGER=${GIT_PAGER-${PAGER-less}}
+
+	[ -z "$TG_PAGER"  -o  "$TG_PAGER" = "cat" ]  && return 0
+
+
+	# now spawn pager
+	export LESS=${LESS:-FRSX}	# as in pager.c:pager_preexec()
+
+	_pager_fifo_dir="$(mktemp -t -d tg-pager-fifo.XXXXXX)"
+	_pager_fifo="$_pager_fifo_dir/0"
+	mkfifo -m 600 "$_pager_fifo"
+
+	"$TG_PAGER" < "$_pager_fifo" &
+	exec > "$_pager_fifo"		# dup2(pager_fifo.in, 1)
+
+	# this is needed so e.g. `git diff` will still colorize it's output if
+	# requested in ~/.gitconfig with color.diff=auto
+	export GIT_PAGER_IN_USE=1
+
+	# atexit(close(1); wait pager)
+	trap "exec >&-; rm \"$_pager_fifo\"; rmdir \"$_pager_fifo_dir\"; wait" EXIT
+}
 
 ## Startup
 
-- 
tg: (8c77c34..) t/setup-pager (depends on: master)


Second patch which uses setup_pager:


>From 1b723ebf740c58bc25ac97eff0a31b07373d8d1e Mon Sep 17 00:00:00 2001
From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Date: Tue, 6 Jan 2009 18:03:21 +0300
Subject: [TopGit PATCH] tg-patch: fix pagination

Previously, when I was invoking `tg patch` the following used to happen:

1. .topmsg content was sent directly to _terminal_
2. for each file in the patch, its diff was generated with `git diff`
   and sent to *PAGER*
3. trailing 'tg: ...' was sent to terminal again

So the problem is that while `tg patch >file` works as expected, plain
`tg patch` does not -- in pager there is only a part of the whole patch
(first file diff) and header and trailer are ommitted.

I've finally decided to fix this inconvenience, and the way it works is
like in git -- we just hook `setup_pager` function in commands which
need to be paginated.

Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
---
 tg-patch.sh |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/tg-patch.sh b/tg-patch.sh
index a704375..dc699d2 100644
--- a/tg-patch.sh
+++ b/tg-patch.sh
@@ -24,6 +24,9 @@ done
 base_rev="$(git rev-parse --short --verify "refs/top-bases/$name" 2>/dev/null)" ||
 	die "not a TopGit-controlled branch"
 
+
+setup_pager
+
 git cat-file blob "$name:.topmsg"
 echo
 [ -n "$(git grep '^[-]--' "$name" -- ".topmsg")" ] || echo '---'
-- 
1.6.1.48.ge9b8


Thanks,
Kirill

^ permalink raw reply related

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 22:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Sam Vilain, Pierre Habouzit, Linus Torvalds, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <7v63kqall2.fsf@gitster.siamese.dyndns.org>

Hi,

On Wed, 7 Jan 2009, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > After compiling and installing, something like this should be fun to 
> > watch:
> >
> > 	$ git rev-list --all --parents | \
> > 	  grep " .* " | \
> > 	  while read commit parent1 parent2 otherparents
> > 	  do
> > 		test -z "$otherparents" || continue
> > 		git checkout $parent1 &&
> > 		git merge $parent2 &&
> > 		git diff > without-patience.txt &&
> > ...
> > 		if ! cmp without-patience.txt with-patience.txt
> > 		then
> > 			echo '==============================='
> > 			echo "differences found in merge $commit"
> > ...
> > 			cat with-patience.txt
> > 		fi ||
> > 		exit
> > 	  done | tee patience-merge.out
> 
> An even more interesting test would be possible by dropping "&&" from the
> two "git merge" invocations.
> 
>  - Your sample will exit at the first conflicting merge otherwise.
> 
>  - You may find cases where one resolves cleanly while the other leaves
>    conflicts.

Yeah, that's why I always put "like" into that phrase "something like 
this"... :-)

Actually, I had to read something and did not want my box to sit idle 
while I was doing that, so...

The most interesting thing to me was: of the 4072 merges I have in my 
local git.git clone, only 66 show a difference.

The next interesting thing: none -- I repeat, none! -- resulted in only 
one of both methods having conflicts.  In all cases, if patience merge had 
conflicts, so had the classical merge, and vice versa.  I would have 
expected patience merge to handle some conflicts more gracefully.

So let's go on to the next metric: what are the differences in the --cc 
diffs' line counts?

On average, patience merge produced 1.03225806451613 more lines of --cc 
diff, and the standard deviation between the line counts is 
42.9823669772587.

So from the line counts' point of view, the difference is lost in the 
noise.

So let's look at a concrete example.  I take 
41a3e3aa9bdaede9ab7afed206428c1b071060d2, as it is one of the three merges 
with minimal --cc diff line counts (they all have 33 lines) and where 
patience merge makes a difference.

This is the --cc diff without patience merge:

-- snip --
diff --cc git-am.sh
index a391254,5a7695e..0000000
--- a/git-am.sh
+++ b/git-am.sh
@@@ -327,11 -327,20 +327,28 @@@ d
  			echo "Patch is empty.  Was it split wrong?"
  			stop_here $this
  		}
++<<<<<<< HEAD:git-am.sh
 +		SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
 +		case "$keep_subject" in -k)  SUBJECT="[PATCH] $SUBJECT" ;; esac
 +
 +		(printf '%s\n\n' "$SUBJECT"; cat "$dotest/msg") |
 +			git stripspace > "$dotest/msg-clean"
++=======
+ 		if test -f "$dotest/rebasing" &&
+ 			commit=$(sed -e 's/^From \([0-9a-f]*\) .*/\1/' \
+ 				-e q "$dotest/$msgnum") &&
+ 			test "$(git cat-file -t "$commit")" = commit
+ 		then
+ 			git cat-file commit "$commit" |
+ 			sed -e '1,/^$/d' >"$dotest/msg-clean"
+ 		else
+ 			SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
+ 			case "$keep_subject" in -k)  SUBJECT="[PATCH] $SUBJECT" ;; esac
+ 
+ 			(echo "$SUBJECT" ; echo ; cat "$dotest/msg") |
+ 				git stripspace > "$dotest/msg-clean"
+ 		fi
++>>>>>>> 5e835cac8622373724235d299f1331ac4cf81ccf:git-am.sh
  		;;
  	esac
  
-- snap --

Compare this with the --cc diff _with_ patience merge:

-- snip --
diff --cc git-am.sh
index a391254,5a7695e..0000000
--- a/git-am.sh
+++ b/git-am.sh
@@@ -327,11 -327,20 +327,25 @@@ d
  			echo "Patch is empty.  Was it split wrong?"
  			stop_here $this
  		}
- 		SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
- 		case "$keep_subject" in -k)  SUBJECT="[PATCH] $SUBJECT" ;; esac
- 
+ 		if test -f "$dotest/rebasing" &&
+ 			commit=$(sed -e 's/^From \([0-9a-f]*\) .*/\1/' \
+ 				-e q "$dotest/$msgnum") &&
+ 			test "$(git cat-file -t "$commit")" = commit
+ 		then
+ 			git cat-file commit "$commit" |
+ 			sed -e '1,/^$/d' >"$dotest/msg-clean"
+ 		else
+ 			SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
+ 			case "$keep_subject" in -k)  SUBJECT="[PATCH] $SUBJECT" ;; esac
+ 
++<<<<<<< HEAD:git-am.sh
 +		(printf '%s\n\n' "$SUBJECT"; cat "$dotest/msg") |
 +			git stripspace > "$dotest/msg-clean"
++=======
+ 			(echo "$SUBJECT" ; echo ; cat "$dotest/msg") |
+ 				git stripspace > "$dotest/msg-clean"
+ 		fi
++>>>>>>> 5e835cac8622373724235d299f1331ac4cf81ccf:git-am.sh
  		;;
  	esac
  
-- snap --

So, the patience merge resulted in a much smaller _conflict_.

However, another such merge is 276328ffb87cefdc515bee5f09916aea6e0244ed.  
This is the --cc diff without patience merge:

-- snip --
diff --cc diff.c
index 4e4e439,f91f256..0000000
--- a/diff.c
+++ b/diff.c
@@@ -1498,19 -1464,13 +1498,28 @@@ static void builtin_diff(const char *na
  	char *a_one, *b_two;
  	const char *set = diff_get_color_opt(o, DIFF_METAINFO);
  	const char *reset = diff_get_color_opt(o, DIFF_RESET);
 +	const char *a_prefix, *b_prefix;
 +
 +	diff_set_mnemonic_prefix(o, "a/", "b/");
 +	if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
 +		a_prefix = o->b_prefix;
 +		b_prefix = o->a_prefix;
 +	} else {
 +		a_prefix = o->a_prefix;
 +		b_prefix = o->b_prefix;
 +	}
  
++<<<<<<< HEAD:diff.c
 +	a_one = quote_two(a_prefix, name_a + (*name_a == '/'));
 +	b_two = quote_two(b_prefix, name_b + (*name_b == '/'));
++=======
+ 	/* Never use a non-valid filename anywhere if at all possible */
+ 	name_a = DIFF_FILE_VALID(one) ? name_a : name_b;
+ 	name_b = DIFF_FILE_VALID(two) ? name_b : name_a;
+ 
+ 	a_one = quote_two(o->a_prefix, name_a + (*name_a == '/'));
+ 	b_two = quote_two(o->b_prefix, name_b + (*name_b == '/'));
++>>>>>>> e261cf94848d31868c21fb11cade51c30dfcdbe7:diff.c
  	lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
  	lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
  	fprintf(o->file, "%sdiff --git %s %s%s\n", set, a_one, b_two, reset);
-- snap --

And this is _with_ patience merge:

-- snip --
diff --cc diff.c
index 4e4e439,f91f256..0000000
--- a/diff.c
+++ b/diff.c
@@@ -1498,19 -1464,13 +1498,28 @@@ static void builtin_diff(const char *na
  	char *a_one, *b_two;
  	const char *set = diff_get_color_opt(o, DIFF_METAINFO);
  	const char *reset = diff_get_color_opt(o, DIFF_RESET);
 +	const char *a_prefix, *b_prefix;
 +
++<<<<<<< HEAD:diff.c
 +	diff_set_mnemonic_prefix(o, "a/", "b/");
 +	if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
 +		a_prefix = o->b_prefix;
 +		b_prefix = o->a_prefix;
 +	} else {
 +		a_prefix = o->a_prefix;
 +		b_prefix = o->b_prefix;
 +	}
  
 +	a_one = quote_two(a_prefix, name_a + (*name_a == '/'));
 +	b_two = quote_two(b_prefix, name_b + (*name_b == '/'));
++=======
+ 	/* Never use a non-valid filename anywhere if at all possible */
+ 	name_a = DIFF_FILE_VALID(one) ? name_a : name_b;
+ 	name_b = DIFF_FILE_VALID(two) ? name_b : name_a;
+ 
+ 	a_one = quote_two(o->a_prefix, name_a + (*name_a == '/'));
+ 	b_two = quote_two(o->b_prefix, name_b + (*name_b == '/'));
++>>>>>>> e261cf94848d31868c21fb11cade51c30dfcdbe7:diff.c
  	lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
  	lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
  	fprintf(o->file, "%sdiff --git %s %s%s\n", set, a_one, b_two, reset);
-- snap --

So again, we have no clear winner.

Therefore I counted the lines between conflict markers (actually, a perl 
script did).  Of these 66 merges, on average patience merge produced 
4.46774193548387 _fewer_ lines between conflict markers.

Take that with a grain of salt, though: the standard deviation of this 
difference is a hefty 121.163046639509 lines.

The worst case for patience diff was the merge 
4698ef555a1706fe322a68a02a21fb1087940ac3, where the --cc diff line counts 
are 1300 (without) vs 1301 (with patience merge), but the lines between 
conflict markers are 197 vs a ridiculous 826 lines!

But you should take that also with a grain of salt: this merge is a 
_subtree_ merge, and my test redid it as a _non-subtree_ merge.

So I restricted the analysis to the non-subtree merges, and now 
non-patience merge comes out 6.97297297297297 conflict lines fewer than 
patience merge, with a standard deviation of 58.941106657867 (with a total 
count of 37 merges).

Note that ~7 lines difference with a standard deviation of ~59 lines is 
pretty close to ~0 lines difference.

In the end, the additional expense of patience merge might just not be 
worth it.

Ciao,
Dscho

^ permalink raw reply

* Re: Comments on Presentation Notes Request.
From: Daniel Barkalow @ 2009-01-07 22:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Tim Visher, git
In-Reply-To: <20090107063629.GB22616@coredump.intra.peff.net>

On Wed, 7 Jan 2009, Jeff King wrote:

> On Tue, Jan 06, 2009 at 05:33:02PM -0500, Tim Visher wrote:
> 
> > ** Advantages of SCM
> > *** One Source to Rule Them All.
> > *** Unlimited Undo/Redo.
> > *** Safe Concurrent Editing.
> > *** Diff Debugging
> 
> I would add to this metadata and "software archeology": finding the
> author of a change or piece of code, the motivation behind it, related
> changes (by position within history, by content, or by commit message),
> etc.

If you look at the git source code, the comments in the code are almost 
never sufficient to really understand the code, because a full 
line-by-line explanation would make it hard to find the code under the 
comments. On the other hand, if you take "git blame" in one window and a 
series of "git show"s in another window, and look at the commit messages 
for the commits that introduced each of those lines, you get really 
detailed and in-depth documentation of the subtle changes.

> I think people who have not used an SCM before, and people coming from
> SCMs where it is painful to look at history (like CVS) undervalue this
> because it's not part of their workflow.  But having used git for a few
> years now, it is an integral part of how I develop (especially when
> doing maintenance or bugfixes).
> 
> You touch on this in "Diff Debugging", but I think bisection is just a
> part of it.
> 
> > * SCM Best Practices
> >
> > ** Commit Early, Commit Often
> > ** Don't Commit Broken Code (To the Public Tree)
> 
> People talk a lot about using their SCM on a plane, but I think these
> two seemingly opposite commands highlight the _real_ useful thing about
> a distributed system for most people: commit and publish are two
> separate actions.
> 
> So I think it might be better to say "Commit Early, Commit Often" but
> "Don't _Publish_ Broken Code". Which is what you end up saying in the
> discussion, but I think using that terminology makes clear the important
> distinction between two actions that are convoluted in centralized
> systems.
> 
> > *** Backup Becomes A Separate Process
> > Because there is only a single repository, you need a back-up strategy
> > or else you are exposing yourself to a single point of failure.
> > [...]
> > *** Natural Backup
> > Because every developer has a copy of the repository, every developer
> > you add adds an extra failure point.  The more developers you have,
> > the more backups you have of the repository.
> 
> The "natural backup" thing gets brought out a lot for DVCS. And it is
> sort of true: instead of each developer having a backup of the latest
> version (or some recent version which they checked out), they have a
> backup of the whole history. But they still might not have everything.
> Developers might not clone all branches. They might not be up to date
> with some "master" repository. Useful work might be unpublished in the
> master repo (e.g., I am working on feature X which is 99% complete, but
> not ready for me to merge into master and push).

It is the case that everything in the central repo (including speculative 
stuff) will also be on its author's machine, with the metadata needed to 
identify that it's not in the main history and how everything is supposed 
to be arranged. This is likely to be particularly helpful for the work 
that everybody did between the last backup and the crash.

> So yes, you are much more likely to salvage useful (if not all) data
> from developer repositories in the event of a crash. But I still think
> it's crazy not to have a backup strategy for your DVCS repo.

I think it's very important to have a backup strategy, but it's nice that 
the developers can get work done while the server is still down.

> > ** Excellent Merge algorithms
> > 
> > Git has excellent merge algorithms.  This is widely attributed and
> > doesn't require much explanation.  It was one of Git's original design
> > goals, and it has been proven by Git's implementation.  Merging in Git
> > is _much_ less painful than in other systems.
> 
> Actually, git has a really _stupid_ merge algorithm that has been around
> forever: the 3-way merge. And by stupid I don't mean bad, but just
> simple and predictable. I think the git philosophy is more about making
> it easy to merge often, and about making sure conflicts are simple to
> understand and fix, than it is about being clever.

Git is clever about finding the 3 inputs to the 3-way merge, particularly 
the common ancestor of commits that don't have a common ancestor. I think 
merge-recursive is novel to git, and may not be available anywhere else.

> Which isn't to say there aren't systems with less clever merge
> algorithms. CVS doesn't even do a 3-way merge, since it doesn't bother
> to remember where the last branch intersection was.

CVS did do 3-way merge, but only between your uncommited changes, the 
latest commit, and the common ancestor (the commit that you started 
changing). IIRC, arch actually didn't support 3-way merge at all.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Re: Comments on Presentation Notes Request.
From: Boyd Stephen Smith Jr. @ 2009-01-07 22:40 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git
In-Reply-To: <alpine.LNX.1.00.0901071654530.19665@iabervon.org>

[-- Attachment #1: Type: text/plain, Size: 436 bytes --]

On Wednesday 2009 January 07 16:30:04 Daniel Barkalow wrote:
> Git is clever about finding [...]
> the common ancestor of commits that don't have a common ancestor.

*confused*

Please elaborate.
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [BUG PATCH RFC] mailinfo: correctly handle multiline 'Subject:' header
From: Kirill Smelkov @ 2009-01-07 22:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <1230316721-14339-1-git-send-email-kirr@mns.spb.ru>

On Fri, Dec 26, 2008 at 09:38:41PM +0300, Kirill Smelkov wrote:
> When native language (RU) is in use, subject header usually contains several
> parts, e.g.
> 
> Subject: [Navy-patches] [PATCH]
> 	=?utf-8?b?0JjQt9C80LXQvdGR0L0g0YHQv9C40YHQvtC6INC/0LA=?=
> 	=?utf-8?b?0LrQtdGC0L7QsiDQvdC10L7QsdGF0L7QtNC40LzRi9GFINC00LvRjyA=?=
> 	=?utf-8?b?0YHQsdC+0YDQutC4?=

Which btw should be extracted by git-mailinfo to:

    'Subject: Изменён список пакетов необходимых для сборки'

> This exposes several bugs in builtin-mailinfo.c that I try to fix:
> 
> 
> 1. decode_b_segment: do not append explicit NUL -- explicit NUL was preventing
>    correct header construction on parts concatenation via strbuf_addbuf in
>    decode_header_bq. Fixes:
> 
> -Subject: Изменён список пакетов необходимых для сборки
> +Subject: Изменён список па
> 
> 
> Then
> 
> 2. (hackish) do not emit '\n' after processing of every header segment. It
>    seems we should emit previous part as-is only if it does not end with
>    '=?='. Fixes:
> 
> -Subject: Изменён список пакетов необходимых для сборки
> +Subject: Изменён список па кетов необходимых для сборки
> 
> 
> Sorry for low-quality patch and description. I did what I could and don't have
> energy and time dig more into MIME.
> 
> Please help.
> 
> Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
> 
> ---
>  builtin-mailinfo.c  |   18 ++++++++++++++++-
>  t/t5100-mailinfo.sh |    2 +-
>  t/t5100/info0012    |    5 ++++
>  t/t5100/msg0012     |    7 ++++++
>  t/t5100/patch0012   |   30 +++++++++++++++++++++++++++++
>  t/t5100/sample.mbox |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 112 insertions(+), 2 deletions(-)

Junio, All,

What about this patch?

It at least exposes bug in git-mailinfo wrt handling of multiline
subjects, and in very details documents it and adds a test for it.


Yes, my fixes are of 'low quality', but may I try to attract git
community attention one more time?


Thanks beforehand,
Kirill


P.S. original post with patch:

http://marc.info/?l=git&m=123031899307286&w=2

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Pierre Habouzit @ 2009-01-07 22:45 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Sam Vilain, Linus Torvalds, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901072213570.7496@intel-tinevez-2-302>

[-- Attachment #1: Type: text/plain, Size: 2168 bytes --]

On Wed, Jan 07, 2009 at 10:00:07PM +0000, Johannes Schindelin wrote:
> Therefore I counted the lines between conflict markers (actually, a perl 
> script did).  Of these 66 merges, on average patience merge produced 
> 4.46774193548387 _fewer_ lines between conflict markers.
> 
> Take that with a grain of salt, though: the standard deviation of this 
> difference is a hefty 121.163046639509 lines.
> 
> The worst case for patience diff was the merge 
> 4698ef555a1706fe322a68a02a21fb1087940ac3, where the --cc diff line counts 
> are 1300 (without) vs 1301 (with patience merge), but the lines between 
> conflict markers are 197 vs a ridiculous 826 lines!
> 
> But you should take that also with a grain of salt: this merge is a 
> _subtree_ merge, and my test redid it as a _non-subtree_ merge.
> 
> So I restricted the analysis to the non-subtree merges, and now 
> non-patience merge comes out 6.97297297297297 conflict lines fewer than 
> patience merge, with a standard deviation of 58.941106657867 (with a total 
> count of 37 merges).
> 
> Note that ~7 lines difference with a standard deviation of ~59 lines is 
> pretty close to ~0 lines difference.
> 
> In the end, the additional expense of patience merge might just not be 
> worth it.

Depends, if it can help generating nicer merges, it's good to have.

We could have an option to git-merge that tries hard to generate the
smallest conflict possible. _that_ would really really be worth it. I
mean, I've had really really tricky conflicts to work with where
git-merge genrated ridiculously big conflicts, and where I hard to
resort using UI tools to perform the merge (meld IIRC to name it), and
given how slow and crappy those tools are, I would gladly restart a
merge with a --generate-smallest-conflicts-as-possible if it can save me
from those merge tools.

YMMV though.

PS: I never thought the patience diff is a silver bullet, it's just yet
    another tool in the toolbox.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: R. Tyler Ballance @ 2009-01-07 22:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <alpine.LFD.2.00.0901070743070.3057@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 4576 bytes --]

On Wed, 2009-01-07 at 08:07 -0800, Linus Torvalds wrote:
> Well, that's not necessarily "unfortunate". It does actually end up 
> showing that the objects themselves were apparently never really corrupt.
> 
> So there is no fundamental data structure corrupttion - because when you 
> copy the repository, it's all good agin!
>  - it could be some _temporary_ git corruption caused internally inside a 
>    git process - ie a wild pointer, or perhaps a race condition (but we 
>    don't really use threading in 1.6.0.4 unless you ask for it, and even 
>    then just for pack-file generation)

I have a feeling it's something like this, one of our operations guys
did some research while I was looking at code and he came across this:

        On Wed, 2009-01-07 at 14:17 -0800, Ken Brownfield wrote:
        git-merge is using too much RAM, and failing to malloc() but
        NOT  
        > reporting it.  This is all sorts of bad:
        > 
        >   A) using an unscalable amount of RAM
        >   B) failing to detect malloc() failure
        >   C) reporting file corruption instead
        > I was able to reproduce this.
        >
        > limit ~1.5GB -> corrupt file
        > limit ~3GB -> magically no longer corrupt.
        >
        > The false fail may be limited to git-merge, but git status also  
        > allocates the same amount of RAM.
        > 
        > To temporarily work around this problem, issue this once you
        log in to  
        > a dev box:
        > 
        > tcsh:
        >         limit vmemoryuse 3000000
        > bash:
        >         ulimit -v 3000000
        > 
        > Be gentle.
        

> And quite frankly, since the corruption seems to be site-specific, I 
> really do suspect the second case. Although it's possible, of course, that 
> it could be some compiler issue that makes _your_ binaries have issues 
> even when nobody else sees it.

I think you're correct insofar that our major site-specific alteration
has come up on the mailing list before (okay maybe two site-specific
things). 
	* Our Git repo is ~7.1GB
	* ulimit -v is set to ~1.5G


I think I know how this could be failing and corrupting things (assuming
it's malloc(2)) related.


What I'm thinking is that in xmalloc() or one of the other x*)_
functions, the malloc(size) is failing because of the ulimits, and then
the potentially somewhere it's silently failing or maybe even
accidentally returning one of those "malloc(1)" pointers?

I've got two new tarred repositories from two developers the issue
happened to today, so I'm flush full of sample repositories to try stuff
on :)


> 
> Hmm. That's actually _normal_ under some circumstances. At least with 
> older git versions, or if your .git/index file couldn't be rewritten for 
> some reason - your existing index file contains all the old stat 
> information, and if git cannot (or, in the case of older git version, just 
> will not) refresh it automatically, it will show all the files as changed, 
> even if it's just the inode number that really changed.
> 
> A _normal_ git install should have auto-refreshed the index, though. 
> Unless the tar archive only contained the ".git" directory, and not the 
> checkout?

I believe the issues I noticed when untarring the repo were a red
herring, I did the `git diff` after untarring and I noticed that only a
certain set of files where changed, I'm willing to go so far as to guess
that they were the files affected in the corrupted packs. Of the 32k
files in our repository, 98 were actually different after untarring
(according to git-diff(1))

> And dobody else saw it than this one person, and it was a total mystery to 
> everybody until we realized that he used this one feature that nobody else 
> was using. So as you're on OS X, I assume you don't have CRLF conversion, 
> but maybe you use some other feature that we support but nobody really 
> actually uses. Like keyword expansion or something?

The two new folks this happened to today had nothing "special" about
them other than the ulimit.


I've got the script(1) output of performing git-ls-files(1) and some
other commands that I tried, nothing they output was particular
informative or interesting, and I don't think it will help if this
really is a memory related issue, that said I'd be more than happy to
send it to a couple of you (Junio, Linus, Nico).


I'm *so* ready for this bug to die >=\


Cheers

-- 
-R. Tyler Ballance
Slide, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-07 23:03 UTC (permalink / raw)
  To: Pierre Habouzit
  Cc: Junio C Hamano, Sam Vilain, Linus Torvalds, davidel,
	Francis Galiegue, Git ML
In-Reply-To: <20090107224504.GA29537@artemis.corp>

Hi,

On Wed, 7 Jan 2009, Pierre Habouzit wrote:

> On Wed, Jan 07, 2009 at 10:00:07PM +0000, Johannes Schindelin wrote:
> > Therefore I counted the lines between conflict markers (actually, a perl 
> > script did).  Of these 66 merges, on average patience merge produced 
> > 4.46774193548387 _fewer_ lines between conflict markers.
> > 
> > Take that with a grain of salt, though: the standard deviation of this 
> > difference is a hefty 121.163046639509 lines.
> > 
> > The worst case for patience diff was the merge 
> > 4698ef555a1706fe322a68a02a21fb1087940ac3, where the --cc diff line counts 
> > are 1300 (without) vs 1301 (with patience merge), but the lines between 
> > conflict markers are 197 vs a ridiculous 826 lines!
> > 
> > But you should take that also with a grain of salt: this merge is a 
> > _subtree_ merge, and my test redid it as a _non-subtree_ merge.
> > 
> > So I restricted the analysis to the non-subtree merges, and now 
> > non-patience merge comes out 6.97297297297297 conflict lines fewer than 
> > patience merge, with a standard deviation of 58.941106657867 (with a total 
> > count of 37 merges).
> > 
> > Note that ~7 lines difference with a standard deviation of ~59 lines is 
> > pretty close to ~0 lines difference.
> > 
> > In the end, the additional expense of patience merge might just not be 
> > worth it.
> 
> Depends, if it can help generating nicer merges, it's good to have.
> 
> We could have an option to git-merge that tries hard to generate the
> smallest conflict possible.

I also showed you examples, in addition to numbers, exactly to point out 
that shorter conflicts do not necessarily mean nicer conflicts.

Ciao,
Dscho

^ permalink raw reply

* Re: Google Summer of Code 2009
From: Miklos Vajna @ 2009-01-07 23:11 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git
In-Reply-To: <20090107183033.GB10790@spearce.org>

[-- Attachment #1: Type: text/plain, Size: 297 bytes --]

On Wed, Jan 07, 2009 at 10:30:33AM -0800, "Shawn O. Pearce" <spearce@spearce.org> wrote:
> The ideas box is once again open for suggestions.  Please start
> proposing student projects, and possible mentors.

I think restartable git-clone from last year is still actual, and would
be nice to have.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: Google Summer of Code 2009
From: Alex Riesen @ 2009-01-07 23:12 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git
In-Reply-To: <20090107183033.GB10790@spearce.org>

2009/1/7 Shawn O. Pearce <spearce@spearce.org>:
>
>  Organization ideas page:
>    http://git.or.cz/gitwiki/SoC2009Ideas
>

BTW, what happened to GitTorrent?

^ permalink raw reply

* Re: Google Summer of Code 2009
From: Shawn O. Pearce @ 2009-01-07 23:14 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git
In-Reply-To: <81b0412b0901071512k64a7d5e2u2c602b903f5233d3@mail.gmail.com>

Alex Riesen <raa.lkml@gmail.com> wrote:
> 2009/1/7 Shawn O. Pearce <spearce@spearce.org>:
> >
> >  Organization ideas page:
> >    http://git.or.cz/gitwiki/SoC2009Ideas
> 
> BTW, what happened to GitTorrent?

I got lazy and didn't copy everything over.  ;-)

GitTorrent and restartable clone both should probably be on the 2009
idea list, though GitTorrent already has a code base from the failed
2008 project that someone might be able to start and pick up from...

-- 
Shawn.

^ permalink raw reply

* [PATCH] gitweb: support the rel=vcs-* microformat
From: Joey Hess @ 2009-01-07 23:24 UTC (permalink / raw)
  To: git
In-Reply-To: <20090107190238.GA3909@gnu.kitenet.net>

The rel=vcs-* microformat allows a web page to indicate the locations of
repositories related to it in a machine-parseable manner.
(See http://kitenet.net/~joey/rfc/rel-vcs/)

Make gitweb use the microformat if it has been configured with project url
information in any of the usual ways. On the project summary page, the
repository URL display is simply marked up using the microformat. On the
project list page and forks list page, the microformat is embedded in the
header, since the URLs do not appear on the page.

The microformat could be included on other pages too, but I've skipped
doing so for now, since it would mean reading another file for every page
displayed.

There is a small overhead in including the microformat on project list
and forks list pages, but getting the project descriptions for those pages
already incurs a similar overhead, and the ability to get every repo url
in one place seems worthwhile.

This changes git_get_project_url_list() to not check wantarray, and only
return in list context -- the only way it is used AFAICS. It memoizes
both that function and git_get_project_description(), to avoid redundant
file reads.

Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
---
 gitweb/gitweb.perl |   78 +++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 62 insertions(+), 16 deletions(-)

This incorporates Giuseppe Bilotta's feedback, and uses new features
of the microformat. You can see this version running at
http://git.ikiwiki.info/

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 99f71b4..c238717 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2020,9 +2020,14 @@ sub git_get_path_by_hash {
 ## ......................................................................
 ## git utility functions, directly accessing git repository
 
+{
+my %project_descriptions; # cache
+
 sub git_get_project_description {
 	my $path = shift;
 
+	return $project_descriptions{$path} if exists $project_descriptions{$path};
+
 	$git_dir = "$projectroot/$path";
 	open my $fd, "$git_dir/description"
 		or return git_get_project_config('description');
@@ -2031,7 +2036,9 @@ sub git_get_project_description {
 	if (defined $descr) {
 		chomp $descr;
 	}
-	return $descr;
+	return $project_descriptions{$path}=$descr;
+}
+
 }
 
 sub git_get_project_ctags {
@@ -2099,18 +2106,30 @@ sub git_show_project_tagcloud {
 	}
 }
 
+{
+my %project_url_lists; # cache
+
 sub git_get_project_url_list {
+	# use per project git URL list in $projectroot/$path/cloneurl
+	# or make project git URL from git base URL and project name
 	my $path = shift;
 
+	return @{$project_url_lists{$path}} if exists $project_url_lists{$path};
+
+	my @ret;
 	$git_dir = "$projectroot/$path";
-	open my $fd, "$git_dir/cloneurl"
-		or return wantarray ?
-		@{ config_to_multi(git_get_project_config('url')) } :
-		   config_to_multi(git_get_project_config('url'));
-	my @git_project_url_list = map { chomp; $_ } <$fd>;
-	close $fd;
+	if (open my $fd, "$git_dir/cloneurl") {
+		@ret = map { chomp; $_ } <$fd>;
+		close $fd;
+	} else {
+	       @ret = @{ config_to_multi(git_get_project_config('url')) };
+	}
+	@ret=map { "$_/$project" } @git_base_url_list if ! @ret;
+
+	$project_url_lists{$path}=\@ret;
+	return @ret;
+}
 
-	return wantarray ? @git_project_url_list : \@git_project_url_list;
 }
 
 sub git_get_projects_list {
@@ -2856,6 +2875,7 @@ sub blob_contenttype {
 sub git_header_html {
 	my $status = shift || "200 OK";
 	my $expires = shift;
+	my $extraheader = shift;
 
 	my $title = "$site_name";
 	if (defined $project) {
@@ -2953,6 +2973,8 @@ EOF
 		print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
 	}
 
+	print $extraheader if defined $extraheader;
+
 	print "</head>\n" .
 	      "<body>\n";
 
@@ -4365,6 +4387,26 @@ sub git_search_grep_body {
 	print "</table>\n";
 }
 
+sub git_link_title {
+	my $project=shift;
+	
+	my $description=git_get_project_description($project);
+	return $project.(length $description ? " - $description" : "");
+}
+
+# generates header with links to the specified projects
+sub git_links_header {
+	my $ret='';
+	foreach my $project (@_) {
+		# rel=vcs-* microformat
+		my $title=git_link_title($project);
+		foreach my $url git_get_project_url_list($project) {
+			$ret.=qq{<link rel="vcs-git" href="$url" title="$title"/>\n}
+		}
+	}
+	return $ret;
+}
+
 ## ======================================================================
 ## ======================================================================
 ## actions
@@ -4380,7 +4422,9 @@ sub git_project_list {
 		die_error(404, "No projects found");
 	}
 
-	git_header_html();
+	my $extraheader=git_links_header(map { $_->{path} } @list);
+
+	git_header_html(undef, undef, $extraheader);
 	if (-f $home_text) {
 		print "<div class=\"index_include\">\n";
 		insert_file($home_text);
@@ -4405,8 +4449,10 @@ sub git_forks {
 	if (!@list) {
 		die_error(404, "No forks found");
 	}
+	
+	my $extraheader=git_links_header(map { $_->{path} } @list);
 
-	git_header_html();
+	git_header_html(undef, undef, $extraheader);
 	git_print_page_nav('','');
 	git_print_header_div('summary', "$project forks");
 	git_project_list_body(\@list, $order);
@@ -4468,14 +4514,14 @@ sub git_summary {
 		print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
 	}
 
-	# use per project git URL list in $projectroot/$project/cloneurl
-	# or make project git URL from git base URL and project name
 	my $url_tag = "URL";
-	my @url_list = git_get_project_url_list($project);
-	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
-	foreach my $git_url (@url_list) {
+	my $title=git_link_title($project);
+	foreach my $git_url (git_get_project_url_list($project)) {
 		next unless $git_url;
-		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
+		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>".
+		      # rel=vcs-* microformat
+		      "<a rel=\"vcs-git\" href=\"$git_url\" title=\"$title\">$git_url</a>".
+		      "</td></tr>\n";
 		$url_tag = "";
 	}
 
-- 
1.5.6.5



-- 
see shy jo

^ permalink raw reply related

* Re: Google Summer of Code 2009
From: Johannes Schindelin @ 2009-01-07 23:30 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Alex Riesen, git
In-Reply-To: <20090107231431.GC10790@spearce.org>

Hi,

On Wed, 7 Jan 2009, Shawn O. Pearce wrote:

> Alex Riesen <raa.lkml@gmail.com> wrote:
> > 2009/1/7 Shawn O. Pearce <spearce@spearce.org>:
> > >
> > >  Organization ideas page:
> > >    http://git.or.cz/gitwiki/SoC2009Ideas
> > 
> > BTW, what happened to GitTorrent?
> 
> I got lazy and didn't copy everything over.  ;-)

Actually, that would have been lazy. :-)

> GitTorrent and restartable clone both should probably be on the 2009 
> idea list, though GitTorrent already has a code base from the failed 
> 2008 project that someone might be able to start and pick up from...

According to

	http://repo.or.cz/w/VCS-Git-Torrent.git

Joshua is still working on it (albeit slowly).

However, from what Sam said at the GitTogether, it might be a much better 
idea to look at the existing code as a fact-finding experiment, scrap it 
(excluding the experience), and start modifying git-daemon.

AFAICT Sam has a pretty clear idea how to go about it, and staying with C 
should make it much easier for other people to comment.

Note that there has been a flurry of emails on the gittorrent list a few 
weeks back, where somebody challenged the approach Sam wants to take, 
saying that BitTorrent has some very nice features that are absolutely 
necessary, such as its pretty awkward custom encoding.

But AFAICT Sam did a pretty good job at dispelling all of the objections.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-07 23:29 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <1231368935.8870.584.camel@starfruit>



On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
>
> >    git process - ie a wild pointer, or perhaps a race condition (but we 
> >    don't really use threading in 1.6.0.4 unless you ask for it, and even 
> >    then just for pack-file generation)
> 
> I have a feeling it's something like this, one of our operations guys
> did some research while I was looking at code and he came across this:
> 
>         On Wed, 2009-01-07 at 14:17 -0800, Ken Brownfield wrote:
>         git-merge is using too much RAM, and failing to malloc() but
>         NOT  
>         > reporting it.  This is all sorts of bad:
>         > 
>         >   A) using an unscalable amount of RAM
>         >   B) failing to detect malloc() failure
>         >   C) reporting file corruption instead

Well, I dont' think that's exactly it. git internally doesn't really use 
malloc at all, and uses xmalloc() instead which will die() if the malloc 
fails. So there's almost certainly no "failing to detect failures"

Yes, there's a few places that don't use the wrapper, but they should be 
safe (eg either they SIGSEGV, or they are like create_delta_index() and 
just create a sub-optimal pack with a warning).

HOWEVER:

>         > I was able to reproduce this.
>         >
>         > limit ~1.5GB -> corrupt file
>         > limit ~3GB -> magically no longer corrupt.

That is interesting, although I also worry that there might be other 
issues going on (ie since you've reported thigns magically fixing 
themselves, maybe the ulimit tests just _happened_ to show that, even if 
it wasn't the core reason).

BUT! This is definitely worth looking at.

For example, we do have some cases where we try to do "mmap()", and if it 
fails, we try to free some memory and try again. In particular, in 
xmmap(), if an mmap() fails - which may be due to running out of virtual 
address space - we'll actually try to release some pack-file memory and 
try again. Maybe there's a bug there - and it would be one that seldom 
triggers for others.

> I think you're correct insofar that our major site-specific alteration
> has come up on the mailing list before (okay maybe two site-specific
> things). 
> 	* Our Git repo is ~7.1GB
> 	* ulimit -v is set to ~1.5G

It is certainly possible. It's too bad that it's private, because it makes 
it _much_ harder to try to pinpoint this.

				Linus

^ permalink raw reply

* Re: Google Summer of Code 2009
From: Alex Riesen @ 2009-01-07 23:40 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Shawn O. Pearce, git
In-Reply-To: <alpine.DEB.1.00.0901080024170.7496@intel-tinevez-2-302>

2009/1/8 Johannes Schindelin <Johannes.Schindelin@gmx.de>:
>> GitTorrent and restartable clone both should probably be on the 2009
>> idea list, though GitTorrent already has a code base from the failed
>> 2008 project that someone might be able to start and pick up from...
>
> According to
>
>        http://repo.or.cz/w/VCS-Git-Torrent.git
>
> Joshua is still working on it (albeit slowly).
>
> However, from what Sam said at the GitTogether, it might be a much better
> idea to look at the existing code as a fact-finding experiment, scrap it
> (excluding the experience), and start modifying git-daemon.

Takes courage, saying things like that :)

^ permalink raw reply

* Re: Problems with large compressed binaries when converting from svn
From: Alex Riesen @ 2009-01-07 23:55 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
In-Reply-To: <c09652430901060455l5179888ep3c51ff4e3dd5a6ef@mail.gmail.com>

2009/1/6 Øyvind Harboe <oyvind.harboe@zylin.com>:
> I'm converting from svn and I've run into a
> problem with tar.gz and tar.bz2 compressed files.
>
> (This is a separate but only slightly related to previous post).
>
> In subversion we committed large tar.bz2/gz files. These files would
> change relatively rarely, but only very slightly.  The trouble with the tar.bz2
> format is that if the first byte changes, then the rest of the file will also
> be different. .zip does not have this problem, but .zip isn't a very friendly
> format for our purposes.
>
> Later on the tar.bz2/gz files started to change fairly often, but harddrives
> get bigger much more quickly than the .svn repository grows so we just
> kept doing things the same way rather than reeducate and reengineer
> the procedures.
>
> With .git we need to handle this differently somehow.
>
> Does git have some capability to store diffs of compressed files efficiently?

No, but you can unpack the tarballs and include the toolchains as submodules
(aka subprojects) in the projects which need them.

See man page to git submodule, the user-manual.txt on "submodule" and
gitmodules.txt (submodule configuration formats and conventions).

^ permalink raw reply

* fetch branch blacklist
From: jidanni @ 2009-01-08  0:07 UTC (permalink / raw)
  To: git

If one wants to always fetch all except one remote branch, one cannot
just blacklist it, but must instead whitelist all the rest.
$ git branch -rd origin/man origin/html
Deleted remote branch origin/man.
Deleted remote branch origin/html.
Plus I edited them out of FETCH_HEAD. Nonetheless, back from the dead:
$ git pull
From git://git.kernel.org/pub/scm/git/git
 * [new branch]      html       -> origin/html
 * [new branch]      man        -> origin/man
The only solution is to change .git/config:
[remote "origin"]
	url = git://git.kernel.org/pub/scm/git/git.git
#	fetch = +refs/heads/*:refs/remotes/origin/*
	fetch = +refs/heads/maint:refs/remotes/origin/maint
	fetch = +refs/heads/master:refs/remotes/origin/master
	fetch = +refs/heads/next:refs/remotes/origin/next
	fetch = +refs/heads/pu:refs/remotes/origin/pu
	fetch = +refs/heads/todo:refs/remotes/origin/todo
(Such explicit whitelisting will also sacrifice automatic addition or
even notification, if desired, of future new branches too.)
There is a remote.<name>.skipDefaultUpdate variable, but it probably
isn't fine grained enough.

^ permalink raw reply

* collapsing commits with rebase
From: Geoff Russell @ 2009-01-08  0:08 UTC (permalink / raw)
  To: git

Dear gits,

I have a series of commits:

    A---B---C---D---E---F

I want to collapse B---C---D into one single commit. git rebase -i B  will allow
me to do this, but I'm looking for a non-interactive incantation.

Cheers,
Geoff Russell


P.S. The context is a program that performs a single high level
operation on a repository
as a series of commits but then wants to turn  it back into a single
commit without
user intervention so it subsequently looks like a single op in the history.

^ permalink raw reply

* Re: Comments on Presentation Notes Request.
From: Daniel Barkalow @ 2009-01-08  0:14 UTC (permalink / raw)
  To: Tim Visher; +Cc: git
In-Reply-To: <c115fd3c0901061433i78bf3b26v77e5981aada6728e@mail.gmail.com>

On Tue, 6 Jan 2009, Tim Visher wrote:

> Hello Everyone,
> 
> I'm putting together a little 15 minute presentation for my company
> regarding SCMSes in an attempt to convince them to at the very least
> use a Distributed SCMS and at best to use git.  I put together all my
> notes, although I didn't put together the actual presentation yet.  I
> figured I'd post them here and maybe get some feedback about it.  Let
> me know what you think.
> 
> Thanks in advance!
> 
> Notes
> ---------
> 
> SCM: Distributed, Centralized, and Everything in Between.
> 
> * SCM Best Practices
> 
> ** Allow and Encourage Customer Participation
> 
> Most shops seem to attempt to funnel customer participation through
> the developers.  This is a cache miss for many operations such as
> developing the user manual by a design team external to the
> development team.  Basic operations such as commit and update are
> fairly simple to grasp and can even be simplified further through
> scripts and other such tools that non-developers can quickly be taught
> to use.
> 
> Of note is the Tortoise family of tools which integrate directly into
> Windows Explorer.  This makes it fairly easy for anyone who is
> familiar with Windows Explorer to get into using any of the tools that
> there is a Tortoise implementation for.

I still want an office software package with "commit" instead of "save" 
(when in a repository), and a mail program with "push" instead of "attach" 
and "fetch" instead of "open". (See below)

I think that the sales department should be using distributed version 
control, neatly packaged up.

> * The Centralized Model
> 
> ** We Know About This One
> 
> This is traditional, plain vanilla, ubiquitous SCM.
> 
> The great majority of the SCMSes out there are centralized.
> 
> Closely resembles the Client/Server system model.
> 
> ** Work Flow
> 
> <http://whygitisbetterthanx.com/#any-workflow>
> 
> *** 2 basic models: 'Lock, Modify, Unlock' and 'Copy, Modify, Merge'.
> 
> Older systems were primarily Lock, Modify, Unlock implementations.
> You would checkout a file that you intended to work on, and no one
> else would be able to check it out until you unlocked it, signaling
> that you were done editing it.  This is inherently inefficient as on a
> team of developers, the chances that two are working on the exact same
> part of a system without knowing it and coordinating are fairly low.
> Also, any disparate features that still touch the same files in the
> system cannot be worked on simultaneously.
> 
> The answer to this is Copy, Modify, Merge.  In this system, every
> developer gets a complete copy of the HEAD.  Everyone changes the HEAD
> concurrently.  When commits happen, the system attempts to
> intelligently merge them.  If it fails (usually doesn't happen unless
> there is bad coordination), then it asks you to merge them.  This has
> been proven to work well.

Git is almost unique in that, at the point where the user is asked to do a 
merge, the user's work is already preserved.

That is, most systems are: Copy, Modify, Merge, Commit. Git is: Copy, 
Modify, Commit, Merge.

> * The Distributed Model
> 
> ** This Ones New
> 
> At least new as in unfamiliar.  The concept is over a decade old.

In some fundamental ways, this actually resembles the "broadcast email" 
collaboration method. That is, a group is writing a document. Someone 
writes a skeleton, and emails it to everybody else. They make changes to 
different sections. When each person has changed something, they email the 
full document to everybody else. Before people send out their 
versions, they check their email and (painfully) merge the changes into 
what they've done.

This evolved into having a certain location to avoid the painful merge, 
and then to version control. Distributed systems go back to this model, 
except without the "(painfully)" and with all the other benefits of 
version control.

> There are a few different popular distributed SCMSes (Git, Mercurial
> (hg), Bazaar (bzr), Bitkeeper)
> 
> Very closely resembles a peer-to-peer network and the organic
> relationships that evolve in that space.
> 
> In a distributed system, there is no one point where all development
> comes together to for any reason other than policy.  Everyone who is
> working on a system intrinsically has their own copy of the entire
> repository.  All of the history, all of the source code, all of the
> public branches, all of the public tags, etc.  Because of this,
> developers can also have private branches, private tags, private
> commits, private history.  The distinction between public and private
> is very important in this context.  This has several distinct features
> which I'll go into now.
> 
> ** Work Flow (Pick Your Poison)
> 
> <http://whygitisbetterthanx.com/#any-workflow>
> 
> ** Key Properties
> 
> *** Private/Public Concept
> 
> Distributed SCMSes Private/Public ontology is __much__ richer.
> Whereas in a central system, private means only what you have yet to
> commit or what you are leaving untracked, in a distributed system,
> private means anything that you have not yet _chosen_ to make public.
> In other words, you can have private branches, private tags, private
> committed changes to your copy of the head, etc.  Anything that you do
> not specifically publish to a location that others can access is
> intrinsically private.
> 
> In other words, you can finally SCM your sandbox!  You can commit as
> many broken things as you want to a private repository, giving you the
> ability to have a nearly infinite set of undoable and recoverable
> changes, without breaking anyone else's build.  Or, you can just as
> easily ignore TDD, never commit anything for 3 weeks and then do a
> big, massive commit and as long as your final product is tested and
> merges with the rest of the tree, you're good to go and no one cares.

Although you'll be really sad if you accidentally wipe out your work after 
2 1/2 weeks...

> Because you have a rich ontology for private/public data, you can also
> do crazy things like rewriting your local history before anyone else
> sees it.  Because your repository is the only one that has to know
> about the history as long as you're dealing with private data, this is
> a completely safe (although policy debatable) operation.  Of course,
> once data has been published, you really shouldn't mess with its
> history anymore.

You can also see this as writing a new history. If you knew starting out 
everything that you knew when you finished, you might do things 
differently, and the results would likely be more useful. Writing a new 
history lets you start over from where you started, while being able to 
refer to the final working state that you came up with.

> *** Must Learn New Work Flows.
> 
> In order to fully experience the advantages of distributed systems,
> new work flows must be learned.  In other words, it's possible to use
> distributed systems nearly the exact same way as you use a centralized
> system (you just need to learn new commands), but you don't get many
> of the benefits except the speed improvements.  The real game change
> happens when you realize that you can keep things private until their
> finished.  Once you realize that, new branching patterns emerge, new
> work flows happen, you commit more often, and have the ability to
> become much looser and freer in your development process.

My experience bring git to a small company is that people don't need to 
learn new workflows. They can go on with their old workflows and develop 
new ones as they streamline their work. The one exception is really that 
they have to be told that, in git, you commit before merging instead of 
merging before committing.

> *** Impossible To Completely Enforce A Single, Canonical
> Representation of the Code Base.
> 
> By nature, a distributed system cannot enforce a single canonical
> representation of the code base except by policy, and policies can
> always be broken.  Also, any intentionally private data is not backed
> up because it is not shared.  However, backup becomes much simpler
> because you know that no one else is committing to your repository.
> 
> This bears some explanation.  Within a distributed system, you can
> have a single official release point that everyone has blessed (or the
> company has blessed, or the original developer has blessed, or
> whatever).  However, you cannot _stop_ someone else from making a
> release point because their repository is just as valid as yours.  You
> cannot _stop_ developers from sharing code between themselves without
> going out to the official central location.  All you can do is ask
> them not to.

And you might not want to ask them not to. It's really nice to be able to 
reassign a developer to a different task and pass that developer's 
incomplete and not-ready-for-prime-time work to somebody else.

> * Why Git is the Best Choice
> 
> ** (Un)Staged Changes
> 
> Git employs the concept of the Index or Cache or Commit Stage.  This
> is also unique to Git, and it's pretty strange for developers coming
> from a system without it.
> 
> Basically, There are 4 states that any content can be in under Git.
> 
> 1. Untracked: This is content that Git is completely unaware of.
> 2. Tracked but Unstaged: This is content that has changed that Git is
> aware of but will not commit on the next commit command.
> 3. Tracked and Staged: This is the same as unstaged except that this
> content will be committed on the next commit.
> 4. Tracked and Committed:  This is content that has not changed since
> the previous commit that Git is aware of.

1, 4, and something in between are normal; the only extra is 
distinguishing 2 and 3.

> This is very powerful yet somewhat awkward to grasp.  Basically, the
> upshot of this feature is that you can manually build commits if you
> want to.  Say you were working on feature foo and then made some other
> changes because you came across feature bar and thought it would be
> quick to do.  In any other system, the only way you could commit parts
> of what you'd changed is if you were lucky enough for the disparate
> changes to be in different files.  In that case, you could commit only
> the files that you wanted to change for the different features.
> However, if you made disparate changes to the same file, you were
> stuck.  In Git, you can stage only parts of the files to an extreme
> degree.  This allows you to create as many commits as you want out of
> a single change set until the whole change set is committed.

It's pretty common for a system to support:

$ (sys) commit <filenames...>

At its core, the index just lets you tell git about those files on 
multiple command lines instead of just one. And it lets you make 
unincluded changes after you give it a file but before you commit. And it 
lets you fabricate the contents that you're putting in. But really, it's 
about being able to list the things to include one-by-one. (Well, really, 
it's about being able to make 100 commits of a 30000-file project in under 
a second, but that's just the original inspiration.)

> I've found this to be particularly useful when working with an
> existing code base that was not properly formatted.  Often, I'll come
> to a file that has a bunch of wonky white space choices and improperly
> indented logical constructs and I'll just quickly run through it
> correcting that stuff before continuing with the feature I was working
> on.  Afterwords, I'll stage the formatting and commit it, and then
> stage the feature I was working on and commit that.  You may not want
> that kind of control (and if you don't, you don't need to use it), but
> I like it.
> 
> ** Cryptographically Guarantees Content
> 
> One of the most surprising things I learned as I was researching this
> was that most SCMSes do not guarantee that your content does not get
> corrupted.  In other words, if the repository's disk doesn't fail but
> instead just gets corrupted, you'll never know unless you actually
> notice the corruption in the files.  If you have memory corruption
> locally and commit your changes, you just won't know.
>
> Git guarantees absolutely that if corruption happens, you will know
> about it.  It does this by creating SHA-1 hashes of your content and
> then checking to make sure that the SHA-1 hash does not change for an
> object.  The details of this aren't as important as the fact that Git
> is one of the very few systems that do this and it's obviously
> desirable.

You can still get a situation where the content gets corrupted before it 
gets into git, and git happily tracks your corrupt content. But that's 
pretty obvious.

^ permalink raw reply

* Re: Comments on Presentation Notes Request.
From: Daniel Barkalow @ 2009-01-08  0:28 UTC (permalink / raw)
  To: Boyd Stephen Smith Jr.; +Cc: git
In-Reply-To: <200901071640.06288.bss@iguanasuicide.net>

On Wed, 7 Jan 2009, Boyd Stephen Smith Jr. wrote:

> On Wednesday 2009 January 07 16:30:04 Daniel Barkalow wrote:
> > Git is clever about finding [...]
> > the common ancestor of commits that don't have a common ancestor.
> 
> *confused*
> 
> Please elaborate.

I meant to say "a *unique* closest common ancestor". The clever trick is 
that, if there are multiple common ancestors which aren't closer than each 
other, you can merge those ancestors (based, recursively, on their common 
ancestors) to generate a new commit with merge conflicts in it. You then 
pretend that this commit is the unique common ancestor for 3-way merge. 
This works because the merge conflicts in the commit all seem to have been 
replaced in each branch, and the conflict region is some arbitrary chunk 
of text in between other context, and the 3-way merge output doesn't show 
the original text (which would be weird junk in this case: a merge 
conflict that didn't really happen in the middle of other merge 
conflicts), but only the text from the two sides being merged, so it's not 
necessary to resolve the old merge that didn't happen.

I think all of the other systems, if you have crossing history such that 
there isn't a unique common ancestor do one of: (a) give up, (b) generate 
conflicts between your change as it stayed in your branch and the same 
change as it went out and came back, or (c) mishandle some cases involving 
reverts.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: R. Tyler Ballance @ 2009-01-08  0:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
In-Reply-To: <alpine.LFD.2.00.0901071520330.3057@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 2370 bytes --]

On Wed, 2009-01-07 at 15:29 -0800, Linus Torvalds wrote:
> It is certainly possible. It's too bad that it's private, because it makes 
> it _much_ harder to try to pinpoint this.

My most esteemed colleague (Ken aka kb) who pointed out the memory issue
was on the right path (I think), and I have a reproduction case you can
try with your very own Linux kernel tree!

WOO!

I set ulimit -v really low (150M), and the operations I made got an
mmap(2) fatal error, but there is a sweet spot that I found, see the
transcript below. I basically chose an arbitrary revision from a couple
of weeks ago, and rolled the repository back to that point, then I tried
with iterations of ulimit -v 150, 250, 450, and then back down to 350.

        tyler@grapefruit:~/source/git/linux-2.6> limit
        cputime         unlimited
        filesize        unlimited
        datasize        unlimited
        stacksize       8MB
        coredumpsize    0kB
        memoryuse       2561MB
        maxproc         24564
        descriptors     1024
        memorylocked    64kB
        addressspace    unlimited
        maxfilelocks    unlimited
        sigpending      24564
        msgqueue        819200
        nice            0
        rt_priority     0
        tyler@grapefruit:~/source/git/linux-2.6> export
        START=56d18e9932ebf4e8eca42d2ce509450e6c9c1666
        tyler@grapefruit:~/source/git/linux-2.6> git reset --hard $START
        HEAD is now at 56d18e9 Merge branch 'upstream' of
        git://ftp.linux-mips.org/pub/scm/upstream-linus
        tyler@grapefruit:~/source/git/linux-2.6> ulimit -v `echo "350 *
        1024" | bc -l`
        tyler@grapefruit:~/source/git/linux-2.6> git pull
        error: failed to read object
        be1b87c70af69acfadb8a27a7a76dfb61de92643 at offset 1850923
        from .git/objects/pack/pack-dbe154052997a05499eb6b4fd90b924da68e799a.pack
        fatal: object be1b87c70af69acfadb8a27a7a76dfb61de92643 is
        corrupted
        tyler@grapefruit:~/source/git/linux-2.6>
        
I've tried this a couple of times, and it does seem to be reproducible,
let me know if you have any issues reproducing it locally and I'll try
to dig into it more with valgrind or something a bit more pin-pointing
than "ulimit -v && try, try again"


Cheers
-- 
-R. Tyler Ballance
Slide, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox