Git development

Git development
 help / color / mirror / Atom feed

* Re: BUG?? INSTALL MAKEFILE
From: Lars Sadau @ 2009-01-07 10:44 UTC (permalink / raw)
  To: git
In-Reply-To: <vpqzli45p6q.fsf@bauges.imag.fr>

Matthieu Moy <Matthieu.Moy <at> imag.fr> writes:

> 
> Ted Pavlic <ted <at> tedpavlic.com> writes:
> 
> > According to the INSTALL doc, the default prefix should be ~.

I am the same opinion

> I didn't read that in INSTALL. What I read is that if I only run "make
> install", the prefix is $HOME, which is true. Now, ./configure uses a
> default value which is not the one of the Makefile, but that's another
> point.
> 

May be not, confuses newbies.

------
Lars

^ permalink raw reply

* [PATCH] diff --no-index -q: fix endless loop
From: Thomas Rast @ 2009-01-07 11:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7veizfbnuw.fsf@gitster.siamese.dyndns.org>

We forgot to move to the next argument when parsing -q, getting stuck
in an endless loop.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>

---

Junio C Hamano wrote:
> I'll queue the "--" fix, "-q" fix and this pager fix.  Thanks.

Seems the after-midnight rule indeed has some value.  I pointed out
the argv[i] bug, which I see you have squashed into the -- fix, but
failed to notice that the -q parsing is also missing an i++.

I'm still not convinced of the option's value, but this patch at least
fixes the bug.


 diff-no-index.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/diff-no-index.c b/diff-no-index.c
index 12ff1f1..60ed174 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -207,8 +207,10 @@ void diff_no_index(struct rev_info *revs,
 		int j;
 		if (!strcmp(argv[i], "--no-index"))
 			i++;
-		else if (!strcmp(argv[i], "-q"))
+		else if (!strcmp(argv[i], "-q")) {
 			options |= DIFF_SILENT_ON_REMOVED;
+			i++;
+		}
 		else if (!strcmp(argv[i], "--"))
 			i++;
 		else {
-- 
tg: (3bbe36c..) t/diff-q-endless (depends on: next)

^ permalink raw reply related

* Re: [PATCH (topgit) 1/2] Implement setup_pager just like in git
From: Kirill Smelkov @ 2009-01-07 11:27 UTC (permalink / raw)
  To: martin f krafft; +Cc: git, pasky
In-Reply-To: <20090106203203.GA11274@lapse.rw.madduck.net>

Martin, thanks for your review.

I'll too reply inline:

On Tue, Jan 06, 2009 at 09:32:03PM +0100, martin f krafft wrote:
> Thanks, Kirill, for the patches. A couple of comments inline. I hope
> Petr has a chance to look too.
> 
> also sprach Kirill Smelkov <kirr@landau.phys.spbu.ru> [2009.01.06.1616 +0100]:
> > +# isatty FD
> > +isatty()
> > +{
> > +	tty -s 0<&$1 || return 1
> > +	return 0
> > +}
> 
> You don't need any of the return statements. Functions' return
> values are the return values of the last commands they execute.

Agree, I'll rework isatty to be just

isatty()
{
	tty -s 0<&$1
}

> Since we are not using set -e, just "tty -s 0<&$1" in the body will
> have the same effect.

We _are_ using `set -e` in tg.sh:

http://repo.or.cz/w/topgit.git?a=blob;f=tg.sh;h=8c23d26b9a62ddcc1869bb70299862c32edd4403;hb=HEAD#l254

And also I think `tty -s 0<&$1` is a bit obscure, so maybe it is still
better to put it into a function with well-known name such as isatty?

> 
> > +	# TG_PAGER = GIT_PAGER | PAGER
> > +	# http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2004-03/0792.html
> > +	case ${GIT_PAGER+XXX} in
> > +	'')
> > +		case ${PAGER+XXX} in
> > +		'')
> > +			# both GIT_PAGER & PAGER unset
> 
> I find this very confusing. Why not simply
> 
>   TG_PAGER="${GIT_PAGER:-}"
>   TG_PAGER="${TG_PAGER:-$PAGER}"
> 
> ?

I find it confusing too, but this is needed because they usually do
something like this

    $ GIT_PAGER='' <some-git-command>

to force it to be pagerless.

For example in git-rebase.sh:

http://repo.or.cz/w/git.git?a=blob;f=git-rebase.sh;h=ebd4df3a0e821ddcfd1eabfcaac17f854e172a85;hb=HEAD#l415

And other places.


So I think it would be better to preserve the same semantics for `tg
patch` callers, though it's a pity that it's hard (maybe I'm wrong ?) to
see whether an env var is unset.

I'll add additional note about why this is needed.


> 
> > +     [ -z "$TG_PAGER"  -o  "$TG_PAGER" = "cat" ]  && return 0
> 
> What if I set my pager to /bin/cat? But I suppose then there's just
> a wasted FIFO and process, nothing big.

Yes. I'd drop "cat" from here completely, but since I was mimicing
pager.c I left it:

http://repo.or.cz/w/git.git?a=blob;f=pager.c;h=f19ddbc87df04f117cd5e39189c8322fd5f29d68;hb=HEAD#l55

I'm ok with dropping cat. Should we?

> > +	_pager_fifo="$(mktemp -t tg-pager-fifo.XXXXXX)"
> > +	rm "$_pager_fifo" && mkfifo -m 600 "$_pager_fifo"
> 
> There's a race condition here. I can't see a real problem with it,
> though, nor do I know of a better way.

Yes I know and agree here is a race.

But netheir did I knew how to overcome this (I though we'll need mkfifo
to support creating fifos with temp names, but mkfifo lacks support for
this)

The real problem here is that in bash, it's hard to get such constructs
going without fifos at all -- just using pipes like we would do in C,
because:

    exec >file

redirects the rest of current process output to file, but

    exec |less

does _not_ redirect the rest of the current process output through pipe
to less.

Likewise

    exec > >(less)

works (process redirection), but is a bashishm, so = not an option for
us.

Fortunately Adeodato suggested to use `mktemp -d`, so this is reworked
to:

    _pager_fifo_dir="$(mktemp -t -d tg-pager-fifo.XXXXXX)"
    _pager_fifo="$_pager_fifo_dir/0"
    mkfifo -m 600 "$_pager_fifo"

+ appopriate cleanup.

> > +	"$TG_PAGER" < "$_pager_fifo" &
> > +	exec > "$_pager_fifo"		# dup2(pager_fifo.in, 1)
> > +
> > +	# this is needed so e.g. `git diff` will still colorize it's output if
> > +	# requested in ~/.gitconfig with color.diff=auto
> > +	export GIT_PAGER_IN_USE=1
> > +
> > +	# atexit(close(1); wait pager)
> > +	trap "exec >&-; rm $_pager_fifo; wait" EXIT
> 
> Consistency: $_pager_fifo is not passed as a quoted string to rm
> here. In the unlikely event that $TMPDIR contains a space, this
> would fail.

Thanks, corrected.

> I definitely want Petr's opinion on this before I integrate it.

Ok, let's wait what Petr says. Here is improved patch:

Changes since v1:

 o isatty() simplified
 o added a note about different meaning of GIT_PAGER='' and unset
   GIT_PAGER
 o fixed race wrt to mkfifo creation
 o fixed potential whitespace problem in "$_pager_fifo" cleanup


(interdiff)

diff --git a/tg.sh b/tg.sh
index 51a82af..bf9cf5c 100644
--- a/tg.sh
+++ b/tg.sh
@@ -248,8 +248,7 @@ do_help()
 # isatty FD
 isatty()
 {
-	tty -s 0<&$1 || return 1
-	return 0
+	tty -s 0<&$1
 }
 
 # setup_pager
@@ -259,6 +258,7 @@ setup_pager()
 	isatty 1 || return 0
 
 	# TG_PAGER = GIT_PAGER | PAGER
+	# (but differentiate between GIT_PAGER='' and unset variables)
 	# http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2004-03/0792.html
 	case ${GIT_PAGER+XXX} in
 	'')
@@ -283,8 +283,9 @@ setup_pager()
 	# now spawn pager
 	export LESS=${LESS:-FRSX}	# as in pager.c:pager_preexec()
 
-	_pager_fifo="$(mktemp -t tg-pager-fifo.XXXXXX)"
-	rm "$_pager_fifo" && mkfifo -m 600 "$_pager_fifo"
+	_pager_fifo_dir="$(mktemp -t -d tg-pager-fifo.XXXXXX)"
+	_pager_fifo="$_pager_fifo_dir/0"
+	mkfifo -m 600 "$_pager_fifo"
 
 	"$TG_PAGER" < "$_pager_fifo" &
 	exec > "$_pager_fifo"		# dup2(pager_fifo.in, 1)
@@ -294,7 +295,7 @@ setup_pager()
 	export GIT_PAGER_IN_USE=1
 
 	# atexit(close(1); wait pager)
-	trap "exec >&-; rm $_pager_fifo; wait" EXIT
+	trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
 }
 
 ## Startup


>From 2193b7c703c2d31c8739eec617b8c0e8c1d09b79 Mon Sep 17 00:00:00 2001
From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Date: Tue, 6 Jan 2009 17:56:37 +0300
Subject: [PATCH (topgit) v2] Implement setup_pager just like in git

setup_pager() spawns a pager process and redirect the rest of our output
to it.

This will be needed to fix `tg patch` output in the next commit.

Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
---
 tg.sh |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/tg.sh b/tg.sh
index 8c23d26..bf9cf5c 100644
--- a/tg.sh
+++ b/tg.sh
@@ -243,6 +243,60 @@ do_help()
 	fi
 }
 
+## Pager stuff
+
+# isatty FD
+isatty()
+{
+	tty -s 0<&$1
+}
+
+# setup_pager
+# Spawn pager process and redirect the rest of our output to it
+setup_pager()
+{
+	isatty 1 || return 0
+
+	# TG_PAGER = GIT_PAGER | PAGER
+	# (but differentiate between GIT_PAGER='' and unset variables)
+	# http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2004-03/0792.html
+	case ${GIT_PAGER+XXX} in
+	'')
+		case ${PAGER+XXX} in
+		'')
+			# both GIT_PAGER & PAGER unset
+			TG_PAGER=''
+			;;
+		*)
+			TG_PAGER="$PAGER"
+			;;
+		esac
+		;;
+	*)
+		TG_PAGER="$GIT_PAGER"
+		;;
+	esac
+
+	[ -z "$TG_PAGER"  -o  "$TG_PAGER" = "cat" ]  && return 0
+
+
+	# now spawn pager
+	export LESS=${LESS:-FRSX}	# as in pager.c:pager_preexec()
+
+	_pager_fifo_dir="$(mktemp -t -d tg-pager-fifo.XXXXXX)"
+	_pager_fifo="$_pager_fifo_dir/0"
+	mkfifo -m 600 "$_pager_fifo"
+
+	"$TG_PAGER" < "$_pager_fifo" &
+	exec > "$_pager_fifo"		# dup2(pager_fifo.in, 1)
+
+	# this is needed so e.g. `git diff` will still colorize it's output if
+	# requested in ~/.gitconfig with color.diff=auto
+	export GIT_PAGER_IN_USE=1
+
+	# atexit(close(1); wait pager)
+	trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
+}
 
 ## Startup
 
-- 
1.6.1.48.ge9b8



Thanks,
Kirill

^ permalink raw reply related

* Re: [PATCH (topgit) 1/2] Implement setup_pager just like in git
From: Thomas Rast @ 2009-01-07 12:24 UTC (permalink / raw)
  To: Kirill Smelkov; +Cc: martin f krafft, git, pasky
In-Reply-To: <20090107112754.GA15158@roro3>

[-- Attachment #1: Type: text/plain, Size: 1198 bytes --]

Kirill Smelkov wrote:
> On Tue, Jan 06, 2009 at 09:32:03PM +0100, martin f krafft wrote:
> > I find this very confusing. Why not simply
> > 
> >   TG_PAGER="${GIT_PAGER:-}"
> >   TG_PAGER="${TG_PAGER:-$PAGER}"
> > 
> > ?
> 
> I find it confusing too, but this is needed because they usually do
> something like this
> 
>     $ GIT_PAGER='' <some-git-command>
> 
> to force it to be pagerless.
[...]
> So I think it would be better to preserve the same semantics for `tg
> patch` callers, though it's a pity that it's hard (maybe I'm wrong ?) to
> see whether an env var is unset.

Admittedly I haven't really studied your patch, but the ${} constructs
can in fact tell empty from unset:

  $ EMPTY=
  $ unset UNDEFINED
  $ echo ${UNDEFINED-foo}
  foo
  $ echo ${UNDEFINED:-foo}
  foo
  $ echo ${EMPTY-foo}

  $ echo ${EMPTY:-foo}
  foo

'man bash' indeed says

  When not performing substring expansion, bash tests for a parameter
  that is unset or null; omitting the colon results in a test only for
  a parameter that is unset.

So I suppose you could use

  ${GIT_PAGER-${PAGER-less}}

or similar.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: Error: unable to unlink ... when using "git gc"
From: Sitaram Chamarty @ 2009-01-07 12:29 UTC (permalink / raw)
  To: git
In-Reply-To: <slrngm92hr.72d.sitaramc@sitaramc.homelinux.net>

On 2009-01-07, Sitaram Chamarty <sitaramc@gmail.com> wrote:
>     $ dummy_commit                   
>     $ jeeves dummy_commit

err, helper functions.  Should have mentioned.

I do a lot of playing around with git :-)

^ permalink raw reply

* Re: [RFC/PATCH 2/3] replace_object: add mechanism to replace objects found in "refs/replace/"
From: Johannes Schindelin @ 2009-01-07 12:29 UTC (permalink / raw)
  To: Christian Couder; +Cc: Junio C Hamano, git
In-Reply-To: <20090107084341.1554d8cd.chriscool@tuxfamily.org>

Hi,

On Wed, 7 Jan 2009, Christian Couder wrote:

> diff --git a/replace_object.c b/replace_object.c
> new file mode 100644
> index 0000000..b50890d
> --- /dev/null
> +++ b/replace_object.c
> @@ -0,0 +1,102 @@
> +#include "cache.h"
> +#include "refs.h"
> +
> +static struct replace_object {
> +	unsigned char sha1[2][20];
> +} **replace_object;
> +
> +static int replace_object_alloc, replace_object_nr;
> +
> +static int replace_object_pos(const unsigned char *sha1)
> +{
> +	int lo, hi;
> +	lo = 0;
> +	hi = replace_object_nr;
> +	while (lo < hi) {
> [...]

I suspect that this sorted list should be a hashmap.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs microformat
From: Giuseppe Bilotta @ 2009-01-07 12:30 UTC (permalink / raw)
  To: git
In-Reply-To: <20090107042518.GB24735@gnu.kitenet.net>

On Wednesday 07 January 2009 05:25, Joey Hess wrote:

> The rel=vcs microformat allows a web page to indicate the locations of
> repositories related to it in a machine-parseable manner.
> (See http://kitenet.net/~joey/rfc/rel-vcs/)

Interesting idea, I like it. However, I see a problem in the proposed
implementation versus the spec. According to the spec:

"""
The "title" is optional, but recommended if there are multiple, different
repositories linked to on one page. It is a human-readable description of the
repository.
[...]
If there are multiple repositories listed, without titles, tools should assume
they are different repositories.
"""

In this patch you do NOT add titles to the rel=vcs links, which means that
everything works fine only if there is a single URL for each project. If a
project has different URLs, it's going to appear multiple times as _different_
projects to a spec-compliant reader.

A possible solution would be to make @git_url_list into a map keyed by the
project name and having the description and repo URL(s) as values.

Since there is the possibility of different projects having the same
description (e.g. the default one), the link title could be composed of
"$project - $description" rather than simply $description.

Note that both in summary and in project list view you already retrieve the
description, so there are no additional disk hits.

> Make gitweb use the microformat in the header of pages it generates,
> if it has been configured with project url information in any of the usual
> ways.
> 
> Since getting the urls can require hitting disk, I avoided putting the
> microformat on *every* page gitweb generates. Just put it on the project
> summary page, the project list page, and the forks list page.
> The first of these already looks up the urls, so adding the microformat was
> free. There is a small overhead in including the microformat on the
> latter two pages, but getting the project descriptions for those pages
> already incurs a similar overhead, and the ability to get every repo url
> in one place seems worthwhile.
> 
> This changes git_get_project_description() to not check wantarray, and only
> return in list context -- the only way it is used AFAICS.

I assume you mean git_get_project_url_list()?

> 
> Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
> ---
>  gitweb/gitweb.perl |   38 ++++++++++++++++++++++++++------------
>  1 files changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99f71b4..3f8a228 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -789,6 +789,9 @@ $git_dir = "$projectroot/$project" if $project;
>  our @snapshot_fmts = gitweb_get_feature('snapshot');
>  @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
>  
> +# populated later with git urls for the project
> +our @git_url_list;
> +
>  # dispatch
>  if (!defined $action) {
>       if (defined $hash) {
> @@ -2100,17 +2103,22 @@ sub git_show_project_tagcloud {
>  }
>  
>  sub git_get_project_url_list {
> +     # use per project git URL list in $projectroot/$path/cloneurl
> +     # or make project git URL from git base URL and project name
>       my $path = shift;
>  
> +     my @ret;
> +
>       $git_dir = "$projectroot/$path";
> -     open my $fd, "$git_dir/cloneurl"
> -             or return wantarray ?
> -             @{ config_to_multi(git_get_project_config('url')) } :
> -                config_to_multi(git_get_project_config('url'));
> -     my @git_project_url_list = map { chomp; $_ } <$fd>;
> -     close $fd;
> +     if (open my $fd, "$git_dir/cloneurl") {
> +             @ret = map { chomp; $_ } <$fd>;
> +             close $fd;
> +     }
> +     else {

Coding style: } else {

> +            @ret = @{ config_to_multi(git_get_project_config('url')) };
> +     }
>  
> -     return wantarray ? @git_project_url_list : \@git_project_url_list;
> +     return @ret ? @ret : map { "$_/$project" } @git_base_url_list;
>  }
>  
>  sub git_get_projects_list {
> @@ -2953,6 +2961,10 @@ EOF
>               print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
>       }
>  
> +     foreach my $url (@git_url_list) {
> +             print qq{<link rel="vcs" type="git" href="$url" />\n};
> +     }
> +
>       print "</head>\n" .
>             "<body>\n";
>  
> @@ -4380,6 +4392,8 @@ sub git_project_list {
>               die_error(404, "No projects found");
>       }
>  
> +     @git_url_list = map { git_get_project_url_list($_->{path}) } @list;
> +
>       git_header_html();
>       if (-f $home_text) {
>               print "<div class=\"index_include\">\n";
> @@ -4400,6 +4414,8 @@ sub git_forks {
>       if (defined $order && $order !~ m/none|project|descr|owner|age/) {
>               die_error(400, "Unknown order parameter");
>       }
> +     
> +     @git_url_list = map { git_get_project_url_list($_->{path}) } @list;
>  
>       my @list = git_get_projects_list($project);
>       if (!@list) {
> @@ -4457,6 +4473,8 @@ sub git_summary {
>               @forklist = git_get_projects_list($project);
>       }
>  
> +     @git_url_list = git_get_project_url_list($project);
> +
>       git_header_html();
>       git_print_page_nav('summary','', $head);
>  
> @@ -4468,12 +4486,8 @@ sub git_summary {
>               print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
>       }
>  
> -     # use per project git URL list in $projectroot/$project/cloneurl
> -     # or make project git URL from git base URL and project name
>       my $url_tag = "URL";
> -     my @url_list = git_get_project_url_list($project);
> -     @url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
> -     foreach my $git_url (@url_list) {
> +     foreach my $git_url (@git_url_list) {
>               next unless $git_url;
>               print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
>               $url_tag = "";

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply

* Re: Problems getting rid of large files using git-filter-branch
From: Johannes Schindelin @ 2009-01-07 12:45 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
In-Reply-To: <c09652430901070215p436db79boae4c56bfa1afbc1a@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 162 bytes --]

Hi,

On Wed, 7 Jan 2009, Øyvind Harboe wrote:

> How would git behave if it ran out of memory?

Something like

	fatal: Out of memory, malloc failed

Ciao,
Dscho

^ permalink raw reply

* [PATCH/RFC v3 0/2] git checkout: optimise away lots of lstat() calls
From: Kjetil Barvik @ 2009-01-07 13:24 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Kjetil Barvik

I have just started to clone some interesting Linux git trees to watch
the development more closely, and therefore also started to use git. I
noticed that 'git checkout' takes some time, and especially that the
'git checkout' command does lots and lots of lstat() calls.

After some more investigation and thinking, I have made 2 patches and
been able to optimise away over 42% of all lstat() calls in some cases
for the 'git checkout' command.  I have not tested other git porcelain
commands for reduced lstat() calls, but I would guess that the more
effective 'lstat_cache()' compared to 'has_leading_symlink_cache()',
should also give better numbers in other cases.

Both patches is against git master, and the git 'make test' test suite
still passes after each patch.

To document the improvement, below is some numbers, which compares
before and after the 2 patches. To reproduce the numbers:

- git clone the Linux git tree to be able to get the Linux tags
  'v2.6.25' and 'v2.6.27'.
- git checkout -b my-v2.6.27 v2.6.27
- git checkout -b my-v2.6.25 v2.6.25

Then, when the current branch is 'my-v2.6.25' do:

  strace -o strace_to27 -T git checkout -q my-v2.6.27

And then you pretty print and collect stats from the 'strace_to27'
file.  If someone wants a copy of the strace_stat.pl script, which I
made/used to do the pretty printing, then give me a hint.

Below is the stats/numbers from the current git version (before the 2
patches).  Notice that we do an lstat() call on the "arch" directory
over 6000 times!

TOTAL      185151 100.000% OK:165544 NOT: 19607  11.136001 sec   60 usec/call
lstat64    120954  65.327% OK:107013 NOT: 13941   5.388727 sec   45 usec/call
  strings  120954 tot  30163 uniq   4.010 /uniq   5.388727 sec   45 usec/call
  files     61491 tot  28712 uniq   2.142 /uniq   2.740520 sec   45 usec/call
  dirs      45522 tot   1436 uniq  31.701 /uniq   1.994448 sec   44 usec/call
  errors    13941 tot   5189 uniq   2.687 /uniq   0.653759 sec   47 usec/call
             6297   5.206% OK:  6297 NOT:     0  "arch"
             4544   3.757% OK:  4544 NOT:     0  "drivers"
             1816   1.501% OK:  1816 NOT:     0  "arch/arm"
             1499   1.239% OK:  1499 NOT:     0  "include"
              912   0.754% OK:   912 NOT:     0  "arch/powerpc"
              764   0.632% OK:   764 NOT:     0  "fs"
              746   0.617% OK:   746 NOT:     0  "drivers/net"
              662   0.547% OK:   662 NOT:     0  "net"
              652   0.539% OK:   325 NOT:   327  "arch/sparc/include"
              636   0.526% OK:   636 NOT:     0  "drivers/media"
              606   0.501% OK:   606 NOT:     0  "include/linux"
              533   0.441% OK:   533 NOT:     0  "arch/sh"
              522   0.432% OK:   260 NOT:   262  "arch/powerpc/include"
              488   0.403% OK:   243 NOT:   245  "arch/sh/include"
              413   0.341% OK:   413 NOT:     0  "arch/sparc"
              390   0.322% OK:   390 NOT:     0  "arch/x86"
              383   0.317% OK:   383 NOT:     0  "Documentation"
              370   0.306% OK:   184 NOT:   186  "arch/ia64/include"
              366   0.303% OK:   366 NOT:     0  "drivers/media/video"
              348   0.288% OK:   173 NOT:   175  "arch/arm/include"

Here is the stats/numbers after applying the 2 patches.  Notice how
nice the top 20 entries list now looks!

TOTAL      133655 100.000% OK:121615 NOT: 12040  10.429999 sec   78 usec/call
lstat64     69603  52.077% OK: 63218 NOT:  6385   3.419920 sec   49 usec/call
  strings   69603 tot  30163 uniq   2.308 /uniq   3.419920 sec   49 usec/call
  files     61491 tot  28712 uniq   2.142 /uniq   3.034869 sec   49 usec/call
  dirs       1727 tot   1164 uniq   1.484 /uniq   0.075681 sec   44 usec/call
  errors     6385 tot   5189 uniq   1.230 /uniq   0.309370 sec   48 usec/call
                4   0.006% OK:     4 NOT:     0  ".gitignore"
                4   0.006% OK:     4 NOT:     0  ".mailmap"
                4   0.006% OK:     4 NOT:     0  "CREDITS"
                4   0.006% OK:     4 NOT:     0  "Documentation/00-INDEX"
                4   0.006% OK:     4 NOT:     0  "Documentation/ABI/testing/sysfs-block"
                4   0.006% OK:     4 NOT:     0  "Documentation/ABI/testing/sysfs-firmware-acpi"
                4   0.006% OK:     4 NOT:     0  "Documentation/CodingStyle"
                4   0.006% OK:     4 NOT:     0  "Documentation/DMA-API.txt"
                4   0.006% OK:     4 NOT:     0  "Documentation/DMA-mapping.txt"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/Makefile"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/gadget.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/kernel-api.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/kernel-locking.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/procfs-guide.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/procfs_example.c"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/rapidio.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/s390-drivers.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/uio-howto.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/videobook.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/writing_usb_driver.tmpl"

Note that the overall used system time as recorded from 'strace -T',
does not drop so much that the reduced lstat() time should indicate
for _this_ particular test run.  This is because now each unlink()
call takes much more time, at least for me on an slow ide disk (using
ext3) on a laptop.

A simple test gives me an overall improvement of 2.937 seconds: real
time drops from 28.195s (best of 5 runs with 'time git ...'), to
25.381s (best of 5 runs).

I have also noticed that inside the unlink_entry() function in file
unpack-trees.c, one could often end up calling rmdir() lots and lots
of times on none-empty directories.  Maybe one should schedule each
directory for removal by an appropriate function, and then at the end
call a new function to clean all the directories at once?

Comments?

----
  Changes since v2:
  = inside patch 1/2
      [[Following is updates after comments from Linus Torvalds - Thanks!]]
    - simplified the interface: introduce 2 static inline functions
      has_symlink_leading_path() and has_symlink_or_noent_leading_path()
    - similar, introduce 2 defines: clear_symlink_cache() and
      clear_symlink_or_noent_cache()
    - reorganise the patches: previous patch 2/4 and 4/4 is put into
      this one
    - update the commit message accordingly
    - keep the symlinks.c file
  = inside patch 2/2
    - was patch 3/4 in v2
    - always null terminate the dirs_path array
    - update the patch with some of the comments regarding patch 1/4
      from Junio C Hamano

  Changes since v1:
  = inside patch 1/4
    - always null terminate the cache_path array
    - added a paragraph to the commit message for this patch
    - small cleanup on 2 comments, and a small line indentation change
      [[Following is updates after comments from Junio C Hamano - Thanks!]]
    - removed the 'static inline update_path_cache()' function
    - replaced the else-part of the above inline function with a call
      to the 'clear_lstat_cache()' function.
    - deleted the '|| errno == ENOTDIR' part inside the big while-loop
      inside check_lstat_cache(), and updated the named BIT-fields
      accordingly
  = inside patch 2/4
    - moved a paragraph out from the commit message for this patch and
      into this cover-letter
      [[Following is updates after comments from Junio C Hamano - Thanks!]]
    - Removed the '|LSTAT_NOTDIR' part from the call to lstat_cache()
      inside function 'check_removed()' inside file diff-lib.c


Kjetil Barvik (2):
  Optimised, faster, more effective symlink/directory detection
  create_directories() inside entry.c: only check each directory once!

 builtin-add.c          |    1 +
 builtin-apply.c        |    1 +
 builtin-update-index.c |    1 +
 cache.h                |   23 +++++++++-
 diff-lib.c             |    1 +
 entry.c                |   82 +++++++++++++++++++++++++-------
 symlinks.c             |  123 +++++++++++++++++++++++++++++++-----------------
 unpack-trees.c         |    6 ++-
 8 files changed, 173 insertions(+), 65 deletions(-)

^ permalink raw reply

* [PATCH/RFC v3 1/2] Optimised, faster, more effective symlink/directory detection
From: Kjetil Barvik @ 2009-01-07 13:24 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Kjetil Barvik
In-Reply-To: <1231334689-17135-1-git-send-email-barvik@broadpark.no>

Changes includes the following:

- The cache functionality is more effective.  Previously when A/B/C/D
  was in the cache and A/B/C/E/file.c was called for, there was no
  match at all from the cache.  Now we use the fact that the paths
  "A", "A/B" and "A/B/C" is already tested, and we only need to do an
  lstat() call on "A/B/C/E".

- We only cache/store the last path regardless of it's type.  Since the
  cache functionality is always used with alphabetically sorted names
  (at least it seams so for me), there is no need to store both the
  last symlink-leading path and the last real-directory path.  Note
  that if the cache is not called with (mostly) alphabetically sorted
  names, neither the old, nor this new one, would be very effective.

- We also can cache the fact that a directory does not exist.
  Previously we could end up doing lots of lstat() calls for a removed
  directory which previously contained lots of files.  Since we
  already have simplified the cache functionality and only store the
  last path (see above), this new functionality was easy to add.

- Previously, when symlink A/B/C/S was cached/stored in the
  symlink-leading path, and A/B/C/file.c was called for, it was not
  easy to use the fact that we already known that the paths "A", "A/B"
  and "A/B/C" is real directories.  Since we now only store one single
  path (the last one), we also get similar logic for free regarding
  the new "non-exsisting-directory-cache".

- Avoid copying the first path components of the name 2 zillions times
  when we tests new path components.  Since we always cache/store the
  last path, we can copy each component as we test those directly into
  the cache.  Previously we ended up doing a memcpy() for the full
  path/name right before each lstat() call, and when updating the
  cache for each time we have tested an new path component.

- We also use less memory, that is PATH_MAX bytes less memory on the
  stack and PATH_MAX bytes less memory on the heap.

- Introduce a 3rd argument, 'unsigned int track_flags', to the
  cache-test function, check_lstat_cache().  This new argument can be
  used to tell the cache functionality which types of directories
  should be cached.

- Also introduce a 'void clear_lstat_cache(void)' function, which
  should be used to clean the cache before usage.  If for instance,
  you have changed the types of directories which should be cached,
  the cache could contain a path which was not wanted.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 719de8b... 870961e... M	builtin-add.c
:100644 100644 a8f75ed... d3d001a... M	builtin-apply.c
:100644 100644 5604977... 8907219... M	builtin-update-index.c
:100644 100644 231c06d... 768ba38... M	cache.h
:100644 100644 ae96c64... c9caa0e... M	diff-lib.c
:100644 100644 5a5e781... 2f025c8... M	symlinks.c
:100644 100644 54f301d... 28e2759... M	unpack-trees.c
 builtin-add.c          |    1 +
 builtin-apply.c        |    1 +
 builtin-update-index.c |    1 +
 cache.h                |   22 ++++++++-
 diff-lib.c             |    1 +
 symlinks.c             |  123 +++++++++++++++++++++++++++++++-----------------
 unpack-trees.c         |    5 +-
 7 files changed, 107 insertions(+), 47 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 719de8b0f2d2d831f326d948aa18700e5c474950..870961e8ca4e3d6f9333020083d0a232bccd542c 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -225,6 +225,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	argc = parse_options(argc, argv, builtin_add_options,
 			  builtin_add_usage, 0);
+	clear_symlink_cache();
 	if (patch_interactive)
 		add_interactive = 1;
 	if (add_interactive)
diff --git a/builtin-apply.c b/builtin-apply.c
index a8f75ed3ed411d8cf7a3ec9dfefef7407c50f447..d3d001a96be6e502d6338af4467f7c313370d78e 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -3154,6 +3154,7 @@ int cmd_apply(int argc, const char **argv, const char *unused_prefix)
 	if (apply_default_whitespace)
 		parse_whitespace_option(apply_default_whitespace);
 
+	clear_symlink_cache();
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
 		char *end;
diff --git a/builtin-update-index.c b/builtin-update-index.c
index 560497750586ec61be4e34de6dedd9c307129817..8907219fb9cb438113e29ee17854edb5dd4baa4d 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -581,6 +581,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	if (entries < 0)
 		die("cache corrupted");
 
+	clear_symlink_cache();
 	for (i = 1 ; i < argc; i++) {
 		const char *path = argv[i];
 		const char *p;
diff --git a/cache.h b/cache.h
index 231c06d7726b575f6e522d5b0c0fe43557e8c651..768ba3825f3015828381490b0c387177a4f71578 100644
--- a/cache.h
+++ b/cache.h
@@ -719,7 +719,27 @@ struct checkout {
 };
 
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
-extern int has_symlink_leading_path(int len, const char *name);
+
+#define LSTAT_DIR       (1u << 0)
+#define LSTAT_NOENT     (1u << 1)
+#define LSTAT_SYMLINK   (1u << 2)
+#define LSTAT_LSTATERR  (1u << 3)
+#define LSTAT_ERR       (1u << 4)
+extern unsigned int check_lstat_cache(int len, const char *name,
+				      unsigned int track_flags);
+extern void clear_lstat_cache(void);
+static inline unsigned int has_symlink_leading_path(int len, const char *name)
+{
+	return check_lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_DIR) &
+		LSTAT_SYMLINK;
+}
+#define clear_symlink_cache() clear_lstat_cache()
+static inline unsigned int has_symlink_or_noent_leading_path(int len, const char *name)
+{
+	return check_lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_NOENT|LSTAT_DIR) &
+		(LSTAT_SYMLINK|LSTAT_NOENT);
+}
+#define clear_symlink_or_noent_cache() clear_lstat_cache()
 
 extern struct alternate_object_database {
 	struct alternate_object_database *next;
diff --git a/diff-lib.c b/diff-lib.c
index ae96c64ca209f4df9008198e8a04b160bed618c7..c9caa0e6ef0f4a8ee8b850869ef6d0f52b712385 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -69,6 +69,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		diff_unmerged_stage = 2;
 	entries = active_nr;
 	symcache[0] = '\0';
+	clear_symlink_cache();
 	for (i = 0; i < entries; i++) {
 		struct stat st;
 		unsigned int oldmode, newmode;
diff --git a/symlinks.c b/symlinks.c
index 5a5e781a15d7d9cb60797958433eca896b31ec85..2f025c87b0be0e713af7b1400e71284a22c25e9f 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -1,64 +1,99 @@
 #include "cache.h"
 
-struct pathname {
-	int len;
-	char path[PATH_MAX];
-};
+static char         cache_path[PATH_MAX];
+static int          cache_len   = 0;
+static unsigned int cache_flags = 0;
 
-/* Return matching pathname prefix length, or zero if not matching */
-static inline int match_pathname(int len, const char *name, struct pathname *match)
+static inline int
+greatest_common_path_cache_prefix(int len, const char *name)
 {
-	int match_len = match->len;
-	return (len > match_len &&
-		name[match_len] == '/' &&
-		!memcmp(name, match->path, match_len)) ? match_len : 0;
-}
+	int max_len, match_len = 0, i = 0;
 
-static inline void set_pathname(int len, const char *name, struct pathname *match)
-{
-	if (len < PATH_MAX) {
-		match->len = len;
-		memcpy(match->path, name, len);
-		match->path[len] = 0;
+	max_len = len < cache_len ? len : cache_len;
+	while (i < max_len && name[i] == cache_path[i]) {
+		if (name[i] == '/') match_len = i;
+		i++;
 	}
+	if (i == cache_len && len > cache_len && name[cache_len] == '/')
+		match_len = cache_len;
+	return match_len;
 }
 
-int has_symlink_leading_path(int len, const char *name)
+/*
+ * Check if name 'name' of length 'len' has a symlink leading
+ * component, or if the directory exists and is real, or not.
+ *
+ * To speed up the check, some information is allowed to be cached.
+ * This is indicated by the 'track_flags' argument.
+ */
+unsigned int
+check_lstat_cache(int len, const char *name, unsigned int track_flags)
 {
-	static struct pathname link, nonlink;
-	char path[PATH_MAX];
+	int match_len, last_slash, max_len;
+	unsigned int match_flags, ret_flags, save_flags;
 	struct stat st;
-	char *sp;
-	int known_dir;
 
-	/*
-	 * See if the last known symlink cache matches.
+	/* Check if match from the cache for 2 "excluding" path types.
 	 */
-	if (match_pathname(len, name, &link))
-		return 1;
+	match_len = last_slash =
+		greatest_common_path_cache_prefix(len, name);
+	match_flags =
+		cache_flags & track_flags & (LSTAT_NOENT|LSTAT_SYMLINK);
+	if (match_flags && match_len == cache_len)
+		return match_flags;
 
-	/*
-	 * Get rid of the last known directory part
+	/* Okay, no match from the cache so far, so now we have to
+	 * check the rest of the path components.
 	 */
-	known_dir = match_pathname(len, name, &nonlink);
-
-	while ((sp = strchr(name + known_dir + 1, '/')) != NULL) {
-		int thislen = sp - name ;
-		memcpy(path, name, thislen);
-		path[thislen] = 0;
+	ret_flags = LSTAT_DIR;
+	max_len = len < PATH_MAX ? len : PATH_MAX;
+	while (match_len < max_len) {
+		do {
+			cache_path[match_len] = name[match_len];
+			match_len++;
+		} while (match_len < max_len && name[match_len] != '/');
+		if (match_len >= max_len)
+			break;
+		last_slash = match_len;
+		cache_path[last_slash] = '\0';
 
-		if (lstat(path, &st))
-			return 0;
-		if (S_ISDIR(st.st_mode)) {
-			set_pathname(thislen, path, &nonlink);
-			known_dir = thislen;
+		if (lstat(cache_path, &st)) {
+			ret_flags = LSTAT_LSTATERR;
+			if (errno == ENOENT)
+				ret_flags |= LSTAT_NOENT;
+		} else if (S_ISDIR(st.st_mode)) {
 			continue;
-		}
-		if (S_ISLNK(st.st_mode)) {
-			set_pathname(thislen, path, &link);
-			return 1;
+		} else if (S_ISLNK(st.st_mode)) {
+			ret_flags = LSTAT_SYMLINK;
+		} else {
+			ret_flags = LSTAT_ERR;
 		}
 		break;
 	}
-	return 0;
+
+	/* At the end update the cache.  Note that max 3 different
+	 * path types can be cached for the moment!
+	 */
+	save_flags = ret_flags & track_flags &
+		(LSTAT_NOENT|LSTAT_SYMLINK|LSTAT_DIR);
+	if (save_flags && last_slash > 0 && last_slash < PATH_MAX) {
+		cache_path[last_slash] = '\0';
+		cache_len   = last_slash;
+		cache_flags = save_flags;
+	} else {
+		clear_lstat_cache();
+	}
+	return ret_flags;
+}
+
+/*
+ * Before usage of the check_lstat_cache() function one should call
+ * clear_lstat_cache() (at an appropriate place) to make sure that the
+ * cache is clean.
+ */
+void clear_lstat_cache(void)
+{
+	cache_path[0] = '\0';
+	cache_len     = 0;
+	cache_flags   = 0;
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index 54f301da67be879c80426bc21776427fdd38c02e..28e275981a21b033459ef9c7e420cce4bf7e5513 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -61,7 +61,7 @@ static void unlink_entry(struct cache_entry *ce)
 	char *cp, *prev;
 	char *name = ce->name;
 
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
 		return;
 	if (unlink(name))
 		return;
@@ -105,6 +105,7 @@ static int check_updates(struct unpack_trees_options *o)
 		cnt = 0;
 	}
 
+	clear_symlink_or_noent_cache();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -584,7 +585,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 	if (o->index_only || o->reset || !o->update)
 		return 0;
 
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
 		return 0;
 
 	if (!lstat(ce->name, &st)) {
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* [PATCH/RFC v3 2/2] create_directories() inside entry.c: only check each directory once!
From: Kjetil Barvik @ 2009-01-07 13:24 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Kjetil Barvik
In-Reply-To: <1231334689-17135-1-git-send-email-barvik@broadpark.no>

When we do an 'git checkout' after some time we end up in the
'checkout_entry()' function inside entry.c, and from here we call the
'create_directories()' function to make sure the all the directories
exists for the possible new file or entry.

The 'create_directories()' function happily started to check that all
path component exists.  This resulted in tons and tons of calls to
lstat() or stat() when we checkout files nested deep inside a
directory.

We try to avoid this by remembering the last checked and possible
newly created directory.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 768ba38... ec1297f... M	cache.h
:100644 100644 aa2ee46... 36d6f98... M	entry.c
:100644 100644 28e2759... 0a03e65... M	unpack-trees.c
 cache.h        |    1 +
 entry.c        |   82 +++++++++++++++++++++++++++++++++++++++++++------------
 unpack-trees.c |    1 +
 3 files changed, 66 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index 768ba3825f3015828381490b0c387177a4f71578..ec1297ff5621cc9eb7fce51cc025f18a030ac9ea 100644
--- a/cache.h
+++ b/cache.h
@@ -718,6 +718,7 @@ struct checkout {
 		 refresh_cache:1;
 };
 
+extern void clear_created_dirs_cache(void);
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
 
 #define LSTAT_DIR       (1u << 0)
diff --git a/entry.c b/entry.c
index aa2ee46a84033585d8e07a585610c5a697af82c2..36d6f98c1f59a86a5e9c117e1181c1169208168f 100644
--- a/entry.c
+++ b/entry.c
@@ -1,33 +1,67 @@
 #include "cache.h"
 #include "blob.h"
 
-static void create_directories(const char *path, const struct checkout *state)
+static char dirs_path[PATH_MAX];
+static int  dirs_len = 0;
+
+static inline int
+greatest_common_created_dirs_prefix(int len, const char *name)
 {
-	int len = strlen(path);
-	char *buf = xmalloc(len + 1);
-	const char *slash = path;
+	int max_len, match_len = 0, i = 0;
 
-	while ((slash = strchr(slash+1, '/')) != NULL) {
-		struct stat st;
-		int stat_status;
+	max_len = len < dirs_len ? len : dirs_len;
+	while (i < max_len && name[i] == dirs_path[i]) {
+		if (name[i] == '/') match_len = i;
+		i++;
+	}
+	if (i == dirs_len && len > dirs_len && name[dirs_len] == '/')
+		match_len = dirs_len;
+	return match_len;
+}
 
-		len = slash - path;
-		memcpy(buf, path, len);
-		buf[len] = 0;
+void clear_created_dirs_cache(void)
+{
+	dirs_path[0] = '\0';
+	dirs_len     = 0;
+}
 
-		if (len <= state->base_dir_len)
+static void
+create_directories(int len, const char *path, const struct checkout *state)
+{
+	int i, max_len, last_slash, stat_status;
+	struct stat st;
+
+	/* Check the cache for previously checked or created
+	 * directories (and components) within this function.  There
+	 * is no need to check or re-create directory components more
+	 * than once!
+	 */
+	max_len = len < PATH_MAX ? len : PATH_MAX;
+	i = last_slash = greatest_common_created_dirs_prefix(max_len, path);
+
+	while (i < max_len) {
+		do {
+			dirs_path[i] = path[i];
+			i++;
+		} while (i < max_len && path[i] != '/');
+		if (i >= max_len)
+			break;
+		last_slash = i;
+		dirs_path[last_slash] = '\0';
+
+		if (last_slash <= state->base_dir_len)
 			/*
 			 * checkout-index --prefix=<dir>; <dir> is
 			 * allowed to be a symlink to an existing
 			 * directory.
 			 */
-			stat_status = stat(buf, &st);
+			stat_status = stat(dirs_path, &st);
 		else
 			/*
 			 * if there currently is a symlink, we would
 			 * want to replace it with a real directory.
 			 */
-			stat_status = lstat(buf, &st);
+			stat_status = lstat(dirs_path, &st);
 
 		if (!stat_status && S_ISDIR(st.st_mode))
 			continue; /* ok, it is already a directory. */
@@ -38,14 +72,20 @@ static void create_directories(const char *path, const struct checkout *state)
 		 * error codepath; we do not care, as we unlink and
 		 * mkdir again in such a case.
 		 */
-		if (mkdir(buf, 0777)) {
+		if (mkdir(dirs_path, 0777)) {
 			if (errno == EEXIST && state->force &&
-			    !unlink(buf) && !mkdir(buf, 0777))
+			    !unlink(dirs_path) && !mkdir(dirs_path, 0777))
 				continue;
-			die("cannot create directory at %s", buf);
+			die("cannot create directory at %s", dirs_path);
 		}
 	}
-	free(buf);
+	/* Update the cache of already created/checked directories */
+	if (last_slash > 0 && last_slash < PATH_MAX) {
+		dirs_path[last_slash] = '\0';
+		dirs_len = last_slash;
+	} else {
+		clear_created_dirs_cache();
+	}
 }
 
 static void remove_subtree(const char *path)
@@ -55,6 +95,11 @@ static void remove_subtree(const char *path)
 	char pathbuf[PATH_MAX];
 	char *name;
 
+	/* To be utterly safe we invalidate the cache of the
+	 * previously created directories.
+	 */
+	clear_created_dirs_cache();
+
 	if (!dir)
 		die("cannot opendir %s (%s)", path, strerror(errno));
 	strcpy(pathbuf, path);
@@ -201,6 +246,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
 
 	memcpy(path, state->base_dir, len);
 	strcpy(path + len, ce->name);
+	len += ce_namelen(ce);
 
 	if (!lstat(path, &st)) {
 		unsigned changed = ce_match_stat(ce, &st, CE_MATCH_IGNORE_VALID);
@@ -229,6 +275,6 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
 			return error("unable to unlink old '%s' (%s)", path, strerror(errno));
 	} else if (state->not_new)
 		return 0;
-	create_directories(path, state);
+	create_directories(len, path, state);
 	return write_entry(ce, path, state, 0);
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index 28e275981a21b033459ef9c7e420cce4bf7e5513..0a03e65f9c9d869ab2d8b3c337f032ff2b8e7b2f 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -119,6 +119,7 @@ static int check_updates(struct unpack_trees_options *o)
 		}
 	}
 
+	clear_created_dirs_cache();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* Re: [PATCH (topgit) 1/2] Implement setup_pager just like in git
From: Bert Wesarg @ 2009-01-07 14:24 UTC (permalink / raw)
  To: Kirill Smelkov; +Cc: martin f krafft, git, pasky
In-Reply-To: <20090107112754.GA15158@roro3>

On Wed, Jan 7, 2009 at 12:27, Kirill Smelkov <kirr@landau.phys.spbu.ru> wrote:
> Martin, thanks for your review.
> +       # atexit(close(1); wait pager)
> +       trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
I think you need to escape the double quotes.

Bert

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Pierre Habouzit @ 2009-01-07 14:39 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901062037250.30769@pacific.mpi-cbg.de>


[-- Attachment #1.1: Type: text/plain, Size: 1832 bytes --]

On Tue, Jan 06, 2009 at 07:40:02PM +0000, Johannes Schindelin wrote:
> Hi,
> 
> On Tue, 6 Jan 2009, Pierre Habouzit wrote:
> 
> > On jeu, jan 01, 2009 at 04:38:09 +0000, Johannes Schindelin wrote:
> > > 
> > > Nothing fancy, really, just a straight-forward implementation of the
> > > heavily under-documented and under-analyzed paience diff algorithm.
> > 
> > Btw, what is the status of this series ? I see it neither in pu nor in 
> > next. And I would gladly see it included in git.
> 
> AFAICT people wanted to be reasonably sure that it is worth the effort.

Fair enough.

> Although I would like to see it in once it is fleshed out -- even if it 
> does not meet our usefulness standard -- because people said Git is 
> inferior for not providing a patience diff.  If we have --patience, we can 
> say "but we have it, it's just not useful, check for yourself".

Well I believe it's useful, but maybe the standard algorithm could be
tweaked the way Linus proposes to make the "long" lines weight louder or
so.

> Due to the lines being much longer than 80 characters, this example was 
> not useful to me.

I hope the other one was ;)

WRT the leaks, you want to squash the attached patch on the proper
patches of your series (maybe the xdl_free on map.entries could be put
in a hasmap_destroy or similar btw, but valgrind reports no more leaks
in xdiff now). As of timings, with the current implementation, patience
diff is around 30% slower than the default implementation, which is a
bit worse than what I would expect, probably due to the fact that you
had to work around the libxdiff interface I guess.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #1.2: memory-leaks.diff --]
[-- Type: text/plain, Size: 1120 bytes --]

diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index d01cbdd..c2ffb03 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -203,8 +203,10 @@ static struct entry *find_longest_common_sequence(struct hashmap *map)
 	}
 
 	/* No common unique lines were found */
-	if (!longest)
+	if (!longest) {
+		xdl_free(sequence);
 		return NULL;
+	}
 
 	/* Iterate starting at the last element, adjusting the "next" members */
 	entry = sequence[longest - 1];
@@ -213,6 +215,7 @@ static struct entry *find_longest_common_sequence(struct hashmap *map)
 		entry->previous->next = entry;
 		entry = entry->previous;
 	}
+	xdl_free(sequence);
 	return entry;
 }
 
@@ -350,6 +353,7 @@ static int patience_diff(mmfile_t *file1, mmfile_t *file2,
 			env->xdf1.rchg[line1++ - 1] = 1;
 		while(count2--)
 			env->xdf2.rchg[line2++ - 1] = 1;
+		xdl_free(map.entries);
 		return 0;
 	}
 
@@ -361,6 +365,7 @@ static int patience_diff(mmfile_t *file1, mmfile_t *file2,
 		result = fall_back_to_classic_diff(&map,
 			line1, count1, line2, count2);
 
+	xdl_free(map.entries);
 	return result;
 }
 

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply related

* Re: [PATCH (topgit) 1/2] Implement setup_pager just like in git
From: Pierre Habouzit @ 2009-01-07 14:44 UTC (permalink / raw)
  To: Kirill Smelkov; +Cc: martin f krafft, git, pasky
In-Reply-To: <20090107112754.GA15158@roro3>

[-- Attachment #1: Type: text/plain, Size: 977 bytes --]

On Wed, Jan 07, 2009 at 11:27:54AM +0000, Kirill Smelkov wrote:
> Martin, thanks for your review.
> 
> I'll too reply inline:
> 
> On Tue, Jan 06, 2009 at 09:32:03PM +0100, martin f krafft wrote:
> > Thanks, Kirill, for the patches. A couple of comments inline. I hope
> > Petr has a chance to look too.
> > 
> > also sprach Kirill Smelkov <kirr@landau.phys.spbu.ru> [2009.01.06.1616 +0100]:
> > > +# isatty FD
> > > +isatty()
> > > +{
> > > +	tty -s 0<&$1 || return 1
> > > +	return 0
> > > +}
> > 
> > You don't need any of the return statements. Functions' return
> > values are the return values of the last commands they execute.
> 
> Agree, I'll rework isatty to be just
> 
> isatty()
> {
> 	tty -s 0<&$1
> }

why not test -t 0 ? I'm not sure it's POSIX though.


-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH v2] parse-opt: migrate builtin-ls-files.
From: Pierre Habouzit @ 2009-01-07 14:46 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Junio C Hamano, git
In-Reply-To: <1231297894-31809-1-git-send-email-vmiklos@frugalware.org>

[-- Attachment #1: Type: text/plain, Size: 746 bytes --]

On Wed, Jan 07, 2009 at 03:11:34AM +0000, Miklos Vajna wrote:

> +static int option_parse_ignored(const struct option *opt,
> +				const char *arg, int unset)
> +{
> +	struct dir_struct *dir = opt->value;
> +
> +	if (unset)
> +		dir->show_ignored = 0;
> +	else
> +		dir->show_ignored = 1;

dir->show_ignored = !unset ?

> +static int option_parse_directory(const struct option *opt,
> +				  const char *arg, int unset)
> +{

ditto

> +static int option_parse_empty(const struct option *opt,
> +				 const char *arg, int unset)
> +{

ditto


-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* git rebase orthodontics
From: jidanni @ 2009-01-07 14:47 UTC (permalink / raw)
  To: git

In today's adventure we follow junior user me as he sets off in his
first bumper car ride on the wrong side of git-rebase, wherein he
discovers there are no guard rails. git version 1.6.0.6.

$ git branch
* jidanni
  master
$ git commit -m bla --allow-empty
$ git rebase --interactive master
(In the editor I change the single pick offered into a squash)
grep: ....git/rebase-merge/done: No such file or directory
Cannot 'squash' without a previous commit
(OK, but the grep error being shown to the user is a bug.)

$ git show --abbrev-commit --pretty=raw jidanni master
commit 07aef4a...
tree 28f9caca33a8294d36b3d42f21ff472c6126da16
parent 3ad166e006afc3ce57a35b6ac650569e557b024a...
commit 3ad166e...
tree 28f9caca33a8294d36b3d42f21ff472c6126da16
parent 726f8d08e2f7642d56a568eb82a685de0da0baf7
$ EDITOR=cat git rebase --interactive master
pick 07aef4a This is a commit with No files, wow. bla.
# Rebase 3ad166e..07aef4a onto 3ad166e ...
Successfully rebased and updated refs/heads/jidanni.
(But it didn't. git show shows no change. ls -l shows
refs/heads/jidanni was not touched.
OK, it seems like all I am doing is changing
              A jidanni
             /
D---E---F---G master
into the same thing, a noop. But shouldn't it warn and quit, instead
of rewarding me with the success message? Let's try it the other way
around:
$ git checkout master
$ git rebase --interactive jidanni #Wherein one sees:
noop
# Rebase 07aef4a..3ad166e onto 07aef4a
Successfully rebased and updated refs/heads/master.
OK, now I have achieved
D---E---F---G---A master, jidanni
Observations:
When I tried a noop, it didn't say noop in the editor.
When I tried a yesop, it did say noop in the editor.
In both cases it gave the same success message.

^ permalink raw reply

* Re: Problems getting rid of large files using git-filter-branch
From: Nicolas Pitre @ 2009-01-07 15:02 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
In-Reply-To: <c09652430901070026m6ca0ec98ndc7483aac8dfde89@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2045 bytes --]

On Wed, 7 Jan 2009, Øyvind Harboe wrote:

> Here is a summary of the solution I used. I'm a beginner in git
> and just summarizing what others told me and what I did. Use at
> your own risk!
> 
> 1. Remove anything you know should be removed, e.g.:
> 
> git filter-branch --tree-filter 'find . -regex ".*toolchain\..*" -exec
> rm -f {} \;' HEAD
> 
> 2. Expire the log:
> 
> git reflog expire --all
> 
> 3. Delete stuff from .git that should be manually "verified" to be
> correct. I don't actually
> know how to "verify" that at this point... Use backups Luke!
> 
> rm -rf .git/refs/original
> # delete lines w/"refs/original" from .git/packed-refs
> vi .git/packed-refs
> # for good measure...
> git reflog expire --all
> git gc
> 
> 4. Your repository is still huge. By creating a new repository and pulling from
> this one, the garbage will stay in the old one...
> 
> mkdir newrep
> cd newrep
> git init
> git pull file:///oldrep

I'd suggest you skip 2 and 3, and do 4 only.  Using 4 makes 2 
unnecessary, and is far safer than 3.  Manually deleting stuff in .git 
is fine only if you really know what you're doing and have some 
acquaintance with the git internals.

> 5. Check size of .git. If it is still too big, try figuring out which
> files that are big by looking at the packs(.git/objects/pack/xxx):
> 
> $ git verify-pack -v $PACK | grep -v "^chain " | sort -n -k 4
> 
> and then for the last few lines do a
> 
> $ git rev-list --all --objects | grep $SHA1
> 
> 6. Go back to #1 until done.
> 
> Your repository should now be of reasonable size...
> 
> I've found some great scripts for converting from svn/cvs, but really
> the above procedure
> is necessary to run when converting nasty old repositories...
> 
> -- 
> Øyvind Harboe
> http://www.zylin.com/zy1000.html
> ARM7 ARM9 XScale Cortex
> JTAG debugger and flash programmer
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH (topgit) 1/2] Implement setup_pager just like in git
From: Petr Baudis @ 2009-01-07 15:10 UTC (permalink / raw)
  To: Kirill Smelkov, Pierre Habouzit; +Cc: martin f krafft, git
In-Reply-To: <20090107112754.GA15158@roro3>

On Wed, Jan 07, 2009 at 02:27:54PM +0300, Kirill Smelkov wrote:
> >From 2193b7c703c2d31c8739eec617b8c0e8c1d09b79 Mon Sep 17 00:00:00 2001
> From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
> Date: Tue, 6 Jan 2009 17:56:37 +0300
> Subject: [PATCH (topgit) v2] Implement setup_pager just like in git
> 
> setup_pager() spawns a pager process and redirect the rest of our output
> to it.
> 
> This will be needed to fix `tg patch` output in the next commit.
> 
> Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>

But you never use it...?

> ---
>  tg.sh |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 54 insertions(+), 0 deletions(-)
> 
> diff --git a/tg.sh b/tg.sh
> index 8c23d26..bf9cf5c 100644
> --- a/tg.sh
> +++ b/tg.sh
> @@ -243,6 +243,60 @@ do_help()
>  	fi
>  }
>  
> +## Pager stuff
> +
> +# isatty FD
> +isatty()
> +{
> +	tty -s 0<&$1
> +}
> +
> +# setup_pager
> +# Spawn pager process and redirect the rest of our output to it
> +setup_pager()
> +{
> +	isatty 1 || return 0
> +
> +	# TG_PAGER = GIT_PAGER | PAGER
> +	# (but differentiate between GIT_PAGER='' and unset variables)
> +	# http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2004-03/0792.html
> +	case ${GIT_PAGER+XXX} in
> +	'')
> +		case ${PAGER+XXX} in
> +		'')

I'm pretty sure there's been a nice trick for this, but I can't remember
it at all now.

> +			# both GIT_PAGER & PAGER unset
> +			TG_PAGER=''
> +			;;
> +		*)
> +			TG_PAGER="$PAGER"
> +			;;
> +		esac
> +		;;
> +	*)
> +		TG_PAGER="$GIT_PAGER"
> +		;;
> +	esac
> +
> +	[ -z "$TG_PAGER"  -o  "$TG_PAGER" = "cat" ]  && return 0
> +
> +
> +	# now spawn pager
> +	export LESS=${LESS:-FRSX}	# as in pager.c:pager_preexec()
> +
> +	_pager_fifo_dir="$(mktemp -t -d tg-pager-fifo.XXXXXX)"
> +	_pager_fifo="$_pager_fifo_dir/0"
> +	mkfifo -m 600 "$_pager_fifo"
> +
> +	"$TG_PAGER" < "$_pager_fifo" &
> +	exec > "$_pager_fifo"		# dup2(pager_fifo.in, 1)
> +
> +	# this is needed so e.g. `git diff` will still colorize it's output if
> +	# requested in ~/.gitconfig with color.diff=auto
> +	export GIT_PAGER_IN_USE=1
> +
> +	# atexit(close(1); wait pager)
> +	trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
> +}

Frankly, I would have been just much happier if something like git
pager--helper would be provided for external tools to use. Seeing how it
gets reimplemented like this just pains me greatly.

On Wed, Jan 07, 2009 at 03:44:32PM +0100, Pierre Habouzit wrote:
> On Wed, Jan 07, 2009 at 11:27:54AM +0000, Kirill Smelkov wrote:
> > isatty()
> > {
> > 	tty -s 0<&$1
> > }
> 
> why not test -t 0 ? I'm not sure it's POSIX though.

It's SUS for many issues already it seems.

-- 
				Petr "Pasky" Baudis
The average, healthy, well-adjusted adult gets up at seven-thirty
in the morning feeling just terrible. -- Jean Kerr

^ permalink raw reply

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Nicolas Pitre @ 2009-01-07 15:31 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Linus Torvalds, Jan Krüger, Git ML
In-Reply-To: <1231314099.8870.415.camel@starfruit>

On Tue, 6 Jan 2009, R. Tyler Ballance wrote:

> On Tue, 2009-01-06 at 20:54 -0800, Linus Torvalds wrote:
> > 
> > On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> > > 
> > > I'll back the patch out and redeploy, it's worth mentioning that a
> > > coworker of mine just got the issue as well (on 1.6.1). He was able to
> > > `git pull` and the error went away, but I doubt that it "magically fixed
> > > itself"
> > 
> > Quite frankly, that behaviour sounds like a disk _cache_ corruption issue. 
> > The fact that some corruption "comes and goes" and sometimes magically 
> > heals itself sounds very much like some disk cache problem, and then that 
> > particular part of the cache gets replaced and then when re-populated it 
> > is magically correct.
> > 
> > We had that in one case with a Linux NFS client, where a rename across 
> > directories caused problems.
> > 
> > This was a networked filesystem on OS X, right? File caching is much more 
> > "interesting" in networked filesystems than it is in normal private 
> > on-disk ones.
> 
> Not quite, what I meant was that some users (not all) who've experienced
> this issue are using Samba to copy files over directly into the Git
> repository. I was mentioning this in case somewhere between Finder,
> Samba, ext3 and Git, some file system change events were pissing Git off
> and causing it.

As long as those files are not within the .git directory that should be 
fine.

> I don't think this is the case as the coworker that I
> mentioned earlier doesn't use Samba and neither do I (we both experience
> the issue today, mine disappeared by upgrading to 1.6.1, his by `git
> pull`).

Problem is that none of that "makes sense".  If a real git corruption 
was there, it wouldn't disappear without explicit action from your part.  
What git v1.6.1 is able to do over earlier versions is to still function 
properly if some corrupted objects have a redundant copy in the same 
repository, but it wouldn't stop complaining about the existence of 
corrupted data.  Doing a 'git gc' or 'git repack -a' might get rid of 
the corruption.  And a 'git pull' might hide it if the received pack 
during the pull operation happens to contain another copy of the object 
that was corrupted before, but it wouldn't prevent 'git fsck --full' 
from seeing it.

The fact that you cannot reproduce the corruption issues after 
unarchiving a repository that was known to have problems right before it 
was archived is really really strange.  That does not rule out git bugs 
of course, but at least this shows that no actual corruption on disk was 
initially involved.

Again, I'd suggest you perform your git usage within a script session so 
to capture the exact operation performed and error messages produced 
when/if similar problems do occur again.  Otherwise we're only running 
after our tail.

Nicolas

^ permalink raw reply

* Re: Error: unable to unlink ... when using "git gc"
From: Boyd Stephen Smith Jr. @ 2009-01-07 15:48 UTC (permalink / raw)
  To: git; +Cc: Sitaram Chamarty
In-Reply-To: <slrngm92hr.72d.sitaramc@sitaramc.homelinux.net>

[-- Attachment #1: Type: text/plain, Size: 1589 bytes --]

On Wednesday 2009 January 07 04:55:56 you wrote:
> On 2009-01-07, Boyd Stephen Smith Jr. <bss@iguanasuicide.net> wrote:
> > On Tuesday 06 January 2009, Sitaram Chamarty <sitaramc@gmail.com>
> > wrote=20
> >
> >>    chmod g+ws .git
> >>
> >>Now set the group to something (I use "gitpushers" ;-)
> >>
> >>    chgrp -R gitpushers .git
> >
> > ISTR this breaking here when someone on the team had a umask like 077
> > and=20 was using file:// or ssh:// to push.  I tended up "fixing" things
> > with a=20 cronjob, (which is a bit of a hack) IIRC.
>
> That doesn't sound right.  "git help init" says:
>  - 0xxx: 0xxx is an octal number and each file will have mode 0xxx
>  - 0xxx will override users umask(2) value, and thus, users
>    with a safe umask (0077) can use this option
>  - 0660 is equivalent to group.
>
> So when you say "group", you're saying "0660", and when you
> say "0660", you're overriding users umask value.

Very good then.  It is from when I was significantly less experienced in using 
git and managing repos, so I may have forgotten to use the correct options OR 
it could just have been the version of git I was using (1.4.4.4, IIRC) -- 
still using that in at least one place, as it is the current version in 
Debian Etch.

It's good to know it works properly now, if all the right switches are set.
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs microformat
From: Joey Hess @ 2009-01-07 15:50 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git
In-Reply-To: <gk2794$djn$1@ger.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]

Giuseppe Bilotta wrote:
> In this patch you do NOT add titles to the rel=vcs links, which means that
> everything works fine only if there is a single URL for each project. If a
> project has different URLs, it's going to appear multiple times as _different_
> projects to a spec-compliant reader.
> 
> A possible solution would be to make @git_url_list into a map keyed by the
> project name and having the description and repo URL(s) as values.

Yes. I considered doing that, but didn't immediatly see a way to get the
project description w/o additional overhead (of looking it up a second
time).

> > This changes git_get_project_description() to not check wantarray, and only
> > return in list context -- the only way it is used AFAICS.
> 
> I assume you mean git_get_project_url_list()?

In fact yes.
 

Thanks for the feedback. There are some changes happening to the
microformat that should make gitweb's job slightly easier, I'll respin
the patch soon.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: git rebase orthodontics
From: Boyd Stephen Smith Jr. @ 2009-01-07 16:10 UTC (permalink / raw)
  To: jidanni; +Cc: git
In-Reply-To: <87sknvxje8.fsf@jidanni.org>

[-- Attachment #1: Type: text/plain, Size: 559 bytes --]

If you want to go from:
>               A jidanni
>              /
> D---E---F---G master

To:
> D---E---F---G---A master, jidanni

You don't want rebase.  You want 'git checkout master && git merge jidanni'.  
I think you can throw --no-commit on the merge is you want to avoid a non-ff 
update to the master ref.
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: Comments on Presentation Notes Request.
From: Tim Visher @ 2009-01-07 16:11 UTC (permalink / raw)
  To: david; +Cc: git
In-Reply-To: <alpine.DEB.1.10.0901070030320.31038@asgard.lang.hm>

[-- Attachment #1: Type: text/plain, Size: 8698 bytes --]

Thanks for the suggestions so far.  I've updated the notes.

@Peff: Thanks especially for pointing me towards Junio's
presentatation.  That's an excellent source.

Here's the patch for your suggestions:

diff --git a/scmOutline.txt b/scmOutline.txt
index 1791fa0..d25198c 100644
--- a/scmOutline.txt
+++ b/scmOutline.txt
@@ -1,4 +1,4 @@
-SCM: Distributed, Centralized, and Everything in Between.

+SCM: Centralized, Distributed, and Everything in Between.

 * What is SCM and Why is it Useful?

@@ -20,7 +20,11 @@ Not only is it unlimited, but it's random access.
If you changed a function a w

 Many people can edit the same code base at the same time and know,
without a doubt, that when they pull all those changes together, the
system will merge the content intelligently or inform you of the
conflict and let you merge it.  You don't need to lock files.
Obviously, if there is bad coordination then the possibilities of
conflicts rise, but this should not happen regularly.

-*** Diff Debugging

+*** Software Archeology

+

+With a proper SCMS, it becomes a somewhat trivial operation to
discover the author and reasons for a given change.  This is because
of the rich metadata associated with commits (author, date, complete
change set, diffs, and commentary).  So rather than wandering asking
if anyone remembers doing something and why, you simply commit that
information into the system and then refer to it when you need to.

+

+**** Diff Debugging

 You can find where a bug was introduced by learning how to reproduce
the bug and then doing a binary chop search back through the History
to come to the exact commit that introduced the bug.

@@ -30,11 +34,11 @@ You can find where a bug was introduced by
learning how to reproduce the bug and

 The more you commit, the more fine grained control you have over the
undo feature of SCM.  Most documents that I have read suggested a TDD
approach wherein you commit whenever you have written just enough code
for your test to pass. But...

-** Don't Commit Broken Code (To the Public Tree)

+** Don't _Publish_ Broken Code

 Of primary concern is the fact that your central HEAD should _always_
build.  This is why practices like Continuous Integration and TDD are
so important.  TDD gives you the freedom to be sure that a change you
made hasn't broken anything you weren't expecting it to break.
Continuous Integration allows you to be sure that your whole system
will build every time.  Thus, you should _never_ commit broken code to
the (public) tree.

-Of course, in a centralized system, committing is intrinsically
public.  Even on branches, every time you commit any sort of change,
everyone is able to see it and so you could be breaking the build for
someone (even if it's just yourself and the build system).  One of the
nice features of a distributed system is that your public/private
ontology is much richer and thus allows you to have broken code in
your SCMS.

+Of course, in a centralized system, committing is intrinsically
public.  Even on branches, every time you commit any sort of change,
everyone is able to see it and so you could be breaking the build for
someone (even if it's just yourself and the build system).  One of the
nice features of a distributed system is that your public/private
ontology is much richer and thus allows you to have broken code in
your SCMS, so long as you haven't published it, at no penalty to
anyone but yourself.

 ** Whole Hog

@@ -130,7 +134,9 @@ Once you've published, however, not much changes.
Almost everything except upda

 *** Natural Backup

-Because every developer has a copy of the repository, every developer
you add adds an extra failure point.  The more developers you have,
the more backups you have of the repository.

+Because every developer has a copy of the repository, every developer
you add adds an extra layer of redundancy.  The more developers you
have, the more backups you have of the repository.

+

+An important point to make clear here is that you only are backing up
what everyone is duplicating.  If you have 10 unpublished branches
that no one else has cloned, then those are obviously not backed up.
However, the idea here would be that anything that is being developed
actively by multiple people is backed up by as many developers.  Other
than that, your private data must be backed up by you (which is what
you do anyway, right? ;).

 *** Must Learn New Work Flows.

@@ -148,6 +154,8 @@ This bears some explanation.  Within a distributed
system, you can have a single

 Git's implementation just happens to be wickedly fast.  It's faster
than mercurial, it's faster than bazaar, etc.  Everything, committing,
merging, viewing history, branching, and even updating and and pushing
are all faster.

+This is much more important than just shaving a few seconds off the
operations.  Because Git is so much faster, you begin to do things
differently because of how fast it is.  Git's blazing fast branching
and merging wouldn't matter at all if you never branched and merged
(which is possible), but because their blazing fast you _should_ begin
to branch and merge much more often, which __does__ fundamentally
change the way you develop your code (hopefully for the better).

+

 ** Tracks Content, not Files

 Git tracks content, not files, and it's the only SCMS at the moment
that does this.  This has many effects internally, but the most
apparent effect I know of is that for the first time Git can easily
tell you the history of even a function in a file because Git can tell
you which files that function existed (or does exist) in over the
course of development.

@@ -171,9 +179,9 @@ This is very powerful yet somewhat awkward to
grasp.  Basically, the upshot of t

 I've found this to be particularly useful when working with an
existing code base that was not properly formatted.  Often, I'll come
to a file that has a bunch of wonky white space choices and improperly
indented logical constructs and I'll just quickly run through it
correcting that stuff before continuing with the feature I was working
on.  Afterwords, I'll stage the formatting and commit it, and then
stage the feature I was working on and commit that.  You may not want
that kind of control (and if you don't, you don't need to use it), but
I like it.

-** Excellent Merge algorithms

+** Stupid but _Fast_ Merge Algorithms

-Git has excellent merge algorithms.  This is widely attributed and
doesn't require much explanation.  It was one of Git's original design
goals, and it has been proven by Git's implementation.  Merging in Git
is _much_ less painful than in other systems.

+Merging in Git is _much_ less painful than in other systems.  This is
mainly because of how fast it is and how much data it remembers when
it does a merge.  As opposed to CVS which can't merge a branch twice
because it doesn't remember where the last merge happened, Git keeps
track of that information so you can merge between branches as much as
you want.  Git's philosophy is to make merging as fast and painless as
possible so that you merge early and often enough to not develop
really bad conflicts that are nearly impossible to resolve.

 ** Has powerful 'maintainer tools'

@@ -196,3 +204,4 @@ Git guarantees absolutely that if corruption
happens, you will know about it.  I
 - <http://svnbook.red-bean.com/> - Rolling publish book on
Subversion.  Chapter 1 is a good introduction to general centralized
SCM concepts and principles.

 - <http://www.perforce.com/perforce/bestpractices.html> - An
excellent set of best practices from the Perforce team.  Some of it
(especially the branches) has a distinct centralized lean, but most of
it is quite good.

 - <http://www.bobev.com/PresentationsAndPapers/Common%20SCM%20Patterns.pdf>
- Interesting presentation by Pretzel Logic from 2001 attempting to
outline some common SCM best practices as Patterns.

+- <http://members.cox.net/junkio/200607-ols.pdf> - A presentation by
Junio Hamano (the Git maintainer) at a Linux symposium on what Git is
with some tutorials.

I've also attached it as a file.  It was generated by `git diff -p`.

I'm also looking for anyplace where I'm technically inaccurate.
Unfortunately, I've written a lot of this from things that I've either
read or heard.  I'm mainly experienced with VSS and Subversion (and
both of those to a very small degree), and making a lot of progress
with Git.  I've kind of been swept away by all the energy surrounding
git right now, though, so I'm sure my judgement is somewhat clouded.

Thanks again for your help!

-- 

In Christ,

Timmy V.

http://burningones.com/
http://five.sentenc.es/ - Spend less time on e-mail

[-- Attachment #2: suggestionsPatch01 --]
[-- Type: application/octet-stream, Size: 7915 bytes --]

diff --git a/scmOutline.txt b/scmOutline.txt
index 1791fa0..d25198c 100644
--- a/scmOutline.txt
+++ b/scmOutline.txt
@@ -1,4 +1,4 @@
-SCM: Distributed, Centralized, and Everything in Between.
+SCM: Centralized, Distributed, and Everything in Between.

 * What is SCM and Why is it Useful?

@@ -20,7 +20,11 @@ Not only is it unlimited, but it's random access.  If you changed a function a w

 Many people can edit the same code base at the same time and know, without a doubt, that when they pull all those changes together, the system will merge the content intelligently or inform you of the conflict and let you merge it.  You don't need to lock files.  Obviously, if there is bad coordination then the possibilities of conflicts rise, but this should not happen regularly.

-*** Diff Debugging
+*** Software Archeology
+
+With a proper SCMS, it becomes a somewhat trivial operation to discover the author and reasons for a given change.  This is because of the rich metadata associated with commits (author, date, complete change set, diffs, and commentary).  So rather than wandering asking if anyone remembers doing something and why, you simply commit that information into the system and then refer to it when you need to.
+
+**** Diff Debugging

 You can find where a bug was introduced by learning how to reproduce the bug and then doing a binary chop search back through the History to come to the exact commit that introduced the bug.

@@ -30,11 +34,11 @@ You can find where a bug was introduced by learning how to reproduce the bug and

 The more you commit, the more fine grained control you have over the undo feature of SCM.  Most documents that I have read suggested a TDD approach wherein you commit whenever you have written just enough code for your test to pass. But...

-** Don't Commit Broken Code (To the Public Tree)
+** Don't _Publish_ Broken Code

 Of primary concern is the fact that your central HEAD should _always_ build.  This is why practices like Continuous Integration and TDD are so important.  TDD gives you the freedom to be sure that a change you made hasn't broken anything you weren't expecting it to break.  Continuous Integration allows you to be sure that your whole system will build every time.  Thus, you should _never_ commit broken code to the (public) tree.

-Of course, in a centralized system, committing is intrinsically public.  Even on branches, every time you commit any sort of change, everyone is able to see it and so you could be breaking the build for someone (even if it's just yourself and the build system).  One of the nice features of a distributed system is that your public/private ontology is much richer and thus allows you to have broken code in your SCMS.
+Of course, in a centralized system, committing is intrinsically public.  Even on branches, every time you commit any sort of change, everyone is able to see it and so you could be breaking the build for someone (even if it's just yourself and the build system).  One of the nice features of a distributed system is that your public/private ontology is much richer and thus allows you to have broken code in your SCMS, so long as you haven't published it, at no penalty to anyone but yourself.

 ** Whole Hog

@@ -130,7 +134,9 @@ Once you've published, however, not much changes.  Almost everything except upda

 *** Natural Backup

-Because every developer has a copy of the repository, every developer you add adds an extra failure point.  The more developers you have, the more backups you have of the repository.  
+Because every developer has a copy of the repository, every developer you add adds an extra layer of redundancy.  The more developers you have, the more backups you have of the repository.  
+
+An important point to make clear here is that you only are backing up what everyone is duplicating.  If you have 10 unpublished branches that no one else has cloned, then those are obviously not backed up.  However, the idea here would be that anything that is being developed actively by multiple people is backed up by as many developers.  Other than that, your private data must be backed up by you (which is what you do anyway, right? ;).

 *** Must Learn New Work Flows.

@@ -148,6 +154,8 @@ This bears some explanation.  Within a distributed system, you can have a single

 Git's implementation just happens to be wickedly fast.  It's faster than mercurial, it's faster than bazaar, etc.  Everything, committing, merging, viewing history, branching, and even updating and and pushing are all faster.

+This is much more important than just shaving a few seconds off the operations.  Because Git is so much faster, you begin to do things differently because of how fast it is.  Git's blazing fast branching and merging wouldn't matter at all if you never branched and merged (which is possible), but because their blazing fast you _should_ begin to branch and merge much more often, which __does__ fundamentally change the way you develop your code (hopefully for the better).
+
 ** Tracks Content, not Files

 Git tracks content, not files, and it's the only SCMS at the moment that does this.  This has many effects internally, but the most apparent effect I know of is that for the first time Git can easily tell you the history of even a function in a file because Git can tell you which files that function existed (or does exist) in over the course of development.
@@ -171,9 +179,9 @@ This is very powerful yet somewhat awkward to grasp.  Basically, the upshot of t

 I've found this to be particularly useful when working with an existing code base that was not properly formatted.  Often, I'll come to a file that has a bunch of wonky white space choices and improperly indented logical constructs and I'll just quickly run through it correcting that stuff before continuing with the feature I was working on.  Afterwords, I'll stage the formatting and commit it, and then stage the feature I was working on and commit that.  You may not want that kind of control (and if you don't, you don't need to use it), but I like it.

-** Excellent Merge algorithms
+** Stupid but _Fast_ Merge Algorithms

-Git has excellent merge algorithms.  This is widely attributed and doesn't require much explanation.  It was one of Git's original design goals, and it has been proven by Git's implementation.  Merging in Git is _much_ less painful than in other systems.
+Merging in Git is _much_ less painful than in other systems.  This is mainly because of how fast it is and how much data it remembers when it does a merge.  As opposed to CVS which can't merge a branch twice because it doesn't remember where the last merge happened, Git keeps track of that information so you can merge between branches as much as you want.  Git's philosophy is to make mergining as fast and painless as possible so that you merge early and often enough to not develop really bad conflicts that are nearly impossible to resolve.

 ** Has powerful 'maintainer tools'

@@ -196,3 +204,4 @@ Git guarantees absolutely that if corruption happens, you will know about it.  I
 - <http://svnbook.red-bean.com/> - Rolling publish book on Subversion.  Chapter 1 is a good introduction to general centralized SCM concepts and principles.
 - <http://www.perforce.com/perforce/bestpractices.html> - An excellent set of best practices from the Perforce team.  Some of it (especially the branches) has a distinct centralized lean, but most of it is quite good.
 - <http://www.bobev.com/PresentationsAndPapers/Common%20SCM%20Patterns.pdf> - Interesting presentation by Pretzel Logic from 2001 attempting to outline some common SCM best practices as Patterns.
+- <http://members.cox.net/junkio/200607-ols.pdf> - A presentation by Junio Hamano (the Git maintainer) at a Linux symposium on what Git is with some tutorials.

^ permalink raw reply related

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-07 16:07 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <1231314099.8870.415.camel@starfruit>

On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> > 
> > The thing to do is
> > 
> >  - untar it on some trusted machine with a local disk and a known-good 
> >    filesystem.
> > 
> >    IOW, not that networked samba share.
> > 
> >  - verify that it really does happen on that machine, with that untarred 
> >    image. Because maybe it doesn't. 
> 
> Unfortunately it doesn't

Well, that's not necessarily "unfortunate". It does actually end up 
showing that the objects themselves were apparently never really corrupt.

So there is no fundamental data structure corrupttion - because when you 
copy the repository, it's all good agin!

In other words, that's not a worthless piece of information at all, and it 
does tell us a lot, namely that the corruption was never real long-term 
data corruption of the git object archive, but something local and 
temporary. Again, we're really back to either:

 - it could be some _temporary_ git corruption caused internally inside a 
   git process - ie a wild pointer, or perhaps a race condition (but we 
   don't really use threading in 1.6.0.4 unless you ask for it, and even 
   then just for pack-file generation)

 - or it's the disk cache corruption, and the tar/untar ended up flushing 
   it.

And quite frankly, since the corruption seems to be site-specific, I 
really do suspect the second case. Although it's possible, of course, that 
it could be some compiler issue that makes _your_ binaries have issues 
even when nobody else sees it.

> what I did notice was this when I did a `git
> status` in the directory right after untarring:
>         tyler@grapefruit:~/jburgess_main> git status
>         #
>         # ---impressive amount of file names fly by---
>         # ----snip---
>         #
>         # Untracked files:
>         #   (use "git add <file>..." to include in what will be
>         committed)
>         #
>         #       artwork/
>         #       bt/
>         #       flash/

Hmm. That's actually _normal_ under some circumstances. At least with 
older git versions, or if your .git/index file couldn't be rewritten for 
some reason - your existing index file contains all the old stat 
information, and if git cannot (or, in the case of older git version, just 
will not) refresh it automatically, it will show all the files as changed, 
even if it's just the inode number that really changed.

A _normal_ git install should have auto-refreshed the index, though. 
Unless the tar archive only contained the ".git" directory, and not the 
checkout?

>         tyler@grapefruit:~/jburgess_main>
> 
> Basically, somehow Git thinks that *every* file in the repository is
> deleted at this point. I went ahead and performed a `git reset --hard`
> to see if the issue would manifest itself thereafter, but it did not.

That would be what I'd expect if you had only tarred up .git, although 
then I wouldn't have expected those "Untracked files:". Hmm. Without being 
able to look at the archive, I'm just guessing randomly.

> I did try to do a git-fsck(1), and this is what I got:
>         tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full
>         [1]    19381 segmentation fault  /usr/local/bin/git fsck --full
>         tyler@grapefruit:~/jburgess_main> 

.. and that's the unrelated fsck bug that got fixed later.

> >    The hope is that you caught the corruption in the cache, and it 
> >    actually got written out to the tar-file. But if it _is_ a disk cache 
> >    (well, network cache) issue, maybe the IO required to tar everything up 
> >    was enough to flush it, and the tar-file actually _works_ because it 
> >    got repopulated correctly.
> 
> When I was working through this with Jan, one of the things that we did
> was move the actual object file in .git/objects, they existed so maybe I
> could look into those to check?

Yes. If you have any bad loose objects, if you compare them to the good 
objects with the same name, that's going to be interesting information. 
The pattern of corruption can be very telling. For example, on Linux, a 
disk cache corruption would usually be at 4kB block boundaries, because 
that's the granularity of the cache. While a bit error would be obvious 
etc etc.

> I checked with our operations team, and contrary to my suspicion (your
> NFS comment piqued my curiosity), these disks that are actually on the
> machines are not NFS mounts but rather local disk arrays.

Ok, that generally makes caching much simpler. What filesystem?

Is there anything else that you do that is site-specific and/or slightly 
different? For example, a long time ago we had a bug related to CRLF 
conversion which would cause a use-after-free problem, and that would 
corrupt the data internally to git.

And dobody else saw it than this one person, and it was a total mystery to 
everybody until we realized that he used this one feature that nobody else 
was using. So as you're on OS X, I assume you don't have CRLF conversion, 
but maybe you use some other feature that we support but nobody really 
actually uses. Like keyword expansion or something?

Oh - that would also explain why you got all those entries in "git status" 
that went away when you did a "git reset --hard": if you had some keyword 
expansion (or CRLF) enabled in the original users "~/.gitconfig", that 
checkout would have had expansion/CRLF/whatever conversion, but then when 
you tarred/untarred it on another setup, the expansion would be seen as a 
difference because it wasn't enabled.

Hmm?

		Linus

^ permalink raw reply

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-07 16:08 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <alpine.LFD.2.00.0901070743070.3057@localhost.localdomain>

On Wed, 7 Jan 2009, Linus Torvalds wrote:
> 
> And dobody else saw it than this one person, and it was a total mystery to 
> everybody until we realized that he used this one feature that nobody else 
> was using. So as you're on OS X, I assume you don't have CRLF conversion, 
> but maybe you use some other feature that we support but nobody really 
> actually uses. Like keyword expansion or something?
> 
> Oh - that would also explain why you got all those entries in "git status" 
> that went away when you did a "git reset --hard": if you had some keyword 
> expansion (or CRLF) enabled in the original users "~/.gitconfig", that 
> checkout would have had expansion/CRLF/whatever conversion, but then when 
> you tarred/untarred it on another setup, the expansion would be seen as a 
> difference because it wasn't enabled.

Btw, if you untar it again, and just do a "git diff", that should show any 
such effects. Rather than showing just that something changed, it should 
show _how_ it changed.

		Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox