Git development
 help / color / mirror / Atom feed
* Re: [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-19 18:46 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.62.0505191426000.20274@localhost.localdomain>

>>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:

NP> On Thu, 19 May 2005, Junio C Hamano wrote:
>> - the command line interface "-M" to read "-M" or "-M[0-9]"
>> (one digit); -M defaults to -M5 and give the cut-off point at
>> similarity score 5000, -M9 at 9000, etc.

NP> Why not a fractional value instead?  -M1 is 100% the same while -M.95 
NP> allows for some 5% changes.

We are essentially saying the same thing.  Internally diff core
uses score between 0 and 10000 but single digit proposed above
or fractional both hides that from the user by normalizing the
scale to something less arbitrary (in my case 0-9 in your case
0-1.0).




^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-19 18:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7v4qcz16n6.fsf@assigned-by-dhcp.cox.net>

Replying to myself.

JCH> Oops,... thanks.  I still had some doubts about it and that's
JCH> why I said it was beta, but that is fine.  My doubts are minor:

Here is another "doubt" point.  I am almost embarrassed to ask
this, but what's the right way to express the following?  I
could not figure out how to silence const warnings from gcc
without using the cast there, which defeats the whole point of
const warnings:

	if (!pid) {
		const char *pgm = external_diff();
		if (pgm) {
			if (one && two) {
				const char *exec_arg[9];
				const char **arg = &exec_arg[0];
				*arg++ = pgm;
				*arg++ = name;
				*arg++ = temp[0].name;
				*arg++ = temp[0].hex;
				*arg++ = temp[0].mode;
				*arg++ = temp[1].name;
				*arg++ = temp[1].hex;
				*arg++ = temp[1].mode;
				if (other)
					*arg++ = other;
				*arg = 0;
				execvp(pgm, (char *const*) exec_arg);
			}

Here, pgm and name are const,  execvp expects char *const argv[]
as its second argument.


^ permalink raw reply

* Re: [PATCH] packed delta git
From: Thomas Glanzmann @ 2005-05-19 18:38 UTC (permalink / raw)
  To: git
In-Reply-To: <200505191428.52238.mason@suse.com>

Hello Chris,

> size (du -sh .git)              2.5G                  227M

wow that beats bitkeeper in size. What is missing to actual use such a
approach in a distributed environment?

	Thomas

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Nicolas Pitre @ 2005-05-19 18:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <7v4qcz16n6.fsf@assigned-by-dhcp.cox.net>

On Thu, 19 May 2005, Junio C Hamano wrote:

>  - the command line interface "-M" to read "-M" or "-M[0-9]"
>    (one digit); -M defaults to -M5 and give the cut-off point at
>    similarity score 5000, -M9 at 9000, etc.

Why not a fractional value instead?  -M1 is 100% the same while -M.95 
allows for some 5% changes.

This is clear and not based on some arbitrary level values.

>  - I have been assuming that diff_delta uses its two input
>    read-only but have not verified that myself yet.

It does.


Nicolas

^ permalink raw reply

* Re: [PATCH] packed delta git
From: Chris Mason @ 2005-05-19 18:28 UTC (permalink / raw)
  To: git; +Cc: Nicolas Pitre
In-Reply-To: <200505171857.46370.mason@suse.com>

On Tuesday 17 May 2005 18:57, Chris Mason wrote:
> Hello everyone,
>
> Here's a new version of my packed git patch, diffed on top of Nicolas'
> delta code (link below to that).  It doesn't change the core git commands
> to create packed/delta files, that is done via a new git-pack command.  The
> git-pack usage is very simple:
>
> git-pack [<reference_sha1>:]<target_sha1> [ <next_sha1> ... ]

My original git-pack-changes script didn't properly limit the length of the 
delta chains, so you could use it to create a repo that you can't later read.

The new one below fixes that, and also changes the direction of the delta.
Deltas are now done in reverse, leaving the most recent sha1 as a whole file
and diffing old revisions against it.

The result is the same size (62M for current linux-2.6 git tree) and faster
checkout times for head (9s vs 15s).  I also tested against the bk-cvs patch
set:
                                      vanilla               packed/delta
checkout-cache (hot)      (only 1.5G ram) 15s
checkout-cache (cold)     4m30s               1m19s  
size (du -sh .git)              2.5G                  227M

The steps to pack/delta the 2.6 git tree have changed:

# step one, pack all of HEAD together
git-ls-tree -r HEAD | awk '{print $3}' | xargs git-pack

# step two pack deltas for all revs back to the first commit
git-pack-changes-script | xargs git-pack

# step three, pack a delta from 2.6.12-rc2 to 2.6.11
git-pack-changes-script -t 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 c39ae07f393806ccf406ef966e9a15afc43cc36a | xargs git-pack

New git-pack-changes-script:

--
#!/usr/bin/perl
#
# script to search through the rev-list output and generate delta history
# you can specify either a start and stop commit or two trees to search.
# with no command line args it searches the entire revision history.
# output is suitable for piping to xargs git-pack

use strict;

my $ret;
my $i;
my @wanted = ();
my $argc = scalar(@ARGV);
my $commit;
my $stop;
my %delta = ();
my %packed = ();

sub add_packed($) {
    my ($sha1) = @_;
    if (defined($packed{$sha1})) {
        return 1;
    }
    if (defined($delta{$sha1})) {
        return 1;
    }
    $packed{$sha1} = 1;
    print "$sha1\n";
    return 0;
}

sub add_delta($$) {
    my ($ref, $target) = @_;
    my $chain = 0;
    my $recur = $ref;
    if (defined($delta{$target})) {
        return 1;
    }
    if (defined($packed{$target})) {
        return 1;
    }
    while(1) {
	last if (!defined($delta{$recur}));
	if ($target eq $delta{$recur}) {
	    add_packed($target);
	    return 1;
	}
	$chain++;
	if ($chain > 32) {
	    add_packed($target);
	    return 1;
	}
	$recur = $delta{$recur};
    }
    $delta{$target} = $ref;
    print "$ref:$target\n";
    return 0;
}

sub print_usage() {
    print STDERR "usage: pack-changes [-c commit] [-s stop commit] [-t tree1 tree2]\n";
    exit(1);
}

sub find_tree($) {
    my ($commit) = @_;
    open(CM, "git-cat-file commit $commit|") || die "git-cat-file failed";
    while(<CM>) {
        chomp;
	my @words = split;
	if ($words[0] eq "tree") {
	    return $words[1];
	} elsif ($words[0] ne "parent") {
	    last;
	}
    }
    close(CM);
    if ($? && ($ret = $? >> 8)) {
        die "cat-file $commit failed with $ret";
    }
    return undef;
}

sub test_diff($$) {
    my ($a, $b) = @_;
    open(DT, "git-diff-tree -r -t $a $b|") || die "diff-tree failed";
    while(<DT>) {
        chomp;
	my @words = split;
	my $sha1 = $words[2];
	my $change = $words[0];
	if ($change =~ m/^\*/) {
	    @words = split("->", $sha1);
	    add_delta($words[0], $words[1]);
	} elsif ($change =~ m/^\-/) {
	    next;
	} else {
	    add_packed($sha1);
	}
    }
    close(DT);
    if ($? && ($ret = $? >> 8)) {
	die "git-diff-tree failed with $ret";
    }
    return 0;
}

for ($i = 0 ; $i < $argc ; $i++)  {
    if ($ARGV[$i] eq "-c") {
    	if ($i == $argc - 1) {
	    print_usage();
	}
	$commit = $ARGV[++$i];
    } elsif ($ARGV[$i] eq "-s") {
    	if ($i == $argc - 1) {
	    print_usage();
	}
	$stop = $ARGV[++$i];
    } elsif ($ARGV[$i] eq "-t") {
        if ($argc != 3 || $i != 0) {
	    print_usage();
	}
	if (test_diff($ARGV[1], $ARGV[2])) {
	    die "test_diff failed\n";
	}
	add_delta($ARGV[1], $ARGV[2]);
	exit(0);
    }
}

if (!defined($commit)) {
    $commit = `commit-id`;
    if ($?) {
    	print STDERR "commit-id failed, try using -c to specify a commit\n";
	exit(1);
    }
    chomp $commit;
}

open(RL, "git-rev-list $commit|") || die "rev-list failed";
while(<RL>) {
    chomp;
    my $cur = $_;
    my $cur_tree;
    my $parent_tree;
    my $parent_commit = undef;
    open(PARENT, "git-cat-file commit $cur|") || die "cat-file failed";
    while(<PARENT>) {
        chomp;
	my @words = split;
	if ($words[0] eq "tree") {
	    $cur_tree = $words[1];
	    next;
	} elsif ($words[0] ne "parent") {
	    last;
	}
	$parent_commit = $words[1];
	my $next = <PARENT>;
	# ignore merge sets for now
	if ($next =~ m/^parent/) {
	    last;
	}
	# note that we run test_diff to generate a reverse
	# diff
	if (test_diff($cur, $words[1])) {
	    die "test_diff failed\n";
	}
	$parent_tree = find_tree($words[1]);
	if (!defined($parent_tree)) {
	    die "failed to find tree for $words[1]\n";
	}
	add_delta($cur_tree, $parent_tree);
	add_packed($cur);
	last;
    }
    close(PARENT);
    if (!defined($parent_commit)) {
        print STDERR "parentless commit $cur\n";
    }
    if ($? && ($ret = $? >> 8)) {
        die "cat-file failed with $ret";
    }
    if ($cur eq $stop) {
        last;
    }
}
close(RL);

if ($? && ($ret = $? >> 8)) {
    die "rev-list failed with $ret";
}


^ permalink raw reply

* Re: manpage name conflict
From: Sebastian Kuzminsky @ 2005-05-19 18:18 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.58.0505190956330.2322@ppc970.osdl.org>

Linus Torvalds <torvalds@osdl.org> wrote:
> On Thu, 19 May 2005, Sebastian Kuzminsky wrote:
> > Anyway, here's the documentation patch:
> 
> It's whitespace-corrupted, with tabs turned into spaces..


<blush>


Index: Documentation/Makefile
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/Documentation/Makefile  (mode:100644)
+++ uncommitted/Documentation/Makefile  (mode:100644)
@@ -1,6 +1,6 @@
 DOC_SRC=$(wildcard git*.txt)
 DOC_HTML=$(patsubst %.txt,%.html,$(DOC_SRC))
-DOC_MAN=$(patsubst %.txt,%.1,$(DOC_SRC))
+DOC_MAN=$(patsubst %.txt,%.1,$(wildcard git-*.txt)) git.7
 
 all: $(DOC_HTML) $(DOC_MAN)
 
@@ -13,13 +13,15 @@
 	touch $@
 
 clean:
-	rm -f *.xml *.html *.1
+	rm -f *.xml *.html *.1 *.7
 
 %.html : %.txt
 	asciidoc -b css-embedded -d manpage $<
 
-%.1 : %.xml
+%.1 %.7 : %.xml
 	xmlto man $<
+	# FIXME: this next line works around an output filename bug in asciidoc 6.0.3
+	[ "$@" = "git.7" ] || mv git.1 $@
 
 %.xml : %.txt
 	asciidoc -b docbook -d manpage $<
Index: Documentation/git-diff-helper.txt
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/Documentation/git-diff-helper.txt  (mode:100644)
+++ uncommitted/Documentation/git-diff-helper.txt  (mode:100644)
@@ -1,5 +1,5 @@
 git-diff-helper(1)
-=======================
+==================
 v0.1, May 2005
 
 NAME
Index: Documentation/git.txt
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/Documentation/git.txt  (mode:100644)
+++ uncommitted/Documentation/git.txt  (mode:100644)
@@ -1,4 +1,4 @@
-git(1)
+git(7)
 ======
 v0.1, May 2005
 


-- 
Sebastian Kuzminsky
"Marie will know I'm headed south, so's to meet me by and by"
-Townes Van Zandt

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Joel Becker @ 2005-05-19 17:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.58.0505190901340.2322@ppc970.osdl.org>

On Thu, May 19, 2005 at 09:19:28AM -0700, Linus Torvalds wrote:
> In other words, let's say that we create a new architecture or a new 
> filesystem, and we have tons of _new_ files, but not a lot of removed 
> files. It would literally be very cool to see that the new files are based 
> on contents of old files, and that it would thus potentially be very 
> interesting to see a diff like

	Subversion encourages exactly this with the 'svn cp' command.
Just as knowing when a file was renamed allows you to track the history
past its first appearance under the current name, 'cp' allows you to
follow the history even if the original name still exists.  I have found
this useful more than once.
	Now, whether you track this up front with an expensive commit or
use tools to discover the relationship at query time (ala your
why-rename-tracking-isnt-needed argument) is a different question.  As
we all know, most tools ask the user to explicitly declare the
relationship at the time it happens with 'svn rename' and 'svn cp' or
the analog.  But git could do the comparisons, with appropriate
heuristics, at the time someone asks.

Joel

-- 

Life's Little Instruction Book #335

	"Every so often, push your luck."

Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply

* Re: [PATCH] Deltification library work by Nicolas Pitre.
From: Nicolas Pitre @ 2005-05-19 17:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <7vekc3178w.fsf@assigned-by-dhcp.cox.net>

On Thu, 19 May 2005, Junio C Hamano wrote:

> >>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:
> 
> NP> In fact I think the code in that file might be simplified even further 
> NP> eventually, at which point there  might not be much of the original code 
> NP> left anymore and the license switched to GPL v2.
> 
> I am afraid that kind of code transformation would not change
> the copyright issues.

Maybe you're right.  Anyway it is a non issue now.


Nicolas

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Linus Torvalds @ 2005-05-19 17:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4qcz16n6.fsf@assigned-by-dhcp.cox.net>



On Thu, 19 May 2005, Junio C Hamano wrote:
> 
>  - I have been assuming that diff_delta uses its two input
>    read-only but have not verified that myself yet.

Since test-delta uses mmap(PROT_READ), we'd get SIGSEGV if diff_delta 
actually wrote to the thing. So this is a safe assumption.

		Linus

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-19 17:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0505190901340.2322@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> I notice that you left some debugging output in there ("**score **" 
LT> stuff), and I'll remove it, but it's merged and pushed out and passed my 
LT> trivial tests. 

Oops,... thanks.  I still had some doubts about it and that's
why I said it was beta, but that is fine.  My doubts are minor:

 - the command line interface "-M" to read "-M" or "-M[0-9]"
   (one digit); -M defaults to -M5 and give the cut-off point at
   similarity score 5000, -M9 at 9000, etc.

 - I was debating myself if adding something like this was a
   good idea (using scale between 0 and 9 corresponding the
   -M[0-9] option):

	diff --git a/arch/um/kernel/sys_call_table.c b/arch/um/sys-x86_64/sys_call_table.c
   ***  rename similarity index 8
	rename old arch/um/kernel/sys_call_table.c
	rename new arch/um/sys-x86_64/sys_call_table.c
	--- a/arch/um/kernel/sys_call_table.c
	+++ b/arch/um/sys-x86_64/sys_call_table.c

 - I have been assuming that diff_delta uses its two input
   read-only but have not verified that myself yet.

 - I did not check for leaks and knew I had outdated comments in
   some while doing the diff core interface cleanups.

A bit of clean-up patch, which may not apply exactly if you
removed the **score** stuff is attached.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
# - HEAD: Detect renames in diff family.
# + 11: Cleanup and leak fix after rename in diff family patch.
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -85,12 +85,10 @@ struct diff_spec {
 	unsigned char blob_sha1[20];
 	unsigned short mode;	 /* file mode */
 	unsigned sha1_valid : 1; /* if true, use blob_sha1 and trust mode;
-				  * however with a NULL SHA1, read them
-				  * from the file system.
-				  * if false, use the name and read mode from
+				  * if false, use the name and read from
 				  * the filesystem.
 				  */
-	unsigned file_valid : 1; /* if false the file does not even exist */
+	unsigned file_valid : 1; /* if false the file does not exist */
 };
 
 static void builtin_diff(const char *name_a,
@@ -506,6 +504,7 @@ static void free_data(struct diff_spec_h
 	else if (s->flags & SHOULD_MUNMAP)
 		munmap(s->data, s->size);
 	s->flags &= ~(SHOULD_FREE|SHOULD_MUNMAP);
+	s->data = 0;
 }
 
 static void flush_remaining_diff(struct diff_spec_hold *elem,
@@ -625,9 +624,17 @@ void diff_flush(void)
 
 	/* We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
+	 *
+	 * With the current callers, we should not have already
+	 * matched entries at this point, but it is nonetheless
+	 * checked for sanity.
 	 */
 	for (dst = createdfile; dst; dst = dst->next) {
+		if (dst->flags & MATCHED)
+			continue;
 		for (src = deletedfile; src; src = src->next) {
+			if (src->flags & MATCHED)
+				continue;
 			if (! is_exact_match(src, dst))
 				continue;
 			flush_rename_pair(src, dst);
@@ -665,6 +672,7 @@ void diff_flush(void)
 	}
 	qsort(mx, num_create*num_delete, sizeof(*mx), score_compare); 
 
+#if 0
  	for (c = 0; c < num_create * num_delete; c++) {
 		src = mx[c].src;
 		dst = mx[c].dst;
@@ -674,6 +682,7 @@ void diff_flush(void)
 			"**score ** %d %s %s\n",
 			mx[c].score, src->path, dst->path);
 	}
+#endif
 
  	for (c = 0; c < num_create * num_delete; c++) {
 		src = mx[c].src;
@@ -684,6 +693,7 @@ void diff_flush(void)
 			break;
 		flush_rename_pair(src, dst);
 	}
+	free(mx);
 
  exit_path:
 	flush_remaining_diff(createdfile, 1);


^ permalink raw reply

* Re: [PATCH] Deltification library work by Nicolas Pitre.
From: Junio C Hamano @ 2005-05-19 16:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.62.0505191104410.20274@localhost.localdomain>

>>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:

NP> In fact I think the code in that file might be simplified even further 
NP> eventually, at which point there  might not be much of the original code 
NP> left anymore and the license switched to GPL v2.

I am afraid that kind of code transformation would not change
the copyright issues.


^ permalink raw reply

* Re: manpage name conflict
From: Linus Torvalds @ 2005-05-19 16:57 UTC (permalink / raw)
  To: Sebastian Kuzminsky; +Cc: git
In-Reply-To: <E1DYnpO-0003cF-I6@highlab.com>



On Thu, 19 May 2005, Sebastian Kuzminsky wrote:
> 
> Anyway, here's the documentation patch:

It's whitespace-corrupted, with tabs turned into spaces..

		Linus

^ permalink raw reply

* Re: manpage name conflict
From: Linus Torvalds @ 2005-05-19 16:47 UTC (permalink / raw)
  To: Sebastian Kuzminsky; +Cc: git
In-Reply-To: <E1DYnpO-0003cF-I6@highlab.com>



On Thu, 19 May 2005, Sebastian Kuzminsky wrote:
> 
> But what is going to be the name of the git package?  Let's please
> not make it "git", because that's taken by the GNU Interactive Tools.
> How about "git-core" or "git-plumbing" or "linus-is-a-git"?

"git-core" sounds good to me. I don't mind "linus-is-a-git" either, but I
suspect it would end up confusing people if the git packages are installed
with something that starts with "linus-"

		Linus

^ permalink raw reply

* Re: manpage name conflict
From: Sebastian Kuzminsky @ 2005-05-19 16:24 UTC (permalink / raw)
  To: git
In-Reply-To: <20050519155804.GB4513@pasky.ji.cz>

Petr Baudis <pasky@ucw.cz> wrote:
> Does this manpage actually belong to man1? What about git(7) or
> something? It's not an actual command.


Good point.


Ok, I've appended a patch (against the top of git-pb) that moves the
git manpage to man7.  It also does two other things:

    * Sort of works around the asciidoc 6.0.3 bug where the manpages all
      get called "git.1".  It just renames them to what they should have
      been called.

    * Fixes a cut-n-paste bug in git-diff-helper.txt that was making
      asciidoc choke.




> Not directly related to this problem, but just FYI - git isn't staying
> as part of Cogito forever, actually I think its time in Cogito
> distribution is running over soon (now that I've pushed all the interesting
> local changes to git-pb, consequently to git-linus).
> 
> So you will have to either bundle it manually in the distribution
> packages, or provide a separate git package for cogito to depend on
> (when the unbundling really happens).  Either way, this is git issue,
> not cogito. :-)


Right.  Hm.  It's no problem to have git be it's own separate package
with all the appropriate relationships (cogito Requires git, and git
suggests cogito).


But what is going to be the name of the git package?  Let's please
not make it "git", because that's taken by the GNU Interactive Tools.
How about "git-core" or "git-plumbing" or "linus-is-a-git"?


;)




Anyway, here's the documentation patch:


Index: Documentation/Makefile
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/Documentation/Makefile  (mode:100644)
+++ uncommitted/Documentation/Makefile  (mode:100644)
@@ -1,6 +1,6 @@
 DOC_SRC=$(wildcard git*.txt)
 DOC_HTML=$(patsubst %.txt,%.html,$(DOC_SRC))
-DOC_MAN=$(patsubst %.txt,%.1,$(DOC_SRC))
+DOC_MAN=$(patsubst %.txt,%.1,$(wildcard git-*.txt)) git.7

 all: $(DOC_HTML) $(DOC_MAN)

@@ -13,13 +13,15 @@
        touch $@

 clean:
-       rm -f *.xml *.html *.1
+       rm -f *.xml *.html *.1 *.7

 %.html : %.txt
        asciidoc -b css-embedded -d manpage $<

-%.1 : %.xml
+%.1 %.7 : %.xml
        xmlto man $<
+       # FIXME: this next line works around an output filename bug in asciidoc 6.0.3
+       [ "$@" = "git.7" ] || mv git.1 $@

 %.xml : %.txt
        asciidoc -b docbook -d manpage $<
Index: Documentation/git-diff-helper.txt
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/Documentation/git-diff-helper.txt  (mode:100644)
+++ uncommitted/Documentation/git-diff-helper.txt  (mode:100644)
@@ -1,5 +1,5 @@
 git-diff-helper(1)
-=======================
+==================
 v0.1, May 2005

 NAME
Index: Documentation/git.txt
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/Documentation/git.txt  (mode:100644)
+++ uncommitted/Documentation/git.txt  (mode:100644)
@@ -1,4 +1,4 @@
-git(1)
+git(7)
 ======
 v0.1, May 2005


-- 
Sebastian Kuzminsky
"Marie will know I'm headed south, so's to meet me by and by"
-Townes Van Zandt

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Linus Torvalds @ 2005-05-19 16:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vu0kz1p6k.fsf@assigned-by-dhcp.cox.net>



On Thu, 19 May 2005, Junio C Hamano wrote:
> 
> Special request for Linus is to check if I did not screw up the
> various calls into the diff core from diff-tree.  Essentially
> the idea is to start one patchset session with diff_setup() and
> close it with diff_flush() before you start another patchset
> session.

It all looks ok from a quick setup, and with this I can now do

	git-whatchanged -M

in the kernel, and searching for renames I find:

	diff --git a/arch/um/kernel/sys_call_table.c b/arch/um/sys-x86_64/sys_call_table.c
	rename old arch/um/kernel/sys_call_table.c
	rename new arch/um/sys-x86_64/sys_call_table.c
	--- a/arch/um/kernel/sys_call_table.c
	+++ b/arch/um/sys-x86_64/sys_call_table.c
	@@ -1,4 +1,4 @@
	-/* 
	+/*
	  * Copyright (C) 2000 Jeff Dike (jdike@karaya.com)
	  * Copyright 2003 PathScale, Inc.
	  * Licensed under the GPL
	@@ -14,6 +14,12 @@
	 #include "sysdep/syscalls.h"
	 #include "kern_util.h"
	 
	+#ifdef CONFIG_NFSD
	....

which looks quite correct.

I notice that you left some debugging output in there ("**score **" 
stuff), and I'll remove it, but it's merged and pushed out and passed my 
trivial tests. 

[ rambling mode on: ]

One thing that struck me is that there is nothing wrong with having the 
same old file marked twice for a rename, or considering new files to be 
copies of old files. So if we ever allow that, then "rename" may be the 
wrong name for this, since the logic certainly allows the old file to 
still exist (or be removed and show up multiple times in a new guise).

In other words, let's say that we create a new architecture or a new 
filesystem, and we have tons of _new_ files, but not a lot of removed 
files. It would literally be very cool to see that the new files are based 
on contents of old files, and that it would thus potentially be very 
interesting to see a diff like

	diff --git a/arch/i386/kernel/irq.c b/arch/x86-64/kernel/irq.c
	based-on old arch/i386/kernel/irq.c
	creates new arch/x86-64/kernel/irq.c
	--- arch/i386/kernel/irq.c
	+++ arch/x86_64/kernel/irq.c
	@@ -1,205 +1,31 @@
	 /*
	- *     linux/arch/i386/kernel/irq.c
	+ *     linux/arch/x86_64/kernel/irq.c
	  *
	  *     Copyright (C) 1992, 1998 Linus Torvalds, Ingo Molnar
	  *
	...

(the above is a made-up example, but it's at least _half-way_ valid).

I'm not suggesting you actually do this, if only because it's quite
expensive: it means that any newly added file would have to be compared
with _all_ files in the previous archive, which is just too damn
expensive. But I'd like people to kind of keep this in mind as a
possibility, because maybe wasting CPU time in a big way might actually be
acceptable in some cases, and having a separate flag to enable this kind 
of thing might be interesting, no?

		Linus

^ permalink raw reply

* Re: [PATCH 0/4] Pulling refs files
From: Daniel Barkalow @ 2005-05-19 16:00 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, Linus Torvalds
In-Reply-To: <20050519065207.GB18281@pasky.ji.cz>

On Thu, 19 May 2005, Petr Baudis wrote:

> Dear diary, on Thu, May 19, 2005 at 05:19:01AM CEST, I got a letter
> where Daniel Barkalow <barkalow@iabervon.org> told me that...
> >  2) fetching reference files by name, and making them available to the
> >     local program without writing them to disk at all.
> >  3) fetching other files by name and writing them to either the
> >     corresponding filename or a provided replacement.
> > 
> > I had thought that (2) could be done as a special case of (3), but I think
> > that it has to be separate, because (2) just returns the value, while
> > (3) can't just return the contents, but has to write it somewhere, since
> > it isn't constrained to be exactly 20 bytes.
> 
> Huh. How would (2) be useful and why can't you just still write it e.g.
> to some user-supplied temporary file? I think that'd be still actually
> much less trouble for the scripts to handle.

(2) is what is needed if the user just requests downloading objects
starting with a reference stored remotely, and doesn't request that the
reference be written anywhere. It is also useful because the system wants
to verify that it has actually downloaded the objects successfully before
writing the reference.

Note that the scripts see a higher-level interface; these are the
operations that (e.g.) http-pull.c has to provide for pull.c, which builds
a larger operation (determine the target hash, download the objects, write
the specified ref file) out of them. It would be inconvenient for pull.c 
to download to a temporary file and then read the temporary file, which
shouldn't normally be visible yet, to figure out what it's doing. It wants
to have a function that takes a string and returns a hash, getting the
value from the remote host, and it's inconvenient to deal with the disk in
the middle.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: manpage name conflict
From: Petr Baudis @ 2005-05-19 15:58 UTC (permalink / raw)
  To: Sebastian Kuzminsky; +Cc: git
In-Reply-To: <E1DYmy8-0003YB-JW@highlab.com>

Dear diary, on Thu, May 19, 2005 at 05:29:52PM CEST, I got a letter
where Sebastian Kuzminsky <seb@highlab.com> told me that...
> Hi folks, I maintain a Debian package for Cogito (it just went into "Sid"
> aka "unstable"), and I just got a bug report from a user that I'd like
> your input on.
> 
> 
> The problem is that Cogito wants to install a git(1) manpage, and so does
> the GNU Interactive Tools.  The GNU Interactive Tools actually have a
> program called "git", so it seems only fair that they get to call their
> manpage by the same name.  The GIT-as-in-Cogito git(1) manpage gives
> an overview of the GIT-as-in-Cogito core, so maybe we could install it
> as git-core(1)?

Does this manpage actually belong to man1? What about git(7) or
something? It's not an actual command.


Not directly related to this problem, but just FYI - git isn't staying
as part of Cogito forever, actually I think its time in Cogito
distribution is running over soon (now that I've pushed all the interesting
local changes to git-pb, consequently to git-linus).

So you will have to either bundle it manually in the distribution
packages, or provide a separate git package for cogito to depend on
(when the unbundling really happens).  Either way, this is git issue,
not cogito. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH] Deltification library work by Nicolas Pitre.
From: Davide Libenzi @ 2005-05-19 15:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505190833380.2322@ppc970.osdl.org>

On Thu, 19 May 2005, Linus Torvalds wrote:

>
> [ This goes to the list because Davide can apparently receive the list
>   emails, but for some reason apparently doesn't like my osdl.org
>   address ]

[Just greylist timeout ;)]


> Davide,
>
> would you mind signing off on me adding the lines
>
>  *  This file is free software; you can redistribute it and/or
>  *  modify it under the terms of the GNU Lesser General Public
>  *  License as published by the Free Software Foundation; either
>  *  version 2.1 of the License, or (at your option) any later version.
> + *
> + *  Use of this within git automatically means that the LGPL
> + *  licensing gets turned into GPLv2 within this project.
>  */

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>


- Davide


^ permalink raw reply

* Re: [PATCH] Deltification library work by Nicolas Pitre.
From: Linus Torvalds @ 2005-05-19 15:36 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505190736020.2322@ppc970.osdl.org>


[ This goes to the list because Davide can apparently receive the list 
   emails, but for some reason apparently doesn't like my osdl.org 
   address ]

Davide,

would you mind signing off on me adding the lines

  *  This file is free software; you can redistribute it and/or
  *  modify it under the terms of the GNU Lesser General Public
  *  License as published by the Free Software Foundation; either
  *  version 2.1 of the License, or (at your option) any later version.
+ *
+ *  Use of this within git automatically means that the LGPL
+ *  licensing gets turned into GPLv2 within this project. 
  */

(If you just send me an ack and your "signed-off-by" line, I'll edit 
Nico's patch appropriately, and check it in with all of our sign-offs).

That way there's no question about the dual-licensing.

		Linus

^ permalink raw reply

* manpage name conflict
From: Sebastian Kuzminsky @ 2005-05-19 15:29 UTC (permalink / raw)
  To: git

Hi folks, I maintain a Debian package for Cogito (it just went into "Sid"
aka "unstable"), and I just got a bug report from a user that I'd like
your input on.


The problem is that Cogito wants to install a git(1) manpage, and so does
the GNU Interactive Tools.  The GNU Interactive Tools actually have a
program called "git", so it seems only fair that they get to call their
manpage by the same name.  The GIT-as-in-Cogito git(1) manpage gives
an overview of the GIT-as-in-Cogito core, so maybe we could install it
as git-core(1)?


What do you think?


-- 
Sebastian Kuzminsky
"Marie will know I'm headed south, so's to meet me by and by"
-Townes Van Zandt

^ permalink raw reply

* Re: [PATCH] Deltification library work by Nicolas Pitre.
From: Nicolas Pitre @ 2005-05-19 15:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.58.0505190736020.2322@ppc970.osdl.org>

On Thu, 19 May 2005, Linus Torvalds wrote:

> Sure. I'll apply this one and merge in Junio's rename on top of it, but I 
> wanted to verify one thing first:
> 
> > + *  This file is free software; you can redistribute it and/or
> > + *  modify it under the terms of the GNU Lesser General Public
> > + *  License as published by the Free Software Foundation; either
> > + *  version 2.1 of the License, or (at your option) any later version.
> 
> I don't know the different LGPL versions, so can somebody verify that LGPL
> 2.1 is fully compatible with GPLv2...
> 
> In fact I'd prefer to have that notice in the code to make it obvious that 
> the LGPL becomes the GPLv2 when linked into the rest of git.

I don't mind switching it to GPL v2 if I'm allowed to.  I kept LGPL v2.1 
for that file since that's the license used for xdiff where significant 
portion of that file has been copied from.

In fact I think the code in that file might be simplified even further 
eventually, at which point there  might not be much of the original code 
left anymore and the license switched to GPL v2.  But in the mean time 
someone else with better knowledge of GPL vs LGPL interaction is needed 
to give advice.


Nicolas

^ permalink raw reply

* Re: [PATCH] Deltification library work by Nicolas Pitre.
From: Linus Torvalds @ 2005-05-19 14:38 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.62.0505191019180.20274@localhost.localdomain>



On Thu, 19 May 2005, Nicolas Pitre wrote:
> 
> I'd prefer if the following patch was applied instead, following the 
> patch separation I've done already.

Sure. I'll apply this one and merge in Junio's rename on top of it, but I 
wanted to verify one thing first:

> + *  This file is free software; you can redistribute it and/or
> + *  modify it under the terms of the GNU Lesser General Public
> + *  License as published by the Free Software Foundation; either
> + *  version 2.1 of the License, or (at your option) any later version.

I don't know the different LGPL versions, so can somebody verify that LGPL
2.1 is fully compatible with GPLv2...

In fact I'd prefer to have that notice in the code to make it obvious that 
the LGPL becomes the GPLv2 when linked into the rest of git.

		Linus

^ permalink raw reply

* Re: [PATCH] Deltification library work by Nicolas Pitre.
From: Nicolas Pitre @ 2005-05-19 14:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: torvalds, git
In-Reply-To: <7vwtpv1pd4.fsf@assigned-by-dhcp.cox.net>

On Thu, 19 May 2005, Junio C Hamano wrote:

> This is stolen from the deltification patch by Nicolas Pitre.
> Although the deltification patch has not been submitted for the
> inclusion, the library part here is useful for the rename
> detection logic in the diff work I have been doing.  The next
> patch will depend on this, so if Nico is OK with this one,
> please consider inclusion of this patch.

I'd prefer if the following patch was applied instead, following the 
patch separation I've done already.

=====

This patch adds basic library functions to create and replay delta 
information. Also included is a test-delta utility to validate the code.

Signed-off-by: Nicolas Pitre <nico@cam.org>

Index: git/diff-delta.c
===================================================================
--- /dev/null
+++ git/diff-delta.c
@@ -0,0 +1,330 @@
+/*
+ * diff-delta.c: generate a delta between two buffers
+ *
+ *  Many parts of this file have been lifted from LibXDiff version 0.10.
+ *  http://www.xmailserver.org/xdiff-lib.html
+ *
+ *  LibXDiff was written by Davide Libenzi <davidel@xmailserver.org>
+ *  Copyright (C) 2003	Davide Libenzi
+ *
+ *  Many mods for GIT usage by Nicolas Pitre <nico@cam.org>, (C) 2005.
+ *
+ *  This file is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU Lesser General Public
+ *  License as published by the Free Software Foundation; either
+ *  version 2.1 of the License, or (at your option) any later version.
+ */
+
+#include <stdlib.h>
+#include "delta.h"
+
+
+/* block size: min = 16, max = 64k, power of 2 */
+#define BLK_SIZE 16
+
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+
+#define GR_PRIME 0x9e370001
+#define HASH(v, b) (((unsigned int)(v) * GR_PRIME) >> (32 - (b)))
+	
+/* largest prime smaller than 65536 */
+#define BASE 65521
+
+/* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */
+#define NMAX 5552
+
+#define DO1(buf, i)  { s1 += buf[i]; s2 += s1; }
+#define DO2(buf, i)  DO1(buf, i); DO1(buf, i + 1);
+#define DO4(buf, i)  DO2(buf, i); DO2(buf, i + 2);
+#define DO8(buf, i)  DO4(buf, i); DO4(buf, i + 4);
+#define DO16(buf)    DO8(buf, 0); DO8(buf, 8);
+
+static unsigned int adler32(unsigned int adler, const unsigned char *buf, int len)
+{
+	int k;
+	unsigned int s1 = adler & 0xffff;
+	unsigned int s2 = adler >> 16;
+
+	while (len > 0) {
+		k = MIN(len, NMAX);
+		len -= k;
+		while (k >= 16) {
+			DO16(buf);
+			buf += 16;
+			k -= 16;
+		}
+		if (k != 0)
+			do {
+				s1 += *buf++;
+				s2 += s1;
+			} while (--k);
+		s1 %= BASE;
+		s2 %= BASE;
+	}
+
+	return (s2 << 16) | s1;
+}
+
+static unsigned int hashbits(unsigned int size)
+{
+	unsigned int val = 1, bits = 0;
+	while (val < size && bits < 32) {
+		val <<= 1;
+	       	bits++;
+	}
+	return bits ? bits: 1;
+}
+
+typedef struct s_chanode {
+	struct s_chanode *next;
+	int icurr;
+} chanode_t;
+
+typedef struct s_chastore {
+	chanode_t *head, *tail;
+	int isize, nsize;
+	chanode_t *ancur;
+	chanode_t *sncur;
+	int scurr;
+} chastore_t;
+
+static void cha_init(chastore_t *cha, int isize, int icount)
+{
+	cha->head = cha->tail = NULL;
+	cha->isize = isize;
+	cha->nsize = icount * isize;
+	cha->ancur = cha->sncur = NULL;
+	cha->scurr = 0;
+}
+
+static void *cha_alloc(chastore_t *cha)
+{
+	chanode_t *ancur;
+	void *data;
+
+	ancur = cha->ancur;
+	if (!ancur || ancur->icurr == cha->nsize) {
+		ancur = malloc(sizeof(chanode_t) + cha->nsize);
+		if (!ancur)
+			return NULL;
+		ancur->icurr = 0;
+		ancur->next = NULL;
+		if (cha->tail)
+			cha->tail->next = ancur;
+		if (!cha->head)
+			cha->head = ancur;
+		cha->tail = ancur;
+		cha->ancur = ancur;
+	}
+
+	data = (void *)ancur + sizeof(chanode_t) + ancur->icurr;
+	ancur->icurr += cha->isize;
+	return data;
+}
+
+static void cha_free(chastore_t *cha)
+{
+	chanode_t *cur = cha->head;
+	while (cur) {
+		chanode_t *tmp = cur;
+		cur = cur->next;
+		free(tmp);
+	}
+}
+
+typedef struct s_bdrecord {
+	struct s_bdrecord *next;
+	unsigned int fp;
+	const unsigned char *ptr;
+} bdrecord_t;
+
+typedef struct s_bdfile {
+	const unsigned char *data, *top;
+	chastore_t cha;
+	unsigned int fphbits;
+	bdrecord_t **fphash;
+} bdfile_t;
+
+static int delta_prepare(const unsigned char *buf, int bufsize, bdfile_t *bdf)
+{
+	unsigned int fphbits;
+	int i, hsize;
+	const unsigned char *base, *data, *top;
+	bdrecord_t *brec;
+	bdrecord_t **fphash;
+
+	fphbits = hashbits(bufsize / BLK_SIZE + 1);
+	hsize = 1 << fphbits;
+	fphash = malloc(hsize * sizeof(bdrecord_t *));
+	if (!fphash)
+		return -1;
+	for (i = 0; i < hsize; i++)
+		fphash[i] = NULL;
+	cha_init(&bdf->cha, sizeof(bdrecord_t), hsize / 4 + 1);
+
+	bdf->data = data = base = buf;
+	bdf->top = top = buf + bufsize;
+	data += (bufsize / BLK_SIZE) * BLK_SIZE;
+	if (data == top)
+		data -= BLK_SIZE;
+
+	for ( ; data >= base; data -= BLK_SIZE) {
+		brec = cha_alloc(&bdf->cha);
+		if (!brec) {
+			cha_free(&bdf->cha);
+			free(fphash);
+			return -1;
+		}
+		brec->fp = adler32(0, data, MIN(BLK_SIZE, top - data));
+		brec->ptr = data;
+		i = HASH(brec->fp, fphbits);
+		brec->next = fphash[i];
+		fphash[i] = brec;
+	}
+
+	bdf->fphbits = fphbits;
+	bdf->fphash = fphash;
+
+	return 0;
+}
+
+static void delta_cleanup(bdfile_t *bdf)
+{
+	free(bdf->fphash);
+	cha_free(&bdf->cha);
+}
+
+#define COPYOP_SIZE(o, s) \
+    (!!(o & 0xff) + !!(o & 0xff00) + !!(o & 0xff0000) + !!(o & 0xff000000) + \
+     !!(s & 0xff) + !!(s & 0xff00) + 1)
+
+void *diff_delta(void *from_buf, unsigned long from_size,
+		 void *to_buf, unsigned long to_size,
+		 unsigned long *delta_size)
+{
+	int i, outpos, outsize, inscnt, csize, msize, moff;
+	unsigned int fp;
+	const unsigned char *data, *top, *ptr1, *ptr2;
+	unsigned char *out, *orig;
+	bdrecord_t *brec;
+	bdfile_t bdf;
+
+	if (!from_size || !to_size || delta_prepare(from_buf, from_size, &bdf))
+		return NULL;
+	
+	outpos = 0;
+	outsize = 8192;
+	out = malloc(outsize);
+	if (!out) {
+		delta_cleanup(&bdf);
+		return NULL;
+	}
+
+	data = to_buf;
+	top = to_buf + to_size;
+
+	/* store reference buffer size */
+	orig = out + outpos++;
+	*orig = i = 0;
+	do {
+		if (from_size & 0xff) {
+			*orig |= (1 << i);
+			out[outpos++] = from_size;
+		}
+		i++;
+		from_size >>= 8;
+	} while (from_size);
+
+	/* store target buffer size */
+	orig = out + outpos++;
+	*orig = i = 0;
+	do {
+		if (to_size & 0xff) {
+			*orig |= (1 << i);
+			out[outpos++] = to_size;
+		}
+		i++;
+		to_size >>= 8;
+	} while (to_size);
+
+	inscnt = 0;
+	moff = 0;
+	while (data < top) {
+		msize = 0;
+		fp = adler32(0, data, MIN(top - data, BLK_SIZE));
+		i = HASH(fp, bdf.fphbits);
+		for (brec = bdf.fphash[i]; brec; brec = brec->next) {
+			if (brec->fp == fp) {
+				csize = bdf.top - brec->ptr;
+				if (csize > top - data)
+					csize = top - data;
+				for (ptr1 = brec->ptr, ptr2 = data; 
+				     csize && *ptr1 == *ptr2;
+				     csize--, ptr1++, ptr2++);
+
+				csize = ptr1 - brec->ptr;
+				if (csize > msize) {
+					moff = brec->ptr - bdf.data;
+					msize = csize;
+					if (msize >= 0x10000) {
+						msize = 0x10000;
+						break;
+					}
+				}
+			}
+		}
+
+		if (!msize || msize < COPYOP_SIZE(moff, msize)) {
+			if (!inscnt)
+				outpos++;
+			out[outpos++] = *data++;
+			inscnt++;
+			if (inscnt == 0x7f) {
+				out[outpos - inscnt - 1] = inscnt;
+				inscnt = 0;
+			}
+		} else {
+			if (inscnt) {
+				out[outpos - inscnt - 1] = inscnt;
+				inscnt = 0;
+			}
+
+			data += msize;
+			orig = out + outpos++;
+			i = 0x80;
+
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x01; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x02; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x04; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x08; }
+
+			if (msize & 0xff) { out[outpos++] = msize; i |= 0x10; }
+			msize >>= 8;
+			if (msize & 0xff) { out[outpos++] = msize; i |= 0x20; }
+
+			*orig = i;
+		}
+
+		/* next time around the largest possible output is 1 + 4 + 3 */
+		if (outpos > outsize - 8) {
+			void *tmp = out;
+			outsize = outsize * 3 / 2;
+			out = realloc(out, outsize);
+			if (!out) {
+				free(tmp);
+				delta_cleanup(&bdf);
+				return NULL;
+			}
+		}
+	}
+
+	if (inscnt)
+		out[outpos - inscnt - 1] = inscnt;
+
+	delta_cleanup(&bdf);
+	*delta_size = outpos;
+	return out;
+}
Index: git/delta.h
===================================================================
--- /dev/null
+++ git/delta.h
@@ -0,0 +1,6 @@
+extern void *diff_delta(void *from_buf, unsigned long from_size,
+			void *to_buf, unsigned long to_size,
+		        unsigned long *delta_size);
+extern void *patch_delta(void *src_buf, unsigned long src_size,
+			 void *delta_buf, unsigned long delta_size,
+			 unsigned long *dst_size);
Index: git/Makefile
===================================================================
--- git.orig/Makefile
+++ git/Makefile
@@ -36,9 +36,9 @@
 	$(INSTALL) $(PROG) $(SCRIPTS) $(dest)$(bin)
 
 LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o \
-	 tag.o date.o
+	 tag.o date.o diff-delta.o patch-delta.o
 LIB_FILE=libgit.a
-LIB_H=cache.h object.h blob.h tree.h commit.h tag.h
+LIB_H=cache.h object.h blob.h tree.h commit.h tag.h delta.h
 
 LIB_H += strbuf.h
 LIB_OBJS += strbuf.o
@@ -72,6 +72,9 @@
 test-date: test-date.c date.o
 	$(CC) $(CFLAGS) -o $@ test-date.c date.o
 
+test-delta: test-delta.c diff-delta.o patch-delta.o
+	$(CC) $(CFLAGS) -o $@ $^
+
 git-%: %.c $(LIB_FILE)
 	$(CC) $(CFLAGS) -o $@ $(filter %.c,$^) $(LIBS)
 
Index: git/patch-delta.c
===================================================================
--- /dev/null
+++ git/patch-delta.c
@@ -0,0 +1,88 @@
+/*
+ * patch-delta.c:
+ * recreate a buffer from a source and the delta produced by diff-delta.c
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+#include "delta.h"
+
+void *patch_delta(void *src_buf, unsigned long src_size,
+		  void *delta_buf, unsigned long delta_size,
+		  unsigned long *dst_size)
+{
+	const unsigned char *data, *top;
+	unsigned char *dst_buf, *out, cmd;
+	unsigned long size;
+	int i;
+
+	/* the smallest delta size possible is 6 bytes */
+	if (delta_size < 6)
+		return NULL;
+
+	data = delta_buf;
+	top = delta_buf + delta_size;
+
+	/* make sure the orig file size matches what we expect */
+	size = i = 0;
+	cmd = *data++;
+	while (cmd) {
+		if (cmd & 1)
+			size |= *data++ << i;
+		i += 8;
+		cmd >>= 1;
+	}
+	if (size != src_size)
+		return NULL;
+
+	/* now the result size */
+	size = i = 0;
+	cmd = *data++;
+	while (cmd) {
+		if (cmd & 1)
+			size |= *data++ << i;
+		i += 8;
+		cmd >>= 1;
+	}
+	dst_buf = malloc(size);
+	if (!dst_buf)
+		return NULL;
+
+	out = dst_buf;
+	while (data < top) {
+		cmd = *data++;
+		if (cmd & 0x80) {
+			unsigned long cp_off = 0, cp_size = 0;
+			const unsigned char *buf;
+			if (cmd & 0x01) cp_off = *data++;
+			if (cmd & 0x02) cp_off |= (*data++ << 8);
+			if (cmd & 0x04) cp_off |= (*data++ << 16);
+			if (cmd & 0x08) cp_off |= (*data++ << 24);
+			if (cmd & 0x10) cp_size = *data++;
+			if (cmd & 0x20) cp_size |= (*data++ << 8);
+			if (cp_size == 0) cp_size = 0x10000;
+			buf = (cmd & 0x40) ? dst_buf : src_buf;
+			memcpy(out, buf + cp_off, cp_size);
+			out += cp_size;
+		} else {
+			memcpy(out, data, cmd);
+			out += cmd;
+			data += cmd;
+		}
+	}
+
+	/* sanity check */
+	if (data != top || out - dst_buf != size) {
+		free(dst_buf);
+		return NULL;
+	}
+
+	*dst_size = size;
+	return dst_buf;
+}
Index: git/test-delta.c
===================================================================
--- /dev/null
+++ git/test-delta.c
@@ -0,0 +1,79 @@
+/*
+ * test-delta.c: test code to exercise diff-delta.c and patch-delta.c
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include "delta.h"
+
+static const char *usage =
+	"test-delta (-d|-p) <from_file> <data_file> <out_file>";
+
+int main(int argc, char *argv[])
+{
+	int fd;
+	struct stat st;
+	void *from_buf, *data_buf, *out_buf;
+	unsigned long from_size, data_size, out_size;
+
+	if (argc != 5 || (strcmp(argv[1], "-d") && strcmp(argv[1], "-p"))) {
+		fprintf(stderr, "Usage: %s\n", usage);
+		return 1;
+	}
+
+	fd = open(argv[2], O_RDONLY);
+	if (fd < 0 || fstat(fd, &st)) {
+		perror(argv[2]);
+		return 1;
+	}
+	from_size = st.st_size;
+	from_buf = mmap(NULL, from_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (from_buf == MAP_FAILED) {
+		perror(argv[2]);
+		return 1;
+	}
+	close(fd);
+
+	fd = open(argv[3], O_RDONLY);
+	if (fd < 0 || fstat(fd, &st)) {
+		perror(argv[3]);
+		return 1;
+	}
+	data_size = st.st_size;
+	data_buf = mmap(NULL, data_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (data_buf == MAP_FAILED) {
+		perror(argv[3]);
+		return 1;
+	}
+	close(fd);
+
+	if (argv[1][1] == 'd')
+		out_buf = diff_delta(from_buf, from_size,
+				     data_buf, data_size, &out_size);
+	else
+		out_buf = patch_delta(from_buf, from_size,
+				      data_buf, data_size, &out_size);
+	if (!out_buf) {
+		fprintf(stderr, "delta operation failed (returned NULL)\n");
+		return 1;
+	}
+
+	fd = open (argv[4], O_WRONLY|O_CREAT|O_TRUNC, 0666);
+	if (fd < 0 || write(fd, out_buf, out_size) != out_size) {
+		perror(argv[4]);
+		return 1;
+	}
+
+	return 0;
+}

^ permalink raw reply

* Re: gitk-1.0 released
From: Ingo Molnar @ 2005-05-19 13:30 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git
In-Reply-To: <20050519132411.GA29111@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> - the ability to copy & paste from all the windows would be nice. (e.g. 
>   in the bugreport above i had to type down the "Octopus merge .." text 
>   instead of pasting it from gitk)

scrap this one - the patch view window allows copy & paste, and the name 
of the patch is included there too.

	Ingo

^ permalink raw reply

* Re: gitk-1.0 released
From: Ingo Molnar @ 2005-05-19 13:24 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git
In-Reply-To: <17036.36624.911071.810357@cargo.ozlabs.ibm.com>


* Paul Mackerras <paulus@samba.org> wrote:

> I have released a new version of gitk.  I got brave and called it 1.0
> and it is at:
> 
> 	http://ozlabs.org/~paulus/gitk-1.0

very nice! Works well and it's pretty fast on a 2GHz P4.

a bugreport: when looking at the main git history, the following commit 
seems to be rendered incorrectly:

 211232bae64bcc60bbf5d1b5e5b2344c22ed767e

The "Octopus merge ..." text is incorrectly overlayed with a graph line.

here's a feature wishlist if you dont mind:

- the ability to copy & paste from all the windows would be nice. (e.g. 
  in the bugreport above i had to type down the "Octopus merge .." text 
  instead of pasting it from gitk)

- i guess this one is on your todo list: the history graph of a single
  object (file).

- first window appearance on an uncached repository can be pretty slow 
  due to disk seeks - so it might make sense to display something (an 
  hourglass?) sooner - when i first started it i thought it hung. On 
  already cached repositories the window comes up immediately, and the 
  list of commits is updated dynamically.

(and the biggest missing feature of GIT right now is author + 
last-commit annotated file viewing which could be integrated into gitk 
a'ka BK's revtool: selecting a given line of the file would bring one to 
that commit, etc.)

	Ingo

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox