* Re: the war on trailing whitespace
From: Andrew Morton @ 2006-02-26 5:07 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7v1wxq7psj.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano <junkio@cox.net> wrote:
>
> Andrew Morton <akpm@osdl.org> writes:
>
> > It's invariably pointless to add lines which have trailing whitespace.
> > Nobody cares much, but my scripts spam me when it happens, so I've become
> > obsessive....
>
> I do not call me obsessive, but I do enable pre-commit and
> pre-applypatch hooks I ship with git myself.
It's apparent that few others do this.
> > I realise that we cannot do this when doing git fetches, but when importing
> > patches and mboxes, git ought to whine loudly about input which matches the
> > above regexp, and it should offer an option to tidy it up. Perhaps by
> > default.
>
> I stole the policy the sample hook scripts use from you; it is
> not enabled by default, and as the tool manufacturer I am a bit
> reluctant to do so.
>
It's not strong enough. I mean, if you ask a developer "do you wish to add
new trialing whitespace to the kernel" then obviously their answer would be
"no". So how do we help them in this?
I'd suggest a) git will simply refuse to apply such a patch unless given a
special `forcing' flag, b) even when thus forced, it will still warn and c)
with a different flag, it will strip-then-apply, without generating a
warning.
^ permalink raw reply
* Re: [PATCH] First cut at libifying revlist generation
From: Junio C Hamano @ 2006-02-26 3:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0602251608160.22647@g5.osdl.org>
Linus Torvalds <torvalds@osdl.org> writes:
> This really just splits things up partially, and creates the
> interface to set things up by parsign the command line.
Sorry, I'll keep this but I ended up wasting the whole day
setting up a new machine for my wife (a windows box). The next
task would be to install Cygwin so I can have fun with git...
^ permalink raw reply
* Re: the war on trailing whitespace
From: Junio C Hamano @ 2006-02-26 3:38 UTC (permalink / raw)
To: Andrew Morton; +Cc: git
In-Reply-To: <20060225174047.0e9a6d29.akpm@osdl.org>
Andrew Morton <akpm@osdl.org> writes:
> It's invariably pointless to add lines which have trailing whitespace.
> Nobody cares much, but my scripts spam me when it happens, so I've become
> obsessive....
I do not call me obsessive, but I do enable pre-commit and
pre-applypatch hooks I ship with git myself.
> I realise that we cannot do this when doing git fetches, but when importing
> patches and mboxes, git ought to whine loudly about input which matches the
> above regexp, and it should offer an option to tidy it up. Perhaps by
> default.
I stole the policy the sample hook scripts use from you; it is
not enabled by default, and as the tool manufacturer I am a bit
reluctant to do so.
However, as a kernel project maintainer high in the foodchain,
I'd imagine your plea to your fellow maintainers who apply
patches using git tools would be heard well.
^ permalink raw reply
* [PATCH] annotate: Convert all -| calls to use a helper open_pipe().
From: Ryan Anderson @ 2006-02-26 3:02 UTC (permalink / raw)
To: junkio; +Cc: git, Ryan Anderson
In-Reply-To: <11409185133616-git-send-email-ryan@michonline.com>
When we settle on a solution for ActiveState's forking issues, all
compatibility checks can be handled inside this one function.
Also, fixed an abuse of global variables in the process of cleaning this up.
Signed-off-by: Ryan Anderson <ryan@michonline.com>
---
I should have done this before I sent the other patch out, sorry about
that. This one will also be in my annotate-upstream branch, but I'm a
bit less sure where we stand on this kind of fixup at the moment. Even
if this is the wrong place to start, this at least flags all the spots
that need fixing in a nice way.
git-annotate.perl | 73 ++++++++++++++++++++++++++++++++---------------------
1 files changed, 44 insertions(+), 29 deletions(-)
3e72e9de7ece2b9c2c9447e761ee721bdd94a33f
diff --git a/git-annotate.perl b/git-annotate.perl
index 91da6d5..ee8ff15 100755
--- a/git-annotate.perl
+++ b/git-annotate.perl
@@ -98,15 +98,15 @@ while (my $bound = pop @stack) {
push @revqueue, $head;
init_claim( defined $starting_rev ? $starting_rev : 'dirty');
unless (defined $starting_rev) {
- open(DIFF,"-|","git","diff","-R", "HEAD", "--",$filename)
+ my $diff = open_pipe("git","diff","-R", "HEAD", "--",$filename)
or die "Failed to call git diff to check for dirty state: $!";
- _git_diff_parse(*DIFF, $head, "dirty", (
+ _git_diff_parse($diff, $head, "dirty", (
'author' => gitvar_name("GIT_AUTHOR_IDENT"),
'author_date' => sprintf("%s +0000",time()),
)
);
- close(DIFF);
+ close($diff);
}
handle_rev();
@@ -172,20 +172,21 @@ sub handle_rev {
sub git_rev_list {
my ($rev, $file) = @_;
+ my $revlist;
if ($rev_file) {
- open(P, '<' . $rev_file);
+ open($revlist, '<' . $rev_file);
} else {
- open(P,"-|","git-rev-list","--parents","--remove-empty",$rev,"--",$file)
+ $revlist = open_pipe("git-rev-list","--parents","--remove-empty",$rev,"--",$file)
or die "Failed to exec git-rev-list: $!";
}
my @revs;
- while(my $line = <P>) {
+ while(my $line = <$revlist>) {
chomp $line;
my ($rev, @parents) = split /\s+/, $line;
push @revs, [ $rev, @parents ];
}
- close(P);
+ close($revlist);
printf("0 revs found for rev %s (%s)\n", $rev, $file) if (@revs == 0);
return @revs;
@@ -194,22 +195,22 @@ sub git_rev_list {
sub find_parent_renames {
my ($rev, $file) = @_;
- open(P,"-|","git-diff-tree", "-M50", "-r","--name-status", "-z","$rev")
+ my $patch = open_pipe("git-diff-tree", "-M50", "-r","--name-status", "-z","$rev")
or die "Failed to exec git-diff: $!";
local $/ = "\0";
my %bound;
- my $junk = <P>;
- while (my $change = <P>) {
+ my $junk = <$patch>;
+ while (my $change = <$patch>) {
chomp $change;
- my $filename = <P>;
+ my $filename = <$patch>;
chomp $filename;
if ($change =~ m/^[AMD]$/ ) {
next;
} elsif ($change =~ m/^R/ ) {
my $oldfilename = $filename;
- $filename = <P>;
+ $filename = <$patch>;
chomp $filename;
if ( $file eq $filename ) {
my $parent = git_find_parent($rev, $oldfilename);
@@ -218,7 +219,7 @@ sub find_parent_renames {
}
}
}
- close(P);
+ close($patch);
return \%bound;
}
@@ -227,14 +228,14 @@ sub find_parent_renames {
sub git_find_parent {
my ($rev, $filename) = @_;
- open(REVPARENT,"-|","git-rev-list","--remove-empty", "--parents","--max-count=1","$rev","--",$filename)
+ my $revparent = open_pipe("git-rev-list","--remove-empty", "--parents","--max-count=1","$rev","--",$filename)
or die "Failed to open git-rev-list to find a single parent: $!";
- my $parentline = <REVPARENT>;
+ my $parentline = <$revparent>;
chomp $parentline;
my ($revfound,$parent) = split m/\s+/, $parentline;
- close(REVPARENT);
+ close($revparent);
return $parent;
}
@@ -245,13 +246,13 @@ sub git_find_parent {
sub git_diff_parse {
my ($parent, $rev, %revinfo) = @_;
- open(DIFF,"-|","git-diff-tree","-M","-p",$rev,$parent,"--",
+ my $diff = open_pipe("git-diff-tree","-M","-p",$rev,$parent,"--",
$revs{$rev}{'filename'}, $revs{$parent}{'filename'})
or die "Failed to call git-diff for annotation: $!";
- _git_diff_parse(*DIFF, $parent, $rev, %revinfo);
+ _git_diff_parse($diff, $parent, $rev, %revinfo);
- close(DIFF);
+ close($diff);
}
sub _git_diff_parse {
@@ -264,7 +265,7 @@ sub _git_diff_parse {
my $gotheader = 0;
my ($remstart);
my ($hunk_start, $hunk_index);
- while(<DIFF>) {
+ while(<$diff>) {
chomp;
if (m/^@@ -(\d+),(\d+) \+(\d+),(\d+)/) {
$remstart = $1;
@@ -335,15 +336,15 @@ sub git_cat_file {
my $blob = git_ls_tree($rev, $filename);
- open(C,"-|","git","cat-file", "blob", $blob)
+ my $catfile = open_pipe("git","cat-file", "blob", $blob)
or die "Failed to git-cat-file blob $blob (rev $rev, file $filename): " . $!;
my @lines;
- while(<C>) {
+ while(<$catfile>) {
chomp;
push @lines, $_;
}
- close(C);
+ close($catfile);
return @lines;
}
@@ -351,15 +352,15 @@ sub git_cat_file {
sub git_ls_tree {
my ($rev, $filename) = @_;
- open(T,"-|","git","ls-tree",$rev,$filename)
+ my $lstree = open_pipe("git","ls-tree",$rev,$filename)
or die "Failed to call git ls-tree: $!";
my ($mode, $type, $blob, $tfilename);
- while(<T>) {
+ while(<$lstree>) {
($mode, $type, $blob, $tfilename) = split(/\s+/, $_, 4);
last if ($tfilename eq $filename);
}
- close(T);
+ close($lstree);
return $blob if $filename eq $filename;
die "git-ls-tree failed to find blob for $filename";
@@ -379,11 +380,11 @@ sub claim_line {
sub git_commit_info {
my ($rev) = @_;
- open(COMMIT, "-|","git-cat-file", "commit", $rev)
+ my $commit = open_pipe("git-cat-file", "commit", $rev)
or die "Failed to call git-cat-file: $!";
my %info;
- while(<COMMIT>) {
+ while(<$commit>) {
chomp;
last if (length $_ == 0);
@@ -397,7 +398,7 @@ sub git_commit_info {
$info{'committer_date'} = $3;
}
}
- close(COMMIT);
+ close($commit);
return %info;
}
@@ -430,3 +431,17 @@ sub gitvar_name {
return join(' ', @field[0...(@field-4)]);
}
+
+sub open_pipe {
+ my (@execlist) = @_;
+
+ my $pid = open my $kid, "-|";
+ defined $pid or die "Cannot fork: $!";
+
+ unless ($pid) {
+ exec @execlist;
+ die "Cannot exec @execlist: $!";
+ }
+
+ return $kid;
+}
--
1.2.3.g9ca3
^ permalink raw reply related
* [PATCH] annotate: Handle dirty state and arbitrary revisions.
From: Ryan Anderson @ 2006-02-26 1:48 UTC (permalink / raw)
To: junkio; +Cc: git, Ryan Anderson
Also, use Getopt::Long and only process each rev once.
(Thanks to Morten Welinder for spotting the performance problems.)
Signed-off-by: Ryan Anderson <ryan@michonline.com>
---
Or pull from http://h4x0r5.com/~ryan/git/ryan.git/ annotate-upstream
git-annotate.perl | 150 ++++++++++++++++++++++++++++++++++++++++-------------
1 files changed, 113 insertions(+), 37 deletions(-)
9ca393b055b53df4eb9284b39f38c5b42b7f93b1
diff --git a/git-annotate.perl b/git-annotate.perl
index 3800c46..91da6d5 100755
--- a/git-annotate.perl
+++ b/git-annotate.perl
@@ -8,44 +8,62 @@
use warnings;
use strict;
-use Getopt::Std;
+use Getopt::Long;
use POSIX qw(strftime gmtime);
sub usage() {
- print STDERR 'Usage: ${\basename $0} [-s] [-S revs-file] file
-
- -l show long rev
- -r follow renames
- -S commit use revs from revs-file instead of calling git-rev-list
+ print STDERR 'Usage: ${\basename $0} [-s] [-S revs-file] file [ revision ]
+ -l, --long
+ Show long rev (Defaults off)
+ -r, --rename
+ Follow renames (Defaults on).
+ -S, --rev-file revs-file
+ use revs from revs-file instead of calling git-rev-list
+ -h, --help
+ This message.
';
exit(1);
}
-our ($opt_h, $opt_l, $opt_r, $opt_S);
-getopts("hlrS:") or usage();
-$opt_h && usage();
+our ($help, $longrev, $rename, $starting_rev, $rev_file) = (0, 0, 1);
+
+my $rc = GetOptions( "long|l" => \$longrev,
+ "help|h" => \$help,
+ "rename|r" => \$rename,
+ "rev-file|S" => \$rev_file);
+if (!$rc or $help) {
+ usage();
+}
my $filename = shift @ARGV;
+if (@ARGV) {
+ $starting_rev = shift @ARGV;
+}
my @stack = (
{
- 'rev' => "HEAD",
+ 'rev' => defined $starting_rev ? $starting_rev : "HEAD",
'filename' => $filename,
},
);
-our (@lineoffsets, @pendinglineoffsets);
our @filelines = ();
-open(F,"<",$filename)
- or die "Failed to open filename: $!";
-while(<F>) {
- chomp;
- push @filelines, $_;
+if (defined $starting_rev) {
+ @filelines = git_cat_file($starting_rev, $filename);
+} else {
+ open(F,"<",$filename)
+ or die "Failed to open filename: $!";
+
+ while(<F>) {
+ chomp;
+ push @filelines, $_;
+ }
+ close(F);
+
}
-close(F);
-our $leftover_lines = @filelines;
+
our %revs;
our @revqueue;
our $head;
@@ -66,7 +84,7 @@ while (my $bound = pop @stack) {
next;
}
- if (!$opt_r) {
+ if (!$rename) {
next;
}
@@ -78,8 +96,18 @@ while (my $bound = pop @stack) {
}
}
push @revqueue, $head;
-init_claim($head);
-$revs{$head}{'lineoffsets'} = {};
+init_claim( defined $starting_rev ? $starting_rev : 'dirty');
+unless (defined $starting_rev) {
+ open(DIFF,"-|","git","diff","-R", "HEAD", "--",$filename)
+ or die "Failed to call git diff to check for dirty state: $!";
+
+ _git_diff_parse(*DIFF, $head, "dirty", (
+ 'author' => gitvar_name("GIT_AUTHOR_IDENT"),
+ 'author_date' => sprintf("%s +0000",time()),
+ )
+ );
+ close(DIFF);
+}
handle_rev();
@@ -88,7 +116,7 @@ foreach my $l (@filelines) {
my ($output, $rev, $committer, $date);
if (ref $l eq 'ARRAY') {
($output, $rev, $committer, $date) = @$l;
- if (!$opt_l && length($rev) > 8) {
+ if (!$longrev && length($rev) > 8) {
$rev = substr($rev,0,8);
}
} else {
@@ -102,7 +130,6 @@ foreach my $l (@filelines) {
sub init_claim {
my ($rev) = @_;
- my %revinfo = git_commit_info($rev);
for (my $i = 0; $i < @filelines; $i++) {
$filelines[$i] = [ $filelines[$i], '', '', '', 1];
# line,
@@ -117,7 +144,9 @@ sub init_claim {
sub handle_rev {
my $i = 0;
+ my %seen;
while (my $rev = shift @revqueue) {
+ next if $seen{$rev}++;
my %revinfo = git_commit_info($rev);
@@ -143,8 +172,8 @@ sub handle_rev {
sub git_rev_list {
my ($rev, $file) = @_;
- if ($opt_S) {
- open(P, '<' . $opt_S);
+ if ($rev_file) {
+ open(P, '<' . $rev_file);
} else {
open(P,"-|","git-rev-list","--parents","--remove-empty",$rev,"--",$file)
or die "Failed to exec git-rev-list: $!";
@@ -216,24 +245,31 @@ sub git_find_parent {
sub git_diff_parse {
my ($parent, $rev, %revinfo) = @_;
- my ($ri, $pi) = (0,0);
open(DIFF,"-|","git-diff-tree","-M","-p",$rev,$parent,"--",
$revs{$rev}{'filename'}, $revs{$parent}{'filename'})
or die "Failed to call git-diff for annotation: $!";
+ _git_diff_parse(*DIFF, $parent, $rev, %revinfo);
+
+ close(DIFF);
+}
+
+sub _git_diff_parse {
+ my ($diff, $parent, $rev, %revinfo) = @_;
+
+ my ($ri, $pi) = (0,0);
my $slines = $revs{$rev}{'lines'};
my @plines;
my $gotheader = 0;
- my ($remstart, $remlength, $addstart, $addlength);
- my ($hunk_start, $hunk_index, $hunk_adds);
+ my ($remstart);
+ my ($hunk_start, $hunk_index);
while(<DIFF>) {
chomp;
if (m/^@@ -(\d+),(\d+) \+(\d+),(\d+)/) {
- ($remstart, $remlength, $addstart, $addlength) = ($1, $2, $3, $4);
+ $remstart = $1;
# Adjust for 0-based arrays
$remstart--;
- $addstart--;
# Reinit hunk tracking.
$hunk_start = $remstart;
$hunk_index = 0;
@@ -279,7 +315,6 @@ sub git_diff_parse {
}
$hunk_index++;
}
- close(DIFF);
for (my $i = $ri; $i < @{$slines} ; $i++) {
push @plines, $slines->[$ri++];
}
@@ -295,13 +330,13 @@ sub get_line {
}
sub git_cat_file {
- my ($parent, $filename) = @_;
- return () unless defined $parent && defined $filename;
- my $blobline = `git-ls-tree $parent $filename`;
- my ($mode, $type, $blob, $tfilename) = split(/\s+/, $blobline, 4);
+ my ($rev, $filename) = @_;
+ return () unless defined $rev && defined $filename;
- open(C,"-|","git-cat-file", "blob", $blob)
- or die "Failed to git-cat-file blob $blob (rev $parent, file $filename): " . $!;
+ my $blob = git_ls_tree($rev, $filename);
+
+ open(C,"-|","git","cat-file", "blob", $blob)
+ or die "Failed to git-cat-file blob $blob (rev $rev, file $filename): " . $!;
my @lines;
while(<C>) {
@@ -313,6 +348,25 @@ sub git_cat_file {
return @lines;
}
+sub git_ls_tree {
+ my ($rev, $filename) = @_;
+
+ open(T,"-|","git","ls-tree",$rev,$filename)
+ or die "Failed to call git ls-tree: $!";
+
+ my ($mode, $type, $blob, $tfilename);
+ while(<T>) {
+ ($mode, $type, $blob, $tfilename) = split(/\s+/, $_, 4);
+ last if ($tfilename eq $filename);
+ }
+ close(T);
+
+ return $blob if $filename eq $filename;
+ die "git-ls-tree failed to find blob for $filename";
+
+}
+
+
sub claim_line {
my ($floffset, $rev, $lines, %revinfo) = @_;
@@ -354,3 +408,25 @@ sub format_date {
return strftime("%Y-%m-%d %H:%M:%S " . $timezone, gmtime($timestamp));
}
+# Copied from git-send-email.perl - We need a Git.pm module..
+sub gitvar {
+ my ($var) = @_;
+ my $fh;
+ my $pid = open($fh, '-|');
+ die "$!" unless defined $pid;
+ if (!$pid) {
+ exec('git-var', $var) or die "$!";
+ }
+ my ($val) = <$fh>;
+ close $fh or die "$!";
+ chomp($val);
+ return $val;
+}
+
+sub gitvar_name {
+ my ($name) = @_;
+ my $val = gitvar($name);
+ my @field = split(/\s+/, $val);
+ return join(' ', @field[0...(@field-4)]);
+}
+
--
1.2.3.g9ca3
^ permalink raw reply related
* the war on trailing whitespace
From: Andrew Morton @ 2006-02-26 1:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
It's invariably pointless to add lines which have trailing whitespace.
Nobody cares much, but my scripts spam me when it happens, so I've become
obsessive. Looking at Dave Miller's current net devel tree:
bix:/usr/src/25> grep '^+.*[ ]$' patches/git-net.patch | wc -l
170
Note that this is purely _added_ trailing whitespace. I'm not proposing
that the revision control system should care about pre-existing trailing
whitespace.
I got the quilt guys to generate warnings when patches add trailing
whitespace, and to provide the tools to strip it. And I believe Larry made
similar changes to bk.
I realise that we cannot do this when doing git fetches, but when importing
patches and mboxes, git ought to whine loudly about input which matches the
above regexp, and it should offer an option to tidy it up. Perhaps by
default.
Of course, this means that the person who sent the patch will find that his
as-yet-unsent patches don't apply to the upstream tree any more. Well,
that's tough luck - perhaps it'll motivate him to stop adding trailing
whitespace. The patches will still apply with `patch -l'.
Thanks for listening ;)
^ permalink raw reply
* [PATCH] First cut at libifying revlist generation
From: Linus Torvalds @ 2006-02-26 0:19 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
This really just splits things up partially, and creates the
interface to set things up by parsign the command line.
No real code changes so far, although the parsing of filenames is a bit
stricter. In particular, if there is a "--", then we do not accept any
filenames before it, and if there isn't any "--", then we check that _all_
paths listed are valid, not just the first one.
The new argument parsing automatically also gives us "--default" and
"--not" handling as in git-rev-parse.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
The path checking just makes sense.
This also makes git-rev-list handle "--all" and "--not" correctly (before,
it wouldn't handle "--not"), and teaches it to handle "--default". I
didn't do "--since" and "--before", but I will eventually. The final end
result is that we won't need git-rev-parse for most things.
But basically there should be no code changes, and this is just a mid-way
point where a _partial_ set of routines from "git-rev-list" has been moved
into a library files - revision.c. The half-way point has some ugly
interfaces, but I don't want to do it all in one go.
The _plan_ is to:
- move the actual history traversal into revision.c too
- leave all the git-rev-list -specific stuff (the "--bisect" logic,
the print-out logic etc) in rev-list.c
- make it trivial to make a small C version of "git diff" that just uses
the same "setup_revisions()" interface and then looks at
"revs->commits" to see what revisions were passed in (the library
interface is already at the point where that should work)
- when the actual history _traversal_ is moved into "revision.c" too, it
should then be possible to make an equally small "git log" and friends
(whatchanged etc) into C using the revision.c code.
Comments? I think this is safe to apply, because I've done a diff of the
old non-split rev-list.c against both the new rev-list.c and revision.c,
and all the changes _look_ like just moving things around and taking the
new "struct rev_info" into account.
It also passes all the tests, and in general seems straightforward enough.
HOEVER, I'd still like to point out that I could have screwed something
up, and this is just about the most core program in all of git, so it
would make a lot of sense if more people double-checked my "trivial"
split-up.
Linus
diff --git a/Makefile b/Makefile
index 6c59cee..3575489 100644
--- a/Makefile
+++ b/Makefile
@@ -192,7 +192,7 @@ LIB_FILE=libgit.a
LIB_H = \
blob.h cache.h commit.h count-delta.h csum-file.h delta.h \
diff.h epoch.h object.h pack.h pkt-line.h quote.h refs.h \
- run-command.h strbuf.h tag.h tree.h git-compat-util.h
+ run-command.h strbuf.h tag.h tree.h git-compat-util.h revision.h
DIFF_OBJS = \
diff.o diffcore-break.o diffcore-order.o diffcore-pathspec.o \
@@ -205,7 +205,7 @@ LIB_OBJS = \
quote.o read-cache.o refs.o run-command.o \
server-info.o setup.o sha1_file.o sha1_name.o strbuf.o \
tag.o tree.o usage.o config.o environment.o ctype.o copy.o \
- fetch-clone.o \
+ fetch-clone.o revision.o \
$(DIFF_OBJS)
LIBS = $(LIB_FILE)
diff --git a/epoch.c b/epoch.c
index 3a76748..0f37492 100644
--- a/epoch.c
+++ b/epoch.c
@@ -15,6 +15,7 @@
#include "cache.h"
#include "commit.h"
+#include "revision.h"
#include "epoch.h"
struct fraction {
diff --git a/epoch.h b/epoch.h
index 7493d5a..3756009 100644
--- a/epoch.h
+++ b/epoch.h
@@ -11,7 +11,6 @@ typedef int (*emitter_func) (struct comm
int sort_list_in_merge_order(struct commit_list *list, emitter_func emitter);
/* Low bits are used by rev-list */
-#define UNINTERESTING (1u<<10)
#define BOUNDARY (1u<<11)
#define VISITED (1u<<12)
#define DISCONTINUITY (1u<<13)
diff --git a/rev-list.c b/rev-list.c
index 67d2a48..d1c52a6 100644
--- a/rev-list.c
+++ b/rev-list.c
@@ -6,9 +6,10 @@
#include "blob.h"
#include "epoch.h"
#include "diff.h"
+#include "revision.h"
+
+/* bits #0 and #1 in revision.h */
-#define SEEN (1u << 0)
-#define INTERESTING (1u << 1)
#define COUNTED (1u << 2)
#define SHOWN (1u << 3)
#define TREECHANGE (1u << 4)
@@ -38,60 +39,20 @@ static const char rev_list_usage[] =
" --bisect"
;
-static int dense = 1;
+struct rev_info revs;
+
static int unpacked = 0;
static int bisect_list = 0;
-static int tag_objects = 0;
-static int tree_objects = 0;
-static int blob_objects = 0;
-static int edge_hint = 0;
static int verbose_header = 0;
static int abbrev = DEFAULT_ABBREV;
static int show_parents = 0;
static int hdr_termination = 0;
static const char *commit_prefix = "";
-static unsigned long max_age = -1;
-static unsigned long min_age = -1;
-static int max_count = -1;
static enum cmit_fmt commit_format = CMIT_FMT_RAW;
static int merge_order = 0;
static int show_breaks = 0;
static int stop_traversal = 0;
-static int topo_order = 0;
-static int lifo = 1;
static int no_merges = 0;
-static const char **paths = NULL;
-static int remove_empty_trees = 0;
-
-struct name_path {
- struct name_path *up;
- int elem_len;
- const char *elem;
-};
-
-static char *path_name(struct name_path *path, const char *name)
-{
- struct name_path *p;
- char *n, *m;
- int nlen = strlen(name);
- int len = nlen + 1;
-
- for (p = path; p; p = p->up) {
- if (p->elem_len)
- len += p->elem_len + 1;
- }
- n = xmalloc(len);
- m = n + len - (nlen + 1);
- strcpy(m, name);
- for (p = path; p; p = p->up) {
- if (p->elem_len) {
- m -= p->elem_len + 1;
- memcpy(m, p->elem, p->elem_len);
- m[p->elem_len] = '/';
- }
- }
- return n;
-}
static void show_commit(struct commit *commit)
{
@@ -168,15 +129,15 @@ static int filter_commit(struct commit *
return STOP;
if (commit->object.flags & (UNINTERESTING|SHOWN))
return CONTINUE;
- if (min_age != -1 && (commit->date > min_age))
+ if (revs.min_age != -1 && (commit->date > revs.min_age))
return CONTINUE;
- if (max_age != -1 && (commit->date < max_age)) {
+ if (revs.max_age != -1 && (commit->date < revs.max_age)) {
stop_traversal=1;
return CONTINUE;
}
if (no_merges && (commit->parents && commit->parents->next))
return CONTINUE;
- if (paths && dense) {
+ if (revs.paths && revs.dense) {
if (!(commit->object.flags & TREECHANGE))
return CONTINUE;
rewrite_parents(commit);
@@ -196,7 +157,7 @@ static int process_commit(struct commit
return CONTINUE;
}
- if (max_count != -1 && !max_count--)
+ if (revs.max_count != -1 && !revs.max_count--)
return STOP;
show_commit(commit);
@@ -204,19 +165,6 @@ static int process_commit(struct commit
return CONTINUE;
}
-static struct object_list **add_object(struct object *obj,
- struct object_list **p,
- struct name_path *path,
- const char *name)
-{
- struct object_list *entry = xmalloc(sizeof(*entry));
- entry->item = obj;
- entry->next = *p;
- entry->name = path_name(path, name);
- *p = entry;
- return &entry->next;
-}
-
static struct object_list **process_blob(struct blob *blob,
struct object_list **p,
struct name_path *path,
@@ -224,7 +172,7 @@ static struct object_list **process_blob
{
struct object *obj = &blob->object;
- if (!blob_objects)
+ if (!revs.blob_objects)
return p;
if (obj->flags & (UNINTERESTING | SEEN))
return p;
@@ -241,7 +189,7 @@ static struct object_list **process_tree
struct tree_entry_list *entry;
struct name_path me;
- if (!tree_objects)
+ if (!revs.tree_objects)
return p;
if (obj->flags & (UNINTERESTING | SEEN))
return p;
@@ -314,75 +262,6 @@ static void show_commit_list(struct comm
}
}
-static void mark_blob_uninteresting(struct blob *blob)
-{
- if (!blob_objects)
- return;
- if (blob->object.flags & UNINTERESTING)
- return;
- blob->object.flags |= UNINTERESTING;
-}
-
-static void mark_tree_uninteresting(struct tree *tree)
-{
- struct object *obj = &tree->object;
- struct tree_entry_list *entry;
-
- if (!tree_objects)
- return;
- if (obj->flags & UNINTERESTING)
- return;
- obj->flags |= UNINTERESTING;
- if (!has_sha1_file(obj->sha1))
- return;
- if (parse_tree(tree) < 0)
- die("bad tree %s", sha1_to_hex(obj->sha1));
- entry = tree->entries;
- tree->entries = NULL;
- while (entry) {
- struct tree_entry_list *next = entry->next;
- if (entry->directory)
- mark_tree_uninteresting(entry->item.tree);
- else
- mark_blob_uninteresting(entry->item.blob);
- free(entry);
- entry = next;
- }
-}
-
-static void mark_parents_uninteresting(struct commit *commit)
-{
- struct commit_list *parents = commit->parents;
-
- while (parents) {
- struct commit *commit = parents->item;
- commit->object.flags |= UNINTERESTING;
-
- /*
- * Normally we haven't parsed the parent
- * yet, so we won't have a parent of a parent
- * here. However, it may turn out that we've
- * reached this commit some other way (where it
- * wasn't uninteresting), in which case we need
- * to mark its parents recursively too..
- */
- if (commit->parents)
- mark_parents_uninteresting(commit);
-
- /*
- * A missing commit is ok iff its parent is marked
- * uninteresting.
- *
- * We just mark such a thing parsed, so that when
- * it is popped next time around, we won't be trying
- * to parse it and get an error.
- */
- if (!has_sha1_file(commit->object.sha1))
- commit->object.parsed = 1;
- parents = parents->next;
- }
-}
-
static int everybody_uninteresting(struct commit_list *orig)
{
struct commit_list *list = orig;
@@ -413,7 +292,7 @@ static int count_distance(struct commit_
if (commit->object.flags & (UNINTERESTING | COUNTED))
break;
- if (!paths || (commit->object.flags & TREECHANGE))
+ if (!revs.paths || (commit->object.flags & TREECHANGE))
nr++;
commit->object.flags |= COUNTED;
p = commit->parents;
@@ -447,7 +326,7 @@ static struct commit_list *find_bisectio
nr = 0;
p = list;
while (p) {
- if (!paths || (p->item->object.flags & TREECHANGE))
+ if (!revs.paths || (p->item->object.flags & TREECHANGE))
nr++;
p = p->next;
}
@@ -457,7 +336,7 @@ static struct commit_list *find_bisectio
for (p = list; p; p = p->next) {
int distance;
- if (paths && !(p->item->object.flags & TREECHANGE))
+ if (revs.paths && !(p->item->object.flags & TREECHANGE))
continue;
distance = count_distance(p);
@@ -483,7 +362,7 @@ static void mark_edge_parents_uninterest
if (!(parent->object.flags & UNINTERESTING))
continue;
mark_tree_uninteresting(parent->tree);
- if (edge_hint && !(parent->object.flags & SHOWN)) {
+ if (revs.edge_hint && !(parent->object.flags & SHOWN)) {
parent->object.flags |= SHOWN;
printf("-%s\n", sha1_to_hex(parent->object.sha1));
}
@@ -613,7 +492,7 @@ static void try_to_simplify_commit(struc
return;
case TREE_NEW:
- if (remove_empty_trees && same_tree_as_empty(p->tree)) {
+ if (revs.remove_empty_trees && same_tree_as_empty(p->tree)) {
*pp = parent->next;
continue;
}
@@ -664,7 +543,7 @@ static void add_parents_to_list(struct c
* simplify the commit history and find the parent
* that has no differences in the path set if one exists.
*/
- if (paths)
+ if (revs.paths)
try_to_simplify_commit(commit);
parent = commit->parents;
@@ -693,7 +572,7 @@ static struct commit_list *limit_list(st
list = list->next;
free(entry);
- if (max_age != -1 && (commit->date < max_age))
+ if (revs.max_age != -1 && (commit->date < revs.max_age))
obj->flags |= UNINTERESTING;
if (unpacked && has_sha1_pack(obj->sha1))
obj->flags |= UNINTERESTING;
@@ -704,155 +583,40 @@ static struct commit_list *limit_list(st
break;
continue;
}
- if (min_age != -1 && (commit->date > min_age))
+ if (revs.min_age != -1 && (commit->date > revs.min_age))
continue;
p = &commit_list_insert(commit, p)->next;
}
- if (tree_objects)
+ if (revs.tree_objects)
mark_edges_uninteresting(newlist);
if (bisect_list)
newlist = find_bisection(newlist);
return newlist;
}
-static void add_pending_object(struct object *obj, const char *name)
-{
- add_object(obj, &pending_objects, NULL, name);
-}
-
-static struct commit *get_commit_reference(const char *name, const unsigned char *sha1, unsigned int flags)
-{
- struct object *object;
-
- object = parse_object(sha1);
- if (!object)
- die("bad object %s", name);
-
- /*
- * Tag object? Look what it points to..
- */
- while (object->type == tag_type) {
- struct tag *tag = (struct tag *) object;
- object->flags |= flags;
- if (tag_objects && !(object->flags & UNINTERESTING))
- add_pending_object(object, tag->tag);
- object = parse_object(tag->tagged->sha1);
- if (!object)
- die("bad object %s", sha1_to_hex(tag->tagged->sha1));
- }
-
- /*
- * Commit object? Just return it, we'll do all the complex
- * reachability crud.
- */
- if (object->type == commit_type) {
- struct commit *commit = (struct commit *)object;
- object->flags |= flags;
- if (parse_commit(commit) < 0)
- die("unable to parse commit %s", name);
- if (flags & UNINTERESTING)
- mark_parents_uninteresting(commit);
- return commit;
- }
-
- /*
- * Tree object? Either mark it uniniteresting, or add it
- * to the list of objects to look at later..
- */
- if (object->type == tree_type) {
- struct tree *tree = (struct tree *)object;
- if (!tree_objects)
- return NULL;
- if (flags & UNINTERESTING) {
- mark_tree_uninteresting(tree);
- return NULL;
- }
- add_pending_object(object, "");
- return NULL;
- }
-
- /*
- * Blob object? You know the drill by now..
- */
- if (object->type == blob_type) {
- struct blob *blob = (struct blob *)object;
- if (!blob_objects)
- return NULL;
- if (flags & UNINTERESTING) {
- mark_blob_uninteresting(blob);
- return NULL;
- }
- add_pending_object(object, "");
- return NULL;
- }
- die("%s is unknown object", name);
-}
-
-static void handle_one_commit(struct commit *com, struct commit_list **lst)
-{
- if (!com || com->object.flags & SEEN)
- return;
- com->object.flags |= SEEN;
- commit_list_insert(com, lst);
-}
-
-/* for_each_ref() callback does not allow user data -- Yuck. */
-static struct commit_list **global_lst;
-
-static int include_one_commit(const char *path, const unsigned char *sha1)
-{
- struct commit *com = get_commit_reference(path, sha1, 0);
- handle_one_commit(com, global_lst);
- return 0;
-}
-
-static void handle_all(struct commit_list **lst)
-{
- global_lst = lst;
- for_each_ref(include_one_commit);
- global_lst = NULL;
-}
-
int main(int argc, const char **argv)
{
- const char *prefix = setup_git_directory();
- struct commit_list *list = NULL;
+ struct commit_list *list;
int i, limited = 0;
+ argc = setup_revisions(argc, argv, &revs);
+
for (i = 1 ; i < argc; i++) {
- int flags;
const char *arg = argv[i];
- char *dotdot;
- struct commit *commit;
- unsigned char sha1[20];
/* accept -<digit>, like traditilnal "head" */
if ((*arg == '-') && isdigit(arg[1])) {
- max_count = atoi(arg + 1);
+ revs.max_count = atoi(arg + 1);
continue;
}
if (!strcmp(arg, "-n")) {
if (++i >= argc)
die("-n requires an argument");
- max_count = atoi(argv[i]);
+ revs.max_count = atoi(argv[i]);
continue;
}
if (!strncmp(arg,"-n",2)) {
- max_count = atoi(arg + 2);
- continue;
- }
- if (!strncmp(arg, "--max-count=", 12)) {
- max_count = atoi(arg + 12);
- continue;
- }
- if (!strncmp(arg, "--max-age=", 10)) {
- max_age = atoi(arg + 10);
- limited = 1;
- continue;
- }
- if (!strncmp(arg, "--min-age=", 10)) {
- min_age = atoi(arg + 10);
- limited = 1;
+ revs.max_count = atoi(arg + 2);
continue;
}
if (!strcmp(arg, "--header")) {
@@ -893,23 +657,6 @@ int main(int argc, const char **argv)
bisect_list = 1;
continue;
}
- if (!strcmp(arg, "--all")) {
- handle_all(&list);
- continue;
- }
- if (!strcmp(arg, "--objects")) {
- tag_objects = 1;
- tree_objects = 1;
- blob_objects = 1;
- continue;
- }
- if (!strcmp(arg, "--objects-edge")) {
- tag_objects = 1;
- tree_objects = 1;
- blob_objects = 1;
- edge_hint = 1;
- continue;
- }
if (!strcmp(arg, "--unpacked")) {
unpacked = 1;
limited = 1;
@@ -923,100 +670,42 @@ int main(int argc, const char **argv)
show_breaks = 1;
continue;
}
- if (!strcmp(arg, "--topo-order")) {
- topo_order = 1;
- lifo = 1;
- limited = 1;
- continue;
- }
- if (!strcmp(arg, "--date-order")) {
- topo_order = 1;
- lifo = 0;
- limited = 1;
- continue;
- }
- if (!strcmp(arg, "--dense")) {
- dense = 1;
- continue;
- }
- if (!strcmp(arg, "--sparse")) {
- dense = 0;
- continue;
- }
- if (!strcmp(arg, "--remove-empty")) {
- remove_empty_trees = 1;
- continue;
- }
- if (!strcmp(arg, "--")) {
- i++;
- break;
- }
-
- if (show_breaks && !merge_order)
- usage(rev_list_usage);
+ usage(rev_list_usage);
- flags = 0;
- dotdot = strstr(arg, "..");
- if (dotdot) {
- unsigned char from_sha1[20];
- char *next = dotdot + 2;
- *dotdot = 0;
- if (!*next)
- next = "HEAD";
- if (!get_sha1(arg, from_sha1) && !get_sha1(next, sha1)) {
- struct commit *exclude;
- struct commit *include;
-
- exclude = get_commit_reference(arg, from_sha1, UNINTERESTING);
- include = get_commit_reference(next, sha1, 0);
- if (!exclude || !include)
- die("Invalid revision range %s..%s", arg, next);
- limited = 1;
- handle_one_commit(exclude, &list);
- handle_one_commit(include, &list);
- continue;
- }
- *dotdot = '.';
- }
- if (*arg == '^') {
- flags = UNINTERESTING;
- arg++;
- limited = 1;
- }
- if (get_sha1(arg, sha1) < 0) {
- struct stat st;
- if (lstat(arg, &st) < 0)
- die("'%s': %s", arg, strerror(errno));
- break;
- }
- commit = get_commit_reference(arg, sha1, flags);
- handle_one_commit(commit, &list);
}
+ list = revs.commits;
+ if (list && list->next)
+ limited = 1;
+
+ if (revs.topo_order)
+ limited = 1;
+
if (!list &&
- (!(tag_objects||tree_objects||blob_objects) && !pending_objects))
+ (!(revs.tag_objects||revs.tree_objects||revs.blob_objects) && !revs.pending_objects))
usage(rev_list_usage);
- paths = get_pathspec(prefix, argv + i);
- if (paths) {
+ if (revs.paths) {
limited = 1;
- diff_tree_setup_paths(paths);
+ diff_tree_setup_paths(revs.paths);
}
+ if (revs.max_age || revs.min_age)
+ limited = 1;
save_commit_buffer = verbose_header;
track_object_refs = 0;
if (!merge_order) {
sort_by_date(&list);
- if (list && !limited && max_count == 1 &&
- !tag_objects && !tree_objects && !blob_objects) {
+ if (list && !limited && revs.max_count == 1 &&
+ !revs.tag_objects && !revs.tree_objects && !revs.blob_objects) {
show_commit(list->item);
return 0;
}
if (limited)
list = limit_list(list);
- if (topo_order)
- sort_in_topological_order(&list, lifo);
+ if (revs.topo_order)
+ sort_in_topological_order(&list, revs.lifo);
show_commit_list(list);
} else {
#ifndef NO_OPENSSL
diff --git a/revision.c b/revision.c
new file mode 100644
index 0000000..17dbf9a
--- /dev/null
+++ b/revision.c
@@ -0,0 +1,370 @@
+#include "cache.h"
+#include "tag.h"
+#include "blob.h"
+#include "tree.h"
+#include "commit.h"
+#include "refs.h"
+#include "revision.h"
+
+static char *path_name(struct name_path *path, const char *name)
+{
+ struct name_path *p;
+ char *n, *m;
+ int nlen = strlen(name);
+ int len = nlen + 1;
+
+ for (p = path; p; p = p->up) {
+ if (p->elem_len)
+ len += p->elem_len + 1;
+ }
+ n = xmalloc(len);
+ m = n + len - (nlen + 1);
+ strcpy(m, name);
+ for (p = path; p; p = p->up) {
+ if (p->elem_len) {
+ m -= p->elem_len + 1;
+ memcpy(m, p->elem, p->elem_len);
+ m[p->elem_len] = '/';
+ }
+ }
+ return n;
+}
+
+struct object_list **add_object(struct object *obj,
+ struct object_list **p,
+ struct name_path *path,
+ const char *name)
+{
+ struct object_list *entry = xmalloc(sizeof(*entry));
+ entry->item = obj;
+ entry->next = *p;
+ entry->name = path_name(path, name);
+ *p = entry;
+ return &entry->next;
+}
+
+static void mark_blob_uninteresting(struct blob *blob)
+{
+ if (blob->object.flags & UNINTERESTING)
+ return;
+ blob->object.flags |= UNINTERESTING;
+}
+
+void mark_tree_uninteresting(struct tree *tree)
+{
+ struct object *obj = &tree->object;
+ struct tree_entry_list *entry;
+
+ if (obj->flags & UNINTERESTING)
+ return;
+ obj->flags |= UNINTERESTING;
+ if (!has_sha1_file(obj->sha1))
+ return;
+ if (parse_tree(tree) < 0)
+ die("bad tree %s", sha1_to_hex(obj->sha1));
+ entry = tree->entries;
+ tree->entries = NULL;
+ while (entry) {
+ struct tree_entry_list *next = entry->next;
+ if (entry->directory)
+ mark_tree_uninteresting(entry->item.tree);
+ else
+ mark_blob_uninteresting(entry->item.blob);
+ free(entry);
+ entry = next;
+ }
+}
+
+void mark_parents_uninteresting(struct commit *commit)
+{
+ struct commit_list *parents = commit->parents;
+
+ while (parents) {
+ struct commit *commit = parents->item;
+ commit->object.flags |= UNINTERESTING;
+
+ /*
+ * Normally we haven't parsed the parent
+ * yet, so we won't have a parent of a parent
+ * here. However, it may turn out that we've
+ * reached this commit some other way (where it
+ * wasn't uninteresting), in which case we need
+ * to mark its parents recursively too..
+ */
+ if (commit->parents)
+ mark_parents_uninteresting(commit);
+
+ /*
+ * A missing commit is ok iff its parent is marked
+ * uninteresting.
+ *
+ * We just mark such a thing parsed, so that when
+ * it is popped next time around, we won't be trying
+ * to parse it and get an error.
+ */
+ if (!has_sha1_file(commit->object.sha1))
+ commit->object.parsed = 1;
+ parents = parents->next;
+ }
+}
+
+static void add_pending_object(struct rev_info *revs, struct object *obj, const char *name)
+{
+ add_object(obj, &revs->pending_objects, NULL, name);
+}
+
+static struct commit *get_commit_reference(struct rev_info *revs, const char *name, const unsigned char *sha1, unsigned int flags)
+{
+ struct object *object;
+
+ object = parse_object(sha1);
+ if (!object)
+ die("bad object %s", name);
+
+ /*
+ * Tag object? Look what it points to..
+ */
+ while (object->type == tag_type) {
+ struct tag *tag = (struct tag *) object;
+ object->flags |= flags;
+ if (revs->tag_objects && !(object->flags & UNINTERESTING))
+ add_pending_object(revs, object, tag->tag);
+ object = parse_object(tag->tagged->sha1);
+ if (!object)
+ die("bad object %s", sha1_to_hex(tag->tagged->sha1));
+ }
+
+ /*
+ * Commit object? Just return it, we'll do all the complex
+ * reachability crud.
+ */
+ if (object->type == commit_type) {
+ struct commit *commit = (struct commit *)object;
+ object->flags |= flags;
+ if (parse_commit(commit) < 0)
+ die("unable to parse commit %s", name);
+ if (flags & UNINTERESTING)
+ mark_parents_uninteresting(commit);
+ return commit;
+ }
+
+ /*
+ * Tree object? Either mark it uniniteresting, or add it
+ * to the list of objects to look at later..
+ */
+ if (object->type == tree_type) {
+ struct tree *tree = (struct tree *)object;
+ if (!revs->tree_objects)
+ return NULL;
+ if (flags & UNINTERESTING) {
+ mark_tree_uninteresting(tree);
+ return NULL;
+ }
+ add_pending_object(revs, object, "");
+ return NULL;
+ }
+
+ /*
+ * Blob object? You know the drill by now..
+ */
+ if (object->type == blob_type) {
+ struct blob *blob = (struct blob *)object;
+ if (!revs->blob_objects)
+ return NULL;
+ if (flags & UNINTERESTING) {
+ mark_blob_uninteresting(blob);
+ return NULL;
+ }
+ add_pending_object(revs, object, "");
+ return NULL;
+ }
+ die("%s is unknown object", name);
+}
+
+static void add_one_commit(struct commit *commit, struct rev_info *revs)
+{
+ if (!commit || (commit->object.flags & SEEN))
+ return;
+ commit->object.flags |= SEEN;
+ commit_list_insert(commit, &revs->commits);
+}
+
+static int all_flags;
+static struct rev_info *all_revs;
+
+static int handle_one_ref(const char *path, const unsigned char *sha1)
+{
+ struct commit *commit = get_commit_reference(all_revs, path, sha1, all_flags);
+ add_one_commit(commit, all_revs);
+ return 0;
+}
+
+static void handle_all(struct rev_info *revs, unsigned flags)
+{
+ all_revs = revs;
+ all_flags = flags;
+ for_each_ref(handle_one_ref);
+}
+
+/*
+ * Parse revision information, filling in the "rev_info" structure,
+ * and removing the used arguments from the argument list.
+ *
+ * Returns the number of arguments left ("new argc").
+ */
+int setup_revisions(int argc, const char **argv, struct rev_info *revs)
+{
+ int i, flags, seen_dashdash;
+ const char *def = NULL;
+ const char **unrecognized = argv+1;
+ int left = 1;
+
+ memset(revs, 0, sizeof(*revs));
+ revs->lifo = 1;
+ revs->dense = 1;
+ revs->prefix = setup_git_directory();
+ revs->max_age = -1;
+ revs->min_age = -1;
+ revs->max_count = -1;
+
+ /* First, search for "--" */
+ seen_dashdash = 0;
+ for (i = 1; i < argc; i++) {
+ const char *arg = argv[i];
+ if (strcmp(arg, "--"))
+ continue;
+ argv[i] = NULL;
+ argc = i;
+ revs->paths = get_pathspec(revs->prefix, argv + i + 1);
+ seen_dashdash = 1;
+ break;
+ }
+
+ flags = 0;
+ for (i = 1; i < argc; i++) {
+ struct commit *commit;
+ const char *arg = argv[i];
+ unsigned char sha1[20];
+ char *dotdot;
+ int local_flags;
+
+ if (*arg == '-') {
+ if (!strncmp(arg, "--max-count=", 12)) {
+ revs->max_count = atoi(arg + 12);
+ continue;
+ }
+ if (!strncmp(arg, "--max-age=", 10)) {
+ revs->max_age = atoi(arg + 10);
+ continue;
+ }
+ if (!strncmp(arg, "--min-age=", 10)) {
+ revs->min_age = atoi(arg + 10);
+ continue;
+ }
+ if (!strcmp(arg, "--all")) {
+ handle_all(revs, flags);
+ continue;
+ }
+ if (!strcmp(arg, "--not")) {
+ flags ^= UNINTERESTING;
+ continue;
+ }
+ if (!strcmp(arg, "--default")) {
+ if (++i >= argc)
+ die("bad --default argument");
+ def = argv[i];
+ continue;
+ }
+ if (!strcmp(arg, "--topo-order")) {
+ revs->topo_order = 1;
+ continue;
+ }
+ if (!strcmp(arg, "--date-order")) {
+ revs->lifo = 0;
+ revs->topo_order = 1;
+ continue;
+ }
+ if (!strcmp(arg, "--dense")) {
+ revs->dense = 1;
+ continue;
+ }
+ if (!strcmp(arg, "--sparse")) {
+ revs->dense = 0;
+ continue;
+ }
+ if (!strcmp(arg, "--remove-empty")) {
+ revs->remove_empty_trees = 1;
+ continue;
+ }
+ if (!strcmp(arg, "--objects")) {
+ revs->tag_objects = 1;
+ revs->tree_objects = 1;
+ revs->blob_objects = 1;
+ continue;
+ }
+ if (!strcmp(arg, "--objects-edge")) {
+ revs->tag_objects = 1;
+ revs->tree_objects = 1;
+ revs->blob_objects = 1;
+ revs->edge_hint = 1;
+ continue;
+ }
+ *unrecognized++ = arg;
+ left++;
+ continue;
+ }
+ dotdot = strstr(arg, "..");
+ if (dotdot) {
+ unsigned char from_sha1[20];
+ char *next = dotdot + 2;
+ *dotdot = 0;
+ if (!*next)
+ next = "HEAD";
+ if (!get_sha1(arg, from_sha1) && !get_sha1(next, sha1)) {
+ struct commit *exclude;
+ struct commit *include;
+
+ exclude = get_commit_reference(revs, arg, from_sha1, flags ^ UNINTERESTING);
+ include = get_commit_reference(revs, next, sha1, flags);
+ if (!exclude || !include)
+ die("Invalid revision range %s..%s", arg, next);
+ add_one_commit(exclude, revs);
+ add_one_commit(include, revs);
+ continue;
+ }
+ *dotdot = '.';
+ }
+ local_flags = 0;
+ if (*arg == '^') {
+ local_flags = UNINTERESTING;
+ arg++;
+ }
+ if (get_sha1(arg, sha1) < 0) {
+ struct stat st;
+ int j;
+
+ if (seen_dashdash || local_flags)
+ die("bad revision '%s'", arg);
+
+ /* If we didn't have a "--", all filenames must exist */
+ for (j = i; j < argc; j++) {
+ if (lstat(argv[j], &st) < 0)
+ die("'%s': %s", arg, strerror(errno));
+ }
+ revs->paths = get_pathspec(revs->prefix, argv + i);
+ break;
+ }
+ commit = get_commit_reference(revs, arg, sha1, flags ^ local_flags);
+ add_one_commit(commit, revs);
+ }
+ if (def && !revs->commits) {
+ unsigned char sha1[20];
+ struct commit *commit;
+ if (get_sha1(def, sha1) < 0)
+ die("bad default revision '%s'", def);
+ commit = get_commit_reference(revs, def, sha1, 0);
+ add_one_commit(commit, revs);
+ }
+ *unrecognized = NULL;
+ return left;
+}
diff --git a/revision.h b/revision.h
new file mode 100644
index 0000000..5170ac4
--- /dev/null
+++ b/revision.h
@@ -0,0 +1,48 @@
+#ifndef REVISION_H
+#define REVISION_H
+
+#define SEEN (1u<<0)
+#define UNINTERESTING (1u<<1)
+
+struct rev_info {
+ /* Starting list */
+ struct commit_list *commits;
+ struct object_list *pending_objects;
+
+ /* Basic information */
+ const char *prefix;
+ const char **paths;
+
+ /* Traversal flags */
+ unsigned int dense:1,
+ remove_empty_trees:1,
+ lifo:1,
+ topo_order:1,
+ tag_objects:1,
+ tree_objects:1,
+ blob_objects:1,
+ edge_hint:1;
+
+ /* special limits */
+ int max_count;
+ unsigned long max_age;
+ unsigned long min_age;
+};
+
+/* revision.c */
+extern int setup_revisions(int argc, const char **argv, struct rev_info *revs);
+extern void mark_parents_uninteresting(struct commit *commit);
+extern void mark_tree_uninteresting(struct tree *tree);
+
+struct name_path {
+ struct name_path *up;
+ int elem_len;
+ const char *elem;
+};
+
+extern struct object_list **add_object(struct object *obj,
+ struct object_list **p,
+ struct name_path *path,
+ const char *name);
+
+#endif
^ permalink raw reply related
* Re: [PATCH] git-fetch: print the new and old ref when fast-forwarding
From: Junio C Hamano @ 2006-02-25 20:53 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Lukas Sandström, Git Mailing List
In-Reply-To: <Pine.LNX.4.63.0602252107110.17999@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> This is useful when you check out new changes with gitk.
>> Just copy/paste the old ref into gitk from the terminal.
>
> Why does "gitk ORIG_HEAD..HEAD" not work? (It also does the correct thing
> when pulling...)
For most projects and repositories with single interesting head,
that would work just fine.
If you use additional Pull: lines to track more than one remote
refs, this patch would help.
For example, if you are tracking my "next" while keeping an eye
on my "master" and "pu", your .git/remotes/origin file may have
something like this:
URL: git://git.kernel.org/pub/scm/git/git.git
Pull: next:origin
Pull: master:ko-master
Pull: pu:ko-pu
When "git pull origin" pulls my next branch into your current
branch (typically "master"), it also fast forwards your tracking
branches ko-master and ko-pu. If you want to see what I merged
in the meantime, you would want to get the old value of
ko-master and the new value and feed them to gitk (or git log).
ORIG_HEAD in this case was the old value of _your_ current
branch head, and is not useful to see what happened to my master
branch.
^ permalink raw reply
* Re: [PATCH] git-fetch: print the new and old ref when fast-forwarding
From: Johannes Schindelin @ 2006-02-25 20:08 UTC (permalink / raw)
To: Lukas Sandström; +Cc: Git Mailing List, Junio C Hamano
In-Reply-To: <44003D6D.2010406@etek.chalmers.se>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 273 bytes --]
Hi,
On Sat, 25 Feb 2006, Lukas Sandström wrote:
> This is useful when you check out new changes with gitk.
> Just copy/paste the old ref into gitk from the terminal.
Why does "gitk ORIG_HEAD..HEAD" not work? (It also does the correct thing
when pulling...)
Hth,
Dscho
^ permalink raw reply
* Re: [PATCH] diff-delta: produce optimal pack data
From: Linus Torvalds @ 2006-02-25 19:18 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git, Carl Baldwin
In-Reply-To: <Pine.LNX.4.64.0602242326381.31162@localhost.localdomain>
On Sat, 25 Feb 2006, Nicolas Pitre wrote:
>
> Yes, the hash is larger. There is a cost in memory usage but not really
> in CPU cycles.
Note that memory usage translates almost 1:1 (or worse) to CPU cycles in
almost all real-life behaviours. Only in carefully tuned benchmarks does
it not.
Increased memory usage means more paging, and worse cache behaviour. Now,
hashes aren't wonderful for caches in the first place, but imagine the
hump you pass when the data doesn't fit in a 64kB L1 any more (or a 256kB
L2). Huge.
> > You'll find a lot of that in any file: three or four bytes of similarity
> > just doesn't sound worthwhile to go digging after.
>
> Well after having experimented a lot with multiple parameters I think
> they are worth it after all. Not only they provide for optimal deltas,
> but their hash is faster to compute than larger blocks which seems to
> counter balance for the cost of increased hash list.
Hey, numbers talk. If you've got the numbers, I'll just shut up ;)
Linus
^ permalink raw reply
* [PATCH] git-fetch: print the new and old ref when fast-forwarding
From: Lukas Sandström @ 2006-02-25 11:20 UTC (permalink / raw)
To: Git Mailing List, Junio C Hamano
Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
---
This is useful when you check out new changes with gitk.
Just copy/paste the old ref into gitk from the terminal.
git-fetch.sh | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
78f2844c17bb8627e84389e3026906d074c43363
diff --git a/git-fetch.sh b/git-fetch.sh
index fcc24f8..a3e2a34 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -164,6 +164,7 @@ fast_forward_local () {
;;
*,$local)
echo >&2 "* $1: fast forward to $3"
+ echo >&2 " from $local to $2"
git-update-ref "$1" "$2" "$local"
;;
*)
--
1.2.3.gc412
^ permalink raw reply related
* Re: gitview: Use monospace font to draw the branch and tag name
From: Aneesh Kumar @ 2006-02-25 7:00 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vzmkg6kj7.fsf@assigned-by-dhcp.cox.net>
On 2/25/06, Junio C Hamano <junkio@cox.net> wrote:
>
> >> I have an impression that hardcoding UI policy like this is
> >> generally frowned upon.
> >
> > But with that changes branch and the tag name looks neat.
>
> The point being that it only looks neat on your display, with
> its size and resolution, and to your eye.
>
> People have different tastes and equipments. One thing I really
> liked gitk (I do not know if gitview does this as well -- if so
> I am happy) is I can do - and + to change the text size
> depending on what display I am working on.
>
>
Point taken. Right now gitview doesn't support + and - key binding. I
will try to add that. For the time being can you can drop my font and
font size related changes and apply rest of the patches.
-aneesh
^ permalink raw reply
* Re: [PATCH] git-rm: Fix to properly handle files with spaces, tabs, newlines, etc.
From: Junio C Hamano @ 2006-02-25 6:05 UTC (permalink / raw)
To: Alex Riesen; +Cc: git, Carl Worth, Krzysiek Pawlik
In-Reply-To: <81b0412b0602240523l1b10c910q6e5d2e3cef82e306@mail.gmail.com>
"Alex Riesen" <raa.lkml@gmail.com> writes:
> On 2/23/06, Carl Worth <cworth@cworth.org> wrote:
>> +# Setup some files to be removed, some with funny characters
>> +touch -- foo bar baz 'space embedded' 'tab embedded' 'newline
>> +embedded' -q
>> +git-add -- foo bar baz 'space embedded' 'tab embedded' 'newline
>> +embedded' -q
>> +git-commit -m "add files"
>
> This doesn't work on some exotic filesystems (ntfs and fat):
Sorry to have applied this without thinking. Yes, we had
disabled a test that uses tab for this exact reason, but I have
forgotten about it.
^ permalink raw reply
* Re: gitview: Use monospace font to draw the branch and tag name
From: Junio C Hamano @ 2006-02-25 6:05 UTC (permalink / raw)
To: Aneesh Kumar; +Cc: git
In-Reply-To: <cc723f590602240829y3ebffaf9mdd7de0cd2a9051e@mail.gmail.com>
"Aneesh Kumar" <aneesh.kumar@gmail.com> writes:
> On 2/24/06, Junio C Hamano <junkio@cox.net> wrote:
>> "Aneesh Kumar K.V" <aneesh.kumar@gmail.com> writes:
>>
>> > This patch address the below:
>> > Use monospace font to draw branch and tag name
>> > set the font size to 13.
>>
>> I have an impression that hardcoding UI policy like this is
>> generally frowned upon.
>
> But with that changes branch and the tag name looks neat.
The point being that it only looks neat on your display, with
its size and resolution, and to your eye.
People have different tastes and equipments. One thing I really
liked gitk (I do not know if gitview does this as well -- if so
I am happy) is I can do - and + to change the text size
depending on what display I am working on.
^ permalink raw reply
* Re: FYI: git-am allows creation of empty commits.
From: Junio C Hamano @ 2006-02-25 6:04 UTC (permalink / raw)
To: Eric Wong; +Cc: Eric W. Biederman, git, Martin Langhoff, Matthias Urlichs
In-Reply-To: <20060224131922.GA19401@localdomain>
Eric Wong <normalperson@yhbt.net> writes:
>> I think 99.9% of the time it is a mistake if a single-parented
>> commit has the same tree as its parent commit has, so having a
>> check in commit-tree may not be a bad idea.
>
> This would break importers, more than 0.1% I think... Arch
> definitely allows empty commits for getting log messages in.
> SVN forbids them from their POV, but they also have things
> that we can't see when we import (properties like: mime,
> externals, eol-style) causing us to write the same tree twice.
> Not sure about CVS...
>
> Maybe a flag such as --force could be added.
Good point perhaps. Maybe an explicit --no-empty should ask for
it, and git-am should use it. As to git-commit I am unsure.
In the meantime I'd drop the patch.
^ permalink raw reply
* Re: [PATCH] diff-delta: produce optimal pack data
From: Nicolas Pitre @ 2006-02-25 5:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, git, Carl Baldwin
In-Reply-To: <Pine.LNX.4.64.0602242326381.31162@localhost.localdomain>
OOps.... Forgot to complete one paragraph.
On Sat, 25 Feb 2006, Nicolas Pitre wrote:
> I of course looked at the time to pack vs the size reduction in my
> tests. And really like I said above the cost is well balanced. The
> only issue is that smaller blocks are more likely to trap into
> patological data sets. But that problem does exist with larger blocks
> too, to a lesser degree of course but still. For example, using a 16
> block size with adler32, computing a delta between two files
... as provided by Carl takes up to _nine_ minutes for a _single_ delta !
So regardless of the block size used, the issue right now has more to do
with that combinatorial explosion than the actual block size. And
preventing that patological case from expending out of bounds is pretty
easy to do.
OK I just tested a tentative patch to trap that case and the time to
delta those two 20MB files passed from over 9 minutes to only 36 seconds
here, with less than 10% in delta size difference. So I think I might
be on the right track. Further tuning might help even further.
Nicolas
^ permalink raw reply
* Re: [PATCH] diff-delta: produce optimal pack data
From: Nicolas Pitre @ 2006-02-25 5:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, git, Carl Baldwin
In-Reply-To: <Pine.LNX.4.64.0602241952140.22647@g5.osdl.org>
On Fri, 24 Feb 2006, Linus Torvalds wrote:
>
>
> On Fri, 24 Feb 2006, Nicolas Pitre wrote:
> >
> > Well, that's the compromize to make. By default the version with
> > adler32 used 16 byte blocks to index the reference buffer. That means
> > you can match target data against the reference only if whole 16 byte
> > blocks match. Then, if you fix a typo in the target buffer then you'll
> > inevitably need 16 literal bytes in the delta instead of only
> > one because you won't be able to resynchronize with the reference buffer
> > until the next 16 byte block.
>
> That shouldn't be true. Once you find a 16-byte match, you can search
> forward (in theory you should be able to search backwards too, but by that
> time we've already expanded the non-matching previous bytes, I think).
Obviously, but that's not my point. What I mean is that small changes
will kind of sabotage the match since you'll have to scan forward
(adding literal bytes to the delta output in the mean time) until you
find another 16 byte block that matches. This is harder to find than a
3 byte block. Hence you'll end up adding more literal bytes (up to 16
of them) until you can match the reference buffer again even if you only
flipped one byte in the target buffer. In some cases that makes up for
deltas that are twice as large.
> > What I've made in my last delta patch is to reduce that 16 byte block to
> > only 3 bytes. Why 3 bytes? Because less than that produces smaller
> > delta data if done with literal bytes directly, and 3 bytes provided
> > enough bits to hash.
>
> On the other hand, the cost is that your lookups _are_ going to be more
> expensive. Regardless of how good the hash is, basically you have 16/3
> more hash-entries to look up, so you've made compression more expensive in
> footprint, at least (I assume you've made the hash appropriately larger).
Yes, the hash is larger. There is a cost in memory usage but not really
in CPU cycles.
> Also, at 3 bytes, insertion is at least equally dense (three bytes of data
> vs three bytes of offset into the source), and can be worse (the offset
> might be 5 bytes, no?). So it would seem like you'd be better off with 4+
> bytes, at which point the delta should be a win.
The code already discriminate the space of a block copy notation given
the offset and size vs the space for the equivalent literal bytes. So
the optimal encoding is always chosen already.
In fact, if you want to copy up to 15 bytes from offset 0 that will be
encoded with only 2 bytes in the delta. The only case that is
suboptimal is when you want to copy only two bytes from offset 0 (2
delta bytes) but only two bytes is mever matched by the hash lookup
since the hash is computed with 3 bytes. In that case 2 literal bytes
will be added to the delta plus opcode = 3 bytes. I considered that
special case not worth it. However copying a block of 3 bytes that gets
encoded into 3 bytes of delta is quite common (that'd take 4 bytes of
delta if they were literals).
As for using more bytes for block hashing, that increase thenumber of
cycles to compute the hash. The adler32 version reads 16 bytes for
every byte offset in the target file while my latest version only reads
3 bytes for every byte offset. So in effect my target hash computation
is faster than the adler32 one. However there is potentially more
entries in the same hash bucket to validate especially with repetitive
data.
> Have you tried some half-way point, like ~8 bytes?
Yes, and while the needed cycles tend to remain the same on average, the
resulting pack gets larger.
> > Now the problem comes when indexing a reference file full of:
> >
> > 0x46f8, 0x000b, 0x42fe, 0x0000, 0xffc0, 0x0001, 0xff00, 0x0008,
> ...
> >
> > There is a bunch of ", 0x" that get hashed to the same thing.
>
> You'll find a lot of that in any file: three or four bytes of similarity
> just doesn't sound worthwhile to go digging after.
Well after having experimented a lot with multiple parameters I think
they are worth it after all. Not only they provide for optimal deltas,
but their hash is faster to compute than larger blocks which seems to
counter balance for the cost of increased hash list.
> > The adler32 made that particular example a non issue since the
> > likelyhood of many 16 byte blocks to be the same is pretty low in this
> > case. But the flaw remains if for example there is lots of similar 16
> > byte blocks, like a binary file with lots of zeroes for example. In
> > fact, the performance problem Carl is having does use the diff-delta
> > version still using adler32.
>
> Agreed. I think limiting the hash length is a fine idea regardless, I just
> think it sounds dangerous with the three-byte thing where a lot of matches
> should be expected (never mind ", 0x", just things like newlines and tabs
> in source code).
They are usually less than the number of lines. Yet if you have a 1000
line source file and let's suppose that we keep only 50 hashed "\n\t\t"
this is sufficient to provide enough opportunities for matching that
plus common patterns like a following "if (" for example.
> Only considering 16-byte sequences of similarities is a trade-off of
> packing cost vs win.. Not saying other block-sizes aren't worth testing,
> but I suspect trying too hard is going to be too costly.
I of course looked at the time to pack vs the size reduction in my
tests. And really like I said above the cost is well balanced. The
only issue is that smaller blocks are more likely to trap into
patological data sets. But that problem does exist with larger blocks
too, to a lesser degree of course but still. For example, using a 16
block size with adler32, computing a delta between two files
> Especially as deltas compress _worse_ the smaller they are. Bigger
> "insert" chunks probably compress a lot better than a copy chunk.
Yes, but given that we favor deltas from larger to smaller already those
inserts are already not making much differences. They have to be quite
large to effectively provide better zlib compression.
> Have you looked at the delta size vs compression?
That's certainly an additional test worth adding to try_delta().
max_size should be smaller than the original object deflated size making
sure we won't store deltas that might end up larger than the undeltified
object.
Nicolas
^ permalink raw reply
* Re: [PATCH] diff-delta: produce optimal pack data
From: Linus Torvalds @ 2006-02-25 4:05 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git, Carl Baldwin
In-Reply-To: <Pine.LNX.4.64.0602242130030.31162@localhost.localdomain>
On Fri, 24 Feb 2006, Nicolas Pitre wrote:
>
> Well, that's the compromize to make. By default the version with
> adler32 used 16 byte blocks to index the reference buffer. That means
> you can match target data against the reference only if whole 16 byte
> blocks match. Then, if you fix a typo in the target buffer then you'll
> inevitably need 16 literal bytes in the delta instead of only
> one because you won't be able to resynchronize with the reference buffer
> until the next 16 byte block.
That shouldn't be true. Once you find a 16-byte match, you can search
forward (in theory you should be able to search backwards too, but by that
time we've already expanded the non-matching previous bytes, I think).
But I'm no xdelta expert, so I migt be wrong.
> What I've made in my last delta patch is to reduce that 16 byte block to
> only 3 bytes. Why 3 bytes? Because less than that produces smaller
> delta data if done with literal bytes directly, and 3 bytes provided
> enough bits to hash.
On the other hand, the cost is that your lookups _are_ going to be more
expensive. Regardless of how good the hash is, basically you have 16/3
more hash-entries to look up, so you've made compression more expensive in
footprint, at least (I assume you've made the hash appropriately larger).
Also, at 3 bytes, insertion is at least equally dense (three bytes of data
vs three bytes of offset into the source), and can be worse (the offset
might be 5 bytes, no?). So it would seem like you'd be better off with 4+
bytes, at which point the delta should be a win.
Have you tried some half-way point, like ~8 bytes?
> Now the problem comes when indexing a reference file full of:
>
> 0x46f8, 0x000b, 0x42fe, 0x0000, 0xffc0, 0x0001, 0xff00, 0x0008,
...
>
> There is a bunch of ", 0x" that get hashed to the same thing.
You'll find a lot of that in any file: three or four bytes of similarity
just doesn't sound worthwhile to go digging after.
> The adler32 made that particular example a non issue since the
> likelyhood of many 16 byte blocks to be the same is pretty low in this
> case. But the flaw remains if for example there is lots of similar 16
> byte blocks, like a binary file with lots of zeroes for example. In
> fact, the performance problem Carl is having does use the diff-delta
> version still using adler32.
Agreed. I think limiting the hash length is a fine idea regardless, I just
think it sounds dangerous with the three-byte thing where a lot of matches
should be expected (never mind ", 0x", just things like newlines and tabs
in source code).
Only considering 16-byte sequences of similarities is a trade-off of
packing cost vs win.. Not saying other block-sizes aren't worth testing,
but I suspect trying too hard is going to be too costly.
Especially as deltas compress _worse_ the smaller they are. Bigger
"insert" chunks probably compress a lot better than a copy chunk.
Have you looked at the delta size vs compression?
Linus
^ permalink raw reply
* Re: [PATCH] diff-delta: produce optimal pack data
From: Nicolas Pitre @ 2006-02-25 3:53 UTC (permalink / raw)
To: Carl Baldwin; +Cc: Junio C Hamano, git
In-Reply-To: <20060224225023.GA28538@hpsvcnb.fc.hp.com>
On Fri, 24 Feb 2006, Carl Baldwin wrote:
> > If you can add them into a single .tgz with instructions on how
> > to reproduce the issue and provide me with an URL where I can fetch it
> > that'd be perfect.
>
> I will do this in an email off of the list because these files really
> shouldn't be available on a public list.
OK I have the files, and I can confirm that your problem is of the same
combinatorial explosion type I already talked about, _even_ with the
version using adler32. This is really O(m*n) where m and n being the
size of two consecutive versions of the files. But since m and n are
both _huge_ then the delta code really goes out to lunch.
I'm working on a patch to cap this to something like O(m+n).
Stay tuned.
Nicolas
^ permalink raw reply
* Re: [PATCH] diff-delta: produce optimal pack data
From: Nicolas Pitre @ 2006-02-25 3:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, git, Carl Baldwin
In-Reply-To: <Pine.LNX.4.64.0602241637480.22647@g5.osdl.org>
On Fri, 24 Feb 2006, Linus Torvalds wrote:
>
>
> On Fri, 24 Feb 2006, Nicolas Pitre wrote:
> >
> > Currently, diff-delta takes blocks of data in the reference file and
> > hash them. When the target file is scanned, it uses the hash to match
> > blocks from the target file with the reference file.
> >
> > If blocks are hashed evenly the cost of producing a delta is at most
> > O(n+m) where n and m are the size of the reference and target files
> > respectively. In other words, with good data set the cost is linear.
>
> Assuming the hash is good, of course.
>
> I think this was the problem with you trying something simpler than
> adler32..
Well, that's the compromize to make. By default the version with
adler32 used 16 byte blocks to index the reference buffer. That means
you can match target data against the reference only if whole 16 byte
blocks match. Then, if you fix a typo in the target buffer then you'll
inevitably need 16 literal bytes in the delta instead of only
one because you won't be able to resynchronize with the reference buffer
until the next 16 byte block.
What I've made in my last delta patch is to reduce that 16 byte block to
only 3 bytes. Why 3 bytes? Because less than that produces smaller
delta data if done with literal bytes directly, and 3 bytes provided
enough bits to hash. I also made those 3 byte blocks overlap so
indexing would start at any offset with byte precision. This really
allows for optimal deltas so that they cannot be smaller.
Now the problem comes when indexing a reference file full of:
0x46f8, 0x000b, 0x42fe, 0x0000, 0xffc0, 0x0001, 0xff00, 0x0008,
0x03e0, 0x0009, 0x0f01, 0x0003, 0x8072, 0x0000, 0x0400, 0x0000,
0x0046, 0x0003, 0x9180, 0x0001, 0x0003, 0x0008, 0x02eb, 0x0003,
0x8072, 0x0000, 0x0400, 0x0000, 0x8010, 0x0008, 0x0010, 0x0000,
0x0361, 0x0003, 0x037e, 0x0004, 0x3941, 0x0002, 0x0b0f, 0x0003,
0x8072, 0x0000, 0x0400, 0x0000, 0x000a, 0x000b, 0x0346, 0x000c,
0x11fe, 0x0000, 0x3717, 0x0003, 0x8072, 0x0000, 0x0400, 0x0000,
0x8010, 0x0008, 0x000e, 0x0000, 0x0361, 0x0003, 0x8060, 0x0000,
There is a bunch of ", 0x" that get hashed to the same thing. And when
the second phase i.e. trying to find the best match into the reference
buffer for each occurrence of the same many ", 0x" in the target buffer
you get a conbinatorial explosion.
The adler32 made that particular example a non issue since the
likelyhood of many 16 byte blocks to be the same is pretty low in this
case. But the flaw remains if for example there is lots of similar 16
byte blocks, like a binary file with lots of zeroes for example. In
fact, the performance problem Carl is having does use the diff-delta
version still using adler32.
> > But if many blocks from the reference buffer do hash to the same bucket
> > then for each block in the target file many blocks from the reference
> > buffer have to be tested against, making it tend towards O(n^m) which is
> > pretty highly exponential.
> >
> > The solution I'm investigating is to put a limit on the number of
> > entries in the same hash bucket so to bring the cost back to something
> > more linear. That means the delta might miss on better matches that
> > have not been hashed but still benefit from a limited set.
>
> Sounds fair enough.
Testing appear to show that this is a worthwhile safety valve. And in
most case that safety valve should not be activated at all.
> However, you migt also want to consider another approach..
>
> One of the biggest costs for the xdelta algorithm is probably just the
> "delta_prepare()", but at the same time, that is constant wrt the source
> buffer.
Actually it is not that costly. Much much less than computing the sha1
of the same buffer for example.
> Now, the sad part is that when I wrote pack-objects, I didn't really
> understand the diff-delta algorithm, I just plugged it in. Which means
> that when I did it, I made the (obvious and simple) decision to keep the
> _result_ that we are looking at constant, and try to delta against
> different sources.
>
> HOWEVER.
>
> I suspect you already see where this is going..
>
> We _could_ switch the "pack-objects" window handling around, and instead
> of looking at the object we want to pack, and looking at the ten (or
> "window") previous objects to delta against, we could do it the other way
> around: keep the object we delta against constant, and see what deltas we
> could prepare for the ten next objects.
>
> And since the source would now be constant, you'd need to do the
> "delta_prepare()" just _once_ per window, instead of every single time.
Might be worth trying. Actually, this can be tested without even
changing the window handling just yet, since diff-delta() could return
the index data instead of freeing it, and pack-objects can store it
along side with the object data it tries to delta against. That
wouldn't be memory efficient, but at least that would give an idea of
the magnitude of the saving on CPU time. But I really doubt that'll
save more than a few percent.
>
> Now, I haven't done any profiling on the diff-delta code, and maybe my
> guess that delta_prepare() is a pretty expensive part is wrong, and maybe
> it wouldn't help to switch the window probing around. But I thought I'd
> mention it as one thing to explore..
Just to give you an idea, the bulk of my current "prepare" code looks
like this:
/* then populate the index */
data = buf + bufsize - 2;
while (data > buf) {
entry->ptr = --data;
i = (data[0] << hshift) ^ data[1];
i ^= (i << hshift) ^ data[2];
entry->next = hash[i];
hash[i] = entry++;
}
As you can see it is pretty lightweight.
But that would probably be a worthwhile optimization to have even if it
saves 10% of CPU time.
Nicolas
^ permalink raw reply
* Re: git-annotate
From: Nicolas Vilz 'niv' @ 2006-02-25 1:21 UTC (permalink / raw)
To: GIT Mailing List
In-Reply-To: <118833cc0602240721s1c64219y161cc0536fb361e4@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 668 bytes --]
Morten Welinder wrote:
> git-annotate ought to call usage() if it doesn't get its file.
Full Ack.
>>git annotate
>
> Use of uninitialized value in open at
> /scratch/welinder/git/git-annotate line 40.
> Failed to open filename: No such file or directory at
> /scratch/welinder/git/git-annotate line 40.
oh yes, that was something, i slipped over today, didn't have the time
to report it myself.
additionally i got following with git-annotate --help
niv@hermes ~ $ git-annotate --help
/usr/bin/git-annotate version [unknown] calling Getopt::Std::getopts
(version 1.05 [paranoid]),
running under Perl version 5.8.8.
i don't know if that should happen so.
Nicolas
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]
^ permalink raw reply
* Re: [PATCH] diff-delta: produce optimal pack data
From: Linus Torvalds @ 2006-02-25 0:45 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git, Carl Baldwin
In-Reply-To: <Pine.LNX.4.64.0602241613030.31162@localhost.localdomain>
On Fri, 24 Feb 2006, Nicolas Pitre wrote:
>
> Currently, diff-delta takes blocks of data in the reference file and
> hash them. When the target file is scanned, it uses the hash to match
> blocks from the target file with the reference file.
>
> If blocks are hashed evenly the cost of producing a delta is at most
> O(n+m) where n and m are the size of the reference and target files
> respectively. In other words, with good data set the cost is linear.
Assuming the hash is good, of course.
I think this was the problem with you trying something simpler than
adler32..
> But if many blocks from the reference buffer do hash to the same bucket
> then for each block in the target file many blocks from the reference
> buffer have to be tested against, making it tend towards O(n^m) which is
> pretty highly exponential.
>
> The solution I'm investigating is to put a limit on the number of
> entries in the same hash bucket so to bring the cost back to something
> more linear. That means the delta might miss on better matches that
> have not been hashed but still benefit from a limited set.
Sounds fair enough.
However, you migt also want to consider another approach..
One of the biggest costs for the xdelta algorithm is probably just the
"delta_prepare()", but at the same time, that is constant wrt the source
buffer.
Now, the sad part is that when I wrote pack-objects, I didn't really
understand the diff-delta algorithm, I just plugged it in. Which means
that when I did it, I made the (obvious and simple) decision to keep the
_result_ that we are looking at constant, and try to delta against
different sources.
HOWEVER.
I suspect you already see where this is going..
We _could_ switch the "pack-objects" window handling around, and instead
of looking at the object we want to pack, and looking at the ten (or
"window") previous objects to delta against, we could do it the other way
around: keep the object we delta against constant, and see what deltas we
could prepare for the ten next objects.
And since the source would now be constant, you'd need to do the
"delta_prepare()" just _once_ per window, instead of every single time.
Now, I haven't done any profiling on the diff-delta code, and maybe my
guess that delta_prepare() is a pretty expensive part is wrong, and maybe
it wouldn't help to switch the window probing around. But I thought I'd
mention it as one thing to explore..
Linus
^ permalink raw reply
* Re: [PATCH] diff-delta: produce optimal pack data
From: Junio C Hamano @ 2006-02-24 23:55 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602241029360.23719@localhost.localdomain>
Nicolas Pitre <nico@cam.org> writes:
> On Fri, 24 Feb 2006, Junio C Hamano wrote:
>
>> In Linux 2.6 repository, these object pairs take forever to
>> delta.
>>
>> blob 9af06ba723df75fed49f7ccae5b6c9c34bc5115f ->
>> blob dfc9cd58dc065d17030d875d3fea6e7862ede143
>> size (491102 -> 496045)
>> 58 seconds
>>
>> blob 4917ec509720a42846d513addc11cbd25e0e3c4f ->
>> blob dfc9cd58dc065d17030d875d3fea6e7862ede143
>> size (495831 -> 496045)
>> 64 seconds
>
> Thanks for this. I'll see what I can do to tweak the code to better
> cope with those. Just keep my fourth delta patch in the pu branch for
> now.
If apply this on top of pack-objects.c, you can find more of
them yourself.
---
diff --git a/pack-objects.c b/pack-objects.c
index be7a200..3f88e86 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -62,6 +62,7 @@ static const char *base_name;
static unsigned char pack_file_sha1[20];
static int progress = 1;
static volatile int progress_update = 0;
+static volatile int progress_update_cnt = 0;
/*
* The object names in objects array are hashed with this hashtable,
@@ -826,6 +827,7 @@ static int try_delta(struct unpacked *cu
struct object_entry *old_entry = old->entry;
int old_preferred = (old_entry->preferred_base ||
old_entry->based_on_preferred);
+ int last_up;
unsigned long size, oldsize, delta_size, sizediff;
long max_size;
void *delta_buf;
@@ -890,8 +892,17 @@ static int try_delta(struct unpacked *cu
}
if (sizediff >= max_size)
return -1;
+ last_up = progress_update_cnt;
delta_buf = diff_delta(old->data, oldsize,
cur->data, size, &delta_size, max_size);
+ if (last_up + 1 < progress_update_cnt) {
+ /* It took more than one second */
+ fprintf(stderr, "%d -> %d: %s -> ",
+ last_up, progress_update_cnt,
+ sha1_to_hex(old_entry->sha1));
+ fprintf(stderr, "%s (%lu -> %lu)\n",
+ sha1_to_hex(cur_entry->sha1), oldsize, size);
+ }
if (!delta_buf)
return 0;
cur_entry->delta = old_entry;
@@ -906,6 +917,7 @@ static void progress_interval(int signum
{
signal(SIGALRM, progress_interval);
progress_update = 1;
+ progress_update_cnt++;
}
static void find_deltas(struct object_entry **list, int window, int depth)
^ permalink raw reply related
* [PATCH] Add missing programs to ignore list
From: Shawn Pearce @ 2006-02-24 22:51 UTC (permalink / raw)
To: git
Added recently added programs to the default exclude list.
---
I found these were missing in master. pg won't let me work in a
directory if there are untracked files laying around which aren't
excluded by an ignore pattern.
.gitignore | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
base 809da5f8a21a10112eece4ee9be55fe64371ce68
last 9752e066a00144fc1b7657e6d54569e9d9764092
diff --git a/.gitignore b/.gitignore
index 94f66d5a1ef9d1e2bec2c14b392565ee5bf98dd6..5be239a4aa95f78cb09e4921459e1678b9f3c36e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,7 @@ GIT-VERSION-FILE
git
git-add
git-am
+git-annotate
git-apply
git-applymbox
git-applypatch
@@ -22,6 +23,7 @@ git-convert-objects
git-count-objects
git-cvsexportcommit
git-cvsimport
+git-cvsserver
git-daemon
git-diff
git-diff-files
@@ -53,6 +55,7 @@ git-mailsplit
git-merge
git-merge-base
git-merge-index
+git-merge-tree
git-merge-octopus
git-merge-one-file
git-merge-ours
@@ -60,6 +63,7 @@ git-merge-recursive
git-merge-resolve
git-merge-stupid
git-mktag
+git-mktree
git-name-rev
git-mv
git-pack-redundant
--
1.2.3.g809d
^ permalink raw reply related
* Re: [PATCH] diff-delta: produce optimal pack data
From: Carl Baldwin @ 2006-02-24 22:50 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0602241544270.31162@localhost.localdomain>
On Fri, Feb 24, 2006 at 04:12:14PM -0500, Nicolas Pitre wrote:
> On Fri, 24 Feb 2006, Carl Baldwin wrote:
> > After the twelve large objects were packed into individual packs the
> > rest of the packing went very quickly and git v1.2.3's date reuse worked
> > very well. This was sort of my attempt at simulating how things would
> > be if git avoided deltification of each of these large files. I'm sorry
> > to have been so harsh earlier I just didn't understand that
> > incrementally packing one-by-one was going to help this much.
>
> Hmmmmmmm....
>
> I don't think I understand what is going on here.
>
> You say that, if you add those big files and incrementally repack after
> each commit using git repack with no option, then it requires only about
> one second each time. Right?
Well, actually I was packing them individually by calling
git-pack-objects directly on each blob.
I'll try doing it exactly as you describe...
Ok, I tried it. Basically I do the following.
% mkdir test
% cd test
% git init-db
% cp ../files/binfile.1 binfile
% time git add binfile
real 0m2.459s
user 0m2.443s
sys 0m0.019s
% git commit -a -m "Rev 1"
% time git repack
[snip]
real 0m1.111s
user 0m1.046s
sys 0m0.061s
% for i in $(seq 2 12); do
cp ../files/binfile.$i binfile
time git commit -a -m "Rev $i"
time git repack
done
Each commit takes around 2.8-3.5 seconds and each repack takes about
1.2-1.5 seconds. These are prettly reasonable.
Now, I try 'git repack -a -f' (or even without -f) and it goes out to
lunch. I think it would take on the order of a day to actually finish
because it wasn't very far after an hour.
[snip]
> How many files besides those 12 big blobs do you have?
This repository has been completely stripped to the 12 revisions of the
one file. So, there are 36 objects.
12 blobs.
12 trees.
12 commits.
That is all.
[snip]
> If you can add them into a single .tgz with instructions on how
> to reproduce the issue and provide me with an URL where I can fetch it
> that'd be perfect.
I will do this in an email off of the list because these files really
shouldn't be available on a public list.
Carl
--
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Carl Baldwin RADCAD (R&D CAD)
Hewlett Packard Company
MS 88 work: 970 898-1523
3404 E. Harmony Rd. work: Carl.N.Baldwin@hp.com
Fort Collins, CO 80525 home: Carl@ecBaldwin.net
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox