Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v2] Add --track option to git clone
From: David Soria Parra @ 2009-12-02 19:27 UTC (permalink / raw)
  To: git
In-Reply-To: <20091202190807.GB30778@coredump.intra.peff.net>

On 2009-12-02, Jeff King <peff@peff.net> wrote:
>   1. Add "--track foo" as a convenience wrapper for "-f foo -b foo".
>
>   2. If no "-b" is given, the first "-f" is assumed as "-b". So "git
>      clone -f foo" becomes equivalent to David's --track.
>
> And of course the name "-f" (for --fetch, if you were wondering) is open
> to suggestion.
>
> What do you think?
>
This approach is much better than my initial proposal. Sadly I won't have time
to implement this, which is why I wrote the simplest working solution for me.
Fetch seems to reasonable. I can rewrite the patch to be able to use refspecs, but
it would require additional refactoring to be able to specify multiple --fetch parameters.

David

^ permalink raw reply

* Re: [PATCH/RFC 01/11] mingw: add network-wrappers for daemon
From: Johannes Sixt @ 2009-12-02 19:34 UTC (permalink / raw)
  To: kusmabite; +Cc: Martin Storsjö, msysgit, git, dotzenlabs
In-Reply-To: <40aa078e0912020501v9378c37l106e1e23b5e7b43d@mail.gmail.com>

On Mittwoch, 2. Dezember 2009, Erik Faye-Lund wrote:
> I'm not very familiar with poll(), but if I understand the man-pages
> correctly it's waiting for events on file descriptors, and is in our
> case used to check for incoming connections, right? If so, I see three
> possible ways forward: (1) extending our poll()-emulation to handle
> multiple sockets, (2) change daemon.c to check one socket at the time,
> and (3) using select() instead of poll().
>
> (1) seems like the "correct" but tricky thing to do, (2) like the
> "easy" but nasty thing to do. However, (3) strikes me as the least
> dangerous thing to do ;)
>
> For (1), there's also a WSAPoll() function in Windows, but I'm not
> sure how to figure out if an fd is a socket or a pipe. There's also
> WaitForMultipleObjects.

GetFileType() returns FILE_TYPE_PIPE for both pipes and sockets. But once you 
know this, you can use getsockopt(): If it succeeds, it is a socket, and in 
this case, assume that poll() was called from git-daemon, i.e. all polled-for 
fds are sockets and you can select().

-- Hannes

^ permalink raw reply

* [PATCH 2/5] git-svn: Make merge metadata accessible to make_log_entry
From: Alex Vandiver @ 2009-12-02 19:07 UTC (permalink / raw)
  To: git
In-Reply-To: <1259780874-14706-1-git-send-email-alex@chmrr.net>


Signed-off-by: Alex Vandiver <alex@chmrr.net>
---
 git-svn.perl |   22 ++++++++++++++--------
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 53bf20c..5337326 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -2924,7 +2924,7 @@ sub check_author {
 }
 
 sub find_extra_svk_parents {
-	my ($self, $ed, $tickets, $parents) = @_;
+	my ($self, $ed, $tickets, $parents, $merges) = @_;
 	# aha!  svk:merge property changed...
 	my @tickets = split "\n", $tickets;
 	my @known_parents;
@@ -2944,14 +2944,15 @@ sub find_extra_svk_parents {
 				# wahey!  we found it, but it might be
 				# an old one (!)
 				push @known_parents, [ $rev, $commit ];
+				push @known_parents, [ $rev, $path, $commit ];
 			}
 		}
 	}
 	# Ordering matters; highest-numbered commit merge tickets
 	# first, as they may account for later merge ticket additions
 	# or changes.
-	@known_parents = map {$_->[1]} sort {$b->[0] <=> $a->[0]} @known_parents;
-	for my $parent ( @known_parents ) {
+	for my $merge ( sort {$b->[0] <=> $a->[0]} @known_parents ) {
+		my ($rev, $path, $parent) = @{$merge};
 		my @cmd = ('rev-list', $parent, map { "^$_" } @$parents );
 		my ($msg_fh, $ctx) = command_output_pipe(@cmd);
 		my $new;
@@ -2963,6 +2964,7 @@ sub find_extra_svk_parents {
 			print STDERR
 			    "Found merge parent (svk:merge ticket): $parent\n";
 			push @$parents, $parent;
+			push @$merges, "$path:$rev";
 		}
 	}
 }
@@ -3061,27 +3063,31 @@ sub make_log_entry {
 	my ($self, $rev, $parents, $ed) = @_;
 	my $untracked = $self->get_untracked($ed);
 
-	my @parents = @$parents;
+	my %log_entry = ( parents => $parents,
+	                  merged_branches => [],
+	                  revision => $rev,
+	                  log => '');
 	my $ps = $ed->{path_strip} || "";
 	for my $path ( grep { m/$ps/ } %{$ed->{dir_prop}} ) {
 		my $props = $ed->{dir_prop}{$path};
 		if ( $props->{"svk:merge"} ) {
 			$self->find_extra_svk_parents
-				($ed, $props->{"svk:merge"}, \@parents);
+				($ed,
+				 $props->{"svk:merge"},
+				 $log_entry{parents},
+				 $log_entry{merged_branches});
 		}
 		if ( $props->{"svn:mergeinfo"} ) {
 			$self->find_extra_svn_parents
 				($ed,
 				 $props->{"svn:mergeinfo"},
-				 \@parents);
+				 $log_entry{parents});
 		}
 	}
 
 	open my $un, '>>', "$self->{dir}/unhandled.log" or croak $!;
 	print $un "r$rev\n" or croak $!;
 	print $un $_, "\n" foreach @$untracked;
-	my %log_entry = ( parents => \@parents, revision => $rev,
-	                  log => '');
 
 	my $headrev;
 	my $logged = delete $self->{logged_rev_props};
-- 
1.6.6.rc0.327.g032bc

^ permalink raw reply related

* [PATCH 1/5] git-svn: Allow setting the committer and author date separately
From: Alex Vandiver @ 2009-12-02 19:07 UTC (permalink / raw)
  To: git
In-Reply-To: <1259780874-14706-1-git-send-email-alex@chmrr.net>


Signed-off-by: Alex Vandiver <alex@chmrr.net>
---
 git-svn.perl |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 51f03ad..53bf20c 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -2491,7 +2491,7 @@ sub set_commit_header_env {
 
 	$ENV{GIT_AUTHOR_NAME} = $log_entry->{name};
 	$ENV{GIT_AUTHOR_EMAIL} = $log_entry->{email};
-	$ENV{GIT_AUTHOR_DATE} = $ENV{GIT_COMMITTER_DATE} = $log_entry->{date};
+	$ENV{GIT_AUTHOR_DATE} = $log_entry->{date};
 
 	$ENV{GIT_COMMITTER_NAME} = (defined $log_entry->{commit_name})
 						? $log_entry->{commit_name}
@@ -2499,6 +2499,9 @@ sub set_commit_header_env {
 	$ENV{GIT_COMMITTER_EMAIL} = (defined $log_entry->{commit_email})
 						? $log_entry->{commit_email}
 						: $log_entry->{email};
+        $ENV{GIT_COMMITTER_DATE} =  (defined $log_entry->{commit_date})
+						? $log_entry->{commit_date}
+						: $log_entry->{date};
 	\%env;
 }
 
-- 
1.6.6.rc0.327.g032bc

^ permalink raw reply related

* [PATCH 3/5] git-svn: Strip SVK headers, optionally parsing author information
From: Alex Vandiver @ 2009-12-02 19:07 UTC (permalink / raw)
  To: git
In-Reply-To: <1259780874-14706-1-git-send-email-alex@chmrr.net>

SVK adds additional headers (often nested arbitrarily) detailing
information on the local commit.  When possible, strip these headers
so that the first line of git's commit message is actually descriptive
of the commit.

Additionally, these headers contain information about the original
author's username, and their local commit time.  If the
--use-log-author flag is set, use this information to set the
information on the git commit.  Note that the username thus extracted
may be a _local_ username, and thus may require additional, somewhat
unexpected, entries in the authors file.

Signed-off-by: Alex Vandiver <alex@chmrr.net>
---
 git-svn.perl |   98 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 5337326..0731425 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -3110,7 +3110,8 @@ sub make_log_entry {
 	close $un or croak $!;
 
 	$log_entry{date} = parse_svn_date($log_entry{date});
-	$log_entry{log} .= "\n";
+	parse_svk_log(\%log_entry) if $log_entry{log} =~ svk_header_regex( lenient => 1 );
+
 	my $author = $log_entry{author} = check_author($log_entry{author});
 	my ($name, $email) = defined $::users{$author} ? @{$::users{$author}}
 						       : ($author, undef);
@@ -3118,7 +3119,15 @@ sub make_log_entry {
 	my ($commit_name, $commit_email) = ($name, $email);
 	if ($_use_log_author) {
 		my $name_field;
-		if ($log_entry{log} =~ /From:\s+(.*\S)\s*\n/i) {
+		if ($log_entry{log_author}) {
+			$log_entry{commit_date} = $log_entry{date};
+			$log_entry{date} = $log_entry{log_author_date};
+			$log_entry{log_author} = check_author($log_entry{log_author});
+			my ($log_author_name, $log_author_email)
+			    = defined $::users{$log_entry{log_author}} ? @{$::users{$log_entry{log_author}}}
+			        : ($log_entry{log_author}, undef);
+			$name_field = "$log_author_name <$log_author_email>";
+		} elsif ($log_entry{log} =~ /From:\s+(.*\S)\s*\n/i) {
 			$name_field = $1;
 		} elsif ($log_entry{log} =~ /Signed-off-by:\s+(.*\S)\s*\n/i) {
 			$name_field = $1;
@@ -3182,6 +3191,91 @@ sub make_log_entry {
 	\%log_entry;
 }
 
+sub svk_header_regex {
+	my %args = ( lenient => 0, orig => 0, @_ );
+	my $orig = $args{orig} ? qr/ \(orig r\d+\)/ : "";
+	my $atstart = "";
+	if ($args{lenient}) {
+		$atstart = qr/\s*/;
+		$orig = qr/(?: \(orig r\d+\))?/;
+	}
+	return qr/^${atstart}r\d+\@\S+$orig:\s*(\S+)\s*\|\s*(.*?)\s*([+-]\d+)$/m;
+}
+
+sub parse_svk_log {
+	my $log_entry = shift;
+	my $log = $log_entry->{log};
+
+	# Strip off blank lines at the start and end
+	$log =~ s/^(\s*?\n)+//;
+	$log =~ s/\s*$//;
+
+	# If each line starts with a space, this might be an
+	# unmodified SVK log format.  As a side effect, this also
+	# trims the leading space off of the lines.
+	my $lines = $log =~ s/^//mg;
+	my $spaced = $log =~ s/^ //mg;
+	return unless $lines == $spaced;
+
+	my $regex = svk_header_regex( orig => 1 );
+	if ($log =~ /\A$regex/) {
+		# This is either a merge commit, or a base-less merge
+		# (replay from a different repository) The \A assures
+		# that this is an _unedited_ merge commit with no
+		# hand-supplied log message.
+		if (@{$log_entry->{merged_branches} || []}) {
+			# This is a merge with no description; provide
+			# one.
+			$log_entry->{log} = "Merge from @{$log_entry->{merged_branches}}\n\n$log";
+		} else {
+			my $commits = 0;
+			$commits++ while $log =~ /$regex/g;
+			if ($commits == 1) {
+				# This is a baseless merge of one
+				# commit; strip off the original
+				# commit info
+				$log_entry->{log_author} = $1;
+				$log_entry->{log_author_date} = "$3 $2";
+				$log =~ s/\A$regex\n*//;
+				$log_entry->{log} = $log;
+				parse_svk_log($log_entry);
+			} else {
+				# A lump baseless merge?  Remove all
+				# of the SVK headers on this level,
+				# and add a summary.  Trailing
+				# newlines on the svk header lines are
+				# left unmolested, so they become
+				# blank lines.
+				$log =~ s/$regex//g;
+				$log_entry->{log} = "Lump commit\n$log";
+			}
+		}
+	} else {
+		# Look for svk header lines without the (orig r12345),
+		# which were local commits.
+		$regex = svk_header_regex();
+		my $commits = 0;
+		$commits++ while $log =~ /$regex/g;
+		if ($commits == 0) {
+			# No more svk-like commits; don't change anything.
+		} elsif ($commits == 1) {
+			# Only one top-level commit-like object; strip
+			# it off, recurse down.
+			$log_entry->{log_author} = $1;
+			$log_entry->{log_author_date} = "$3 $2";
+			$log =~ s/$regex\n*//;
+			$log_entry->{log} = $log;
+			parse_svk_log($log_entry);
+		} else {
+			# This is a lump push of local commits.	 Strip
+			# off all of the svk headers in this level,
+			# and call it quits.
+			$log =~ s/$regex//g;
+			$log_entry->{log} = $log;
+		}
+	}
+}
+
 sub fetch {
 	my ($self, $min_rev, $max_rev, @parents) = @_;
 	my ($last_rev, $last_commit) = $self->last_rev_commit;
-- 
1.6.6.rc0.327.g032bc

^ permalink raw reply related

* [PATCH 0/5] git-svn: svk log message cleanup
From: Alex Vandiver @ 2009-12-02 19:07 UTC (permalink / raw)
  To: git

This patch series tries to clean up the cruft that svk leaves in log
messages, while optionally (using the existing --use-log-author
option) using the information therein to set author username and time.
 - Alex

^ permalink raw reply

* [PATCH 4/5] git-svn: Provide a default "empty commit message" so the metadata is not the header
From: Alex Vandiver @ 2009-12-02 19:07 UTC (permalink / raw)
  To: git
In-Reply-To: <1259780874-14706-1-git-send-email-alex@chmrr.net>

git-svn adds a trailing line of metadata to the commit message.  If
the commit message would otherwise be empty, this can lead to
confusing display in `gitk` and `git log --oneline`.  Thus, provide a
no-op "(empty commit message)" message for the first line of such
messages.

Signed-off-by: Alex Vandiver <alex@chmrr.net>
---
 git-svn.perl |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 0731425..87462c9 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -3111,6 +3111,7 @@ sub make_log_entry {
 
 	$log_entry{date} = parse_svn_date($log_entry{date});
 	parse_svk_log(\%log_entry) if $log_entry{log} =~ svk_header_regex( lenient => 1 );
+	$log_entry{log} = "(empty commit message)\n" unless $log_entry{log} =~ /\S/;
 
 	my $author = $log_entry{author} = check_author($log_entry{author});
 	my ($name, $email) = defined $::users{$author} ? @{$::users{$author}}
-- 
1.6.6.rc0.327.g032bc

^ permalink raw reply related

* [PATCH 5/5] git-svn: Correct a copy-and-pasted misleading comment
From: Alex Vandiver @ 2009-12-02 19:07 UTC (permalink / raw)
  To: git
In-Reply-To: <1259780874-14706-1-git-send-email-alex@chmrr.net>


Signed-off-by: Alex Vandiver <alex@chmrr.net>
---
 git-svn.perl |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 87462c9..804a26c 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -2973,7 +2973,7 @@ sub find_extra_svk_parents {
 # have actually changed
 sub find_extra_svn_parents {
 	my ($self, $ed, $mergeinfo, $parents) = @_;
-	# aha!  svk:merge property changed...
+	# aha!  svn:mergeinfo property changed...
 
 	# We first search for merged tips which are not in our
 	# history.  Then, we figure out which git revisions are in
-- 
1.6.6.rc0.327.g032bc

^ permalink raw reply related

* Re: [PATCH] builtin-commit: add --date option
From: Junio C Hamano @ 2009-12-02 19:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Miklos Vajna, git
In-Reply-To: <20091202192614.GD30778@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Do you really want to set the date to something arbitrary, or do you
> just want to set it to "now"? If the latter case, do you really just
> want the recently discussed --reset-author?
>
> Also, is there a good reason why GIT_AUTHOR_DATE is not respected in
> this case?  If not, should we simply be fixing that bug instead?

I expect I won't be on the list for the rest of the day (I've started the
preparation to tag -rc1), but I have two-and-half points before this
discussion goes too far:

 - The "--reset-author" patch will be in 1.6.6 (it is already in 'master'
   yesterday, I think);

 - We should honor GIT_AUTHOR_DATE if --reset-author is given.

 - I _think_ we should ignore GIT_AUTHOR_DATE if --reset-author is not
   given, as --amend/-c/-C is stronger for being command line options than
   an environment variable.

So I do not think --date is something we urgently need, even though it
might be nice to have it to be consistent with --author.

^ permalink raw reply

* Re: [RFC PATCH 0/8] Git remote helpers to implement smart transports.
From: Jeff King @ 2009-12-02 19:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Sverre Rabbelier, Johannes Schindelin, Ilari Liusvaara, git
In-Reply-To: <7vy6lldwyo.fsf@alter.siamese.dyndns.org>

On Wed, Dec 02, 2009 at 10:58:39AM -0800, Junio C Hamano wrote:

> After all my main objection is against closing the door to others by one
> particular implementation squating on "remote-http" name and refusing the
> use of that nice, authoritative-sounding name by others.

I would think that it would be useful to use the "remote-http" name as
the extra level of indirection (as a symlink, hardlink, or wrapper
script to remote-curl). Then you could have competing first-class
implementations that would be easy for the user (or package manager) to
switch between.

For example, Debian contains versions of curl built against gnutls and
against openssl. Right now the debian git package requires the gnutls
version. But let's say they ship two packages: git-http-curl-openssl and
git-http-curl-gnutls. Then you can install whichever you prefer, and the
package will contain the file "git-remote-http" pointing to
"git-remote-curl-$whatever".

And yes, if you think about it, this particular situation already works
with a hard-coded "git-remote-curl", since both are built on top of
curl, and that makes a reasonable name. But now extend it to "you don't
want to use curl, but rather some other http library". I don't think we
have any interest in providing a non-curl version as part of git itself,
but it provides a hook should somebody want to write their own http
handler (either using a different library, or maybe a wrapper that does
caching, or whatever).

Just my two cents. I don't plan on writing any such third-party remote
handlers, but it seems simple enough to leave the door open.

-Peff

^ permalink raw reply

* Re: [RFC PATCH 0/8] Git remote helpers to implement smart transports.
From: Ilari Liusvaara @ 2009-12-02 19:52 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Sverre Rabbelier, Junio C Hamano, git
In-Reply-To: <alpine.DEB.1.00.0912021832480.4985@pacific.mpi-cbg.de>

On Wed, Dec 02, 2009 at 06:39:19PM +0100, Johannes Schindelin wrote:
> 
> This is definitely a good direction, and it would be even better if the 
> absence of the remote helper was also handled gracefully.  Just think 
> about a (as of now fictious) git-remote-http.rpm relying on git-core.rpm 
> and libcurl.rpm.  If you do not want to access http:// URLs, you can 
> install just git-core.  Once you encounter an http:// URL you need to 
> access, you install git-remote-http.  Keeping git-core.  (I like to call 
> this setup "modular".)

There are some rather unfortunate details relating to this.

Main git executable currently has no good way to discover what went wrong
with remote helper execution that fails before reaching capabilities
exchange.

It would be ideal if executions failing due to ENOENT would be reported
as remote helper not existing, other exec errors reported as failed execution,
fatal signals as remote helper crashing and other exits rely on remote helper
reporting the problem.

Unfortunately, this can't be done without breaking remote helper interface,
either by requiring initial response from helper or requiring helpers not
to explicitly fail due to bad parameters before reaching capabilities exchange,
since one can't know if execution was successuful without seeing at least
one incoming line.

IIRC, current versions print some rather funky error if you try to use
nonexistent helper: 'remote-foo is not git command' or some such.

> Of course, I never understood why the backend should know the 
> implementation detail that it is based on cURL, so it would be even more 
> modular (at least by my definition) if there was no hard-coded mapping 
> (Sverre -- Cc'ed -- seemed to like URLs of the form "svn::http://..." and 
> "cvs::pserver..." to trigger looking for a remote helper explicitely).  I 
> find the compiled-in mapping rather limiting.

That syntax is rather nice for handling foregin VCSes that may have URL forms
that overlap with native ones. But it sure isn't nice for those remote helpers
that implement git native transports (remote-curl is already a precedent on
doing that). 

The API is already general enough to do both: Git native transports (currently
dumb only without lots of effort, which this patchset is about) and foregin 
VCS bridges.

-Ilari

^ permalink raw reply

* Re: warning in git version 1.6.6.rc0.114.gc8648
From: Alejandro Riveira @ 2009-12-02 19:53 UTC (permalink / raw)
  To: git
In-Reply-To: <7veindgt8v.fsf@alter.siamese.dyndns.org>

El Wed, 02 Dec 2009 09:50:40 -0800, Junio C Hamano escribió:

> Alejandro Riveira <ariveira@gmail.com> writes:
> 
> 
> Yes we are aware of the issue and have a patch to do so which requires
> another change which we also already have patch for.  It will be fixed
> before 1.6.6-rc1

 Thanks for the quick answer :)

> 
> Thanks for reporting.
 
 No; thanks *you* for your work in git ^_^

^ permalink raw reply

* [PATCH v2] Detailed diagnosis when parsing an object name fails.
From: y @ 2009-12-02 20:01 UTC (permalink / raw)
  To: git, gitster; +Cc: Matthieu Moy

From: Matthieu Moy <Matthieu.Moy@imag.fr>

The previous error message was the same in many situations (unknown
revision or path not in the working tree). We try to help the user as
much as possible to understand the error, especially with the
sha1:filename notation. In this case, we say whether the sha1 or the
filename is problematic, and diagnose the confusion between
relative-to-root and relative-to-$PWD confusion precisely.

The 6 new error messages are tested.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
Changes since v1:

* Fixed a segfault with

+	if (!prefix)
+		prefix = "";

* Added testcases.

 cache.h                        |    6 ++-
 setup.c                        |   15 +++++-
 sha1_name.c                    |   95 ++++++++++++++++++++++++++++++++++++++--
 t/t1506-rev-parse-diagnosis.sh |   67 ++++++++++++++++++++++++++++
 4 files changed, 176 insertions(+), 7 deletions(-)
 create mode 100755 t/t1506-rev-parse-diagnosis.sh

diff --git a/cache.h b/cache.h
index 0e69384..5c8cb5f 100644
--- a/cache.h
+++ b/cache.h
@@ -708,7 +708,11 @@ static inline unsigned int hexval(unsigned char c)
 #define DEFAULT_ABBREV 7
 
 extern int get_sha1(const char *str, unsigned char *sha1);
-extern int get_sha1_with_mode(const char *str, unsigned char *sha1, unsigned *mode);
+static inline get_sha1_with_mode(const char *str, unsigned char *sha1, unsigned *mode)
+{
+	return get_sha1_with_mode_1(str, sha1, mode, 0, NULL);
+}
+extern int get_sha1_with_mode_1(const char *str, unsigned char *sha1, unsigned *mode, int fatal, const char *prefix);
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
 extern int read_ref(const char *filename, unsigned char *sha1);
diff --git a/setup.c b/setup.c
index f67250b..3094e8b 100644
--- a/setup.c
+++ b/setup.c
@@ -74,6 +74,18 @@ int check_filename(const char *prefix, const char *arg)
 	die_errno("failed to stat '%s'", arg);
 }
 
+static void NORETURN die_verify_filename(const char *prefix, const char *arg)
+{
+	unsigned char sha1[20];
+	unsigned mode;
+	/* try a detailed diagnostic ... */
+	get_sha1_with_mode_1(arg, sha1, &mode, 1, prefix);
+	/* ... or fall back the most general message. */
+	die("ambiguous argument '%s': unknown revision or path not in the working tree.\n"
+	    "Use '--' to separate paths from revisions", arg);
+
+}
+
 /*
  * Verify a filename that we got as an argument for a pathspec
  * entry. Note that a filename that begins with "-" never verifies
@@ -87,8 +99,7 @@ void verify_filename(const char *prefix, const char *arg)
 		die("bad flag '%s' used after filename", arg);
 	if (check_filename(prefix, arg))
 		return;
-	die("ambiguous argument '%s': unknown revision or path not in the working tree.\n"
-	    "Use '--' to separate paths from revisions", arg);
+	die_verify_filename(prefix, arg);
 }
 
 /*
diff --git a/sha1_name.c b/sha1_name.c
index 44bb62d..030e2ac 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -804,7 +804,77 @@ int get_sha1(const char *name, unsigned char *sha1)
 	return get_sha1_with_mode(name, sha1, &unused);
 }
 
-int get_sha1_with_mode(const char *name, unsigned char *sha1, unsigned *mode)
+static void diagnose_invalid_sha1_path(const char *prefix,
+				       const char *filename,
+				       const char *tree_sha1,
+				       const char *object_name)
+{
+	struct stat st;
+	unsigned char sha1[20];
+	unsigned mode;
+
+	if (!prefix)
+		prefix = "";
+
+	if (!lstat(filename, &st))
+		die("Path '%s' exists on disk, but not in '%s'.",
+		    filename, object_name);
+	if (errno == ENOENT || errno == ENOTDIR) {
+		char *fullname = malloc(strlen(filename)
+					     + strlen(prefix) + 1);
+		strcpy(fullname, prefix);
+		strcat(fullname, filename);
+
+		if (!get_tree_entry(tree_sha1, fullname,
+				    sha1, &mode)) {
+			die("Path '%s' exists, but not '%s'.\n"
+			    "Did you mean '%s:%s'?",
+			    fullname,
+			    filename,
+			    object_name,
+			    fullname);
+		}
+		die("Path '%s' does not exist in '%s'",
+		    filename, object_name);
+	}
+}
+
+static void diagnose_invalid_index_path(int stage,
+					const char *prefix,
+					const char *filename)
+{
+	struct stat st;
+
+	if (!prefix)
+		prefix = "";
+
+	if (!lstat(filename, &st))
+		die("Path '%s' exists on disk, but not in the index.", filename);
+	if (errno == ENOENT || errno == ENOTDIR) {
+		struct cache_entry *ce;
+		int pos;
+		int namelen = strlen(filename) + strlen(prefix);
+		char *fullname = malloc(namelen + 1);
+		strcpy(fullname, prefix);
+		strcat(fullname, filename);
+		pos = cache_name_pos(fullname, namelen);
+		if (pos < 0)
+			pos = -pos - 1;
+		ce = active_cache[pos];
+		if (ce_namelen(ce) == namelen &&
+		    !memcmp(ce->name, fullname, namelen))
+			die("Path '%s' is in the index, but not '%s'.\n"
+			    "Did you mean ':%d:%s'?",
+			    fullname, filename,
+			    stage, fullname);
+
+		die("Path '%s' does not exist (neither on disk nor in the index).",
+		    filename);
+	}
+}
+
+
+int get_sha1_with_mode_1(const char *name, unsigned char *sha1, unsigned *mode, int fatal, const char *prefix)
 {
 	int ret, bracket_depth;
 	int namelen = strlen(name);
@@ -850,6 +920,8 @@ int get_sha1_with_mode(const char *name, unsigned char *sha1, unsigned *mode)
 			}
 			pos++;
 		}
+		if (fatal)
+			diagnose_invalid_index_path(stage, prefix, cp);
 		return -1;
 	}
 	for (cp = name, bracket_depth = 0; *cp; cp++) {
@@ -862,9 +934,24 @@ int get_sha1_with_mode(const char *name, unsigned char *sha1, unsigned *mode)
 	}
 	if (*cp == ':') {
 		unsigned char tree_sha1[20];
-		if (!get_sha1_1(name, cp-name, tree_sha1))
-			return get_tree_entry(tree_sha1, cp+1, sha1,
-					      mode);
+		char *object_name;
+		if (fatal) {
+			object_name = malloc(cp-name+1);
+			strncpy(object_name, name, cp-name);
+			object_name[cp-name] = '\0';
+		}
+		if (!get_sha1_1(name, cp-name, tree_sha1)) {
+			const char *filename = cp+1;
+			ret = get_tree_entry(tree_sha1, filename, sha1, mode);
+			if (fatal)
+				diagnose_invalid_sha1_path(prefix, filename,
+							   tree_sha1, object_name);
+
+			return ret;
+		} else {
+			if (fatal)
+				die("Invalid object name '%s'.", object_name);
+		}
 	}
 	return ret;
 }
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
new file mode 100755
index 0000000..8112d56
--- /dev/null
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -0,0 +1,67 @@
+#!/bin/sh
+
+test_description='test git rev-parse diagnosis for invalid argument'
+
+exec </dev/null
+
+. ./test-lib.sh
+
+HASH_file=
+
+test_expect_success 'set up basic repo' '
+	echo one > file.txt &&
+	mkdir subdir &&
+	echo two > subdir/file.txt &&
+	echo three > subdir/file2.txt &&
+	git add . &&
+	git commit -m init &&
+	echo four > index-only.txt &&
+	git add index-only.txt &&
+	echo five > disk-only.txt
+'
+
+test_expect_success 'correct file objects' '
+	HASH_file=$(git rev-parse HEAD:file.txt) &&
+	git rev-parse HEAD:subdir/file.txt &&
+	git rev-parse :index-only.txt &&
+	cd subdir &&
+	git rev-parse HEAD:file.txt &&
+	git rev-parse HEAD:subdir/file2.txt &&
+	test $HASH_file = $(git rev-parse HEAD:file.txt) &&
+	test $HASH_file = $(git rev-parse :file.txt) &&
+	test $HASH_file = $(git rev-parse :0:file.txt) &&
+	cd ..
+'
+
+test_expect_success 'incorrect revision id' '
+	test_must_fail git rev-parse foobar:file.txt 2>&1 |
+		grep "Invalid object name '"'"'foobar'"'"'." &&
+	test_must_fail git rev-parse foobar 2>&1 |
+		grep "unknown revision or path not in the working tree."
+'
+
+test_expect_success 'incorrect file in sha1:path' '
+	test_must_fail git rev-parse HEAD:nothing.txt 2>&1 |
+		grep "fatal: Path '"'"'nothing.txt'"'"' does not exist in '"'"'HEAD'"'"'" &&
+	test_must_fail git rev-parse HEAD:index-only.txt 2>&1 |
+		grep "fatal: Path '"'"'index-only.txt'"'"' exists on disk, but not in '"'"'HEAD'"'"'." &&
+	cd subdir &&
+	test_must_fail git rev-parse HEAD:file2.txt 2>&1 |
+		grep "Did you mean '"'"'HEAD:subdir/file2.txt'"'"'?" &&
+	cd ..
+'
+
+test_expect_success 'incorrect file in :path and :0:path' '
+	test_must_fail git rev-parse :nothing.txt 2>&1 |
+		grep "fatal: Path '"'"'nothing.txt'"'"' does not exist (neither on disk nor in the index)." &&
+	test_must_fail git rev-parse :1:nothing.txt 2>&1 |
+		grep "Path '"'"'nothing.txt'"'"' does not exist (neither on disk nor in the index)." &&
+	cd subdir &&
+	test_must_fail git rev-parse :file2.txt 2>&1 |
+		grep "Did you mean '"'"':0:subdir/file2.txt'"'"'?" &&
+	cd .. &&
+	test_must_fail git rev-parse :disk-only.txt 2>&1 |
+		grep "fatal: Path '"'"'disk-only.txt'"'"' exists on disk, but not in the index."
+'
+
+test_done
-- 
1.6.6.rc0.256.g6060

^ permalink raw reply related

* [PATCH v2] Detailed diagnosis when parsing an object name fails.
From: y @ 2009-12-02 20:01 UTC (permalink / raw)
  To: git, gitster; +Cc: Matthieu Moy

From: Matthieu Moy <Matthieu.Moy@imag.fr>

The previous error message was the same in many situations (unknown
revision or path not in the working tree). We try to help the user as
much as possible to understand the error, especially with the
sha1:filename notation. In this case, we say whether the sha1 or the
filename is problematic, and diagnose the confusion between
relative-to-root and relative-to-$PWD confusion precisely.

The 6 new error messages are tested.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
Changes since v1:

* Fixed a segfault with

+	if (!prefix)
+		prefix = "";

* Added testcases.

 cache.h                        |    6 ++-
 setup.c                        |   15 +++++-
 sha1_name.c                    |   95 ++++++++++++++++++++++++++++++++++++++--
 t/t1506-rev-parse-diagnosis.sh |   67 ++++++++++++++++++++++++++++
 4 files changed, 176 insertions(+), 7 deletions(-)
 create mode 100755 t/t1506-rev-parse-diagnosis.sh

diff --git a/cache.h b/cache.h
index 0e69384..5c8cb5f 100644
--- a/cache.h
+++ b/cache.h
@@ -708,7 +708,11 @@ static inline unsigned int hexval(unsigned char c)
 #define DEFAULT_ABBREV 7
 
 extern int get_sha1(const char *str, unsigned char *sha1);
-extern int get_sha1_with_mode(const char *str, unsigned char *sha1, unsigned *mode);
+static inline get_sha1_with_mode(const char *str, unsigned char *sha1, unsigned *mode)
+{
+	return get_sha1_with_mode_1(str, sha1, mode, 0, NULL);
+}
+extern int get_sha1_with_mode_1(const char *str, unsigned char *sha1, unsigned *mode, int fatal, const char *prefix);
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
 extern int read_ref(const char *filename, unsigned char *sha1);
diff --git a/setup.c b/setup.c
index f67250b..3094e8b 100644
--- a/setup.c
+++ b/setup.c
@@ -74,6 +74,18 @@ int check_filename(const char *prefix, const char *arg)
 	die_errno("failed to stat '%s'", arg);
 }
 
+static void NORETURN die_verify_filename(const char *prefix, const char *arg)
+{
+	unsigned char sha1[20];
+	unsigned mode;
+	/* try a detailed diagnostic ... */
+	get_sha1_with_mode_1(arg, sha1, &mode, 1, prefix);
+	/* ... or fall back the most general message. */
+	die("ambiguous argument '%s': unknown revision or path not in the working tree.\n"
+	    "Use '--' to separate paths from revisions", arg);
+
+}
+
 /*
  * Verify a filename that we got as an argument for a pathspec
  * entry. Note that a filename that begins with "-" never verifies
@@ -87,8 +99,7 @@ void verify_filename(const char *prefix, const char *arg)
 		die("bad flag '%s' used after filename", arg);
 	if (check_filename(prefix, arg))
 		return;
-	die("ambiguous argument '%s': unknown revision or path not in the working tree.\n"
-	    "Use '--' to separate paths from revisions", arg);
+	die_verify_filename(prefix, arg);
 }
 
 /*
diff --git a/sha1_name.c b/sha1_name.c
index 44bb62d..030e2ac 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -804,7 +804,77 @@ int get_sha1(const char *name, unsigned char *sha1)
 	return get_sha1_with_mode(name, sha1, &unused);
 }
 
-int get_sha1_with_mode(const char *name, unsigned char *sha1, unsigned *mode)
+static void diagnose_invalid_sha1_path(const char *prefix,
+				       const char *filename,
+				       const char *tree_sha1,
+				       const char *object_name)
+{
+	struct stat st;
+	unsigned char sha1[20];
+	unsigned mode;
+
+	if (!prefix)
+		prefix = "";
+
+	if (!lstat(filename, &st))
+		die("Path '%s' exists on disk, but not in '%s'.",
+		    filename, object_name);
+	if (errno == ENOENT || errno == ENOTDIR) {
+		char *fullname = malloc(strlen(filename)
+					     + strlen(prefix) + 1);
+		strcpy(fullname, prefix);
+		strcat(fullname, filename);
+
+		if (!get_tree_entry(tree_sha1, fullname,
+				    sha1, &mode)) {
+			die("Path '%s' exists, but not '%s'.\n"
+			    "Did you mean '%s:%s'?",
+			    fullname,
+			    filename,
+			    object_name,
+			    fullname);
+		}
+		die("Path '%s' does not exist in '%s'",
+		    filename, object_name);
+	}
+}
+
+static void diagnose_invalid_index_path(int stage,
+					const char *prefix,
+					const char *filename)
+{
+	struct stat st;
+
+	if (!prefix)
+		prefix = "";
+
+	if (!lstat(filename, &st))
+		die("Path '%s' exists on disk, but not in the index.", filename);
+	if (errno == ENOENT || errno == ENOTDIR) {
+		struct cache_entry *ce;
+		int pos;
+		int namelen = strlen(filename) + strlen(prefix);
+		char *fullname = malloc(namelen + 1);
+		strcpy(fullname, prefix);
+		strcat(fullname, filename);
+		pos = cache_name_pos(fullname, namelen);
+		if (pos < 0)
+			pos = -pos - 1;
+		ce = active_cache[pos];
+		if (ce_namelen(ce) == namelen &&
+		    !memcmp(ce->name, fullname, namelen))
+			die("Path '%s' is in the index, but not '%s'.\n"
+			    "Did you mean ':%d:%s'?",
+			    fullname, filename,
+			    stage, fullname);
+
+		die("Path '%s' does not exist (neither on disk nor in the index).",
+		    filename);
+	}
+}
+
+
+int get_sha1_with_mode_1(const char *name, unsigned char *sha1, unsigned *mode, int fatal, const char *prefix)
 {
 	int ret, bracket_depth;
 	int namelen = strlen(name);
@@ -850,6 +920,8 @@ int get_sha1_with_mode(const char *name, unsigned char *sha1, unsigned *mode)
 			}
 			pos++;
 		}
+		if (fatal)
+			diagnose_invalid_index_path(stage, prefix, cp);
 		return -1;
 	}
 	for (cp = name, bracket_depth = 0; *cp; cp++) {
@@ -862,9 +934,24 @@ int get_sha1_with_mode(const char *name, unsigned char *sha1, unsigned *mode)
 	}
 	if (*cp == ':') {
 		unsigned char tree_sha1[20];
-		if (!get_sha1_1(name, cp-name, tree_sha1))
-			return get_tree_entry(tree_sha1, cp+1, sha1,
-					      mode);
+		char *object_name;
+		if (fatal) {
+			object_name = malloc(cp-name+1);
+			strncpy(object_name, name, cp-name);
+			object_name[cp-name] = '\0';
+		}
+		if (!get_sha1_1(name, cp-name, tree_sha1)) {
+			const char *filename = cp+1;
+			ret = get_tree_entry(tree_sha1, filename, sha1, mode);
+			if (fatal)
+				diagnose_invalid_sha1_path(prefix, filename,
+							   tree_sha1, object_name);
+
+			return ret;
+		} else {
+			if (fatal)
+				die("Invalid object name '%s'.", object_name);
+		}
 	}
 	return ret;
 }
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
new file mode 100755
index 0000000..8112d56
--- /dev/null
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -0,0 +1,67 @@
+#!/bin/sh
+
+test_description='test git rev-parse diagnosis for invalid argument'
+
+exec </dev/null
+
+. ./test-lib.sh
+
+HASH_file=
+
+test_expect_success 'set up basic repo' '
+	echo one > file.txt &&
+	mkdir subdir &&
+	echo two > subdir/file.txt &&
+	echo three > subdir/file2.txt &&
+	git add . &&
+	git commit -m init &&
+	echo four > index-only.txt &&
+	git add index-only.txt &&
+	echo five > disk-only.txt
+'
+
+test_expect_success 'correct file objects' '
+	HASH_file=$(git rev-parse HEAD:file.txt) &&
+	git rev-parse HEAD:subdir/file.txt &&
+	git rev-parse :index-only.txt &&
+	cd subdir &&
+	git rev-parse HEAD:file.txt &&
+	git rev-parse HEAD:subdir/file2.txt &&
+	test $HASH_file = $(git rev-parse HEAD:file.txt) &&
+	test $HASH_file = $(git rev-parse :file.txt) &&
+	test $HASH_file = $(git rev-parse :0:file.txt) &&
+	cd ..
+'
+
+test_expect_success 'incorrect revision id' '
+	test_must_fail git rev-parse foobar:file.txt 2>&1 |
+		grep "Invalid object name '"'"'foobar'"'"'." &&
+	test_must_fail git rev-parse foobar 2>&1 |
+		grep "unknown revision or path not in the working tree."
+'
+
+test_expect_success 'incorrect file in sha1:path' '
+	test_must_fail git rev-parse HEAD:nothing.txt 2>&1 |
+		grep "fatal: Path '"'"'nothing.txt'"'"' does not exist in '"'"'HEAD'"'"'" &&
+	test_must_fail git rev-parse HEAD:index-only.txt 2>&1 |
+		grep "fatal: Path '"'"'index-only.txt'"'"' exists on disk, but not in '"'"'HEAD'"'"'." &&
+	cd subdir &&
+	test_must_fail git rev-parse HEAD:file2.txt 2>&1 |
+		grep "Did you mean '"'"'HEAD:subdir/file2.txt'"'"'?" &&
+	cd ..
+'
+
+test_expect_success 'incorrect file in :path and :0:path' '
+	test_must_fail git rev-parse :nothing.txt 2>&1 |
+		grep "fatal: Path '"'"'nothing.txt'"'"' does not exist (neither on disk nor in the index)." &&
+	test_must_fail git rev-parse :1:nothing.txt 2>&1 |
+		grep "Path '"'"'nothing.txt'"'"' does not exist (neither on disk nor in the index)." &&
+	cd subdir &&
+	test_must_fail git rev-parse :file2.txt 2>&1 |
+		grep "Did you mean '"'"':0:subdir/file2.txt'"'"'?" &&
+	cd .. &&
+	test_must_fail git rev-parse :disk-only.txt 2>&1 |
+		grep "fatal: Path '"'"'disk-only.txt'"'"' exists on disk, but not in the index."
+'
+
+test_done
-- 
1.6.6.rc0.256.g6060


^ permalink raw reply related

* Re: "git merge" merges too much!
From: Jeff King @ 2009-12-02 20:09 UTC (permalink / raw)
  To: The Git Mailing List
In-Reply-To: <m1NFAji-000kn2C@most.weird.com>

On Mon, Nov 30, 2009 at 01:12:31PM -0500, Greg A. Woods wrote:

> (From a first pass through the documentation I would never have guessed
> that "tags" were also a form of "refs".  All these different names for

I find git is much simpler to use and understand if you start "at the
bottom" with the basic concepts (because for the most part, git is
really a set of tools for manipulating the few basic data structures).
For a short intro, try:

  http://eagain.net/articles/git-for-computer-scientists/

I think Scott Chacon's "Pro Git" book also takes a similar approach, but
I confess that I have not actually read it carefully. At this point, I
know enough about git to make reading it not very interesting. :) You
can find it online at:

  http://progit.org/book/

> features.  Even the gitglossary(7) is somewhat inconsistent on how it
> uses "ref" and "refs".  Perhaps all that's needed is some firm editing
> and clean-up of the manuals and documentation by a good strong technical
> editor.)

I skimmed it and didn't see any inconsistency. If you have something
specific in mind, please point it out so we can fix it.

> "git rebase" will not work for me unless it grows a "copy" option ,
> i.e. one which does not delete the original branch (i.e. avoids the
> "reset" phase of its operation).  This option would likely only make
> sense when used with the "--onto" option, I would guess.

I think Dmitry already mentioned this, but you probably want to create a
new branch to hold your rebased history if you don't want to modify the
existing branch.

> (git-log(1) is worse than ls(1) for having too many options, but worst
> of all in the release I'm still using it doesn't respond sensibly nor
> consistently with other commands when given the "-?" option.)

$ ls -?
ls: invalid option -- '?'
Try `ls --help' for more information.

$ ls --help ;# or ls -h
[copious usage information]

$ git log -?
fatal: unrecognized argument: -?

$ git log --help
[the man page]

$ git log -h
usage: git log [<options>] [<since>..<until>] [[--] <path>...]
   or: git show [options] <object>...

$ cd /outside/of/git/repo
$ git log -?
fatal: Not a git repository (or any of the parent directories): .git

So "-?" is bogus for both ls and git. But there are two failings I see:

  1. Outside of a repository, "git log" does not even get to the
     argument-parsing phase to see that "-?" is bogus. We short-circuit
     "-h" and "--help" to avoid actually looking for a git repository,
     but obviously cannot do so for every "--bogus" argument we see.
     We could potentially also short-circuit "-?" (and probably map it
     to "-h" if we were going to do that). However, I didn't think "-?"
     was in common use.

  2. "git log -h" doesn't mention any of the options specifically,
     though other git commands do (e.g., try "git archive -h"). This is
     because the option list is generated by our parseopt library, but
     the revision and diff options (which are the only ones that "git
     log" takes) do not use parseopt. Maybe we should point to "--help"
     for the full list in that case.

-Peff

^ permalink raw reply

* Re: [RFC PATCH 4/8] Support remote helpers implementing smart transports
From: Ilari Liusvaara @ 2009-12-02 20:10 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git
In-Reply-To: <20091202170457.GC31648@spearce.org>

On Wed, Dec 02, 2009 at 09:04:57AM -0800, Shawn O. Pearce wrote:
> Ilari Liusvaara <ilari.liusvaara@elisanet.fi> wrote:
> 
> Drop invoke-r.

Dropped.

> Modify transport-helper.c to allow pushing TRANS_OPT_UPLOADPACK and
> TRANS_OPT_RECEIVEPACK down into the helper via the option capability.

NAK. Modified _process_connect_or_invoke (now _process_connect) to pass
new option that appiles to connecting all subprotocols (if needed).

It looks like following:

 > capabilities
 < option
 < connect
 <
 > option servpath <blahblah>
 < ok
 > connect git-upload-pack
 < 

And from helper POV, all subprotocols should appear identical from
layer 6 POV so it doesn't make sense to diffrentiate between path
for upload-pack and receive-pack (or upload-archive!).

> I'd rename connect-r to just connect.

Yeah, putting repository in RPC explicit signature is bit ugly (there
isn't probaby ever going to be signature that doesn't contain repo as
argument). That would make it 'connect'.

Renamed.

-Ilari

^ permalink raw reply

* Re: [PATCH] builtin-commit: add --date option
From: Miklos Vajna @ 2009-12-02 20:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20091202192425.GC30778@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 826 bytes --]

On Wed, Dec 02, 2009 at 02:24:25PM -0500, Jeff King <peff@peff.net> wrote:
> On Wed, Dec 02, 2009 at 09:35:48AM -0800, Junio C Hamano wrote:
> 
> > > Is there any documentation describing what does parse_date() accept?
> > [...]
> > The above are all supported (you can label 2 as ISO even though the
> > official ISO8601 wants "T" instead of " " between date and time).
> > 
> > For more amusing ones, see
> > 
> >     http://article.gmane.org/gmane.comp.version-control.git/12241
> > 
> > and follow the discussion there ;-)
> 
> Aren't the amusing ones the result of approxidate, and not parse_date?
> At least that is my recollection from working on the date code when I
> ate 30 hot dogs last August.

I think you are right, at least --date="2008-12-02 18:04:00" works here,
but not --date="teatime".

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [RFC/PATCHv9 01/11] fast-import: Proper notes tree manipulation
From: Shawn O. Pearce @ 2009-12-02 20:39 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, gitster
In-Reply-To: <1259719783-4674-2-git-send-email-johan@herland.net>

Johan Herland <johan@herland.net> wrote:
> diff --git a/fast-import.c b/fast-import.c
> index b41d29f..b51ffbc 100644
> --- a/fast-import.c
> +++ b/fast-import.c

This new version is much cleaner, thank you.

> @@ -161,6 +161,7 @@ Format of STDIN stream:
>  #define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
>  #define DEPTH_BITS 13
>  #define MAX_DEPTH ((1<<DEPTH_BITS)-1)
> +#define NOTE_MODE 0170000

Be consistent with S_IFGITLINK:

#define S_IFNOTE 0170000
#define S_ISNOTE(m) (((m) & S_IFMT) == S_IFNOTE)

> @@ -245,6 +246,7 @@ struct branch
>  	const char *name;
>  	struct tree_entry branch_tree;
>  	uintmax_t last_commit;
> +	unsigned int num_notes;

Use uintmax_t here?

> +static unsigned int do_change_note_fanout(
> +		struct tree_entry *orig_root, struct tree_entry *root,
> +		char *hex_sha1, unsigned int hex_sha1_len,
> +		char *fullpath, unsigned int fullpath_len,
> +		unsigned char fanout)
> +{
> +	struct tree_content *t = root->tree;
> +	struct tree_entry *e, leaf;
> +	unsigned int i, tmp_hex_sha1_len, tmp_fullpath_len, num_notes = 0;
> +	unsigned char sha1[20];
> +	char realpath[60];
> +	int is_note;
> +
> +	for (i = 0; i < t->entry_count; i++) {
> +		e = t->entries[i];
> +		is_note = (e->versions[1].mode & NOTE_MODE) == NOTE_MODE;
> +		tmp_hex_sha1_len = hex_sha1_len + e->name->str_len;
> +		tmp_fullpath_len = fullpath_len;
> +
> +		if (tmp_hex_sha1_len <= 40 && e->name->str_len >= 2) {
> +			memcpy(hex_sha1 + hex_sha1_len, e->name->str_dat,
> +				e->name->str_len);
> +			if (tmp_fullpath_len)
> +				fullpath[tmp_fullpath_len++] = '/';
> +			memcpy(fullpath + tmp_fullpath_len, e->name->str_dat,
> +				e->name->str_len);
> +			tmp_fullpath_len += e->name->str_len;
> +			assert(tmp_fullpath_len < 60);
> +			fullpath[tmp_fullpath_len] = '\0';
> +		} else {
> +			assert(!is_note);
> +			continue;
> +		}

Are we rejecting a mixed content-tree here?  I thought a notes
tree was allowed to hold anything, e.g. isn't it ok to put a
".gitattributes" file into a notes tree.

I think we'd do better to have at the top of our loop:

	if (!is_note && !S_ISDIR(e->versions[1].mode))
		continue;

so that we ignore non-notes and non-subdirectories which might
contain notes.

Also, fast-import never uses assert.  I'd prefer to die() because
then the recent command trace can go into the crash report.
It gives the user more context about what the hell just went wrong.

> +		/* The above may have reallocated the current tree_content */
> +		if (t != root->tree)
> +			t = root->tree;

Why bother with the condition?  Just do the assignment every time
in the loop.

> @@ -2080,8 +2195,10 @@ static void note_change_n(struct branch *b)
>  			    typename(type), command_buf.buf);
>  	}
> 
> -	tree_content_set(&b->branch_tree, sha1_to_hex(commit_sha1), sha1,
> -		S_IFREG | 0644, NULL);
> +	construct_path_with_fanout(sha1_to_hex(commit_sha1), fanout, path);
> +	b->num_notes += adjust_num_notes(&b->branch_tree, path, sha1);
> +	mode = (is_null_sha1(sha1) ? S_IFREG : NOTE_MODE) | 0644;
> +	tree_content_set(&b->branch_tree, path, sha1, mode, NULL);

I wonder if it wouldn't be better to compute the fan out here on
each put.  That way if an importer drives 2,000,000 notes at once
to us in a single commit, we don't wind up with a flat 0 fan-out
tree and trying to perform a linear insert on each one, but instead
will start to increase the fan out as the number of entries goes up.

Basically, tree_content_set/remove are O(N) operations on N paths
in the tree, because their structures aren't necessarily sorted.
IIRC at one point in time I tried to do this with a binary search but
gave up and just did it unsorted.  At least using the fan out here
would help partition the search space dramatically on large inserts.

-- 
Shawn.

^ permalink raw reply

* Re: [PATCH 2/5] git-svn: Make merge metadata accessible to make_log_entry
From: Alex Vandiver @ 2009-12-02 20:46 UTC (permalink / raw)
  To: git
In-Reply-To: <1259780874-14706-3-git-send-email-alex@chmrr.net>

At Wed Dec 02 14:07:51 -0500 2009, Alex Vandiver wrote:
> @@ -2944,14 +2944,15 @@ sub find_extra_svk_parents {
>                  # wahey!  we found it, but it might be
>                  # an old one (!)
>                  push @known_parents, [ $rev, $commit ];
> +                push @known_parents, [ $rev, $path, $commit ];
>              }
>          }
>      }

This hunk is wrong due to a mis-merge on my part -- the first 'push'
line should have been removed, obviously.  I'll wait for other
comments before I push a v2, however.

 - Alex
-- 
Networking -- only one letter away from not working

^ permalink raw reply

* How do you best store structured data in git repositories?
From: Sebastian Setzer @ 2009-12-02 21:08 UTC (permalink / raw)
  To: git

Hi,
when you design a file format to store structured data, and you want to
manage these files with git, how do you do this best?

I'd like to hear about best practices, experiences, links to discussions
on this subject, ...

Here are some of my questions:

Do you store everything in a single file and configure git to use
special diff- and merge-tools?
Do you use XML for this purpose?
Do you take care that the contents of your file is as stable as possible
when it's saved or do you let your diff tools cope with issues like
reordering, reassignment of identifiers (for example when identifiers
are offsets in the file), ...?

Do you store one object/record per file (with filename=id, for example
with GUID-s) and hope that git will not mess them up when it merges
them?

Do you store records as directories, with very small files which contain
single attributes (because records can be considered sets of
key-value-pairs and the same applies to directories)? Do you configure
git to do a scalar merge on non-text "attributes" (with special file
extensions)?

When you don't store everything in a single, binary file: Do you use git
hooks to update an index for efficient queries on your structured data?
Do you update the whole index for every change? Or do you use git hashes
to decide which segment of your index needs to be updated?

greetings,
Sebastian

^ permalink raw reply

* Re: [PATCH v2] Add --track option to git clone
From: Nanako Shiraishi @ 2009-12-02 21:07 UTC (permalink / raw)
  To: Jeff King; +Cc: David Soria Parra, git
In-Reply-To: <20091202190807.GB30778@coredump.intra.peff.net>

Quoting Jeff King <peff@peff.net>

> I would find something like this useful for cloning git.git, where I
> explicitly fetch maint, master, next, and pu, but none of html, man, or
> todo. This makes "gitk --all" much nicer to view.

Thank you for explaining. I now can understand why it can be useful.

>   # most general case
>   git clone -f 'refs/heads/subset/*:refs/remotes/origin/*' remote.git

Because this is only about branches and no other kinds of 
references, I think this is an overkill.

>   git clone -f 'subset/*' remote.git

But I think this is a good idea.

>   # multiple -f should add multiple refspec lines
>   git clone -f maint -f master -f next -f pu git.git
>
>   # choose your favorite branch
>   git clone -f maint -f master -f next -f pu -b next git.git
> ...
> What do you think?

I think your rule to make first branch given by -f the default 
for -b is a good idea. But I'm not very happy with the example 
with four -f. Can we probably write it like this?

  git clone -f maint,master,next,pu git.git

If it isn't a good idea to use comma, we can use colon to split 
the list of branch names instead.

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply

* Re: [PATCH] builtin-commit: add --date option
From: Miklos Vajna @ 2009-12-02 21:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git
In-Reply-To: <7vpr6xcgki.fsf@alter.siamese.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

On Wed, Dec 02, 2009 at 11:38:05AM -0800, Junio C Hamano <gitster@pobox.com> wrote:
>  - We should honor GIT_AUTHOR_DATE if --reset-author is given.

This is already a situation, as far as I see.

>  - I _think_ we should ignore GIT_AUTHOR_DATE if --reset-author is not
>    given, as --amend/-c/-C is stronger for being command line options than
>    an environment variable.

We already ignore GIT_AUTHOR_DATE if --reset-author is not given, also
when I change it like this:

diff --git a/builtin-commit.c b/builtin-commit.c
index e93a647..7234c7d 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -397,7 +397,8 @@ static void determine_author_info(void)

                name = xstrndup(a + 8, lb - (a + 8));
                email = xstrndup(lb + 2, rb - (lb + 2));
-               date = xstrndup(rb + 2, eol - (rb + 2));
+               if (!date)
+                       date = xstrndup(rb + 2, eol - (rb + 2));
        }

        if (force_author) {

tests 27, 33, 38 and 39 fail in t7501-commit.sh.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply related

* [PATCH] gitweb: Describe (possible) gitweb.js minification in gitweb/README
From: Jakub Narebski @ 2009-12-02 21:14 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski
In-Reply-To: <7vaay2tkfh.fsf@alter.siamese.dyndns.org>

On Tue, 1 Dec 2009, Junio C Hamano wrote:

> * jn/gitweb-blame (2009-11-24) 8 commits.
>   (merged to 'next' on 2009-11-25 at 0a5b649)
>  + gitweb.js: fix padLeftStr() and its usage
>  + gitweb.js: Harden setting blamed commit info in incremental blame
>  + gitweb.js: fix null object exception in initials calculation
>  + gitweb: Minify gitweb.js if JSMIN is defined

This commit is somehow missing description of JSMIn in gitweb/README
from the original patch.  Here is its completion.

>  + gitweb: Create links leading to 'blame_incremental' using JavaScript
>   (merged to 'next' on 2009-10-11 at 73c4a83)
>  + gitweb: Colorize 'blame_incremental' view during processing
>  + gitweb: Incremental blame (using JavaScript)
>  + gitweb: Add optional "time to generate page" info in footer
> 
> With two more changes to disable this by default to make it
> suitable as "new feature with known breakages" for 1.6.6

-- >8 --
Subject: [PATCH] gitweb: Describe (possible) gitweb.js minification in gitweb/README

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/README |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/gitweb/README b/gitweb/README
index b69b0e5..e34ee79 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -95,7 +95,8 @@ You can specify the following configuration variables when building GIT:
  * GITWEB_JS
    Points to the localtion where you put gitweb.js on your web server
    (or to be more generic URI of JavaScript code used by gitweb).
-   Relative to base URI of gitweb.  [Default: gitweb.js]
+   Relative to base URI of gitweb.  [Default: gitweb.js (or gitweb.min.js
+   if JSMIN build variable is defined / JavaScript minifier is used)]
  * GITWEB_CONFIG
    This Perl file will be loaded using 'do' and can be used to override any
    of the options above as well as some other options -- see the "Runtime
-- 
1.6.5.3

^ permalink raw reply related

* Re: Git GUI client SmartGit released
From: Marc Strapetz @ 2009-12-02 21:15 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git, Alexander Kitaev
In-Reply-To: <20091202164810.GB31648@spearce.org>

> I noticed you use JGit and the Trilead SSH client.
> 
> I'm curious, did you guys replace JSch because its a pile of junk?
> Did you patch JGit to use Trilead SSH instead of JSch?  If so,
> would you be interested in contributing that change back to JGit?
> I'm rather fed up with JSch...  :-)

We currently don't use JGit's transport, but we plant a custom SSH
client on the git executable which connects back to SmartGit and just
tunnels SSH data through. Anyway, I can remember that SVNKit was using
JSch initially and they switched to Trilead because of problems with
JSch (maybe Alexander in Cc can shed more light on that).

--
Best regards,
Marc Strapetz
=============
syntevo GmbH
http://www.syntevo.com
http://blog.syntevo.com

Shawn O. Pearce wrote:
> Marc Strapetz <marc.strapetz@syntevo.com> wrote:
>> We are proud to announce the general availability of our Git client
>> SmartGit[1]:
>>
>>  http://www.syntevo.com/smartgit/index.html
> 
> Congrats on your release.
> 
> I noticed you use JGit and the Trilead SSH client.
> 
> I'm curious, did you guys replace JSch because its a pile of junk?
> Did you patch JGit to use Trilead SSH instead of JSch?  If so,
> would you be interested in contributing that change back to JGit?
> I'm rather fed up with JSch...  :-)
> 

^ permalink raw reply

* Re: How do you best store structured data in git repositories?
From: Avery Pennarun @ 2009-12-02 21:17 UTC (permalink / raw)
  To: sebastianspublicaddress; +Cc: git
In-Reply-To: <1259788097.3590.29.camel@nord26-amd64>

On Wed, Dec 2, 2009 at 4:08 PM, Sebastian Setzer
<sebastianspublicaddress@googlemail.com> wrote:
> Do you store everything in a single file and configure git to use
> special diff- and merge-tools?
> Do you use XML for this purpose?

XML is terrible for most data storage purposes.  Data exchange, maybe,
but IMHO the best thing you can do when you get XML data is to put it
in some other format ASAP.

As it happens, I've been doing a project where we store a bunch of
stuff in csv format in git, and it works fairly well.  We made a
special merge driver that can merge csv data (based on knowing which
columns should be treated as the "primary key").

> Do you take care that the contents of your file is as stable as possible
> when it's saved or do you let your diff tools cope with issues like
> reordering, reassignment of identifiers (for example when identifiers
> are offsets in the file), ...?

A custom merge driver is better, by far, than the builtin ones (which
were designed for source code) if you have any kind of structured data
that you don't want to have to merge by hand.

That said, however, you should still try to make your files as stable
as possible, because:

- If your program outputs the data in random order, it's just being
sloppy anyway

- 'git diff' doesn't work usefully otherwise (for examining the data
and debugging)

Of course, all bets are off if your file is actually binary; merging
and diffing is mostly impossible unless you use a totally custom
engine.  And if your file contains byte offsets, then it's a binary
file, no matter that it looks like in your text editor.  Adding a byte
in the middle would make such a file entirely nonsense, which is not
an attribute of a text file.

> Do you store one object/record per file (with filename=id, for example
> with GUID-s) and hope that git will not mess them up when it merges
> them?
>
> Do you store records as directories, with very small files which contain
> single attributes (because records can be considered sets of
> key-value-pairs and the same applies to directories)? Do you configure
> git to do a scalar merge on non-text "attributes" (with special file
> extensions)?

In git, you have to balance between its different limitations.  If you
have a tonne of small files, it'll take you longer to retrieve a large
amount of data.  If you have one big huge file, git will suck a lot of
memory when repacking.  The best is to achieve a reasonable balance.

One trick that I've been using lately is to split large files
according to a rolling checksum:
http://alumnit.ca/~apenwarr/log/?m=200910#04

This generally keeps diffs useful, but keeps individual file sizes
down.  Obviously the implementation pointed to there is just a toy,
but the idea is sound.

> When you don't store everything in a single, binary file: Do you use git
> hooks to update an index for efficient queries on your structured data?
> Do you update the whole index for every change? Or do you use git hashes
> to decide which segment of your index needs to be updated?

We keep a separate index file that's not part of git.  When the git
repo is updated, we note which rows have changed, then update the
index.

Avery

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox