Git development

Git development
 help / color / mirror / Atom feed

* Re: support gnupg-2.x in git.
From: Junio C Hamano @ 2011-11-29 20:29 UTC (permalink / raw)
  To: Paweł Sikora; +Cc: git
In-Reply-To: <201111291937.34324.pawel.sikora@agmk.net>

Paweł Sikora <pawel.sikora@agmk.net> writes:

> i'm using a gnupg-2.0.18 and currently i'm not able to use git tag/verify
> due to hadcoded "gpg" literals in builtin/{tag,verifiy-tag}.c.

Stating the obvious...

  $ ln -s /usr/local/not/on/my/path/bin/gnupg-2.0.18 $HOME/bin/gpg
  $ PATH=$HOME/bin:$PATH

Or this untested patch, which applies on top of jc/signed-commit, as the
GnuPG interface is in the process of getting heavily refactored.

-- >8 --
Subject: gpg-interface: allow use of a custom GPG binary

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt  |   11 +++++++++++
 Documentation/git-tag.txt |    8 +++++---
 gpg-interface.c           |   11 ++++++++---
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index b30c7e6..094c1c9 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1094,6 +1094,17 @@ grep.lineNumber::
 grep.extendedRegexp::
 	If set to true, enable '--extended-regexp' option by default.
 
+gpg.program::
+	Use this custom program instead of "gpg" found on $PATH when
+	making or verifying a PGP signature. The program must support the
+	same command line interface as GPG, namely, to verify a detached
+	signature, "gpg --verify $file - <$signature" is run, and the
+	program is expected to signal a good signature by exiting with
+	code 0, and to generate an ascii-armored detached signature, the
+	standard input of "gpg -bsau $key" is fed with the contents to be
+	signed, and the program is expected to send the result to its
+	standard output.
+
 gui.commitmsgwidth::
 	Defines how wide the commit message window is in the
 	linkgit:git-gui[1]. "75" is the default.
diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
index c83cb13..74fc7e0 100644
--- a/Documentation/git-tag.txt
+++ b/Documentation/git-tag.txt
@@ -38,7 +38,9 @@ created (i.e. a lightweight tag).
 A GnuPG signed tag object will be created when `-s` or `-u
 <key-id>` is used.  When `-u <key-id>` is not used, the
 committer identity for the current user is used to find the
-GnuPG key for signing.
+GnuPG key for signing. 	The configuration variable `gpg.program`
+is used to specify custom GnuPG binary.
+
 
 OPTIONS
 -------
@@ -48,11 +50,11 @@ OPTIONS
 
 -s::
 --sign::
-	Make a GPG-signed tag, using the default e-mail address's key
+	Make a GPG-signed tag, using the default e-mail address's key.
 
 -u <key-id>::
 --local-user=<key-id>::
-	Make a GPG-signed tag, using the given key
+	Make a GPG-signed tag, using the given key.
 
 -f::
 --force::
diff --git a/gpg-interface.c b/gpg-interface.c
index ff232c8..18630ff 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -5,6 +5,7 @@
 #include "sigchain.h"
 
 static char *configured_signing_key;
+static const char *gpg_program = "gpg";
 
 void set_signing_key(const char *key)
 {
@@ -15,9 +16,12 @@ void set_signing_key(const char *key)
 int git_gpg_config(const char *var, const char *value, void *cb)
 {
 	if (!strcmp(var, "user.signingkey")) {
+		set_signing_key(value);
+	}
+	if (!strcmp(var, "gpg.program")) {
 		if (!value)
 			return config_error_nonbool(var);
-		set_signing_key(value);
+		gpg_program = xstrdup(value);
 	}
 	return 0;
 }
@@ -46,7 +50,7 @@ int sign_buffer(struct strbuf *buffer, struct strbuf *signature, const char *sig
 	gpg.argv = args;
 	gpg.in = -1;
 	gpg.out = -1;
-	args[0] = "gpg";
+	args[0] = gpg_program;
 	args[1] = "-bsau";
 	args[2] = signing_key;
 	args[3] = NULL;
@@ -101,10 +105,11 @@ int verify_signed_buffer(const char *payload, size_t payload_size,
 			 struct strbuf *gpg_output)
 {
 	struct child_process gpg;
-	const char *args_gpg[] = {"gpg", "--verify", "FILE", "-", NULL};
+	const char *args_gpg[] = {NULL, "--verify", "FILE", "-", NULL};
 	char path[PATH_MAX];
 	int fd, ret;
 
+	args_gpg[0] = gpg_program;
 	fd = git_mkstemp(path, PATH_MAX, ".git_vtag_tmpXXXXXX");
 	if (fd < 0)
 		return error("could not create temporary file '%s': %s",

^ permalink raw reply related

* Re: [PATCH] Make feed title valid utf8
From: Junio C Hamano @ 2011-11-29 20:11 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jürgen Kreileder, git
In-Reply-To: <m38vmylnfu.fsf@localhost.localdomain>

Thanks for patches and reviews.

I am not keeping track of what is acked unconditionally, what is acked
with reservation i.e. "with this tweak on top", and what is rejected with
suggestion of an alternative, though. I hope I can expect to see a series
that is finished between you two to be resent for application after 1.7.8
ships.

^ permalink raw reply

* Re: [PATCH] rebase -i: interrupt rebase when "commit --amend" failed during "reword"
From: Junio C Hamano @ 2011-11-29 20:08 UTC (permalink / raw)
  To: Andrew Wong; +Cc: git
In-Reply-To: <1322496952-23819-2-git-send-email-andrew.kw.w@gmail.com>

Andrew Wong <andrew.kw.w@gmail.com> writes:

> "commit --amend" could fail in cases like the user empties the commit
> message, or pre-commit failed.  When it fails, rebase should be
> interrupted, rather than ignoring the error and continue on rebasing.
> This gives users a way to gracefully interrupt a "reword" if they
> decided they actually want to do an "edit", or even "rebase --abort".
>
> Signed-off-by: Andrew Wong <andrew.kw.w@gmail.com>

Makes sense, especially if "commit" itself failed due to some unknown
reason or a refusal from the pre-commit hook. Even though a user could
have been using the "empty the commit log message and the original is
kept" as a trick to recover from a botched rewording attempt and this
change will regress for such use cases, I have a feeling that it does
not matter.

Is there anything we should be saying more than "fatal: Cannot amend" to
help users when this new "die" triggers? What is the recommended recovery
procedure? Run "git commit --amend" after doing whatever is needed to fix
the tree (e.g. if pre-commit refused because of a coding style violation,
it may involve fixing the tree being committed; if it refused because of a
typo in the log message, the tree itself may be OK and nothing needs to be
done) and then "git rebase --continue"?

>  git-rebase--interactive.sh |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 804001b..669f378 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -408,7 +408,8 @@ do_next () {
>  		mark_action_done
>  		pick_one $sha1 ||
>  			die_with_patch $sha1 "Could not apply $sha1... $rest"
> -		git commit --amend --no-post-rewrite
> +		git commit --amend --no-post-rewrite ||
> +			die_with_patch $sha1 "Cannot amend commit after successfully picking $sha1... $rest"
>  		record_in_rewritten $sha1
>  		;;
>  	edit|e)

^ permalink raw reply

* Re: [PATCH] Make feed title valid utf8
From: Jakub Narebski @ 2011-11-29 19:48 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git, Jakub Narebski
In-Reply-To: <CAKD0UuxFVtCRT+hqO5vkDRanaX3Gvwf9MAFqNUwAiFA+wEwxXg@mail.gmail.com>

Subject: gitweb: Make feed title valid utf8

Jürgen Kreileder <jk@blackdown.de> writes:

> gitweb doesn't properly handle UTF8 site names when generating feed titles.
> 
> Signed-off-by: Juergen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..a2838c3 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -7482,7 +7482,7 @@ sub git_feed {
>  	return if ($cgi->request_method() eq 'HEAD');
> 
>  	# header variables
> -	my $title = "$site_name - $project/$action";
> +	my $title = to_utf8($site_name) . " - " . to_utf8($project) . "/$action";
>  	my $feed_type = 'log';
>  	if (defined $hash) {
>  		$title .= " - '$hash'";
> -- 

Thanks.  With the minor nit of prefixing subject with subsystem
designation, i.e. with "gitweb: " -- ACK.

-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH] gitweb: Output valid utf8 in git_blame_common('data')
From: Jakub Narebski @ 2011-11-29 19:46 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git, Jakub Narebski
In-Reply-To: <CAKD0Uuxq+wLdRy5r_hz9qgjHkDmFHHbeAVkb07HizX9xaGMptw@mail.gmail.com>

Jürgen Kreileder <jk@blackdown.de> writes:

> With javascript-actions enabled gitweb showed broken author names in
> the tooltips on blame pages.
> 
> Signed-off-by: Juergen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..c863afe 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -6107,7 +6107,9 @@ sub git_blame_common {
>  			-type=>"text/plain", -charset => "utf-8",
>  			-status=> "200 OK");
>  		local $| = 1; # output autoflush
> -		print while <$fd>;
> +		while (my $line = <$fd>) {
> +			print to_utf8($line);
> +		}
>  		close $fd
>  			or print "ERROR $!\n";
> 
> -- 

Thanks.  ACK.


BTW. all those troubles with not forgetting to call to_utf8() make me
wonder if we wouldn't be better to forget about supporting
$fallback_encoding and just put

  use open qw(:encoding(UTF-8) :std); 

at the beginning of gitweb, c.f. http://training.perl.com/OSCON2011/index.html


Or

  use open qw(:utf8 :std); 

though then we simply discard errors.

-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH] gitweb: Output site name with valid utf8 in OPML
From: Jakub Narebski @ 2011-11-29 19:35 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git
In-Reply-To: <CAKD0UuyJDaHyofM5VYG-R03mgYi_1QBJvQufv927+x6YYzPU2w@mail.gmail.com>

Jürgen Kreileder <jk@blackdown.de> writes:

> Signed-off-by: Juergen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..df747c1 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -7699,11 +7699,12 @@ sub git_opml {
>  		-charset => 'utf-8',
>  		-content_disposition => 'inline; filename="opml.xml"');
> 
> +	my $title = esc_html($site_name);
>  	print <<XML;
>  <?xml version="1.0" encoding="utf-8"?>
>  <opml version="1.0">
>  <head>
> -  <title>$site_name OPML Export</title>
> +  <title>$title OPML Export</title>
>  </head>
>  <body>
>  <outline text="git RSS feeds">
> -- 

Thanks.  That is certainly correct... but it is more than just
handling utf8, isn't it.  It was also about not escaping XML
(think of site name containing eg. '<< foo >>' etc.).

So the subject / commit message should be corrected.

-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH] gitweb: Escape attribute in chop_and_escape_str()
From: Jakub Narebski @ 2011-11-29 19:33 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git, Jakub Narebski
In-Reply-To: <CAKD0UuyOMRFHE6DvuDj0ancfwFfg8ADKab6emv99+FN5RfZ=mQ@mail.gmail.com>

Jürgen Kreileder <jk@blackdown.de> writes:

> Fixes the title attribute in <span title="JÃ¼rgen Kreileder">Jürgen
> Kreileder</span> for example because to_utf8() is called implicitly now.
> 
> (Not sure why the attribute is there at all in the example. From my
> point of view nothing got chopped.)

Hmmm... this should not happen because of

	my $chopped = chop_str(@_);
	if ($chopped eq $str) {
		return esc_html($chopped);
	} else 
 
Perhaps it is a matter of doing to_utf8() on $str prior to comparison?

> Signed-off-by: Juergen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..fd76407 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1699,7 +1699,7 @@ sub chop_and_escape_str {
>  		return esc_html($chopped);
>  	} else {
>  		$str =~ s/[[:cntrl:]]/?/g;
> -		return $cgi->span({-title=>$str}, esc_html($chopped));
> +		return $cgi->span({-title => esc_attr($str)}, esc_html($chopped));
>  	}
>  }
> 
> -- 

esc_attr() is a wrong solution here, because $cgi->span(...) should
properly escape attributes.  You should simply use to_utf8() or
sanitize().

Well, uless we simply do

  $str = to_utf8($str);

earlier.

-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH] gitweb: Don't append ';js=(0|1)' to external links
From: Jakub Narebski @ 2011-11-29 19:28 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git, Jakub Narebski
In-Reply-To: <CAKD0UuzU4hAe7RGYukGyPpvfGeYJ3pgJ5pynupneMpQSaX5Cjw@mail.gmail.com>

Jürgen Kreileder <jk@blackdown.de> writes:

> Signed-off-by: Juergen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl                       |    2 +-
>  gitweb/static/js/javascript-detection.js |    4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..dfe3407 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3974,7 +3974,7 @@ sub git_footer_html {
>  		print qq!<script type="text/javascript">\n!.
>  		      qq!window.onload = function () {\n!;
>  		if (gitweb_check_feature('javascript-actions')) {
> -			print qq!	fixLinks();\n!;
> +			print qq!	fixLinks('$my_url');\n!;
>  		}
>  		if ($jstimezone && $tz_cookie && $datetime_class) {
>  			print qq!	var tz_cookie = { name: '$tz_cookie', expires: 14, path:
> '/' };\n!. # in days
> diff --git a/gitweb/static/js/javascript-detection.js
> b/gitweb/static/js/javascript-detection.js
> index fa2596f..36964ad 100644
> --- a/gitweb/static/js/javascript-detection.js
> +++ b/gitweb/static/js/javascript-detection.js
> @@ -29,11 +29,11 @@ var jsExceptionsRe = /[;?]js=[01](#.*)?$/;
>   *
>   * @globals jsExceptionsRe
>   */
> -function fixLinks() {
> +function fixLinks(baseurl) {
>  	var allLinks = document.getElementsByTagName("a") || document.links;
>  	for (var i = 0, len = allLinks.length; i < len; i++) {
>  		var link = allLinks[i];
> -		if (!jsExceptionsRe.test(link)) {
> +		if (!jsExceptionsRe.test(link) && !link.href.indexOf(baseurl)) {
>  			link.href = link.href.replace(/(#|$)/,
>  				(link.href.indexOf('?') === -1 ? '?' : ';') + 'js=1$1');
>  		}
> -- 

Thanks for this, but I think a better solution would be to explicitly
mark the few external links we have e.g. with 'class="external"', and
use that to avoid adding ';js=(0|1)' to them.

This has the advantage that we can use different style to mark
outgoing external links.

I even have such patch somewhere in the StGit stack...
-- >8 --
Subject: [PATCH] gitweb: Mark external links

...and do not add 'js=1' to them with JavaScript.

Both $logo_url and $home_link links are now marked with "external"
class, and fixLink does not add 'js=1' to them on click.  We add
'js=1' to internal link to make server-side of gitweb know that it can
use JavaScript-only actions; we shouldn't do this for extrenal links,
as 'js=1' might mean something else to them.

Note that only links using A element matter: images (linked using
IMG), stylesheets (linked using STYLE) and JavaScript files (linked
using SCRIPT) were never affected.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl                       |    5 ++++-
 gitweb/static/js/javascript-detection.js |    5 +++++
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7456a4b..f1c1caa 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3626,13 +3626,16 @@ EOF
 	print "<div class=\"page_header\">\n";
 	if (defined $logo) {
 		print $cgi->a({-href => esc_url($logo_url),
+		               -class => "external",
 		               -title => $logo_label},
 		              $cgi->img({-src => esc_url($logo),
 		                         -width => 72, -height => 27,
 		                         -alt => "git",
 		                         -class => "logo"}));
 	}
-	print $cgi->a({-href => esc_url($home_link)}, $home_link_str) . " / ";
+	print $cgi->a({-href => esc_url($home_link)
+	               -class => "external"},
+	              $home_link_str) . " / ";
 	if (defined $project) {
 		print $cgi->a({-href => href(action=>"summary")}, esc_html($project));
 		if (defined $action) {
diff --git a/gitweb/static/js/javascript-detection.js b/gitweb/static/js/javascript-detection.js
index 2b51e55..fc59e42 100644
--- a/gitweb/static/js/javascript-detection.js
+++ b/gitweb/static/js/javascript-detection.js
@@ -60,6 +60,11 @@ function fixLink(link) {
 	 */
 	var jsExceptionsRe = /[;?]js=[01]$/;
 
+	// don't change links marked as external ($logo_url, $home_link)
+	if (link.className === 'external') {
+		return;
+	}
+
 	if (!jsExceptionsRe.test(link)) { // =~ /[;?]js=[01]$/;
 		link.href +=
 			(link.href.indexOf('?') === -1 ? '?' : ';') + 'js=1';

^ permalink raw reply related

* Re: Git Submodule Problem - Bug?
From: Junio C Hamano @ 2011-11-29 19:21 UTC (permalink / raw)
  To: Manuel Koller
  Cc: Jens Lehmann, Fredrik Gustafsson, Thomas Rast, git, Heiko Voigt
In-Reply-To: <839E634E-76C0-40ED-8CCC-43E52F782079@gmail.com>

Manuel Koller <koller.manuel@gmail.com> writes:

> In have wrote a workaround for the problem i posted that goes into this
> direction. I just check whether the url has changed and remove the
> submodule by hand if it did.

... which might be wrong, depending on how you implement it, so be careful
(see below).

> ...
> What is the purpose of having two different submodules in the same path?
> Identifying the submodule by url however would probably make things
> considerably faster...

There are three primary things involved when talking about a submodule in
a superproject. Its name, its path in the superproject hierarchy, and its
URL (if it is foreign). The canonical example used when discussing the
design of Git submodule support is to imagine your superproject that is
about a small router that has a kernel component as a submodule that lives
in kernel/ hierarchy (your userland may live elsewhere in the tree, either
as a submodule bound at src/ hierarchy or contained in the superproject
itself and it does not make a difference).

You may had two major versions of your product. v1.0 line used to use BSD
kernels, and v2.0 line with linux-2.6 kernels. Even though a checkout of
any version of your superproject has a submodule bound at its kernel/
directory, the logical identity of the submodule used in your v1.0 line of
product and v2.0 line are different. You can differenciate them by giving
them different names, say 'kernel' and 'linux-kernel' [*1*]. When you
checkout the v1.0 tag, kernel/ directory should be populated by the
submodule that house the BSD kernel. When you checkout the v2.0 tag, it
should house the Linux 2.6 kernel.

Now, your v3.0 line will still use linux-2.6 (and now linux-3.0)
kernel. Logically, the kernel project is the same thing. linux-3.0 is just
a continuation of linux-2.6 series and maintained in the *logically* same
upstream repository. However, the repository location has recently changed
and now lives in

    http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

even though it used to be linux-2.6.git in the same hierarchy [*2*]. The
name of the submodule ('linux-kernel') does not have to change, as the
repository in the new location has all the history for linux-2.6 kernel as
well. The URL mapping recorded in .gitmodules in-tree from the submodule
name to the location however needs to change [*3*]. Ideally (I do not
think the current "git submodule" implementation does this), there should
be a more automated way to optionally allow switching the upstream URL of
the 'linux-kernel' subproject bound at kernel/ directory when you switch
between commits in the v2.0 series and in the v3.0 series of the
superproject [*3*], so that the user can say "Yes, I know linux.git URL
contains everything needed from linux-2.6.git URL, so just update the
submodule's upstream URL to linux.git one, and keep it like so; there is
no need to use the older URL even when I check out v2.0 from the
superproject" or "Notice when the URL recorded in .gitmodules for this
project changes when I check out a different version of the superproject,
and use the old URL to update the submodule when I check out the old
version of the superproject".

Even though it is an ancient design discussion, this thread is worth a
read before discussing anything about submodules:

  http://thread.gmane.org/gmane.comp.version-control.git/47466/focus=47498

not because the ideas that were discussed but not implemented in the
current "git submodule" are all good things that should be added, but
because the discussion shows what kind of considerations need to go into
the design.

[Footnotes]

*1* In the ideal world, you would have called it 'bsd' back in your v1.0
days with a foresight that someday you would switch to 'linux', but it is
likely you would not have been perfect.

*2* You can still access it as linux-2.6.git because k.org is trying to be
nice to its users, but that is besides the point.

*3* It becomes relevant if you imagine a case where the old linux-2.6.git
repository at k.org were left frozen at the last version of the 2.6 series
and the new history is only available in linux.git repository.

^ permalink raw reply

* support gnupg-2.x in git.
From: Paweł Sikora @ 2011-11-29 18:37 UTC (permalink / raw)
  To: git

Hi,

i'm using a gnupg-2.0.18 and currently i'm not able to use git tag/verify
due to hadcoded "gpg" literals in builtin/{tag,verifiy-tag}.c.
could you please improve git to use global config key with specified path
to gpg{,2} executable?

BR,
Paweł.

please CC me on reply.

^ permalink raw reply

* log: option "--follow" not the default for a single file?
From: Ralf Thielow @ 2011-11-29 18:25 UTC (permalink / raw)
  To: git

Why is the option "--follow" not the default if the log-command
is used with a single file? Many GUI tools don't show me the
full history of a single file if there was a rename in it.

Thanks

^ permalink raw reply

* Re: [PATCH 12/13] credentials: add "store" helper
From: Junio C Hamano @ 2011-11-29 18:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20111124110756.GJ8417@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> +static void parse_credential_file(const char *fn,
> +				  struct credential *c,
> +				  void (*match_cb)(struct credential *),
> +				  void (*other_cb)(struct strbuf *))
> +{
> +	FILE *fh;
> +	struct strbuf line = STRBUF_INIT;
> +	struct credential entry = CREDENTIAL_INIT;
> +
> +	fh = fopen(fn, "r");
> +	if (!fh) {
> +		if (errno != ENOENT)
> +			die_errno("unable to open %s", fn);
> +		return;
> +	}
> +
> +	while (strbuf_getline(&line, fh, '\n') != EOF) {
> +		credential_from_url(&entry, line.buf);
> +		if (entry.username && entry.password &&
> +		    credential_match(c, &entry)) {

This looks curious; isn't checking .username and .password part of the
responsibility of credential_match()? And even if entry lacks password
(which won't happen in the context of this program, given the
implementation of store_credential() below) shouldn't it still be
considered a match?

> +			if (match_cb) {
> +				match_cb(&entry);
> +				break;

Multiple matches not allowed, which sounds fine.

> +			}
> +		}
> +		else if (other_cb)
> +			other_cb(&line);
> +	}
> +
> +	credential_clear(&entry);
> +	strbuf_release(&line);
> +	fclose(fh);
> +}
> +
> +static void print_entry(struct credential *c)
> +{
> +	printf("username=%s\n", c->username);
> +	printf("password=%s\n", c->password);
> +}
> +
> +static void print_line(struct strbuf *buf)
> +{
> +	strbuf_addch(buf, '\n');
> +	write_or_die(credential_lock.fd, buf->buf, buf->len);
> +}
> +
> +static void rewrite_credential_file(const char *fn, struct credential *c,
> +				    struct strbuf *extra)
> +{
> +	umask(077);

Curious placement of umask(). I would expect a function that has its own
call to umask() restore it before it returns, and a stand-alone program
whose sole purpose is to work with a private file, setting a tight umask
upfront at the beginning of main() may be easier to understand.

> +	if (hold_lock_file_for_update(&credential_lock, fn, 0) < 0)
> +		die_errno("unable to get credential storage lock");
> +	parse_credential_file(fn, c, NULL, print_line);
> +	if (extra)
> +		print_line(extra);

An entry for a newly updated password comes at the end of the file,
instead of replacing an entry already in the file in-place? Given that
parse_credential_file() when processing a look-up request (which is the
majority of the case) stops upon finding a match, it might make more sense
to have the new one (which may be expected to be used often) at the
beginning instead, no?

> +	if (commit_lock_file(&credential_lock) < 0)
> +		die_errno("unable to commit credential store");
> +}
> +
> +static void store_credential(const char *fn, struct credential *c)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	if (!c->protocol || !(c->host || c->path) ||
> +	    !c->username || !c->password)
> +		return;
> +
> +	strbuf_addf(&buf, "%s://", c->protocol);
> +	strbuf_addstr_urlencode(&buf, c->username, 1);
> +	strbuf_addch(&buf, ':');
> +	strbuf_addstr_urlencode(&buf, c->password, 1);
> +	strbuf_addch(&buf, '@');
> +	if (c->host)
> +		strbuf_addstr_urlencode(&buf, c->host, 1);
> +	if (c->path) {
> +		strbuf_addch(&buf, '/');
> +		strbuf_addstr_urlencode(&buf, c->path, 0);
> +	}
> +
> +	rewrite_credential_file(fn, c, &buf);
> +	strbuf_release(&buf);
> +}
> +
> +static void remove_credential(const char *fn, struct credential *c)
> +{
> +	if (!c->protocol || !(c->host || c->path))
> +		return;

The choice of the fields looks rather arbitrary. I cannot say "remove all
the credentials whose username is 'gitster' at 'github.com' no matter what
protocol is used", but I can say "remove all credentials under any name
for any host as long as the transfer goes over 'https' and accesses a
repository at 'if/xyzzy' path", it seems.

This filtering matches what lookup_credential() does but shouldn't it be
implemented at a single place in any case?

> +	rewrite_credential_file(fn, c, NULL);
> +}
> +
> +static int lookup_credential(const char *fn, struct credential *c)
> +{
> +	if (!c->protocol || !(c->host || c->path))
> +		return 0;
> +	parse_credential_file(fn, c, print_entry, NULL);
> +	return c->username && c->password;
> +}

^ permalink raw reply

* Re: [PATCH 11/13] strbuf: add strbuf_add*_urlencode
From: Junio C Hamano @ 2011-11-29 18:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20111124110728.GI8417@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> This just follows the rfc3986 rules for percent-encoding
> url data into a strbuf.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  strbuf.c |   37 +++++++++++++++++++++++++++++++++++++
>  strbuf.h |    5 +++++
>  2 files changed, 42 insertions(+), 0 deletions(-)
>
> diff --git a/strbuf.c b/strbuf.c
> index 3ad2cc0..60e5e59 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -397,3 +397,40 @@ int strbuf_read_file(struct strbuf *sb, const char *path, size_t hint)
>  
>  	return len;
>  }
> +
> +static int is_rfc3986_reserved(char ch)
> +{
> +	switch (ch) {
> +	case '!': case '*': case '\'': case '(': case ')': case ';':
> +	case ':': case '@': case '&': case '=': case '+': case '$':
> +	case ',': case '/': case '?': case '#': case '[': case ']':
> +		return 1;
> +	}
> +	return 0;
> +}

Part of me wonders if we still have extra bits in sane_ctype[] array but
that one is cumbersome to update, and the above should be easier to read
and maintain.

> +void strbuf_add_urlencode(struct strbuf *sb, const char *s, size_t len,
> +			  int reserved)

Does "reserved" parameter mean "must-encode-reserved", or
"may-encode-reserved" (the latter would be more like "if set to 0,
per-cent encoding the result would be an error")?

^ permalink raw reply

* Re: [PATCH 1/2] checkout,merge: loosen overwriting untracked file check based on info/exclude
From: Junio C Hamano @ 2011-11-29 18:17 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Johannes Sixt, Taylor Hedberg, Bertrand BENOIT
In-Reply-To: <1322388933-6284-1-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> Back in 1127148 (Loosen "working file will be lost" check in
> Porcelain-ish - 2006-12-04), git-checkout.sh learned to quietly
> overwrite ignored files. Howver the code only took .gitignore files
> into account.

v1.5.3.5-721-g039bc64 also tried to make it harder to make this kind of
mistake, and forgot to spot it in git-checkout.sh which was a bit
unfortunate.

Thanks. I think both patches make sense, and especially 2/2 opens the door
to possibly make ignored ones automatically precious.

^ permalink raw reply

* Re: Git Submodule Problem - Bug?
From: Manuel Koller @ 2011-11-29 18:15 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Fredrik Gustafsson, Thomas Rast, git, Heiko Voigt
In-Reply-To: <4ED5196B.5030200@web.de>

Thanks for taking my problem seriously.

On 29.11.2011, at 18:42, Jens Lehmann wrote:

> Am 29.11.2011 11:41, schrieb Fredrik Gustafsson:
>> On Tue, Nov 29, 2011 at 11:25:41AM +0100, Thomas Rast wrote:
>>> So maybe the right questions to ask would be: what's the *official*
>>> way of removing a submodule completely?  Do we support overwriting
>>> submodules in the way Manuel wanted to?  Why not? :-)
>> 
>> I suggest that we add a command for that;
>> git submodule remove <submodule>
> 
> Hmm, to me it looks like the problem is in "git submodule add". It
> doesn't check if the submodule repo it finds in .git/modules matches
> the one the user wants to create. So we end up reviving the first
> submodule although the user wants to use a completely different repo.

In have wrote a workaround for the problem i posted that goes into this direction. I just check whether the url has changed and remove the submodule by hand if it did. See https://github.com/kollerma/git-submodule-tools if you're interested (its in git-fix-submodules).

> One solution could be to only let "git submodule update" revive
> submodules from .git/modules and make "git submodule add" error out
> if it finds the git directory of a submodule with the same name in
> .git/modules. But currently there is no way to tell "git submodule add"
> to use a different submodule name (it always uses the path as a name),
> so we might have to add an option to do that and tell the user in the
> error message how he can add a different submodule under the same
> path.
> 
> Another solution could be that "git submodule add" detects that a
> submodule with the name "sub" did exist and chooses a different name
> (say "sub2") for the the new one. Then the user wouldn't have to
> cope with the problem himself.

What is the purpose of having two different submodules in the same path? Identifying the submodule by url however would probably make things considerably faster. At least in my case, since I have checked out different branches of the same repo at different paths as submodules. 

^ permalink raw reply

* Re: Git Submodule Problem - Bug?
From: Jens Lehmann @ 2011-11-29 17:42 UTC (permalink / raw)
  To: Manuel Koller; +Cc: Fredrik Gustafsson, Thomas Rast, git, Heiko Voigt
In-Reply-To: <20111129104105.GA10839@kolya>

Am 29.11.2011 11:41, schrieb Fredrik Gustafsson:
> On Tue, Nov 29, 2011 at 11:25:41AM +0100, Thomas Rast wrote:
>> So maybe the right questions to ask would be: what's the *official*
>> way of removing a submodule completely?  Do we support overwriting
>> submodules in the way Manuel wanted to?  Why not? :-)
> 
> I suggest that we add a command for that;
> git submodule remove <submodule>

Hmm, to me it looks like the problem is in "git submodule add". It
doesn't check if the submodule repo it finds in .git/modules matches
the one the user wants to create. So we end up reviving the first
submodule although the user wants to use a completely different repo.

One solution could be to only let "git submodule update" revive
submodules from .git/modules and make "git submodule add" error out
if it finds the git directory of a submodule with the same name in
.git/modules. But currently there is no way to tell "git submodule add"
to use a different submodule name (it always uses the path as a name),
so we might have to add an option to do that and tell the user in the
error message how he can add a different submodule under the same
path.

Another solution could be that "git submodule add" detects that a
submodule with the name "sub" did exist and chooses a different name
(say "sub2") for the the new one. Then the user wouldn't have to
cope with the problem himself.

^ permalink raw reply

* Re: [PATCH 03/13] introduce credentials API
From: Junio C Hamano @ 2011-11-29 17:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20111129050425.GA32022@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Mon, Nov 28, 2011 at 01:46:35PM -0800, Junio C Hamano wrote:
>
>> > diff --git a/Documentation/technical/api-credentials.txt b/Documentation/technical/api-credentials.txt
>> > new file mode 100644
>> > index 0000000..3061077
>> > --- /dev/null
>> > +++ b/Documentation/technical/api-credentials.txt
>> > @@ -0,0 +1,148 @@
>> > + ...
>> > +`credential_fill`::
>> > +
>> > +	Attempt to fill the username and password fields of the passed
>> > +	credential struct, first consulting storage helpers, then asking
>> > +	the user. Guarantees that the username and password fields will
>> > +	be filled afterwards (or die() will be called).
>> > +
>> > +`credential_reject`::
>> > +
>> > +	Inform the credential subsystem that the provided credentials
>> > +	have been rejected. This will notify any storage helpers of the
>> > +	rejection (which allows them to, for example, purge the invalid
>> > +	credentials from storage), and then clear the username and
>> > +	password fields in `struct credential`. It can then be
>> > +	`credential_fill`-ed again.
>> > +
>> > +`credential_approve`::
>> > +
>> > +	Inform the credential subsystem that the provided credentials
>> > +	were successfully used for authentication. This will notify any
>> > +	storage helpers of the approval, so that they can store the
>> > +	result to be used again.
>> 
>> It's a bit hard to read and understand which part of the system calls
>> these and which other part of the system is responsible for implementing
>> them, and how "helper" fits into the picture (perhaps calling some of
>> these interfaces will result in "helper" getting called?).
>
> How about following it with a rough example of how they would be used,
> like:
>
>   /*
>    * Create a credential with some context; we don't yet know the
>    * username or password.
>    */
>   struct credential c = CREDENTIAL_INIT;
>   c.protocol = xstrdup("https");
>   c.host = xstrdup("example.com");
>   c.path = xstrdup("foo.git");
>
>   /*
>    * Fill in the username and password fields by contacting helpers
>    * and/or asking the user. The function will die if it fails.
>    */
>   credential_fill(&c);
>
>   /*
>    * And then finally make our request, reporting back to the credential
>    * system on whether it succeeded or failed.
>    */
>   if (make_http_request(c.host, c.path, c.username, c.password) < 0)
>           credential_reject(&c);
>   else
>           credential_accept(&c);
>
> Does that make it more clear?

Immensely, at least to me. From the perspective of a user of the API, a
call to credential_fill() is to "fill in the credential" in the sense that
"call the function to fill in the credential" but I find it easier to
understand if it were explained to me as "ask the API to fill in the
credential, which may involve helpers to interact with the user--the point
of the API is that the caller does not care how it is done".  Same for the
reject/accept calls---the example makes it clear that they are to tell the
decision to reject/accept made by the application to the credential API,
and it is up to the API layer what it does using that decision (like
removing the cached and now stale password).

The above example is a bit too simplistic and misleading, though. You
would call reject only on authentication failure (do not trash stored and
good password upon network being unreachable temporarily or the server
being overloaded).

> So one possible rule would be:
>
>   1. If it starts with "!", clip off the "!" and hand it to the shell.
>
>   2. Otherwise, if is_absolute_path(), hand it to the shell directly.
>
>   3. Otherwise, prepend "git credential-" and hand it to the shell.
>
> I think that is slightly less confusing than the "first word is alnum"
> thing.

Simpler and easier to explain. Good ;-)

> How do you feel about the "values cannot contain a newline" requirement?

In the context of asking username, password, or passphrase, I think "LF is
the end of the line and you cannot have that byte in your response" is
perfectly reasonable. I've yet to find a way to use LF in a passphrase to
unlock my Gnome keychain ;-).

>> > +int credential_read(struct credential *c, FILE *fp)
>> > +{
>> > ...
>> > +			c->host = xstrdup(value);
>> > +		}
>> > +		else if (!strcmp(key, "path")) {
>> > ...
>> > +		/* ignore other lines; we don't know what they mean, but
>> > +		 * this future-proofs us when later versions of git do
>> > +		 * learn new lines, and the helpers are updated to match */
>> 
>> Two style nits.
>
> I'm supposed to guess? ;P

Sorry, but you guessed right.

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Shawn Pearce @ 2011-11-29 17:08 UTC (permalink / raw)
  To: Bill Zaumen; +Cc: git, gitster, pclouds, peff, torvalds
In-Reply-To: <1322546563.1719.22.camel@yos>

On Mon, Nov 28, 2011 at 22:02, Bill Zaumen <bill.zaumen+git@gmail.com> wrote:
> Several years ago (in 2006) there was a discussion about the whether
> SHA-1 is adequate given the very small but non-zero probability of a
> hash collision, particularly given the possibility of a malicious
> attempt to generate a collision. At roughly the same time, Git was
> modified to support "thin packs" during data transfers. These allow
> one to send deltas based on objects that are not in the pack file
> being transferred. As a result a previously undetected hash collision
> could result in a corrupted repository when (for example) the same
> delta is applied to different objects that have the same SHA-1 hash.

I don't think you understand how these thin packs are processed.

If the pack contains <100 objects, it is inflated to loose objects. If
the receiving side (so client in fetch, server in push) already has an
object by that SHA-1, the new object is discarded. If the pack
contains >=100 objects, and the receiving side already has the object,
it is compared byte-for-byte to ensure the incoming copy exactly
matches the already existing copy.

Either way the first object to arrive always wins.

The recipient has to trust that the remote side is providing it
something reasonable. If the recipient has *ZERO* trust in the sender,
then s/he should read the content of all newly arrived objects before
compiling or executing them. This is one reason why Git does not run
hooks that are transported as part of the repository. If the recipient
thinks reading the content is too onerous or impossible, then they
have to make a trust assertion on the remote side.

This trust assertion should be derived from the community, and not so
much around the machine claiming the content is what it says it is. We
have yet to disprove the halting problem, so we have yet to construct
a machine that can verify those Linux kernel sources you downloaded
don't contain a local root exploit (for example). Instead we have to
trust the community of developers and users who work on and run that
code to have confidence that the code works as expected, etc. We base
our trust off reputable people making statements like "Linus kernel at
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git is
pretty good", and "kernel.org is where Linus pushes directly to, so
its reasonable to trust the kernel.org server".

The recipient should have some understanding of the remote server's
security policies, or pay attention to notices posted by others who
are also fetching and reviewing content from the same repository. At
some level, the community using a repository from a given site should
be policing that site and establishing trust that the host is not
providing garbage content. After an incident, it is possible to pick
up again by rebuilding the environment from an already known
repository that people trust. In the case of the recent kernel.org
environment rebuild, that is exactly what they did, the community
picked up again from Linus' personal repository.

DNS could be abused to send you the wrong IP for a site, but most
people don't use random DNS servers, they have some level of trust in
their DNS provider. DNSSEC is helping to improve the security of the
name->IP translation process, and using protocols like HTTPS with SSL
certificate verification can help to reduce the chances that a forged
DNS entry sends you to an evil source, rather than the community
trusted one. (Although SSL certificates seem to be getting forged left
and right these days, so again you can't really rely on strong
cryptography to magically solve security problems when the attacker
holds the private key you have decided to trust with no further
verification.)

But trust aside, consider an object C is sent as a delta to the remote
side. The delta base B is not included in the pack, and is referenced
by SHA-1. When the remote side processed delta C, it looks up a copy
of base B from its own repository. We assume this content of B is
correct, due to the "first to arrive wins" rule, and the community
review/trust/notification process.

The inflated length of B is checked against a size that is stored in
the front of the delta instruction stream that describes C. These
lengths must match exactly, if they do not match then the delta
application aborts, the pack is rejected, and any temporary data is
removed from the filesystem. As Peff pointed out elsewhere in this
thread, the odds of a SHA-1 collision in a project are low, on the
order of 1/(2^80). Although there are some attacks on message digest
functions like MD-5 or SHA-1 that might be possible to generate a
duplicate in 2^57 time, any that I have read require producing a
different length content than the original you are trying to replace.
Assuming the copy of B on the remote system actually inflates and
computes to the correct SHA-1 B, it probably does not also have the
correct length if it is an object with correct hash but wrong content.
So delta application should still be checking for collision with a
1/(2^80) probability.

Assuming the remote's copy of B passed the size check, the delta is
applied on top, and the SHA-1 of the result buffer is computed. The
attacker must craft the delta such that the SHA-1 of C is the result,
otherwise connectivity checks will fail.

Assuming the attacker successfully stores a C' that has the right
SHA-1, but wrong content... the community around that repository will
eventually notice this and message that the source site cannot be
trusted. I refer you back to the statement above about trusting the
site you pull from, or trusting the users that you authorize to push
into a repository.

But thin pack aside, this problem exists in any form of a packed
object. An attacker can try to send object P' (as a non-delta) in
place of P. SHA1(P') = SHA1(P), but the content differs. This is far
easier to construct than the thin pack delta case you think is a
problem, and is the most likely approach for an attacker to take. I
refer you back to the statement above about trusting the site you pull
from, or trusting the users that you authorize to push into a
repository, or reading every object you receive.

Even if you magically fix the hash function somehow to decrease the
odds of collision (e.g. by switching to a member of the SHA-2 or SHA-3
family), there is no way to prevent a bug or root exploit from
entering the project except by never adding new code, or by carefully
reviewing everything that is submitted, and building up a basis of
trust around that review method. It is far more likely for an attacker
to try and submit a plain text patch to the Linux kernel mailing list
that reads completely correct, hashes to the correct SHA-1s when
applied in Git, etc... but just "happens" to contain an off by one
pointer bug in some weird case that allows the attacker to overflow a
critical memory buffer and later inject some code that can later be
used to create an exploit. If they are ever "caught" they may just
claim "I AM A MORON I AM SORRY I MISSED THAT BUFFER CHECK AND SO DID
YOU DURING CODE REVIEW SO ITS NOT ALL MY FAULT LEAVE ME ALONE" and get
away with it.

Trust. Review. Verify.

I don't know about you, but I don't just pull random code from
arbitrary sites on the Internet. Nor do I compile or execute that code
on my workstation. I do trust some individuals based on their
reputation on the Internet, or my past experiences working with their
code. And I also trust some hosting environments like kernel.org, or
GitHub, or code.google.com, to provide reasonably secure hosting, and
to aggressively react to any event that might make it harder for me to
trust the content they supply. And I also read a lot of code that I
pull.

It really isn't the problem you try to claim it is.

^ permalink raw reply

* Re: subversion-perl missing
From: Jason @ 2011-11-29 15:39 UTC (permalink / raw)
  To: git
In-Reply-To: <4E83433E.9000702@gjlay.de>

Hi Johann,
Were you able to get this working?  I'm also getting the "Bad URL passed to RA
layer: Unrecognized URL scheme" error when trying to run "git svn clone".  

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Shawn Pearce @ 2011-11-29 15:23 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, Bill Zaumen, git, gitster, torvalds
In-Reply-To: <CACsJy8BRhoYM+Lb8afp=3rKzYNOEq0JY8uS9mD1ovz3uyJVWOA@mail.gmail.com>

On Tue, Nov 29, 2011 at 05:17, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Tue, Nov 29, 2011 at 4:07 PM, Jeff King <peff@peff.net> wrote:
>> However, it's harder for trees. Each entry needs to have the new digest
>> added, but there simply isn't room in the format. So we would have to
>> change the tree format, breaking interoperability with older versions of
>> git. And all of your tree sha1's would change as soon as you wrote them
>> with the new format. That's only slightly better than just swapping
>> sha1 out for a better algorithm.
>
> I think we could hide the new digest after NUL at the end of path
> name. C Git at least does not seem to care whatever after NUL.

No, you can't. The next byte after the NUL at the end of a path name
is the SHA-1 of that entry. After those 20 bytes of SHA-1 data is
consumed, the "mode SP name\0" of the next entry is parsed.

There is not room in the tree format for any additional data. Period.
At best you can modify the mode value to be a new octal code that is
not recognized by current Git implementations. But that doesn't get
you a whole lot of data, and its pretty risky change because its
rather incompatible.

^ permalink raw reply

* Re: [PATCH] grep: enable multi-threading for -p and -W
From: Thomas Rast @ 2011-11-29 14:07 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Junio C Hamano
In-Reply-To: <4ED4E2D7.9090804@lsrfire.ath.cx>

René Scharfe wrote:
> > * none of the patches:
> >   git grep --cached INITRAMFS_ROOT_UID
> >     2.13user 0.14system 0:02.10elapsed
> >   git grep --cached -W INITRAMFS_ROOT_UID    # note: broken!
> >     2.23user 0.18system 0:02.21elapsed 
> 
> Broken is a tad too hard a word

Sorry, I just wanted to mark it as: this is unattainable since we're
now doing extra work.

> > * my patch, but not yours:
> >   git grep --cached -W INITRAMFS_ROOT_UID
> >     3.01user 0.05system 0:03.07elapsed
> > 
> > * my patch + your patch:
> >   git grep --cached -W INITRAMFS_ROOT_UID
> >     4.45user 0.22system 0:02.67elapsed
> > 
> > So while it ends up being faster overall, it also uses way more CPU
> > and would presumably be *slower* on a single-processor system.
> > Apparently those attribute lookups really hurt.
> 
> Hmm, why are they *that* expensive?
> 
> callgrind tells me that userdiff_find_by_path() contributes only 0.18%
> to the total cost with your first patch.  Timings in my virtual machine
> are very volatile, but it seems that here the difference is in the
> system time while user is basically the same for all combinations of
> patches.

Not sure, perhaps callgrind does not model syscalls as expensive
enough.  I also suspect (from my very cursory look at the attributes
machinery) that loading attributes for all paths *in random order*
(i.e., threaded) causes a lot of discards of the existing attributes
data.  (The order is still random with my new patch, but we only load
them for files that have matches.)

> I wonder what caused the slight speedup for the case without -W.  It's
> probably just noise, though.

Yeah, it's very noisy, but I was too lazy for best-of-50 or something ;-)

[...]
> > +#ifndef NO_PTHREADS
> > +static inline void grep_attr_lock(struct grep_opt *opt)
[...]
> We'd need stubs here for the case of NO_PTHREADS being defined.

Right, thanks.  Sorry for not testing with NO_PTHREADS to begin with.

> Perhaps leave the thread stuff in builtin/grep.c and export a function
> from there that does [the userdiff lookup], with locking and all?

That seems like a layering violation to me.  Isn't builtin/grep.c
supposed to call out to grep.c, but not the other way around?

Maybe it would be less messy if we had a global (across all of git)
flag that says whether threads are on.  Then subsystems that are used
from threaded code, but cannot handle it, can learn to lock themselves
around their work.  But it would be pretty much the opposite of what
git-grep now does.

Thanks for the review.  I'll reroll as a proper patch later.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Nguyen Thai Ngoc Duy @ 2011-11-29 14:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Bill Zaumen, git, gitster, spearce, torvalds
In-Reply-To: <20111129090733.GA22046@sigill.intra.peff.net>

On Tue, Nov 29, 2011 at 4:07 PM, Jeff King <peff@peff.net> wrote:
> That brings me to my second concern: how does this alternative message
> digest have any authority?
>
> For example, your patch teaches the git protocol a new extension to pass
> these digests along with the object sha1s. But how do we know the server
> isn't feeding us a bad digest along with the bad object?
>
> The "usual" security model discussed in git is that of verifying a
> signed tag. Linus signs a tag and pushes it to a server. I fetch the
> tag, and can verify the signature on the tag. I want to know that I have
> the exact same objects that Linus had. But I can't assume the server is
> trustworthy; it may have fed me a bad object with a collided sha1.
>
> But what's in the signed part of the tag object? Only the sha1 of the
> commit the tag points to, but not the new digest. So an attacker can
> replace the commit with one that collides, and it can in turn point to
> arbitrary trees and blobs.
>
> You can fix this by including an extra header in the signed part of the
> tag that says "also, the digest of the commit I point to is X". Then you
> know you have the same commit that Linus had. But the commit points to a
> tree by its sha1. So you have to add a similar header in the commit
> object that says "also, the digest of the tree I point to is X". And
> ditto for all of the parent pointers, if you want to care about signing
> history. And then you have the same problem in the tree: each sub-tree
> and blob is referenced by its sha1.
>
> ..

(Security ignorant speaking)

Can we just hash all objects in a pack from bottom up, (replacing
sha-1 in trees/commits/tags with the new digest in memory before
hashing), then attach the new top digest to tag's content? The sender
is required by the receiver to send new digests for all objects in the
pack together with the pack. The receiver can then go through the same
process to produce the top digest and match it with one saved in tag.

I do agree that we should use stronger digests, not weaker, preferably
a combination of them.
-- 
Duy

^ permalink raw reply

* Re: [PATCH] grep: enable multi-threading for -p and -W
From: René Scharfe @ 2011-11-29 13:49 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Junio C Hamano
In-Reply-To: <201111291054.39477.trast@student.ethz.ch>

Am 29.11.2011 10:54, schrieb Thomas Rast:
> René Scharfe wrote:
>> Move attribute reading, which is not thread-safe, into add_work(), under
>> grep_mutex.  That way we can stop turning off multi-threading if -p or -W
>> is given, as the diff attribute for each file is gotten safely now.
>>
>> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
>> ---
>> Goes on top of your patch.  Needs testing..
> 
> On a random old linux-2.6 clone at v2.6.37-rc2, I'm seeing (best of 5):
> 
> * none of the patches:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.13user 0.14system 0:02.10elapsed
>   git grep --cached -W INITRAMFS_ROOT_UID    # note: broken!
>     2.23user 0.18system 0:02.21elapsed 

Broken is a tad too hard a word; IIUC -W just lacks support for function
patterns of different file types, which is unintended and surprising of
course.  For the default case it works correctly (and threaded).

> * my patch, but not yours:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.21user 0.21system 0:02.27elapsed
>   git grep --cached -W INITRAMFS_ROOT_UID
>     3.01user 0.05system 0:03.07elapsed
> 
> * my patch + your patch:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.22user 0.17system 0:02.22elapsed
>   git grep --cached -W INITRAMFS_ROOT_UID
>     4.45user 0.22system 0:02.67elapsed
> 
> So while it ends up being faster overall, it also uses way more CPU
> and would presumably be *slower* on a single-processor system.
> Apparently those attribute lookups really hurt.

Hmm, why are they *that* expensive?

callgrind tells me that userdiff_find_by_path() contributes only 0.18%
to the total cost with your first patch.  Timings in my virtual machine
are very volatile, but it seems that here the difference is in the
system time while user is basically the same for all combinations of
patches.

> So I had the following idea: if we load attributes only very lazily
> (that is, when match_funcname() is first called), we can ask for them
> much more rarely.  The revised timings:
> 
> * my patch + the new patch at the end:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.15user 0.16system 0:02.15elapsed 107%CPU
>   git grep --cached -W INITRAMFS_ROOT_UID
>     2.20user 0.18system 0:02.24elapsed

Nice.

I wonder what caused the slight speedup for the case without -W.  It's
probably just noise, though.

> However, locking on a specific lock relies on the fact that the
> attributes are only read from the tree, but never from the object
> store.  Perhaps it would be more futureproof to lock on
> read_sha1_mutex instead.  Either way the lazy attributes lookup seems
> a big win.

You could lock read_sha1_mutex after grep_attr_mutex.  With lazy loading
it should have a low impact most of the time.

> ------ 8< ------ 8< ------
> Subject: [PATCH] grep: fine-grained locking around attributes access
> 
> Lazily load the userdiff attributes in match_funcname().  Use a
> separate mutex around this loading to protect the (not thread-safe)
> attributes machinery.  This lets us re-enable threading with -p and
> -W while reducing the overhead caused by looking up attributes.
> 
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 988ea1d..822b32f 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt)
>  
>  	pthread_mutex_init(&grep_mutex, NULL);
>  	pthread_mutex_init(&read_sha1_mutex, NULL);
> +	pthread_mutex_init(&grep_attr_mutex, NULL);
>  	pthread_cond_init(&cond_add, NULL);
>  	pthread_cond_init(&cond_write, NULL);
>  	pthread_cond_init(&cond_result, NULL);
> @@ -303,6 +304,7 @@ static int wait_all(void)
>  
>  	pthread_mutex_destroy(&grep_mutex);
>  	pthread_mutex_destroy(&read_sha1_mutex);
> +	pthread_mutex_destroy(&grep_attr_mutex);
>  	pthread_cond_destroy(&cond_add);
>  	pthread_cond_destroy(&cond_write);
>  	pthread_cond_destroy(&cond_result);
> @@ -1002,9 +1004,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  		opt.regflags |= REG_ICASE;
>  
>  #ifndef NO_PTHREADS
> -	if (online_cpus() == 1 || !grep_threads_ok(&opt))
> +	if (online_cpus() == 1)
>  		use_threads = 0;
> +#endif
> +
> +	opt.use_threads = use_threads;
>  
> +#ifndef NO_PTHREADS
>  	if (use_threads) {
>  		if (opt.pre_context || opt.post_context || opt.file_break ||
>  		    opt.funcbody)
> diff --git a/grep.c b/grep.c
> index 7a070e9..e9c3df3 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -2,6 +2,7 @@
>  #include "grep.h"
>  #include "userdiff.h"
>  #include "xdiff-interface.h"
> +#include "thread-utils.h"
>  
>  void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat)
>  {
> @@ -806,10 +807,38 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol,
>  	opt->output(opt, "\n", 1);
>  }
>  
> -static int match_funcname(struct grep_opt *opt, char *bol, char *eol)
> +#ifndef NO_PTHREADS
> +pthread_mutex_t grep_attr_mutex;
> +
> +static inline void grep_attr_lock(struct grep_opt *opt)
> +{
> +	if (opt->use_threads)
> +		pthread_mutex_lock(&grep_attr_mutex);
> +}
> +
> +static inline void grep_attr_unlock(struct grep_opt *opt)
> +{
> +	if (opt->use_threads)
> +		pthread_mutex_unlock(&grep_attr_mutex);
> +}
> +#endif

We'd need stubs here for the case of NO_PTHREADS being defined.

> +
> +static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol)
>  {
>  	xdemitconf_t *xecfg = opt->priv;
> -	if (xecfg && xecfg->find_func) {
> +	if (xecfg && !xecfg->find_func) {
> +		grep_attr_lock(opt);
> +		struct userdiff_driver *drv = userdiff_find_by_path(name);
> +		grep_attr_unlock(opt);
> +		if (drv && drv->funcname.pattern) {
> +			const struct userdiff_funcname *pe = &drv->funcname;
> +			xdiff_set_find_func(xecfg, pe->pattern, pe->cflags);
> +		} else {
> +			xecfg = opt->priv = NULL;
> +		}
> +	}

Perhaps leave the thread stuff in builtin/grep.c and export a function
from there that does the above, with locking and all?

> +
> +	if (xecfg) {
>  		char buf[1];
>  		return xecfg->find_func(bol, eol - bol, buf, 1,
>  					xecfg->find_func_priv) >= 0;
> @@ -835,7 +864,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name,
>  		if (lno <= opt->last_shown)
>  			break;
>  
> -		if (match_funcname(opt, bol, eol)) {
> +		if (match_funcname(opt, name, bol, eol)) {
>  			show_line(opt, bol, eol, name, lno, '=');
>  			break;
>  		}
> @@ -848,7 +877,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
>  	unsigned cur = lno, from = 1, funcname_lno = 0;
>  	int funcname_needed = !!opt->funcname;
>  
> -	if (opt->funcbody && !match_funcname(opt, bol, end))
> +	if (opt->funcbody && !match_funcname(opt, name, bol, end))
>  		funcname_needed = 2;
>  
>  	if (opt->pre_context < lno)
> @@ -864,7 +893,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
>  		while (bol > buf && bol[-1] != '\n')
>  			bol--;
>  		cur--;
> -		if (funcname_needed && match_funcname(opt, bol, eol)) {
> +		if (funcname_needed && match_funcname(opt, name, bol, eol)) {
>  			funcname_lno = cur;
>  			funcname_needed = 0;
>  		}
> @@ -942,19 +971,6 @@ static int look_ahead(struct grep_opt *opt,
>  	return 0;
>  }
>  
> -int grep_threads_ok(const struct grep_opt *opt)
> -{
> -	/* If this condition is true, then we may use the attribute
> -	 * machinery in grep_buffer_1. The attribute code is not
> -	 * thread safe, so we disable the use of threads.
> -	 */
> -	if ((opt->funcname || opt->funcbody)
> -	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
> -		return 0;
> -
> -	return 1;
> -}
> -
>  static void std_output(struct grep_opt *opt, const void *buf, size_t size)
>  {
>  	fwrite(buf, size, 1, stdout);
> @@ -1011,12 +1027,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
>  	if ((opt->funcname || opt->funcbody)
>  	    && !opt->unmatch_name_only && !opt->status_only &&
>  	    !opt->name_only && !binary_match_only && !collect_hits) {
> -		struct userdiff_driver *drv = userdiff_find_by_path(name);
> -		if (drv && drv->funcname.pattern) {
> -			const struct userdiff_funcname *pe = &drv->funcname;
> -			xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
> -			opt->priv = &xecfg;
> -		}
> +		opt->priv = &xecfg;
>  	}

opt->priv can be set unconditionally here now, since we only access it
from within match_funcname() and that function is not called if any of
the short-circuit options is set.

>  	try_lookahead = should_lookahead(opt);
>  
> @@ -1093,7 +1104,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
>  				show_function = 1;
>  			goto next_line;
>  		}
> -		if (show_function && match_funcname(opt, bol, eol))
> +		if (show_function && match_funcname(opt, name, bol, eol))
>  			show_function = 0;
>  		if (show_function ||
>  		    (last_hit && lno <= last_hit + opt->post_context)) {
> diff --git a/grep.h b/grep.h
> index a652800..15d227c 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -115,6 +115,7 @@ struct grep_opt {
>  	int show_hunk_mark;
>  	int file_break;
>  	int heading;
> +	int use_threads;
>  	void *priv;
>  
>  	void (*output)(struct grep_opt *opt, const void *data, size_t size);
> @@ -131,4 +132,10 @@ struct grep_opt {
>  extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
>  extern int grep_threads_ok(const struct grep_opt *opt);
>  
> +#ifndef NO_PTHREADS
> +/* Mutex used around access to the attributes machinery if
> + * opt->use_threads.  Must be initialized/destroyed by callers! */
> +extern pthread_mutex_t grep_attr_mutex;
> +#endif
> +
>  #endif
> 
> 

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Nguyen Thai Ngoc Duy @ 2011-11-29 13:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Bill Zaumen, git, gitster, spearce, torvalds
In-Reply-To: <20111129090733.GA22046@sigill.intra.peff.net>

On Tue, Nov 29, 2011 at 4:07 PM, Jeff King <peff@peff.net> wrote:
> However, it's harder for trees. Each entry needs to have the new digest
> added, but there simply isn't room in the format. So we would have to
> change the tree format, breaking interoperability with older versions of
> git. And all of your tree sha1's would change as soon as you wrote them
> with the new format. That's only slightly better than just swapping
> sha1 out for a better algorithm.

I think we could hide the new digest after NUL at the end of path
name. C Git at least does not seem to care whatever after NUL.
-- 
Duy

^ permalink raw reply

* Re: Copy branch into master
From: Tay Ray Chuan @ 2011-11-29 12:49 UTC (permalink / raw)
  To: Andrew Eikum; +Cc: Ron Eggler, git
In-Reply-To: <20111128183616.GB29503@foghorn.codeweavers.com>

On Tue, Nov 29, 2011 at 2:36 AM, Andrew Eikum <aeikum@codeweavers.com> wrote:
> On Mon, Nov 28, 2011 at 10:25:33AM -0800, Ron Eggler wrote:
>> Some time ago I created a DVT branch in my project and I have almost been
>> exclusively working in it. Now the time for some test deployment came and I
>> didn't have time to merge it all back into the master thus I gave out the
>> DVT branch version. Now I would like to copy exactly what I have in that
>> branch back into my master to have an exact copy in my master of what got
>> deployed with out any changes.
>> How can I do this?
>
> Couple options, depending on what you want:
>
> Rename DVT to master (similar to 'mv DVT master', including
> losing the contents of 'master'):
> $ git checkout --detach HEAD
> $ git branch -M DVT master
> $ git checkout master

It might not be wise to take the strict definition of rename = move
(copy + delete). You will lose *all* your reflog associated with
master. The old master is gone forever.

Resetting the branch is "safer". It's like pseudo-copying the branch.
The reflog for both branches are still around. You can do this with

 # on master
 $ git reset --hard DVT

or

 # not on master
 $ git branch -f master dvt

-- 
Cheers,
Ray Chuan

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox