Git development
 help / color / mirror / Atom feed
* Re: [PATCH 12/13] credentials: add "store" helper
From: Jeff King @ 2011-11-29 21:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vsjl6ssf9.fsf@alter.siamese.dyndns.org>

On Tue, Nov 29, 2011 at 10:19:06AM -0800, Junio C Hamano wrote:

> > +	while (strbuf_getline(&line, fh, '\n') != EOF) {
> > +		credential_from_url(&entry, line.buf);
> > +		if (entry.username && entry.password &&
> > +		    credential_match(c, &entry)) {
> 
> This looks curious; isn't checking .username and .password part of the
> responsibility of credential_match()? And even if entry lacks password
> (which won't happen in the context of this program, given the
> implementation of store_credential() below) shouldn't it still be
> considered a match?

credential_match will check .username, if the pattern mentions it. It
will never check .password. My intent here was to enforce well-formed
entries in the credential file. So you could add:

  http://example.com/

to the credential file, but it's just meaningless noise. It
doesn't actually tell us a username or password.

The helper won't add such an entry itself, but given the simplicity of
the format, I wanted to leave the door open for curious hackers to
populate it manually if they choose.

I think you're right that:

  http://user@example.com/

is potentially meaningful, and this would skip that. OTOH, you would be
much better served to just do:

  git config credential.http://example.com.username user

So I consider it a slight abuse of this helper in the first place.

> > +static void rewrite_credential_file(const char *fn, struct credential *c,
> > +				    struct strbuf *extra)
> > +{
> > +	umask(077);
> 
> Curious placement of umask(). I would expect a function that has its own
> call to umask() restore it before it returns, and a stand-alone program
> whose sole purpose is to work with a private file, setting a tight umask
> upfront at the beginning of main() may be easier to understand.

I think that is largely a holdover from the original implementation,
which set the umask and did other black magic before calling
git_config_set. I agree it would make more sense at the beginning of the
program. Will change.

> > +	if (hold_lock_file_for_update(&credential_lock, fn, 0) < 0)
> > +		die_errno("unable to get credential storage lock");
> > +	parse_credential_file(fn, c, NULL, print_line);
> > +	if (extra)
> > +		print_line(extra);
> 
> An entry for a newly updated password comes at the end of the file,
> instead of replacing an entry already in the file in-place? Given that
> parse_credential_file() when processing a look-up request (which is the
> majority of the case) stops upon finding a match, it might make more sense
> to have the new one (which may be expected to be used often) at the
> beginning instead, no?

Yeah. It's a linear search. Your worst-case is always going to be O(n),
but I just assumed n would remain relatively small and we wouldn't care
(if it isn't, the right solution is probably a smarter data structure).

But your optimization is trivial to implement, so it's probably worth
doing.

> > +	if (commit_lock_file(&credential_lock) < 0)
> > +		die_errno("unable to commit credential store");
> > +}
> > +
> > +static void store_credential(const char *fn, struct credential *c)
> > +{
> > +	struct strbuf buf = STRBUF_INIT;
> > +
> > +	if (!c->protocol || !(c->host || c->path) ||
> > +	    !c->username || !c->password)
> > +		return;
> [...]
> > +static void remove_credential(const char *fn, struct credential *c)
> > +{
> > +	if (!c->protocol || !(c->host || c->path))
> > +		return;
> 
> The choice of the fields looks rather arbitrary. I cannot say "remove all
> the credentials whose username is 'gitster' at 'github.com' no matter what
> protocol is used", but I can say "remove all credentials under any name
> for any host as long as the transfer goes over 'https' and accesses a
> repository at 'if/xyzzy' path", it seems.

It is kind of arbitrary. The storage format is URLs, which is why
store_credential is a little pedantic. We can't store something that
doesn't have a protocol part, as that is a required part of the URL
(actually, in URL-speak this is the "scheme"; I wonder if we should use
the same term here).

I was thinking we need a protocol for the same reason in
remove_credential, but I think you are right. We never actually convert
it to a URL, so in theory you could do:

  git credential-store erase <<\EOF
  username=gitster
  host=github.com
  EOF

Again, not an operation that git will ever perform, but I guess
something that people might want to do (I had always assumed the
"$EDITOR ~/.git-credentials" was going to be the preferred way of doing
such operations :) ).

I don't think there's any harm in loosening that condition.

-Peff

^ permalink raw reply

* Re: [PATCH] gitweb: Don't append ';js=(0|1)' to external links
From: Jürgen Kreileder @ 2011-11-29 21:31 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <m3pqgaloda.fsf@localhost.localdomain>

On Tue, Nov 29, 2011 at 20:28, Jakub Narebski <jnareb@gmail.com> wrote:
> Jürgen Kreileder <jk@blackdown.de> writes:
[...]
> Thanks for this, but I think a better solution would be to explicitly
> mark the few external links we have e.g. with 'class="external"', and
> use that to avoid adding ';js=(0|1)' to them.

This won't work because there are more than a few external links.  Think of
links added in the header or footer or via a project specific README.html.

You would have to do it the other way round: Mark all internal links.

> This has the advantage that we can use different style to mark
> outgoing external links.
>
> I even have such patch somewhere in the StGit stack...
> -- >8 --
> Subject: [PATCH] gitweb: Mark external links
>
> ...and do not add 'js=1' to them with JavaScript.
>
> Both $logo_url and $home_link links are now marked with "external"
> class, and fixLink does not add 'js=1' to them on click.  We add
> 'js=1' to internal link to make server-side of gitweb know that it can
> use JavaScript-only actions; we shouldn't do this for extrenal links,
> as 'js=1' might mean something else to them.
>
> Note that only links using A element matter: images (linked using
> IMG), stylesheets (linked using STYLE) and JavaScript files (linked
> using SCRIPT) were never affected.
>
> Signed-off-by: Jakub Narebski <jnareb@gmail.com>
> ---
>  gitweb/gitweb.perl                       |    5 ++++-
>  gitweb/static/js/javascript-detection.js |    5 +++++
>  2 files changed, 9 insertions(+), 1 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 7456a4b..f1c1caa 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3626,13 +3626,16 @@ EOF
>        print "<div class=\"page_header\">\n";
>        if (defined $logo) {
>                print $cgi->a({-href => esc_url($logo_url),
> +                              -class => "external",
>                               -title => $logo_label},
>                              $cgi->img({-src => esc_url($logo),
>                                         -width => 72, -height => 27,
>                                         -alt => "git",
>                                         -class => "logo"}));
>        }
> -       print $cgi->a({-href => esc_url($home_link)}, $home_link_str) . " / ";
> +       print $cgi->a({-href => esc_url($home_link)
> +                      -class => "external"},
> +                     $home_link_str) . " / ";
>        if (defined $project) {
>                print $cgi->a({-href => href(action=>"summary")}, esc_html($project));
>                if (defined $action) {
> diff --git a/gitweb/static/js/javascript-detection.js b/gitweb/static/js/javascript-detection.js
> index 2b51e55..fc59e42 100644
> --- a/gitweb/static/js/javascript-detection.js
> +++ b/gitweb/static/js/javascript-detection.js
> @@ -60,6 +60,11 @@ function fixLink(link) {
>         */
>        var jsExceptionsRe = /[;?]js=[01]$/;
>
> +       // don't change links marked as external ($logo_url, $home_link)
> +       if (link.className === 'external') {
> +               return;
> +       }
> +
>        if (!jsExceptionsRe.test(link)) { // =~ /[;?]js=[01]$/;
>                link.href +=
>                        (link.href.indexOf('?') === -1 ? '?' : ';') + 'js=1';
>
>
>



-- 
http://blog.blackdown.de/
http://www.flickr.com/photos/jkreileder/

^ permalink raw reply

* Re: [PATCH 11/13] strbuf: add strbuf_add*_urlencode
From: Jeff King @ 2011-11-29 21:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vzkfessff.fsf@alter.siamese.dyndns.org>

On Tue, Nov 29, 2011 at 10:19:00AM -0800, Junio C Hamano wrote:

> > +static int is_rfc3986_reserved(char ch)
> > +{
> > +	switch (ch) {
> > +	case '!': case '*': case '\'': case '(': case ')': case ';':
> > +	case ':': case '@': case '&': case '=': case '+': case '$':
> > +	case ',': case '/': case '?': case '#': case '[': case ']':
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> 
> Part of me wonders if we still have extra bits in sane_ctype[] array but
> that one is cumbersome to update, and the above should be easier to read
> and maintain.

We have 2 bits left. I did consider it, but it just seemed excessively
cumbersome for something that really doesn't need to be that fast (if it
is indeed any faster than this case statement).

> > +void strbuf_add_urlencode(struct strbuf *sb, const char *s, size_t len,
> > +			  int reserved)
> 
> Does "reserved" parameter mean "must-encode-reserved", or
> "may-encode-reserved" (the latter would be more like "if set to 0,
> per-cent encoding the result would be an error")?

It is "must-encode-reserved". The difference, from my reading of the
rfc, is that we can relax our encoding in the path-name portion of the
URI. For example, in:

  https://user@host/path/to/repo.git

You definitely want to quote "/" in the user or hostname, but doing so
in path/to/repo.git is just annoying.

-Peff

^ permalink raw reply

* Re: [PATCH 03/13] introduce credentials API
From: Jeff King @ 2011-11-29 21:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmxbeu91d.fsf@alter.siamese.dyndns.org>

On Tue, Nov 29, 2011 at 09:34:54AM -0800, Junio C Hamano wrote:

> >> > +`credential_fill`::
> >> > +
> >> > +	Attempt to fill the username and password fields of the passed
> >> > +	credential struct, first consulting storage helpers, then asking
> >> > +	the user. Guarantees that the username and password fields will
> >> > +	be filled afterwards (or die() will be called).

> Immensely, at least to me. From the perspective of a user of the API, a
> call to credential_fill() is to "fill in the credential" in the sense that
> "call the function to fill in the credential" but I find it easier to
> understand if it were explained to me as "ask the API to fill in the
> credential, which may involve helpers to interact with the user--the point
> of the API is that the caller does not care how it is done".  Same for the
> reject/accept calls---the example makes it clear that they are to tell the
> decision to reject/accept made by the application to the credential API,
> and it is up to the API layer what it does using that decision (like
> removing the cached and now stale password).

Ahh, I see your confusion now. It is not that the description is
necessarily lacking any content, but that the tense of the sentences is
misleading. I'll fix that.

> The above example is a bit too simplistic and misleading, though. You
> would call reject only on authentication failure (do not trash stored and
> good password upon network being unreachable temporarily or the server
> being overloaded).

Good point. I'll fix that and add the example to the documentation.

> > So one possible rule would be:
> >
> >   1. If it starts with "!", clip off the "!" and hand it to the shell.
> >
> >   2. Otherwise, if is_absolute_path(), hand it to the shell directly.
> >
> >   3. Otherwise, prepend "git credential-" and hand it to the shell.
> >
> > I think that is slightly less confusing than the "first word is alnum"
> > thing.
> 
> Simpler and easier to explain. Good ;-)

OK, I'll implement that, then.

> > How do you feel about the "values cannot contain a newline" requirement?
> 
> In the context of asking username, password, or passphrase, I think "LF is
> the end of the line and you cannot have that byte in your response" is
> perfectly reasonable. I've yet to find a way to use LF in a passphrase to
> unlock my Gnome keychain ;-).

The potential issue is that other values get that, too. So if you have a
URL with "\n" in the path, it cannot be transmitted verbatim. We can
url-encode, of course, but I didn't want the helpers to have to deal
with quoting issues.

> >> Two style nits.
> >
> > I'm supposed to guess? ;P
> 
> Sorry, but you guessed right.

OK, will fix.

-Peff

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Jeff King @ 2011-11-29 20:59 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Bill Zaumen, git, gitster, spearce, torvalds
In-Reply-To: <CACsJy8DboVU4kSbJSV=8NP08OyLYVgOKsm8tt=koZ0=JcGSE=A@mail.gmail.com>

On Tue, Nov 29, 2011 at 09:04:06PM +0700, Nguyen Thai Ngoc Duy wrote:

> > You can fix this by including an extra header in the signed part of the
> > tag that says "also, the digest of the commit I point to is X". Then you
> > know you have the same commit that Linus had. But the commit points to a
> > tree by its sha1. So you have to add a similar header in the commit
> > object that says "also, the digest of the tree I point to is X". And
> > ditto for all of the parent pointers, if you want to care about signing
> > history. And then you have the same problem in the tree: each sub-tree
> > and blob is referenced by its sha1.
> >
> Can we just hash all objects in a pack from bottom up, (replacing
> sha-1 in trees/commits/tags with the new digest in memory before
> hashing), then attach the new top digest to tag's content? The sender
> is required by the receiver to send new digests for all objects in the
> pack together with the pack. The receiver can then go through the same
> process to produce the top digest and match it with one saved in tag.

I think that is conflating two different layers of git. The security for
tags happens at the conceptual object db layer: you sign a tag, and that
points to a commit, which points to a tree, and so on. The authenticity
comes from the tag signature, but the integrity of each link in the
chain is verifiable because of the has property. The pack layer, on the
other hand, is just an implementation detail about how those conceptual
objects are stored. More than just your tag will be in a pack, and the
contents of your tag may be spread across several packs (or even loose
objects).

So I don't think it's right to talk about packs at all in the signature
model.

If you wanted to say "make a digest of all of the sub-objects pointed to
by the tag", then yes, that does work (security-wise). But it's
expensive to calculate. Instead, you want to use a "digest of digests"
as much as possible. Which is what git already does, of course; you hash
the tree object, which contains hashes of the blob sha1s. Git's
conceptual model is fine. The only problem is that sha1 is potentially
going to lose its security properties, weakening the links in the chain.
So as much as possible, we want to insert additional links at the exact
same places, but using a stronger algorithm.

Does that make sense?

-Peff

^ permalink raw reply

* Re: support gnupg-2.x in git.
From: Junio C Hamano @ 2011-11-29 20:29 UTC (permalink / raw)
  To: Paweł Sikora; +Cc: git
In-Reply-To: <201111291937.34324.pawel.sikora@agmk.net>

Paweł Sikora <pawel.sikora@agmk.net> writes:

> i'm using a gnupg-2.0.18 and currently i'm not able to use git tag/verify
> due to hadcoded "gpg" literals in builtin/{tag,verifiy-tag}.c.

Stating the obvious...

  $ ln -s /usr/local/not/on/my/path/bin/gnupg-2.0.18 $HOME/bin/gpg
  $ PATH=$HOME/bin:$PATH

Or this untested patch, which applies on top of jc/signed-commit, as the
GnuPG interface is in the process of getting heavily refactored.

-- >8 --
Subject: gpg-interface: allow use of a custom GPG binary

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt  |   11 +++++++++++
 Documentation/git-tag.txt |    8 +++++---
 gpg-interface.c           |   11 ++++++++---
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index b30c7e6..094c1c9 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1094,6 +1094,17 @@ grep.lineNumber::
 grep.extendedRegexp::
 	If set to true, enable '--extended-regexp' option by default.
 
+gpg.program::
+	Use this custom program instead of "gpg" found on $PATH when
+	making or verifying a PGP signature. The program must support the
+	same command line interface as GPG, namely, to verify a detached
+	signature, "gpg --verify $file - <$signature" is run, and the
+	program is expected to signal a good signature by exiting with
+	code 0, and to generate an ascii-armored detached signature, the
+	standard input of "gpg -bsau $key" is fed with the contents to be
+	signed, and the program is expected to send the result to its
+	standard output.
+
 gui.commitmsgwidth::
 	Defines how wide the commit message window is in the
 	linkgit:git-gui[1]. "75" is the default.
diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
index c83cb13..74fc7e0 100644
--- a/Documentation/git-tag.txt
+++ b/Documentation/git-tag.txt
@@ -38,7 +38,9 @@ created (i.e. a lightweight tag).
 A GnuPG signed tag object will be created when `-s` or `-u
 <key-id>` is used.  When `-u <key-id>` is not used, the
 committer identity for the current user is used to find the
-GnuPG key for signing.
+GnuPG key for signing. 	The configuration variable `gpg.program`
+is used to specify custom GnuPG binary.
+
 
 OPTIONS
 -------
@@ -48,11 +50,11 @@ OPTIONS
 
 -s::
 --sign::
-	Make a GPG-signed tag, using the default e-mail address's key
+	Make a GPG-signed tag, using the default e-mail address's key.
 
 -u <key-id>::
 --local-user=<key-id>::
-	Make a GPG-signed tag, using the given key
+	Make a GPG-signed tag, using the given key.
 
 -f::
 --force::
diff --git a/gpg-interface.c b/gpg-interface.c
index ff232c8..18630ff 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -5,6 +5,7 @@
 #include "sigchain.h"
 
 static char *configured_signing_key;
+static const char *gpg_program = "gpg";
 
 void set_signing_key(const char *key)
 {
@@ -15,9 +16,12 @@ void set_signing_key(const char *key)
 int git_gpg_config(const char *var, const char *value, void *cb)
 {
 	if (!strcmp(var, "user.signingkey")) {
+		set_signing_key(value);
+	}
+	if (!strcmp(var, "gpg.program")) {
 		if (!value)
 			return config_error_nonbool(var);
-		set_signing_key(value);
+		gpg_program = xstrdup(value);
 	}
 	return 0;
 }
@@ -46,7 +50,7 @@ int sign_buffer(struct strbuf *buffer, struct strbuf *signature, const char *sig
 	gpg.argv = args;
 	gpg.in = -1;
 	gpg.out = -1;
-	args[0] = "gpg";
+	args[0] = gpg_program;
 	args[1] = "-bsau";
 	args[2] = signing_key;
 	args[3] = NULL;
@@ -101,10 +105,11 @@ int verify_signed_buffer(const char *payload, size_t payload_size,
 			 struct strbuf *gpg_output)
 {
 	struct child_process gpg;
-	const char *args_gpg[] = {"gpg", "--verify", "FILE", "-", NULL};
+	const char *args_gpg[] = {NULL, "--verify", "FILE", "-", NULL};
 	char path[PATH_MAX];
 	int fd, ret;
 
+	args_gpg[0] = gpg_program;
 	fd = git_mkstemp(path, PATH_MAX, ".git_vtag_tmpXXXXXX");
 	if (fd < 0)
 		return error("could not create temporary file '%s': %s",

^ permalink raw reply related

* Re: [PATCH] Make feed title valid utf8
From: Junio C Hamano @ 2011-11-29 20:11 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jürgen Kreileder, git
In-Reply-To: <m38vmylnfu.fsf@localhost.localdomain>

Thanks for patches and reviews.

I am not keeping track of what is acked unconditionally, what is acked
with reservation i.e. "with this tweak on top", and what is rejected with
suggestion of an alternative, though. I hope I can expect to see a series
that is finished between you two to be resent for application after 1.7.8
ships.

^ permalink raw reply

* Re: [PATCH] rebase -i: interrupt rebase when "commit --amend" failed during "reword"
From: Junio C Hamano @ 2011-11-29 20:08 UTC (permalink / raw)
  To: Andrew Wong; +Cc: git
In-Reply-To: <1322496952-23819-2-git-send-email-andrew.kw.w@gmail.com>

Andrew Wong <andrew.kw.w@gmail.com> writes:

> "commit --amend" could fail in cases like the user empties the commit
> message, or pre-commit failed.  When it fails, rebase should be
> interrupted, rather than ignoring the error and continue on rebasing.
> This gives users a way to gracefully interrupt a "reword" if they
> decided they actually want to do an "edit", or even "rebase --abort".
>
> Signed-off-by: Andrew Wong <andrew.kw.w@gmail.com>

Makes sense, especially if "commit" itself failed due to some unknown
reason or a refusal from the pre-commit hook. Even though a user could
have been using the "empty the commit log message and the original is
kept" as a trick to recover from a botched rewording attempt and this
change will regress for such use cases, I have a feeling that it does
not matter.

Is there anything we should be saying more than "fatal: Cannot amend" to
help users when this new "die" triggers? What is the recommended recovery
procedure? Run "git commit --amend" after doing whatever is needed to fix
the tree (e.g. if pre-commit refused because of a coding style violation,
it may involve fixing the tree being committed; if it refused because of a
typo in the log message, the tree itself may be OK and nothing needs to be
done) and then "git rebase --continue"?

>  git-rebase--interactive.sh |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 804001b..669f378 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -408,7 +408,8 @@ do_next () {
>  		mark_action_done
>  		pick_one $sha1 ||
>  			die_with_patch $sha1 "Could not apply $sha1... $rest"
> -		git commit --amend --no-post-rewrite
> +		git commit --amend --no-post-rewrite ||
> +			die_with_patch $sha1 "Cannot amend commit after successfully picking $sha1... $rest"
>  		record_in_rewritten $sha1
>  		;;
>  	edit|e)

^ permalink raw reply

* Re: [PATCH] Make feed title valid utf8
From: Jakub Narebski @ 2011-11-29 19:48 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git, Jakub Narebski
In-Reply-To: <CAKD0UuxFVtCRT+hqO5vkDRanaX3Gvwf9MAFqNUwAiFA+wEwxXg@mail.gmail.com>

Subject: gitweb: Make feed title valid utf8

Jürgen Kreileder <jk@blackdown.de> writes:

> gitweb doesn't properly handle UTF8 site names when generating feed titles.
> 
> Signed-off-by: Juergen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..a2838c3 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -7482,7 +7482,7 @@ sub git_feed {
>  	return if ($cgi->request_method() eq 'HEAD');
> 
>  	# header variables
> -	my $title = "$site_name - $project/$action";
> +	my $title = to_utf8($site_name) . " - " . to_utf8($project) . "/$action";
>  	my $feed_type = 'log';
>  	if (defined $hash) {
>  		$title .= " - '$hash'";
> -- 

Thanks.  With the minor nit of prefixing subject with subsystem
designation, i.e. with "gitweb: " -- ACK.

-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH] gitweb: Output valid utf8 in git_blame_common('data')
From: Jakub Narebski @ 2011-11-29 19:46 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git, Jakub Narebski
In-Reply-To: <CAKD0Uuxq+wLdRy5r_hz9qgjHkDmFHHbeAVkb07HizX9xaGMptw@mail.gmail.com>

Jürgen Kreileder <jk@blackdown.de> writes:

> With javascript-actions enabled gitweb showed broken author names in
> the tooltips on blame pages.
> 
> Signed-off-by: Juergen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..c863afe 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -6107,7 +6107,9 @@ sub git_blame_common {
>  			-type=>"text/plain", -charset => "utf-8",
>  			-status=> "200 OK");
>  		local $| = 1; # output autoflush
> -		print while <$fd>;
> +		while (my $line = <$fd>) {
> +			print to_utf8($line);
> +		}
>  		close $fd
>  			or print "ERROR $!\n";
> 
> -- 

Thanks.  ACK.


BTW. all those troubles with not forgetting to call to_utf8() make me
wonder if we wouldn't be better to forget about supporting
$fallback_encoding and just put

  use open qw(:encoding(UTF-8) :std); 

at the beginning of gitweb, c.f. http://training.perl.com/OSCON2011/index.html


Or

  use open qw(:utf8 :std); 

though then we simply discard errors.

-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH] gitweb: Output site name with valid utf8 in OPML
From: Jakub Narebski @ 2011-11-29 19:35 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git
In-Reply-To: <CAKD0UuyJDaHyofM5VYG-R03mgYi_1QBJvQufv927+x6YYzPU2w@mail.gmail.com>

Jürgen Kreileder <jk@blackdown.de> writes:

> Signed-off-by: Juergen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..df747c1 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -7699,11 +7699,12 @@ sub git_opml {
>  		-charset => 'utf-8',
>  		-content_disposition => 'inline; filename="opml.xml"');
> 
> +	my $title = esc_html($site_name);
>  	print <<XML;
>  <?xml version="1.0" encoding="utf-8"?>
>  <opml version="1.0">
>  <head>
> -  <title>$site_name OPML Export</title>
> +  <title>$title OPML Export</title>
>  </head>
>  <body>
>  <outline text="git RSS feeds">
> -- 

Thanks.  That is certainly correct... but it is more than just
handling utf8, isn't it.  It was also about not escaping XML
(think of site name containing eg. '<< foo >>' etc.).

So the subject / commit message should be corrected.

-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH] gitweb: Escape attribute in chop_and_escape_str()
From: Jakub Narebski @ 2011-11-29 19:33 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git, Jakub Narebski
In-Reply-To: <CAKD0UuyOMRFHE6DvuDj0ancfwFfg8ADKab6emv99+FN5RfZ=mQ@mail.gmail.com>

Jürgen Kreileder <jk@blackdown.de> writes:

> Fixes the title attribute in <span title="Jürgen Kreileder">Jürgen
> Kreileder</span> for example because to_utf8() is called implicitly now.
> 
> (Not sure why the attribute is there at all in the example. From my
> point of view nothing got chopped.)

Hmmm... this should not happen because of

	my $chopped = chop_str(@_);
	if ($chopped eq $str) {
		return esc_html($chopped);
	} else 
 
Perhaps it is a matter of doing to_utf8() on $str prior to comparison?

> Signed-off-by: Juergen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..fd76407 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1699,7 +1699,7 @@ sub chop_and_escape_str {
>  		return esc_html($chopped);
>  	} else {
>  		$str =~ s/[[:cntrl:]]/?/g;
> -		return $cgi->span({-title=>$str}, esc_html($chopped));
> +		return $cgi->span({-title => esc_attr($str)}, esc_html($chopped));
>  	}
>  }
> 
> -- 

esc_attr() is a wrong solution here, because $cgi->span(...) should
properly escape attributes.  You should simply use to_utf8() or
sanitize().

Well, uless we simply do

  $str = to_utf8($str);

earlier.

-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH] gitweb: Don't append ';js=(0|1)' to external links
From: Jakub Narebski @ 2011-11-29 19:28 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git, Jakub Narebski
In-Reply-To: <CAKD0UuzU4hAe7RGYukGyPpvfGeYJ3pgJ5pynupneMpQSaX5Cjw@mail.gmail.com>

Jürgen Kreileder <jk@blackdown.de> writes:

> Signed-off-by: Juergen Kreileder <jk@blackdown.de>
> ---
>  gitweb/gitweb.perl                       |    2 +-
>  gitweb/static/js/javascript-detection.js |    4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4f0c3bd..dfe3407 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3974,7 +3974,7 @@ sub git_footer_html {
>  		print qq!<script type="text/javascript">\n!.
>  		      qq!window.onload = function () {\n!;
>  		if (gitweb_check_feature('javascript-actions')) {
> -			print qq!	fixLinks();\n!;
> +			print qq!	fixLinks('$my_url');\n!;
>  		}
>  		if ($jstimezone && $tz_cookie && $datetime_class) {
>  			print qq!	var tz_cookie = { name: '$tz_cookie', expires: 14, path:
> '/' };\n!. # in days
> diff --git a/gitweb/static/js/javascript-detection.js
> b/gitweb/static/js/javascript-detection.js
> index fa2596f..36964ad 100644
> --- a/gitweb/static/js/javascript-detection.js
> +++ b/gitweb/static/js/javascript-detection.js
> @@ -29,11 +29,11 @@ var jsExceptionsRe = /[;?]js=[01](#.*)?$/;
>   *
>   * @globals jsExceptionsRe
>   */
> -function fixLinks() {
> +function fixLinks(baseurl) {
>  	var allLinks = document.getElementsByTagName("a") || document.links;
>  	for (var i = 0, len = allLinks.length; i < len; i++) {
>  		var link = allLinks[i];
> -		if (!jsExceptionsRe.test(link)) {
> +		if (!jsExceptionsRe.test(link) && !link.href.indexOf(baseurl)) {
>  			link.href = link.href.replace(/(#|$)/,
>  				(link.href.indexOf('?') === -1 ? '?' : ';') + 'js=1$1');
>  		}
> -- 

Thanks for this, but I think a better solution would be to explicitly
mark the few external links we have e.g. with 'class="external"', and
use that to avoid adding ';js=(0|1)' to them.

This has the advantage that we can use different style to mark
outgoing external links.

I even have such patch somewhere in the StGit stack...
-- >8 --
Subject: [PATCH] gitweb: Mark external links

...and do not add 'js=1' to them with JavaScript.

Both $logo_url and $home_link links are now marked with "external"
class, and fixLink does not add 'js=1' to them on click.  We add
'js=1' to internal link to make server-side of gitweb know that it can
use JavaScript-only actions; we shouldn't do this for extrenal links,
as 'js=1' might mean something else to them.

Note that only links using A element matter: images (linked using
IMG), stylesheets (linked using STYLE) and JavaScript files (linked
using SCRIPT) were never affected.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl                       |    5 ++++-
 gitweb/static/js/javascript-detection.js |    5 +++++
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7456a4b..f1c1caa 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3626,13 +3626,16 @@ EOF
 	print "<div class=\"page_header\">\n";
 	if (defined $logo) {
 		print $cgi->a({-href => esc_url($logo_url),
+		               -class => "external",
 		               -title => $logo_label},
 		              $cgi->img({-src => esc_url($logo),
 		                         -width => 72, -height => 27,
 		                         -alt => "git",
 		                         -class => "logo"}));
 	}
-	print $cgi->a({-href => esc_url($home_link)}, $home_link_str) . " / ";
+	print $cgi->a({-href => esc_url($home_link)
+	               -class => "external"},
+	              $home_link_str) . " / ";
 	if (defined $project) {
 		print $cgi->a({-href => href(action=>"summary")}, esc_html($project));
 		if (defined $action) {
diff --git a/gitweb/static/js/javascript-detection.js b/gitweb/static/js/javascript-detection.js
index 2b51e55..fc59e42 100644
--- a/gitweb/static/js/javascript-detection.js
+++ b/gitweb/static/js/javascript-detection.js
@@ -60,6 +60,11 @@ function fixLink(link) {
 	 */
 	var jsExceptionsRe = /[;?]js=[01]$/;
 
+	// don't change links marked as external ($logo_url, $home_link)
+	if (link.className === 'external') {
+		return;
+	}
+
 	if (!jsExceptionsRe.test(link)) { // =~ /[;?]js=[01]$/;
 		link.href +=
 			(link.href.indexOf('?') === -1 ? '?' : ';') + 'js=1';

^ permalink raw reply related

* Re: Git Submodule Problem - Bug?
From: Junio C Hamano @ 2011-11-29 19:21 UTC (permalink / raw)
  To: Manuel Koller
  Cc: Jens Lehmann, Fredrik Gustafsson, Thomas Rast, git, Heiko Voigt
In-Reply-To: <839E634E-76C0-40ED-8CCC-43E52F782079@gmail.com>

Manuel Koller <koller.manuel@gmail.com> writes:

> In have wrote a workaround for the problem i posted that goes into this
> direction. I just check whether the url has changed and remove the
> submodule by hand if it did.

... which might be wrong, depending on how you implement it, so be careful
(see below).

> ...
> What is the purpose of having two different submodules in the same path?
> Identifying the submodule by url however would probably make things
> considerably faster...

There are three primary things involved when talking about a submodule in
a superproject. Its name, its path in the superproject hierarchy, and its
URL (if it is foreign). The canonical example used when discussing the
design of Git submodule support is to imagine your superproject that is
about a small router that has a kernel component as a submodule that lives
in kernel/ hierarchy (your userland may live elsewhere in the tree, either
as a submodule bound at src/ hierarchy or contained in the superproject
itself and it does not make a difference).

You may had two major versions of your product. v1.0 line used to use BSD
kernels, and v2.0 line with linux-2.6 kernels. Even though a checkout of
any version of your superproject has a submodule bound at its kernel/
directory, the logical identity of the submodule used in your v1.0 line of
product and v2.0 line are different. You can differenciate them by giving
them different names, say 'kernel' and 'linux-kernel' [*1*]. When you
checkout the v1.0 tag, kernel/ directory should be populated by the
submodule that house the BSD kernel. When you checkout the v2.0 tag, it
should house the Linux 2.6 kernel.

Now, your v3.0 line will still use linux-2.6 (and now linux-3.0)
kernel. Logically, the kernel project is the same thing. linux-3.0 is just
a continuation of linux-2.6 series and maintained in the *logically* same
upstream repository. However, the repository location has recently changed
and now lives in

    http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

even though it used to be linux-2.6.git in the same hierarchy [*2*]. The
name of the submodule ('linux-kernel') does not have to change, as the
repository in the new location has all the history for linux-2.6 kernel as
well. The URL mapping recorded in .gitmodules in-tree from the submodule
name to the location however needs to change [*3*]. Ideally (I do not
think the current "git submodule" implementation does this), there should
be a more automated way to optionally allow switching the upstream URL of
the 'linux-kernel' subproject bound at kernel/ directory when you switch
between commits in the v2.0 series and in the v3.0 series of the
superproject [*3*], so that the user can say "Yes, I know linux.git URL
contains everything needed from linux-2.6.git URL, so just update the
submodule's upstream URL to linux.git one, and keep it like so; there is
no need to use the older URL even when I check out v2.0 from the
superproject" or "Notice when the URL recorded in .gitmodules for this
project changes when I check out a different version of the superproject,
and use the old URL to update the submodule when I check out the old
version of the superproject".

Even though it is an ancient design discussion, this thread is worth a
read before discussing anything about submodules:

  http://thread.gmane.org/gmane.comp.version-control.git/47466/focus=47498

not because the ideas that were discussed but not implemented in the
current "git submodule" are all good things that should be added, but
because the discussion shows what kind of considerations need to go into
the design.


[Footnotes]

*1* In the ideal world, you would have called it 'bsd' back in your v1.0
days with a foresight that someday you would switch to 'linux', but it is
likely you would not have been perfect.

*2* You can still access it as linux-2.6.git because k.org is trying to be
nice to its users, but that is besides the point.

*3* It becomes relevant if you imagine a case where the old linux-2.6.git
repository at k.org were left frozen at the last version of the 2.6 series
and the new history is only available in linux.git repository.

^ permalink raw reply

* support gnupg-2.x in git.
From: Paweł Sikora @ 2011-11-29 18:37 UTC (permalink / raw)
  To: git

Hi,

i'm using a gnupg-2.0.18 and currently i'm not able to use git tag/verify
due to hadcoded "gpg" literals in builtin/{tag,verifiy-tag}.c.
could you please improve git to use global config key with specified path
to gpg{,2} executable?

BR,
Paweł.

please CC me on reply.

^ permalink raw reply

* log: option "--follow" not the default for a single file?
From: Ralf Thielow @ 2011-11-29 18:25 UTC (permalink / raw)
  To: git

Why is the option "--follow" not the default if the log-command
is used with a single file? Many GUI tools don't show me the
full history of a single file if there was a rename in it.

Thanks

^ permalink raw reply

* Re: [PATCH 12/13] credentials: add "store" helper
From: Junio C Hamano @ 2011-11-29 18:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20111124110756.GJ8417@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> +static void parse_credential_file(const char *fn,
> +				  struct credential *c,
> +				  void (*match_cb)(struct credential *),
> +				  void (*other_cb)(struct strbuf *))
> +{
> +	FILE *fh;
> +	struct strbuf line = STRBUF_INIT;
> +	struct credential entry = CREDENTIAL_INIT;
> +
> +	fh = fopen(fn, "r");
> +	if (!fh) {
> +		if (errno != ENOENT)
> +			die_errno("unable to open %s", fn);
> +		return;
> +	}
> +
> +	while (strbuf_getline(&line, fh, '\n') != EOF) {
> +		credential_from_url(&entry, line.buf);
> +		if (entry.username && entry.password &&
> +		    credential_match(c, &entry)) {

This looks curious; isn't checking .username and .password part of the
responsibility of credential_match()? And even if entry lacks password
(which won't happen in the context of this program, given the
implementation of store_credential() below) shouldn't it still be
considered a match?

> +			if (match_cb) {
> +				match_cb(&entry);
> +				break;

Multiple matches not allowed, which sounds fine.

> +			}
> +		}
> +		else if (other_cb)
> +			other_cb(&line);
> +	}
> +
> +	credential_clear(&entry);
> +	strbuf_release(&line);
> +	fclose(fh);
> +}
> +
> +static void print_entry(struct credential *c)
> +{
> +	printf("username=%s\n", c->username);
> +	printf("password=%s\n", c->password);
> +}
> +
> +static void print_line(struct strbuf *buf)
> +{
> +	strbuf_addch(buf, '\n');
> +	write_or_die(credential_lock.fd, buf->buf, buf->len);
> +}
> +
> +static void rewrite_credential_file(const char *fn, struct credential *c,
> +				    struct strbuf *extra)
> +{
> +	umask(077);

Curious placement of umask(). I would expect a function that has its own
call to umask() restore it before it returns, and a stand-alone program
whose sole purpose is to work with a private file, setting a tight umask
upfront at the beginning of main() may be easier to understand.

> +	if (hold_lock_file_for_update(&credential_lock, fn, 0) < 0)
> +		die_errno("unable to get credential storage lock");
> +	parse_credential_file(fn, c, NULL, print_line);
> +	if (extra)
> +		print_line(extra);

An entry for a newly updated password comes at the end of the file,
instead of replacing an entry already in the file in-place? Given that
parse_credential_file() when processing a look-up request (which is the
majority of the case) stops upon finding a match, it might make more sense
to have the new one (which may be expected to be used often) at the
beginning instead, no?

> +	if (commit_lock_file(&credential_lock) < 0)
> +		die_errno("unable to commit credential store");
> +}
> +
> +static void store_credential(const char *fn, struct credential *c)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	if (!c->protocol || !(c->host || c->path) ||
> +	    !c->username || !c->password)
> +		return;
> +
> +	strbuf_addf(&buf, "%s://", c->protocol);
> +	strbuf_addstr_urlencode(&buf, c->username, 1);
> +	strbuf_addch(&buf, ':');
> +	strbuf_addstr_urlencode(&buf, c->password, 1);
> +	strbuf_addch(&buf, '@');
> +	if (c->host)
> +		strbuf_addstr_urlencode(&buf, c->host, 1);
> +	if (c->path) {
> +		strbuf_addch(&buf, '/');
> +		strbuf_addstr_urlencode(&buf, c->path, 0);
> +	}
> +
> +	rewrite_credential_file(fn, c, &buf);
> +	strbuf_release(&buf);
> +}
> +
> +static void remove_credential(const char *fn, struct credential *c)
> +{
> +	if (!c->protocol || !(c->host || c->path))
> +		return;

The choice of the fields looks rather arbitrary. I cannot say "remove all
the credentials whose username is 'gitster' at 'github.com' no matter what
protocol is used", but I can say "remove all credentials under any name
for any host as long as the transfer goes over 'https' and accesses a
repository at 'if/xyzzy' path", it seems.

This filtering matches what lookup_credential() does but shouldn't it be
implemented at a single place in any case?

> +	rewrite_credential_file(fn, c, NULL);
> +}
> +
> +static int lookup_credential(const char *fn, struct credential *c)
> +{
> +	if (!c->protocol || !(c->host || c->path))
> +		return 0;
> +	parse_credential_file(fn, c, print_entry, NULL);
> +	return c->username && c->password;
> +}

^ permalink raw reply

* Re: [PATCH 11/13] strbuf: add strbuf_add*_urlencode
From: Junio C Hamano @ 2011-11-29 18:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20111124110728.GI8417@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> This just follows the rfc3986 rules for percent-encoding
> url data into a strbuf.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  strbuf.c |   37 +++++++++++++++++++++++++++++++++++++
>  strbuf.h |    5 +++++
>  2 files changed, 42 insertions(+), 0 deletions(-)
>
> diff --git a/strbuf.c b/strbuf.c
> index 3ad2cc0..60e5e59 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -397,3 +397,40 @@ int strbuf_read_file(struct strbuf *sb, const char *path, size_t hint)
>  
>  	return len;
>  }
> +
> +static int is_rfc3986_reserved(char ch)
> +{
> +	switch (ch) {
> +	case '!': case '*': case '\'': case '(': case ')': case ';':
> +	case ':': case '@': case '&': case '=': case '+': case '$':
> +	case ',': case '/': case '?': case '#': case '[': case ']':
> +		return 1;
> +	}
> +	return 0;
> +}

Part of me wonders if we still have extra bits in sane_ctype[] array but
that one is cumbersome to update, and the above should be easier to read
and maintain.

> +void strbuf_add_urlencode(struct strbuf *sb, const char *s, size_t len,
> +			  int reserved)

Does "reserved" parameter mean "must-encode-reserved", or
"may-encode-reserved" (the latter would be more like "if set to 0,
per-cent encoding the result would be an error")?

^ permalink raw reply

* Re: [PATCH 1/2] checkout,merge: loosen overwriting untracked file check based on info/exclude
From: Junio C Hamano @ 2011-11-29 18:17 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Johannes Sixt, Taylor Hedberg, Bertrand BENOIT
In-Reply-To: <1322388933-6284-1-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> Back in 1127148 (Loosen "working file will be lost" check in
> Porcelain-ish - 2006-12-04), git-checkout.sh learned to quietly
> overwrite ignored files. Howver the code only took .gitignore files
> into account.

v1.5.3.5-721-g039bc64 also tried to make it harder to make this kind of
mistake, and forgot to spot it in git-checkout.sh which was a bit
unfortunate.

Thanks. I think both patches make sense, and especially 2/2 opens the door
to possibly make ignored ones automatically precious.

^ permalink raw reply

* Re: Git Submodule Problem - Bug?
From: Manuel Koller @ 2011-11-29 18:15 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Fredrik Gustafsson, Thomas Rast, git, Heiko Voigt
In-Reply-To: <4ED5196B.5030200@web.de>

Thanks for taking my problem seriously.

On 29.11.2011, at 18:42, Jens Lehmann wrote:

> Am 29.11.2011 11:41, schrieb Fredrik Gustafsson:
>> On Tue, Nov 29, 2011 at 11:25:41AM +0100, Thomas Rast wrote:
>>> So maybe the right questions to ask would be: what's the *official*
>>> way of removing a submodule completely?  Do we support overwriting
>>> submodules in the way Manuel wanted to?  Why not? :-)
>> 
>> I suggest that we add a command for that;
>> git submodule remove <submodule>
> 
> Hmm, to me it looks like the problem is in "git submodule add". It
> doesn't check if the submodule repo it finds in .git/modules matches
> the one the user wants to create. So we end up reviving the first
> submodule although the user wants to use a completely different repo.

In have wrote a workaround for the problem i posted that goes into this direction. I just check whether the url has changed and remove the submodule by hand if it did. See https://github.com/kollerma/git-submodule-tools if you're interested (its in git-fix-submodules).

> One solution could be to only let "git submodule update" revive
> submodules from .git/modules and make "git submodule add" error out
> if it finds the git directory of a submodule with the same name in
> .git/modules. But currently there is no way to tell "git submodule add"
> to use a different submodule name (it always uses the path as a name),
> so we might have to add an option to do that and tell the user in the
> error message how he can add a different submodule under the same
> path.
> 
> Another solution could be that "git submodule add" detects that a
> submodule with the name "sub" did exist and chooses a different name
> (say "sub2") for the the new one. Then the user wouldn't have to
> cope with the problem himself.

What is the purpose of having two different submodules in the same path? Identifying the submodule by url however would probably make things considerably faster. At least in my case, since I have checked out different branches of the same repo at different paths as submodules. 

^ permalink raw reply

* Re: Git Submodule Problem - Bug?
From: Jens Lehmann @ 2011-11-29 17:42 UTC (permalink / raw)
  To: Manuel Koller; +Cc: Fredrik Gustafsson, Thomas Rast, git, Heiko Voigt
In-Reply-To: <20111129104105.GA10839@kolya>

Am 29.11.2011 11:41, schrieb Fredrik Gustafsson:
> On Tue, Nov 29, 2011 at 11:25:41AM +0100, Thomas Rast wrote:
>> So maybe the right questions to ask would be: what's the *official*
>> way of removing a submodule completely?  Do we support overwriting
>> submodules in the way Manuel wanted to?  Why not? :-)
> 
> I suggest that we add a command for that;
> git submodule remove <submodule>

Hmm, to me it looks like the problem is in "git submodule add". It
doesn't check if the submodule repo it finds in .git/modules matches
the one the user wants to create. So we end up reviving the first
submodule although the user wants to use a completely different repo.

One solution could be to only let "git submodule update" revive
submodules from .git/modules and make "git submodule add" error out
if it finds the git directory of a submodule with the same name in
.git/modules. But currently there is no way to tell "git submodule add"
to use a different submodule name (it always uses the path as a name),
so we might have to add an option to do that and tell the user in the
error message how he can add a different submodule under the same
path.

Another solution could be that "git submodule add" detects that a
submodule with the name "sub" did exist and chooses a different name
(say "sub2") for the the new one. Then the user wouldn't have to
cope with the problem himself.

^ permalink raw reply

* Re: [PATCH 03/13] introduce credentials API
From: Junio C Hamano @ 2011-11-29 17:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20111129050425.GA32022@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Mon, Nov 28, 2011 at 01:46:35PM -0800, Junio C Hamano wrote:
>
>> > diff --git a/Documentation/technical/api-credentials.txt b/Documentation/technical/api-credentials.txt
>> > new file mode 100644
>> > index 0000000..3061077
>> > --- /dev/null
>> > +++ b/Documentation/technical/api-credentials.txt
>> > @@ -0,0 +1,148 @@
>> > + ...
>> > +`credential_fill`::
>> > +
>> > +	Attempt to fill the username and password fields of the passed
>> > +	credential struct, first consulting storage helpers, then asking
>> > +	the user. Guarantees that the username and password fields will
>> > +	be filled afterwards (or die() will be called).
>> > +
>> > +`credential_reject`::
>> > +
>> > +	Inform the credential subsystem that the provided credentials
>> > +	have been rejected. This will notify any storage helpers of the
>> > +	rejection (which allows them to, for example, purge the invalid
>> > +	credentials from storage), and then clear the username and
>> > +	password fields in `struct credential`. It can then be
>> > +	`credential_fill`-ed again.
>> > +
>> > +`credential_approve`::
>> > +
>> > +	Inform the credential subsystem that the provided credentials
>> > +	were successfully used for authentication. This will notify any
>> > +	storage helpers of the approval, so that they can store the
>> > +	result to be used again.
>> 
>> It's a bit hard to read and understand which part of the system calls
>> these and which other part of the system is responsible for implementing
>> them, and how "helper" fits into the picture (perhaps calling some of
>> these interfaces will result in "helper" getting called?).
>
> How about following it with a rough example of how they would be used,
> like:
>
>   /*
>    * Create a credential with some context; we don't yet know the
>    * username or password.
>    */
>   struct credential c = CREDENTIAL_INIT;
>   c.protocol = xstrdup("https");
>   c.host = xstrdup("example.com");
>   c.path = xstrdup("foo.git");
>
>   /*
>    * Fill in the username and password fields by contacting helpers
>    * and/or asking the user. The function will die if it fails.
>    */
>   credential_fill(&c);
>
>   /*
>    * And then finally make our request, reporting back to the credential
>    * system on whether it succeeded or failed.
>    */
>   if (make_http_request(c.host, c.path, c.username, c.password) < 0)
>           credential_reject(&c);
>   else
>           credential_accept(&c);
>
> Does that make it more clear?

Immensely, at least to me. From the perspective of a user of the API, a
call to credential_fill() is to "fill in the credential" in the sense that
"call the function to fill in the credential" but I find it easier to
understand if it were explained to me as "ask the API to fill in the
credential, which may involve helpers to interact with the user--the point
of the API is that the caller does not care how it is done".  Same for the
reject/accept calls---the example makes it clear that they are to tell the
decision to reject/accept made by the application to the credential API,
and it is up to the API layer what it does using that decision (like
removing the cached and now stale password).

The above example is a bit too simplistic and misleading, though. You
would call reject only on authentication failure (do not trash stored and
good password upon network being unreachable temporarily or the server
being overloaded).

> So one possible rule would be:
>
>   1. If it starts with "!", clip off the "!" and hand it to the shell.
>
>   2. Otherwise, if is_absolute_path(), hand it to the shell directly.
>
>   3. Otherwise, prepend "git credential-" and hand it to the shell.
>
> I think that is slightly less confusing than the "first word is alnum"
> thing.

Simpler and easier to explain. Good ;-)

> How do you feel about the "values cannot contain a newline" requirement?

In the context of asking username, password, or passphrase, I think "LF is
the end of the line and you cannot have that byte in your response" is
perfectly reasonable. I've yet to find a way to use LF in a passphrase to
unlock my Gnome keychain ;-).

>> > +int credential_read(struct credential *c, FILE *fp)
>> > +{
>> > ...
>> > +			c->host = xstrdup(value);
>> > +		}
>> > +		else if (!strcmp(key, "path")) {
>> > ...
>> > +		/* ignore other lines; we don't know what they mean, but
>> > +		 * this future-proofs us when later versions of git do
>> > +		 * learn new lines, and the helpers are updated to match */
>> 
>> Two style nits.
>
> I'm supposed to guess? ;P

Sorry, but you guessed right.

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Shawn Pearce @ 2011-11-29 17:08 UTC (permalink / raw)
  To: Bill Zaumen; +Cc: git, gitster, pclouds, peff, torvalds
In-Reply-To: <1322546563.1719.22.camel@yos>

On Mon, Nov 28, 2011 at 22:02, Bill Zaumen <bill.zaumen+git@gmail.com> wrote:
> Several years ago (in 2006) there was a discussion about the whether
> SHA-1 is adequate given the very small but non-zero probability of a
> hash collision, particularly given the possibility of a malicious
> attempt to generate a collision. At roughly the same time, Git was
> modified to support "thin packs" during data transfers. These allow
> one to send deltas based on objects that are not in the pack file
> being transferred. As a result a previously undetected hash collision
> could result in a corrupted repository when (for example) the same
> delta is applied to different objects that have the same SHA-1 hash.

I don't think you understand how these thin packs are processed.

If the pack contains <100 objects, it is inflated to loose objects. If
the receiving side (so client in fetch, server in push) already has an
object by that SHA-1, the new object is discarded. If the pack
contains >=100 objects, and the receiving side already has the object,
it is compared byte-for-byte to ensure the incoming copy exactly
matches the already existing copy.

Either way the first object to arrive always wins.


The recipient has to trust that the remote side is providing it
something reasonable. If the recipient has *ZERO* trust in the sender,
then s/he should read the content of all newly arrived objects before
compiling or executing them. This is one reason why Git does not run
hooks that are transported as part of the repository. If the recipient
thinks reading the content is too onerous or impossible, then they
have to make a trust assertion on the remote side.

This trust assertion should be derived from the community, and not so
much around the machine claiming the content is what it says it is. We
have yet to disprove the halting problem, so we have yet to construct
a machine that can verify those Linux kernel sources you downloaded
don't contain a local root exploit (for example). Instead we have to
trust the community of developers and users who work on and run that
code to have confidence that the code works as expected, etc. We base
our trust off reputable people making statements like "Linus kernel at
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git is
pretty good", and "kernel.org is where Linus pushes directly to, so
its reasonable to trust the kernel.org server".

The recipient should have some understanding of the remote server's
security policies, or pay attention to notices posted by others who
are also fetching and reviewing content from the same repository. At
some level, the community using a repository from a given site should
be policing that site and establishing trust that the host is not
providing garbage content. After an incident, it is possible to pick
up again by rebuilding the environment from an already known
repository that people trust. In the case of the recent kernel.org
environment rebuild, that is exactly what they did, the community
picked up again from Linus' personal repository.

DNS could be abused to send you the wrong IP for a site, but most
people don't use random DNS servers, they have some level of trust in
their DNS provider. DNSSEC is helping to improve the security of the
name->IP translation process, and using protocols like HTTPS with SSL
certificate verification can help to reduce the chances that a forged
DNS entry sends you to an evil source, rather than the community
trusted one. (Although SSL certificates seem to be getting forged left
and right these days, so again you can't really rely on strong
cryptography to magically solve security problems when the attacker
holds the private key you have decided to trust with no further
verification.)


But trust aside, consider an object C is sent as a delta to the remote
side. The delta base B is not included in the pack, and is referenced
by SHA-1. When the remote side processed delta C, it looks up a copy
of base B from its own repository. We assume this content of B is
correct, due to the "first to arrive wins" rule, and the community
review/trust/notification process.

The inflated length of B is checked against a size that is stored in
the front of the delta instruction stream that describes C. These
lengths must match exactly, if they do not match then the delta
application aborts, the pack is rejected, and any temporary data is
removed from the filesystem. As Peff pointed out elsewhere in this
thread, the odds of a SHA-1 collision in a project are low, on the
order of 1/(2^80). Although there are some attacks on message digest
functions like MD-5 or SHA-1 that might be possible to generate a
duplicate in 2^57 time, any that I have read require producing a
different length content than the original you are trying to replace.
Assuming the copy of B on the remote system actually inflates and
computes to the correct SHA-1 B, it probably does not also have the
correct length if it is an object with correct hash but wrong content.
So delta application should still be checking for collision with a
1/(2^80) probability.

Assuming the remote's copy of B passed the size check, the delta is
applied on top, and the SHA-1 of the result buffer is computed. The
attacker must craft the delta such that the SHA-1 of C is the result,
otherwise connectivity checks will fail.

Assuming the attacker successfully stores a C' that has the right
SHA-1, but wrong content... the community around that repository will
eventually notice this and message that the source site cannot be
trusted. I refer you back to the statement above about trusting the
site you pull from, or trusting the users that you authorize to push
into a repository.

But thin pack aside, this problem exists in any form of a packed
object. An attacker can try to send object P' (as a non-delta) in
place of P. SHA1(P') = SHA1(P), but the content differs. This is far
easier to construct than the thin pack delta case you think is a
problem, and is the most likely approach for an attacker to take. I
refer you back to the statement above about trusting the site you pull
from, or trusting the users that you authorize to push into a
repository, or reading every object you receive.


Even if you magically fix the hash function somehow to decrease the
odds of collision (e.g. by switching to a member of the SHA-2 or SHA-3
family), there is no way to prevent a bug or root exploit from
entering the project except by never adding new code, or by carefully
reviewing everything that is submitted, and building up a basis of
trust around that review method. It is far more likely for an attacker
to try and submit a plain text patch to the Linux kernel mailing list
that reads completely correct, hashes to the correct SHA-1s when
applied in Git, etc... but just "happens" to contain an off by one
pointer bug in some weird case that allows the attacker to overflow a
critical memory buffer and later inject some code that can later be
used to create an exploit. If they are ever "caught" they may just
claim "I AM A MORON I AM SORRY I MISSED THAT BUFFER CHECK AND SO DID
YOU DURING CODE REVIEW SO ITS NOT ALL MY FAULT LEAVE ME ALONE" and get
away with it.

Trust. Review. Verify.

I don't know about you, but I don't just pull random code from
arbitrary sites on the Internet. Nor do I compile or execute that code
on my workstation. I do trust some individuals based on their
reputation on the Internet, or my past experiences working with their
code. And I also trust some hosting environments like kernel.org, or
GitHub, or code.google.com, to provide reasonably secure hosting, and
to aggressively react to any event that might make it harder for me to
trust the content they supply. And I also read a lot of code that I
pull.

It really isn't the problem you try to claim it is.

^ permalink raw reply

* Re: subversion-perl missing
From: Jason @ 2011-11-29 15:39 UTC (permalink / raw)
  To: git
In-Reply-To: <4E83433E.9000702@gjlay.de>

Hi Johann,
Were you able to get this working?  I'm also getting the "Bad URL passed to RA
layer: Unrecognized URL scheme" error when trying to run "git svn clone".  

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Shawn Pearce @ 2011-11-29 15:23 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, Bill Zaumen, git, gitster, torvalds
In-Reply-To: <CACsJy8BRhoYM+Lb8afp=3rKzYNOEq0JY8uS9mD1ovz3uyJVWOA@mail.gmail.com>

On Tue, Nov 29, 2011 at 05:17, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Tue, Nov 29, 2011 at 4:07 PM, Jeff King <peff@peff.net> wrote:
>> However, it's harder for trees. Each entry needs to have the new digest
>> added, but there simply isn't room in the format. So we would have to
>> change the tree format, breaking interoperability with older versions of
>> git. And all of your tree sha1's would change as soon as you wrote them
>> with the new format. That's only slightly better than just swapping
>> sha1 out for a better algorithm.
>
> I think we could hide the new digest after NUL at the end of path
> name. C Git at least does not seem to care whatever after NUL.

No, you can't. The next byte after the NUL at the end of a path name
is the SHA-1 of that entry. After those 20 bytes of SHA-1 data is
consumed, the "mode SP name\0" of the next entry is parsed.

There is not room in the tree format for any additional data. Period.
At best you can modify the mode value to be a new octal code that is
not recognized by current Git implementations. But that doesn't get
you a whole lot of data, and its pretty risky change because its
rather incompatible.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox