Git development

Git development
 help / color / mirror / Atom feed

* Re: git email From: parsing (was Re: [GIT PULL] Staging/IIO driver patches for 4.11-rc1)
From: Jeff King @ 2017-02-23  6:17 UTC (permalink / raw)
  To: Greg KH
  Cc: Linus Torvalds, Simon Sandström, Andrew Morton, git,
	Linux Kernel Mailing List, Linux Driver Project
In-Reply-To: <20170223060444.GA26196@kroah.com>

On Thu, Feb 23, 2017 at 07:04:44AM +0100, Greg KH wrote:

> > Poor Simon Sandström.
> > 
> > Funnily enough, this only exists for one commit. You've got several
> > other commits from Simon that get his name right.
> > 
> > What happened?
> 
> I don't know what happened, I used git for this, I don't use quilt for
> "normal" patches accepted into my trees anymore, only for stable kernel
> work.
> 
> So either the mail is malformed, or git couldn't figure it out, I've
> attached the original message below, and cc:ed the git mailing list.
> 
> Also, Simon emailed me after this was committed saying something went
> wrong, but I couldn't go back and rebase my tree.  Simon, did you ever
> figure out if something was odd on your end?
> 
> Git developers, any ideas?

The problem isn't on the applying end, but rather on the generating end.
The From header in the attached mbox is:

  From: =?us-ascii?B?PT9VVEYtOD9xP1NpbW9uPTIwU2FuZHN0cj1DMz1CNm0/PQ==?= <simon@nikanor.nu>

If you de-base64 that, you get:

  =?UTF-8?q?Simon=20Sandstr=C3=B6m?=

So something double-encoded it before it got to your mbox.

-Peff

^ permalink raw reply

* Re: [PATCH v2] config: reject invalid VAR in 'git -c VAR=VAL command'
From: Jeff King @ 2017-02-23  5:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Stefan Beller
In-Reply-To: <xmqqd1ebfd9l.fsf_-_@gitster.mtv.corp.google.com>

On Tue, Feb 21, 2017 at 01:24:38PM -0800, Junio C Hamano wrote:

> The parsing of one-shot assignments of configuration variables that
> come from the command line historically was quite loose and allowed
> anything to pass.
> 
> The configuration variable names that come from files are validated
> in git_config_parse_source(), which uses get_base_var() that grabs
> the <section> (and subsection) while making sure that <section>
> consists of iskeychar() letters, the function itself that makes sure
> that the first letter in <variable> is isalpha(), and get_value()
> that grabs the remainder of the <variable> name while making sure
> that it consists of iskeychar() letters.
> 
> Perform an equivalent check in canonicalize_config_variable_name()
> to catch invalid configuration variable names that come from the
> command line.

FWIW, the code looks OK here. It is a shame to duplicate the policy
found in git_config_parse_key(), though.

I wonder if we could make a master version of that which canonicalizes
in-place, and then just wrap it for the git_config_parse_key()
interface. Actually, I guess the function you just wrote _is_ that inner
function, as long as it learned about the "quiet" flag.

-Peff

^ permalink raw reply

* Re: [RFC][Git GUI] Make Commit message field in git GUI re sizable.
From: Jessie Hernandez @ 2017-02-23  6:46 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: Bert Wesarg, Git Mailing List, Pat Thoyts
In-Reply-To: <59ab152c-62d1-a2ec-eee5-0bd909674988@xiplink.com>

> On 2017-02-22 06:59 AM, Jessie Hernandez wrote:
>>> HI,
>>>
>>> the reason why it is fixed, is because commit messages should be
>>> wrapped at 76 characters to be used in mails. So it helps you with the
>>> wrapping.
>>>
>>> Bert
>>
>> Right ok. I understand.
>> Knowing this I think I might start writing my commit messages
>> differently
>> then.
>
> You can configure gui.commitMsgWidth if you don't like the default
> (which is 75, not 76).
>
> 		M.

Hi Marc,

Yeah I did that in my local version and got my desired effect.
But knowing it was designed like this and and not a bug,
I am happy to use the designed behavior and adjust my way of working.

-----------------
Jessie Hernandez

>
>
>> Thank you for this.
>>
>> Regards
>>
>> -----------------
>> Jessie Hernandez
>>
>>>
>>>
>>> On Wed, Feb 22, 2017 at 10:27 AM, Jessie Hernandez
>>> <jessie@jessiehernandez.com> wrote:
>>>> Hi all,
>>>>
>>>> I have been using git for a few years now and really like the
>>>> software.
>>>> I have a small annoyance and was wondering if I could get the
>>>> communities
>>>> view on this.
>>>>
>>>> When using git GUI I find it handy to be able to re-size the "Unstaged
>>>> Changes" and the "Staged Changed" fields.
>>>>
>>>> I would like the same thing for the "Commit Message" field, or to have
>>>> it
>>>> re-size with the git GUI window.
>>>>
>>>> I can re-size the "Commit Message" vertically when making the
>>>> "Modified"
>>>> panel smaller.
>>>>
>>>> Does this make sense?
>>>> I would be happy to get into more detail if that is necessary or if
>>>> something is not clear.
>>>>
>>>> Thank you.
>>>>
>>>> -----------------
>>>> Jessie Hernandez
>>>>
>>>>
>>>
>>
>>
>



^ permalink raw reply

* Re: [PATCH v2] config: reject invalid VAR in 'git -c VAR=VAL command'
From: Junio C Hamano @ 2017-02-23  7:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org, Stefan Beller
In-Reply-To: <20170223055831.u3yofkby3c56t7l4@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> FWIW, the code looks OK here. It is a shame to duplicate the policy
> found in git_config_parse_key(), though.
>
> I wonder if we could make a master version of that which canonicalizes
> in-place, and then just wrap it for the git_config_parse_key()
> interface. Actually, I guess the function you just wrote _is_ that inner
> function, as long as it learned about the "quiet" flag.

Hmm, I obviously missed an opportunity.  I thought about doing a
similar thing with the policy in parse-source, but that side didn't
seem worth doing, as the config-parse-source callgraph is quite a
mess (as it has to parse the .ini like format with line breaks and
comments, not the simple "<string>[.<string>].<string>" thing, it
cannot quite avoid it), and it needs to take advantage of _some_
policy to parse the pieces.

We could loosen the policy implemented by config-parse-key interface
(e.g. change config-parse-source to let anything that begins with a
non-whitespace continue to be processed with get_value(), instead of
only allowing string that begins with isalpha(); similarly loosen
get_value() to allow any non-dot non-space string, not just
iskeychar() bytes) and first turn what is read into the simple
"<string>[.<string>].<string>" format, and then reuse the new
"master" logic to validate.  That would allow us to update the
"master" logic to make it tighter or looser to some degree, but the
source parser still needs to hardcode _some_ policy (e.g. the first
level and the last level names begin with a non-space) that allows
it to guess what "<string>" pieces the contents being parsed from
the .ini looking format meant to express.

But you are right.  config-parse-key does have the simpler string
that can just be given to the canonicalize thing and we should be
able to reuse it.

^ permalink raw reply

* Re: git email From: parsing (was Re: [GIT PULL] Staging/IIO driver patches for 4.11-rc1)
From: Simon Sandström @ 2017-02-23  7:59 UTC (permalink / raw)
  To: Jeff King, Greg KH
  Cc: Linus Torvalds, Andrew Morton, git, Linux Kernel Mailing List,
	Linux Driver Project
In-Reply-To: <20170223061702.bzzgrntotppvwdw6@sigill.intra.peff.net>

On Thu, Feb 23, 2017 at 01:17:02AM -0500, Jeff King wrote:
> On Thu, Feb 23, 2017 at 07:04:44AM +0100, Greg KH wrote:
> 
> > 
> > I don't know what happened, I used git for this, I don't use quilt for
> > "normal" patches accepted into my trees anymore, only for stable kernel
> > work.
> > 
> > So either the mail is malformed, or git couldn't figure it out, I've
> > attached the original message below, and cc:ed the git mailing list.
> > 
> > Also, Simon emailed me after this was committed saying something went
> > wrong, but I couldn't go back and rebase my tree.  Simon, did you ever
> > figure out if something was odd on your end?
> > 
> > Git developers, any ideas?
> 
> The problem isn't on the applying end, but rather on the generating end.
> The From header in the attached mbox is:
> 
>   From: =?us-ascii?B?PT9VVEYtOD9xP1NpbW9uPTIwU2FuZHN0cj1DMz1CNm0/PQ==?= <simon@nikanor.nu>
> 
> If you de-base64 that, you get:
> 
>   =?UTF-8?q?Simon=20Sandstr=C3=B6m?=
> 
> So something double-encoded it before it got to your mbox.
> 
> -Peff

Hi,

Yes, Mutt on my end caused this. I used Mutt 1.5.23, and I either had
it misconfigured or it simply can't handle files that contains encoded
From:/Subject:-fields when you run it with mutt -H <file>.

I upgraded to Mutt 1.7.1 which solved the problem.

- Simon

^ permalink raw reply

* Re: possible bug:  inconsistent CLI behaviour for empty user.name
From: Jeff King @ 2017-02-23  8:11 UTC (permalink / raw)
  To: bs.x.ttp; +Cc: git
In-Reply-To: <20170203051309.a737846dd26a6ed8df1e4112@gmx.de>

On Fri, Feb 03, 2017 at 05:13:09AM +0100, bs.x.ttp@recursor.net wrote:

> The problem is that GIT accepts a user.name of " " for some operations
> (for example when doing a simple "git commit"), but does require a
> "non-empty" user.name for others (like git commit --amend and git
> rebase). In case of the latter commands GIT fails with the message
> "fatal: empty ident name (for <email@address>) not allowed".

I think it's a bug. We try to always reject empty usernames, but the
"empty" check is done before we cut off leading and trailing cruft (like
whitespace).

The "--amend" command notices because it actually parses the name out of
the existing commit. That version has already had its whitespace eaten
up (when it was written into the original commit), and so it ends up as
blank.

Here's a series which fixes that along with a few other oddities I
noticed.

  [1/4]: ident: mark error messages for translation
  [2/4]: ident: handle NULL email when complaining of empty name
  [3/4]: ident: reject all-crud ident name
  [4/4]: ident: do not ignore empty config name/email

 ident.c                       | 49 ++++++++++++++++++++++++++-----------------
 t/t7518-ident-corner-cases.sh | 36 +++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 19 deletions(-)
 create mode 100755 t/t7518-ident-corner-cases.sh


-Peff

^ permalink raw reply

* [PATCH 2/4] ident: handle NULL email when complaining of empty name
From: Jeff King @ 2017-02-23  8:13 UTC (permalink / raw)
  To: bs.x.ttp; +Cc: git
In-Reply-To: <20170223081157.hwfn3msfux5udmng@sigill.intra.peff.net>

If we see an empty name, we complain about and mention the
matching email in the error message (to give it some
context). However, the "email" pointer may be NULL here if
we were planning to fill it in later from ident_default_email().

This was broken by 59f929596 (fmt_ident: refactor strictness
checks, 2016-02-04). Prior to that commit, we would look up
the default name and email before doing any other actions.
So one solution would be to go back to that.

However, we can't just do so blindly. The logic for handling
the "!email" condition has grown since then. In particular,
looking up the default email can die if getpwuid() fails,
but there are other errors that should take precedence.
Commit 734c7789a (ident: check for useConfigOnly before
auto-detection of name/email, 2016-03-30) reordered the
checks so that we prefer the error message for
useConfigOnly.

Instead, we can observe that while the name-handling depends
on "email" being set, the reverse is not true. So we can
simply set up the email variable first.

This does mean that if both are bogus, we'll complain about
the email before the name. But between the two, there is no
reason to prefer one over the other.

Signed-off-by: Jeff King <peff@peff.net>
---
 ident.c                       | 26 +++++++++++++-------------
 t/t7518-ident-corner-cases.sh | 20 ++++++++++++++++++++
 2 files changed, 33 insertions(+), 13 deletions(-)
 create mode 100755 t/t7518-ident-corner-cases.sh

diff --git a/ident.c b/ident.c
index dde82983a..ea6034581 100644
--- a/ident.c
+++ b/ident.c
@@ -351,6 +351,19 @@ const char *fmt_ident(const char *name, const char *email,
 	int want_date = !(flag & IDENT_NO_DATE);
 	int want_name = !(flag & IDENT_NO_NAME);

+	if (!email) {
+		if (strict && ident_use_config_only
+		    && !(ident_config_given & IDENT_MAIL_GIVEN)) {
+			fputs(_(env_hint), stderr);
+			die(_("no email was given and auto-detection is disabled"));
+		}
+		email = ident_default_email();
+		if (strict && default_email_is_bogus) {
+			fputs(_(env_hint), stderr);
+			die(_("unable to auto-detect email address (got '%s')"), email);
+		}
+	}
+
 	if (want_name) {
 		int using_default = 0;
 		if (!name) {
@@ -378,19 +391,6 @@ const char *fmt_ident(const char *name, const char *email,
 		}
 	}

-	if (!email) {
-		if (strict && ident_use_config_only
-		    && !(ident_config_given & IDENT_MAIL_GIVEN)) {
-			fputs(_(env_hint), stderr);
-			die(_("no email was given and auto-detection is disabled"));
-		}
-		email = ident_default_email();
-		if (strict && default_email_is_bogus) {
-			fputs(_(env_hint), stderr);
-			die(_("unable to auto-detect email address (got '%s')"), email);
-		}
-	}
-
 	strbuf_reset(&ident);
 	if (want_name) {
 		strbuf_addstr_without_crud(&ident, name);
diff --git a/t/t7518-ident-corner-cases.sh b/t/t7518-ident-corner-cases.sh
new file mode 100755
index 000000000..6c057afc1
--- /dev/null
+++ b/t/t7518-ident-corner-cases.sh
@@ -0,0 +1,20 @@
+#!/bin/sh
+
+test_description='corner cases in ident strings'
+. ./test-lib.sh
+
+# confirm that we do not segfault _and_ that we do not say "(null)", as
+# glibc systems will quietly handle our NULL pointer
+#
+# Note also that we can't use "env" here because we need to unset a variable,
+# and "-u" is not portable.
+test_expect_success 'empty name and missing email' '
+	(
+		sane_unset GIT_AUTHOR_EMAIL &&
+		GIT_AUTHOR_NAME= &&
+		test_must_fail git commit --allow-empty -m foo 2>err &&
+		test_i18ngrep ! null err
+	)
+'
+
+test_done
-- 
2.12.0.rc2.597.g959f68882

^ permalink raw reply related

* [PATCH 1/4] ident: mark error messages for translation
From: Jeff King @ 2017-02-23  8:12 UTC (permalink / raw)
  To: bs.x.ttp; +Cc: git
In-Reply-To: <20170223081157.hwfn3msfux5udmng@sigill.intra.peff.net>

We already translate the big "please tell me who you are"
hint, but missed the individual error messages that go with
it.

Signed-off-by: Jeff King <peff@peff.net>
---
 ident.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/ident.c b/ident.c
index ac4ae02b4..dde82983a 100644
--- a/ident.c
+++ b/ident.c
@@ -357,13 +357,13 @@ const char *fmt_ident(const char *name, const char *email,
 			if (strict && ident_use_config_only
 			    && !(ident_config_given & IDENT_NAME_GIVEN)) {
 				fputs(_(env_hint), stderr);
-				die("no name was given and auto-detection is disabled");
+				die(_("no name was given and auto-detection is disabled"));
 			}
 			name = ident_default_name();
 			using_default = 1;
 			if (strict && default_name_is_bogus) {
 				fputs(_(env_hint), stderr);
-				die("unable to auto-detect name (got '%s')", name);
+				die(_("unable to auto-detect name (got '%s')"), name);
 			}
 		}
 		if (!*name) {
@@ -371,7 +371,7 @@ const char *fmt_ident(const char *name, const char *email,
 			if (strict) {
 				if (using_default)
 					fputs(_(env_hint), stderr);
-				die("empty ident name (for <%s>) not allowed", email);
+				die(_("empty ident name (for <%s>) not allowed"), email);
 			}
 			pw = xgetpwuid_self(NULL);
 			name = pw->pw_name;
@@ -382,12 +382,12 @@ const char *fmt_ident(const char *name, const char *email,
 		if (strict && ident_use_config_only
 		    && !(ident_config_given & IDENT_MAIL_GIVEN)) {
 			fputs(_(env_hint), stderr);
-			die("no email was given and auto-detection is disabled");
+			die(_("no email was given and auto-detection is disabled"));
 		}
 		email = ident_default_email();
 		if (strict && default_email_is_bogus) {
 			fputs(_(env_hint), stderr);
-			die("unable to auto-detect email address (got '%s')", email);
+			die(_("unable to auto-detect email address (got '%s')"), email);
 		}
 	}
 
@@ -403,7 +403,7 @@ const char *fmt_ident(const char *name, const char *email,
 		strbuf_addch(&ident, ' ');
 		if (date_str && date_str[0]) {
 			if (parse_date(date_str, &ident) < 0)
-				die("invalid date format: %s", date_str);
+				die(_("invalid date format: %s"), date_str);
 		}
 		else
 			strbuf_addstr(&ident, ident_default_date());
-- 
2.12.0.rc2.597.g959f68882


^ permalink raw reply related

* [PATCH 3/4] ident: reject all-crud ident name
From: Jeff King @ 2017-02-23  8:15 UTC (permalink / raw)
  To: bs.x.ttp; +Cc: git
In-Reply-To: <20170223081157.hwfn3msfux5udmng@sigill.intra.peff.net>

An ident name consisting of only "crud" characters (like
whitespace or punctuation) is effectively the same as an
empty one, because our strbuf_addstr_without_crud() will
remove those characters.

We reject an empty name when formatting a strict ident, but
don't notice an all-crud one because our check happens
before the crud-removal step.

We could skip past the crud before checking for an empty
name, but let's make it a separate code path, for two
reasons. One is that we can give a more specific error
message. And two is that unlike a blank name, we probably
don't want to kick in the fallback-to-username behavior.

Signed-off-by: Jeff King <peff@peff.net>
---
 ident.c                       | 11 +++++++++++
 t/t7518-ident-corner-cases.sh |  5 +++++
 2 files changed, 16 insertions(+)

diff --git a/ident.c b/ident.c
index ea6034581..ead09ff7f 100644
--- a/ident.c
+++ b/ident.c
@@ -203,6 +203,15 @@ static int crud(unsigned char c)
 		c == '\'';
 }
 
+static int has_non_crud(const char *str)
+{
+	for (; *str; str++) {
+		if (!crud(*str))
+			return 1;
+	}
+	return 0;
+}
+
 /*
  * Copy over a string to the destination, but avoid special
  * characters ('\n', '<' and '>') and remove crud at the end
@@ -389,6 +398,8 @@ const char *fmt_ident(const char *name, const char *email,
 			pw = xgetpwuid_self(NULL);
 			name = pw->pw_name;
 		}
+		if (strict && !has_non_crud(name))
+			die(_("name consists only of disallowed characters: %s"), name);
 	}
 
 	strbuf_reset(&ident);
diff --git a/t/t7518-ident-corner-cases.sh b/t/t7518-ident-corner-cases.sh
index 6c057afc1..3d2560c3c 100755
--- a/t/t7518-ident-corner-cases.sh
+++ b/t/t7518-ident-corner-cases.sh
@@ -17,4 +17,9 @@ test_expect_success 'empty name and missing email' '
 	)
 '
 
+test_expect_success 'commit rejects all-crud name' '
+	test_must_fail env GIT_AUTHOR_NAME=" .;<>" \
+		git commit --allow-empty -m foo
+'
+
 test_done
-- 
2.12.0.rc2.597.g959f68882


^ permalink raw reply related

* [PATCH 4/4] ident: do not ignore empty config name/email
From: Jeff King @ 2017-02-23  8:17 UTC (permalink / raw)
  To: bs.x.ttp; +Cc: git
In-Reply-To: <20170223081157.hwfn3msfux5udmng@sigill.intra.peff.net>

When we read user.name and user.email from a config file,
they go into strbufs. When a caller asks ident_default_name()
for the value, we fallback to auto-detecting if the strbuf
is empty.

That means that explicitly setting an empty string in the
config is identical to not setting it at all. This is
potentially confusing, as we usually accept a configured
value as the final value.

Signed-off-by: Jeff King <peff@peff.net>
---
This one is perhaps questionable. Maybe somebody is relying on setting a
per-repo user.name to override a ~/.gitconfig value and enforce
auto-detection?

 ident.c                       |  4 ++--
 t/t7518-ident-corner-cases.sh | 11 +++++++++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/ident.c b/ident.c
index ead09ff7f..c0364fe3a 100644
--- a/ident.c
+++ b/ident.c
@@ -153,7 +153,7 @@ static void copy_email(const struct passwd *pw, struct strbuf *email,
 
 const char *ident_default_name(void)
 {
-	if (!git_default_name.len) {
+	if (!(ident_config_given & IDENT_NAME_GIVEN) && !git_default_name.len) {
 		copy_gecos(xgetpwuid_self(&default_name_is_bogus), &git_default_name);
 		strbuf_trim(&git_default_name);
 	}
@@ -162,7 +162,7 @@ const char *ident_default_name(void)
 
 const char *ident_default_email(void)
 {
-	if (!git_default_email.len) {
+	if (!(ident_config_given & IDENT_MAIL_GIVEN) && !git_default_email.len) {
 		const char *email = getenv("EMAIL");
 
 		if (email && email[0]) {
diff --git a/t/t7518-ident-corner-cases.sh b/t/t7518-ident-corner-cases.sh
index 3d2560c3c..ef570ac62 100755
--- a/t/t7518-ident-corner-cases.sh
+++ b/t/t7518-ident-corner-cases.sh
@@ -22,4 +22,15 @@ test_expect_success 'commit rejects all-crud name' '
 		git commit --allow-empty -m foo
 '
 
+# We must test the actual error message here, as an unwanted
+# auto-detection could fail for other reasons.
+test_expect_success 'empty configured name does not auto-detect' '
+	(
+		sane_unset GIT_AUTHOR_NAME &&
+		test_must_fail \
+			git -c user.name= commit --allow-empty -m foo 2>err &&
+		test_i18ngrep "empty ident name" err
+	)
+'
+
 test_done
-- 
2.12.0.rc2.597.g959f68882

^ permalink raw reply related

* [PATCH 1/4] t7003: ensure --prune-empty can prune root commit
From: Devin J. Pohly @ 2017-02-23  8:27 UTC (permalink / raw)
  To: git; +Cc: Devin J. Pohly

New test to expose a bug in filter-branch whereby the root commit is
never pruned, even though its tree is empty and --prune-empty is given.

The setup isn't exactly pretty, but I couldn't think of a simpler way to
create a parallel commit graph sans the first commit.

Signed-off-by: Devin J. Pohly <djpohly@gmail.com>
---
 t/t7003-filter-branch.sh | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index cb8fbd8e5..2dfe46250 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -313,6 +313,27 @@ test_expect_success 'Tag name filtering allows slashes in tag names' '
 	git cat-file tag X/2 > actual &&
 	test_cmp expect actual
 '
+test_expect_success 'setup --prune-empty comparisons' '
+	git checkout --orphan master-no-a &&
+	git rm -rf . &&
+	unset test_tick &&
+	test_tick &&
+	GIT_COMMITTER_DATE="@0 +0000" GIT_AUTHOR_DATE="@0 +0000" &&
+	test_commit --notick B B.t B Bx &&
+	git checkout -b branch-no-a Bx &&
+	test_commit D D.t D Dx &&
+	mkdir dir &&
+	test_commit dir/D dir/D.t dir/D dir/Dx &&
+	test_commit E E.t E Ex &&
+	git checkout master-no-a &&
+	test_commit C C.t C Cx &&
+	git checkout branch-no-a &&
+	git merge Cx -m "Merge tag '\''C'\'' into branch" &&
+	git tag Fx &&
+	test_commit G G.t G Gx &&
+	test_commit H H.t H Hx &&
+	git checkout branch
+'
 
 test_expect_success 'Prune empty commits' '
 	git rev-list HEAD > expect &&
@@ -341,6 +362,15 @@ test_expect_success 'prune empty works even without index/tree filters' '
 	test_cmp expect actual
 '
 
+test_expect_success '--prune-empty is able to prune root commit' '
+	git rev-list branch-no-a >expect &&
+	git branch testing H &&
+	git filter-branch -f --prune-empty --index-filter "git update-index --remove A.t" testing &&
+	git rev-list testing >actual &&
+	git branch -D testing &&
+	test_cmp expect actual
+'
+
 test_expect_success '--remap-to-ancestor with filename filters' '
 	git checkout master &&
 	git reset --hard A &&
-- 
2.11.1


^ permalink raw reply related

* [PATCH 2/4] t7003: ensure --prune-empty removes entire branch when applicable
From: Devin J. Pohly @ 2017-02-23  8:27 UTC (permalink / raw)
  To: git; +Cc: Devin J. Pohly
In-Reply-To: <20170223082736.31283-1-djpohly@gmail.com>

Sanity check before changing the logic in git_commit_non_empty_tree.

Signed-off-by: Devin J. Pohly <djpohly@gmail.com>
---
 t/t7003-filter-branch.sh | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index 2dfe46250..7cb60799b 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -371,6 +371,13 @@ test_expect_success '--prune-empty is able to prune root commit' '
 	test_cmp expect actual
 '
 
+test_expect_success '--prune-empty is able to prune entire branch' '
+	git branch prune-entire B &&
+	git filter-branch -f --prune-empty --index-filter "git update-index --remove A.t B.t" prune-entire &&
+	test_path_is_missing .git/refs/heads/prune-entire &&
+	test_must_fail git reflog exists refs/heads/prune-entire
+'
+
 test_expect_success '--remap-to-ancestor with filename filters' '
 	git checkout master &&
 	git reset --hard A &&
-- 
2.11.1


^ permalink raw reply related

* [PATCH 3/4] filter-branch: fix --prune-empty on parentless commits
From: Devin J. Pohly @ 2017-02-23  8:27 UTC (permalink / raw)
  To: git; +Cc: Devin J. Pohly
In-Reply-To: <20170223082736.31283-1-djpohly@gmail.com>

Previously, the git_commit_non_empty_tree function would always pass any
commit with no parents to git-commit-tree, regardless of whether the
tree was nonempty.  The new commit would then be recorded in the
filter-branch revision map, and subsequent commits which leave the tree
untouched would be correctly filtered.

With this change, parentless commits with an empty tree are correctly
pruned, and an empty file is recorded in the revision map, signifying
that it was rewritten to "no commits."  This works naturally with the
parent mapping for subsequent commits.

Signed-off-by: Devin J. Pohly <djpohly@gmail.com>
---
 Documentation/git-filter-branch.txt | 14 ++++++--------
 git-filter-branch.sh                |  2 ++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
index 0a09698c0..6e4bb0220 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -167,14 +167,12 @@ to other tags will be rewritten to point to the underlying commit.
 	project root. Implies <<Remap_to_ancestor>>.
 
 --prune-empty::
-	Some kind of filters will generate empty commits, that left the tree
-	untouched.  This switch allow git-filter-branch to ignore such
-	commits.  Though, this switch only applies for commits that have one
-	and only one parent, it will hence keep merges points. Also, this
-	option is not compatible with the use of `--commit-filter`. Though you
-	just need to use the function 'git_commit_non_empty_tree "$@"' instead
-	of the `git commit-tree "$@"` idiom in your commit filter to make that
-	happen.
+	Some filters will generate empty commits that leave the tree untouched.
+	This option instructs git-filter-branch to remove such commits if they
+	have exactly one or zero non-pruned parents; merge commits will
+	therefore remain intact.  This option cannot be used together with
+	`--commit-filter`, though the same effect can be achieved by using the
+	provided `git_commit_non_empty_tree` function in a commit filter.
 
 --original <namespace>::
 	Use this option to set the namespace where the original commits
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 86b2ff1e0..2b8cdba15 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -46,6 +46,8 @@ git_commit_non_empty_tree()
 {
 	if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then
 		map "$3"
+	elif test $# = 1 && test "$1" = 4b825dc642cb6eb9a060e54bf8d69288fbee4904; then
+		:
 	else
 		git commit-tree "$@"
 	fi
-- 
2.11.1


^ permalink raw reply related

* [PATCH 4/4] p7000: add test for filter-branch with --prune-empty
From: Devin J. Pohly @ 2017-02-23  8:27 UTC (permalink / raw)
  To: git; +Cc: Devin J. Pohly
In-Reply-To: <20170223082736.31283-1-djpohly@gmail.com>

Signed-off-by: Devin J. Pohly <djpohly@gmail.com>
---
 t/perf/p7000-filter-branch.sh | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/t/perf/p7000-filter-branch.sh b/t/perf/p7000-filter-branch.sh
index 15ee5d1d5..b029586cc 100755
--- a/t/perf/p7000-filter-branch.sh
+++ b/t/perf/p7000-filter-branch.sh
@@ -16,4 +16,9 @@ test_perf 'noop filter' '
 	git filter-branch -f base..HEAD
 '
 
+test_perf 'noop prune-empty' '
+	git checkout --detach tip &&
+	git filter-branch -f --prune-empty base..HEAD
+'
+
 test_done
-- 
2.11.1


^ permalink raw reply related

* Re: [PATCH] http(s): automatically try NTLM authentication first
From: Mantas Mikulėnas @ 2017-02-23  9:13 UTC (permalink / raw)
  To: David Turner
  Cc: 'brian m. carlson', 'Junio C Hamano',
	git@vger.kernel.org, Johannes Schindelin, Eric Sunshine,
	Jeff King
In-Reply-To: <b152fad7e79046c5aa6cac9e21066c1c@exmbdft7.ad.twosigma.com>

On 2017-02-23 03:03, David Turner wrote:
> Actually, though, I am not sure this is as bad as it seems, because gssapi
> might protect us.  When I locally tried a fake server, git (libcurl) refused to 
> send my Kerberos credentials because "Server not found in Kerberos 
> database".  I don't have a machine set up with NTLM authentication 
> (because, apparently, that would be insane), so I don't know how to 
> confirm that gssapi would operate off of a whitelist for NTLM as well. 

NTLM and Kerberos work very differently in that regard.

Kerberos is ticket-based so the client *first* has to obtain a ticket
from the domain's KDC, so a malicious server at minimum needs to know
what principal name to provide (i.e. which real server to try
impersonating). And even if it does that, the ticket doesn't contain
crackable hashes, just data encrypted with a key known only to the KDC
and the real server. So the whitelist is only for privacy and/or
performance reasons, I guess?

NTLM is challenge/response without any third party, and yes, it requires
the application to implement its own whitelisting to avoid the security
problems.

-- 
Mantas Mikulėnas <grawity@gmail.com>

^ permalink raw reply

* Re: [PATCH v2 1/4] delete_ref: accept a reflog message argument
From: Duy Nguyen @ 2017-02-23  9:33 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: Junio C Hamano, Jeff King, Git Mailing List
In-Reply-To: <20170221011035.847-2-kyle@kyleam.com>

On Tue, Feb 21, 2017 at 8:10 AM, Kyle Meyer <kyle@kyleam.com> wrote:
> diff --git a/refs.h b/refs.h
> index 9fbff90e7..5880886a7 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -276,8 +276,8 @@ int reflog_exists(const char *refname);
>   * exists, regardless of its old value. It is an error for old_sha1 to
>   * be NULL_SHA1. flags is passed through to ref_transaction_delete().
>   */
> -int delete_ref(const char *refname, const unsigned char *old_sha1,
> -              unsigned int flags);
> +int delete_ref(const char *msg, const char *refname,
> +              const unsigned char *old_sha1, unsigned int flags);

Is it just me who thinks it's weird that msg comes in front here? The
key identity, refname, should be the first argument imo. You'll
probably want to update the comment block above if msg can be NULL. We
have _very_ good documentation in this file, let's keep it uptodate.
-- 
Duy

^ permalink raw reply

* Re: [PATCH v5 01/24] refs.h: add forward declaration for structs used in this file
From: Duy Nguyen @ 2017-02-23  9:26 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Junio C Hamano, Michael Haggerty,
	Johannes Schindelin, Ramsay Jones, David Turner
In-Reply-To: <CAGZ79kZTr924RrvG6RsasmhT1yswN875XShtmrrcJ8ztaAGJUw@mail.gmail.com>

On Thu, Feb 23, 2017 at 1:18 AM, Stefan Beller <sbeller@google.com> wrote:
> On Wed, Feb 22, 2017 at 6:04 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>>  refs.h | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/refs.h b/refs.h
>> index 9fbff90e7..c494b641a 100644
>> --- a/refs.h
>> +++ b/refs.h
>> @@ -1,6 +1,11 @@
>>  #ifndef REFS_H
>>  #define REFS_H
>>
>> +struct object_id;
>> +struct ref_transaction;
>> +struct strbuf;
>> +struct string_list;
>> +
>>  /*
>>   * Resolve a reference, recursively following symbolic refererences.
>>   *
>> @@ -144,7 +149,6 @@ int dwim_log(const char *str, int len, unsigned char *sha1, char **ref);
>>   * `ref_transaction_commit` is called.  So `ref_transaction_verify`
>>   * won't report a verification failure until the commit is attempted.
>>   */
>> -struct ref_transaction;
>
> Leaving the detailed comment about ref_transaction dangling?
> I can understand if you don't want to move it with the declaration,
> as you want all declarations terse in a few lines.
> Maybe move the comment to be part of the first large comment
> (The one that you can see in the first hunk, starting with
> " * Resolve a reference, recursively following")

I thought the comment block covered the following declarations too,
not just ref_transaction. But on second read it's not that.
Transaction functions are way down below. I'll probably move
ref_transaction back where it was.
-- 
Duy

^ permalink raw reply

* [ANNOUNCE] Git Rev News edition 24
From: Christian Couder @ 2017-02-23  9:41 UTC (permalink / raw)
  To: git
  Cc: lwn, Thomas Ferris Nicolaisen, Jakub Narebski, Markus Jansen,
	Junio C Hamano, Johannes Schindelin, Jeff King, Stefan Beller,
	Pranit Bauva, Michael J Gruber, Jacob Vosmaer,
	Ævar Arnfjörð Bjarmason, Josh Triplett,
	Erik van Zijst, Brendan Forster, Stefan Saasen, David Turner,
	Nguyen Thai Ngoc Duy, Shawn O. Pierce, Carlos Martín Nieto,
	Charles Bailey, Karen Sandler, Tim Pettersen, Lars Schneider,
	Ben Peart, Cornelius Schumacher

Hi everyone,

The 24th edition of Git Rev News is now published:

https://git.github.io/rev_news/2017/02/22/edition-24/

Thanks a lot to all the contributors and helpers!

Enjoy,
Christian, Thomas, Jakub and Markus.

^ permalink raw reply

* [PATCH v5 1/1] config: add conditional include
From: Nguyễn Thái Ngọc Duy @ 2017-02-23 12:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, sschuberth, Matthieu Moy,
	Nguyễn Thái Ngọc Duy
In-Reply-To: <20170223122346.12222-1-pclouds@gmail.com>

This allows some more flexibility in managing configuration across
repositories. The most often seen use case on the mailing list is when
the user needs to use different email addresses on different
repositories. If these repositories share something that we can use to
group them up, then we can set the same configuration for groups
automatically.

In this patch, the only supported grouping is based on $GIT_DIR (in
absolute path), so you would need to group repositories by directory, or
something like that to take advantage of it.

We already have include.path for unconditional includes. This patch goes
with include-if.xxx.path to make it clearer that a condition is
required.

Similar to include.path, older git versions that don't understand
include-if will simply ignore them.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt  |  54 +++++++++++++++++++++++++
 config.c                  | 100 ++++++++++++++++++++++++++++++++++++++++++++++
 t/t1305-config-include.sh |  56 ++++++++++++++++++++++++++
 3 files changed, 210 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 015346c417..8cadf2b776 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -91,6 +91,49 @@ found at the location of the include directive. If the value of the
 relative to the configuration file in which the include directive was
 found.  See below for examples.
 
+Conditional includes
+~~~~~~~~~~~~~~~~~~~~
+
+You can include one config file from another conditionally by setting
+a special `include-if.<condition>.path` variable to the name of the
+file to be included. The variable is treated the same way as
+`include.path`.
+
+The condition starts with a keyword, followed by a colon and a
+pattern. The interpretation of the pattern depends on the keyword.
+Supported keywords are:
+
+`gitdir`::
+	The environment variable `GIT_DIR` must match the following
+	pattern for files to be included. The pattern can contain
+	standard globbing wildcards and two additional ones, `**/` and
+	`/**`, that can match multiple path components. Please refer
+	to linkgit:gitignore[5] for details. For convenience:
+
+ * If the pattern starts with `~/`, `~` will be substituted with the
+   content of the environment variable `HOME`.
+
+ * If the pattern starts with `./`, it is replaced with the directory
+   containing the current config file.
+
+ * If the pattern does not start with either `~/`, `./` or `/`, `**/`
+   will be automatically prepended. For example, the pattern `foo/bar`
+   becomes `**/foo/bar` and would match `/any/path/to/foo/bar`.
+
+ * If the pattern ends with `/`, `**` will be automatically added. For
+   example, the pattern `foo/` becomes `foo/**`. In other words, it
+   matches "foo" and everything inside, recursively.
+
+`gitdir/i`::
+	This is the same as `gitdir` except that matching is done
+	case-insensitively (e.g. on case-insensitive file sytems)
+
+A few more notes on matching:
+
+ * Symlinks in `$GIT_DIR` are not resolved before matching.
+
+ * Note that "../" is not special and will match literally, which is
+   unlikely what you want.
 
 Example
 ~~~~~~~
@@ -119,6 +162,17 @@ Example
 		path = foo ; expand "foo" relative to the current file
 		path = ~/foo ; expand "foo" in your `$HOME` directory
 
+	; include if $GIT_DIR is /path/to/foo/.git
+	[include-if "gitdir:/path/to/foo/.git"]
+		path = /path/to/foo.inc
+
+	; include for all repositories inside /path/to/group
+	[include-if "gitdir:/path/to/group/"]
+		path = /path/to/foo.inc
+
+	; include for all repositories inside $HOME/to/group
+	[include-if "gitdir:~/to/group/"]
+		path = /path/to/foo.inc
 
 Values
 ~~~~~~
diff --git a/config.c b/config.c
index c6b874a7bf..3090fbf1a8 100644
--- a/config.c
+++ b/config.c
@@ -13,6 +13,7 @@
 #include "hashmap.h"
 #include "string-list.h"
 #include "utf8.h"
+#include "dir.h"
 
 struct config_source {
 	struct config_source *prev;
@@ -170,9 +171,101 @@ static int handle_path_include(const char *path, struct config_include_data *inc
 	return ret;
 }
 
+static int prepare_include_condition_pattern(struct strbuf *pat)
+{
+	struct strbuf path = STRBUF_INIT;
+	int prefix = 0;
+
+	/* TODO: maybe support ~user/ too */
+	if (pat->buf[0] == '~' && is_dir_sep(pat->buf[1])) {
+		const char *home = getenv("HOME");
+
+		if (!home)
+			return error(_("$HOME is not defined"));
+
+		strbuf_add_absolute_path(&path, home);
+		strbuf_splice(pat, 0, 1, path.buf, path.len);
+		prefix = path.len + 1 /*slash*/;
+	} else if (pat->buf[0] == '.' && is_dir_sep(pat->buf[1])) {
+		const char *slash;
+
+		if (!cf || !cf->path)
+			return error(_("relative config include "
+				       "conditionals must come from files"));
+
+		/* TODO: escape wildcards */
+		strbuf_add_absolute_path(&path, cf->path);
+		slash = find_last_dir_sep(path.buf);
+		if (!slash)
+			die("BUG: how is this possible?");
+		strbuf_splice(pat, 0, 1, path.buf, slash - path.buf);
+		prefix = slash - path.buf + 1 /* slash */;
+	} else if (!is_absolute_path(pat->buf))
+		strbuf_insert(pat, 0, "**/", 3);
+
+	if (pat->len && is_dir_sep(pat->buf[pat->len - 1]))
+		strbuf_addstr(pat, "**");
+
+	strbuf_release(&path);
+	return prefix;
+}
+
+static int include_by_gitdir(const char *cond, size_t cond_len, int icase)
+{
+	struct strbuf text = STRBUF_INIT;
+	struct strbuf pattern = STRBUF_INIT;
+	int ret = 0, prefix;
+
+	strbuf_add_absolute_path(&text, get_git_dir());
+	strbuf_add(&pattern, cond, cond_len);
+	prefix = prepare_include_condition_pattern(&pattern);
+
+	if (prefix < 0)
+		goto done;
+
+	if (prefix > 0) {
+		/*
+		 * perform literal matching on the prefix part so that
+		 * any wildcard character in it can't create side effects.
+		 */
+		if (text.len < prefix)
+			goto done;
+		if (!icase && strncmp(pattern.buf, text.buf, prefix))
+			goto done;
+		if (icase && strncasecmp(pattern.buf, text.buf, prefix))
+			goto done;
+	}
+
+	ret = !wildmatch(pattern.buf + prefix, text.buf + prefix,
+			 icase ? WM_CASEFOLD : 0, NULL);
+
+done:
+	strbuf_release(&pattern);
+	strbuf_release(&text);
+	return ret;
+}
+
+static int include_condition_is_true(const char *cond, size_t cond_len)
+{
+	/* no condition (i.e., "include.path") is always true */
+	if (!cond)
+		return 1;
+
+	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
+		return include_by_gitdir(cond, cond_len, 0);
+	else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len))
+		return include_by_gitdir(cond, cond_len, 1);
+
+	error(_("unrecognized include condition: %.*s"), (int)cond_len, cond);
+	/* unknown conditionals are always false */
+	return 0;
+}
+
 int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
+	const char *cond, *key;
+	int cond_len;
 	int ret;
 
 	/*
@@ -185,6 +278,13 @@ int git_config_include(const char *var, const char *value, void *data)
 
 	if (!strcmp(var, "include.path"))
 		ret = handle_path_include(value, inc);
+
+	if (!parse_config_key(var, "include-if", &cond, &cond_len, &key) &&
+	    include_condition_is_true(cond, cond_len)) {
+		if (!strcmp(key, "path"))
+			ret = handle_path_include(value, inc);
+		/* else we do not know about this type of include; ignore */
+	}
 	return ret;
 }
 
diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
index 9ba2ba11c3..d5b586e270 100755
--- a/t/t1305-config-include.sh
+++ b/t/t1305-config-include.sh
@@ -152,6 +152,62 @@ test_expect_success 'relative includes from stdin line fail' '
 	test_must_fail git config --file - test.one
 '
 
+test_expect_success 'conditional include, both unanchored' '
+	git init foo &&
+	(
+		cd foo &&
+		echo "[include-if \"gitdir:foo/\"]path=bar" >>.git/config &&
+		echo "[test]one=1" >.git/bar &&
+		echo 1 >expect &&
+		git config test.one >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'conditional include, $HOME expansion' '
+	(
+		cd foo &&
+		echo "[include-if \"gitdir:~/foo/\"]path=bar2" >>.git/config &&
+		echo "[test]two=2" >.git/bar2 &&
+		echo 2 >expect &&
+		git config test.two >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'conditional include, full pattern' '
+	(
+		cd foo &&
+		echo "[include-if \"gitdir:**/foo/**\"]path=bar3" >>.git/config &&
+		echo "[test]three=3" >.git/bar3 &&
+		echo 3 >expect &&
+		git config test.three >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'conditional include, relative path' '
+	echo "[include-if \"gitdir:./foo/.git\"]path=bar4" >>.gitconfig &&
+	echo "[test]four=4" >bar4 &&
+	(
+		cd foo &&
+		echo 4 >expect &&
+		git config test.four >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'conditional include, both unanchored, icase' '
+	(
+		cd foo &&
+		echo "[include-if \"gitdir/i:FOO/\"]path=bar5" >>.git/config &&
+		echo "[test]five=5" >.git/bar5 &&
+		echo 5 >expect &&
+		git config test.five >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'include cycles are detected' '
 	cat >.gitconfig <<-\EOF &&
 	[test]value = gitconfig
-- 
2.11.0.157.gd943d85


^ permalink raw reply related

* [PATCH v5 0/1] Conditional config include
From: Nguyễn Thái Ngọc Duy @ 2017-02-23 12:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, sschuberth, Matthieu Moy,
	Nguyễn Thái Ngọc Duy

Let's try this again. v4 and before can be found in the original
thread [1]. The remaining issues of v4 were

On Fri, Aug 19, 2016 at 8:54 PM, Jeff King <peff@peff.net> wrote:
> On Sat, Aug 13, 2016 at 03:40:59PM +0700, Duy Nguyen wrote:
>
>> Ping..
>
> There was some discussion after v4. I think the open issues are:
>
>   - the commit message is rather terse (it should describe motivation,
>     and can refer to the docs for the "how")

Fixed

>
>   - the syntax might be more clear as:
>
>        [include-if "gitdir:..."]
>
>     or
>
>        [include "gitdir-is:..."]

I'll go with include-if. It feels a bit awkward to add -is to every
new type of condition.

>   - there is an open question of whether we would like to go this route,
>     maintaining backwards compatibility syntactically (and thus having
>     these includes quietly skipped in older versions), or introduce a
>     breaking syntax that could be more convenient, like:
>
>       [if-gitdir(...):section]
>       conditional-but-no-include-necessary = true
>
>     I do not have a strong opinion myself, though I had written the
>     original [include] assuming syntactic compatibility, and I think it
>     has worked out (e.g., you can manipulate include.path via "git
>     config" just as you would any other key).

I'll go with the what we have done, at least it's proven not provoking
too bad effects. So no breaking syntax.

[1] http://public-inbox.org/git/20160714153311.2166-1-pclouds@gmail.com/

Nguyễn Thái Ngọc Duy (1):
  config: add conditional include

 Documentation/config.txt  |  54 +++++++++++++++++++++++++
 config.c                  | 100 ++++++++++++++++++++++++++++++++++++++++++++++
 t/t1305-config-include.sh |  56 ++++++++++++++++++++++++++
 3 files changed, 210 insertions(+)

-- 
2.11.0.157.gd943d85


^ permalink raw reply

* RE: [RFC] Add support for downloading blobs on demand
From: Ben Peart @ 2017-02-23 15:39 UTC (permalink / raw)
  To: 'Christian Couder'
  Cc: 'Jeff King', 'git', 'Johannes Schindelin',
	'Ben Peart'
In-Reply-To: <002701d2816e$f4682fa0$dd388ee0$@gmail.com>

I've completed the work of switching our read_object proposal to use a 
background process (refactored from the LFS code) and have extricated it
from the rest of our GVFS fork so that it can be examined/tested
separately.  It is currently based on a Git For Windows fork that I've
pushed to GitHub for anyone who is interested in viewing it at:

https://github.com/benpeart/git/tree/read-object-process

After some additional conversations with Christian, we're working to
combine our RFC/patch series into a single solution that should meet the
requirements of both.

The combined solution needs to have an "info" function which requests
info about a single object instead of a "have" function which must
return information on all objects the ODB knows as this doesn't scale
when the number of objects is large.  

This means the "info" call has to be fast so spawning a process on every
call won't work.  The background process with a versioned interface that
allows you to negotiate capabilities should solve this problem.

Ben

> -----Original Message-----
> From: Ben Peart [mailto:peartben@gmail.com]
> Sent: Tuesday, February 7, 2017 1:21 PM
> To: 'Christian Couder' <christian.couder@gmail.com>
> Cc: 'Jeff King' <peff@peff.net>; 'git' <git@vger.kernel.org>; 'Johannes
> Schindelin' <Johannes.Schindelin@gmx.de>; Ben Peart
> <benpeart@microsoft.com>
> Subject: RE: [RFC] Add support for downloading blobs on demand
> 
> No worries about a late response, I'm sure this is the start of a long
> conversation. :)
> 
> > -----Original Message-----
> > From: Christian Couder [mailto:christian.couder@gmail.com]
> > Sent: Sunday, February 5, 2017 9:04 AM
> > To: Ben Peart <peartben@gmail.com>
> > Cc: Jeff King <peff@peff.net>; git <git@vger.kernel.org>; Johannes
> > Schindelin <Johannes.Schindelin@gmx.de>
> > Subject: Re: [RFC] Add support for downloading blobs on demand
> >
> > (Sorry for the late reply and thanks to Dscho for pointing me to this
> > thread.)
> >
> > On Tue, Jan 17, 2017 at 10:50 PM, Ben Peart <peartben@gmail.com>
> wrote:
> > >> From: Jeff King [mailto:peff@peff.net] On Fri, Jan 13, 2017 at
> > >> 10:52:53AM -0500, Ben Peart wrote:
> > >>
> > >> > Clone and fetch will pass a  --lazy-clone  flag (open to a better
> > >> > name
> > >> > here) similar to  --depth  that instructs the server to only
> > >> > return commits and trees and to ignore blobs.
> > >> >
> > >> > Later during git operations like checkout, when a blob cannot be
> > >> > found after checking all the regular places (loose, pack,
> > >> > alternates, etc), git will download the missing object and place
> > >> > it into the local object store (currently as a loose object) then
> > >> > resume the
> > operation.
> > >>
> > >> Have you looked at the "external odb" patches I wrote a while ago,
> > >> and which Christian has been trying to resurrect?
> > >>
> > >>
> > >>
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpub
> > >> li
> > >> c-
> > >> inbox.org%2Fgit%2F20161130210420.15982-1-
> > >>
> >
> chriscool%40tuxfamily.org%2F&data=02%7C01%7CBen.Peart%40microsoft.c
> > >>
> >
> om%7C9596d3bf32564f123e0c08d43f08a9e1%7C72f988bf86f141af91ab2d7c
> > >>
> >
> d011db47%7C1%7C0%7C636202753822020527&sdata=a6%2BGOAQoRhjFoxS
> > >> vftY8JZAVUssmrXuDZ9OBy3xqNZk%3D&reserved=0
> > >>
> > >> This is a similar approach, though I pushed the policy for "how do
> > >> you get the objects" out into an external script. One advantage
> > >> there is that large objects could easily be fetched from another
> > >> source entirely (e.g., S3 or equivalent) rather than the repo itself.
> > >>
> > >> The downside is that it makes things more complicated, because a
> > >> push or a fetch now involves three parties (server, client, and the
> > >> alternate object store). So questions like "do I have all the
> > >> objects I need" are hard to reason about.
> > >>
> > >> If you assume that there's going to be _some_ central Git repo
> > >> which has all of the objects, you might as well fetch from there
> > >> (and do it over normal git protocols). And that simplifies things a
> > >> bit, at the cost of
> > being less flexible.
> > >
> > > We looked quite a bit at the external odb patches, as well as lfs
> > > and even using alternates.  They all share a common downside that
> > > you must maintain a separate service that contains _some_ of the files.
> >
> > Pushing the policy for "how do you get the objects" out into an
> > external helper doesn't mean that the external helper cannot use the main
> service.
> > The external helper is still free to do whatever it wants including
> > calling the main service if it thinks it's better.
> 
> That is a good point and you're correct, that means you can avoid having to
> build out multiple services.
> 
> >
> > > These
> > > files must also be versioned, replicated, backed up and the service
> > > itself scaled out to handle the load.  As you mentioned, having
> > > multiple services involved increases flexability but it also
> > > increases the complexity and decreases the reliability of the
> > > overall version control service.
> >
> > About reliability, I think it depends a lot on the use case. If you
> > want to get very big files over an unreliable connection, it can
> > better if you send those big files over a restartable protocol and
> > service like HTTP/S on a regular web server.
> >
> 
> My primary concern about reliability was the multiplicative effect of making
> multiple requests across multiple servers to complete a single request.
> Having putting this all in a single service like you suggested above brings us
> back to parity on the complexity.
> 
> > > For operational simplicity, we opted to go with a design that uses a
> > > single, central git repo which has _all_ the objects and to focus on
> > > enhancing it to handle large numbers of files efficiently.  This
> > > allows us to focus our efforts on a great git service and to avoid
> > > having to build out these other services.
> >
> > Ok, but I don't think it prevents you from using at least some of the
> > same mechanisms that the external odb series is using.
> > And reducing the number of mechanisms in Git itself is great for its
> > maintainability and simplicity.
> 
> I completely agree with the goal of reducing the number of mechanisms in
> Git itself.  Our proposal is primarily targeting speeding operations when
> dealing with large numbers of files.  ObjectDB is primarily targeting large
> objects but there is a lot of similarity in how we're approaching the solution.
> I hope/believe we can come to a common solution that will solve both.
> 
> >
> > >> > To prevent git from accidentally downloading all missing blobs,
> > >> > some git operations are updated to be aware of the potential for
> > missing blobs.
> > >> > The most obvious being check_connected which will return success
> > >> > as if everything in the requested commits is available locally.
> > >>
> > >> Actually, Git is pretty good about trying not to access blobs when
> > >> it doesn't need to. The important thing is that you know enough
> > >> about the blobs to fulfill has_sha1_file() and sha1_object_info()
> > >> requests without actually fetching the data.
> > >>
> > >> So the client definitely needs to have some list of which objects
> > >> exist, and which it _could_ get if it needed to.
> >
> > Yeah, and the external odb series handles that already, thanks to
> > Peff's initial work.
> >
> 
> I'm currently working on a patch series that will reimplement our current
> read-object hook to use the LFS model for long running background
> processes.  As part of that, I am building a versioned interface that will
> support multiple commands (like get, have, put).  In my initial
> implementation, I'm only supporting the "get" verb as that is what we
> currently need but my intent is to build it so that we could add have and put
> in future versions.  When I have the first iteration ready, I'll push it up to our
> fork on github for review as code is clearer than my description in email.
> 
> Moving forward, the "have" verb is a little problematic as we would "have"
> 3+ million shas that we'd be required to fetch from the server and then pass
> along to git when requested.  It would be nice to come up with a way to
> avoid or reduce that cost.
> 
> > >> The one place you'd probably want to tweak things is in the diff
> > >> code, as a single "git log -Sfoo" would fault in all of the blobs.
> > >
> > > It is an interesting idea to explore how we could be smarter about
> > > preventing blobs from faulting in if we had enough info to fulfill
> > > has_sha1_file() and sha1_object_info().  Given we also heavily prune
> > > the working directory using sparse-checkout, this hasn't been our
> > > top focus but it is certainly something worth looking into.
> >
> > The external odb series doesn't handle preventing blobs from faulting
> > in yet, so this could be a common problem.
> >
> 
> Agreed.  This is one we've been working on quite a bit out of necessity.  If
> you look at our patch series, most of the changes are related to dealing with
> missing objects.
> 
> > [...]
> >
> > >> One big hurdle to this approach, no matter the protocol, is how you
> > >> are going to handle deltas. Right now, a git client tells the
> > >> server "I have this commit, but I want this other one". And the
> > >> server knows which objects the client has from the first, and which
> > >> it needs from the second. Moreover, it knows that it can send
> > >> objects in delta form directly from disk if the other side has the delta
> base.
> > >>
> > >> So what happens in this system? We know we don't need to send any
> > >> blobs in a regular fetch, because the whole idea is that we only
> > >> send blobs on demand. So we wait for the client to ask us for blob
> > >> A. But then what do we send? If we send the whole blob without
> > >> deltas, we're going to waste a lot of bandwidth.
> > >>
> > >> The on-disk size of all of the blobs in linux.git is ~500MB. The
> > >> actual data size is ~48GB. Some of that is from zlib, which you get
> > >> even for non-deltas. But the rest of it is from the delta
> > >> compression. I don't think it's feasible to give that up, at least
> > >> not for "normal" source repos like linux.git (more on that in a minute).
> > >>
> > >> So ideally you do want to send deltas. But how do you know which
> > >> objects the other side already has, which you can use as a delta
> > >> base? Sending the list of "here are the blobs I have" doesn't scale.
> > >> Just the sha1s start to add up, especially when you are doing
> > >> incremental
> > fetches.
> >
> > To initialize some paths that the client wants, it could perhaps just
> > ask for some pack files, or maybe bundle files, related to these paths.
> > Those packs or bundles could be downloaded either directly from the
> > main server or from other web or proxy servers.
> >
> > >> I think this sort of things performs a lot better when you just
> > >> focus on large objects. Because they don't tend to delta well
> > >> anyway, and the savings are much bigger by avoiding ones you don't
> > >> want. So a directive like "don't bother sending blobs larger than
> > >> 1MB" avoids a lot of these issues. In other words, you have some
> > >> quick shorthand to communicate between the client and server: this
> > >> what I have, and what I
> > don't.
> > >> Normal git relies on commit reachability for that, but there are
> > >> obviously other dimensions. The key thing is that both sides be
> > >> able to express the filters succinctly, and apply them efficiently.
> > >
> > > Our challenge has been more the sheer _number_ of files that exist
> > > in the repo rather than the _size_ of the files in the repo.  With
> > > >3M source files and any typical developer only needing a small
> > > percentage of those files to do their job, our focus has been
> > > pruning the tree as much as possible such that they only pay the
> > > cost for the files they actually need.  With typical text source
> > > files being 10K - 20K in size, the overhead of the round trip is a
> > > significant part of the overall transfer time so deltas don't help
> > > as much.  I agree that large files are also a problem but it isn't my top
> focus at this point in time.
> >
> > Ok, but it would be nice if both problems could be solved using some
> > common mechanisms.
> > This way it could probably work better in situations where there are
> > both a large number of files _and_ some big files.
> > And from what I am seeing, there could be no real downside from using
> > some common mechanisms.
> >
> 
> Agree completely.  I'm hopeful that we can come up with some common
> mechanisms that will allow us to solve both problems.
> 
> > >> If most of your benefits are not from avoiding blobs in general,
> > >> but rather just from sparsely populating the tree, then it sounds
> > >> like sparse clone might be an easier path forward. The general idea
> > >> is to restrict not just the checkout, but the actual object
> > >> transfer and reachability (in the tree dimension, the way shallow
> > >> clone limits it in the time dimension, which will require
> > >> cooperation between the client
> > and server).
> > >>
> > >> So that's another dimension of filtering, which should be expressed
> > >> pretty
> > >> succinctly: "I'm interested in these paths, and not these other
> > >> ones." It's pretty easy to compute on the server side during graph
> > >> traversal (though it interacts badly with reachability bitmaps, so
> > >> there would need to be some hacks there).
> > >>
> > >> It's an idea that's been talked about many times, but I don't
> > >> recall that there were ever working patches. You might dig around
> > >> in the list archive under the name "sparse clone" or possibly "narrow
> clone".
> > >
> > > While a sparse/narrow clone would work with this proposal, it isn't
> > > required.  You'd still probably want all the commits and trees but
> > > the clone would also bring down the specified blobs.  Combined with
> > > using "depth" you could further limit it to those blobs at tip.
> > >
> > > We did run into problems with this model however as our usage
> > > patterns are such that our working directories often contain very
> > > sparse trees and as a result, we can end up with thousands of
> > > entries in the sparse checkout file.  This makes it difficult for
> > > users to manually specify a sparse-checkout before they even do a
> > > clone.  We have implemented a hashmap based sparse-checkout to deal
> > > with the performance issues of having that many entries but that's a
> > > different RFC/PATCH.  In short, we found that a "lazy-clone" and
> > > downloading blobs on demand provided a better developer experience.
> >
> > I think both ways are possible using the external odb mechanism.
> >
> > >> > Future Work
> > >> > ~~~~~~~~~~~
> > >> >
> > >> > The current prototype calls a new hook proc in
> > >> > sha1_object_info_extended and read_object, to download each
> > >> > missing blob.  A better solution would be to implement this via a
> > >> > long running process that is spawned on the first download and
> > >> > listens for requests to download additional objects until it
> > >> > terminates when the parent git operation exits (similar to the
> > >> > recent long running smudge and clean filter
> > >> work).
> > >>
> > >> Yeah, see the external-odb discussion. Those prototypes use a
> > >> process per object, but I think we all agree after seeing how the
> > >> git-lfs interface has scaled that this is a non-starter. Recent
> > >> versions of git-lfs do the single- process thing, and I think any
> > >> sort of external-odb hook should be modeled on that protocol.
> >
> > I agree that the git-lfs scaling work is great, but I think it's not
> > necessary in the external odb work to have the same kind of
> > single-process protocol from the beginning (though it should be possible
> and easy to add it).
> > For example if the external odb work can be used or extended to handle
> > restartable clone by downloading a single bundle when cloning, this
> > would not need that kind of protocol.
> >
> > > I'm looking into this now and plan to re-implement it this way
> > > before sending out the first patch series.  Glad to hear you think
> > > it is a good protocol to model it on.
> >
> > Yeah, for your use case on Windows, it looks really worth it to use
> > this kind of protocol.
> >
> > >> > Need to investigate an alternate batching scheme where we can
> > >> > make a single request for a set of "related" blobs and receive
> > >> > single a packfile (especially during checkout).
> > >>
> > >> I think this sort of batching is going to be the really hard part
> > >> to retrofit onto git. Because you're throwing out the procedural
> > >> notion that you can loop over a set of objects and ask for each
> individually.
> > >> You have to start deferring computation until answers are ready.
> > >> Some operations can do that reasonably well (e.g., checkout), but
> > >> something like "git log -p" is constantly digging down into history.
> > >> I suppose you could just perform the skeleton of the operation
> > >> _twice_, once to find the list of objects to fault in, and the
> > >> second time to
> > actually do it.
> >
> > In my opinion, perhaps we can just prevent "git log -p" from faulting
> > in blobs and have it show a warning saying that it was performed only
> > on a subset of all the blobs.
> >
> 
> You might be surprised at how many other places end up faulting in blobs. :)
> Rename detection is one we've recently been working on.
> 
> > [...]



^ permalink raw reply

* SHAttered (the first practical SHA1 attack)
From: Santiago Torres @ 2017-02-23 15:50 UTC (permalink / raw)
  To: Git

[-- Attachment #1: Type: text/plain, Size: 321 bytes --]

Hello all,

I ran into this website presenting the "first practical attack on
sha1"[1]. I don't recall seeing this on the ML, so I'm sharing this just
in case. I know there are proposals to move out of sha1 already. I
wonder if this affects the timeline for their adoption?

Thanks,
-Santiago.

[1] https://shattered.io/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* RE: [PATCH 2/2] http: add an "auto" mode for http.emptyauth
From: David Turner @ 2017-02-23 16:31 UTC (permalink / raw)
  To: 'Jeff King'
  Cc: Junio C Hamano, git@vger.kernel.org, sandals@crustytoothpaste.net,
	Johannes Schindelin, Eric Sunshine
In-Reply-To: <20170223013746.lturqad7lnehedb4@sigill.intra.peff.net>



> -----Original Message-----
> From: Jeff King [mailto:peff@peff.net]
> Sent: Wednesday, February 22, 2017 8:38 PM
> To: David Turner <David.Turner@twosigma.com>
> Cc: Junio C Hamano <gitster@pobox.com>; git@vger.kernel.org;
> sandals@crustytoothpaste.net; Johannes Schindelin
> <johannes.schindelin@gmx.de>; Eric Sunshine <sunshine@sunshineco.com>
> Subject: Re: [PATCH 2/2] http: add an "auto" mode for http.emptyauth
> 
> On Thu, Feb 23, 2017 at 01:16:33AM +0000, David Turner wrote:
> 
> > I don't know enough about how libcurl handles authentication to know
> > whether these patches are a good idea, but I have a minor comment
> anyway.
> 
> As somebody who is using non-Basic auth, can you apply these patches and
> show us the output of:
> 
>    GIT_TRACE_CURL=1 \
>    git ls-remote https://your-server 2>&1 >/dev/null |
>    egrep '(Send|Recv) header: (GET|HTTP|Auth)'
> 
> (without http.emptyauth turned on, obviously).

The results appear to be identical with and without
the patch.  With http.emptyauth turned off,
16:27:28.208924 http.c:524              => Send header: GET /info/refs?service=git-upload-pack HTTP/1.1
16:27:28.212872 http.c:524              <= Recv header: HTTP/1.1 401 Authorization Required
Username for 'http://git': [I just pressed enter]
Password for 'http://git': [ditto]
16:27:29.928872 http.c:524              => Send header: GET /info/refs?service=git-upload-pack HTTP/1.1
16:27:29.929787 http.c:524              <= Recv header: HTTP/1.1 401 Authorization Required

(if someone else wants to replicate this, delete >/dev/null bit 
from Jeff's shell snippet)



^ permalink raw reply

* SHA1 collisions found
From: Joey Hess @ 2017-02-23 16:43 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 748 bytes --]

https://shattered.io/static/shattered.pdf
https://freedom-to-tinker.com/2017/02/23/rip-sha-1/

IIRC someone has been working on parameterizing git's SHA1 assumptions
so a repository could eventually use a more secure hash. How far has
that gotten? There are still many "40" constants in git.git HEAD.

In the meantime, git commit -S, and checks that commits are signed,
seems like the only way to mitigate against attacks such as
the ones described in the threads at
https://joeyh.name/blog/sha-1/ and
https://joeyh.name/blog/entry/size_of_the_git_sha1_collision_attack_surface/

Since we now have collisions in valid PDF files, collisions in valid git
commit and tree objects are probably able to be constructed.

-- 
see shy jo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: SHA1 collisions found
From: Junio C Hamano @ 2017-02-23 17:02 UTC (permalink / raw)
  To: Joey Hess; +Cc: Git Mailing List
In-Reply-To: <20170223164306.spg2avxzukkggrpb@kitenet.net>

On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess <id@joeyh.name> wrote:
>
> Since we now have collisions in valid PDF files, collisions in valid git
> commit and tree objects are probably able to be constructed.

That may be true, but
https://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901@ppc970.osdl.org/

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox