Git development
 help / color / mirror / Atom feed
* Re: [RFC PATCH] GIT-VERSION-GEN: set --abbrev=9 to match auto-scaling
From: Ramsay Jones @ 2016-12-06 20:43 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: GIT Mailing-list
In-Reply-To: <20161206182648.sxoftkd4hjhvenaf@sigill.intra.peff.net>



On 06/12/16 18:26, Jeff King wrote:
> On Tue, Dec 06, 2016 at 10:17:55AM -0800, Junio C Hamano wrote:
> 
>> Yup, that is what I meant to say with "that is already possible" and
>> we are on the same page.  As all three of us seem to be happy with
>> just dropping --abbrev and letting describe do its default thing (as
>> configured by whoever is doing the build), let's do so.
>>
>> -- >8 --
>> From: Ramsay Jones <ramsay@ramsayjones.plus.com>
>> Date: Sun, 4 Dec 2016 20:45:59 +0000
>> Subject: [PATCH] GIT-VERSION-GEN: do not force abbreviation length used by 'describe'
> 
> Thanks, this is a nice summary of the discussion, and the patch is
> obviously correct.

Yep, looks very good to me! (I had started to write a commit
message for this patch, when your email arrived. Needless to
say, but your message is much better than mine!)

Thanks!

ATB,
Ramsay Jones


^ permalink raw reply

* Re: [PATCH] stash: disable renames when calling git-diff
From: Junio C Hamano @ 2016-12-06 20:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Charles Bailey, Matthew Patey, git
In-Reply-To: <20161206202521.zxgxqsetgqejsyl3@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Here it is in patch form. Perhaps I am missing some subtle case that
> diff-index would not handle, but it does pass the test suite. And
> looking over the original thread, I don't see any particular reason to
> prefer git-diff.

Ah, our mails crossed ;-)  I am reasonably sure that "diff HEAD" and
"diff-index HEAD" would behave identically.

> @@ -115,7 +115,7 @@ create_stash () {
>  			git read-tree --index-output="$TMPindex" -m $i_tree &&
>  			GIT_INDEX_FILE="$TMPindex" &&
>  			export GIT_INDEX_FILE &&
> -			git diff --name-only -z HEAD -- >"$TMP-stagenames" &&
> +			git diff-index --name-only -z HEAD -- >"$TMP-stagenames" &&
>  			git update-index -z --add --remove --stdin <"$TMP-stagenames" &&
>  			git write-tree &&
>  			rm -f "$TMPindex"

Will queue this one instead.  Thanks.


^ permalink raw reply

* Re: [PATCH] stash: disable renames when calling git-diff
From: Jeff King @ 2016-12-06 20:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Charles Bailey, Matthew Patey, git
In-Reply-To: <20161206201130.azfmqvtk3iettlx7@sigill.intra.peff.net>

On Tue, Dec 06, 2016 at 03:11:30PM -0500, Jeff King wrote:

> > Yeah, it looks like "add -u" to me, too.  Perhaps the script was old
> > enough that it didn't exist back then?  I dunno.
> 
> Hmm, nope. It _was_ "git add -u" at one point and switched. See
> 7aa5d43cc (stash: Don't overwrite files that have gone from the index,
> 2010-04-18).
> 
> I think you are right that diff-index could work, though. I always
> forget that without "--cached", diff-index looks at the working tree
> files.

Here it is in patch form. Perhaps I am missing some subtle case that
diff-index would not handle, but it does pass the test suite. And
looking over the original thread, I don't see any particular reason to
prefer git-diff.

+cc Charles just in case he remembers anything.

-- >8 --
Subject: [PATCH] stash: prefer plumbing over git-diff

When creating a stash, we need to look at the diff between
the working tree and HEAD, and do so using the git-diff
porcelain.  Because git-diff enables porcelain config like
renames by default, this causes at least one problem. The
--name-only format will not mention the source side of a
rename, meaning we will fail to stash a deletion that is
part of a rename.

We could fix that case by passing --no-renames, but this is
a symptom of a larger problem. We should be using the
diff-index plumbing here, which does not have renames
enabled by default, and also does not respect any
potentially confusing config options.

Reported-by: Matthew Patey <matthew.patey2167@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 git-stash.sh     | 2 +-
 t/t3903-stash.sh | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/git-stash.sh b/git-stash.sh
index 4546abaae..10c284d1a 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -115,7 +115,7 @@ create_stash () {
 			git read-tree --index-output="$TMPindex" -m $i_tree &&
 			GIT_INDEX_FILE="$TMPindex" &&
 			export GIT_INDEX_FILE &&
-			git diff --name-only -z HEAD -- >"$TMP-stagenames" &&
+			git diff-index --name-only -z HEAD -- >"$TMP-stagenames" &&
 			git update-index -z --add --remove --stdin <"$TMP-stagenames" &&
 			git write-tree &&
 			rm -f "$TMPindex"
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index e1a6ccc00..2de3e18ce 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -766,4 +766,13 @@ test_expect_success 'stash list --cc shows combined diff' '
 	test_cmp expect actual
 '
 
+test_expect_success 'stash is not confused by partial renames' '
+	mv file renamed &&
+	git add renamed &&
+	git stash &&
+	git stash apply &&
+	test_path_is_file renamed &&
+	test_path_is_missing file
+'
+
 test_done
-- 
2.11.0.191.gdb26c57


^ permalink raw reply related

* Re: [PATCH] stash: disable renames when calling git-diff
From: Junio C Hamano @ 2016-12-06 20:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Matthew Patey, git
In-Reply-To: <xmqq37i024fy.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> I think you are right that diff-index could work, though. I always
>> forget that without "--cached", diff-index looks at the working tree
>> files.
>
> I'll reword the log message while queuing.

The last paragraph became like so:

    The correct solution is to move to an appropriate plumbing
    command. But in the short term lets ask git-diff not to respect
    renames.


^ permalink raw reply

* Re: [PATCH] stash: disable renames when calling git-diff
From: Jeff King @ 2016-12-06 20:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthew Patey, git
In-Reply-To: <xmqq37i024fy.fsf@gitster.mtv.corp.google.com>

On Tue, Dec 06, 2016 at 12:22:25PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I think you are right that diff-index could work, though. I always
> > forget that without "--cached", diff-index looks at the working tree
> > files.
> 
> I'll reword the log message while queuing.  It was super surprising
> to me to hear that there was something "git diff" did that three
> "git diff-*" plumbing brothers couldn't do at the basic "what to
> compare with what" level, as I wrote all four ;-)

Oops, our emails just crossed; I just sent a revised patch.

I know there are a few cases that git-diff can handle that the others
can't, but I think they are all direct blob-to-blob comparisons.

-Peff

^ permalink raw reply

* Re: [PATCH] stash: disable renames when calling git-diff
From: Junio C Hamano @ 2016-12-06 20:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Matthew Patey, git
In-Reply-To: <20161206201130.azfmqvtk3iettlx7@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> I think you are right that diff-index could work, though. I always
> forget that without "--cached", diff-index looks at the working tree
> files.

I'll reword the log message while queuing.  It was super surprising
to me to hear that there was something "git diff" did that three
"git diff-*" plumbing brothers couldn't do at the basic "what to
compare with what" level, as I wrote all four ;-)

^ permalink raw reply

* Re: [PATCH] stash: disable renames when calling git-diff
From: Jeff King @ 2016-12-06 20:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthew Patey, git
In-Reply-To: <xmqqd1h425vv.fsf@gitster.mtv.corp.google.com>

On Tue, Dec 06, 2016 at 11:51:16AM -0800, Junio C Hamano wrote:

> > I don't think there's a plumbing command which works for diffing the
> > working tree directly to a git tree. In the long run, it might be a good
> > idea to remedy that.
> 
> I do not think that one is doing anything different from "git
> diff-index --name-only -z HEAD".
> [...]
> Yeah, it looks like "add -u" to me, too.  Perhaps the script was old
> enough that it didn't exist back then?  I dunno.

Hmm, nope. It _was_ "git add -u" at one point and switched. See
7aa5d43cc (stash: Don't overwrite files that have gone from the index,
2010-04-18).

I think you are right that diff-index could work, though. I always
forget that without "--cached", diff-index looks at the working tree
files.

-Peff

^ permalink raw reply

* Re: [PATCH] stash: disable renames when calling git-diff
From: Junio C Hamano @ 2016-12-06 19:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Matthew Patey, git
In-Reply-To: <20161206193154.vf7cd7lk5gyxrra5@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Tue, Dec 06, 2016 at 11:20:18AM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> >> If you run:
>> >> 
>> >>   git -c diff.renames=false stash
>> >> 
>> >> then it works.
>> >
>> > And here's a patch to fix it.
>> 
>> Yuck.  This obviously has easier to bite more people since we
>> enabled the renames by default.  Thanks for a quick fix.
>> 
>> I wonder why we are using "git diff" here, not the plumbing,
>> though.
>
> I don't think there's a plumbing command which works for diffing the
> working tree directly to a git tree. In the long run, it might be a good
> idea to remedy that.

I do not think that one is doing anything different from "git
diff-index --name-only -z HEAD".

> Though I'm not sure that "git add -u" would not accomplish the same
> thing as these several commands.

Yeah, it looks like "add -u" to me, too.  Perhaps the script was old
enough that it didn't exist back then?  I dunno.


^ permalink raw reply

* Re: [PATCH v15 18/27] bisect--helper: `bisect_autostart` shell function in C
From: Pranit Bauva @ 2016-12-06 19:47 UTC (permalink / raw)
  To: Stephan Beyer; +Cc: Git List
In-Reply-To: <723476f6-2c8c-38df-1771-9a525196d9de@gmx.net>

Hey Stephan,

On Mon, Nov 21, 2016 at 1:45 AM, Stephan Beyer <s-beyer@gmx.net> wrote:
> Hi,
>
> On 10/14/2016 04:14 PM, Pranit Bauva wrote:
>> diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
>> index 502bf18..1767916 100644
>> --- a/builtin/bisect--helper.c
>> +++ b/builtin/bisect--helper.c
>> @@ -422,6 +425,7 @@ static int bisect_next(...)
>>  {
>>       int res, no_checkout;
>>
>> +     bisect_autostart(terms);
>
> You are not checking for return values here. (The shell code simply
> exited if there is no tty, but you don't.)

True. I didn't notice it carefully. Thanks for pointing it out.

>> @@ -754,6 +758,32 @@ static int bisect_start(struct bisect_terms *terms, int no_checkout,
>>       return retval || bisect_auto_next(terms, NULL);
>>  }
>>
>> +static int bisect_autostart(struct bisect_terms *terms)
>> +{
>> +     if (is_empty_or_missing_file(git_path_bisect_start())) {
>> +             const char *yesno;
>> +             const char *argv[] = {NULL};
>> +             fprintf(stderr, _("You need to start by \"git bisect "
>> +                               "start\"\n"));
>> +
>> +             if (!isatty(0))
>
> isatty(STDIN_FILENO)?

Seems better.

>> +                     return 1;
>> +
>> +             /*
>> +              * TRANSLATORS: Make sure to include [Y] and [n] in your
>> +              * translation. THe program will only accept English input
>
> Typo "THe"

Sure.

>> +              * at this point.
>> +              */
>
> Taking "at this point" into consideration, I think the Y and n can be
> easily translated now that it is in C. I guess, by using...
>
>> +             yesno = git_prompt(_("Do you want me to do it for you "
>> +                                  "[Y/n]? "), PROMPT_ECHO);
>> +             if (starts_with(yesno, "n") || starts_with(yesno, "N"))
>
> ... starts_with(yesno, _("n")) || starts_with(yesno, _("N"))
> here (but not sure). However, this would be an extra patch on top of
> this series.

Can add it as an extra patch. Thanks for informing.

>> +                     exit(0);
>
> Shouldn't this also be "return 1;"? Saying "no" is the same outcome as
> not having a tty to ask for yes or no.

Yes.

>>  int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
>>  {
>>       enum {
>> @@ -790,6 +821,8 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
>>                        N_("find the next bisection commit"), BISECT_NEXT),
>>               OPT_CMDMODE(0, "bisect-auto-next", &cmdmode,
>>                        N_("verify the next bisection state then find the next bisection state"), BISECT_AUTO_NEXT),
>> +             OPT_CMDMODE(0, "bisect-autostart", &cmdmode,
>> +                      N_("start the bisection if BISECT_START empty or missing"), BISECT_AUTOSTART),
>
> The word "is" is missing.

Sure. Thanks for going through these patches very carefully.

Regards,
Pranit Bauva

^ permalink raw reply

* Re: [PATCH v15 08/27] bisect--helper: `is_expected_rev` & `check_expected_revs` shell function in C
From: Pranit Bauva @ 2016-12-06 19:33 UTC (permalink / raw)
  To: Stephan Beyer; +Cc: Git List
In-Reply-To: <a4c7fec8-0e84-eb53-ca22-c369ce3facfa@gmx.net>

Hey Stephan,

On Thu, Nov 17, 2016 at 5:17 AM, Stephan Beyer <s-beyer@gmx.net> wrote:
> Hi,
>
> On 10/14/2016 04:14 PM, Pranit Bauva wrote:
>> diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
>> index d84ba86..c542e8b 100644
>> --- a/builtin/bisect--helper.c
>> +++ b/builtin/bisect--helper.c
>> @@ -123,13 +123,40 @@ static int bisect_reset(const char *commit)
>>       return bisect_clean_state();
>>  }
>>
>> +static int is_expected_rev(const char *expected_hex)
>> +{
>> +     struct strbuf actual_hex = STRBUF_INIT;
>> +     int res = 0;
>> +     if (strbuf_read_file(&actual_hex, git_path_bisect_expected_rev(), 0) >= 40) {
>> +             strbuf_trim(&actual_hex);
>> +             res = !strcmp(actual_hex.buf, expected_hex);
>> +     }
>> +     strbuf_release(&actual_hex);
>> +     return res;
>> +}
>
> I am not sure it does what it should.
>
> I would expect the following behavior from this function:
>  - file does not exist (or is "broken") => return 0
>  - actual_hex != expected_hex => return 0
>  - otherwise return 1
>
> If I am not wrong, the code does the following instead:
>  - file does not exist (or is "broken") => return 0
>  - actual_hex != expected_hex => return 1
>  - otherwise => return 0

Yeah, you are right. I should update this. Thanks for pointing it out.

>> +static int check_expected_revs(const char **revs, int rev_nr)
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < rev_nr; i++) {
>> +             if (!is_expected_rev(revs[i])) {
>> +                     unlink_or_warn(git_path_bisect_ancestors_ok());
>> +                     unlink_or_warn(git_path_bisect_expected_rev());
>> +                     return 0;
>> +             }
>> +     }
>> +     return 0;
>> +}
>
> Here I am not sure what the function *should* do. However, I see that it
> basically mimics the behavior of the shell function (assuming
> is_expected_rev() is implemented correctly).
>
> I don't understand why the return value is int and not void. To avoid a
> "return 0;" line when calling this function?

Initially I thought I would be using the return value but now I
realize that it is meaningless to do so. Using void seems better. :)

>> @@ -167,6 +196,8 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
>>               if (argc > 1)
>>                       die(_("--bisect-reset requires either zero or one arguments"));
>>               return bisect_reset(argc ? argv[0] : NULL);
>> +     case CHECK_EXPECTED_REVS:
>> +             return check_expected_revs(argv, argc);
>
> I note that you check the correct number of arguments for some
> subcommands and you do not check it for some other subcommands like this
> one. (I don't care, I just want to mention it.)

Here we should be able to accept any number of arguments. I think it
would be good to add a non-zero check though just to maintain the
uniformity. Though this is something programmer needs to be careful
about rather than the user.

>>       default:
>>               die("BUG: unknown subcommand '%d'", cmdmode);
>>       }
>
> ~Stephan

Regards,
Pranit Bauva

^ permalink raw reply

* Re: [PATCH] stash: disable renames when calling git-diff
From: Jeff King @ 2016-12-06 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthew Patey, git
In-Reply-To: <xmqqh96g27bh.fsf@gitster.mtv.corp.google.com>

On Tue, Dec 06, 2016 at 11:20:18AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> If you run:
> >> 
> >>   git -c diff.renames=false stash
> >> 
> >> then it works.
> >
> > And here's a patch to fix it.
> 
> Yuck.  This obviously has easier to bite more people since we
> enabled the renames by default.  Thanks for a quick fix.
> 
> I wonder why we are using "git diff" here, not the plumbing,
> though.

I don't think there's a plumbing command which works for diffing the
working tree directly to a git tree. In the long run, it might be a good
idea to remedy that.

Though I'm not sure that "git add -u" would not accomplish the same
thing as these several commands.

-Peff

^ permalink raw reply

* Re: [PATCH] stash: disable renames when calling git-diff
From: Junio C Hamano @ 2016-12-06 19:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Matthew Patey, git
In-Reply-To: <20161206154120.yyuca35ugyuifpq6@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

>> If you run:
>> 
>>   git -c diff.renames=false stash
>> 
>> then it works.
>
> And here's a patch to fix it.

Yuck.  This obviously has easier to bite more people since we
enabled the renames by default.  Thanks for a quick fix.

I wonder why we are using "git diff" here, not the plumbing,
though.

>
> -- >8 --
> Subject: [PATCH] stash: disable renames when calling git-diff
>
> When creating a stash, we need to look at the diff between
> the working tree and HEAD. There's no plumbing command that
> covers this case, so we do so by calling the git-diff
> porcelain. This means we also inherit the usual list of
> porcelain options, which can impact the output in confusing
> ways.
>
> There's at least one known problem: --name-only will not
> mention the source side of a rename, meaning that we will
> fail to stash a deletion in such a case.
>
> The correct solution is to move to an appropriate plumbing
> command. But since there isn't one available, in the short
> term we can plug this particular problem by manually asking
> git-diff not to respect renames.
>
> Reported-by: Matthew Patey <matthew.patey2167@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> There may be other similar problems lurking depending on the various
> config options you have set, but I don't think so. Most of the options
> only affect --patch output.
>
> I do find it interesting that --name-only does not mention the source
> side of a rename. The longer forms like --name-status mention both
> sides. Certainly --name-status does not have any way of saying "this is
> the source side of a rename", but I'd think it would simply mention both
> sides. Anyway, even if that were changed (which would fix this bug), I
> think adding --no-renames is still a good thing to be doing here.
>
>  git-stash.sh     | 2 +-
>  t/t3903-stash.sh | 9 +++++++++
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/git-stash.sh b/git-stash.sh
> index 4546abaae..40ab2710e 100755
> --- a/git-stash.sh
> +++ b/git-stash.sh
> @@ -115,7 +115,7 @@ create_stash () {
>  			git read-tree --index-output="$TMPindex" -m $i_tree &&
>  			GIT_INDEX_FILE="$TMPindex" &&
>  			export GIT_INDEX_FILE &&
> -			git diff --name-only -z HEAD -- >"$TMP-stagenames" &&
> +			git diff --name-only -z --no-renames HEAD -- >"$TMP-stagenames" &&
>  			git update-index -z --add --remove --stdin <"$TMP-stagenames" &&
>  			git write-tree &&
>  			rm -f "$TMPindex"
> diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
> index e1a6ccc00..2de3e18ce 100755
> --- a/t/t3903-stash.sh
> +++ b/t/t3903-stash.sh
> @@ -766,4 +766,13 @@ test_expect_success 'stash list --cc shows combined diff' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'stash is not confused by partial renames' '
> +	mv file renamed &&
> +	git add renamed &&
> +	git stash &&
> +	git stash apply &&
> +	test_path_is_file renamed &&
> +	test_path_is_missing file
> +'
> +
>  test_done

^ permalink raw reply

* BUG: "cherry-pick A..B || git reset --hard OTHER"
From: Junio C Hamano @ 2016-12-06 18:58 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, SZEDER Gábor

I was burned a few times with this in the past few years, but it did
not irritate me often enough that I didn't write it down.  But I
think this is serious enough that deserves attention from those who
were involved.

A short reproduction recipe, using objects from git.git that are
publicly available and stable, shows how bad it is.

    $ git checkout v2.9.3^0

    $ git cherry-pick 0582a34f52..a94bb68397
    ... see conflict, decide to give up backporting to
    ... such an old fork point.
    ... the git-prompt gives "|CHERRY-PICKING" correctly.

    $ git reset --hard v2.10.2^0
    ... the git-prompt no longer says "|CHERRY-PICKING"

    $ edit && git commit -m "prelim work for backporting"
    [detached HEAD cc5a6a9219] prelim work for backporting

    $ git cherry-pick 0582a34f52..a94bb68397
    error: a cherry-pick or revert is already in progress
    hint: try "git cherry-pick (--continue | --quit | --abort)"
    fatal: cherry-pick failed

    $ git cherry-pick --abort
    ... we come back to v2.9.3^0, losing the new commit!

The above shows two bugs.

 (1) The third invocation of "cherry-pick" with "--abort" to get rid
     of the state from the unfinished cherry-pick we did previously
     is necessary, but the command does not notice that we resetted
     to a new branch AND we even did some other work there.  This
     loses end-user's work.  

     "git cherry-pick --abort" should learn from "git am --abort"
     that has an extra safety to deal with the above workflow.  The
     state from the unfinished "am" is removed, but the head is not
     rewound to avoid losing end-user's work.

     You can try by replacing two instances of

	$ git cherry-pick 0582a34f52..a94bb68397

     with

	$ git format-patch --stdout 0582a34f52..a94bb68397 | git am

     in the above sequence, and conclude with "git am--abort" to see
     how much more pleasant and safe "git am --abort" is.

 (2) The bug in "cherry-pick --abort" is made worse because the
     git-completion script seems to be unaware of $GIT_DIR/sequencer
     and stops saying "|CHERRY-PICKING" after the step to switch to
     a different state with "git reset --hard v2.10.2^0".  If the
     prompt showed it after "git reset", the end user would have
     thought twice before starting the "prelim work".

Thanks.

^ permalink raw reply

* Re: [PATCH v15 11/27] bisect--helper: `bisect_next_check` & bisect_voc shell function in C
From: Pranit Bauva @ 2016-12-06 18:39 UTC (permalink / raw)
  To: Stephan Beyer; +Cc: Git List
In-Reply-To: <b78a4cbf-86ed-938e-1d41-6c48e0df981e@gmx.net>

Hey Stephan,

Sorry for the late replies. My end semester exams just got over.

On Fri, Nov 18, 2016 at 2:29 AM, Stephan Beyer <s-beyer@gmx.net> wrote:
>
> Hi Pranit,
>
> On 10/14/2016 04:14 PM, Pranit Bauva wrote:
> > Also reimplement `bisect_voc` shell function in C and call it from
> > `bisect_next_check` implementation in C.
>
> Please don't! ;D
>
> > +static char *bisect_voc(char *revision_type)
> > +{
> > +     if (!strcmp(revision_type, "bad"))
> > +             return "bad|new";
> > +     if (!strcmp(revision_type, "good"))
> > +             return "good|old";
> > +
> > +     return NULL;
> > +}
>
> Why not simply use something like this:
>
> static const char *voc[] = {
>         "bad|new",
>         "good|old",
> };
>
> Then...

This probably seems a good idea.

> > +static int bisect_next_check(const struct bisect_terms *terms,
> > +                          const char *current_term)
> > +{
> > +     int missing_good = 1, missing_bad = 1, retval = 0;
> > +     char *bad_ref = xstrfmt("refs/bisect/%s", terms->term_bad);
> > +     char *good_glob = xstrfmt("%s-*", terms->term_good);
> > +     char *bad_syn, *good_syn;
>
> ...you don't need bad_syn and good_syn...
>
> > +     bad_syn = xstrdup(bisect_voc("bad"));
> > +     good_syn = xstrdup(bisect_voc("good"));
>
> ...and hence not these two lines...
>
> > +     if (!is_empty_or_missing_file(git_path_bisect_start())) {
> > +             error(_("You need to give me at least one %s and "
> > +                     "%s revision. You can use \"git bisect %s\" "
> > +                     "and \"git bisect %s\" for that. \n"),
> > +                     bad_syn, good_syn, bad_syn, good_syn);
>
> ...and write
>                         voc[0], voc[1], voc[0], voc[1]);
> instead...
>
> > +             retval = -1;
> > +             goto finish;
> > +     }
> > +     else {
> > +             error(_("You need to start by \"git bisect start\". You "
> > +                     "then need to give me at least one %s and %s "
> > +                     "revision. You can use \"git bisect %s\" and "
> > +                     "\"git bisect %s\" for that.\n"),
> > +                     good_syn, bad_syn, bad_syn, good_syn);
>
> ...and here
>                         voc[1], voc[0], voc[0], voc[1]);
> ...
>
> > +             retval = -1;
> > +             goto finish;
> > +     }
> > +     goto finish;
> > +finish:
> > +     if (!bad_ref)
> > +             free(bad_ref);
> > +     if (!good_glob)
> > +             free(good_glob);
> > +     if (!bad_syn)
> > +             free(bad_syn);
> > +     if (!good_syn)
> > +             free(good_syn);
>
> ...and you can remove the 4 lines above.
>
> > +     return retval;
> > +}
>
> Besides that, there are again some things that I've already mentioned
> and that can be applied here, too, for example, not capitalizing
> TERM_GOOD and TERM_BAD, the goto fail simplification, the terms memory leak.

Your suggestion simplifies things, I will use that.

Regards,
Pranit Bauva

^ permalink raw reply

* Re: [PATCH v3 1/2] difftool: add a skeleton for the upcoming builtin
From: Jeff King @ 2016-12-06 18:35 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Johannes Schindelin, Junio C Hamano, git@vger.kernel.org,
	David Aguilar, Dennis Kaarsemaker
In-Reply-To: <CAGZ79kZoy2zSgSEc7kfAZ9tg9_uJxa+_FNFVO8UEDLVK6YDxVg@mail.gmail.com>

On Tue, Dec 06, 2016 at 10:22:21AM -0800, Stefan Beller wrote:

> >> Maybe even go a step further and say that the config code needs a context
> >> "object".
> >
> > If I were writing git from scratch, I'd consider making a "struct
> > repository" object. I'm not sure how painful it would be to retro-fit it
> > at this point.
> 
> Would it be possible to introduce "the repo" struct similar to "the index"
> in cache.h?
> 
> From a submodule perspective I would very much welcome this
> object oriented approach to repositories.

I think it may be more complicated, because there's some implicit global
state in "the repo", like where files are relative to our cwd. All of
those low-level functions would have to start caring about which repo
we're talking about so they can prefix the appropriate working tree
path, etc.

For some operations that would be fine, but there are things that would
subtly fail for submodules. I'm thinking we'd end up with some code
state like:

  /* finding a repo does not modify global state; good */
  struct repository *repo = repo_discover(".");

  /* obvious repo-level operations like looking up refs can be done with
   * a repository object; good */
  repo_for_each_ref(repo, callback, NULL);

  /*
   * "enter" the repo so that we are at the top-level of the working
   * tree, etc. After this you can actually look at the index without
   * things breaking.
   */
  repo_enter(repo);

That would be enough to implement a lot of submodule-level stuff, but it
would break pretty subtly as soon as you asked the submodule about its
working tree. The solution is to make everything that accesses the
working tree aware of the idea of a working tree root besides the cwd.
But that's a pretty invasive change.

-Peff

^ permalink raw reply

* Re: [PATCH] clone,fetch: explain the shallow-clone option a little more clearly
From: Junio C Hamano @ 2016-12-06 18:29 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Alex Henrie, Git Mailing List
In-Reply-To: <CACsJy8Cy+iWw4NhR7B_fZhBMMYt2QxdMeFsC=++LLnrTLNWfKw@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

> On Tue, Dec 6, 2016 at 1:28 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> I however offhand do not think the feature can be used to make the
>> repository shallower
>
> I'm pretty sure it can,...

I wrote my message after a short local testing, but it is very
possible I botched it and reached a wrong conclusion above.

If we can use the command to make it shallower, then the phrase
"deepen" would probably be what we need to be fixing in this patch:

>  	OPT_STRING_LIST(0, "shallow-exclude", &option_not, N_("revision"),
> -			N_("deepen history of shallow clone by excluding rev")),
> +			N_("deepen history of shallow clone, excluding rev")),

Perhaps a shorter version of:

    Adjust the depth of shallow clone so that commits that are
    decendants of the named rev are made available, while commits
    that are ancestors of the named rev are made beyond reach.

or something like that.  Here is my (somewhat botched) attempt:

    Adjust shallow clone's history to be cut at the rev



^ permalink raw reply

* [PATCH v2] jk/http-walker-limit-redirect rebased to maint-2.9
From: Jeff King @ 2016-12-06 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git, sbeller, bburky, jrnieder
In-Reply-To: <20161206181008.yaz2md3343pukaov@sigill.intra.peff.net>

On Tue, Dec 06, 2016 at 01:10:08PM -0500, Jeff King wrote:

> > I think I merged yours and then Brandon's on jch/pu branches in that
> > order, and the conflict resolution should look OK.
> > 
> > I however forked yours on v2.11.0-rc1, which would need to be
> > rebased to one of the earlier maintenance tracks, before we can
> > merge it to 'next'.
> 
> Yeah, I built it on top of master.
> 
> It does depend on some of the http-walker changes Eric made a few months
> ago. In particular, 17966c0a6 (http: avoid disconnecting on 404s for
> loose objects, 2016-07-11) added some checks against the HTTP status
> code, and my series modifies the checks (mostly so that ">= 400" becomes
> ">= 300").
> 
> Rebasing on maint-2.9 means omitting those changes. That preserves the
> security properties, but means that the error handling is worse when we
> see an illegal redirect. That may be OK, though.
> 
> Since the resolution is to omit the changes entirely from my series,
> merging up to v2.11 wouldn't produce any conflicts. We'd need to have a
> separate set of patches adding those changes back in.

This actually turned out to be pretty easy. The final patch from my
original series is the one that tweaks the error-handling from
17966c0a6. The rest of them are _almost_ untouched, except that one of
the error-handling tweaks gets bumped to the final patch.

So here's a resend of the patches, rebased on your maint-2.9 branch.
Patches 1-5 should be applied directly there.

On maint-2.10, you can merge up maint-2.9, and then apply patch 6.

Hopefully that makes sense, but I've pushed branches
jk/maint-X-http-redirect to https://github.com/peff/git that show the
final configuration (and a diff of jk/maint-2.10-http-redirect shows the
outcome is identical to applying the original series on top of 2.10).

Merging up to 2.11 should be trivial.

  [1/6]: http: simplify update_url_from_redirect
  [2/6]: http: always update the base URL for redirects
  [3/6]: remote-curl: rename shadowed options variable
  [4/6]: http: make redirects more obvious
  [5/6]: http: treat http-alternates like redirects
  [6/6]: http-walker: complain about non-404 loose object errors

-Peff

^ permalink raw reply

* Re: [RFC PATCH] GIT-VERSION-GEN: set --abbrev=9 to match auto-scaling
From: Jeff King @ 2016-12-06 18:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ramsay Jones, GIT Mailing-list
In-Reply-To: <xmqqwpfc2a7g.fsf@gitster.mtv.corp.google.com>

On Tue, Dec 06, 2016 at 10:17:55AM -0800, Junio C Hamano wrote:

> Yup, that is what I meant to say with "that is already possible" and
> we are on the same page.  As all three of us seem to be happy with
> just dropping --abbrev and letting describe do its default thing (as
> configured by whoever is doing the build), let's do so.
> 
> -- >8 --
> From: Ramsay Jones <ramsay@ramsayjones.plus.com>
> Date: Sun, 4 Dec 2016 20:45:59 +0000
> Subject: [PATCH] GIT-VERSION-GEN: do not force abbreviation length used by 'describe'

Thanks, this is a nice summary of the discussion, and the patch is
obviously correct.

-Peff

^ permalink raw reply

* [PATCH v2 6/6] http-walker: complain about non-404 loose object errors
From: Jeff King @ 2016-12-06 18:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git, sbeller, bburky, jrnieder
In-Reply-To: <20161206182414.466uotqfufcimpqb@sigill.intra.peff.net>

Since commit 17966c0a6 (http: avoid disconnecting on 404s
for loose objects, 2016-07-11), we turn off curl's
FAILONERROR option and instead manually deal with failing
HTTP codes.

However, the logic to do so only recognizes HTTP 404 as a
failure. This is probably the most common result, but if we
were to get another code, the curl result remains CURLE_OK,
and we treat it as success. We still end up detecting the
failure when we try to zlib-inflate the object (which will
fail), but instead of reporting the HTTP error, we just
claim that the object is corrupt.

Instead, let's catch anything in the 300's or above as an
error (300's are redirects which are not an error at the
HTTP level, but are an indication that we've explicitly
disabled redirects, so we should treat them as such; we
certainly don't have the resulting object content).

Note that we also fill in req->errorstr, which we didn't do
before. Without FAILONERROR, curl will not have filled this
in, and it will remain a blank string. This never mattered
for the 404 case, because in the logic below we hit the
"missing_target()" branch and print nothing. But for other
errors, we'd want to say _something_, if only to fill in the
blank slot in the error message.

Signed-off-by: Jeff King <peff@peff.net>
---
The second hunk here is new in v2; earlier it appeared in patch 3. But
arguably it goes better here anyway; I didn't even need to modify the
commit message to explain it.

 http-walker.c | 7 +++++--
 http.c        | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/http-walker.c b/http-walker.c
index 25a8b1ad4..c2f81cd6a 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -482,10 +482,13 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
 	 * we turned off CURLOPT_FAILONERROR to avoid losing a
 	 * persistent connection and got CURLE_OK.
 	 */
-	if (req->http_code == 404 && req->curl_result == CURLE_OK &&
+	if (req->http_code >= 300 && req->curl_result == CURLE_OK &&
 			(starts_with(req->url, "http://") ||
-			 starts_with(req->url, "https://")))
+			 starts_with(req->url, "https://"))) {
 		req->curl_result = CURLE_HTTP_RETURNED_ERROR;
+		xsnprintf(req->errorstr, sizeof(req->errorstr),
+			  "HTTP request failed");
+	}
 
 	if (obj_req->state == ABORTED) {
 		ret = error("Request for %s aborted", hex);
diff --git a/http.c b/http.c
index ff4d88231..c25d1d540 100644
--- a/http.c
+++ b/http.c
@@ -2021,7 +2021,7 @@ static size_t fwrite_sha1_file(char *ptr, size_t eltsize, size_t nmemb,
 		if (c != CURLE_OK)
 			die("BUG: curl_easy_getinfo for HTTP code failed: %s",
 				curl_easy_strerror(c));
-		if (slot->http_code >= 400)
+		if (slot->http_code >= 300)
 			return size;
 	}
 
-- 
2.11.0.191.gdb26c57

^ permalink raw reply related

* [PATCH v2 5/6] http: treat http-alternates like redirects
From: Jeff King @ 2016-12-06 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git, sbeller, bburky, jrnieder
In-Reply-To: <20161206182414.466uotqfufcimpqb@sigill.intra.peff.net>

The previous commit made HTTP redirects more obvious and
tightened up the default behavior. However, there's another
way for a server to ask a git client to fetch arbitrary
content: by having an http-alternates file (or a regular
alternates file, which is used as a backup).

Similar to the HTTP redirect case, a malicious server can
claim to have refs pointing at object X, return a 404 when
the client asks for X, but point to some other URL via
http-alternates, which the client will transparently fetch.
The end result is that it looks from the user's perspective
like the objects came from the malicious server, as the
other URL is not mentioned at all.

Worse, because we feed the new URL to curl ourselves, the
usual protocol restrictions do not kick in (neither curl's
default of disallowing file://, nor the protocol
whitelisting in f4113cac0 (http: limit redirection to
protocol-whitelist, 2015-09-22).

Let's apply the same rules here as we do for HTTP redirects.
Namely:

  - unless http.followRedirects is set to "always", we will
    not follow remote redirects from http-alternates (or
    alternates) at all

  - set CURLOPT_PROTOCOLS alongside CURLOPT_REDIR_PROTOCOLS
    restrict ourselves to a known-safe set and respect any
    user-provided whitelist.

  - mention alternate object stores on stderr so that the
    user is aware another source of objects may be involved

The first item may prove to be too restrictive. The most
common use of alternates is to point to another path on the
same server. While it's possible for a single-server
redirect to be an attack, it takes a fairly obscure setup
(victim and evil repository on the same host, host speaks
dumb http, and evil repository has access to edit its own
http-alternates file).

So we could make the checks more specific, and only cover
cross-server redirects. But that means parsing the URLs
ourselves, rather than letting curl handle them. This patch
goes for the simpler approach. Given that they are only used
with dumb http, http-alternates are probably pretty rare.
And there's an escape hatch: the user can allow redirects on
a specific server by setting http.<url>.followRedirects to
"always".

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 http-walker.c              |  8 +++++---
 http.c                     |  1 +
 t/t5550-http-fetch-dumb.sh | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/http-walker.c b/http-walker.c
index 2c721f0c3..4bff31c44 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -290,9 +290,8 @@ static void process_alternates_response(void *callback_data)
 				struct strbuf target = STRBUF_INIT;
 				strbuf_add(&target, base, serverlen);
 				strbuf_add(&target, data + i, posn - i - 7);
-				if (walker->get_verbosely)
-					fprintf(stderr, "Also look at %s\n",
-						target.buf);
+				warning("adding alternate object store: %s",
+					target.buf);
 				newalt = xmalloc(sizeof(*newalt));
 				newalt->next = NULL;
 				newalt->base = strbuf_detach(&target, NULL);
@@ -318,6 +317,9 @@ static void fetch_alternates(struct walker *walker, const char *base)
 	struct alternates_request alt_req;
 	struct walker_data *cdata = walker->data;
 
+	if (http_follow_config != HTTP_FOLLOW_ALWAYS)
+		return;
+
 	/*
 	 * If another request has already started fetching alternates,
 	 * wait for them to arrive and return to processing this request's
diff --git a/http.c b/http.c
index b99ade5fa..a9778bfa4 100644
--- a/http.c
+++ b/http.c
@@ -581,6 +581,7 @@ static CURL *get_curl_handle(void)
 	if (is_transport_allowed("ftps"))
 		allowed_protocols |= CURLPROTO_FTPS;
 	curl_easy_setopt(result, CURLOPT_REDIR_PROTOCOLS, allowed_protocols);
+	curl_easy_setopt(result, CURLOPT_PROTOCOLS, allowed_protocols);
 #else
 	if (transport_restrict_protocols())
 		warning("protocol restrictions not applied to curl redirects because\n"
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index ad94ed7b1..22011f0b6 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -322,5 +322,43 @@ test_expect_success 'http.followRedirects defaults to "initial"' '
 	test_must_fail git clone $HTTPD_URL/redir-objects/repo.git default
 '
 
+# The goal is for a clone of the "evil" repository, which has no objects
+# itself, to cause the client to fetch objects from the "victim" repository.
+test_expect_success 'set up evil alternates scheme' '
+	victim=$HTTPD_DOCUMENT_ROOT_PATH/victim.git &&
+	git init --bare "$victim" &&
+	git -C "$victim" --work-tree=. commit --allow-empty -m secret &&
+	git -C "$victim" repack -ad &&
+	git -C "$victim" update-server-info &&
+	sha1=$(git -C "$victim" rev-parse HEAD) &&
+
+	evil=$HTTPD_DOCUMENT_ROOT_PATH/evil.git &&
+	git init --bare "$evil" &&
+	# do this by hand to avoid object existence check
+	printf "%s\\t%s\\n" $sha1 refs/heads/master >"$evil/info/refs"
+'
+
+# Here we'll just redirect via HTTP. In a real-world attack these would be on
+# different servers, but we should reject it either way.
+test_expect_success 'http-alternates is a non-initial redirect' '
+	echo "$HTTPD_URL/dumb/victim.git/objects" \
+		>"$evil/objects/info/http-alternates" &&
+	test_must_fail git -c http.followRedirects=initial \
+		clone $HTTPD_URL/dumb/evil.git evil-initial &&
+	git -c http.followRedirects=true \
+		clone $HTTPD_URL/dumb/evil.git evil-initial
+'
+
+# Curl supports a lot of protocols that we'd prefer not to allow
+# http-alternates to use, but it's hard to test whether curl has
+# accessed, say, the SMTP protocol, because we are not running an SMTP server.
+# But we can check that it does not allow access to file://, which would
+# otherwise allow this clone to complete.
+test_expect_success 'http-alternates cannot point at funny protocols' '
+	echo "file://$victim/objects" >"$evil/objects/info/http-alternates" &&
+	test_must_fail git -c http.followRedirects=true \
+		clone "$HTTPD_URL/dumb/evil.git" evil-file
+'
+
 stop_httpd
 test_done
-- 
2.11.0.191.gdb26c57


^ permalink raw reply related

* [PATCH v2 4/6] http: make redirects more obvious
From: Jeff King @ 2016-12-06 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git, sbeller, bburky, jrnieder
In-Reply-To: <20161206182414.466uotqfufcimpqb@sigill.intra.peff.net>

We instruct curl to always follow HTTP redirects. This is
convenient, but it creates opportunities for malicious
servers to create confusing situations. For instance,
imagine Alice is a git user with access to a private
repository on Bob's server. Mallory runs her own server and
wants to access objects from Bob's repository.

Mallory may try a few tricks that involve asking Alice to
clone from her, build on top, and then push the result:

  1. Mallory may simply redirect all fetch requests to Bob's
     server. Git will transparently follow those redirects
     and fetch Bob's history, which Alice may believe she
     got from Mallory. The subsequent push seems like it is
     just feeding Mallory back her own objects, but is
     actually leaking Bob's objects. There is nothing in
     git's output to indicate that Bob's repository was
     involved at all.

     The downside (for Mallory) of this attack is that Alice
     will have received Bob's entire repository, and is
     likely to notice that when building on top of it.

  2. If Mallory happens to know the sha1 of some object X in
     Bob's repository, she can instead build her own history
     that references that object. She then runs a dumb http
     server, and Alice's client will fetch each object
     individually. When it asks for X, Mallory redirects her
     to Bob's server. The end result is that Alice obtains
     objects from Bob, but they may be buried deep in
     history. Alice is less likely to notice.

Both of these attacks are fairly hard to pull off. There's a
social component in getting Mallory to convince Alice to
work with her. Alice may be prompted for credentials in
accessing Bob's repository (but not always, if she is using
a credential helper that caches). Attack (1) requires a
certain amount of obliviousness on Alice's part while making
a new commit. Attack (2) requires that Mallory knows a sha1
in Bob's repository, that Bob's server supports dumb http,
and that the object in question is loose on Bob's server.

But we can probably make things a bit more obvious without
any loss of functionality. This patch does two things to
that end.

First, when we encounter a whole-repo redirect during the
initial ref discovery, we now inform the user on stderr,
making attack (1) much more obvious.

Second, the decision to follow redirects is now
configurable. The truly paranoid can set the new
http.followRedirects to false to avoid any redirection
entirely. But for a more practical default, we will disallow
redirects only after the initial ref discovery. This is
enough to thwart attacks similar to (2), while still
allowing the common use of redirects at the repository
level. Since c93c92f30 (http: update base URLs when we see
redirects, 2013-09-28) we re-root all further requests from
the redirect destination, which should generally mean that
no further redirection is necessary.

As an escape hatch, in case there really is a server that
needs to redirect individual requests, the user can set
http.followRedirects to "true" (and this can be done on a
per-server basis via http.*.followRedirects config).

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/config.txt   | 10 ++++++++++
 http.c                     | 31 +++++++++++++++++++++++++++++--
 http.h                     | 10 +++++++++-
 remote-curl.c              |  4 ++++
 t/lib-httpd/apache.conf    |  6 ++++++
 t/t5550-http-fetch-dumb.sh | 23 +++++++++++++++++++++++
 6 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index f4721a048..815333643 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1833,6 +1833,16 @@ http.userAgent::
 	of common USER_AGENT strings (but not including those like git/1.7.1).
 	Can be overridden by the `GIT_HTTP_USER_AGENT` environment variable.
 
+http.followRedirects::
+	Whether git should follow HTTP redirects. If set to `true`, git
+	will transparently follow any redirect issued by a server it
+	encounters. If set to `false`, git will treat all redirects as
+	errors. If set to `initial`, git will follow redirects only for
+	the initial request to a remote, but not for subsequent
+	follow-up HTTP requests. Since git uses the redirected URL as
+	the base for the follow-up requests, this is generally
+	sufficient. The default is `initial`.
+
 http.<url>.*::
 	Any of the http.* options above can be applied selectively to some URLs.
 	For a config key to match a URL, each element of the config key is
diff --git a/http.c b/http.c
index 718d2109b..b99ade5fa 100644
--- a/http.c
+++ b/http.c
@@ -98,6 +98,8 @@ static int http_proactive_auth;
 static const char *user_agent;
 static int curl_empty_auth;
 
+enum http_follow_config http_follow_config = HTTP_FOLLOW_INITIAL;
+
 #if LIBCURL_VERSION_NUM >= 0x071700
 /* Use CURLOPT_KEYPASSWD as is */
 #elif LIBCURL_VERSION_NUM >= 0x070903
@@ -337,6 +339,16 @@ static int http_options(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp("http.followredirects", var)) {
+		if (value && !strcmp(value, "initial"))
+			http_follow_config = HTTP_FOLLOW_INITIAL;
+		else if (git_config_bool(var, value))
+			http_follow_config = HTTP_FOLLOW_ALWAYS;
+		else
+			http_follow_config = HTTP_FOLLOW_NONE;
+		return 0;
+	}
+
 	/* Fall back on the default ones */
 	return git_default_config(var, value, cb);
 }
@@ -553,7 +565,6 @@ static CURL *get_curl_handle(void)
 				 curl_low_speed_time);
 	}
 
-	curl_easy_setopt(result, CURLOPT_FOLLOWLOCATION, 1);
 	curl_easy_setopt(result, CURLOPT_MAXREDIRS, 20);
 #if LIBCURL_VERSION_NUM >= 0x071301
 	curl_easy_setopt(result, CURLOPT_POSTREDIR, CURL_REDIR_POST_ALL);
@@ -882,6 +893,16 @@ struct active_request_slot *get_active_slot(void)
 	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
 	curl_easy_setopt(slot->curl, CURLOPT_RANGE, NULL);
 
+	/*
+	 * Default following to off unless "ALWAYS" is configured; this gives
+	 * callers a sane starting point, and they can tweak for individual
+	 * HTTP_FOLLOW_* cases themselves.
+	 */
+	if (http_follow_config == HTTP_FOLLOW_ALWAYS)
+		curl_easy_setopt(slot->curl, CURLOPT_FOLLOWLOCATION, 1);
+	else
+		curl_easy_setopt(slot->curl, CURLOPT_FOLLOWLOCATION, 0);
+
 #if LIBCURL_VERSION_NUM >= 0x070a08
 	curl_easy_setopt(slot->curl, CURLOPT_IPRESOLVE, git_curl_ipresolve);
 #endif
@@ -1122,9 +1143,12 @@ static int handle_curl_result(struct slot_results *results)
 	 * If we see a failing http code with CURLE_OK, we have turned off
 	 * FAILONERROR (to keep the server's custom error response), and should
 	 * translate the code into failure here.
+	 *
+	 * Likewise, if we see a redirect (30x code), that means we turned off
+	 * redirect-following, and we should treat the result as an error.
 	 */
 	if (results->curl_result == CURLE_OK &&
-	    results->http_code >= 400) {
+	    results->http_code >= 300) {
 		results->curl_result = CURLE_HTTP_RETURNED_ERROR;
 		/*
 		 * Normally curl will already have put the "reason phrase"
@@ -1443,6 +1467,9 @@ static int http_request(const char *url,
 		strbuf_addstr(&buf, " no-cache");
 	if (options && options->keep_error)
 		curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 0);
+	if (options && options->initial_request &&
+	    http_follow_config == HTTP_FOLLOW_INITIAL)
+		curl_easy_setopt(slot->curl, CURLOPT_FOLLOWLOCATION, 1);
 
 	headers = curl_slist_append(headers, buf.buf);
 
diff --git a/http.h b/http.h
index 36f558bfb..31b4cc94b 100644
--- a/http.h
+++ b/http.h
@@ -116,6 +116,13 @@ extern struct credential http_auth;
 
 extern char curl_errorstr[CURL_ERROR_SIZE];
 
+enum http_follow_config {
+	HTTP_FOLLOW_NONE,
+	HTTP_FOLLOW_ALWAYS,
+	HTTP_FOLLOW_INITIAL
+};
+extern enum http_follow_config http_follow_config;
+
 static inline int missing__target(int code, int result)
 {
 	return	/* file:// URL -- do we ever use one??? */
@@ -139,7 +146,8 @@ extern char *get_remote_object_url(const char *url, const char *hex,
 /* Options for http_get_*() */
 struct http_get_options {
 	unsigned no_cache:1,
-		 keep_error:1;
+		 keep_error:1,
+		 initial_request:1;
 
 	/* If non-NULL, returns the content-type of the response. */
 	struct strbuf *content_type;
diff --git a/remote-curl.c b/remote-curl.c
index e3803daa3..05ae8dead 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -276,6 +276,7 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	http_options.charset = &charset;
 	http_options.effective_url = &effective_url;
 	http_options.base_url = &url;
+	http_options.initial_request = 1;
 	http_options.no_cache = 1;
 	http_options.keep_error = 1;
 
@@ -294,6 +295,9 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		die("unable to access '%s': %s", url.buf, curl_errorstr);
 	}
 
+	if (options.verbosity && !starts_with(refs_url.buf, url.buf))
+		warning(_("redirecting to %s"), url.buf);
+
 	last= xcalloc(1, sizeof(*last_discovery));
 	last->service = service;
 	last->buf_alloc = strbuf_detach(&buffer, &last->len);
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 9a355fb1c..5b408d649 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -123,6 +123,7 @@ ScriptAlias /error/ error.sh/
 </Files>
 
 RewriteEngine on
+RewriteRule ^/dumb-redir/(.*)$ /dumb/$1 [R=301]
 RewriteRule ^/smart-redir-perm/(.*)$ /smart/$1 [R=301]
 RewriteRule ^/smart-redir-temp/(.*)$ /smart/$1 [R=302]
 RewriteRule ^/smart-redir-auth/(.*)$ /auth/smart/$1 [R=301]
@@ -140,6 +141,11 @@ RewriteRule ^/loop-redir/(.*)$ /loop-redir/x-$1 [R=302]
 RewriteRule ^/insane-redir/(.*)$ /intern-redir/$1/foo [R=301]
 RewriteRule ^/intern-redir/(.*)/foo$ /smart/$1 [PT]
 
+# Serve info/refs internally without redirecting, but
+# issue a redirect for any object requests.
+RewriteRule ^/redir-objects/(.*/info/refs)$ /dumb/$1 [PT]
+RewriteRule ^/redir-objects/(.*/objects/.*)$ /dumb/$1 [R=301]
+
 # Apache 2.2 does not understand <RequireAll>, so we use RewriteCond.
 # And as RewriteCond does not allow testing for non-matches, we match
 # the desired case first (one has abra, two has cadabra), and let it
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 3484b6f0f..ad94ed7b1 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -299,5 +299,28 @@ test_expect_success 'git client does not send an empty Accept-Language' '
 	! grep "^Accept-Language:" stderr
 '
 
+test_expect_success 'redirects can be forbidden/allowed' '
+	test_must_fail git -c http.followRedirects=false \
+		clone $HTTPD_URL/dumb-redir/repo.git dumb-redir &&
+	git -c http.followRedirects=true \
+		clone $HTTPD_URL/dumb-redir/repo.git dumb-redir 2>stderr
+'
+
+test_expect_success 'redirects are reported to stderr' '
+	# just look for a snippet of the redirected-to URL
+	test_i18ngrep /dumb/ stderr
+'
+
+test_expect_success 'non-initial redirects can be forbidden' '
+	test_must_fail git -c http.followRedirects=initial \
+		clone $HTTPD_URL/redir-objects/repo.git redir-objects &&
+	git -c http.followRedirects=true \
+		clone $HTTPD_URL/redir-objects/repo.git redir-objects
+'
+
+test_expect_success 'http.followRedirects defaults to "initial"' '
+	test_must_fail git clone $HTTPD_URL/redir-objects/repo.git default
+'
+
 stop_httpd
 test_done
-- 
2.11.0.191.gdb26c57


^ permalink raw reply related

* [PATCH v2 2/6] http: always update the base URL for redirects
From: Jeff King @ 2016-12-06 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git, sbeller, bburky, jrnieder
In-Reply-To: <20161206182414.466uotqfufcimpqb@sigill.intra.peff.net>

If a malicious server redirects the initial ref
advertisement, it may be able to leak sha1s from other,
unrelated servers that the client has access to. For
example, imagine that Alice is a git user, she has access to
a private repository on a server hosted by Bob, and Mallory
runs a malicious server and wants to find out about Bob's
private repository.

Mallory asks Alice to clone an unrelated repository from her
over HTTP. When Alice's client contacts Mallory's server for
the initial ref advertisement, the server issues an HTTP
redirect for Bob's server. Alice contacts Bob's server and
gets the ref advertisement for the private repository. If
there is anything to fetch, she then follows up by asking
the server for one or more sha1 objects. But who is the
server?

If it is still Mallory's server, then Alice will leak the
existence of those sha1s to her.

Since commit c93c92f30 (http: update base URLs when we see
redirects, 2013-09-28), the client usually rewrites the base
URL such that all further requests will go to Bob's server.
But this is done by textually matching the URL. If we were
originally looking for "http://mallory/repo.git/info/refs",
and we got pointed at "http://bob/other.git/info/refs", then
we know that the right root is "http://bob/other.git".

If the redirect appears to change more than just the root,
we punt and continue to use the original server. E.g.,
imagine the redirect adds a URL component that Bob's server
will ignore, like "http://bob/other.git/info/refs?dummy=1".

We can solve this by aborting in this case rather than
silently continuing to use Mallory's server. In addition to
protecting from sha1 leakage, it's arguably safer and more
sane to refuse a confusing redirect like that in general.
For example, part of the motivation in c93c92f30 is
avoiding accidentally sending credentials over clear http,
just to get a response that says "try again over https". So
even in a non-malicious case, we'd prefer to err on the side
of caution.

The downside is that it's possible this will break a
legitimate but complicated server-side redirection scheme.
The setup given in the newly added test does work, but it's
convoluted enough that we don't need to care about it. A
more plausible case would be a server which redirects a
request for "info/refs?service=git-upload-pack" to just
"info/refs" (because it does not do smart HTTP, and for some
reason really dislikes query parameters).  Right now we
would transparently downgrade to dumb-http, but with this
patch, we'd complain (and the user would have to set
GIT_SMART_HTTP=0 to fetch).

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 http.c                        | 12 ++++++++----
 t/lib-httpd/apache.conf       |  8 ++++++++
 t/t5551-http-fetch-smart.sh   |  4 ++++
 t/t5812-proto-disable-http.sh |  1 +
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/http.c b/http.c
index d4034a14b..718d2109b 100644
--- a/http.c
+++ b/http.c
@@ -1491,9 +1491,9 @@ static int http_request(const char *url,
  *
  * Note that this assumes a sane redirect scheme. It's entirely possible
  * in the example above to end up at a URL that does not even end in
- * "info/refs".  In such a case we simply punt, as there is not much we can
- * do (and such a scheme is unlikely to represent a real git repository,
- * which means we are likely about to abort anyway).
+ * "info/refs".  In such a case we die. There's not much we can do, such a
+ * scheme is unlikely to represent a real git repository, and failing to
+ * rewrite the base opens options for malicious redirects to do funny things.
  */
 static int update_url_from_redirect(struct strbuf *base,
 				    const char *asked,
@@ -1511,10 +1511,14 @@ static int update_url_from_redirect(struct strbuf *base,
 
 	new_len = got->len;
 	if (!strip_suffix_mem(got->buf, &new_len, tail))
-		return 0; /* insane redirect scheme */
+		die(_("unable to update url base from redirection:\n"
+		      "  asked for: %s\n"
+		      "   redirect: %s"),
+		    asked, got->buf);
 
 	strbuf_reset(base);
 	strbuf_add(base, got->buf, new_len);
+
 	return 1;
 }
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 018a83a5a..9a355fb1c 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -132,6 +132,14 @@ RewriteRule ^/ftp-redir/(.*)$ ftp://localhost:1000/$1 [R=302]
 RewriteRule ^/loop-redir/x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-(.*) /$1 [R=302]
 RewriteRule ^/loop-redir/(.*)$ /loop-redir/x-$1 [R=302]
 
+# The first rule issues a client-side redirect to something
+# that _doesn't_ look like a git repo. The second rule is a
+# server-side rewrite, so that it turns out the odd-looking
+# thing _is_ a git repo. The "[PT]" tells Apache to match
+# the usual ScriptAlias rules for /smart.
+RewriteRule ^/insane-redir/(.*)$ /intern-redir/$1/foo [R=301]
+RewriteRule ^/intern-redir/(.*)/foo$ /smart/$1 [PT]
+
 # Apache 2.2 does not understand <RequireAll>, so we use RewriteCond.
 # And as RewriteCond does not allow testing for non-matches, we match
 # the desired case first (one has abra, two has cadabra), and let it
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 2f375eb94..d8826acde 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -110,6 +110,10 @@ test_expect_success 'redirects re-root further requests' '
 	git clone $HTTPD_URL/smart-redir-limited/repo.git repo-redir-limited
 '
 
+test_expect_success 're-rooting dies on insane schemes' '
+	test_must_fail git clone $HTTPD_URL/insane-redir/repo.git insane
+'
+
 test_expect_success 'clone from password-protected repository' '
 	echo two >expect &&
 	set_askpass user@host pass@host &&
diff --git a/t/t5812-proto-disable-http.sh b/t/t5812-proto-disable-http.sh
index 0d105d541..044cc152f 100755
--- a/t/t5812-proto-disable-http.sh
+++ b/t/t5812-proto-disable-http.sh
@@ -18,6 +18,7 @@ test_proto "smart http" http "$HTTPD_URL/smart/repo.git"
 
 test_expect_success 'curl redirects respect whitelist' '
 	test_must_fail env GIT_ALLOW_PROTOCOL=http:https \
+			   GIT_SMART_HTTP=0 \
 		git clone "$HTTPD_URL/ftp-redir/repo.git" 2>stderr &&
 	{
 		test_i18ngrep "ftp.*disabled" stderr ||
-- 
2.11.0.191.gdb26c57


^ permalink raw reply related

* [PATCH v2 3/6] remote-curl: rename shadowed options variable
From: Jeff King @ 2016-12-06 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git, sbeller, bburky, jrnieder
In-Reply-To: <20161206182414.466uotqfufcimpqb@sigill.intra.peff.net>

The discover_refs() function has a local "options" variable
to hold the http_get_options we pass to http_get_strbuf().
But this shadows the global "struct options" that holds our
program-level options, which cannot be accessed from this
function.

Let's give the local one a more descriptive name so we can
tell the two apart.

Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 6b83b7783..e3803daa3 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -254,7 +254,7 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	struct strbuf effective_url = STRBUF_INIT;
 	struct discovery *last = last_discovery;
 	int http_ret, maybe_smart = 0;
-	struct http_get_options options;
+	struct http_get_options http_options;
 
 	if (last && !strcmp(service, last->service))
 		return last;
@@ -271,15 +271,15 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		strbuf_addf(&refs_url, "service=%s", service);
 	}
 
-	memset(&options, 0, sizeof(options));
-	options.content_type = &type;
-	options.charset = &charset;
-	options.effective_url = &effective_url;
-	options.base_url = &url;
-	options.no_cache = 1;
-	options.keep_error = 1;
+	memset(&http_options, 0, sizeof(http_options));
+	http_options.content_type = &type;
+	http_options.charset = &charset;
+	http_options.effective_url = &effective_url;
+	http_options.base_url = &url;
+	http_options.no_cache = 1;
+	http_options.keep_error = 1;
 
-	http_ret = http_get_strbuf(refs_url.buf, &buffer, &options);
+	http_ret = http_get_strbuf(refs_url.buf, &buffer, &http_options);
 	switch (http_ret) {
 	case HTTP_OK:
 		break;
-- 
2.11.0.191.gdb26c57


^ permalink raw reply related

* [PATCH v2 1/6] http: simplify update_url_from_redirect
From: Jeff King @ 2016-12-06 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git, sbeller, bburky, jrnieder
In-Reply-To: <20161206182414.466uotqfufcimpqb@sigill.intra.peff.net>

This function looks for a common tail between what we asked
for and where we were redirected to, but it open-codes the
comparison. We can avoid some confusing subtractions by
using strip_suffix_mem().

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/http.c b/http.c
index d8e427b69..d4034a14b 100644
--- a/http.c
+++ b/http.c
@@ -1500,7 +1500,7 @@ static int update_url_from_redirect(struct strbuf *base,
 				    const struct strbuf *got)
 {
 	const char *tail;
-	size_t tail_len;
+	size_t new_len;
 
 	if (!strcmp(asked, got->buf))
 		return 0;
@@ -1509,14 +1509,12 @@ static int update_url_from_redirect(struct strbuf *base,
 		die("BUG: update_url_from_redirect: %s is not a superset of %s",
 		    asked, base->buf);
 
-	tail_len = strlen(tail);
-
-	if (got->len < tail_len ||
-	    strcmp(tail, got->buf + got->len - tail_len))
+	new_len = got->len;
+	if (!strip_suffix_mem(got->buf, &new_len, tail))
 		return 0; /* insane redirect scheme */
 
 	strbuf_reset(base);
-	strbuf_add(base, got->buf, got->len - tail_len);
+	strbuf_add(base, got->buf, new_len);
 	return 1;
 }
 
-- 
2.11.0.191.gdb26c57


^ permalink raw reply related

* Re: [PATCH v3 1/2] difftool: add a skeleton for the upcoming builtin
From: Stefan Beller @ 2016-12-06 18:22 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin, Junio C Hamano, git@vger.kernel.org,
	David Aguilar, Dennis Kaarsemaker
In-Reply-To: <20161206150955.mvq4ocamaei52bap@sigill.intra.peff.net>

On Tue, Dec 6, 2016 at 7:09 AM, Jeff King <peff@peff.net> wrote:

>>
>> Maybe even go a step further and say that the config code needs a context
>> "object".
>
> If I were writing git from scratch, I'd consider making a "struct
> repository" object. I'm not sure how painful it would be to retro-fit it
> at this point.

Would it be possible to introduce "the repo" struct similar to "the index"
in cache.h?

From a submodule perspective I would very much welcome this
object oriented approach to repositories.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox