Git development

Git development
 help / color / mirror / Atom feed

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Johannes Schindelin @ 2017-01-16 11:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqh94zbwlu.fsf@gitster.mtv.corp.google.com>

Hi Junio,

On Sun, 15 Jan 2017, Junio C Hamano wrote:

> Here are the topics that have been cooking.  Commits prefixed with
> '-' are only in 'pu' (proposed updates) while commits prefixed with
> '+' are in 'next'.  The ones marked with '.' do not appear in any of
> the integration branches, but I am still holding onto them.

A suggestion: since it is very, very tedious to find the latest
iteration's thread, as well as the corresponding commit in `pu` (or
whatever other branch you use), and since you auto-generate these lists
anyway, it would make things less cumbersome if the commit names of the
tips, as well as the Message-IDs of the cover-letter (or first patch) were
displayed with the topic branches.

It still would not address e.g. the problem that the original authors are
not notified about the current state of their submission by these What's
Cooking mails.

But it would improve the situation.

> * js/difftool-builtin (2017-01-09) 5 commits
>  - t7800: run both builtin and scripted difftool, for now
>  - difftool: implement the functionality in the builtin
>  - difftool: add a skeleton for the upcoming builtin
>  - git_exec_path: do not return the result of getenv()

This patch was not in my patch submission. Sneaking it into this topic
branch is not incorrect.

It does not matter, though, as I will drop the Coverity patches from this
patch series: the conflation of Coverity fixes with the builtin difftool
was a mistake, and I will thereby address that mistake.

Ciao,
Johannes

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Johannes Schindelin @ 2017-01-16 12:52 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <257b4175-9879-7814-5d8d-02050792574d@kdbg.org>

Hi Hannes,

On Mon, 16 Jan 2017, Johannes Sixt wrote:

> Am 16.01.2017 um 02:51 schrieb Junio C Hamano:
> > * jk/vreport-sanitize (2017-01-11) 2 commits
> >  - vreport: sanitize ASCII control chars
> >  - Revert "vreportf: avoid intermediate buffer"
> >
> >  An error message with an ASCII control character like '\r' in it
> >  can alter the message to hide its early part, which is problematic
> >  when a remote side gives such an error message that the local side
> >  will relay with a "remote: " prefix.
> >
> >  Will merge to 'next'.
> 
> Please be not too hasty with advancing this topic to master. I could imagine
> that there is some unwanted fallout on systems where the end-of-line marker
> CRLF is common. Though, I haven't tested the topic myself, yet, nor do I
> expect regressions in *my* typical workflow.

Thank you for being so attentive.

This topic branch would indeed have caused problems. Worse: it would have
caused problems that are not covered by our test suite, as Git for
Windows' own utilities do not generate CR/LF line endings. So this
regression would have bit our users. Nasty.

Something like this is necessary, at least:

-- snipsnap --
Subject: [PATCH] fixup! vreport: sanitize ASCII control chars

The original patch is incorrect, as it turns carriage returns into
question marks.

However, carriage returns should be left alone when preceding a line feed,
and simply turned into line feeds otherwise.

To make the end result more readable, the logic is inverted so that the
common case (no substitution) is handled first.

While at it, let's lose the unnecessary curly braces.

We also add a regression test that verifies that carriage returns are
handled properly. And as we expect CR/LF to be handled properly also on
platforms other than Windows, this test case is not guarded by the MINGW
prerequisite.

Spotted by Hannes Sixt.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t0000-basic.sh | 8 ++++++++
 usage.c          | 9 ++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index 60811a3a7c..d494cb7c05 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -1063,4 +1063,12 @@ test_expect_success 'very long name in the index handled sanely' '
 	test $len = 4098
 '
 
+# This test verifies that the error reporting functions handle CR correctly.
+# --abbrev-ref is simply used out of convenience, as it reports an error including
+# a user-specified argument.
+test_expect_success 'error messages treat CR/LF correctly' '
+	test_must_fail git rev-parse --abbrev-ref "$(printf "CR/LF\\r\\n")" 2>err &&
+	grep "^fatal: .*CR/LF$" err
+'
+
 test_done
diff --git a/usage.c b/usage.c
index 50a6ccee44..71bc7c0329 100644
--- a/usage.c
+++ b/usage.c
@@ -15,10 +15,13 @@ void vreportf(const char *prefix, const char *err, va_list params)
 	char *p;
 
 	vsnprintf(msg, sizeof(msg), err, params);
-	for (p = msg; *p; p++) {
-		if (iscntrl(*p) && *p != '\t' && *p != '\n')
+	for (p = msg; *p; p++)
+		if (!iscntrl(*p) || *p == '\t' || *p == '\n')
+			continue;
+		else if (*p != '\r')
 			*p = '?';
-	}
+		else if (p[1] != '\n')
+			*p = '\n';
 	fprintf(fh, "%s%s\n", prefix, msg);
 }
 
-- 
2.11.0.windows.3


^ permalink raw reply related

* Re: [RFC] stash: support filename argument
From: Johannes Schindelin @ 2017-01-16 13:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Gummerer, git, kes-kes
In-Reply-To: <alpine.DEB.2.20.1701161150320.3469@virtualbox>

Hi,

On Mon, 16 Jan 2017, Johannes Schindelin wrote:

> On Sun, 15 Jan 2017, Junio C Hamano wrote:
> 
> > I haven't spent enough time to think if it even makes sense to
> > "stash" partially, leaving the working tree still dirty.
> 
> Think no more! We already support that with --keep-index, and it is a very
> useful feature.

And of course there is `git stash -p`, which is also very useful,
*because* it leaves the working tree still dirty, in the desired way.

Ciao,
Johannes

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Jeff King @ 2017-01-16 16:04 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johannes Sixt, Junio C Hamano, git
In-Reply-To: <alpine.DEB.2.20.1701161251100.3469@virtualbox>

On Mon, Jan 16, 2017 at 01:52:39PM +0100, Johannes Schindelin wrote:

> > >  An error message with an ASCII control character like '\r' in it
> > >  can alter the message to hide its early part, which is problematic
> > >  when a remote side gives such an error message that the local side
> > >  will relay with a "remote: " prefix.
> > >
> > >  Will merge to 'next'.
> > 
> > Please be not too hasty with advancing this topic to master. I could imagine
> > that there is some unwanted fallout on systems where the end-of-line marker
> > CRLF is common. Though, I haven't tested the topic myself, yet, nor do I
> > expect regressions in *my* typical workflow.
> 
> Thank you for being so attentive.
> 
> This topic branch would indeed have caused problems. Worse: it would have
> caused problems that are not covered by our test suite, as Git for
> Windows' own utilities do not generate CR/LF line endings. So this
> regression would have bit our users. Nasty.

Hmm.  I am not sure to what degree CRLFs are actually a problem here.
Keep in mind these are error messages generated via error(), and so not
processing arbitrary data. I can imagine that CRs might come from:

  1. A parameter like a filename or revision name. If I ask for a
     rev-parse of "foo\r\n", then it's probably useful to mention the
     "\r" in the error message, rather than ignoring it (or converting
     it to a line-feed).

     And I think that would apply to any input parameter we show via
     error(), etc, if it is connected to a newline (ideally we would
     show newlines as "?", too, but we cannot tell the difference
     between ones from parameters, and ones that are part of the error
     message).

  2. The printf-fmt strings themselves. But these come from C code,
     which just uses "\n". My impression is that it is fprintf() which
     is responsible for converting that to "\r\n". And we are doing our
     sanitizing here between an snprintf(), and an fprintf() of the
     result. So it should see only the raw "\n" fields.

I am certainly open to loosening the sanitizing for CR to make things
work seamlessly on Windows. But I am having trouble imagining a case
that is actually negatively impacted.

> -- snipsnap --
> Subject: [PATCH] fixup! vreport: sanitize ASCII control chars

Given the subtlety here, I'd much rather have a patch on top.

> The original patch is incorrect, as it turns carriage returns into
> question marks.
> 
> However, carriage returns should be left alone when preceding a line feed,
> and simply turned into line feeds otherwise.

The question of whether to leave CRLFs alone is addressed above. But I
do not understand why you'd want a lone CR to be converted to a line
feed. Running:

  git rev-parse --abbrev-ref "$(printf "foo\r")"

with my patch gets you:

  fatal: ambiguous argument 'foo?': unknown revision or path not in the working tree.

But with yours:

  fatal: ambiguous argument 'foo
  ': unknown revision or path not in the working tree.

Obviously the "?" thing is a lossy transformation, but I do not see how
a newline is any less confusing (but see below for some thoughts).

> To make the end result more readable, the logic is inverted so that the
> common case (no substitution) is handled first.
> 
> While at it, let's lose the unnecessary curly braces.

Please don't. Obviously C treats the "if/else" as a single unit, but
IMHO it's less error-prone to include the braces any time there are
multiple visual lines. E.g., something like:

  while (foo)
	if (bar)
		one();
	else
		two();
	three();

is much easier to spot as wrong when you would require braces either
way (and not relevant here, but I'd say that even an inner block with a
comment deserves braces for the same reason).

> We also add a regression test that verifies that carriage returns are
> handled properly. And as we expect CR/LF to be handled properly also on
> platforms other than Windows, this test case is not guarded by the MINGW
> prerequisite.

I am not sure what "properly" means here. In your test:

> +# This test verifies that the error reporting functions handle CR correctly.
> +# --abbrev-ref is simply used out of convenience, as it reports an error including
> +# a user-specified argument.
> +test_expect_success 'error messages treat CR/LF correctly' '
> +	test_must_fail git rev-parse --abbrev-ref "$(printf "CR/LF\\r\\n")" 2>err &&
> +	grep "^fatal: .*CR/LF$" err
> +'

The "\n" is eaten by the shell, and git sees only "CR/LF\r". So we are
not testing the CRLF case in vreportf() at all.

We do end up with "CR/LF\n" in vreportf(), which is presumably converted
by fprintf() to "CR/LF\r\n" on Windows. And so perhaps that is why you
are doing the "convert \r to \n" thing above.

But I still think it's not doing the right thing. Git _didn't_ see CRLF,
it saw CR.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Johannes Schindelin @ 2017-01-16 17:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Junio C Hamano, git
In-Reply-To: <20170116160456.ltbb7ofe47xos7xo@sigill.intra.peff.net>

Hi Peff,

On Mon, 16 Jan 2017, Jeff King wrote:

> On Mon, Jan 16, 2017 at 01:52:39PM +0100, Johannes Schindelin wrote:
> 
> > > >  An error message with an ASCII control character like '\r' in it
> > > >  can alter the message to hide its early part, which is problematic
> > > >  when a remote side gives such an error message that the local side
> > > >  will relay with a "remote: " prefix.
> > > >
> > > >  Will merge to 'next'.
> > > 
> > > Please be not too hasty with advancing this topic to master. I could imagine
> > > that there is some unwanted fallout on systems where the end-of-line marker
> > > CRLF is common. Though, I haven't tested the topic myself, yet, nor do I
> > > expect regressions in *my* typical workflow.
> > 
> > Thank you for being so attentive.
> > 
> > This topic branch would indeed have caused problems. Worse: it would have
> > caused problems that are not covered by our test suite, as Git for
> > Windows' own utilities do not generate CR/LF line endings. So this
> > regression would have bit our users. Nasty.
> 
> Hmm.  I am not sure to what degree CRLFs are actually a problem here.
> Keep in mind these are error messages generated via error(), and so not
> processing arbitrary data. I can imagine that CRs might come from:

Please note the regression test I added. It uses rev-parse's --abbrev-ref
option which quotes the argument when erroring out. This argument then
gets munged.

So error() (or in this case, die()) *very much* processes arbitrary data.

I *know* that rev-parse --abbrev-ref is an artificial example, it is
highly unlikely that anybody will use

	git rev-parse --abbrev-ref "$(<call an external program here that
		generates CR/LF line endings>)"

However, there are plenty other cases in regular Git usage where arguments
are generated by external programs to which we have no business dictating a
specific line ending style.

If you absolutely insist, I will spend time to find a plausible example
and use that in the regression test. However, that would not really
improve anything, as the purpose of the regression test is simply to
demonstrate that a user-provided argument's CR/LF does not get munged
incorrectly. And the test I provided serves that purpose perfectly.

>   1. A parameter like a filename or revision name. If I ask for a
>      rev-parse of "foo\r\n", then it's probably useful to mention the
>      "\r" in the error message, rather than ignoring it (or converting
>      it to a line-feed).
> 
>      And I think that would apply to any input parameter we show via
>      error(), etc, if it is connected to a newline (ideally we would
>      show newlines as "?", too, but we cannot tell the difference
>      between ones from parameters, and ones that are part of the error
>      message).

I think it is doing users a really great disservice to munge up CR or LF
into question marks. I *guarantee* you that it confuses users. And not
because they are dumb, but because the code violates the Law of Least
Surprise.

> I am certainly open to loosening the sanitizing for CR to make things
> work seamlessly on Windows. But I am having trouble imagining a case
> that is actually negatively impacted.

Git accepts all kinds of arguments, not just file names. It is totally
legitimate, and you probably can show use cases of that yourself, to
generate those arguments by other programs.

These programs can generate CR/LF line endings.

While well-intentioned, your changes could make things even unreadable.
Certainly confusing.

> > -- snipsnap --
> > Subject: [PATCH] fixup! vreport: sanitize ASCII control chars
> 
> Given the subtlety here, I'd much rather have a patch on top.

Fine.

> > The original patch is incorrect, as it turns carriage returns into
> > question marks.
> > 
> > However, carriage returns should be left alone when preceding a line feed,
> > and simply turned into line feeds otherwise.
> 
> The question of whether to leave CRLFs alone is addressed above. But I
> do not understand why you'd want a lone CR to be converted to a line
> feed. Running:
> 
>   git rev-parse --abbrev-ref "$(printf "foo\r")"
> 
> with my patch gets you:
> 
>   fatal: ambiguous argument 'foo?': unknown revision or path not in the working tree.
> 
> But with yours:
> 
>   fatal: ambiguous argument 'foo
>   ': unknown revision or path not in the working tree.
> 
> Obviously the "?" thing is a lossy transformation,

s/lossy/confusing/

Probably not to you, because you came up with the transformation, but to
literally everybody else.

> but I do not see how a newline is any less confusing (but see below for
> some thoughts).

The more common bug report to be expected would show that multi-line
arguments are converted to ending in question marks. That is not the user
experience for which I am aiming.

> > To make the end result more readable, the logic is inverted so that the
> > common case (no substitution) is handled first.
> > 
> > While at it, let's lose the unnecessary curly braces.
> 
> Please don't. Obviously C treats the "if/else" as a single unit, but
> IMHO it's less error-prone to include the braces any time there are
> multiple visual lines. E.g., something like:
> 
>   while (foo)
> 	if (bar)
> 		one();
> 	else
> 		two();
> 	three();
> 
> is much easier to spot as wrong when you would require braces either
> way (and not relevant here, but I'd say that even an inner block with a
> comment deserves braces for the same reason).

There is no documentation about the preferred coding style.

For years, I saw patches, and provided patches myself, that *avoided*
curly braces when not necessary (in addition to aiming for shorter arms to
come before longer arms in conditionals).

Now all of a sudden Junio *and* you suggest unnecessary curly braces to be
added.

That is inconsistent at best, and confusing. Maybe you two gentle people
can make up your mind and document the final verdict, and for extra
brownie points automate the formatting so that patch reviews are not
dominated by coding style comments that would be better addressed by
machines than by humans.

> > We also add a regression test that verifies that carriage returns are
> > handled properly. And as we expect CR/LF to be handled properly also on
> > platforms other than Windows, this test case is not guarded by the MINGW
> > prerequisite.
> 
> I am not sure what "properly" means here. In your test:
> 
> > +# This test verifies that the error reporting functions handle CR correctly.
> > +# --abbrev-ref is simply used out of convenience, as it reports an error including
> > +# a user-specified argument.
> > +test_expect_success 'error messages treat CR/LF correctly' '
> > +	test_must_fail git rev-parse --abbrev-ref "$(printf "CR/LF\\r\\n")" 2>err &&
> > +	grep "^fatal: .*CR/LF$" err
> > +'
> 
> The "\n" is eaten by the shell, and git sees only "CR/LF\r". So we are
> not testing the CRLF case in vreportf() at all.

True. Should be easy to fix, starting from my patch.

> We do end up with "CR/LF\n" in vreportf(), which is presumably converted
> by fprintf() to "CR/LF\r\n" on Windows. And so perhaps that is why you
> are doing the "convert \r to \n" thing above.
> 
> But I still think it's not doing the right thing. Git _didn't_ see CRLF,
> it saw CR.

You know, I don't care. As long as carriage returns are either left alone
(which conflicts specifically with your stated goal) or at least are
handled gracefully when coming before line feeds.

Converting stray carriage returns to question marks is confusing, and of
course I would take the brunt of the bug reports, so I do not look
favorably on that change.

My patch was just a starter, to help with fixing the patch series before
it hits `next`.

Ciao,
Johannes

^ permalink raw reply

* [PATCH 2/2] configure.ac: Fix --without-iconv
From: Bernd Kuhls @ 2017-01-16 19:56 UTC (permalink / raw)
  To: git; +Cc: Bernd Kuhls
In-Reply-To: <20170116195638.3713-1-bernd.kuhls@writeme.com>

GIT_PARSE_WITH(iconv)) sets NO_ICONV=YesPlease in
https://github.com/git/git/blob/maint/configure.ac#L327

But the command GIT_CONF_SUBST([NO_ICONV]) in
https://github.com/git/git/blob/maint/configure.ac#L618

is only executed when NO_ICONV is an empty variable
https://github.com/git/git/blob/maint/configure.ac#L578

which has the effect that NO_ICONV=YesPlease is not written to
config.mak.autogen which breaks compilation in systems without iconv.

Signed-off-by: Bernd Kuhls <bernd.kuhls@writeme.com>
---
 configure.ac | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/configure.ac b/configure.ac
index 63e71a472..419469315 100644
--- a/configure.ac
+++ b/configure.ac
@@ -614,15 +614,15 @@ LIBS="$old_LIBS"
 
 GIT_UNSTASH_FLAGS($ICONVDIR)
 
-GIT_CONF_SUBST([NEEDS_LIBICONV])
-GIT_CONF_SUBST([NO_ICONV])
-
 if test -n "$NO_ICONV"; then
     NEEDS_LIBICONV=
 fi
 
 fi
 
+GIT_CONF_SUBST([NEEDS_LIBICONV])
+GIT_CONF_SUBST([NO_ICONV])
+
 #
 # Define NO_DEFLATE_BOUND if deflateBound is missing from zlib.
 
-- 
2.11.0


^ permalink raw reply related

* [PATCH 1/2] configure.ac: fix old iconv check
From: Bernd Kuhls @ 2017-01-16 19:56 UTC (permalink / raw)
  To: git; +Cc: Bernd Kuhls

According to
https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Running-the-Compiler.html
the parameter syntax of AC_COMPILE_IFELSE is

(input, [action-if-true], [action-if-false])

Displaying "no" when the test was positive and enabling support for old
iconv implementations by OLD_ICONV=UnfortunatelyYes when the test fails
it obviously wrong. This patch switches the actions to fix the problem.

Signed-off-by: Bernd Kuhls <bernd.kuhls@writeme.com>
---
 configure.ac | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index 0b15f04b1..63e71a472 100644
--- a/configure.ac
+++ b/configure.ac
@@ -759,9 +759,9 @@ GIT_STASH_FLAGS($ICONVDIR)
 
 AC_MSG_CHECKING([for old iconv()])
 AC_COMPILE_IFELSE([OLDICONVTEST_SRC],
-	[AC_MSG_RESULT([no])],
 	[AC_MSG_RESULT([yes])
-	OLD_ICONV=UnfortunatelyYes])
+	OLD_ICONV=UnfortunatelyYes],
+	[AC_MSG_RESULT([no])])
 
 GIT_UNSTASH_FLAGS($ICONVDIR)
 
-- 
2.11.0


^ permalink raw reply related

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Johannes Sixt @ 2017-01-16 20:33 UTC (permalink / raw)
  To: Johannes Schindelin, Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <alpine.DEB.2.20.1701161746200.3469@virtualbox>

Am 16.01.2017 um 18:06 schrieb Johannes Schindelin:
> On Mon, 16 Jan 2017, Jeff King wrote:
>> Hmm.  I am not sure to what degree CRLFs are actually a problem here.
>> Keep in mind these are error messages generated via error(), and so not
>> processing arbitrary data. I can imagine that CRs might come from:
>
> Please note the regression test I added. It uses rev-parse's --abbrev-ref
> option which quotes the argument when erroring out. This argument then
> gets munged.
>
> So error() (or in this case, die()) *very much* processes arbitrary data.
>
> I *know* that rev-parse --abbrev-ref is an artificial example, it is
> highly unlikely that anybody will use
>
> 	git rev-parse --abbrev-ref "$(<call an external program here that
> 		generates CR/LF line endings>)"
>
> However, there are plenty other cases in regular Git usage where arguments
> are generated by external programs to which we have no business dictating a
> specific line ending style.

However, Jeff's patch is intended to catch exactly these cases (not for 
the cases where this happens accidentally, but when they happen with 
malicious intent).

We are talking about user-provided data that is reproduced by die() or 
error(). I daresay that we do not have a single case where it is 
intended that this data is intentionally multi-lined, like a commit 
message. It can only be an accident or malicious when it spans across lines.

I know we allow CR and LF in file names, but in all cases where such a 
name appears in an error message, it is *not important* that the data is 
reproduced exactly. On the contrary, it is usually more helpful to know 
that something strange is going on. The question marks are a strong 
indication to the user for this.

> If you absolutely insist, I will spend time to find a plausible example
> and use that in the regression test.

I don't want to see you on an endeavor with dubious results. I'd prefer 
to wait until the first case of "incorrectly munged data" is reported 
because, as I said, I have a gut feeling that there is none.

>> I am certainly open to loosening the sanitizing for CR to make things
>> work seamlessly on Windows. But I am having trouble imagining a case
>> that is actually negatively impacted.

I came to the same conclusion. I regret having sent out a warning 
message in, well, such a haste(*), without thinking the case through 
first. IMHO, Jeff's patch should be fine as is.

(*) literally; I had to catch a train.

-- Hannes

^ permalink raw reply

* [PATCH 0/6] fsck --connectivity-check misses some corruption
From: Jeff King @ 2017-01-16 21:22 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin

I came across a repository today that was missing an object, and for
which "git fsck" reported the error but "git fsck --connectivity-check"
did not. It turns out that the shortcut taken by --connectivity-check
violates some assumptions made by the rest of fsck (namely, that every
object in the repo has a "struct object" loaded).

And fsck being a generally neglected tool, I couldn't help but find
several more bugs on the way. :)

  [1/6]: t1450: clean up sub-objects in duplicate-entry test
  [2/6]: fsck: report trees as dangling
  [3/6]: fsck: prepare dummy objects for --connectivity-check
  [4/6]: fsck: tighten error-checks of "git fsck <head>"
  [5/6]: fsck: do not fallback "git fsck <bogus>" to "git fsck"
  [6/6]: fsck: check HAS_OBJ more consistently

 builtin/fsck.c  | 131 ++++++++++++++++++++++++++++++++++++++++++++------------
 t/t1450-fsck.sh |  70 ++++++++++++++++++++++++++++--
 2 files changed, 171 insertions(+), 30 deletions(-)

-Peff

^ permalink raw reply

* [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test
From: Jeff King @ 2017-01-16 21:24 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

This test creates a multi-level set of trees, but its
cleanup routine only removes the top-level tree. After the
test finishes, the inner tree and the blob it points to
remain, making the inner tree dangling.

A later test ("cleaned up") verifies that we've removed any
cruft and "git fsck" output is clean. This passes only
because of a bug in git-fsck which fails to notice dangling
trees.

In preparation for fixing the bug, let's teach this earlier
test to clean up after itself correctly. We have to remove
the inner tree (and therefore the blob, too, which becomes
dangling after removing that tree).

Since the setup code happens inside a subshell, we can't
just set a variable for each object. However, we can stuff
all of the sha1s into the $T output variable, which is not
used for anything except cleanup.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1450-fsck.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index ee7d4736d..6eef8b28e 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -189,14 +189,16 @@ test_expect_success 'commit with NUL in header' '
 '

 test_expect_success 'tree object with duplicate entries' '
-	test_when_finished "remove_object \$T" &&
+	test_when_finished "for i in \$T; do remove_object \$i; done" &&
 	T=$(
 		GIT_INDEX_FILE=test-index &&
 		export GIT_INDEX_FILE &&
 		rm -f test-index &&
 		>x &&
 		git add x &&
+		git rev-parse :x &&
 		T=$(git write-tree) &&
+		echo $T &&
 		(
 			git cat-file tree $T &&
 			git cat-file tree $T
-- 
2.11.0.642.gd6f8cda6c

^ permalink raw reply related

* [PATCH 2/6] fsck: report trees as dangling
From: Jeff King @ 2017-01-16 21:25 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

After checking connectivity, fsck looks through the list of
any objects we've seen mentioned, and reports unreachable
and un-"used" ones as dangling. However, it skips any object
which is not marked as "parsed", as that is an object that
we _don't_ have (but that somebody mentioned).

Since 6e454b9a3 (clear parsed flag when we free tree
buffers, 2013-06-05), that flag can't be relied on, and the
correct method is to check the HAS_OBJ flag. The cleanup in
that commit missed this callsite, though. As a result, we
would generally fail to report dangling trees.

We never noticed because there were no tests in this area
(for trees or otherwise). Let's add some.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  |  2 +-
 t/t1450-fsck.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index f01b81eeb..3e67203f9 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -225,7 +225,7 @@ static void check_unreachable_object(struct object *obj)
 	 * to complain about it being unreachable (since it does
 	 * not exist).
 	 */
-	if (!obj->parsed)
+	if (!(obj->flags & HAS_OBJ))
 		return;
 
 	/*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 6eef8b28e..e88ec7747 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -559,4 +559,31 @@ test_expect_success 'fsck --name-objects' '
 	)
 '
 
+# for each of type, we have one version which is referenced by another object
+# (and so while unreachable, not dangling), and another variant which really is
+# dangling.
+test_expect_success 'fsck notices dangling objects' '
+	git init dangling &&
+	(
+		cd dangling &&
+		blob=$(echo not-dangling | git hash-object -w --stdin) &&
+		dblob=$(echo dangling | git hash-object -w --stdin) &&
+		tree=$(printf "100644 blob %s\t%s\n" $blob one | git mktree) &&
+		dtree=$(printf "100644 blob %s\t%s\n" $blob two | git mktree) &&
+		commit=$(git commit-tree $tree) &&
+		dcommit=$(git commit-tree -p $commit $tree) &&
+
+		cat >expect <<-EOF &&
+		dangling blob $dblob
+		dangling commit $dcommit
+		dangling tree $dtree
+		EOF
+
+		git fsck >actual &&
+		# the output order is non-deterministic, as it comes from a hash
+		sort <actual >actual.sorted &&
+		test_cmp expect actual.sorted
+	)
+'
+
 test_done
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related

* [PATCH 5/6] fsck: do not fallback "git fsck <bogus>" to "git fsck"
From: Jeff King @ 2017-01-16 21:34 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

Since fsck tries to continue as much as it can after seeing
an error, we still do the reachability check even if some
heads we were given on the command-line are bogus. But if
_none_ of the heads is is valid, we fallback to checking all
refs and the index, which is not what the user asked for at
all.

Instead of checking "heads", the number of successful heads
we got, check "argc" (which we know only has non-options in
it, because parse_options removed the others).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  |  2 +-
 t/t1450-fsck.sh | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index c7d0590e5..8ae065b2d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -778,7 +778,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	 * default ones from .git/refs. We also consider the index file
 	 * in this case (ie this implies --cache).
 	 */
-	if (!heads) {
+	if (!argc) {
 		get_default_heads();
 		keep_cache_objects = 1;
 	}
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 2f3b05276..96b74dc9a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -610,4 +610,15 @@ test_expect_success 'fsck $name notices bogus $name' '
 	test_must_fail git fsck $_z40
 '
 
+test_expect_success 'bogus head does not fallback to all heads' '
+	# set up a case that will cause a reachability complaint
+	echo to-be-deleted >foo &&
+	git add foo &&
+	blob=$(git rev-parse :foo) &&
+	test_when_finished "git rm --cached foo" &&
+	remove_object $blob &&
+	test_must_fail git fsck $_z40 >out 2>&1 &&
+	! grep $blob out
+'
+
 test_done
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related

* [PATCH 6/6] fsck: check HAS_OBJ more consistently
From: Jeff King @ 2017-01-16 21:34 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

There are two spots that call lookup_object() and assume
that a non-NULL result means we have the object:

  1. When we're checking the objects given to us by the user
     on the command line.

  2. When we're checking if a reflog entry is valid.

This generally follows fsck's mental model that we will have
looked at and loaded a "struct object" for each object in
the repository. But it misses one case: if another object
_mentioned_ an object, but we didn't actually parse it or
verify that it exists, it will still have a struct.

It's not clear if this is a triggerable bug or not.
Certainly the later parts of the reachability check need to
be careful of this, and do so by checking the HAS_OBJ flag.
But both of these steps happen before we start traversing,
so probably we won't have followed any links yet. Still,
it's easy enough to be defensive here.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 8ae065b2d..28d3cbb14 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -398,7 +398,7 @@ static void fsck_handle_reflog_sha1(const char *refname, unsigned char *sha1,

 	if (!is_null_sha1(sha1)) {
 		obj = lookup_object(sha1);
-		if (obj) {
+		if (obj && (obj->flags & HAS_OBJ)) {
 			if (timestamp && name_objects)
 				add_decoration(fsck_walk_options.object_names,
 					obj,
@@ -755,7 +755,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		if (!get_sha1(arg, sha1)) {
 			struct object *obj = lookup_object(sha1);

-			if (!obj) {
+			if (!obj || !(obj->flags & HAS_OBJ)) {
 				error("%s: object missing", sha1_to_hex(sha1));
 				errors_found |= ERROR_OBJECT;
 				continue;
-- 
2.11.0.642.gd6f8cda6c

^ permalink raw reply related

* [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check
From: Jeff King @ 2017-01-16 21:32 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.

Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.

However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:

  1. mark_object() will not queue objects for examination,
     so recursively following links from commits to trees,
     etc, did nothing. I.e., we were checking the
     reachability of hardly anything at all.

  2. When a set of heads is given on the command-line, we
     use lookup_object() to see if they exist. But without
     the initial pass, we assume nothing exists.

  3. When loading reflog entries, we do a similar
     lookup_object() check, and complain that the reflog is
     broken if the object doesn't exist in our hash.

So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.

One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).

Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.

And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.

One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.

Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.

That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).

Signed-off-by: Jeff King <peff@peff.net>
---
The cmd_fsck() part of the diff is pretty nasty without
"-b".

I had originally planned to pull the "stop calling
verify_pack()" bit out into its own patch. But doing this
without treating loose and packed objects the same raises a
lot of questions about why one and not the other. It seemed
simpler to just explain it all together.

 builtin/fsck.c  | 118 +++++++++++++++++++++++++++++++++++++++++++++-----------
 t/t1450-fsck.sh |  23 ++++++++++-
 2 files changed, 117 insertions(+), 24 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3e67203f9..f527d8a02 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -205,8 +205,6 @@ static void check_reachable_object(struct object *obj)
 	if (!(obj->flags & HAS_OBJ)) {
 		if (has_sha1_pack(obj->oid.hash))
 			return; /* it is in pack - forget about it */
-		if (connectivity_only && has_object_file(&obj->oid))
-			return;
 		printf("missing %s %s\n", typename(obj->type),
 			describe_object(obj));
 		errors_found |= ERROR_REACHABLE;
@@ -584,6 +582,79 @@ static int fsck_cache_tree(struct cache_tree *it)
 	return err;
 }

+static void mark_object_for_connectivity(const unsigned char *sha1)
+{
+	struct object *obj = lookup_object(sha1);
+
+	/*
+	 * Setting the object type here isn't strictly necessary here for a
+	 * connectivity check. In most cases, our walk will expect a certain
+	 * type (e.g., a tree referencing a blob) and will use lookup_blob() to
+	 * assign the type. But doing it here has two advantages:
+	 *
+	 *   1. When the fsck_walk code looks at objects that _don't_ come from
+	 *      links (e.g., the tip of a ref), it may complain about the
+	 *      "unknown object type".
+	 *
+	 *   2. This serves as a nice cross-check that the graph links are
+	 *      sane. So --connectivity-only does not check that the bits of
+	 *      blobs are not corrupted, but it _does_ check that 100644 tree
+	 *      entries point to blobs, and so forth.
+	 *
+	 * Unfortunately we can't just use parse_object() here, because the
+	 * whole point of --connectivity-only is to avoid reading the object
+	 * data more than necessary.
+	 */
+	if (!obj || obj->type == OBJ_NONE) {
+		enum object_type type = sha1_object_info(sha1, NULL);
+		switch (type) {
+		case OBJ_BAD:
+			error("%s: unable to read object type",
+			      sha1_to_hex(sha1));
+			break;
+		case OBJ_COMMIT:
+			obj = (struct object *)lookup_commit(sha1);
+			break;
+		case OBJ_TREE:
+			obj = (struct object *)lookup_tree(sha1);
+			break;
+		case OBJ_BLOB:
+			obj = (struct object *)lookup_blob(sha1);
+			break;
+		case OBJ_TAG:
+			obj = (struct object *)lookup_tag(sha1);
+			break;
+		default:
+			error("%s: unknown object type %d",
+			      sha1_to_hex(sha1), type);
+		}
+
+		if (!obj || obj->type == OBJ_NONE) {
+			errors_found |= ERROR_OBJECT;
+			return;
+		}
+	}
+
+	obj->flags |= HAS_OBJ;
+}
+
+static int mark_loose_for_connectivity(const unsigned char *sha1,
+				       const char *path,
+				       void *data)
+{
+	mark_object_for_connectivity(sha1);
+	return 0;
+}
+
+static int mark_packed_for_connectivity(const unsigned char *sha1,
+					struct packed_git *pack,
+					uint32_t pos,
+					void *data)
+{
+	mark_object_for_connectivity(sha1);
+	return 0;
+}
+
 static char const * const fsck_usage[] = {
 	N_("git fsck [<options>] [<object>...]"),
 	NULL
@@ -646,32 +717,35 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		prepare_alt_odb();
 		for (alt = alt_odb_list; alt; alt = alt->next)
 			fsck_object_dir(alt->path);
-	}

-	if (check_full) {
-		struct packed_git *p;
-		uint32_t total = 0, count = 0;
-		struct progress *progress = NULL;
+		if (check_full) {
+			struct packed_git *p;
+			uint32_t total = 0, count = 0;
+			struct progress *progress = NULL;
+
+			prepare_packed_git();

-		prepare_packed_git();
+			if (show_progress) {
+				for (p = packed_git; p; p = p->next) {
+					if (open_pack_index(p))
+						continue;
+					total += p->num_objects;
+				}

-		if (show_progress) {
+				progress = start_progress(_("Checking objects"), total);
+			}
 			for (p = packed_git; p; p = p->next) {
-				if (open_pack_index(p))
-					continue;
-				total += p->num_objects;
+				/* verify gives error messages itself */
+				if (verify_pack(p, fsck_obj_buffer,
+						progress, count))
+					errors_found |= ERROR_PACK;
+				count += p->num_objects;
 			}
-
-			progress = start_progress(_("Checking objects"), total);
-		}
-		for (p = packed_git; p; p = p->next) {
-			/* verify gives error messages itself */
-			if (verify_pack(p, fsck_obj_buffer,
-					progress, count))
-				errors_found |= ERROR_PACK;
-			count += p->num_objects;
+			stop_progress(&progress);
 		}
-		stop_progress(&progress);
+	} else {
+		for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
+		for_each_packed_object(mark_packed_for_connectivity, NULL, 0);
 	}

 	heads = 0;
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index e88ec7747..c1b2dda33 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -523,9 +523,21 @@ test_expect_success 'fsck --connectivity-only' '
 		touch empty &&
 		git add empty &&
 		test_commit empty &&
+
+		# Drop the index now; we want to be sure that we
+		# recursively notice that we notice the broken objects
+		# because they are reachable from refs, not because
+		# they are in the index.
+		rm -f .git/index &&
+
+		# corrupt the blob, but in a way that we can still identify
+		# its type. That lets us see that --connectivity-only is
+		# not actually looking at the contents, but leaves it
+		# free to examine the type if it chooses.
 		empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 &&
-		rm -f $empty &&
-		echo invalid >$empty &&
+		blob=$(echo unrelated | git hash-object -w --stdin) &&
+		mv $(sha1_file $blob) $empty &&
+
 		test_must_fail git fsck --strict &&
 		git fsck --strict --connectivity-only &&
 		tree=$(git rev-parse HEAD:) &&
@@ -537,6 +549,13 @@ test_expect_success 'fsck --connectivity-only' '
 	)
 '

+test_expect_success 'fsck --connectivity-only with explicit head' '
+	(
+		cd connectivity-only &&
+		test_must_fail git fsck --connectivity-only $tree
+	)
+'
+
 remove_loose_object () {
 	sha1="$(git rev-parse "$1")" &&
 	remainder=${sha1#??} &&
-- 
2.11.0.642.gd6f8cda6c

^ permalink raw reply related

* [PATCH 4/6] fsck: tighten error-checks of "git fsck <head>"
From: Jeff King @ 2017-01-16 21:33 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

Instead of checking reachability from the refs, you can ask
fsck to check from a particular set of heads. However, the
error checking here is quite lax. In particular:

  1. It claims lookup_object() will report an error, which
     is not true. It only does a hash lookup, and the user
     has no clue that their argument was skipped.

  2. When either the name or sha1 cannot be resolved, we
     continue to exit with a successful error code, even
     though we didn't check what the user asked us to.

This patch fixes both of these cases.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  | 7 +++++--
 t/t1450-fsck.sh | 5 +++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index f527d8a02..c7d0590e5 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -755,9 +755,11 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		if (!get_sha1(arg, sha1)) {
 			struct object *obj = lookup_object(sha1);
 
-			/* Error is printed by lookup_object(). */
-			if (!obj)
+			if (!obj) {
+				error("%s: object missing", sha1_to_hex(sha1));
+				errors_found |= ERROR_OBJECT;
 				continue;
+			}
 
 			obj->used = 1;
 			if (name_objects)
@@ -768,6 +770,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 		error("invalid parameter: expected sha1, got '%s'", arg);
+		errors_found |= ERROR_OBJECT;
 	}
 
 	/*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index c1b2dda33..2f3b05276 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -605,4 +605,9 @@ test_expect_success 'fsck notices dangling objects' '
 	)
 '
 
+test_expect_success 'fsck $name notices bogus $name' '
+	test_must_fail git fsck bogus &&
+	test_must_fail git fsck $_z40
+'
+
 test_done
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related

* Re: [PATCH 1/2] configure.ac: fix old iconv check
From: Jeff King @ 2017-01-16 21:41 UTC (permalink / raw)
  To: Bernd Kuhls; +Cc: git
In-Reply-To: <20170116195638.3713-1-bernd.kuhls@writeme.com>

On Mon, Jan 16, 2017 at 08:56:37PM +0100, Bernd Kuhls wrote:

> According to
> https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Running-the-Compiler.html
> the parameter syntax of AC_COMPILE_IFELSE is
> 
> (input, [action-if-true], [action-if-false])
> 
> Displaying "no" when the test was positive and enabling support for old
> iconv implementations by OLD_ICONV=UnfortunatelyYes when the test fails
> it obviously wrong. This patch switches the actions to fix the problem.

Hrm. But the test code is:

  # Define OLD_ICONV if your library has an old iconv(), where the second
  # (input buffer pointer) parameter is declared with type (const char **).
  AC_DEFUN([OLDICONVTEST_SRC], [
  AC_LANG_PROGRAM([[
  #include <iconv.h>

  extern size_t iconv(iconv_t cd,
                      char **inbuf, size_t *inbytesleft,
                      char **outbuf, size_t *outbytesleft);
  ]], [])])

Which will compile correctly for the _new_ code, but not the old. So
when the test is positive, then "no", we do not need to set the
OLD_ICONV flag.

The logic would probably be clearer if the test were reversed. It would
mean that other compilation problems (e.g., you do not have iconv.h at
all) would fail to see OLD_ICONV, but that seems reasonable.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Jeff King @ 2017-01-16 21:44 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Johannes Schindelin, Junio C Hamano, git
In-Reply-To: <677a335f-889c-2924-b7bd-93c2b6663175@kdbg.org>

On Mon, Jan 16, 2017 at 09:33:07PM +0100, Johannes Sixt wrote:

> However, Jeff's patch is intended to catch exactly these cases (not for the
> cases where this happens accidentally, but when they happen with malicious
> intent).
> 
> We are talking about user-provided data that is reproduced by die() or
> error(). I daresay that we do not have a single case where it is intended
> that this data is intentionally multi-lined, like a commit message. It can
> only be an accident or malicious when it spans across lines.
> 
> I know we allow CR and LF in file names, but in all cases where such a name
> appears in an error message, it is *not important* that the data is
> reproduced exactly. On the contrary, it is usually more helpful to know that
> something strange is going on. The question marks are a strong indication to
> the user for this.

Yes, exactly. Thanks for explaining this better than I obviously was
doing. :)

> > If you absolutely insist, I will spend time to find a plausible example
> > and use that in the regression test.
> 
> I don't want to see you on an endeavor with dubious results. I'd prefer to
> wait until the first case of "incorrectly munged data" is reported because,
> as I said, I have a gut feeling that there is none.

Agreed.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Jeff King @ 2017-01-16 22:00 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johannes Sixt, Junio C Hamano, git
In-Reply-To: <alpine.DEB.2.20.1701161746200.3469@virtualbox>

On Mon, Jan 16, 2017 at 06:06:35PM +0100, Johannes Schindelin wrote:

> >      And I think that would apply to any input parameter we show via
> >      error(), etc, if it is connected to a newline (ideally we would
> >      show newlines as "?", too, but we cannot tell the difference
> >      between ones from parameters, and ones that are part of the error
> >      message).
> 
> I think it is doing users a really great disservice to munge up CR or LF
> into question marks. I *guarantee* you that it confuses users. And not
> because they are dumb, but because the code violates the Law of Least
> Surprise.

I'm not sure if you realize that with stock git, the example from your
test looks like this (at least in my terminal):

  $ git.v2.11.0 rev-parse --abbrev-ref "$(printf 'CR/LF\r\n')" >/dev/null
  ': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git <command> [<revision>...] -- [<file>...]'

The "\r" causes us to overwrite the rest of the message, including the
actual filename. With my patch it's:

  $ git rev-parse --abbrev-ref "$(printf 'CR/LF\r\n')" >/dev/null
  fatal: ambiguous argument 'CR/LF?': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git <command> [<revision>...] -- [<file>...]'

I am certainly sympathetic to the idea that the "?" is ugly and
potentially confusing. But I think it's at least a step forward for this
particular example.

I'll snip liberally from the rest of your response, because I think what
JSixt wrote covers it.

> > > While at it, let's lose the unnecessary curly braces.
> > 
> > Please don't. Obviously C treats the "if/else" as a single unit, but
> > IMHO it's less error-prone to include the braces any time there are
> > multiple visual lines. E.g., something like:
> > 
> >   while (foo)
> > 	if (bar)
> > 		one();
> > 	else
> > 		two();
> > 	three();
> > 
> > is much easier to spot as wrong when you would require braces either
> > way (and not relevant here, but I'd say that even an inner block with a
> > comment deserves braces for the same reason).
> 
> There is no documentation about the preferred coding style.

Documentation/CodingGuidelines says:

 - We avoid using braces unnecessarily.  I.e.

        if (bla) {
                x = 1;
        }

   is frowned upon.  A gray area is when the statement extends
   over a few lines, and/or you have a lengthy comment atop of
   it.  Also, like in the Linux kernel, if there is a long list
   of "else if" statements, it can make sense to add braces to
   single line blocks.

I think this is pretty clearly the "gray area" mentioned there. Which
yes, does not say "definitely do it this way", but I hope makes it clear
that you're supposed to use judgement about readability.

-Peff

^ permalink raw reply

* [PATCH] CodingGuidelines: clarify multi-line brace style
From: Jeff King @ 2017-01-16 22:08 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johannes Sixt, Junio C Hamano, git
In-Reply-To: <20170116220014.bwi5xi2br56lyqsw@sigill.intra.peff.net>

On Mon, Jan 16, 2017 at 05:00:14PM -0500, Jeff King wrote:

> > > Please don't. Obviously C treats the "if/else" as a single unit, but
> > > IMHO it's less error-prone to include the braces any time there are
> > > multiple visual lines. E.g., something like:
> > > 
> > >   while (foo)
> > > 	if (bar)
> > > 		one();
> > > 	else
> > > 		two();
> > > 	three();
> > > 
> > > is much easier to spot as wrong when you would require braces either
> > > way (and not relevant here, but I'd say that even an inner block with a
> > > comment deserves braces for the same reason).
> > 
> > There is no documentation about the preferred coding style.
> 
> Documentation/CodingGuidelines says:
> 
>  - We avoid using braces unnecessarily.  I.e.
> 
>         if (bla) {
>                 x = 1;
>         }
> 
>    is frowned upon.  A gray area is when the statement extends
>    over a few lines, and/or you have a lengthy comment atop of
>    it.  Also, like in the Linux kernel, if there is a long list
>    of "else if" statements, it can make sense to add braces to
>    single line blocks.
> 
> I think this is pretty clearly the "gray area" mentioned there. Which
> yes, does not say "definitely do it this way", but I hope makes it clear
> that you're supposed to use judgement about readability.

So here's a patch.

I know we've usually tried to keep this file to guidelines and not
rules, but clearly it has not been clear-cut enough in this instance.

-- >8 --
Subject: [PATCH] CodingGuidelines: clarify multi-line brace style

There are some "gray areas" around when to omit braces from
a conditional or loop body. Since that seems to have
resulted in some arguments, let's be a little more clear
about our preferred style.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/CodingGuidelines | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 4cd95da6b..0e336e99d 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -206,11 +206,38 @@ For C programs:
 		x = 1;
 	}
 
-   is frowned upon.  A gray area is when the statement extends
-   over a few lines, and/or you have a lengthy comment atop of
-   it.  Also, like in the Linux kernel, if there is a long list
-   of "else if" statements, it can make sense to add braces to
-   single line blocks.
+   is frowned upon. But there are a few exceptions:
+
+	- When the statement extends over a few lines (e.g., a while loop
+	  with an embedded conditional, or a comment). E.g.:
+
+		while (foo) {
+			if (x)
+				one();
+			else
+				two();
+		}
+
+		if (foo) {
+			/*
+			 * This one requires some explanation,
+			 * so we're better off with braces to make
+			 * it obvious that the indentation is correct.
+			 */
+			doit();
+		}
+
+	- When there are multiple arms to a conditional, it can make
+	  sense to add braces to single line blocks for consistency.
+	  E.g.:
+
+		if (foo) {
+			doit();
+		} else {
+			one();
+			two();
+			three();
+		}
 
  - We try to avoid assignments in the condition of an "if" statement.
 
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related

* Re: [RFC] stash: support filename argument
From: Thomas Gummerer @ 2017-01-16 22:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, kes-kes
In-Reply-To: <xmqqvatfc0rt.fsf@gitster.mtv.corp.google.com>

On 01/15, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > While working on a repository, it's often helpful to stash the changes
> > of a single or multiple files, and leave others alone.  Unfortunately
> > git currently offers no such option.  git stash -p can be used to work
> > around this, but it's often impractical when there are a lot of changes
> > over multiple files.
> >
> > Add a --file option to git stash save, which allows for stashing a
> > single file.  Specifying the --file argument multiple times allows
> > stashing more than one file at a time.
> >
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >
> > Marked as RFC and without documentation updates to first get a feeling
> > for the user interface, and whether people are interested in this
> > change.
> >
> > Ideally I wanted the the user interface to look like something like:
> > git stash save -- [<filename1,...>], but unfortunately that's already
> > taken up by the stash message.  So to preserve backward compatibility
> > I used the new --file argument.
> 
> I haven't spent enough time to think if it even makes sense to
> "stash" partially, leaving the working tree still dirty.  My initial
> reaction was "then stashing away the dirty WIP state to get a spiffy
> clean working environment becomes impossible", and I still need time
> to recover from that ;-)  So as to the desirablity of this "feature",
> I have no strong opinion for or against yet.

As others mentioned, this would be similar to stash -p.  My main
usecase is stashing away a config file or changes in one test file
while I forget to test something without these changes.  I definitely
used git stash -p on more than one occasion to simulate this behaviour
before.

> But if we were to do this, then we should bite the bullet and
> declare that "stash save <message>" was a mistake.  It should have
> been "stash save -m <message>" and we should transition to that (one
> easy way out would be to find another verb that is not 'save').
>
> Then we can do "git stash save [-m <msg>] [--] [pathspec...]" which
> follows the usual command line convention.

I like this interface much better than the one in my patch.  I'm not
sure about the verb we could use for the transition, maybe 'push'.  If
anyone can come up with a different suggestion I'd be happy to hear
it, as I'm not very good at naming things.

-- 
Thomas

^ permalink raw reply

* [BUG/RFC] Git Gui: GIT_DIR leaks to spawned Git Bash
From: Max Kirillov @ 2017-01-16 22:40 UTC (permalink / raw)
  To: Pat Thoyts; +Cc: git, Junio C Hamano, Johannes Schindelin

Hello.

Apparently various GIT_* environment variables (most
interesting is GIT_DIR but AFAIR there were more) leak to
shell session launched from Git Gui's "Git Bash" menu item.
Which can cause unexpected results if user navigates in that
shell to some other repository (in can also include starting
another terminal window etc. to make user forget original
context) and tries to do something there.

I think is would be appropriate to clean up the environment
while starting any program from git gui. It may have
downside that the environment could be set intentionally,
and now is's removed, but (1) there seems to be no way to
distinguish intentionally vs automatically set GIT_DIR, and
(2) the spawned program may be intended to be used
independly even if the environment was set intentionally for
git gui.

But I would like to note that I find it wrong for the "git"
dispatcher to set up any enrironment to the spawned
command-specific executable. Because it mixes the concerns
of that variables - are they user-settable variables to
tweak the git's behavior, or internal ones to transport the
discovered paths and other info to scripts which cannot do
the standard discovery? Not clear. There has been bugs
related to that already, for example [1].

[1] https://public-inbox.org/git/1409784120-2228-1-git-send-email-max@max630.net/

-- 
Max

^ permalink raw reply

* [RFH - Tcl/Tk] use of procedure before declaration?
From: Philip Oakley @ 2017-01-16 22:44 UTC (permalink / raw)
  To: Git List; +Cc: Pat Thoyts

I'm looking into a user git-gui problem
(https://github.com/git-for-windows/git/issues/1014) that I'd seen in the
past - I'd started some patches back in Dec 2015
http://public-inbox.org/git/1450310287-4936-1-git-send-email-philipoakley@iee.org/

I'm trying to make sure I have covered the corner cases correctly, and I'm
not sure if the current code actually works as advertised.

In
https://github.com/git/git/blob/master/git-gui/lib/choose_repository.tcl#L242
the procedure `_unset_recentrepo` is called, however the procedure isn't
declared until line 248. My reading of the various Tcl tutorials suggest
(but not explictly) that this isn't the right way.

Should 3c6a287 ("git-gui: Keep repo_config(gui.recentrepos) and .gitconfig
in sync", 2010-01-23) have declared `proc _unset_recentrepo {p}` before
`proc _get_recentrepos {}` ?

--

Philip

^ permalink raw reply

* HEAD's reflog entry for a renamed branch
From: Kyle Meyer @ 2017-01-16 23:17 UTC (permalink / raw)
  To: git

Hello,

I noticed that, after renaming the current branch, the corresponding
message in .git/logs/HEAD is empty.

For example, running

    $ mkdir test-repo
    $ cd test-repo
    $ git init
    $ echo abc >file.txt
    $ git add file.txt
    $ git commit -m"Add file.txt"
    $ git branch -m master new-master

resulted in the following reflogs:

   $ cat .git/logs/refs/heads/new-master
   00000... 68730... Kyle Meyer <kyle@kyleam.com> 1484607020 -0500	commit (initial): Add file.txt
   68730... 68730... Kyle Meyer <kyle@kyleam.com> 1484607020 -0500	Branch: renamed refs/heads/master to refs/heads/new-master

   $ cat .git/logs/HEAD
   00000... 68730... Kyle Meyer <kyle@kyleam.com> 1484607020 -0500	commit (initial): Add file.txt
   68730... 00000... Kyle Meyer <kyle@kyleam.com> 1484607020 -0500

I expected the second line of .git/logs/HEAD to mirror the second line
of .git/logs/refs/heads/new-master.  Are the empty message and null sha1
in HEAD's entry intentional?

--
Kyle

^ permalink raw reply

* Re: [PATCH 0/6] fsck --connectivity-check misses some corruption
From: Jeff King @ 2017-01-16 23:26 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

On Mon, Jan 16, 2017 at 04:22:31PM -0500, Jeff King wrote:

> I came across a repository today that was missing an object, and for
> which "git fsck" reported the error but "git fsck --connectivity-check"
> did not. It turns out that the shortcut taken by --connectivity-check
> violates some assumptions made by the rest of fsck (namely, that every
> object in the repo has a "struct object" loaded).
> 
> And fsck being a generally neglected tool, I couldn't help but find
> several more bugs on the way. :)
> 
>   [1/6]: t1450: clean up sub-objects in duplicate-entry test
>   [2/6]: fsck: report trees as dangling
>   [3/6]: fsck: prepare dummy objects for --connectivity-check
>   [4/6]: fsck: tighten error-checks of "git fsck <head>"
>   [5/6]: fsck: do not fallback "git fsck <bogus>" to "git fsck"
>   [6/6]: fsck: check HAS_OBJ more consistently
> 
>  builtin/fsck.c  | 131 ++++++++++++++++++++++++++++++++++++++++++++------------
>  t/t1450-fsck.sh |  70 ++++++++++++++++++++++++++++--
>  2 files changed, 171 insertions(+), 30 deletions(-)

Oh, one thing I forgot to mention: I tried test-merging this with my
other recent topic in the area, jk/loose-object-fsck. There are a few
conflicts in the test script, but nothing hard to resolve (just both of
them adding tests at the end, but both are careful to clean up after
themselves).

-Peff

^ permalink raw reply

* Re: [RFC] stash: support filename argument
From: Jeff King @ 2017-01-16 23:49 UTC (permalink / raw)
  To: Stephan Beyer; +Cc: Junio C Hamano, Thomas Gummerer, git, kes-kes
In-Reply-To: <c0d46a97-b1c0-d9c9-e475-28e0368ac61f@gmx.net>

On Mon, Jan 16, 2017 at 01:41:02AM +0100, Stephan Beyer wrote:

> However, going further, the next feature to be requested could be "git
> stash --patch" ;D This leads me to think that the mentioned simulation
> of partial stashes might be something everyone might come up with who
> has basic understanding of git features and "git stash", and might be
> the more flexible and probably better solution to the problem.

Don't we have "git stash -p" already?[1]

I use it all the time, and it's very handy for picking apart changes. I
have often wanted "git stash -p <paths>" to work, though.

-Peff

[1] I had to double check that it was not something I hacked together
    locally and forgot about. :) It's from dda1f2a5c (Implement 'git
    stash save --patch', 2009-08-13). I also worked up a "git stash
    apply -p", but I remember it was buggy, and I haven't looked at it
    in years. It's at:

      https://github.com/peff/git/commit/2c4ed43987b20cfb1605fbd7648d81e454c45c71

    if anybody is curious. I don't even recall what the problems were at
    this point.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox