Git development

Git development
 help / color / mirror / Atom feed

* [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check
From: Jeff King @ 2017-01-16 21:32 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.

Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.

However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:

  1. mark_object() will not queue objects for examination,
     so recursively following links from commits to trees,
     etc, did nothing. I.e., we were checking the
     reachability of hardly anything at all.

  2. When a set of heads is given on the command-line, we
     use lookup_object() to see if they exist. But without
     the initial pass, we assume nothing exists.

  3. When loading reflog entries, we do a similar
     lookup_object() check, and complain that the reflog is
     broken if the object doesn't exist in our hash.

So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.

One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).

Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.

And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.

One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.

Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.

That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).

Signed-off-by: Jeff King <peff@peff.net>
---
The cmd_fsck() part of the diff is pretty nasty without
"-b".

I had originally planned to pull the "stop calling
verify_pack()" bit out into its own patch. But doing this
without treating loose and packed objects the same raises a
lot of questions about why one and not the other. It seemed
simpler to just explain it all together.

 builtin/fsck.c  | 118 +++++++++++++++++++++++++++++++++++++++++++++-----------
 t/t1450-fsck.sh |  23 ++++++++++-
 2 files changed, 117 insertions(+), 24 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3e67203f9..f527d8a02 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -205,8 +205,6 @@ static void check_reachable_object(struct object *obj)
 	if (!(obj->flags & HAS_OBJ)) {
 		if (has_sha1_pack(obj->oid.hash))
 			return; /* it is in pack - forget about it */
-		if (connectivity_only && has_object_file(&obj->oid))
-			return;
 		printf("missing %s %s\n", typename(obj->type),
 			describe_object(obj));
 		errors_found |= ERROR_REACHABLE;
@@ -584,6 +582,79 @@ static int fsck_cache_tree(struct cache_tree *it)
 	return err;
 }

+static void mark_object_for_connectivity(const unsigned char *sha1)
+{
+	struct object *obj = lookup_object(sha1);
+
+	/*
+	 * Setting the object type here isn't strictly necessary here for a
+	 * connectivity check. In most cases, our walk will expect a certain
+	 * type (e.g., a tree referencing a blob) and will use lookup_blob() to
+	 * assign the type. But doing it here has two advantages:
+	 *
+	 *   1. When the fsck_walk code looks at objects that _don't_ come from
+	 *      links (e.g., the tip of a ref), it may complain about the
+	 *      "unknown object type".
+	 *
+	 *   2. This serves as a nice cross-check that the graph links are
+	 *      sane. So --connectivity-only does not check that the bits of
+	 *      blobs are not corrupted, but it _does_ check that 100644 tree
+	 *      entries point to blobs, and so forth.
+	 *
+	 * Unfortunately we can't just use parse_object() here, because the
+	 * whole point of --connectivity-only is to avoid reading the object
+	 * data more than necessary.
+	 */
+	if (!obj || obj->type == OBJ_NONE) {
+		enum object_type type = sha1_object_info(sha1, NULL);
+		switch (type) {
+		case OBJ_BAD:
+			error("%s: unable to read object type",
+			      sha1_to_hex(sha1));
+			break;
+		case OBJ_COMMIT:
+			obj = (struct object *)lookup_commit(sha1);
+			break;
+		case OBJ_TREE:
+			obj = (struct object *)lookup_tree(sha1);
+			break;
+		case OBJ_BLOB:
+			obj = (struct object *)lookup_blob(sha1);
+			break;
+		case OBJ_TAG:
+			obj = (struct object *)lookup_tag(sha1);
+			break;
+		default:
+			error("%s: unknown object type %d",
+			      sha1_to_hex(sha1), type);
+		}
+
+		if (!obj || obj->type == OBJ_NONE) {
+			errors_found |= ERROR_OBJECT;
+			return;
+		}
+	}
+
+	obj->flags |= HAS_OBJ;
+}
+
+static int mark_loose_for_connectivity(const unsigned char *sha1,
+				       const char *path,
+				       void *data)
+{
+	mark_object_for_connectivity(sha1);
+	return 0;
+}
+
+static int mark_packed_for_connectivity(const unsigned char *sha1,
+					struct packed_git *pack,
+					uint32_t pos,
+					void *data)
+{
+	mark_object_for_connectivity(sha1);
+	return 0;
+}
+
 static char const * const fsck_usage[] = {
 	N_("git fsck [<options>] [<object>...]"),
 	NULL
@@ -646,32 +717,35 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		prepare_alt_odb();
 		for (alt = alt_odb_list; alt; alt = alt->next)
 			fsck_object_dir(alt->path);
-	}

-	if (check_full) {
-		struct packed_git *p;
-		uint32_t total = 0, count = 0;
-		struct progress *progress = NULL;
+		if (check_full) {
+			struct packed_git *p;
+			uint32_t total = 0, count = 0;
+			struct progress *progress = NULL;
+
+			prepare_packed_git();

-		prepare_packed_git();
+			if (show_progress) {
+				for (p = packed_git; p; p = p->next) {
+					if (open_pack_index(p))
+						continue;
+					total += p->num_objects;
+				}

-		if (show_progress) {
+				progress = start_progress(_("Checking objects"), total);
+			}
 			for (p = packed_git; p; p = p->next) {
-				if (open_pack_index(p))
-					continue;
-				total += p->num_objects;
+				/* verify gives error messages itself */
+				if (verify_pack(p, fsck_obj_buffer,
+						progress, count))
+					errors_found |= ERROR_PACK;
+				count += p->num_objects;
 			}
-
-			progress = start_progress(_("Checking objects"), total);
-		}
-		for (p = packed_git; p; p = p->next) {
-			/* verify gives error messages itself */
-			if (verify_pack(p, fsck_obj_buffer,
-					progress, count))
-				errors_found |= ERROR_PACK;
-			count += p->num_objects;
+			stop_progress(&progress);
 		}
-		stop_progress(&progress);
+	} else {
+		for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
+		for_each_packed_object(mark_packed_for_connectivity, NULL, 0);
 	}

 	heads = 0;
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index e88ec7747..c1b2dda33 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -523,9 +523,21 @@ test_expect_success 'fsck --connectivity-only' '
 		touch empty &&
 		git add empty &&
 		test_commit empty &&
+
+		# Drop the index now; we want to be sure that we
+		# recursively notice that we notice the broken objects
+		# because they are reachable from refs, not because
+		# they are in the index.
+		rm -f .git/index &&
+
+		# corrupt the blob, but in a way that we can still identify
+		# its type. That lets us see that --connectivity-only is
+		# not actually looking at the contents, but leaves it
+		# free to examine the type if it chooses.
 		empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 &&
-		rm -f $empty &&
-		echo invalid >$empty &&
+		blob=$(echo unrelated | git hash-object -w --stdin) &&
+		mv $(sha1_file $blob) $empty &&
+
 		test_must_fail git fsck --strict &&
 		git fsck --strict --connectivity-only &&
 		tree=$(git rev-parse HEAD:) &&
@@ -537,6 +549,13 @@ test_expect_success 'fsck --connectivity-only' '
 	)
 '

+test_expect_success 'fsck --connectivity-only with explicit head' '
+	(
+		cd connectivity-only &&
+		test_must_fail git fsck --connectivity-only $tree
+	)
+'
+
 remove_loose_object () {
 	sha1="$(git rev-parse "$1")" &&
 	remainder=${sha1#??} &&
-- 
2.11.0.642.gd6f8cda6c

^ permalink raw reply related

* [PATCH 6/6] fsck: check HAS_OBJ more consistently
From: Jeff King @ 2017-01-16 21:34 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

There are two spots that call lookup_object() and assume
that a non-NULL result means we have the object:

  1. When we're checking the objects given to us by the user
     on the command line.

  2. When we're checking if a reflog entry is valid.

This generally follows fsck's mental model that we will have
looked at and loaded a "struct object" for each object in
the repository. But it misses one case: if another object
_mentioned_ an object, but we didn't actually parse it or
verify that it exists, it will still have a struct.

It's not clear if this is a triggerable bug or not.
Certainly the later parts of the reachability check need to
be careful of this, and do so by checking the HAS_OBJ flag.
But both of these steps happen before we start traversing,
so probably we won't have followed any links yet. Still,
it's easy enough to be defensive here.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 8ae065b2d..28d3cbb14 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -398,7 +398,7 @@ static void fsck_handle_reflog_sha1(const char *refname, unsigned char *sha1,

 	if (!is_null_sha1(sha1)) {
 		obj = lookup_object(sha1);
-		if (obj) {
+		if (obj && (obj->flags & HAS_OBJ)) {
 			if (timestamp && name_objects)
 				add_decoration(fsck_walk_options.object_names,
 					obj,
@@ -755,7 +755,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		if (!get_sha1(arg, sha1)) {
 			struct object *obj = lookup_object(sha1);

-			if (!obj) {
+			if (!obj || !(obj->flags & HAS_OBJ)) {
 				error("%s: object missing", sha1_to_hex(sha1));
 				errors_found |= ERROR_OBJECT;
 				continue;
-- 
2.11.0.642.gd6f8cda6c

^ permalink raw reply related

* [PATCH 5/6] fsck: do not fallback "git fsck <bogus>" to "git fsck"
From: Jeff King @ 2017-01-16 21:34 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

Since fsck tries to continue as much as it can after seeing
an error, we still do the reachability check even if some
heads we were given on the command-line are bogus. But if
_none_ of the heads is is valid, we fallback to checking all
refs and the index, which is not what the user asked for at
all.

Instead of checking "heads", the number of successful heads
we got, check "argc" (which we know only has non-options in
it, because parse_options removed the others).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  |  2 +-
 t/t1450-fsck.sh | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index c7d0590e5..8ae065b2d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -778,7 +778,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	 * default ones from .git/refs. We also consider the index file
 	 * in this case (ie this implies --cache).
 	 */
-	if (!heads) {
+	if (!argc) {
 		get_default_heads();
 		keep_cache_objects = 1;
 	}
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 2f3b05276..96b74dc9a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -610,4 +610,15 @@ test_expect_success 'fsck $name notices bogus $name' '
 	test_must_fail git fsck $_z40
 '
 
+test_expect_success 'bogus head does not fallback to all heads' '
+	# set up a case that will cause a reachability complaint
+	echo to-be-deleted >foo &&
+	git add foo &&
+	blob=$(git rev-parse :foo) &&
+	test_when_finished "git rm --cached foo" &&
+	remove_object $blob &&
+	test_must_fail git fsck $_z40 >out 2>&1 &&
+	! grep $blob out
+'
+
 test_done
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related

* [PATCH 2/6] fsck: report trees as dangling
From: Jeff King @ 2017-01-16 21:25 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

After checking connectivity, fsck looks through the list of
any objects we've seen mentioned, and reports unreachable
and un-"used" ones as dangling. However, it skips any object
which is not marked as "parsed", as that is an object that
we _don't_ have (but that somebody mentioned).

Since 6e454b9a3 (clear parsed flag when we free tree
buffers, 2013-06-05), that flag can't be relied on, and the
correct method is to check the HAS_OBJ flag. The cleanup in
that commit missed this callsite, though. As a result, we
would generally fail to report dangling trees.

We never noticed because there were no tests in this area
(for trees or otherwise). Let's add some.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  |  2 +-
 t/t1450-fsck.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index f01b81eeb..3e67203f9 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -225,7 +225,7 @@ static void check_unreachable_object(struct object *obj)
 	 * to complain about it being unreachable (since it does
 	 * not exist).
 	 */
-	if (!obj->parsed)
+	if (!(obj->flags & HAS_OBJ))
 		return;
 
 	/*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 6eef8b28e..e88ec7747 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -559,4 +559,31 @@ test_expect_success 'fsck --name-objects' '
 	)
 '
 
+# for each of type, we have one version which is referenced by another object
+# (and so while unreachable, not dangling), and another variant which really is
+# dangling.
+test_expect_success 'fsck notices dangling objects' '
+	git init dangling &&
+	(
+		cd dangling &&
+		blob=$(echo not-dangling | git hash-object -w --stdin) &&
+		dblob=$(echo dangling | git hash-object -w --stdin) &&
+		tree=$(printf "100644 blob %s\t%s\n" $blob one | git mktree) &&
+		dtree=$(printf "100644 blob %s\t%s\n" $blob two | git mktree) &&
+		commit=$(git commit-tree $tree) &&
+		dcommit=$(git commit-tree -p $commit $tree) &&
+
+		cat >expect <<-EOF &&
+		dangling blob $dblob
+		dangling commit $dcommit
+		dangling tree $dtree
+		EOF
+
+		git fsck >actual &&
+		# the output order is non-deterministic, as it comes from a hash
+		sort <actual >actual.sorted &&
+		test_cmp expect actual.sorted
+	)
+'
+
 test_done
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related

* [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test
From: Jeff King @ 2017-01-16 21:24 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin
In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net>

This test creates a multi-level set of trees, but its
cleanup routine only removes the top-level tree. After the
test finishes, the inner tree and the blob it points to
remain, making the inner tree dangling.

A later test ("cleaned up") verifies that we've removed any
cruft and "git fsck" output is clean. This passes only
because of a bug in git-fsck which fails to notice dangling
trees.

In preparation for fixing the bug, let's teach this earlier
test to clean up after itself correctly. We have to remove
the inner tree (and therefore the blob, too, which becomes
dangling after removing that tree).

Since the setup code happens inside a subshell, we can't
just set a variable for each object. However, we can stuff
all of the sha1s into the $T output variable, which is not
used for anything except cleanup.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1450-fsck.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index ee7d4736d..6eef8b28e 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -189,14 +189,16 @@ test_expect_success 'commit with NUL in header' '
 '

 test_expect_success 'tree object with duplicate entries' '
-	test_when_finished "remove_object \$T" &&
+	test_when_finished "for i in \$T; do remove_object \$i; done" &&
 	T=$(
 		GIT_INDEX_FILE=test-index &&
 		export GIT_INDEX_FILE &&
 		rm -f test-index &&
 		>x &&
 		git add x &&
+		git rev-parse :x &&
 		T=$(git write-tree) &&
+		echo $T &&
 		(
 			git cat-file tree $T &&
 			git cat-file tree $T
-- 
2.11.0.642.gd6f8cda6c

^ permalink raw reply related

* [PATCH 0/6] fsck --connectivity-check misses some corruption
From: Jeff King @ 2017-01-16 21:22 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin

I came across a repository today that was missing an object, and for
which "git fsck" reported the error but "git fsck --connectivity-check"
did not. It turns out that the shortcut taken by --connectivity-check
violates some assumptions made by the rest of fsck (namely, that every
object in the repo has a "struct object" loaded).

And fsck being a generally neglected tool, I couldn't help but find
several more bugs on the way. :)

  [1/6]: t1450: clean up sub-objects in duplicate-entry test
  [2/6]: fsck: report trees as dangling
  [3/6]: fsck: prepare dummy objects for --connectivity-check
  [4/6]: fsck: tighten error-checks of "git fsck <head>"
  [5/6]: fsck: do not fallback "git fsck <bogus>" to "git fsck"
  [6/6]: fsck: check HAS_OBJ more consistently

 builtin/fsck.c  | 131 ++++++++++++++++++++++++++++++++++++++++++++------------
 t/t1450-fsck.sh |  70 ++++++++++++++++++++++++++++--
 2 files changed, 171 insertions(+), 30 deletions(-)

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Johannes Sixt @ 2017-01-16 20:33 UTC (permalink / raw)
  To: Johannes Schindelin, Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <alpine.DEB.2.20.1701161746200.3469@virtualbox>

Am 16.01.2017 um 18:06 schrieb Johannes Schindelin:
> On Mon, 16 Jan 2017, Jeff King wrote:
>> Hmm.  I am not sure to what degree CRLFs are actually a problem here.
>> Keep in mind these are error messages generated via error(), and so not
>> processing arbitrary data. I can imagine that CRs might come from:
>
> Please note the regression test I added. It uses rev-parse's --abbrev-ref
> option which quotes the argument when erroring out. This argument then
> gets munged.
>
> So error() (or in this case, die()) *very much* processes arbitrary data.
>
> I *know* that rev-parse --abbrev-ref is an artificial example, it is
> highly unlikely that anybody will use
>
> 	git rev-parse --abbrev-ref "$(<call an external program here that
> 		generates CR/LF line endings>)"
>
> However, there are plenty other cases in regular Git usage where arguments
> are generated by external programs to which we have no business dictating a
> specific line ending style.

However, Jeff's patch is intended to catch exactly these cases (not for 
the cases where this happens accidentally, but when they happen with 
malicious intent).

We are talking about user-provided data that is reproduced by die() or 
error(). I daresay that we do not have a single case where it is 
intended that this data is intentionally multi-lined, like a commit 
message. It can only be an accident or malicious when it spans across lines.

I know we allow CR and LF in file names, but in all cases where such a 
name appears in an error message, it is *not important* that the data is 
reproduced exactly. On the contrary, it is usually more helpful to know 
that something strange is going on. The question marks are a strong 
indication to the user for this.

> If you absolutely insist, I will spend time to find a plausible example
> and use that in the regression test.

I don't want to see you on an endeavor with dubious results. I'd prefer 
to wait until the first case of "incorrectly munged data" is reported 
because, as I said, I have a gut feeling that there is none.

>> I am certainly open to loosening the sanitizing for CR to make things
>> work seamlessly on Windows. But I am having trouble imagining a case
>> that is actually negatively impacted.

I came to the same conclusion. I regret having sent out a warning 
message in, well, such a haste(*), without thinking the case through 
first. IMHO, Jeff's patch should be fine as is.

(*) literally; I had to catch a train.

-- Hannes

^ permalink raw reply

* [PATCH 1/2] configure.ac: fix old iconv check
From: Bernd Kuhls @ 2017-01-16 19:56 UTC (permalink / raw)
  To: git; +Cc: Bernd Kuhls

According to
https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Running-the-Compiler.html
the parameter syntax of AC_COMPILE_IFELSE is

(input, [action-if-true], [action-if-false])

Displaying "no" when the test was positive and enabling support for old
iconv implementations by OLD_ICONV=UnfortunatelyYes when the test fails
it obviously wrong. This patch switches the actions to fix the problem.

Signed-off-by: Bernd Kuhls <bernd.kuhls@writeme.com>
---
 configure.ac | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index 0b15f04b1..63e71a472 100644
--- a/configure.ac
+++ b/configure.ac
@@ -759,9 +759,9 @@ GIT_STASH_FLAGS($ICONVDIR)
 
 AC_MSG_CHECKING([for old iconv()])
 AC_COMPILE_IFELSE([OLDICONVTEST_SRC],
-	[AC_MSG_RESULT([no])],
 	[AC_MSG_RESULT([yes])
-	OLD_ICONV=UnfortunatelyYes])
+	OLD_ICONV=UnfortunatelyYes],
+	[AC_MSG_RESULT([no])])
 
 GIT_UNSTASH_FLAGS($ICONVDIR)
 
-- 
2.11.0


^ permalink raw reply related

* [PATCH 2/2] configure.ac: Fix --without-iconv
From: Bernd Kuhls @ 2017-01-16 19:56 UTC (permalink / raw)
  To: git; +Cc: Bernd Kuhls
In-Reply-To: <20170116195638.3713-1-bernd.kuhls@writeme.com>

GIT_PARSE_WITH(iconv)) sets NO_ICONV=YesPlease in
https://github.com/git/git/blob/maint/configure.ac#L327

But the command GIT_CONF_SUBST([NO_ICONV]) in
https://github.com/git/git/blob/maint/configure.ac#L618

is only executed when NO_ICONV is an empty variable
https://github.com/git/git/blob/maint/configure.ac#L578

which has the effect that NO_ICONV=YesPlease is not written to
config.mak.autogen which breaks compilation in systems without iconv.

Signed-off-by: Bernd Kuhls <bernd.kuhls@writeme.com>
---
 configure.ac | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/configure.ac b/configure.ac
index 63e71a472..419469315 100644
--- a/configure.ac
+++ b/configure.ac
@@ -614,15 +614,15 @@ LIBS="$old_LIBS"
 
 GIT_UNSTASH_FLAGS($ICONVDIR)
 
-GIT_CONF_SUBST([NEEDS_LIBICONV])
-GIT_CONF_SUBST([NO_ICONV])
-
 if test -n "$NO_ICONV"; then
     NEEDS_LIBICONV=
 fi
 
 fi
 
+GIT_CONF_SUBST([NEEDS_LIBICONV])
+GIT_CONF_SUBST([NO_ICONV])
+
 #
 # Define NO_DEFLATE_BOUND if deflateBound is missing from zlib.
 
-- 
2.11.0


^ permalink raw reply related

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Johannes Schindelin @ 2017-01-16 17:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Junio C Hamano, git
In-Reply-To: <20170116160456.ltbb7ofe47xos7xo@sigill.intra.peff.net>

Hi Peff,

On Mon, 16 Jan 2017, Jeff King wrote:

> On Mon, Jan 16, 2017 at 01:52:39PM +0100, Johannes Schindelin wrote:
> 
> > > >  An error message with an ASCII control character like '\r' in it
> > > >  can alter the message to hide its early part, which is problematic
> > > >  when a remote side gives such an error message that the local side
> > > >  will relay with a "remote: " prefix.
> > > >
> > > >  Will merge to 'next'.
> > > 
> > > Please be not too hasty with advancing this topic to master. I could imagine
> > > that there is some unwanted fallout on systems where the end-of-line marker
> > > CRLF is common. Though, I haven't tested the topic myself, yet, nor do I
> > > expect regressions in *my* typical workflow.
> > 
> > Thank you for being so attentive.
> > 
> > This topic branch would indeed have caused problems. Worse: it would have
> > caused problems that are not covered by our test suite, as Git for
> > Windows' own utilities do not generate CR/LF line endings. So this
> > regression would have bit our users. Nasty.
> 
> Hmm.  I am not sure to what degree CRLFs are actually a problem here.
> Keep in mind these are error messages generated via error(), and so not
> processing arbitrary data. I can imagine that CRs might come from:

Please note the regression test I added. It uses rev-parse's --abbrev-ref
option which quotes the argument when erroring out. This argument then
gets munged.

So error() (or in this case, die()) *very much* processes arbitrary data.

I *know* that rev-parse --abbrev-ref is an artificial example, it is
highly unlikely that anybody will use

	git rev-parse --abbrev-ref "$(<call an external program here that
		generates CR/LF line endings>)"

However, there are plenty other cases in regular Git usage where arguments
are generated by external programs to which we have no business dictating a
specific line ending style.

If you absolutely insist, I will spend time to find a plausible example
and use that in the regression test. However, that would not really
improve anything, as the purpose of the regression test is simply to
demonstrate that a user-provided argument's CR/LF does not get munged
incorrectly. And the test I provided serves that purpose perfectly.

>   1. A parameter like a filename or revision name. If I ask for a
>      rev-parse of "foo\r\n", then it's probably useful to mention the
>      "\r" in the error message, rather than ignoring it (or converting
>      it to a line-feed).
> 
>      And I think that would apply to any input parameter we show via
>      error(), etc, if it is connected to a newline (ideally we would
>      show newlines as "?", too, but we cannot tell the difference
>      between ones from parameters, and ones that are part of the error
>      message).

I think it is doing users a really great disservice to munge up CR or LF
into question marks. I *guarantee* you that it confuses users. And not
because they are dumb, but because the code violates the Law of Least
Surprise.

> I am certainly open to loosening the sanitizing for CR to make things
> work seamlessly on Windows. But I am having trouble imagining a case
> that is actually negatively impacted.

Git accepts all kinds of arguments, not just file names. It is totally
legitimate, and you probably can show use cases of that yourself, to
generate those arguments by other programs.

These programs can generate CR/LF line endings.

While well-intentioned, your changes could make things even unreadable.
Certainly confusing.

> > -- snipsnap --
> > Subject: [PATCH] fixup! vreport: sanitize ASCII control chars
> 
> Given the subtlety here, I'd much rather have a patch on top.

Fine.

> > The original patch is incorrect, as it turns carriage returns into
> > question marks.
> > 
> > However, carriage returns should be left alone when preceding a line feed,
> > and simply turned into line feeds otherwise.
> 
> The question of whether to leave CRLFs alone is addressed above. But I
> do not understand why you'd want a lone CR to be converted to a line
> feed. Running:
> 
>   git rev-parse --abbrev-ref "$(printf "foo\r")"
> 
> with my patch gets you:
> 
>   fatal: ambiguous argument 'foo?': unknown revision or path not in the working tree.
> 
> But with yours:
> 
>   fatal: ambiguous argument 'foo
>   ': unknown revision or path not in the working tree.
> 
> Obviously the "?" thing is a lossy transformation,

s/lossy/confusing/

Probably not to you, because you came up with the transformation, but to
literally everybody else.

> but I do not see how a newline is any less confusing (but see below for
> some thoughts).

The more common bug report to be expected would show that multi-line
arguments are converted to ending in question marks. That is not the user
experience for which I am aiming.

> > To make the end result more readable, the logic is inverted so that the
> > common case (no substitution) is handled first.
> > 
> > While at it, let's lose the unnecessary curly braces.
> 
> Please don't. Obviously C treats the "if/else" as a single unit, but
> IMHO it's less error-prone to include the braces any time there are
> multiple visual lines. E.g., something like:
> 
>   while (foo)
> 	if (bar)
> 		one();
> 	else
> 		two();
> 	three();
> 
> is much easier to spot as wrong when you would require braces either
> way (and not relevant here, but I'd say that even an inner block with a
> comment deserves braces for the same reason).

There is no documentation about the preferred coding style.

For years, I saw patches, and provided patches myself, that *avoided*
curly braces when not necessary (in addition to aiming for shorter arms to
come before longer arms in conditionals).

Now all of a sudden Junio *and* you suggest unnecessary curly braces to be
added.

That is inconsistent at best, and confusing. Maybe you two gentle people
can make up your mind and document the final verdict, and for extra
brownie points automate the formatting so that patch reviews are not
dominated by coding style comments that would be better addressed by
machines than by humans.

> > We also add a regression test that verifies that carriage returns are
> > handled properly. And as we expect CR/LF to be handled properly also on
> > platforms other than Windows, this test case is not guarded by the MINGW
> > prerequisite.
> 
> I am not sure what "properly" means here. In your test:
> 
> > +# This test verifies that the error reporting functions handle CR correctly.
> > +# --abbrev-ref is simply used out of convenience, as it reports an error including
> > +# a user-specified argument.
> > +test_expect_success 'error messages treat CR/LF correctly' '
> > +	test_must_fail git rev-parse --abbrev-ref "$(printf "CR/LF\\r\\n")" 2>err &&
> > +	grep "^fatal: .*CR/LF$" err
> > +'
> 
> The "\n" is eaten by the shell, and git sees only "CR/LF\r". So we are
> not testing the CRLF case in vreportf() at all.

True. Should be easy to fix, starting from my patch.

> We do end up with "CR/LF\n" in vreportf(), which is presumably converted
> by fprintf() to "CR/LF\r\n" on Windows. And so perhaps that is why you
> are doing the "convert \r to \n" thing above.
> 
> But I still think it's not doing the right thing. Git _didn't_ see CRLF,
> it saw CR.

You know, I don't care. As long as carriage returns are either left alone
(which conflicts specifically with your stated goal) or at least are
handled gracefully when coming before line feeds.

Converting stray carriage returns to question marks is confusing, and of
course I would take the brunt of the bug reports, so I do not look
favorably on that change.

My patch was just a starter, to help with fixing the patch series before
it hits `next`.

Ciao,
Johannes

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Jeff King @ 2017-01-16 16:04 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johannes Sixt, Junio C Hamano, git
In-Reply-To: <alpine.DEB.2.20.1701161251100.3469@virtualbox>

On Mon, Jan 16, 2017 at 01:52:39PM +0100, Johannes Schindelin wrote:

> > >  An error message with an ASCII control character like '\r' in it
> > >  can alter the message to hide its early part, which is problematic
> > >  when a remote side gives such an error message that the local side
> > >  will relay with a "remote: " prefix.
> > >
> > >  Will merge to 'next'.
> > 
> > Please be not too hasty with advancing this topic to master. I could imagine
> > that there is some unwanted fallout on systems where the end-of-line marker
> > CRLF is common. Though, I haven't tested the topic myself, yet, nor do I
> > expect regressions in *my* typical workflow.
> 
> Thank you for being so attentive.
> 
> This topic branch would indeed have caused problems. Worse: it would have
> caused problems that are not covered by our test suite, as Git for
> Windows' own utilities do not generate CR/LF line endings. So this
> regression would have bit our users. Nasty.

Hmm.  I am not sure to what degree CRLFs are actually a problem here.
Keep in mind these are error messages generated via error(), and so not
processing arbitrary data. I can imagine that CRs might come from:

  1. A parameter like a filename or revision name. If I ask for a
     rev-parse of "foo\r\n", then it's probably useful to mention the
     "\r" in the error message, rather than ignoring it (or converting
     it to a line-feed).

     And I think that would apply to any input parameter we show via
     error(), etc, if it is connected to a newline (ideally we would
     show newlines as "?", too, but we cannot tell the difference
     between ones from parameters, and ones that are part of the error
     message).

  2. The printf-fmt strings themselves. But these come from C code,
     which just uses "\n". My impression is that it is fprintf() which
     is responsible for converting that to "\r\n". And we are doing our
     sanitizing here between an snprintf(), and an fprintf() of the
     result. So it should see only the raw "\n" fields.

I am certainly open to loosening the sanitizing for CR to make things
work seamlessly on Windows. But I am having trouble imagining a case
that is actually negatively impacted.

> -- snipsnap --
> Subject: [PATCH] fixup! vreport: sanitize ASCII control chars

Given the subtlety here, I'd much rather have a patch on top.

> The original patch is incorrect, as it turns carriage returns into
> question marks.
> 
> However, carriage returns should be left alone when preceding a line feed,
> and simply turned into line feeds otherwise.

The question of whether to leave CRLFs alone is addressed above. But I
do not understand why you'd want a lone CR to be converted to a line
feed. Running:

  git rev-parse --abbrev-ref "$(printf "foo\r")"

with my patch gets you:

  fatal: ambiguous argument 'foo?': unknown revision or path not in the working tree.

But with yours:

  fatal: ambiguous argument 'foo
  ': unknown revision or path not in the working tree.

Obviously the "?" thing is a lossy transformation, but I do not see how
a newline is any less confusing (but see below for some thoughts).

> To make the end result more readable, the logic is inverted so that the
> common case (no substitution) is handled first.
> 
> While at it, let's lose the unnecessary curly braces.

Please don't. Obviously C treats the "if/else" as a single unit, but
IMHO it's less error-prone to include the braces any time there are
multiple visual lines. E.g., something like:

  while (foo)
	if (bar)
		one();
	else
		two();
	three();

is much easier to spot as wrong when you would require braces either
way (and not relevant here, but I'd say that even an inner block with a
comment deserves braces for the same reason).

> We also add a regression test that verifies that carriage returns are
> handled properly. And as we expect CR/LF to be handled properly also on
> platforms other than Windows, this test case is not guarded by the MINGW
> prerequisite.

I am not sure what "properly" means here. In your test:

> +# This test verifies that the error reporting functions handle CR correctly.
> +# --abbrev-ref is simply used out of convenience, as it reports an error including
> +# a user-specified argument.
> +test_expect_success 'error messages treat CR/LF correctly' '
> +	test_must_fail git rev-parse --abbrev-ref "$(printf "CR/LF\\r\\n")" 2>err &&
> +	grep "^fatal: .*CR/LF$" err
> +'

The "\n" is eaten by the shell, and git sees only "CR/LF\r". So we are
not testing the CRLF case in vreportf() at all.

We do end up with "CR/LF\n" in vreportf(), which is presumably converted
by fprintf() to "CR/LF\r\n" on Windows. And so perhaps that is why you
are doing the "convert \r to \n" thing above.

But I still think it's not doing the right thing. Git _didn't_ see CRLF,
it saw CR.

-Peff

^ permalink raw reply

* Re: [RFC] stash: support filename argument
From: Johannes Schindelin @ 2017-01-16 13:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Gummerer, git, kes-kes
In-Reply-To: <alpine.DEB.2.20.1701161150320.3469@virtualbox>

Hi,

On Mon, 16 Jan 2017, Johannes Schindelin wrote:

> On Sun, 15 Jan 2017, Junio C Hamano wrote:
> 
> > I haven't spent enough time to think if it even makes sense to
> > "stash" partially, leaving the working tree still dirty.
> 
> Think no more! We already support that with --keep-index, and it is a very
> useful feature.

And of course there is `git stash -p`, which is also very useful,
*because* it leaves the working tree still dirty, in the desired way.

Ciao,
Johannes

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Johannes Schindelin @ 2017-01-16 12:52 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <257b4175-9879-7814-5d8d-02050792574d@kdbg.org>

Hi Hannes,

On Mon, 16 Jan 2017, Johannes Sixt wrote:

> Am 16.01.2017 um 02:51 schrieb Junio C Hamano:
> > * jk/vreport-sanitize (2017-01-11) 2 commits
> >  - vreport: sanitize ASCII control chars
> >  - Revert "vreportf: avoid intermediate buffer"
> >
> >  An error message with an ASCII control character like '\r' in it
> >  can alter the message to hide its early part, which is problematic
> >  when a remote side gives such an error message that the local side
> >  will relay with a "remote: " prefix.
> >
> >  Will merge to 'next'.
> 
> Please be not too hasty with advancing this topic to master. I could imagine
> that there is some unwanted fallout on systems where the end-of-line marker
> CRLF is common. Though, I haven't tested the topic myself, yet, nor do I
> expect regressions in *my* typical workflow.

Thank you for being so attentive.

This topic branch would indeed have caused problems. Worse: it would have
caused problems that are not covered by our test suite, as Git for
Windows' own utilities do not generate CR/LF line endings. So this
regression would have bit our users. Nasty.

Something like this is necessary, at least:

-- snipsnap --
Subject: [PATCH] fixup! vreport: sanitize ASCII control chars

The original patch is incorrect, as it turns carriage returns into
question marks.

However, carriage returns should be left alone when preceding a line feed,
and simply turned into line feeds otherwise.

To make the end result more readable, the logic is inverted so that the
common case (no substitution) is handled first.

While at it, let's lose the unnecessary curly braces.

We also add a regression test that verifies that carriage returns are
handled properly. And as we expect CR/LF to be handled properly also on
platforms other than Windows, this test case is not guarded by the MINGW
prerequisite.

Spotted by Hannes Sixt.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t0000-basic.sh | 8 ++++++++
 usage.c          | 9 ++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index 60811a3a7c..d494cb7c05 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -1063,4 +1063,12 @@ test_expect_success 'very long name in the index handled sanely' '
 	test $len = 4098
 '
 
+# This test verifies that the error reporting functions handle CR correctly.
+# --abbrev-ref is simply used out of convenience, as it reports an error including
+# a user-specified argument.
+test_expect_success 'error messages treat CR/LF correctly' '
+	test_must_fail git rev-parse --abbrev-ref "$(printf "CR/LF\\r\\n")" 2>err &&
+	grep "^fatal: .*CR/LF$" err
+'
+
 test_done
diff --git a/usage.c b/usage.c
index 50a6ccee44..71bc7c0329 100644
--- a/usage.c
+++ b/usage.c
@@ -15,10 +15,13 @@ void vreportf(const char *prefix, const char *err, va_list params)
 	char *p;
 
 	vsnprintf(msg, sizeof(msg), err, params);
-	for (p = msg; *p; p++) {
-		if (iscntrl(*p) && *p != '\t' && *p != '\n')
+	for (p = msg; *p; p++)
+		if (!iscntrl(*p) || *p == '\t' || *p == '\n')
+			continue;
+		else if (*p != '\r')
 			*p = '?';
-	}
+		else if (p[1] != '\n')
+			*p = '\n';
 	fprintf(fh, "%s%s\n", prefix, msg);
 }
 
-- 
2.11.0.windows.3


^ permalink raw reply related

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Johannes Schindelin @ 2017-01-16 11:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqh94zbwlu.fsf@gitster.mtv.corp.google.com>

Hi Junio,

On Sun, 15 Jan 2017, Junio C Hamano wrote:

> Here are the topics that have been cooking.  Commits prefixed with
> '-' are only in 'pu' (proposed updates) while commits prefixed with
> '+' are in 'next'.  The ones marked with '.' do not appear in any of
> the integration branches, but I am still holding onto them.

A suggestion: since it is very, very tedious to find the latest
iteration's thread, as well as the corresponding commit in `pu` (or
whatever other branch you use), and since you auto-generate these lists
anyway, it would make things less cumbersome if the commit names of the
tips, as well as the Message-IDs of the cover-letter (or first patch) were
displayed with the topic branches.

It still would not address e.g. the problem that the original authors are
not notified about the current state of their submission by these What's
Cooking mails.

But it would improve the situation.

> * js/difftool-builtin (2017-01-09) 5 commits
>  - t7800: run both builtin and scripted difftool, for now
>  - difftool: implement the functionality in the builtin
>  - difftool: add a skeleton for the upcoming builtin
>  - git_exec_path: do not return the result of getenv()

This patch was not in my patch submission. Sneaking it into this topic
branch is not incorrect.

It does not matter, though, as I will drop the Coverity patches from this
patch series: the conflation of Coverity fixes with the builtin difftool
was a mistake, and I will thereby address that mistake.

Ciao,
Johannes

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Johannes Schindelin @ 2017-01-16 11:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqh94zbwlu.fsf@gitster.mtv.corp.google.com>

Hi Junio,

On Sun, 15 Jan 2017, Junio C Hamano wrote:

> * js/prepare-sequencer-more (2017-01-09) 38 commits

I think that it adds confusion rather than value to specifically use a
different branch name than I indicated in my submission, unless there is a
really good reason to do so (which I fail to see here).

>  - sequencer (rebase -i): write out the final message
>  - sequencer (rebase -i): write the progress into files
>  - sequencer (rebase -i): show the progress
>  - sequencer (rebase -i): suggest --edit-todo upon unknown command
>  - sequencer (rebase -i): show only failed cherry-picks' output
>  - sequencer (rebase -i): show only failed `git commit`'s output
>  - sequencer: use run_command() directly
>  - sequencer: make reading author-script more elegant
>  - sequencer (rebase -i): differentiate between comments and 'noop'
>  - sequencer (rebase -i): implement the 'drop' command
>  - sequencer (rebase -i): allow rescheduling commands
>  - sequencer (rebase -i): respect strategy/strategy_opts settings
>  - sequencer (rebase -i): respect the rebase.autostash setting
>  - sequencer (rebase -i): run the post-rewrite hook, if needed
>  - sequencer (rebase -i): record interrupted commits in rewritten, too
>  - sequencer (rebase -i): copy commit notes at end
>  - sequencer (rebase -i): set the reflog message consistently
>  - sequencer (rebase -i): refactor setting the reflog message
>  - sequencer (rebase -i): allow fast-forwarding for edit/reword
>  - sequencer (rebase -i): implement the 'reword' command
>  - sequencer (rebase -i): leave a patch upon error
>  - sequencer (rebase -i): update refs after a successful rebase
>  - sequencer (rebase -i): the todo can be empty when continuing
>  - sequencer (rebase -i): skip some revert/cherry-pick specific code path
>  - sequencer (rebase -i): remove CHERRY_PICK_HEAD when no longer needed
>  - sequencer (rebase -i): allow continuing with staged changes
>  - sequencer (rebase -i): write an author-script file
>  - sequencer (rebase -i): implement the short commands
>  - sequencer (rebase -i): add support for the 'fixup' and 'squash' commands
>  - sequencer (rebase -i): write the 'done' file
>  - sequencer (rebase -i): learn about the 'verbose' mode
>  - sequencer (rebase -i): implement the 'exec' command
>  - sequencer (rebase -i): implement the 'edit' command
>  - sequencer (rebase -i): implement the 'noop' command
>  - sequencer: support a new action: 'interactive rebase'
>  - sequencer: use a helper to find the commit message
>  - sequencer: move "else" keyword onto the same line as preceding brace
>  - sequencer: avoid unnecessary curly braces
> 
>  The sequencer has further been extended in preparation to act as a
>  back-end for "rebase -i".
> 
>  Waiting for review comments to be addressed.

The only outstanding review comments I know about are your objection to
the name of the read_env_script() function (which I shot down as bogus),
and the rather real bug fix I sent out as a fixup! which you may want to
squash in (in the alternative, I can mailbomb v4 of the entire sequencer-i
patch series, that is your choice).

Ciao,
Johannes

^ permalink raw reply

* Re: [RFC] stash --continue
From: Johannes Schindelin @ 2017-01-16 10:54 UTC (permalink / raw)
  To: Stephan Beyer; +Cc: git
In-Reply-To: <cd784a4e-ee99-564e-81de-9f7f6cc26c67@gmx.net>

Hi Stephan,

On Mon, 16 Jan 2017, Stephan Beyer wrote:

> a git-newbie-ish co-worker uses git-stash sometimes. Last time he used
> "git stash pop", he got into a merge conflict. After he resolved the
> conflict, he did not know what to do to get the repository into the
> wanted state. In his case, it was only "git add <resolved files>"
> followed by a "git reset" and a "git stash drop", but there may be more
> involved cases when your index is not clean before "git stash pop" and
> you want to have your index as before.
> 
> This led to the idea to have something like "git stash --continue"[1]

More like "git stash pop --continue". Without the "pop" command, it does
not make too much sense.

Ciao,
Johannes

^ permalink raw reply

* Re: [RFC] stash: support filename argument
From: Johannes Schindelin @ 2017-01-16 10:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Gummerer, git, kes-kes
In-Reply-To: <xmqqvatfc0rt.fsf@gitster.mtv.corp.google.com>

Hi Junio,

On Sun, 15 Jan 2017, Junio C Hamano wrote:

> I haven't spent enough time to think if it even makes sense to
> "stash" partially, leaving the working tree still dirty.

Think no more! We already support that with --keep-index, and it is a very
useful feature.

Ciao,
Johannes

^ permalink raw reply

* Re: [PATCH v3 00/38] Teach the sequencer to act as rebase -i's backend
From: Johannes Schindelin @ 2017-01-16 10:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Kevin Daudt, Dennis Kaarsemaker, Stephan Beyer, Jeff King
In-Reply-To: <xmqqinpgdass.fsf@gitster.mtv.corp.google.com>

Hi Junio,

On Sat, 14 Jan 2017, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > You stated elsewhere that converting a script into a builtin should focus
> > on a faithful conversion.
> >
> > The original code is:
> >
> > 	. "$author_script"
> 
> [...]
>
> If the code in the sequencer.c reads things other than the three
> variables we ourselves set, and make them into environment variables
> and propagate to subprocesses (hooks and editors), it would be a
> bug.  The original did not intend to do that (the dot-sourcing is
> overly loose than reading three known variables and nothing else,
> but is OK because we do not support the case where end users muck
> with the file).  Also, writing FOO=BAR alone (not "export FOO=BAR"
> or "FOO=BAR; export FOO") to the file wouldn't have exported FOO to
> subprocesses anyway.

That analysis cannot be completely correct, as the GIT_AUTHOR_* variables
*are* used by the `git commit` subprocess.

In any case, it is clear that we (I include all reviewers here) messed
this patch series up quite a bit. Hence I will be more careful from now on
to only act on suggestions that do, in fact, improve the patch series from
my point of view.

Ciao,
Johannes

^ permalink raw reply

* Re: [PATCH] Documentation/bisect: improve on (bad|new) and (good|bad)
From: Matthieu Moy @ 2017-01-16  9:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Couder, git, Manuel Ullmann, Christian Couder
In-Reply-To: <xmqqinpihiwz.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

> Christian Couder <christian.couder@gmail.com> writes:
>
>> The following part of the description:
>>
>> git bisect (bad|new) [<rev>]
>> git bisect (good|old) [<rev>...]
>>
>> may be a bit confusing, as a reader may wonder if instead it should be:
>>
>> git bisect (bad|good) [<rev>]
>> git bisect (old|new) [<rev>...]
>>
>> Of course the difference between "[<rev>]" and "[<rev>...]" should hint
>> that there is a good reason for the way it is.
>>
>> But we can further clarify and complete the description by adding
>> "<term-new>" and "<term-old>" to the "bad|new" and "good|old"
>> alternatives.
>>
>> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
>> ---
>>  Documentation/git-bisect.txt | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Thanks.  The patch looks good.

Looks good to me too.

> I think the answer to the question "why do we think we need a single
> bisect/bad?" is "because bisection is about assuming that there is
> only one commit that flips the tree state from 'old' to 'new' and
> finding that single commit".

I wouldn't say it's about "assuming" there's only one commit, but it's
about finding *one* such commit, i.e. it works if there are several such
commits, but won't find them all.

> But what if bad-A and bad-B have more than one merge bases?  We
> won't know which side the badness came from.
>
>                           o---o---o---bad-A
>                          /     \ / 
>     -----Good---o---o---o       / 
>                          \     / \
>                           o---o---o---bad-B
>
> Being able to bisect the region of DAG bound by "^Good bad-A bad-B"
> may have value in such a case.  I dunno.

I could help finding several guilty commits, but anyway you can't
guarantee you'll find them all as soon as you use a binary search: if
the history looks like

--- Good --- Bad --- Good --- Good --- Bad --- Good --- Bad

then without examining all commits, you can't tell how many good->bad
switches occured.

But keeping several bad commits wouldn't help keeping the set of
potentially guilty commits small: bad commits appear on the positive
side in "^Good bad-A bad-B", so having more bad commits mean having a
larger DAG to explore (which is a bit counter-intuitive: without
thinking about it I'd have said "more info => less commits to explore").

So, if finding all guilty commits is not possible, I'm not sure how
valuable it is to try to find several of them.

OTOH, keeping several good commits is needed to find a commit for which
all parents are good and the commit is bad, i.e. distinguish

Good
    \
     Bad <-- this is the one.
    /
Good

and

Good
    \
     Bad <-- need to dig further
    /
 Bad

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply

* Re: [RFC] stash: support filename argument
From: Marc Strapetz @ 2017-01-16  8:18 UTC (permalink / raw)
  To: Stephan Beyer, Junio C Hamano, Thomas Gummerer; +Cc: git, kes-kes
In-Reply-To: <c0d46a97-b1c0-d9c9-e475-28e0368ac61f@gmx.net>

On 16.01.2017 01:41, Stephan Beyer wrote:
> Hi,
>
> On 01/16/2017 01:21 AM, Junio C Hamano wrote:
>> I haven't spent enough time to think if it even makes sense to
>> "stash" partially, leaving the working tree still dirty.  My initial
>> reaction was "then stashing away the dirty WIP state to get a spiffy
>> clean working environment becomes impossible", and I still need time
>> to recover from that ;-)  So as to the desirablity of this "feature",
>> I have no strong opinion for or against yet.
> I do remember that I simulated that feature a few times (either by
> adding the to-be-keep stuff (hunks, not only files) to the index and use
> --keep-index, and sometimes by making a temporary commit (to make sure
> to not lose anything) and then stash). So I think there is a valid
> desire of the feature.

I can confirm this from a GUI client perspective, for which this feature 
makes probably even more sense than for command line. It has been 
requested by our users quite often compared to other features and 
compared to "git stash -p" support.

-Marc




^ permalink raw reply

* Re: [PATCH 00/27] Revamp the attribute system; another round
From: Jeff King @ 2017-01-16  8:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git, pclouds, sbeller
In-Reply-To: <xmqq37gjdgxn.fsf@gitster.mtv.corp.google.com>

On Sun, Jan 15, 2017 at 03:47:16PM -0800, Junio C Hamano wrote:

> This one unfortunately clashes with jk/nofollow-attr-ignore where
> Peff adds sanity to refuse following symbolic links when reading
> .gitignore and .gitattributes; I'll eject jk/nofollow-attr-ignore
> topic for now and see how well this topic fits together with the
> remainder of the topics in flight.

Yeah, that's a good plan. I think my re-roll of the nofollow stuff will
be pretty major, and may not end up touching the attribute code directly
at all.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Junio C Hamano @ 2017-01-16  7:48 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Jeff King, Johannes Schindelin
In-Reply-To: <257b4175-9879-7814-5d8d-02050792574d@kdbg.org>

Johannes Sixt <j6t@kdbg.org> writes:

> Am 16.01.2017 um 02:51 schrieb Junio C Hamano:
>> * jk/vreport-sanitize (2017-01-11) 2 commits
>>  - vreport: sanitize ASCII control chars
>>  - Revert "vreportf: avoid intermediate buffer"
>>
>>  An error message with an ASCII control character like '\r' in it
>>  can alter the message to hide its early part, which is problematic
>>  when a remote side gives such an error message that the local side
>>  will relay with a "remote: " prefix.
>>
>>  Will merge to 'next'.
>
> Please be not too hasty with advancing this topic to master. I could
> imagine that there is some unwanted fallout on systems where the
> end-of-line marker CRLF is common. Though, I haven't tested the topic
> myself, yet, nor do I expect regressions in *my* typical workflow.

Thanks; will wait for a further discussion on the topic's thread
then.


^ permalink raw reply

* Re: gitk pull request // was: Re: gitk: "lime" color incompatible with older Tk versions
From: Junio C Hamano @ 2017-01-16  7:48 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: David Aguilar, Stefan Beller, Andrew Janke, git@vger.kernel.org
In-Reply-To: <20170116031706.GA3322@fergus.ozlabs.ibm.com>

Paul Mackerras <paulus@ozlabs.org> writes:

>> Paul, is it a good time to pull, or do you still have something not
>> published yet that should go together with what you have already
>> queued?
>
> I recently pushed out one more commit to update the Russian
> translation from Dimitriy Ryazantcev.  The head is now 8fef3f36b779.
> I have a couple more series that I am currently reviewing, but nothing
> immediately ready to publish.  It would be a good time for you to do a
> pull, since the "lime" color fix and the memory consumption fixes
> should be helpful for a lot of people.

Thanks.  I did want to get the memory consumption fix sooner rather
than later, and this is very much appreciated.

Pulled.

^ permalink raw reply

* Re: Different merges from translation perspective
From: Junio C Hamano @ 2017-01-16  7:44 UTC (permalink / raw)
  To: Alexander Shopov; +Cc: Git List
In-Reply-To: <CAP6f5MkOoDUqHCvLNQ+xJGWTbrdecet9W_JK5y7JeAnBpGeAaw@mail.gmail.com>

Alexander Shopov <ash@kambanaria.org> writes:

> What is the difference between simple, fast forward, automatic and
> trivial merge?

"fast forward" and "trivial" (and "automatic" to some degree) are
technical terms with precise meaning.  Other phrases that are
related to "merge" that are not in your list are "already
up-to-date" and "real merge".

I do not think "simple" is among these words, but I can see it used
colloquially to mean "a real merge that is easy to resolve",
e.g. "if you get conflicts that is not simple, you may have to study
what both sides did carefully before you can decide the correct
merge result".

When you are on commit X and trying to merge commit Y, various
things can happen.  

 * There is one case where nothing happens.  You may have cloned
   from another repository and the tip of the branch was at commit Y
   back then---since then you built on it and you are now at commit
   X, and nobody else did anything in that repository in the
   meantime.  You try to merge from there, and find that the tip of
   the branch is still Y.  Your history leading to X contains
   everything the other side of the history leading to Y contains,
   so there is no need to do anything.

   We say that in this situation, your branch is already up-to-date.

 * There is another case where no new commit is created.  You may
   have cloned like the above case and got commit X, and haven't
   done anything since then, while others worked on the branch to
   advance the tip to commit Y.  You try to merge their work.  Their
   history leading to Y contains everything you have in your history
   leading to X contains.  The only thing we need is to "fast forward"
   your tip of the branch from X to Y, and make the index and the
   working tree to match.

   We say that in this situation, your branch can be fast forwarded.

 * All other cases, we need to come up with a new state that is the
   result of merging X and Y, and record it as a child commit of
   both X and Y, i.e. create a merge commit.  

   We say that this situation requires a real merge.

 * When a real merge is needed, there are a few subcases.

   - The two histories being merged may have changed their own set
     of files, without overlap.  Your commit X, since your history
     diverged from the history that leads to commit Y, may have
     worked only on the source file, while commit Y, since it
     diverged from your history, may have worked only on the
     documentation file, in a hypothetical two-file project.  In
     such a case, the copy of the documentation file you have at
     commit X is in the state the file was in when the histories
     diverged, and only the other side modified it, and we can take
     the documentation file from commit Y (i.e. the other side) as
     the result.  Similarly, because only you changed the source
     file while the other side didn't touch it, we take the source
     file from commit X.  We can come up with the merge result
     without even inspecting the file contents.

     The act of coming up with the result of the merge by pure
     equality of the file contents (i.e. one side modified, the
     other side left intact) is called "tree level merge", and a
     merge, all of whose paths can be resolved by tree level merge,
     is called "trivial merge".

   - If a merge is not "trivial", we'd need to dig down to "file
     level merge".  The contents of a file that was touched by both
     sides need to be computed. 

   - In a merge that requires "file level merge", you and the other
     side may have modified the same file, but touched different and
     non-overlapping parts.  You may have updated text in section 1
     of the documentation file while the other side may have fixed
     typo in section 3.  We use the same "three-way merge" principle
     used in the "tree-level merge"; if you did not touch a part of
     file that was modified by the other side, we take what the
     other side did, and vice versa, to come up with the merge
     result.  There are other ways to mechanically come up with the
     file level merge result that the user can supply (low level
     merge drivers).  The result of such a mechanical merge, when
     successfully recorded, is often called "automatic merge".

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)
From: Johannes Sixt @ 2017-01-16  6:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Johannes Schindelin
In-Reply-To: <xmqqh94zbwlu.fsf@gitster.mtv.corp.google.com>

Am 16.01.2017 um 02:51 schrieb Junio C Hamano:
> * jk/vreport-sanitize (2017-01-11) 2 commits
>  - vreport: sanitize ASCII control chars
>  - Revert "vreportf: avoid intermediate buffer"
>
>  An error message with an ASCII control character like '\r' in it
>  can alter the message to hide its early part, which is problematic
>  when a remote side gives such an error message that the local side
>  will relay with a "remote: " prefix.
>
>  Will merge to 'next'.

Please be not too hasty with advancing this topic to master. I could 
imagine that there is some unwanted fallout on systems where the 
end-of-line marker CRLF is common. Though, I haven't tested the topic 
myself, yet, nor do I expect regressions in *my* typical workflow.

-- Hannes


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox