Git development
 help / color / mirror / Atom feed
* [PATCH 1/6] t1450: refactor loose-object removal
From: Jeff King @ 2017-01-13 17:54 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git
In-Reply-To: <20170113175258.e66taigy4wpokohk@sigill.intra.peff.net>

Commit 90cf590f5 (fsck: optionally show more helpful info
for broken links, 2016-07-17) added a remove_loose_object()
helper, but we already had a remove_object() helper that did
the same thing. Let's combine these into one.

The implementations had a few subtle differences, so I've
tried to take the best of both:

  - the original used "sed", but the newer version avoids
    spawning an extra process

  - the original processed "$*", which was nonsense, as it
    assumed only a single sha1. Use "$1" to make that more
    clear.

  - the newer version ran an extra rev-parse, but it was not
    necessary; it's sole caller already converted the
    argument into a raw sha1

  - the original used "rm -f", whereas the new one uses
    "rm". The latter is better because it may notice a bug
    or other unexpected failure in the test. (The original
    does check that the object exists before we remove it,
    which is good, but that's a subset of the possible
    unexpected conditions).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1450-fsck.sh | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index ee7d4736d..3297d4cb2 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -43,13 +43,13 @@ test_expect_success 'HEAD is part of refs, valid objects appear valid' '
 
 test_expect_success 'setup: helpers for corruption tests' '
 	sha1_file() {
-		echo "$*" | sed "s#..#.git/objects/&/#"
+		remainder=${1#??} &&
+		firsttwo=${1%$remainder} &&
+		echo ".git/objects/$firsttwo/$remainder"
 	} &&
 
 	remove_object() {
-		file=$(sha1_file "$*") &&
-		test -e "$file" &&
-		rm -f "$file"
+		rm "$(sha1_file "$1")"
 	}
 '
 
@@ -535,13 +535,6 @@ test_expect_success 'fsck --connectivity-only' '
 	)
 '
 
-remove_loose_object () {
-	sha1="$(git rev-parse "$1")" &&
-	remainder=${sha1#??} &&
-	firsttwo=${sha1%$remainder} &&
-	rm .git/objects/$firsttwo/$remainder
-}
-
 test_expect_success 'fsck --name-objects' '
 	rm -rf name-objects &&
 	git init name-objects &&
@@ -550,7 +543,7 @@ test_expect_success 'fsck --name-objects' '
 		test_commit julius caesar.t &&
 		test_commit augustus &&
 		test_commit caesar &&
-		remove_loose_object $(git rev-parse julius:caesar.t) &&
+		remove_object $(git rev-parse julius:caesar.t) &&
 		test_must_fail git fsck --name-objects >out &&
 		tree=$(git rev-parse --verify julius:) &&
 		grep "$tree (\(refs/heads/master\|HEAD\)@{[0-9]*}:" out
-- 
2.11.0.629.g10075098c


^ permalink raw reply related

* [PATCH 2/6] sha1_file: fix error message for alternate objects
From: Jeff King @ 2017-01-13 17:54 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git
In-Reply-To: <20170113175258.e66taigy4wpokohk@sigill.intra.peff.net>

When we fail to open a corrupt loose object, we report an
error and mention the filename via sha1_file_name().
However, that function will always give us a path in the
local repository, whereas the corrupt object may have come
from an alternate. The result is a very misleading error
message.

Teach the open_sha1_file() and stat_sha1_file() helpers to
pass back the path they found, so that we can report it
correctly.

Note that the pointers we return go to static storage (e.g.,
from sha1_file_name()), which is slightly dangerous.
However, these helpers are static local helpers, and the
names are used for immediately generating error messages.
The simplicity is an acceptable tradeoff for the danger.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c     | 46 +++++++++++++++++++++++++++++++---------------
 t/t1450-fsck.sh | 10 ++++++++++
 2 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 1eb47f611..c6b990f41 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1630,39 +1630,54 @@ int git_open_cloexec(const char *name, int flags)
 	return fd;
 }
 
-static int stat_sha1_file(const unsigned char *sha1, struct stat *st)
+/*
+ * Find "sha1" as a loose object in the local repository or in an alternate.
+ * Returns 0 on success, negative on failure.
+ *
+ * The "path" out-parameter will give the path of the object we found (if any).
+ * Note that it may point to static storage and is only valid until another
+ * call to sha1_file_name(), etc.
+ */
+static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
+			  const char **path)
 {
 	struct alternate_object_database *alt;
 
-	if (!lstat(sha1_file_name(sha1), st))
+	*path = sha1_file_name(sha1);
+	if (!lstat(*path, st))
 		return 0;
 
 	prepare_alt_odb();
 	errno = ENOENT;
 	for (alt = alt_odb_list; alt; alt = alt->next) {
-		const char *path = alt_sha1_path(alt, sha1);
-		if (!lstat(path, st))
+		*path = alt_sha1_path(alt, sha1);
+		if (!lstat(*path, st))
 			return 0;
 	}
 
 	return -1;
 }
 
-static int open_sha1_file(const unsigned char *sha1)
+/*
+ * Like stat_sha1_file(), but actually open the object and return the
+ * descriptor. See the caveats on the "path" parameter above.
+ */
+static int open_sha1_file(const unsigned char *sha1, const char **path)
 {
 	int fd;
 	struct alternate_object_database *alt;
 	int most_interesting_errno;
 
-	fd = git_open(sha1_file_name(sha1));
+	*path = sha1_file_name(sha1);
+	fd = git_open(*path);
 	if (fd >= 0)
 		return fd;
 	most_interesting_errno = errno;
 
 	prepare_alt_odb();
 	for (alt = alt_odb_list; alt; alt = alt->next) {
-		const char *path = alt_sha1_path(alt, sha1);
-		fd = git_open(path);
+		*path = alt_sha1_path(alt, sha1);
+		fd = git_open(*path);
 		if (fd >= 0)
 			return fd;
 		if (most_interesting_errno == ENOENT)
@@ -1674,10 +1689,11 @@ static int open_sha1_file(const unsigned char *sha1)
 
 void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
 {
+	const char *path;
 	void *map;
 	int fd;
 
-	fd = open_sha1_file(sha1);
+	fd = open_sha1_file(sha1, &path);
 	map = NULL;
 	if (fd >= 0) {
 		struct stat st;
@@ -1686,7 +1702,7 @@ void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
 			*size = xsize_t(st.st_size);
 			if (!*size) {
 				/* mmap() is forbidden on empty files */
-				error("object file %s is empty", sha1_file_name(sha1));
+				error("object file %s is empty", path);
 				return NULL;
 			}
 			map = xmmap(NULL, *size, PROT_READ, MAP_PRIVATE, fd, 0);
@@ -2806,8 +2822,9 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	 * object even exists.
 	 */
 	if (!oi->typep && !oi->typename && !oi->sizep) {
+		const char *path;
 		struct stat st;
-		if (stat_sha1_file(sha1, &st) < 0)
+		if (stat_sha1_file(sha1, &st, &path) < 0)
 			return -1;
 		if (oi->disk_sizep)
 			*oi->disk_sizep = st.st_size;
@@ -3003,6 +3020,8 @@ void *read_sha1_file_extended(const unsigned char *sha1,
 {
 	void *data;
 	const struct packed_git *p;
+	const char *path;
+	struct stat st;
 	const unsigned char *repl = lookup_replace_object_extended(sha1, flag);
 
 	errno = 0;
@@ -3018,12 +3037,9 @@ void *read_sha1_file_extended(const unsigned char *sha1,
 		die("replacement %s not found for %s",
 		    sha1_to_hex(repl), sha1_to_hex(sha1));
 
-	if (has_loose_object(repl)) {
-		const char *path = sha1_file_name(sha1);
-
+	if (!stat_sha1_file(repl, &st, &path))
 		die("loose object %s (stored in %s) is corrupt",
 		    sha1_to_hex(repl), path);
-	}
 
 	if ((p = has_packed_and_bad(repl)) != NULL)
 		die("packed object %s (stored in %s) is corrupt",
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 3297d4cb2..f95174c9d 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -550,4 +550,14 @@ test_expect_success 'fsck --name-objects' '
 	)
 '
 
+test_expect_success 'alternate objects are correctly blamed' '
+	test_when_finished "rm -rf alt.git .git/objects/info/alternates" &&
+	git init --bare alt.git &&
+	echo "../../alt.git/objects" >.git/objects/info/alternates &&
+	mkdir alt.git/objects/12 &&
+	>alt.git/objects/12/34567890123456789012345678901234567890 &&
+	test_must_fail git fsck >out 2>&1 &&
+	grep alt.git out
+'
+
 test_done
-- 
2.11.0.629.g10075098c


^ permalink raw reply related

* [PATCH 3/6] t1450: test fsck of packed objects
From: Jeff King @ 2017-01-13 17:55 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git
In-Reply-To: <20170113175258.e66taigy4wpokohk@sigill.intra.peff.net>

The code paths in fsck for packed and loose objects are
quite different, and it is not immediately obvious that the
packed case behaves well. In particular:

  1. The fsck_loose() function always returns "0" to tell the
     iterator to keep checking more objects. Whereas
     fsck_obj_buffer() (which handles packed objects)
     returns -1. This is OK, because the callback machinery
     for verify_pack() does not stop when it sees a non-zero
     return.

  2. The fsck_loose() function sets the ERROR_OBJECT bit
     when fsck_obj() fails, whereas fsck_obj_buffer() sets it
     only when it sees a corrupt object. This turns out not
     to matter. We don't actually do anything with this bit
     except exit the program with a non-zero code, and that
     is handled already by the non-zero return from the
     function.

So there are no bugs here, but it was certainly confusing to
me. And we do not test either of the properties in t1450
(neither that a non-corruption error will caused a non-zero
exit for a packed object, nor that we keep going after
seeing the first error). Let's test both of those
conditions, so that we'll notice if any of those assumptions
becomes invalid.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1450-fsck.sh | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f95174c9d..c39d42120 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -560,4 +560,25 @@ test_expect_success 'alternate objects are correctly blamed' '
 	grep alt.git out
 '
 
+test_expect_success 'fsck errors in packed objects' '
+	git cat-file commit HEAD >basis &&
+	sed "s/</one/" basis >one &&
+	sed "s/</foo/" basis >two &&
+	one=$(git hash-object -t commit -w one) &&
+	two=$(git hash-object -t commit -w two) &&
+	pack=$(
+		{
+			echo $one &&
+			echo $two
+		} | git pack-objects .git/objects/pack/pack
+	) &&
+	test_when_finished "rm -f .git/objects/pack/pack-$pack.*" &&
+	remove_object $one &&
+	remove_object $two &&
+	test_must_fail git fsck 2>out &&
+	grep "error in commit $one.* - bad name" out &&
+	grep "error in commit $two.* - bad name" out &&
+	! grep corrupt out
+'
+
 test_done
-- 
2.11.0.629.g10075098c


^ permalink raw reply related

* Re: [PATCH 1/3] xdiff: -W: relax end-of-file function detection
From: René Scharfe @ 2017-01-13 17:49 UTC (permalink / raw)
  To: Vegard Nossum, Junio C Hamano, git
In-Reply-To: <1484324112-17773-1-git-send-email-vegard.nossum@oracle.com>

Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
> When adding a new function to the end of a file, it's enough to know
> that 1) the addition is at the end of the file; and 2) there is a
> function _somewhere_ in there.
>
> If we had simply been changing the end of an existing function, then we
> would also be deleting something from the old version.

That makes sense, thanks.

> This fixes the case where we add e.g.
>
> 	// Begin of dummy
> 	static int dummy(void)
> 	{
> 	}
>
> to the end of the file.

Without this patch the unchanged function before the added lines is 
shown in its entirety as (uncalled for) context.

>
> Cc: René Scharfe <l.s.r@web.de>
> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
> ---
>  xdiff/xemit.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index 7389ce4..8c88dbd 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -183,16 +183,14 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>
>  				/*
>  				 * We don't need additional context if
> -				 * a whole function was added, possibly
> -				 * starting with empty lines.
> +				 * a whole function was added.
>  				 */
> -				while (i2 < xe->xdf2.nrec &&
> -				       is_empty_rec(&xe->xdf2, i2))
> +				while (i2 < xe->xdf2.nrec) {
> +					if (match_func_rec(&xe->xdf2, xecfg, i2,
> +						dummy, sizeof(dummy)) >= 0)

Nit: I don't like the indentation here.  Giving "dummy" its own line is 
also not exactly pretty, but at least would allow the parameters to be 
aligned on the opening parenthesis.

> +						goto post_context_calculation;
>  					i2++;
> -				if (i2 < xe->xdf2.nrec &&
> -				    match_func_rec(&xe->xdf2, xecfg, i2,
> -						   dummy, sizeof(dummy)) >= 0)
> -					goto post_context_calculation;
> +				}
>
>  				/*
>  				 * Otherwise get more context from the
>

^ permalink raw reply

* [PATCHv3 2/2] builtin/commit.c: switch to xstrfmt(), instead of snprintf,
From: Elia Pinto @ 2017-01-13 17:58 UTC (permalink / raw)
  To: git; +Cc: Elia Pinto
In-Reply-To: <20170113175801.39468-1-gitter.spiros@gmail.com>

In this patch, instead of using xnprintf instead of snprintf, which asserts
that we don't truncate, we are switching to dynamic allocation with  xstrfmt(),
, so we can avoid dealing with magic numbers in the code and reduce the cognitive burden from
the programmers, because they no longer have to count bytes needed for static allocation.
As a side effect of this patch we have also reduced the snprintf() calls, that may silently truncate 
results if the programmer is not careful.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net> 
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
---
This is the third  version of the patch.

Changes from the first version: I have split the original commit in two, as discussed here
http://public-inbox.org/git/20161213132717.42965-1-gitter.spiros@gmail.com/.

Changes from the second version:
- Changed the commit message to clarify the purpose of the patch (
as suggested by Junio)
https://public-inbox.org/git/xmqqtw95mfo3.fsf@gitster.mtv.corp.google.com/T/#m2e6405a8a78a8ca1ed770614c91398290574c4a1



 builtin/commit.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 09bcc0f13..37228330c 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1526,12 +1526,10 @@ static int git_commit_config(const char *k, const char *v, void *cb)
 static int run_rewrite_hook(const unsigned char *oldsha1,
 			    const unsigned char *newsha1)
 {
-	/* oldsha1 SP newsha1 LF NUL */
-	static char buf[2*40 + 3];
+	char *buf;
 	struct child_process proc = CHILD_PROCESS_INIT;
 	const char *argv[3];
 	int code;
-	size_t n;
 
 	argv[0] = find_hook("post-rewrite");
 	if (!argv[0])
@@ -1547,11 +1545,11 @@ static int run_rewrite_hook(const unsigned char *oldsha1,
 	code = start_command(&proc);
 	if (code)
 		return code;
-	n = snprintf(buf, sizeof(buf), "%s %s\n",
-		     sha1_to_hex(oldsha1), sha1_to_hex(newsha1));
+	buf = xstrfmt("%s %s\n", sha1_to_hex(oldsha1), sha1_to_hex(newsha1));
 	sigchain_push(SIGPIPE, SIG_IGN);
-	write_in_full(proc.in, buf, n);
+	write_in_full(proc.in, buf, strlen(buf));
 	close(proc.in);
+	free(buf);
 	sigchain_pop(SIGPIPE);
 	return finish_command(&proc);
 }
-- 
2.11.0.154.g5f5f154


^ permalink raw reply related

* [PATCHv3 1/2] builtin/commit.c: removes the PATH_MAX limitation via dynamic allocation
From: Elia Pinto @ 2017-01-13 17:58 UTC (permalink / raw)
  To: git; +Cc: Elia Pinto

This patch removes the PATH_MAX limitation from the environment 
setting that points to a filename, we'd want to handle larger
paths anyway, so we switch to dynamic allocation. As a side effect 
of this patch we have also reduced the snprintf() calls, that 
may silently truncate results if the programmer is not careful.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
---
This is the third  version of the patch.

Changes from the first version: I have split the original commit in two, as discussed here
http://public-inbox.org/git/20161213132717.42965-1-gitter.spiros@gmail.com/.

Changes from the second version:
- Changed the commit message to clarify the purpose of the patch (
as suggested by Junio)
- drop the arg_array_clear call after exit(1)
https://public-inbox.org/git/xmqqtw95mfo3.fsf@gitster.mtv.corp.google.com/T/#m2e6405a8a78a8ca1ed770614c91398290574c4a1


 builtin/commit.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 0ed634b26..09bcc0f13 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -960,15 +960,15 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		return 0;
 
 	if (use_editor) {
-		char index[PATH_MAX];
-		const char *env[2] = { NULL };
-		env[0] =  index;
-		snprintf(index, sizeof(index), "GIT_INDEX_FILE=%s", index_file);
-		if (launch_editor(git_path_commit_editmsg(), NULL, env)) {
+		struct argv_array env = ARGV_ARRAY_INIT;
+
+		argv_array_pushf(&env, "GIT_INDEX_FILE=%s", index_file);
+		if (launch_editor(git_path_commit_editmsg(), NULL, env.argv)) {
 			fprintf(stderr,
 			_("Please supply the message using either -m or -F option.\n"));
 			exit(1);
 		}
+		argv_array_clear(&env);
 	}
 
 	if (!no_verify &&
@@ -1557,23 +1557,22 @@ static int run_rewrite_hook(const unsigned char *oldsha1,
 
 int run_commit_hook(int editor_is_used, const char *index_file, const char *name, ...)
 {
-	const char *hook_env[3] =  { NULL };
-	char index[PATH_MAX];
+	struct argv_array hook_env = ARGV_ARRAY_INIT;
 	va_list args;
 	int ret;
 
-	snprintf(index, sizeof(index), "GIT_INDEX_FILE=%s", index_file);
-	hook_env[0] = index;
+	argv_array_pushf(&hook_env, "GIT_INDEX_FILE=%s", index_file);
 
 	/*
 	 * Let the hook know that no editor will be launched.
 	 */
 	if (!editor_is_used)
-		hook_env[1] = "GIT_EDITOR=:";
+		argv_array_push(&hook_env, "GIT_EDITOR=:");
 
 	va_start(args, name);
-	ret = run_hook_ve(hook_env, name, args);
+	ret = run_hook_ve(hook_env.argv,name, args);
 	va_end(args);
+	argv_array_clear(&hook_env);
 
 	return ret;
 }
-- 
2.11.0.154.g5f5f154


^ permalink raw reply related

* [PATCH 4/6] sha1_file: add read_loose_object() function
From: Jeff King @ 2017-01-13 17:58 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git
In-Reply-To: <20170113175258.e66taigy4wpokohk@sigill.intra.peff.net>

It's surprisingly hard to ask the sha1_file code to open a
_specific_ incarnation of a loose object. Most of the
functions take a sha1, and loop over the various object
types (packed versus loose) and locations (local versus
alternates) at a low level.

However, some tools like fsck need to look at a specific
file. This patch gives them a function they can use to open
the loose object at a given path.

The implementation unfortunately ends up repeating bits of
related functions, but there's not a good way around it
without some major refactoring of the whole sha1_file stack.
We need to mmap the specific file, then partially read the
zlib stream to know whether we're streaming or not, and then
finally either stream it or copy the data to a buffer.

We can do that by assembling some of the more arcane
internal sha1_file functions, but we end up having to
essentially reimplement unpack_sha1_file(), along with the
streaming bits of check_sha1_signature().

Still, most of the ugliness is contained in the new
function, and the interface is clean enough that it may be
reusable (though it seems unlikely anything but git-fsck
would care about opening a specific file).

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h     |  13 ++++++
 sha1_file.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 143 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 1b67f078d..33f1c2fa7 100644
--- a/cache.h
+++ b/cache.h
@@ -1140,6 +1140,19 @@ extern int finalize_object_file(const char *tmpfile, const char *filename);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 
+/*
+ * Open the loose object at path, check its sha1, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const unsigned char *expected_sha1,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents);
+
 /*
  * Return true iff we have an object named sha1, whether local or in
  * an alternate object database, and whether packed or loose.  This
diff --git a/sha1_file.c b/sha1_file.c
index c6b990f41..c0fccb73c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1687,13 +1687,21 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
 	return -1;
 }
 
-void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
+/*
+ * Map the loose object at "path" if it is not NULL, or the path found by
+ * searching for a loose object named "sha1".
+ */
+static void *map_sha1_file_1(const char *path,
+			     const unsigned char *sha1,
+			     unsigned long *size)
 {
-	const char *path;
 	void *map;
 	int fd;
 
-	fd = open_sha1_file(sha1, &path);
+	if (path)
+		fd = git_open(path);
+	else
+		fd = open_sha1_file(sha1, &path);
 	map = NULL;
 	if (fd >= 0) {
 		struct stat st;
@@ -1712,6 +1720,11 @@ void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
 	return map;
 }
 
+void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
+{
+	return map_sha1_file_1(NULL, sha1, size);
+}
+
 unsigned long unpack_object_header_buffer(const unsigned char *buf,
 		unsigned long len, enum object_type *type, unsigned long *sizep)
 {
@@ -3809,3 +3822,117 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags)
 	}
 	return r ? r : pack_errors;
 }
+
+static int check_stream_sha1(git_zstream *stream,
+			     const char *hdr,
+			     unsigned long size,
+			     const char *path,
+			     const unsigned char *expected_sha1)
+{
+	git_SHA_CTX c;
+	unsigned char real_sha1[GIT_SHA1_RAWSZ];
+	unsigned char buf[4096];
+	unsigned long total_read;
+	int status = Z_OK;
+
+	git_SHA1_Init(&c);
+	git_SHA1_Update(&c, hdr, stream->total_out);
+
+	/*
+	 * We already read some bytes into hdr, but the ones up to the NUL
+	 * do not count against the object's content size.
+	 */
+	total_read = stream->total_out - strlen(hdr) - 1;
+
+	/*
+	 * This size comparison must be "<=" to read the final zlib packets;
+	 * see the comment in unpack_sha1_rest for details.
+	 */
+	while (total_read <= size &&
+	       (status == Z_OK || status == Z_BUF_ERROR)) {
+		stream->next_out = buf;
+		stream->avail_out = sizeof(buf);
+		if (size - total_read < stream->avail_out)
+			stream->avail_out = size - total_read;
+		status = git_inflate(stream, Z_FINISH);
+		git_SHA1_Update(&c, buf, stream->next_out - buf);
+		total_read += stream->next_out - buf;
+	}
+	git_inflate_end(stream);
+
+	if (status != Z_STREAM_END) {
+		error("corrupt loose object '%s'", sha1_to_hex(expected_sha1));
+		return -1;
+	}
+
+	git_SHA1_Final(real_sha1, &c);
+	if (hashcmp(expected_sha1, real_sha1)) {
+		error("sha1 mismatch for %s (expected %s)", path,
+		      sha1_to_hex(expected_sha1));
+		return -1;
+	}
+
+	return 0;
+}
+
+int read_loose_object(const char *path,
+		      const unsigned char *expected_sha1,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents)
+{
+	int ret = -1;
+	int fd = -1;
+	void *map = NULL;
+	unsigned long mapsize;
+	git_zstream stream;
+	char hdr[32];
+
+	*contents = NULL;
+
+	map = map_sha1_file_1(path, NULL, &mapsize);
+	if (!map) {
+		error_errno("unable to mmap %s", path);
+		goto out;
+	}
+
+	if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+		error("unable to unpack header of %s", path);
+		goto out;
+	}
+
+	*type = parse_sha1_header(hdr, size);
+	if (*type < 0) {
+		error("unable to parse header of %s", path);
+		git_inflate_end(&stream);
+		goto out;
+	}
+
+	if (*type == OBJ_BLOB) {
+		if (check_stream_sha1(&stream, hdr, *size, path, expected_sha1) < 0)
+			goto out;
+	} else {
+		*contents = unpack_sha1_rest(&stream, hdr, *size, expected_sha1);
+		if (!*contents) {
+			error("unable to unpack contents of %s", path);
+			git_inflate_end(&stream);
+			goto out;
+		}
+		if (check_sha1_signature(expected_sha1, *contents,
+					 *size, typename(*type))) {
+			error("sha1 mismatch for %s (expected %s)", path,
+			      sha1_to_hex(expected_sha1));
+			free(*contents);
+			goto out;
+		}
+	}
+
+	ret = 0; /* everything checks out */
+
+out:
+	if (map)
+		munmap(map, mapsize);
+	if (fd >= 0)
+		close(fd);
+	return ret;
+}
-- 
2.11.0.629.g10075098c


^ permalink raw reply related

* [PATCH 5/6] fsck: parse loose object paths directly
From: Jeff King @ 2017-01-13 17:59 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git
In-Reply-To: <20170113175258.e66taigy4wpokohk@sigill.intra.peff.net>

When we iterate over the list of loose objects to check, we
get the actual path of each object. But we then throw it
away and pass just the sha1 to fsck_sha1(), which will do a
fresh lookup. Usually it would find the same object, but it
may not if an object exists both as a loose and a packed
object. We may end up checking the packed object twice, and
never look at the loose one.

In practice this isn't too terrible, because if fsck doesn't
complain, it means you have at least one good copy. But
since the point of fsck is to look for corruption, we should
be thorough.

The new read_loose_object() interface can help us get the
data from disk, and then we replace parse_object() with
parse_object_buffer(). As a bonus, our error messages now
mention the path to a corrupted object, which should make it
easier to track down errors when they do happen.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  | 46 +++++++++++++++++++++++++++++++++-------------
 t/t1450-fsck.sh | 16 ++++++++++++++++
 2 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index f01b81eeb..4b91ee95e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -362,18 +362,6 @@ static int fsck_obj(struct object *obj)
 	return 0;
 }
 
-static int fsck_sha1(const unsigned char *sha1)
-{
-	struct object *obj = parse_object(sha1);
-	if (!obj) {
-		errors_found |= ERROR_OBJECT;
-		return error("%s: object corrupt or missing",
-			     sha1_to_hex(sha1));
-	}
-	obj->flags |= HAS_OBJ;
-	return fsck_obj(obj);
-}
-
 static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type,
 			   unsigned long size, void *buffer, int *eaten)
 {
@@ -488,9 +476,41 @@ static void get_default_heads(void)
 	}
 }
 
+static struct object *parse_loose_object(const unsigned char *sha1,
+					 const char *path)
+{
+	struct object *obj;
+	void *contents;
+	enum object_type type;
+	unsigned long size;
+	int eaten;
+
+	if (read_loose_object(path, sha1, &type, &size, &contents) < 0)
+		return NULL;
+
+	if (!contents && type != OBJ_BLOB)
+		die("BUG: read_loose_object streamed a non-blob");
+
+	obj = parse_object_buffer(sha1, type, size, contents, &eaten);
+
+	if (!eaten)
+		free(contents);
+	return obj;
+}
+
 static int fsck_loose(const unsigned char *sha1, const char *path, void *data)
 {
-	if (fsck_sha1(sha1))
+	struct object *obj = parse_loose_object(sha1, path);
+
+	if (!obj) {
+		errors_found |= ERROR_OBJECT;
+		error("%s: object corrupt or missing: %s",
+		      sha1_to_hex(sha1), path);
+		return 0; /* keep checking other objects */
+	}
+
+	obj->flags = HAS_OBJ;
+	if (fsck_obj(obj))
 		errors_found |= ERROR_OBJECT;
 	return 0;
 }
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index c39d42120..455c186fe 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -581,4 +581,20 @@ test_expect_success 'fsck errors in packed objects' '
 	! grep corrupt out
 '
 
+test_expect_success 'fsck finds problems in duplicate loose objects' '
+	rm -rf broken-duplicate &&
+	git init broken-duplicate &&
+	(
+		cd broken-duplicate &&
+		test_commit duplicate &&
+		# no "-d" here, so we end up with duplicates
+		git repack &&
+		# now corrupt the loose copy
+		file=$(sha1_file "$(git rev-parse HEAD)") &&
+		rm "$file" &&
+		echo broken >"$file" &&
+		test_must_fail git fsck
+	)
+'
+
 test_done
-- 
2.11.0.629.g10075098c


^ permalink raw reply related

* [PATCH 6/6] fsck: detect trailing garbage in all object types
From: Jeff King @ 2017-01-13 18:00 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git
In-Reply-To: <20170113175258.e66taigy4wpokohk@sigill.intra.peff.net>

When a loose tree or commit is read by fsck (or any git
program), unpack_sha1_rest() checks whether there is extra
cruft at the end of the object file, after the zlib data.
Blobs that are streamed, however, do not have this check.

For normal git operations, it's not a big deal. We know the
sha1 and size checked out, so we have the object bytes we
wanted.  The trailing garbage doesn't affect what we're
trying to do.

But since the point of fsck is to find corruption or other
problems, it should be more thorough. This patch teaches its
loose-sha1 reader to detect extra bytes after the zlib
stream and complain.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c     |  5 +++++
 t/t1450-fsck.sh | 22 ++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/sha1_file.c b/sha1_file.c
index c0fccb73c..b77ab6d5c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3864,6 +3864,11 @@ static int check_stream_sha1(git_zstream *stream,
 		error("corrupt loose object '%s'", sha1_to_hex(expected_sha1));
 		return -1;
 	}
+	if (stream->avail_in) {
+		error("garbage at end of loose object '%s'",
+		      sha1_to_hex(expected_sha1));
+		return -1;
+	}
 
 	git_SHA1_Final(real_sha1, &c);
 	if (hashcmp(expected_sha1, real_sha1)) {
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 455c186fe..8975b4d1b 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -597,4 +597,26 @@ test_expect_success 'fsck finds problems in duplicate loose objects' '
 	)
 '
 
+test_expect_success 'fsck detects trailing loose garbage (commit)' '
+	git cat-file commit HEAD >basis &&
+	echo bump-commit-sha1 >>basis &&
+	commit=$(git hash-object -w -t commit basis) &&
+	file=$(sha1_file $commit) &&
+	test_when_finished "remove_object $commit" &&
+	chmod +w "$file" &&
+	echo garbage >>"$file" &&
+	test_must_fail git fsck 2>out &&
+	test_i18ngrep "garbage.*$commit" out
+'
+
+test_expect_success 'fsck detects trailing loose garbage (blob)' '
+	blob=$(echo trailing | git hash-object -w --stdin) &&
+	file=$(sha1_file $blob) &&
+	test_when_finished "remove_object $blob" &&
+	chmod +w "$file" &&
+	echo garbage >>"$file" &&
+	test_must_fail git fsck 2>out &&
+	test_i18ngrep "garbage.*$blob" out
+'
+
 test_done
-- 
2.11.0.629.g10075098c

^ permalink raw reply related

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
From: René Scharfe @ 2017-01-13 18:19 UTC (permalink / raw)
  To: Vegard Nossum, Junio C Hamano, git
In-Reply-To: <1484324112-17773-2-git-send-email-vegard.nossum@oracle.com>

Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
> When using -W to include the whole function in the diff context, you
> are typically doing this to be able to review the change in its entirety
> within the context of the function. It is therefore almost always
> desirable to include any comments that immediately precede the function.
>
> This also the fixes the case for C where the declaration is split across
> multiple lines (where the first line of the declaration would not be
> included in the output), e.g.:
>
> 	void
> 	dummy(void)
> 	{
> 		...
> 	}
>

That's true, but I'm not sure "non-empty line before function line" is 
good enough a definition for desirable lines.  It wouldn't work for 
people who don't believe in empty lines.  Or for those that put a blank 
line between comment and function.  (I have an opinion on such habits, 
but git diff should probably stay neutral.)  And that's just for C code; 
I have no idea how this heuristic would hold up for other file types 
like HTML.

We can identify function lines with arbitrary precision (with a 
xfuncname regex, if needed), but there is no accurate way to classify 
lines as comments, or as the end of functions.  Adding optional regexes 
for single- and multi-line comments would help, at least for C.

René

^ permalink raw reply

* Re: [PATCH 2/2] Use 'env' to find perl instead of fixed path
From: Junio C Hamano @ 2017-01-13 18:27 UTC (permalink / raw)
  To: Eric Wong; +Cc: Pat Pannuto, Johannes Schindelin, Johannes Sixt, git
In-Reply-To: <20170113024842.GA20572@starla>

Eric Wong <e@80x24.org> writes:

> Pat Pannuto <pat.pannuto@gmail.com> wrote:
>> You may still want the 1/2 patch in this series, just to make things
>> internally consistent with "-w" vs "use warnings;" inside git's perl
>> scripts.
>
> No, that is a step back.  "-w" affects the entire process, so it
> spots more potential problems.  The "warnings" pragma is scoped
> to the enclosing block, so it won't span across files.

OK, so with "-w", we do not have to write "use warnings" in each of
our files to get them checked.  It is handy when we ship our own
libs (e.g. Git.pm) that are used by our programs.

If something we write outselves trigger a false-positive, we can
work it around with "no warnings;" in the smallest block that
encloses the offending code in the worst case, or just rephrase it
in a way that won't trigger a false-positive.

If something we _use_ from a third-party is not warnings-clean,
there is no easy way to squelch them if we use "-w", which is a
potential downside, isn't it?  I do not know how serious a problem
it is in practice.  I suspect that the core package we use from perl
distribution are supposed to be warnings-clean, but we use a handful
of things from outside the core and I do not know what state they
are in.

^ permalink raw reply

* submodule network operations [WAS: Re: [RFC/PATCH 0/4] working tree operations: support superprefix]
From: Stefan Beller @ 2017-01-13 18:30 UTC (permalink / raw)
  To: Brian J. Davis; +Cc: Brandon Williams, git@vger.kernel.org, David Turner

This question is about networking; the patch you originally replied to
was strictly about local operations in the filesystem, which is quite
a difference, so let's discuss it separately.

On Fri, Jan 13, 2017 at 9:56 AM, Brian J. Davis <bitminer@gmail.com> wrote:
>
> In response to a post at:
>
> https://groups.google.com/forum/#!topic/git-users/BVLcKHhSUKo
>
> I was asked to post here to explain a potential problem with current modules
> implementation.  Apologies if I am posting in the wrong place... so good bad
> or otherwise here goes:
>
> +-------------------------------
> With:
>
> git push origin branchname
>
> or
>
> git push server2 branchname
>
> I can push or pull from multiple servers so no problem as git supports this.
>
> Now the problem with use of submodules
>
> submodules are listed:
>
> submodule.directory_to_
> checkout/proj1_dir.url=https://git.someserver1/git/proj1_dir
> <https://git.someserver1/git/proj1_dir>

Technically it is submodule.<name>.url instead of
submodule.<path>.url. The name is usually the path initially, and once
you move the submodule, only the path changes, the name is supposed to
be constant and stay the same.

>
>
> but if say I want to pull from some server 2 and do a
>
> git submodule update --init --recursive

That is why the "git submodule init" exists at all.

    git submodule init
    $EDIT .git/config # change submodule.<name>.url to server2
    git submodule update # that uses the adapted url and
    # creates the submodule repository.

    # From now on the submodule is on its own.
    cd <submodule> && git config --list
    # prints an "origin" remote and a lot more

For a better explanation, I started a documentation series, see
https://github.com/gitster/git/commit/e2b51b9df618ceeff7c4ec044e20f5ce9a87241e

(It is not finished, but that is supposed to explain this exact pain
point in the
STATES section, feel free to point out missing things or what is hard
to understand)

>
> what I would get is from someserver1 not someserver2 as there is no "origin"
> support for submodules.  Is this correct?  If so can origin support be added
> to submodules?

Can you explain more in detail what you mean by origin support?

> +-------------------------------
>
> So above was post from google group.  Maybe above is enough to explain the
> problem that can result, but if not here is a discussion as to why this can
> be confusing resulting to pushing or pulling from the incorrect server.
>
> Lets say projects starts at origin on https://server0.org. Project is then
> brought in house to server https://indhouse.corp by developer A.
>
> Developer A then changes the submodule projects to point to indhouse and
> changes submodules in super project to point to indhouse so everything is
> great.
>
> Then Developer B clones from indhouse makes changes to submodule1 and
> submodule1 and pushes them to indhouse.  Dev A tells Dev B hey thoes changes
> are great why don't you push to public server0 so every one can get thoes
> changes.  Dev B then git push origin branch_name on submodule1 and push
> origin on submodule 2 and superproject.  And everything is great ... right?
>
> Yes by now those who know git know what dev B  forgot or more to the point
> what git does not support in a "clean" way.  For those who don't enter the
> life of dev C
>
> So dev C clones from server0 and performs a git submodule update --init
> --recursive.  Now Dev C is on the www and can't see indhouse.corp and ...
> kerpow... Dev B and Dev C just experienced one of the many SW mines on the
> battlefield.
>
> I ask that git devs first see if I am correct with this assessment as I
> could be wrong... maybe there is a way of doing this... and if not add a
> feature to git to handle submodules and multiple origins cleanly.... and yes
> beating dev B with rubber chicken might be a solution though I am looking
> for a slightly better option.

Yes this is a big point that we want to solve eventually.

When devA brought it inhouse, what they meant to do was this:
"This superproject is actually from server0, but we want to work on it, which
may have submodules diverge from server0 eventually. So if a submodule changed
you need to get it from the inhouse server, otherwise fall back to the server0".

That way developerB can just make changes to some submodules and when
devC clones
they get the "correct" submodule.

A weak attempt to do this is to use *relative* submodule urls. When
using relative urls, and then mirroring the supeproject inhouse, then
Git will look for the submodules as well inhouse, but there
is no such "or if not found look at the original superprojects
origin", which means, you have to mirror all submodules.

And then about upstreaming changes. If you have a single repo (no
submodules), you have to teach people to run "git remote add remote
server0.org && git push upstream ...", but that you can do for each
submodule. (This is tedious:/ but maybe ok; some submodules are free
to sent things upstream whereas others are supersecret that you do not
want to push upstream ever.)

So yeah maybe we want to have more power in the superprojects push operation

    (in the superproject) $ git push --recurse-submodules \
       --only-these-submodules=subA,subB \
       --submodule-to=upstream-as-configered-in-super-project

This is a lot of words but for explaining that is ok?

Thanks,
Stefan

^ permalink raw reply

* Re: [PATCH 1/2] asciidoctor: fix user-manual to be built by `asciidoctor`
From: Junio C Hamano @ 2017-01-13 18:31 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, git, 마누엘
In-Reply-To: <xmqq37gokrdj.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

> ...  Perhaps 1/2 posted as-is
> is the best we could do within that constraint.
>
> Let's queue it on 'pu' and see if somebody else comes up with an
> update that is more visually pleasing with both backends.

... and I noticed that I didn't get to actually queuing this
yesterday, even though I sent the message above.

Now I have, and the patch will be part of tonight's pushout.

Thanks.

^ permalink raw reply

* Re: [PATCHv3 2/2] builtin/commit.c: switch to xstrfmt(), instead of snprintf,
From: Brandon Williams @ 2017-01-13 18:33 UTC (permalink / raw)
  To: Elia Pinto; +Cc: git
In-Reply-To: <20170113175801.39468-2-gitter.spiros@gmail.com>

On 01/13, Elia Pinto wrote:
> In this patch, instead of using xnprintf instead of snprintf, which asserts
> that we don't truncate, we are switching to dynamic allocation with  xstrfmt(),
> , so we can avoid dealing with magic numbers in the code and reduce the cognitive burden from
> the programmers, because they no longer have to count bytes needed for static allocation.
> As a side effect of this patch we have also reduced the snprintf() calls, that may silently truncate 
> results if the programmer is not careful.
> 
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Helped-by: Jeff King <peff@peff.net> 
> Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>

Small nit's with the commit message:
* Stray comma ',' of on its own
* lines are longer than 80 characters

> ---
> This is the third  version of the patch.
> 
> Changes from the first version: I have split the original commit in two, as discussed here
> http://public-inbox.org/git/20161213132717.42965-1-gitter.spiros@gmail.com/.
> 
> Changes from the second version:
> - Changed the commit message to clarify the purpose of the patch (
> as suggested by Junio)
> https://public-inbox.org/git/xmqqtw95mfo3.fsf@gitster.mtv.corp.google.com/T/#m2e6405a8a78a8ca1ed770614c91398290574c4a1
> 
> 
> 
>  builtin/commit.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 09bcc0f13..37228330c 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1526,12 +1526,10 @@ static int git_commit_config(const char *k, const char *v, void *cb)
>  static int run_rewrite_hook(const unsigned char *oldsha1,
>  			    const unsigned char *newsha1)
>  {
> -	/* oldsha1 SP newsha1 LF NUL */
> -	static char buf[2*40 + 3];
> +	char *buf;
>  	struct child_process proc = CHILD_PROCESS_INIT;
>  	const char *argv[3];
>  	int code;
> -	size_t n;
>  
>  	argv[0] = find_hook("post-rewrite");
>  	if (!argv[0])
> @@ -1547,11 +1545,11 @@ static int run_rewrite_hook(const unsigned char *oldsha1,
>  	code = start_command(&proc);
>  	if (code)
>  		return code;
> -	n = snprintf(buf, sizeof(buf), "%s %s\n",
> -		     sha1_to_hex(oldsha1), sha1_to_hex(newsha1));
> +	buf = xstrfmt("%s %s\n", sha1_to_hex(oldsha1), sha1_to_hex(newsha1));
>  	sigchain_push(SIGPIPE, SIG_IGN);
> -	write_in_full(proc.in, buf, n);
> +	write_in_full(proc.in, buf, strlen(buf));
>  	close(proc.in);
> +	free(buf);
>  	sigchain_pop(SIGPIPE);
>  	return finish_command(&proc);
>  }
> -- 
> 2.11.0.154.g5f5f154
> 

-- 
Brandon Williams

^ permalink raw reply

* Re: [PATCH] lib-submodule-update.sh: drop unneeded shell
From: Junio C Hamano @ 2017-01-13 18:40 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git
In-Reply-To: <20170111184732.26416-1-sbeller@google.com>

Stefan Beller <sbeller@google.com> writes:

> In modern Git we prefer "git -C <cmd" over "(cd <somewhere && git <cmd>)"
> as it doesn't need an extra shell.

There is a matching '>' missing.  The description is correct (I am
not sure if there actually is "preference", though), but I found the
title a bit misleading.  With 'drop unneeded subshell', I would
imagine a patch like

		(
	-		( do something )
	+		do something;
			do something else;
		)

It looks more like "reduce use of subshell by using 'git -C <dir>'"
to me.

> And because it is in a setup function, we actually save the invocation
> of 22 shells for a single run of the whole test suite.

Nice.

>
> Noticed while adding a lot more in near vincinity, though not as near
> to cause merge conflicts, so sending it extra.
>
> Thanks,
> Stefan
>
>  t/lib-submodule-update.sh | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
> index 79cdd34a54..915eb4a7c6 100755
> --- a/t/lib-submodule-update.sh
> +++ b/t/lib-submodule-update.sh
> @@ -69,10 +69,7 @@ create_lib_submodule_repo () {
>  
>  		git checkout -b "replace_sub1_with_directory" "add_sub1" &&
>  		git submodule update &&
> -		(
> -			cd sub1 &&
> -			git checkout modifications
> -		) &&
> +		git -C sub1 checkout modifications &&
>  		git rm --cached sub1 &&
>  		rm sub1/.git* &&
>  		git config -f .gitmodules --remove-section "submodule.sub1" &&

^ permalink raw reply

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
From: Stefan Beller @ 2017-01-13 18:44 UTC (permalink / raw)
  To: René Scharfe, Michael Haggerty
  Cc: Vegard Nossum, Junio C Hamano, git@vger.kernel.org
In-Reply-To: <e55dc4dd-768b-8c9b-e3b2-e850d5d521f5@web.de>

On Fri, Jan 13, 2017 at 10:19 AM, René Scharfe <l.s.r@web.de> wrote:
> Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
>>
>> When using -W to include the whole function in the diff context, you
>> are typically doing this to be able to review the change in its entirety
>> within the context of the function. It is therefore almost always
>> desirable to include any comments that immediately precede the function.

Do we need a small comment in the actual code to hint at why we count
upwards there?

>>
>> This also the fixes the case for C where the declaration is split across
>> multiple lines (where the first line of the declaration would not be
>> included in the output), e.g.:
>>
>>         void
>>         dummy(void)
>>         {
>>                 ...
>>         }
>>
>
> That's true, but I'm not sure "non-empty line before function line" is good
> enough a definition for desirable lines.  It wouldn't work for people who
> don't believe in empty lines.  Or for those that put a blank line between
> comment and function.  (I have an opinion on such habits, but git diff
> should probably stay neutral.)  And that's just for C code; I have no idea
> how this heuristic would hold up for other file types like HTML.

I think empty lines are "good as a first approach", see e.g.
433860f3d0beb0c6 the "compaction" heuristic for a similar
thing (the compaction was introduced at d634d61ed), and then
we can build a more elaborate thing on top.

>
> We can identify function lines with arbitrary precision (with a xfuncname
> regex, if needed), but there is no accurate way to classify lines as
> comments, or as the end of functions.  Adding optional regexes for single-
> and multi-line comments would help, at least for C.

That would cover Java and whole lot of other C like languages. So a good
start as well IMHO.

>
> René

^ permalink raw reply

* Re: [PATCH 0/5] extend git-describe pattern matching
From: Junio C Hamano @ 2017-01-13 18:48 UTC (permalink / raw)
  To: Jacob Keller; +Cc: git, Jacob Keller
In-Reply-To: <20170112001721.2534-1-jacob.e.keller@intel.com>

Jacob Keller <jacob.e.keller@intel.com> writes:

> From: Jacob Keller <jacob.keller@gmail.com>
>
> Teach git describe and git name-rev the ability to match multiple
> patterns inclusively. Additionally, teach these commands to also accept
> negative patterns to discard any refs which match.

You made quick responses to reviews with "will change", so I am not
queuing this round to my tree; please don't mistake that as my
indifference or opposition to the topic.  This sounds like a good
thing.

As to the semantics of mixing positives and negatives, I would
recommend this to follow how positive and negative pathspecs mix.
IIRC we chose to use the most simple and easy to explain option,
i.e. a thing must match at least one of the positives and must not
match any of the negatives to be considered a match.



^ permalink raw reply

* Re: [PATCH 2/2] Use 'env' to find perl instead of fixed path
From: Eric Wong @ 2017-01-13 18:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Pat Pannuto, Johannes Schindelin, Johannes Sixt, git
In-Reply-To: <xmqq4m12izmd.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> > Pat Pannuto <pat.pannuto@gmail.com> wrote:
> >> You may still want the 1/2 patch in this series, just to make things
> >> internally consistent with "-w" vs "use warnings;" inside git's perl
> >> scripts.
> >
> > No, that is a step back.  "-w" affects the entire process, so it
> > spots more potential problems.  The "warnings" pragma is scoped
> > to the enclosing block, so it won't span across files.
> 
> OK, so with "-w", we do not have to write "use warnings" in each of
> our files to get them checked.  It is handy when we ship our own
> libs (e.g. Git.pm) that are used by our programs.

Yes.  "use warnings" should be in our own libs in case other
people run without "-w"

> If something we write outselves trigger a false-positive, we can
> work it around with "no warnings;" in the smallest block that
> encloses the offending code in the worst case, or just rephrase it
> in a way that won't trigger a false-positive.

Correct.

> If something we _use_ from a third-party is not warnings-clean,
> there is no easy way to squelch them if we use "-w", which is a
> potential downside, isn't it?  I do not know how serious a problem
> it is in practice.  I suspect that the core package we use from perl
> distribution are supposed to be warnings-clean, but we use a handful
> of things from outside the core and I do not know what state they
> are in.

Yes, "-w" will trigger warnings in third party packages.
Existing uses we have should be fine, and I think most Perl
modules we use or would use are vigilant about being
warnings-clean.  If we have to leave off a "-w", there should
probably be a comment at the top stating the reason:

#!/usr/bin/perl
# Not using "perl -w" since Foo::Bar <= X.Y.Y is not warnings-clean
use strict;
use warnings;
use Foo::Bar;
...

^ permalink raw reply

* [PATCH] lib-submodule-update.sh: reduce use of subshell by using git -C <dir>
From: Stefan Beller @ 2017-01-13 19:03 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller
In-Reply-To: <xmqqtw92hkgc.fsf@gitster.mtv.corp.google.com>

In modern Git we prefer
    "git -C <cmd>"
over
    "(cd <somewhere && git <cmd>)"
as it doesn't need an extra shell.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/lib-submodule-update.sh | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 79cdd34a54..915eb4a7c6 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -69,10 +69,7 @@ create_lib_submodule_repo () {
 
 		git checkout -b "replace_sub1_with_directory" "add_sub1" &&
 		git submodule update &&
-		(
-			cd sub1 &&
-			git checkout modifications
-		) &&
+		git -C sub1 checkout modifications &&
 		git rm --cached sub1 &&
 		rm sub1/.git* &&
 		git config -f .gitmodules --remove-section "submodule.sub1" &&
-- 
2.11.0.300.g08194d1431


^ permalink raw reply related

* Re: [PATCH] Documentation/bisect: improve on (bad|new) and (good|bad)
From: Junio C Hamano @ 2017-01-13 19:14 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Manuel Ullmann, Matthieu Moy, Christian Couder
In-Reply-To: <20170113144405.3963-1-chriscool@tuxfamily.org>

Christian Couder <christian.couder@gmail.com> writes:

> The following part of the description:
>
> git bisect (bad|new) [<rev>]
> git bisect (good|old) [<rev>...]
>
> may be a bit confusing, as a reader may wonder if instead it should be:
>
> git bisect (bad|good) [<rev>]
> git bisect (old|new) [<rev>...]
>
> Of course the difference between "[<rev>]" and "[<rev>...]" should hint
> that there is a good reason for the way it is.
>
> But we can further clarify and complete the description by adding
> "<term-new>" and "<term-old>" to the "bad|new" and "good|old"
> alternatives.
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  Documentation/git-bisect.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks.  The patch looks good.

A related tangent.  

Last night, I was trying to think if there is a fundamental reason
why "bad/new/term-new" cannot take more than one <rev>s on the newer
side of the bisection, and couldn't quite think of any before
falling asleep.

Currently we keep track of a single bisect/bad, while marking all the
revs given as good previously as bisect/good-<SHA-1>.

Because the next "bad" is typically chosen from the region of the
commit DAG that is bounded by bad and good commits, i.e. "rev-list
bisect/bad --not bisect/good-*", the current bisect/bad will always
be an ancestor of all bad commits that used to be bisect/bad, and
keeping previous bisect/bad as bisect/bad-<SHA-1> won't change the
region of the commit DAG yet to be explored.

As a reason why we need to use only a single bisect/bad, the above
description is understandable.  But as a reason why we cannot have
more than one, it is tautological.  It merely says "if we start from
only one and dig history to find older culprit, we need only one
bad".

I fell asleep last night without thinking further than that.

I think the answer to the question "why do we think we need a single
bisect/bad?" is "because bisection is about assuming that there is
only one commit that flips the tree state from 'old' to 'new' and
finding that single commit".  That would mean that even if we had
bisect/bad-A and bisect/bad-B, e.g.

                          o---o---o---bad-A
                         /
    -----Good---o---o---o
                         \
                          o---o---o---bad-B


where 'o' are all commits whose goodness is not yet known, because
bisection is valid only when we are hunting for a single commit that
flips the state from good to bad, that commit MUST be at or before
the merge base of bad-A and bad-B.  So even if we allowed

	$ git bisect bad bad-A bad-B

on the command line, we won't have to set bisect/bad-A and
bisect/bad-B.  We only need a single bisect/bad that points at the
merge base of these two.

But what if bad-A and bad-B have more than one merge bases?  We
won't know which side the badness came from.

                          o---o---o---bad-A
                         /     \ / 
    -----Good---o---o---o       / 
                         \     / \
                          o---o---o---bad-B

Being able to bisect the region of DAG bound by "^Good bad-A bad-B"
may have value in such a case.  I dunno.


^ permalink raw reply

* [PATCH] submodule update: run custom update script for initial populating as well
From: Stefan Beller @ 2017-01-13 19:43 UTC (permalink / raw)
  To: bmwill, gitster, judge.packham, olsonse; +Cc: git, Stefan Beller

In 1b4735d9f3 (submodule: no [--merge|--rebase] when newly cloned,
2011-02-17), all actions were defaulted to checkout for populating
a submodule initially, because merging or rebasing makes no sense
in that situation.

Other commands however do make sense, such as the custom command
that was added later (6cb5728c43, submodule update: allow custom
command to update submodule working tree, 2013-07-03).

I am unsure about the "none" command, as I can see an initial
checkout there as a useful thing. On the other hand going strictly
by our own documentation, we should do nothing in case of "none"
as well, because the user asked for it.

Reported-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 git-submodule.sh            |  7 ++++++-
 t/t7406-submodule-update.sh | 15 +++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 554bd1c494..aeb721ab7e 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -606,7 +606,12 @@ cmd_update()
 		if test $just_cloned -eq 1
 		then
 			subsha1=
-			update_module=checkout
+			if test "$update_module" = "merge" ||
+			   test "$update_module" = "rebase" ||
+			   test "$update_module" = "none"
+			then
+				update_module=checkout
+			fi
 		else
 			subsha1=$(sanitize_submodule_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 64f322c4cc..1407fa6098 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -424,6 +424,19 @@ test_expect_success 'submodule update - command in .git/config catches failure -
 	test_i18ncmp actual expect
 '
 
+test_expect_success 'submodule update - command run for initial population of submodule' '
+	cat <<-\ EOF >expect
+	Execution of '\''false $submodulesha1'\'' failed in submodule path '\''submodule'\''
+	EOF
+	(
+		cd super &&
+		rm -rf submodule
+		test_must_fail git submodule update >../actual
+	)
+	test_cmp expect actual
+	git -C super submodule update --checkout
+'
+
 cat << EOF >expect
 Execution of 'false $submodulesha1' failed in submodule path '../super/submodule'
 Failed to recurse into submodule path '../super'
@@ -476,6 +489,7 @@ test_expect_success 'submodule init picks up merge' '
 '
 
 test_expect_success 'submodule update --merge  - ignores --merge  for new submodules' '
+	test_config -C super submodule.submodule.update checkout &&
 	(cd super &&
 	 rm -rf submodule &&
 	 git submodule update submodule &&
@@ -488,6 +502,7 @@ test_expect_success 'submodule update --merge  - ignores --merge  for new submod
 '
 
 test_expect_success 'submodule update --rebase - ignores --rebase for new submodules' '
+	test_config -C super submodule.submodule.update checkout &&
 	(cd super &&
 	 rm -rf submodule &&
 	 git submodule update submodule &&
-- 
2.11.0.300.g08194d1431.dirty


^ permalink raw reply related

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
From: Junio C Hamano @ 2017-01-13 19:51 UTC (permalink / raw)
  To: René Scharfe; +Cc: Vegard Nossum, git
In-Reply-To: <e55dc4dd-768b-8c9b-e3b2-e850d5d521f5@web.de>

René Scharfe <l.s.r@web.de> writes:

> Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
>> When using -W to include the whole function in the diff context, you
>> are typically doing this to be able to review the change in its entirety
>> within the context of the function. It is therefore almost always
>> desirable to include any comments that immediately precede the function.
>>
>> This also the fixes the case for C where the declaration is split across
>> multiple lines (where the first line of the declaration would not be
>> included in the output), e.g.:
>>
>> 	void
>> 	dummy(void)
>> 	{
>> 		...
>> 	}
>>
>
> That's true, but I'm not sure "non-empty line before function line" is
> good enough a definition for desirable lines.  It wouldn't work for
> people who don't believe in empty lines.  Or for those that put a
> blank line between comment and function.  (I have an opinion on such
> habits, but git diff should probably stay neutral.)  And that's just
> for C code; I have no idea how this heuristic would hold up for other
> file types like HTML.

As you are, I am fairly negative on the heuristic based on the
"non-blank" thing.  We tried once with compaction-heuristics already
and it did not quite perform well.  Let's not hardcode another one.

> We can identify function lines with arbitrary precision (with a
> xfuncname regex, if needed), but there is no accurate way to classify
> lines as comments, or as the end of functions.  Adding optional
> regexes for single- and multi-line comments would help, at least for
> C.

The funcline regexp is used for two related but different purposes.
It identifies a single line to be placed on @@ ... @@ line before a
diff hunk.  This line however does not have to be at the beginning
of a function.  It has to be the line that conveys the most
significant information (e.g. the name of the function).

The way "diff -W" codepath used it as if it were always the very
first line of a function was bound to invite a patch like this, and
if we want to be extra elaborate, I agree that an extra mechanism to
say "the line the funcline regexp matches is not the beginning of a
function, but the beginning is a line that matches this other regexp
before that line" may help.

Do we really want to be that elaborate, though?  I dunno.

I wonder if it would be sufficient to make -W take an optional
number, e.g. "git show -W4", to add extre context lines before the
funcline.

^ permalink raw reply

* Re: [PATCH] lib-submodule-update.sh: drop unneeded shell
From: Junio C Hamano @ 2017-01-13 19:55 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git
In-Reply-To: <xmqqtw92hkgc.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <sbeller@google.com> writes:
>
>> In modern Git we prefer "git -C <cmd" over "(cd <somewhere && git <cmd>)"
>> as it doesn't need an extra shell.
>
> There is a matching '>' missing.  The description is correct (I am
> not sure if there actually is "preference", though), but I found the
> title a bit misleading....

It turns out that there were two missing '>' ;-)  It tentatively has
become like this in my tree.

-- >8 --
From: Stefan Beller <sbeller@google.com>
Date: Wed, 11 Jan 2017 10:47:32 -0800
Subject: [PATCH] lib-submodule-update.sh: reduce use of subshell by using "git -C"

We write

    (cd <dir> && git <cmd>)

to avoid

    cd <dir> && git <cmd> && cd ..

that allows a breakage in one part of the test script to leave the
entire test process in an unexpected place.  We can do this more
concisely with "git -C <dir> <cmd>" with modern Git.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 79cdd34a54..915eb4a7c6 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -69,10 +69,7 @@ create_lib_submodule_repo () {
 
 		git checkout -b "replace_sub1_with_directory" "add_sub1" &&
 		git submodule update &&
-		(
-			cd sub1 &&
-			git checkout modifications
-		) &&
+		git -C sub1 checkout modifications &&
 		git rm --cached sub1 &&
 		rm sub1/.git* &&
 		git config -f .gitmodules --remove-section "submodule.sub1" &&

^ permalink raw reply related

* Re: [PATCH 0/3] updates to gitk & git-gui doc now gitview has gone
From: Junio C Hamano @ 2017-01-13 20:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Philip Oakley, GitList
In-Reply-To: <alpine.DEB.2.20.1701131622510.3469@virtualbox>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> Lets remove the reference in the gitk man page, and do some other
>> fixups while in the area.
>
> These changes all look sensible to me.

To me, too.  Thanks, both.

^ permalink raw reply

* Re: [PATCH 2/2] Use 'env' to find perl instead of fixed path
From: Junio C Hamano @ 2017-01-13 20:01 UTC (permalink / raw)
  To: Eric Wong; +Cc: Pat Pannuto, Johannes Schindelin, Johannes Sixt, git
In-Reply-To: <20170113185246.GA17441@starla>

Eric Wong <e@80x24.org> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>> Eric Wong <e@80x24.org> writes:
>> > Pat Pannuto <pat.pannuto@gmail.com> wrote:
>> >> You may still want the 1/2 patch in this series, just to make things
>> >> internally consistent with "-w" vs "use warnings;" inside git's perl
>> >> scripts.
>> >
>> > No, that is a step back.  "-w" affects the entire process, so it
>> > spots more potential problems.  The "warnings" pragma is scoped
>> > to the enclosing block, so it won't span across files.
>> 
>> OK, so with "-w", we do not have to write "use warnings" in each of
>> our files to get them checked.  It is handy when we ship our own
>> libs (e.g. Git.pm) that are used by our programs.
>
> Yes.  "use warnings" should be in our own libs in case other
> people run without "-w"

Would it mean that we need both anyway?  That is, add missing "use
warnings" without removing "-w" from she-bang line?

> Yes, "-w" will trigger warnings in third party packages.
> Existing uses we have should be fine, and I think most Perl
> modules we use or would use are vigilant about being
> warnings-clean.  If we have to leave off a "-w", there should
> probably be a comment at the top stating the reason:
>
> #!/usr/bin/perl
> # Not using "perl -w" since Foo::Bar <= X.Y.Y is not warnings-clean
> use strict;
> use warnings;
> use Foo::Bar;
> ...

Good.

Speaking of Perl, I recall that somebody complained that we ship
with and do use a stale copy of Error.pm that has been deprecated.
I am not asking you to do so, but we may want to see somebody look
into it (i.e. assessing the current situation, and if it indeed is
desirable for us to wean ourselves away from Error.pm, update our
codepaths that use it).

Thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox