Git development
 help / color / mirror / Atom feed
* bug: git fetch reports too many unreachable loose objects
From: Josh Wolfe @ 2018-12-08  2:12 UTC (permalink / raw)
  To: git

git version 2.19.1
steps to reproduce:

# start in a brand new repo
git init

# create lots of unreachable loose objects
for i in {1..10000}; do git commit-tree -m "$(head -c 12 /dev/urandom
| base64)" "$(git mktree <&-)" <&-; done
# this prints a lot of output and takes a minute or so to run

# trigger git gc to run in the background
git fetch
# Auto packing the repository in background for optimum performance.
# See "git help gc" for manual housekeeping.

# trigger it again
git fetch
# Auto packing the repository in background for optimum performance.
# See "git help gc" for manual housekeeping.
# error: The last gc run reported the following. Please correct the root cause
# and remove .git/gc.log.
# Automatic cleanup will not be performed until the file is removed.
#
# warning: There are too many unreachable loose objects; run 'git
prune' to remove them.

# to manually fix this, run git prune:
git prune

# note that `git gc` does not fix the problem, and appears to do
nothing in this situation:
git gc


According to the `git fetch` output, the `git help gc` docs, and the
`git help prune` docs, I don't think I shouldn't ever have to run `git
prune` manually, so this behavior seems like a bug to me. Please
correct me if this is expected behavior.

In case anyone's wondering why I'm creating unreachable loose objects,
here's the usecase: https://stackoverflow.com/a/50403179/367916 . I
would love a first-class solution to obviate that workaround, but that
is probably a separate issue.

Josh

^ permalink raw reply

* Re: [PATCH v2 1/3] git clone <url> C:\cygwin\home\USER\repo' is working (again)
From: Steven Penny @ 2018-12-08  0:49 UTC (permalink / raw)
  To: tboegi; +Cc: git
In-Reply-To: <20181207170456.8994-1-tboegi@web.de>

On Fri, Dec 7, 2018 at 11:04 AM wrote:
> The solution is to implement has_dos_drive_prefix(), skip_dos_drive_prefix()
> is_dir_sep(), offset_1st_component() and convert_slashes() for cygwin
> in the same way as it is done in 'Git for Windows' in compat/mingw.[ch]
>
> Instead of duplicating the code, it is extracted into compat/mingw-cygwin.[ch]
> Some need for refactoring and cleanup came up in the review, they are adressed
> in a seperate commit.

i have applied the 3 patches against current master, and my original test
passes, so looks good to me.

however like Johannes i take issue with the naming. as he said "mingw-cygwin"
really isnt appropriate. ideally it would be "windows.h", but as that is
conspicuously in use, something like these:

- pc-windows
- pc-win
- win

i disagree with him on using "win32" - that doesnt really make sense, as
obviously you can compile 64-bit Git for Windows. if you wanted to go that route
you would want to use something like:

- windows-api
- win-api
- winapi

further - i disagree with the "DOS" moniker being used at all. DOS is a defunkt
operating system that i dont think Git has *ever* supported, so it doesnt make
sense to be referring to it this way. again, a more approriate name would be
something like "win_drive_prefix".

^ permalink raw reply

* Re: [PATCH] docs/gitweb.conf: config variable typo
From: FeRD @ 2018-12-08  0:48 UTC (permalink / raw)
  To: jrnieder; +Cc: git
In-Reply-To: <20181208004504.GG73340@google.com>

On Fri, Dec 7, 2018 at 7:45 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
>
> Thanks for fixing it.  May we forge your sign-off?

Yes please, guess I didn't read far enough down that document. My apologies.
Consider the previous patch email:

Signed-off-by: FeRD (Frank Dana) <ferdnyc@gmail.com>

^ permalink raw reply

* Re: [PATCH] docs/gitweb.conf: config variable typo
From: Jonathan Nieder @ 2018-12-08  0:45 UTC (permalink / raw)
  To: Frank Dana; +Cc: git
In-Reply-To: <010201678b350faa-868bbac4-9242-427a-9d3b-fc6f00a95270-000000@eu-west-1.amazonses.com>

Hi,

Frank Dana wrote:

> The documentation for the feature 'snapshot' claimed
> "This feature can be configured on a per-repository basis via
> repository's `gitweb.blame` configuration variable"
>
> Fixed to specify `gitweb.snapshot` as the variable name.
> ---
>  Documentation/gitweb.conf.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks for fixing it.  May we forge your sign-off?  See
Documentation/SubmittingPatches section "Certify your work" for what
this means.

Sincerely,
Jonathan

^ permalink raw reply

* [PATCH] docs/gitweb.conf: config variable typo
From: Frank Dana @ 2018-12-08  0:26 UTC (permalink / raw)
  To: git

The documentation for the feature 'snapshot' claimed
"This feature can be configured on a per-repository basis via
repository's `gitweb.blame` configuration variable"

Fixed to specify `gitweb.snapshot` as the variable name.
---
 Documentation/gitweb.conf.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index c0a326e3883c3..40c9563ef67af 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -684,7 +684,7 @@ compressed tar archive) and "zip"; please consult gitweb sources for
 a definitive list.  By default only "tgz" is offered.
 +
 This feature can be configured on a per-repository basis via
-repository's `gitweb.blame` configuration variable, which contains
+repository's `gitweb.snapshot` configuration variable, which contains
 a comma separated list of formats or "none" to disable snapshots.
 Unknown values are ignored.
 

--
https://github.com/git/git/pull/562

^ permalink raw reply related

* Difficulty with parsing colorized diff output
From: George King @ 2018-12-08  0:09 UTC (permalink / raw)
  To: Git Mailing List

Hello, I have a rather elaborate diff highlighter that I have implemented as a post-processor to regular git output. I am writing to discuss some difficult aspects of git diff's color output that I am observing with version 2.19.2. This is not a regression report; I am trying to implement a new feature and am stymied by these details.

My goal is to detect SGR color sequences, e.g. '\x1b[32m', that exist in the source text, and have my highlighter print escaped representations of those. For example, I have checked in files that are expected test outputs for tools that emit color codes, and diffs of those get very confusing.

Figuring out which color codes are from the source text and which were added by git is proving very difficult. The obvious solution is to turn git diff coloring off, but as far as I can see this also turns off all coloring for logs, which is undesirable.

Then I tried to remove just the color codes that git adds to the diff. This almost works, but there are some irregularities. Most lines begin with a style/color code and end with a reset code, which would be a perfect indicator that git is using colors. However:

* Context lines do not begin with reset code, but do end with a reset code. It would be preferable in my opinion if they had both (like every other line), or none at all.

* Added lines have excess codes after the plus sign. The entire prefix is, `\x1b[32m+\x1b[m\x1b[32m` translating to GREEN PLUS RESET GREEN. Emitting codes after the plus sign makes the parsing more complex and idiosyncratic.


In summary, I would like to suggest the following improvements:

* Remove the excess codes after the plus sign.

* When git diff is adding colors, ensure that every line begins with an SGR code and ends with the RESET code.

* Add a config feature to turn on log coloring while leaving diff coloring off.


I would be willing to attempt a fix for this myself, but I'd like to hear what the maintainers think first, and would appreciate any hints as to where I should start looking in the code base.


If anyone is curious about the implementation it is called `same-same` and lives here: https://github.com/gwk/pithy/blob/master/pithy/bin/same_same.py

I configure it like this in .gitconfig:

[core]
  pager = same-same | LESSANSIENDCHARS=mK less --RAW-CONTROL-CHARS
[interactive]
  diffFilter = same-same -interactive | LESSANSIENDCHARS=mK less --RAW-CONTROL-CHARS


Thank you,
George


^ permalink raw reply

* [PATCH 4/4] submodule deinit: unset core.worktree
From: Stefan Beller @ 2018-12-07 23:54 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller
In-Reply-To: <20181207235425.128568-1-sbeller@google.com>

This re-introduces 984cd77ddb (submodule deinit: unset core.worktree,
2018-06-18), which was reverted as part of f178c13fda (Revert "Merge
branch 'sb/submodule-core-worktree'", 2018-09-07)

The whole series was reverted as the offending commit e98317508c
(submodule: ensure core.worktree is set after update, 2018-06-18)
was relied on by other commits such as 984cd77ddb.

Keep the offending commit reverted, but its functionality came back via
4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17), such
that we can reintroduce 984cd77ddb now.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 2 ++
 t/lib-submodule-update.sh   | 2 +-
 t/t7400-submodule-basic.sh  | 5 +++++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 31ac30cf2f..672b74db89 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1131,6 +1131,8 @@ static void deinit_submodule(const char *path, const char *prefix,
 		if (!(flags & OPT_QUIET))
 			printf(format, displaypath);
 
+		submodule_unset_core_worktree(sub);
+
 		strbuf_release(&sb_rm);
 	}
 
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 51d4555549..5b56b23166 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -235,7 +235,7 @@ reset_work_tree_to_interested () {
 	then
 		mkdir -p submodule_update/.git/modules/sub1/modules &&
 		cp -r submodule_update_repo/.git/modules/sub1/modules/sub2 submodule_update/.git/modules/sub1/modules/sub2
-		GIT_WORK_TREE=. git -C submodule_update/.git/modules/sub1/modules/sub2 config --unset core.worktree
+		# core.worktree is unset for sub2 as it is not checked out
 	fi &&
 	# indicate we are interested in the submodule:
 	git -C submodule_update config submodule.sub1.url "bogus" &&
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 76a7cb0af7..aba2d4d6ee 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -984,6 +984,11 @@ test_expect_success 'submodule deinit should remove the whole submodule section
 	rmdir init
 '
 
+test_expect_success 'submodule deinit should unset core.worktree' '
+	test_path_is_file .git/modules/example/config &&
+	test_must_fail git config -f .git/modules/example/config core.worktree
+'
+
 test_expect_success 'submodule deinit from subdirectory' '
 	git submodule update --init &&
 	git config submodule.example.foo bar &&
-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply related

* [PATCH 3/4] submodule--helper: fix BUG message in ensure_core_worktree
From: Stefan Beller @ 2018-12-07 23:54 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller
In-Reply-To: <20181207235425.128568-1-sbeller@google.com>

Shortly after f178c13fda (Revert "Merge branch
'sb/submodule-core-worktree'", 2018-09-07), we had another series
that implemented partially the same, ensuring that core.worktree was
set in a checked out submodule, namely 74d4731da1 (submodule--helper:
replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13)

As the series 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c',
2018-09-17) has different goals than the reverted series 7e25437d35
(Merge branch 'sb/submodule-core-worktree', 2018-07-18), I'd wanted to
replay the series on top of it to reach the goal of having `core.worktree`
correctly set when the submodules worktree is present, and unset when the
worktree is not present.

The replay resulted in a strange merge conflict highlighting that
the BUG message was not changed in 74d4731da1 (submodule--helper:
replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13).

Fix the error message.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index d38113a31a..31ac30cf2f 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2045,7 +2045,7 @@ static int ensure_core_worktree(int argc, const char **argv, const char *prefix)
 	struct repository subrepo;
 
 	if (argc != 2)
-		BUG("submodule--helper connect-gitdir-workingtree <name> <path>");
+		BUG("submodule--helper ensure-core-worktree <path>");
 
 	path = argv[1];
 
-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply related

* [PATCH 2/4] submodule: unset core.worktree if no working tree is present
From: Stefan Beller @ 2018-12-07 23:54 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Junio C Hamano
In-Reply-To: <20181207235425.128568-1-sbeller@google.com>

This reintroduces 4fa4f90ccd (submodule: unset core.worktree if no working
tree is present, 2018-06-12), which was reverted as part of f178c13fda
(Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07).

4fa4f90ccd was reverted as its followup commit was faulty, but without
the accompanying change of the followup, we'd have an incomplete workflow
of setting `core.worktree` again, when it is needed such as checking out
a revision that contains a submodule.

So re-introduce that commit as-is, focusing on fixing up the followup

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c               | 14 ++++++++++++++
 submodule.h               |  2 ++
 t/lib-submodule-update.sh |  3 ++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 6415cc5580..d393e947e6 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1561,6 +1561,18 @@ int bad_to_remove_submodule(const char *path, unsigned flags)
 	return ret;
 }
 
+void submodule_unset_core_worktree(const struct submodule *sub)
+{
+	char *config_path = xstrfmt("%s/modules/%s/config",
+				    get_git_common_dir(), sub->name);
+
+	if (git_config_set_in_file_gently(config_path, "core.worktree", NULL))
+		warning(_("Could not unset core.worktree setting in submodule '%s'"),
+			  sub->path);
+
+	free(config_path);
+}
+
 static const char *get_super_prefix_or_empty(void)
 {
 	const char *s = get_super_prefix();
@@ -1726,6 +1738,8 @@ int submodule_move_head(const char *path,
 
 			if (is_empty_dir(path))
 				rmdir_or_warn(path);
+
+			submodule_unset_core_worktree(sub);
 		}
 	}
 out:
diff --git a/submodule.h b/submodule.h
index a680214c01..9e18e9b807 100644
--- a/submodule.h
+++ b/submodule.h
@@ -131,6 +131,8 @@ int submodule_move_head(const char *path,
 			const char *new_head,
 			unsigned flags);
 
+void submodule_unset_core_worktree(const struct submodule *sub);
+
 /*
  * Prepare the "env_array" parameter of a "struct child_process" for executing
  * a submodule by clearing any repo-specific environment variables, but
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 016391723c..51d4555549 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -709,7 +709,8 @@ test_submodule_recursing_with_args_common() {
 			git branch -t remove_sub1 origin/remove_sub1 &&
 			$command remove_sub1 &&
 			test_superproject_content origin/remove_sub1 &&
-			! test -e sub1
+			! test -e sub1 &&
+			test_must_fail git config -f .git/modules/sub1/config core.worktree
 		)
 	'
 	# ... absorbing a .git directory along the way.
-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply related

* [PATCH 1/4] submodule update: add regression test with old style setups
From: Stefan Beller @ 2018-12-07 23:54 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller
In-Reply-To: <20181207235425.128568-1-sbeller@google.com>

As f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'",
2018-09-07) was produced shortly before a release, nobody asked for
a regression test to be included. Add a regression test that makes sure
that the invocation of `git submodule update` on old setups doesn't
produce errors as pointed out in f178c13fda.

The place to add such a regression test may look odd in t7412, but
that is the best place as there we setup old style submodule setups
explicitly.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/t7412-submodule-absorbgitdirs.sh | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/t/t7412-submodule-absorbgitdirs.sh b/t/t7412-submodule-absorbgitdirs.sh
index ce74c12da2..1cfa150768 100755
--- a/t/t7412-submodule-absorbgitdirs.sh
+++ b/t/t7412-submodule-absorbgitdirs.sh
@@ -75,7 +75,12 @@ test_expect_success 're-setup nested submodule' '
 	GIT_WORK_TREE=../../../nested git -C sub1/.git/modules/nested config \
 		core.worktree "../../../nested" &&
 	# make sure this re-setup is correct
-	git status --ignore-submodules=none
+	git status --ignore-submodules=none &&
+
+	# also make sure this old setup does not regress
+	git submodule update --init --recursive >out 2>err &&
+	test_must_be_empty out &&
+	test_must_be_empty err
 '
 
 test_expect_success 'absorb the git dir in a nested submodule' '
-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply related

* [PATCH 0/4]
From: Stefan Beller @ 2018-12-07 23:54 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

A couple days before the 2.19 release we had a bug report about
broken submodules[1] and reverted[2] the commits leading up to them.

The behavior of said bug fixed itself by taking a different approach[3],
specifically by a weaker enforcement of having `core.worktree` set in a
submodule [4].

The revert [2] was overly broad as we neared the release, such that we wanted
to rather keep the known buggy behavior of always having `core.worktree` set,
rather than figuring out how to fix the new bug of having 'git submodule update'
not working in old style repository setups.

This series re-introduces those reverted patches, with no changes in code,
but with drastically changed commit messages, as those focus on why it is safe
to re-introduce them instead of explaining the desire for the change.

[1] https://public-inbox.org/git/2659750.rG6xLiZASK@twilight
[2] f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07)
[3] 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17)
[4] 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13)

Stefan Beller (4):
  submodule update: add regression test with old style setups
  submodule: unset core.worktree if no working tree is present
  submodule--helper: fix BUG message in ensure_core_worktree
  submodule deinit: unset core.worktree

 builtin/submodule--helper.c        |  4 +++-
 submodule.c                        | 14 ++++++++++++++
 submodule.h                        |  2 ++
 t/lib-submodule-update.sh          |  5 +++--
 t/t7400-submodule-basic.sh         |  5 +++++
 t/t7412-submodule-absorbgitdirs.sh |  7 ++++++-
 6 files changed, 33 insertions(+), 4 deletions(-)

-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply

* Re: [PATCH] commit: abort before commit-msg if empty message
From: Jonathan Nieder @ 2018-12-07 23:07 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <20181207224817.231957-1-jonathantanmy@google.com>

Hi,

Jonathan Tan wrote:

> (The implementation in this commit reads the commit message twice even
> if there is no commit-msg hook. I think that this is fine, since this is
> for interactive use - an alternative would be to plumb information about
> the absence of the hook from run_hook_ve() upwards, which seems more
> complicated.)

Sounds like a good followup for an interested person to do later.  Can
you include a NEEDSWORK comment describing this?

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> This was noticed with the commit-msg hook that comes with Gerrit, which
> basically just calls git interpret-trailers to add a Change-Id trailer.

Thanks for fixing it.

This kind of context tends to be very useful when looking back at a
commit later, so I'd be happy to see it in the commit message too.

[...]
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -652,6 +652,21 @@ static void adjust_comment_line_char(const struct strbuf *sb)
>  	comment_line_char = *p;
>  }
>  
> +static void read_and_clean_commit_message(struct strbuf *sb)
> +{
> +	if (strbuf_read_file(sb, git_path_commit_editmsg(), 0) < 0) {
> +		int saved_errno = errno;
> +		rollback_index_files();
> +		die(_("could not read commit message: %s"), strerror(saved_errno));

Long line.

More importantly, I wonder if this can use die_errno.
rollback_index_files calls delete_tempfile, which can clobber errno, so
that would require restoring errno either here or in that function:

diff --git i/lockfile.h w/lockfile.h
index 35403ccc0d..d592f384e7 100644
--- i/lockfile.h
+++ w/lockfile.h
@@ -298,10 +298,14 @@ static inline int commit_lock_file_to(struct lock_file *lk, const char *path)
  * remove the lockfile. It is a NOOP to call `rollback_lock_file()`
  * for a `lock_file` object that has already been committed or rolled
  * back.
+ *
+ * Saves and restores errno for more convenient use during error handling.
  */
 static inline void rollback_lock_file(struct lock_file *lk)
 {
+	int save_errno = errno;
 	delete_tempfile(&lk->tempfile);
+	errno = save_errno;
 }
 
 #endif /* LOCKFILE_H */

It doesn't make the code more obvious so what you have is probably
better.

Also, on second glance, this is just moved code.  Still hopefully some
of the above is amusing (and rewrapping would be fine to do during the
move).

[...]
> @@ -970,6 +985,22 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
>  		argv_array_clear(&env);
>  	}
>  
> +	if (use_editor && !no_verify) {
> +		/*
> +		 * Abort the commit if the user supplied an empty commit
> +		 * message in the editor. (Because the commit-msg hook is to be
> +		 * run, we need to check this now, since that hook may change
> +		 * the commit message.)
> +		 */
> +		read_and_clean_commit_message(&sb);
> +		if (message_is_empty(&sb, cleanup_mode) && !allow_empty_message) {
> +			rollback_index_files();
> +			fprintf(stderr, _("Aborting commit due to empty commit message.\n"));
> +			exit(1);

What about the template_untouched case?

Should the two call sites share code?

[...]
> +		}
> +		strbuf_release(&sb);
> +	}
> +
>  	if (!no_verify &&
>  	    run_commit_hook(use_editor, index_file, "commit-msg", git_path_commit_editmsg(), NULL)) {
>  		return 0;

This means we have a little duplication of code: first we check whether
to abort here in prepare_to_commit, and then again after the hook is run
in cmd_commit.

Is there some other code structure that would make things more clear?

cmd_commit also seems to be rather long --- is there some logical way
to split it up that would make the code clearer (as a preparatory or
followup patch)?

In cmd_commit, we spend a while (in number of lines, not actual
running time) to determine parents before deciding whether the user
has chosen to abort by writing an empty message.  Should we perform
that check sooner, closer to the prepare_to_commit call?

> @@ -1608,17 +1639,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>  
>  	/* Finally, get the commit message */
>  	strbuf_reset(&sb);
> -	if (strbuf_read_file(&sb, git_path_commit_editmsg(), 0) < 0) {
> -		int saved_errno = errno;
> -		rollback_index_files();
> -		die(_("could not read commit message: %s"), strerror(saved_errno));
> -	}
> -
> -	if (verbose || /* Truncate the message just before the diff, if any. */
> -	    cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS)
> -		strbuf_setlen(&sb, wt_status_locate_end(sb.buf, sb.len));
> -	if (cleanup_mode != COMMIT_MSG_CLEANUP_NONE)
> -		strbuf_stripspace(&sb, cleanup_mode == COMMIT_MSG_CLEANUP_ALL);
> +	read_and_clean_commit_message(&sb);
>  
>  	if (message_is_empty(&sb, cleanup_mode) && !allow_empty_message) {
>  		rollback_index_files();
> diff --git a/t/t7504-commit-msg-hook.sh b/t/t7504-commit-msg-hook.sh
> index 31b9c6a2c1..b44d6fc43e 100755
> --- a/t/t7504-commit-msg-hook.sh
> +++ b/t/t7504-commit-msg-hook.sh
> @@ -122,6 +122,17 @@ test_expect_success 'with failing hook (editor)' '
>  
>  '
>  
> +test_expect_success 'hook is not run if commit message was empty' '
> +	echo "yet more another" >>file &&
> +	git add file &&
> +	echo >FAKE_MSG &&
> +	test_must_fail env GIT_EDITOR="\"\$FAKE_EDITOR\"" git commit 2>err &&
> +
> +	# Verify that git stopped because it saw an empty message, not because
> +	# the hook exited with non-zero error code
> +	test_i18ngrep "Aborting commit due to empty commit message" err
> +'
> +

Nice, makes sense.

Thanks and hope that helps,
Jonathan

^ permalink raw reply related

* Re: [RFC PATCH 2/3] Documentation/git-rev-list: s/<commit>/<object>/
From: Matthew DeVore @ 2018-12-07 22:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Stefan Beller, git, jeffhost, Jeff King, Stefan Beller,
	Jonathan Tan, pclouds
In-Reply-To: <xmqqo9bmyb56.fsf@gitster-ct.c.googlers.com>

On Sun, Oct 21, 2018 at 7:21 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matthew DeVore <matvore@google.com> writes:
>
> >> It is more like "this is a set operation across commits.  We also
> >> show objects that are reachable from the commits in the resulting
> >> set and are not reachable from the commits in the set that were
> >> excluded when --objects option is given".
> >>
> > That would be correct though it wouldn't tell that you can use
> > "--objects ^foo-tree bar-tree."
>
> Yeah, but quite honestly, I consider that it is working by accident,
> not by design, in the current code (iow, you are looking at a
> behaviour of whatever the code happens to do).  "rev-list" is pretty
> much set operation across commits, and anything that deals with a
> non commit-ish given from the command line is an afterthought at
> best, and happenstance in reality.
>
> I do not mean to say that the code must stay that way, though.

I tried fixing the issue of "--objects ^commitobj treeobj" not
properly excluding objects reachable from commitobj, but this ended up
causing t5616-partial-clone.sh to fail. In the test labeled "manual
prefetch of missing objects", we create a clone of srv.bare without
blobs called "pc1", then push some new commits to srv.bare (via a
separate "local" repo), and try to fetch missing blobs with this
command:

$ git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" <observed.oids

Where observed.oids contains all the blobs that were missing. It tells
the remote that it already has the "refs/heads/master" commit, which
means it is excluded. Before, this worked fine, since it didn't mean
the blobs were excluded, only the commit itself. With the fix I
naively put together (and didn't share), because one of the blobs is
reachable from "refs/heads/master" it wouldn't pull it. I guess I can
change it so that items given directly to stdin or argv are always
fetched, though I'm not sure how easy that will be, and I'm not as
interested in fixing this as I once was. I just wanted the
documentation to outline the object-enumeration capabilities. So I'll
send a re-roll which makes much more modest changes to the
documentation.

^ permalink raw reply

* [PATCH] commit: abort before commit-msg if empty message
From: Jonathan Tan @ 2018-12-07 22:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When a user runs "git commit" without specifying a message, an editor
appears with advice:

    Please enter the commit message for your changes. Lines starting
    with '#' will be ignored, and an empty message aborts the commit.

However, if the user supplies an empty message and has a commit-msg hook
which updates the message to be non-empty, the commit proceeds to occur,
despite what the advice states.

Teach commit to also check the emptiness of the commit message before it
invokes the commit-msg hook if an editor is used and if no_verify is not
set (that is, commit-msg is not suppressed). This makes the advice true.

(The implementation in this commit reads the commit message twice even
if there is no commit-msg hook. I think that this is fine, since this is
for interactive use - an alternative would be to plumb information about
the absence of the hook from run_hook_ve() upwards, which seems more
complicated.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
This was noticed with the commit-msg hook that comes with Gerrit, which
basically just calls git interpret-trailers to add a Change-Id trailer.
---
 builtin/commit.c           | 43 ++++++++++++++++++++++++++++----------
 t/t7504-commit-msg-hook.sh | 11 ++++++++++
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index c021b119bb..3681a59af8 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -652,6 +652,21 @@ static void adjust_comment_line_char(const struct strbuf *sb)
 	comment_line_char = *p;
 }
 
+static void read_and_clean_commit_message(struct strbuf *sb)
+{
+	if (strbuf_read_file(sb, git_path_commit_editmsg(), 0) < 0) {
+		int saved_errno = errno;
+		rollback_index_files();
+		die(_("could not read commit message: %s"), strerror(saved_errno));
+	}
+
+	if (verbose || /* Truncate the message just before the diff, if any. */
+	    cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS)
+		strbuf_setlen(sb, wt_status_locate_end(sb->buf, sb->len));
+	if (cleanup_mode != COMMIT_MSG_CLEANUP_NONE)
+		strbuf_stripspace(sb, cleanup_mode == COMMIT_MSG_CLEANUP_ALL);
+}
+
 static int prepare_to_commit(const char *index_file, const char *prefix,
 			     struct commit *current_head,
 			     struct wt_status *s,
@@ -970,6 +985,22 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		argv_array_clear(&env);
 	}
 
+	if (use_editor && !no_verify) {
+		/*
+		 * Abort the commit if the user supplied an empty commit
+		 * message in the editor. (Because the commit-msg hook is to be
+		 * run, we need to check this now, since that hook may change
+		 * the commit message.)
+		 */
+		read_and_clean_commit_message(&sb);
+		if (message_is_empty(&sb, cleanup_mode) && !allow_empty_message) {
+			rollback_index_files();
+			fprintf(stderr, _("Aborting commit due to empty commit message.\n"));
+			exit(1);
+		}
+		strbuf_release(&sb);
+	}
+
 	if (!no_verify &&
 	    run_commit_hook(use_editor, index_file, "commit-msg", git_path_commit_editmsg(), NULL)) {
 		return 0;
@@ -1608,17 +1639,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 
 	/* Finally, get the commit message */
 	strbuf_reset(&sb);
-	if (strbuf_read_file(&sb, git_path_commit_editmsg(), 0) < 0) {
-		int saved_errno = errno;
-		rollback_index_files();
-		die(_("could not read commit message: %s"), strerror(saved_errno));
-	}
-
-	if (verbose || /* Truncate the message just before the diff, if any. */
-	    cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS)
-		strbuf_setlen(&sb, wt_status_locate_end(sb.buf, sb.len));
-	if (cleanup_mode != COMMIT_MSG_CLEANUP_NONE)
-		strbuf_stripspace(&sb, cleanup_mode == COMMIT_MSG_CLEANUP_ALL);
+	read_and_clean_commit_message(&sb);
 
 	if (message_is_empty(&sb, cleanup_mode) && !allow_empty_message) {
 		rollback_index_files();
diff --git a/t/t7504-commit-msg-hook.sh b/t/t7504-commit-msg-hook.sh
index 31b9c6a2c1..b44d6fc43e 100755
--- a/t/t7504-commit-msg-hook.sh
+++ b/t/t7504-commit-msg-hook.sh
@@ -122,6 +122,17 @@ test_expect_success 'with failing hook (editor)' '
 
 '
 
+test_expect_success 'hook is not run if commit message was empty' '
+	echo "yet more another" >>file &&
+	git add file &&
+	echo >FAKE_MSG &&
+	test_must_fail env GIT_EDITOR="\"\$FAKE_EDITOR\"" git commit 2>err &&
+
+	# Verify that git stopped because it saw an empty message, not because
+	# the hook exited with non-zero error code
+	test_i18ngrep "Aborting commit due to empty commit message" err
+'
+
 test_expect_success '--no-verify with failing hook' '
 
 	echo "stuff" >> file &&
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related

* Re: enhancement: support for author.email and author.name in "git config"
From: Jonathan Nieder @ 2018-12-07 22:44 UTC (permalink / raw)
  To: William Hubbs
  Cc: Johannes Schindelin, Martin Ågren, Git Mailing List,
	chutzpah
In-Reply-To: <20181207222147.GA22590@whubbs1.gaikai.biz>

William Hubbs wrote:
> On Thu, Dec 06, 2018 at 11:20:07PM +0100, Johannes Schindelin wrote:

>> There *is* a way to get what you want that is super easy and will
>> definitely work: if you sit down and do it ;-)
>>
>> Please let us know if you need any additional information before you can
>> start.
>
> Which branch should I work off of in the repo?

"master".

Also, please make sure the documentation (e.g. in
Documentation/config/user.txt) describes when a user would want to set
this option.

See also:
- Documentation/SubmittingPatches
- the DISCUSSION section of "git help format-patch"
- [1]
- https://www.kernel.org/pub/software/scm/git/docs/user-manual.html#hacking-git

Thanks,
Jonathan

[1] https://github.com/gitgitgadget/gitgitgadget#a-bot-to-serve-as-glue-between-github-pull-requests-and-the-git-mailing-list

^ permalink raw reply

* Re: RFE: git-patch-id should handle patches without leading "diff"
From: Jonathan Nieder @ 2018-12-07 22:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Konstantin Ryabitsev, git
In-Reply-To: <87tvjpx9fy.fsf@evledraar.gmail.com>

Ævar Arnfjörð Bjarmason wrote:
> On Fri, Dec 07 2018, Jonathan Nieder wrote:

>> The patch-id appears to only care about the diff text, so it should be
>> able to handle this.  So if we have a better heuristic for where the
>> diff starts, it would be good to use it.
>
> No, the patch-id doesn't just care about the diff, it cares about the
> context before the diff too.

Sorry, I did a bad job of communicating.  When I said "diff text", I was
including context.

[...]
> Observe that the diff --git line matters, we hash it:
>
>     $ git diff-tree -p HEAD~.. | git patch-id
>     5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
>     $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable
>     5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
>     $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable
>     4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000

Oh, hm.  That's unfortunate.

[...]
> So it seems most sensible to me if this is going to be supported that we
> go a bit beyond the call of duty and fake up the start of it, namely:
>
>     --- a/arch/x86/kernel/process.c
>     +++ b/arch/x86/kernel/process.c
>
> To be:
>
>     diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>     --- a/arch/x86/kernel/process.c
>     +++ b/arch/x86/kernel/process.c

Right.  We may want to handle diff.mnemonicPrefix as well.

Jonathan

^ permalink raw reply

* [PATCH v3 3/3] Makefile: correct example fuzz build
From: Josh Steadmon @ 2018-12-07 22:27 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, avarab, peff
In-Reply-To: <cover.1544221121.git.steadmon@google.com>

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 6b72f37c29..bbcfc2bc9f 100644
--- a/Makefile
+++ b/Makefile
@@ -3104,7 +3104,7 @@ cover_db_html: cover_db
 # An example command to build against libFuzzer from LLVM 4.0.0:
 #
 # make CC=clang CXX=clang++ \
-#      FUZZ_CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
+#      CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
 #      LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \
 #      fuzz-all
 #
-- 
2.20.0.rc2.12.g4c11c11dec


^ permalink raw reply related

* [PATCH v3 2/3] commit-graph: fix buffer read-overflow
From: Josh Steadmon @ 2018-12-07 22:27 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, avarab, peff
In-Reply-To: <cover.1544221121.git.steadmon@google.com>

fuzz-commit-graph identified a case where Git will read past the end of
a buffer containing a commit graph if the graph's header has an
incorrect chunk count. A simple bounds check in parse_commit_graph()
prevents this.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 commit-graph.c          | 14 ++++++++++++--
 t/t5318-commit-graph.sh | 15 +++++++++++++--
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 07dd410f3c..836d65a1d3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -165,10 +165,20 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	last_chunk_offset = 8;
 	chunk_lookup = data + 8;
 	for (i = 0; i < graph->num_chunks; i++) {
-		uint32_t chunk_id = get_be32(chunk_lookup + 0);
-		uint64_t chunk_offset = get_be64(chunk_lookup + 4);
+		uint32_t chunk_id;
+		uint64_t chunk_offset;
 		int chunk_repeated = 0;
 
+		if (data + graph_size - chunk_lookup <
+		    GRAPH_CHUNKLOOKUP_WIDTH) {
+			error(_("chunk lookup table entry missing; graph file may be incomplete"));
+			free(graph);
+			return NULL;
+		}
+
+		chunk_id = get_be32(chunk_lookup + 0);
+		chunk_offset = get_be64(chunk_lookup + 4);
+
 		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
 		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 5fe21db99f..5b6b44b78e 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -366,24 +366,30 @@ GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
 GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
 GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
 
-# usage: corrupt_graph_and_verify <position> <data> <string>
+# usage: corrupt_graph_and_verify <position> <data> <string> [<zero_pos>]
 # Manipulates the commit-graph file at the position
-# by inserting the data, then runs 'git commit-graph verify'
+# by inserting the data, optionally zeroing the file
+# starting at <zero_pos>, then runs 'git commit-graph verify'
 # and places the output in the file 'err'. Test 'err' for
 # the given string.
 corrupt_graph_and_verify() {
 	pos=$1
 	data="${2:-\0}"
 	grepstr=$3
+	orig_size=$(stat --format=%s $objdir/info/commit-graph)
+	zero_pos=${4:-${orig_size}}
 	cd "$TRASH_DIRECTORY/full" &&
 	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
 	cp $objdir/info/commit-graph commit-graph-backup &&
 	printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc &&
+	truncate --size=$zero_pos $objdir/info/commit-graph &&
+	truncate --size=$orig_size $objdir/info/commit-graph &&
 	test_must_fail git commit-graph verify 2>test_err &&
 	grep -v "^+" test_err >err
 	test_i18ngrep "$grepstr" err
 }
 
+
 test_expect_success 'detect bad signature' '
 	corrupt_graph_and_verify 0 "\0" \
 		"graph signature"
@@ -484,6 +490,11 @@ test_expect_success 'detect invalid checksum hash' '
 		"incorrect checksum"
 '
 
+test_expect_success 'detect incorrect chunk count' '
+	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \
+		"chunk lookup table entry missing" $GRAPH_CHUNK_LOOKUP_OFFSET
+'
+
 test_expect_success 'git fsck (checks commit-graph)' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git fsck &&
-- 
2.20.0.rc2.12.g4c11c11dec


^ permalink raw reply related

* [PATCH v3 1/3] commit-graph, fuzz: Add fuzzer for commit-graph
From: Josh Steadmon @ 2018-12-07 22:27 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, avarab, peff
In-Reply-To: <cover.1544221121.git.steadmon@google.com>

Breaks load_commit_graph_one() into a new function,
parse_commit_graph(). The latter function operates on arbitrary buffers,
which makes it suitable as a fuzzing target. Since parse_commit_graph()
is only called by load_commit_graph_one() (and the fuzzer described
below), we omit error messages that would be duplicated by the caller.

Adds fuzz-commit-graph.c, which provides a fuzzing entry point
compatible with libFuzzer (and possibly other fuzzing engines).

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 .gitignore          |  1 +
 Makefile            |  1 +
 commit-graph.c      | 53 ++++++++++++++++++++++++++++++---------------
 commit-graph.h      |  3 +++
 fuzz-commit-graph.c | 16 ++++++++++++++
 5 files changed, 57 insertions(+), 17 deletions(-)
 create mode 100644 fuzz-commit-graph.c

diff --git a/.gitignore b/.gitignore
index 0d77ea5894..8bcf153ed9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,4 @@
+/fuzz-commit-graph
 /fuzz_corpora
 /fuzz-pack-headers
 /fuzz-pack-idx
diff --git a/Makefile b/Makefile
index 1a44c811aa..6b72f37c29 100644
--- a/Makefile
+++ b/Makefile
@@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \
 
 ETAGS_TARGET = TAGS
 
+FUZZ_OBJS += fuzz-commit-graph.o
 FUZZ_OBJS += fuzz-pack-headers.o
 FUZZ_OBJS += fuzz-pack-idx.o
 
diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..07dd410f3c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -84,16 +84,10 @@ static int commit_graph_compatible(struct repository *r)
 struct commit_graph *load_commit_graph_one(const char *graph_file)
 {
 	void *graph_map;
-	const unsigned char *data, *chunk_lookup;
 	size_t graph_size;
 	struct stat st;
-	uint32_t i;
-	struct commit_graph *graph;
+	struct commit_graph *ret;
 	int fd = git_open(graph_file);
-	uint64_t last_chunk_offset;
-	uint32_t last_chunk_id;
-	uint32_t graph_signature;
-	unsigned char graph_version, hash_version;
 
 	if (fd < 0)
 		return NULL;
@@ -108,27 +102,55 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 		die(_("graph file %s is too small"), graph_file);
 	}
 	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	ret = parse_commit_graph(graph_map, fd, graph_size);
+
+	if (!ret) {
+		munmap(graph_map, graph_size);
+		close(fd);
+		exit(1);
+	}
+
+	return ret;
+}
+
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+					size_t graph_size)
+{
+	const unsigned char *data, *chunk_lookup;
+	uint32_t i;
+	struct commit_graph *graph;
+	uint64_t last_chunk_offset;
+	uint32_t last_chunk_id;
+	uint32_t graph_signature;
+	unsigned char graph_version, hash_version;
+
+	if (!graph_map)
+		return NULL;
+
+	if (graph_size < GRAPH_MIN_SIZE)
+		return NULL;
+
 	data = (const unsigned char *)graph_map;
 
 	graph_signature = get_be32(data);
 	if (graph_signature != GRAPH_SIGNATURE) {
 		error(_("graph signature %X does not match signature %X"),
 		      graph_signature, GRAPH_SIGNATURE);
-		goto cleanup_fail;
+		return NULL;
 	}
 
 	graph_version = *(unsigned char*)(data + 4);
 	if (graph_version != GRAPH_VERSION) {
 		error(_("graph version %X does not match version %X"),
 		      graph_version, GRAPH_VERSION);
-		goto cleanup_fail;
+		return NULL;
 	}
 
 	hash_version = *(unsigned char*)(data + 5);
 	if (hash_version != GRAPH_OID_VERSION) {
 		error(_("hash version %X does not match version %X"),
 		      hash_version, GRAPH_OID_VERSION);
-		goto cleanup_fail;
+		return NULL;
 	}
 
 	graph = alloc_commit_graph();
@@ -152,7 +174,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
 			error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
 			      (uint32_t)chunk_offset);
-			goto cleanup_fail;
+			free(graph);
+			return NULL;
 		}
 
 		switch (chunk_id) {
@@ -187,7 +210,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 
 		if (chunk_repeated) {
 			error(_("chunk id %08x appears multiple times"), chunk_id);
-			goto cleanup_fail;
+			free(graph);
+			return NULL;
 		}
 
 		if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
@@ -201,11 +225,6 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	}
 
 	return graph;
-
-cleanup_fail:
-	munmap(graph_map, graph_size);
-	close(fd);
-	exit(1);
 }
 
 static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
diff --git a/commit-graph.h b/commit-graph.h
index 9db40b4d3a..813e7c19f1 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -54,6 +54,9 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+					size_t graph_size);
+
 /*
  * Return 1 if and only if the repository has a commit-graph
  * file and generation numbers are computed in that file.
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
new file mode 100644
index 0000000000..cf790c9d04
--- /dev/null
+++ b/fuzz-commit-graph.c
@@ -0,0 +1,16 @@
+#include "commit-graph.h"
+
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+					size_t graph_size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+	struct commit_graph *g;
+
+	g = parse_commit_graph((void *)data, -1, size);
+	free(g);
+
+	return 0;
+}
-- 
2.20.0.rc2.12.g4c11c11dec


^ permalink raw reply related

* [PATCH v3 0/3] Add commit-graph fuzzer and fix buffer overflow
From: Josh Steadmon @ 2018-12-07 22:27 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, avarab, peff
In-Reply-To: <cover.1544127439.git.steadmon@google.com>

Ad a new fuzz test for the commit graph and fix a buffer read-overflow
that it discovered. Additionally, fix the Makefile instructions for
building fuzzers.

Changes since V2:
  * Avoid pointer arithmetic overflow when checking the graph's chunk
    count.
  * Merge the corrupt_graph_and_verify and
    corrupt_and_zero_graph_then_verify test functions.

Josh Steadmon (3):
  commit-graph, fuzz: Add fuzzer for commit-graph
  commit-graph: fix buffer read-overflow
  Makefile: correct example fuzz build

 .gitignore              |  1 +
 Makefile                |  3 +-
 commit-graph.c          | 67 +++++++++++++++++++++++++++++------------
 commit-graph.h          |  3 ++
 fuzz-commit-graph.c     | 16 ++++++++++
 t/t5318-commit-graph.sh | 15 +++++++--
 6 files changed, 83 insertions(+), 22 deletions(-)
 create mode 100644 fuzz-commit-graph.c

Range-diff against v2:
1:  af45c2337f ! 1:  675d58ecea commit-graph: fix buffer read-overflow
    @@ -22,8 +22,8 @@
     +		uint64_t chunk_offset;
      		int chunk_repeated = 0;
      
    -+		if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH >
    -+		    data + graph_size) {
    ++		if (data + graph_size - chunk_lookup <
    ++		    GRAPH_CHUNKLOOKUP_WIDTH) {
     +			error(_("chunk lookup table entry missing; graph file may be incomplete"));
     +			free(graph);
     +			return NULL;
    @@ -40,31 +40,34 @@
      --- a/t/t5318-commit-graph.sh
      +++ b/t/t5318-commit-graph.sh
     @@
    - 	test_i18ngrep "$grepstr" err
    - }
    + GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
    + GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
      
    -+
    -+# usage: corrupt_and_zero_graph_then_verify <corrupt_position> <data> <zero_position> <string>
    -+# Manipulates the commit-graph file at <corrupt_position> by inserting the data,
    -+# then zeros the file starting at <zero_position>. Finally, runs
    -+# 'git commit-graph verify' and places the output in the file 'err'. Tests 'err'
    -+# for the given string.
    -+corrupt_and_zero_graph_then_verify() {
    -+	corrupt_pos=$1
    -+	data="${2:-\0}"
    -+	zero_pos=$3
    -+	grepstr=$4
    +-# usage: corrupt_graph_and_verify <position> <data> <string>
    ++# usage: corrupt_graph_and_verify <position> <data> <string> [<zero_pos>]
    + # Manipulates the commit-graph file at the position
    +-# by inserting the data, then runs 'git commit-graph verify'
    ++# by inserting the data, optionally zeroing the file
    ++# starting at <zero_pos>, then runs 'git commit-graph verify'
    + # and places the output in the file 'err'. Test 'err' for
    + # the given string.
    + corrupt_graph_and_verify() {
    + 	pos=$1
    + 	data="${2:-\0}"
    + 	grepstr=$3
     +	orig_size=$(stat --format=%s $objdir/info/commit-graph)
    -+	cd "$TRASH_DIRECTORY/full" &&
    -+	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
    -+	cp $objdir/info/commit-graph commit-graph-backup &&
    -+	printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$corrupt_pos" conv=notrunc &&
    ++	zero_pos=${4:-${orig_size}}
    + 	cd "$TRASH_DIRECTORY/full" &&
    + 	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
    + 	cp $objdir/info/commit-graph commit-graph-backup &&
    + 	printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc &&
     +	truncate --size=$zero_pos $objdir/info/commit-graph &&
     +	truncate --size=$orig_size $objdir/info/commit-graph &&
    -+	test_must_fail git commit-graph verify 2>test_err &&
    -+	grep -v "^+" test_err >err &&
    -+	test_i18ngrep "$grepstr" err
    -+}
    + 	test_must_fail git commit-graph verify 2>test_err &&
    + 	grep -v "^+" test_err >err
    + 	test_i18ngrep "$grepstr" err
    + }
    + 
     +
      test_expect_success 'detect bad signature' '
      	corrupt_graph_and_verify 0 "\0" \
    @@ -73,9 +76,9 @@
      		"incorrect checksum"
      '
      
    -+test_expect_success 'detect truncated graph' '
    -+	corrupt_and_zero_graph_then_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \
    -+		$GRAPH_CHUNK_LOOKUP_OFFSET "chunk lookup table entry missing"
    ++test_expect_success 'detect incorrect chunk count' '
    ++	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \
    ++		"chunk lookup table entry missing" $GRAPH_CHUNK_LOOKUP_OFFSET
     +'
     +
      test_expect_success 'git fsck (checks commit-graph)' '
2:  7519fc76df = 2:  06a32bfe8b Makefile: correct example fuzz build
-- 
2.20.0.rc2.12.g4c11c11dec


^ permalink raw reply

* Re: RFE: git-patch-id should handle patches without leading "diff"
From: Ævar Arnfjörð Bjarmason @ 2018-12-07 22:23 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Konstantin Ryabitsev, git
In-Reply-To: <20181207220116.GB73340@google.com>


On Fri, Dec 07 2018, Jonathan Nieder wrote:

> Hi,
>
> Konstantin Ryabitsev wrote:
>
>> Every now and again I come across a patch sent to LKML without a leading
>> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>>
>> https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/
>>
>> I am guessing quilt does not bother including the leading "diff a/foo
>> b/foo" because it's redundant with the next two lines, however this
>> remains a valid patch recognized by git-am.
>>
>> If you pipe that patch via git-patch-id, it produces nothing, but if I
>> put in the leading "diff", like so:
>>
>> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>
>> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Interesting.  As Ævar mentioned, the relevant code is
>
> 		/* Ignore commit comments */
> 		if (!patchlen && !starts_with(line, "diff "))
> 			continue;
>
> which is trying to handle a case where a line that is special to the
> parser appears before the diff begins.
>
> The patch-id appears to only care about the diff text, so it should be
> able to handle this.  So if we have a better heuristic for where the
> diff starts, it would be good to use it.

No, the patch-id doesn't just care about the diff, it cares about the
context before the diff too.

See this patch:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~..
    diff --git x/refs/files-backend.c y/refs/files-backend.c
    index 9183875dad..dd8abe9185 100644
    --- x/refs/files-backend.c
    +++ y/refs/files-backend.c
    @@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store *refs,
                    break;
            case REF_TYPE_OTHER_PSEUDOREF:
            case REF_TYPE_MAIN_PSEUDOREF:
    -               return files_reflog_path_other_worktrees(refs, sb, refname);
    +               files_reflog_path_other_worktrees(refs, sb, refname);
    +               break;
            case REF_TYPE_NORMAL:
                    strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
                    break;

Observe that the diff --git line matters, we hash it:

    $ git diff-tree -p HEAD~.. | git patch-id
    5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable
    5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000

The thing it doesn't care about is the "index" between the "diff" and
patch:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id --stable
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000

We also care about the +++ and --- lines:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id
    56985c2c38cce6079de2690082e1770a8e81214c 0000000000000000000000000000000000000000

Then we normalize the @@ line, e.g.:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/\d+/123/g' | git patch-id
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000


There's other caveats (see the code, e.g. "strip space") but to a first
approximation a patch id is a hash of something that looks like this:
    
    diff --git x/refs/files-backend.c y/refs/files-backend.c
    --- x/refs/files-backend.c
    +++ y/refs/files-backend.c
    @@ -123,123 +123,123 @@ static void files_reflog_path(struct files_ref_store *refs,
                    break;
            case REF_TYPE_OTHER_PSEUDOREF:
            case REF_TYPE_MAIN_PSEUDOREF:
    -               return files_reflog_path_other_worktrees(refs, sb, refname);
    +               files_reflog_path_other_worktrees(refs, sb, refname);
    +               break;
            case REF_TYPE_NORMAL:
                    strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
                    break;

Which means that accepting a patch like this as input would actually
give you a different patch-id than if it had the proper header.

So it seems most sensible to me if this is going to be supported that we
go a bit beyond the call of duty and fake up the start of it, namely:

    --- a/arch/x86/kernel/process.c
    +++ b/arch/x86/kernel/process.c

To be:

    diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
    --- a/arch/x86/kernel/process.c
    +++ b/arch/x86/kernel/process.c

It'll make the state machine a bit more complex, but IMO it would suck
more if we generate a different hash depending on the tool generating
the diff. OTOH the "diff --git" line was never there, and it *does*
matter, so should we be faking it up? Maybe not, bah!

> "git apply" uses apply.c::find_header, which is more permissive.
> Maybe it would be possible to unify these somehow.  (I haven't looked
> closely enough to tell how painful that would be.)
>
> Thanks and hope that helps,
> Jonathan

^ permalink raw reply

* Re: [PATCH] mailmap: update brandon williams's email address
From: Jonathan Nieder @ 2018-12-07 22:22 UTC (permalink / raw)
  To: Stefan Beller; +Cc: bwilliamseng, git, bwilliams.eng
In-Reply-To: <CAGZ79kYrgpZDqAhg8c11V_qJTCzzw4-qrVN2z_Y_OAeCbWU6dQ@mail.gmail.com>

Hi,

Stefan Beller wrote:

> What would be more of interest is why we'd be interested in this patch
> as there is no commit/patch sent by Brandon with this email in gits history.

I think there's an implicit assumption in this question that isn't
spelled out.  Do I understand correctly that you're saying the main
purpose of .mailmap is to figure out whether two commits are by the
same author?

My own uses of .mailmap primarily have a different purpose: to find
out the preferred contact address for the author of a given commit.

Thanks,
Jonathan

^ permalink raw reply

* Re: enhancement: support for author.email and author.name in "git config"
From: William Hubbs @ 2018-12-07 22:21 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Martin Ågren, Git Mailing List, chutzpah
In-Reply-To: <nycvar.QRO.7.76.6.1812062318570.41@tvgsbejvaqbjf.bet>

Hi Johannes,

On Thu, Dec 06, 2018 at 11:20:07PM +0100, Johannes Schindelin wrote:
> Hi William,
> 
>[...]
> 
> There *is* a way to get what you want that is super easy and will
> definitely work: if you sit down and do it ;-)
> 
> Please let us know if you need any additional information before you can
> start.

Which branch should I work off of in the repo?

William

^ permalink raw reply

* Re: [PATCH v2 3/3] Refactor mingw_cygwin_offset_1st_component()
From: Johannes Schindelin @ 2018-12-07 22:18 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, svnpenn
In-Reply-To: <20181207170500.9078-1-tboegi@web.de>

Hi Torsten,

On Fri, 7 Dec 2018, tboegi@web.de wrote:

> diff --git a/compat/mingw-cygwin.c b/compat/mingw-cygwin.c
> index 5552c3ac20..c379a72775 100644
> --- a/compat/mingw-cygwin.c
> +++ b/compat/mingw-cygwin.c
> @@ -10,10 +10,8 @@ size_t mingw_cygwin_skip_dos_drive_prefix(char **path)
>  size_t mingw_cygwin_offset_1st_component(const char *path)
>  {
>  	char *pos = (char *)path;
> -
> -	/* unc paths */

This comment is still useful (and now even more correct), and should stay.

> -	if (!skip_dos_drive_prefix(&pos) &&
> -			is_dir_sep(pos[0]) && is_dir_sep(pos[1])) {
> +	if (is_dir_sep(pos[0]) && is_dir_sep(pos[1])) {
> +		/* unc path */
>  		/* skip server name */
>  		pos = strpbrk(pos + 2, "\\/");
>  		if (!pos)
> @@ -22,7 +20,8 @@ size_t mingw_cygwin_offset_1st_component(const char *path)
>  		do {
>  			pos++;
>  		} while (*pos && !is_dir_sep(*pos));
> +	} else {
> +		skip_dos_drive_prefix(&pos);
>  	}
> -

Why remove this empty line? It structures the code quite nicely.

The rest looks correct to me,
Johannes

>  	return pos + is_dir_sep(*pos) - path;
>  }
> -- 
> 2.19.0.271.gfe8321ec05
> 
> 

^ permalink raw reply

* Re: [PATCH] mailmap: update brandon williams's email address
From: Stefan Beller @ 2018-12-07 22:11 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: bwilliamseng, git, bwilliams.eng
In-Reply-To: <20181207214013.GA73340@google.com>

On Fri, Dec 7, 2018 at 1:40 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Brandon Williams wrote:
>
> > Signed-off-by: Brandon Williams <bwilliams.eng@gmail.com>
> > ---
> >  .mailmap | 1 +
> >  1 file changed, 1 insertion(+)
>
> I can confirm that this is indeed the same person.

What would be more of interest is why we'd be interested in this patch
as there is no commit/patch sent by Brandon with this email in gits history.

Is that so you get cc'd on your private address and can follow
things you worked on without being subscribed to the mailing list?
(I'd be interested to see the use case in the commit message;)

Thanks,
Stefan

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox