Git development
 help / color / mirror / Atom feed
* Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
From: Stefan Beller @ 2018-12-06 21:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan
In-Reply-To: <xmqqa7lkeki3.fsf@gitster-ct.c.googlers.com>

On Tue, Dec 4, 2018 at 7:10 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > This is a resend of sb/submodule-recursive-fetch-gets-the-tip,
> > with all feedback addressed. As it took some time, I'll send it
> > without range-diff, but would ask for full review.
>
> Is that a "resend" or reroll/update (or whatever word that does not
> imply "just sending the same thing again")?

As you noticed, it is an actual update. I started to use resend
as DScho seems very unhappy about the word reroll claiming we'd
be the only Software community that uses the term reroll for
an iteration of a change.

I see how resend could sound like retransmission without change.


>                         child_process_init(cp);
>      -                  cp->dir = strbuf_detach(&submodule_path, NULL);
>     -                   prepare_submodule_repo_env(&cp->env_array);
>      +                  cp->dir = xstrdup(repo->worktree);
>     +                   prepare_submodule_repo_env(&cp->env_array);
>
> Hmph, I offhand do not see there would be any difference if you
> assigned to cp->dir before or after preparing the repo env, but is
> there a reason these two must be done in this updated order that I
> am missing?  Very similar changes appear multiple times in this
> range-diff.

Jonathan Tan asked for it to be "diff friendly". This -of course- is
range-diff unfriendly.

> [...]

you seem to be OK with a lot of the changes, I did not find an
actionable suggestion.

Thanks for still queuing topics during -rc time,
Stefan

^ permalink raw reply

* Re: [wishlist] git submodule update --reset-hard
From: Stefan Beller @ 2018-12-06 21:55 UTC (permalink / raw)
  To: Yaroslav Halchenko; +Cc: git
In-Reply-To: <20181206212459.GN4633@hopa.kiewit.dartmouth.edu>

On Thu, Dec 6, 2018 at 1:25 PM Yaroslav Halchenko <yoh@onerussian.com> wrote:
>
>
> On Thu, 06 Dec 2018, Stefan Beller wrote:
>
> > On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko <yoh@onerussian.com> wrote:
>
> > > Dear Git Gurus,
>
> > > I wondered what would be your take on my wishlist request to add
> > > --reset-hard option, which would be very similar to regular "update" which
> > > checks out necessary commit, but I want it to remain in the branch.
>
> > What if the branch differs from the sha1 recorded in the superproject?
>
> git reset --hard  itself is an operation which should be done with some
> level of competence in doing "the right thing" by calling it.  You
> can hop branches even in current (without any submodules in question)
> repository with it and cause as much chaos as you desire.

Right.

git reset --hard would the branch (as well as the working tree) to the
given sha1, which is confusing as submodules get involved.

The Right Thing as of now is the sha1 as found in the
superprojects gitlink. But as that can be different from any branch
in the submodule, we'd rather detach the HEAD to make it
deterministic.

There was a proposal to "re-attach HEAD" in the submodule, i.e.
if the branch branch points at the same commit, we don't need
a detached HEAD, but could go with the branch instead.

> If desired though, a number of prevention mechanisms could be in place (but
> would require option(s) to overcome) to allow submodule to be reset --hard'ed
> only when some conditions met (e.g. only to the commit which is among parent
> commits path of the current branch).  This way wild hops would be prevented,
> although you might still end up in some feature branch.  But since "reset
> --hard" itself doesn't have any safe-guards, I do not really think they should
> be implemented here either.

So are you looking for
a) "stay on submodule branch (i.e. HEAD still points at $branch), and
reset --hard"
    such that the submodule has a clean index and at that $branch or
b) "stay on submodule branch (i.e. HEAD still points at $branch), but $branch is
   set to the gitlink from the superproject, and then a reset --hard
will have the worktree
   set to it as well.

(a) is what the referenced submodule.repoLike option implements.

I'd understand the desire for (b) as well, as it is a "real" hard reset on
the superproject level, without detaching branches.

> >   git reset --hard --recurse-submodules HEAD

> it does indeed some trick(s) but not all seems to be the ones I desire:
>
> 1. Seems to migrate submodule's .git directories into the top level
> .git/modules

Ah yes, that happens too. This will help once you want to git-rm
a submodule and checkout states before and after.

> > undesirable in the sense of still having local changes (that is what
> > the above reset with `--recurse` would fix) or changed the branch
> > state? (i.e. is detached but was on a branch before?)
>
> right -- I meant the local changes and indeed reset --recurse-submodules
> indeed seems to recurse nicely.  Then the undesired effect remaining only
> the detached HEAD

For that we may want to revive discussions in
https://public-inbox.org/git/20170501180058.8063-5-sbeller@google.com/


> > >   git submodule update --recursive
>
> > > I would end up in the detached HEADs within submodules.
>
> > > What I want is to retain current branch they are at (or may be possible
> > > "were in"? reflog records might have that information)
>
> > So something like
>
> >   git submodule foreach --recursive git reset --hard
>
> > ?
>
> not quite  -- this would just kill all local changes within each submodule, not
> to reset it to the desired state, which wouldn't be specified in such
> invocation, and is only known to the repo containing it

With this answer it sounds like you'd want (b) from above.

> > You may be interested in
> > https://public-inbox.org/git/20180927221603.148025-1-sbeller@google.com/
> > which introduces a switch `submodule.repoLike [ = true]`, which
> > when set would not detach HEAD in submodules.
>
> Thanks! looks interesting -- was there more discussion/activity beyond those 5
> posts in the thread?

Unfortunately there was not.

> This feature might indeed come handy but if I got it right, it is somewhat
> complimentary to just having submodule update --reset-hard .  E.g.  submodules
> might be in different branches (if I am not tracking based on branch names), so
> I would not want a recursive checkout with -b|-B.  But we would indeed benefit
> from such functionality, since this difficulty of managing branches of
> submodules I think would be elevated with it! (e.g. in one use case we probably
> will end up with a few thousands of submodules, and at least 3 branches in each
> which would need to be in sync, and typically you wouldn't want different
> branches to be checked out in different submodules)
>
> > Can you say more about the first question above:
> > Would you typically have situations where the
> > submodule branch is out of sync with the superproject
> > and how do you deal with that?
>
> typically I do not have anything out of sync.  My primary use-case is to
> "cancel" recent changes in the entire tree of repositories.  I guess for
> my use case, instead of needing two commands
>
>    git reset --hard PREVIOUSPOINT
>    git submodule update --reset--hard --recursive
>
> I wish there was just one
>
>    git reset --hard --recursive PREVIOUSPOINT

Maybe this could learn options like

  git reset --hard --recursive=hard,keep-branch PREVIOUSPOINT

which then could be put into options like

  git config reset.recurseSubmodules  hard,keep-branch &&
  # maybe not needed, depending on the exact meaning
  # of reset.recurseSubmodules:
  git config submodule.recurse

and then

  git reset --hard PREVIOUS

would do what you'd desire.

> but I felt that   submodule update   might be a better starting point
> since it already  provides different modes for update.  If I was even greedier,
> I would have asked for
>
>    git revert --recursive <commit>...
>    git rebase --recursive [-i] ...
>
> which I also frequently desire (could elaborate on the use cases etc).

These would be nice to have. It would be nice if you'd elaborate on the
use cases for future reference in the mailing list archive. :-)

>
> NB or --recurse-submodules to avoid confusion with recursive merge
> strategy?

... and sometimes recursing in the file system, c.f. `ls-tree -r`.

^ permalink raw reply

* [PATCH] terminology tweak: prune -> path limiting
From: Matthew DeVore @ 2018-12-06 21:33 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jrn, matvore, dstolee, gitster, pclouds, peff

In the codebase, "prune" is a highly overloaded term, and it caused me a
lot of trouble to figure out what it meant when it was used in the
context of path limiting. Stop using the word "prune" when we really
mean "path limiting."

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/technical/api-history-graph.txt |  5 +-
 builtin/add.c                                 |  4 +-
 builtin/diff.c                                |  8 +-
 builtin/fast-export.c                         |  2 +-
 builtin/log.c                                 |  2 +-
 builtin/rev-list.c                            |  2 +-
 diff-lib.c                                    |  6 +-
 revision.c                                    | 90 ++++++++++---------
 revision.h                                    |  4 +-
 t/t7811-grep-open.sh                          |  2 +-
 tree-walk.c                                   | 10 +--
 wt-status.c                                   |  4 +-
 12 files changed, 72 insertions(+), 67 deletions(-)

diff --git a/Documentation/technical/api-history-graph.txt b/Documentation/technical/api-history-graph.txt
index d0d1707c8c..f9a100f88c 100644
--- a/Documentation/technical/api-history-graph.txt
+++ b/Documentation/technical/api-history-graph.txt
@@ -100,8 +100,9 @@ Limitations
   on all parents of that commit.  Parents must not be skipped, or the graph
   output will appear incorrect.
 +
-`graph_update()` may be used on a pruned set of commits only if the parent list
-has been rewritten so as to include only ancestors from the pruned set.
+`graph_update()` may be used on a pruned (e.g. path-limited) set of commits only
+if the parent list has been rewritten so as to include only ancestors from the
+pruned set.
 
 * The graph API does not currently support reverse commit ordering.  In
   order to implement reverse ordering, the graphing API needs an
diff --git a/builtin/add.c b/builtin/add.c
index f65c172299..4abd8ebba8 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -113,14 +113,14 @@ int add_files_to_cache(const char *prefix,
 	repo_init_revisions(the_repository, &rev, prefix);
 	setup_revisions(0, NULL, &rev, NULL);
 	if (pathspec)
-		copy_pathspec(&rev.prune_data, pathspec);
+		copy_pathspec(&rev.path_limits, pathspec);
 	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = update_callback;
 	rev.diffopt.format_callback_data = &data;
 	rev.diffopt.flags.override_submodule_config = 1;
 	rev.max_count = 0; /* do not compare unmerged paths with stage #2 */
 	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
-	clear_pathspec(&rev.prune_data);
+	clear_pathspec(&rev.path_limits);
 	return !!data.add_errors;
 }
 
diff --git a/builtin/diff.c b/builtin/diff.c
index f0393bba23..9010b3228a 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -74,8 +74,8 @@ static int builtin_diff_b_f(struct rev_info *revs,
 	if (argc > 1)
 		usage(builtin_diff_usage);
 
-	GUARD_PATHSPEC(&revs->prune_data, PATHSPEC_FROMTOP | PATHSPEC_LITERAL);
-	path = revs->prune_data.items[0].match;
+	GUARD_PATHSPEC(&revs->path_limits, PATHSPEC_FROMTOP | PATHSPEC_LITERAL);
+	path = revs->path_limits.items[0].match;
 
 	if (lstat(path, &st))
 		die_errno(_("failed to stat '%s'"), path);
@@ -421,8 +421,8 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 			die(_("unhandled object '%s' given."), name);
 		}
 	}
-	if (rev.prune_data.nr)
-		paths += rev.prune_data.nr;
+	if (rev.path_limits.nr)
+		paths += rev.path_limits.nr;
 
 	/*
 	 * Now, do the arguments look reasonable?
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 9e283482ef..6a675b5737 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -1143,7 +1143,7 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 		import_marks(import_filename);
 	lastimportid = last_idnum;
 
-	if (import_filename && revs.prune_data.nr)
+	if (import_filename && revs.path_limits.nr)
 		full_tree = 1;
 
 	get_tags_and_duplicates(&revs.cmdline);
diff --git a/builtin/log.c b/builtin/log.c
index 45aa376a59..f8554d7fa1 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -690,7 +690,7 @@ static void log_setup_revisions_tweak(struct rev_info *rev,
 				      struct setup_revision_opt *opt)
 {
 	if (rev->diffopt.flags.default_follow_renames &&
-	    rev->prune_data.nr == 1)
+	    rev->path_limits.nr == 1)
 		rev->diffopt.flags.follow_renames = 1;
 
 	/* Turn --cc/-c into -p --cc/-c when -p was not given */
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 3a2c0c23b6..edb0799533 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -517,7 +517,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (show_progress)
 		progress = start_delayed_progress(show_progress, 0);
 
-	if (use_bitmap_index && !revs.prune) {
+	if (use_bitmap_index && !revs.path_limiting) {
 		if (revs.count && !revs.left_right && !revs.cherry_mark) {
 			uint32_t commit_count;
 			int max_count = revs.max_count;
diff --git a/diff-lib.c b/diff-lib.c
index 23c8d351b3..431bed0e5a 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -110,7 +110,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		if (diff_can_quit_early(&revs->diffopt))
 			break;
 
-		if (!ce_path_match(istate, ce, &revs->prune_data, NULL))
+		if (!ce_path_match(istate, ce, &revs->path_limits, NULL))
 			continue;
 
 		if (ce_stage(ce)) {
@@ -477,7 +477,7 @@ static int oneway_diff(const struct cache_entry * const *src,
 
 	if (ce_path_match(revs->diffopt.repo->index,
 			  idx ? idx : tree,
-			  &revs->prune_data, NULL)) {
+			  &revs->path_limits, NULL)) {
 		do_oneway_diff(o, idx, tree);
 		if (diff_can_quit_early(&revs->diffopt)) {
 			o->exiting_early = 1;
@@ -543,7 +543,7 @@ int do_diff_cache(const struct object_id *tree_oid, struct diff_options *opt)
 	struct rev_info revs;
 
 	repo_init_revisions(opt->repo, &revs, NULL);
-	copy_pathspec(&revs.prune_data, &opt->pathspec);
+	copy_pathspec(&revs.path_limits, &opt->pathspec);
 	revs.diffopt = *opt;
 
 	if (diff_cache(&revs, tree_oid, NULL, 1))
diff --git a/revision.c b/revision.c
index 13e0519c02..156a89b7f0 100644
--- a/revision.c
+++ b/revision.c
@@ -488,7 +488,7 @@ static int rev_compare_tree(struct rev_info *revs,
 		 * tagged commit by specifying both --simplify-by-decoration
 		 * and pathspec.
 		 */
-		if (!revs->prune_data.nr)
+		if (!revs->path_limits.nr)
 			return REV_TREE_SAME;
 	}
 
@@ -616,14 +616,14 @@ static unsigned update_treesame(struct rev_info *revs, struct commit *commit)
 static inline int limiting_can_increase_treesame(const struct rev_info *revs)
 {
 	/*
-	 * TREESAME is irrelevant unless prune && dense;
+	 * TREESAME is irrelevant unless path_limiting && dense;
 	 * if simplify_history is set, we can't have a mixture of TREESAME and
 	 *    !TREESAME INTERESTING parents (and we don't have treesame[]
 	 *    decoration anyway);
 	 * if first_parent_only is set, then the TREESAME flag is locked
 	 *    against the first parent (and again we lack treesame[] decoration).
 	 */
-	return revs->prune && revs->dense &&
+	return revs->path_limiting && revs->dense &&
 	       !revs->simplify_history &&
 	       !revs->first_parent_only;
 }
@@ -638,7 +638,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 	/*
 	 * If we don't do pruning, everything is interesting
 	 */
-	if (!revs->prune)
+	if (!revs->path_limiting)
 		return;
 
 	if (!get_commit_tree(commit))
@@ -786,7 +786,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 
 	/*
 	 * If the commit is uninteresting, don't try to
-	 * prune parents - we want the maximal uninteresting
+	 * path limit parents - we want the maximal uninteresting
 	 * set.
 	 *
 	 * Normally we haven't parsed the parent
@@ -1511,8 +1511,8 @@ static void prepare_show_merge(struct rev_info *revs)
 	struct commit_list *bases;
 	struct commit *head, *other;
 	struct object_id oid;
-	const char **prune = NULL;
-	int i, prune_num = 1; /* counting terminating NULL */
+	const char **limiting_paths = NULL;
+	int i, limiting_paths_num = 1; /* counting terminating NULL */
 	struct index_state *istate = revs->repo->index;
 
 	if (get_oid("HEAD", &oid))
@@ -1535,19 +1535,20 @@ static void prepare_show_merge(struct rev_info *revs)
 		const struct cache_entry *ce = istate->cache[i];
 		if (!ce_stage(ce))
 			continue;
-		if (ce_path_match(istate, ce, &revs->prune_data, NULL)) {
-			prune_num++;
-			REALLOC_ARRAY(prune, prune_num);
-			prune[prune_num-2] = ce->name;
-			prune[prune_num-1] = NULL;
+		if (ce_path_match(istate, ce, &revs->path_limits, NULL)) {
+			limiting_paths_num++;
+			REALLOC_ARRAY(limiting_paths, limiting_paths_num);
+			limiting_paths[limiting_paths_num-2] = ce->name;
+			limiting_paths[limiting_paths_num-1] = NULL;
 		}
 		while ((i+1 < istate->cache_nr) &&
 		       ce_same_name(ce, istate->cache[i+1]))
 			i++;
 	}
-	clear_pathspec(&revs->prune_data);
-	parse_pathspec(&revs->prune_data, PATHSPEC_ALL_MAGIC & ~PATHSPEC_LITERAL,
-		       PATHSPEC_PREFER_FULL | PATHSPEC_LITERAL_PATH, "", prune);
+	clear_pathspec(&revs->path_limits);
+	parse_pathspec(&revs->path_limits, PATHSPEC_ALL_MAGIC & ~PATHSPEC_LITERAL,
+		       PATHSPEC_PREFER_FULL | PATHSPEC_LITERAL_PATH, "",
+		       limiting_paths);
 	revs->limited = 1;
 }
 
@@ -1736,14 +1737,14 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 }
 
 static void read_pathspec_from_stdin(struct rev_info *revs, struct strbuf *sb,
-				     struct argv_array *prune)
+				     struct argv_array *limiting_paths)
 {
 	while (strbuf_getline(sb, stdin) != EOF)
-		argv_array_push(prune, sb->buf);
+		argv_array_push(limiting_paths, sb->buf);
 }
 
 static void read_revisions_from_stdin(struct rev_info *revs,
-				      struct argv_array *prune)
+				      struct argv_array *limiting_paths)
 {
 	struct strbuf sb;
 	int seen_dashdash = 0;
@@ -1769,7 +1770,7 @@ static void read_revisions_from_stdin(struct rev_info *revs,
 			die("bad revision '%s'", sb.buf);
 	}
 	if (seen_dashdash)
-		read_pathspec_from_stdin(revs, &sb, prune);
+		read_pathspec_from_stdin(revs, &sb, limiting_paths);
 
 	strbuf_release(&sb);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -1884,7 +1885,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->simplify_history = 0;
 		revs->simplify_by_decoration = 1;
 		revs->limited = 1;
-		revs->prune = 1;
+		revs->path_limiting = 1;
 		load_ref_decorations(NULL, DECORATE_SHORT_REFS);
 	} else if (!strcmp(arg, "--date-order")) {
 		revs->sort_order = REV_SORT_BY_COMMIT_DATE;
@@ -2338,7 +2339,7 @@ static void NORETURN diagnose_missing_default(const char *def)
 int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct setup_revision_opt *opt)
 {
 	int i, flags, left, seen_dashdash, got_rev_arg = 0, revarg_opt;
-	struct argv_array prune_data = ARGV_ARRAY_INIT;
+	struct argv_array path_limits = ARGV_ARRAY_INIT;
 	const char *submodule = NULL;
 
 	if (opt)
@@ -2356,7 +2357,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			argv[i] = NULL;
 			argc = i;
 			if (argv[i + 1])
-				argv_array_pushv(&prune_data, argv + i + 1);
+				argv_array_pushv(&path_limits, argv + i + 1);
 			seen_dashdash = 1;
 			break;
 		}
@@ -2387,7 +2388,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 				}
 				if (revs->read_from_stdin++)
 					die("--stdin given twice?");
-				read_revisions_from_stdin(revs, &prune_data);
+				read_revisions_from_stdin(revs, &path_limits);
 				continue;
 			}
 
@@ -2416,32 +2417,32 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			for (j = i; j < argc; j++)
 				verify_filename(revs->prefix, argv[j], j == i);
 
-			argv_array_pushv(&prune_data, argv + i);
+			argv_array_pushv(&path_limits, argv + i);
 			break;
 		}
 		else
 			got_rev_arg = 1;
 	}
 
-	if (prune_data.argc) {
+	if (path_limits.argc) {
 		/*
 		 * If we need to introduce the magic "a lone ':' means no
 		 * pathspec whatsoever", here is the place to do so.
 		 *
-		 * if (prune_data.nr == 1 && !strcmp(prune_data[0], ":")) {
-		 *	prune_data.nr = 0;
-		 *	prune_data.alloc = 0;
-		 *	free(prune_data.path);
-		 *	prune_data.path = NULL;
+		 * if (path_limits.nr == 1 && !strcmp(path_limits[0], ":")) {
+		 *	path_limits.nr = 0;
+		 *	path_limits.alloc = 0;
+		 *	free(path_limits.path);
+		 *	path_limits.path = NULL;
 		 * } else {
-		 *	terminate prune_data.alloc with NULL and
-		 *	call init_pathspec() to set revs->prune_data here.
+		 *	terminate path_limits.alloc with NULL and
+		 *	call init_pathspec() to set revs->path_limits here.
 		 * }
 		 */
-		parse_pathspec(&revs->prune_data, 0, 0,
-			       revs->prefix, prune_data.argv);
+		parse_pathspec(&revs->path_limits, 0, 0,
+			       revs->prefix, path_limits.argv);
 	}
-	argv_array_clear(&prune_data);
+	argv_array_clear(&path_limits);
 
 	if (revs->def == NULL)
 		revs->def = opt ? opt->def : NULL;
@@ -2475,14 +2476,17 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 	if (revs->topo_order && !generation_numbers_enabled(the_repository))
 		revs->limited = 1;
 
-	if (revs->prune_data.nr) {
-		copy_pathspec(&revs->pruning.pathspec, &revs->prune_data);
-		/* Can't prune commits with rename following: the paths change.. */
+	if (revs->path_limits.nr) {
+		copy_pathspec(&revs->pruning.pathspec, &revs->path_limits);
+		/*
+		 * Can't path-limit commits with rename following; the paths
+		 * change.
+		 */
 		if (!revs->diffopt.flags.follow_renames)
-			revs->prune = 1;
+			revs->path_limiting = 1;
 		if (!revs->full_diff)
 			copy_pathspec(&revs->diffopt.pathspec,
-				      &revs->prune_data);
+				      &revs->path_limits);
 	}
 	if (revs->combine_merges)
 		revs->ignore_merges = 0;
@@ -2845,7 +2849,7 @@ static void simplify_merges(struct rev_info *revs)
 	struct commit_list *yet_to_do, **tail;
 	struct commit *commit;
 
-	if (!revs->prune)
+	if (!revs->path_limiting)
 		return;
 
 	/* feed the list reversed */
@@ -3361,7 +3365,7 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi
 	}
 	if (!commit_match(commit, revs))
 		return commit_ignore;
-	if (revs->prune && revs->dense) {
+	if (revs->path_limiting && revs->dense) {
 		/* Commit without changes? */
 		if (commit->object.flags & TREESAME) {
 			int n;
@@ -3446,7 +3450,7 @@ enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit)
 	enum commit_action action = get_commit_action(revs, commit);
 
 	if (action == commit_show &&
-	    revs->prune && revs->dense && want_ancestry(revs)) {
+	    revs->path_limiting && revs->dense && want_ancestry(revs)) {
 		/*
 		 * --full-diff on simplified parents is no good: it
 		 * will show spurious changes from the commits that
diff --git a/revision.h b/revision.h
index 7987bfcd2e..88b7451362 100644
--- a/revision.h
+++ b/revision.h
@@ -87,7 +87,7 @@ struct rev_info {
 	/* Basic information */
 	const char *prefix;
 	const char *def;
-	struct pathspec prune_data;
+	struct pathspec path_limits;
 
 	/*
 	 * Whether the arguments parsed by setup_revisions() included any
@@ -111,7 +111,7 @@ struct rev_info {
 
 	/* Traversal flags */
 	unsigned int	dense:1,
-			prune:1,
+			path_limiting:1,
 			no_walk:2,
 			remove_empty_trees:1,
 			simplify_history:1,
diff --git a/t/t7811-grep-open.sh b/t/t7811-grep-open.sh
index d1ebfd88c7..79af1b7187 100755
--- a/t/t7811-grep-open.sh
+++ b/t/t7811-grep-open.sh
@@ -23,7 +23,7 @@ enum grep_pat_token {
 	test_commit add-user revision.c "
 	}
 	if (seen_dashdash)
-		read_pathspec_from_stdin(revs, &sb, prune);
+		read_pathspec_from_stdin(revs, &sb, limiting_paths);
 	strbuf_release(&sb);
 }
 
diff --git a/tree-walk.c b/tree-walk.c
index 79bafbd1a2..b60170b6b4 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -365,10 +365,10 @@ static void free_extended_entry(struct tree_desc_x *t)
 	}
 }
 
-static inline int prune_traversal(struct name_entry *e,
-				  struct traverse_info *info,
-				  struct strbuf *base,
-				  int still_interesting)
+static inline int path_limit_traversal(struct name_entry *e,
+				       struct traverse_info *info,
+				       struct strbuf *base,
+				       int still_interesting)
 {
 	if (!info->pathspec || still_interesting == 2)
 		return 2;
@@ -461,7 +461,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 		}
 		if (!mask)
 			break;
-		interesting = prune_traversal(e, info, &base, interesting);
+		interesting = path_limit_traversal(e, info, &base, interesting);
 		if (interesting < 0)
 			break;
 		if (interesting) {
diff --git a/wt-status.c b/wt-status.c
index 0fe3bcd4cd..b0a3efea4b 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -603,7 +603,7 @@ static void wt_status_collect_changes_worktree(struct wt_status *s)
 	rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
 	rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
 	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
-	copy_pathspec(&rev.prune_data, &s->pathspec);
+	copy_pathspec(&rev.path_limits, &s->pathspec);
 	run_diff_files(&rev, 0);
 }
 
@@ -639,7 +639,7 @@ static void wt_status_collect_changes_index(struct wt_status *s)
 	rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
 	rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
 	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
-	copy_pathspec(&rev.prune_data, &s->pathspec);
+	copy_pathspec(&rev.path_limits, &s->pathspec);
 	run_diff_index(&rev, 1);
 }
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH] fetch: ensure submodule objects fetched
From: Stefan Beller @ 2018-12-06 21:26 UTC (permalink / raw)
  To: jonathantanmy; +Cc: git, sbeller
In-Reply-To: <20181205010704.84790-1-jonathantanmy@google.com>

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

But this default fetch is not sufficient, as a newly fetched commit in
the superproject could point to a commit in the submodule that is not
in the default refspec. This is common in workflows like Gerrit's.
When fetching a Gerrit change under review (from refs/changes/??), the
commits in that change likely point to submodule commits that have not
been merged to a branch yet.

Fetch a submodule object by id if the object that the superproject
points to, cannot be found. For now this object is fetched from the
'origin' remote as we defer getting the default remote to a later patch.

A list of new submodule commits are already generated in certain
conditions (by check_for_new_submodule_commits()); this new feature
invokes that function in more situations.

The submodule checks were done only when a ref in the superproject
changed, these checks were extended to also be performed when fetching
into FETCH_HEAD for completeness, and add a test for that too.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

Thanks Jonathan for the review!
So it looks like only the last patch needs some improvements,
which is why I'd only resend the last patch here.
Also note the test with interious superproject commits.

All suggestions sounded sensible, addressing them all,
here is a range-diff to the currently queued version:

Range-diff:
1:  04eb06607b ! 1:  ac6558cbc9 fetch: try fetching submodules if needed objects were not fetched
    @@ -1,6 +1,6 @@
     Author: Stefan Beller <sbeller@google.com>
     
    -    fetch: try fetching submodules if needed objects were not fetched
    +    fetch: ensure submodule objects fetched
     
         Currently when git-fetch is asked to recurse into submodules, it dispatches
         a plain "git-fetch -C <submodule-dir>" (with some submodule related options
    @@ -14,22 +14,19 @@
         commits in that change likely point to submodule commits that have not
         been merged to a branch yet.
     
    -    Try fetching a submodule by object id if the object id that the
    -    superproject points to, cannot be found.
    +    Fetch a submodule object by id if the object that the superproject
    +    points to, cannot be found. For now this object is fetched from the
    +    'origin' remote as we defer getting the default remote to a later patch.
     
    -    builtin/fetch used to only inspect submodules when they were fetched
    -    "on-demand", as in either on/off case it was clear whether the submodule
    -    needs to be fetched. However to know whether we need to try fetching the
    -    object ids, we need to identify the object names, which is done in this
    -    function check_for_new_submodule_commits(), so we'll also run that code
    -    in case the submodule recursion is set to "on".
    +    A list of new submodule commits are already generated in certain
    +    conditions (by check_for_new_submodule_commits()); this new feature
    +    invokes that function in more situations.
     
         The submodule checks were done only when a ref in the superproject
         changed, these checks were extended to also be performed when fetching
         into FETCH_HEAD for completeness, and add a test for that too.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/builtin/fetch.c b/builtin/fetch.c
      --- a/builtin/fetch.c
    @@ -82,7 +79,7 @@
      
      	struct string_list changed_submodule_names;
     +
    -+	/* The submodules to fetch in */
    ++	/* Pending fetches by OIDs */
     +	struct fetch_task **oid_fetch_tasks;
     +	int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;
      };
    @@ -97,13 +94,16 @@
      	return spf->default_option;
      }
      
    ++/*
    ++ * Fetch in progress (if callback data) or
    ++ * pending (if in oid_fetch_tasks in struct submodule_parallel_fetch)
    ++ */
     +struct fetch_task {
     +	struct repository *repo;
     +	const struct submodule *sub;
     +	unsigned free_sub : 1; /* Do we need to free the submodule? */
     +
    -+	/* fetch specific oids if set, otherwise fetch default refspec */
    -+	struct oid_array *commits;
    ++	struct oid_array *commits; /* Ensure these commits are fetched */
     +};
     +
     +/**
    @@ -176,7 +176,6 @@
      
      	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
     -		struct strbuf submodule_prefix = STRBUF_INIT;
    -+		int recurse_config;
      		const struct cache_entry *ce = spf->r->index->cache[spf->count];
      		const char *default_argv;
     -		const struct submodule *submodule;
    @@ -199,11 +198,9 @@
     +		task = fetch_task_create(spf->r, ce->name);
     +		if (!task)
     +			continue;
    -+
    -+		recurse_config = get_fetch_recurse_config(task->sub, spf);
      
     -		switch (get_fetch_recurse_config(submodule, spf))
    -+		switch (recurse_config)
    ++		switch (get_fetch_recurse_config(task->sub, spf))
      		{
      		default:
      		case RECURSE_SUBMODULES_DEFAULT:
    @@ -314,7 +311,7 @@
      	return 0;
      }
      
    -+static int commit_exists_in_sub(const struct object_id *oid, void *data)
    ++static int commit_missing_in_sub(const struct object_id *oid, void *data)
     +{
     +	struct repository *subrepo = data;
     +
    @@ -340,7 +337,7 @@
     +
     +	/* Is this the second time we process this submodule? */
     +	if (task->commits)
    -+		return 0;
    ++		goto out;
     +
     +	it = string_list_lookup(&spf->changed_submodule_names, task->sub->name);
     +	if (!it)
    @@ -349,7 +346,7 @@
     +
     +	commits = it->util;
     +	oid_array_filter(commits,
    -+			 commit_exists_in_sub,
    ++			 commit_missing_in_sub,
     +			 task->repo);
     +
     +	/* Are there commits we want, but do not exist? */
    @@ -408,7 +405,7 @@
     +	)
     +'
     +
    -+test_expect_success 'fetch new submodule commits on-demand in FETCH_HEAD' '
    ++test_expect_success 'fetch new submodule commit on-demand in FETCH_HEAD' '
     +	# depends on the previous test for setup
     +
     +	C=$(git -C submodule commit-tree -m "another change outside refs/heads" HEAD^{tree}) &&
    @@ -462,5 +459,36 @@
     +		git checkout --recurse-submodules FETCH_HEAD
     +	)
     +'
    ++
    ++test_expect_success 'fetch new submodule commit intermittently referenced by superproject' '
    ++	# depends on the previous test for setup
    ++
    ++	D=$(git -C sub1 commit-tree -m "change 10 outside refs/heads" HEAD^{tree}) &&
    ++	E=$(git -C sub1 commit-tree -m "change 11 outside refs/heads" HEAD^{tree}) &&
    ++	F=$(git -C sub1 commit-tree -m "change 12 outside refs/heads" HEAD^{tree}) &&
    ++
    ++	git -C sub1 update-ref refs/changes/10 $D &&
    ++	git update-index --cacheinfo 160000 $D sub1 &&
    ++	git commit -m "updated submodules outside of refs/heads" &&
    ++
    ++	git -C sub1 update-ref refs/changes/11 $E &&
    ++	git update-index --cacheinfo 160000 $E sub1 &&
    ++	git commit -m "updated submodules outside of refs/heads" &&
    ++
    ++	git -C sub1 update-ref refs/changes/12 $F &&
    ++	git update-index --cacheinfo 160000 $F sub1 &&
    ++	git commit -m "updated submodules outside of refs/heads" &&
    ++
    ++	G=$(git rev-parse HEAD) &&
    ++	git update-ref refs/changes/13 $G &&
    ++	(
    ++		cd downstream &&
    ++		git fetch --recurse-submodules origin refs/changes/13 &&
    ++
    ++		git -C sub1 cat-file -t $D &&
    ++		git -C sub1 cat-file -t $E &&
    ++		git -C sub1 cat-file -t $F
    ++	)
    ++'
     +
      test_done

 builtin/fetch.c             |  11 +-
 submodule.c                 | 206 +++++++++++++++++++++++++++++++-----
 t/t5526-fetch-submodules.sh | 117 ++++++++++++++++++++
 3 files changed, 296 insertions(+), 38 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e0140327aa..91f9b7d9c8 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -763,9 +763,6 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
-			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref(msg, ref, 0);
 		format_display(display, r ? '!' : '*', what,
 			       r ? _("unable to update local ref") : NULL,
@@ -779,9 +776,6 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
-			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("fast-forward", ref, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
 			       r ? _("unable to update local ref") : NULL,
@@ -794,9 +788,6 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
-			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("forced-update", ref, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
 			       r ? _("unable to update local ref") : _("forced update"),
@@ -892,6 +883,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				ref->force = rm->peer_ref->force;
 			}
 
+			if (recurse_submodules != RECURSE_SUBMODULES_OFF)
+				check_for_new_submodule_commits(&rm->old_oid);
 
 			if (!strcmp(rm->name, "HEAD")) {
 				kind = "";
diff --git a/submodule.c b/submodule.c
index d1b6646f42..b88343d977 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1231,8 +1231,14 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+
+	/* Pending fetches by OIDs */
+	struct fetch_task **oid_fetch_tasks;
+	int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+		  STRING_LIST_INIT_DUP, \
+		  NULL, 0, 0}
 
 static int get_fetch_recurse_config(const struct submodule *submodule,
 				    struct submodule_parallel_fetch *spf)
@@ -1259,6 +1265,76 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
 	return spf->default_option;
 }
 
+/*
+ * Fetch in progress (if callback data) or
+ * pending (if in oid_fetch_tasks in struct submodule_parallel_fetch)
+ */
+struct fetch_task {
+	struct repository *repo;
+	const struct submodule *sub;
+	unsigned free_sub : 1; /* Do we need to free the submodule? */
+
+	struct oid_array *commits; /* Ensure these commits are fetched */
+};
+
+/**
+ * When a submodule is not defined in .gitmodules, we cannot access it
+ * via the regular submodule-config. Create a fake submodule, which we can
+ * work on.
+ */
+static const struct submodule *get_non_gitmodules_submodule(const char *path)
+{
+	struct submodule *ret = NULL;
+	const char *name = default_name_or_path(path);
+
+	if (!name)
+		return NULL;
+
+	ret = xmalloc(sizeof(*ret));
+	memset(ret, 0, sizeof(*ret));
+	ret->path = name;
+	ret->name = name;
+
+	return (const struct submodule *) ret;
+}
+
+static struct fetch_task *fetch_task_create(struct repository *r,
+					    const char *path)
+{
+	struct fetch_task *task = xmalloc(sizeof(*task));
+	memset(task, 0, sizeof(*task));
+
+	task->sub = submodule_from_path(r, &null_oid, path);
+	if (!task->sub) {
+		/*
+		 * No entry in .gitmodules? Technically not a submodule,
+		 * but historically we supported repositories that happen to be
+		 * in-place where a gitlink is. Keep supporting them.
+		 */
+		task->sub = get_non_gitmodules_submodule(path);
+		if (!task->sub) {
+			free(task);
+			return NULL;
+		}
+
+		task->free_sub = 1;
+	}
+
+	return task;
+}
+
+static void fetch_task_release(struct fetch_task *p)
+{
+	if (p->free_sub)
+		free((void*)p->sub);
+	p->free_sub = 0;
+	p->sub = NULL;
+
+	if (p->repo)
+		repo_clear(p->repo);
+	FREE_AND_NULL(p->repo);
+}
+
 static struct repository *get_submodule_repo_for(struct repository *r,
 						 const struct submodule *sub)
 {
@@ -1286,39 +1362,29 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 static int get_next_submodule(struct child_process *cp,
 			      struct strbuf *err, void *data, void **task_cb)
 {
-	int ret = 0;
 	struct submodule_parallel_fetch *spf = data;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *default_argv;
-		const struct submodule *submodule;
-		struct repository *repo;
-		struct submodule default_submodule = SUBMODULE_INIT;
+		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		submodule = submodule_from_path(spf->r, &null_oid, ce->name);
-		if (!submodule) {
-			const char *name = default_name_or_path(ce->name);
-			if (name) {
-				default_submodule.path = name;
-				default_submodule.name = name;
-				submodule = &default_submodule;
-			}
-		}
+		task = fetch_task_create(spf->r, ce->name);
+		if (!task)
+			continue;
 
-		switch (get_fetch_recurse_config(submodule, spf))
+		switch (get_fetch_recurse_config(task->sub, spf))
 		{
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule ||
+			if (!task->sub ||
 			    !string_list_lookup(
 					&spf->changed_submodule_names,
-					submodule->name))
+					task->sub->name))
 				continue;
 			default_argv = "on-demand";
 			break;
@@ -1329,11 +1395,11 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
-		repo = get_submodule_repo_for(spf->r, submodule);
-		if (repo) {
+		task->repo = get_submodule_repo_for(spf->r, task->sub);
+		if (task->repo) {
+			struct strbuf submodule_prefix = STRBUF_INIT;
 			child_process_init(cp);
-			cp->dir = xstrdup(repo->gitdir);
+			cp->dir = task->repo->gitdir;
 			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
 			cp->git_cmd = 1;
 			if (!spf->quiet)
@@ -1343,12 +1409,22 @@ static int get_next_submodule(struct child_process *cp,
 			argv_array_pushv(&cp->args, spf->args.argv);
 			argv_array_push(&cp->args, default_argv);
 			argv_array_push(&cp->args, "--submodule-prefix");
+
+			strbuf_addf(&submodule_prefix, "%s%s/",
+						       spf->prefix,
+						       task->sub->path);
 			argv_array_push(&cp->args, submodule_prefix.buf);
 
-			repo_clear(repo);
-			free(repo);
-			ret = 1;
+			spf->count++;
+			*task_cb = task;
+
+			strbuf_release(&submodule_prefix);
+			return 1;
 		} else {
+
+			fetch_task_release(task);
+			free(task);
+
 			/*
 			 * An empty directory is normal,
 			 * the submodule is not initialized
@@ -1361,12 +1437,38 @@ static int get_next_submodule(struct child_process *cp,
 					    ce->name);
 			}
 		}
+	}
+
+	if (spf->oid_fetch_tasks_nr) {
+		struct fetch_task *task =
+			spf->oid_fetch_tasks[spf->oid_fetch_tasks_nr - 1];
+		struct strbuf submodule_prefix = STRBUF_INIT;
+		spf->oid_fetch_tasks_nr--;
+
+		strbuf_addf(&submodule_prefix, "%s%s/",
+			    spf->prefix, task->sub->path);
+
+		child_process_init(cp);
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+		cp->dir = task->repo->gitdir;
+
+		argv_array_init(&cp->args);
+		argv_array_pushv(&cp->args, spf->args.argv);
+		argv_array_push(&cp->args, "on-demand");
+		argv_array_push(&cp->args, "--submodule-prefix");
+		argv_array_push(&cp->args, submodule_prefix.buf);
+
+		/* NEEDSWORK: have get_default_remote from submodule--helper */
+		argv_array_push(&cp->args, "origin");
+		oid_array_for_each_unique(task->commits,
+					  append_oid_to_argv, &cp->args);
+
+		*task_cb = task;
 		strbuf_release(&submodule_prefix);
-		if (ret) {
-			spf->count++;
-			return 1;
-		}
+		return 1;
 	}
+
 	return 0;
 }
 
@@ -1374,20 +1476,66 @@ static int fetch_start_failure(struct strbuf *err,
 			       void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct fetch_task *task = task_cb;
 
 	spf->result = 1;
 
+	fetch_task_release(task);
 	return 0;
 }
 
+static int commit_missing_in_sub(const struct object_id *oid, void *data)
+{
+	struct repository *subrepo = data;
+
+	enum object_type type = oid_object_info(subrepo, oid, NULL);
+
+	return type != OBJ_COMMIT;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct fetch_task *task = task_cb;
+
+	struct string_list_item *it;
+	struct oid_array *commits;
 
 	if (retvalue)
 		spf->result = 1;
 
+	if (!task || !task->sub)
+		BUG("callback cookie bogus");
+
+	/* Is this the second time we process this submodule? */
+	if (task->commits)
+		goto out;
+
+	it = string_list_lookup(&spf->changed_submodule_names, task->sub->name);
+	if (!it)
+		/* Could be an unchanged submodule, not contained in the list */
+		goto out;
+
+	commits = it->util;
+	oid_array_filter(commits,
+			 commit_missing_in_sub,
+			 task->repo);
+
+	/* Are there commits we want, but do not exist? */
+	if (commits->nr) {
+		task->commits = commits;
+		ALLOC_GROW(spf->oid_fetch_tasks,
+			   spf->oid_fetch_tasks_nr + 1,
+			   spf->oid_fetch_tasks_alloc);
+		spf->oid_fetch_tasks[spf->oid_fetch_tasks_nr] = task;
+		spf->oid_fetch_tasks_nr++;
+		return 0;
+	}
+
+out:
+	fetch_task_release(task);
+
 	return 0;
 }
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 6c2f9b2ba2..9f8c744eb5 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -600,4 +600,121 @@ test_expect_success "fetch new commits when submodule got renamed" '
 	test_cmp expect actual
 '
 
+test_expect_success "fetch new submodule commits on-demand outside standard refspec" '
+	# add a second submodule and ensure it is around in downstream first
+	git clone submodule sub1 &&
+	git submodule add ./sub1 &&
+	git commit -m "adding a second submodule" &&
+	git -C downstream pull &&
+	git -C downstream submodule update --init --recursive &&
+
+	git checkout --detach &&
+
+	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/1 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	test_tick &&
+
+	D=$(git -C sub1 commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
+	git -C sub1 update-ref refs/changes/2 $D &&
+	git update-index --cacheinfo 160000 $D sub1 &&
+
+	git commit -m "updated submodules outside of refs/heads" &&
+	E=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/3 $E &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules origin refs/changes/3:refs/heads/my_branch &&
+		git -C submodule cat-file -t $C &&
+		git -C sub1 cat-file -t $D &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
+test_expect_success 'fetch new submodule commit on-demand in FETCH_HEAD' '
+	# depends on the previous test for setup
+
+	C=$(git -C submodule commit-tree -m "another change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/4 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	test_tick &&
+
+	D=$(git -C sub1 commit-tree -m "another change outside refs/heads" HEAD^{tree}) &&
+	git -C sub1 update-ref refs/changes/5 $D &&
+	git update-index --cacheinfo 160000 $D sub1 &&
+
+	git commit -m "updated submodules outside of refs/heads" &&
+	E=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/6 $E &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules origin refs/changes/6 &&
+		git -C submodule cat-file -t $C &&
+		git -C sub1 cat-file -t $D &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
+test_expect_success 'fetch new submodule commits on-demand without .gitmodules entry' '
+	# depends on the previous test for setup
+
+	git config -f .gitmodules --remove-section submodule.sub1 &&
+	git add .gitmodules &&
+	git commit -m "delete gitmodules file" &&
+	git checkout -B master &&
+	git -C downstream fetch &&
+	git -C downstream checkout origin/master &&
+
+	C=$(git -C submodule commit-tree -m "yet another change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/7 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	test_tick &&
+
+	D=$(git -C sub1 commit-tree -m "yet another change outside refs/heads" HEAD^{tree}) &&
+	git -C sub1 update-ref refs/changes/8 $D &&
+	git update-index --cacheinfo 160000 $D sub1 &&
+
+	git commit -m "updated submodules outside of refs/heads" &&
+	E=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/9 $E &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules origin refs/changes/9 &&
+		git -C submodule cat-file -t $C &&
+		git -C sub1 cat-file -t $D &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
+test_expect_success 'fetch new submodule commit intermittently referenced by superproject' '
+	# depends on the previous test for setup
+
+	D=$(git -C sub1 commit-tree -m "change 10 outside refs/heads" HEAD^{tree}) &&
+	E=$(git -C sub1 commit-tree -m "change 11 outside refs/heads" HEAD^{tree}) &&
+	F=$(git -C sub1 commit-tree -m "change 12 outside refs/heads" HEAD^{tree}) &&
+
+	git -C sub1 update-ref refs/changes/10 $D &&
+	git update-index --cacheinfo 160000 $D sub1 &&
+	git commit -m "updated submodules outside of refs/heads" &&
+
+	git -C sub1 update-ref refs/changes/11 $E &&
+	git update-index --cacheinfo 160000 $E sub1 &&
+	git commit -m "updated submodules outside of refs/heads" &&
+
+	git -C sub1 update-ref refs/changes/12 $F &&
+	git update-index --cacheinfo 160000 $F sub1 &&
+	git commit -m "updated submodules outside of refs/heads" &&
+
+	G=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/13 $G &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules origin refs/changes/13 &&
+
+		git -C sub1 cat-file -t $D &&
+		git -C sub1 cat-file -t $E &&
+		git -C sub1 cat-file -t $F
+	)
+'
+
 test_done
-- 
2.20.0.rc2.230.gc28305e538


^ permalink raw reply related

* Re: [wishlist] git submodule update --reset-hard
From: Yaroslav Halchenko @ 2018-12-06 21:24 UTC (permalink / raw)
  To: git
In-Reply-To: <CAGZ79kY8uv8zDm3f8Jb6aC-nit7OZduixyOekGYWa_xnqFqw-w@mail.gmail.com>


On Thu, 06 Dec 2018, Stefan Beller wrote:

> On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko <yoh@onerussian.com> wrote:

> > Dear Git Gurus,

> > I wondered what would be your take on my wishlist request to add
> > --reset-hard option, which would be very similar to regular "update" which
> > checks out necessary commit, but I want it to remain in the branch.

> What if the branch differs from the sha1 recorded in the superproject?

git reset --hard  itself is an operation which should be done with some
level of competence in doing "the right thing" by calling it.  You
can hop branches even in current (without any submodules in question)
repository with it and cause as much chaos as you desire.

If desired though, a number of prevention mechanisms could be in place (but
would require option(s) to overcome) to allow submodule to be reset --hard'ed
only when some conditions met (e.g. only to the commit which is among parent
commits path of the current branch).  This way wild hops would be prevented,
although you might still end up in some feature branch.  But since "reset
--hard" itself doesn't have any safe-guards, I do not really think they should
be implemented here either.

> > Rationale: In DataLad we heavily rely on submodules, and we have established
> > easy ways to do some manipulations across full hierarchies of them. E.g. a
> > single command could introduce a good number of commits across deep hierarchy
> > of submodules, e.g. while committing changes within deep submodule, while also
> > doing all necessary commits in the repositories leading to that submodule so
> > the entire tree of them stays in a "clean" state. The difficulty comes when
> > there is a need to just "forget" some changes.  The obvious way is to e.g.

> >    git reset --hard PREVIOUS_STATE

>   git reset --hard --recurse-submodules HEAD

> would do the trick

it does indeed some trick(s) but not all seems to be the ones I desire:

1. Seems to migrate submodule's .git directories into the top level
.git/modules

	$>  git reset --hard --recurse-submodules HEAD^^^
	Migrating git directory of 'famface' from        
	'/tmp/gobbini/famface/.git' to
	'/tmp/gobbini/.git/modules/famface'
	Migrating git directory of 'famface/data' from
	'/tmp/gobbini/famface/data/.git' to
	'/tmp/gobbini/.git/modules/famface/modules/data'
	Migrating git directory of 'famface/data/scripts/mridefacer' from
	'/tmp/gobbini/famface/data/scripts/mridefacer/.git' to
	'/tmp/gobbini/.git/modules/famface/modules/data/modules/scripts/mridefacer'
	HEAD is now at 9b4296d [DATALAD] aggregated meta data

we might eventually adopt this default already for years model (git annex seems
to be ok, in that it then replaces .git symlink file with the actual
symlink .git -> ../../.git/modules/...  So things seems to keep working
for annex)

2. It still does the detached HEAD for me

	$> git submodule status --recursive              
	 2569ab436501a832d35afbbe9cc20ffeb6077eb1 famface (2569ab4)
	 f1e8c9b8b025c311424283b9711efc6bc906ba2b famface/data (BIDS-v1.0.1)
	 49b0fe42696724c2a8492f999736056e51b77358 famface/data/scripts/mridefacer (49b0fe4)


> > in the top level repository.  But that leaves all the submodules now in
> > the undesired state.  If I do

> undesirable in the sense of still having local changes (that is what
> the above reset with `--recurse` would fix) or changed the branch
> state? (i.e. is detached but was on a branch before?)

right -- I meant the local changes and indeed reset --recurse-submodules
indeed seems to recurse nicely.  Then the undesired effect remaining only
the detached HEAD

> >   git submodule update --recursive

> > I would end up in the detached HEADs within submodules.

> > What I want is to retain current branch they are at (or may be possible
> > "were in"? reflog records might have that information)

> So something like

>   git submodule foreach --recursive git reset --hard

> ?

not quite  -- this would just kill all local changes within each submodule, not
to reset it to the desired state, which wouldn't be specified in such
invocation, and is only known to the repo containing it

> You may be interested in
> https://public-inbox.org/git/20180927221603.148025-1-sbeller@google.com/
> which introduces a switch `submodule.repoLike [ = true]`, which
> when set would not detach HEAD in submodules.

Thanks! looks interesting -- was there more discussion/activity beyond those 5
posts in the thread?
https://public-inbox.org/git/87h8i9ift4.fsf@evledraar.gmail.com/#r 

This feature might indeed come handy but if I got it right, it is somewhat
complimentary to just having submodule update --reset-hard .  E.g.  submodules
might be in different branches (if I am not tracking based on branch names), so
I would not want a recursive checkout with -b|-B.  But we would indeed benefit
from such functionality, since this difficulty of managing branches of
submodules I think would be elevated with it! (e.g. in one use case we probably
will end up with a few thousands of submodules, and at least 3 branches in each
which would need to be in sync, and typically you wouldn't want different
branches to be checked out in different submodules)

> Can you say more about the first question above:
> Would you typically have situations where the
> submodule branch is out of sync with the superproject
> and how do you deal with that?

typically I do not have anything out of sync.  My primary use-case is to
"cancel" recent changes in the entire tree of repositories.  I guess for
my use case, instead of needing two commands

   git reset --hard PREVIOUSPOINT
   git submodule update --reset--hard --recursive

I wish there was just one

   git reset --hard --recursive PREVIOUSPOINT

but I felt that   submodule update   might be a better starting point
since it already  provides different modes for update.  If I was even greedier,
I would have asked for 

   git revert --recursive <commit>...
   git rebase --recursive [-i] ...

which I also frequently desire (could elaborate on the use cases etc).

NB or --recurse-submodules to avoid confusion with recursive merge
strategy?


But for a complete answer -- if submodule branch is ahead of the superproject's
record, I just commit new state for it in superproject.  Or if I see that all
what I have done was actually a throw away -- "reset --hard" to that needed
state, again manually in submodule.  With  submodule update --reset-hard
--recursive  or  git reset --hard --recursive   I would be just ready
regardless of the depth and complexity of the hierarchy ;-)

> Adding another mode to `git submodule update` sounds
> reasonable to me, too.

cool, thanks!

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply

* [PATCH v2 3/3] Makefile: correct example fuzz build
From: Josh Steadmon @ 2018-12-06 20:20 UTC (permalink / raw)
  To: git, gitster, stolee, avarab
In-Reply-To: <cover.1544127439.git.steadmon@google.com>

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 6b72f37c29..bbcfc2bc9f 100644
--- a/Makefile
+++ b/Makefile
@@ -3104,7 +3104,7 @@ cover_db_html: cover_db
 # An example command to build against libFuzzer from LLVM 4.0.0:
 #
 # make CC=clang CXX=clang++ \
-#      FUZZ_CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
+#      CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
 #      LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \
 #      fuzz-all
 #
-- 
2.20.0.rc2.10.g7519fc76df


^ permalink raw reply related

* [PATCH v2 2/3] commit-graph: fix buffer read-overflow
From: Josh Steadmon @ 2018-12-06 20:20 UTC (permalink / raw)
  To: git, gitster, stolee, avarab
In-Reply-To: <cover.1544127439.git.steadmon@google.com>

fuzz-commit-graph identified a case where Git will read past the end of
a buffer containing a commit graph if the graph's header has an
incorrect chunk count. A simple bounds check in parse_commit_graph()
prevents this.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 commit-graph.c          | 14 ++++++++++++--
 t/t5318-commit-graph.sh | 28 ++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 07dd410f3c..224a5f161e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -165,10 +165,20 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	last_chunk_offset = 8;
 	chunk_lookup = data + 8;
 	for (i = 0; i < graph->num_chunks; i++) {
-		uint32_t chunk_id = get_be32(chunk_lookup + 0);
-		uint64_t chunk_offset = get_be64(chunk_lookup + 4);
+		uint32_t chunk_id;
+		uint64_t chunk_offset;
 		int chunk_repeated = 0;
 
+		if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH >
+		    data + graph_size) {
+			error(_("chunk lookup table entry missing; graph file may be incomplete"));
+			free(graph);
+			return NULL;
+		}
+
+		chunk_id = get_be32(chunk_lookup + 0);
+		chunk_offset = get_be64(chunk_lookup + 4);
+
 		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
 		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 5fe21db99f..2503cb0345 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -384,6 +384,29 @@ corrupt_graph_and_verify() {
 	test_i18ngrep "$grepstr" err
 }
 
+
+# usage: corrupt_and_zero_graph_then_verify <corrupt_position> <data> <zero_position> <string>
+# Manipulates the commit-graph file at <corrupt_position> by inserting the data,
+# then zeros the file starting at <zero_position>. Finally, runs
+# 'git commit-graph verify' and places the output in the file 'err'. Tests 'err'
+# for the given string.
+corrupt_and_zero_graph_then_verify() {
+	corrupt_pos=$1
+	data="${2:-\0}"
+	zero_pos=$3
+	grepstr=$4
+	orig_size=$(stat --format=%s $objdir/info/commit-graph)
+	cd "$TRASH_DIRECTORY/full" &&
+	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+	cp $objdir/info/commit-graph commit-graph-backup &&
+	printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$corrupt_pos" conv=notrunc &&
+	truncate --size=$zero_pos $objdir/info/commit-graph &&
+	truncate --size=$orig_size $objdir/info/commit-graph &&
+	test_must_fail git commit-graph verify 2>test_err &&
+	grep -v "^+" test_err >err &&
+	test_i18ngrep "$grepstr" err
+}
+
 test_expect_success 'detect bad signature' '
 	corrupt_graph_and_verify 0 "\0" \
 		"graph signature"
@@ -484,6 +507,11 @@ test_expect_success 'detect invalid checksum hash' '
 		"incorrect checksum"
 '
 
+test_expect_success 'detect truncated graph' '
+	corrupt_and_zero_graph_then_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \
+		$GRAPH_CHUNK_LOOKUP_OFFSET "chunk lookup table entry missing"
+'
+
 test_expect_success 'git fsck (checks commit-graph)' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git fsck &&
-- 
2.20.0.rc2.10.g7519fc76df


^ permalink raw reply related

* [PATCH v2 1/3] commit-graph, fuzz: Add fuzzer for commit-graph
From: Josh Steadmon @ 2018-12-06 20:20 UTC (permalink / raw)
  To: git, gitster, stolee, avarab
In-Reply-To: <cover.1544127439.git.steadmon@google.com>

Breaks load_commit_graph_one() into a new function,
parse_commit_graph(). The latter function operates on arbitrary buffers,
which makes it suitable as a fuzzing target. Since parse_commit_graph()
is only called by load_commit_graph_one() (and the fuzzer described
below), we omit error messages that would be duplicated by the caller.

Adds fuzz-commit-graph.c, which provides a fuzzing entry point
compatible with libFuzzer (and possibly other fuzzing engines).

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 .gitignore          |  1 +
 Makefile            |  1 +
 commit-graph.c      | 53 ++++++++++++++++++++++++++++++---------------
 commit-graph.h      |  3 +++
 fuzz-commit-graph.c | 16 ++++++++++++++
 5 files changed, 57 insertions(+), 17 deletions(-)
 create mode 100644 fuzz-commit-graph.c

diff --git a/.gitignore b/.gitignore
index 0d77ea5894..8bcf153ed9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,4 @@
+/fuzz-commit-graph
 /fuzz_corpora
 /fuzz-pack-headers
 /fuzz-pack-idx
diff --git a/Makefile b/Makefile
index 1a44c811aa..6b72f37c29 100644
--- a/Makefile
+++ b/Makefile
@@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \
 
 ETAGS_TARGET = TAGS
 
+FUZZ_OBJS += fuzz-commit-graph.o
 FUZZ_OBJS += fuzz-pack-headers.o
 FUZZ_OBJS += fuzz-pack-idx.o
 
diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..07dd410f3c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -84,16 +84,10 @@ static int commit_graph_compatible(struct repository *r)
 struct commit_graph *load_commit_graph_one(const char *graph_file)
 {
 	void *graph_map;
-	const unsigned char *data, *chunk_lookup;
 	size_t graph_size;
 	struct stat st;
-	uint32_t i;
-	struct commit_graph *graph;
+	struct commit_graph *ret;
 	int fd = git_open(graph_file);
-	uint64_t last_chunk_offset;
-	uint32_t last_chunk_id;
-	uint32_t graph_signature;
-	unsigned char graph_version, hash_version;
 
 	if (fd < 0)
 		return NULL;
@@ -108,27 +102,55 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 		die(_("graph file %s is too small"), graph_file);
 	}
 	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	ret = parse_commit_graph(graph_map, fd, graph_size);
+
+	if (!ret) {
+		munmap(graph_map, graph_size);
+		close(fd);
+		exit(1);
+	}
+
+	return ret;
+}
+
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+					size_t graph_size)
+{
+	const unsigned char *data, *chunk_lookup;
+	uint32_t i;
+	struct commit_graph *graph;
+	uint64_t last_chunk_offset;
+	uint32_t last_chunk_id;
+	uint32_t graph_signature;
+	unsigned char graph_version, hash_version;
+
+	if (!graph_map)
+		return NULL;
+
+	if (graph_size < GRAPH_MIN_SIZE)
+		return NULL;
+
 	data = (const unsigned char *)graph_map;
 
 	graph_signature = get_be32(data);
 	if (graph_signature != GRAPH_SIGNATURE) {
 		error(_("graph signature %X does not match signature %X"),
 		      graph_signature, GRAPH_SIGNATURE);
-		goto cleanup_fail;
+		return NULL;
 	}
 
 	graph_version = *(unsigned char*)(data + 4);
 	if (graph_version != GRAPH_VERSION) {
 		error(_("graph version %X does not match version %X"),
 		      graph_version, GRAPH_VERSION);
-		goto cleanup_fail;
+		return NULL;
 	}
 
 	hash_version = *(unsigned char*)(data + 5);
 	if (hash_version != GRAPH_OID_VERSION) {
 		error(_("hash version %X does not match version %X"),
 		      hash_version, GRAPH_OID_VERSION);
-		goto cleanup_fail;
+		return NULL;
 	}
 
 	graph = alloc_commit_graph();
@@ -152,7 +174,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
 			error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
 			      (uint32_t)chunk_offset);
-			goto cleanup_fail;
+			free(graph);
+			return NULL;
 		}
 
 		switch (chunk_id) {
@@ -187,7 +210,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 
 		if (chunk_repeated) {
 			error(_("chunk id %08x appears multiple times"), chunk_id);
-			goto cleanup_fail;
+			free(graph);
+			return NULL;
 		}
 
 		if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
@@ -201,11 +225,6 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	}
 
 	return graph;
-
-cleanup_fail:
-	munmap(graph_map, graph_size);
-	close(fd);
-	exit(1);
 }
 
 static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
diff --git a/commit-graph.h b/commit-graph.h
index 9db40b4d3a..813e7c19f1 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -54,6 +54,9 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+					size_t graph_size);
+
 /*
  * Return 1 if and only if the repository has a commit-graph
  * file and generation numbers are computed in that file.
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
new file mode 100644
index 0000000000..cf790c9d04
--- /dev/null
+++ b/fuzz-commit-graph.c
@@ -0,0 +1,16 @@
+#include "commit-graph.h"
+
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+					size_t graph_size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+	struct commit_graph *g;
+
+	g = parse_commit_graph((void *)data, -1, size);
+	free(g);
+
+	return 0;
+}
-- 
2.20.0.rc2.10.g7519fc76df


^ permalink raw reply related

* [PATCH v2 0/3] Add commit-graph fuzzer and fix buffer overflow
From: Josh Steadmon @ 2018-12-06 20:20 UTC (permalink / raw)
  To: git, gitster, stolee, avarab
In-Reply-To: <cover.1544048946.git.steadmon@google.com>

Add a new fuzz test for the commit graph and fix a buffer read-overflow
that it discovered. Additionally, fix the Makefile instructions for
building fuzzers.

Changes since V1:
  * Moved the parse_commit_graph() declaration to the header file, since
    we don't mind if others use it.
  * Moved some unnecessary comments into commit messages.
  * Fixed some style issues.
  * Added a test case for detecting commit graphs with missing chunk
    lookup entries.
  * Ævar's comments on the Makefile made me realize the fuzzer build
    instructions were using the wrong variable. Added a new commit to
    fix this.

Josh Steadmon (3):
  commit-graph, fuzz: Add fuzzer for commit-graph
  commit-graph: fix buffer read-overflow
  Makefile: correct example fuzz build

 .gitignore              |  1 +
 Makefile                |  3 +-
 commit-graph.c          | 67 +++++++++++++++++++++++++++++------------
 commit-graph.h          |  3 ++
 fuzz-commit-graph.c     | 16 ++++++++++
 t/t5318-commit-graph.sh | 28 +++++++++++++++++
 6 files changed, 98 insertions(+), 20 deletions(-)
 create mode 100644 fuzz-commit-graph.c

Range-diff against v1:
1:  53e62baaa8 ! 1:  0b57ecbe1b commit-graph, fuzz: Add fuzzer for commit-graph
    @@ -4,7 +4,9 @@
     
         Breaks load_commit_graph_one() into a new function,
         parse_commit_graph(). The latter function operates on arbitrary buffers,
    -    which makes it suitable as a fuzzing target.
    +    which makes it suitable as a fuzzing target. Since parse_commit_graph()
    +    is only called by load_commit_graph_one() (and the fuzzer described
    +    below), we omit error messages that would be duplicated by the caller.
     
         Adds fuzz-commit-graph.c, which provides a fuzzing entry point
         compatible with libFuzzer (and possibly other fuzzing engines).
    @@ -35,17 +37,6 @@
      diff --git a/commit-graph.c b/commit-graph.c
      --- a/commit-graph.c
      +++ b/commit-graph.c
    -@@
    - #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
    - 			+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
    - 
    -+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
    -+					size_t graph_size);
    -+
    -+
    - char *get_commit_graph_filename(const char *obj_dir)
    - {
    - 	return xstrfmt("%s/info/commit-graph", obj_dir);
     @@
      struct commit_graph *load_commit_graph_one(const char *graph_file)
      {
    @@ -70,7 +61,7 @@
      	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
     +	ret = parse_commit_graph(graph_map, fd, graph_size);
     +
    -+	if (ret == NULL) {
    ++	if (!ret) {
     +		munmap(graph_map, graph_size);
     +		close(fd);
     +		exit(1);
    @@ -79,10 +70,6 @@
     +	return ret;
     +}
     +
    -+/*
    -+ * This function is intended to be used only from load_commit_graph_one() or in
    -+ * fuzz tests.
    -+ */
     +struct commit_graph *parse_commit_graph(void *graph_map, int fd,
     +					size_t graph_size)
     +{
    @@ -94,11 +81,9 @@
     +	uint32_t graph_signature;
     +	unsigned char graph_version, hash_version;
     +
    -+	/*
    -+	 * This should already be checked in load_commit_graph_one, but we still
    -+	 * need a check here for when we're calling parse_commit_graph directly
    -+	 * from fuzz tests. We can omit the error message in that case.
    -+	 */
    ++	if (!graph_map)
    ++		return NULL;
    ++
     +	if (graph_size < GRAPH_MIN_SIZE)
     +		return NULL;
     +
    @@ -162,12 +147,25 @@
      
      static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
     
    + diff --git a/commit-graph.h b/commit-graph.h
    + --- a/commit-graph.h
    + +++ b/commit-graph.h
    +@@
    + 
    + struct commit_graph *load_commit_graph_one(const char *graph_file);
    + 
    ++struct commit_graph *parse_commit_graph(void *graph_map, int fd,
    ++					size_t graph_size);
    ++
    + /*
    +  * Return 1 if and only if the repository has a commit-graph
    +  * file and generation numbers are computed in that file.
    +
      diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
      new file mode 100644
      --- /dev/null
      +++ b/fuzz-commit-graph.c
     @@
    -+#include "object-store.h"
     +#include "commit-graph.h"
     +
     +struct commit_graph *parse_commit_graph(void *graph_map, int fd,
    @@ -179,9 +177,8 @@
     +{
     +	struct commit_graph *g;
     +
    -+	g = parse_commit_graph((void *) data, -1, size);
    -+	if (g)
    -+		free(g);
    ++	g = parse_commit_graph((void *)data, -1, size);
    ++	free(g);
     +
     +	return 0;
     +}
2:  ad2e761f44 ! 2:  af45c2337f commit-graph: fix buffer read-overflow
    @@ -22,7 +22,8 @@
     +		uint64_t chunk_offset;
      		int chunk_repeated = 0;
      
    -+		if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH > data + graph_size) {
    ++		if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH >
    ++		    data + graph_size) {
     +			error(_("chunk lookup table entry missing; graph file may be incomplete"));
     +			free(graph);
     +			return NULL;
    @@ -34,3 +35,49 @@
      		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
      
      		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
    +
    + diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
    + --- a/t/t5318-commit-graph.sh
    + +++ b/t/t5318-commit-graph.sh
    +@@
    + 	test_i18ngrep "$grepstr" err
    + }
    + 
    ++
    ++# usage: corrupt_and_zero_graph_then_verify <corrupt_position> <data> <zero_position> <string>
    ++# Manipulates the commit-graph file at <corrupt_position> by inserting the data,
    ++# then zeros the file starting at <zero_position>. Finally, runs
    ++# 'git commit-graph verify' and places the output in the file 'err'. Tests 'err'
    ++# for the given string.
    ++corrupt_and_zero_graph_then_verify() {
    ++	corrupt_pos=$1
    ++	data="${2:-\0}"
    ++	zero_pos=$3
    ++	grepstr=$4
    ++	orig_size=$(stat --format=%s $objdir/info/commit-graph)
    ++	cd "$TRASH_DIRECTORY/full" &&
    ++	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
    ++	cp $objdir/info/commit-graph commit-graph-backup &&
    ++	printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$corrupt_pos" conv=notrunc &&
    ++	truncate --size=$zero_pos $objdir/info/commit-graph &&
    ++	truncate --size=$orig_size $objdir/info/commit-graph &&
    ++	test_must_fail git commit-graph verify 2>test_err &&
    ++	grep -v "^+" test_err >err &&
    ++	test_i18ngrep "$grepstr" err
    ++}
    ++
    + test_expect_success 'detect bad signature' '
    + 	corrupt_graph_and_verify 0 "\0" \
    + 		"graph signature"
    +@@
    + 		"incorrect checksum"
    + '
    + 
    ++test_expect_success 'detect truncated graph' '
    ++	corrupt_and_zero_graph_then_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \
    ++		$GRAPH_CHUNK_LOOKUP_OFFSET "chunk lookup table entry missing"
    ++'
    ++
    + test_expect_success 'git fsck (checks commit-graph)' '
    + 	cd "$TRASH_DIRECTORY/full" &&
    + 	git fsck &&
-:  ---------- > 3:  7519fc76df Makefile: correct example fuzz build
-- 
2.20.0.rc2.10.g7519fc76df


^ permalink raw reply

* Re: git, monorepos, and access control
From: Johannes Schindelin @ 2018-12-06 20:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Coiner, John, git@vger.kernel.org
In-Reply-To: <20181205210104.GC19936@sigill.intra.peff.net>

Hi,

On Wed, 5 Dec 2018, Jeff King wrote:

> The model that fits more naturally with how Git is implemented would be
> to use submodules. There you leak the hash of the commit from the
> private submodule, but that's probably obscure enough (and if you're
> really worried, you can add a random nonce to the commit messages in the
> submodule to make their hashes unguessable).

I hear myself frequently saying: "Friends don't let friends use
submodules". It's almost like: "Some people think their problem is solved
by using submodules. Only now they have two problems."

There are big reasons, after all, why some companies go for monorepos: it
is not for lack of trying to go with submodules, it is the problems that
were incurred by trying to treat entire repositories the same as single
files (or even trees): they are just too different.

In a previous life, I also tried to go for submodules, was burned, and had
to restart the whole thing. We ended up with something that might work in
this instance, too, although our use case was not need-to-know type of
encapsulation. What we went for was straight up modularization.

What I mean is that we split the project up into over 100 individual
projects that are now all maintained in individual repositories, and they
are connected completely outside of Git, via a dependency management
system (in this case, Maven, although that is probably too Java-centric
for AMD's needs).

I just wanted to throw that out here: if you can split up your project
into individual projects, it might make sense not to maintain them as
submodules but instead as individual repositories whose artifacts are
uploaded into a central, versioned artifact store (Maven, NuGet, etc). And
those artifacts would then be retrieved by the projects that need them.

I figure that that scheme might work for you better than submodules: I
could imagine that you need to make the build artifacts available even to
people who are not permitted to look at the corresponding source code,
anyway.

Ciao,
Johannes

^ permalink raw reply

* Re: enhancement: support for author.email and author.name in "git config"
From: William Hubbs @ 2018-12-06 19:17 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, chutzpah, williamh
In-Reply-To: <CAN0heSp2g0A82YYvMw-RaERESXFtj3TgPKA3RysC07Lf=tHBcg@mail.gmail.com>

On Thu, Dec 06, 2018 at 07:38:08PM +0100, Martin Ågren wrote:
> Hi William,
> 
> [...]
> 
> This idea was floated a couple of months ago [1]. Junio seemed to find
> the request sensible and outlined a design. No patches materialized, as
> far as I know, but that could be because Eric suggested a tool called
> direnv. Maybe that would work for you.
 
 Yes, this design would solve the issue.

 I'll take a look at direnv, but it would be nice to have native git
 support for this. :-)

Thanks,

William

> [1] https://public-inbox.org/git/0f66ad7a-2289-2cce-6533-a27e19945187@rasmusvillemoes.dk/
> 
> Martin

^ permalink raw reply

* Re: BUG: CR marker ^M doesn't show up in '-' lines of diffs when the ending of the removed line is CR+LF
From: Frank Schäfer @ 2018-12-06 18:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, brian m. carlson, git
In-Reply-To: <xmqq1s6vbhdr.fsf@gitster-ct.c.googlers.com>


Am 06.12.18 um 01:58 schrieb Junio C Hamano:
> Frank Schäfer <fschaefer.oss@googlemail.com> writes:
>
>> Just to be sure that I'm not missing anything here:
>> What's your definition of "LF in repository, CRLF in working tree" in
>> terms of config parameters ?
> :::Documentation/config/core.txt:::
>
> core.autocrlf::
> 	Setting this variable to "true" is the same as setting
> 	the `text` attribute to "auto" on all files and core.eol to "crlf".
> 	Set to true if you want to have `CRLF` line endings in your
> 	working directory and the repository has LF line endings.
> 	This variable can be set to 'input',
> 	in which case no output conversion is performed.
Argh... crap. I was missing that I have to set the "text" attribute
manually...
Thats why core.eol=crlf didn't make a difference in my tests. :/

Let me thoroughly re-validate all cases.
I will likely not find the time for that before Monday, but I think that
break could be helpful. ;)

Regards,
Frank


^ permalink raw reply

* Re: [RFC PATCH] Introduce "precious" file concept
From: Duy Nguyen @ 2018-12-06 18:39 UTC (permalink / raw)
  To: Jacob Keller
  Cc: per.lundberg, Ævar Arnfjörð Bjarmason,
	brian m. carlson, Git Mailing List, jost, Joshua Jensen,
	Junio C Hamano, git, Clemens Buchacher, Holger Hellmuth (IKS),
	Kevin Ballard
In-Reply-To: <CA+P7+xri1=peNpEiZCE802HwCXhojyp2BDvOR+6BBSoRtsZyzA@mail.gmail.com>

On Tue, Nov 27, 2018 at 1:56 PM Jacob Keller <jacob.keller@gmail.com> wrote:
> Personally, I would rather err on the side which requires the least
> interaction from users to avoid silently clobbering an ignored file.
>
> Either Duy's solution with a sort of "untracked" reflog, or the
> garbage/trashable notion.

The "untracked reflog" is partially functional now [1] if you want to
have a look. I'm not going to post the series until post-2.20, but if
you do look, I suggest the first patch that lays out the design and a
plumbing command to manage it. Basically you'll do

    git backup-log --id=worktree log <some-path>

then pick up the version you like with "git backup-log cat" and do
whatever you want with it. High level UI is not there and will be a
topic of discussion.

The precious/trashable/garbage notion can be used to suppress making backups.

[1] https://gitlab.com/pclouds/git/commits/backup-log
-- 
Duy

^ permalink raw reply

* Re: enhancement: support for author.email and author.name in "git config"
From: Martin Ågren @ 2018-12-06 18:38 UTC (permalink / raw)
  To: williamh; +Cc: Git Mailing List, chutzpah
In-Reply-To: <20181206181739.GB30045@whubbs1.gaikai.biz>

Hi William,

On Thu, 6 Dec 2018 at 19:18, William Hubbs <williamh@gentoo.org> wrote:
> We are in a situation where we would like to use author information that is
> different from committer information when we commit to certain
> repositories.

[...]

> [...] I would like to propose the addition of author.email and
> author.name settings to the git-config system.
>
> Additionally you could add committer.name and committer.email, but the
> only reason I bring the committer variations up is consistency since I
> see you also have GIT_COMMITTER_NAME and GIT_COMMITTER_EMAIL environment
> variables.

This idea was floated a couple of months ago [1]. Junio seemed to find
the request sensible and outlined a design. No patches materialized, as
far as I know, but that could be because Eric suggested a tool called
direnv. Maybe that would work for you.

[1] https://public-inbox.org/git/0f66ad7a-2289-2cce-6533-a27e19945187@rasmusvillemoes.dk/

Martin

^ permalink raw reply

* Re: [wishlist] git submodule update --reset-hard
From: Stefan Beller @ 2018-12-06 18:29 UTC (permalink / raw)
  To: Yaroslav Halchenko; +Cc: git
In-Reply-To: <20181206173554.GH4633@hopa.kiewit.dartmouth.edu>

On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko <yoh@onerussian.com> wrote:
>
> Dear Git Gurus,
>
> I wondered what would be your take on my wishlist request to add
> --reset-hard option, which would be very similar to regular "update" which
> checks out necessary commit, but I want it to remain in the branch.

What if the branch differs from the sha1 recorded in the superproject?

> Rationale: In DataLad we heavily rely on submodules, and we have established
> easy ways to do some manipulations across full hierarchies of them. E.g. a
> single command could introduce a good number of commits across deep hierarchy
> of submodules, e.g. while committing changes within deep submodule, while also
> doing all necessary commits in the repositories leading to that submodule so
> the entire tree of them stays in a "clean" state. The difficulty comes when
> there is a need to just "forget" some changes.  The obvious way is to e.g.
>
>    git reset --hard PREVIOUS_STATE

  git reset --hard --recurse-submodules HEAD

would do the trick

> in the top level repository.  But that leaves all the submodules now in
> the undesired state.  If I do

undesirable in the sense of still having local changes (that is what
the above reset with `--recurse` would fix) or changed the branch
state? (i.e. is detached but was on a branch before?)

>   git submodule update --recursive
>
> I would end up in the detached HEADs within submodules.
>
> What I want is to retain current branch they are at (or may be possible
> "were in"? reflog records might have that information)

So something like

  git submodule foreach --recursive git reset --hard

?

You may be interested in
https://public-inbox.org/git/20180927221603.148025-1-sbeller@google.com/
which introduces a switch `submodule.repoLike [ = true]`, which
when set would not detach HEAD in submodules.

Can you say more about the first question above:
Would you typically have situations where the
submodule branch is out of sync with the superproject
and how do you deal with that?

Adding another mode to `git submodule update` sounds
reasonable to me, too.

Stefan

^ permalink raw reply

* enhancement: support for author.email and author.name in "git config"
From: William Hubbs @ 2018-12-06 18:17 UTC (permalink / raw)
  To: git; +Cc: chutzpah, williamh

Hi all,

We are in a situation where we would like to use author information that is
different from committer information when we commit to certain
repositories.

Currently, it looks like there are two ways to do this, and I'm not sure
how to make either of them work well.

There are the GIT_AUTHOR_EMAIL and GIT_AUTHOR_NAME environment
variables, but  these would have to be set globally. Also, there is the
--author command line switch for the "git commit" command, but this is
easy to forget to use.

Is there something I'm missing?

If not, I would like to propose the addition of author.email and
author.name settings to the git-config system.

Additionally you could add committer.name and committer.email, but the
only reason I bring the committer variations up is consistency since I
see you also have GIT_COMMITTER_NAME and GIT_COMMITTER_EMAIL environment
variables.

Does anyone have any thoughts on this?

I don't think either of us is on the mailing list, so please keep us in
CC's when you reply.

Thanks,

William

^ permalink raw reply

* Re: A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others
From: Stefan Beller @ 2018-12-06 18:11 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Ævar Arnfjörð Bjarmason, git
In-Reply-To: <b23422fa-8a1d-e508-a008-a2fe27b7b49f@talktalk.net>

On Thu, Dec 6, 2018 at 6:58 AM Phillip Wood <phillip.wood@talktalk.net> wrote:

> > So is there some "must be at least two consecutive lines" condition for
> > not-plain, or is something else going on here?
>
> To be considered a block has to have 20 alphanumeric characters - see
> commit f0b8fb6e59 ("diff: define block by number of alphanumeric chars",
> 2017-08-15). This stops things like random '}' lines being marked as
> moved on their own.

This is spot on.

All but the "plain" mode use the concept of "blocks" of code
(there is even one mode called "blocks", which adds to the confusion).

> It might be better to use some kind of frequency
> information (a bit like python's difflib junk parameter) instead so that
> (fairly) unique short lines also get marked properly.

Yes that is what I was initially thinking about. However to have good
information, you'd need to index a whole lot (the whole repository,
i.e. all text blobs in existence?) to get an accurate picture of frequency
information, which I'd prefer to call entropy as I come from a background
familiar with https://en.wikipedia.org/wiki/Information_theory, I am not
sure where 'frequency information' comes from -- it sounds like the
same concept.

Of course it is too expensive to run an operation O(repository size)
just for this diff, so maybe we could get away with some smaller
corpus to build up this information on what is sufficient for coloring.

When only looking at the given diff, I would imagine that each line
would not carry a whole lot of information as its characters occur
rather frequently compared to the rest of the diff.

Best,
Stefan

^ permalink raw reply

* [wishlist] git submodule update --reset-hard
From: Yaroslav Halchenko @ 2018-12-06 17:35 UTC (permalink / raw)
  To: git@vger.kernel.org

Dear Git Gurus,

I wondered what would be your take on my wishlist request to add
--reset-hard option, which would be very similar to regular "update" which
checks out necessary commit, but I want it to remain in the branch.

Rationale: In DataLad we heavily rely on submodules, and we have established
easy ways to do some manipulations across full hierarchies of them. E.g. a
single command could introduce a good number of commits across deep hierarchy
of submodules, e.g. while committing changes within deep submodule, while also
doing all necessary commits in the repositories leading to that submodule so
the entire tree of them stays in a "clean" state. The difficulty comes when
there is a need to just "forget" some changes.  The obvious way is to e.g. 

   git reset --hard PREVIOUS_STATE

in the top level repository.  But that leaves all the submodules now in
the undesired state.  If I do

  git submodule update --recursive

I would end up in the detached HEADs within submodules.  

What I want is to retain current branch they are at (or may be possible
"were in"? reflog records might have that information)

Example:

# Have to use datalad install  since  git clone --recurse-submodules
# seems to not consider alternative locations for submodules' .git/
# with url being just a relative path, and where submodules aren't 
# all residing up under toplevel URL .git/

	$> datalad install -r http://datasets.datalad.org/labs/gobbini/.git
	[INFO   ] Cloning http://datasets.datalad.org/labs/gobbini/.git into '/tmp/gobbini' 
	install(ok): /tmp/gobbini (dataset)                                                                             
	[INFO   ] Installing <Dataset path=/tmp/gobbini> recursively 
	[INFO   ] Cloning http://datasets.datalad.org/labs/gobbini/famface/.git into '/tmp/gobbini/famface' 
	[INFO   ] Cloning http://datasets.datalad.org/labs/gobbini/famface/data/.git into '/tmp/gobbini/famface/data'   
	[INFO   ] access to dataset sibling "datasets.datalad.org" not auto-enabled, enable with:                       
	| 		datalad siblings -d "/tmp/gobbini/famface/data" enable -s datasets.datalad.org 
	[INFO   ] Cloning http://datasets.datalad.org/labs/gobbini/famface/data/scripts/mridefacer/.git [2 other candidates] into '/tmp/gobbini/famface/data/scripts/mridefacer' 
	action summary:                                                                                                 
	  install (ok: 4)

so I have a hierarchy in a good state and all checked out in master
branch

	$> cd gobbini

	$> git submodule status --recursive       
	 b9071a6bc9f7665f7c75549c63d29f16d40e8af7 famface (heads/master)
	 e59ba76b42f219bdf14b6b547dd6d9cc0ed5227f famface/data (BIDS-v1.0.1-3-ge59ba76b)
	 5d8036c0aaeebb448a00df6296ddc9f799efdd1f famface/data/scripts/mridefacer (heads/master)

	$> git submodule foreach --recursive cat .git/HEAD                 
	Entering 'famface'
	ref: refs/heads/master
	Entering 'famface/data'
	ref: refs/heads/master
	Entering 'famface/data/scripts/mridefacer'
	ref: refs/heads/master


and if I do roll back

	$> git reset --hard HEAD^^^        
	HEAD is now at 9b4296d [DATALAD] aggregated meta data
	changes on filesystem:                                                                                          
	 famface | 2 +-

and default update --recursive

	$> git submodule update --recursive
	Submodule path 'famface': checked out '2569ab436501a832d35afbbe9cc20ffeb6077eb1'
	Submodule path 'famface/data': checked out 'f1e8c9b8b025c311424283b9711efc6bc906ba2b'
	Submodule path 'famface/data/scripts/mridefacer': checked out '49b0fe42696724c2a8492f999736056e51b77358'

I end up in detached HEADs

	$> git submodule status --recursive 
	 2569ab436501a832d35afbbe9cc20ffeb6077eb1 famface (2569ab4)
	 f1e8c9b8b025c311424283b9711efc6bc906ba2b famface/data (BIDS-v1.0.1)
	 49b0fe42696724c2a8492f999736056e51b77358 famface/data/scripts/mridefacer (49b0fe4)


I do see that there is a "custom command" way to do it via
"submodule.<name>.update" config setting, but that is not easy to use for my
case since all the `<name>` would be different to specify !git reset --hard for
all of them via config option and I could not find any way to "glob" config
(like submodule.*.update).  But in effect that is probably what I need:

	# restarting from a clean state here
	$> git -c submodule.famface.update='!git reset --hard' submodule update --recursive    
	HEAD is now at 2569ab4 [DATALAD] aggregated meta data
	Submodule path 'famface': 'git reset --hard 2569ab436501a832d35afbbe9cc20ffeb6077eb1'
	Submodule path 'famface/data': checked out 'f1e8c9b8b025c311424283b9711efc6bc906ba2b'
	Submodule path 'famface/data/scripts/mridefacer': checked out '49b0fe42696724c2a8492f999736056e51b77358'

	$> git submodule status --recursive                                                
	 2569ab436501a832d35afbbe9cc20ffeb6077eb1 famface (heads/master)
	 f1e8c9b8b025c311424283b9711efc6bc906ba2b famface/data (BIDS-v1.0.1)
	 49b0fe42696724c2a8492f999736056e51b77358 famface/data/scripts/mridefacer (49b0fe4)


Corner cases I see which might make it trickier for a full blown
solution (might be relevant to current state as well for other
strategies):

-  If between those commits we got an additional submodule added (in
   immediate repository or within one of the subdatasets), ideally it
   should also be wiped out

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply

* Re: [BUG REPORT] Git does not correctly replay bisect log
From: Lukáš Krejčí @ 2018-12-06 17:30 UTC (permalink / raw)
  To: Christian Couder; +Cc: git
In-Reply-To: <CAP8UFD3VCzEdtfRqi_d2ibPtTN0uocGW+CshY_5m16TW1_YUdw@mail.gmail.com>

On Thu, 2018-12-06 at 17:31 +0100, Christian Couder wrote:
> > When Git replays the bisect log, it only updates refs/bisect/bad,
> > refs/bisect/good-*, refs/bisect/skip-* and reconstructs the log in
> > .git/BISECT_LOG. After that check_good_are_ancestors_of_bad() verifies
> > that all good commits are ancestors of the bad commit, and if not, the
> > message "Bisecting: a merge base must be tested" is printed and the
> > branch is switched to the merge base of the bad and all the good
> > commits.
> 
> I am not sure if you are talking about running `git bisect replay` or
> sourcing the log in the above.

I am talking about `git bisect replay`. The shell script, as far as I
can see, only updates the references (ref/bisect/*) and never checks if
the revisions marked as 'good' are ancestors of the 'bad' one.
Therefore, $GIT_DIR/BISECT_ANCESTORS_OK file is never created.

The first time the ancestors are checked is in the helper (`git-bisect-
-help --next-all`) that has only limited information from refs/bisect*.


^ permalink raw reply

* Re: [BUG REPORT] Git does not correctly replay bisect log
From: Christian Couder @ 2018-12-06 16:31 UTC (permalink / raw)
  To: lskrejci; +Cc: git
In-Reply-To: <278123655fbdb565aed16bba804711774716c554.camel@gmail.com>

Hi,

On Thu, Dec 6, 2018 at 3:43 PM Lukáš Krejčí <lskrejci@gmail.com> wrote:
>
> Hello again,
>
> after looking into this today, I'm not sure if this can be considered a
> bug - it's just that I expected Git to check out the exact commit to
> test that was there before resetting the bisect. That made me uncertain
> whether Git restored the correct state.
>
> When I looked at what Git actually does, it became clear that the
> behavior is not incorrect but perhaps a bit surprising.

Yeah, I agree. I suspect, but I am not sure, that the difference of
behavior is because in one case we only check merge bases once at the
beginning (maybe because the BISECT_ANCESTORS_OK file always exists)
while in the other case we check them more than once during the
bisection. I haven't had time to look closely at this, but I would
like to.

> When Git replays the bisect log, it only updates refs/bisect/bad,
> refs/bisect/good-*, refs/bisect/skip-* and reconstructs the log in
> .git/BISECT_LOG. After that check_good_are_ancestors_of_bad() verifies
> that all good commits are ancestors of the bad commit, and if not, the
> message "Bisecting: a merge base must be tested" is printed and the
> branch is switched to the merge base of the bad and all the good
> commits.

I am not sure if you are talking about running `git bisect replay` or
sourcing the log in the above.

> Basically, some state is lost because Git "forgot" the first good
> commit from the log already was an ancestor of the first bad one.

The BISECT_ANCESTORS_OK file should be there to avoid forgetting that
we already checked the merge bases.

> In other words, it's as if I just started the bisect with the following
> commands and just pasted the whole bisect log to .git/BISECT_LOG:
>
> $ git bisect start
> $ git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703
> $ git bisect good 958f338e96f874a0d29442396d6adf9c1e17aa2d
> $ git bisect bad 1b0d274523df5ef1caedc834da055ff721e4d4f0
> Bisecting: a merge base must be tested
> [1e4b044d22517cae7047c99038abb444423243ca] Linux 4.18-rc4

Yeah, when we start a new bisection the BISECT_ANCESTORS_OK file
should be erased if it exists, while it shouldn't be erased when we
are already in the middle of an existing bisection.

[...]

> And indeed, marking the merge base as good switches to the correct
> commit after the bisect. Marking it as bad will fail, so at least you
> can't make a mistake after replaying the bisect log:
> $ git bisect bad
> The merge base 1e4b044d22517cae7047c99038abb444423243ca is bad.
> This means the bug has been fixed between 1e4b044d22517cae7047c99038abb444423243ca and [94710cac0ef4ee177a63b5227664b38c95bbf703 958f338e96f874a0d29442396d6adf9c1e17aa2d].

Yeah, I think this works as expected.

> Once again, I'm sorry for the noise. I guess it wasn't clear from the
> man page that something like this could happen and that made me think
> that this was a bug.

No reason to be sorry, there might still be a bug related to the
BISECT_ANCESTORS_OK file or something. I hope I can take a look at
this more closely soon.

Thanks for your report and your work on this,
Christian.

^ permalink raw reply

* [PATCH] Indent code with TABs
From: Nguyễn Thái Ngọc Duy @ 2018-12-06 15:42 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

We indent with TABs and sometimes for fine alignment, TABs followed by
spaces, but never all spaces (unless the indentation is less than 8
columns). Indenting with spaces slips through in some places. Fix
them.

Imported code and compat/ are left alone on purpose. The former should
remain as close as upstream as possible. The latter pretty much has
separate maintainers, it's up to them to decide.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Should be quite safe to merge since "git diff -b" is empty

 archive-tar.c     |   2 +-
 archive.c         |   4 +-
 builtin/add.c     |   2 +-
 builtin/gc.c      |   2 +-
 cache-tree.c      |   2 +-
 convert.c         |   6 +--
 git-compat-util.h |   2 +-
 parse-options.c   |   2 +-
 parse-options.h   |   6 +--
 quote.c           |   2 +-
 read-cache.c      | 118 +++++++++++++++++++++++-----------------------
 revision.c        |   4 +-
 symlinks.c        |   2 +-
 13 files changed, 77 insertions(+), 77 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index a58e1a8ebf..4aabd566fb 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -142,7 +142,7 @@ static int stream_blocked(const struct object_id *oid)
  * string and appends it to a struct strbuf.
  */
 static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
-                                     const char *value, unsigned int valuelen)
+				     const char *value, unsigned int valuelen)
 {
 	int len, tmp;
 
diff --git a/archive.c b/archive.c
index fd556c28e4..c2fe16ad33 100644
--- a/archive.c
+++ b/archive.c
@@ -36,8 +36,8 @@ void init_archivers(void)
 }
 
 static void format_subst(const struct commit *commit,
-                         const char *src, size_t len,
-                         struct strbuf *buf)
+			 const char *src, size_t len,
+			 struct strbuf *buf)
 {
 	char *to_free = NULL;
 	struct strbuf fmt = STRBUF_INIT;
diff --git a/builtin/add.c b/builtin/add.c
index f65c172299..12247b48fd 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -176,7 +176,7 @@ static void refresh(int verbose, const struct pathspec *pathspec)
 			die(_("pathspec '%s' did not match any files"),
 			    pathspec->items[i].match);
 	}
-        free(seen);
+	free(seen);
 }
 
 int run_add_interactive(const char *revision, const char *patch_mode,
diff --git a/builtin/gc.c b/builtin/gc.c
index 871a56f1c5..7696017cd4 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -317,7 +317,7 @@ static void add_repack_all_option(struct string_list *keep_pack)
 
 static void add_repack_incremental_option(void)
 {
-       argv_array_push(&repack, "--no-write-bitmap-index");
+	argv_array_push(&repack, "--no-write-bitmap-index");
 }
 
 static int need_to_gc(void)
diff --git a/cache-tree.c b/cache-tree.c
index 9d454d24bc..d6dbbebfb2 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -448,7 +448,7 @@ int cache_tree_update(struct index_state *istate, int flags)
 }
 
 static void write_one(struct strbuf *buffer, struct cache_tree *it,
-                      const char *path, int pathlen)
+		      const char *path, int pathlen)
 {
 	int i;
 
diff --git a/convert.c b/convert.c
index e0848226d2..5f60c11ce0 100644
--- a/convert.c
+++ b/convert.c
@@ -705,7 +705,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 }
 
 static int apply_single_file_filter(const char *path, const char *src, size_t len, int fd,
-                        struct strbuf *dst, const char *cmd)
+				    struct strbuf *dst, const char *cmd)
 {
 	/*
 	 * Create a pipeline to have the command filter the buffer's
@@ -1091,7 +1091,7 @@ static int count_ident(const char *cp, unsigned long size)
 }
 
 static int ident_to_git(const char *path, const char *src, size_t len,
-                        struct strbuf *buf, int ident)
+			struct strbuf *buf, int ident)
 {
 	char *dst, *dollar;
 
@@ -1135,7 +1135,7 @@ static int ident_to_git(const char *path, const char *src, size_t len,
 }
 
 static int ident_to_worktree(const char *path, const char *src, size_t len,
-                             struct strbuf *buf, int ident)
+			     struct strbuf *buf, int ident)
 {
 	struct object_id oid;
 	char *to_free = NULL, *dollar, *spc;
diff --git a/git-compat-util.h b/git-compat-util.h
index 09b0102cae..f281aa5185 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -721,7 +721,7 @@ extern const char *githstrerror(int herror);
 #ifdef NO_MEMMEM
 #define memmem gitmemmem
 void *gitmemmem(const void *haystack, size_t haystacklen,
-                const void *needle, size_t needlelen);
+		const void *needle, size_t needlelen);
 #endif
 
 #ifdef OVERRIDE_STRDUP
diff --git a/parse-options.c b/parse-options.c
index 3b874a83a0..27353c8e8d 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -236,7 +236,7 @@ static int parse_short_opt(struct parse_opt_ctx_t *p, const struct option *optio
 }
 
 static int parse_long_opt(struct parse_opt_ctx_t *p, const char *arg,
-                          const struct option *options)
+			  const struct option *options)
 {
 	const struct option *all_opts = options;
 	const char *arg_end = strchrnul(arg, '=');
diff --git a/parse-options.h b/parse-options.h
index 6c4fe2016d..bd88f6424a 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -175,11 +175,11 @@ struct option {
  * Returns the number of arguments left in argv[].
  */
 extern int parse_options(int argc, const char **argv, const char *prefix,
-                         const struct option *options,
-                         const char * const usagestr[], int flags);
+			 const struct option *options,
+			 const char * const usagestr[], int flags);
 
 extern NORETURN void usage_with_options(const char * const *usagestr,
-                                        const struct option *options);
+					const struct option *options);
 
 extern NORETURN void usage_msg_opt(const char *msg,
 				   const char * const *usagestr,
diff --git a/quote.c b/quote.c
index c95dd2cafb..7f2aa6faa4 100644
--- a/quote.c
+++ b/quote.c
@@ -234,7 +234,7 @@ static size_t next_quote_pos(const char *s, ssize_t maxlen)
  *     Return value is the same as in (1).
  */
 static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
-                                    struct strbuf *sb, FILE *fp, int no_dq)
+				    struct strbuf *sb, FILE *fp, int no_dq)
 {
 #undef EMIT
 #define EMIT(c)                                 \
diff --git a/read-cache.c b/read-cache.c
index bd45dc3e24..3428d34f34 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -3494,71 +3494,71 @@ static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context,
 
 static struct index_entry_offset_table *read_ieot_extension(const char *mmap, size_t mmap_size, size_t offset)
 {
-       const char *index = NULL;
-       uint32_t extsize, ext_version;
-       struct index_entry_offset_table *ieot;
-       int i, nr;
-
-       /* find the IEOT extension */
-       if (!offset)
-	       return NULL;
-       while (offset <= mmap_size - the_hash_algo->rawsz - 8) {
-	       extsize = get_be32(mmap + offset + 4);
-	       if (CACHE_EXT((mmap + offset)) == CACHE_EXT_INDEXENTRYOFFSETTABLE) {
-		       index = mmap + offset + 4 + 4;
-		       break;
-	       }
-	       offset += 8;
-	       offset += extsize;
-       }
-       if (!index)
-	       return NULL;
-
-       /* validate the version is IEOT_VERSION */
-       ext_version = get_be32(index);
-       if (ext_version != IEOT_VERSION) {
-	       error("invalid IEOT version %d", ext_version);
-	       return NULL;
-       }
-       index += sizeof(uint32_t);
-
-       /* extension size - version bytes / bytes per entry */
-       nr = (extsize - sizeof(uint32_t)) / (sizeof(uint32_t) + sizeof(uint32_t));
-       if (!nr) {
-	       error("invalid number of IEOT entries %d", nr);
-	       return NULL;
-       }
-       ieot = xmalloc(sizeof(struct index_entry_offset_table)
-	       + (nr * sizeof(struct index_entry_offset)));
-       ieot->nr = nr;
-       for (i = 0; i < nr; i++) {
-	       ieot->entries[i].offset = get_be32(index);
-	       index += sizeof(uint32_t);
-	       ieot->entries[i].nr = get_be32(index);
-	       index += sizeof(uint32_t);
-       }
-
-       return ieot;
+	const char *index = NULL;
+	uint32_t extsize, ext_version;
+	struct index_entry_offset_table *ieot;
+	int i, nr;
+
+	/* find the IEOT extension */
+	if (!offset)
+		return NULL;
+	while (offset <= mmap_size - the_hash_algo->rawsz - 8) {
+		extsize = get_be32(mmap + offset + 4);
+		if (CACHE_EXT((mmap + offset)) == CACHE_EXT_INDEXENTRYOFFSETTABLE) {
+			index = mmap + offset + 4 + 4;
+			break;
+		}
+		offset += 8;
+		offset += extsize;
+	}
+	if (!index)
+		return NULL;
+
+	/* validate the version is IEOT_VERSION */
+	ext_version = get_be32(index);
+	if (ext_version != IEOT_VERSION) {
+		error("invalid IEOT version %d", ext_version);
+		return NULL;
+	}
+	index += sizeof(uint32_t);
+
+	/* extension size - version bytes / bytes per entry */
+	nr = (extsize - sizeof(uint32_t)) / (sizeof(uint32_t) + sizeof(uint32_t));
+	if (!nr) {
+		error("invalid number of IEOT entries %d", nr);
+		return NULL;
+	}
+	ieot = xmalloc(sizeof(struct index_entry_offset_table)
+		       + (nr * sizeof(struct index_entry_offset)));
+	ieot->nr = nr;
+	for (i = 0; i < nr; i++) {
+		ieot->entries[i].offset = get_be32(index);
+		index += sizeof(uint32_t);
+		ieot->entries[i].nr = get_be32(index);
+		index += sizeof(uint32_t);
+	}
+
+	return ieot;
 }
 
 static void write_ieot_extension(struct strbuf *sb, struct index_entry_offset_table *ieot)
 {
-       uint32_t buffer;
-       int i;
+	uint32_t buffer;
+	int i;
 
-       /* version */
-       put_be32(&buffer, IEOT_VERSION);
-       strbuf_add(sb, &buffer, sizeof(uint32_t));
+	/* version */
+	put_be32(&buffer, IEOT_VERSION);
+	strbuf_add(sb, &buffer, sizeof(uint32_t));
 
-       /* ieot */
-       for (i = 0; i < ieot->nr; i++) {
+	/* ieot */
+	for (i = 0; i < ieot->nr; i++) {
 
-	       /* offset */
-	       put_be32(&buffer, ieot->entries[i].offset);
-	       strbuf_add(sb, &buffer, sizeof(uint32_t));
+		/* offset */
+		put_be32(&buffer, ieot->entries[i].offset);
+		strbuf_add(sb, &buffer, sizeof(uint32_t));
 
-	       /* count */
-	       put_be32(&buffer, ieot->entries[i].nr);
-	       strbuf_add(sb, &buffer, sizeof(uint32_t));
-       }
+		/* count */
+		put_be32(&buffer, ieot->entries[i].nr);
+		strbuf_add(sb, &buffer, sizeof(uint32_t));
+	}
 }
diff --git a/revision.c b/revision.c
index 13e0519c02..f9ca1a8d53 100644
--- a/revision.c
+++ b/revision.c
@@ -1495,8 +1495,8 @@ void repo_init_revisions(struct repository *r,
 }
 
 static void add_pending_commit_list(struct rev_info *revs,
-                                    struct commit_list *commit_list,
-                                    unsigned int flags)
+				    struct commit_list *commit_list,
+				    unsigned int flags)
 {
 	while (commit_list) {
 		struct object *object = &commit_list->item->object;
diff --git a/symlinks.c b/symlinks.c
index 5261e8cf49..69d458a24d 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -221,7 +221,7 @@ int has_symlink_leading_path(const char *name, int len)
  */
 int check_leading_path(const char *name, int len)
 {
-    return threaded_check_leading_path(&default_cache, name, len);
+	return threaded_check_leading_path(&default_cache, name, len);
 }
 
 /*
-- 
2.20.0.rc2.461.g47d1d16d8d


^ permalink raw reply related

* Re: gitweb: local configuration not found
From: Martin Mareš @ 2018-12-06 15:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Jonathan Nieder, git
In-Reply-To: <xmqq5zw7bhgv.fsf@gitster-ct.c.googlers.com>

Hello!

> Yeah, it does look indirect.  Despite what you said, it also would
> support users giving an absolute path via GITWEB_CONFIG.
> 
> With "use File::Spec", perhaps something like this?

Yes, this looks right.

				Martin

^ permalink raw reply

* Re: A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others
From: Phillip Wood @ 2018-12-06 14:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Stefan Beller; +Cc: git
In-Reply-To: <87zhtiyd45.fsf@evledraar.gmail.com>

Hi Ævar

On 06/12/2018 13:54, Ævar Arnfjörð Bjarmason wrote:
> Let's ignore how bad this patch is for git.git, and just focus on how
> diff.colorMoved treats it:
> 
>      diff --git a/builtin/add.c b/builtin/add.c
>      index f65c172299..d1155322ef 100644
>      --- a/builtin/add.c
>      +++ b/builtin/add.c
>      @@ -6,5 +6,3 @@
>       #include "cache.h"
>      -#include "config.h"
>       #include "builtin.h"
>      -#include "lockfile.h"
>       #include "dir.h"
>      diff --git a/builtin/am.c b/builtin/am.c
>      index 8f27f3375b..eded15aa8a 100644
>      --- a/builtin/am.c
>      +++ b/builtin/am.c
>      @@ -6,3 +6,2 @@
>       #include "cache.h"
>      -#include "config.h"
>       #include "builtin.h"
>      diff --git a/builtin/blame.c b/builtin/blame.c
>      index 06a7163ffe..44a754f190 100644
>      --- a/builtin/blame.c
>      +++ b/builtin/blame.c
>      @@ -8,3 +8,2 @@
>       #include "cache.h"
>      -#include "config.h"
>       #include "color.h"
>      diff --git a/cache.h b/cache.h
>      index ca36b44ee0..ea8d60b94a 100644
>      --- a/cache.h
>      +++ b/cache.h
>      @@ -4,2 +4,4 @@
>       #include "git-compat-util.h"
>      +#include "config.h"
>      +#include "new.h"
>       #include "strbuf.h"
> 
> This is a common thing that's useful to have highlighted, e.g. we move
> includes of config.h to some common file, so I want to se all the
> deleted config.h lines as moved into the cache.h line, and then the
> "lockfile.h" I removed while I was at it plain remove, and the new
> "new.h" plain added.
> 
> Exactly that is what you get with diff.colorMoved=plain, but the default
> of diff.colorMoved=zebra gets confused by this and highlights no moves
> at all, same or "blocks" and "dimmed-zebra".
> 
> So at first I thought this had something to do with the many->one
> detection, but it seems to be simpler, we just don't detect a move of
> 1-line with anything but plain, e.g. this works as expected in all modes
> and detects the many->one:
> 
>      diff --git a/builtin/add.c b/builtin/add.c
>      index f65c172299..f4fda75890 100644
>      --- a/builtin/add.c
>      +++ b/builtin/add.c
>      @@ -5,4 +5,2 @@
>        */
>      -#include "cache.h"
>      -#include "config.h"
>       #include "builtin.h"
>      diff --git a/builtin/branch.c b/builtin/branch.c
>      index 0c55f7f065..52e39924d3 100644
>      --- a/builtin/branch.c
>      +++ b/builtin/branch.c
>      @@ -7,4 +7,2 @@
> 
>      -#include "cache.h"
>      -#include "config.h"
>       #include "color.h"
>      diff --git a/cache.h b/cache.h
>      index ca36b44ee0..d4146dbf8a 100644
>      --- a/cache.h
>      +++ b/cache.h
>      @@ -3,2 +3,4 @@
> 
>      +#include "cache.h"
>      +#include "config.h"
>       #include "git-compat-util.h"
> 
> So is there some "must be at least two consecutive lines" condition for
> not-plain, or is something else going on here?

To be considered a block has to have 20 alphanumeric characters - see 
commit f0b8fb6e59 ("diff: define block by number of alphanumeric chars", 
2017-08-15). This stops things like random '}' lines being marked as 
moved on their own. It might be better to use some kind of frequency 
information (a bit like python's difflib junk parameter) instead so that 
(fairly) unique short lines also get marked properly.

Best Wishes

Phillip

^ permalink raw reply

* Re: [BUG REPORT] Git does not correctly replay bisect log
From: Lukáš Krejčí @ 2018-12-06 14:43 UTC (permalink / raw)
  To: Christian Couder; +Cc: git
In-Reply-To: <CAP8UFD3cD5KtvPJK5WkWGVUT6grbL=xL2MV1YWNJGpOjD3uRiQ@mail.gmail.com>

Hello again,

after looking into this today, I'm not sure if this can be considered a
bug - it's just that I expected Git to check out the exact commit to
test that was there before resetting the bisect. That made me uncertain
whether Git restored the correct state.

When I looked at what Git actually does, it became clear that the
behavior is not incorrect but perhaps a bit surprising.

When Git replays the bisect log, it only updates refs/bisect/bad,
refs/bisect/good-*, refs/bisect/skip-* and reconstructs the log in
.git/BISECT_LOG. After that check_good_are_ancestors_of_bad() verifies
that all good commits are ancestors of the bad commit, and if not, the
message "Bisecting: a merge base must be tested" is printed and the
branch is switched to the merge base of the bad and all the good
commits.

Basically, some state is lost because Git "forgot" the first good
commit from the log already was an ancestor of the first bad one.
In other words, it's as if I just started the bisect with the following
commands and just pasted the whole bisect log to .git/BISECT_LOG:

$ git bisect start
$ git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703
$ git bisect good 958f338e96f874a0d29442396d6adf9c1e17aa2d
$ git bisect bad 1b0d274523df5ef1caedc834da055ff721e4d4f0
Bisecting: a merge base must be tested
[1e4b044d22517cae7047c99038abb444423243ca] Linux 4.18-rc4

(here's the full bisect log again for reference)
git bisect start
# bad: [5b394b2ddf0347bef56e50c69a58773c94343ff3] Linux 4.19-rc1
git bisect bad 5b394b2ddf0347bef56e50c69a58773c94343ff3
# good: [94710cac0ef4ee177a63b5227664b38c95bbf703] Linux 4.18
git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703
# bad: [54dbe75bbf1e189982516de179147208e90b5e45] Merge tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm
git bisect bad 54dbe75bbf1e189982516de179147208e90b5e45
# bad: [0a957467c5fd46142bc9c52758ffc552d4c5e2f7] x86: i8259: Add missing include file
git bisect bad 0a957467c5fd46142bc9c52758ffc552d4c5e2f7
# good: [958f338e96f874a0d29442396d6adf9c1e17aa2d] Merge branch 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 958f338e96f874a0d29442396d6adf9c1e17aa2d
# bad: [2c20443ec221dcb76484b30933593e8ecd836bbd] Merge tag 'acpi-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 2c20443ec221dcb76484b30933593e8ecd836bbd
# bad: [c2fc71c9b74c1e87336a27dba1a5edc69d2690f1] Merge tag 'mtd/for-4.19' of git://git.infradead.org/linux-mtd
git bisect bad c2fc71c9b74c1e87336a27dba1a5edc69d2690f1
# bad: [b86d865cb1cae1e61527ea0b8977078bbf694328] blkcg: Make blkg_root_lookup() work for queues in bypass mode
git bisect bad b86d865cb1cae1e61527ea0b8977078bbf694328
# bad: [1b0d274523df5ef1caedc834da055ff721e4d4f0] nvmet: don't use uuid_le type
git bisect bad 1b0d274523df5ef1caedc834da055ff721e4d4f0

And indeed, marking the merge base as good switches to the correct
commit after the bisect. Marking it as bad will fail, so at least you
can't make a mistake after replaying the bisect log:
$ git bisect bad
The merge base 1e4b044d22517cae7047c99038abb444423243ca is bad.
This means the bug has been fixed between 1e4b044d22517cae7047c99038abb444423243ca and [94710cac0ef4ee177a63b5227664b38c95bbf703 958f338e96f874a0d29442396d6adf9c1e17aa2d].

Once again, I'm sorry for the noise. I guess it wasn't clear from the
man page that something like this could happen and that made me think
that this was a bug.


^ permalink raw reply

* How to investigate further a seemingly 'ghost' commit (after a merge)?
From: fan2linux @ 2018-12-06 14:45 UTC (permalink / raw)
  To: git
In-Reply-To: <2070302649.44051.1544107334387.JavaMail.zimbra@laposte.net>

Hello, 

I am trying to understand how a fix from a bug-correction branch vanished and 
the bug found its way back into the main branch after two merges. 

I am using git version 2.19.2. 

Checkouting tag 18.40.1 and checking its graph: 

> $ git checkout 18.40.1 
> 
> $ git log --oneline --graph 
> * ee4721a1 (HEAD, tag: 18.40.1) Merge branch 'suppressionNumAdhCTP' into 'homol2' 
> |\ 
> | * 52aeaf64 Retire le numéro de contrat et le numéro d'adhérent qui étaient affichés en haut de la page et empiétaient sur le préimprimé. 
> |/ 
> * 9f68ec4b (tag: 18.40.0) Merge branch 'cherry-pick-7c956b5f' into 'homol2' 

For the record, here is the git log excerpt on the file in question: 

> $ git log app/CTPCB-AC001/CTPCB-AC001.tex 
> commit 52aeaf64c808c1e3ee8c5cbf5f0221d4e8a7699d 
> Author: Guillaume Ballin < guillaume.ballin@mnt.fr > 
> Date: Tue Nov 13 11:52:03 2018 +0100 
> 
> Retire le numéro de contrat et le numéro d'adhérent qui étaient affichés en haut de la page et empiétaient sur le préimprimé. 
> 
> commit 9336db5c25bb0f3af19183fe1db2fb05ce28b9f3 
> Author: arnavander < arnavander@mnt.fr > 
> Date: Mon Nov 5 14:46:27 2018 +0000 
> 
> Correction retour anomalie de Carte TP 
> 
> Signed-off-by: arnavander < arnavander@mnt.fr > 
> 
> 
> (cherry picked from commit 7c956b5f29bf23c624684c9300a13abecfb451c5) 

Checkouting commit 25ca67a9 and checking its graph: 

> $ git checkout 25ca67a9 
> 
> $ git log --oneline --graph 
> * 25ca67a9 (HEAD) Merge branch 'homol2' into 'homol' 
> |\ 
> | * ee4721a1 (tag: 18.40.1) Merge branch 'suppressionNumAdhCTP' into 'homol2' 
> | |\ 
> | | * 52aeaf64 Retire le numéro de contrat et le numéro d'adhérent qui étaient affichés en haut de la page et empiétaient sur le préimprimé. 
> | |/ 
> | * 9f68ec4b (tag: 18.40.0) Merge branch 'cherry-pick-7c956b5f' into 'homol2' 

Checking the log of that file again: 

> $ git log app/CTPCB-AC001/CTPCB-AC001.tex 
> commit 7c956b5f29bf23c624684c9300a13abecfb451c5 
> Author: arnavander < arnavander@mnt.fr > 
> Date: Mon Nov 5 15:46:27 2018 +0100 
> 
> Correction retour anomalie de Carte TP 
> 
> Signed-off-by: arnavander < arnavander@mnt.fr > 

Commit 52aeaf64c808c1e3ee8c5cbf5f0221d4e8a7699d does not show up any more in 
the pointed to by commit 25ca67a9! 

The fix only deleted a block of lines: 

> $ git diff 9f68ec4b 52aeaf64 
> diff --git a/app/CTPCB-AC001/CTPCB-AC001.tex b/app/CTPCB-AC001/CTPCB-AC001.tex 
> index 148b3ed2..741d3dff 100644 
> --- a/app/CTPCB-AC001/CTPCB-AC001.tex 
> +++ b/app/CTPCB-AC001/CTPCB-AC001.tex 
> @@ -122,12 +122,6 @@ 
> } 
> 
> \newcommand{\CarteTP}{ 
> - \begin{textblock*}{200pt}(50pt,225pt)% 
> - \No contrat : \NUMADH\\ 
> - \ifdefined\NUMABA 
> - \No adhérent : \NUMABA\\ 
> - \fi 
> - \end{textblock*}% 
> \edToAddrList{DADDR} 
> 
> %PREMIERE CARTE 

How a commit can disapear like that from the log of a file and of course not 
patching the file while staying visible in the git log of the branch? 
It's just like if grandchildren forgetted grandparent commit while staying connected 
to parents... 

--
Fan2linux - G. Ballin

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox