Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()
From: Derrick Stolee @ 2020-09-01 13:04 UTC (permalink / raw)
  To: Jeff King, git
In-Reply-To: <20200901111800.GA3115584@coredump.intra.peff.net>

On 9/1/2020 7:18 AM, Jeff King wrote:
> This patch fixes a bug where xrealloc(ptr, 0) can double-free and
> corrupt the heap on some platforms (including at least glibc).

!!! Good find !!!

> The simplest fix here is to just pass "ret" (which we know to be NULL)
> to the follow-up realloc(). That does mean that a system which _doesn't_
> free the original pointer would leak it. But that interpretation of the
> standard seems unlikely (if a system didn't deallocate in this case, I'd
> expect it to simply return the original pointer). If it turns out to be
> an issue, we can handle the "!size" case up front instead, before we
> call realloc() at all.

Adding an `if (!size) {free(ptr); return NULL;}` block was what I
expected. Was that chosen just so we can rely more on the system
realloc(), or is there a performance implication that I'm not
seeing?

> @@ -120,7 +120,7 @@ void *xrealloc(void *ptr, size_t size)
>  	memory_limit_check(size, 0);
>  	ret = realloc(ptr, size);
>  	if (!ret && !size)
> -		ret = realloc(ptr, 1);
> +		ret = realloc(ret, 1);

I appreciate all the additional context for such a small change.

LGTM.
-Stolee

^ permalink raw reply

* Re: [PATCH v3 11/11] doc: add corrected commit date info
From: Abhishek Kumar @ 2020-09-01 13:01 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: abhishekkumar8222, git, gitgitgadget, jnareb
In-Reply-To: <e7bbce30-93a6-b7e2-844b-5f2af4dbddf3@gmail.com>

On Thu, Aug 27, 2020 at 09:15:56AM -0400, Derrick Stolee wrote:
> On 8/27/2020 2:39 AM, Abhishek Kumar wrote:
> > Thinking about this, I feel creating a new section called "Handling
> > Mixed Generation Number Chains" made more sense:
> > 
> >   ## Handling Mixed Generation Number Chains
> > 
> >   With the introduction of generation number v2 and generation data chunk,
> >   the following scenario is possible:
> > 
> >   1. "New" Git writes a commit-graph with a GDAT chunk.
> >   2. "Old" Git writes a split commit-graph on top without a GDAT chunk.
> 
> I like the idea of this section, and this setup is good.
> 
> >   The commits in the lower layer will be interpreted as having very large
> >   generation values (commit date plus offset) compared to the generation
> >   numbers in the top layer (toplogical level). This violates the
> >   expectation that the generation of a parent is strictly smaller than the
> >   generation of a child. In such cases, we revert to using topological
> >   levels for all layers to maintain backwards compatability.
> 
> s/toplogical/topological
> 
> But also, we don't want to phrase this as "in this case, we do the wrong
> thing" but instead
> 
>   A naive approach of using the newest available generation number from
>   each layer would lead to violated expectations: the lower layer would
>   use corrected commit dates which are much larger than the topological
>   levels of the higher layer. For this reason, Git inspects each layer
>   to see if any layer is missing corrected commit dates. In such a case,
>   Git only uses topological levels.
> 
> >   When writing a new layer in split commit-graph, we write a GDAT chunk
> >   only if the topmost layer has a GDAT chunk. This guarantees that if a
> >   lyer has GDAT chunk, all lower layers must have a GDAT chunk as well.
> 
> s/lyer/layer
> 
> Perhaps leaving this at a higher level than referencing "GDAT chunk" is
> advisable. Perhaps use "we write corrected commit dates" or "all lower
> layers must store corrected commit dates as well", for example.
> 
> >   Rewriting layers follows similar approach: if the topmost layer below
> >   set of layers being rewriteen (in the split commit-graph chain) exists,
> >   and it does not contain GDAT chunk, then the result of rewrite does not
> >   have GDAT chunks either.
> 
> This could use more positive language to make it clear that sometimes
> we _do_ want to write corrected commit dates when merging layers:
> 
>   When merging layers, we do not consider whether the merged layers had
>   corrected commit dates. Instead, the new layer will have corrected
>   commit dates if and only if all existing layers below the new layer
>   have corrected commit dates.

Thanks, that is a great suggestion! Using positive language is more
straightforward and easier to understand.

> 
> Thanks,
> -Stolee

^ permalink raw reply

* Re: Git in Outreachy?
From: Jeff King @ 2020-09-01 12:51 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Christian Couder, Johannes Schindelin
In-Reply-To: <20200831180537.GC331156@google.com>

On Mon, Aug 31, 2020 at 11:05:37AM -0700, Emily Shaffer wrote:

> I'm interested in mentoring or co-mentoring this term. (I'm not working
> this week but I didn't want to miss a deadline to volunteer.)

OK, between you and Christian, then, it sounds like it's worth pursuing.
I'll sign us up and try to arrange funding.

> One thought I had was whether it might be cool to shop for another
> co-mentor from Wireshark and have an intern teach Wireshark how to
> decipher Git protocol. But it seems large, and maybe last-minute for
> this term.

That sounds neat. I don't think it would be too large (in fact, I think
it might end up being a bit small for a whole-term project). It would
definitely require somebody from Wireshark being a co-mentor, though.

-Peff

^ permalink raw reply

* Re: [PATCH v3 05/11] commit-graph: return 64-bit generation number
From: Abhishek Kumar @ 2020-09-01 12:06 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: abhishekkumar8222, git, stolee, me
In-Reply-To: <85ft8adilr.fsf@gmail.com>

On Tue, Aug 25, 2020 at 02:18:24PM +0200, Jakub Narębski wrote:
> Abhishek Kumar <abhishekkumar8222@gmail.com> writes:
> 
> ...
> 
> However, as I wrote, handling GENERATION_NUMBER_V2_OFFSET_MAX is
> difficult.  As far as I can see, we can choose one of the *three*
> solutions (the third one is _new_):
> 
> a. store 64-bit corrected commit date in the GDAT chunk
>    all possible values are able to be stored, no need for
>    GENERATION_NUMBER_V2_MAX,
> 
> b. store 32-bit corrected commit date offset in the GDAT chunk,
>    if its value is larger than GENERATION_NUMBER_V2_OFFSET_MAX,
>    do not write GDAT chunk at all (like for backward compatibility
>    with mixed-version chains of split commit-graph layers),
> 
> c. store 32-bit corrected commit date offset in the GDAT chunk,
>    using some kind of overflow handling scheme; for example if
>    the most significant bit of 32-bit value is 1, then the
>    rest 31-bits are position in GDOV chunk, which uses 64-bit
>    to store those corrected commit date offsets that do not
>    fit in 32 bits.
> 

Alright, so the third solution leverages the fact that in practice,
very few offsets would overflow the 32-bit limit. Using 64-bits for all
offsets would be wasteful, we can trade off a miniscule amount of
computation to save large amounts of disk space.

>
> This type of schema is used in other places in Git code, if I remember
> it correctly.
> 

Yes, it's a similar idea to the extra edge list chunk, where the most
significant bit of second parent indicates whether they are more than
two parents.

It's definitely feasible, albeit a little complex.

What's the overall consensus on the third solution?

>
> >> The commit message says nothing about the new symbolic constant
> >> GENERATION_NUMBER_V1_INFINITY, though.
> >>
> >> I'm not sure it is even needed (see comments below).
> >
> > Yes, you are correct. I tried it out with your suggestions and it wasn't
> > really needed.
> >
> > Thanks for catching this!
> 
> Mistakes can happen when changig how the series is split into commits.
> 
> Best,
> -- 
> Jakub Narębski

Thanks
- Abhishek

^ permalink raw reply

* Re: [PATCH] core.abbrev <off|false|no> disables abbreviations
From: Derrick Stolee @ 2020-09-01 12:14 UTC (permalink / raw)
  To: Eric Wong, git; +Cc: brian m. carlson
In-Reply-To: <20200901074355.GA4498@dcvr>

On 9/1/2020 3:43 AM, Eric Wong wrote:
> These allows users to write hash-agnostic scripts and configs to
> disable abbreviations.  Using "-c core.abbrev=40" will be
> insufficient with SHA-256, and "-c core.abbrev=64" won't work
> with SHA-1 repos today.
> 
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
>   I kinda wanted to allow a value of "max", but I figured the existing
>   boolean falsiness words might make more sense with `--no-abbrev' in
>   for some commands...  Naming is hard :x
> 
>  config.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/config.c b/config.c
> index 2bdff4457b..f2e09c72ca 100644
> --- a/config.c
> +++ b/config.c
> @@ -1217,6 +1217,10 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
>  			return config_error_nonbool(var);
>  		if (!strcasecmp(value, "auto"))
>  			default_abbrev = -1;
> +		else if (!strcasecmp(value, "false") ||
> +			 !strcasecmp(value, "no") ||
> +			 !strcasecmp(value, "off"))
> +			default_abbrev = the_hash_algo->hexsz;

I'm not sure we need three synonyms for "no-abbrev" here.

"false" would be natural, except I think in a few places
the config value "0" is also interpreted as "false", but
as seen below a value of "0" snaps up to the minimum
allowed abbreviation.

>  		else {
>  			int abbrev = git_config_int(var, value);
>  			if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)

Perhaps "core.abbrev = never" would be a good option?

After we decide on the word, this patch needs:

* Updates to Documentation/config/core.txt
* A test that works with both hash versions.

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH v3 03/11] commit-graph: consolidate fill_commit_graph_info
From: Abhishek Kumar @ 2020-09-01 11:35 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: abhishekkumar8222, git, gitgitgadget, stolee
In-Reply-To: <85mu2jc75c.fsf@gmail.com>

On Tue, Aug 25, 2020 at 01:11:11PM +0200, Jakub Narębski wrote:
> Hello,
> 
> ...
> 
> All right.
> 
> We might want to add here the information that we also move loading the
> commit date from the commit-graph file from fill_commit_in_graph() down
> the [new] call chain into fill_commit_graph_info().  The commit date
> would be needed in fill_commit_graph_info() in the next commit to
> compute corrected commit date out of corrected commit date offset, and
> store it as generation number.
> 
> 
> NOTE that this means that if we switch to storing 64-bit corrected
> commit date directly in the commit-graph file, instead of storing 32-bit
> offsets, neither this Move Statement Into Function Out of Caller
> refactoring nor change to the 'generate tar with future mtime' test
> would be necessary.
> 
> >
> > The test 'generate tar with future mtime' creates a commit with commit
> > time of (2 ^ 36 + 1) seconds since EPOCH. The CDAT chunk provides
> > 34-bits for storing commiter date, thus committer time overflows into
> > generation number (within CDAT chunk) and has undefined behavior.
> >
> > The test used to pass as fill_commit_graph_info() would not set struct
> > member `date` of struct commit and loads committer date from the object
> > database, generating a tar file with the expected mtime.
> 
> I guess that in the case of generating a tar file we would read the
> commit out of 'object database', and then only add commit-graph specific
> info with fill_commit_graph_info().  Possibly because we need more
> information that commit-graph provides for a commit.
> 
> >
> > However, with corrected commit date, we will load the committer date
> > from CDAT chunk (truncated to lower 34-bits) to populate the generation
> > number. Thus, fill_commit_graph_info() sets date and generates tar file
> > with the truncated mtime and the test fails.
> >
> > Let's fix the test by setting a timestamp of (2 ^ 34 - 1) seconds, which
> > will not be truncated.
> 
> Now I got interested why the value of (2 ^ 36 + 1) seconds since EPOCH
> was used.
> 
> The commit that introduced the 'generate tar with future mtime' test,
> namely e51217e15 (t5000: test tar files that overflow ustar headers,
> 30-06-2016), says:
> 
> 	The ustar format only has room for 11 (or 12, depending on
> 	some implementations) octal digits for the size and mtime of
> 	each file. For values larger than this, we have to add pax
> 	extended headers to specify the real data, and git does not
> 	yet know how to do so.
> 
> 	Before fixing that, let's start off with some test
> 	infrastructure [...]
> 
> The value of 2 ^ 36 equals 2 ^ 3*12 = (2 ^ 3) ^ 12 = 8 ^ 12.
> So we need the value of (2 ^ 36 + 1) for this test do do its job.
> Possibly the value of 8 ^ 11 + 1 = 2 ^ 33 + 1 would be enough
> (if we skip testing "some implementations").
> 
> So I think to make this test more clear (for inquisitive minds) we
> should set a timestamp of (2 ^ 33 + 1), not (2 ^ 34 - 1) seconds
> since EPOCH.  Maybe even add a variant of this test that uses the
> origial value of (2 ^ 36 + 1) seconds since EPOCH, but turns off
> use of serialized commit-graph.

That's pretty interesting! I didn't look into this either, will modify
the existing test and add a new test for it.

Thanks for investigating this further.

> 
> I'm sorry for not checking this earlier.
> 
> Best,
> -- 
> Jakub Narębski

Thanks
- Abhishek

^ permalink raw reply

* Re: [PATCH v3 07/11] commit-graph: implement corrected commit date
From: Abhishek Kumar @ 2020-09-01 11:01 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: abhishekkumar8222, git, gitgitgadget, stolee
In-Reply-To: <85wo1nca3u.fsf@gmail.com>

On Tue, Aug 25, 2020 at 12:07:17PM +0200, Jakub Narębski wrote:
> Hello,
> 
> ...
> 
> I think I was not clear enough (in trying to be brief).  I meant here
> loading available generation numbers for use in graph traversal,
> done in later patches in this series.
> 
> In _next_ commit we store topological levels in `generation` field:
> 
>   @@ -755,7 +763,11 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
>    	date_low = get_be32(commit_data + g->hash_len + 12);
>    	item->date = (timestamp_t)((date_high << 32) | date_low);
> 
>   -	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
>   +	if (g->chunk_generation_data)
>   +		graph_data->generation = item->date +
>   +			(timestamp_t) get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
>   +	else
>   +		graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
> 
> 
> We use topo_level slab only when writing the commit-graph file.
> 

Right, I thought the agenda outlined points in the process of writing
commit-graph file.

>
> > We could avoid initializing topo_slab if we are not writing generation
> > data chunk (and thus don't need corrected commit dates) but that
> > wouldn't have an impact on run time while writing commit-graph because
> > computing corrected commit dates is cheap as the main cost is in walking
> > the graph and writing the file.
> 
> Right.
> 
> Though you need to add the cost of allocation and managing extra
> commit slab, I think that amortized cost is negligible.
> 
> But what would be better is showing benchmark data: does writing the
> commit graph without GDAT take not insigificant more time than without
> this patch?

Right, we could compare time taken by master and series until (but not
including this patcth) to write a commit-graph file. Will add.

> 
> [...]
> >>> @@ -2372,8 +2384,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
> >>>  	for (i = 0; i < g->num_commits; i++) {
> >>>  		struct commit *graph_commit, *odb_commit;
> >>>  		struct commit_list *graph_parents, *odb_parents;
> >>> -		timestamp_t max_generation = 0;
> >>> -		timestamp_t generation;
> >>> +		timestamp_t max_corrected_commit_date = 0;
> >>> +		timestamp_t corrected_commit_date;
> >>
> >> This is simple, and perhaps unnecessary, rename of variables.
> >> Shouldn't we however verify *both* topological level, and
> >> (if exists) corrected commit date?
> >
> > The problem with verifying both topological level and corrected commit
> > dates is that we would have to re-fill commit_graph_data slab with commit
> > data chunk as we cannot modify data->generation otherwise, essentially
> > repeating the whole verification process.
> >
> > While it's okay for now, I might take this up in a future series [1].
> >
> > [1]: https://lore.kernel.org/git/4043ffbc-84df-0cd6-5c75-af80383a56cf@gmail.com/
> 
> All right, I believe you that verifying both topological level and
> corrected commit date would be more difficult.
> 
> That doesn't change the conclusion that this variable should remain to
> be named `generation`, as when verifying GDAT-less commit-graph files it
> would check topological levels (it uses commit_graph_generation(), which
> in turn uses `generation` field in commit graph info, which as I have
> show above in later patch could be v1 or v2 generation number).
> 

Right, I completely misunderstood you initially. Reverted the variable
name changes.

> Best,
> -- 
> Jakub Narębski

^ permalink raw reply

* [PATCH v2 05/11] merge-resolve: rewrite in C
From: Alban Gruin @ 2020-09-01 10:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

This rewrites `git merge-resolve' from shell to C.  As for `git
merge-one-file', this port is not completely straightforward and removes
calls to external processes to avoid reading and writing the index over
and over again.

 - The call to `update-index -q --refresh' is replaced by a call to
   refresh_index().

 - The call to `read-tree' is replaced by a call to unpack_trees() (and
   all the setup needed).

 - The call to `write-tree' is replaced by a call to
   write_index_as_tree().

 - The call to `merge-index', needed to invoke `git merge-one-file', is
   replaced by a call to the new merge_all() function.  A callback
   function, merge_one_file_cb(), is added to allow it to call
   merge_one_file() without forking.

Here too, the index is read in cmd_merge_resolve(), but
merge_strategies_resolve() takes care of writing it back to the disk.

The parameters of merge_strategies_resolve() will be surprising at first
glance: why using a commit list for `bases' and `remote', where we could
use an oid array, and a pointer to an oid?  Because, in a later commit,
try_merge_strategy() will be able to call merge_strategies_resolve()
directly, and it already uses a commit list for `bases' (`common') and
`remote' (`remoteheads'), and a string for `head_arg'.  To reduce
frictions later, merge_strategies_resolve() takes the same types of
parameters.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 Makefile                |  2 +-
 builtin.h               |  1 +
 builtin/merge-resolve.c | 69 +++++++++++++++++++++++++++++++++
 git-merge-resolve.sh    | 54 --------------------------
 git.c                   |  1 +
 merge-strategies.c      | 85 +++++++++++++++++++++++++++++++++++++++++
 merge-strategies.h      |  5 +++
 7 files changed, 162 insertions(+), 55 deletions(-)
 create mode 100644 builtin/merge-resolve.c
 delete mode 100755 git-merge-resolve.sh

diff --git a/Makefile b/Makefile
index 8849d54063..929c3dc3eb 100644
--- a/Makefile
+++ b/Makefile
@@ -596,7 +596,6 @@ SCRIPT_SH += git-bisect.sh
 SCRIPT_SH += git-difftool--helper.sh
 SCRIPT_SH += git-filter-branch.sh
 SCRIPT_SH += git-merge-octopus.sh
-SCRIPT_SH += git-merge-resolve.sh
 SCRIPT_SH += git-mergetool.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-request-pull.sh
@@ -1092,6 +1091,7 @@ BUILTIN_OBJS += builtin/merge-index.o
 BUILTIN_OBJS += builtin/merge-one-file.o
 BUILTIN_OBJS += builtin/merge-ours.o
 BUILTIN_OBJS += builtin/merge-recursive.o
+BUILTIN_OBJS += builtin/merge-resolve.o
 BUILTIN_OBJS += builtin/merge-tree.o
 BUILTIN_OBJS += builtin/merge.o
 BUILTIN_OBJS += builtin/mktag.o
diff --git a/builtin.h b/builtin.h
index 9205d5ecdc..6ea207c9fd 100644
--- a/builtin.h
+++ b/builtin.h
@@ -174,6 +174,7 @@ int cmd_merge_ours(int argc, const char **argv, const char *prefix);
 int cmd_merge_file(int argc, const char **argv, const char *prefix);
 int cmd_merge_one_file(int argc, const char **argv, const char *prefix);
 int cmd_merge_recursive(int argc, const char **argv, const char *prefix);
+int cmd_merge_resolve(int argc, const char **argv, const char *prefix);
 int cmd_merge_tree(int argc, const char **argv, const char *prefix);
 int cmd_mktag(int argc, const char **argv, const char *prefix);
 int cmd_mktree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/merge-resolve.c b/builtin/merge-resolve.c
new file mode 100644
index 0000000000..59f734473b
--- /dev/null
+++ b/builtin/merge-resolve.c
@@ -0,0 +1,69 @@
+/*
+ * Builtin "git merge-resolve"
+ *
+ * Copyright (c) 2020 Alban Gruin
+ *
+ * Based on git-merge-resolve.sh, written by Linus Torvalds and Junio C
+ * Hamano.
+ *
+ * Resolve two trees, using enhanced multi-base read-tree.
+ */
+
+#include "cache.h"
+#include "builtin.h"
+#include "merge-strategies.h"
+
+static const char builtin_merge_resolve_usage[] =
+	"git merge-resolve <bases>... -- <head> <remote>";
+
+int cmd_merge_resolve(int argc, const char **argv, const char *prefix)
+{
+	int i, is_baseless = 1, sep_seen = 0;
+	const char *head = NULL;
+	struct commit_list *bases = NULL, *remote = NULL;
+	struct commit_list **next_base = &bases;
+
+	if (argc < 5)
+		usage(builtin_merge_resolve_usage);
+
+	setup_work_tree();
+	if (repo_read_index(the_repository) < 0)
+		die("invalid index");
+
+	/* The first parameters up to -- are merge bases; the rest are
+	 * heads. */
+	for (i = 1; i < argc; i++) {
+		if (strcmp(argv[i], "--") == 0)
+			sep_seen = 1;
+		else if (strcmp(argv[i], "-h") == 0)
+			usage(builtin_merge_resolve_usage);
+		else if (sep_seen && !head)
+			head = argv[i];
+		else if (remote) {
+			/* Give up if we are given two or more remotes.
+			 * Not handling octopus. */
+			return 2;
+		} else {
+			struct object_id oid;
+
+			get_oid(argv[i], &oid);
+			is_baseless &= sep_seen;
+
+			if (!oideq(&oid, the_hash_algo->empty_tree)) {
+				struct commit *commit;
+				commit = lookup_commit_or_die(&oid, argv[i]);
+
+				if (sep_seen)
+					commit_list_append(commit, &remote);
+				else
+					next_base = commit_list_append(commit, next_base);
+			}
+		}
+	}
+
+	/* Give up if this is a baseless merge. */
+	if (is_baseless)
+		return 2;
+
+	return merge_strategies_resolve(the_repository, bases, head, remote);
+}
diff --git a/git-merge-resolve.sh b/git-merge-resolve.sh
deleted file mode 100755
index 343fe7bccd..0000000000
--- a/git-merge-resolve.sh
+++ /dev/null
@@ -1,54 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) 2005 Linus Torvalds
-# Copyright (c) 2005 Junio C Hamano
-#
-# Resolve two trees, using enhanced multi-base read-tree.
-
-# The first parameters up to -- are merge bases; the rest are heads.
-bases= head= remotes= sep_seen=
-for arg
-do
-	case ",$sep_seen,$head,$arg," in
-	*,--,)
-		sep_seen=yes
-		;;
-	,yes,,*)
-		head=$arg
-		;;
-	,yes,*)
-		remotes="$remotes$arg "
-		;;
-	*)
-		bases="$bases$arg "
-		;;
-	esac
-done
-
-# Give up if we are given two or more remotes -- not handling octopus.
-case "$remotes" in
-?*' '?*)
-	exit 2 ;;
-esac
-
-# Give up if this is a baseless merge.
-if test '' = "$bases"
-then
-	exit 2
-fi
-
-git update-index -q --refresh
-git read-tree -u -m --aggressive $bases $head $remotes || exit 2
-echo "Trying simple merge."
-if result_tree=$(git write-tree 2>/dev/null)
-then
-	exit 0
-else
-	echo "Simple merge failed, trying Automatic merge."
-	if git merge-index -o git-merge-one-file -a
-	then
-		exit 0
-	else
-		exit 1
-	fi
-fi
diff --git a/git.c b/git.c
index c97fea36c1..794ca6e9f0 100644
--- a/git.c
+++ b/git.c
@@ -538,6 +538,7 @@ static struct cmd_struct commands[] = {
 	{ "merge-recursive", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
+	{ "merge-resolve", cmd_merge_resolve, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-subtree", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-tree", cmd_merge_tree, RUN_SETUP | NO_PARSEOPT },
 	{ "mktag", cmd_mktag, RUN_SETUP | NO_PARSEOPT },
diff --git a/merge-strategies.c b/merge-strategies.c
index 00738863e4..6b905dfc38 100644
--- a/merge-strategies.c
+++ b/merge-strategies.c
@@ -1,8 +1,11 @@
 #include "cache.h"
+#include "cache-tree.h"
 #include "dir.h"
 #include "ll-merge.h"
+#include "lockfile.h"
 #include "merge-strategies.h"
 #include "run-command.h"
+#include "unpack-trees.h"
 #include "xdiff-interface.h"
 
 static int add_to_index_cacheinfo(struct index_state *istate,
@@ -307,3 +310,85 @@ int merge_all(struct index_state *istate, int oneshot, int quiet,
 
 	return err;
 }
+
+static int add_tree(const struct object_id *oid, struct tree_desc *t)
+{
+	struct tree *tree;
+
+	tree = parse_tree_indirect(oid);
+	if (parse_tree(tree))
+		return -1;
+
+	init_tree_desc(t, tree->buffer, tree->size);
+	return 0;
+}
+
+int merge_strategies_resolve(struct repository *r,
+			     struct commit_list *bases, const char *head_arg,
+			     struct commit_list *remote)
+{
+	int i = 0;
+	struct lock_file lock = LOCK_INIT;
+	struct tree_desc t[MAX_UNPACK_TREES];
+	struct unpack_trees_options opts;
+	struct object_id head, oid;
+	struct commit_list *j;
+
+	if (head_arg)
+		get_oid(head_arg, &head);
+
+	repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR);
+	refresh_index(r->index, 0, NULL, NULL, NULL);
+
+	memset(&opts, 0, sizeof(opts));
+	opts.head_idx = 1;
+	opts.src_index = r->index;
+	opts.dst_index = r->index;
+	opts.update = 1;
+	opts.merge = 1;
+	opts.aggressive = 1;
+
+	for (j = bases; j && j->item; j = j->next) {
+		if (add_tree(&j->item->object.oid, t + (i++)))
+			goto out;
+	}
+
+	if (head_arg && add_tree(&head, t + (i++)))
+		goto out;
+	if (remote && add_tree(&remote->item->object.oid, t + (i++)))
+		goto out;
+
+	if (i == 1)
+		opts.fn = oneway_merge;
+	else if (i == 2) {
+		opts.fn = twoway_merge;
+		opts.initial_checkout = is_index_unborn(r->index);
+	} else if (i >= 3) {
+		opts.fn = threeway_merge;
+		opts.head_idx = i - 1;
+	}
+
+	if (unpack_trees(i, t, &opts))
+		goto out;
+
+	puts(_("Trying simple merge."));
+	write_locked_index(r->index, &lock, COMMIT_LOCK);
+
+	if (write_index_as_tree(&oid, r->index, r->index_file,
+				WRITE_TREE_SILENT, NULL)) {
+		int ret;
+
+		puts(_("Simple merge failed, trying Automatic merge."));
+		repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR);
+		ret = merge_all(r->index, 0, 0, merge_one_file_cb, r);
+
+		write_locked_index(r->index, &lock, COMMIT_LOCK);
+		return !!ret;
+	}
+
+	return 0;
+
+ out:
+	rollback_lock_file(&lock);
+	return 2;
+}
diff --git a/merge-strategies.h b/merge-strategies.h
index 40e175ca39..778f8ce9d6 100644
--- a/merge-strategies.h
+++ b/merge-strategies.h
@@ -1,6 +1,7 @@
 #ifndef MERGE_STRATEGIES_H
 #define MERGE_STRATEGIES_H
 
+#include "commit.h"
 #include "object.h"
 
 int merge_strategies_one_file(struct repository *r,
@@ -33,4 +34,8 @@ int merge_one_path(struct index_state *istate, int oneshot, int quiet,
 int merge_all(struct index_state *istate, int oneshot, int quiet,
 	      merge_cb cb, void *data);
 
+int merge_strategies_resolve(struct repository *r,
+			     struct commit_list *bases, const char *head_arg,
+			     struct commit_list *remote);
+
 #endif /* MERGE_STRATEGIES_H */
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* [PATCH v2 07/11] merge-octopus: rewrite in C
From: Alban Gruin @ 2020-09-01 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

This rewrites `git merge-octopus' from shell to C.  As for the two last
conversions, this port removes calls to external processes to avoid
reading and writing the index over and over again.

 - Calls to `read-tree -u -m (--aggressive)?' are replaced by calls to
   unpack_trees().

 - The call to `write-tree' is replaced by a call to
   write_index_as_tree().

 - The call to `diff-index ...' is replaced by a call to
   repo_index_has_changes(), and is moved from cmd_merge_octopus() to
   merge_octopus().

 - The call to `merge-index', needed to invoke `git merge-one-file', is
   replaced by a call to merge_all().

The index is read in cmd_merge_octopus(), and is wrote back by
merge_strategies_octopus().

Here to, merge_strategies_octopus() takes two commit lists and a string
to reduce frictions when try_merge_strategies() will be modified to call
it directly.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 Makefile                |   2 +-
 builtin.h               |   1 +
 builtin/merge-octopus.c |  65 +++++++++++++
 git-merge-octopus.sh    | 112 ----------------------
 git.c                   |   1 +
 merge-strategies.c      | 200 ++++++++++++++++++++++++++++++++++++++++
 merge-strategies.h      |   3 +
 7 files changed, 271 insertions(+), 113 deletions(-)
 create mode 100644 builtin/merge-octopus.c
 delete mode 100755 git-merge-octopus.sh

diff --git a/Makefile b/Makefile
index 929c3dc3eb..2fb26d9692 100644
--- a/Makefile
+++ b/Makefile
@@ -595,7 +595,6 @@ unexport CDPATH
 SCRIPT_SH += git-bisect.sh
 SCRIPT_SH += git-difftool--helper.sh
 SCRIPT_SH += git-filter-branch.sh
-SCRIPT_SH += git-merge-octopus.sh
 SCRIPT_SH += git-mergetool.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-request-pull.sh
@@ -1088,6 +1087,7 @@ BUILTIN_OBJS += builtin/mailsplit.o
 BUILTIN_OBJS += builtin/merge-base.o
 BUILTIN_OBJS += builtin/merge-file.o
 BUILTIN_OBJS += builtin/merge-index.o
+BUILTIN_OBJS += builtin/merge-octopus.o
 BUILTIN_OBJS += builtin/merge-one-file.o
 BUILTIN_OBJS += builtin/merge-ours.o
 BUILTIN_OBJS += builtin/merge-recursive.o
diff --git a/builtin.h b/builtin.h
index 6ea207c9fd..5a587ab70c 100644
--- a/builtin.h
+++ b/builtin.h
@@ -170,6 +170,7 @@ int cmd_mailsplit(int argc, const char **argv, const char *prefix);
 int cmd_merge(int argc, const char **argv, const char *prefix);
 int cmd_merge_base(int argc, const char **argv, const char *prefix);
 int cmd_merge_index(int argc, const char **argv, const char *prefix);
+int cmd_merge_octopus(int argc, const char **argv, const char *prefix);
 int cmd_merge_ours(int argc, const char **argv, const char *prefix);
 int cmd_merge_file(int argc, const char **argv, const char *prefix);
 int cmd_merge_one_file(int argc, const char **argv, const char *prefix);
diff --git a/builtin/merge-octopus.c b/builtin/merge-octopus.c
new file mode 100644
index 0000000000..37bbdf11cc
--- /dev/null
+++ b/builtin/merge-octopus.c
@@ -0,0 +1,65 @@
+/*
+ * Builtin "git merge-octopus"
+ *
+ * Copyright (c) 2020 Alban Gruin
+ *
+ * Based on git-merge-octopus.sh, written by Junio C Hamano.
+ *
+ * Resolve two or more trees.
+ */
+
+#include "cache.h"
+#include "builtin.h"
+#include "commit.h"
+#include "merge-strategies.h"
+
+static const char builtin_merge_octopus_usage[] =
+	"git merge-octopus [<bases>...] -- <head> <remote1> <remote2> [<remotes>...]";
+
+int cmd_merge_octopus(int argc, const char **argv, const char *prefix)
+{
+	int i, sep_seen = 0;
+	struct commit_list *bases = NULL, *remotes = NULL;
+	struct commit_list **next_base = &bases, **next_remote = &remotes;
+	const char *head_arg = NULL;
+
+	if (argc < 5)
+		usage(builtin_merge_octopus_usage);
+
+	setup_work_tree();
+	if (repo_read_index(the_repository) < 0)
+		die("corrupted cache");
+
+	/* The first parameters up to -- are merge bases; the rest are
+	 * heads. */
+	for (i = 1; i < argc; i++) {
+		if (strcmp(argv[i], "--") == 0)
+			sep_seen = 1;
+		else if (strcmp(argv[i], "-h") == 0)
+			usage(builtin_merge_octopus_usage);
+		else if (sep_seen && !head_arg)
+			head_arg = argv[i];
+		else {
+			struct object_id oid;
+
+			get_oid(argv[i], &oid);
+
+			if (!oideq(&oid, the_hash_algo->empty_tree)) {
+				struct commit *commit;
+				commit = lookup_commit_or_die(&oid, argv[i]);
+
+				if (sep_seen)
+					next_remote = commit_list_append(commit, next_remote);
+				else
+					next_base = commit_list_append(commit, next_base);
+			}
+		}
+	}
+
+	/* Reject if this is not an octopus -- resolve should be used
+	 * instead. */
+	if (commit_list_count(remotes) < 2)
+		return 2;
+
+	return merge_strategies_octopus(the_repository, bases, head_arg, remotes);
+}
diff --git a/git-merge-octopus.sh b/git-merge-octopus.sh
deleted file mode 100755
index 7d19d37951..0000000000
--- a/git-merge-octopus.sh
+++ /dev/null
@@ -1,112 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) 2005 Junio C Hamano
-#
-# Resolve two or more trees.
-#
-
-. git-sh-setup
-
-LF='
-'
-
-# The first parameters up to -- are merge bases; the rest are heads.
-bases= head= remotes= sep_seen=
-for arg
-do
-	case ",$sep_seen,$head,$arg," in
-	*,--,)
-		sep_seen=yes
-		;;
-	,yes,,*)
-		head=$arg
-		;;
-	,yes,*)
-		remotes="$remotes$arg "
-		;;
-	*)
-		bases="$bases$arg "
-		;;
-	esac
-done
-
-# Reject if this is not an octopus -- resolve should be used instead.
-case "$remotes" in
-?*' '?*)
-	;;
-*)
-	exit 2 ;;
-esac
-
-# MRC is the current "merge reference commit"
-# MRT is the current "merge result tree"
-
-if ! git diff-index --quiet --cached HEAD --
-then
-    gettextln "Error: Your local changes to the following files would be overwritten by merge"
-    git diff-index --cached --name-only HEAD -- | sed -e 's/^/    /'
-    exit 2
-fi
-MRC=$(git rev-parse --verify -q $head)
-MRT=$(git write-tree)
-NON_FF_MERGE=0
-OCTOPUS_FAILURE=0
-for SHA1 in $remotes
-do
-	case "$OCTOPUS_FAILURE" in
-	1)
-		# We allow only last one to have a hand-resolvable
-		# conflicts.  Last round failed and we still had
-		# a head to merge.
-		gettextln "Automated merge did not work."
-		gettextln "Should not be doing an octopus."
-		exit 2
-	esac
-
-	eval pretty_name=\${GITHEAD_$SHA1:-$SHA1}
-	if test "$SHA1" = "$pretty_name"
-	then
-		SHA1_UP="$(echo "$SHA1" | tr a-z A-Z)"
-		eval pretty_name=\${GITHEAD_$SHA1_UP:-$pretty_name}
-	fi
-	common=$(git merge-base --all $SHA1 $MRC) ||
-		die "$(eval_gettext "Unable to find common commit with \$pretty_name")"
-
-	case "$LF$common$LF" in
-	*"$LF$SHA1$LF"*)
-		eval_gettextln "Already up to date with \$pretty_name"
-		continue
-		;;
-	esac
-
-	if test "$common,$NON_FF_MERGE" = "$MRC,0"
-	then
-		# The first head being merged was a fast-forward.
-		# Advance MRC to the head being merged, and use that
-		# tree as the intermediate result of the merge.
-		# We still need to count this as part of the parent set.
-
-		eval_gettextln "Fast-forwarding to: \$pretty_name"
-		git read-tree -u -m $head $SHA1 || exit
-		MRC=$SHA1 MRT=$(git write-tree)
-		continue
-	fi
-
-	NON_FF_MERGE=1
-
-	eval_gettextln "Trying simple merge with \$pretty_name"
-	git read-tree -u -m --aggressive  $common $MRT $SHA1 || exit 2
-	next=$(git write-tree 2>/dev/null)
-	if test $? -ne 0
-	then
-		gettextln "Simple merge did not work, trying automatic merge."
-		git merge-index -o git-merge-one-file -a ||
-		OCTOPUS_FAILURE=1
-		next=$(git write-tree 2>/dev/null)
-	fi
-
-	MRC="$MRC $SHA1"
-	MRT=$next
-done
-
-exit "$OCTOPUS_FAILURE"
diff --git a/git.c b/git.c
index 794ca6e9f0..df0bebdafc 100644
--- a/git.c
+++ b/git.c
@@ -533,6 +533,7 @@ static struct cmd_struct commands[] = {
 	{ "merge-base", cmd_merge_base, RUN_SETUP },
 	{ "merge-file", cmd_merge_file, RUN_SETUP_GENTLY },
 	{ "merge-index", cmd_merge_index, RUN_SETUP | NO_PARSEOPT },
+	{ "merge-octopus", cmd_merge_octopus, RUN_SETUP | NO_PARSEOPT },
 	{ "merge-ours", cmd_merge_ours, RUN_SETUP | NO_PARSEOPT },
 	{ "merge-one-file", cmd_merge_one_file, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-recursive", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
diff --git a/merge-strategies.c b/merge-strategies.c
index 6b905dfc38..dee86389e3 100644
--- a/merge-strategies.c
+++ b/merge-strategies.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "cache-tree.h"
+#include "commit-reach.h"
 #include "dir.h"
 #include "ll-merge.h"
 #include "lockfile.h"
@@ -392,3 +393,202 @@ int merge_strategies_resolve(struct repository *r,
 	rollback_lock_file(&lock);
 	return 2;
 }
+
+static int fast_forward(struct repository *r, const struct object_id *oids,
+			int nr, int aggressive)
+{
+	int i;
+	struct tree_desc t[MAX_UNPACK_TREES];
+	struct unpack_trees_options opts;
+	struct lock_file lock = LOCK_INIT;
+
+	repo_read_index_preload(r, NULL, 0);
+	if (refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL))
+		return -1;
+
+	repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR);
+
+	memset(&opts, 0, sizeof(opts));
+	opts.head_idx = 1;
+	opts.src_index = r->index;
+	opts.dst_index = r->index;
+	opts.merge = 1;
+	opts.update = 1;
+	opts.aggressive = aggressive;
+
+	for (i = 0; i < nr; i++) {
+		struct tree *tree;
+		tree = parse_tree_indirect(oids + i);
+		if (parse_tree(tree))
+			return -1;
+		init_tree_desc(t + i, tree->buffer, tree->size);
+	}
+
+	if (nr == 1)
+		opts.fn = oneway_merge;
+	else if (nr == 2) {
+		opts.fn = twoway_merge;
+		opts.initial_checkout = is_index_unborn(r->index);
+	} else if (nr >= 3) {
+		opts.fn = threeway_merge;
+		opts.head_idx = nr - 1;
+	}
+
+	if (unpack_trees(nr, t, &opts))
+		return -1;
+
+	if (write_locked_index(r->index, &lock, COMMIT_LOCK))
+		return error(_("unable to write new index file"));
+
+	return 0;
+}
+
+static int write_tree(struct repository *r, struct tree **reference_tree)
+{
+	struct object_id oid;
+	int ret;
+
+	ret = write_index_as_tree(&oid, r->index, r->index_file, 0, NULL);
+	if (!ret)
+		*reference_tree = lookup_tree(r, &oid);
+
+	return ret;
+}
+
+int merge_strategies_octopus(struct repository *r,
+			     struct commit_list *bases, const char *head_arg,
+			     struct commit_list *remotes)
+{
+	int non_ff_merge = 0, ret = 0, references = 1;
+	struct commit **reference_commit;
+	struct tree *reference_tree;
+	struct commit_list *j;
+	struct object_id head;
+	struct strbuf sb = STRBUF_INIT;
+
+	get_oid(head_arg, &head);
+
+	reference_commit = xcalloc(commit_list_count(remotes) + 1, sizeof(struct commit *));
+	reference_commit[0] = lookup_commit_reference(r, &head);
+	reference_tree = repo_get_commit_tree(r, reference_commit[0]);
+
+	if (repo_index_has_changes(r, reference_tree, &sb)) {
+		error(_("Your local changes to the following files "
+			"would be overwritten by merge:\n  %s"),
+		      sb.buf);
+		strbuf_release(&sb);
+		ret = 2;
+		goto out;
+	}
+
+	for (j = remotes; j && j->item; j = j->next) {
+		struct commit *c = j->item;
+		struct object_id *oid = &c->object.oid;
+		struct commit_list *common, *k;
+		char *branch_name;
+		int can_ff = 1;
+
+		if (ret) {
+			/* We allow only last one to have a
+			   hand-resolvable conflicts.  Last round failed
+			   and we still had a head to merge. */
+			puts(_("Automated merge did not work."));
+			puts(_("Should not be doing an octopus."));
+
+			ret = 2;
+			goto out;
+		}
+
+		branch_name = merge_get_better_branch_name(oid_to_hex(oid));
+		common = get_merge_bases_many(c, references, reference_commit);
+
+		if (!common)
+			die(_("Unable to find common commit with %s"), branch_name);
+
+		for (k = common; k && !oideq(&k->item->object.oid, oid); k = k->next);
+
+		if (k) {
+			printf(_("Already up to date with %s\n"), branch_name);
+			free(branch_name);
+			free_commit_list(common);
+			continue;
+		}
+
+		if (!non_ff_merge) {
+			int i;
+
+			for (i = 0, k = common; k && i < references && can_ff; k = k->next, i++) {
+				can_ff = oideq(&k->item->object.oid,
+					       &reference_commit[i]->object.oid);
+			}
+		}
+
+		if (!non_ff_merge && can_ff) {
+			/* The first head being merged was a
+			   fast-forward.  Advance the reference commit
+			   to the head being merged, and use that tree
+			   as the intermediate result of the merge.  We
+			   still need to count this as part of the
+			   parent set. */
+			struct object_id oids[2];
+			printf(_("Fast-forwarding to: %s\n"), branch_name);
+
+			oidcpy(oids, &head);
+			oidcpy(oids + 1, oid);
+
+			ret = fast_forward(r, oids, 2, 0);
+			if (ret) {
+				free(branch_name);
+				free_commit_list(common);
+				goto out;
+			}
+
+			references = 0;
+			write_tree(r, &reference_tree);
+		} else {
+			int i = 0;
+			struct tree *next = NULL;
+			struct object_id oids[MAX_UNPACK_TREES];
+
+			non_ff_merge = 1;
+			printf(_("Trying simple merge with %s\n"), branch_name);
+
+			for (k = common; k; k = k->next)
+				oidcpy(oids + (i++), &k->item->object.oid);
+
+			oidcpy(oids + (i++), &reference_tree->object.oid);
+			oidcpy(oids + (i++), oid);
+
+			if (fast_forward(r, oids, i, 1)) {
+				ret = 2;
+
+				free(branch_name);
+				free_commit_list(common);
+
+				goto out;
+			}
+
+			if (write_tree(r, &next)) {
+				struct lock_file lock = LOCK_INIT;
+
+				puts(_("Simple merge did not work, trying automatic merge."));
+				repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR);
+				ret = !!merge_all(r->index, 0, 0, merge_one_file_cb, r);
+				write_locked_index(r->index, &lock, COMMIT_LOCK);
+
+				write_tree(r, &next);
+			}
+
+			reference_tree = next;
+		}
+
+		reference_commit[references++] = c;
+
+		free(branch_name);
+		free_commit_list(common);
+	}
+
+out:
+	free(reference_commit);
+	return ret;
+}
diff --git a/merge-strategies.h b/merge-strategies.h
index 778f8ce9d6..938411a04e 100644
--- a/merge-strategies.h
+++ b/merge-strategies.h
@@ -37,5 +37,8 @@ int merge_all(struct index_state *istate, int oneshot, int quiet,
 int merge_strategies_resolve(struct repository *r,
 			     struct commit_list *bases, const char *head_arg,
 			     struct commit_list *remote);
+int merge_strategies_octopus(struct repository *r,
+			     struct commit_list *bases, const char *head_arg,
+			     struct commit_list *remote);
 
 #endif /* MERGE_STRATEGIES_H */
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* [PATCH v2 08/11] merge: use the "resolve" strategy without forking
From: Alban Gruin @ 2020-09-01 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

This teaches `git merge' to invoke the "resolve" strategy with a
function call instead of forking.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/merge.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/builtin/merge.c b/builtin/merge.c
index 74829a838e..541d9bed02 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -41,6 +41,7 @@
 #include "commit-reach.h"
 #include "wt-status.h"
 #include "commit-graph.h"
+#include "merge-strategies.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
@@ -740,7 +741,10 @@ static int try_merge_strategy(const char *strategy, struct commit_list *common,
 				       COMMIT_LOCK | SKIP_IF_UNCHANGED))
 			die(_("unable to write %s"), get_index_file());
 		return clean ? 0 : 1;
-	} else {
+	} else if (!strcmp(strategy, "resolve"))
+		return merge_strategies_resolve(the_repository, common,
+						head_arg, remoteheads);
+	else {
 		return try_merge_command(the_repository,
 					 strategy, xopts_nr, xopts,
 					 common, head_arg, remoteheads);
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* [PATCH v2 11/11] sequencer: use the "octopus" merge strategy without forking
From: Alban Gruin @ 2020-09-01 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

This teaches the sequencer to invoke the "octopus" strategy with a
function call instead of forking.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 sequencer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index c4c7b28d24..34853b8970 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1927,6 +1927,9 @@ static int do_pick_commit(struct repository *r,
 		if (!strcmp(opts->strategy, "resolve")) {
 			repo_read_index(r);
 			res |= merge_strategies_resolve(r, common, oid_to_hex(&head), remotes);
+		} else if (!strcmp(opts->strategy, "octopus")) {
+			repo_read_index(r);
+			res |= merge_strategies_octopus(r, common, oid_to_hex(&head), remotes);
 		} else
 			res |= try_merge_command(r, opts->strategy,
 						 opts->xopts_nr, (const char **)opts->xopts,
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* [PATCH v2 09/11] merge: use the "octopus" strategy without forking
From: Alban Gruin @ 2020-09-01 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

This teaches `git merge' to invoke the "octopus" strategy with a
function call instead of forking.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/merge.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/merge.c b/builtin/merge.c
index 541d9bed02..90e092ad02 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -744,6 +744,9 @@ static int try_merge_strategy(const char *strategy, struct commit_list *common,
 	} else if (!strcmp(strategy, "resolve"))
 		return merge_strategies_resolve(the_repository, common,
 						head_arg, remoteheads);
+	else if (!strcmp(strategy, "octopus"))
+		return merge_strategies_octopus(the_repository, common,
+						head_arg, remoteheads);
 	else {
 		return try_merge_command(the_repository,
 					 strategy, xopts_nr, xopts,
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* [PATCH v2 03/11] merge-index: libify merge_one_path() and merge_all()
From: Alban Gruin @ 2020-09-01 10:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

The "resolve" and "octopus" merge strategies do not call directly `git
merge-one-file', they delegate the work to another git command, `git
merge-index', that will loop over files in the index and call the
specified command.  Unfortunately, these functions are not part of
libgit.a, which means that once rewritten, the strategies would still
have to invoke `merge-one-file' by spawning a new process first.

To avoid this, this moves merge_one_path(), merge_all(), and their
helpers to merge-strategies.c.  They also take a callback to dictate
what they should do for each file.  For now, only one launching a new
process is defined to preserve the behaviour of the builtin version.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/merge-index.c | 77 +++------------------------------
 merge-strategies.c    | 99 +++++++++++++++++++++++++++++++++++++++++++
 merge-strategies.h    | 17 ++++++++
 3 files changed, 123 insertions(+), 70 deletions(-)

diff --git a/builtin/merge-index.c b/builtin/merge-index.c
index 38ea6ad6ca..6cb666cc78 100644
--- a/builtin/merge-index.c
+++ b/builtin/merge-index.c
@@ -1,74 +1,11 @@
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "builtin.h"
-#include "run-command.h"
-
-static const char *pgm;
-static int one_shot, quiet;
-static int err;
-
-static int merge_entry(int pos, const char *path)
-{
-	int found;
-	const char *arguments[] = { pgm, "", "", "", path, "", "", "", NULL };
-	char hexbuf[4][GIT_MAX_HEXSZ + 1];
-	char ownbuf[4][60];
-
-	if (pos >= active_nr)
-		die("git merge-index: %s not in the cache", path);
-	found = 0;
-	do {
-		const struct cache_entry *ce = active_cache[pos];
-		int stage = ce_stage(ce);
-
-		if (strcmp(ce->name, path))
-			break;
-		found++;
-		oid_to_hex_r(hexbuf[stage], &ce->oid);
-		xsnprintf(ownbuf[stage], sizeof(ownbuf[stage]), "%o", ce->ce_mode);
-		arguments[stage] = hexbuf[stage];
-		arguments[stage + 4] = ownbuf[stage];
-	} while (++pos < active_nr);
-	if (!found)
-		die("git merge-index: %s not in the cache", path);
-
-	if (run_command_v_opt(arguments, 0)) {
-		if (one_shot)
-			err++;
-		else {
-			if (!quiet)
-				die("merge program failed");
-			exit(1);
-		}
-	}
-	return found;
-}
-
-static void merge_one_path(const char *path)
-{
-	int pos = cache_name_pos(path, strlen(path));
-
-	/*
-	 * If it already exists in the cache as stage0, it's
-	 * already merged and there is nothing to do.
-	 */
-	if (pos < 0)
-		merge_entry(-pos-1, path);
-}
-
-static void merge_all(void)
-{
-	int i;
-	for (i = 0; i < active_nr; i++) {
-		const struct cache_entry *ce = active_cache[i];
-		if (!ce_stage(ce))
-			continue;
-		i += merge_entry(i, ce->name)-1;
-	}
-}
+#include "merge-strategies.h"
 
 int cmd_merge_index(int argc, const char **argv, const char *prefix)
 {
-	int i, force_file = 0;
+	int i, force_file = 0, err = 0, one_shot = 0, quiet = 0;
+	const char *pgm;
 
 	/* Without this we cannot rely on waitpid() to tell
 	 * what happened to our children.
@@ -98,14 +35,14 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix)
 				continue;
 			}
 			if (!strcmp(arg, "-a")) {
-				merge_all();
+				err |= merge_all(&the_index, one_shot, quiet,
+						 merge_program_cb, (void *)pgm);
 				continue;
 			}
 			die("git merge-index: unknown option %s", arg);
 		}
-		merge_one_path(arg);
+		err |= merge_one_path(&the_index, one_shot, quiet, arg,
+				      merge_program_cb, (void *)pgm);
 	}
-	if (err && !quiet)
-		die("merge program failed");
 	return err;
 }
diff --git a/merge-strategies.c b/merge-strategies.c
index f2af4a894d..ffd6cf77d6 100644
--- a/merge-strategies.c
+++ b/merge-strategies.c
@@ -2,6 +2,7 @@
 #include "dir.h"
 #include "ll-merge.h"
 #include "merge-strategies.h"
+#include "run-command.h"
 #include "xdiff-interface.h"
 
 static int add_to_index_cacheinfo(struct index_state *istate,
@@ -197,3 +198,101 @@ int merge_strategies_one_file(struct repository *r,
 
 	return 0;
 }
+
+int merge_program_cb(const struct object_id *orig_blob,
+		     const struct object_id *our_blob,
+		     const struct object_id *their_blob, const char *path,
+		     unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
+		     void *data)
+{
+	char ownbuf[3][GIT_MAX_HEXSZ] = {{0}};
+	const char *arguments[] = { (char *)data, "", "", "", path,
+				    ownbuf[0], ownbuf[1], ownbuf[2],
+				    NULL };
+
+	if (orig_blob)
+		arguments[1] = oid_to_hex(orig_blob);
+	if (our_blob)
+		arguments[2] = oid_to_hex(our_blob);
+	if (their_blob)
+		arguments[3] = oid_to_hex(their_blob);
+
+	xsnprintf(ownbuf[0], sizeof(ownbuf[0]), "%o", orig_mode);
+	xsnprintf(ownbuf[1], sizeof(ownbuf[1]), "%o", our_mode);
+	xsnprintf(ownbuf[2], sizeof(ownbuf[2]), "%o", their_mode);
+
+	return run_command_v_opt(arguments, 0);
+}
+
+static int merge_entry(struct index_state *istate, int quiet, int pos,
+		       const char *path, merge_cb cb, void *data)
+{
+	int found = 0;
+	const struct object_id *oids[3] = {NULL};
+	unsigned int modes[3] = {0};
+
+	do {
+		const struct cache_entry *ce = istate->cache[pos];
+		int stage = ce_stage(ce);
+
+		if (strcmp(ce->name, path))
+			break;
+		found++;
+		oids[stage - 1] = &ce->oid;
+		modes[stage - 1] = ce->ce_mode;
+	} while (++pos < istate->cache_nr);
+	if (!found)
+		return error(_("%s is not in the cache"), path);
+
+	if (cb(oids[0], oids[1], oids[2], path, modes[0], modes[1], modes[2], data)) {
+		if (!quiet)
+			error(_("Merge program failed"));
+		return -2;
+	}
+
+	return found;
+}
+
+int merge_one_path(struct index_state *istate, int oneshot, int quiet,
+		   const char *path, merge_cb cb, void *data)
+{
+	int pos = index_name_pos(istate, path, strlen(path)), ret;
+
+	/*
+	 * If it already exists in the cache as stage0, it's
+	 * already merged and there is nothing to do.
+	 */
+	if (pos < 0) {
+		ret = merge_entry(istate, quiet, -pos - 1, path, cb, data);
+		if (ret == -1)
+			return -1;
+		else if (ret == -2)
+			return 1;
+	}
+	return 0;
+}
+
+int merge_all(struct index_state *istate, int oneshot, int quiet,
+	      merge_cb cb, void *data)
+{
+	int err = 0, i, ret;
+	for (i = 0; i < istate->cache_nr; i++) {
+		const struct cache_entry *ce = istate->cache[i];
+		if (!ce_stage(ce))
+			continue;
+
+		ret = merge_entry(istate, quiet, i, ce->name, cb, data);
+		if (ret > 0)
+			i += ret - 1;
+		else if (ret == -1)
+			return -1;
+		else if (ret == -2) {
+			if (oneshot)
+				err++;
+			else
+				return 1;
+		}
+	}
+
+	return err;
+}
diff --git a/merge-strategies.h b/merge-strategies.h
index b527d145c7..cf78d7eaf4 100644
--- a/merge-strategies.h
+++ b/merge-strategies.h
@@ -10,4 +10,21 @@ int merge_strategies_one_file(struct repository *r,
 			      unsigned int orig_mode, unsigned int our_mode,
 			      unsigned int their_mode);
 
+typedef int (*merge_cb)(const struct object_id *orig_blob,
+			const struct object_id *our_blob,
+			const struct object_id *their_blob, const char *path,
+			unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
+			void *data);
+
+int merge_program_cb(const struct object_id *orig_blob,
+		     const struct object_id *our_blob,
+		     const struct object_id *their_blob, const char *path,
+		     unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
+		     void *data);
+
+int merge_one_path(struct index_state *istate, int oneshot, int quiet,
+		   const char *path, merge_cb cb, void *data);
+int merge_all(struct index_state *istate, int oneshot, int quiet,
+	      merge_cb cb, void *data);
+
 #endif /* MERGE_STRATEGIES_H */
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* [PATCH v2 10/11] sequencer: use the "resolve" strategy without forking
From: Alban Gruin @ 2020-09-01 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

This teaches the sequencer to invoke the "resolve" strategy with a
function call instead of forking.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 sequencer.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 2425896911..c4c7b28d24 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -33,6 +33,7 @@
 #include "commit-reach.h"
 #include "rebase-interactive.h"
 #include "reset.h"
+#include "merge-strategies.h"
 
 #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION"
 
@@ -1922,9 +1923,15 @@ static int do_pick_commit(struct repository *r,
 
 		commit_list_insert(base, &common);
 		commit_list_insert(next, &remotes);
-		res |= try_merge_command(r, opts->strategy,
-					 opts->xopts_nr, (const char **)opts->xopts,
-					common, oid_to_hex(&head), remotes);
+
+		if (!strcmp(opts->strategy, "resolve")) {
+			repo_read_index(r);
+			res |= merge_strategies_resolve(r, common, oid_to_hex(&head), remotes);
+		} else
+			res |= try_merge_command(r, opts->strategy,
+						 opts->xopts_nr, (const char **)opts->xopts,
+						 common, oid_to_hex(&head), remotes);
+
 		free_commit_list(common);
 		free_commit_list(remotes);
 	}
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* [PATCH v2 02/11] merge-one-file: rewrite in C
From: Alban Gruin @ 2020-09-01 10:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

This rewrites `git merge-one-file' from shell to C.  This port is not
completely straightforward: to save precious cycles by avoiding reading
and flushing the index repeatedly, write temporary files when an
operation can be performed in-memory, or allow other function to use the
rewrite without forking nor worrying about the index, the calls to
external processes are replaced by calls to functions in libgit.a:

 - calls to `update-index --add --cacheinfo' are replaced by calls to
   add_cache_entry();

 - calls to `update-index --remove' are replaced by calls to
   remove_file_from_cache();

 - calls to `checkout-index -u -f' are replaced by calls to
   checkout_entry();

 - calls to `unpack-file' and `merge-files' are replaced by calls to
   read_mmblob() and ll_merge(), respectively, to merge files
   in-memory;

 - calls to `checkout-index -f --stage=2' are replaced by calls to
   cache_file_exists();

 - calls to `update-index' are replaced by calls to add_file_to_cache().

The bulk of the rewrite is done in a new file in libgit.a,
merge-strategies.c.  This will enable the resolve and octopus strategies
to directly call it instead of forking.

This also fixes a bug present in the original script: instead of
checking if a _regular_ file exists when a file exists in the branch to
merge, but not in our branch, the rewritten version checks if a file of
any kind (ie. a directory, ...) exists.  This fixes the tests t6035.14,
where the branch to merge had a new file, `a/b', but our branch had a
directory there; it should have failed because a directory exists, but
it did not because there was no regular file called `a/b'.  This test is
now marked as successful.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 Makefile                        |   3 +-
 builtin.h                       |   1 +
 builtin/merge-one-file.c        |  85 ++++++++++++++
 git-merge-one-file.sh           | 167 ---------------------------
 git.c                           |   1 +
 merge-strategies.c              | 199 ++++++++++++++++++++++++++++++++
 merge-strategies.h              |  13 +++
 t/t6415-merge-dir-to-symlink.sh |   2 +-
 8 files changed, 302 insertions(+), 169 deletions(-)
 create mode 100644 builtin/merge-one-file.c
 delete mode 100755 git-merge-one-file.sh
 create mode 100644 merge-strategies.c
 create mode 100644 merge-strategies.h

diff --git a/Makefile b/Makefile
index 65f8cfb236..8849d54063 100644
--- a/Makefile
+++ b/Makefile
@@ -596,7 +596,6 @@ SCRIPT_SH += git-bisect.sh
 SCRIPT_SH += git-difftool--helper.sh
 SCRIPT_SH += git-filter-branch.sh
 SCRIPT_SH += git-merge-octopus.sh
-SCRIPT_SH += git-merge-one-file.sh
 SCRIPT_SH += git-merge-resolve.sh
 SCRIPT_SH += git-mergetool.sh
 SCRIPT_SH += git-quiltimport.sh
@@ -911,6 +910,7 @@ LIB_OBJS += match-trees.o
 LIB_OBJS += mem-pool.o
 LIB_OBJS += merge-blobs.o
 LIB_OBJS += merge-recursive.o
+LIB_OBJS += merge-strategies.o
 LIB_OBJS += merge.o
 LIB_OBJS += mergesort.o
 LIB_OBJS += midx.o
@@ -1089,6 +1089,7 @@ BUILTIN_OBJS += builtin/mailsplit.o
 BUILTIN_OBJS += builtin/merge-base.o
 BUILTIN_OBJS += builtin/merge-file.o
 BUILTIN_OBJS += builtin/merge-index.o
+BUILTIN_OBJS += builtin/merge-one-file.o
 BUILTIN_OBJS += builtin/merge-ours.o
 BUILTIN_OBJS += builtin/merge-recursive.o
 BUILTIN_OBJS += builtin/merge-tree.o
diff --git a/builtin.h b/builtin.h
index a5ae15bfe5..9205d5ecdc 100644
--- a/builtin.h
+++ b/builtin.h
@@ -172,6 +172,7 @@ int cmd_merge_base(int argc, const char **argv, const char *prefix);
 int cmd_merge_index(int argc, const char **argv, const char *prefix);
 int cmd_merge_ours(int argc, const char **argv, const char *prefix);
 int cmd_merge_file(int argc, const char **argv, const char *prefix);
+int cmd_merge_one_file(int argc, const char **argv, const char *prefix);
 int cmd_merge_recursive(int argc, const char **argv, const char *prefix);
 int cmd_merge_tree(int argc, const char **argv, const char *prefix);
 int cmd_mktag(int argc, const char **argv, const char *prefix);
diff --git a/builtin/merge-one-file.c b/builtin/merge-one-file.c
new file mode 100644
index 0000000000..306a86c2f0
--- /dev/null
+++ b/builtin/merge-one-file.c
@@ -0,0 +1,85 @@
+/*
+ * Builtin "git merge-one-file"
+ *
+ * Copyright (c) 2020 Alban Gruin
+ *
+ * Based on git-merge-one-file.sh, written by Linus Torvalds.
+ *
+ * This is the git per-file merge utility, called with
+ *
+ *   argv[1] - original file SHA1 (or empty)
+ *   argv[2] - file in branch1 SHA1 (or empty)
+ *   argv[3] - file in branch2 SHA1 (or empty)
+ *   argv[4] - pathname in repository
+ *   argv[5] - original file mode (or empty)
+ *   argv[6] - file in branch1 mode (or empty)
+ *   argv[7] - file in branch2 mode (or empty)
+ *
+ * Handle some trivial cases. The _really_ trivial cases have been
+ * handled already by git read-tree, but that one doesn't do any merges
+ * that might change the tree layout.
+ */
+
+#include "cache.h"
+#include "builtin.h"
+#include "lockfile.h"
+#include "merge-strategies.h"
+
+static const char builtin_merge_one_file_usage[] =
+	"git merge-one-file <orig blob> <our blob> <their blob> <path> "
+	"<orig mode> <our mode> <their mode>\n\n"
+	"Blob ids and modes should be empty for missing files.";
+
+int cmd_merge_one_file(int argc, const char **argv, const char *prefix)
+{
+	struct object_id orig_blob, our_blob, their_blob,
+		*p_orig_blob = NULL, *p_our_blob = NULL, *p_their_blob = NULL;
+	unsigned int orig_mode = 0, our_mode = 0, their_mode = 0, ret = 0;
+	struct lock_file lock = LOCK_INIT;
+
+	if (argc != 8)
+		usage(builtin_merge_one_file_usage);
+
+	if (repo_read_index(the_repository) < 0)
+		die("invalid index");
+
+	repo_hold_locked_index(the_repository, &lock, LOCK_DIE_ON_ERROR);
+
+	if (!get_oid(argv[1], &orig_blob)) {
+		p_orig_blob = &orig_blob;
+		orig_mode = strtol(argv[5], NULL, 8);
+
+		if (!(S_ISREG(orig_mode) || S_ISDIR(orig_mode) || S_ISLNK(orig_mode)))
+			ret |= error(_("invalid 'orig' mode: %o"), orig_mode);
+	}
+
+	if (!get_oid(argv[2], &our_blob)) {
+		p_our_blob = &our_blob;
+		our_mode = strtol(argv[6], NULL, 8);
+
+		if (!(S_ISREG(our_mode) || S_ISDIR(our_mode) || S_ISLNK(our_mode)))
+			ret |= error(_("invalid 'our' mode: %o"), our_mode);
+	}
+
+	if (!get_oid(argv[3], &their_blob)) {
+		p_their_blob = &their_blob;
+		their_mode = strtol(argv[7], NULL, 8);
+
+		if (!(S_ISREG(their_mode) || S_ISDIR(their_mode) || S_ISLNK(their_mode)))
+			ret = error(_("invalid 'their' mode: %o"), their_mode);
+	}
+
+	if (ret)
+		return ret;
+
+	ret = merge_strategies_one_file(the_repository,
+					p_orig_blob, p_our_blob, p_their_blob, argv[4],
+					orig_mode, our_mode, their_mode);
+
+	if (ret) {
+		rollback_lock_file(&lock);
+		return ret;
+	}
+
+	return write_locked_index(the_repository->index, &lock, COMMIT_LOCK);
+}
diff --git a/git-merge-one-file.sh b/git-merge-one-file.sh
deleted file mode 100755
index f6d9852d2f..0000000000
--- a/git-merge-one-file.sh
+++ /dev/null
@@ -1,167 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) Linus Torvalds, 2005
-#
-# This is the git per-file merge script, called with
-#
-#   $1 - original file SHA1 (or empty)
-#   $2 - file in branch1 SHA1 (or empty)
-#   $3 - file in branch2 SHA1 (or empty)
-#   $4 - pathname in repository
-#   $5 - original file mode (or empty)
-#   $6 - file in branch1 mode (or empty)
-#   $7 - file in branch2 mode (or empty)
-#
-# Handle some trivial cases.. The _really_ trivial cases have
-# been handled already by git read-tree, but that one doesn't
-# do any merges that might change the tree layout.
-
-USAGE='<orig blob> <our blob> <their blob> <path>'
-USAGE="$USAGE <orig mode> <our mode> <their mode>"
-LONG_USAGE="usage: git merge-one-file $USAGE
-
-Blob ids and modes should be empty for missing files."
-
-SUBDIRECTORY_OK=Yes
-. git-sh-setup
-cd_to_toplevel
-require_work_tree
-
-if test $# != 7
-then
-	echo "$LONG_USAGE"
-	exit 1
-fi
-
-case "${1:-.}${2:-.}${3:-.}" in
-#
-# Deleted in both or deleted in one and unchanged in the other
-#
-"$1.." | "$1.$1" | "$1$1.")
-	if { test -z "$6" && test "$5" != "$7"; } ||
-	   { test -z "$7" && test "$5" != "$6"; }
-	then
-		echo "ERROR: File $4 deleted on one branch but had its" >&2
-		echo "ERROR: permissions changed on the other." >&2
-		exit 1
-	fi
-
-	if test -n "$2"
-	then
-		echo "Removing $4"
-	else
-		# read-tree checked that index matches HEAD already,
-		# so we know we do not have this path tracked.
-		# there may be an unrelated working tree file here,
-		# which we should just leave unmolested.  Make sure
-		# we do not have it in the index, though.
-		exec git update-index --remove -- "$4"
-	fi
-	if test -f "$4"
-	then
-		rm -f -- "$4" &&
-		rmdir -p "$(expr "z$4" : 'z\(.*\)/')" 2>/dev/null || :
-	fi &&
-		exec git update-index --remove -- "$4"
-	;;
-
-#
-# Added in one.
-#
-".$2.")
-	# the other side did not add and we added so there is nothing
-	# to be done, except making the path merged.
-	exec git update-index --add --cacheinfo "$6" "$2" "$4"
-	;;
-"..$3")
-	echo "Adding $4"
-	if test -f "$4"
-	then
-		echo "ERROR: untracked $4 is overwritten by the merge." >&2
-		exit 1
-	fi
-	git update-index --add --cacheinfo "$7" "$3" "$4" &&
-		exec git checkout-index -u -f -- "$4"
-	;;
-
-#
-# Added in both, identically (check for same permissions).
-#
-".$3$2")
-	if test "$6" != "$7"
-	then
-		echo "ERROR: File $4 added identically in both branches," >&2
-		echo "ERROR: but permissions conflict $6->$7." >&2
-		exit 1
-	fi
-	echo "Adding $4"
-	git update-index --add --cacheinfo "$6" "$2" "$4" &&
-		exec git checkout-index -u -f -- "$4"
-	;;
-
-#
-# Modified in both, but differently.
-#
-"$1$2$3" | ".$2$3")
-
-	case ",$6,$7," in
-	*,120000,*)
-		echo "ERROR: $4: Not merging symbolic link changes." >&2
-		exit 1
-		;;
-	*,160000,*)
-		echo "ERROR: $4: Not merging conflicting submodule changes." >&2
-		exit 1
-		;;
-	esac
-
-	src1=$(git unpack-file $2)
-	src2=$(git unpack-file $3)
-	case "$1" in
-	'')
-		echo "Added $4 in both, but differently."
-		orig=$(git unpack-file $(git hash-object /dev/null))
-		;;
-	*)
-		echo "Auto-merging $4"
-		orig=$(git unpack-file $1)
-		;;
-	esac
-
-	git merge-file "$src1" "$orig" "$src2"
-	ret=$?
-	msg=
-	if test $ret != 0 || test -z "$1"
-	then
-		msg='content conflict'
-		ret=1
-	fi
-
-	# Create the working tree file, using "our tree" version from the
-	# index, and then store the result of the merge.
-	git checkout-index -f --stage=2 -- "$4" && cat "$src1" >"$4" || exit 1
-	rm -f -- "$orig" "$src1" "$src2"
-
-	if test "$6" != "$7"
-	then
-		if test -n "$msg"
-		then
-			msg="$msg, "
-		fi
-		msg="${msg}permissions conflict: $5->$6,$7"
-		ret=1
-	fi
-
-	if test $ret != 0
-	then
-		echo "ERROR: $msg in $4" >&2
-		exit 1
-	fi
-	exec git update-index -- "$4"
-	;;
-
-*)
-	echo "ERROR: $4: Not handling case $1 -> $2 -> $3" >&2
-	;;
-esac
-exit 1
diff --git a/git.c b/git.c
index 8bd1d7551d..c97fea36c1 100644
--- a/git.c
+++ b/git.c
@@ -534,6 +534,7 @@ static struct cmd_struct commands[] = {
 	{ "merge-file", cmd_merge_file, RUN_SETUP_GENTLY },
 	{ "merge-index", cmd_merge_index, RUN_SETUP | NO_PARSEOPT },
 	{ "merge-ours", cmd_merge_ours, RUN_SETUP | NO_PARSEOPT },
+	{ "merge-one-file", cmd_merge_one_file, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-recursive", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
diff --git a/merge-strategies.c b/merge-strategies.c
new file mode 100644
index 0000000000..f2af4a894d
--- /dev/null
+++ b/merge-strategies.c
@@ -0,0 +1,199 @@
+#include "cache.h"
+#include "dir.h"
+#include "ll-merge.h"
+#include "merge-strategies.h"
+#include "xdiff-interface.h"
+
+static int add_to_index_cacheinfo(struct index_state *istate,
+				  unsigned int mode,
+				  const struct object_id *oid, const char *path)
+{
+	struct cache_entry *ce;
+	int len, option;
+
+	if (!verify_path(path, mode))
+		return error(_("Invalid path '%s'"), path);
+
+	len = strlen(path);
+	ce = make_empty_cache_entry(istate, len);
+
+	oidcpy(&ce->oid, oid);
+	memcpy(ce->name, path, len);
+	ce->ce_flags = create_ce_flags(0);
+	ce->ce_namelen = len;
+	ce->ce_mode = create_ce_mode(mode);
+	if (assume_unchanged)
+		ce->ce_flags |= CE_VALID;
+	option = ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE;
+	if (add_index_entry(istate, ce, option))
+		return error(_("%s: cannot add to the index"), path);
+
+	return 0;
+}
+
+static int checkout_from_index(struct index_state *istate, const char *path)
+{
+	struct checkout state = CHECKOUT_INIT;
+	struct cache_entry *ce;
+
+	state.istate = istate;
+	state.force = 1;
+	state.base_dir = "";
+	state.base_dir_len = 0;
+
+	ce = index_file_exists(istate, path, strlen(path), 0);
+	if (checkout_entry(ce, &state, NULL, NULL) < 0)
+		return error(_("%s: cannot checkout file"), path);
+	return 0;
+}
+
+static int merge_one_file_deleted(struct index_state *istate,
+				  const struct object_id *orig_blob,
+				  const struct object_id *our_blob,
+				  const struct object_id *their_blob, const char *path,
+				  unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode)
+{
+	if ((our_blob && orig_mode != our_mode) ||
+	    (their_blob && orig_mode != their_mode))
+		return error(_("File %s deleted on one branch but had its "
+			       "permissions changed on the other."), path);
+
+	if (our_blob) {
+		printf(_("Removing %s\n"), path);
+
+		if (file_exists(path))
+			remove_path(path);
+	}
+
+	if (remove_file_from_index(istate, path))
+		return error("%s: cannot remove from the index", path);
+	return 0;
+}
+
+static int do_merge_one_file(struct index_state *istate,
+			     const struct object_id *orig_blob,
+			     const struct object_id *our_blob,
+			     const struct object_id *their_blob, const char *path,
+			     unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode)
+{
+	int ret, i, dest;
+	mmbuffer_t result = {NULL, 0};
+	mmfile_t mmfs[3];
+	struct ll_merge_options merge_opts = {0};
+	struct cache_entry *ce;
+
+	if (our_mode == S_IFLNK || their_mode == S_IFLNK)
+		return error(_("%s: Not merging symbolic link changes."), path);
+	else if (our_mode == S_IFGITLINK || their_mode == S_IFGITLINK)
+		return error(_("%s: Not merging conflicting submodule changes."), path);
+
+	read_mmblob(mmfs + 1, our_blob);
+	read_mmblob(mmfs + 2, their_blob);
+
+	if (orig_blob) {
+		printf(_("Auto-merging %s\n"), path);
+		read_mmblob(mmfs + 0, orig_blob);
+	} else {
+		printf(_("Added %s in both, but differently.\n"), path);
+		read_mmblob(mmfs + 0, &null_oid);
+	}
+
+	merge_opts.xdl_opts = XDL_MERGE_ZEALOUS_ALNUM;
+	ret = ll_merge(&result, path,
+		       mmfs + 0, "orig",
+		       mmfs + 1, "our",
+		       mmfs + 2, "their",
+		       istate, &merge_opts);
+
+	for (i = 0; i < 3; i++)
+		free(mmfs[i].ptr);
+
+	if (ret > 127 || !orig_blob)
+		ret = error(_("content conflict in %s"), path);
+
+	/* Create the working tree file, using "our tree" version from
+	   the index, and then store the result of the merge. */
+	ce = index_file_exists(istate, path, strlen(path), 0);
+	if (!ce)
+		BUG("file is not present in the cache?");
+
+	unlink(path);
+	dest = open(path, O_WRONLY | O_CREAT, ce->ce_mode);
+	write_in_full(dest, result.ptr, result.size);
+	close(dest);
+
+	free(result.ptr);
+
+	if (ret && our_mode != their_mode)
+		return error(_("permission conflict: %o->%o,%o in %s"),
+			     orig_mode, our_mode, their_mode, path);
+	if (ret)
+		return 1;
+
+	return add_file_to_index(istate, path, 0);
+}
+
+int merge_strategies_one_file(struct repository *r,
+			      const struct object_id *orig_blob,
+			      const struct object_id *our_blob,
+			      const struct object_id *their_blob, const char *path,
+			      unsigned int orig_mode, unsigned int our_mode,
+			      unsigned int their_mode)
+{
+	if (orig_blob &&
+	    ((!their_blob && our_blob && oideq(orig_blob, our_blob)) ||
+	     (!our_blob && their_blob && oideq(orig_blob, their_blob))))
+		/* Deleted in both or deleted in one and unchanged in
+		   the other */
+		return merge_one_file_deleted(r->index,
+					      orig_blob, our_blob, their_blob, path,
+					      orig_mode, our_mode, their_mode);
+	else if (!orig_blob && our_blob && !their_blob) {
+		/* Added in one.  The other side did not add and we
+		   added so there is nothing to be done, except making
+		   the path merged. */
+		return add_to_index_cacheinfo(r->index, our_mode, our_blob, path);
+	} else if (!orig_blob && !our_blob && their_blob) {
+		printf(_("Adding %s\n"), path);
+
+		if (file_exists(path))
+			return error(_("untracked %s is overwritten by the merge."), path);
+
+		if (add_to_index_cacheinfo(r->index, their_mode, their_blob, path))
+			return 1;
+		return checkout_from_index(r->index, path);
+	} else if (!orig_blob && our_blob && their_blob &&
+		   oideq(our_blob, their_blob)) {
+		/* Added in both, identically (check for same
+		   permissions). */
+		if (our_mode != their_mode)
+			return error(_("File %s added identically in both branches, "
+				       "but permissions conflict %o->%o."),
+				     path, our_mode, their_mode);
+
+		printf(_("Adding %s\n"), path);
+
+		if (add_to_index_cacheinfo(r->index, our_mode, our_blob, path))
+			return 1;
+		return checkout_from_index(r->index, path);
+	} else if (our_blob && their_blob)
+		/* Modified in both, but differently. */
+		return do_merge_one_file(r->index,
+					 orig_blob, our_blob, their_blob, path,
+					 orig_mode, our_mode, their_mode);
+	else {
+		char *orig_hex = "", *our_hex = "", *their_hex = "";
+
+		if (orig_blob)
+			orig_hex = oid_to_hex(orig_blob);
+		if (our_blob)
+			our_hex = oid_to_hex(our_blob);
+		if (their_blob)
+			their_hex = oid_to_hex(their_blob);
+
+		return error(_("%s: Not handling case %s -> %s -> %s"),
+			     path, orig_hex, our_hex, their_hex);
+	}
+
+	return 0;
+}
diff --git a/merge-strategies.h b/merge-strategies.h
new file mode 100644
index 0000000000..b527d145c7
--- /dev/null
+++ b/merge-strategies.h
@@ -0,0 +1,13 @@
+#ifndef MERGE_STRATEGIES_H
+#define MERGE_STRATEGIES_H
+
+#include "object.h"
+
+int merge_strategies_one_file(struct repository *r,
+			      const struct object_id *orig_blob,
+			      const struct object_id *our_blob,
+			      const struct object_id *their_blob, const char *path,
+			      unsigned int orig_mode, unsigned int our_mode,
+			      unsigned int their_mode);
+
+#endif /* MERGE_STRATEGIES_H */
diff --git a/t/t6415-merge-dir-to-symlink.sh b/t/t6415-merge-dir-to-symlink.sh
index 2eddcc7664..5fb74e39a0 100755
--- a/t/t6415-merge-dir-to-symlink.sh
+++ b/t/t6415-merge-dir-to-symlink.sh
@@ -94,7 +94,7 @@ test_expect_success SYMLINKS 'a/b was resolved as symlink' '
 	test -h a/b
 '
 
-test_expect_failure 'do not lose untracked in merge (resolve)' '
+test_expect_success 'do not lose untracked in merge (resolve)' '
 	git reset --hard &&
 	git checkout baseline^0 &&
 	>a/b/c/e &&
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* [PATCH v2 06/11] merge-recursive: move better_branch_name() to merge.c
From: Alban Gruin @ 2020-09-01 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

get_better_branch_name() will be used by rebase-octopus once it is
rewritten in C, so instead of duplicating it, this moves this function
preventively inside an appropriate file in libgit.a.  This function is
also renamed to reflect its usage by merge strategies.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/merge-recursive.c | 16 ++--------------
 cache.h                   |  2 +-
 merge.c                   | 12 ++++++++++++
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/builtin/merge-recursive.c b/builtin/merge-recursive.c
index a4bfd8fc51..972243b5e9 100644
--- a/builtin/merge-recursive.c
+++ b/builtin/merge-recursive.c
@@ -8,18 +8,6 @@
 static const char builtin_merge_recursive_usage[] =
 	"git %s <base>... -- <head> <remote> ...";
 
-static char *better_branch_name(const char *branch)
-{
-	static char githead_env[8 + GIT_MAX_HEXSZ + 1];
-	char *name;
-
-	if (strlen(branch) != the_hash_algo->hexsz)
-		return xstrdup(branch);
-	xsnprintf(githead_env, sizeof(githead_env), "GITHEAD_%s", branch);
-	name = getenv(githead_env);
-	return xstrdup(name ? name : branch);
-}
-
 int cmd_merge_recursive(int argc, const char **argv, const char *prefix)
 {
 	const struct object_id *bases[21];
@@ -75,8 +63,8 @@ int cmd_merge_recursive(int argc, const char **argv, const char *prefix)
 	if (get_oid(o.branch2, &h2))
 		die(_("could not resolve ref '%s'"), o.branch2);
 
-	o.branch1 = better1 = better_branch_name(o.branch1);
-	o.branch2 = better2 = better_branch_name(o.branch2);
+	o.branch1 = better1 = merge_get_better_branch_name(o.branch1);
+	o.branch2 = better2 = merge_get_better_branch_name(o.branch2);
 
 	if (o.verbosity >= 3)
 		printf(_("Merging %s with %s\n"), o.branch1, o.branch2);
diff --git a/cache.h b/cache.h
index 4cad61ffa4..a926b0bc87 100644
--- a/cache.h
+++ b/cache.h
@@ -1917,7 +1917,7 @@ int checkout_fast_forward(struct repository *r,
 			  const struct object_id *from,
 			  const struct object_id *to,
 			  int overwrite_ignore);
-
+char *merge_get_better_branch_name(const char *branch);
 
 int sane_execvp(const char *file, char *const argv[]);
 
diff --git a/merge.c b/merge.c
index 5fb88af102..801d673c5f 100644
--- a/merge.c
+++ b/merge.c
@@ -109,3 +109,15 @@ int checkout_fast_forward(struct repository *r,
 		return error(_("unable to write new index file"));
 	return 0;
 }
+
+char *merge_get_better_branch_name(const char *branch)
+{
+	static char githead_env[8 + GIT_MAX_HEXSZ + 1];
+	char *name;
+
+	if (strlen(branch) != the_hash_algo->hexsz)
+		return xstrdup(branch);
+	xsnprintf(githead_env, sizeof(githead_env), "GITHEAD_%s", branch);
+	name = getenv(githead_env);
+	return xstrdup(name ? name : branch);
+}
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* [PATCH v2 04/11] merge-index: don't fork if the requested program is `git-merge-one-file'
From: Alban Gruin @ 2020-09-01 10:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

Since `git-merge-one-file' has been rewritten and libified, this teaches
`merge-index' to call merge_strategies_one_file() without forking using
a new callback, merge_one_file_cb().

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/merge-index.c | 29 +++++++++++++++++++++++++++--
 merge-strategies.c    | 11 +++++++++++
 merge-strategies.h    |  6 ++++++
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/builtin/merge-index.c b/builtin/merge-index.c
index 6cb666cc78..19fff9a113 100644
--- a/builtin/merge-index.c
+++ b/builtin/merge-index.c
@@ -1,11 +1,15 @@
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "builtin.h"
+#include "lockfile.h"
 #include "merge-strategies.h"
 
 int cmd_merge_index(int argc, const char **argv, const char *prefix)
 {
 	int i, force_file = 0, err = 0, one_shot = 0, quiet = 0;
 	const char *pgm;
+	void *data;
+	merge_cb merge_action;
+	struct lock_file lock = LOCK_INIT;
 
 	/* Without this we cannot rely on waitpid() to tell
 	 * what happened to our children.
@@ -26,7 +30,19 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix)
 		quiet = 1;
 		i++;
 	}
+
 	pgm = argv[i++];
+	if (!strcmp(pgm, "git-merge-one-file")) {
+		merge_action = merge_one_file_cb;
+		data = (void *)the_repository;
+
+		setup_work_tree();
+		hold_locked_index(&lock, LOCK_DIE_ON_ERROR);
+	} else {
+		merge_action = merge_program_cb;
+		data = (void *)pgm;
+	}
+
 	for (; i < argc; i++) {
 		const char *arg = argv[i];
 		if (!force_file && *arg == '-') {
@@ -36,13 +52,22 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix)
 			}
 			if (!strcmp(arg, "-a")) {
 				err |= merge_all(&the_index, one_shot, quiet,
-						 merge_program_cb, (void *)pgm);
+						 merge_action, data);
 				continue;
 			}
 			die("git merge-index: unknown option %s", arg);
 		}
 		err |= merge_one_path(&the_index, one_shot, quiet, arg,
-				      merge_program_cb, (void *)pgm);
+				      merge_action, data);
+	}
+
+	if (merge_action == merge_one_file_cb) {
+		if (err) {
+			rollback_lock_file(&lock);
+			return err;
+		}
+
+		return write_locked_index(&the_index, &lock, COMMIT_LOCK);
 	}
 	return err;
 }
diff --git a/merge-strategies.c b/merge-strategies.c
index ffd6cf77d6..00738863e4 100644
--- a/merge-strategies.c
+++ b/merge-strategies.c
@@ -199,6 +199,17 @@ int merge_strategies_one_file(struct repository *r,
 	return 0;
 }
 
+int merge_one_file_cb(const struct object_id *orig_blob,
+		      const struct object_id *our_blob,
+		      const struct object_id *their_blob, const char *path,
+		      unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
+		      void *data)
+{
+	return merge_strategies_one_file((struct repository *)data,
+					 orig_blob, our_blob, their_blob, path,
+					 orig_mode, our_mode, their_mode);
+}
+
 int merge_program_cb(const struct object_id *orig_blob,
 		     const struct object_id *our_blob,
 		     const struct object_id *their_blob, const char *path,
diff --git a/merge-strategies.h b/merge-strategies.h
index cf78d7eaf4..40e175ca39 100644
--- a/merge-strategies.h
+++ b/merge-strategies.h
@@ -16,6 +16,12 @@ typedef int (*merge_cb)(const struct object_id *orig_blob,
 			unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
 			void *data);
 
+int merge_one_file_cb(const struct object_id *orig_blob,
+		      const struct object_id *our_blob,
+		      const struct object_id *their_blob, const char *path,
+		      unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
+		      void *data);
+
 int merge_program_cb(const struct object_id *orig_blob,
 		     const struct object_id *our_blob,
 		     const struct object_id *their_blob, const char *path,
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* [PATCH v2 00/11] Rewrite the remaining merge strategies from shell to C
From: Alban Gruin @ 2020-09-01 10:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200625121953.16991-1-alban.gruin@gmail.com>

In a effort to reduce the number of shell scripts in git's codebase, I
propose this patch series converting the two remaining merge strategies,
resolve and octopus, from shell to C.  This will enable slightly better
performance, better integration with git itself (no more forking to
perform these operations), better portability (Windows and shell scripts
don't mix well).

Three scripts are actually converted: first git-merge-one-file.sh, then
git-merge-resolve.sh, and finally git-merge-octopus.sh.  Not only they
are converted, but they also are modified to operate without forking,
and then libified so they can be used by git without spawning another
process.

The first patch is not important to make the whole series work, but I
made this patch while working on it.

This series keeps the commands `git merge-one-file', `git
merge-resolve', and `git merge-octopus', so any script depending on them
should keep working without any changes.

This series is based on d9cd433147 (po: add missing letter for French
message, 2020-08-27).  The tip is tagged as
"rewrite-merge-strategies-v2" at https://github.com/agrn/git.

Changes since v1:

 - Merged commits rewriting and libifying scripts.

 - Introduce checks in merge-one-file to check that file modes are
   correct.

 - Use ll_merge() instead of xdl_merge().

 - merge-index does no longer fork to call git-merge-one-file.

 - Remove usage of the_index in merge-one-file.c.

 - Mark more strings for translation.

 - Carry more comments from the original scripts.

 - Use GIT_MAX_HEXSZ instead of hardcoding 60.

Alban Gruin (11):
  t6027: modernise tests
  merge-one-file: rewrite in C
  merge-index: libify merge_one_path() and merge_all()
  merge-index: don't fork if the requested program is
    `git-merge-one-file'
  merge-resolve: rewrite in C
  merge-recursive: move better_branch_name() to merge.c
  merge-octopus: rewrite in C
  merge: use the "resolve" strategy without forking
  merge: use the "octopus" strategy without forking
  sequencer: use the "resolve" strategy without forking
  sequencer: use the "octopus" merge strategy without forking

 Makefile                        |   7 +-
 builtin.h                       |   3 +
 builtin/merge-index.c           | 102 ++----
 builtin/merge-octopus.c         |  65 ++++
 builtin/merge-one-file.c        |  85 +++++
 builtin/merge-recursive.c       |  16 +-
 builtin/merge-resolve.c         |  69 ++++
 builtin/merge.c                 |   9 +-
 cache.h                         |   2 +-
 git-merge-octopus.sh            | 112 ------
 git-merge-one-file.sh           | 167 ---------
 git-merge-resolve.sh            |  54 ---
 git.c                           |   3 +
 merge-strategies.c              | 594 ++++++++++++++++++++++++++++++++
 merge-strategies.h              |  44 +++
 merge.c                         |  12 +
 sequencer.c                     |  16 +-
 t/t6407-merge-binary.sh         |  27 +-
 t/t6415-merge-dir-to-symlink.sh |   2 +-
 19 files changed, 942 insertions(+), 447 deletions(-)
 create mode 100644 builtin/merge-octopus.c
 create mode 100644 builtin/merge-one-file.c
 create mode 100644 builtin/merge-resolve.c
 delete mode 100755 git-merge-octopus.sh
 delete mode 100755 git-merge-one-file.sh
 delete mode 100755 git-merge-resolve.sh
 create mode 100644 merge-strategies.c
 create mode 100644 merge-strategies.h

Range-diff against v1:
 1:  50e15b5243 !  1:  28c8fd11b6 t6027: modernise tests
    @@ Commit message
     
         Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
     
    - ## t/t6027-merge-binary.sh ##
    -@@ t/t6027-merge-binary.sh: test_description='ask merge-recursive to merge binary files'
    + ## t/t6407-merge-binary.sh ##
    +@@ t/t6407-merge-binary.sh: test_description='ask merge-recursive to merge binary files'
      . ./test-lib.sh
      
      test_expect_success setup '
    @@ t/t6027-merge-binary.sh: test_description='ask merge-recursive to merge binary f
      	cat "$TEST_DIRECTORY"/test-binary-1.png >m &&
      	git add m &&
      	git ls-files -s | sed -e "s/ 0	/ 1	/" >E1 &&
    -@@ t/t6027-merge-binary.sh: test_expect_success setup '
    +@@ t/t6407-merge-binary.sh: test_expect_success setup '
      '
      
      test_expect_success resolve '
 2:  08a337738e <  -:  ---------- merge-one-file: rewrite in C
 3:  5da78d5de1 <  -:  ---------- merge-one-file: remove calls to external processes
 4:  11c0da9e13 <  -:  ---------- merge-one-file: use error() instead of fprintf(stderr, ...)
 5:  df28965c8e <  -:  ---------- merge-one-file: libify merge_one_file()
 -:  ---------- >  2:  f5ab0fdf0a merge-one-file: rewrite in C
 6:  84f2f2946a !  3:  7f3ce7da17 merge-index: libify merge_one_path() and merge_all()
    @@ builtin/merge-index.c: int cmd_merge_index(int argc, const char **argv, const ch
     
      ## merge-strategies.c ##
     @@
    - #include "cache.h"
      #include "dir.h"
    + #include "ll-merge.h"
      #include "merge-strategies.h"
     +#include "run-command.h"
      #include "xdiff-interface.h"
    @@ merge-strategies.c: int merge_strategies_one_file(struct repository *r,
     +		     unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode,
     +		     void *data)
     +{
    -+	char ownbuf[3][60] = {{0}};
    ++	char ownbuf[3][GIT_MAX_HEXSZ] = {{0}};
     +	const char *arguments[] = { (char *)data, "", "", "", path,
     +				    ownbuf[0], ownbuf[1], ownbuf[2],
     +				    NULL };
 7:  1f864a4840 <  -:  ---------- merge-resolve: rewrite in C
 8:  3517990e6a <  -:  ---------- merge-resolve: remove calls to external processes
 9:  9831fe1729 <  -:  ---------- merge-resolve: libify merge_resolve()
 -:  ---------- >  4:  07e6a6aaef merge-index: don't fork if the requested program is `git-merge-one-file'
 -:  ---------- >  5:  117d4fc840 merge-resolve: rewrite in C
10:  99d42e8ea1 =  6:  4fc955962b merge-recursive: move better_branch_name() to merge.c
11:  3182673ea7 <  -:  ---------- merge-octopus: rewrite in C
12:  8f4cfcefb7 <  -:  ---------- merge-octopus: remove calls to external processes
13:  d4dba22988 <  -:  ---------- merge-octopus: libify merge_octopus()
 -:  ---------- >  7:  e7b9e15b34 merge-octopus: rewrite in C
14:  bbe50cd770 =  8:  cd0662201d merge: use the "resolve" strategy without forking
15:  b7aff6fb3a =  9:  0525ff0183 merge: use the "octopus" strategy without forking
16:  c1cdcce3a9 = 10:  6fbf599ba4 sequencer: use the "resolve" strategy without forking
17:  e68765cdc7 = 11:  2c2dc3cc62 sequencer: use the "octopus" merge strategy without forking
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply

* [PATCH v2 01/11] t6027: modernise tests
From: Alban Gruin @ 2020-09-01 10:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, phillip.wood, Alban Gruin
In-Reply-To: <20200901105705.6059-1-alban.gruin@gmail.com>

Some tests in t6027 uses a if/then/else to check if a command failed or
not, but we have the `test_must_fail' function to do it correctly for us
nowadays.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 t/t6407-merge-binary.sh | 27 ++++++---------------------
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/t/t6407-merge-binary.sh b/t/t6407-merge-binary.sh
index 4e6c7cb77e..071d3f7343 100755
--- a/t/t6407-merge-binary.sh
+++ b/t/t6407-merge-binary.sh
@@ -5,7 +5,6 @@ test_description='ask merge-recursive to merge binary files'
 . ./test-lib.sh
 
 test_expect_success setup '
-
 	cat "$TEST_DIRECTORY"/test-binary-1.png >m &&
 	git add m &&
 	git ls-files -s | sed -e "s/ 0	/ 1	/" >E1 &&
@@ -35,33 +34,19 @@ test_expect_success setup '
 '
 
 test_expect_success resolve '
-
 	rm -f a* m* &&
 	git reset --hard anchor &&
-
-	if git merge -s resolve master
-	then
-		echo Oops, should not have succeeded
-		false
-	else
-		git ls-files -s >current
-		test_cmp expect current
-	fi
+	test_must_fail git merge -s resolve master &&
+	git ls-files -s >current &&
+	test_cmp expect current
 '
 
 test_expect_success recursive '
-
 	rm -f a* m* &&
 	git reset --hard anchor &&
-
-	if git merge -s recursive master
-	then
-		echo Oops, should not have succeeded
-		false
-	else
-		git ls-files -s >current
-		test_cmp expect current
-	fi
+	test_must_fail git merge -s recursive master &&
+	git ls-files -s >current &&
+	test_cmp expect current
 '
 
 test_done
-- 
2.28.0.370.g2c2dc3cc62


^ permalink raw reply related

* Re: [PATCH v3 06/11] commit-graph: add a slab to store topological levels
From: Abhishek Kumar @ 2020-09-01 10:26 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: abhishekkumar8222, git, gitgitgadget, stolee
In-Reply-To: <CANQwDwdsV0mSos7M_d7UP1CjT1rCyA_GfaYarMKUZaFdDZ0WRg@mail.gmail.com>

On Tue, Aug 25, 2020 at 09:56:44AM +0200, Jakub Narębski wrote:
> On Tue, 25 Aug 2020 at 09:33, Jakub Narębski <jnareb@gmail.com> wrote:
>
> ...
>
> >
> > All right, we might want to make use of the fact that the value of 0 for
> > topological level here always mean that its value for a commit needs to
> > be computed, that 0 is not a valid value for topological levels.
> > - if the value 0 came from commit-graph file, it means that it came
> >   from Git version that used commit-graph but didn't compute generation
> >   numbers; the value is GENERATION_NUMBER_ZERO
> > - the value 0 might came from the fact that commit is not in graph,
> >   and that commit-slab zero-initializes the values stored; let's
> >   call this value GENERATION_NUMBER_UNINITIALIZED
> >
> > If we ensure that corrected commit date can never be zero (which is
> > extremely unlikely, as one of root commits would have to be malformed or
> > written on badly misconfigured computer, with value of 0 for committer
> > timestamp), then this "happy accident" can keep working.
> >
> >   As a special case, commit date with timestamp of zero (01.01.1970 00:00:00Z)
> >   has corrected commit date of one, to be able to distinguish
> >   uninitialized values.
> >
> > Or something like that.
> >
> > Actually, it is not even necessary, as corrected commit date of 0 just
> > means that this single value (well, for every root commit with commit
> > date of 0) would be unnecessary recomputed in compute_generation_numbers().
> >
> > Anyway, we would want to document this fact in the commit message.
> 
> Alternatively, instead of comparing 'level' (and later in series also
> 'corrected_commit_date') against GENERATION_NUMBER_INFINITY,
> we could load at no extra cost `graph_pos` value and compare it
> against COMMIT_NOT_FROM_GRAPH.
> 
> But with this solution we could never get rid of graph_pos, if we
> think it is unnecessary. If we split commit_graph_data into separate
> slabs (as it was in early versions of respective patch series), we
> would have to pay additional cost.
> 
> But it is an alternative.
> 
> Best,
> -- 
> Jakub Narębski

I think updating a commit date with timestampt of zero to use corrected
commit date of one would leave us more options down the line.

Changing this is easy enough.

For a root commit with timestamp zero, current->date would be zero and 
max_corrected_commit_date would be zero as well. So we can set 
corrected commit date as `max_corrected_commit_date + 1`, instead of the
earlier `(current->date - 1) + 1`.

----

diff --git a/commit-graph.c b/commit-graph.c
index 7ed0a33ad6..e3c5e30405 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1389,7 +1389,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 					max_level = GENERATION_NUMBER_V1_MAX - 1;
 				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
 
-				if (current->date > max_corrected_commit_date)
+				if (current->date && current->date > max_corrected_commit_date)
 					max_corrected_commit_date = current->date - 1;
 				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
 			}

^ permalink raw reply related

* Re: [PATCH v3 10/11] commit-reach: use corrected commit dates in paint_down_to_common()
From: Abhishek Kumar @ 2020-09-01 10:08 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: abhishekkumar8222, git, gitgitgadget, stolee
In-Reply-To: <85imdah50e.fsf@gmail.com>

On Sat, Aug 22, 2020 at 09:09:21PM +0200, Jakub Narębski wrote:
> Hello Abhishek,
> 
> "Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > From: Abhishek Kumar <abhishekkumar8222@gmail.com>
> >
> > With corrected commit dates implemented, we no longer have to rely on
> > commit date as a heuristic in paint_down_to_common().
> 
> All right, but it would be nice to have some benchmark data: what were
> performance when using topological levels, what was performance when
> using commit date heuristics (before this patch), what is performace now
> when using corrected commit date.
> 
> >
> > t6024-recursive-merge setups a unique repository where all commits have
> > the same committer date without well-defined merge-base. As this has
> > already caused problems (as noted in 859fdc0 (commit-graph: define
> > GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph within the
> > test script.
> 
> OK?

In hindsight, that is a terrible explanation. Here's what I have revised
this to:

  With corrected commit dates implemented, we no longer have to rely on
  commit date as a heuristic in paint_down_to_common().

  While using corrected commit dates Git walks nearly the same number of
  commits as commit date, the process is slower as for each comparision we
  have to access the commit-slab (for corrected committer date) instead of
  accessing struct member (for committer date).

  For example, the command `git merge-base v4.8 v4.9` on the linux
  repository walks 167468 commits, taking 0.135s for committer date and
  167496 commits, taking 0.157s for corrected committer date respectively.

  t6404-recursive-merge setups a unique repository where all commits have
  the same committer date without well-defined merge-base. As this has
  already caused problems (as noted in 859fdc0 (commit-graph: define
  GIT_TEST_COMMIT_GRAPH, 2018-08-29)).

  While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer
  date as a heuristic in paint_down_to_common(). 6404.1 'combined merge
  conflicts' merges commits in the order:
  - Merge C with B to form a intermediate commit.
  - Merge the intermediate commit with A.

  With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently
  use the corrected committer date, which changes the order in which
  commits are merged:
  - Merge A with B to form a intermediate commit.
  - Merge the intermediate commit with C.

  While resulting repositories are equivalent, 6404.4 'virtual trees were
  processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting
  different merge-bases and thus have different object ids for the
  intermediate commits.

  As this has already causes problems (as noted in 859fdc0 (commit-graph:
  define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph
  within t6404-recursive-merge.
> 
> >
> > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> > ---
> >  commit-graph.c             | 14 ++++++++++++++
> >  commit-graph.h             |  6 ++++++
> >  commit-reach.c             |  2 +-
> >  t/t6024-recursive-merge.sh |  4 +++-
> >  4 files changed, 24 insertions(+), 2 deletions(-)
> >
> 
> I have reorderd files for easier review.
> 
> > diff --git a/commit-graph.h b/commit-graph.h
> > index 3cf89d895d..e22ec1e626 100644
> > --- a/commit-graph.h
> > +++ b/commit-graph.h
> > @@ -91,6 +91,12 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size);
> >   */
> >  int generation_numbers_enabled(struct repository *r);
> >
> > +/*
> > + * Return 1 if and only if the repository has a commit-graph
> > + * file and generation data chunk has been written for the file.
> > + */
> > +int corrected_commit_dates_enabled(struct repository *r);
> > +
> >  enum commit_graph_write_flags {
> >  	COMMIT_GRAPH_WRITE_APPEND     = (1 << 0),
> >  	COMMIT_GRAPH_WRITE_PROGRESS   = (1 << 1),
> 
> All right.
> 
> > diff --git a/commit-graph.c b/commit-graph.c
> > index c1292f8e08..6411068411 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -703,6 +703,20 @@ int generation_numbers_enabled(struct repository *r)
> >  	return !!first_generation;
> >  }
> >
> > +int corrected_commit_dates_enabled(struct repository *r)
> > +{
> > +	struct commit_graph *g;
> > +	if (!prepare_commit_graph(r))
> > +		return 0;
> > +
> > +	g = r->objects->commit_graph;
> > +
> > +	if (!g->num_commits)
> > +		return 0;
> > +
> > +	return !!g->chunk_generation_data;
> > +}
> 
> The previous commit introduced validate_mixed_generation_chain(), which
> walked whole split commit-graph chain, and set `read_generation_data`
> field in `struct commit_graph` for all layers in the chain.
> 
> This function examines only the top layer, so it follows the assumption
> that Git would behave in such way that oly topmost layers in the chai
> can be GDAT-less.
> 
> Why the difference?  Couldn't validate_mixed_generation_chain() simply
> call corrected_commit_dates_enabled()?

The previous commit didn't need to walk the whole split commit-graph
chain. Because of how we are handling writing in a mixed generation data
chunk, if a layer has generation data chunk, all layers below it have a
generation data chunk as well.

So, there are two cases at hand:

- Topmost layer has generation data chunk, so we know all layers below
  it has generation data chunk and we can read values from it.
- Topmost layer does not have generation data chunk, so we know we can't
  read from generation data chunk.

Just checking the topmost layer suffices - modified the previous commit.

Then, this function is more or less the same as
`g->read_generation_data` that is, if we are reading from generation
data chunk, we are using corrected commit dates.

> 
> > +
> >  static void close_commit_graph_one(struct commit_graph *g)
> >  {
> >  	if (!g)
> > diff --git a/commit-reach.c b/commit-reach.c
> > index 470bc80139..3a1b925274 100644
> > --- a/commit-reach.c
> > +++ b/commit-reach.c
> > @@ -39,7 +39,7 @@ static struct commit_list *paint_down_to_common(struct repository *r,
> >  	int i;
> >  	timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
> >
> > -	if (!min_generation)
> 
> This check was added in 091f4cf (commit: don't use generation numbers if
> not needed, 2018-08-30) by Derrick Stolee, and its commit message
> includes benchmark results for running 'git merge-base v4.8 v4.9' in
> Linux kernel repository:
> 
>       v2.18.0: 0.122s    167,468 walked
>   v2.19.0-rc1: 0.547s    635,579 walked
>          HEAD: 0.127s
> 
> > +	if (!min_generation && !corrected_commit_dates_enabled(r))
> >  		queue.compare = compare_commits_by_commit_date;
> 
> It would be nice to have similar benchmark for this change... unless of
> course there is no change in performance, but I think then it needs to
> be stated explicitly.  I think.
> 

Mentioned in the commit message - we walk (nearly) the same number of
commits but take somewhat longer.

> >
> >  	one->object.flags |= PARENT1;
> > diff --git a/t/t6024-recursive-merge.sh b/t/t6024-recursive-merge.sh
> > index 332cfc53fd..d3def66e7d 100755
> > --- a/t/t6024-recursive-merge.sh
> > +++ b/t/t6024-recursive-merge.sh
> > @@ -15,6 +15,8 @@ GIT_COMMITTER_DATE="2006-12-12 23:28:00 +0100"
> >  export GIT_COMMITTER_DATE
> >
> >  test_expect_success 'setup tests' '
> > +	GIT_TEST_COMMIT_GRAPH=0 &&
> > +	export GIT_TEST_COMMIT_GRAPH &&
> >  	echo 1 >a1 &&
> >  	git add a1 &&
> >  	GIT_AUTHOR_DATE="2006-12-12 23:00:00" git commit -m 1 a1 &&
> > @@ -66,7 +68,7 @@ test_expect_success 'setup tests' '
> >  '
> >
> >  test_expect_success 'combined merge conflicts' '
> > -	test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git merge -m final G
> > +	test_must_fail git merge -m final G
> >  '
> >
> >  test_expect_success 'result contains a conflict' '
> 
> OK, so instead of disabling commit-graph for this test, now we disable
> it for the whole script.
> 
> Maybe this change should be in a separate patch?

With the explanation in commit message, it's clear to see how using
corrected commit dates leads to an (incorrectly) failing test. Does it
still make sense to seperate them?

> 
> Best,
> -- 
> Jakub Narębski

Thanks
- Abhishek

^ permalink raw reply

* Re: [PATCH 0/5] Add struct strmap and associated utility functions
From: Jeff King @ 2020-09-01  9:27 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List
In-Reply-To: <CABPp-BGQB8rD8KyzccTQQ_X3Puyy3g5OOSEQES8Lv8Qtb-zRPg@mail.gmail.com>

On Fri, Aug 28, 2020 at 08:29:44AM -0700, Elijah Newren wrote:

> > It may also be a sign that we should be growing the hash more
> > aggressively in the first place. Of course all of this is predicated
> > having some benchmarks. It would be useful to know which part actually
> > provided the speedup.
> 
> Your thoughts here are great; I also had another one this past week --
> I could introduce a hashmap_partial_clear() (in addition to
> hashmap_clear()) for the special usecase I have of leaving the table
> allocated and pre-sized.  It'd prevent people from accidentally using
> it and forgetting to free stuff, while still allowing me to take
> advantage.  But, as you say, more benchmarks would be useful to find
> which parts provided the speedup before taking any of these steps.

Yeah, having a separate function to explicitly do "remove all elements
but keep the table allocated" would be fine with me. My big desire is
that clear() should do the safe, non-leaking thing by default.

-Peff

^ permalink raw reply

* [PATCH] core.abbrev <off|false|no> disables abbreviations
From: Eric Wong @ 2020-09-01  7:43 UTC (permalink / raw)
  To: git; +Cc: brian m. carlson

These allows users to write hash-agnostic scripts and configs to
disable abbreviations.  Using "-c core.abbrev=40" will be
insufficient with SHA-256, and "-c core.abbrev=64" won't work
with SHA-1 repos today.

Signed-off-by: Eric Wong <e@80x24.org>
---
  I kinda wanted to allow a value of "max", but I figured the existing
  boolean falsiness words might make more sense with `--no-abbrev' in
  for some commands...  Naming is hard :x

 config.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/config.c b/config.c
index 2bdff4457b..f2e09c72ca 100644
--- a/config.c
+++ b/config.c
@@ -1217,6 +1217,10 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 			return config_error_nonbool(var);
 		if (!strcasecmp(value, "auto"))
 			default_abbrev = -1;
+		else if (!strcasecmp(value, "false") ||
+			 !strcasecmp(value, "no") ||
+			 !strcasecmp(value, "off"))
+			default_abbrev = the_hash_algo->hexsz;
 		else {
 			int abbrev = git_config_int(var, value);
 			if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)

^ permalink raw reply related

* Re: [PATCH] Makefile: add support for generating JSON compilation database
From: Jeff King @ 2020-09-01  7:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Philippe Blain via GitGitGadget, git,
	Philippe Blain
In-Reply-To: <xmqqsgc3h28s.fsf@gitster.c.googlers.com>

On Sun, Aug 30, 2020 at 09:24:03PM -0700, Junio C Hamano wrote:

> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On 2020-08-30 at 19:28:27, Philippe Blain via GitGitGadget wrote:
> >> From: Philippe Blain <levraiphilippeblain@gmail.com>
> >> 
> >> Tools based on LibClang [1] can make use of a 'JSON Compilation
> >> Database' [2] that keeps track of the exact options used to compile a set
> >> of source files.
> >
> > For additional context why this is valuable, clangd, which is a C
> > language server protocol implementation, can use these files to
> > determine the flags needed to compile a file so it can provide proper
> > editor integration.  As a result, editors supporting the language server
> > protocol (such as VS Code, or Vim with a suitable plugin) can provide
> > better searching, integration, and refactoring tools.
> 
> I found that the proposed commit log was very weak to sell the
> change; some of what you gave above should definitely help strenthen
> it.

Likewise. Looking at the output, I'm confused how it would help with
things like searching and refactoring. It might be nice to spell it out
for those of us exposed to it for the first time (I tried following the
links but remained unenlightened).

I'd also be curious to hear what advantages it gives to add a new
Makefile knob rather than just letting interested parties add -MJ to
their CFLAGS. Is it just a convenience to create the concatenated form?
It seems weird that projects would need to do so themselves with sed
hackery (i.e., I'd expect whatever consumes this json to be able to
handle multiple files).

-Peff

^ permalink raw reply

* Re: [PATCH v2 3/2] credential-cache: use child_process.args
From: Jeff King @ 2020-09-01  4:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqpn76e873.fsf@gitster.c.googlers.com>

On Mon, Aug 31, 2020 at 03:56:00PM -0700, Junio C Hamano wrote:

> > -	argv[0] = "git-credential-cache--daemon";
> > -	argv[1] = socket;
> > -	daemon.argv = argv;
> > +	strvec_pushl(&daemon.args, 
> > +		     "credential-cache--daemon", socket,
> > +		     NULL);
> > +	daemon.git_cmd = 1;
> >  	daemon.no_stdin = 1;
> >  	daemon.out = -1;
> >  
> 
> By the way, an interesting fact is that this cannot graduate UNTIL
> credential-cache becomes a built-in.  Having an intermediate level
> process seems to break t0301.

Hmm, that is interesting. I thought it would work OK because we don't
rely on any process-id magic for finding the daemon, etc, and instead
talk over pipe descriptors. But that proves to be our undoing.

What happens usually is this:

  - credential-cache spawns credential-cache--daemon with its
    stdout connected to a pipe

  - the client calls read_in_full() waiting for the daemon to tell us
    that it started up

  - the daemon opens the socket, then writes "ok\n" to stdout and closes
    the pipe

  - the client gets EOF on the pipe, then compares what it read to
    "ok\n", and continues (or relays an error)

But when we add the extra "git" wrapper process into the mix, we never
see that EOF. The wrapper's stdout also points to that pipe, so closing
it in the daemon process isn't enough for the client to get EOF. And the
wrapper doesn't exit, because it's waiting on the daemon.

So one hacky fix is:

diff --git a/credential-cache.c b/credential-cache.c
index 04df61cf02..9bfddaf050 100644
--- a/credential-cache.c
+++ b/credential-cache.c
@@ -51,7 +51,7 @@ static void spawn_daemon(const char *socket)

 	if (start_command(&daemon))
 		die_errno("unable to start cache daemon");
-	r = read_in_full(daemon.out, buf, sizeof(buf));
+	r = read_in_full(daemon.out, buf, 3);
 	if (r < 0)
 		die_errno("unable to read result code from cache daemon");
 	if (r != 3 || memcmp(buf, "ok\n", 3))

which makes t0301 pass. A less-hacky solution might be to loosen its
expectations not to require EOF at all (and just accept anything
starting with "ok\n". But I don't think it's worth doing either, if we
know we're going to switch it to a builtin anyway (and that also makes
me feel slightly better about the plan to do so).

I do wonder if it points to a problem in the git.c wrapper. It's
duplicating all of the descriptors that the child external commands will
see, so callers can't rely on EOF (or EPIPE for its stdin) to know when
the external program has closed them. For the most part that's OK,
because we expect them to close when the external program exits, at
which point the wrapper will exit, too. But things get weird around
half-duplex shutdowns, or programs that try to daemonize.

Perhaps git.c should be closing all descriptors after spawning the
child. Of course that gets weird if it wants to write an error to stderr
about spawning the child. I dunno. It seems as likely to introduce
problems as solve them, so if nothing is broken beyond this cache-daemon
thing, I'd just as soon leave it.

I wouldn't be surprised if older pre-builtin "upload-pack" could run
into problems. But we always called it as "git-upload-pack" back then
anyway. Possibly modern "git daemon" could suffer weirdness, as it's
still a separate program (I shied away from including it in my recent
"slimming" series exactly because I was afraid of these kinds of issues;
but it sounds like being a builtin probably has less-surprising
implications overall).

All of which is to say I'm happy to do nothing, but that turned out to
be a very interesting data point. Thanks for mentioning it. :)

-Peff

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox