Git development
 help / color / mirror / Atom feed
* [PATCH v7 03/14] replay: start using parse_options API
From: Christian Couder @ 2023-11-15 14:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Johannes Schindelin,
	Elijah Newren, John Cai, Derrick Stolee, Phillip Wood, Calvin Wan,
	Toon Claes, Dragan Simic, Linus Arver, Christian Couder
In-Reply-To: <20231115143327.2441397-1-christian.couder@gmail.com>

From: Elijah Newren <newren@gmail.com>

Instead of manually parsing arguments, let's start using the parse_options
API. This way this new builtin will look more standard, and in some
upcoming commits will more easily be able to handle more command line
options.

Note that we plan to later use standard revision ranges instead of
hardcoded "<oldbase> <branch>" arguments. When we will use standard
revision ranges, it will be easier to check if there are no spurious
arguments if we keep ARGV[0], so let's call parse_options() with
PARSE_OPT_KEEP_ARGV0 even if we don't need ARGV[0] right now to avoid
some useless code churn.

Co-authored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/replay.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/builtin/replay.c b/builtin/replay.c
index f2d8444417..afabb844d3 100644
--- a/builtin/replay.c
+++ b/builtin/replay.c
@@ -15,7 +15,7 @@
 #include "lockfile.h"
 #include "merge-ort.h"
 #include "object-name.h"
-#include "read-cache-ll.h"
+#include "parse-options.h"
 #include "refs.h"
 #include "revision.h"
 #include "sequencer.h"
@@ -92,6 +92,7 @@ static struct commit *create_commit(struct tree *tree,
 int cmd_replay(int argc, const char **argv, const char *prefix)
 {
 	struct commit *onto;
+	const char *onto_name = NULL;
 	struct commit *last_commit = NULL, *last_picked_commit = NULL;
 	struct object_id head;
 	struct lock_file lock = LOCK_INIT;
@@ -105,16 +106,32 @@ int cmd_replay(int argc, const char **argv, const char *prefix)
 	struct strbuf branch_name = STRBUF_INIT;
 	int ret = 0;
 
-	if (argc == 2 && !strcmp(argv[1], "-h")) {
-		printf("git replay --onto <newbase> <oldbase> <branch> # EXPERIMENTAL\n");
-		exit(129);
+	const char * const replay_usage[] = {
+		N_("git replay --onto <newbase> <oldbase> <branch> # EXPERIMENTAL"),
+		NULL
+	};
+	struct option replay_options[] = {
+		OPT_STRING(0, "onto", &onto_name,
+			   N_("revision"),
+			   N_("replay onto given commit")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, replay_options, replay_usage,
+			     PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_KEEP_UNKNOWN_OPT);
+
+	if (!onto_name) {
+		error(_("option --onto is mandatory"));
+		usage_with_options(replay_usage, replay_options);
 	}
 
-	if (argc != 5 || strcmp(argv[1], "--onto"))
-		die("usage: read the code, figure out how to use it, then do so");
+	if (argc != 3) {
+		error(_("bad number of arguments"));
+		usage_with_options(replay_usage, replay_options);
+	}
 
-	onto = peel_committish(argv[2]);
-	strbuf_addf(&branch_name, "refs/heads/%s", argv[4]);
+	onto = peel_committish(onto_name);
+	strbuf_addf(&branch_name, "refs/heads/%s", argv[2]);
 
 	/* Sanity check */
 	if (repo_get_oid(the_repository, "HEAD", &head))
@@ -126,6 +143,7 @@ int cmd_replay(int argc, const char **argv, const char *prefix)
 		BUG("Could not read index");
 
 	repo_init_revisions(the_repository, &revs, prefix);
+
 	revs.verbose_header = 1;
 	revs.max_parents = 1;
 	revs.cherry_mark = 1;
@@ -134,7 +152,8 @@ int cmd_replay(int argc, const char **argv, const char *prefix)
 	revs.right_only = 1;
 	revs.sort_order = REV_SORT_IN_GRAPH_ORDER;
 	revs.topo_order = 1;
-	strvec_pushl(&rev_walk_args, "", argv[4], "--not", argv[3], NULL);
+
+	strvec_pushl(&rev_walk_args, "", argv[2], "--not", argv[1], NULL);
 
 	if (setup_revisions(rev_walk_args.nr, rev_walk_args.v, &revs, NULL) > 1) {
 		ret = error(_("unhandled options"));
@@ -197,8 +216,8 @@ int cmd_replay(int argc, const char **argv, const char *prefix)
 			       &last_commit->object.oid,
 			       &last_picked_commit->object.oid,
 			       REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
-			error(_("could not update %s"), argv[4]);
-			die("Failed to update %s", argv[4]);
+			error(_("could not update %s"), argv[2]);
+			die("Failed to update %s", argv[2]);
 		}
 		if (create_symref("HEAD", branch_name.buf, reflog_msg.buf) < 0)
 			die(_("unable to update HEAD"));
@@ -210,8 +229,8 @@ int cmd_replay(int argc, const char **argv, const char *prefix)
 			       &last_commit->object.oid,
 			       &head,
 			       REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
-			error(_("could not update %s"), argv[4]);
-			die("Failed to update %s", argv[4]);
+			error(_("could not update %s"), argv[2]);
+			die("Failed to update %s", argv[2]);
 		}
 	}
 	ret = (result.clean == 0);
-- 
2.43.0.rc1.15.g29556bcc86


^ permalink raw reply related

* [PATCH v7 02/14] replay: introduce new builtin
From: Christian Couder @ 2023-11-15 14:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Johannes Schindelin,
	Elijah Newren, John Cai, Derrick Stolee, Phillip Wood, Calvin Wan,
	Toon Claes, Dragan Simic, Linus Arver, Christian Couder
In-Reply-To: <20231115143327.2441397-1-christian.couder@gmail.com>

From: Elijah Newren <newren@gmail.com>

For now, this is just a rename from `t/helper/test-fast-rebase.c` into
`builtin/replay.c` with minimal changes to make it build appropriately.

Let's add a stub documentation and a stub test script though.

Subsequent commits will flesh out the capabilities of the new command
and make it a more standard regular builtin.

Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Co-authored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 .gitignore                                    |  1 +
 Documentation/git-replay.txt                  | 39 ++++++++++++
 Makefile                                      |  2 +-
 builtin.h                                     |  1 +
 .../test-fast-rebase.c => builtin/replay.c    | 29 +++------
 command-list.txt                              |  1 +
 git.c                                         |  1 +
 t/helper/test-tool.c                          |  1 -
 t/helper/test-tool.h                          |  1 -
 t/t3650-replay-basics.sh                      | 60 +++++++++++++++++++
 t/t6429-merge-sequence-rename-caching.sh      | 27 +++------
 11 files changed, 122 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/git-replay.txt
 rename t/helper/test-fast-rebase.c => builtin/replay.c (87%)
 create mode 100755 t/t3650-replay-basics.sh

diff --git a/.gitignore b/.gitignore
index 5e56e471b3..612c0f6a0f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -135,6 +135,7 @@
 /git-remote-ext
 /git-repack
 /git-replace
+/git-replay
 /git-request-pull
 /git-rerere
 /git-reset
diff --git a/Documentation/git-replay.txt b/Documentation/git-replay.txt
new file mode 100644
index 0000000000..0349058b66
--- /dev/null
+++ b/Documentation/git-replay.txt
@@ -0,0 +1,39 @@
+git-replay(1)
+=============
+
+NAME
+----
+git-replay - EXPERIMENTAL: Replay commits on a new base, works with bare repos too
+
+
+SYNOPSIS
+--------
+[verse]
+'git replay' --onto <newbase> <oldbase> <branch> # EXPERIMENTAL
+
+DESCRIPTION
+-----------
+
+Takes a range of commits, specified by <oldbase> and <branch>, and
+replays them onto a new location (see `--onto` option below).
+
+THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
+
+OPTIONS
+-------
+
+--onto <newbase>::
+	Starting point at which to create the new commits.  May be any
+	valid commit, and not just an existing branch name.
+
+EXIT STATUS
+-----------
+
+For a successful, non-conflicted replay, the exit status is 0.  When
+the replay has conflicts, the exit status is 1.  If the replay is not
+able to complete (or start) due to some kind of error, the exit status
+is something other than 0 or 1.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 03adcb5a48..3834bc1544 100644
--- a/Makefile
+++ b/Makefile
@@ -799,7 +799,6 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-env-helper.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
-TEST_BUILTINS_OBJS += test-fast-rebase.o
 TEST_BUILTINS_OBJS += test-find-pack.o
 TEST_BUILTINS_OBJS += test-fsmonitor-client.o
 TEST_BUILTINS_OBJS += test-genrandom.o
@@ -1290,6 +1289,7 @@ BUILTIN_OBJS += builtin/remote-fd.o
 BUILTIN_OBJS += builtin/remote.o
 BUILTIN_OBJS += builtin/repack.o
 BUILTIN_OBJS += builtin/replace.o
+BUILTIN_OBJS += builtin/replay.o
 BUILTIN_OBJS += builtin/rerere.o
 BUILTIN_OBJS += builtin/reset.o
 BUILTIN_OBJS += builtin/rev-list.o
diff --git a/builtin.h b/builtin.h
index d560baa661..28280636da 100644
--- a/builtin.h
+++ b/builtin.h
@@ -211,6 +211,7 @@ int cmd_remote(int argc, const char **argv, const char *prefix);
 int cmd_remote_ext(int argc, const char **argv, const char *prefix);
 int cmd_remote_fd(int argc, const char **argv, const char *prefix);
 int cmd_repack(int argc, const char **argv, const char *prefix);
+int cmd_replay(int argc, const char **argv, const char *prefix);
 int cmd_rerere(int argc, const char **argv, const char *prefix);
 int cmd_reset(int argc, const char **argv, const char *prefix);
 int cmd_restore(int argc, const char **argv, const char *prefix);
diff --git a/t/helper/test-fast-rebase.c b/builtin/replay.c
similarity index 87%
rename from t/helper/test-fast-rebase.c
rename to builtin/replay.c
index 2bfab66b1b..f2d8444417 100644
--- a/t/helper/test-fast-rebase.c
+++ b/builtin/replay.c
@@ -1,17 +1,11 @@
 /*
- * "git fast-rebase" builtin command
- *
- * FAST: Forking Any Subprocesses (is) Taboo
- *
- * This is meant SOLELY as a demo of what is possible.  sequencer.c and
- * rebase.c should be refactored to use the ideas here, rather than attempting
- * to extend this file to replace those (unless Phillip or Dscho say that
- * refactoring is too hard and we need a clean slate, but I'm guessing that
- * refactoring is the better route).
+ * "git replay" builtin command
  */
 
 #define USE_THE_INDEX_VARIABLE
-#include "test-tool.h"
+#include "git-compat-util.h"
+
+#include "builtin.h"
 #include "cache-tree.h"
 #include "commit.h"
 #include "environment.h"
@@ -27,7 +21,8 @@
 #include "sequencer.h"
 #include "setup.h"
 #include "strvec.h"
-#include "tree.h"
+#include <oidset.h>
+#include <tree.h>
 
 static const char *short_commit_name(struct commit *commit)
 {
@@ -94,7 +89,7 @@ static struct commit *create_commit(struct tree *tree,
 	return (struct commit *)obj;
 }
 
-int cmd__fast_rebase(int argc, const char **argv)
+int cmd_replay(int argc, const char **argv, const char *prefix)
 {
 	struct commit *onto;
 	struct commit *last_commit = NULL, *last_picked_commit = NULL;
@@ -110,14 +105,8 @@ int cmd__fast_rebase(int argc, const char **argv)
 	struct strbuf branch_name = STRBUF_INIT;
 	int ret = 0;
 
-	/*
-	 * test-tool stuff doesn't set up the git directory by default; need to
-	 * do that manually.
-	 */
-	setup_git_directory();
-
 	if (argc == 2 && !strcmp(argv[1], "-h")) {
-		printf("Sorry, I am not a psychiatrist; I can not give you the help you need.  Oh, you meant usage...\n");
+		printf("git replay --onto <newbase> <oldbase> <branch> # EXPERIMENTAL\n");
 		exit(129);
 	}
 
@@ -136,7 +125,7 @@ int cmd__fast_rebase(int argc, const char **argv)
 	if (repo_read_index(the_repository) < 0)
 		BUG("Could not read index");
 
-	repo_init_revisions(the_repository, &revs, NULL);
+	repo_init_revisions(the_repository, &revs, prefix);
 	revs.verbose_header = 1;
 	revs.max_parents = 1;
 	revs.cherry_mark = 1;
diff --git a/command-list.txt b/command-list.txt
index 54b2a50f5f..c4cd0f352b 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -160,6 +160,7 @@ git-reflog                              ancillarymanipulators           complete
 git-remote                              ancillarymanipulators           complete
 git-repack                              ancillarymanipulators           complete
 git-replace                             ancillarymanipulators           complete
+git-replay                              plumbingmanipulators
 git-request-pull                        foreignscminterface             complete
 git-rerere                              ancillaryinterrogators
 git-reset                               mainporcelain           history
diff --git a/git.c b/git.c
index c67e44dd82..7068a184b0 100644
--- a/git.c
+++ b/git.c
@@ -594,6 +594,7 @@ static struct cmd_struct commands[] = {
 	{ "remote-fd", cmd_remote_fd, NO_PARSEOPT },
 	{ "repack", cmd_repack, RUN_SETUP },
 	{ "replace", cmd_replace, RUN_SETUP },
+	{ "replay", cmd_replay, RUN_SETUP },
 	{ "rerere", cmd_rerere, RUN_SETUP },
 	{ "reset", cmd_reset, RUN_SETUP },
 	{ "restore", cmd_restore, RUN_SETUP | NEED_WORK_TREE },
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 876cd2dc31..37ba996539 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -30,7 +30,6 @@ static struct test_cmd cmds[] = {
 	{ "dump-untracked-cache", cmd__dump_untracked_cache },
 	{ "env-helper", cmd__env_helper },
 	{ "example-decorate", cmd__example_decorate },
-	{ "fast-rebase", cmd__fast_rebase },
 	{ "find-pack", cmd__find_pack },
 	{ "fsmonitor-client", cmd__fsmonitor_client },
 	{ "genrandom", cmd__genrandom },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 70dd4eba11..8a1a7c63da 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -24,7 +24,6 @@ int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__dump_reftable(int argc, const char **argv);
 int cmd__env_helper(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
-int cmd__fast_rebase(int argc, const char **argv);
 int cmd__find_pack(int argc, const char **argv);
 int cmd__fsmonitor_client(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
diff --git a/t/t3650-replay-basics.sh b/t/t3650-replay-basics.sh
new file mode 100755
index 0000000000..36c1b5082a
--- /dev/null
+++ b/t/t3650-replay-basics.sh
@@ -0,0 +1,60 @@
+#!/bin/sh
+
+test_description='basic git replay tests'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+GIT_AUTHOR_NAME=author@name
+GIT_AUTHOR_EMAIL=bogus@email@address
+export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL
+
+test_expect_success 'setup' '
+	test_commit A &&
+	test_commit B &&
+
+	git switch -c topic1 &&
+	test_commit C &&
+	git switch -c topic2 &&
+	test_commit D &&
+	test_commit E &&
+	git switch topic1 &&
+	test_commit F &&
+	git switch -c topic3 &&
+	test_commit G &&
+	test_commit H &&
+	git switch -c topic4 main &&
+	test_commit I &&
+	test_commit J &&
+
+	git switch -c next main &&
+	test_commit K &&
+	git merge -m "Merge topic1" topic1 &&
+	git merge -m "Merge topic2" topic2 &&
+	git merge -m "Merge topic3" topic3 &&
+	>evil &&
+	git add evil &&
+	git commit --amend &&
+	git merge -m "Merge topic4" topic4 &&
+
+	git switch main &&
+	test_commit L &&
+	test_commit M &&
+
+	git switch -c conflict B &&
+	test_commit C.conflict C.t conflict
+'
+
+test_expect_success 'using replay to rebase two branches, one on top of other' '
+	git switch main &&
+
+	git replay --onto main topic1 topic2 >result &&
+
+	git log --format=%s $(cut -f 3 -d " " result) >actual &&
+	test_write_lines E D M L B A >expect &&
+	test_cmp expect actual
+'
+
+test_done
diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh
index 75d3fd2dba..7670b72008 100755
--- a/t/t6429-merge-sequence-rename-caching.sh
+++ b/t/t6429-merge-sequence-rename-caching.sh
@@ -71,9 +71,8 @@ test_expect_success 'caching renames does not preclude finding new ones' '
 
 		git switch upstream &&
 
-		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git replay --onto HEAD upstream~1 topic &&
 		git reset --hard topic &&
-		#git cherry-pick upstream~1..topic
 
 		git ls-files >tracked-files &&
 		test_line_count = 2 tracked-files &&
@@ -141,8 +140,7 @@ test_expect_success 'cherry-pick both a commit and its immediate revert' '
 		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
 		export GIT_TRACE2_PERF &&
 
-		test-tool fast-rebase --onto HEAD upstream~1 topic &&
-		#git cherry-pick upstream~1..topic &&
+		git replay --onto HEAD upstream~1 topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
 		test_line_count = 1 calls
@@ -200,9 +198,8 @@ test_expect_success 'rename same file identically, then reintroduce it' '
 		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
 		export GIT_TRACE2_PERF &&
 
-		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git replay --onto HEAD upstream~1 topic &&
 		git reset --hard topic &&
-		#git cherry-pick upstream~1..topic &&
 
 		git ls-files >tracked &&
 		test_line_count = 2 tracked &&
@@ -278,9 +275,8 @@ test_expect_success 'rename same file identically, then add file to old dir' '
 		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
 		export GIT_TRACE2_PERF &&
 
-		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git replay --onto HEAD upstream~1 topic &&
 		git reset --hard topic &&
-		#git cherry-pick upstream~1..topic &&
 
 		git ls-files >tracked &&
 		test_line_count = 4 tracked &&
@@ -356,8 +352,7 @@ test_expect_success 'cached dir rename does not prevent noticing later conflict'
 		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
 		export GIT_TRACE2_PERF &&
 
-		test_must_fail test-tool fast-rebase --onto HEAD upstream~1 topic >output &&
-		#git cherry-pick upstream..topic &&
+		test_must_fail git replay --onto HEAD upstream~1 topic >output &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
 		test_line_count = 2 calls
@@ -456,9 +451,8 @@ test_expect_success 'dir rename unneeded, then add new file to old dir' '
 		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
 		export GIT_TRACE2_PERF &&
 
-		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git replay --onto HEAD upstream~1 topic &&
 		git reset --hard topic &&
-		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
 		test_line_count = 2 calls &&
@@ -523,9 +517,8 @@ test_expect_success 'dir rename unneeded, then rename existing file into old dir
 		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
 		export GIT_TRACE2_PERF &&
 
-		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git replay --onto HEAD upstream~1 topic &&
 		git reset --hard topic &&
-		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
 		test_line_count = 3 calls &&
@@ -626,9 +619,8 @@ test_expect_success 'caching renames only on upstream side, part 1' '
 		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
 		export GIT_TRACE2_PERF &&
 
-		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git replay --onto HEAD upstream~1 topic &&
 		git reset --hard topic &&
-		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
 		test_line_count = 1 calls &&
@@ -685,9 +677,8 @@ test_expect_success 'caching renames only on upstream side, part 2' '
 		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
 		export GIT_TRACE2_PERF &&
 
-		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git replay --onto HEAD upstream~1 topic &&
 		git reset --hard topic &&
-		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
 		test_line_count = 2 calls &&
-- 
2.43.0.rc1.15.g29556bcc86


^ permalink raw reply related

* [PATCH v7 00/14] Introduce new `git replay` command
From: Christian Couder @ 2023-11-15 14:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Johannes Schindelin,
	Elijah Newren, John Cai, Derrick Stolee, Phillip Wood, Calvin Wan,
	Toon Claes, Dragan Simic, Linus Arver, Christian Couder
In-Reply-To: <20231102135151.843758-1-christian.couder@gmail.com>

# Intro

`git replay` has initially been developed entirely by Elijah Newren
mostly last year (2022) at:

https://github.com/newren/git/commits/replay

I took over this year to polish and upstream it as GitLab is
interested in replacing libgit2, and for that purpose needs a command
to do server side (so without using a worktree) rebases, cherry-picks
and reverts.

I reduced the number of commits and features in this patch series,
compared to what Elijah already developed. Especially I stopped short
of replaying merge commits and replaying interactively. These and
other features might be upstreamed in the future after this patch
series has graduated.

The focus in this series is to make it a good plumbing command that
can already be used server side and that replaces the "fast-rebase"
test-tool command. So things to make it easier to use on the command
line, and more advanced features (like replaying merges) are left out.

It looks like GitHub has actually already been using version 3 of this
patch series in production with good results. See:

https://github.blog/2023-07-27-scaling-merge-ort-across-github/
https://lore.kernel.org/git/304f2a49-5e05-7655-9f87-2011606df5db@gmx.de/

# Content of this cover letter

The "Quick Overview" and "Reasons for diverging from cherry-pick &
rebase" sections just below are describing the purpose of the new
command in the big scheme of things. They are taken from Elijah's
design notes
(https://github.com/newren/git/blob/replay/replay-design-notes.txt)
and describe what we want this command to become and the reasons for
that, not what the command is after only this patch series. Also these
design notes were written at least one year ago, so parts of those 2
sections are not true anymore. I have added Phillip Wood's or Felipe
Contreras' notes (thanks to them) where that's the case, but some now
flawed parts may have missed.

After these two sections, starting with the "Important limitations"
section, you will find sections describing what is actually in this
patch series.

More interesting material is available in Elijah's design notes like
an "Intro via examples"
(https://github.com/newren/git/blob/replay/replay-design-notes.txt#L37-L132),
a discussion about "Preserving topology, replaying merges"
(https://github.com/newren/git/blob/replay/replay-design-notes.txt#L264-L341)
and a "Current status" section describing Elijah's work
(https://github.com/newren/git/blob/replay/replay-design-notes.txt#L344-L392)
before I started working on upstreaming it.

I have not included this material here though, as the documentation
added by this patch series for the `git replay` command already
includes an "EXAMPLES" section, and other sections of Elijah's design
notes might not be interesting for now. Also this cover letter is
already pretty long.  But reviewers can refer to the links above if
they think it can help.

# Quick Overview (from Elijah's design notes)

`git replay`, at a basic level, can perhaps be thought of as a
"default-to-dry-run rebase" -- meaning no updates to the working tree,
or to the index, or to any references.  However, it differs from
rebase in that it:

  * Works for branches that aren't checked out

  * Works in a bare repository

  * Can replay multiple branches simultaneously (with or without common
    history in the range being replayed)

  * Preserves relative topology by default (merges are replayed too in
    Elijah's original work, not in this series)

  * Focuses on performance

  * Has several altered defaults as a result of the above

I sometimes think of `git replay` as "fast-replay", a patch-based
analogue to the snapshot-based fast-export & fast-import tools.

# Reasons for diverging from cherry-pick & rebase (from Elijah's
  design notes)

There are multiple reasons to diverge from the defaults in cherry-pick and
rebase.

* Server side needs

  * Both cherry-pick and rebase, via the sequencer, are heavily tied
    to updating the working tree, index, some refs, and a lot of
    control files with every commit replayed, and invoke a mess of
    hooks[1] that might be hard to avoid for backward compatibility
    reasons (at least, that's been brought up a few times on the
    list).

  * cherry-pick and rebase both fork various subprocesses
    unnecessarily, but somewhat intrinsically in part to ensure the
    same hooks are called that old scripted implementations would have
    called.

    Note: since 356ee4659bb (sequencer: try to commit without forking
    'git commit', 2017-11-24) cherry-pick and rebase do not fork
    subprocesses other than hooks for the cases covered by this patch
    series (i.e. they do not fork "git commit" for simple picks).

  * "Dry run" behavior, where there are no updates to worktree, index,
    or even refs might be important.

  * Should not assume users only want to operate on HEAD (see next
    section)

* Decapitate HEAD-centric assumptions

  * cherry-pick forces commits to be played on top of HEAD;
    inflexible.

  * rebase assumes the range of commits to be replayed is
    upstream..HEAD by default, though it allows one to replay
    upstream..otherbranch -- but it still forcibly and needlessly
    checks out 'otherbranch' before starting to replay things.

    Note: since 767a9c417eb (rebase -i: stop checking out the tip of
    the branch to rebase, 2020-01-24) it's not true that rebase
    forcibly and needlessly checks out 'otherbranch'.

  * Assuming HEAD is involved severely limits replaying multiple
    (possibly divergent) branches.

    Note: since 89fc0b53fdb (rebase: update refs from 'update-ref'
    commands, 2022-07-19) the sequencer can update multiple
    branches. The issue with divergent branch is with command line
    arguments and the todo list generation rather than the
    capabilities of the sequencer.

  * Once you stop assuming HEAD has a certain meaning, there's not
    much reason to have two separate commands anymore (except for the
    funny extra not-necessarily-compatible options both have gained
    over time).

  * (Micro issue: Assuming HEAD is involved also makes it harder for
    new users to learn what rebase means and does; it makes command
    lines hard to parse.  Not sure I want to harp on this too much, as
    I have a suspicion I might be creating a tool for experts with
    complicated use cases, but it's a minor quibble.)

* Performance

  * jj is slaughtering us on rebase speed[2].  I would like us to become
    competitive.  (I dropped a few comments in the link at [2] about why
    git is currently so bad.)

  * From [3], there was a simple 4-patch series in linux.git that took
    53 seconds to rebase.  Switching to ort dropped it to 16 seconds.
    While that sounds great, only 11 *milliseconds* were needed to do
    the actual merges.  That means almost *all* the time (>99%) was
    overhead!  Big offenders:

    * --reapply-cherry-picks should be the default

    * can_fast_forward() should be ripped out, and perhaps other extraneous
      revision walks

      Note: d42c9ffa0f (rebase: factor out branch_base calculation,
      2022-10-17) might already deal with that (according to Felipe
      Contreras).

    * avoid updating working tree, index, refs, reflogs, and control
      structures except when needed (e.g. hitting a conflict, or operation
      finished)

  * Other performance ideas (mostly for future work, not in this
    series)

    * single-file control structures instead of directory of files
      (when doing interactive things which is in Elijah's original
      work, but not in this series)

    * avoid forking subprocesses unless explicitly requested (e.g.
      --exec, --strategy, --run-hooks).  For example, definitely do not
      invoke `git commit` or `git merge`.

    * Sanitize hooks:

      * dispense with all per-commit hooks for sure (pre-commit,
        post-commit, post-checkout).

      * pre-rebase also seems to assume exactly 1 ref is written, and
        invoking it repeatedly would be stupid.  Plus, it's specific
        to "rebase".  So...ignore?  (Stolee's --ref-update option for
        rebase probably broke the pre-rebase assumptions already...)

      * post-rewrite hook might make sense, but fast-import got
        exempted, and I think of replay like a patch-based analogue
        to the snapshot-based fast-import.

    * When not running server side, resolve conflicts in a sparse-cone
      sparse-index worktree to reduce number of files written to a
      working tree.  (See below as well.)

    * [High risk of possible premature optimization] Avoid large
      numbers of newly created loose objects, when replaying large
      numbers of commits.  Two possibilities: (1) Consider using
      tmp-objdir and pack objects from the tmp-objdir at end of
      exercise, (2) Lift code from git-fast-import to immediately
      stuff new objects into a pack?

* Multiple branches and non-checked out branches

  * The ability to operate on non-checked out branches also implies
    that we should generally be able to replay when in a dirty working
    tree (exception being when we expect to update HEAD and any of the
    dirty files is one that needs to be updated by the replay).

  * Also, if we are operating locally on a non-checked out branch and
    hit a conflict, we should have a way to resolve the conflict
    without messing with the user's work on their current
    branch. (This is not is this patch series though.)

    * Idea: new worktree with sparse cone + sparse index checkout,
      containing only files in the root directory, and whatever is
      necessary to get the conflicts

    * Companion to above idea: control structures should be written to
      $GIT_COMMON_DIR/replay-${worktree}, so users can have multiple
      replay sessions, and so we know which worktrees are associated
      with which replay operations.

  - [1] https://lore.kernel.org/git/pull.749.v3.git.git.1586044818132.gitgitgadget@gmail.com/
  - [2] https://github.com/martinvonz/jj/discussions/49
  - [3] https://lore.kernel.org/git/CABPp-BE48=97k_3tnNqXPjSEfA163F8hoE+HY0Zvz1SWB2B8EA@mail.gmail.com/

# Important limitations

* The code exits with code 1 if there are any conflict. No
  resumability. No nice output. No interactivity. No special exit code
  depending on the reason.

* When a commit becomes empty as it is replayed, it is still replayed
  as an empty commit, instead of being dropped.

* No replaying merges, nor root commits. Only regular commits.

* Signed commits are not properly handled. It's not clear what to do
  to such commits when replaying on the server side.

* Notes associated with replayed commits are not updated nor
  duplicated. (Thanks to Phillip Wood for noticing.)

# Commit overview

* 1/14 t6429: remove switching aspects of fast-rebase

    Preparatory commit to make it easier to later replace the
    fast-rebase test-tool by `git replay` without breaking existing
    tests.

* 2/14 replay: introduce new builtin

    This creates a minimal `git replay` command by moving the code
    from the `fast-rebase` test helper from `t/helper/` into
    `builtin/` and doing some renames and a few other needed changes.
    Since v6, there are only a few doc improvements as suggested by
    Dscho.

* - 3/14 replay: start using parse_options API
  - 4/14 replay: die() instead of failing assert()
  - 5/14 replay: introduce pick_regular_commit()
  - 6/14 replay: change rev walking options
  - 7/14 replay: add an important FIXME comment about gpg signing
  - 8/14 replay: remove progress and info output
  - 9/14 replay: remove HEAD related sanity check

    These slowly change the command to make it behave more like
    regular commands and to start cleaning up its output. In patch
    6/14 (replay: change rev walking options) there are some changes
    compared to v6 as suggested by Elijah and Dscho. We are now
    setting the rev walking bits we want before the call to
    setup_revisions(). And then after that call we check if these bits
    have been changed, and if that's the case we warn that we are
    going to override those changes and we override the bits.

* 10/14 replay: make it a minimal server side command

    After the cleaning up in previous commits, it's now time to
    radically change the way it works by stopping it to do ref
    updates, to update the index and worktree, to consider HEAD as
    special. Instead just make it output commands that should be
    passed to `git update-ref --stdin`.

* 11/14 replay: use standard revision ranges

    Start adding new interesting features and also documentation and
    tests, as the interface of the command is cristalizing into its
    final form. Since v6 "Takes a range of commits" has been replaced
    with "Takes ranges of commits" to reflect the fact that it can
    accept more than one <revision-range>.

* - 12/14 replay: add --advance or 'cherry-pick' mode
  - 13/14 replay: add --contained to rebase contained branches

    Add new options and features to the command.

* 14/14 replay: stop assuming replayed branches do not diverge

    This adds another interesting feature, as well as related
    documentation and tests.

# Notes about `fast-rebase`, tests and documentation

The `fast-rebase` test-tool helper was developed by Elijah to
experiment with a rebasing tool that would be developed from scratch
based on his merge-ort work, could be used to test that merge-ort
work, and would not have the speed and interface limitations of `git
rebase` or `git cherry-pick`.

This `fast-rebase` helper was used before this series in:

t6429-merge-sequence-rename-caching.sh

So when `git replay` is created from `fast-rebase` in patch 2/14, the
t6429 test script is also converted to use `git replay`. This ensures
that `git replay` doesn't break too badly during the first 10 patches
in this patch series.

Tests and documentation are introduced specifically for `git replay`
as soon as patch 2/14, but they are not much improved since around
11/14 as it doesn't make much sense to document and test behavior that
we know is going to change soon.

# Possibly controversial issues 

* bare or not bare: this series works towards a plumbing command with
  the end goal of it being usable and used first on bare repos,
  contrary to existing commands like `git rebase` and `git
  cherry-pick`. The tests check that the command works on both bare
  and non-bare repo though.

* exit status: a successful, non-conflicted replay exits with code
  0. When the replay has conflicts, the exit status is 1. If the
  replay is not able to complete (or start) due to some kind of error,
  the exit status is something other than 0 or 1. There are a few
  tests checking that. It has been suggested in an internal review
  that conflicts might want to get a more specific error code as an
  error code of 1 might be quite easy to return by accident. It
  doesn't seem to me from their docs (which might want to be improved,
  I didn't look at the code) that other commands like `git merge` and
  `git rebase` exit with a special error code in case of conflict.

* make worktree and index changes optional: commit 10/14 stops
  updating the index and worktree, but it might be better especially
  for cli users to make that optional. The issue is that this would
  make the command more complex while we are developing a number of
  important features so that the command can be used on bare repos. It
  seems that this should rather be done in an iterative improvement
  after the important features have landed.

* --advance and --contained: these two advanced options might not
  belong to this first series and could perhaps be added in a followup
  series in separate commits. On the other hand the code for
  --contained seems involved with the code of --advance and it's nice
  to see soon that git replay can indeed do cherry-picking and rebase
  many refs at once, and this way fullfil these parts of its promise.

* replaying diverging branches: 14/14 the last patch in the series,
  which allow replaying diverging branches, can be seen as a
  fundamental fix or alternatively as adding an interesting
  feature. So it's debatable if it should be in its own patch along
  with its own tests as in this series, or if it should be merged into
  a previous patch and which one.

* only 2 patches: this patch series can be seen from a high level
  point of view as 1) introducing the new `git replay` command, and 2)
  using `git replay` to replace, and get rid of, the fast-rebase
  test-tool command. The fact that not much of the original
  fast-rebase code and interface is left would agree with that point
  of view. On the other hand, fast-rebase can also be seen as a first
  iteration towards `git replay`. So it can also make sense to see how
  `git replay` evolved from it.

# Changes between v6 and v7

Thanks to Dscho, Linus Arver and Dragan Simic for their suggestions on
the previous version! The few changes compared to v6 are:

* The patch series was rebased onto master at dadef801b3 (Git
  2.43-rc1, 2023-11-08). This is to make it stand on a possibly more
  stable base.

* In patch 2/14 (replay: introduce new builtin), there are a few doc
  improvements as suggested by Dscho, as we now talk about the
  <oldbase> and <branch> arguments in the description.

* In patch 6/14 (replay: change rev walking options), as suggested by
  Elijah and Dscho, we are now setting the rev walking bits we want
  before the call to setup_revisions(). And then after that call we
  check if these bits have been changed, and if that's the case we
  warn that we are going to override those changes and we override the
  bits.

* In patch 11/14 (replay: use standard revision ranges), "Takes a
  range of commits" has been replaced with "Takes ranges of commits"
  to reflect the fact that it can accept more than one
  <revision-range>.

CI tests seem to pass according to:

https://github.com/chriscool/git/actions/runs/6878406898

(Except the "sparse" test, but failure doesn't seem to be related. And
sorry I stopped waiting for the MacOS and ASAN tests to finish after
23 minutes.)

# Range-diff between v6 and v7

(Sorry it looks like patch 6/14 in v7 is considered to be completely
different from what it was in v6, so the range-diff is not showing
differences between them.)

 1:  fac0a9dff4 =  1:  cddcd967b2 t6429: remove switching aspects of fast-rebase
 2:  bec2eb8928 !  2:  c8476fb093 replay: introduce new builtin
    @@ Documentation/git-replay.txt (new)
     +DESCRIPTION
     +-----------
     +
    -+Takes a range of commits and replays them onto a new location.
    ++Takes a range of commits, specified by <oldbase> and <branch>, and
    ++replays them onto a new location (see `--onto` option below).
     +
     +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
     +
 3:  b0cdfdc0c3 =  3:  43322abd1e replay: start using parse_options API
 4:  c3403f0b9d =  4:  6524c7f045 replay: die() instead of failing assert()
 5:  4188eeac30 =  5:  05d0efa3cb replay: introduce pick_regular_commit()
 6:  b7b4d9001e <  -:  ---------- replay: change rev walking options
 -:  ---------- >  6:  c7a5aad3d6 replay: change rev walking options
 7:  c57577a9b8 =  7:  01f35f924b replay: add an important FIXME comment about gpg signing
 8:  e78be50f3d =  8:  1498b24bad replay: remove progress and info output
 9:  e4c79b676f =  9:  6786fc147b replay: remove HEAD related sanity check
10:  8d89f1b733 ! 10:  9a24dbb530 replay: make it a minimal server side command
    @@ Commit message
         Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
     
      ## Documentation/git-replay.txt ##
    -@@ Documentation/git-replay.txt: SYNOPSIS
    - DESCRIPTION
    +@@ Documentation/git-replay.txt: DESCRIPTION
      -----------
      
    --Takes a range of commits and replays them onto a new location.
    -+Takes a range of commits and replays them onto a new location. Leaves
    -+the working tree and the index untouched, and updates no
    -+references. The output of this command is meant to be used as input to
    + Takes a range of commits, specified by <oldbase> and <branch>, and
    +-replays them onto a new location (see `--onto` option below).
    ++replays them onto a new location (see `--onto` option below). Leaves
    ++the working tree and the index untouched, and updates no references.
    ++The output of this command is meant to be used as input to
     +`git update-ref --stdin`, which would update the relevant branches.
      
      THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
    @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix
        struct merge_result result;
     -  struct strbuf reflog_msg = STRBUF_INIT;
        struct strbuf branch_name = STRBUF_INIT;
    -   int i, ret = 0;
    +   int ret = 0;
      
     @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix)
        onto = peel_committish(onto_name);
11:  3d433a1322 ! 11:  ad6ca2fbef replay: use standard revision ranges
    @@ Documentation/git-replay.txt: git-replay - EXPERIMENTAL: Replay commits on a new
      
      DESCRIPTION
      -----------
    -@@ Documentation/git-replay.txt: DESCRIPTION
    - Takes a range of commits and replays them onto a new location. Leaves
    - the working tree and the index untouched, and updates no
    - references. The output of this command is meant to be used as input to
    + 
    +-Takes a range of commits, specified by <oldbase> and <branch>, and
    +-replays them onto a new location (see `--onto` option below). Leaves
    ++Takes ranges of commits and replays them onto a new location. Leaves
    + the working tree and the index untouched, and updates no references.
    + The output of this command is meant to be used as input to
     -`git update-ref --stdin`, which would update the relevant branches.
     +`git update-ref --stdin`, which would update the relevant branches
     +(see the OUTPUT section below).
    @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix
        struct merge_options merge_opt;
        struct merge_result result;
     -  struct strbuf branch_name = STRBUF_INIT;
    -   int i, ret = 0;
    +   int ret = 0;
      
        const char * const replay_usage[] = {
     -          N_("git replay --onto <newbase> <oldbase> <branch> # EXPERIMENTAL"),
    @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix
     -  strvec_pushl(&rev_walk_args, "", argv[2], "--not", argv[1], NULL);
     -
        /*
    -    * TODO: For now, let's warn when we see an option that we are
    -    * going to override after setup_revisions() below. In the
    +    * Set desired values for rev walking options here. If they
    +    * are changed by some user specified option in setup_revisions()
     @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix)
    -                   warning(_("option '%s' will be overridden"), argv[i]);
    -   }
    +   revs.topo_order = 1;
    +   revs.simplify_history = 0;
      
     -  if (setup_revisions(rev_walk_args.nr, rev_walk_args.v, &revs, NULL) > 1) {
     -          ret = error(_("unhandled options"));
    @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix
        }
      
     @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix)
    -   revs.topo_order = 1;
    -   revs.simplify_history = 0;
    +           revs.simplify_history = 0;
    +   }
      
     -  strvec_clear(&rev_walk_args);
     -
12:  cca8105382 ! 12:  081864ed5f replay: add --advance or 'cherry-pick' mode
    @@ builtin/replay.c: static struct commit *pick_regular_commit(struct commit *pickm
        struct merge_options merge_opt;
        struct merge_result result;
     +  struct strset *update_refs = NULL;
    -   int i, ret = 0;
    +   int ret = 0;
      
        const char * const replay_usage[] = {
     -          N_("git replay --onto <newbase> <revision-range>... # EXPERIMENTAL"),
    @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix
      
        /*
     @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix)
    -   revs.topo_order = 1;
    -   revs.simplify_history = 0;
    +           revs.simplify_history = 0;
    +   }
      
     +  determine_replay_mode(&revs.cmdline, onto_name, &advance_name,
     +                        &onto, &update_refs);
13:  92287a2cc8 ! 13:  19c4016c7c replay: add --contained to rebase contained branches
    @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix
        struct rev_info revs;
        struct commit *last_commit = NULL;
     @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix)
    -   int i, ret = 0;
    +   int ret = 0;
      
        const char * const replay_usage[] = {
     -          N_("git replay (--onto <newbase> | --advance <branch>) <revision-range>... # EXPERIMENTAL"),
14:  529a7fda40 ! 14:  29556bcc86 replay: stop assuming replayed branches do not diverge
    @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix
        struct merge_result result;
        struct strset *update_refs = NULL;
     +  kh_oid_map_t *replayed_commits;
    -   int i, ret = 0;
    +   int ret = 0;
      
        const char * const replay_usage[] = {
     @@ builtin/replay.c: int cmd_replay(int argc, const char **argv, const char *prefix)


Elijah Newren (14):
  t6429: remove switching aspects of fast-rebase
  replay: introduce new builtin
  replay: start using parse_options API
  replay: die() instead of failing assert()
  replay: introduce pick_regular_commit()
  replay: change rev walking options
  replay: add an important FIXME comment about gpg signing
  replay: remove progress and info output
  replay: remove HEAD related sanity check
  replay: make it a minimal server side command
  replay: use standard revision ranges
  replay: add --advance or 'cherry-pick' mode
  replay: add --contained to rebase contained branches
  replay: stop assuming replayed branches do not diverge

 .gitignore                               |   1 +
 Documentation/git-replay.txt             | 127 +++++++
 Makefile                                 |   2 +-
 builtin.h                                |   1 +
 builtin/replay.c                         | 445 +++++++++++++++++++++++
 command-list.txt                         |   1 +
 git.c                                    |   1 +
 t/helper/test-fast-rebase.c              | 241 ------------
 t/helper/test-tool.c                     |   1 -
 t/helper/test-tool.h                     |   1 -
 t/t3650-replay-basics.sh                 | 198 ++++++++++
 t/t6429-merge-sequence-rename-caching.sh |  45 ++-
 12 files changed, 800 insertions(+), 264 deletions(-)
 create mode 100644 Documentation/git-replay.txt
 create mode 100644 builtin/replay.c
 delete mode 100644 t/helper/test-fast-rebase.c
 create mode 100755 t/t3650-replay-basics.sh


base-commit: dadef801b365989099a9929e995589e455c51fed
prerequisite-patch-id: 6df236178013b77ca82d22653c1ab78477e081ce
-- 
2.43.0.rc1.15.g29556bcc86


^ permalink raw reply

* [PATCH v7 01/14] t6429: remove switching aspects of fast-rebase
From: Christian Couder @ 2023-11-15 14:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Johannes Schindelin,
	Elijah Newren, John Cai, Derrick Stolee, Phillip Wood, Calvin Wan,
	Toon Claes, Dragan Simic, Linus Arver, Christian Couder
In-Reply-To: <20231115143327.2441397-1-christian.couder@gmail.com>

From: Elijah Newren <newren@gmail.com>

At the time t6429 was written, merge-ort was still under development,
did not have quite as many tests, and certainly was not widely deployed.
Since t6429 was exercising some codepaths just a little differently, we
thought having them also test the "merge_switch_to_result()" bits of
merge-ort was useful even though they weren't intrinsic to the real
point of these tests.

However, the value provided by doing extra testing of the
"merge_switch_to_result()" bits has decreased a bit over time, and it's
actively making it harder to refactor `test-tool fast-rebase` into `git
replay`, which we are going to do in following commits.  Dispense with
these bits.

Co-authored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/helper/test-fast-rebase.c              | 9 +--------
 t/t6429-merge-sequence-rename-caching.sh | 9 +++++++--
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c
index cac20a72b3..2bfab66b1b 100644
--- a/t/helper/test-fast-rebase.c
+++ b/t/helper/test-fast-rebase.c
@@ -194,7 +194,7 @@ int cmd__fast_rebase(int argc, const char **argv)
 		last_commit = create_commit(result.tree, commit, last_commit);
 	}
 
-	merge_switch_to_result(&merge_opt, head_tree, &result, 1, !result.clean);
+	merge_finalize(&merge_opt, &result);
 
 	if (result.clean < 0)
 		exit(128);
@@ -213,9 +213,6 @@ int cmd__fast_rebase(int argc, const char **argv)
 		}
 		if (create_symref("HEAD", branch_name.buf, reflog_msg.buf) < 0)
 			die(_("unable to update HEAD"));
-
-		prime_cache_tree(the_repository, the_repository->index,
-				 result.tree);
 	} else {
 		fprintf(stderr, "\nAborting: Hit a conflict.\n");
 		strbuf_addf(&reflog_msg, "rebase progress up to %s",
@@ -228,10 +225,6 @@ int cmd__fast_rebase(int argc, const char **argv)
 			die("Failed to update %s", argv[4]);
 		}
 	}
-	if (write_locked_index(&the_index, &lock,
-			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die(_("unable to write %s"), get_index_file());
-
 	ret = (result.clean == 0);
 cleanup:
 	strbuf_release(&reflog_msg);
diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh
index d02fa16614..75d3fd2dba 100755
--- a/t/t6429-merge-sequence-rename-caching.sh
+++ b/t/t6429-merge-sequence-rename-caching.sh
@@ -72,6 +72,7 @@ test_expect_success 'caching renames does not preclude finding new ones' '
 		git switch upstream &&
 
 		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git reset --hard topic &&
 		#git cherry-pick upstream~1..topic
 
 		git ls-files >tracked-files &&
@@ -200,6 +201,7 @@ test_expect_success 'rename same file identically, then reintroduce it' '
 		export GIT_TRACE2_PERF &&
 
 		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git reset --hard topic &&
 		#git cherry-pick upstream~1..topic &&
 
 		git ls-files >tracked &&
@@ -277,6 +279,7 @@ test_expect_success 'rename same file identically, then add file to old dir' '
 		export GIT_TRACE2_PERF &&
 
 		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git reset --hard topic &&
 		#git cherry-pick upstream~1..topic &&
 
 		git ls-files >tracked &&
@@ -356,8 +359,6 @@ test_expect_success 'cached dir rename does not prevent noticing later conflict'
 		test_must_fail test-tool fast-rebase --onto HEAD upstream~1 topic >output &&
 		#git cherry-pick upstream..topic &&
 
-		grep CONFLICT..rename/rename output &&
-
 		grep region_enter.*diffcore_rename trace.output >calls &&
 		test_line_count = 2 calls
 	)
@@ -456,6 +457,7 @@ test_expect_success 'dir rename unneeded, then add new file to old dir' '
 		export GIT_TRACE2_PERF &&
 
 		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git reset --hard topic &&
 		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
@@ -522,6 +524,7 @@ test_expect_success 'dir rename unneeded, then rename existing file into old dir
 		export GIT_TRACE2_PERF &&
 
 		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git reset --hard topic &&
 		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
@@ -624,6 +627,7 @@ test_expect_success 'caching renames only on upstream side, part 1' '
 		export GIT_TRACE2_PERF &&
 
 		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git reset --hard topic &&
 		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
@@ -682,6 +686,7 @@ test_expect_success 'caching renames only on upstream side, part 2' '
 		export GIT_TRACE2_PERF &&
 
 		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		git reset --hard topic &&
 		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
-- 
2.43.0.rc1.15.g29556bcc86


^ permalink raw reply related

* Re: [PATCH] commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
From: Patrick Steinhardt @ 2023-11-15 13:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Karthik Nayak
In-Reply-To: <20231114194310.GC2092538@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 5369 bytes --]

On Tue, Nov 14, 2023 at 02:43:10PM -0500, Jeff King wrote:
> On Wed, Nov 15, 2023 at 01:51:43AM +0900, Junio C Hamano wrote:
> 
> > >> Both of these are expected failures: we knowingly corrupt the repository
> > >> and circumvent git-gc(1)/git-maintenance(1), thus no commit-graphs are
> > >> updated. If we stick with the new stance that repository corruption
> > >> should not require us to pessimize the common case,...
> > >
> > > Yeah, just like we try to be extra careful while running fsck,
> > > because "--missing" is about finding these "corrupt" cases,
> > > triggering the paranoia mode upon seeing the option would make
> > > sense, no?  It would fix the failure in 6022, right?
> > >
> > > Thanks for working on this.
> > 
> > Just to make sure we do not miscommunicate, I do not think we want
> > to trigger the paranoia mode only in our tests.  We want to be
> > paranoid to help real users who used "--missing" for their real use,
> > so enabling PARANOIA in the test script is a wrong approach.  We
> > should enable it inside "rev-list --missing" codepath.
> 
> Yeah. Just like we auto-enabled GIT_REF_PARANOIA for git-gc, etc, I
> think we should do the same here.

I'm honestly still torn on this one. There are two cases that I can
think of where missing objects would be benign and where one wants to
use `git rev-list --missing`:

    - Repositories with promisor remotes, to find the boundary of where
      we need to fetch new objects.

    - Quarantine directories where you only intend to list new objects
      or find the boundary.

And in neither of those cases I can see a path for how the commit-graph
would contain such missing commits when using regular tooling to perform
repository maintenance.

So I'm still not sure why we think that this case is so much more
special than others. If a user wants to check for repository corruption
the tool shouldn't be `git rev-list --missing`, but git-fsck(1). To me,
the former is only useful in very specific circumstances where the user
knows what they are doing, and in none of the usecases I can think of
should we have a stale commit-graph _unless_ we have actual repository
corruption.

In reverse, to me this means that `--missing` is no more special than
any of the other low-level tooling, where our stance seems to be "We
assume that the repository is not corrupt". In that spirit, I'd argue
that the same default value should apply here as for all the other
cases.

But based on the discussion it very much feels like I'm alone with this
train of thought, which may indicate that I'm missing a quintessential
part of your arguments. May just as well be that I'm too focussed on the
usecases we at GitLab have for the new `--missing` behaviour around
commits that Karthik has just introduced.

Oh, well. I'll wait for answers to this reply until tomorrow, and if I
still haven't been able to convince anybody then I'll assume it's just
me and adapt accordingly :)

> As we are closing in on the v2.43 release, there's one thing I'm not
> sure about regarding release planning. Are these test cases that _used_
> to detect the corruption, but now don't? I.e., I would have expected
> that disabling GIT_COMMIT_GRAPH_PARANOIA would take us back to the same
> state as v2.42. But I think it doesn't because of the hunk in e04838ea82
> (commit-graph: introduce envvar to disable commit existence checks,
> 2023-10-31) that makes the has_object() call conditional (and now
> defaults to off).
> 
> What I'm getting as it that I think we have three options for v2.43:
> 
>   1. Ship what has been in the release candidates, which has a known
>      performance regression (though the severity is up for debate).

This seems like the best option for now in my opinion. The new behaviour
is not a bug, quite on the contrary, even though it is slower.

As Junio once said, we are not a "performance is king" project [1]. This
has burnt itself into my mind, and funny enough it was in the vicinity
of the change where I originally introduced the other object existence
check into `lookup_commit_in_graph()`.

[1]: <xmqqr1i1t6zl.fsf@gitster.g>

>   2. Flip the default to "0" (i.e., Patrick's patch in this thread). We
>      know that loosens some cases versus 2.42, which may be considered a
>      regression.

If we consider this to be a regression then I'd rather want to drop this
patch completely and leave it be. Ultimately, the question is how much
we trust our tooling to keep the commit-graph up-to-date, and whether or
not we need to account for corrupted repositories.

I for myself do trust the tooling, otherwise I wouldn't have sent this
patch. But I'm also happy to accept the current status where we are
being more thorough at the cost of performance.

>   3. Sort it out before the release. We're getting pretty close to do
>      a lot new work there, but I think the changes should be small-ish.
>      The nuclear option is ejecting the topic and re-doing it in the
>      next cycle.

I would be comfortable with this option if we simply switch the default
without adding special-casing for specific options like `--missing`. But
otherwise I'd rather not rush such a change.

Patrick

> I don't have a really strong preference between the three.
> 
> -Peff

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Shallow clones becomes not-shallow when cloning to a different drive
From: Tobias Schlüter @ 2023-11-15 13:12 UTC (permalink / raw)
  To: git


Hi,

I reported this on github but I was asked to take it here [*].  I had to 
deal with the following scenario today (on Windows, but not OS-specific):

$ git clone  --single-branch --branch feature/bla --depth 200 
~/source/Repos/bla ${PATH_ON_DOFFERENT_DRIVE}/bla
Cloning into './bla'...
warning: --depth is ignored in local clones; use file:// instead.
... run out of disk space because it does a full clone ...

Doing something different than the user asked for, and in a way that can 
lead to dangerous scenarios, is a bad choice.  In this case I was 
copying to a USB stick, but it could also be a system drive that runs 
out of space.

Additionally, the suggestion to use "file://" instead turned out to be 
impractical, as it is very slow.  I started the operation with 
"file://..." replacing the repository path on the command line before my 
lunch break.  When I came back it had copied a mere 100MB.

I would suggest to not imply "--local" when copying to a different 
device, and I would suggest to avoid doing something different than what 
the user asked for.  In this case I specifically asked for a shallow 
copy to save resources, the logic that a local copy using hard links 
actually saves more resources simply didn't apply.  Additionally, I 
would suggest investigating potential performance issues in the case of 
a shallow clone with file:// paths.

Please consider changing these choices and defaults.

Best regards,
- Tobi

[*] https://github.com/git-for-windows/git/issues/4693

^ permalink raw reply

* Re: [RFC PATCH 2/2] rev-list: add --ours/--theirs options
From: Junio C Hamano @ 2023-11-15 13:03 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: git
In-Reply-To: <20231115120417.1327259-2-vegard.nossum@oracle.com>

Vegard Nossum <vegard.nossum@oracle.com> writes:

> Add --ours and --theirs to view the commits from
>
>   $(git merge-base HEAD MERGE_HEAD)..HEAD and
>   $(git merge-base HEAD MERGE_HEAD)..MERGE_HEAD

The range you wrote with "merge-base" would not work well, when
there are multiple merge bases between HEAD and MERGE_HEAD.  

Just saying

	MERGE_HEAD..HEAD and
	HEAD..MERGE_HEAD

or even simpler, saying

	MERGE_HEAD.. and
	..MERGE_HEAD

would be sufficient, simpler, and more importantly, more correct.


^ permalink raw reply

* tb/merge-tree-write-pack, was Re: What's cooking in git.git (Nov 2023, #04; Thu, 9)
From: Johannes Schindelin @ 2023-11-15 12:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jeff King, git
In-Reply-To: <xmqqpm0iy00y.fsf@gitster.g>

Hi Junio,

On Fri, 10 Nov 2023, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
>
> > On Thu, Nov 09, 2023 at 02:40:28AM +0900, Junio C Hamano wrote:
> >> * tb/merge-tree-write-pack (2023-10-23) 5 commits
> > ...
> > This series received a couple of LGTMs from you and Patrick:
> >
> >   - https://lore.kernel.org/git/xmqqo7go7w63.fsf@gitster.g/#t
> >   - https://lore.kernel.org/git/ZTjKmcV5c_EFuoGo@tanuki/
>
> Yup, I am aware of them.
>
> > Johannes had posted some comments[1] about instead using a temporary
> > object store where objects are written as loose that would extend to git
> > replay....
>
> I was hoping to hear from Johannes saying he agrees with the above.
> It is not strictly required, but is much nice to have once we hear
> "let's step back a bit---are we going in the right direction?" and
> it has been responded.

When I wrote about `tmp_objdir`, there were a couple of things going on in
my mind:

- First of all, I was hesitant to write this at all because I knew that I
  lack the time to engage meaningfully in any follow-up discussion.

- To be honest, the approach to teach `merge-ort.c` anything about whether
  objects are written loosely or streamed into a pack strikes me as
  somewhat contrary to the goal of separating concerns. The merge
  machinery should not know, in my mind, how the objects are stored.

- A long-standing paradigm in Git is that pack files are not used until
  finalized. Breaking such a paradigm after being in effect for a long
  time, in my experience, is always followed by unwelcome "gifts that keep
  on giving".

- The streaming pack approach struck me as something that would only work
  properly if Git was designed with single-process operations in mind. But
  Git was originally designed around the process-proliferating Unix
  philosophy, and it is deeply ingrained in Git to this day. As such, I do
  not expect the streaming pack approach to generalize to a noteworthy
  fraction of Git operations, and I would love to focus on an approach
  that generalizes better.

- At the Git Contributor Summit, I had talked about my goals, and Elijah
  helpfully pointed out how `--remerge-diff` does it, and I wanted to
  pursue that idea further.

- The scenario I want to address (and that I assumed the patch series
  under discussion tried to address, too) is a very specific, server-side
  scenario where many `merge-tree`/`replay` runs produce _many_ loose
  objects. Quite a fraction of those are produced by processes that run
  into a SIGTERM-enforced timeout, and the `tmp_objdir` approach would
  naturally help: unneeded loose objects would not even be added to the
  primary object database but be reaped with the temporary object
  databases.

- While it may sound as if the sheer number of loose objects is the
  primary problem, an even more pressing issue I need to address is that
  competing processes that try to work on a snapshot of the loose objects
  (which does not exist, you cannot "take a snapshot", all you can do is
  to enumerate the directories sequentially) seem sometimes to process
  loose tree/commit objects that reference other objects that have been
  missed due to racy reads/writes/enumerations. The reason for this is
  that the loose objects produced by `merge-tree`/`replay` are added
  non-transactionally, and concurrent reads are prone to run into racy
  conditions where they only see a part of those objects.

- Even just using `tmp_objdir_migrate()` could help a lot by narrowing the
  window for those racy conditions.

- The number of inodes has been a concern, yes, but not such a pressing
  one that I could afford spending any further thought on the idea to
  reduce them. In any case, a working theory is that this concern would
  already be helped by avoiding the loose objects produced by failing
  merges/rebases (whose results are not used) or by merges/rebases running
  into a timeout.

- Streaming packs, if I understand correctly, do not do deltas. That in
  and of itself can cause file size issues, and light-weight maintenance
  may not even bother to try finding deltas, thereby causing follow-on
  problems.

With all this in mind, I do not think that I can affort to spend brain
cycles on the streaming-pack approach. I do not intend to discourage
anybody from working on that approach, yet I won't encourage anyone,
either.

Ciao,
Johannes

^ permalink raw reply

* [RFC PATCH 2/2] rev-list: add --ours/--theirs options
From: Vegard Nossum @ 2023-11-15 12:04 UTC (permalink / raw)
  To: git; +Cc: Vegard Nossum
In-Reply-To: <20231115120417.1327259-1-vegard.nossum@oracle.com>

When resolving merge conflicts, it is useful to be able to inspect the
commits on either side of the attempted merge. git log/rev-list already
have the --merge option, which shows these commits; however, this doesn't
tell you which side each commit appears on.

Add --ours and --theirs to view the commits from

  $(git merge-base HEAD MERGE_HEAD)..HEAD and
  $(git merge-base HEAD MERGE_HEAD)..MERGE_HEAD

respectively.

I didn't see any existing tests for git rev-list --merge, but I've used
t/t6417-merge-ours-theirs.sh as a template for the new tests, which also
include --merge.

Since git diff/diff-files have their own --ours/--theirs parsing and
handling, we need a mechanism to prevent the generic revision parsing
from consuming these arguments. The mechanism I came up with was adding
a new flag, 'ignore_ours_theirs', to 'struct setup_revision_opt' that
these two commands must set. It's admittedly not extremely elegant, but
I didn't see a better solution.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
---
 Documentation/rev-list-options.txt |  8 ++++++
 builtin/diff-files.c               |  9 +++++-
 builtin/diff.c                     | 10 ++++++-
 revision.c                         | 16 ++++++++---
 revision.h                         |  6 ++--
 t/t6440-rev-list-merge.sh          | 45 ++++++++++++++++++++++++++++++
 6 files changed, 86 insertions(+), 8 deletions(-)
 create mode 100755 t/t6440-rev-list-merge.sh

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 2bf239ff03..1a75420190 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -344,6 +344,14 @@ Under `--pretty=reference`, this information will not be shown at all.
 	After a failed merge, show refs that touch files having a
 	conflict and don't exist on all heads to merge.
 
+--ours::
+	The same as --merge, but only show commits from "our branch"
+	(`HEAD`).
+
+--theirs::
+	The same as --merge, but only show commits from "their branch"
+	(`MERGE_HEAD`).
+
 --boundary::
 	Output excluded boundary commits. Boundary commits are
 	prefixed with `-`.
diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index f38912cd40..64f3b1d284 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -20,6 +20,7 @@ COMMON_DIFF_OPTIONS_HELP;
 
 int cmd_diff_files(int argc, const char **argv, const char *prefix)
 {
+	struct setup_revision_opt opt = { 0 };
 	struct rev_info rev;
 	int result;
 	unsigned options = 0;
@@ -43,7 +44,13 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)
 
 	prefix = precompose_argv_prefix(argc, argv, prefix);
 
-	argc = setup_revisions(argc, argv, &rev, NULL);
+	/*
+	 * We have our own handling for --ours/--theirs, so don't
+	 * restrict the revision ranges/paths.
+	 */
+	opt.ignore_ours_theirs = 1;
+
+	argc = setup_revisions(argc, argv, &rev, &opt);
 	while (1 < argc && argv[1][0] == '-') {
 		if (!strcmp(argv[1], "--base"))
 			rev.max_count = 1;
diff --git a/builtin/diff.c b/builtin/diff.c
index 55e7d21755..ab2388b5bc 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -393,6 +393,7 @@ static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
 int cmd_diff(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	struct setup_revision_opt opt = { 0 };
 	struct rev_info rev;
 	struct object_array ent = OBJECT_ARRAY_INIT;
 	int first_non_parent = -1;
@@ -499,7 +500,14 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 
 	if (nongit)
 		die(_("Not a git repository"));
-	argc = setup_revisions(argc, argv, &rev, NULL);
+
+	/*
+	 * We have our own handling for --ours/--theirs, so don't
+	 * restrict the revision ranges/paths.
+	 */
+	opt.ignore_ours_theirs = 1;
+
+	argc = setup_revisions(argc, argv, &rev, &opt);
 	if (!rev.diffopt.output_format) {
 		rev.diffopt.output_format = DIFF_FORMAT_PATCH;
 		diff_setup_done(&rev.diffopt);
diff --git a/revision.c b/revision.c
index 00d5c29bfc..070b5dd73b 100644
--- a/revision.c
+++ b/revision.c
@@ -1978,8 +1978,10 @@ static void prepare_show_merge(struct rev_info *revs)
 	if (repo_get_oid(the_repository, "MERGE_HEAD", &oid))
 		die("--merge without MERGE_HEAD?");
 	other = lookup_commit_or_die(&oid, "MERGE_HEAD");
-	add_pending_object(revs, &head->object, "HEAD");
-	add_pending_object(revs, &other->object, "MERGE_HEAD");
+	if (revs->show_merge_ours)
+		add_pending_object(revs, &head->object, "HEAD");
+	if (revs->show_merge_theirs)
+		add_pending_object(revs, &other->object, "MERGE_HEAD");
 	bases = repo_get_merge_bases(the_repository, head, other);
 	add_rev_cmdline_list(revs, bases, REV_CMD_MERGE_BASE, UNINTERESTING | BOTTOM);
 	add_pending_commit_list(revs, bases, UNINTERESTING | BOTTOM);
@@ -2231,6 +2233,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	const char *optarg = NULL;
 	int argcount;
 	const unsigned hexsz = the_hash_algo->hexsz;
+	int parse_ours_theirs = !(opt && opt->ignore_ours_theirs);
 
 	/* pseudo revision arguments */
 	if (!strcmp(arg, "--all") || !strcmp(arg, "--branches") ||
@@ -2324,7 +2327,12 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->def = argv[1];
 		return 2;
 	} else if (!strcmp(arg, "--merge")) {
-		revs->show_merge = 1;
+		revs->show_merge_ours = 1;
+		revs->show_merge_theirs = 1;
+	} else if (parse_ours_theirs && !strcmp(arg, "--ours")) {
+		revs->show_merge_ours = 1;
+	} else if (parse_ours_theirs && !strcmp(arg, "--theirs")) {
+		revs->show_merge_theirs = 1;
 	} else if (!strcmp(arg, "--topo-order")) {
 		revs->sort_order = REV_SORT_IN_GRAPH_ORDER;
 		revs->topo_order = 1;
@@ -2982,7 +2990,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 		revs->def = opt ? opt->def : NULL;
 	if (opt && opt->tweak)
 		opt->tweak(revs);
-	if (revs->show_merge)
+	if (revs->show_merge_ours | revs->show_merge_theirs)
 		prepare_show_merge(revs);
 	if (revs->def && !revs->pending.nr && !revs->rev_input_given) {
 		struct object_id oid;
diff --git a/revision.h b/revision.h
index 94c43138bc..63effe69c7 100644
--- a/revision.h
+++ b/revision.h
@@ -253,7 +253,8 @@ struct rev_info {
 	int		show_notes;
 	unsigned int	shown_one:1,
 			shown_dashes:1,
-			show_merge:1,
+			show_merge_ours:1,
+			show_merge_theirs:1,
 			show_notes_given:1,
 			show_notes_by_default:1,
 			show_signature:1,
@@ -436,7 +437,8 @@ void repo_init_revisions(struct repository *r,
 struct setup_revision_opt {
 	const char *def;
 	void (*tweak)(struct rev_info *);
-	unsigned int	assume_dashdash:1,
+	unsigned int	ignore_ours_theirs:1,
+			assume_dashdash:1,
 			allow_exclude_promisor_objects:1,
 			free_removed_argv_elements:1;
 	unsigned revarg_opt;
diff --git a/t/t6440-rev-list-merge.sh b/t/t6440-rev-list-merge.sh
new file mode 100755
index 0000000000..d1d6af6f31
--- /dev/null
+++ b/t/t6440-rev-list-merge.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+test_description='git rev-list --merge/--ours/--theirs'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success setup '
+	test_write_lines initial >file &&
+	git add file &&
+	git commit -m initial &&
+
+	git checkout -b ours main &&
+	test_write_lines ours >file &&
+	git commit -a -m ours &&
+
+	git checkout -b theirs main &&
+	test_write_lines theirs >file &&
+	git commit -a -m theirs &&
+
+	git checkout ours^0 &&
+	test_must_fail git merge theirs &&
+
+	INITIAL=$(git rev-parse main) &&
+	OURS=$(git rev-parse ours) &&
+	THEIRS=$(git rev-parse theirs)
+'
+
+test_expect_success 'git rev-list --merge' '
+	git rev-parse $OURS $THEIRS >expected &&
+	git rev-list --merge >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git rev-list --ours' '
+	test $OURS = $(git rev-list --ours)
+'
+
+test_expect_success 'git rev-list --theirs' '
+	test $THEIRS = $(git rev-list --theirs)
+'
+
+test_done
-- 
2.34.1


^ permalink raw reply related

* [RFC PATCH 1/2] diff: add tests for git diff --merge/--ours/--theirs
From: Vegard Nossum @ 2023-11-15 12:04 UTC (permalink / raw)
  To: git; +Cc: Vegard Nossum

These options don't seem to have any tests currently and the next
patch in this series changes how these options are parsed. Add tests.

Based loosely on t/t6417-merge-ours-theirs.sh.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
---
 t/t4070-diff-merge.sh | 79 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)
 create mode 100755 t/t4070-diff-merge.sh

diff --git a/t/t4070-diff-merge.sh b/t/t4070-diff-merge.sh
new file mode 100755
index 0000000000..01ac82f0c4
--- /dev/null
+++ b/t/t4070-diff-merge.sh
@@ -0,0 +1,79 @@
+#!/bin/sh
+
+test_description='git diff --merge/--ours/--theirs'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success setup '
+	test_write_lines initial >file &&
+	git add file &&
+	git commit -m initial &&
+
+	git checkout -b ours main &&
+	test_write_lines ours >file &&
+	git commit -a -m ours &&
+
+	git checkout -b theirs main &&
+	test_write_lines theirs >file &&
+	git commit -a -m theirs &&
+
+	git checkout ours^0 &&
+	test_must_fail git merge theirs &&
+
+	INITIAL=$(git rev-parse main) &&
+	OURS=$(git rev-parse ours) &&
+	THEIRS=$(git rev-parse theirs)
+'
+
+test_expect_success 'git diff --merge' '
+	git diff --merge | grep -v ^index >actual &&
+	cat >expect <<-\EOF &&
+	diff --cc file
+	--- a/file
+	+++ b/file
+	@@@ -1,1 -1,1 +1,1 @@@
+	- theirs
+	 -initial
+	++ours
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'git diff --ours' '
+	git diff --ours | grep -v ^index >actual &&
+	cat >expect <<-\EOF &&
+	* Unmerged path file
+	diff --git a/file b/file
+	--- a/file
+	+++ b/file
+	@@ -1 +1,5 @@
+	+<<<<<<< HEAD
+	 ours
+	+=======
+	+theirs
+	+>>>>>>> theirs
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'git diff --theirs' '
+	git diff --theirs | grep -v ^index >actual &&
+	cat >expect <<-\EOF &&
+	* Unmerged path file
+	diff --git a/file b/file
+	--- a/file
+	+++ b/file
+	@@ -1 +1,5 @@
+	+<<<<<<< HEAD
+	+ours
+	+=======
+	 theirs
+	+>>>>>>> theirs
+	EOF
+	test_cmp expect actual
+'
+
+test_done
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v2 5/5] http: reset CURLOPT_POSTFIELDSIZE_LARGE between requests
From: Patrick Steinhardt @ 2023-11-15  6:44 UTC (permalink / raw)
  To: Jiří Hruška; +Cc: git, Jeff King, Jonathan Tan, Junio C Hamano
In-Reply-To: <CAGE_+C5pnASOsrDr4ehNj-deYbSTr=pRgPcWqq5VSQs-Y08ttQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2714 bytes --]

On Tue, Nov 14, 2023 at 07:34:55PM -0800, Jiří Hruška wrote:
> `get_active_slot()` makes sure that the reused cURL handles it gives
> out are as good as fresh ones, by resetting all options that other code
> might have set on them back to defaults.
> 
> But this does not apply to `CURLOPT_POSTFIELDSIZE_LARGE` yet, which can
> stay set from a previous request. For example, an earlier probe request
> with just a flush packet "0000" leaves it set to 4.
> 
> The problem seems harmless in practice, but it can be confusing to see
> a negative amount of remaining bytes to upload when inspecting libcurl
> internals while debugging networking-related issues, for example.
> 
> So reset also this option to its default value (which is -1, not 0).
> 
> Signed-off-by: Jiri Hruska <jirka@fud.cz>
> ---
>  http.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/http.c b/http.c
> index 8f71bf00d8..14f2fbb82e 100644
> --- a/http.c
> +++ b/http.c
> @@ -1454,6 +1454,7 @@ struct active_request_slot *get_active_slot(void)
>  	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
>  	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
>  	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
> +	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, (curl_off_t)-1);
>  	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
>  	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
>  	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);

It feels quite easy for this list to grow stale whenever we start to set
a new option somewhere else. Is there a specific reason why we can't
instead use `curl_easy_reset()` here? Quoting its description:

> Re-initializes all options previously set on a specified CURL handle
> to the default values. This puts back the handle to the same state as
> it was in when it was just created with curl_easy_init.
> 
> It does not change the following information kept in the handle: live
> connections, the Session ID cache, the DNS cache, the cookies, the
> shares or the alt-svc cache.

From my naive point of view it sounds like exactly what we're after.
Most of the code in question was introduced in 9094950d73 (http: prevent
segfault during curl handle reuse, 2006-05-31), where we used to support
libcurl at least back to v7.7. `curl_easy_reset()` on the other hand had
only been introduced with v7.12.1 of libcurl, so maybe that's the reason
why it's not used here?

I dunno, might as well be that there is a good reason why we don't use
it here. But if we can, then I'd argue it would be a great cleanup to
convert to `curl_easy_reset()` here instead of piling onto the list of
options.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 5/5] http: reset CURLOPT_POSTFIELDSIZE_LARGE between requests
From: Jiří Hruška @ 2023-11-15  3:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Jonathan Tan, Junio C Hamano
In-Reply-To: <20231115033121.939-1-jirka@fud.cz>

`get_active_slot()` makes sure that the reused cURL handles it gives
out are as good as fresh ones, by resetting all options that other code
might have set on them back to defaults.

But this does not apply to `CURLOPT_POSTFIELDSIZE_LARGE` yet, which can
stay set from a previous request. For example, an earlier probe request
with just a flush packet "0000" leaves it set to 4.

The problem seems harmless in practice, but it can be confusing to see
a negative amount of remaining bytes to upload when inspecting libcurl
internals while debugging networking-related issues, for example.

So reset also this option to its default value (which is -1, not 0).

Signed-off-by: Jiri Hruska <jirka@fud.cz>
---
 http.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/http.c b/http.c
index 8f71bf00d8..14f2fbb82e 100644
--- a/http.c
+++ b/http.c
@@ -1454,6 +1454,7 @@ struct active_request_slot *get_active_slot(void)
 	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
 	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
 	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
+	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, (curl_off_t)-1);
 	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
 	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
 	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
-- 
2.42.1.5.g2f21867bd5

^ permalink raw reply related

* [PATCH v2 4/5] remote-curl: simplify rpc_out() - less nesting and rename
From: Jiří Hruška @ 2023-11-15  3:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Jonathan Tan, Junio C Hamano
In-Reply-To: <20231115033121.939-1-jirka@fud.cz>

- Remove one indentation level in `rpc_out()`, as the conditions can be
  combined into just one `if` statement without losing readability.

  Skipping the resetting of `initial_buffer`/`len`/`pos` revealed a bug
  in `stateless_connect()`, where `rpc.len` is reset to 0 after each
  request, but not `rpc.pos`. Relying on `rpc_out()` always doing this
  before has never been safe (it might have not finished cleanly, for
  example). So better reset it there, just like `rpc.len`.

- Rename `flush_read_but_not_sent` to `read_from_out_done`. The name is
  slightly misleading, because the "flush" might never be really "sent"
  (depends on `write_line_lengths`), and this is not the most important
  part anyway. The primary role of the flag is rather to signal that
  `read_from_out()` is "done" and must not be called for this particular
  RPC exchange anymore.

- Update/add some related comments.

Signed-off-by: Jiri Hruska <jirka@fud.cz>
---
 remote-curl.c | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 690df2a43e..d5aa66a44c 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -606,13 +606,14 @@ struct rpc_state {
 	unsigned write_line_lengths : 1;

 	/*
-	 * Used by rpc_out; initialize to 0. This is true if a flush has been
-	 * read, but the corresponding line length (if write_line_lengths is
-	 * true) and EOF have not been sent to libcurl. Since each flush marks
-	 * the end of a request, each flush must be completely sent before any
-	 * further reading occurs.
+	 * Used by rpc_out; initialize to 0. This is true if a flush packet
+	 * has been read from the child process, signaling the end of the
+	 * current data to send. There might be still some bytes pending in
+	 * 'buf' (e.g. the corresponding line length, if write_line_lengths
+	 * is true), but no more reads can be performed on the 'out' pipe as
+	 * part of the current RPC exchange.
 	 */
-	unsigned flush_read_but_not_sent : 1;
+	unsigned read_from_out_done : 1;
 };

 #define RPC_STATE_INIT { 0 }
@@ -690,21 +691,29 @@ static size_t rpc_out(void *ptr, size_t eltsize,
 	size_t avail = rpc->len - rpc->pos;
 	enum packet_read_status status;

-	if (!avail) {
+	/*
+	 * If there is no more data available in our buffer and the child
+	 * process is not done sending yet, read the next packet.
+	 */
+	if (!avail && !rpc->read_from_out_done) {
 		rpc->initial_buffer = 0;
 		rpc->len = 0;
 		rpc->pos = 0;
-		if (!rpc->flush_read_but_not_sent) {
-			if (!rpc_read_from_out(rpc, 0, &avail, &status))
-				BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
-			if (status == PACKET_READ_FLUSH)
-				rpc->flush_read_but_not_sent = 1;
-		}
+		if (!rpc_read_from_out(rpc, 0, &avail, &status))
+			BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
+
 		/*
-		 * If flush_read_but_not_sent is true, we have already read one
-		 * full request but have not fully sent it + EOF, which is why
-		 * we need to refrain from reading.
+		 * If a flush packet was read, it means the child process is
+		 * done sending this request. The buffer might be fully empty
+		 * at this point or contain a flush packet too, depending on
+		 * rpc->write_line_lengths.
+		 * In any case, we must refrain from reading any more, because
+		 * the child process already expects to receive a response back
+		 * instead. If both sides would try to read at once, they would
+		 * just hang waiting for each other.
 		 */
+		if (status == PACKET_READ_FLUSH)
+			rpc->read_from_out_done = 1;
 	}

 	/*
@@ -967,7 +976,7 @@ static int post_rpc(struct rpc_state *rpc, int
stateless_connect, int flush_rece
 		 */
 		headers = curl_slist_append(headers, "Transfer-Encoding: chunked");
 		rpc->initial_buffer = 1;
-		rpc->flush_read_but_not_sent = 0;
+		rpc->read_from_out_done = 0;
 		curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out);
 		curl_easy_setopt(slot->curl, CURLOPT_INFILE, rpc);
 		curl_easy_setopt(slot->curl, CURLOPT_SEEKFUNCTION, rpc_seek);
@@ -1487,7 +1496,7 @@ static int stateless_connect(const char *service_name)
 	rpc.gzip_request = 1;
 	rpc.initial_buffer = 0;
 	rpc.write_line_lengths = 1;
-	rpc.flush_read_but_not_sent = 0;
+	rpc.read_from_out_done = 0;

 	/*
 	 * Dump the capability listing that we got from the server earlier
@@ -1510,6 +1519,7 @@ static int stateless_connect(const char *service_name)
 			break;
 		/* Reset the buffer for next request */
 		rpc.len = 0;
+		rpc.pos = 0;
 	}

 	free(rpc.service_url);
-- 
2.42.1.5.g2f21867bd5

^ permalink raw reply related

* [PATCH v2 3/5] remote-curl: simplify rpc_out() - remove superfluous ifs
From: Jiří Hruška @ 2023-11-15  3:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Jonathan Tan, Junio C Hamano
In-Reply-To: <20231115033121.939-1-jirka@fud.cz>

Remove the second set of nested `if` statements and an early `return`
in `rpc_out()`, because they accomplish nothing. The rest of the
function would behave the same without any branching.

Signed-off-by: Jiri Hruska <jirka@fud.cz>
---
 remote-curl.c | 24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 428dd70aa1..690df2a43e 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -706,25 +706,13 @@ static size_t rpc_out(void *ptr, size_t eltsize,
 		 * we need to refrain from reading.
 		 */
 	}
-	if (rpc->flush_read_but_not_sent) {
-		if (!avail) {
-			/*
-			 * The line length either does not need to be sent at
-			 * all or has already been completely sent. Now we can
-			 * return 0, indicating EOF, meaning that the flush has
-			 * been fully sent. It is important to keep returning 0
-			 * as long as needed in that case, as libcurl invokes
-			 * the callback multiple times at EOF sometimes.
-			 */
-			return 0;
-		}
-		/*
-		 * If avail is non-zero, the line length for the flush still
-		 * hasn't been fully sent. Proceed with sending the line
-		 * length.
-		 */
-	}

+	/*
+	 * Copy data to the provided buffer. If there is nothing more to send,
+	 * nothing gets written and the return value is 0 (EOF).
+	 * It is important to keep returning 0 as long as needed in that case,
+	 * as libcurl invokes the callback multiple times at EOF sometimes.
+	 */
 	if (max < avail)
 		avail = max;
 	memcpy(ptr, rpc->buf + rpc->pos, avail);
-- 
2.42.1.5.g2f21867bd5

^ permalink raw reply related

* [PATCH v2 1/5] remote-curl: avoid hang if curl asks for more data after eof
From: Jiří Hruška @ 2023-11-15  3:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Jonathan Tan, Junio C Hamano
In-Reply-To: <20231115033121.939-1-jirka@fud.cz>

It has been observed that under some circumstances, libcurl can call
our `CURLOPT_READFUNCTION` callback `rpc_out()` again even after
already getting a return value of zero (EOF) back once before.

Because `rpc->flush_read_but_not_sent` is reset to false immediately
the first time an EOF is returned, the repeated call goes again to
`rpc_read_from_out()`, which tries to read more from the child process
pipe and the whole operation gets stuck - the child process is already
trying to read a response back and will not write anything to the
output pipe anymore, while the parent/remote process is now blocked
waiting to read more too and never even finishes sending the request.

The bug is fixed by moving the reset of the `flush_read_but_not_sent`
flag to `post_rpc()`, only before `rpc_out()` would be potentially used
the next time. This makes the callback behave like fread() and return
a zero any number of times at the end of a finished upload.

Signed-off-by: Jiri Hruska <jirka@fud.cz>
---
 remote-curl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index ef05752ca5..199c4615a5 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -705,9 +705,10 @@ static size_t rpc_out(void *ptr, size_t eltsize,
 			 * The line length either does not need to be sent at
 			 * all or has already been completely sent. Now we can
 			 * return 0, indicating EOF, meaning that the flush has
-			 * been fully sent.
+			 * been fully sent. It is important to keep returning 0
+			 * as long as needed in that case, as libcurl invokes
+			 * the callback multiple times at EOF sometimes.
 			 */
-			rpc->flush_read_but_not_sent = 0;
 			return 0;
 		}
 		/*
@@ -963,6 +964,7 @@ static int post_rpc(struct rpc_state *rpc, int
stateless_connect, int flush_rece
 		 */
 		headers = curl_slist_append(headers, "Transfer-Encoding: chunked");
 		rpc->initial_buffer = 1;
+		rpc->flush_read_but_not_sent = 0;
 		curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out);
 		curl_easy_setopt(slot->curl, CURLOPT_INFILE, rpc);
 		curl_easy_setopt(slot->curl, CURLOPT_SEEKFUNCTION, rpc_seek);
-- 
2.42.1.5.g2f21867bd5

^ permalink raw reply related

* [PATCH v2 2/5] remote-curl: improve readability of curl callbacks
From: Jiří Hruška @ 2023-11-15  3:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Jonathan Tan, Junio C Hamano
In-Reply-To: <20231115033121.939-1-jirka@fud.cz>

The "user data" or "context" argument of libcurl streaming callbacks
is sometimes called `clientp` and sometimes `buffer_`. The latter is
especially confusing, because there is an actual buffer pointer
argument passed to the same functions.

- Make the argument consistently named `userdata` everywhere, just
  like the official cURL documentation calls it.

- Also add comments to all the callbacks, to make it easier to grasp
  what is the "in" and what is the "out" direction in this code.

Signed-off-by: Jiri Hruska <jirka@fud.cz>
---
 remote-curl.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 199c4615a5..428dd70aa1 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -675,11 +675,18 @@ static int rpc_read_from_out(struct rpc_state
*rpc, int options,
 	return 1;
 }

+/*
+ * CURLOPT_READFUNCTION callback, called by libcurl when it wants more data
+ * to send out. Used only if the request did not fit into just one buffer and
+ * data must be streamed as it comes.
+ * Has the same semantics as fread(), but reads packets from the pipe from
+ * the child process instead. A return value of 0 (EOF) finishes the upload.
+ */
 static size_t rpc_out(void *ptr, size_t eltsize,
-		size_t nmemb, void *buffer_)
+		size_t nmemb, void *userdata)
 {
 	size_t max = eltsize * nmemb;
-	struct rpc_state *rpc = buffer_;
+	struct rpc_state *rpc = userdata;
 	size_t avail = rpc->len - rpc->pos;
 	enum packet_read_status status;

@@ -725,9 +732,16 @@ static size_t rpc_out(void *ptr, size_t eltsize,
 	return avail;
 }

-static int rpc_seek(void *clientp, curl_off_t offset, int origin)
+/*
+ * CURLOPT_SEEKFUNCTION callback, called by libcurl when it wants to seek in
+ * the data being sent out. Used only if the request did not fit into just
+ * one buffer and data must be streamed as it comes.
+ * Has the same semantics as fseek(), but seeks in the buffered packet read
+ * from the pipe from the child process instead.
+ */
+static int rpc_seek(void *userdata, curl_off_t offset, int origin)
 {
-	struct rpc_state *rpc = clientp;
+	struct rpc_state *rpc = userdata;

 	if (origin != SEEK_SET)
 		BUG("rpc_seek only handles SEEK_SET, not %d", origin);
@@ -797,14 +811,15 @@ struct rpc_in_data {
 };

 /*
- * A callback for CURLOPT_WRITEFUNCTION. The return value is the bytes consumed
- * from ptr.
+ * CURLOPT_WRITEFUNCTION callback, called when more received data has come in.
+ * Has the same semantics as fwrite(), but writes packets to the pipe to the
+ * child process instead. The return value is the bytes consumed from ptr.
  */
 static size_t rpc_in(char *ptr, size_t eltsize,
-		size_t nmemb, void *buffer_)
+		size_t nmemb, void *userdata)
 {
 	size_t size = eltsize * nmemb;
-	struct rpc_in_data *data = buffer_;
+	struct rpc_in_data *data = userdata;
 	long response_code;

 	if (curl_easy_getinfo(data->slot->curl, CURLINFO_RESPONSE_CODE,
-- 
2.42.1.5.g2f21867bd5

^ permalink raw reply related

* [PATCH v2 0/5] Avoid hang if curl needs eof twice + minor related improvements
From: Jiří Hruška @ 2023-11-15  3:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Jonathan Tan, Junio C Hamano
In-Reply-To: <CAGE_+C6DJMAO0bj5QHoKBBV3gMEMtZ-ajJ9ZnDGYq6eorr-oig@mail.gmail.com>

Proposed changes split into several commits for clarity

Jiri Hruska (5):
  remote-curl: avoid hang if curl asks for more data after eof
  remote-curl: improve readability of curl callbacks
  remote-curl: simplify rpc_out() - remove superfluous ifs
  remote-curl: simplify rpc_out() - less nesting and rename
  http: reset CURLOPT_POSTFIELDSIZE_LARGE between requests

 http.c        |  1 +
 remote-curl.c | 99 +++++++++++++++++++++++++++++----------------------
 2 files changed, 58 insertions(+), 42 deletions(-)

-- 
2.42.1.5.g2f21867bd5

^ permalink raw reply

* Re: [PATCH] commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
From: Jeff King @ 2023-11-15  1:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Karthik Nayak
In-Reply-To: <xmqq4jhnyhe1.fsf@gitster.g>

On Wed, Nov 15, 2023 at 09:44:38AM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > What I'm getting as it that I think we have three options for v2.43:
> >
> >   1. Ship what has been in the release candidates, which has a known
> >      performance regression (though the severity is up for debate).
> >
> >   2. Flip the default to "0" (i.e., Patrick's patch in this thread). We
> >      know that loosens some cases versus 2.42, which may be considered a
> >      regression.
> >
> >   3. Sort it out before the release. We're getting pretty close to do
> >      a lot new work there, but I think the changes should be small-ish.
> >      The nuclear option is ejecting the topic and re-doing it in the
> >      next cycle.
> >
> > I don't have a really strong preference between the three.
> 
> I've been (naively) assuming that #1 is everybody's preference,
> simply because #2 does introduce a regression in the correctness
> department (as opposed to a possible performance regression caused
> by #1), and because #3 has a high risk of screwing up.
> 
> As long as the performance regression is known and on our radar,
> I'd say that working on a maintenance release after Thanksgiving
> would be sufficient.
> 
> I might be underestimating the impact of the loss of performance,
> though, in which case I'd consider that nuclear one, which is the
> simplest and least risky.

I am fine with #1 for the release. Mostly I just wanted to understand
what the plan was (and if we needed to be hurrying to try to make the
non-nuclear #3 work).

-Peff

^ permalink raw reply

* Re: [PATCH] ci: avoid running the test suite _twice_
From: Junio C Hamano @ 2023-11-15  1:00 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: Jeff King, Johannes Schindelin via GitGitGadget, Phillip Wood,
	git, Johannes Schindelin
In-Reply-To: <ZVPm0qn6XsbLL8eM@google.com>

Josh Steadmon <steadmon@google.com> writes:

> On 2023.11.14 08:55, Junio C Hamano wrote:
>> Jeff King <peff@peff.net> writes:
>> 
>> > I do have to wonder, though, as somebody who did not follow the
>> > unit-test topic closely: why are the unit tests totally separate from
>> > the rest of the suite? I would think we'd want them run from one or more
>> > t/t*.sh scripts. That would make bugs like this impossible, but also:
>> >
>> >   1. They'd be run via "make test", so developers don't have to remember
>> >      to run them separately.
>> >
>> >   2. They can be run in parallel with all of the other tests when using
>> >      "prove -j", etc.
>> 
>> Very good points.  Josh?
>
> In short, the last time I tried to add something to CI, it was not well
> received, so I've been perhaps overly cautious in keeping the unit-tests
> well-separated from other targets. But I can send a follow-up patch to
> fold them into `make test`. Or would you prefer that I send a v11 of
> js/doc-unit-tests instead?

Incremental patches to update what is in 'next' would let us try out
the new arragement to drive the tests from the main "make test"
eaarlier.  Post release, a new iteration could replace the series
wholesale as we will have an opportunity to rebuild 'next', but it
would be nice for the end states to match, if you were to do both.

Thanks.

^ permalink raw reply

* Re: [PATCH] send-email: avoid duplicate specification warnings
From: Junio C Hamano @ 2023-11-15  0:48 UTC (permalink / raw)
  To: Todd Zullinger
  Cc: Jeff King, git, Ævar Arnfjörð Bjarmason,
	Ondřej Pohořelský
In-Reply-To: <ZVPfvjoXyGVlKqvr@pobox.com>

Todd Zullinger <tmz@pobox.com> writes:

> Since this isn't anything new with 2.43, it doesn't need to
> be fixed with much urgency.

True.  Unless the new version of Getopt::Long is quickly spreading
through our user base, that is.

> Thanks both,

Thanks for spotting the issue and acting on it quickly.

^ permalink raw reply

* Re: [PATCH] commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
From: Junio C Hamano @ 2023-11-15  0:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Karthik Nayak
In-Reply-To: <20231114194310.GC2092538@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> What I'm getting as it that I think we have three options for v2.43:
>
>   1. Ship what has been in the release candidates, which has a known
>      performance regression (though the severity is up for debate).
>
>   2. Flip the default to "0" (i.e., Patrick's patch in this thread). We
>      know that loosens some cases versus 2.42, which may be considered a
>      regression.
>
>   3. Sort it out before the release. We're getting pretty close to do
>      a lot new work there, but I think the changes should be small-ish.
>      The nuclear option is ejecting the topic and re-doing it in the
>      next cycle.
>
> I don't have a really strong preference between the three.

I've been (naively) assuming that #1 is everybody's preference,
simply because #2 does introduce a regression in the correctness
department (as opposed to a possible performance regression caused
by #1), and because #3 has a high risk of screwing up.

As long as the performance regression is known and on our radar,
I'd say that working on a maintenance release after Thanksgiving
would be sufficient.

I might be underestimating the impact of the loss of performance,
though, in which case I'd consider that nuclear one, which is the
simplest and least risky.

Thanks.


^ permalink raw reply

* Re: [PATCH] remote-curl: avoid hang if curl asks for more data after eof
From: Jiří Hruška @ 2023-11-14 23:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git, Jeff King
In-Reply-To: <xmqqv8a515ge.fsf@gitster.g>

> Is this a bug on our side, or cURL library calling us when it should not?

I thought the same and yes, I suppose it _might_ be potentially considered
a bug on cURL side in the first place. But then also

1/ it is not mandated anywhere in the API that the callback will never be
   called again after already getting an EOF once,

2/ I looked at libcurl code and it was not entirely clear to me that the
   behavior would be accidental, that it could be clearly called as a bug,

3/ anything that follows how fread() works would never be affected,
   but git-remote-curl is, because it does something differently,

4/ even if it gets fixed in libcurl today, people might be building Git with
   whatever old versions of the library for years to come,

so worth fixing here in any case (imho).

But I'll reach out to curl-library and get their opinion, so that we have
a full picture here. Thanks

^ permalink raw reply

* Re: [PATCH] remote-curl: avoid hang if curl asks for more data after eof
From: Jiří Hruška @ 2023-11-14 23:16 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Jeff King
In-Reply-To: <20231113212243.1495815-1-jonathantanmy@google.com>

Thank you for the comments,

> Yes, please split non-functional changes into a separate commit
> (preferably one for each concern). I do envision reviewers saying "let's
> put patches X, Y, and Z in, but not patches A, B, and C", so splitting
> would make it easier to decide what's worthwhile to have.

Sure, will do

^ permalink raw reply

* [PATCH] RelNotes: tweak 2.43.0 release notes
From: Andy Koppe @ 2023-11-14 22:31 UTC (permalink / raw)
  To: git; +Cc: Andy Koppe

Add some more detail on the $(decorate) log format placeholder and tweak
the wording on some other points.
---
 Documentation/RelNotes/2.43.0.txt | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/Documentation/RelNotes/2.43.0.txt b/Documentation/RelNotes/2.43.0.txt
index 770543c464..9a68074a4a 100644
--- a/Documentation/RelNotes/2.43.0.txt
+++ b/Documentation/RelNotes/2.43.0.txt
@@ -10,8 +10,8 @@ Backward Compatibility Notes
    prefix.  If you are negatively affected by this change, please use
    "--subject-prefix=PATCH --rfc" as a replacement.
 
- * "git rev-list --stdin" learned to take non-revisions (like "--not")
-   recently from the standard input, but the way such a "--not" was
+ * "git rev-list --stdin" recently learned to take non-revisions (like
+   "--not") from the standard input, but the way such a "--not" was
    handled was quite confusing, which has been rethought.  The updated
    rule is that "--not" given from the command line only affects revs
    given from the command line that comes but not revs read from the
@@ -43,10 +43,10 @@ UI, Workflows & Features
 
  * Git GUI updates.
 
- * "git format-patch" learns a way to feed cover letter description,
-   that (1) can be used on detached HEAD where there is no branch
-   description available, and (2) also can override the branch
-   description if there is one.
+ * "git format-patch" learns option "--description-file" to feed in a
+   cover letter description that can be used when no branch description
+   is available, or that can override the branch description if there is
+   one.
 
  * Use of --max-pack-size to allow multiple packfiles to be created is
    now supported even when we are sending unreachable objects to cruft
@@ -56,7 +56,9 @@ UI, Workflows & Features
    "--subject-prefix" option and used "[RFC PATCH]"; now we will add
    "RFC" prefix to whatever subject prefix is specified.
 
- * "git log --format" has been taught the %(decorate) placeholder.
+ * "git log" format strings now support a "%(decorate)" placeholder that
+   can be used to customize the symbols and the tag prefix used in ref
+   decorations.
 
  * The default log message created by "git revert", when reverting a
    commit that records a revert, has been tweaked, to encourage people
@@ -99,7 +101,7 @@ UI, Workflows & Features
  * The attribute subsystem learned to honor `attr.tree` configuration
    that specifies which tree to read the .gitattributes files from.
 
- * "git merge-file" learns a mode to read three contents to be merged
+ * "git merge-file" learns a mode to read three files to be merged
    from blob objects.
 
 
@@ -127,7 +129,7 @@ Performance, Internal Implementation, Development Support etc.
  * The code to keep track of existing packs in the repository while
    repacking has been refactored.
 
- * The "streaming" interface used for bulk-checkin codepath has been
+ * The "streaming" interface used for the bulk-checkin codepath has been
    narrowed to take only blob objects for now, with no real loss of
    functionality.
 
-- 
2.43.0-rc2


^ permalink raw reply related

* RE: [Potential Bug] Test t0301.34 hangs - Git v2.43.0-rc2
From: rsbecker @ 2023-11-14 22:18 UTC (permalink / raw)
  To: 'Todd Zullinger'; +Cc: 'Junio C Hamano', git
In-Reply-To: <ZVPiQKhem7ew8o_8@pobox.com>

On Tuesday, November 14, 2023 4:10 PM, Todd Zullinger wrote:
>rsbecker@nexbridge.com wrote:
>> When running the full suite, I found that t0301.34 hangs on NonStop
>> x86 (Big Endian). No details at this point - will rerun this, but this
>> is a regression from rc1.
>
>FWIW, this test ran fine on Fedora's s390x architecture.
>That's little solace, I know, but may help rule out some potential causes.
>
>    t0301-credential-cache.sh ..........................
>    ...
>    ok 34 - helper (cache) can forget user
>    ...
>    # passed all 44 test(s)
>
>The build log is available here (for a few weeks or so -- it was only a
test build):
>
>https://kojipkgs.fedoraproject.org//work/tasks/4976/109024976/build.log

Well... it looks transient. I reran the test with and without verbose in ksh
and bash without any difficulty. It might just be a timing issue.

I guess we can ignore this one for now unless it becomes persistent.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox