Git development
 help / color / mirror / Atom feed
* [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:36 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee
In-Reply-To: <pull.696.v2.git.1597760730.gitgitgadget@gmail.com>

This series is based on v3 of part I (ds/maintenance-part-1) [2].

This patch series contains 9 patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help, I'm
splitting out the portions that create and test the 'maintenance' builtin
from the additional tasks (prefetch, loose-objects, incremental-repack) that
can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/
[2] 
https://lore.kernel.org/git/pull.695.v3.git.1598380426.gitgitgadget@gmail.com/

As detailed in [2], the 'git maintenance run' subcommand will run certain
tasks based on config options or the --task= arguments. The --auto option
indicates to the task to only run based on some internal check that there
has been "enough" change in that domain to merit the work. In the case of
the 'gc' task, this also reduces the amount of work done. 

The new maintenance tasks in this series are:

 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". 

Since [2] replaced the 'git gc --auto' calls with 'git maintenance run
--auto' at the end of some Git commands, users could replace the 'gc' task
with these lighter-weight changes for foreground maintenance.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves. I have an RFC series for
this available at [3].

[3] 
https://lore.kernel.org/git/pull.680.git.1597857408.gitgitgadget@gmail.com/

Updates since v2
================

 * Dropped "fetch: optionally allow disabling FETCH_HEAD update"
   
   
 * A lot of fallout from the change in the option parsing in v3 of
   Maintenance II.
   
   
 * Dropped the "verify, and delete and rewrite on failure" logic from the
   incremental-repack task. This might be added again later after it can be
   tested more thoroughly.
   
   

Updates since v1 (of this series)
=================================

 * PATCH 1 ("fetch: optionally allow disabling FETCH_HEAD update") was
   rewritten on-list. Getting a version out with this patch is the main
   reason for rolling a v2. (That, and Part I is re-rolled with a v2 and I
   want to make sure this series applies cleanly.)
   
   
 * The 'prefetch' and 'loose-objects' tasks had some review, but my proposed
   changes were not acked, so they may need another review.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The 0x7FFF/0x7FFFFFFF constant problem is fixed with an EXPENSIVE test
   that verifies it.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Derrick Stolee (8):
  maintenance: add prefetch task
  maintenance: add loose-objects task
  maintenance: create auto condition for loose-objects
  midx: enable core.multiPackIndex by default
  midx: use start_delayed_progress()
  maintenance: add incremental-repack task
  maintenance: auto-size incremental-repack batch
  maintenance: add incremental-repack auto condition

 Documentation/config/core.txt        |   4 +-
 Documentation/config/maintenance.txt |  18 ++
 Documentation/git-maintenance.txt    |  45 ++++
 builtin/gc.c                         | 327 +++++++++++++++++++++++++++
 midx.c                               |  23 +-
 midx.h                               |   1 +
 repo-settings.c                      |   6 +
 repository.h                         |   2 +
 t/t5319-multi-pack-index.sh          |  15 +-
 t/t7900-maintenance.sh               | 191 ++++++++++++++++
 10 files changed, 609 insertions(+), 23 deletions(-)


base-commit: 652a8eac57d04a51820c7a5b45031b50c5188e7b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-696%2Fderrickstolee%2Fmaintenance%2Fgc-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-696/derrickstolee/maintenance/gc-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/696

Range-diff vs v2:

  1:  f3bc0b2d92 <  -:  ---------- fetch: optionally allow disabling FETCH_HEAD update
  2:  8779c6c20d !  1:  da64c51a81 maintenance: add prefetch task
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     -@@ Documentation/git-maintenance.txt: since it will not expire `.graph` files that were in the previous
     - `commit-graph-chain` file. They will be deleted by a later run based on
     - the expiration delay.
     +@@ Documentation/git-maintenance.txt: commit-graph::
     + 	`commit-graph-chain` file. They will be deleted by a later run based
     + 	on the expiration delay.
       
      +prefetch::
      +	The `prefetch` task updates the object directory with the latest
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *opts)
     - 	return 1;
     +@@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
     + 	return 0;
       }
       
     -+static int fetch_remote(const char *remote, struct maintenance_opts *opts)
     ++static int fetch_remote(const char *remote, struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *
      +	return 0;
      +}
      +
     -+static int maintenance_task_prefetch(struct maintenance_opts *opts)
     ++static int maintenance_task_prefetch(struct maintenance_run_opts *opts)
      +{
      +	int result = 0;
      +	struct string_list_item *item;
     @@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *
      +	return result;
      +}
      +
     - static int maintenance_task_gc(struct maintenance_opts *opts)
     + static int maintenance_task_gc(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
      @@ builtin/gc.c: struct maintenance_task {
  3:  4fa9d298b9 !  2:  75e846456b maintenance: add loose-objects task
     @@ Documentation/git-maintenance.txt: gc::
       --auto::
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
       	return run_command(&child);
       }
       
     -+static int prune_packed(struct maintenance_opts *opts)
     ++static int prune_packed(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
      +	return ++(d->count) > d->batch_size;
      +}
      +
     -+static int pack_loose(struct maintenance_opts *opts)
     ++static int pack_loose(struct maintenance_run_opts *opts)
      +{
      +	struct repository *r = the_repository;
      +	int result = 0;
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
      +	return result;
      +}
      +
     -+static int maintenance_task_loose_objects(struct maintenance_opts *opts)
     ++static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
      +{
      +	return prune_packed(opts) || pack_loose(opts);
      +}
      +
     - typedef int maintenance_task_fn(struct maintenance_opts *opts);
     + typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
       
       /*
      @@ builtin/gc.c: struct maintenance_task {
  4:  3432bc3167 =  3:  d6e382c43e maintenance: create auto condition for loose-objects
  5:  0ee2434bdb =  4:  d0f2ec70d9 midx: enable core.multiPackIndex by default
  6:  e157ea8dd7 =  5:  2cd3c803d9 midx: use start_delayed_progress()
  7:  a8d956dad6 !  6:  0dd26bb584 maintenance: add incremental-repack task
     @@ Commit message
             it requires doing some calculations that are better isolated to
             a separate change.
      
     -    Each of the above steps update the multi-pack-index file. After
     -    each step, we verify the new multi-pack-index. If the new
     -    multi-pack-index is corrupt, then delete the multi-pack-index,
     -    rewrite it from scratch, and stop doing the later steps of the
     -    job. This is intended to be an extra-safe check without leaving
     -    a repo with many pack-files without a multi-pack-index.
     -
          These steps are based on a similar background maintenance step in
          Scalar (and VFS for Git) [1]. This was incredibly effective for
          users of the Windows OS repository. After using the same VFS for Git
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
       	return prune_packed(opts) || pack_loose(opts);
       }
       
     -+static int multi_pack_index_write(struct maintenance_opts *opts)
     ++static int multi_pack_index_write(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return 0;
      +}
      +
     -+static int rewrite_multi_pack_index(struct maintenance_opts *opts)
     -+{
     -+	struct repository *r = the_repository;
     -+	char *midx_name = get_midx_filename(r->objects->odb->path);
     -+
     -+	unlink(midx_name);
     -+	free(midx_name);
     -+
     -+	return multi_pack_index_write(opts);
     -+}
     -+
     -+static int multi_pack_index_verify(struct maintenance_opts *opts,
     -+				   const char *message)
     -+{
     -+	struct child_process child = CHILD_PROCESS_INIT;
     -+
     -+	child.git_cmd = 1;
     -+	strvec_pushl(&child.args, "multi-pack-index", "verify", NULL);
     -+
     -+	if (opts->quiet)
     -+		strvec_push(&child.args, "--no-progress");
     -+
     -+	if (run_command(&child)) {
     -+		warning(_("'git multi-pack-index verify' failed %s"), message);
     -+		return 1;
     -+	}
     -+
     -+	return 0;
     -+}
     -+
     -+static int multi_pack_index_expire(struct maintenance_opts *opts)
     ++static int multi_pack_index_expire(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return 0;
      +}
      +
     -+static int multi_pack_index_repack(struct maintenance_opts *opts)
     ++static int multi_pack_index_repack(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return 0;
      +}
      +
     -+static int maintenance_task_incremental_repack(struct maintenance_opts *opts)
     ++static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts)
      +{
      +	prepare_repo_settings(the_repository);
      +	if (!the_repository->settings.core_multi_pack_index) {
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +
      +	if (multi_pack_index_write(opts))
      +		return 1;
     -+	if (multi_pack_index_verify(opts, "after initial write"))
     -+		return rewrite_multi_pack_index(opts);
      +	if (multi_pack_index_expire(opts))
      +		return 1;
     -+	if (multi_pack_index_verify(opts, "after expire step"))
     -+		return !!rewrite_multi_pack_index(opts);
      +	if (multi_pack_index_repack(opts))
      +		return 1;
     -+	if (multi_pack_index_verify(opts, "after repack step"))
     -+		return !!rewrite_multi_pack_index(opts);
      +	return 0;
      +}
      +
     - typedef int maintenance_task_fn(struct maintenance_opts *opts);
     + typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
       
       /*
      @@ builtin/gc.c: struct maintenance_task {
  8:  f0e7276755 !  7:  f3b25a9927 maintenance: auto-size incremental-repack batch
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_run_opts *opts)
       	return 0;
       }
       
     @@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_opts *opts)
      +	return result_size;
      +}
      +
     - static int multi_pack_index_repack(struct maintenance_opts *opts)
     + static int multi_pack_index_repack(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
     -@@ builtin/gc.c: static int multi_pack_index_repack(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int multi_pack_index_repack(struct maintenance_run_opts *opts)
       	if (opts->quiet)
       		strvec_push(&child.args, "--no-progress");
       
  9:  5659a23ad5 !  8:  e9bb32f53a maintenance: add incremental-repack auto condition
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
       	return prune_packed(opts) || pack_loose(opts);
       }
       
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return count >= incremental_repack_auto_limit;
      +}
      +
     - static int multi_pack_index_write(struct maintenance_opts *opts)
     + static int multi_pack_index_write(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
      @@ builtin/gc.c: static struct maintenance_task tasks[] = {

-- 
gitgitgadget

^ permalink raw reply

* [PATCH v3 09/11] maintenance: use pointers to check --auto
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c              | 16 ++++++++++++++++
 t/t5514-fetch-multiple.sh |  2 +-
 t/t7900-maintenance.sh    |  2 +-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 67a8d405a1..709d13553b 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -755,9 +755,17 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 
 typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
 
+/*
+ * An auto condition function returns 1 if the task should run
+ * and 0 if the task should NOT run. See needs_to_gc() for an
+ * example.
+ */
+typedef int maintenance_auto_fn(void);
+
 struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
+	maintenance_auto_fn *auto_condition;
 	unsigned enabled:1;
 
 	/* -1 if not selected. */
@@ -776,6 +784,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_GC] = {
 		"gc",
 		maintenance_task_gc,
+		need_to_gc,
 		1,
 	},
 	[TASK_COMMIT_GRAPH] = {
@@ -831,6 +840,11 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		if (!found_selected && !tasks[i].enabled)
 			continue;
 
+		if (opts->auto_flag &&
+		    (!tasks[i].auto_condition ||
+		     !tasks[i].auto_condition()))
+			continue;
+
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
@@ -845,6 +859,8 @@ static void initialize_task_config(void)
 {
 	int i;
 	struct strbuf config_name = STRBUF_INIT;
+	gc_config();
+
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
 
diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index de8e2f1531..bd202ec6f3 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -108,7 +108,7 @@ test_expect_success 'git fetch --multiple (two remotes)' '
 	 GIT_TRACE=1 git fetch --multiple one two 2>trace &&
 	 git branch -r > output &&
 	 test_cmp ../expect output &&
-	 grep "built-in: git gc" trace >gc &&
+	 grep "built-in: git maintenance" trace >gc &&
 	 test_line_count = 1 gc
 	)
 '
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 290abb7832..4f6a04ddb1 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -21,7 +21,7 @@ test_expect_success 'run [--auto|--quiet]' '
 	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
 		git maintenance run --no-quiet 2>/dev/null &&
 	test_subcommand git gc --quiet <run-no-auto.txt &&
-	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand ! git gc --auto --quiet <run-auto.txt &&
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 11/11] maintenance: add trace2 regions for task execution
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index 8c4edf19ba..c3bcdc1167 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -927,10 +927,12 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		trace2_region_enter("maintenance", tasks[i].name, r);
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
 		}
+		trace2_region_leave("maintenance", tasks[i].name, r);
 	}
 
 	rollback_lock_file(&lk);
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v3 10/11] maintenance: add auto condition for commit-graph task
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.

This count is controlled by the maintenance.commit-graph.auto config
option.

To compute the count, use a depth-first search starting at each ref, and
leaving markers using the PARENT1 flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt | 10 ++++
 builtin/gc.c                         | 82 ++++++++++++++++++++++++++++
 object.h                             |  1 +
 3 files changed, 93 insertions(+)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 4402b8b49f..7cc6700d57 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -4,3 +4,13 @@ maintenance.<task>.enabled::
 	`git maintenance run`. These config values are ignored if a
 	`--task` option exists. By default, only `maintenance.gc.enabled`
 	is true.
+
+maintenance.commit-graph.auto::
+	This integer config option controls how often the `commit-graph` task
+	should be run as part of `git maintenance run --auto`. If zero, then
+	the `commit-graph` task will not run with the `--auto` option. A
+	negative value will force the task to run every time. Otherwise, a
+	positive value implies the command should run when the number of
+	reachable commits that are not in the commit-graph file is at least
+	the value of `maintenance.commit-graph.auto`. The default value is
+	100.
diff --git a/builtin/gc.c b/builtin/gc.c
index 709d13553b..8c4edf19ba 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -28,6 +28,7 @@
 #include "blob.h"
 #include "tree.h"
 #include "promisor-remote.h"
+#include "refs.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -710,6 +711,86 @@ struct maintenance_run_opts {
 	int quiet;
 };
 
+/* Remember to update object flag allocation in object.h */
+#define SEEN		(1u<<0)
+
+struct cg_auto_data {
+	int num_not_in_graph;
+	int limit;
+};
+
+static int dfs_on_ref(const char *refname,
+		      const struct object_id *oid, int flags,
+		      void *cb_data)
+{
+	struct cg_auto_data *data = (struct cg_auto_data *)cb_data;
+	int result = 0;
+	struct object_id peeled;
+	struct commit_list *stack = NULL;
+	struct commit *commit;
+
+	if (!peel_ref(refname, &peeled))
+		oid = &peeled;
+	if (oid_object_info(the_repository, oid, NULL) != OBJ_COMMIT)
+		return 0;
+
+	commit = lookup_commit(the_repository, oid);
+	if (!commit)
+		return 0;
+	if (parse_commit(commit))
+		return 0;
+
+	commit_list_append(commit, &stack);
+
+	while (!result && stack) {
+		struct commit_list *parent;
+
+		commit = pop_commit(&stack);
+
+		for (parent = commit->parents; parent; parent = parent->next) {
+			if (parse_commit(parent->item) ||
+			    commit_graph_position(parent->item) != COMMIT_NOT_FROM_GRAPH ||
+			    parent->item->object.flags & SEEN)
+				continue;
+
+			parent->item->object.flags |= SEEN;
+			data->num_not_in_graph++;
+
+			if (data->num_not_in_graph >= data->limit) {
+				result = 1;
+				break;
+			}
+
+			commit_list_append(parent->item, &stack);
+		}
+	}
+
+	free_commit_list(stack);
+	return result;
+}
+
+static int should_write_commit_graph(void)
+{
+	int result;
+	struct cg_auto_data data;
+
+	data.num_not_in_graph = 0;
+	data.limit = 100;
+	git_config_get_int("maintenance.commit-graph.auto",
+			   &data.limit);
+
+	if (!data.limit)
+		return 0;
+	if (data.limit < 0)
+		return 1;
+
+	result = for_each_ref(dfs_on_ref, &data);
+
+	clear_commit_marks_all(SEEN);
+
+	return result;
+}
+
 static int run_write_commit_graph(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -790,6 +871,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_COMMIT_GRAPH] = {
 		"commit-graph",
 		maintenance_task_commit_graph,
+		should_write_commit_graph,
 	},
 };
 
diff --git a/object.h b/object.h
index 96a2105859..20b18805f0 100644
--- a/object.h
+++ b/object.h
@@ -73,6 +73,7 @@ struct object_array {
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * builtin/fsck.c:           0--3
+ * builtin/gc.c:             0
  * builtin/index-pack.c:                                     2021
  * builtin/pack-objects.c:                                   20
  * builtin/reflog.c:                   10--12
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 08/11] maintenance: create maintenance.<task>.enabled config
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt             |  2 ++
 Documentation/config/maintenance.txt |  6 ++++++
 Documentation/git-maintenance.txt    | 14 +++++++++-----
 builtin/gc.c                         | 19 +++++++++++++++++++
 t/t7900-maintenance.sh               |  8 ++++++++
 5 files changed, 44 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3042d80978..f93b6837e4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -398,6 +398,8 @@ include::config/mailinfo.txt[]
 
 include::config/mailmap.txt[]
 
+include::config/maintenance.txt[]
+
 include::config/man.txt[]
 
 include::config/merge.txt[]
diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
new file mode 100644
index 0000000000..4402b8b49f
--- /dev/null
+++ b/Documentation/config/maintenance.txt
@@ -0,0 +1,6 @@
+maintenance.<task>.enabled::
+	This boolean config option controls whether the maintenance task
+	with name `<task>` is run when no `--task` option is specified to
+	`git maintenance run`. These config values are ignored if a
+	`--task` option exists. By default, only `maintenance.gc.enabled`
+	is true.
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 819ca41ab6..6abcb8255a 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,9 +30,11 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks. If one or more `--task=<task>`
-	options are specified, then those tasks are run in the provided
-	order. Otherwise, only the `gc` task is run.
+	Run one or more maintenance tasks. If one or more `--task` options
+	are specified, then those tasks are run in that order. Otherwise,
+	the tasks are determined by which `maintenance.<task>.enabled`
+	config options are true. By default, only `maintenance.gc.enabled`
+	is true.
 
 TASKS
 -----
@@ -67,8 +69,10 @@ OPTIONS
 
 --task=<task>::
 	If this option is specified one or more times, then only run the
-	specified tasks in the specified order. See the 'TASKS' section
-	for the list of accepted `<task>` values.
+	specified tasks in the specified order. If no `--task=<task>`
+	arguments are specified, then only the tasks with
+	`maintenance.<task>.enabled` configured as `true` are considered.
+	See the 'TASKS' section for the list of accepted `<task>` values.
 
 GIT
 ---
diff --git a/builtin/gc.c b/builtin/gc.c
index 1cebb7282d..67a8d405a1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -841,6 +841,24 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 	return result;
 }
 
+static void initialize_task_config(void)
+{
+	int i;
+	struct strbuf config_name = STRBUF_INIT;
+	for (i = 0; i < TASK__COUNT; i++) {
+		int config_value;
+
+		strbuf_setlen(&config_name, 0);
+		strbuf_addf(&config_name, "maintenance.%s.enabled",
+			    tasks[i].name);
+
+		if (!git_config_get_bool(config_name.buf, &config_value))
+			tasks[i].enabled = config_value;
+	}
+
+	strbuf_release(&config_name);
+}
+
 static int task_option_parse(const struct option *opt,
 			     const char *arg, int unset)
 {
@@ -889,6 +907,7 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	memset(&opts, 0, sizeof(opts));
 
 	opts.quiet = !isatty(2);
+	initialize_task_config();
 
 	for (i = 0; i < TASK__COUNT; i++)
 		tasks[i].selected_order = -1;
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 792765aff7..290abb7832 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -25,6 +25,14 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'maintenance.<task>.enabled' '
+	git config maintenance.gc.enabled false &&
+	git config maintenance.commit-graph.enabled true &&
+	GIT_TRACE2_EVENT="$(pwd)/run-config.txt" git maintenance run 2>err &&
+	test_subcommand ! git gc --quiet <run-config.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-config.txt
+'
+
 test_expect_success 'run --task=<task>' '
 	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
 		git maintenance run --task=commit-graph 2>/dev/null &&
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 07/11] maintenance: take a lock on the objects directory
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.

If the lock file already exists, then fail with a warning. If '--auto'
is specified, then instead no warning is shown and no tasks are attempted.
This will become very important later when we implement the 'prefetch'
task, as this is our stop-gap from creating a recursive process loop
between 'git fetch' and 'git maintenance run --auto'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index e94f263c77..1cebb7282d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -798,6 +798,25 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
 	int i, found_selected = 0;
 	int result = 0;
+	struct lock_file lk;
+	struct repository *r = the_repository;
+	char *lock_path = xstrfmt("%s/maintenance", r->objects->odb->path);
+
+	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
+		/*
+		 * Another maintenance command is running.
+		 *
+		 * If --auto was provided, then it is likely due to a
+		 * recursive process stack. Do not report an error in
+		 * that case.
+		 */
+		if (!opts->auto_flag && !opts->quiet)
+			warning(_("lock file '%s' exists, skipping maintenance"),
+				lock_path);
+		free(lock_path);
+		return 0;
+	}
+	free(lock_path);
 
 	for (i = 0; !found_selected && i < TASK__COUNT; i++)
 		found_selected = tasks[i].selected_order >= 0;
@@ -818,6 +837,7 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		}
 	}
 
+	rollback_lock_file(&lk);
 	return result;
 }
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 05/11] maintenance: add commit-graph task
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command

	git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  8 ++++++++
 builtin/gc.c                      | 30 ++++++++++++++++++++++++++++++
 commit-graph.c                    |  8 ++++----
 commit-graph.h                    |  1 +
 t/t7900-maintenance.sh            |  2 ++
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 04fa0fe329..fc5dbcf0b9 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -35,6 +35,14 @@ run::
 TASKS
 -----
 
+commit-graph::
+	The `commit-graph` job updates the `commit-graph` files incrementally,
+	then verifies that the written data is correct. The incremental
+	write is safe to run alongside concurrent Git processes since it
+	will not expire `.graph` files that were in the previous
+	`commit-graph-chain` file. They will be deleted by a later run based
+	on the expiration delay.
+
 gc::
 	Clean up unnecessary files and optimize the local repository. "GC"
 	stands for "garbage collection," but this task performs many
diff --git a/builtin/gc.c b/builtin/gc.c
index 81643515e5..16e567992e 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -710,6 +710,31 @@ struct maintenance_run_opts {
 	int quiet;
 };
 
+static int run_write_commit_graph(struct maintenance_run_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "commit-graph", "write",
+		     "--split", "--reachable", NULL);
+
+	if (opts->quiet)
+		strvec_push(&child.args, "--no-progress");
+
+	return !!run_command(&child);
+}
+
+static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
+{
+	close_object_store(the_repository->objects);
+	if (run_write_commit_graph(opts)) {
+		error(_("failed to write commit-graph"));
+		return 1;
+	}
+
+	return 0;
+}
+
 static int maintenance_task_gc(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -738,6 +763,7 @@ struct maintenance_task {
 
 enum maintenance_task_label {
 	TASK_GC,
+	TASK_COMMIT_GRAPH,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -749,6 +775,10 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_gc,
 		1,
 	},
+	[TASK_COMMIT_GRAPH] = {
+		"commit-graph",
+		maintenance_task_commit_graph,
+	},
 };
 
 static int maintenance_run_tasks(struct maintenance_run_opts *opts)
diff --git a/commit-graph.c b/commit-graph.c
index e51c91dd5b..a9b0db7d4a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -172,7 +172,7 @@ static char *get_split_graph_filename(struct object_directory *odb,
 		       oid_hex);
 }
 
-static char *get_chain_filename(struct object_directory *odb)
+char *get_commit_graph_chain_filename(struct object_directory *odb)
 {
 	return xstrfmt("%s/info/commit-graphs/commit-graph-chain", odb->path);
 }
@@ -516,7 +516,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r,
 	struct stat st;
 	struct object_id *oids;
 	int i = 0, valid = 1, count;
-	char *chain_name = get_chain_filename(odb);
+	char *chain_name = get_commit_graph_chain_filename(odb);
 	FILE *fp;
 	int stat_res;
 
@@ -1668,7 +1668,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	}
 
 	if (ctx->split) {
-		char *lock_name = get_chain_filename(ctx->odb);
+		char *lock_name = get_commit_graph_chain_filename(ctx->odb);
 
 		hold_lock_file_for_update_mode(&lk, lock_name,
 					       LOCK_DIE_ON_ERROR, 0444);
@@ -2038,7 +2038,7 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx)
 	if (ctx->split_opts && ctx->split_opts->expire_time)
 		expire_time = ctx->split_opts->expire_time;
 	if (!ctx->split) {
-		char *chain_file_name = get_chain_filename(ctx->odb);
+		char *chain_file_name = get_commit_graph_chain_filename(ctx->odb);
 		unlink(chain_file_name);
 		free(chain_file_name);
 		ctx->num_commit_graphs_after = 0;
diff --git a/commit-graph.h b/commit-graph.h
index 09a97030dc..765221cdcc 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -25,6 +25,7 @@ struct raw_object_store;
 struct string_list;
 
 char *get_commit_graph_filename(struct object_directory *odb);
+char *get_commit_graph_chain_filename(struct object_directory *odb);
 int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
 
 /*
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 47d512464c..c0c4e6846e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -4,6 +4,8 @@ test_description='git maintenance builtin'
 
 . ./test-lib.sh
 
+GIT_TEST_COMMIT_GRAPH=0
+
 test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
 	test_i18ngrep "usage: git maintenance run" err &&
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 06/11] maintenance: add --task option
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  9 ++++-
 builtin/gc.c                      | 66 +++++++++++++++++++++++++++++--
 t/t7900-maintenance.sh            | 27 +++++++++++++
 3 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index fc5dbcf0b9..819ca41ab6 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,7 +30,9 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks.
+	Run one or more maintenance tasks. If one or more `--task=<task>`
+	options are specified, then those tasks are run in the provided
+	order. Otherwise, only the `gc` task is run.
 
 TASKS
 -----
@@ -63,6 +65,11 @@ OPTIONS
 --quiet::
 	Do not report progress or other information over `stderr`.
 
+--task=<task>::
+	If this option is specified one or more times, then only run the
+	specified tasks in the specified order. See the 'TASKS' section
+	for the list of accepted `<task>` values.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index 16e567992e..e94f263c77 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -701,7 +701,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 }
 
 static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto] [--[no-]quiet]"),
+	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
 	NULL
 };
 
@@ -759,6 +759,9 @@ struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
 	unsigned enabled:1;
+
+	/* -1 if not selected. */
+	int selected_order;
 };
 
 enum maintenance_task_label {
@@ -781,13 +784,32 @@ static struct maintenance_task tasks[] = {
 	},
 };
 
+static int compare_tasks_by_selection(const void *a_, const void *b_)
+{
+	const struct maintenance_task *a, *b;
+
+	a = (const struct maintenance_task *)&a_;
+	b = (const struct maintenance_task *)&b_;
+
+	return b->selected_order - a->selected_order;
+}
+
 static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
-	int i;
+	int i, found_selected = 0;
 	int result = 0;
 
+	for (i = 0; !found_selected && i < TASK__COUNT; i++)
+		found_selected = tasks[i].selected_order >= 0;
+
+	if (found_selected)
+		QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
+
 	for (i = 0; i < TASK__COUNT; i++) {
-		if (!tasks[i].enabled)
+		if (found_selected && tasks[i].selected_order < 0)
+			continue;
+
+		if (!found_selected && !tasks[i].enabled)
 			continue;
 
 		if (tasks[i].fn(opts)) {
@@ -799,20 +821,58 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 	return result;
 }
 
+static int task_option_parse(const struct option *opt,
+			     const char *arg, int unset)
+{
+	int i, num_selected = 0;
+	struct maintenance_task *task = NULL;
+
+	BUG_ON_OPT_NEG(unset);
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		if (tasks[i].selected_order >= 0)
+			num_selected++;
+		if (!strcasecmp(tasks[i].name, arg)) {
+			task = &tasks[i];
+		}
+	}
+
+	if (!task) {
+		error(_("'%s' is not a valid task"), arg);
+		return 1;
+	}
+
+	if (task->selected_order >= 0) {
+		error(_("task '%s' cannot be selected multiple times"), arg);
+		return 1;
+	}
+
+	task->selected_order = num_selected + 1;
+
+	return 0;
+}
+
 static int maintenance_run(int argc, const char **argv, const char *prefix)
 {
+	int i;
 	struct maintenance_run_opts opts;
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
+		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
+			N_("run a specific task"),
+			PARSE_OPT_NONEG, task_option_parse),
 		OPT_END()
 	};
 	memset(&opts, 0, sizeof(opts));
 
 	opts.quiet = !isatty(2);
 
+	for (i = 0; i < TASK__COUNT; i++)
+		tasks[i].selected_order = -1;
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_run_options,
 			     builtin_maintenance_run_usage,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index c0c4e6846e..792765aff7 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -25,4 +25,31 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'run --task=<task>' '
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-gc.txt" \
+		git maintenance run --task=gc 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-both.txt" \
+		git maintenance run --task=commit-graph --task=gc 2>/dev/null &&
+	test_subcommand ! git gc --quiet <run-commit-graph.txt &&
+	test_subcommand git gc --quiet <run-gc.txt &&
+	test_subcommand git gc --quiet <run-both.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-commit-graph.txt &&
+	test_subcommand ! git commit-graph write --split --reachable --no-progress <run-gc.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-both.txt
+'
+
+test_expect_success 'run --task=bogus' '
+	test_must_fail git maintenance run --task=bogus 2>err &&
+	test_i18ngrep "is not a valid task" err
+'
+
+test_expect_success 'run --task duplicate' '
+	test_must_fail git maintenance run --task=gc --task=gc 2>err &&
+	test_i18ngrep "cannot be selected multiple times" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 04/11] maintenance: initialize task array
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

In anticipation of implementing multiple maintenance tasks inside the
'maintenance' builtin, use a list of structs to describe the work to be
done.

The struct maintenance_task stores the name of the task (as given by a
future command-line argument) along with a function pointer to its
implementation and a boolean for whether the step is enabled.

A list these structs are initialized with the full list of implemented
tasks along with a default order. For now, this list only contains the
"gc" task. This task is also the only task enabled by default.

The run subcommand will return a nonzero exit code if any task fails.
However, it will attempt all tasks in its loop before returning with the
failure. Also each failed task will print an error message.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index da11f1edcd..81643515e5 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -728,6 +728,47 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 	return run_command(&child);
 }
 
+typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
+
+struct maintenance_task {
+	const char *name;
+	maintenance_task_fn *fn;
+	unsigned enabled:1;
+};
+
+enum maintenance_task_label {
+	TASK_GC,
+
+	/* Leave as final value */
+	TASK__COUNT
+};
+
+static struct maintenance_task tasks[] = {
+	[TASK_GC] = {
+		"gc",
+		maintenance_task_gc,
+		1,
+	},
+};
+
+static int maintenance_run_tasks(struct maintenance_run_opts *opts)
+{
+	int i;
+	int result = 0;
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		if (!tasks[i].enabled)
+			continue;
+
+		if (tasks[i].fn(opts)) {
+			error(_("task '%s' failed"), tasks[i].name);
+			result = 1;
+		}
+	}
+
+	return result;
+}
+
 static int maintenance_run(int argc, const char **argv, const char *prefix)
 {
 	struct maintenance_run_opts opts;
@@ -750,7 +791,7 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	if (argc != 0)
 		usage_with_options(builtin_maintenance_run_usage,
 				   builtin_maintenance_run_options);
-	return maintenance_task_gc(&opts);
+	return maintenance_run_tasks(&opts);
 }
 
 static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 02/11] maintenance: add --quiet option
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.

Pipe the option to the 'git gc' child process.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  3 +++
 builtin/gc.c                      | 11 ++++++++++-
 t/t7900-maintenance.sh            | 15 ++++++++++-----
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index ff47fb3641..04fa0fe329 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -52,6 +52,9 @@ OPTIONS
 	in the `gc.auto` config setting, or when the number of pack-files
 	exceeds the `gc.autoPackLimit` config setting.
 
+--quiet::
+	Do not report progress or other information over `stderr`.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index 7f2771e786..da11f1edcd 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -701,12 +701,13 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 }
 
 static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto]"),
+	N_("git maintenance run [--auto] [--[no-]quiet]"),
 	NULL
 };
 
 struct maintenance_run_opts {
 	int auto_flag;
+	int quiet;
 };
 
 static int maintenance_task_gc(struct maintenance_run_opts *opts)
@@ -718,6 +719,10 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 
 	if (opts->auto_flag)
 		strvec_push(&child.args, "--auto");
+	if (opts->quiet)
+		strvec_push(&child.args, "--quiet");
+	else
+		strvec_push(&child.args, "--no-quiet");
 
 	close_object_store(the_repository->objects);
 	return run_command(&child);
@@ -729,10 +734,14 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_BOOL(0, "quiet", &opts.quiet,
+			 N_("do not report progress or other information over stderr")),
 		OPT_END()
 	};
 	memset(&opts, 0, sizeof(opts));
 
+	opts.quiet = !isatty(2);
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_run_options,
 			     builtin_maintenance_run_usage,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 14aa691f19..47d512464c 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -11,11 +11,16 @@ test_expect_success 'help text' '
 	test_i18ngrep "invalid subcommand: barf" err
 '
 
-test_expect_success 'run [--auto]' '
-	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
-	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
-	test_subcommand git gc <run-no-auto.txt &&
-	test_subcommand git gc --auto <run-auto.txt
+test_expect_success 'run [--auto|--quiet]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \
+		git maintenance run 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \
+		git maintenance run --auto 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
+		git maintenance run --no-quiet 2>/dev/null &&
+	test_subcommand git gc --quiet <run-no-auto.txt &&
+	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 03/11] maintenance: replace run_auto_gc()
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.

To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.

Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.

Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/fetch-options.txt |  6 ++++--
 Documentation/git-clone.txt     |  6 +++---
 builtin/am.c                    |  2 +-
 builtin/commit.c                |  2 +-
 builtin/fetch.c                 |  6 ++++--
 builtin/merge.c                 |  2 +-
 builtin/rebase.c                |  4 ++--
 run-command.c                   | 16 +++++++---------
 run-command.h                   |  2 +-
 t/t5510-fetch.sh                |  2 +-
 10 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 07d06ff445..b65a758661 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -95,9 +95,11 @@ ifndef::git-pull[]
 	Allow several <repository> and <group> arguments to be
 	specified. No <refspec>s may be specified.
 
+--[no-]auto-maintenance::
 --[no-]auto-gc::
-	Run `git gc --auto` at the end to perform garbage collection
-	if needed. This is enabled by default.
+	Run `git maintenance run --auto` at the end to perform automatic
+	repository maintenance if needed. (`--[no-]auto-gc` is a synonym.)
+	This is enabled by default.
 
 --[no-]write-commit-graph::
 	Write a commit-graph after fetching. This overrides the config
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index c898310099..097e6a86c5 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -78,9 +78,9 @@ repository using this option and then delete branches (or use any
 other Git command that makes any existing commit unreferenced) in the
 source repository, some objects may become unreferenced (or dangling).
 These objects may be removed by normal Git operations (such as `git commit`)
-which automatically call `git gc --auto`. (See linkgit:git-gc[1].)
-If these objects are removed and were referenced by the cloned repository,
-then the cloned repository will become corrupt.
+which automatically call `git maintenance run --auto`. (See
+linkgit:git-maintenance[1].) If these objects are removed and were referenced
+by the cloned repository, then the cloned repository will become corrupt.
 +
 Note that running `git repack` without the `--local` option in a repository
 cloned with `--shared` will copy objects from the source repository into a pack
diff --git a/builtin/am.c b/builtin/am.c
index dd4e6c2d9b..68e9d17379 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1795,7 +1795,7 @@ static void am_run(struct am_state *state, int resume)
 	if (!state->rebasing) {
 		am_destroy(state);
 		close_object_store(the_repository->objects);
-		run_auto_gc(state->quiet);
+		run_auto_maintenance(state->quiet);
 	}
 }
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 69ac78d5e5..f9b0a0c05d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1702,7 +1702,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	git_test_write_commit_graph_or_die();
 
 	repo_rerere(the_repository, 0);
-	run_auto_gc(quiet);
+	run_auto_maintenance(quiet);
 	run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
 	if (amend && !no_post_rewrite) {
 		commit_post_rewrite(the_repository, current_head, &oid);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 2eb8d6a5a5..cb38e6f5ec 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -199,8 +199,10 @@ static struct option builtin_fetch_options[] = {
 	OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
 			N_("report that we have only objects reachable from this object")),
 	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
+	OPT_BOOL(0, "auto-maintenance", &enable_auto_gc,
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "auto-gc", &enable_auto_gc,
-		 N_("run 'gc --auto' after fetching")),
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "show-forced-updates", &fetch_show_forced_updates,
 		 N_("check for forced-updates on all updated branches")),
 	OPT_BOOL(0, "write-commit-graph", &fetch_write_commit_graph,
@@ -1891,7 +1893,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	close_object_store(the_repository->objects);
 
 	if (enable_auto_gc)
-		run_auto_gc(verbosity < 0);
+		run_auto_maintenance(verbosity < 0);
 
 	return result;
 }
diff --git a/builtin/merge.c b/builtin/merge.c
index 74829a838e..8f31bfab56 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -456,7 +456,7 @@ static void finish(struct commit *head_commit,
 			 * user should see them.
 			 */
 			close_object_store(the_repository->objects);
-			run_auto_gc(verbosity < 0);
+			run_auto_maintenance(verbosity < 0);
 		}
 	}
 	if (new_head && show_diffstat) {
diff --git a/builtin/rebase.c b/builtin/rebase.c
index dadb52fa92..9ffba9009d 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -728,10 +728,10 @@ static int finish_rebase(struct rebase_options *opts)
 	apply_autostash(state_dir_path("autostash", opts));
 	close_object_store(the_repository->objects);
 	/*
-	 * We ignore errors in 'gc --auto', since the
+	 * We ignore errors in 'git maintenance run --auto', since the
 	 * user should see them.
 	 */
-	run_auto_gc(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
+	run_auto_maintenance(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
 	if (opts->type == REBASE_MERGE) {
 		struct replay_opts replay = REPLAY_OPTS_INIT;
 
diff --git a/run-command.c b/run-command.c
index cc9c3296ba..2ee59acdc8 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1866,15 +1866,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 	return result;
 }
 
-int run_auto_gc(int quiet)
+int run_auto_maintenance(int quiet)
 {
-	struct strvec argv_gc_auto = STRVEC_INIT;
-	int status;
+	struct child_process maint = CHILD_PROCESS_INIT;
 
-	strvec_pushl(&argv_gc_auto, "gc", "--auto", NULL);
-	if (quiet)
-		strvec_push(&argv_gc_auto, "--quiet");
-	status = run_command_v_opt(argv_gc_auto.v, RUN_GIT_CMD);
-	strvec_clear(&argv_gc_auto);
-	return status;
+	maint.git_cmd = 1;
+	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
+	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
+
+	return run_command(&maint);
 }
diff --git a/run-command.h b/run-command.h
index 8b9bfaef16..6472b38bde 100644
--- a/run-command.h
+++ b/run-command.h
@@ -221,7 +221,7 @@ int run_hook_ve(const char *const *env, const char *name, va_list args);
 /*
  * Trigger an auto-gc
  */
-int run_auto_gc(int quiet);
+int run_auto_maintenance(int quiet);
 
 #define RUN_COMMAND_NO_STDIN 1
 #define RUN_GIT_CMD	     2	/*If this is to be git sub-command */
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 2a1abe91f0..a9682c5669 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -934,7 +934,7 @@ test_expect_success 'fetching with auto-gc does not lock up' '
 		git config fetch.unpackLimit 1 &&
 		git config gc.autoPackLimit 1 &&
 		git config gc.autoDetach false &&
-		GIT_ASK_YESNO="$D/askyesno" git fetch >fetch.out 2>&1 &&
+		GIT_ASK_YESNO="$D/askyesno" git fetch --verbose >fetch.out 2>&1 &&
 		test_i18ngrep "Auto packing the repository" fetch.out &&
 		! grep "Should I try again" fetch.out
 	)
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 01/11] maintenance: create basic maintenance runner
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.695.v3.git.1598380426.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.

Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.

For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.

Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.

Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                        |  1 +
 Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
 builtin.h                         |  1 +
 builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
 command-list.txt                  |  1 +
 git.c                             |  1 +
 t/t7900-maintenance.sh            | 21 ++++++++++++
 t/test-lib-functions.sh           | 33 ++++++++++++++++++
 8 files changed, 172 insertions(+)
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh

diff --git a/.gitignore b/.gitignore
index ee509a2ad2..a5808fa30d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -90,6 +90,7 @@
 /git-ls-tree
 /git-mailinfo
 /git-mailsplit
+/git-maintenance
 /git-merge
 /git-merge-base
 /git-merge-index
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
new file mode 100644
index 0000000000..ff47fb3641
--- /dev/null
+++ b/Documentation/git-maintenance.txt
@@ -0,0 +1,57 @@
+git-maintenance(1)
+==================
+
+NAME
+----
+git-maintenance - Run tasks to optimize Git repository data
+
+
+SYNOPSIS
+--------
+[verse]
+'git maintenance' run [<options>]
+
+
+DESCRIPTION
+-----------
+Run tasks to optimize Git repository data, speeding up other Git commands
+and reducing storage requirements for the repository.
+
+Git commands that add repository data, such as `git add` or `git fetch`,
+are optimized for a responsive user experience. These commands do not take
+time to optimize the Git data, since such optimizations scale with the full
+size of the repository while these user commands each perform a relatively
+small action.
+
+The `git maintenance` command provides flexibility for how to optimize the
+Git repository.
+
+SUBCOMMANDS
+-----------
+
+run::
+	Run one or more maintenance tasks.
+
+TASKS
+-----
+
+gc::
+	Clean up unnecessary files and optimize the local repository. "GC"
+	stands for "garbage collection," but this task performs many
+	smaller tasks. This task can be expensive for large repositories,
+	as it repacks all Git objects into a single pack-file. It can also
+	be disruptive in some situations, as it deletes stale data. See
+	linkgit:git-gc[1] for more details on garbage collection in Git.
+
+OPTIONS
+-------
+--auto::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain thresholds are met. For example, the `gc` task
+	runs when the number of loose objects exceeds the number stored
+	in the `gc.auto` config setting, or when the number of pack-files
+	exceeds the `gc.autoPackLimit` config setting.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/builtin.h b/builtin.h
index a5ae15bfe5..17c1c0ce49 100644
--- a/builtin.h
+++ b/builtin.h
@@ -167,6 +167,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix);
 int cmd_ls_remote(int argc, const char **argv, const char *prefix);
 int cmd_mailinfo(int argc, const char **argv, const char *prefix);
 int cmd_mailsplit(int argc, const char **argv, const char *prefix);
+int cmd_maintenance(int argc, const char **argv, const char *prefix);
 int cmd_merge(int argc, const char **argv, const char *prefix);
 int cmd_merge_base(int argc, const char **argv, const char *prefix);
 int cmd_merge_index(int argc, const char **argv, const char *prefix);
diff --git a/builtin/gc.c b/builtin/gc.c
index aafa0946f5..7f2771e786 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -699,3 +699,60 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	return 0;
 }
+
+static const char * const builtin_maintenance_run_usage[] = {
+	N_("git maintenance run [--auto]"),
+	NULL
+};
+
+struct maintenance_run_opts {
+	int auto_flag;
+};
+
+static int maintenance_task_gc(struct maintenance_run_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_push(&child.args, "gc");
+
+	if (opts->auto_flag)
+		strvec_push(&child.args, "--auto");
+
+	close_object_store(the_repository->objects);
+	return run_command(&child);
+}
+
+static int maintenance_run(int argc, const char **argv, const char *prefix)
+{
+	struct maintenance_run_opts opts;
+	struct option builtin_maintenance_run_options[] = {
+		OPT_BOOL(0, "auto", &opts.auto_flag,
+			 N_("run tasks based on the state of the repository")),
+		OPT_END()
+	};
+	memset(&opts, 0, sizeof(opts));
+
+	argc = parse_options(argc, argv, prefix,
+			     builtin_maintenance_run_options,
+			     builtin_maintenance_run_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (argc != 0)
+		usage_with_options(builtin_maintenance_run_usage,
+				   builtin_maintenance_run_options);
+	return maintenance_task_gc(&opts);
+}
+
+static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
+
+int cmd_maintenance(int argc, const char **argv, const char *prefix)
+{
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage(builtin_maintenance_usage);
+
+	if (!strcmp(argv[1], "run"))
+		return maintenance_run(argc - 1, argv + 1, prefix);
+
+	die(_("invalid subcommand: %s"), argv[1]);
+}
diff --git a/command-list.txt b/command-list.txt
index e5901f2213..0e3204e7d1 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -117,6 +117,7 @@ git-ls-remote                           plumbinginterrogators
 git-ls-tree                             plumbinginterrogators
 git-mailinfo                            purehelpers
 git-mailsplit                           purehelpers
+git-maintenance                         mainporcelain
 git-merge                               mainporcelain           history
 git-merge-base                          plumbinginterrogators
 git-merge-file                          plumbingmanipulators
diff --git a/git.c b/git.c
index 8bd1d7551d..24f250d29a 100644
--- a/git.c
+++ b/git.c
@@ -529,6 +529,7 @@ static struct cmd_struct commands[] = {
 	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 	{ "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "mailsplit", cmd_mailsplit, NO_PARSEOPT },
+	{ "maintenance", cmd_maintenance, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "merge", cmd_merge, RUN_SETUP | NEED_WORK_TREE },
 	{ "merge-base", cmd_merge_base, RUN_SETUP },
 	{ "merge-file", cmd_merge_file, RUN_SETUP_GENTLY },
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
new file mode 100755
index 0000000000..14aa691f19
--- /dev/null
+++ b/t/t7900-maintenance.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+
+test_description='git maintenance builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'help text' '
+	test_expect_code 129 git maintenance -h 2>err &&
+	test_i18ngrep "usage: git maintenance run" err &&
+	test_expect_code 128 git maintenance barf 2>err &&
+	test_i18ngrep "invalid subcommand: barf" err
+'
+
+test_expect_success 'run [--auto]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
+	test_subcommand git gc <run-no-auto.txt &&
+	test_subcommand git gc --auto <run-auto.txt
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6a8e194a99..d805e73f45 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1628,3 +1628,36 @@ test_path_is_hidden () {
 	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
 	return 1
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_subcommand [!] <command> <args>... < <trace>
+#
+# For example, to look for an invocation of "git upload-pack
+# /path/to/repo"
+#
+#	GIT_TRACE2_EVENT=event.log git fetch ... &&
+#	test_subcommand git upload-pack "$PATH" <event.log
+#
+# If the first parameter passed is !, this instead checks that
+# the given command was not called.
+#
+test_subcommand () {
+	local negate=
+	if test "$1" = "!"
+	then
+		negate=t
+		shift
+	fi
+
+	local expr=$(printf '"%s",' "$@")
+	expr="${expr%,}"
+
+	if test -n "$negate"
+	then
+		! grep "\[$expr\]"
+	else
+		grep "\[$expr\]"
+	fi
+}
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 00/11] Maintenance I: Command, gc and commit-graph tasks
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee
In-Reply-To: <pull.695.v2.git.1597760589.gitgitgadget@gmail.com>

This series is based on jc/no-update-fetch-head.

This patch series contains 11patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help focus
the conversation, I'm splitting out the portions that create and test the
'maintenance' builtin from the additional tasks (prefetch, loose-objects,
incremental-repack) that can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/

As mentioned before, git gc already plays the role of maintaining Git
repositories. It has accumulated several smaller pieces in its long history,
including:

 1. Repacking all reachable objects into one pack-file (and deleting
    unreachable objects).
 2. Packing refs.
 3. Expiring reflogs.
 4. Clearing rerere logs.
 5. Updating the commit-graph file.
 6. Pruning worktrees.

While expiring reflogs, clearing rererelogs, and deleting unreachable
objects are suitable under the guise of "garbage collection", packing refs
and updating the commit-graph file are not as obviously fitting. Further,
these operations are "all or nothing" in that they rewrite almost all
repository data, which does not perform well at extremely large scales.
These operations can also be disruptive to foreground Git commands when git
gc --auto triggers during routine use.

This series does not intend to change what git gc does, but instead create
new choices for automatic maintenance activities, of which git gc remains
the only one enabled by default.

The new maintenance tasks are:

 * 'commit-graph' : write and verify a single layer of an incremental
   commit-graph.
 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

The only included tasks are the 'gc' and 'commit-graph' tasks. The rest will
follow in a follow-up series. Including the 'commit-graph' task here allows
us to build and test features like config settings and the --task= 
command-line argument.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". There are
additional config options to allow customizing the conditions for which the
tasks run during the '--auto' option.

 Because 'gc' is implemented as a maintenance task, the most dramatic change
of this series is to convert the 'git gc --auto' calls into 'git maintenance
run --auto' calls at the end of some Git commands. By default, the only
change is that 'git gc --auto' will be run below an additional 'git
maintenance' process.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves.

Updates since v2
================

 * Based on jc/no-update-fetch-head instead of jk/strvec.
   
   
 * I realized that the other maintenance subcommands should not accept the
   options for the 'run' subcommand, so I reorganized the option parsing
   into that subcommand. This makes the range-diff noisier than it would
   have been otherwise.
   
   
 * While updating the parsing, I also updated the usage strings.
   
   
 * The "verify, then delete and rewrite on failure" logic is removed. I'll
   consider bringing this back with a way to test the behavior in a later
   patch series.
   
   
 * Other feedback from Jonathan Tan is applied.
   
   

Updates since v1 (of this series)
=================================

 * Documentation fixes.
   
   
 * The builtin code had some slight tweaks in PATCH 1.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Derrick Stolee (11):
  maintenance: create basic maintenance runner
  maintenance: add --quiet option
  maintenance: replace run_auto_gc()
  maintenance: initialize task array
  maintenance: add commit-graph task
  maintenance: add --task option
  maintenance: take a lock on the objects directory
  maintenance: create maintenance.<task>.enabled config
  maintenance: use pointers to check --auto
  maintenance: add auto condition for commit-graph task
  maintenance: add trace2 regions for task execution

 .gitignore                           |   1 +
 Documentation/config.txt             |   2 +
 Documentation/config/maintenance.txt |  16 ++
 Documentation/fetch-options.txt      |   6 +-
 Documentation/git-clone.txt          |   6 +-
 Documentation/git-maintenance.txt    |  79 +++++++
 builtin.h                            |   1 +
 builtin/am.c                         |   2 +-
 builtin/commit.c                     |   2 +-
 builtin/fetch.c                      |   6 +-
 builtin/gc.c                         | 336 +++++++++++++++++++++++++++
 builtin/merge.c                      |   2 +-
 builtin/rebase.c                     |   4 +-
 command-list.txt                     |   1 +
 commit-graph.c                       |   8 +-
 commit-graph.h                       |   1 +
 git.c                                |   1 +
 object.h                             |   1 +
 run-command.c                        |  16 +-
 run-command.h                        |   2 +-
 t/t5510-fetch.sh                     |   2 +-
 t/t5514-fetch-multiple.sh            |   2 +-
 t/t7900-maintenance.sh               |  63 +++++
 t/test-lib-functions.sh              |  33 +++
 24 files changed, 565 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh


base-commit: 887952b8c680626f4721cb5fa57704478801aca4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-695%2Fderrickstolee%2Fmaintenance%2Fbuiltin-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-695/derrickstolee/maintenance/builtin-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/695

Range-diff vs v2:

  1:  e09e4a4a87 !  1:  aa961af387 maintenance: create basic maintenance runner
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
       	return 0;
       }
      +
     -+static const char * const builtin_maintenance_usage[] = {
     -+	N_("git maintenance run [<options>]"),
     ++static const char * const builtin_maintenance_run_usage[] = {
     ++	N_("git maintenance run [--auto]"),
      +	NULL
      +};
      +
     -+struct maintenance_opts {
     ++struct maintenance_run_opts {
      +	int auto_flag;
      +};
      +
     -+static int maintenance_task_gc(struct maintenance_opts *opts)
     ++static int maintenance_task_gc(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
      +	return run_command(&child);
      +}
      +
     -+static int maintenance_run(struct maintenance_opts *opts)
     ++static int maintenance_run(int argc, const char **argv, const char *prefix)
      +{
     -+	return maintenance_task_gc(opts);
     -+}
     -+
     -+int cmd_maintenance(int argc, const char **argv, const char *prefix)
     -+{
     -+	struct maintenance_opts opts;
     -+	struct option builtin_maintenance_options[] = {
     ++	struct maintenance_run_opts opts;
     ++	struct option builtin_maintenance_run_options[] = {
      +		OPT_BOOL(0, "auto", &opts.auto_flag,
      +			 N_("run tasks based on the state of the repository")),
      +		OPT_END()
      +	};
     -+
      +	memset(&opts, 0, sizeof(opts));
      +
     -+	if (argc == 2 && !strcmp(argv[1], "-h"))
     -+		usage_with_options(builtin_maintenance_usage,
     -+				   builtin_maintenance_options);
     -+
      +	argc = parse_options(argc, argv, prefix,
     -+			     builtin_maintenance_options,
     -+			     builtin_maintenance_usage,
     -+			     PARSE_OPT_KEEP_UNKNOWN);
     ++			     builtin_maintenance_run_options,
     ++			     builtin_maintenance_run_usage,
     ++			     PARSE_OPT_STOP_AT_NON_OPTION);
     ++
     ++	if (argc != 0)
     ++		usage_with_options(builtin_maintenance_run_usage,
     ++				   builtin_maintenance_run_options);
     ++	return maintenance_task_gc(&opts);
     ++}
     ++
     ++static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
      +
     -+	if (argc != 1)
     -+		usage_with_options(builtin_maintenance_usage,
     -+				   builtin_maintenance_options);
     ++int cmd_maintenance(int argc, const char **argv, const char *prefix)
     ++{
     ++	if (argc == 2 && !strcmp(argv[1], "-h"))
     ++		usage(builtin_maintenance_usage);
      +
     -+	if (!strcmp(argv[0], "run"))
     -+		return maintenance_run(&opts);
     ++	if (!strcmp(argv[1], "run"))
     ++		return maintenance_run(argc - 1, argv + 1, prefix);
      +
     -+	die(_("invalid subcommand: %s"), argv[0]);
     ++	die(_("invalid subcommand: %s"), argv[1]);
      +}
      
     + ## command-list.txt ##
     +@@ command-list.txt: git-ls-remote                           plumbinginterrogators
     + git-ls-tree                             plumbinginterrogators
     + git-mailinfo                            purehelpers
     + git-mailsplit                           purehelpers
     ++git-maintenance                         mainporcelain
     + git-merge                               mainporcelain           history
     + git-merge-base                          plumbinginterrogators
     + git-merge-file                          plumbingmanipulators
     +
       ## git.c ##
      @@ git.c: static struct cmd_struct commands[] = {
       	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
  2:  adae48d235 !  2:  5386d8a628 maintenance: add --quiet option
     @@ Documentation/git-maintenance.txt: OPTIONS
       Part of the linkgit:git[1] suite
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static const char * const builtin_maintenance_usage[] = {
     +@@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
     + }
       
     - struct maintenance_opts {
     + static const char * const builtin_maintenance_run_usage[] = {
     +-	N_("git maintenance run [--auto]"),
     ++	N_("git maintenance run [--auto] [--[no-]quiet]"),
     + 	NULL
     + };
     + 
     + struct maintenance_run_opts {
       	int auto_flag;
      +	int quiet;
       };
       
     - static int maintenance_task_gc(struct maintenance_opts *opts)
     -@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
     + static int maintenance_task_gc(struct maintenance_run_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
       
       	if (opts->auto_flag)
       		strvec_push(&child.args, "--auto");
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
       
       	close_object_store(the_repository->objects);
       	return run_command(&child);
     -@@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 	struct option builtin_maintenance_options[] = {
     +@@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char *prefix)
     + 	struct option builtin_maintenance_run_options[] = {
       		OPT_BOOL(0, "auto", &opts.auto_flag,
       			 N_("run tasks based on the state of the repository")),
      +		OPT_BOOL(0, "quiet", &opts.quiet,
      +			 N_("do not report progress or other information over stderr")),
       		OPT_END()
       	};
     - 
     -@@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 		usage_with_options(builtin_maintenance_usage,
     - 				   builtin_maintenance_options);
     + 	memset(&opts, 0, sizeof(opts));
       
      +	opts.quiet = !isatty(2);
      +
       	argc = parse_options(argc, argv, prefix,
     - 			     builtin_maintenance_options,
     - 			     builtin_maintenance_usage,
     + 			     builtin_maintenance_run_options,
     + 			     builtin_maintenance_run_usage,
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'help text' '
  3:  91741a0cfc =  3:  e28b332df4 maintenance: replace run_auto_gc()
  4:  1db3b96280 !  4:  82326c1c38 maintenance: initialize task array
     @@ Commit message
      
          The run subcommand will return a nonzero exit code if any task fails.
          However, it will attempt all tasks in its loop before returning with the
     -    failure. Also each failed task will send an error message.
     +    failure. Also each failed task will print an error message.
      
          Helped-by: Taylor Blau <me@ttaylorr.com>
          Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
       	return run_command(&child);
       }
       
     -+typedef int maintenance_task_fn(struct maintenance_opts *opts);
     ++typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
      +
      +struct maintenance_task {
      +	const char *name;
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
      +	},
      +};
      +
     - static int maintenance_run(struct maintenance_opts *opts)
     - {
     --	return maintenance_task_gc(opts);
     ++static int maintenance_run_tasks(struct maintenance_run_opts *opts)
     ++{
      +	int i;
      +	int result = 0;
      +
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
      +	}
      +
      +	return result;
     ++}
     ++
     + static int maintenance_run(int argc, const char **argv, const char *prefix)
     + {
     + 	struct maintenance_run_opts opts;
     +@@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char *prefix)
     + 	if (argc != 0)
     + 		usage_with_options(builtin_maintenance_run_usage,
     + 				   builtin_maintenance_run_options);
     +-	return maintenance_task_gc(&opts);
     ++	return maintenance_run_tasks(&opts);
       }
       
     - int cmd_maintenance(int argc, const char **argv, const char *prefix)
     + static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
  5:  50b457fd57 !  5:  06984a42bf maintenance: add commit-graph task
     @@ Commit message
          maintenance: add commit-graph task
      
          The first new task in the 'git maintenance' builtin is the
     -    'commit-graph' job. It is based on the sequence of events in the
     -    'commit-graph' job in Scalar [1]. This sequence is as follows:
     +    'commit-graph' task. This updates the commit-graph file
     +    incrementally with the command
      
     -    1. git commit-graph write --reachable --split
     -    2. git commit-graph verify --shallow
     -    3. If the verify succeeds, stop.
     -    4. Delete the commit-graph-chain file.
     -    5. git commit-graph write --reachable --split
     +            git commit-graph write --reachable --split
      
          By writing an incremental commit-graph file using the "--split"
          option we minimize the disruption from this operation. The default
     @@ Commit message
          This could be avoided by a future update to use the --expire-time
          argument when writing the commit-graph.
      
     -    By using 'git commit-graph verify --shallow' we can ensure that
     -    the file we just wrote is valid. This is an extra safety precaution
     -    that is faster than our 'write' subcommand. In the rare situation
     -    that the newest layer of the commit-graph is corrupt, we can "fix"
     -    the corruption by deleting the commit-graph-chain file and rewrite
     -    the full commit-graph as a new one-layer commit graph. This does
     -    not completely prevent _that_ file from being corrupt, but it does
     -    recompute the commit-graph by parsing commits from the object
     -    database. In our use of this step in Scalar and VFS for Git, we
     -    have only seen this issue arise because our microsoft/git fork
     -    reverted 43d3561 ("commit-graph write: don't die if the existing
     -    graph is corrupt" 2019-03-25) for a while to keep commit-graph
     -    writes very fast. We dropped the revert when updating to v2.23.0.
     -    The verify still has potential for catching corrupt data across
     -    the layer boundary: if the new file has commit X with parent Y
     -    in an old file but the commit ID for Y in the old file had a
     -    bitswap, then we will notice that in the 'verify' command.
     -
     -    [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/CommitGraphStep.cs
     -
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     @@ Documentation/git-maintenance.txt: run::
       
      +commit-graph::
      +	The `commit-graph` job updates the `commit-graph` files incrementally,
     -+	then verifies that the written data is correct. If the new layer has an
     -+	issue, then the chain file is removed and the `commit-graph` is
     -+	rewritten from scratch.
     -++
     -+The incremental write is safe to run alongside concurrent Git processes
     -+since it will not expire `.graph` files that were in the previous
     -+`commit-graph-chain` file. They will be deleted by a later run based on
     -+the expiration delay.
     ++	then verifies that the written data is correct. The incremental
     ++	write is safe to run alongside concurrent Git processes since it
     ++	will not expire `.graph` files that were in the previous
     ++	`commit-graph-chain` file. They will be deleted by a later run based
     ++	on the expiration delay.
      +
       gc::
       	Clean up unnecessary files and optimize the local repository. "GC"
       	stands for "garbage collection," but this task performs many
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: struct maintenance_opts {
     +@@ builtin/gc.c: struct maintenance_run_opts {
       	int quiet;
       };
       
     -+static int run_write_commit_graph(struct maintenance_opts *opts)
     ++static int run_write_commit_graph(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: struct maintenance_opts {
      +	return !!run_command(&child);
      +}
      +
     -+static int run_verify_commit_graph(struct maintenance_opts *opts)
     ++static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
      +{
     -+	struct child_process child = CHILD_PROCESS_INIT;
     -+
     -+	child.git_cmd = 1;
     -+	strvec_pushl(&child.args, "commit-graph", "verify",
     -+		     "--shallow", NULL);
     -+
     -+	if (opts->quiet)
     -+		strvec_push(&child.args, "--no-progress");
     -+
     -+	return !!run_command(&child);
     -+}
     -+
     -+static int maintenance_task_commit_graph(struct maintenance_opts *opts)
     -+{
     -+	struct repository *r = the_repository;
     -+	char *chain_path;
     -+
     -+	close_object_store(r->objects);
     ++	close_object_store(the_repository->objects);
      +	if (run_write_commit_graph(opts)) {
      +		error(_("failed to write commit-graph"));
      +		return 1;
      +	}
      +
     -+	if (!run_verify_commit_graph(opts))
     -+		return 0;
     -+
     -+	warning(_("commit-graph verify caught error, rewriting"));
     -+
     -+	chain_path = get_commit_graph_chain_filename(r->objects->odb);
     -+	if (unlink(chain_path)) {
     -+		UNLEAK(chain_path);
     -+		die(_("failed to remove commit-graph at %s"), chain_path);
     -+	}
     -+	free(chain_path);
     -+
     -+	if (!run_write_commit_graph(opts))
     -+		return 0;
     -+
     -+	error(_("failed to rewrite commit-graph"));
     -+	return 1;
     ++	return 0;
      +}
      +
     - static int maintenance_task_gc(struct maintenance_opts *opts)
     + static int maintenance_task_gc(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
      @@ builtin/gc.c: struct maintenance_task {
     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
      +	},
       };
       
     - static int maintenance_run(struct maintenance_opts *opts)
     + static int maintenance_run_tasks(struct maintenance_run_opts *opts)
      
       ## commit-graph.c ##
      @@ commit-graph.c: static char *get_split_graph_filename(struct object_directory *odb,
     @@ commit-graph.c: static void expire_commit_graphs(struct write_commit_graph_conte
       		ctx->num_commit_graphs_after = 0;
      
       ## commit-graph.h ##
     -@@ commit-graph.h: struct commit;
     - struct bloom_filter_settings;
     +@@ commit-graph.h: struct raw_object_store;
     + struct string_list;
       
       char *get_commit_graph_filename(struct object_directory *odb);
      +char *get_commit_graph_chain_filename(struct object_directory *odb);
  6:  85268bd53e !  6:  69298aee24 maintenance: add --task option
     @@ Documentation/git-maintenance.txt: OPTIONS
       
      +--task=<task>::
      +	If this option is specified one or more times, then only run the
     -+	specified tasks in the specified order.
     ++	specified tasks in the specified order. See the 'TASKS' section
     ++	for the list of accepted `<task>` values.
      +
       GIT
       ---
       Part of the linkgit:git[1] suite
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: typedef int maintenance_task_fn(struct maintenance_opts *opts);
     - struct maintenance_task {
     +@@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
     + }
     + 
     + static const char * const builtin_maintenance_run_usage[] = {
     +-	N_("git maintenance run [--auto] [--[no-]quiet]"),
     ++	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
     + 	NULL
     + };
     + 
     +@@ builtin/gc.c: struct maintenance_task {
       	const char *name;
       	maintenance_task_fn *fn;
     --	unsigned enabled:1;
     -+	unsigned enabled:1,
     -+		 selected:1;
     + 	unsigned enabled:1;
     ++
     ++	/* -1 if not selected. */
      +	int selected_order;
       };
       
     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
      +	return b->selected_order - a->selected_order;
      +}
      +
     - static int maintenance_run(struct maintenance_opts *opts)
     + static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       {
      -	int i;
      +	int i, found_selected = 0;
       	int result = 0;
       
      +	for (i = 0; !found_selected && i < TASK__COUNT; i++)
     -+		found_selected = tasks[i].selected;
     ++		found_selected = tasks[i].selected_order >= 0;
      +
      +	if (found_selected)
      +		QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
      +
       	for (i = 0; i < TASK__COUNT; i++) {
      -		if (!tasks[i].enabled)
     -+		if (found_selected && !tasks[i].selected)
     ++		if (found_selected && tasks[i].selected_order < 0)
      +			continue;
      +
      +		if (!found_selected && !tasks[i].enabled)
       			continue;
       
       		if (tasks[i].fn(opts)) {
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       	return result;
       }
       
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
      +	BUG_ON_OPT_NEG(unset);
      +
      +	for (i = 0; i < TASK__COUNT; i++) {
     -+		num_selected += tasks[i].selected;
     ++		if (tasks[i].selected_order >= 0)
     ++			num_selected++;
      +		if (!strcasecmp(tasks[i].name, arg)) {
      +			task = &tasks[i];
      +		}
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
      +		return 1;
      +	}
      +
     -+	if (task->selected) {
     ++	if (task->selected_order >= 0) {
      +		error(_("task '%s' cannot be selected multiple times"), arg);
      +		return 1;
      +	}
      +
     -+	task->selected = 1;
      +	task->selected_order = num_selected + 1;
      +
      +	return 0;
      +}
      +
     - int cmd_maintenance(int argc, const char **argv, const char *prefix)
     + static int maintenance_run(int argc, const char **argv, const char *prefix)
       {
     - 	struct maintenance_opts opts;
     -@@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     ++	int i;
     + 	struct maintenance_run_opts opts;
     + 	struct option builtin_maintenance_run_options[] = {
     + 		OPT_BOOL(0, "auto", &opts.auto_flag,
       			 N_("run tasks based on the state of the repository")),
       		OPT_BOOL(0, "quiet", &opts.quiet,
       			 N_("do not report progress or other information over stderr")),
     @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefi
      +			PARSE_OPT_NONEG, task_option_parse),
       		OPT_END()
       	};
     + 	memset(&opts, 0, sizeof(opts));
     + 
     + 	opts.quiet = !isatty(2);
       
     ++	for (i = 0; i < TASK__COUNT; i++)
     ++		tasks[i].selected_order = -1;
     ++
     + 	argc = parse_options(argc, argv, prefix,
     + 			     builtin_maintenance_run_options,
     + 			     builtin_maintenance_run_usage,
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'run [--auto|--quiet]' '
  7:  6f86cfaa94 !  7:  3d513acdd8 maintenance: take a lock on the objects directory
     @@ Commit message
          lock is never committed, since it does not represent meaningful data.
          Instead, it is only a placeholder.
      
     -    If the lock file already exists, then fail silently. This will become
     -    very important later when we implement the 'fetch' task, as this is our
     -    stop-gap from creating a recursive process loop between 'git fetch' and
     -    'git maintenance run'.
     +    If the lock file already exists, then fail with a warning. If '--auto'
     +    is specified, then instead no warning is shown and no tasks are attempted.
     +    This will become very important later when we implement the 'prefetch'
     +    task, as this is our stop-gap from creating a recursive process loop
     +    between 'git fetch' and 'git maintenance run --auto'.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       {
       	int i, found_selected = 0;
       	int result = 0;
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
      +		 * that case.
      +		 */
      +		if (!opts->auto_flag && !opts->quiet)
     -+			error(_("lock file '%s' exists, skipping maintenance"),
     -+			      lock_path);
     ++			warning(_("lock file '%s' exists, skipping maintenance"),
     ++				lock_path);
      +		free(lock_path);
      +		return 0;
      +	}
      +	free(lock_path);
       
       	for (i = 0; !found_selected && i < TASK__COUNT; i++)
     - 		found_selected = tasks[i].selected;
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     + 		found_selected = tasks[i].selected_order >= 0;
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       		}
       	}
       
  8:  5c0f9d69d1 !  8:  712f5f2d8e maintenance: create maintenance.<task>.enabled config
     @@ Documentation/config/maintenance.txt (new)
      @@
      +maintenance.<task>.enabled::
      +	This boolean config option controls whether the maintenance task
     -+	with name `<task>` is run when no `--task` option is specified.
     -+	By default, only `maintenance.gc.enabled` is true.
     ++	with name `<task>` is run when no `--task` option is specified to
     ++	`git maintenance run`. These config values are ignored if a
     ++	`--task` option exists. By default, only `maintenance.gc.enabled`
     ++	is true.
      
       ## Documentation/git-maintenance.txt ##
      @@ Documentation/git-maintenance.txt: SUBCOMMANDS
     @@ Documentation/git-maintenance.txt: SUBCOMMANDS
       
       TASKS
       -----
     +@@ Documentation/git-maintenance.txt: OPTIONS
     + 
     + --task=<task>::
     + 	If this option is specified one or more times, then only run the
     +-	specified tasks in the specified order. See the 'TASKS' section
     +-	for the list of accepted `<task>` values.
     ++	specified tasks in the specified order. If no `--task=<task>`
     ++	arguments are specified, then only the tasks with
     ++	`maintenance.<task>.enabled` configured as `true` are considered.
     ++	See the 'TASKS' section for the list of accepted `<task>` values.
     + 
     + GIT
     + ---
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       	return result;
       }
       
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
       static int task_option_parse(const struct option *opt,
       			     const char *arg, int unset)
       {
     -@@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 				   builtin_maintenance_options);
     +@@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char *prefix)
     + 	memset(&opts, 0, sizeof(opts));
       
       	opts.quiet = !isatty(2);
      +	initialize_task_config();
       
     - 	argc = parse_options(argc, argv, prefix,
     - 			     builtin_maintenance_options,
     + 	for (i = 0; i < TASK__COUNT; i++)
     + 		tasks[i].selected_order = -1;
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'run [--auto|--quiet]' '
  9:  68bf5bef4b !  9:  69d3b48fd4 maintenance: use pointers to check --auto
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
       
     - typedef int maintenance_task_fn(struct maintenance_opts *opts);
     + typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
       
      +/*
      + * An auto condition function returns 1 if the task should run
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
       	const char *name;
       	maintenance_task_fn *fn;
      +	maintenance_auto_fn *auto_condition;
     - 	unsigned enabled:1,
     - 		 selected:1;
     - 	int selected_order;
     + 	unsigned enabled:1;
     + 
     + 	/* -1 if not selected. */
      @@ builtin/gc.c: static struct maintenance_task tasks[] = {
       	[TASK_GC] = {
       		"gc",
     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
       		1,
       	},
       	[TASK_COMMIT_GRAPH] = {
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       		if (!found_selected && !tasks[i].enabled)
       			continue;
       
 10:  fc097c389a ! 10:  4c3115fe35 maintenance: add auto condition for commit-graph task
     @@ Commit message
      
       ## Documentation/config/maintenance.txt ##
      @@ Documentation/config/maintenance.txt: maintenance.<task>.enabled::
     - 	This boolean config option controls whether the maintenance task
     - 	with name `<task>` is run when no `--task` option is specified.
     - 	By default, only `maintenance.gc.enabled` is true.
     + 	`git maintenance run`. These config values are ignored if a
     + 	`--task` option exists. By default, only `maintenance.gc.enabled`
     + 	is true.
      +
      +maintenance.commit-graph.auto::
      +	This integer config option controls how often the `commit-graph` task
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: struct maintenance_opts {
     +@@ builtin/gc.c: struct maintenance_run_opts {
       	int quiet;
       };
       
      +/* Remember to update object flag allocation in object.h */
     -+#define PARENT1		(1u<<16)
     ++#define SEEN		(1u<<0)
      +
     -+static int num_commits_not_in_graph = 0;
     -+static int limit_commits_not_in_graph = 100;
     ++struct cg_auto_data {
     ++	int num_not_in_graph;
     ++	int limit;
     ++};
      +
      +static int dfs_on_ref(const char *refname,
      +		      const struct object_id *oid, int flags,
      +		      void *cb_data)
      +{
     ++	struct cg_auto_data *data = (struct cg_auto_data *)cb_data;
      +	int result = 0;
      +	struct object_id peeled;
      +	struct commit_list *stack = NULL;
     @@ builtin/gc.c: struct maintenance_opts {
      +		for (parent = commit->parents; parent; parent = parent->next) {
      +			if (parse_commit(parent->item) ||
      +			    commit_graph_position(parent->item) != COMMIT_NOT_FROM_GRAPH ||
     -+			    parent->item->object.flags & PARENT1)
     ++			    parent->item->object.flags & SEEN)
      +				continue;
      +
     -+			parent->item->object.flags |= PARENT1;
     -+			num_commits_not_in_graph++;
     ++			parent->item->object.flags |= SEEN;
     ++			data->num_not_in_graph++;
      +
     -+			if (num_commits_not_in_graph >= limit_commits_not_in_graph) {
     ++			if (data->num_not_in_graph >= data->limit) {
      +				result = 1;
      +				break;
      +			}
     @@ builtin/gc.c: struct maintenance_opts {
      +static int should_write_commit_graph(void)
      +{
      +	int result;
     ++	struct cg_auto_data data;
      +
     ++	data.num_not_in_graph = 0;
     ++	data.limit = 100;
      +	git_config_get_int("maintenance.commit-graph.auto",
     -+			   &limit_commits_not_in_graph);
     ++			   &data.limit);
      +
     -+	if (!limit_commits_not_in_graph)
     ++	if (!data.limit)
      +		return 0;
     -+	if (limit_commits_not_in_graph < 0)
     ++	if (data.limit < 0)
      +		return 1;
      +
     -+	result = for_each_ref(dfs_on_ref, NULL);
     ++	result = for_each_ref(dfs_on_ref, &data);
      +
     -+	clear_commit_marks_all(PARENT1);
     ++	clear_commit_marks_all(SEEN);
      +
      +	return result;
      +}
      +
     - static int run_write_commit_graph(struct maintenance_opts *opts)
     + static int run_write_commit_graph(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
      @@ builtin/gc.c: static struct maintenance_task tasks[] = {
     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
      
       ## object.h ##
      @@ object.h: struct object_array {
     +  * sha1-name.c:                                              20
        * list-objects-filter.c:                                      21
        * builtin/fsck.c:           0--3
     ++ * builtin/gc.c:             0
        * builtin/index-pack.c:                                     2021
     -+ * builtin/maintenance.c:                           16
        * builtin/pack-objects.c:                                   20
        * builtin/reflog.c:                   10--12
     -  * builtin/show-branch.c:    0-------------------------------------------26
 11:  46fbe161aa ! 11:  652a8eac57 maintenance: add trace2 regions for task execution
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       		     !tasks[i].auto_condition()))
       			continue;
       

-- 
gitgitgadget

^ permalink raw reply

* Re: [PATCH v2] refs: remove lookup cache for reference-transaction hook
From: Jeff King @ 2020-08-25 18:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Taylor Blau
In-Reply-To: <xmqqd03e1tsd.fsf@gitster.c.googlers.com>

On Tue, Aug 25, 2020 at 11:09:54AM -0700, Junio C Hamano wrote:

> >> +test_perf "nonatomic push" '
> >> +	git push ./target-repo.git $(test_seq 1000) &&
> >> +	git push --delete ./target-repo.git $(test_seq 1000)
> >>  '
> >
> > This works as far as Git is concerned, but "seq 1000" output with NULs
> > is 3893 bytes. I wonder if some platforms might run into command-line
> > limits there.
> 
> That was my thought when I saw the above as well.  In addition, I do
> not think it is a good idea to encourage digit-only refnames.

Good point. It gets hairy at four digits:

  $ git show 1000
  error: short SHA1 1000 is ambiguous
  hint: The candidates are:
  hint:   10000434d2 tree
  hint:   10007bcb9e tree
  hint:   10008a0e22 tree
  hint:   1000bdf512 tree
  hint:   1000dc2368 blob
  fatal: ambiguous argument '1000': unknown revision or path not in the working tree.

So I think if the test works it may be relying on the exact object ids
that happen to be generated (which fortunately are at least
deterministic these days, but it may be a trap waiting to spring for
somebody later).

-Peff

^ permalink raw reply

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
From: Jeff King @ 2020-08-25 18:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, dstolee
In-Reply-To: <xmqqh7sq1u0a.fsf@gitster.c.googlers.com>

On Tue, Aug 25, 2020 at 11:05:09AM -0700, Junio C Hamano wrote:

> Subject: [RFC] pack-redundant: gauge the usage before proposing its removal
> 
> The subcommand is unusably slow and the reason why nobody reports it
> as a performance bug is suspected to be the absense of users.  Let's
> disable the normal use of command by making it error out with a big
> message that asks the user to tell us that they still care about the
> command, with an escape hatch to override it with a command line
> option.

I like this direction. And I'm tempted to say it's OK to go straight
here given how generally useless the command is. But it feels like a big
jump for the initial change. While we give the user an easy escape
hatch, it might still break their scripts.

A more gentle transition would I guess be:

  1. Mention deprecation in release notes (hah, as if anybody reads
     them).

  2. Issue a warning but continue to behave as normal. That might break
     scripts that care a lot about stderr, but otherwise is harmless. No
     clue if anybody would actually see the message or not.

  3. Flip warning to error, as your patch does.

  4. Removal.

I'd guess we could do 1+2 in one release, then 3 a release or two later,
and then finally 4. I dunno. That's more tedious and I'll be surprised
if we get any bites, but maybe we ought to do it out of principle.

-Peff

^ permalink raw reply

* Re: [PATCH 0/7] Better threaded delta resolution in index-pack (another try)
From: Jonathan Tan @ 2020-08-25 18:11 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git, steadmon
In-Reply-To: <20200824220829.GA802799@coredump.intra.peff.net>

> The overall time seems to get slightly worse, though (HEAD~7 is before
> your patch, HEAD is with it):
> 
>   Test                                           HEAD~7               HEAD                    
>   --------------------------------------------------------------------------------------------
>   5302.9: index-pack default number of threads   71.96(376.11+3.66)   74.18(390.62+6.08) +3.1%
> 
> There may be other cases that get better, though. A 3% increase here is
> probably OK if we get something for it. But if our primary goal here is
> increasing multithread efficiency, then we should be able to show some
> benchmark that improves. :)

Ah...good question. Cloning from
https://fuchsia.googlesource.com/third_party/vulkan-cts (mentioned in
patch 7), cd-ing to the pack dir, and running:

  git index-pack --stdin -o foo <*.pack

I got 8m2.878s with my patches and 12m6.365s without. But I ran this on
a cloud virtual machine (what I have access to right now) so the numbers
might look different on a dedicated machine.

^ permalink raw reply

* Re: [PATCH v2] refs: remove lookup cache for reference-transaction hook
From: Junio C Hamano @ 2020-08-25 18:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Taylor Blau
In-Reply-To: <20200825151053.GA1409139@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Tue, Aug 25, 2020 at 12:35:24PM +0200, Patrick Steinhardt wrote:
>
>> The only change compared to v1 is that I've addressed the unportable
>> `branch-{1..1000}` syntax in favor of `test_seq`. I had to setup refs as
>> part of the setup and change the ordering for "update-ref --stdin" from
>> create/update/delete to update/delete/create, but I don't think that's
>> too bad. At least timings didn't seem to really change because of that.
>
> Another option instead of changing the order in the other tests is to do
> another untimed setup step before the push test. I'm OK either way,
> though.
>
>> +test_perf "nonatomic push" '
>> +	git push ./target-repo.git $(test_seq 1000) &&
>> +	git push --delete ./target-repo.git $(test_seq 1000)
>>  '
>
> This works as far as Git is concerned, but "seq 1000" output with NULs
> is 3893 bytes. I wonder if some platforms might run into command-line
> limits there.

That was my thought when I saw the above as well.  In addition, I do
not think it is a good idea to encourage digit-only refnames.


^ permalink raw reply

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
From: Junio C Hamano @ 2020-08-25 18:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee
In-Reply-To: <20200825172214.GC1414394@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> OK, that's the part I was missing. The discussion here and the statement
> from git-repack(1):
>
>   -d
>       After packing, if the newly created packs make some existing packs
>       redundant, remove the redundant packs. Also run git prune-packed
>       to remove redundant loose object files.
>
> made me think that it was running pack-redundant. But it doesn't seem
> to. It looks like we stopped doing so in 6ed64058e1 (git-repack: do not
> do complex redundancy check., 2005-11-19).

Thanks for digging.  A good opportunity for a #leftoverbits
documentation update from new people is here.

> As an aside, we tried using pack-redundant at GitHub several years ago
> for dropping packs that were replicated in alternates storage. It
> performs very poorly (quadratically, perhaps?) to the point that we
> found it unusable,...

Yes, I originally wrote "the pack-redundant subcommand" in the
message you are responding to with a bit more colourful adjectives,
but rewrote it ;-)  My recollection from the last time I looked at
it is that it is quadratic or even worse---that was long time ago,
but on the other hand I think the subcommand had no significant
improvement over the course of its life.

Perhaps it is time to drop it.

-- >8 --
Subject: [RFC] pack-redundant: gauge the usage before proposing its removal

The subcommand is unusably slow and the reason why nobody reports it
as a performance bug is suspected to be the absense of users.  Let's
disable the normal use of command by making it error out with a big
message that asks the user to tell us that they still care about the
command, with an escape hatch to override it with a command line
option.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-redundant.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index 178e3409b7..97cf3df79b 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -554,6 +554,7 @@ static void load_all(void)
 int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	int i_still_use_this = 0;
 	struct pack_list *min = NULL, *red, *pl;
 	struct llist *ignore;
 	struct object_id *oid;
@@ -580,12 +581,25 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 			alt_odb = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--i-still-use-this")) {
+			i_still_use_this = 1;
+			continue;
+		}
 		if (*arg == '-')
 			usage(pack_redundant_usage);
 		else
 			break;
 	}
 
+	if (!i_still_use_this) {
+		puts(_("'git pack-redundant' is nominated for removal.\n"
+		       "If you still use this command, please add an extra\n"
+		       "option, '--i-still-use-this', on the command line\n"
+		       "and let us know you still use it by sending an e-mail\n"
+		       "to <git@vger.kernel.org>.  Thanks\n"));
+		exit(1);
+	}
+
 	if (load_all_packs)
 		load_all();
 	else

^ permalink raw reply related

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
From: Jeff King @ 2020-08-25 17:42 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Derrick Stolee, Son Luong Ngoc, git, dstolee
In-Reply-To: <20200825173425.GA16844@syl.lan>

On Tue, Aug 25, 2020 at 01:34:25PM -0400, Taylor Blau wrote:

> > I.e., drop step 2 above, and make step 3 just clear_midx_file(). Which
> > is roughly what the code does now, isn't it? Or is there some reason
> > that "expire" is more interesting than just clearing?
> 
> It's not clear to me whether you're talking about my patch, or what a
> more full integration with 'git repack' looks like.
> 
> If you are talking about my patch, I disagree: checking that the MIDX
> doesn't know about a pack we're dropping *is* useful even without
> all-in-one, because of '.keep' packs (as demonstrated by the new test in
> my patch).

I hadn't considered .keep. So all-in-one may still involve selectively
deleting some packs. It makes sense, then, for repack to consider
whether the midx is actually redundant or not rather than just always
clearing it (i.e., what your patch does).

In general I consider .keep packs kind of an awful feature that
introduces a lot of confusion and works against other features like
bitmaps. But I guess that they're the only thing that allows you to have
a gigantic Windows-like repo where you never fully pack it, but just
keep acquiring a string of big packs. Which is the exact case that the
midx is most useful for. So they're definitely worth considering and
supporting here.

> To me, this patch seems like an incremental step in the direction that
> we ultimately want to be going, but it's hard to untangle whether the
> ensuing discussion is targeted at my patch, or the ultimate goal.

I wasn't sure of the answer to that until we untangled more. I.e., it
wasn't clear to me if your incremental step was even in the right
direction if we weren't in fact ever selectively deleting packs in
git-repack. (And now it sounds like we aren't via "git repack -d", which
was my original question, but we are via .keep).

-Peff

^ permalink raw reply

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
From: Jeff King @ 2020-08-25 17:34 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, Taylor Blau, git, dstolee
In-Reply-To: <c78b3108-b760-d252-9428-6f03549fea11@gmail.com>

On Tue, Aug 25, 2020 at 12:18:42PM -0400, Derrick Stolee wrote:

> The best I can see is that prepare_midx_pack() will return 1 if the
> pack no longer exists, and this would cause a die("error preparing
> packfile from multi-pack-index") in nth_midxed_pack_entry(), killing
> the following stack trace:
> 
> + find_pack_entry():packfile.c
>  + fill_midx_entry():midx.c
>   + nth_midxed_pack_entry():midx.c
> 
> Perhaps that die() is a bit over-zealous since we could return 1
> through this stack and find_pack_entry() could continue searching
> for the object in the packed_git list. However, it could start
> returning false _negatives_ if there were duplicates of the object
> in the multi-pack-index but only the latest copy was deleted (and
> the object does not appear in a pack-file outside of the multi-
> pack-index).

Hmm, yeah.

I thought this code is already doing the right thing, because I'd expect
the is_pack_valid() call later in nth_midxed_pack_entry() to be where we
notice the problem. But add_packed_git() does stat the packfile and
return an error.

So that die() really ought to be just "return 0". The caller already has
to (and does) handle similar errors (including that the pack went away
after we added it to the packed_git list, or that it exists but has
bogus data, etc).

-Peff

^ permalink raw reply

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
From: Taylor Blau @ 2020-08-25 17:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, Taylor Blau, Son Luong Ngoc, git, dstolee
In-Reply-To: <20200825172901.GD1414394@coredump.intra.peff.net>

On Tue, Aug 25, 2020 at 01:29:01PM -0400, Jeff King wrote:
> On Tue, Aug 25, 2020 at 01:10:13PM -0400, Derrick Stolee wrote:
>
> > > If we do care, though, that implies some cooperation between the
> > > deletion process and the midx code. Perhaps it even argues that repack
> > > should refuse to delete such a single pack at all, since it _isn't_
> > > redundant. It's part of a midx, and the caller should rewrite the midx
> > > first itself, and _then_ look for redundant packs.
> >
> > It is worth noting that we _do_ have a way to integrate the delete and
> > write code using 'git multi-pack-index expire'. One way to resolve this
> > atomicity would be to do the following inside the repack command:
> >
> >  1. Create and index the new pack.
> >  2. git multi-pack-index write
> >  3. git multi-pack-index expire
>
> Given that discussion elsewhere points to git-repack only really
> deleting packs in all-in-one mode (and not ever a single pack), it seems
> like we can really be much simpler here. If we're not deleting packs via
> all-in-one, there's no need to touch the midx at all. If we are, then
> it's reasonable to delete the midx immediately (after having written our
> new pack but before deleting) since our new single pack idx is as good
> or better.
>
> I.e., drop step 2 above, and make step 3 just clear_midx_file(). Which
> is roughly what the code does now, isn't it? Or is there some reason
> that "expire" is more interesting than just clearing?

It's not clear to me whether you're talking about my patch, or what a
more full integration with 'git repack' looks like.

If you are talking about my patch, I disagree: checking that the MIDX
doesn't know about a pack we're dropping *is* useful even without
all-in-one, because of '.keep' packs (as demonstrated by the new test in
my patch).

To me, this patch seems like an incremental step in the direction that
we ultimately want to be going, but it's hard to untangle whether the
ensuing discussion is targeted at my patch, or the ultimate goal.

> And if anybody does want to drop single packs, etc, they can do so by
> generating a sensible midx separately from the repack operation (and
> probably doing so before dropping the packs for atomicity).
>
> -Peff

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
From: Jeff King @ 2020-08-25 17:29 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Son Luong Ngoc, git, dstolee
In-Reply-To: <45921233-ac6c-05f2-e108-0ab2aeb56104@gmail.com>

On Tue, Aug 25, 2020 at 01:10:13PM -0400, Derrick Stolee wrote:

> > If we do care, though, that implies some cooperation between the
> > deletion process and the midx code. Perhaps it even argues that repack
> > should refuse to delete such a single pack at all, since it _isn't_
> > redundant. It's part of a midx, and the caller should rewrite the midx
> > first itself, and _then_ look for redundant packs.
> 
> It is worth noting that we _do_ have a way to integrate the delete and
> write code using 'git multi-pack-index expire'. One way to resolve this
> atomicity would be to do the following inside the repack command:
> 
>  1. Create and index the new pack.
>  2. git multi-pack-index write
>  3. git multi-pack-index expire

Given that discussion elsewhere points to git-repack only really
deleting packs in all-in-one mode (and not ever a single pack), it seems
like we can really be much simpler here. If we're not deleting packs via
all-in-one, there's no need to touch the midx at all. If we are, then
it's reasonable to delete the midx immediately (after having written our
new pack but before deleting) since our new single pack idx is as good
or better.

I.e., drop step 2 above, and make step 3 just clear_midx_file(). Which
is roughly what the code does now, isn't it? Or is there some reason
that "expire" is more interesting than just clearing?

And if anybody does want to drop single packs, etc, they can do so by
generating a sensible midx separately from the repack operation (and
probably doing so before dropping the packs for atomicity).

-Peff

^ permalink raw reply

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
From: Jeff King @ 2020-08-25 17:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, dstolee
In-Reply-To: <xmqqtuwq1zux.fsf@gitster.c.googlers.com>

On Tue, Aug 25, 2020 at 08:58:46AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Does "git repack" ever remove just one pack? Obviously "git repack -ad"
> > or "git repack -Ad" is going to pack everything and delete the old
> > packs. So I think we'd want to remove a midx there.
> 
> AFAIK, the pack-redundant subcommand is used by nobody in-tree, so
> nobody is doing "there are three packfiles, but all the objects in
> one of them are contained in the other two, so let's remove that
> redundant one".

OK, that's the part I was missing. The discussion here and the statement
from git-repack(1):

  -d
      After packing, if the newly created packs make some existing packs
      redundant, remove the redundant packs. Also run git prune-packed
      to remove redundant loose object files.

made me think that it was running pack-redundant. But it doesn't seem
to. It looks like we stopped doing so in 6ed64058e1 (git-repack: do not
do complex redundancy check., 2005-11-19).

As an aside, we tried using pack-redundant at GitHub several years ago
for dropping packs that were replicated in alternates storage. It
performs very poorly (quadratically, perhaps?) to the point that we
found it unusable, and I implemented a "local-redundant" command to do
the same thing more quickly. We ended up dropping that as we now migrate
packs wholesale (rather than via fetch), so I never upstreamed it.

I mention that here to warn people away from pack-redundant (I wondered
at the time why nobody had noticed its terrible performance, but now I
think the answer is just that nobody ever runs it). I wonder if we
should even consider deprecating and removing it.

I'm also happy to share local-redundant if anybody is interested (though
as I've mentioned elsewhere, I think the larger scheme it was part of it
also performed poorly and isn't worth pursuing).

> > And "git repack -d" I think of as deleting only loose objects that we
> > just packed. But I guess it could also remove a pack that has now been
> > made redundant? That seems like a rare case in practice, but I suppose
> > is possible.
> 
> Meaning it can become reality?  Yes.  Or it already can happen?  I
> doubt it.

I thought the latter, but after this thread I agree that it can't.

> > I'm also a little curious how bad it is to have a midx whose pack has
> > gone away. I guess we'd answer queries for "yes, we have this object"
> > even if we don't, which is bad. Though in practice we'd only delete
> > those packs if we have their objects elsewhere.
> 
> Hmph, object O used to be in pack A and pack B, midx points at the
> copy in pack B but we deleted it because the pack is redundant.
> Now, midx says O exists, but midx cannot be used to retrieve O.  We
> need to ask A.idx about O to locate it.
> 
> That sounds brittle.  I am not sure our error fallback is all that
> patient.

It has to be that patient even without a midx, I think. We might open
A.idx and mmap its contents, without opening the matching A.pack (or we
might open it and later close it due to memory or descriptor
constraints). And we have two layers there:

  - when object_info_extended sees a bad return from
    packed_object_info(), it will mark the entry as bad and recurse,
    giving us an opportunity to find another one

  - when we can't find the object at all (e.g., because we marked that
    particular pack's version as inaccessible), we'll hit the
    reprepare_packed_git() path and look for a new idx

Those same things should be kicking in for midx lookups, too (I didn't
test them, but from a cursory look at the organization of the code, I
think they will).

> > In that line of thinking, do we even need to delete midx files if one of
> > their packs goes away? The reading side probably ought to be able to
> > handle that gracefully.
> 
> But at that point, is there even a point to have that midx file that
> knows about objects (so it can answer has_object()? correctly and
> quickly) but does not know the correct location of half of the objects?
> Instead of going directly to A.idx to locate O, we need to go to midx
> to learn the location of O in B (which no longer exists), and then
> fall back to it, that is a pure overhead.

It is overhead for sure. But my thinking was that you're trading one
efficiency for another:

  - the midx may incur an extra lookup for objects in the redundant
    pack, but that pack may be a small portion of what the midx indexes
    (and this is likely in the usual case that you have one big
    cumulative pack and many recently-created smaller ones; the big one
    will never be made redundant and dominates the object count of the
    midx)

  - by keeping the midx, you're improving lookup speed for all of the
    other objects, which don't have to hunt through separate pack idx
    files

So my guess is that it would usually be a win. Though really the correct
answer is: if you are mucking with packs you should just generate a new
midx (or you should pack all-into-one, which generates a single idx
accomplishing the same thing).

> > And the more interesting case is when you repack everything with "-ad"
> > or similar, at which point you shouldn't even need to look up what's in
> > the midx to see if you deleted its packs. The point of your operation is
> > to put it all-into-one, so you know the old midx should be discarded.
> 
> Old one, yes.  Do we need to have the new one in that case?

You shouldn't need one since you'd only have a single pack now.

-Peff

^ permalink raw reply

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
From: Derrick Stolee @ 2020-08-25 17:10 UTC (permalink / raw)
  To: Jeff King, Taylor Blau; +Cc: Son Luong Ngoc, git, dstolee
In-Reply-To: <20200825164721.GA1414394@coredump.intra.peff.net>

On 8/25/2020 12:47 PM, Jeff King wrote:
> It may be worth thinking a bit about atomicity here, though. Rather than
> separate delete and write steps, would somebody want a sequence like:

...
> If we do care, though, that implies some cooperation between the
> deletion process and the midx code. Perhaps it even argues that repack
> should refuse to delete such a single pack at all, since it _isn't_
> redundant. It's part of a midx, and the caller should rewrite the midx
> first itself, and _then_ look for redundant packs.

It is worth noting that we _do_ have a way to integrate the delete and
write code using 'git multi-pack-index expire'. One way to resolve this
atomicity would be to do the following inside the repack command:

 1. Create and index the new pack.
 2. git multi-pack-index write
 3. git multi-pack-index expire

While this _mostly_ works, we still have issues around a concurrent
process opening a multi-pack-index before step (2) and trying to
read from the pack-file deleted in step (3). For this reason, the
'incremental-repack' task in the maintenance builtin series [1] runs
'expire' then 'repack' (and not the opposite order). To be completely
safe, we'd want to make sure that no Git command that started before
the 'repack' command is still running when we run 'expire'. In practice,
it suffices to run the 'incremental-repack' step at most once a day.

[1] https://lore.kernel.org/git/a8d956dad6b3a81d0f1b63cbd48f36a5e1195ae8.1597760730.git.gitgitgadget@gmail.com/

This discussion points out a big reason why the multi-pack-index
wasn't fully integrated with 'git repack': it can be tricky. Using
a repacking strategy that is focused on the multi-pack-index is
less tricky.

I still think that this patch is moving us in a better direction.

I'm making note of these possible improvements:

1. Have 'git repack' integrate with 'git multi-pack-index expire'
   when deleting pack-files after creating a new one (if a
   multi-pack-index exists).

2. Handle "missing packs" referenced by the multi-pack-index more
   gracefully, likely by disabling the multi-pack-index and
   calling reprepare_packed_git().

Thanks,
-Stolee



^ permalink raw reply

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
From: Jeff King @ 2020-08-25 16:56 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Derrick Stolee, git, dstolee
In-Reply-To: <20200825154224.GA9116@syl.lan>

On Tue, Aug 25, 2020 at 11:42:24AM -0400, Taylor Blau wrote:

> > test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> > 	git init preserve &&
> > 	test_when_finished "rm -fr preserve" &&
> > 	(
> > 		cd preserve &&
> > 		midx=.git/objects/pack/multi-pack-index &&
> >
> > 		test_commit 1 &&
> > 		HASH1=$(git pack-objects --all .git/objects/pack/pack) &&
> > 		touch .git/objects/pack/pack-$HASH1.keep &&
> >
> > 		cat >pack-input <<-\EOF &&
> 
> Escaping the heredoc shouldn't be necessary, so this can be written
> instead as '<<-EOF'.

We usually go the opposite way: if a here-doc doesn't need
interpolation, then we prefer to mark it as such to avoid surprising
somebody who adds lines later (and might need quoting).

Certainly you can argue that least-surprise would be in the other
direction (i.e., people expect to interpolate by default, and you
surprise anybody adding a variable reference), but for Git's test suite,
the convention usually runs the other way.

-Peff

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox