Git development
 help / color / mirror / Atom feed
* Re: [PATCH] parse-options: fix SunCC compiler warning
From: Duy Nguyen @ 2018-12-10 15:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: avarab, git, sunshine, szeder.dev
In-Reply-To: <xmqqy38xkhvd.fsf@gitster-ct.c.googlers.com>

On Mon, Dec 10, 2018 at 03:36:54PM +0900, Junio C Hamano wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
> 
> > The compiler reports this because show_gitcomp() never actually
> > returns a value:
> >
> >     "parse-options.c", line 520: warning: Function has no return
> >     statement : show_gitcomp
> >
> > The function calls exit() and will never return. Update and mark it
> > NORETURN.
> 
> Yuck.  This should do for now, but I am not impressed by the choice
> to hook show_gitcomp() call into parse_options_step(), which forces
> such an abnormal exit deeper in the callchain [*1*].  For readers
> (not compilers), it would help to have a comment at the caller that
> says that show_gitcomp() never returns and exits.
> 
> 	side note #1.  Perhaps parse_options_start() would have been
> 	a better place, instead of parse_options_step() that is
> 	repeatedly called in a loop and itself has a loop.  ANd
> 	worse yet, the check is done inside the loop even though the
> 	call is to be made only when the --git-completion-helper
> 	option is given as the sole request.

The reason it's in parse_options_step() is because -h is also handled
in there. Although -h does not bury exit() deep in the call chain. So
how about this as a replacement?

-- 8< --
diff --git a/parse-options.c b/parse-options.c
index 3b874a83a0..6932eaff61 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -516,7 +516,7 @@ static int show_gitcomp(struct parse_opt_ctx_t *ctx,
 	show_negated_gitcomp(original_opts, -1);
 	show_negated_gitcomp(original_opts, nr_noopts);
 	fputc('\n', stdout);
-	exit(0);
+	return PARSE_OPT_COMPLETE;
 }
 
 static int usage_with_options_internal(struct parse_opt_ctx_t *,
@@ -638,6 +638,8 @@ int parse_options(int argc, const char **argv, const char *prefix,
 	case PARSE_OPT_HELP:
 	case PARSE_OPT_ERROR:
 		exit(129);
+	case PARSE_OPT_COMPLETE:
+		exit(0);
 	case PARSE_OPT_NON_OPTION:
 	case PARSE_OPT_DONE:
 		break;
diff --git a/parse-options.h b/parse-options.h
index 6c4fe2016d..a650a7d220 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -208,6 +208,7 @@ extern int opterror(const struct option *opt, const char *reason, int flags);
 /*----- incremental advanced APIs -----*/
 
 enum {
+	PARSE_OPT_COMPLETE = -2,
 	PARSE_OPT_HELP = -1,
 	PARSE_OPT_DONE,
 	PARSE_OPT_NON_OPTION,
-- 8< --

^ permalink raw reply related

* Re: [PATCH 1/8] move worktree tests to t24*
From: Duy Nguyen @ 2018-12-10 15:32 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git Mailing List, Junio C Hamano, Elijah Newren
In-Reply-To: <20181209200449.16342-2-t.gummerer@gmail.com>

On Sun, Dec 9, 2018 at 9:04 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> The 'git worktree' command used to be just another mode in 'git
> checkout', namely 'git checkout --to'.  When the tests for the latter
> were retrofitted for the former, the test name was adjusted, but the
> test number was kept, even though the test is testing a different
> command now.  t/README states: "Second digit tells the particular
> command we are testing.", so 'git worktree' should have a separate
> number just for itself.
>
> Move the worktree tests to t24* to adhere to that guideline. We're
> going to make use of the free'd up numbers in a subsequent commit.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  t/{t2025-worktree-add.sh => t2400-worktree-add.sh}     | 0
>  t/{t2026-worktree-prune.sh => t2401-worktree-prune.sh} | 0
>  t/{t2027-worktree-list.sh => t2402-worktree-list.sh}   | 0
>  3 files changed, 0 insertions(+), 0 deletions(-)
>  rename t/{t2025-worktree-add.sh => t2400-worktree-add.sh} (100%)
>  rename t/{t2026-worktree-prune.sh => t2401-worktree-prune.sh} (100%)
>  rename t/{t2027-worktree-list.sh => t2402-worktree-list.sh} (100%)

Heh.. I did the same thing (in my unsent switch-branch/restore-files
series) and even used the same 24xx range :D You probably want to move
t2028 and t2029 too (not sure if they have landed on 'master')
-- 
Duy

^ permalink raw reply

* Re: [PATCH 2/8] entry: factor out unlink_entry function
From: Duy Nguyen @ 2018-12-10 15:49 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git Mailing List, Junio C Hamano, Elijah Newren
In-Reply-To: <20181209200449.16342-3-t.gummerer@gmail.com>

On Sun, Dec 9, 2018 at 9:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> Factor out the 'unlink_entry()' function from unpack-trees.c to
> entry.c.  It will be used in other places as well in subsequent
> steps.
>
> As it's no longer a static function, also move the documentation to
> the header file to make it more discoverable.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  cache.h        |  5 +++++
>  entry.c        | 15 +++++++++++++++
>  unpack-trees.c | 19 -------------------
>  3 files changed, 20 insertions(+), 19 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index ca36b44ee0..c1c953e810 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1542,6 +1542,11 @@ struct checkout {
>  extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
>  extern void enable_delayed_checkout(struct checkout *state);
>  extern int finish_delayed_checkout(struct checkout *state);
> +/*
> + * Unlink the last component and schedule the leading directories for
> + * removal, such that empty directories get removed.
> + */
> +extern void unlink_entry(const struct cache_entry *ce);

I'm torn. We try to remove 'extern' but I can see you may want to add
it here to be consistent with others. And removing extern even from
functions from entry.c only would cause some conflicts.

I wonder if we should move the 'removal' variable in symlinks to
'struct checkout' to reduce another global variable. But I guess
that's the problem for another day. It's not the focus of this series.
-- 
Duy

^ permalink raw reply

* Re: [PATCH 3/8] entry: support CE_WT_REMOVE flag in checkout_entry
From: Duy Nguyen @ 2018-12-10 15:58 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git Mailing List, Junio C Hamano, Elijah Newren
In-Reply-To: <20181209200449.16342-4-t.gummerer@gmail.com>

On Sun, Dec 9, 2018 at 9:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> 'checkout_entry()' currently only supports creating new entries in the
> working tree, but not deleting them.  Add the ability to remove
> entries at the same time if the entry is marked with the CE_WT_REMOVE
> flag.
>
> Currently this doesn't have any effect, as the CE_WT_REMOVE flag is
> only used in unpack-tree, however we will make use of this in a
> subsequent step in the series.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  entry.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/entry.c b/entry.c
> index 3ec148ceee..cd1c6601b6 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -441,6 +441,13 @@ int checkout_entry(struct cache_entry *ce,
>         static struct strbuf path = STRBUF_INIT;
>         struct stat st;
>
> +       if (ce->ce_flags & CE_WT_REMOVE) {
> +               if (topath)
> +                       BUG("Can't remove entry to a path");
> +               unlink_entry(ce);
> +               return 0;
> +       }

This makes the path counting in nd/checkout-noisy less accurate. But
it's not your fault of course.

Junio, do you still want to merge that series down to 'next' or drop
it? If it will be merged down, I'll keep a note and fix it once this
one lands too.

> +
>         if (topath)
>                 return write_entry(ce, topath, state, 1);
>
> --
> 2.20.0.405.gbc1bbc6f85
>


-- 
Duy

^ permalink raw reply

* Re: [PATCH 4/8] read-cache: add invalidate parameter to remove_marked_cache_entries
From: Duy Nguyen @ 2018-12-10 16:08 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git Mailing List, Junio C Hamano, Elijah Newren
In-Reply-To: <20181209200449.16342-5-t.gummerer@gmail.com>

On Sun, Dec 9, 2018 at 9:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> When marking cache entries for removal, and later removing them all at
> once using 'remove_marked_cache_entries()', cache entries currently
> have to be invalidated manually in the cache tree and in the untracked
> cache.
>
> Add an invalidate flag to the function.  With the flag set, the
> function will take care of invalidating the path in the cache tree and
> in the untracked cache.
>
> This will be useful in a subsequent commit.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>
> For the two current callsites, unpack-trees seems to do this
> invalidation itself internally.

I'm still a bit scared of this invalidation business in unpack-trees.
The thing is, we handle two separate index_state there, src_index and
result and invalidation has to be done on the right one (because index
extensions are on src_index until the very end of unpack-trees;
invalidating on 'result' would be no-op and wrong).
remove_marked_cache_entries() seems to be called on 'result' while
invalidate_ce_path() is on src_index, hm....

> I don't quite understand why we don't
> need it in split-index mode though.  I assume it's because the cache
> tree in the main index would already have been invalidated?  I didn't
> have much time to dig, but couldn't produce any failures with it
> either, so I assume not invalidating paths is the right thing to do
> here.

Yeah I think it's because cache-tree and untracked cache are already
properly invalidated. This merge base thingy is done when we load the
index files up, not when we write them down. The "front" index may
record that a few paths in the base index are no longer valid and need
to be deleted. But untracked cache and cache-tree both should have
recorded that same info when these paths are marked for delete at
index write time.
-- 
Duy

^ permalink raw reply

* [PATCH v3 0/6] Add a new "sparse" tree walk algorithm
From: Derrick Stolee via GitGitGadget @ 2018-12-10 16:42 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano
In-Reply-To: <pull.89.v2.git.gitgitgadget@gmail.com>

One of the biggest remaining pain points for users of very large
repositories is the time it takes to run 'git push'. We inspected some slow
pushes by our developers and found that the "Enumerating Objects" phase of a
push was very slow. This is unsurprising, because this is why reachability
bitmaps exist. However, reachability bitmaps are not available to us because
of the single pack-file requirement. The bitmap approach is intended for
servers anyway, and clients have a much different behavior pattern.

Specifically, clients are normally pushing a very small number of objects
compared to the entire working directory. A typical user changes only a
small cone of the working directory, so let's use that to our benefit.

Create a new "sparse" mode for 'git pack-objects' that uses the paths that
introduce new objects to direct our search into the reachable trees. By
collecting trees at each path, we can then recurse into a path only when
there are uninteresting and interesting trees at that path. This gains a
significant performance boost for small topics while presenting a
possibility of packing extra objects.

The main algorithm change is in patch 4, but is set up a little bit in
patches 1 and 2.

As demonstrated in the included test script, we see that the existing
algorithm can send extra objects due to the way we specify the "frontier".
But we can send even more objects if a user copies objects from one folder
to another. I say "copy" because a rename would (usually) change the
original folder and trigger a walk into that path, discovering the objects.

In order to benefit from this approach, the user can opt-in using the
pack.useSparse config setting. This setting can be overridden using the
'--no-sparse' option.

Update in V2: 

 * Added GIT_TEST_PACK_SPARSE test option.
 * Fixed test breakages when GIT_TEST_PACK_SPARSE is enabled by adding null
   checks.

Update in V3:

 * Change documentation around 'pack.useSparse' config setting to better
   advertise to users.
 * Mostly a ping now that v2.20.0 is out.

Derrick Stolee (6):
  revision: add mark_tree_uninteresting_sparse
  list-objects: consume sparse tree walk
  pack-objects: add --sparse option
  revision: implement sparse algorithm
  pack-objects: create pack.useSparse setting
  pack-objects: create GIT_TEST_PACK_SPARSE

 Documentation/config/pack.txt      |   9 ++
 Documentation/git-pack-objects.txt |  11 ++-
 bisect.c                           |   2 +-
 builtin/pack-objects.c             |  10 ++-
 builtin/rev-list.c                 |   2 +-
 http-push.c                        |   2 +-
 list-objects.c                     |  55 +++++++++++-
 list-objects.h                     |   4 +-
 revision.c                         | 121 +++++++++++++++++++++++++
 revision.h                         |   2 +
 t/README                           |   4 +
 t/t5322-pack-objects-sparse.sh     | 139 +++++++++++++++++++++++++++++
 12 files changed, 351 insertions(+), 10 deletions(-)
 create mode 100755 t/t5322-pack-objects-sparse.sh


base-commit: a1598010f775d82b5adf12c29d0f5bc9b41434c6
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-89%2Fderrickstolee%2Fpush%2Fsparse-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-89/derrickstolee/push/sparse-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/89

Range-diff vs v2:

 1:  60617681f7 = 1:  60617681f7 revision: add mark_tree_uninteresting_sparse
 2:  4527addacb = 2:  4527addacb list-objects: consume sparse tree walk
 3:  9644f6ff04 ! 3:  4ef318bdb2 pack-objects: add --sparse option
     @@ -32,7 +32,9 @@
       
      +--sparse::
      +	Use the "sparse" algorithm to determine which objects to include in
     -+	the pack. This can have significant performance benefits when computing
     ++	the pack, when combined with the "--revs" option. This algorithm
     ++	only walks trees that appear in paths that introduce new objects.
     ++	This can have significant performance benefits when computing
      +	a pack to send a small change. However, it is possible that extra
      +	objects are added to the pack-file if the included commits contain
      +	certain types of direct renames.
 4:  c99957d06f = 4:  571b2e2784 revision: implement sparse algorithm
 5:  d6912188be ! 5:  33d2c04dd6 pack-objects: create pack.useSparse setting
     @@ -19,6 +19,26 @@
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     +diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
     +--- a/Documentation/config/pack.txt
     ++++ b/Documentation/config/pack.txt
     +@@
     + 	true. You should not generally need to turn this off unless
     + 	you are debugging pack bitmaps.
     + 
     ++pack.useSparse::
     ++	When true, git will default to using the '--sparse' option in
     ++	'git pack-objects' when the '--revs' option is present. This
     ++	algorithm only walks trees that appear in paths that introduce new
     ++	objects. This can have significant performance benefits when
     ++	computing a pack to send a small change. However, it is possible
     ++	that extra objects are added to the pack-file if the included
     ++	commits contain certain types of direct renames.
     ++
     + pack.writeBitmaps (deprecated)::
     + 	This is a deprecated synonym for `repack.writeBitmaps`.
     + 
     +
      diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
      --- a/builtin/pack-objects.c
      +++ b/builtin/pack-objects.c
 6:  3d394a9136 = 6:  e4f29543ee pack-objects: create GIT_TEST_PACK_SPARSE

-- 
gitgitgadget

^ permalink raw reply

* [PATCH v3 1/6] revision: add mark_tree_uninteresting_sparse
From: Derrick Stolee via GitGitGadget @ 2018-12-10 16:42 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v3.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

In preparation for a new algorithm that walks fewer trees when
creating a pack from a set of revisions, create a method that
takes an oidset of tree oids and marks reachable objects as
UNINTERESTING.

The current implementation uses the existing
mark_tree_uninteresting to recursively walk the trees and blobs.
This will walk the same number of trees as the old mechanism.

There is one new assumption in this approach: we are also given
the oids of the interesting trees. This implementation does not
use those trees at the moment, but we will use them in a later
rewrite of this method.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 revision.c | 22 ++++++++++++++++++++++
 revision.h |  2 ++
 2 files changed, 24 insertions(+)

diff --git a/revision.c b/revision.c
index 13e0519c02..3a62c7c187 100644
--- a/revision.c
+++ b/revision.c
@@ -99,6 +99,28 @@ void mark_tree_uninteresting(struct repository *r, struct tree *tree)
 	mark_tree_contents_uninteresting(r, tree);
 }
 
+void mark_trees_uninteresting_sparse(struct repository *r,
+				     struct oidset *set)
+{
+	struct object_id *oid;
+	struct oidset_iter iter;
+
+	oidset_iter_init(set, &iter);
+	while ((oid = oidset_iter_next(&iter))) {
+		struct tree *tree = lookup_tree(r, oid);
+
+		if (tree->object.flags & UNINTERESTING) {
+			/*
+			 * Remove the flag so the next call
+			 * is not a no-op. The flag is added
+			 * in mark_tree_unintersting().
+			 */
+			tree->object.flags ^= UNINTERESTING;
+			mark_tree_uninteresting(r, tree);
+		}
+	}
+}
+
 struct commit_stack {
 	struct commit **items;
 	size_t nr, alloc;
diff --git a/revision.h b/revision.h
index 7987bfcd2e..f828e91ae9 100644
--- a/revision.h
+++ b/revision.h
@@ -67,6 +67,7 @@ struct rev_cmdline_info {
 #define REVISION_WALK_NO_WALK_SORTED 1
 #define REVISION_WALK_NO_WALK_UNSORTED 2
 
+struct oidset;
 struct topo_walk_info;
 
 struct rev_info {
@@ -327,6 +328,7 @@ void put_revision_mark(const struct rev_info *revs,
 
 void mark_parents_uninteresting(struct commit *commit);
 void mark_tree_uninteresting(struct repository *r, struct tree *tree);
+void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *set);
 
 void show_object_with_name(FILE *, struct object *, const char *);
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 3/6] pack-objects: add --sparse option
From: Derrick Stolee via GitGitGadget @ 2018-12-10 16:42 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v3.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Add a '--sparse' option flag to the pack-objects builtin. This
allows the user to specify that they want to use the new logic
for walking trees. This logic currently does not differ from the
existing output, but will in a later change.

Create a new test script, t5322-pack-objects-sparse.sh, to ensure
the object list that is selected matches what we expect. When we
update the logic to walk in a sparse fashion, the final test will
be updated to show the extra objects that are added.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-pack-objects.txt |  11 ++-
 builtin/pack-objects.c             |   5 +-
 t/t5322-pack-objects-sparse.sh     | 115 +++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+), 2 deletions(-)
 create mode 100755 t/t5322-pack-objects-sparse.sh

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 40c825c381..e45f3e680d 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	[--local] [--incremental] [--window=<n>] [--depth=<n>]
 	[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
 	[--stdout [--filter=<filter-spec>] | base-name]
-	[--shallow] [--keep-true-parents] < object-list
+	[--shallow] [--keep-true-parents] [--sparse] < object-list
 
 
 DESCRIPTION
@@ -196,6 +196,15 @@ depth is 4095.
 	Add --no-reuse-object if you want to force a uniform compression
 	level on all data no matter the source.
 
+--sparse::
+	Use the "sparse" algorithm to determine which objects to include in
+	the pack, when combined with the "--revs" option. This algorithm
+	only walks trees that appear in paths that introduce new objects.
+	This can have significant performance benefits when computing
+	a pack to send a small change. However, it is possible that extra
+	objects are added to the pack-file if the included commits contain
+	certain types of direct renames.
+
 --thin::
 	Create a "thin" pack by omitting the common objects between a
 	sender and a receiver in order to reduce network transfer. This
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5f70d840a7..7d5b0735e3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -84,6 +84,7 @@ static unsigned long pack_size_limit;
 static int depth = 50;
 static int delta_search_threads;
 static int pack_to_stdout;
+static int sparse;
 static int thin;
 static int num_preferred_base;
 static struct progress *progress_state;
@@ -3135,7 +3136,7 @@ static void get_object_list(int ac, const char **av)
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
-	mark_edges_uninteresting(&revs, show_edge, 0);
+	mark_edges_uninteresting(&revs, show_edge, sparse);
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
@@ -3292,6 +3293,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		{ OPTION_CALLBACK, 0, "unpack-unreachable", NULL, N_("time"),
 		  N_("unpack unreachable objects newer than <time>"),
 		  PARSE_OPT_OPTARG, option_parse_unpack_unreachable },
+		OPT_BOOL(0, "sparse", &sparse,
+			 N_("use the sparse reachability algorithm")),
 		OPT_BOOL(0, "thin", &thin,
 			 N_("create thin packs")),
 		OPT_BOOL(0, "shallow", &shallow,
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
new file mode 100755
index 0000000000..81f6805bc3
--- /dev/null
+++ b/t/t5322-pack-objects-sparse.sh
@@ -0,0 +1,115 @@
+#!/bin/sh
+
+test_description='pack-objects object selection using sparse algorithm'
+. ./test-lib.sh
+
+test_expect_success 'setup repo' '
+	test_commit initial &&
+	for i in $(test_seq 1 3)
+	do
+		mkdir f$i &&
+		for j in $(test_seq 1 3)
+		do
+			mkdir f$i/f$j &&
+			echo $j >f$i/f$j/data.txt
+		done
+	done &&
+	git add . &&
+	git commit -m "Initialized trees" &&
+	for i in $(test_seq 1 3)
+	do
+		git checkout -b topic$i master &&
+		echo change-$i >f$i/f$i/data.txt &&
+		git commit -a -m "Changed f$i/f$i/data.txt"
+	done &&
+	cat >packinput.txt <<-EOF &&
+	topic1
+	^topic2
+	^topic3
+	EOF
+	git rev-parse			\
+		topic1			\
+		topic1^{tree}		\
+		topic1:f1		\
+		topic1:f1/f1		\
+		topic1:f1/f1/data.txt | sort >expect_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git index-pack -o nonsparse.idx nonsparse.pack &&
+	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+	test_cmp expect_objects.txt nonsparse_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
+# Demonstrate that both algorithms send "extra" objects because
+# they are not in the frontier.
+
+test_expect_success 'duplicate a folder from f3 and commit to topic1' '
+	git checkout topic1 &&
+	echo change-3 >f3/f3/data.txt &&
+	git commit -a -m "Changed f3/f3/data.txt" &&
+	git rev-parse			\
+		topic1~1		\
+		topic1~1^{tree}		\
+		topic1^{tree}		\
+		topic1			\
+		topic1:f1		\
+		topic1:f1/f1		\
+		topic1:f1/f1/data.txt	\
+		topic1:f3		\
+		topic1:f3/f3		\
+		topic1:f3/f3/data.txt | sort >expect_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git index-pack -o nonsparse.idx nonsparse.pack &&
+	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+	test_cmp expect_objects.txt nonsparse_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
+test_expect_success 'duplicate a folder from f1 into f3' '
+	mkdir f3/f4 &&
+	cp -r f1/f1/* f3/f4 &&
+	git add f3/f4 &&
+	git commit -m "Copied f1/f1 to f3/f4" &&
+	cat >packinput.txt <<-EOF &&
+	topic1
+	^topic1~1
+	EOF
+	git rev-parse		\
+		topic1		\
+		topic1^{tree}	\
+		topic1:f3 | sort >expect_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git index-pack -o nonsparse.idx nonsparse.pack &&
+	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+	test_cmp expect_objects.txt nonsparse_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 2/6] list-objects: consume sparse tree walk
From: Derrick Stolee via GitGitGadget @ 2018-12-10 16:42 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v3.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

When creating a pack-file using 'git pack-objects --revs' we provide
a list of interesting and uninteresting commits. For example, a push
operation would make the local topic branch be interesting and the
known remote refs as uninteresting. We want to discover the set of
new objects to send to the server as a thin pack.

We walk these commits until we discover a frontier of commits such
that every commit walk starting at interesting commits ends in a root
commit or unintersting commit. We then need to discover which
non-commit objects are reachable from  uninteresting commits.

The mark_edges_unintersting() method in list-objects.c iterates on
the commit list and does the following:

* If the commit is UNINTERSTING, then mark its root tree and every
  object it can reach as UNINTERESTING.

* If the commit is interesting, then mark the root tree of every
  UNINTERSTING parent (and all objects that tree can reach) as
  UNINTERSTING.

At the very end, we repeat the process on every commit directly
given to the revision walk from stdin. This helps ensure we properly
cover shallow commits that otherwise were not included in the
frontier.

The logic to recursively follow trees is in the
mark_tree_uninteresting() method in revision.c. The algorithm avoids
duplicate work by not recursing into trees that are already marked
UNINTERSTING.

Add a new 'sparse' option to the mark_edges_uninteresting() method
that performs this logic in a slightly new way. As we iterate over
the commits, we add all of the root trees to an oidset. Then, call
mark_trees_uninteresting_sparse() on that oidset. Note that we
include interesting trees in this process. The current implementation
of mark_trees_unintersting_sparse() will walk the same trees as
the old logic, but this will be replaced in a later change.

The sparse option is not used by any callers at the moment, but
will be wired to 'git pack-objects' in the next change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 bisect.c               |  2 +-
 builtin/pack-objects.c |  2 +-
 builtin/rev-list.c     |  2 +-
 http-push.c            |  2 +-
 list-objects.c         | 55 +++++++++++++++++++++++++++++++++++++++---
 list-objects.h         |  4 ++-
 revision.c             |  3 +++
 7 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/bisect.c b/bisect.c
index 487675c672..842f8b4b8f 100644
--- a/bisect.c
+++ b/bisect.c
@@ -656,7 +656,7 @@ static void bisect_common(struct rev_info *revs)
 	if (prepare_revision_walk(revs))
 		die("revision walk setup failed");
 	if (revs->tree_objects)
-		mark_edges_uninteresting(revs, NULL);
+		mark_edges_uninteresting(revs, NULL, 0);
 }
 
 static void exit_if_skipped_commits(struct commit_list *tried,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 411aefd687..5f70d840a7 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3135,7 +3135,7 @@ static void get_object_list(int ac, const char **av)
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
-	mark_edges_uninteresting(&revs, show_edge);
+	mark_edges_uninteresting(&revs, show_edge, 0);
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 2880ed37e3..9663cbfae0 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -543,7 +543,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 	if (revs.tree_objects)
-		mark_edges_uninteresting(&revs, show_edge);
+		mark_edges_uninteresting(&revs, show_edge, 0);
 
 	if (bisect_list) {
 		int reaches, all;
diff --git a/http-push.c b/http-push.c
index cd48590912..ea52d6f9f6 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1933,7 +1933,7 @@ int cmd_main(int argc, const char **argv)
 		pushing = 0;
 		if (prepare_revision_walk(&revs))
 			die("revision walk setup failed");
-		mark_edges_uninteresting(&revs, NULL);
+		mark_edges_uninteresting(&revs, NULL, 0);
 		objects_to_send = get_delta(&revs, ref_lock);
 		finish_all_active_slots();
 
diff --git a/list-objects.c b/list-objects.c
index c41cc80db5..4fbdeca0a4 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -222,25 +222,72 @@ static void mark_edge_parents_uninteresting(struct commit *commit,
 	}
 }
 
-void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
+static void add_edge_parents(struct commit *commit,
+			     struct rev_info *revs,
+			     show_edge_fn show_edge,
+			     struct oidset *set)
+{
+	struct commit_list *parents;
+
+	for (parents = commit->parents; parents; parents = parents->next) {
+		struct commit *parent = parents->item;
+		struct tree *tree = get_commit_tree(parent);
+
+		if (!tree)
+			continue;
+
+		oidset_insert(set, &tree->object.oid);
+
+		if (!(parent->object.flags & UNINTERESTING))
+			continue;
+		tree->object.flags |= UNINTERESTING;
+
+		if (revs->edge_hint && !(parent->object.flags & SHOWN)) {
+			parent->object.flags |= SHOWN;
+			show_edge(parent);
+		}
+	}
+}
+
+void mark_edges_uninteresting(struct rev_info *revs,
+			      show_edge_fn show_edge,
+			      int sparse)
 {
 	struct commit_list *list;
+	struct oidset set;
 	int i;
 
+	if (sparse)
+		oidset_init(&set, 16);
+
 	for (list = revs->commits; list; list = list->next) {
 		struct commit *commit = list->item;
 
-		if (commit->object.flags & UNINTERESTING) {
+		if (sparse) {
+			struct tree *tree = get_commit_tree(commit);
+
+			if (commit->object.flags & UNINTERESTING)
+				tree->object.flags |= UNINTERESTING;
+
+			oidset_insert(&set, &tree->object.oid);
+			add_edge_parents(commit, revs, show_edge, &set);
+		} else if (commit->object.flags & UNINTERESTING) {
 			mark_tree_uninteresting(revs->repo,
 						get_commit_tree(commit));
 			if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {
 				commit->object.flags |= SHOWN;
 				show_edge(commit);
 			}
-			continue;
+		} else {
+			mark_edge_parents_uninteresting(commit, revs, show_edge);
 		}
-		mark_edge_parents_uninteresting(commit, revs, show_edge);
 	}
+
+	if (sparse) {
+		mark_trees_uninteresting_sparse(revs->repo, &set);
+		oidset_clear(&set);
+	}
+
 	if (revs->edge_hint_aggressive) {
 		for (i = 0; i < revs->cmdline.nr; i++) {
 			struct object *obj = revs->cmdline.rev[i].item;
diff --git a/list-objects.h b/list-objects.h
index ad40762926..a952680e46 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -10,7 +10,9 @@ typedef void (*show_object_fn)(struct object *, const char *, void *);
 void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
 
 typedef void (*show_edge_fn)(struct commit *);
-void mark_edges_uninteresting(struct rev_info *, show_edge_fn);
+void mark_edges_uninteresting(struct rev_info *revs,
+			      show_edge_fn show_edge,
+			      int sparse);
 
 struct oidset;
 struct list_objects_filter_options;
diff --git a/revision.c b/revision.c
index 3a62c7c187..f9eb6400f1 100644
--- a/revision.c
+++ b/revision.c
@@ -109,6 +109,9 @@ void mark_trees_uninteresting_sparse(struct repository *r,
 	while ((oid = oidset_iter_next(&iter))) {
 		struct tree *tree = lookup_tree(r, oid);
 
+		if (!tree)
+			continue;
+
 		if (tree->object.flags & UNINTERESTING) {
 			/*
 			 * Remove the flag so the next call
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 4/6] revision: implement sparse algorithm
From: Derrick Stolee via GitGitGadget @ 2018-12-10 16:42 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v3.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

When enumerating objects to place in a pack-file during 'git
pack-objects --revs', we discover the "frontier" of commits
that we care about and the boundary with commit we find
uninteresting. From that point, we walk trees to discover which
trees and blobs are uninteresting. Finally, we walk trees to find
the interesting trees.

This commit introduces a new, "sparse" way to discover the
uninteresting trees. We use the perspective of a single user trying
to push their topic to a large repository. That user likely changed
a very small fraction of the paths in their working directory, but
we spend a lot of time walking all reachable trees.

The way to switch the logic to work in this sparse way is to start
caring about which paths introduce new trees. While it is not
possible to generate a diff between the frontier boundary and all
of the interesting commits, we can simulate that behavior by
inspecting all of the root trees as a whole, then recursing down
to the set of trees at each path.

We already had taken the first step by passing an oidset to
mark_trees_uninteresting_sparse(). We now create a dictionary
whose keys are paths and values are oidsets. We consider the set
of trees that appear at each path. While we inspect a tree, we
add its subtrees to the oidsets corresponding to the tree entry's
path. We also mark trees as UNINTERESTING if the tree we are
parsing is UNINTERESTING.

To actually improve the peformance, we need to terminate our
recursion unless the oidset contains some intersting trees and
some uninteresting trees. Technically, we only need one interesting
tree for this to speed up in most cases, but we also will not mark
anything UNINTERESTING if there are no uninteresting trees, so
that would be wasted effort.

There are a few ways that this is not a universally better option.

First, we can pack extra objects. If someone copies a subtree
from one tree to another, the first tree will appear UNINTERESTING
and we will not recurse to see that the subtree should also be
UNINTERESTING. We will walk the new tree and see the subtree as
a "new" object and add it to the pack. We add a test case that
demonstrates this as a way to prove that the --sparse option is
actually working.

Second, we can have extra memory pressure. If instead of being a
single user pushing a small topic we are a server sending new
objects from across the entire working directory, then we will
gain very little (the recursion will rarely terminate early) but
will spend extra time maintaining the path-oidset dictionaries.

Despite these potential drawbacks, the benefits of the algorithm
are clear. By adding a counter to 'add_children_by_path' and
'mark_tree_contents_uninteresting', I measured the number of
parsed trees for the two algorithms in a variety of repos.

For git.git, I used the following input:

	v2.19.0
	^v2.19.0~10

 Objects to pack: 550
Walked (old alg): 282
Walked (new alg): 130

For the Linux repo, I used the following input:

	v4.18
	^v4.18~10

 Objects to pack:   518
Walked (old alg): 4,836
Walked (new alg):   188

The two repos above are rather "wide and flat" compared to
other repos that I have used in the past. As a comparison,
I tested an old topic branch in the Azure DevOps repo, which
has a much deeper folder structure than the Linux repo.

 Objects to pack:    220
Walked (old alg): 22,804
Walked (new alg):    129

I used the number of walked trees the main metric above because
it is consistent across multiple runs. When I ran my tests, the
performance of the pack-objects command with the same options
could change the end-to-end time by 10x depending on the file
system being warm. However, by repeating the same test on repeat
I could get more consistent timing results. The git.git and
Linux tests were too fast overall (less than 0.5s) to measure
an end-to-end difference. The Azure DevOps case was slow enough
to see the time improve from 15s to 1s in the warm case. The
cold case was 90s to 9s in my testing.

These improvements will have even larger benefits in the super-
large Windows repository. In our experiments, we see the
"Enumerate objects" phase of pack-objects taking 60-80% of the
end-to-end time of non-trivial pushes, taking longer than the
network time to send the pack and the server time to verify the
pack.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 revision.c                     | 116 ++++++++++++++++++++++++++++++---
 t/t5322-pack-objects-sparse.sh |  21 ++++--
 2 files changed, 121 insertions(+), 16 deletions(-)

diff --git a/revision.c b/revision.c
index f9eb6400f1..971f1bb095 100644
--- a/revision.c
+++ b/revision.c
@@ -99,29 +99,125 @@ void mark_tree_uninteresting(struct repository *r, struct tree *tree)
 	mark_tree_contents_uninteresting(r, tree);
 }
 
+struct paths_and_oids {
+	struct string_list list;
+};
+
+static void paths_and_oids_init(struct paths_and_oids *po)
+{
+	string_list_init(&po->list, 1);
+}
+
+static void paths_and_oids_clear(struct paths_and_oids *po)
+{
+	int i;
+	for (i = 0; i < po->list.nr; i++) {
+		oidset_clear(po->list.items[i].util);
+		free(po->list.items[i].util);
+	}
+
+	string_list_clear(&po->list, 0);
+}
+
+static void paths_and_oids_insert(struct paths_and_oids *po,
+				  const char *path,
+				  const struct object_id *oid)
+{
+	struct string_list_item *item = string_list_insert(&po->list, path);
+	struct oidset *set;
+
+	if (!item->util) {
+		set = xcalloc(1, sizeof(struct oidset));
+		oidset_init(set, 16);
+		item->util = set;
+	} else {
+		set = item->util;
+	}
+
+	oidset_insert(set, oid);
+}
+
+static void add_children_by_path(struct repository *r,
+				 struct tree *tree,
+				 struct paths_and_oids *po)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+
+	if (!tree)
+		return;
+
+	if (parse_tree_gently(tree, 1) < 0)
+		return;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+	while (tree_entry(&desc, &entry)) {
+		switch (object_type(entry.mode)) {
+		case OBJ_TREE:
+			paths_and_oids_insert(po, entry.path, entry.oid);
+
+			if (tree->object.flags & UNINTERESTING) {
+				struct tree *child = lookup_tree(r, entry.oid);
+				if (child)
+					child->object.flags |= UNINTERESTING;
+			}
+			break;
+		case OBJ_BLOB:
+			if (tree->object.flags & UNINTERESTING) {
+				struct blob *child = lookup_blob(r, entry.oid);
+				if (child)
+					child->object.flags |= UNINTERESTING;
+			}
+			break;
+		default:
+			/* Subproject commit - not in this repository */
+			break;
+		}
+	}
+
+	free_tree_buffer(tree);
+}
+
 void mark_trees_uninteresting_sparse(struct repository *r,
 				     struct oidset *set)
 {
+	int i;
+	unsigned has_interesting = 0, has_uninteresting = 0;
+	struct paths_and_oids po;
 	struct object_id *oid;
 	struct oidset_iter iter;
 
 	oidset_iter_init(set, &iter);
-	while ((oid = oidset_iter_next(&iter))) {
+	while ((!has_interesting || !has_uninteresting) &&
+	       (oid = oidset_iter_next(&iter))) {
 		struct tree *tree = lookup_tree(r, oid);
 
 		if (!tree)
 			continue;
 
-		if (tree->object.flags & UNINTERESTING) {
-			/*
-			 * Remove the flag so the next call
-			 * is not a no-op. The flag is added
-			 * in mark_tree_unintersting().
-			 */
-			tree->object.flags ^= UNINTERESTING;
-			mark_tree_uninteresting(r, tree);
-		}
+		if (tree->object.flags & UNINTERESTING)
+			has_uninteresting = 1;
+		else
+			has_interesting = 1;
 	}
+
+	/* Do not walk unless we have both types of trees. */
+	if (!has_uninteresting || !has_interesting)
+		return;
+
+	paths_and_oids_init(&po);
+
+	oidset_iter_init(set, &iter);
+	while ((oid = oidset_iter_next(&iter))) {
+		struct tree *tree = lookup_tree(r, oid);
+		add_children_by_path(r, tree, &po);
+	}
+
+	for (i = 0; i < po.list.nr; i++)
+		mark_trees_uninteresting_sparse(
+			r, (struct oidset *)po.list.items[i].util);
+
+	paths_and_oids_clear(&po);
 }
 
 struct commit_stack {
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
index 81f6805bc3..45dba6e014 100755
--- a/t/t5322-pack-objects-sparse.sh
+++ b/t/t5322-pack-objects-sparse.sh
@@ -83,22 +83,25 @@ test_expect_success 'sparse pack-objects' '
 	test_cmp expect_objects.txt sparse_objects.txt
 '
 
+# Demonstrate that the algorithms differ when we copy a tree wholesale
+# from one folder to another.
+
 test_expect_success 'duplicate a folder from f1 into f3' '
 	mkdir f3/f4 &&
 	cp -r f1/f1/* f3/f4 &&
 	git add f3/f4 &&
 	git commit -m "Copied f1/f1 to f3/f4" &&
-	cat >packinput.txt <<-EOF &&
+	cat >packinput.txt <<-EOF
 	topic1
 	^topic1~1
 	EOF
-	git rev-parse		\
-		topic1		\
-		topic1^{tree}	\
-		topic1:f3 | sort >expect_objects.txt
 '
 
 test_expect_success 'non-sparse pack-objects' '
+	git rev-parse			\
+		topic1			\
+		topic1^{tree}		\
+		topic1:f3 | sort >expect_objects.txt &&
 	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
 	git index-pack -o nonsparse.idx nonsparse.pack &&
 	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
@@ -106,10 +109,16 @@ test_expect_success 'non-sparse pack-objects' '
 '
 
 test_expect_success 'sparse pack-objects' '
+	git rev-parse			\
+		topic1			\
+		topic1^{tree}		\
+		topic1:f3		\
+		topic1:f3/f4		\
+		topic1:f3/f4/data.txt | sort >expect_sparse_objects.txt &&
 	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
 	git index-pack -o sparse.idx sparse.pack &&
 	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
-	test_cmp expect_objects.txt sparse_objects.txt
+	test_cmp expect_sparse_objects.txt sparse_objects.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 6/6] pack-objects: create GIT_TEST_PACK_SPARSE
From: Derrick Stolee via GitGitGadget @ 2018-12-10 16:42 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v3.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Create a test variable GIT_TEST_PACK_SPARSE to enable the sparse
object walk algorithm by default during the test suite. Enabling
this variable ensures coverage in many interesting cases, such as
shallow clones, partial clones, and missing objects.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/pack-objects.c         | 1 +
 t/README                       | 4 ++++
 t/t5322-pack-objects-sparse.sh | 6 +++---
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 124b1bafc4..507d381153 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3331,6 +3331,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 
+	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", 0);
 	reset_pack_idx_option(&pack_idx_opts);
 	git_config(git_pack_config, NULL);
 
diff --git a/t/README b/t/README
index 28711cc508..8b6dfe1864 100644
--- a/t/README
+++ b/t/README
@@ -342,6 +342,10 @@ GIT_TEST_INDEX_VERSION=<n> exercises the index read/write code path
 for the index version specified.  Can be set to any valid version
 (currently 2, 3, or 4).
 
+GIT_TEST_PACK_SPARSE=<boolean> if enabled will default the pack-objects
+builtin to use the sparse object walk. This can still be overridden by
+the --no-sparse command-line argument.
+
 GIT_TEST_PRELOAD_INDEX=<boolean> exercises the preload-index code path
 by overriding the minimum number of cache entries required per thread.
 
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
index 8f5699bd91..e8cf41d1c6 100755
--- a/t/t5322-pack-objects-sparse.sh
+++ b/t/t5322-pack-objects-sparse.sh
@@ -36,7 +36,7 @@ test_expect_success 'setup repo' '
 '
 
 test_expect_success 'non-sparse pack-objects' '
-	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git pack-objects --stdout --revs --no-sparse <packinput.txt >nonsparse.pack &&
 	git index-pack -o nonsparse.idx nonsparse.pack &&
 	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
 	test_cmp expect_objects.txt nonsparse_objects.txt
@@ -70,7 +70,7 @@ test_expect_success 'duplicate a folder from f3 and commit to topic1' '
 '
 
 test_expect_success 'non-sparse pack-objects' '
-	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git pack-objects --stdout --revs --no-sparse <packinput.txt >nonsparse.pack &&
 	git index-pack -o nonsparse.idx nonsparse.pack &&
 	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
 	test_cmp expect_objects.txt nonsparse_objects.txt
@@ -102,7 +102,7 @@ test_expect_success 'non-sparse pack-objects' '
 		topic1			\
 		topic1^{tree}		\
 		topic1:f3 | sort >expect_objects.txt &&
-	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git pack-objects --stdout --revs --no-sparse <packinput.txt >nonsparse.pack &&
 	git index-pack -o nonsparse.idx nonsparse.pack &&
 	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
 	test_cmp expect_objects.txt nonsparse_objects.txt
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v3 5/6] pack-objects: create pack.useSparse setting
From: Derrick Stolee via GitGitGadget @ 2018-12-10 16:42 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v3.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The '--sparse' flag in 'git pack-objects' changes the algorithm
used to enumerate objects to one that is faster for individual
users pushing new objects that change only a small cone of the
working directory. The sparse algorithm is not recommended for a
server, which likely sends new objects that appear across the
entire working directory.

Create a 'pack.useSparse' setting that enables this new algorithm.
This allows 'git push' to use this algorithm without passing a
'--sparse' flag all the way through four levels of run_command()
calls.

If the '--no-sparse' flag is set, then this config setting is
overridden.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/pack.txt  |  9 +++++++++
 builtin/pack-objects.c         |  4 ++++
 t/t5322-pack-objects-sparse.sh | 15 +++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index edac75c83f..425c73aa52 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -105,6 +105,15 @@ pack.useBitmaps::
 	true. You should not generally need to turn this off unless
 	you are debugging pack bitmaps.
 
+pack.useSparse::
+	When true, git will default to using the '--sparse' option in
+	'git pack-objects' when the '--revs' option is present. This
+	algorithm only walks trees that appear in paths that introduce new
+	objects. This can have significant performance benefits when
+	computing a pack to send a small change. However, it is possible
+	that extra objects are added to the pack-file if the included
+	commits contain certain types of direct renames.
+
 pack.writeBitmaps (deprecated)::
 	This is a deprecated synonym for `repack.writeBitmaps`.
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7d5b0735e3..124b1bafc4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2711,6 +2711,10 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		use_bitmap_index_default = git_config_bool(k, v);
 		return 0;
 	}
+	if (!strcmp(k, "pack.usesparse")) {
+		sparse = git_config_bool(k, v);
+		return 0;
+	}
 	if (!strcmp(k, "pack.threads")) {
 		delta_search_threads = git_config_int(k, v);
 		if (delta_search_threads < 0)
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
index 45dba6e014..8f5699bd91 100755
--- a/t/t5322-pack-objects-sparse.sh
+++ b/t/t5322-pack-objects-sparse.sh
@@ -121,4 +121,19 @@ test_expect_success 'sparse pack-objects' '
 	test_cmp expect_sparse_objects.txt sparse_objects.txt
 '
 
+test_expect_success 'pack.useSparse enables algorithm' '
+	git config pack.useSparse true &&
+	git pack-objects --stdout --revs <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_sparse_objects.txt sparse_objects.txt
+'
+
+test_expect_success 'pack.useSparse overridden' '
+	git pack-objects --stdout --revs --no-sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* Re: [PATCH 5/8] checkout: introduce --{,no-}overlay option
From: Duy Nguyen @ 2018-12-10 16:42 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git Mailing List, Junio C Hamano, Elijah Newren
In-Reply-To: <20181209200449.16342-6-t.gummerer@gmail.com>

On Sun, Dec 9, 2018 at 9:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> @@ -302,15 +310,29 @@ static int checkout_paths(const struct checkout_opts *opts,
>                 ce->ce_flags &= ~CE_MATCHED;
>                 if (!opts->ignore_skipworktree && ce_skip_worktree(ce))
>                         continue;
> -               if (opts->source_tree && !(ce->ce_flags & CE_UPDATE))
> -                       /*
> -                        * "git checkout tree-ish -- path", but this entry
> -                        * is in the original index; it will not be checked
> -                        * out to the working tree and it does not matter
> -                        * if pathspec matched this entry.  We will not do
> -                        * anything to this entry at all.
> -                        */
> -                       continue;
> +               if (opts->source_tree && !(ce->ce_flags & CE_UPDATE)) {
> +                       if (!opts->overlay_mode &&
> +                           ce_path_match(&the_index, ce, &opts->pathspec, ps_matched)) {
> +                               /*
> +                                * "git checkout --no-overlay <tree-ish> -- path",
> +                                * and the path is not in tree-ish, but is in
> +                                * the current index, which means that it should
> +                                * be removed.
> +                                */
> +                               ce->ce_flags |= CE_MATCHED | CE_REMOVE | CE_WT_REMOVE;
> +                               continue;
> +                       } else {

In non-overlay mode but when pathspec does not match, we come here too.

> +                               /*
> +                                * "git checkout tree-ish -- path", but this
> +                                * entry is in the original index; it will not

I think the missing key point in this comment block is "..is in the
original index _and it's not in tree-ish_". In non-overlay mode, if
pathspec does not match then it's safe to ignore too. But this logic
starts too get to complex and hurt my brain.

> +                                * be checked out to the working tree and it
> +                                * does not matter if pathspec matched this
> +                                * entry.  We will not do anything to this entry
> +                                * at all.
> +                                */
> +                               continue;
> +                       }
> +               }
>                 /*
>                  * Either this entry came from the tree-ish we are
>                  * checking the paths out of, or we are checking out

> @@ -1266,6 +1299,7 @@ int cmd_checkout(int argc, const char **argv, const char *prefix)
>                             "checkout", "control recursive updating of submodules",
>                             PARSE_OPT_OPTARG, option_parse_recurse_submodules_worktree_updater },
>                 OPT_BOOL(0, "progress", &opts.show_progress, N_("force progress reporting")),
> +               OPT_BOOL(0, "overlay", &opts.overlay_mode, N_("use overlay mode")),

maybe add " (default)" to the help string.

>                 OPT_END(),
>         };
>
-- 
Duy

^ permalink raw reply

* Re: [PATCH 6/8] checkout: add --cached option
From: Duy Nguyen @ 2018-12-10 16:49 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git Mailing List, Junio C Hamano, Elijah Newren
In-Reply-To: <20181209200449.16342-7-t.gummerer@gmail.com>

On Sun, Dec 9, 2018 at 9:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> Add a new --cached option to git checkout, which works only on the
> index, but not the working tree, similar to what 'git reset <tree-ish>
> -- <pathspec>... does.

Elijah wanted another mode (and I agree) that modifies worktree but
leaves the index alone. This is most useful (or least confusing) when
used with <tree-ish> and would be default in restore-files. I'm not
saying you have to implement it, but how do the new command line
options are designed to make sense?

I guess if --cached is "update index only" then --no-cached goes back
to the default "update both worktree and index" and we need a another
option for "worktree only"? Can we have one option with three possible
values (index-only, index-and-worktree, worktree-only) maybe?
-- 
Duy

^ permalink raw reply

* Re: [PATCH 7/8] checkout: allow ignoring unmatched pathspec
From: Duy Nguyen @ 2018-12-10 16:51 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git Mailing List, Junio C Hamano, Elijah Newren
In-Reply-To: <20181209200449.16342-8-t.gummerer@gmail.com>

On Sun, Dec 9, 2018 at 9:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> Currently when 'git checkout -- <pathspec>...' is invoked with
> multiple pathspecs, where one or more of the pathspecs don't match
> anything, checkout errors out.
>
> This can be inconvenient in some cases, such as when using git
> checkout from a script.

Wait, should scripts go with read-tree, checkout-index or other
plumbing commands instead?
-- 
Duy

^ permalink raw reply

* Re: [PATCH 0/8] introduce no-overlay and cached mode in git checkout
From: Duy Nguyen @ 2018-12-10 17:06 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git Mailing List, Junio C Hamano, Elijah Newren
In-Reply-To: <20181209200449.16342-1-t.gummerer@gmail.com>

On Sun, Dec 9, 2018 at 9:04 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> My hope is also that the no-overlay mode could become the new default
> in the restore-files command Duy is currently working on.

I already wrote something like that in git-restore-files.txt even
though the implementation is not there :D

There will be a hell of conflicts when the two series enter 'pu' (or
even worse, when the third one to update worktree only appears) so I'm
going to send the switch-branch/restore-files series out but mostly to
gather comments and will rebase once the other series land.

(Alternatively I'll split my series and let the switch-branch part
land first, may be simpler)
-- 
Duy

^ permalink raw reply

* Re: [PATCH 0/8] introduce no-overlay and cached mode in git checkout
From: Elijah Newren @ 2018-12-10 17:18 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Junio C Hamano,
	Nguyễn Thái Ngọc
In-Reply-To: <20181209200449.16342-1-t.gummerer@gmail.com>

On Sun, Dec 9, 2018 at 12:04 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> Here's the series I mentioned a couple of times on the list already,
> introducing a no-overlay mode in 'git checkout'.  The inspiration for
> this came from Junios message in [*1*].
>
> Basically the idea is to also delete files when the match <pathspec>
> in 'git checkout <tree-ish> -- <pathspec>' in the current tree, but
> don't match <pathspec> in <tree-ish>.  The rest of the cases are
> already properly taken care of by 'git checkout'.

Yes, but I'd put it a little differently:

"""
Basically, the idea is when the user run "git checkout --no-overlay
<tree-ish> -- <pathspec>" that the given pathspecs should exactly
match <tree-ish> after the operation completes.  This means that we
also want to delete files that match <pathspec> if those paths are not
found in <tree-ish>.
"""

...and maybe even toss in some comments about the fact that this is
the way git checkout should have always behaved, it just traditionally
hasn't.  (You could also work in comments about how with this new mode
the user can run git diff afterward with the given commit-ish and
pathspecs and get back an empty diff, as expected, which wasn't true
before.  But maybe I'm belaboring the point.)


> The final step in the series is to actually make use of this in 'git
> stash', which simplifies the code there a bit.  I am however happy to
> hold off on this step until the stash-in-C series is merged, so we
> don't delay that further.
>
> In addition to the no-overlay mode, we also add a --cached mode, which
> works only on the index, thus similar to 'git reset <tree-ish> -- <pathspec>'.

If you're adding a --cached mode to make it only work on the index,
should there be a similar mode to allow it to only work on the working
tree?  (I'm not as concerned with that here, but I really think the
new restore-files command by default should only operate on the
working tree, and then have options to affect the index either in
addition or instead of the working tree.)

> Actually deprecating 'git reset <tree-ish> -- <pathspec>' should come
> later, probably not before Duy's restore-files command lands, as 'git
> checkout --no-overlay <tree-ish> -- <pathspec>' is a bit cumbersome to
> type compared to 'git reset <tree-ish> -- <pathspec>'.

Makes sense.

> My hope is also that the no-overlay mode could become the new default
> in the restore-files command Duy is currently working on.

Absolutely, yes.  I don't want another broken command.  :-)


> No documentation yet, as I wanted to get this out for review first.
> I'm not familiar with most of the code I touched here, so there may
> well be much better ways to implement some of this, that I wasn't able
> to figure out.  I'd be very happy with some feedback around that.
>
> Another thing I'm not sure about is how to deal with conflicts.  In
> the cached mode this patch series is not dealing with it at all, as
> 'git checkout -- <pathspec>' when pathspec matches a file with
> conflicts doesn't update the index.  For the no-overlay mode, the file
> is removed if the corresponding stage is not found in the index.  I'm
> however not sure this is the right thing to do in all cases?

Here's how I'd go about analyzing that...

If the user passes a <tree-ish>, then the answer about what to do is
pretty obvious; the <tree-ish> didn't have conflicts, so conflicted
paths in the index that match the pathspec should be overwritten with
whatever version of those paths existed in <tree-ish> (possibly
implying deletion of some paths).

Also, as you point out, --cached means only modify the index and not
the working tree; so if they specify both --cached and provide no
tree, then they've specified a no-op.

So it's only interesting when you have conflicts in the index and
specify --no-overlay without a <tree-ish> or --cached.  This boils
down to "how do we update the working tree to match the index, when
the index is conflicted?"  A couple points to consider:
  * This is somewhat of an edge case
  * In the normal case --no-overlay is only different from --overlay
behavior for directories; it'd be nice if that extended to all cases
  * How does this command behave without a <tree-ish> when
--no-overlay is specified and a directory is given for a <pathspec>
and there aren't any conflicts?  Are we being consistent with that
behavior?


However, I think it turns out that the answer is much simpler than all
that initial analysis or what you say you've implemented.  Here's why:

If <pathspec> is a file which is present in both the working tree and
the index and it has conflicts, then "git checkout -- <pathspec>" will
currently throw an error:

$ git checkout -- subdir/counting
error: path 'subdir/counting' is unmerged

In fact, even if every entry in subdir/ is a path that is in both the
index and the working tree (so that --no-overlay and --overlay ought
to behave the same), if any one of the files in subdir is conflicted,
attempting to checkout the subdir will abort with this same error
message and no paths will be updated at all:

$ git checkout -- subdir
error: path 'subdir/counting' is unmerged

as such, the answer with what to do with --no-overlay mode is pretty
clear: if the <pathspec> matches _any_ path that is conflicted, simply
throw an error and abort the operation without making any changes at
all.

^ permalink raw reply

* Re: [PATCH 2/8] entry: factor out unlink_entry function
From: Elijah Newren @ 2018-12-10 17:23 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Thomas Gummerer, Git Mailing List, Junio C Hamano
In-Reply-To: <CACsJy8AGerhjnT0O6vT264tND78N5cbgFREzYtdmriXERu0Jtw@mail.gmail.com>

On Mon, Dec 10, 2018 at 7:50 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Sun, Dec 9, 2018 at 9:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> >
> > Factor out the 'unlink_entry()' function from unpack-trees.c to
> > entry.c.  It will be used in other places as well in subsequent
> > steps.
> >
> > As it's no longer a static function, also move the documentation to
> > the header file to make it more discoverable.

I also started using unlink_entry() in another place in a local patch
series that I haven't submitted yet (and which I need to get back to
at some point).  So this will help me too.  :-)

> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  cache.h        |  5 +++++
> >  entry.c        | 15 +++++++++++++++
> >  unpack-trees.c | 19 -------------------
> >  3 files changed, 20 insertions(+), 19 deletions(-)
> >
> > diff --git a/cache.h b/cache.h
> > index ca36b44ee0..c1c953e810 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1542,6 +1542,11 @@ struct checkout {
> >  extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
> >  extern void enable_delayed_checkout(struct checkout *state);
> >  extern int finish_delayed_checkout(struct checkout *state);
> > +/*
> > + * Unlink the last component and schedule the leading directories for
> > + * removal, such that empty directories get removed.
> > + */
> > +extern void unlink_entry(const struct cache_entry *ce);
>
> I'm torn. We try to remove 'extern' but I can see you may want to add
> it here to be consistent with others. And removing extern even from
> functions from entry.c only would cause some conflicts.
>
> I wonder if we should move the 'removal' variable in symlinks to
> 'struct checkout' to reduce another global variable. But I guess
> that's the problem for another day. It's not the focus of this series.

"move the 'removal' variable in symlinks"?  I'm having a really hard
time parsing that phrase and the sentence it's embedded in.  Could you
reword for me Duy?

^ permalink raw reply

* Re: [PATCH 2/8] entry: factor out unlink_entry function
From: Duy Nguyen @ 2018-12-10 17:27 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Thomas Gummerer, Git Mailing List, Junio C Hamano
In-Reply-To: <CABPp-BE15jfe76q3hqSxnifv8MNB1e6_GDT=5=ZmTE__XuTmLw@mail.gmail.com>

On Mon, Dec 10, 2018 at 6:23 PM Elijah Newren <newren@gmail.com> wrote:
> > I wonder if we should move the 'removal' variable in symlinks to
> > 'struct checkout' to reduce another global variable. But I guess
> > that's the problem for another day. It's not the focus of this series.
>
> "move the 'removal' variable in symlinks"?  I'm having a really hard
> time parsing that phrase and the sentence it's embedded in.  Could you
> reword for me Duy?

Sorry s/in symlinks/&.c/. There's a global variable named 'removal' in
symlinks.c which is used by schedule_dir_for_removal() and this
function in turn is used by unlink_entry().
-- 
Duy

^ permalink raw reply

* Re: [PATCH 3/8] entry: support CE_WT_REMOVE flag in checkout_entry
From: Elijah Newren @ 2018-12-10 17:49 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Junio C Hamano,
	Nguyễn Thái Ngọc
In-Reply-To: <20181209200449.16342-4-t.gummerer@gmail.com>

On Sun, Dec 9, 2018 at 12:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> 'checkout_entry()' currently only supports creating new entries in the
> working tree, but not deleting them.  Add the ability to remove
> entries at the same time if the entry is marked with the CE_WT_REMOVE
> flag.
>
> Currently this doesn't have any effect, as the CE_WT_REMOVE flag is
> only used in unpack-tree, however we will make use of this in a
> subsequent step in the series.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  entry.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/entry.c b/entry.c
> index 3ec148ceee..cd1c6601b6 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -441,6 +441,13 @@ int checkout_entry(struct cache_entry *ce,
>         static struct strbuf path = STRBUF_INIT;
>         struct stat st;
>
> +       if (ce->ce_flags & CE_WT_REMOVE) {
> +               if (topath)
> +                       BUG("Can't remove entry to a path");

Minor nit: This error message is kinda hard to parse, for someone not
that familiar with all the *_entry functions, like myself.  Maybe add
a comment before this line:
    /* No content and thus no path to create, so we have no pathname
to return */
or reword the error slightly?  Or maybe it's fine and I was just
confused from lack of code familiarity, but I'll throw it out there
since I stumbled on it a bit.

> +               unlink_entry(ce);
> +               return 0;
> +       }
> +
>         if (topath)
>                 return write_entry(ce, topath, state, 1);
>
> --
> 2.20.0.405.gbc1bbc6f85

^ permalink raw reply

* [PATCH 0/5] Create 'expire' and 'repack' verbs for git-multi-pack-index
From: Derrick Stolee via GitGitGadget @ 2018-12-10 18:06 UTC (permalink / raw)
  To: git; +Cc: sbeller, peff, jrnieder, avarab, Junio C Hamano

The multi-pack-index provides a fast way to find an object among a large
list of pack-files. It stores a single pack-reference for each object id, so
duplicate objects are ignored. Among a list of pack-files storing the same
object, the most-recently modified one is used.

Create new verbs for the multi-pack-index builtin.

 * 'git multi-pack-index expire': If we have a pack-file indexed by the
   multi-pack-index, but all objects in that pack are duplicated in
   more-recently modified packs, then delete that pack (and any others like
   it). Delete the reference to that pack in the multi-pack-index.
   
   
 * 'git multi-pack-index repack --batch-size=': Starting from the oldest
   pack-files covered by the multi-pack-index, find those whose on-disk size
   is below the batch size until we have a collection of packs whose sizes
   add up to the batch size. Create a new pack containing all objects that
   the multi-pack-index references to those packs.
   
   

This allows us to create a new pattern for repacking objects: run 'repack'.
After enough time has passed that all Git commands that started before the
last 'repack' are finished, run 'expire' again. This approach has some
advantages over the existing "repack everything" model:

 1. Incremental. We can repack a small batch of objects at a time, instead
    of repacking all reachable objects. We can also limit ourselves to the
    objects that do not appear in newer pack-files.
    
    
 2. Highly Available. By adding a new pack-file (and not deleting the old
    pack-files) we do not interrupt concurrent Git commands, and do not
    suffer performance degradation. By expiring only pack-files that have no
    referenced objects, we know that Git commands that are doing normal
    object lookups* will not be interrupted.
    
    
 3. Note: if someone concurrently runs a Git command that uses
    get_all_packs(), then that command could try to read the pack-files and
    pack-indexes that we are deleting during an expire command. Such
    commands are usually related to object maintenance (i.e. fsck, gc,
    pack-objects) or are related to less-often-used features (i.e.
    fast-import, http-backend, server-info).
    
    

We plan to use this approach in VFS for Git to do background maintenance of
the "shared object cache" which is a Git alternate directory filled with
packfiles containing commits and trees. We currently download pack-files on
an hourly basis to keep up-to-date with the central server. The cache
servers supply packs on an hourly and daily basis, so most of the hourly
packs become useless after a new daily pack is downloaded. The 'expire'
command would clear out most of those packs, but many will still remain with
fewer than 100 objects remaining. The 'repack' command (with a batch size of
1-3gb, probably) can condense the remaining packs in commands that run for
1-3 min at a time. Since the daily packs range from 100-250mb, we will also
combine and condense those packs.

Thanks, -Stolee

Derrick Stolee (5):
  multi-pack-index: prepare for 'expire' verb
  midx: refactor permutation logic
  multi-pack-index: implement 'expire' verb
  multi-pack-index: prepare 'repack' verb
  midx: implement midx_repack()

 Documentation/git-multi-pack-index.txt |  20 +++
 builtin/multi-pack-index.c             |  12 +-
 midx.c                                 | 222 +++++++++++++++++++++++--
 midx.h                                 |   2 +
 t/t5319-multi-pack-index.sh            |  98 +++++++++++
 5 files changed, 343 insertions(+), 11 deletions(-)


base-commit: 26aa9fc81d4c7f6c3b456a29da0b7ec72e5c6595
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-92%2Fderrickstolee%2Fmidx-expire%2Fupstream-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-92/derrickstolee/midx-expire/upstream-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/92
-- 
gitgitgadget

^ permalink raw reply

* [PATCH 1/5] multi-pack-index: prepare for 'expire' verb
From: Derrick Stolee via GitGitGadget @ 2018-12-10 18:06 UTC (permalink / raw)
  To: git; +Cc: sbeller, peff, jrnieder, avarab, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.92.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The multi-pack-index tracks objects in a collection of pack-files.
Only one copy of each object is indexed, using the modified time
of the pack-files to determine tie-breakers. It is possible to
have a pack-file with no referenced objects because all objects
have a duplicate in a newer pack-file.

Introduce a new 'expire' verb to the multi-pack-index builtin.
This verb will delete these unused pack-files and rewrite the
multi-pack-index to no longer refer to those files. More details
about the specifics will follow as the method is implemented.

Add a test that verifies the 'expire' verb is correctly wired,
but will still be valid when the verb is implemented. Specifically,
create a set of packs that should all have referenced objects and
should not be removed during an 'expire' operation.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  8 +++++
 builtin/multi-pack-index.c             |  4 ++-
 midx.c                                 |  5 +++
 midx.h                                 |  1 +
 t/t5319-multi-pack-index.sh            | 47 ++++++++++++++++++++++++++
 5 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index f7778a2c85..822d83c845 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -31,6 +31,14 @@ verify::
 	When given as the verb, verify the contents of the MIDX file
 	at `<dir>/packs/multi-pack-index`.
 
+expire::
+	When given as the verb, delete the pack-files that are tracked
+	by the MIDX file at `<dir>/packs/multi-pack-index` but have
+	no objects referenced by the MIDX. All objects in these pack-
+	files have another copy in a more-recently modified pack-file.
+	Rewrite the MIDX file afterward to remove all references to
+	these pack-files.
+
 
 EXAMPLES
 --------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index fca70f8e4f..145de3a46c 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,7 +5,7 @@
 #include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-	N_("git multi-pack-index [--object-dir=<dir>] (write|verify)"),
+	N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire)"),
 	NULL
 };
 
@@ -44,6 +44,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		return write_midx_file(opts.object_dir);
 	if (!strcmp(argv[0], "verify"))
 		return verify_midx_file(opts.object_dir);
+	if (!strcmp(argv[0], "expire"))
+		return expire_midx_packs(opts.object_dir);
 
 	die(_("unrecognized verb: %s"), argv[0]);
 }
diff --git a/midx.c b/midx.c
index 730ff84dff..bb825ef816 100644
--- a/midx.c
+++ b/midx.c
@@ -1025,3 +1025,8 @@ int verify_midx_file(const char *object_dir)
 
 	return verify_midx_error;
 }
+
+int expire_midx_packs(const char *object_dir)
+{
+	return 0;
+}
diff --git a/midx.h b/midx.h
index 774f652530..e3a2b740b5 100644
--- a/midx.h
+++ b/midx.h
@@ -49,6 +49,7 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 int write_midx_file(const char *object_dir);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(const char *object_dir);
+int expire_midx_packs(const char *object_dir);
 
 void close_midx(struct multi_pack_index *m);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 70926b5bc0..948effc1ee 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -348,4 +348,51 @@ test_expect_success 'verify incorrect 64-bit offset' '
 		"incorrect object offset"
 '
 
+test_expect_success 'setup expire tests' '
+	mkdir dup &&
+	(
+		cd dup &&
+		git init &&
+		for i in $(test_seq 1 20)
+		do
+			test_commit $i
+		done &&
+		git branch A HEAD &&
+		git branch B HEAD~8 &&
+		git branch C HEAD~13 &&
+		git branch D HEAD~16 &&
+		git branch E HEAD~18 &&
+		git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
+		refs/heads/E
+		EOF
+		git pack-objects --revs .git/objects/pack/pack-D <<-EOF &&
+		refs/heads/D
+		^refs/heads/E
+		EOF
+		git pack-objects --revs .git/objects/pack/pack-C <<-EOF &&
+		refs/heads/C
+		^refs/heads/D
+		EOF
+		git pack-objects --revs .git/objects/pack/pack-B <<-EOF &&
+		refs/heads/B
+		^refs/heads/C
+		EOF
+		git pack-objects --revs .git/objects/pack/pack-A <<-EOF &&
+		refs/heads/A
+		^refs/heads/B
+		EOF
+		git multi-pack-index write
+	)
+'
+
+test_expect_success 'expire does not remove any packs' '
+	(
+		cd dup &&
+		ls .git/objects/pack >expect &&
+		git multi-pack-index expire &&
+		ls .git/objects/pack >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 3/5] multi-pack-index: implement 'expire' verb
From: Derrick Stolee via GitGitGadget @ 2018-12-10 18:06 UTC (permalink / raw)
  To: git; +Cc: sbeller, peff, jrnieder, avarab, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.92.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The 'git multi-pack-index expire' command looks at the existing
mult-pack-index, counts the number of objects referenced in each
pack-file, deletes the pack-fils with no referenced objects, and
rewrites the multi-pack-index to no longer reference those packs.

Refactor the write_midx_file() method to call write_midx_internal()
which now takes an existing 'struct multi_pack_index' and a list
of pack-files to drop (as specified by the names of their pack-
indexes). As we write the new multi-pack-index, we drop those
file names from the list of known pack-files.

The expire_midx_packs() method removes the unreferenced pack-files
after carefully closing the packs to avoid open handles.

Test that a new pack-file that covers the contents of two other
pack-files leads to those pack-files being deleted during the
expire command. Be sure to read the multi-pack-index to ensure
it no longer references those packs.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 87 +++++++++++++++++++++++++++++++++++--
 t/t5319-multi-pack-index.sh | 15 +++++++
 2 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/midx.c b/midx.c
index 3bd7183a53..50e4cd7270 100644
--- a/midx.c
+++ b/midx.c
@@ -751,7 +751,8 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off
 	return written;
 }
 
-int write_midx_file(const char *object_dir)
+static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
+			       struct string_list *packs_to_drop)
 {
 	unsigned char cur_chunk, num_chunks = 0;
 	char *midx_name;
@@ -765,6 +766,7 @@ int write_midx_file(const char *object_dir)
 	uint32_t nr_entries, num_large_offsets = 0;
 	struct pack_midx_entry *entries = NULL;
 	int large_offsets_needed = 0;
+	int result = 0;
 
 	midx_name = get_midx_filename(object_dir);
 	if (safe_create_leading_directories(midx_name)) {
@@ -773,7 +775,10 @@ int write_midx_file(const char *object_dir)
 			  midx_name);
 	}
 
-	packs.m = load_multi_pack_index(object_dir, 1);
+	if (m)
+		packs.m = m;
+	else
+		packs.m = load_multi_pack_index(object_dir, 1);
 
 	packs.nr = 0;
 	packs.alloc_list = packs.m ? packs.m->num_packs : 16;
@@ -787,7 +792,24 @@ int write_midx_file(const char *object_dir)
 	ALLOC_ARRAY(packs.perm, packs.alloc_perm);
 
 	if (packs.m) {
+		int drop_index = 0, missing_drops = 0;
 		for (i = 0; i < packs.m->num_packs; i++) {
+			if (packs_to_drop && drop_index < packs_to_drop->nr) {
+				int cmp = strcmp(packs.m->pack_names[i],
+						 packs_to_drop->items[drop_index].string);
+
+				if (!cmp) {
+					drop_index++;
+					continue;
+				} else if (cmp > 0) {
+					error(_("did not see pack-file %s to drop"),
+					      packs_to_drop->items[drop_index].string);
+					drop_index++;
+					i--;
+					missing_drops++;
+				}
+			}
+
 			ALLOC_GROW(packs.list, packs.nr + 1, packs.alloc_list);
 			ALLOC_GROW(packs.names, packs.nr + 1, packs.alloc_names);
 			ALLOC_GROW(packs.perm, packs.nr + 1, packs.alloc_perm);
@@ -798,6 +820,12 @@ int write_midx_file(const char *object_dir)
 			packs.pack_name_concat_len += strlen(packs.names[packs.nr]) + 1;
 			packs.nr++;
 		}
+
+		if (packs_to_drop && (drop_index < packs_to_drop->nr || missing_drops)) {
+			error(_("did not see all pack-files to drop"));
+			result = 1;
+			goto cleanup;
+		}
 	}
 
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs);
@@ -932,7 +960,12 @@ cleanup:
 	free(packs.perm);
 	free(entries);
 	free(midx_name);
-	return 0;
+	return result;
+}
+
+int write_midx_file(const char *object_dir)
+{
+	return write_midx_internal(object_dir, NULL, NULL);
 }
 
 void clear_midx_file(struct repository *r)
@@ -1034,5 +1067,51 @@ int verify_midx_file(const char *object_dir)
 
 int expire_midx_packs(const char *object_dir)
 {
-	return 0;
+	uint32_t i, *count, result = 0;
+	size_t dirlen;
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
+	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+
+	if (!m)
+		return 0;
+
+	count = xcalloc(m->num_packs, sizeof(uint32_t));
+	for (i = 0; i < m->num_objects; i++) {
+		int pack_int_id = nth_midxed_pack_int_id(m, i);
+		count[pack_int_id]++;
+	}
+
+	strbuf_addstr(&buf, object_dir);
+	strbuf_addstr(&buf, "/pack/");
+	dirlen = buf.len;
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (count[i])
+			continue;
+
+		if (m->packs[i]) {
+			close_pack(m->packs[i]);
+			m->packs[i] = NULL;
+		}
+
+		string_list_insert(&packs_to_drop, m->pack_names[i]);
+
+		strbuf_setlen(&buf, dirlen);
+		strbuf_addstr(&buf, m->pack_names[i]);
+		unlink(buf.buf);
+
+		strip_suffix_mem(buf.buf, &buf.len, "idx");
+		strbuf_addstr(&buf, "pack");
+		unlink(buf.buf);
+	}
+
+	strbuf_release(&buf);
+	free(count);
+
+	if (packs_to_drop.nr)
+		result = write_midx_internal(object_dir, m, &packs_to_drop);
+
+	string_list_clear(&packs_to_drop, 0);
+	return result;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 948effc1ee..210279a3cf 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -395,4 +395,19 @@ test_expect_success 'expire does not remove any packs' '
 	)
 '
 
+test_expect_success 'expire removes unreferenced packs' '
+	(
+		cd dup &&
+		git pack-objects --revs .git/objects/pack/pack-combined <<-EOF &&
+		refs/heads/A
+		^refs/heads/C
+		EOF
+		git multi-pack-index write &&
+		ls .git/objects/pack | grep -v -e pack-[AB] >expect &&
+		git multi-pack-index expire &&
+		ls .git/objects/pack >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 2/5] midx: refactor permutation logic
From: Derrick Stolee via GitGitGadget @ 2018-12-10 18:06 UTC (permalink / raw)
  To: git; +Cc: sbeller, peff, jrnieder, avarab, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.92.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

When writing a multi-pack-index, we keep track of an integer
permutation, tracking the list of pack-files that we know about
(both from the existing multi-pack-index and the new pack-files
being introduced) and converting them into a sorted order for
the new multi-pack-index.

In anticipation of dropping pack-files from the existing multi-
pack-index, refactor the logic around how we track this permutation.

First, insert the permutation into the pack_list structure. This
allows us to grow the permutation dynamically as we add packs.

Second, fill the permutation with values corresponding to their
position in the list of pack-files, sorted as follows:

  1. The pack-files in the existing multi-pack-index,
     sorted lexicographically.

  2. The pack-files not in the existing multi-pack-index,
     sorted as discovered from the filesystem.

There is a subtle thing in how we initialize this permutation,
specifically how we use 'i' for the initial value. This will
matter more when we implement the logic for dropping existing
packs, as we will create holes in the ordering.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/midx.c b/midx.c
index bb825ef816..3bd7183a53 100644
--- a/midx.c
+++ b/midx.c
@@ -380,9 +380,11 @@ static size_t write_midx_header(struct hashfile *f,
 struct pack_list {
 	struct packed_git **list;
 	char **names;
+	uint32_t *perm;
 	uint32_t nr;
 	uint32_t alloc_list;
 	uint32_t alloc_names;
+	uint32_t alloc_perm;
 	size_t pack_name_concat_len;
 	struct multi_pack_index *m;
 };
@@ -398,6 +400,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
 
 		ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
 		ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
+		ALLOC_GROW(packs->perm, packs->nr + 1, packs->alloc_perm);
 
 		packs->list[packs->nr] = add_packed_git(full_path,
 							full_path_len,
@@ -417,6 +420,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
 			return;
 		}
 
+		packs->perm[packs->nr] = packs->nr;
 		packs->names[packs->nr] = xstrdup(file_name);
 		packs->pack_name_concat_len += strlen(file_name) + 1;
 		packs->nr++;
@@ -443,7 +447,7 @@ static void sort_packs_by_name(char **pack_names, uint32_t nr_packs, uint32_t *p
 	ALLOC_ARRAY(pairs, nr_packs);
 
 	for (i = 0; i < nr_packs; i++) {
-		pairs[i].pack_int_id = i;
+		pairs[i].pack_int_id = perm[i];
 		pairs[i].pack_name = pack_names[i];
 	}
 
@@ -755,7 +759,6 @@ int write_midx_file(const char *object_dir)
 	struct hashfile *f = NULL;
 	struct lock_file lk;
 	struct pack_list packs;
-	uint32_t *pack_perm = NULL;
 	uint64_t written = 0;
 	uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1];
 	uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1];
@@ -774,18 +777,22 @@ int write_midx_file(const char *object_dir)
 
 	packs.nr = 0;
 	packs.alloc_list = packs.m ? packs.m->num_packs : 16;
-	packs.alloc_names = packs.alloc_list;
+	packs.alloc_perm = packs.alloc_names = packs.alloc_list;
 	packs.list = NULL;
 	packs.names = NULL;
+	packs.perm = NULL;
 	packs.pack_name_concat_len = 0;
 	ALLOC_ARRAY(packs.list, packs.alloc_list);
 	ALLOC_ARRAY(packs.names, packs.alloc_names);
+	ALLOC_ARRAY(packs.perm, packs.alloc_perm);
 
 	if (packs.m) {
 		for (i = 0; i < packs.m->num_packs; i++) {
 			ALLOC_GROW(packs.list, packs.nr + 1, packs.alloc_list);
 			ALLOC_GROW(packs.names, packs.nr + 1, packs.alloc_names);
+			ALLOC_GROW(packs.perm, packs.nr + 1, packs.alloc_perm);
 
+			packs.perm[packs.nr] = i;
 			packs.list[packs.nr] = NULL;
 			packs.names[packs.nr] = xstrdup(packs.m->pack_names[i]);
 			packs.pack_name_concat_len += strlen(packs.names[packs.nr]) + 1;
@@ -802,10 +809,9 @@ int write_midx_file(const char *object_dir)
 		packs.pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
 					      (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
 
-	ALLOC_ARRAY(pack_perm, packs.nr);
-	sort_packs_by_name(packs.names, packs.nr, pack_perm);
+	sort_packs_by_name(packs.names, packs.nr, packs.perm);
 
-	entries = get_sorted_entries(packs.m, packs.list, pack_perm, packs.nr, &nr_entries);
+	entries = get_sorted_entries(packs.m, packs.list, packs.perm, packs.nr, &nr_entries);
 
 	for (i = 0; i < nr_entries; i++) {
 		if (entries[i].offset > 0x7fffffff)
@@ -923,8 +929,8 @@ cleanup:
 
 	free(packs.list);
 	free(packs.names);
+	free(packs.perm);
 	free(entries);
-	free(pack_perm);
 	free(midx_name);
 	return 0;
 }
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 4/5] multi-pack-index: prepare 'repack' verb
From: Derrick Stolee via GitGitGadget @ 2018-12-10 18:06 UTC (permalink / raw)
  To: git; +Cc: sbeller, peff, jrnieder, avarab, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.92.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

In an environment where the multi-pack-index is useful, it is due
to many pack-files and an inability to repack the object store
into a single pack-file. However, it is likely that many of these
pack-files are rather small, and could be repacked into a slightly
larger pack-file without too much effort. It may also be important
to ensure the object store is highly available and the repack
operation does not interrupt concurrent git commands.

Introduce a 'repack' verb to 'git multi-pack-index' that takes a
'--batch-size' option. The verb will inspect the multi-pack-index
for referenced pack-files whose size is smaller than the batch
size, until collecting a list of pack-files whose sizes sum to
larger than the batch size. Then, a new pack-file will be created
containing the objects from those pack-files that are referenced
by the multi-pack-index. The resulting pack is likely to actually
be smaller than the batch size due to compression and the fact
that there may be objects in the pack-files that have duplicate
copies in other pack-files.

The current change introduces the command-line arguments, and we
add a test that ensures we parse these options properly. Since
we specify a small batch size, we will guarantee that future
implementations do not change the list of pack-files.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt | 12 ++++++++++++
 builtin/multi-pack-index.c             | 10 +++++++++-
 midx.c                                 |  5 +++++
 midx.h                                 |  1 +
 t/t5319-multi-pack-index.sh            | 11 +++++++++++
 5 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 822d83c845..e43e7da71e 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -39,6 +39,18 @@ expire::
 	Rewrite the MIDX file afterward to remove all references to
 	these pack-files.
 
+repack::
+	When given as the verb, collect a batch of pack-files whose
+	size are all at most the size given by --batch-size, but
+	whose sizes sum to larger than --batch-size. The batch is
+	selected by greedily adding small pack-files starting with
+	the oldest pack-files that fit the size. Create a new pack-
+	file containing the objects the multi-pack-index indexes
+	into thos pack-files, and rewrite the multi-pack-index to
+	contain that pack-file. A later run of 'git multi-pack-index
+	expire' will delete the pack-files that were part of this
+	batch.
+
 
 EXAMPLES
 --------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 145de3a46c..d87a2235e3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,12 +5,13 @@
 #include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-	N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire)"),
+	N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>)"),
 	NULL
 };
 
 static struct opts_multi_pack_index {
 	const char *object_dir;
+	unsigned long batch_size;
 } opts;
 
 int cmd_multi_pack_index(int argc, const char **argv,
@@ -19,6 +20,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
 	static struct option builtin_multi_pack_index_options[] = {
 		OPT_FILENAME(0, "object-dir", &opts.object_dir,
 		  N_("object directory containing set of packfile and pack-index pairs")),
+		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
+		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
 	};
 
@@ -40,6 +43,11 @@ int cmd_multi_pack_index(int argc, const char **argv,
 		return 1;
 	}
 
+	if (!strcmp(argv[0], "repack"))
+		return midx_repack(opts.object_dir, (size_t)opts.batch_size);
+	if (opts.batch_size)
+		die(_("--batch-size option is only for 'repack' verb"));
+
 	if (!strcmp(argv[0], "write"))
 		return write_midx_file(opts.object_dir);
 	if (!strcmp(argv[0], "verify"))
diff --git a/midx.c b/midx.c
index 50e4cd7270..4caf148464 100644
--- a/midx.c
+++ b/midx.c
@@ -1115,3 +1115,8 @@ int expire_midx_packs(const char *object_dir)
 	string_list_clear(&packs_to_drop, 0);
 	return result;
 }
+
+int midx_repack(const char *object_dir, size_t batch_size)
+{
+	return 0;
+}
diff --git a/midx.h b/midx.h
index e3a2b740b5..394a21ee96 100644
--- a/midx.h
+++ b/midx.h
@@ -50,6 +50,7 @@ int write_midx_file(const char *object_dir);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(const char *object_dir);
 int expire_midx_packs(const char *object_dir);
+int midx_repack(const char *object_dir, size_t batch_size);
 
 void close_midx(struct multi_pack_index *m);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 210279a3cf..c23e930a5d 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -410,4 +410,15 @@ test_expect_success 'expire removes unreferenced packs' '
 	)
 '
 
+test_expect_success 'repack does not create any packs' '
+	(
+		cd dup &&
+		ls .git/objects/pack >expect &&
+		MINSIZE=$(ls -l .git/objects/pack/*pack | awk "{print \$5;}" | sort -n | head -n 1) &&
+		git multi-pack-index repack --batch-size=$MINSIZE &&
+		ls .git/objects/pack >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox