* [PATCH] reftable/stack: use geometric table compaction
@ 2024-03-05 20:03 Justin Tobler via GitGitGadget
  2024-03-06 12:30 ` Patrick Steinhardt
                   ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Justin Tobler via GitGitGadget @ 2024-03-05 20:03 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Justin Tobler
From: Justin Tobler <jltobler@gmail.com>
To reduce the number of on-disk reftables, compaction is performed.
Contiguous tables with the same binary log value of size are grouped
into segments. The segment that has both the lowest binary log value and
contains more than one table is set as the starting point when
identifying the compaction segment.
Since segments containing a single table are not initially considered
for compaction, if the table appended to the list does not match the
previous table log value, no compaction occurs for the new table. It is
therefore possible for unbounded growth of the table list. This can be
demonstrated by repeating the following sequence:
git branch -f foo
git branch -d foo
Each operation results in a new table being written with no compaction
occurring until a separate operation produces a table matching the
previous table log value.
To avoid unbounded growth of the table list, walk through each table and
evaluate if it needs to be included in the compaction segment to restore
a geometric sequence.
Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
tables and are therefore updated to specify the correct new table count.
Since compaction is more aggressive in ensuring tables maintain a
geometric sequence, the expected table count is reduced in these tests.
In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
removed because the function is no longer needed. Also, the
`test_suggest_compaction_segment()` test is updated to better showcase
and reflect the new geometric compaction behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
    reftable/stack: use geometric table compaction
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1683
 reftable/stack.c           | 106 +++++++++++++++----------------------
 reftable/stack.h           |   3 --
 reftable/stack_test.c      |  66 +++++------------------
 t/t0610-reftable-basics.sh |  24 ++++-----
 4 files changed, 70 insertions(+), 129 deletions(-)
diff --git a/reftable/stack.c b/reftable/stack.c
index b64e55648aa..e4ea8753977 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1214,75 +1214,57 @@ static int segment_size(struct segment *s)
 	return s->end - s->start;
 }
 
-int fastlog2(uint64_t sz)
-{
-	int l = 0;
-	if (sz == 0)
-		return 0;
-	for (; sz; sz /= 2) {
-		l++;
-	}
-	return l - 1;
-}
-
-struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n)
-{
-	struct segment *segs = reftable_calloc(n, sizeof(*segs));
-	struct segment cur = { 0 };
-	size_t next = 0, i;
-
-	if (n == 0) {
-		*seglen = 0;
-		return segs;
-	}
-	for (i = 0; i < n; i++) {
-		int log = fastlog2(sizes[i]);
-		if (cur.log != log && cur.bytes > 0) {
-			struct segment fresh = {
-				.start = i,
-			};
-
-			segs[next++] = cur;
-			cur = fresh;
-		}
-
-		cur.log = log;
-		cur.end = i + 1;
-		cur.bytes += sizes[i];
-	}
-	segs[next++] = cur;
-	*seglen = next;
-	return segs;
-}
-
 struct segment suggest_compaction_segment(uint64_t *sizes, size_t n)
 {
-	struct segment min_seg = {
-		.log = 64,
-	};
-	struct segment *segs;
-	size_t seglen = 0, i;
-
-	segs = sizes_to_segments(&seglen, sizes, n);
-	for (i = 0; i < seglen; i++) {
-		if (segment_size(&segs[i]) == 1)
-			continue;
+	struct segment seg = { 0 };
+	uint64_t bytes;
+	size_t i;
 
-		if (segs[i].log < min_seg.log)
-			min_seg = segs[i];
-	}
+	/*
+	 * If there are no tables or only a single one then we don't have to
+	 * compact anything. The sequence is geometric by definition already.
+	 */
+	if (n <= 1)
+		return seg;
 
-	while (min_seg.start > 0) {
-		size_t prev = min_seg.start - 1;
-		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev]))
+	/*
+	 * Find the ending table of the compaction segment needed to restore the
+	 * geometric sequence.
+	 *
+	 * To do so, we iterate backwards starting from the most recent table
+	 * until a valid segment end is found. If the preceding table is smaller
+	 * than the current table multiplied by the geometric factor (2), the
+	 * current table is set as the compaction segment end.
+	 */
+	for (i = n - 1; i > 0; i--) {
+		if (sizes[i - 1] < sizes[i] * 2) {
+			seg.end = i;
+			bytes = sizes[i];
 			break;
+		}
+	}
+
+	/*
+	 * Find the starting table of the compaction segment by iterating
+	 * through the remaing tables and keeping track of the accumulated size
+	 * of all tables seen from the segment end table.
+	 *
+	 * Note that we keep iterating even after we have found the first
+	 * first starting point. This is because there may be tables in the
+	 * stack preceding that first starting point which violate the geometric
+	 * sequence.
+	 */
+	for (; i > 0; i--) {
+		uint64_t curr = bytes;
+		bytes += sizes[i - 1];
 
-		min_seg.start = prev;
-		min_seg.bytes += sizes[prev];
+		if (sizes[i - 1] < curr * 2) {
+			seg.start = i - 1;
+			seg.bytes = bytes;
+		}
 	}
 
-	reftable_free(segs);
-	return min_seg;
+	return seg;
 }
 
 static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
@@ -1305,7 +1287,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st)
 		suggest_compaction_segment(sizes, st->merged->stack_len);
 	reftable_free(sizes);
 	if (segment_size(&seg) > 0)
-		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+		return stack_compact_range_stats(st, seg.start, seg.end,
 						 NULL);
 
 	return 0;
diff --git a/reftable/stack.h b/reftable/stack.h
index d919455669e..656f896cc28 100644
--- a/reftable/stack.h
+++ b/reftable/stack.h
@@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines);
 
 struct segment {
 	size_t start, end;
-	int log;
 	uint64_t bytes;
 };
 
-int fastlog2(uint64_t sz);
-struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n);
 struct segment suggest_compaction_segment(uint64_t *sizes, size_t n);
 
 #endif
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 509f4866236..85600a9573e 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -720,59 +720,14 @@ static void test_reftable_stack_hash_id(void)
 	clear_dir(dir);
 }
 
-static void test_log2(void)
-{
-	EXPECT(1 == fastlog2(3));
-	EXPECT(2 == fastlog2(4));
-	EXPECT(2 == fastlog2(5));
-}
-
-static void test_sizes_to_segments(void)
-{
-	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
-	/* .................0  1  2  3  4  5 */
-
-	size_t seglen = 0;
-	struct segment *segs =
-		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
-	EXPECT(segs[2].log == 3);
-	EXPECT(segs[2].start == 5);
-	EXPECT(segs[2].end == 6);
-
-	EXPECT(segs[1].log == 2);
-	EXPECT(segs[1].start == 2);
-	EXPECT(segs[1].end == 5);
-	reftable_free(segs);
-}
-
-static void test_sizes_to_segments_empty(void)
-{
-	size_t seglen = 0;
-	struct segment *segs = sizes_to_segments(&seglen, NULL, 0);
-	EXPECT(seglen == 0);
-	reftable_free(segs);
-}
-
-static void test_sizes_to_segments_all_equal(void)
-{
-	uint64_t sizes[] = { 5, 5 };
-	size_t seglen = 0;
-	struct segment *segs =
-		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
-	EXPECT(seglen == 1);
-	EXPECT(segs[0].start == 0);
-	EXPECT(segs[0].end == 2);
-	reftable_free(segs);
-}
-
 static void test_suggest_compaction_segment(void)
 {
-	uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
+	uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 };
 	/* .................0    1    2  3   4  5  6 */
 	struct segment min =
 		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
-	EXPECT(min.start == 2);
-	EXPECT(min.end == 7);
+	EXPECT(min.start == 1);
+	EXPECT(min.end == 9);
 }
 
 static void test_suggest_compaction_segment_nothing(void)
@@ -884,6 +839,17 @@ static void test_empty_add(void)
 	reftable_stack_destroy(st2);
 }
 
+static int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0)
+		return 0;
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
 static void test_reftable_stack_auto_compaction(void)
 {
 	struct reftable_write_options cfg = { 0 };
@@ -1072,7 +1038,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void)
 int stack_test_main(int argc, const char *argv[])
 {
 	RUN_TEST(test_empty_add);
-	RUN_TEST(test_log2);
 	RUN_TEST(test_names_equal);
 	RUN_TEST(test_parse_names);
 	RUN_TEST(test_read_file);
@@ -1092,9 +1057,6 @@ int stack_test_main(int argc, const char *argv[])
 	RUN_TEST(test_reftable_stack_update_index_check);
 	RUN_TEST(test_reftable_stack_uptodate);
 	RUN_TEST(test_reftable_stack_validate_refname);
-	RUN_TEST(test_sizes_to_segments);
-	RUN_TEST(test_sizes_to_segments_all_equal);
-	RUN_TEST(test_sizes_to_segments_empty);
 	RUN_TEST(test_suggest_compaction_segment);
 	RUN_TEST(test_suggest_compaction_segment_nothing);
 	return 0;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 6a131e40b81..a3b1a04123e 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -293,7 +293,7 @@ test_expect_success 'ref transaction: writes cause auto-compaction' '
 	test_line_count = 1 repo/.git/reftable/tables.list &&
 
 	test_commit -C repo --no-tag A &&
-	test_line_count = 2 repo/.git/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
 
 	test_commit -C repo --no-tag B &&
 	test_line_count = 1 repo/.git/reftable/tables.list
@@ -324,7 +324,7 @@ test_expect_success 'ref transaction: writes are synced' '
 		git -C repo -c core.fsync=reference \
 		-c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD &&
 	check_fsync_events trace2.txt <<-EOF
-	"name":"hardware-flush","count":2
+	"name":"hardware-flush","count":4
 	EOF
 '
 
@@ -334,8 +334,8 @@ test_expect_success 'pack-refs: compacts tables' '
 
 	test_commit -C repo A &&
 	ls -1 repo/.git/reftable >table-files &&
-	test_line_count = 4 table-files &&
-	test_line_count = 3 repo/.git/reftable/tables.list &&
+	test_line_count = 3 table-files &&
+	test_line_count = 2 repo/.git/reftable/tables.list &&
 
 	git -C repo pack-refs &&
 	ls -1 repo/.git/reftable >table-files &&
@@ -367,7 +367,7 @@ do
 			umask $umask &&
 			git init --shared=true repo &&
 			test_commit -C repo A &&
-			test_line_count = 3 repo/.git/reftable/tables.list
+			test_line_count = 2 repo/.git/reftable/tables.list
 		) &&
 		git -C repo pack-refs &&
 		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
@@ -737,10 +737,10 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' '
 	test_commit -C repo A &&
 	git -C repo worktree add ../worktree &&
 
-	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
-	test_line_count = 4 repo/.git/reftable/tables.list &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
 	git -C repo pack-refs &&
-	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
 	test_line_count = 1 repo/.git/reftable/tables.list
 '
 
@@ -750,11 +750,11 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' '
 	test_commit -C repo A &&
 	git -C repo worktree add ../worktree &&
 
-	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
-	test_line_count = 4 repo/.git/reftable/tables.list &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
 	git -C worktree pack-refs &&
 	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
-	test_line_count = 4 repo/.git/reftable/tables.list
+	test_line_count = 1 repo/.git/reftable/tables.list
 '
 
 test_expect_success 'worktree: creating shared ref updates main stack' '
@@ -770,7 +770,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' '
 
 	git -C worktree update-ref refs/heads/shared HEAD &&
 	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
-	test_line_count = 2 repo/.git/reftable/tables.list
+	test_line_count = 1 repo/.git/reftable/tables.list
 '
 
 test_expect_success 'worktree: creating per-worktree ref updates worktree stack' '
base-commit: b387623c12f3f4a376e4d35a610fd3e55d7ea907
-- 
gitgitgadget
^ permalink raw reply related	[flat|nested] 52+ messages in thread* Re: [PATCH] reftable/stack: use geometric table compaction 2024-03-05 20:03 [PATCH] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget @ 2024-03-06 12:30 ` Patrick Steinhardt 2024-03-06 12:37 ` Patrick Steinhardt 2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget 2 siblings, 0 replies; 52+ messages in thread From: Patrick Steinhardt @ 2024-03-06 12:30 UTC (permalink / raw) To: Justin Tobler via GitGitGadget; +Cc: git, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 9502 bytes --] On Tue, Mar 05, 2024 at 08:03:45PM +0000, Justin Tobler via GitGitGadget wrote: > From: Justin Tobler <jltobler@gmail.com> > > To reduce the number of on-disk reftables, compaction is performed. > Contiguous tables with the same binary log value of size are grouped > into segments. The segment that has both the lowest binary log value and > contains more than one table is set as the starting point when > identifying the compaction segment. > > Since segments containing a single table are not initially considered > for compaction, if the table appended to the list does not match the > previous table log value, no compaction occurs for the new table. It is > therefore possible for unbounded growth of the table list. This can be > demonstrated by repeating the following sequence: > > git branch -f foo > git branch -d foo > > Each operation results in a new table being written with no compaction > occurring until a separate operation produces a table matching the > previous table log value. > > To avoid unbounded growth of the table list, walk through each table and > evaluate if it needs to be included in the compaction segment to restore > a geometric sequence. I think the description of what exactly changes could use some more explanation and some arguments why the new behaviour is okay, too. It's quite a large rewrite of the compaction logic, so pinpointing exactly how these are different would go a long way. > Some tests in `t0610-reftable-basics.sh` assert the on-disk state of > tables and are therefore updated to specify the correct new table count. > Since compaction is more aggressive in ensuring tables maintain a > geometric sequence, the expected table count is reduced in these tests. > In `reftable/stack_test.c` tests related to `sizes_to_segments()` are > removed because the function is no longer needed. Also, the > `test_suggest_compaction_segment()` test is updated to better showcase > and reflect the new geometric compaction behavior. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > --- > reftable/stack: use geometric table compaction > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1683 > > reftable/stack.c | 106 +++++++++++++++---------------------- > reftable/stack.h | 3 -- > reftable/stack_test.c | 66 +++++------------------ > t/t0610-reftable-basics.sh | 24 ++++----- > 4 files changed, 70 insertions(+), 129 deletions(-) > > diff --git a/reftable/stack.c b/reftable/stack.c > index b64e55648aa..e4ea8753977 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -1214,75 +1214,57 @@ static int segment_size(struct segment *s) > return s->end - s->start; > } > > -int fastlog2(uint64_t sz) > -{ > - int l = 0; > - if (sz == 0) > - return 0; > - for (; sz; sz /= 2) { > - l++; > - } > - return l - 1; > -} > - > -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n) > -{ > - struct segment *segs = reftable_calloc(n, sizeof(*segs)); > - struct segment cur = { 0 }; > - size_t next = 0, i; > - > - if (n == 0) { > - *seglen = 0; > - return segs; > - } > - for (i = 0; i < n; i++) { > - int log = fastlog2(sizes[i]); > - if (cur.log != log && cur.bytes > 0) { > - struct segment fresh = { > - .start = i, > - }; > - > - segs[next++] = cur; > - cur = fresh; > - } > - > - cur.log = log; > - cur.end = i + 1; > - cur.bytes += sizes[i]; > - } > - segs[next++] = cur; > - *seglen = next; > - return segs; > -} > - > struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) > { > - struct segment min_seg = { > - .log = 64, > - }; > - struct segment *segs; > - size_t seglen = 0, i; > - > - segs = sizes_to_segments(&seglen, sizes, n); > - for (i = 0; i < seglen; i++) { > - if (segment_size(&segs[i]) == 1) > - continue; > + struct segment seg = { 0 }; > + uint64_t bytes; > + size_t i; > > - if (segs[i].log < min_seg.log) > - min_seg = segs[i]; > - } > + /* > + * If there are no tables or only a single one then we don't have to > + * compact anything. The sequence is geometric by definition already. > + */ > + if (n <= 1) > + return seg; > > - while (min_seg.start > 0) { > - size_t prev = min_seg.start - 1; > - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) > + /* > + * Find the ending table of the compaction segment needed to restore the > + * geometric sequence. > + * > + * To do so, we iterate backwards starting from the most recent table > + * until a valid segment end is found. If the preceding table is smaller > + * than the current table multiplied by the geometric factor (2), the > + * current table is set as the compaction segment end. > + */ > + for (i = n - 1; i > 0; i--) { > + if (sizes[i - 1] < sizes[i] * 2) { > + seg.end = i; > + bytes = sizes[i]; > break; > + } > + } I was briefly wondering whether we have to compare the _sum_ of all preceding table sizes to the next size here. Otherwise it may happen that compaction will lead to a new table that is immediately violating the geometric sequence again. But I think due to properties of the geometric sequence, the sum of all entries preceding the current value cannot be greater than the value itself. So this should be fine. This might be worth a comment. > + > + /* > + * Find the starting table of the compaction segment by iterating > + * through the remaing tables and keeping track of the accumulated size s/remaing/remaining/ > + * of all tables seen from the segment end table. > + * > + * Note that we keep iterating even after we have found the first > + * first starting point. This is because there may be tables in the Nit: s/first//, duplicate word. > + * stack preceding that first starting point which violate the geometric > + * sequence. > + */ > + for (; i > 0; i--) { > + uint64_t curr = bytes; > + bytes += sizes[i - 1]; > > - min_seg.start = prev; > - min_seg.bytes += sizes[prev]; > + if (sizes[i - 1] < curr * 2) { > + seg.start = i - 1; > + seg.bytes = bytes; > + } > } Overall I really like the rewritten algorithm, it's a ton easier to understand compared to the preceding code. One thing I'd suggest doing though is to provide a benchmark of how the new compaction strategy compares to the old one. A comparatively easy way to do this is to write N refs sequentially -- with a big enough N (e.g. 1 million), compaction time will eventually become an important factor. So something like the following (untested): hyperfine \ --prepare "rm -rf repo && git init --ref-format=reftable repo && git -C repo commit --allow-empty --message msg" \ 'for ((i = 0 ; i < 1000000; i++ )); do git -C repo update-ref refs/heads/branch-$i HEAD' > > - reftable_free(segs); > - return min_seg; > + return seg; > } > > static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) [snip] > @@ -737,10 +737,10 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' ' > test_commit -C repo A && > git -C repo worktree add ../worktree && > > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list && > + test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > + test_line_count = 1 repo/.git/reftable/tables.list && > git -C repo pack-refs && > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > + test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > test_line_count = 1 repo/.git/reftable/tables.list > ' This test needs updating as git-pack-refs(1) has become a no-op here. > @@ -750,11 +750,11 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' ' > test_commit -C repo A && > git -C repo worktree add ../worktree && > > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list && > + test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > + test_line_count = 1 repo/.git/reftable/tables.list && > git -C worktree pack-refs && > test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list > + test_line_count = 1 repo/.git/reftable/tables.list > ' Same. > test_expect_success 'worktree: creating shared ref updates main stack' ' > @@ -770,7 +770,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' ' > > git -C worktree update-ref refs/heads/shared HEAD && > test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 2 repo/.git/reftable/tables.list > + test_line_count = 1 repo/.git/reftable/tables.list > ' Same. One thing missing is a test that demonstrates the previously-broken behaviour. Patrick > test_expect_success 'worktree: creating per-worktree ref updates worktree stack' ' > > base-commit: b387623c12f3f4a376e4d35a610fd3e55d7ea907 > -- > gitgitgadget > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] reftable/stack: use geometric table compaction 2024-03-05 20:03 [PATCH] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-03-06 12:30 ` Patrick Steinhardt @ 2024-03-06 12:37 ` Patrick Steinhardt 2024-03-21 22:48 ` Justin Tobler 2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget 2 siblings, 1 reply; 52+ messages in thread From: Patrick Steinhardt @ 2024-03-06 12:37 UTC (permalink / raw) To: Justin Tobler via GitGitGadget; +Cc: git, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 866 bytes --] On Tue, Mar 05, 2024 at 08:03:45PM +0000, Justin Tobler via GitGitGadget wrote: > From: Justin Tobler <jltobler@gmail.com> > @@ -1305,7 +1287,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st) > suggest_compaction_segment(sizes, st->merged->stack_len); > reftable_free(sizes); > if (segment_size(&seg) > 0) > - return stack_compact_range_stats(st, seg.start, seg.end - 1, > + return stack_compact_range_stats(st, seg.start, seg.end, > NULL); > > return 0; One more thing: I think it would make sense to move the refactoring where you change whether the end segment index is inclusive or exclusive into a separate patch so that it's easier to reason about. Also, the fact that no tests would require changes would further stress the point that this is a mere refactoring without unintended side effects. Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] reftable/stack: use geometric table compaction 2024-03-06 12:37 ` Patrick Steinhardt @ 2024-03-21 22:48 ` Justin Tobler 0 siblings, 0 replies; 52+ messages in thread From: Justin Tobler @ 2024-03-21 22:48 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Justin Tobler via GitGitGadget, git On 24/03/06 01:37PM, Patrick Steinhardt wrote: > On Tue, Mar 05, 2024 at 08:03:45PM +0000, Justin Tobler via GitGitGadget wrote: > > From: Justin Tobler <jltobler@gmail.com> > > @@ -1305,7 +1287,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st) > > suggest_compaction_segment(sizes, st->merged->stack_len); > > reftable_free(sizes); > > if (segment_size(&seg) > 0) > > - return stack_compact_range_stats(st, seg.start, seg.end - 1, > > + return stack_compact_range_stats(st, seg.start, seg.end, > > NULL); > > > > return 0; > > One more thing: I think it would make sense to move the refactoring > where you change whether the end segment index is inclusive or exclusive > into a separate patch so that it's easier to reason about. Also, the > fact that no tests would require changes would further stress the point > that this is a mere refactoring without unintended side effects. The `test_suggest_compaction_segment()` in `stack_test.c` does have to be updated to reflect the segment end now being inclusive. But other than that, no tests have to be updated. Thanks Patrick for all the great feedback! I've updated per your comments in V2 of the patch series. -Justin ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v2 0/3] reftable/stack: use geometric table compaction 2024-03-05 20:03 [PATCH] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-03-06 12:30 ` Patrick Steinhardt 2024-03-06 12:37 ` Patrick Steinhardt @ 2024-03-21 22:40 ` Justin Tobler via GitGitGadget 2024-03-21 22:40 ` [PATCH v2 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget ` (5 more replies) 2 siblings, 6 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-03-21 22:40 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Justin Tobler Hello again, This is the second version my patch series that refactors the reftable compaction strategy to instead follow a geometric sequence. Changes compared to v1: * Added GIT_TEST_REFTABLE_NO_AUTOCOMPACTION environment variable to disable reftable compaction when testing. * Refactored worktree tests in t0610-reftable-basics.sh to properly assert git-pack-refs(1) works as expected. * Added test to validate that alternating table sizes are compacted. * Added benchmark to compare compaction strategies. * Moved change that made compaction segment end inclusive to its own commit. * Added additional explanation in commits and comments and fixed typos. Thanks for taking a look! Justin Justin Tobler (3): reftable/stack: add env to disable autocompaction reftable/stack: use geometric table compaction reftable/segment: make segment end inclusive reftable/stack.c | 113 ++++++++++++++++--------------------- reftable/stack.h | 3 - reftable/stack_test.c | 66 +++++----------------- reftable/system.h | 1 + t/t0610-reftable-basics.sh | 43 +++++++++----- 5 files changed, 94 insertions(+), 132 deletions(-) base-commit: 3bd955d26919e149552f34aacf8a4e6368c26cec Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1683 Range-diff vs v1: -: ----------- > 1: cb6b152e5c8 reftable/stack: add env to disable autocompaction 1: 7a518853a10 ! 2: def70084523 reftable/stack: use geometric table compaction @@ Commit message occurring until a separate operation produces a table matching the previous table log value. - To avoid unbounded growth of the table list, walk through each table and - evaluate if it needs to be included in the compaction segment to restore - a geometric sequence. + Instead, to avoid unbounded growth of the table list, the compaction + strategy is updated to ensure tables follow a geometric sequence after + each operation. This is done by walking the table list in reverse index + order to identify the compaction segment start and end. The compaction + segment end is found by identifying the first table which has a + preceding table size less than twice the current table. Next, the + compaction segment start is found iterating through the remaining tables + in the list checking if the previous table size is less than twice the + cumulative of tables from the segment end. This ensures the correct + segment start is found and that the newly compacted table does not + violate the geometric sequence. + + When creating 10 thousand references, the new strategy has no + performance impact: + + Benchmark 1: update-ref: create refs sequentially (revision = HEAD~) + Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s] + Range (min … max): 26.447 s … 26.569 s 10 runs + + Benchmark 2: update-ref: create refs sequentially (revision = HEAD) + Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s] + Range (min … max): 26.366 s … 26.444 s 10 runs + + Summary + update-ref: create refs sequentially (revision = HEAD) ran + 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~) Some tests in `t0610-reftable-basics.sh` assert the on-disk state of tables and are therefore updated to specify the correct new table count. @@ reftable/stack.c: static int segment_size(struct segment *s) + * until a valid segment end is found. If the preceding table is smaller + * than the current table multiplied by the geometric factor (2), the + * current table is set as the compaction segment end. ++ * ++ * Tables after the ending point are not added to the byte count because ++ * they are already valid members of the geometric sequence. Due to the ++ * properties of a geometric sequence, it is not possible for the sum of ++ * these tables to exceed the value of the ending point table. + */ + for (i = n - 1; i > 0; i--) { + if (sizes[i - 1] < sizes[i] * 2) { -+ seg.end = i; ++ seg.end = i + 1; + bytes = sizes[i]; break; + } @@ reftable/stack.c: static int segment_size(struct segment *s) + + /* + * Find the starting table of the compaction segment by iterating -+ * through the remaing tables and keeping track of the accumulated size -+ * of all tables seen from the segment end table. ++ * through the remaining tables and keeping track of the accumulated ++ * size of all tables seen from the segment end table. + * + * Note that we keep iterating even after we have found the first -+ * first starting point. This is because there may be tables in the -+ * stack preceding that first starting point which violate the geometric ++ * starting point. This is because there may be tables in the stack ++ * preceding that first starting point which violate the geometric + * sequence. + */ + for (; i > 0; i--) { @@ reftable/stack.c: static int segment_size(struct segment *s) } static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) -@@ reftable/stack.c: int reftable_stack_auto_compact(struct reftable_stack *st) - suggest_compaction_segment(sizes, st->merged->stack_len); - reftable_free(sizes); - if (segment_size(&seg) > 0) -- return stack_compact_range_stats(st, seg.start, seg.end - 1, -+ return stack_compact_range_stats(st, seg.start, seg.end, - NULL); - - return 0; ## reftable/stack.h ## @@ reftable/stack.h: int read_lines(const char *filename, char ***lines); @@ reftable/stack_test.c: static void test_reftable_stack_hash_id(void) - EXPECT(min.start == 2); - EXPECT(min.end == 7); + EXPECT(min.start == 1); -+ EXPECT(min.end == 9); ++ EXPECT(min.end == 10); } static void test_suggest_compaction_segment_nothing(void) @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause a test_commit -C repo --no-tag B && test_line_count = 1 repo/.git/reftable/tables.list + ' + ++test_expect_success 'ref transaction: alternating table sizes are compacted' ' ++ test_when_finished "rm -rf repo" && ++ git init repo && ++ test_commit -C repo A && ++ for i in $(test_seq 20) ++ do ++ git -C repo branch -f foo && ++ git -C repo branch -d foo || return 1 ++ done && ++ test_line_count = 2 repo/.git/reftable/tables.list ++' ++ + check_fsync_events () { + local trace="$1" && + shift && @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes are synced' ' git -C repo -c core.fsync=reference \ -c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD && @@ t/t0610-reftable-basics.sh: do git -C repo pack-refs && test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list && @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: pack-refs in main repo packs main refs' ' + test_when_finished "rm -rf repo worktree" && + git init repo && test_commit -C repo A && - git -C repo worktree add ../worktree && +- git -C repo worktree add ../worktree && ++ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && ++ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && -+ test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && -+ test_line_count = 1 repo/.git/reftable/tables.list && ++ test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && ++ test_line_count = 3 repo/.git/reftable/tables.list && git -C repo pack-refs && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && -+ test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && ++ test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list ' @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: pack-refs in worktree packs worktree refs' ' + test_when_finished "rm -rf repo worktree" && + git init repo && test_commit -C repo A && - git -C repo worktree add ../worktree && +- git -C repo worktree add ../worktree && ++ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && ++ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && -+ test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && -+ test_line_count = 1 repo/.git/reftable/tables.list && ++ test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && ++ test_line_count = 3 repo/.git/reftable/tables.list && git -C worktree pack-refs && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list -+ test_line_count = 1 repo/.git/reftable/tables.list ++ test_line_count = 3 repo/.git/reftable/tables.list ' test_expect_success 'worktree: creating shared ref updates main stack' ' + test_when_finished "rm -rf repo worktree" && + git init repo && + test_commit -C repo A && ++ test_commit -C repo B && + + git -C repo worktree add ../worktree && + git -C repo pack-refs && @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: creating shared ref updates main stack' ' + test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 1 repo/.git/reftable/tables.list && - git -C worktree update-ref refs/heads/shared HEAD && +- git -C worktree update-ref refs/heads/shared HEAD && ++ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/heads/shared HEAD && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && -- test_line_count = 2 repo/.git/reftable/tables.list -+ test_line_count = 1 repo/.git/reftable/tables.list + test_line_count = 2 repo/.git/reftable/tables.list ' - - test_expect_success 'worktree: creating per-worktree ref updates worktree stack' ' -: ----------- > 3: a23e3fc6972 reftable/segment: make segment end inclusive -- gitgitgadget ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v2 1/3] reftable/stack: add env to disable autocompaction 2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget @ 2024-03-21 22:40 ` Justin Tobler via GitGitGadget 2024-03-22 1:25 ` Patrick Steinhardt 2024-03-21 22:40 ` [PATCH v2 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget ` (4 subsequent siblings) 5 siblings, 1 reply; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-03-21 22:40 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> In future tests it will be neccesary to create repositories with a set number of tables. To make this easier, introduce the `GIT_TEST_REFTABLE_NO_AUTOCOMPACTION` environment variable that, when set, disables autocompaction of reftables. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/stack.c | 2 +- reftable/system.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/reftable/stack.c b/reftable/stack.c index b64e55648aa..2370d93d13b 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -681,7 +681,7 @@ int reftable_addition_commit(struct reftable_addition *add) if (err) goto done; - if (!add->stack->disable_auto_compact) + if (!add->stack->disable_auto_compact && !git_env_bool("GIT_TEST_REFTABLE_NO_AUTOCOMPACTION", 0)) err = reftable_stack_auto_compact(add->stack); done: diff --git a/reftable/system.h b/reftable/system.h index 6b74a815143..ec08b728177 100644 --- a/reftable/system.h +++ b/reftable/system.h @@ -15,6 +15,7 @@ license that can be found in the LICENSE file or at #include "strbuf.h" #include "hash-ll.h" /* hash ID, sizes.*/ #include "dir.h" /* remove_dir_recursively, for tests.*/ +#include "parse.h" int hash_size(uint32_t id); -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 1/3] reftable/stack: add env to disable autocompaction 2024-03-21 22:40 ` [PATCH v2 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget @ 2024-03-22 1:25 ` Patrick Steinhardt 0 siblings, 0 replies; 52+ messages in thread From: Patrick Steinhardt @ 2024-03-22 1:25 UTC (permalink / raw) To: Justin Tobler via GitGitGadget; +Cc: git, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1768 bytes --] On Thu, Mar 21, 2024 at 10:40:17PM +0000, Justin Tobler via GitGitGadget wrote: > From: Justin Tobler <jltobler@gmail.com> > > In future tests it will be neccesary to create repositories with a set > number of tables. To make this easier, introduce the > `GIT_TEST_REFTABLE_NO_AUTOCOMPACTION` environment variable that, when > set, disables autocompaction of reftables. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> Might be worth it to demonstrate in a test what this does, also to make sure that it actually works as expected and doesn't regress. Even though it may be a bit overboarding to add tests for test-only functionality. Dunno... ultimately it wouldn't hurt though, I guess? Patrick > --- > reftable/stack.c | 2 +- > reftable/system.h | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/reftable/stack.c b/reftable/stack.c > index b64e55648aa..2370d93d13b 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -681,7 +681,7 @@ int reftable_addition_commit(struct reftable_addition *add) > if (err) > goto done; > > - if (!add->stack->disable_auto_compact) > + if (!add->stack->disable_auto_compact && !git_env_bool("GIT_TEST_REFTABLE_NO_AUTOCOMPACTION", 0)) > err = reftable_stack_auto_compact(add->stack); > > done: > diff --git a/reftable/system.h b/reftable/system.h > index 6b74a815143..ec08b728177 100644 > --- a/reftable/system.h > +++ b/reftable/system.h > @@ -15,6 +15,7 @@ license that can be found in the LICENSE file or at > #include "strbuf.h" > #include "hash-ll.h" /* hash ID, sizes.*/ > #include "dir.h" /* remove_dir_recursively, for tests.*/ > +#include "parse.h" > > int hash_size(uint32_t id); > > -- > gitgitgadget > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v2 2/3] reftable/stack: use geometric table compaction 2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget 2024-03-21 22:40 ` [PATCH v2 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget @ 2024-03-21 22:40 ` Justin Tobler via GitGitGadget 2024-03-22 1:25 ` Patrick Steinhardt 2024-03-27 13:24 ` Karthik Nayak 2024-03-21 22:40 ` [PATCH v2 3/3] reftable/segment: make segment end inclusive Justin Tobler via GitGitGadget ` (3 subsequent siblings) 5 siblings, 2 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-03-21 22:40 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> To reduce the number of on-disk reftables, compaction is performed. Contiguous tables with the same binary log value of size are grouped into segments. The segment that has both the lowest binary log value and contains more than one table is set as the starting point when identifying the compaction segment. Since segments containing a single table are not initially considered for compaction, if the table appended to the list does not match the previous table log value, no compaction occurs for the new table. It is therefore possible for unbounded growth of the table list. This can be demonstrated by repeating the following sequence: git branch -f foo git branch -d foo Each operation results in a new table being written with no compaction occurring until a separate operation produces a table matching the previous table log value. Instead, to avoid unbounded growth of the table list, the compaction strategy is updated to ensure tables follow a geometric sequence after each operation. This is done by walking the table list in reverse index order to identify the compaction segment start and end. The compaction segment end is found by identifying the first table which has a preceding table size less than twice the current table. Next, the compaction segment start is found iterating through the remaining tables in the list checking if the previous table size is less than twice the cumulative of tables from the segment end. This ensures the correct segment start is found and that the newly compacted table does not violate the geometric sequence. When creating 10 thousand references, the new strategy has no performance impact: Benchmark 1: update-ref: create refs sequentially (revision = HEAD~) Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s] Range (min … max): 26.447 s … 26.569 s 10 runs Benchmark 2: update-ref: create refs sequentially (revision = HEAD) Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s] Range (min … max): 26.366 s … 26.444 s 10 runs Summary update-ref: create refs sequentially (revision = HEAD) ran 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~) Some tests in `t0610-reftable-basics.sh` assert the on-disk state of tables and are therefore updated to specify the correct new table count. Since compaction is more aggressive in ensuring tables maintain a geometric sequence, the expected table count is reduced in these tests. In `reftable/stack_test.c` tests related to `sizes_to_segments()` are removed because the function is no longer needed. Also, the `test_suggest_compaction_segment()` test is updated to better showcase and reflect the new geometric compaction behavior. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/stack.c | 109 ++++++++++++++++--------------------- reftable/stack.h | 3 - reftable/stack_test.c | 66 +++++----------------- t/t0610-reftable-basics.sh | 43 ++++++++++----- 4 files changed, 91 insertions(+), 130 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index 2370d93d13b..ef55dc75cde 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -1214,75 +1214,62 @@ static int segment_size(struct segment *s) return s->end - s->start; } -int fastlog2(uint64_t sz) -{ - int l = 0; - if (sz == 0) - return 0; - for (; sz; sz /= 2) { - l++; - } - return l - 1; -} - -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n) -{ - struct segment *segs = reftable_calloc(n, sizeof(*segs)); - struct segment cur = { 0 }; - size_t next = 0, i; - - if (n == 0) { - *seglen = 0; - return segs; - } - for (i = 0; i < n; i++) { - int log = fastlog2(sizes[i]); - if (cur.log != log && cur.bytes > 0) { - struct segment fresh = { - .start = i, - }; - - segs[next++] = cur; - cur = fresh; - } - - cur.log = log; - cur.end = i + 1; - cur.bytes += sizes[i]; - } - segs[next++] = cur; - *seglen = next; - return segs; -} - struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) { - struct segment min_seg = { - .log = 64, - }; - struct segment *segs; - size_t seglen = 0, i; - - segs = sizes_to_segments(&seglen, sizes, n); - for (i = 0; i < seglen; i++) { - if (segment_size(&segs[i]) == 1) - continue; + struct segment seg = { 0 }; + uint64_t bytes; + size_t i; - if (segs[i].log < min_seg.log) - min_seg = segs[i]; - } + /* + * If there are no tables or only a single one then we don't have to + * compact anything. The sequence is geometric by definition already. + */ + if (n <= 1) + return seg; - while (min_seg.start > 0) { - size_t prev = min_seg.start - 1; - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) + /* + * Find the ending table of the compaction segment needed to restore the + * geometric sequence. + * + * To do so, we iterate backwards starting from the most recent table + * until a valid segment end is found. If the preceding table is smaller + * than the current table multiplied by the geometric factor (2), the + * current table is set as the compaction segment end. + * + * Tables after the ending point are not added to the byte count because + * they are already valid members of the geometric sequence. Due to the + * properties of a geometric sequence, it is not possible for the sum of + * these tables to exceed the value of the ending point table. + */ + for (i = n - 1; i > 0; i--) { + if (sizes[i - 1] < sizes[i] * 2) { + seg.end = i + 1; + bytes = sizes[i]; break; + } + } + + /* + * Find the starting table of the compaction segment by iterating + * through the remaining tables and keeping track of the accumulated + * size of all tables seen from the segment end table. + * + * Note that we keep iterating even after we have found the first + * starting point. This is because there may be tables in the stack + * preceding that first starting point which violate the geometric + * sequence. + */ + for (; i > 0; i--) { + uint64_t curr = bytes; + bytes += sizes[i - 1]; - min_seg.start = prev; - min_seg.bytes += sizes[prev]; + if (sizes[i - 1] < curr * 2) { + seg.start = i - 1; + seg.bytes = bytes; + } } - reftable_free(segs); - return min_seg; + return seg; } static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) diff --git a/reftable/stack.h b/reftable/stack.h index d919455669e..656f896cc28 100644 --- a/reftable/stack.h +++ b/reftable/stack.h @@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines); struct segment { size_t start, end; - int log; uint64_t bytes; }; -int fastlog2(uint64_t sz); -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n); struct segment suggest_compaction_segment(uint64_t *sizes, size_t n); #endif diff --git a/reftable/stack_test.c b/reftable/stack_test.c index 509f4866236..e5f6ff5c9e4 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -720,59 +720,14 @@ static void test_reftable_stack_hash_id(void) clear_dir(dir); } -static void test_log2(void) -{ - EXPECT(1 == fastlog2(3)); - EXPECT(2 == fastlog2(4)); - EXPECT(2 == fastlog2(5)); -} - -static void test_sizes_to_segments(void) -{ - uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 }; - /* .................0 1 2 3 4 5 */ - - size_t seglen = 0; - struct segment *segs = - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); - EXPECT(segs[2].log == 3); - EXPECT(segs[2].start == 5); - EXPECT(segs[2].end == 6); - - EXPECT(segs[1].log == 2); - EXPECT(segs[1].start == 2); - EXPECT(segs[1].end == 5); - reftable_free(segs); -} - -static void test_sizes_to_segments_empty(void) -{ - size_t seglen = 0; - struct segment *segs = sizes_to_segments(&seglen, NULL, 0); - EXPECT(seglen == 0); - reftable_free(segs); -} - -static void test_sizes_to_segments_all_equal(void) -{ - uint64_t sizes[] = { 5, 5 }; - size_t seglen = 0; - struct segment *segs = - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); - EXPECT(seglen == 1); - EXPECT(segs[0].start == 0); - EXPECT(segs[0].end == 2); - reftable_free(segs); -} - static void test_suggest_compaction_segment(void) { - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; /* .................0 1 2 3 4 5 6 */ struct segment min = suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); - EXPECT(min.start == 2); - EXPECT(min.end == 7); + EXPECT(min.start == 1); + EXPECT(min.end == 10); } static void test_suggest_compaction_segment_nothing(void) @@ -884,6 +839,17 @@ static void test_empty_add(void) reftable_stack_destroy(st2); } +static int fastlog2(uint64_t sz) +{ + int l = 0; + if (sz == 0) + return 0; + for (; sz; sz /= 2) { + l++; + } + return l - 1; +} + static void test_reftable_stack_auto_compaction(void) { struct reftable_write_options cfg = { 0 }; @@ -1072,7 +1038,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void) int stack_test_main(int argc, const char *argv[]) { RUN_TEST(test_empty_add); - RUN_TEST(test_log2); RUN_TEST(test_names_equal); RUN_TEST(test_parse_names); RUN_TEST(test_read_file); @@ -1092,9 +1057,6 @@ int stack_test_main(int argc, const char *argv[]) RUN_TEST(test_reftable_stack_update_index_check); RUN_TEST(test_reftable_stack_uptodate); RUN_TEST(test_reftable_stack_validate_refname); - RUN_TEST(test_sizes_to_segments); - RUN_TEST(test_sizes_to_segments_all_equal); - RUN_TEST(test_sizes_to_segments_empty); RUN_TEST(test_suggest_compaction_segment); RUN_TEST(test_suggest_compaction_segment_nothing); return 0; diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 686781192eb..e6c3f94d874 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -293,12 +293,24 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' test_line_count = 1 repo/.git/reftable/tables.list && test_commit -C repo --no-tag A && - test_line_count = 2 repo/.git/reftable/tables.list && + test_line_count = 1 repo/.git/reftable/tables.list && test_commit -C repo --no-tag B && test_line_count = 1 repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: alternating table sizes are compacted' ' + test_when_finished "rm -rf repo" && + git init repo && + test_commit -C repo A && + for i in $(test_seq 20) + do + git -C repo branch -f foo && + git -C repo branch -d foo || return 1 + done && + test_line_count = 2 repo/.git/reftable/tables.list +' + check_fsync_events () { local trace="$1" && shift && @@ -324,7 +336,7 @@ test_expect_success 'ref transaction: writes are synced' ' git -C repo -c core.fsync=reference \ -c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD && check_fsync_events trace2.txt <<-EOF - "name":"hardware-flush","count":2 + "name":"hardware-flush","count":4 EOF ' @@ -346,8 +358,8 @@ test_expect_success 'pack-refs: compacts tables' ' test_commit -C repo A && ls -1 repo/.git/reftable >table-files && - test_line_count = 4 table-files && - test_line_count = 3 repo/.git/reftable/tables.list && + test_line_count = 3 table-files && + test_line_count = 2 repo/.git/reftable/tables.list && git -C repo pack-refs && ls -1 repo/.git/reftable >table-files && @@ -379,7 +391,7 @@ do umask $umask && git init --shared=true repo && test_commit -C repo A && - test_line_count = 3 repo/.git/reftable/tables.list + test_line_count = 2 repo/.git/reftable/tables.list ) && git -C repo pack-refs && test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list && @@ -747,12 +759,13 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && - git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 3 repo/.git/reftable/tables.list && git -C repo pack-refs && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list ' @@ -760,19 +773,21 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && - git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 3 repo/.git/reftable/tables.list && git -C worktree pack-refs && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list + test_line_count = 3 repo/.git/reftable/tables.list ' test_expect_success 'worktree: creating shared ref updates main stack' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && + test_commit -C repo B && git -C repo worktree add ../worktree && git -C repo pack-refs && @@ -780,7 +795,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' ' test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list && - git -C worktree update-ref refs/heads/shared HEAD && + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/heads/shared HEAD && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 2 repo/.git/reftable/tables.list ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 2/3] reftable/stack: use geometric table compaction 2024-03-21 22:40 ` [PATCH v2 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget @ 2024-03-22 1:25 ` Patrick Steinhardt 2024-03-27 13:24 ` Karthik Nayak 1 sibling, 0 replies; 52+ messages in thread From: Patrick Steinhardt @ 2024-03-22 1:25 UTC (permalink / raw) To: Justin Tobler via GitGitGadget; +Cc: git, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 16012 bytes --] On Thu, Mar 21, 2024 at 10:40:18PM +0000, Justin Tobler via GitGitGadget wrote: > From: Justin Tobler <jltobler@gmail.com> > > To reduce the number of on-disk reftables, compaction is performed. > Contiguous tables with the same binary log value of size are grouped > into segments. The segment that has both the lowest binary log value and > contains more than one table is set as the starting point when > identifying the compaction segment. > > Since segments containing a single table are not initially considered > for compaction, if the table appended to the list does not match the > previous table log value, no compaction occurs for the new table. It is > therefore possible for unbounded growth of the table list. This can be > demonstrated by repeating the following sequence: > > git branch -f foo > git branch -d foo > > Each operation results in a new table being written with no compaction > occurring until a separate operation produces a table matching the > previous table log value. > > Instead, to avoid unbounded growth of the table list, the compaction > strategy is updated to ensure tables follow a geometric sequence after > each operation. This is done by walking the table list in reverse index > order to identify the compaction segment start and end. The compaction > segment end is found by identifying the first table which has a > preceding table size less than twice the current table. Next, the > compaction segment start is found iterating through the remaining tables > in the list checking if the previous table size is less than twice the > cumulative of tables from the segment end. This ensures the correct > segment start is found and that the newly compacted table does not > violate the geometric sequence. I don't think we need to go into so much detail how exactly the algorithm is working -- these kind of comments should ideally exist in the code. What would be more interesting to explain here is _why_ we chose the new algorithm over the old one instead of just trying to fix the issue. Other than that this patch LGTM. Patrick > When creating 10 thousand references, the new strategy has no > performance impact: > > Benchmark 1: update-ref: create refs sequentially (revision = HEAD~) > Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s] > Range (min … max): 26.447 s … 26.569 s 10 runs > > Benchmark 2: update-ref: create refs sequentially (revision = HEAD) > Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s] > Range (min … max): 26.366 s … 26.444 s 10 runs > > Summary > update-ref: create refs sequentially (revision = HEAD) ran > 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~) > > Some tests in `t0610-reftable-basics.sh` assert the on-disk state of > tables and are therefore updated to specify the correct new table count. > Since compaction is more aggressive in ensuring tables maintain a > geometric sequence, the expected table count is reduced in these tests. > In `reftable/stack_test.c` tests related to `sizes_to_segments()` are > removed because the function is no longer needed. Also, the > `test_suggest_compaction_segment()` test is updated to better showcase > and reflect the new geometric compaction behavior. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > --- > reftable/stack.c | 109 ++++++++++++++++--------------------- > reftable/stack.h | 3 - > reftable/stack_test.c | 66 +++++----------------- > t/t0610-reftable-basics.sh | 43 ++++++++++----- > 4 files changed, 91 insertions(+), 130 deletions(-) > > diff --git a/reftable/stack.c b/reftable/stack.c > index 2370d93d13b..ef55dc75cde 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -1214,75 +1214,62 @@ static int segment_size(struct segment *s) > return s->end - s->start; > } > > -int fastlog2(uint64_t sz) > -{ > - int l = 0; > - if (sz == 0) > - return 0; > - for (; sz; sz /= 2) { > - l++; > - } > - return l - 1; > -} > - > -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n) > -{ > - struct segment *segs = reftable_calloc(n, sizeof(*segs)); > - struct segment cur = { 0 }; > - size_t next = 0, i; > - > - if (n == 0) { > - *seglen = 0; > - return segs; > - } > - for (i = 0; i < n; i++) { > - int log = fastlog2(sizes[i]); > - if (cur.log != log && cur.bytes > 0) { > - struct segment fresh = { > - .start = i, > - }; > - > - segs[next++] = cur; > - cur = fresh; > - } > - > - cur.log = log; > - cur.end = i + 1; > - cur.bytes += sizes[i]; > - } > - segs[next++] = cur; > - *seglen = next; > - return segs; > -} > - > struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) > { > - struct segment min_seg = { > - .log = 64, > - }; > - struct segment *segs; > - size_t seglen = 0, i; > - > - segs = sizes_to_segments(&seglen, sizes, n); > - for (i = 0; i < seglen; i++) { > - if (segment_size(&segs[i]) == 1) > - continue; > + struct segment seg = { 0 }; > + uint64_t bytes; > + size_t i; > > - if (segs[i].log < min_seg.log) > - min_seg = segs[i]; > - } > + /* > + * If there are no tables or only a single one then we don't have to > + * compact anything. The sequence is geometric by definition already. > + */ > + if (n <= 1) > + return seg; > > - while (min_seg.start > 0) { > - size_t prev = min_seg.start - 1; > - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) > + /* > + * Find the ending table of the compaction segment needed to restore the > + * geometric sequence. > + * > + * To do so, we iterate backwards starting from the most recent table > + * until a valid segment end is found. If the preceding table is smaller > + * than the current table multiplied by the geometric factor (2), the > + * current table is set as the compaction segment end. > + * > + * Tables after the ending point are not added to the byte count because > + * they are already valid members of the geometric sequence. Due to the > + * properties of a geometric sequence, it is not possible for the sum of > + * these tables to exceed the value of the ending point table. > + */ > + for (i = n - 1; i > 0; i--) { > + if (sizes[i - 1] < sizes[i] * 2) { > + seg.end = i + 1; > + bytes = sizes[i]; > break; > + } > + } > + > + /* > + * Find the starting table of the compaction segment by iterating > + * through the remaining tables and keeping track of the accumulated > + * size of all tables seen from the segment end table. > + * > + * Note that we keep iterating even after we have found the first > + * starting point. This is because there may be tables in the stack > + * preceding that first starting point which violate the geometric > + * sequence. > + */ > + for (; i > 0; i--) { > + uint64_t curr = bytes; > + bytes += sizes[i - 1]; > > - min_seg.start = prev; > - min_seg.bytes += sizes[prev]; > + if (sizes[i - 1] < curr * 2) { > + seg.start = i - 1; > + seg.bytes = bytes; > + } > } > > - reftable_free(segs); > - return min_seg; > + return seg; > } > > static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) > diff --git a/reftable/stack.h b/reftable/stack.h > index d919455669e..656f896cc28 100644 > --- a/reftable/stack.h > +++ b/reftable/stack.h > @@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines); > > struct segment { > size_t start, end; > - int log; > uint64_t bytes; > }; > > -int fastlog2(uint64_t sz); > -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n); > struct segment suggest_compaction_segment(uint64_t *sizes, size_t n); > > #endif > diff --git a/reftable/stack_test.c b/reftable/stack_test.c > index 509f4866236..e5f6ff5c9e4 100644 > --- a/reftable/stack_test.c > +++ b/reftable/stack_test.c > @@ -720,59 +720,14 @@ static void test_reftable_stack_hash_id(void) > clear_dir(dir); > } > > -static void test_log2(void) > -{ > - EXPECT(1 == fastlog2(3)); > - EXPECT(2 == fastlog2(4)); > - EXPECT(2 == fastlog2(5)); > -} > - > -static void test_sizes_to_segments(void) > -{ > - uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 }; > - /* .................0 1 2 3 4 5 */ > - > - size_t seglen = 0; > - struct segment *segs = > - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); > - EXPECT(segs[2].log == 3); > - EXPECT(segs[2].start == 5); > - EXPECT(segs[2].end == 6); > - > - EXPECT(segs[1].log == 2); > - EXPECT(segs[1].start == 2); > - EXPECT(segs[1].end == 5); > - reftable_free(segs); > -} > - > -static void test_sizes_to_segments_empty(void) > -{ > - size_t seglen = 0; > - struct segment *segs = sizes_to_segments(&seglen, NULL, 0); > - EXPECT(seglen == 0); > - reftable_free(segs); > -} > - > -static void test_sizes_to_segments_all_equal(void) > -{ > - uint64_t sizes[] = { 5, 5 }; > - size_t seglen = 0; > - struct segment *segs = > - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); > - EXPECT(seglen == 1); > - EXPECT(segs[0].start == 0); > - EXPECT(segs[0].end == 2); > - reftable_free(segs); > -} > - > static void test_suggest_compaction_segment(void) > { > - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; > + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; > /* .................0 1 2 3 4 5 6 */ > struct segment min = > suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); > - EXPECT(min.start == 2); > - EXPECT(min.end == 7); > + EXPECT(min.start == 1); > + EXPECT(min.end == 10); > } > > static void test_suggest_compaction_segment_nothing(void) > @@ -884,6 +839,17 @@ static void test_empty_add(void) > reftable_stack_destroy(st2); > } > > +static int fastlog2(uint64_t sz) > +{ > + int l = 0; > + if (sz == 0) > + return 0; > + for (; sz; sz /= 2) { > + l++; > + } > + return l - 1; > +} > + > static void test_reftable_stack_auto_compaction(void) > { > struct reftable_write_options cfg = { 0 }; > @@ -1072,7 +1038,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void) > int stack_test_main(int argc, const char *argv[]) > { > RUN_TEST(test_empty_add); > - RUN_TEST(test_log2); > RUN_TEST(test_names_equal); > RUN_TEST(test_parse_names); > RUN_TEST(test_read_file); > @@ -1092,9 +1057,6 @@ int stack_test_main(int argc, const char *argv[]) > RUN_TEST(test_reftable_stack_update_index_check); > RUN_TEST(test_reftable_stack_uptodate); > RUN_TEST(test_reftable_stack_validate_refname); > - RUN_TEST(test_sizes_to_segments); > - RUN_TEST(test_sizes_to_segments_all_equal); > - RUN_TEST(test_sizes_to_segments_empty); > RUN_TEST(test_suggest_compaction_segment); > RUN_TEST(test_suggest_compaction_segment_nothing); > return 0; > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 686781192eb..e6c3f94d874 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -293,12 +293,24 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' > test_line_count = 1 repo/.git/reftable/tables.list && > > test_commit -C repo --no-tag A && > - test_line_count = 2 repo/.git/reftable/tables.list && > + test_line_count = 1 repo/.git/reftable/tables.list && > > test_commit -C repo --no-tag B && > test_line_count = 1 repo/.git/reftable/tables.list > ' > > +test_expect_success 'ref transaction: alternating table sizes are compacted' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + test_commit -C repo A && > + for i in $(test_seq 20) > + do > + git -C repo branch -f foo && > + git -C repo branch -d foo || return 1 > + done && > + test_line_count = 2 repo/.git/reftable/tables.list > +' > + > check_fsync_events () { > local trace="$1" && > shift && > @@ -324,7 +336,7 @@ test_expect_success 'ref transaction: writes are synced' ' > git -C repo -c core.fsync=reference \ > -c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD && > check_fsync_events trace2.txt <<-EOF > - "name":"hardware-flush","count":2 > + "name":"hardware-flush","count":4 > EOF > ' > > @@ -346,8 +358,8 @@ test_expect_success 'pack-refs: compacts tables' ' > > test_commit -C repo A && > ls -1 repo/.git/reftable >table-files && > - test_line_count = 4 table-files && > - test_line_count = 3 repo/.git/reftable/tables.list && > + test_line_count = 3 table-files && > + test_line_count = 2 repo/.git/reftable/tables.list && > > git -C repo pack-refs && > ls -1 repo/.git/reftable >table-files && > @@ -379,7 +391,7 @@ do > umask $umask && > git init --shared=true repo && > test_commit -C repo A && > - test_line_count = 3 repo/.git/reftable/tables.list > + test_line_count = 2 repo/.git/reftable/tables.list > ) && > git -C repo pack-refs && > test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list && > @@ -747,12 +759,13 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' ' > test_when_finished "rm -rf repo worktree" && > git init repo && > test_commit -C repo A && > - git -C repo worktree add ../worktree && > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && > > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list && > + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && > + test_line_count = 3 repo/.git/reftable/tables.list && > git -C repo pack-refs && > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && > test_line_count = 1 repo/.git/reftable/tables.list > ' > > @@ -760,19 +773,21 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' ' > test_when_finished "rm -rf repo worktree" && > git init repo && > test_commit -C repo A && > - git -C repo worktree add ../worktree && > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && > > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list && > + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && > + test_line_count = 3 repo/.git/reftable/tables.list && > git -C worktree pack-refs && > test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list > + test_line_count = 3 repo/.git/reftable/tables.list > ' > > test_expect_success 'worktree: creating shared ref updates main stack' ' > test_when_finished "rm -rf repo worktree" && > git init repo && > test_commit -C repo A && > + test_commit -C repo B && > > git -C repo worktree add ../worktree && > git -C repo pack-refs && > @@ -780,7 +795,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' ' > test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > test_line_count = 1 repo/.git/reftable/tables.list && > > - git -C worktree update-ref refs/heads/shared HEAD && > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/heads/shared HEAD && > test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > test_line_count = 2 repo/.git/reftable/tables.list > ' > -- > gitgitgadget > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 2/3] reftable/stack: use geometric table compaction 2024-03-21 22:40 ` [PATCH v2 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-03-22 1:25 ` Patrick Steinhardt @ 2024-03-27 13:24 ` Karthik Nayak 1 sibling, 0 replies; 52+ messages in thread From: Karthik Nayak @ 2024-03-27 13:24 UTC (permalink / raw) To: Justin Tobler via GitGitGadget, git; +Cc: Patrick Steinhardt, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2431 bytes --] "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Justin Tobler <jltobler@gmail.com> > > To reduce the number of on-disk reftables, compaction is performed. > Contiguous tables with the same binary log value of size are grouped > into segments. The segment that has both the lowest binary log value and > contains more than one table is set as the starting point when > identifying the compaction segment. > > Since segments containing a single table are not initially considered > for compaction, if the table appended to the list does not match the > previous table log value, no compaction occurs for the new table. It is > therefore possible for unbounded growth of the table list. This can be > demonstrated by repeating the following sequence: > Nit: A numerical example would really help make this simpler to understand. > + /* > + * Find the ending table of the compaction segment needed to restore the > + * geometric sequence. > + * > + * To do so, we iterate backwards starting from the most recent table > + * until a valid segment end is found. If the preceding table is smaller > + * than the current table multiplied by the geometric factor (2), the > + * current table is set as the compaction segment end. > + * > + * Tables after the ending point are not added to the byte count because > + * they are already valid members of the geometric sequence. Due to the > + * properties of a geometric sequence, it is not possible for the sum of > + * these tables to exceed the value of the ending point table. > + */ > + for (i = n - 1; i > 0; i--) { > + if (sizes[i - 1] < sizes[i] * 2) { > + seg.end = i + 1; > + bytes = sizes[i]; > break; > + } > + } > + > + /* > + * Find the starting table of the compaction segment by iterating > + * through the remaining tables and keeping track of the accumulated > + * size of all tables seen from the segment end table. > + * Nit: we need the accumulated sum because the tables from the end of the segment will be recursively merged backwards. This might be worthwhile to add here. > static void test_suggest_compaction_segment(void) > { > - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; > + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; > /* .................0 1 2 3 4 5 6 */ Nit: since we're here, maybe worthwhile cleaning up this comment. Not sure what it actually is for. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v2 3/3] reftable/segment: make segment end inclusive 2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget 2024-03-21 22:40 ` [PATCH v2 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget 2024-03-21 22:40 ` [PATCH v2 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget @ 2024-03-21 22:40 ` Justin Tobler via GitGitGadget 2024-03-22 1:25 ` [PATCH v2 0/3] reftable/stack: use geometric table compaction Patrick Steinhardt ` (2 subsequent siblings) 5 siblings, 0 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-03-21 22:40 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> For a reftable segment, the start of the range is inclusive and the end is exclusive. In practice we increment the end when creating the compaction segment only to decrement the segment end when using it. Simplify by making the segment end inclusive. The corresponding test, `test_suggest_compaction_segment()`, is updated to show that the segment end is now inclusive. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/stack.c | 4 ++-- reftable/stack_test.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index ef55dc75cde..0c16ae9b12d 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -1243,7 +1243,7 @@ struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) */ for (i = n - 1; i > 0; i--) { if (sizes[i - 1] < sizes[i] * 2) { - seg.end = i + 1; + seg.end = i; bytes = sizes[i]; break; } @@ -1292,7 +1292,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st) suggest_compaction_segment(sizes, st->merged->stack_len); reftable_free(sizes); if (segment_size(&seg) > 0) - return stack_compact_range_stats(st, seg.start, seg.end - 1, + return stack_compact_range_stats(st, seg.start, seg.end, NULL); return 0; diff --git a/reftable/stack_test.c b/reftable/stack_test.c index e5f6ff5c9e4..85600a9573e 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -727,7 +727,7 @@ static void test_suggest_compaction_segment(void) struct segment min = suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); EXPECT(min.start == 1); - EXPECT(min.end == 10); + EXPECT(min.end == 9); } static void test_suggest_compaction_segment_nothing(void) -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/3] reftable/stack: use geometric table compaction 2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget ` (2 preceding siblings ...) 2024-03-21 22:40 ` [PATCH v2 3/3] reftable/segment: make segment end inclusive Justin Tobler via GitGitGadget @ 2024-03-22 1:25 ` Patrick Steinhardt 2024-04-03 10:13 ` Han-Wen Nienhuys 2024-03-29 4:16 ` [PATCH v3 " Justin Tobler via GitGitGadget 2024-04-03 19:12 ` [PATCH v2 " Junio C Hamano 5 siblings, 1 reply; 52+ messages in thread From: Patrick Steinhardt @ 2024-03-22 1:25 UTC (permalink / raw) To: Justin Tobler via GitGitGadget Cc: git, Justin Tobler, Josh Steadmon, Han-Wen Nienhuys [-- Attachment #1: Type: text/plain, Size: 1042 bytes --] On Thu, Mar 21, 2024 at 10:40:16PM +0000, Justin Tobler via GitGitGadget wrote: > Hello again, > > This is the second version my patch series that refactors the reftable > compaction strategy to instead follow a geometric sequence. Changes compared > to v1: > > * Added GIT_TEST_REFTABLE_NO_AUTOCOMPACTION environment variable to disable > reftable compaction when testing. > * Refactored worktree tests in t0610-reftable-basics.sh to properly assert > git-pack-refs(1) works as expected. > * Added test to validate that alternating table sizes are compacted. > * Added benchmark to compare compaction strategies. > * Moved change that made compaction segment end inclusive to its own > commit. > * Added additional explanation in commits and comments and fixed typos. > > Thanks for taking a look! Cc'ing Han-Wen and Josh for additional input. From my point of view the new algorithm is simpler to understand and less fragile, but I do wonder whether there is anything that we're missing. Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/3] reftable/stack: use geometric table compaction 2024-03-22 1:25 ` [PATCH v2 0/3] reftable/stack: use geometric table compaction Patrick Steinhardt @ 2024-04-03 10:13 ` Han-Wen Nienhuys 2024-04-03 10:18 ` Patrick Steinhardt 0 siblings, 1 reply; 52+ messages in thread From: Han-Wen Nienhuys @ 2024-04-03 10:13 UTC (permalink / raw) To: Patrick Steinhardt Cc: Justin Tobler via GitGitGadget, git, Justin Tobler, Josh Steadmon On Fri, Mar 22, 2024 at 10:51 AM Patrick Steinhardt <ps@pks.im> wrote: > > Thanks for taking a look! > > Cc'ing Han-Wen and Josh for additional input. From my point of view the > new algorithm is simpler to understand and less fragile, but I do wonder > whether there is anything that we're missing. Good spotting. I hadn't thought about alternating tables. I have one minor criticism: Environment variables are untyped global variables without any form of data protection, so I find them unsavoury, and have tried to avoid them throughout. (The whole reftable library only looks at $TMPDIR in tests). They're also accessible to end users, so it can become a feature that can inadvertently become a maintenance burden. For testing, there is a stack->disable_auto_compact. If you want to keep that style, I would elevate disable_auto_compact into reftable_write_options to make it API surface. This will let you use it in tests written in C, which can be unittests and therefore more precise and fine-grained. They also run more quickly, and are easier to instrument with asan/valgrind/etc. The test for tables with alternating sizes can be easily written in C. If you really need it, you could initialize disable_auto_compact from the environment, but I would suggest avoiding it if possible. -- Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/3] reftable/stack: use geometric table compaction 2024-04-03 10:13 ` Han-Wen Nienhuys @ 2024-04-03 10:18 ` Patrick Steinhardt 2024-04-03 15:14 ` Justin Tobler 2024-04-03 16:40 ` Junio C Hamano 0 siblings, 2 replies; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-03 10:18 UTC (permalink / raw) To: Han-Wen Nienhuys Cc: Justin Tobler via GitGitGadget, git, Justin Tobler, Josh Steadmon [-- Attachment #1: Type: text/plain, Size: 1853 bytes --] On Wed, Apr 03, 2024 at 12:13:42PM +0200, Han-Wen Nienhuys wrote: > On Fri, Mar 22, 2024 at 10:51 AM Patrick Steinhardt <ps@pks.im> wrote: > > > Thanks for taking a look! > > > > Cc'ing Han-Wen and Josh for additional input. From my point of view the > > new algorithm is simpler to understand and less fragile, but I do wonder > > whether there is anything that we're missing. > > Good spotting. I hadn't thought about alternating tables. > > I have one minor criticism: > > Environment variables are untyped global variables without any form of > data protection, so I find them unsavoury, and have tried to avoid > them throughout. (The whole reftable library only looks at $TMPDIR in > tests). They're also accessible to end users, so it can become a > feature that can inadvertently become a maintenance burden. > > For testing, there is a stack->disable_auto_compact. > > If you want to keep that style, I would elevate disable_auto_compact > into reftable_write_options to make it API surface. This will let you > use it in tests written in C, which can be unittests and therefore > more precise and fine-grained. They also run more quickly, and are > easier to instrument with asan/valgrind/etc. The test for tables with > alternating sizes can be easily written in C. > > If you really need it, you could initialize disable_auto_compact from > the environment, but I would suggest avoiding it if possible. That's actually a good point. I think keeping this as an environment variable isn't too bad as a stop-gap measure for now, and it should be obvious to users that it's not for general use due to the `GIT_TEST` prefix. But I'm definitely supportive of lifting it out of the reftable library and into the reftable backend so that it is specific to Git, not to the reftable library. Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/3] reftable/stack: use geometric table compaction 2024-04-03 10:18 ` Patrick Steinhardt @ 2024-04-03 15:14 ` Justin Tobler 2024-04-03 16:40 ` Junio C Hamano 1 sibling, 0 replies; 52+ messages in thread From: Justin Tobler @ 2024-04-03 15:14 UTC (permalink / raw) To: Patrick Steinhardt Cc: Han-Wen Nienhuys, Justin Tobler via GitGitGadget, git, Josh Steadmon On 24/04/03 12:18PM, Patrick Steinhardt wrote: > On Wed, Apr 03, 2024 at 12:13:42PM +0200, Han-Wen Nienhuys wrote: > > On Fri, Mar 22, 2024 at 10:51 AM Patrick Steinhardt <ps@pks.im> wrote: > > > > Thanks for taking a look! > > > > > > Cc'ing Han-Wen and Josh for additional input. From my point of view the > > > new algorithm is simpler to understand and less fragile, but I do wonder > > > whether there is anything that we're missing. > > > > Good spotting. I hadn't thought about alternating tables. > > > > I have one minor criticism: > > > > Environment variables are untyped global variables without any form of > > data protection, so I find them unsavoury, and have tried to avoid > > them throughout. (The whole reftable library only looks at $TMPDIR in > > tests). They're also accessible to end users, so it can become a > > feature that can inadvertently become a maintenance burden. > > > > For testing, there is a stack->disable_auto_compact. > > > > If you want to keep that style, I would elevate disable_auto_compact > > into reftable_write_options to make it API surface. This will let you > > use it in tests written in C, which can be unittests and therefore > > more precise and fine-grained. They also run more quickly, and are > > easier to instrument with asan/valgrind/etc. The test for tables with > > alternating sizes can be easily written in C. > > > > If you really need it, you could initialize disable_auto_compact from > > the environment, but I would suggest avoiding it if possible. > > That's actually a good point. I think keeping this as an environment > variable isn't too bad as a stop-gap measure for now, and it should be > obvious to users that it's not for general use due to the `GIT_TEST` > prefix. > > But I'm definitely supportive of lifting it out of the reftable library > and into the reftable backend so that it is specific to Git, not to the > reftable library. Moving the env out of the reftable library seems reasonable to me. I'll make this change as part of the next version of this series. -Justin ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/3] reftable/stack: use geometric table compaction 2024-04-03 10:18 ` Patrick Steinhardt 2024-04-03 15:14 ` Justin Tobler @ 2024-04-03 16:40 ` Junio C Hamano 1 sibling, 0 replies; 52+ messages in thread From: Junio C Hamano @ 2024-04-03 16:40 UTC (permalink / raw) To: Patrick Steinhardt Cc: Han-Wen Nienhuys, Justin Tobler via GitGitGadget, git, Justin Tobler, Josh Steadmon Patrick Steinhardt <ps@pks.im> writes: > But I'm definitely supportive of lifting it out of the reftable library > and into the reftable backend so that it is specific to Git, not to the > reftable library. Absolutely. Thanks for bringing up a good point and a nice solution. I do think it makes sense to handle the environment variable on the Git side. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 0/3] reftable/stack: use geometric table compaction 2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget ` (3 preceding siblings ...) 2024-03-22 1:25 ` [PATCH v2 0/3] reftable/stack: use geometric table compaction Patrick Steinhardt @ 2024-03-29 4:16 ` Justin Tobler via GitGitGadget 2024-03-29 4:16 ` [PATCH v3 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget ` (3 more replies) 2024-04-03 19:12 ` [PATCH v2 " Junio C Hamano 5 siblings, 4 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-03-29 4:16 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Justin Tobler Hello again, This is the third version my patch series that refactors the reftable compaction strategy to instead follow a geometric sequence. Changes compared to v2: * Added test to validate the GIT_TEST_REFTABLE_NO_AUTOCOMPACTION environment variable works as expected. * Added additional clarifying comments and examples to explain how the new compaction strategy works. * Removed outdated comment from stack_test.c test Thanks for taking a look! -Justin Justin Tobler (3): reftable/stack: add env to disable autocompaction reftable/stack: use geometric table compaction reftable/stack: make segment end inclusive reftable/stack.c | 124 ++++++++++++++++++------------------- reftable/stack.h | 3 - reftable/stack_test.c | 67 +++++--------------- reftable/system.h | 1 + t/t0610-reftable-basics.sh | 58 ++++++++++++----- 5 files changed, 120 insertions(+), 133 deletions(-) base-commit: c75fd8d8150afdf836b63a8e0534d9b9e3e111ba Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/1683 Range-diff vs v2: 1: cb6b152e5c8 ! 1: 2fdd8ea1133 reftable/stack: add env to disable autocompaction @@ reftable/stack.c: int reftable_addition_commit(struct reftable_addition *add) ## reftable/system.h ## @@ reftable/system.h: license that can be found in the LICENSE file or at - #include "strbuf.h" + #include "tempfile.h" #include "hash-ll.h" /* hash ID, sizes.*/ #include "dir.h" /* remove_dir_recursively, for tests.*/ +#include "parse.h" int hash_size(uint32_t id); + + ## t/t0610-reftable-basics.sh ## +@@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause auto-compaction' ' + test_line_count = 1 repo/.git/reftable/tables.list + ' + ++test_expect_success 'ref transaction: environment variable disables auto-compaction' ' ++ test_when_finished "rm -rf repo" && ++ ++ git init repo && ++ test_commit -C repo A && ++ for i in $(test_seq 20) ++ do ++ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 ++ done && ++ test_line_count = 23 repo/.git/reftable/tables.list && ++ ++ git -C repo update-ref foo HEAD && ++ test_line_count = 1 repo/.git/reftable/tables.list ++' ++ + check_fsync_events () { + local trace="$1" && + shift && 2: def70084523 ! 2: 7e62c2286ae reftable/stack: use geometric table compaction @@ Commit message Instead, to avoid unbounded growth of the table list, the compaction strategy is updated to ensure tables follow a geometric sequence after - each operation. This is done by walking the table list in reverse index - order to identify the compaction segment start and end. The compaction - segment end is found by identifying the first table which has a - preceding table size less than twice the current table. Next, the - compaction segment start is found iterating through the remaining tables - in the list checking if the previous table size is less than twice the - cumulative of tables from the segment end. This ensures the correct - segment start is found and that the newly compacted table does not - violate the geometric sequence. + each operation by individually evaluating each table in reverse index + order. This strategy results in a much simpler and more robust algorithm + compared to the previous one while also maintaining a minimal ordered + set of tables on-disk. When creating 10 thousand references, the new strategy has no performance impact: @@ reftable/stack.c: static int segment_size(struct segment *s) + * they are already valid members of the geometric sequence. Due to the + * properties of a geometric sequence, it is not possible for the sum of + * these tables to exceed the value of the ending point table. ++ * ++ * Example table size sequence requiring no compaction: ++ * 64, 32, 16, 8, 4, 2, 1 ++ * ++ * Example compaction segment end set to table with size 3: ++ * 64, 32, 16, 8, 4, 3, 1 + */ + for (i = n - 1; i > 0; i--) { + if (sizes[i - 1] < sizes[i] * 2) { @@ reftable/stack.c: static int segment_size(struct segment *s) break; + } + } -+ + +- min_seg.start = prev; +- min_seg.bytes += sizes[prev]; + /* + * Find the starting table of the compaction segment by iterating + * through the remaining tables and keeping track of the accumulated -+ * size of all tables seen from the segment end table. ++ * size of all tables seen from the segment end table. The previous ++ * table is compared to the accumulated size because the tables from the ++ * segment end are merged backwards recursively. + * + * Note that we keep iterating even after we have found the first + * starting point. This is because there may be tables in the stack + * preceding that first starting point which violate the geometric + * sequence. ++ * ++ * Example compaction segment start set to table with size 32: ++ * 128, 32, 16, 8, 4, 3, 1 + */ + for (; i > 0; i--) { + uint64_t curr = bytes; + bytes += sizes[i - 1]; - -- min_seg.start = prev; -- min_seg.bytes += sizes[prev]; ++ + if (sizes[i - 1] < curr * 2) { + seg.start = i - 1; + seg.bytes = bytes; @@ reftable/stack_test.c: static void test_reftable_stack_hash_id(void) static void test_suggest_compaction_segment(void) { - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; +- /* .................0 1 2 3 4 5 6 */ + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; - /* .................0 1 2 3 4 5 6 */ struct segment min = suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); - EXPECT(min.start == 2); @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause a test_commit -C repo --no-tag B && test_line_count = 1 repo/.git/reftable/tables.list +@@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: environment variable disables auto-compact + do + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 + done && +- test_line_count = 23 repo/.git/reftable/tables.list && ++ test_line_count = 22 repo/.git/reftable/tables.list && + + git -C repo update-ref foo HEAD && + test_line_count = 1 repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: alternating table sizes are compacted' ' 3: a23e3fc6972 ! 3: 9a33914c852 reftable/segment: make segment end inclusive @@ Metadata Author: Justin Tobler <jltobler@gmail.com> ## Commit message ## - reftable/segment: make segment end inclusive + reftable/stack: make segment end inclusive For a reftable segment, the start of the range is inclusive and the end is exclusive. In practice we increment the end when creating the -- gitgitgadget ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 1/3] reftable/stack: add env to disable autocompaction 2024-03-29 4:16 ` [PATCH v3 " Justin Tobler via GitGitGadget @ 2024-03-29 4:16 ` Justin Tobler via GitGitGadget 2024-03-29 18:25 ` Junio C Hamano ` (2 more replies) 2024-03-29 4:16 ` [PATCH v3 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget ` (2 subsequent siblings) 3 siblings, 3 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-03-29 4:16 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> In future tests it will be neccesary to create repositories with a set number of tables. To make this easier, introduce the `GIT_TEST_REFTABLE_NO_AUTOCOMPACTION` environment variable that, when set, disables autocompaction of reftables. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/stack.c | 2 +- reftable/system.h | 1 + t/t0610-reftable-basics.sh | 15 +++++++++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/reftable/stack.c b/reftable/stack.c index 1ecf1b9751c..07262beaaf7 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -681,7 +681,7 @@ int reftable_addition_commit(struct reftable_addition *add) if (err) goto done; - if (!add->stack->disable_auto_compact) + if (!add->stack->disable_auto_compact && !git_env_bool("GIT_TEST_REFTABLE_NO_AUTOCOMPACTION", 0)) err = reftable_stack_auto_compact(add->stack); done: diff --git a/reftable/system.h b/reftable/system.h index 5d8b6dede50..05b7c8554af 100644 --- a/reftable/system.h +++ b/reftable/system.h @@ -17,6 +17,7 @@ license that can be found in the LICENSE file or at #include "tempfile.h" #include "hash-ll.h" /* hash ID, sizes.*/ #include "dir.h" /* remove_dir_recursively, for tests.*/ +#include "parse.h" int hash_size(uint32_t id); diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 686781192eb..434044078ed 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -299,6 +299,21 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' test_line_count = 1 repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: environment variable disables auto-compaction' ' + test_when_finished "rm -rf repo" && + + git init repo && + test_commit -C repo A && + for i in $(test_seq 20) + do + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 + done && + test_line_count = 23 repo/.git/reftable/tables.list && + + git -C repo update-ref foo HEAD && + test_line_count = 1 repo/.git/reftable/tables.list +' + check_fsync_events () { local trace="$1" && shift && -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3 1/3] reftable/stack: add env to disable autocompaction 2024-03-29 4:16 ` [PATCH v3 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget @ 2024-03-29 18:25 ` Junio C Hamano 2024-03-29 21:56 ` Junio C Hamano 2024-04-02 7:23 ` Patrick Steinhardt 2 siblings, 0 replies; 52+ messages in thread From: Junio C Hamano @ 2024-03-29 18:25 UTC (permalink / raw) To: Justin Tobler via GitGitGadget Cc: git, Patrick Steinhardt, Karthik Nayak, Justin Tobler "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Justin Tobler <jltobler@gmail.com> > > In future tests it will be neccesary to create repositories with a set > number of tables. To make this easier, introduce the > `GIT_TEST_REFTABLE_NO_AUTOCOMPACTION` environment variable that, when > set, disables autocompaction of reftables. "when set" -> "when set to true"? > Signed-off-by: Justin Tobler <jltobler@gmail.com> > --- > reftable/stack.c | 2 +- > reftable/system.h | 1 + > t/t0610-reftable-basics.sh | 15 +++++++++++++++ > 3 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/reftable/stack.c b/reftable/stack.c > index 1ecf1b9751c..07262beaaf7 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -681,7 +681,7 @@ int reftable_addition_commit(struct reftable_addition *add) > if (err) > goto done; > > - if (!add->stack->disable_auto_compact) > + if (!add->stack->disable_auto_compact && !git_env_bool("GIT_TEST_REFTABLE_NO_AUTOCOMPACTION", 0)) > err = reftable_stack_auto_compact(add->stack); > > done: > diff --git a/reftable/system.h b/reftable/system.h > index 5d8b6dede50..05b7c8554af 100644 > --- a/reftable/system.h > +++ b/reftable/system.h > @@ -17,6 +17,7 @@ license that can be found in the LICENSE file or at > #include "tempfile.h" > #include "hash-ll.h" /* hash ID, sizes.*/ > #include "dir.h" /* remove_dir_recursively, for tests.*/ > +#include "parse.h" > > int hash_size(uint32_t id); > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 686781192eb..434044078ed 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -299,6 +299,21 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' > test_line_count = 1 repo/.git/reftable/tables.list > ' > > +test_expect_success 'ref transaction: environment variable disables auto-compaction' ' > + test_when_finished "rm -rf repo" && > + > + git init repo && > + test_commit -C repo A && > + for i in $(test_seq 20) > + do > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 > + done && > + test_line_count = 23 repo/.git/reftable/tables.list && I am not sure if it is a sensible assumption that init + test_commit (which itself is opaque) will create exactly 3 tables forever, even if it may happen to be true right now. Shouldn't you be counting the lines before entering the for loop and adding 20 to that number to set the expectation? > + git -C repo update-ref foo HEAD && > + test_line_count = 1 repo/.git/reftable/tables.list > +' > + > check_fsync_events () { > local trace="$1" && > shift && ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 1/3] reftable/stack: add env to disable autocompaction 2024-03-29 4:16 ` [PATCH v3 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget 2024-03-29 18:25 ` Junio C Hamano @ 2024-03-29 21:56 ` Junio C Hamano 2024-04-02 7:23 ` Patrick Steinhardt 2 siblings, 0 replies; 52+ messages in thread From: Junio C Hamano @ 2024-03-29 21:56 UTC (permalink / raw) To: Justin Tobler via GitGitGadget Cc: git, Patrick Steinhardt, Karthik Nayak, Justin Tobler "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com> writes: > - if (!add->stack->disable_auto_compact) > + if (!add->stack->disable_auto_compact && !git_env_bool("GIT_TEST_REFTABLE_NO_AUTOCOMPACTION", 0)) Fold the line after " &&", i.e. if (!add->stack->disable_auto_compact && !git_env_bool("GIT_TEST_REFTABLE_NO_AUTOCOMPACTION", 0)) > diff --git a/reftable/system.h b/reftable/system.h > index 5d8b6dede50..05b7c8554af 100644 > --- a/reftable/system.h > +++ b/reftable/system.h > @@ -17,6 +17,7 @@ license that can be found in the LICENSE file or at > #include "tempfile.h" > #include "hash-ll.h" /* hash ID, sizes.*/ > #include "dir.h" /* remove_dir_recursively, for tests.*/ > +#include "parse.h" > > int hash_size(uint32_t id); > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 686781192eb..434044078ed 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -299,6 +299,21 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' > test_line_count = 1 repo/.git/reftable/tables.list > ' > > +test_expect_success 'ref transaction: environment variable disables auto-compaction' ' > + test_when_finished "rm -rf repo" && > + > + git init repo && > + test_commit -C repo A && > + for i in $(test_seq 20) > + do > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 Fold the line before "git", i.e. GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true \ git -C repo update-ref branch-$i HEAD || return 1 > + done && > + test_line_count = 23 repo/.git/reftable/tables.list && > + > + git -C repo update-ref foo HEAD && > + test_line_count = 1 repo/.git/reftable/tables.list > +' > + > check_fsync_events () { > local trace="$1" && > shift && ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 1/3] reftable/stack: add env to disable autocompaction 2024-03-29 4:16 ` [PATCH v3 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget 2024-03-29 18:25 ` Junio C Hamano 2024-03-29 21:56 ` Junio C Hamano @ 2024-04-02 7:23 ` Patrick Steinhardt 2024-04-02 17:23 ` Junio C Hamano 2 siblings, 1 reply; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-02 7:23 UTC (permalink / raw) To: Justin Tobler via GitGitGadget; +Cc: git, Karthik Nayak, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2676 bytes --] On Fri, Mar 29, 2024 at 04:16:47AM +0000, Justin Tobler via GitGitGadget wrote: > From: Justin Tobler <jltobler@gmail.com> > > In future tests it will be neccesary to create repositories with a set > number of tables. To make this easier, introduce the > `GIT_TEST_REFTABLE_NO_AUTOCOMPACTION` environment variable that, when > set, disables autocompaction of reftables. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > --- > reftable/stack.c | 2 +- > reftable/system.h | 1 + > t/t0610-reftable-basics.sh | 15 +++++++++++++++ > 3 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/reftable/stack.c b/reftable/stack.c > index 1ecf1b9751c..07262beaaf7 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -681,7 +681,7 @@ int reftable_addition_commit(struct reftable_addition *add) > if (err) > goto done; > > - if (!add->stack->disable_auto_compact) > + if (!add->stack->disable_auto_compact && !git_env_bool("GIT_TEST_REFTABLE_NO_AUTOCOMPACTION", 0)) > err = reftable_stack_auto_compact(add->stack); The double-negation in `GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=false` may be somewhat hard to parse. Should we rename this to `GIT_TEST_REFTABLE_AUTO_COMPACTION` with a default value of `1`? Patrick > done: > diff --git a/reftable/system.h b/reftable/system.h > index 5d8b6dede50..05b7c8554af 100644 > --- a/reftable/system.h > +++ b/reftable/system.h > @@ -17,6 +17,7 @@ license that can be found in the LICENSE file or at > #include "tempfile.h" > #include "hash-ll.h" /* hash ID, sizes.*/ > #include "dir.h" /* remove_dir_recursively, for tests.*/ > +#include "parse.h" > > int hash_size(uint32_t id); > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 686781192eb..434044078ed 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -299,6 +299,21 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' > test_line_count = 1 repo/.git/reftable/tables.list > ' > > +test_expect_success 'ref transaction: environment variable disables auto-compaction' ' > + test_when_finished "rm -rf repo" && > + > + git init repo && > + test_commit -C repo A && > + for i in $(test_seq 20) > + do > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 > + done && > + test_line_count = 23 repo/.git/reftable/tables.list && > + > + git -C repo update-ref foo HEAD && > + test_line_count = 1 repo/.git/reftable/tables.list > +' > + > check_fsync_events () { > local trace="$1" && > shift && > -- > gitgitgadget > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 1/3] reftable/stack: add env to disable autocompaction 2024-04-02 7:23 ` Patrick Steinhardt @ 2024-04-02 17:23 ` Junio C Hamano 0 siblings, 0 replies; 52+ messages in thread From: Junio C Hamano @ 2024-04-02 17:23 UTC (permalink / raw) To: Patrick Steinhardt Cc: Justin Tobler via GitGitGadget, git, Karthik Nayak, Justin Tobler Patrick Steinhardt <ps@pks.im> writes: >> - if (!add->stack->disable_auto_compact) >> + if (!add->stack->disable_auto_compact && !git_env_bool("GIT_TEST_REFTABLE_NO_AUTOCOMPACTION", 0)) >> err = reftable_stack_auto_compact(add->stack); > > The double-negation in `GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=false` may be > somewhat hard to parse. Should we rename this to > `GIT_TEST_REFTABLE_AUTO_COMPACTION` with a default value of `1`? Sounds like a nice improvement. Also the overlong line should be folded. Thanks. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 2/3] reftable/stack: use geometric table compaction 2024-03-29 4:16 ` [PATCH v3 " Justin Tobler via GitGitGadget 2024-03-29 4:16 ` [PATCH v3 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget @ 2024-03-29 4:16 ` Justin Tobler via GitGitGadget 2024-04-02 7:23 ` Patrick Steinhardt 2024-03-29 4:16 ` [PATCH v3 3/3] reftable/stack: make segment end inclusive Justin Tobler via GitGitGadget 2024-04-03 0:20 ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 3 siblings, 1 reply; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-03-29 4:16 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> To reduce the number of on-disk reftables, compaction is performed. Contiguous tables with the same binary log value of size are grouped into segments. The segment that has both the lowest binary log value and contains more than one table is set as the starting point when identifying the compaction segment. Since segments containing a single table are not initially considered for compaction, if the table appended to the list does not match the previous table log value, no compaction occurs for the new table. It is therefore possible for unbounded growth of the table list. This can be demonstrated by repeating the following sequence: git branch -f foo git branch -d foo Each operation results in a new table being written with no compaction occurring until a separate operation produces a table matching the previous table log value. Instead, to avoid unbounded growth of the table list, the compaction strategy is updated to ensure tables follow a geometric sequence after each operation by individually evaluating each table in reverse index order. This strategy results in a much simpler and more robust algorithm compared to the previous one while also maintaining a minimal ordered set of tables on-disk. When creating 10 thousand references, the new strategy has no performance impact: Benchmark 1: update-ref: create refs sequentially (revision = HEAD~) Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s] Range (min … max): 26.447 s … 26.569 s 10 runs Benchmark 2: update-ref: create refs sequentially (revision = HEAD) Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s] Range (min … max): 26.366 s … 26.444 s 10 runs Summary update-ref: create refs sequentially (revision = HEAD) ran 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~) Some tests in `t0610-reftable-basics.sh` assert the on-disk state of tables and are therefore updated to specify the correct new table count. Since compaction is more aggressive in ensuring tables maintain a geometric sequence, the expected table count is reduced in these tests. In `reftable/stack_test.c` tests related to `sizes_to_segments()` are removed because the function is no longer needed. Also, the `test_suggest_compaction_segment()` test is updated to better showcase and reflect the new geometric compaction behavior. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/stack.c | 120 ++++++++++++++++++------------------- reftable/stack.h | 3 - reftable/stack_test.c | 67 +++++---------------- t/t0610-reftable-basics.sh | 45 +++++++++----- 4 files changed, 103 insertions(+), 132 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index 07262beaaf7..e7b9a1de5a4 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -1202,75 +1202,73 @@ static int segment_size(struct segment *s) return s->end - s->start; } -int fastlog2(uint64_t sz) -{ - int l = 0; - if (sz == 0) - return 0; - for (; sz; sz /= 2) { - l++; - } - return l - 1; -} - -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n) -{ - struct segment *segs = reftable_calloc(n, sizeof(*segs)); - struct segment cur = { 0 }; - size_t next = 0, i; - - if (n == 0) { - *seglen = 0; - return segs; - } - for (i = 0; i < n; i++) { - int log = fastlog2(sizes[i]); - if (cur.log != log && cur.bytes > 0) { - struct segment fresh = { - .start = i, - }; - - segs[next++] = cur; - cur = fresh; - } - - cur.log = log; - cur.end = i + 1; - cur.bytes += sizes[i]; - } - segs[next++] = cur; - *seglen = next; - return segs; -} - struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) { - struct segment min_seg = { - .log = 64, - }; - struct segment *segs; - size_t seglen = 0, i; - - segs = sizes_to_segments(&seglen, sizes, n); - for (i = 0; i < seglen; i++) { - if (segment_size(&segs[i]) == 1) - continue; + struct segment seg = { 0 }; + uint64_t bytes; + size_t i; - if (segs[i].log < min_seg.log) - min_seg = segs[i]; - } + /* + * If there are no tables or only a single one then we don't have to + * compact anything. The sequence is geometric by definition already. + */ + if (n <= 1) + return seg; - while (min_seg.start > 0) { - size_t prev = min_seg.start - 1; - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) + /* + * Find the ending table of the compaction segment needed to restore the + * geometric sequence. + * + * To do so, we iterate backwards starting from the most recent table + * until a valid segment end is found. If the preceding table is smaller + * than the current table multiplied by the geometric factor (2), the + * current table is set as the compaction segment end. + * + * Tables after the ending point are not added to the byte count because + * they are already valid members of the geometric sequence. Due to the + * properties of a geometric sequence, it is not possible for the sum of + * these tables to exceed the value of the ending point table. + * + * Example table size sequence requiring no compaction: + * 64, 32, 16, 8, 4, 2, 1 + * + * Example compaction segment end set to table with size 3: + * 64, 32, 16, 8, 4, 3, 1 + */ + for (i = n - 1; i > 0; i--) { + if (sizes[i - 1] < sizes[i] * 2) { + seg.end = i + 1; + bytes = sizes[i]; break; + } + } - min_seg.start = prev; - min_seg.bytes += sizes[prev]; + /* + * Find the starting table of the compaction segment by iterating + * through the remaining tables and keeping track of the accumulated + * size of all tables seen from the segment end table. The previous + * table is compared to the accumulated size because the tables from the + * segment end are merged backwards recursively. + * + * Note that we keep iterating even after we have found the first + * starting point. This is because there may be tables in the stack + * preceding that first starting point which violate the geometric + * sequence. + * + * Example compaction segment start set to table with size 32: + * 128, 32, 16, 8, 4, 3, 1 + */ + for (; i > 0; i--) { + uint64_t curr = bytes; + bytes += sizes[i - 1]; + + if (sizes[i - 1] < curr * 2) { + seg.start = i - 1; + seg.bytes = bytes; + } } - reftable_free(segs); - return min_seg; + return seg; } static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) diff --git a/reftable/stack.h b/reftable/stack.h index d919455669e..656f896cc28 100644 --- a/reftable/stack.h +++ b/reftable/stack.h @@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines); struct segment { size_t start, end; - int log; uint64_t bytes; }; -int fastlog2(uint64_t sz); -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n); struct segment suggest_compaction_segment(uint64_t *sizes, size_t n); #endif diff --git a/reftable/stack_test.c b/reftable/stack_test.c index 7336757cf53..21541742fe5 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -717,59 +717,13 @@ static void test_reftable_stack_hash_id(void) clear_dir(dir); } -static void test_log2(void) -{ - EXPECT(1 == fastlog2(3)); - EXPECT(2 == fastlog2(4)); - EXPECT(2 == fastlog2(5)); -} - -static void test_sizes_to_segments(void) -{ - uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 }; - /* .................0 1 2 3 4 5 */ - - size_t seglen = 0; - struct segment *segs = - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); - EXPECT(segs[2].log == 3); - EXPECT(segs[2].start == 5); - EXPECT(segs[2].end == 6); - - EXPECT(segs[1].log == 2); - EXPECT(segs[1].start == 2); - EXPECT(segs[1].end == 5); - reftable_free(segs); -} - -static void test_sizes_to_segments_empty(void) -{ - size_t seglen = 0; - struct segment *segs = sizes_to_segments(&seglen, NULL, 0); - EXPECT(seglen == 0); - reftable_free(segs); -} - -static void test_sizes_to_segments_all_equal(void) -{ - uint64_t sizes[] = { 5, 5 }; - size_t seglen = 0; - struct segment *segs = - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); - EXPECT(seglen == 1); - EXPECT(segs[0].start == 0); - EXPECT(segs[0].end == 2); - reftable_free(segs); -} - static void test_suggest_compaction_segment(void) { - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; - /* .................0 1 2 3 4 5 6 */ + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; struct segment min = suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); - EXPECT(min.start == 2); - EXPECT(min.end == 7); + EXPECT(min.start == 1); + EXPECT(min.end == 10); } static void test_suggest_compaction_segment_nothing(void) @@ -880,6 +834,17 @@ static void test_empty_add(void) reftable_stack_destroy(st2); } +static int fastlog2(uint64_t sz) +{ + int l = 0; + if (sz == 0) + return 0; + for (; sz; sz /= 2) { + l++; + } + return l - 1; +} + static void test_reftable_stack_auto_compaction(void) { struct reftable_write_options cfg = { 0 }; @@ -1068,7 +1033,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void) int stack_test_main(int argc, const char *argv[]) { RUN_TEST(test_empty_add); - RUN_TEST(test_log2); RUN_TEST(test_names_equal); RUN_TEST(test_parse_names); RUN_TEST(test_read_file); @@ -1088,9 +1052,6 @@ int stack_test_main(int argc, const char *argv[]) RUN_TEST(test_reftable_stack_update_index_check); RUN_TEST(test_reftable_stack_uptodate); RUN_TEST(test_reftable_stack_validate_refname); - RUN_TEST(test_sizes_to_segments); - RUN_TEST(test_sizes_to_segments_all_equal); - RUN_TEST(test_sizes_to_segments_empty); RUN_TEST(test_suggest_compaction_segment); RUN_TEST(test_suggest_compaction_segment_nothing); return 0; diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 434044078ed..b95626549e7 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -293,7 +293,7 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' test_line_count = 1 repo/.git/reftable/tables.list && test_commit -C repo --no-tag A && - test_line_count = 2 repo/.git/reftable/tables.list && + test_line_count = 1 repo/.git/reftable/tables.list && test_commit -C repo --no-tag B && test_line_count = 1 repo/.git/reftable/tables.list @@ -308,12 +308,24 @@ test_expect_success 'ref transaction: environment variable disables auto-compact do GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 done && - test_line_count = 23 repo/.git/reftable/tables.list && + test_line_count = 22 repo/.git/reftable/tables.list && git -C repo update-ref foo HEAD && test_line_count = 1 repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: alternating table sizes are compacted' ' + test_when_finished "rm -rf repo" && + git init repo && + test_commit -C repo A && + for i in $(test_seq 20) + do + git -C repo branch -f foo && + git -C repo branch -d foo || return 1 + done && + test_line_count = 2 repo/.git/reftable/tables.list +' + check_fsync_events () { local trace="$1" && shift && @@ -339,7 +351,7 @@ test_expect_success 'ref transaction: writes are synced' ' git -C repo -c core.fsync=reference \ -c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD && check_fsync_events trace2.txt <<-EOF - "name":"hardware-flush","count":2 + "name":"hardware-flush","count":4 EOF ' @@ -361,8 +373,8 @@ test_expect_success 'pack-refs: compacts tables' ' test_commit -C repo A && ls -1 repo/.git/reftable >table-files && - test_line_count = 4 table-files && - test_line_count = 3 repo/.git/reftable/tables.list && + test_line_count = 3 table-files && + test_line_count = 2 repo/.git/reftable/tables.list && git -C repo pack-refs && ls -1 repo/.git/reftable >table-files && @@ -394,7 +406,7 @@ do umask $umask && git init --shared=true repo && test_commit -C repo A && - test_line_count = 3 repo/.git/reftable/tables.list + test_line_count = 2 repo/.git/reftable/tables.list ) && git -C repo pack-refs && test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list && @@ -762,12 +774,13 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && - git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 3 repo/.git/reftable/tables.list && git -C repo pack-refs && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list ' @@ -775,19 +788,21 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && - git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 3 repo/.git/reftable/tables.list && git -C worktree pack-refs && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list + test_line_count = 3 repo/.git/reftable/tables.list ' test_expect_success 'worktree: creating shared ref updates main stack' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && + test_commit -C repo B && git -C repo worktree add ../worktree && git -C repo pack-refs && @@ -795,7 +810,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' ' test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list && - git -C worktree update-ref refs/heads/shared HEAD && + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/heads/shared HEAD && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 2 repo/.git/reftable/tables.list ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3 2/3] reftable/stack: use geometric table compaction 2024-03-29 4:16 ` [PATCH v3 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget @ 2024-04-02 7:23 ` Patrick Steinhardt 0 siblings, 0 replies; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-02 7:23 UTC (permalink / raw) To: Justin Tobler via GitGitGadget; +Cc: git, Karthik Nayak, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 12351 bytes --] On Fri, Mar 29, 2024 at 04:16:48AM +0000, Justin Tobler via GitGitGadget wrote: > From: Justin Tobler <jltobler@gmail.com> > > To reduce the number of on-disk reftables, compaction is performed. > Contiguous tables with the same binary log value of size are grouped > into segments. The segment that has both the lowest binary log value and > contains more than one table is set as the starting point when > identifying the compaction segment. > > Since segments containing a single table are not initially considered > for compaction, if the table appended to the list does not match the > previous table log value, no compaction occurs for the new table. It is > therefore possible for unbounded growth of the table list. This can be > demonstrated by repeating the following sequence: > > git branch -f foo > git branch -d foo > > Each operation results in a new table being written with no compaction > occurring until a separate operation produces a table matching the > previous table log value. > > Instead, to avoid unbounded growth of the table list, the compaction > strategy is updated to ensure tables follow a geometric sequence after > each operation by individually evaluating each table in reverse index > order. This strategy results in a much simpler and more robust algorithm > compared to the previous one while also maintaining a minimal ordered > set of tables on-disk. > > When creating 10 thousand references, the new strategy has no > performance impact: > > Benchmark 1: update-ref: create refs sequentially (revision = HEAD~) > Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s] > Range (min … max): 26.447 s … 26.569 s 10 runs > > Benchmark 2: update-ref: create refs sequentially (revision = HEAD) > Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s] > Range (min … max): 26.366 s … 26.444 s 10 runs > > Summary > update-ref: create refs sequentially (revision = HEAD) ran > 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~) > > Some tests in `t0610-reftable-basics.sh` assert the on-disk state of > tables and are therefore updated to specify the correct new table count. > Since compaction is more aggressive in ensuring tables maintain a > geometric sequence, the expected table count is reduced in these tests. > In `reftable/stack_test.c` tests related to `sizes_to_segments()` are > removed because the function is no longer needed. Also, the > `test_suggest_compaction_segment()` test is updated to better showcase > and reflect the new geometric compaction behavior. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > --- > reftable/stack.c | 120 ++++++++++++++++++------------------- > reftable/stack.h | 3 - > reftable/stack_test.c | 67 +++++---------------- > t/t0610-reftable-basics.sh | 45 +++++++++----- > 4 files changed, 103 insertions(+), 132 deletions(-) > > diff --git a/reftable/stack.c b/reftable/stack.c > index 07262beaaf7..e7b9a1de5a4 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -1202,75 +1202,73 @@ static int segment_size(struct segment *s) > return s->end - s->start; > } > > -int fastlog2(uint64_t sz) > -{ > - int l = 0; > - if (sz == 0) > - return 0; > - for (; sz; sz /= 2) { > - l++; > - } > - return l - 1; > -} > - > -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n) > -{ > - struct segment *segs = reftable_calloc(n, sizeof(*segs)); > - struct segment cur = { 0 }; > - size_t next = 0, i; > - > - if (n == 0) { > - *seglen = 0; > - return segs; > - } > - for (i = 0; i < n; i++) { > - int log = fastlog2(sizes[i]); > - if (cur.log != log && cur.bytes > 0) { > - struct segment fresh = { > - .start = i, > - }; > - > - segs[next++] = cur; > - cur = fresh; > - } > - > - cur.log = log; > - cur.end = i + 1; > - cur.bytes += sizes[i]; > - } > - segs[next++] = cur; > - *seglen = next; > - return segs; > -} > - > struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) > { > - struct segment min_seg = { > - .log = 64, > - }; > - struct segment *segs; > - size_t seglen = 0, i; > - > - segs = sizes_to_segments(&seglen, sizes, n); > - for (i = 0; i < seglen; i++) { > - if (segment_size(&segs[i]) == 1) > - continue; > + struct segment seg = { 0 }; > + uint64_t bytes; > + size_t i; > > - if (segs[i].log < min_seg.log) > - min_seg = segs[i]; > - } > + /* > + * If there are no tables or only a single one then we don't have to > + * compact anything. The sequence is geometric by definition already. > + */ > + if (n <= 1) > + return seg; > > - while (min_seg.start > 0) { > - size_t prev = min_seg.start - 1; > - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) > + /* > + * Find the ending table of the compaction segment needed to restore the > + * geometric sequence. > + * > + * To do so, we iterate backwards starting from the most recent table > + * until a valid segment end is found. If the preceding table is smaller > + * than the current table multiplied by the geometric factor (2), the > + * current table is set as the compaction segment end. > + * > + * Tables after the ending point are not added to the byte count because > + * they are already valid members of the geometric sequence. Due to the > + * properties of a geometric sequence, it is not possible for the sum of > + * these tables to exceed the value of the ending point table. > + * > + * Example table size sequence requiring no compaction: > + * 64, 32, 16, 8, 4, 2, 1 > + * > + * Example compaction segment end set to table with size 3: > + * 64, 32, 16, 8, 4, 3, 1 > + */ > + for (i = n - 1; i > 0; i--) { > + if (sizes[i - 1] < sizes[i] * 2) { > + seg.end = i + 1; > + bytes = sizes[i]; > break; > + } > + } > > - min_seg.start = prev; > - min_seg.bytes += sizes[prev]; > + /* > + * Find the starting table of the compaction segment by iterating > + * through the remaining tables and keeping track of the accumulated > + * size of all tables seen from the segment end table. The previous > + * table is compared to the accumulated size because the tables from the > + * segment end are merged backwards recursively. > + * > + * Note that we keep iterating even after we have found the first > + * starting point. This is because there may be tables in the stack > + * preceding that first starting point which violate the geometric > + * sequence. > + * > + * Example compaction segment start set to table with size 32: > + * 128, 32, 16, 8, 4, 3, 1 > + */ > + for (; i > 0; i--) { > + uint64_t curr = bytes; > + bytes += sizes[i - 1]; > + > + if (sizes[i - 1] < curr * 2) { > + seg.start = i - 1; > + seg.bytes = bytes; > + } > } > > - reftable_free(segs); > - return min_seg; > + return seg; > } > > static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) > diff --git a/reftable/stack.h b/reftable/stack.h > index d919455669e..656f896cc28 100644 > --- a/reftable/stack.h > +++ b/reftable/stack.h > @@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines); > > struct segment { > size_t start, end; > - int log; > uint64_t bytes; > }; > > -int fastlog2(uint64_t sz); > -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n); > struct segment suggest_compaction_segment(uint64_t *sizes, size_t n); > > #endif > diff --git a/reftable/stack_test.c b/reftable/stack_test.c > index 7336757cf53..21541742fe5 100644 > --- a/reftable/stack_test.c > +++ b/reftable/stack_test.c > @@ -717,59 +717,13 @@ static void test_reftable_stack_hash_id(void) > clear_dir(dir); > } > > -static void test_log2(void) > -{ > - EXPECT(1 == fastlog2(3)); > - EXPECT(2 == fastlog2(4)); > - EXPECT(2 == fastlog2(5)); > -} > - > -static void test_sizes_to_segments(void) > -{ > - uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 }; > - /* .................0 1 2 3 4 5 */ > - > - size_t seglen = 0; > - struct segment *segs = > - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); > - EXPECT(segs[2].log == 3); > - EXPECT(segs[2].start == 5); > - EXPECT(segs[2].end == 6); > - > - EXPECT(segs[1].log == 2); > - EXPECT(segs[1].start == 2); > - EXPECT(segs[1].end == 5); > - reftable_free(segs); > -} > - > -static void test_sizes_to_segments_empty(void) > -{ > - size_t seglen = 0; > - struct segment *segs = sizes_to_segments(&seglen, NULL, 0); > - EXPECT(seglen == 0); > - reftable_free(segs); > -} > - > -static void test_sizes_to_segments_all_equal(void) > -{ > - uint64_t sizes[] = { 5, 5 }; > - size_t seglen = 0; > - struct segment *segs = > - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); > - EXPECT(seglen == 1); > - EXPECT(segs[0].start == 0); > - EXPECT(segs[0].end == 2); > - reftable_free(segs); > -} > - > static void test_suggest_compaction_segment(void) > { > - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; > - /* .................0 1 2 3 4 5 6 */ > + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; > struct segment min = > suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); > - EXPECT(min.start == 2); > - EXPECT(min.end == 7); > + EXPECT(min.start == 1); > + EXPECT(min.end == 10); > } > > static void test_suggest_compaction_segment_nothing(void) > @@ -880,6 +834,17 @@ static void test_empty_add(void) > reftable_stack_destroy(st2); > } > > +static int fastlog2(uint64_t sz) > +{ > + int l = 0; > + if (sz == 0) > + return 0; > + for (; sz; sz /= 2) { > + l++; > + } Nit: we could drop the curly braces while at it. > + return l - 1; > +} > + > static void test_reftable_stack_auto_compaction(void) > { > struct reftable_write_options cfg = { 0 }; > @@ -1068,7 +1033,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void) > int stack_test_main(int argc, const char *argv[]) > { > RUN_TEST(test_empty_add); > - RUN_TEST(test_log2); > RUN_TEST(test_names_equal); > RUN_TEST(test_parse_names); > RUN_TEST(test_read_file); > @@ -1088,9 +1052,6 @@ int stack_test_main(int argc, const char *argv[]) > RUN_TEST(test_reftable_stack_update_index_check); > RUN_TEST(test_reftable_stack_uptodate); > RUN_TEST(test_reftable_stack_validate_refname); > - RUN_TEST(test_sizes_to_segments); > - RUN_TEST(test_sizes_to_segments_all_equal); > - RUN_TEST(test_sizes_to_segments_empty); > RUN_TEST(test_suggest_compaction_segment); > RUN_TEST(test_suggest_compaction_segment_nothing); > return 0; > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 434044078ed..b95626549e7 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -293,7 +293,7 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' > test_line_count = 1 repo/.git/reftable/tables.list && > > test_commit -C repo --no-tag A && > - test_line_count = 2 repo/.git/reftable/tables.list && > + test_line_count = 1 repo/.git/reftable/tables.list && > > test_commit -C repo --no-tag B && > test_line_count = 1 repo/.git/reftable/tables.list > @@ -308,12 +308,24 @@ test_expect_success 'ref transaction: environment variable disables auto-compact > do > GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 > done && > - test_line_count = 23 repo/.git/reftable/tables.list && > + test_line_count = 22 repo/.git/reftable/tables.list && > > git -C repo update-ref foo HEAD && > test_line_count = 1 repo/.git/reftable/tables.list > ' > > +test_expect_success 'ref transaction: alternating table sizes are compacted' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + test_commit -C repo A && > + for i in $(test_seq 20) Nit: we could reduce the number of iterations here so that we don't have to spawn 40 Git commands. Using something like 10 or even 5 iterations should be sufficient to demonstrate the problem, no? Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 3/3] reftable/stack: make segment end inclusive 2024-03-29 4:16 ` [PATCH v3 " Justin Tobler via GitGitGadget 2024-03-29 4:16 ` [PATCH v3 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget 2024-03-29 4:16 ` [PATCH v3 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget @ 2024-03-29 4:16 ` Justin Tobler via GitGitGadget 2024-03-29 18:36 ` Junio C Hamano 2024-04-03 0:20 ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 3 siblings, 1 reply; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-03-29 4:16 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> For a reftable segment, the start of the range is inclusive and the end is exclusive. In practice we increment the end when creating the compaction segment only to decrement the segment end when using it. Simplify by making the segment end inclusive. The corresponding test, `test_suggest_compaction_segment()`, is updated to show that the segment end is now inclusive. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/stack.c | 4 ++-- reftable/stack_test.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index e7b9a1de5a4..0973c47dd92 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -1237,7 +1237,7 @@ struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) */ for (i = n - 1; i > 0; i--) { if (sizes[i - 1] < sizes[i] * 2) { - seg.end = i + 1; + seg.end = i; bytes = sizes[i]; break; } @@ -1291,7 +1291,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st) suggest_compaction_segment(sizes, st->merged->stack_len); reftable_free(sizes); if (segment_size(&seg) > 0) - return stack_compact_range_stats(st, seg.start, seg.end - 1, + return stack_compact_range_stats(st, seg.start, seg.end, NULL); return 0; diff --git a/reftable/stack_test.c b/reftable/stack_test.c index 21541742fe5..4d7305623a0 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -723,7 +723,7 @@ static void test_suggest_compaction_segment(void) struct segment min = suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); EXPECT(min.start == 1); - EXPECT(min.end == 10); + EXPECT(min.end == 9); } static void test_suggest_compaction_segment_nothing(void) -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3 3/3] reftable/stack: make segment end inclusive 2024-03-29 4:16 ` [PATCH v3 3/3] reftable/stack: make segment end inclusive Justin Tobler via GitGitGadget @ 2024-03-29 18:36 ` Junio C Hamano 2024-04-02 7:23 ` Patrick Steinhardt 0 siblings, 1 reply; 52+ messages in thread From: Junio C Hamano @ 2024-03-29 18:36 UTC (permalink / raw) To: Justin Tobler via GitGitGadget Cc: git, Patrick Steinhardt, Karthik Nayak, Justin Tobler "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Justin Tobler <jltobler@gmail.com> > > For a reftable segment, the start of the range is inclusive and the end > is exclusive. In practice we increment the end when creating the > compaction segment only to decrement the segment end when using it. > > Simplify by making the segment end inclusive. The corresponding test, > `test_suggest_compaction_segment()`, is updated to show that the segment > end is now inclusive. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > --- > reftable/stack.c | 4 ++-- > reftable/stack_test.c | 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) I'd defer it to Patrick (and Han-Wen, if he wants to comment on it), but isn't it a natural expectation shared among CS folks that it is the most usual way to express a range to use inclusive lower-end and exclusive upper-end? After all, that is how an array works, i.e. msg[n] is NULL and beyond the end where n == strlen(msg). So, I dunno. > diff --git a/reftable/stack.c b/reftable/stack.c > index e7b9a1de5a4..0973c47dd92 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -1237,7 +1237,7 @@ struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) > */ > for (i = n - 1; i > 0; i--) { > if (sizes[i - 1] < sizes[i] * 2) { > - seg.end = i + 1; > + seg.end = i; > bytes = sizes[i]; > break; > } > @@ -1291,7 +1291,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st) > suggest_compaction_segment(sizes, st->merged->stack_len); > reftable_free(sizes); > if (segment_size(&seg) > 0) > - return stack_compact_range_stats(st, seg.start, seg.end - 1, > + return stack_compact_range_stats(st, seg.start, seg.end, > NULL); > > return 0; > diff --git a/reftable/stack_test.c b/reftable/stack_test.c > index 21541742fe5..4d7305623a0 100644 > --- a/reftable/stack_test.c > +++ b/reftable/stack_test.c > @@ -723,7 +723,7 @@ static void test_suggest_compaction_segment(void) > struct segment min = > suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); > EXPECT(min.start == 1); > - EXPECT(min.end == 10); > + EXPECT(min.end == 9); > } > > static void test_suggest_compaction_segment_nothing(void) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 3/3] reftable/stack: make segment end inclusive 2024-03-29 18:36 ` Junio C Hamano @ 2024-04-02 7:23 ` Patrick Steinhardt 0 siblings, 0 replies; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-02 7:23 UTC (permalink / raw) To: Junio C Hamano Cc: Justin Tobler via GitGitGadget, git, Karthik Nayak, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2792 bytes --] On Fri, Mar 29, 2024 at 11:36:33AM -0700, Junio C Hamano wrote: > "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Justin Tobler <jltobler@gmail.com> > > > > For a reftable segment, the start of the range is inclusive and the end > > is exclusive. In practice we increment the end when creating the > > compaction segment only to decrement the segment end when using it. > > > > Simplify by making the segment end inclusive. The corresponding test, > > `test_suggest_compaction_segment()`, is updated to show that the segment > > end is now inclusive. > > > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > > --- > > reftable/stack.c | 4 ++-- > > reftable/stack_test.c | 2 +- > > 2 files changed, 3 insertions(+), 3 deletions(-) > > I'd defer it to Patrick (and Han-Wen, if he wants to comment on it), > but isn't it a natural expectation shared among CS folks that it is > the most usual way to express a range to use inclusive lower-end and > exclusive upper-end? > > After all, that is how an array works, i.e. msg[n] is NULL and > beyond the end where n == strlen(msg). > > So, I dunno. I don't really have a strong opinion here, to be honest. I think the previous way to handle this was fine, the new way is okay, too. Which may indicate that we can just drop this patch to avoid needless churn unless somebody feels strongly about this. Patrick > > diff --git a/reftable/stack.c b/reftable/stack.c > > index e7b9a1de5a4..0973c47dd92 100644 > > --- a/reftable/stack.c > > +++ b/reftable/stack.c > > @@ -1237,7 +1237,7 @@ struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) > > */ > > for (i = n - 1; i > 0; i--) { > > if (sizes[i - 1] < sizes[i] * 2) { > > - seg.end = i + 1; > > + seg.end = i; > > bytes = sizes[i]; > > break; > > } > > > > > @@ -1291,7 +1291,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st) > > suggest_compaction_segment(sizes, st->merged->stack_len); > > reftable_free(sizes); > > if (segment_size(&seg) > 0) > > - return stack_compact_range_stats(st, seg.start, seg.end - 1, > > + return stack_compact_range_stats(st, seg.start, seg.end, > > NULL); > > > > return 0; > > diff --git a/reftable/stack_test.c b/reftable/stack_test.c > > index 21541742fe5..4d7305623a0 100644 > > --- a/reftable/stack_test.c > > +++ b/reftable/stack_test.c > > @@ -723,7 +723,7 @@ static void test_suggest_compaction_segment(void) > > struct segment min = > > suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); > > EXPECT(min.start == 1); > > - EXPECT(min.end == 10); > > + EXPECT(min.end == 9); > > } > > > > static void test_suggest_compaction_segment_nothing(void) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v4 0/2] reftable/stack: use geometric table compaction 2024-03-29 4:16 ` [PATCH v3 " Justin Tobler via GitGitGadget ` (2 preceding siblings ...) 2024-03-29 4:16 ` [PATCH v3 3/3] reftable/stack: make segment end inclusive Justin Tobler via GitGitGadget @ 2024-04-03 0:20 ` Justin Tobler via GitGitGadget 2024-04-03 0:20 ` [PATCH v4 1/2] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget ` (4 more replies) 3 siblings, 5 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-03 0:20 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Justin Tobler Hello again, This is the fourth version my patch series that refactors the reftable compaction strategy to instead follow a geometric sequence. Changes compared to v3: * Changed env name from GIT_TEST_REFTABLE_NO_AUTOCOMPACTION to GIT_TEST_REFTABLE_AUTOCOMPACTION and set the default to false. This should hopefully be a bit more intuitive since it avoids the double negative. * Updated the corresponding env var test in t0610-reftable-basics.sh to assert on the number of tables added and be overall less fragile. * Folded lines that were too long. * Updated some comments in stack.c to more accurately explain that table segment end is exclusive. * Dropped reftable/stack: make segment end inclusive commit to keep segment end exclusive and better follow expectations. Thanks for taking a look! -Justin Justin Tobler (2): reftable/stack: add env to disable autocompaction reftable/stack: use geometric table compaction reftable/stack.c | 126 +++++++++++++++++++------------------ reftable/stack.h | 3 - reftable/stack_test.c | 66 ++++--------------- reftable/system.h | 1 + t/t0610-reftable-basics.sh | 65 +++++++++++++++---- 5 files changed, 132 insertions(+), 129 deletions(-) base-commit: c75fd8d8150afdf836b63a8e0534d9b9e3e111ba Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/1683 Range-diff vs v3: 1: 2fdd8ea1133 ! 1: 2a0421e5f20 reftable/stack: add env to disable autocompaction @@ Commit message In future tests it will be neccesary to create repositories with a set number of tables. To make this easier, introduce the - `GIT_TEST_REFTABLE_NO_AUTOCOMPACTION` environment variable that, when - set, disables autocompaction of reftables. + `GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set + to false, disables autocompaction of reftables. Signed-off-by: Justin Tobler <jltobler@gmail.com> @@ reftable/stack.c: int reftable_addition_commit(struct reftable_addition *add) goto done; - if (!add->stack->disable_auto_compact) -+ if (!add->stack->disable_auto_compact && !git_env_bool("GIT_TEST_REFTABLE_NO_AUTOCOMPACTION", 0)) ++ if (!add->stack->disable_auto_compact && ++ git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) err = reftable_stack_auto_compact(add->stack); done: @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause a test_line_count = 1 repo/.git/reftable/tables.list ' -+test_expect_success 'ref transaction: environment variable disables auto-compaction' ' ++test_expect_success 'ref transaction: env var disables compaction' ' + test_when_finished "rm -rf repo" && + + git init repo && + test_commit -C repo A && -+ for i in $(test_seq 20) ++ ++ start=$(wc -l <repo/.git/reftable/tables.list) && ++ iterations=5 && ++ expected=$((start + iterations)) && ++ ++ for i in $(test_seq $iterations) + do -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ ++ git -C repo update-ref branch-$i HEAD || return 1 + done && -+ test_line_count = 23 repo/.git/reftable/tables.list && ++ test_line_count = $expected repo/.git/reftable/tables.list && + + git -C repo update-ref foo HEAD && -+ test_line_count = 1 repo/.git/reftable/tables.list ++ test_line_count -lt $expected repo/.git/reftable/tables.list +' + check_fsync_events () { 2: 7e62c2286ae ! 2: e0f4d0dbcc1 reftable/stack: use geometric table compaction @@ reftable/stack.c: static int segment_size(struct segment *s) - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) + /* + * Find the ending table of the compaction segment needed to restore the -+ * geometric sequence. ++ * geometric sequence. Note that the segment end is exclusive. + * + * To do so, we iterate backwards starting from the most recent table + * until a valid segment end is found. If the preceding table is smaller + * than the current table multiplied by the geometric factor (2), the -+ * current table is set as the compaction segment end. ++ * compaction segment end has been identified. + * + * Tables after the ending point are not added to the byte count because + * they are already valid members of the geometric sequence. Due to the @@ reftable/stack.c: static int segment_size(struct segment *s) + * Example table size sequence requiring no compaction: + * 64, 32, 16, 8, 4, 2, 1 + * -+ * Example compaction segment end set to table with size 3: ++ * Example table size sequence where compaction segment end is set to ++ * the last table. Since the segment end is exclusive, the last table is ++ * excluded during subsequent compaction and the table with size 3 is ++ * the final table included: + * 64, 32, 16, 8, 4, 3, 1 + */ + for (i = n - 1; i > 0; i--) { @@ reftable/stack_test.c: static void test_empty_add(void) + int l = 0; + if (sz == 0) + return 0; -+ for (; sz; sz /= 2) { ++ for (; sz; sz /= 2) + l++; -+ } + return l - 1; +} + @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause a test_commit -C repo --no-tag B && test_line_count = 1 repo/.git/reftable/tables.list -@@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: environment variable disables auto-compact - do - GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 - done && -- test_line_count = 23 repo/.git/reftable/tables.list && -+ test_line_count = 22 repo/.git/reftable/tables.list && - - git -C repo update-ref foo HEAD && - test_line_count = 1 repo/.git/reftable/tables.list +@@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: env var disables compaction' ' + test_line_count -lt $expected repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: alternating table sizes are compacted' ' + test_when_finished "rm -rf repo" && ++ + git init repo && + test_commit -C repo A && -+ for i in $(test_seq 20) ++ for i in $(test_seq 5) + do + git -C repo branch -f foo && + git -C repo branch -d foo || return 1 @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: pack-refs in main rep test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && -- git -C repo worktree add ../worktree && -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && ++ ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C repo worktree add ../worktree && ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ ++ git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: pack-refs in worktree test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && -- git -C repo worktree add ../worktree && -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && ++ ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C repo worktree add ../worktree && ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ ++ git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: pack-refs in worktree ' test_expect_success 'worktree: creating shared ref updates main stack' ' - test_when_finished "rm -rf repo worktree" && - git init repo && - test_commit -C repo A && -+ test_commit -C repo B && - - git -C repo worktree add ../worktree && - git -C repo pack-refs && @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: creating shared ref updates main stack' ' test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list && -- git -C worktree update-ref refs/heads/shared HEAD && -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/heads/shared HEAD && ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C worktree update-ref refs/heads/shared HEAD && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 2 repo/.git/reftable/tables.list - ' 3: 9a33914c852 < -: ----------- reftable/stack: make segment end inclusive -- gitgitgadget ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v4 1/2] reftable/stack: add env to disable autocompaction 2024-04-03 0:20 ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget @ 2024-04-03 0:20 ` Justin Tobler via GitGitGadget 2024-04-03 0:20 ` [PATCH v4 2/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget ` (3 subsequent siblings) 4 siblings, 0 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-03 0:20 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> In future tests it will be neccesary to create repositories with a set number of tables. To make this easier, introduce the `GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set to false, disables autocompaction of reftables. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/stack.c | 3 ++- reftable/system.h | 1 + t/t0610-reftable-basics.sh | 21 +++++++++++++++++++++ 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/reftable/stack.c b/reftable/stack.c index 1ecf1b9751c..4c373fb0ee2 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -681,7 +681,8 @@ int reftable_addition_commit(struct reftable_addition *add) if (err) goto done; - if (!add->stack->disable_auto_compact) + if (!add->stack->disable_auto_compact && + git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) err = reftable_stack_auto_compact(add->stack); done: diff --git a/reftable/system.h b/reftable/system.h index 5d8b6dede50..05b7c8554af 100644 --- a/reftable/system.h +++ b/reftable/system.h @@ -17,6 +17,7 @@ license that can be found in the LICENSE file or at #include "tempfile.h" #include "hash-ll.h" /* hash ID, sizes.*/ #include "dir.h" /* remove_dir_recursively, for tests.*/ +#include "parse.h" int hash_size(uint32_t id); diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 686781192eb..444f7497907 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -299,6 +299,27 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' test_line_count = 1 repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: env var disables compaction' ' + test_when_finished "rm -rf repo" && + + git init repo && + test_commit -C repo A && + + start=$(wc -l <repo/.git/reftable/tables.list) && + iterations=5 && + expected=$((start + iterations)) && + + for i in $(test_seq $iterations) + do + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C repo update-ref branch-$i HEAD || return 1 + done && + test_line_count = $expected repo/.git/reftable/tables.list && + + git -C repo update-ref foo HEAD && + test_line_count -lt $expected repo/.git/reftable/tables.list +' + check_fsync_events () { local trace="$1" && shift && -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v4 2/2] reftable/stack: use geometric table compaction 2024-04-03 0:20 ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-04-03 0:20 ` [PATCH v4 1/2] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget @ 2024-04-03 0:20 ` Justin Tobler via GitGitGadget 2024-04-03 4:47 ` [PATCH v4 0/2] " Patrick Steinhardt ` (2 subsequent siblings) 4 siblings, 0 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-03 0:20 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> To reduce the number of on-disk reftables, compaction is performed. Contiguous tables with the same binary log value of size are grouped into segments. The segment that has both the lowest binary log value and contains more than one table is set as the starting point when identifying the compaction segment. Since segments containing a single table are not initially considered for compaction, if the table appended to the list does not match the previous table log value, no compaction occurs for the new table. It is therefore possible for unbounded growth of the table list. This can be demonstrated by repeating the following sequence: git branch -f foo git branch -d foo Each operation results in a new table being written with no compaction occurring until a separate operation produces a table matching the previous table log value. Instead, to avoid unbounded growth of the table list, the compaction strategy is updated to ensure tables follow a geometric sequence after each operation by individually evaluating each table in reverse index order. This strategy results in a much simpler and more robust algorithm compared to the previous one while also maintaining a minimal ordered set of tables on-disk. When creating 10 thousand references, the new strategy has no performance impact: Benchmark 1: update-ref: create refs sequentially (revision = HEAD~) Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s] Range (min … max): 26.447 s … 26.569 s 10 runs Benchmark 2: update-ref: create refs sequentially (revision = HEAD) Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s] Range (min … max): 26.366 s … 26.444 s 10 runs Summary update-ref: create refs sequentially (revision = HEAD) ran 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~) Some tests in `t0610-reftable-basics.sh` assert the on-disk state of tables and are therefore updated to specify the correct new table count. Since compaction is more aggressive in ensuring tables maintain a geometric sequence, the expected table count is reduced in these tests. In `reftable/stack_test.c` tests related to `sizes_to_segments()` are removed because the function is no longer needed. Also, the `test_suggest_compaction_segment()` test is updated to better showcase and reflect the new geometric compaction behavior. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/stack.c | 123 +++++++++++++++++++------------------ reftable/stack.h | 3 - reftable/stack_test.c | 66 ++++---------------- t/t0610-reftable-basics.sh | 44 +++++++++---- 4 files changed, 108 insertions(+), 128 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index 4c373fb0ee2..5f087cbd4bc 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -1203,75 +1203,76 @@ static int segment_size(struct segment *s) return s->end - s->start; } -int fastlog2(uint64_t sz) -{ - int l = 0; - if (sz == 0) - return 0; - for (; sz; sz /= 2) { - l++; - } - return l - 1; -} - -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n) -{ - struct segment *segs = reftable_calloc(n, sizeof(*segs)); - struct segment cur = { 0 }; - size_t next = 0, i; - - if (n == 0) { - *seglen = 0; - return segs; - } - for (i = 0; i < n; i++) { - int log = fastlog2(sizes[i]); - if (cur.log != log && cur.bytes > 0) { - struct segment fresh = { - .start = i, - }; - - segs[next++] = cur; - cur = fresh; - } - - cur.log = log; - cur.end = i + 1; - cur.bytes += sizes[i]; - } - segs[next++] = cur; - *seglen = next; - return segs; -} - struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) { - struct segment min_seg = { - .log = 64, - }; - struct segment *segs; - size_t seglen = 0, i; - - segs = sizes_to_segments(&seglen, sizes, n); - for (i = 0; i < seglen; i++) { - if (segment_size(&segs[i]) == 1) - continue; + struct segment seg = { 0 }; + uint64_t bytes; + size_t i; - if (segs[i].log < min_seg.log) - min_seg = segs[i]; - } + /* + * If there are no tables or only a single one then we don't have to + * compact anything. The sequence is geometric by definition already. + */ + if (n <= 1) + return seg; - while (min_seg.start > 0) { - size_t prev = min_seg.start - 1; - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) + /* + * Find the ending table of the compaction segment needed to restore the + * geometric sequence. Note that the segment end is exclusive. + * + * To do so, we iterate backwards starting from the most recent table + * until a valid segment end is found. If the preceding table is smaller + * than the current table multiplied by the geometric factor (2), the + * compaction segment end has been identified. + * + * Tables after the ending point are not added to the byte count because + * they are already valid members of the geometric sequence. Due to the + * properties of a geometric sequence, it is not possible for the sum of + * these tables to exceed the value of the ending point table. + * + * Example table size sequence requiring no compaction: + * 64, 32, 16, 8, 4, 2, 1 + * + * Example table size sequence where compaction segment end is set to + * the last table. Since the segment end is exclusive, the last table is + * excluded during subsequent compaction and the table with size 3 is + * the final table included: + * 64, 32, 16, 8, 4, 3, 1 + */ + for (i = n - 1; i > 0; i--) { + if (sizes[i - 1] < sizes[i] * 2) { + seg.end = i + 1; + bytes = sizes[i]; break; + } + } - min_seg.start = prev; - min_seg.bytes += sizes[prev]; + /* + * Find the starting table of the compaction segment by iterating + * through the remaining tables and keeping track of the accumulated + * size of all tables seen from the segment end table. The previous + * table is compared to the accumulated size because the tables from the + * segment end are merged backwards recursively. + * + * Note that we keep iterating even after we have found the first + * starting point. This is because there may be tables in the stack + * preceding that first starting point which violate the geometric + * sequence. + * + * Example compaction segment start set to table with size 32: + * 128, 32, 16, 8, 4, 3, 1 + */ + for (; i > 0; i--) { + uint64_t curr = bytes; + bytes += sizes[i - 1]; + + if (sizes[i - 1] < curr * 2) { + seg.start = i - 1; + seg.bytes = bytes; + } } - reftable_free(segs); - return min_seg; + return seg; } static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) diff --git a/reftable/stack.h b/reftable/stack.h index d919455669e..656f896cc28 100644 --- a/reftable/stack.h +++ b/reftable/stack.h @@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines); struct segment { size_t start, end; - int log; uint64_t bytes; }; -int fastlog2(uint64_t sz); -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n); struct segment suggest_compaction_segment(uint64_t *sizes, size_t n); #endif diff --git a/reftable/stack_test.c b/reftable/stack_test.c index 7336757cf53..a23cee22f4d 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -717,59 +717,13 @@ static void test_reftable_stack_hash_id(void) clear_dir(dir); } -static void test_log2(void) -{ - EXPECT(1 == fastlog2(3)); - EXPECT(2 == fastlog2(4)); - EXPECT(2 == fastlog2(5)); -} - -static void test_sizes_to_segments(void) -{ - uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 }; - /* .................0 1 2 3 4 5 */ - - size_t seglen = 0; - struct segment *segs = - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); - EXPECT(segs[2].log == 3); - EXPECT(segs[2].start == 5); - EXPECT(segs[2].end == 6); - - EXPECT(segs[1].log == 2); - EXPECT(segs[1].start == 2); - EXPECT(segs[1].end == 5); - reftable_free(segs); -} - -static void test_sizes_to_segments_empty(void) -{ - size_t seglen = 0; - struct segment *segs = sizes_to_segments(&seglen, NULL, 0); - EXPECT(seglen == 0); - reftable_free(segs); -} - -static void test_sizes_to_segments_all_equal(void) -{ - uint64_t sizes[] = { 5, 5 }; - size_t seglen = 0; - struct segment *segs = - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); - EXPECT(seglen == 1); - EXPECT(segs[0].start == 0); - EXPECT(segs[0].end == 2); - reftable_free(segs); -} - static void test_suggest_compaction_segment(void) { - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; - /* .................0 1 2 3 4 5 6 */ + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; struct segment min = suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); - EXPECT(min.start == 2); - EXPECT(min.end == 7); + EXPECT(min.start == 1); + EXPECT(min.end == 10); } static void test_suggest_compaction_segment_nothing(void) @@ -880,6 +834,16 @@ static void test_empty_add(void) reftable_stack_destroy(st2); } +static int fastlog2(uint64_t sz) +{ + int l = 0; + if (sz == 0) + return 0; + for (; sz; sz /= 2) + l++; + return l - 1; +} + static void test_reftable_stack_auto_compaction(void) { struct reftable_write_options cfg = { 0 }; @@ -1068,7 +1032,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void) int stack_test_main(int argc, const char *argv[]) { RUN_TEST(test_empty_add); - RUN_TEST(test_log2); RUN_TEST(test_names_equal); RUN_TEST(test_parse_names); RUN_TEST(test_read_file); @@ -1088,9 +1051,6 @@ int stack_test_main(int argc, const char *argv[]) RUN_TEST(test_reftable_stack_update_index_check); RUN_TEST(test_reftable_stack_uptodate); RUN_TEST(test_reftable_stack_validate_refname); - RUN_TEST(test_sizes_to_segments); - RUN_TEST(test_sizes_to_segments_all_equal); - RUN_TEST(test_sizes_to_segments_empty); RUN_TEST(test_suggest_compaction_segment); RUN_TEST(test_suggest_compaction_segment_nothing); return 0; diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 444f7497907..853f0a051d8 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -293,7 +293,7 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' test_line_count = 1 repo/.git/reftable/tables.list && test_commit -C repo --no-tag A && - test_line_count = 2 repo/.git/reftable/tables.list && + test_line_count = 1 repo/.git/reftable/tables.list && test_commit -C repo --no-tag B && test_line_count = 1 repo/.git/reftable/tables.list @@ -320,6 +320,19 @@ test_expect_success 'ref transaction: env var disables compaction' ' test_line_count -lt $expected repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: alternating table sizes are compacted' ' + test_when_finished "rm -rf repo" && + + git init repo && + test_commit -C repo A && + for i in $(test_seq 5) + do + git -C repo branch -f foo && + git -C repo branch -d foo || return 1 + done && + test_line_count = 2 repo/.git/reftable/tables.list +' + check_fsync_events () { local trace="$1" && shift && @@ -345,7 +358,7 @@ test_expect_success 'ref transaction: writes are synced' ' git -C repo -c core.fsync=reference \ -c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD && check_fsync_events trace2.txt <<-EOF - "name":"hardware-flush","count":2 + "name":"hardware-flush","count":4 EOF ' @@ -367,8 +380,8 @@ test_expect_success 'pack-refs: compacts tables' ' test_commit -C repo A && ls -1 repo/.git/reftable >table-files && - test_line_count = 4 table-files && - test_line_count = 3 repo/.git/reftable/tables.list && + test_line_count = 3 table-files && + test_line_count = 2 repo/.git/reftable/tables.list && git -C repo pack-refs && ls -1 repo/.git/reftable >table-files && @@ -400,7 +413,7 @@ do umask $umask && git init --shared=true repo && test_commit -C repo A && - test_line_count = 3 repo/.git/reftable/tables.list + test_line_count = 2 repo/.git/reftable/tables.list ) && git -C repo pack-refs && test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list && @@ -768,12 +781,16 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && + + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 3 repo/.git/reftable/tables.list && git -C repo pack-refs && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list ' @@ -781,13 +798,17 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && + + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 3 repo/.git/reftable/tables.list && git -C worktree pack-refs && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list + test_line_count = 3 repo/.git/reftable/tables.list ' test_expect_success 'worktree: creating shared ref updates main stack' ' @@ -801,6 +822,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' ' test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list && + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ git -C worktree update-ref refs/heads/shared HEAD && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 2 repo/.git/reftable/tables.list -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v4 0/2] reftable/stack: use geometric table compaction 2024-04-03 0:20 ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-04-03 0:20 ` [PATCH v4 1/2] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget 2024-04-03 0:20 ` [PATCH v4 2/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget @ 2024-04-03 4:47 ` Patrick Steinhardt 2024-04-03 11:12 ` Karthik Nayak 2024-04-04 18:29 ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget 4 siblings, 0 replies; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-03 4:47 UTC (permalink / raw) To: Justin Tobler via GitGitGadget; +Cc: git, Karthik Nayak, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 10642 bytes --] On Wed, Apr 03, 2024 at 12:20:34AM +0000, Justin Tobler via GitGitGadget wrote: > Hello again, > > This is the fourth version my patch series that refactors the reftable > compaction strategy to instead follow a geometric sequence. Changes compared > to v3: > > * Changed env name from GIT_TEST_REFTABLE_NO_AUTOCOMPACTION to > GIT_TEST_REFTABLE_AUTOCOMPACTION and set the default to false. This You probably mean true here, not false :) > should hopefully be a bit more intuitive since it avoids the double > negative. > * Updated the corresponding env var test in t0610-reftable-basics.sh to > assert on the number of tables added and be overall less fragile. > * Folded lines that were too long. > * Updated some comments in stack.c to more accurately explain that table > segment end is exclusive. > * Dropped reftable/stack: make segment end inclusive commit to keep segment > end exclusive and better follow expectations. > > Thanks for taking a look! This version looks good to me, thanks! Patrick > -Justin > > Justin Tobler (2): > reftable/stack: add env to disable autocompaction > reftable/stack: use geometric table compaction > > reftable/stack.c | 126 +++++++++++++++++++------------------ > reftable/stack.h | 3 - > reftable/stack_test.c | 66 ++++--------------- > reftable/system.h | 1 + > t/t0610-reftable-basics.sh | 65 +++++++++++++++---- > 5 files changed, 132 insertions(+), 129 deletions(-) > > > base-commit: c75fd8d8150afdf836b63a8e0534d9b9e3e111ba > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v4 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v4 > Pull-Request: https://github.com/gitgitgadget/git/pull/1683 > > Range-diff vs v3: > > 1: 2fdd8ea1133 ! 1: 2a0421e5f20 reftable/stack: add env to disable autocompaction > @@ Commit message > > In future tests it will be neccesary to create repositories with a set > number of tables. To make this easier, introduce the > - `GIT_TEST_REFTABLE_NO_AUTOCOMPACTION` environment variable that, when > - set, disables autocompaction of reftables. > + `GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set > + to false, disables autocompaction of reftables. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > > @@ reftable/stack.c: int reftable_addition_commit(struct reftable_addition *add) > goto done; > > - if (!add->stack->disable_auto_compact) > -+ if (!add->stack->disable_auto_compact && !git_env_bool("GIT_TEST_REFTABLE_NO_AUTOCOMPACTION", 0)) > ++ if (!add->stack->disable_auto_compact && > ++ git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) > err = reftable_stack_auto_compact(add->stack); > > done: > @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause a > test_line_count = 1 repo/.git/reftable/tables.list > ' > > -+test_expect_success 'ref transaction: environment variable disables auto-compaction' ' > ++test_expect_success 'ref transaction: env var disables compaction' ' > + test_when_finished "rm -rf repo" && > + > + git init repo && > + test_commit -C repo A && > -+ for i in $(test_seq 20) > ++ > ++ start=$(wc -l <repo/.git/reftable/tables.list) && > ++ iterations=5 && > ++ expected=$((start + iterations)) && > ++ > ++ for i in $(test_seq $iterations) > + do > -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 > ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ > ++ git -C repo update-ref branch-$i HEAD || return 1 > + done && > -+ test_line_count = 23 repo/.git/reftable/tables.list && > ++ test_line_count = $expected repo/.git/reftable/tables.list && > + > + git -C repo update-ref foo HEAD && > -+ test_line_count = 1 repo/.git/reftable/tables.list > ++ test_line_count -lt $expected repo/.git/reftable/tables.list > +' > + > check_fsync_events () { > 2: 7e62c2286ae ! 2: e0f4d0dbcc1 reftable/stack: use geometric table compaction > @@ reftable/stack.c: static int segment_size(struct segment *s) > - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) > + /* > + * Find the ending table of the compaction segment needed to restore the > -+ * geometric sequence. > ++ * geometric sequence. Note that the segment end is exclusive. > + * > + * To do so, we iterate backwards starting from the most recent table > + * until a valid segment end is found. If the preceding table is smaller > + * than the current table multiplied by the geometric factor (2), the > -+ * current table is set as the compaction segment end. > ++ * compaction segment end has been identified. > + * > + * Tables after the ending point are not added to the byte count because > + * they are already valid members of the geometric sequence. Due to the > @@ reftable/stack.c: static int segment_size(struct segment *s) > + * Example table size sequence requiring no compaction: > + * 64, 32, 16, 8, 4, 2, 1 > + * > -+ * Example compaction segment end set to table with size 3: > ++ * Example table size sequence where compaction segment end is set to > ++ * the last table. Since the segment end is exclusive, the last table is > ++ * excluded during subsequent compaction and the table with size 3 is > ++ * the final table included: > + * 64, 32, 16, 8, 4, 3, 1 > + */ > + for (i = n - 1; i > 0; i--) { > @@ reftable/stack_test.c: static void test_empty_add(void) > + int l = 0; > + if (sz == 0) > + return 0; > -+ for (; sz; sz /= 2) { > ++ for (; sz; sz /= 2) > + l++; > -+ } > + return l - 1; > +} > + > @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause a > > test_commit -C repo --no-tag B && > test_line_count = 1 repo/.git/reftable/tables.list > -@@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: environment variable disables auto-compact > - do > - GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1 > - done && > -- test_line_count = 23 repo/.git/reftable/tables.list && > -+ test_line_count = 22 repo/.git/reftable/tables.list && > - > - git -C repo update-ref foo HEAD && > - test_line_count = 1 repo/.git/reftable/tables.list > +@@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: env var disables compaction' ' > + test_line_count -lt $expected repo/.git/reftable/tables.list > ' > > +test_expect_success 'ref transaction: alternating table sizes are compacted' ' > + test_when_finished "rm -rf repo" && > ++ > + git init repo && > + test_commit -C repo A && > -+ for i in $(test_seq 20) > ++ for i in $(test_seq 5) > + do > + git -C repo branch -f foo && > + git -C repo branch -d foo || return 1 > @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: pack-refs in main rep > test_when_finished "rm -rf repo worktree" && > git init repo && > test_commit -C repo A && > -- git -C repo worktree add ../worktree && > -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && > -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && > ++ > ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ > + git -C repo worktree add ../worktree && > ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ > ++ git -C worktree update-ref refs/worktree/per-worktree HEAD && > > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list && > @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: pack-refs in worktree > test_when_finished "rm -rf repo worktree" && > git init repo && > test_commit -C repo A && > -- git -C repo worktree add ../worktree && > -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && > -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && > ++ > ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ > + git -C repo worktree add ../worktree && > ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ > ++ git -C worktree update-ref refs/worktree/per-worktree HEAD && > > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list && > @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: pack-refs in worktree > ' > > test_expect_success 'worktree: creating shared ref updates main stack' ' > - test_when_finished "rm -rf repo worktree" && > - git init repo && > - test_commit -C repo A && > -+ test_commit -C repo B && > - > - git -C repo worktree add ../worktree && > - git -C repo pack-refs && > @@ t/t0610-reftable-basics.sh: test_expect_success 'worktree: creating shared ref updates main stack' ' > test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > test_line_count = 1 repo/.git/reftable/tables.list && > > -- git -C worktree update-ref refs/heads/shared HEAD && > -+ GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/heads/shared HEAD && > ++ GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ > + git -C worktree update-ref refs/heads/shared HEAD && > test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > test_line_count = 2 repo/.git/reftable/tables.list > - ' > 3: 9a33914c852 < -: ----------- reftable/stack: make segment end inclusive > > -- > gitgitgadget [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v4 0/2] reftable/stack: use geometric table compaction 2024-04-03 0:20 ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget ` (2 preceding siblings ...) 2024-04-03 4:47 ` [PATCH v4 0/2] " Patrick Steinhardt @ 2024-04-03 11:12 ` Karthik Nayak 2024-04-03 16:56 ` Junio C Hamano 2024-04-04 18:29 ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget 4 siblings, 1 reply; 52+ messages in thread From: Karthik Nayak @ 2024-04-03 11:12 UTC (permalink / raw) To: Justin Tobler via GitGitGadget, git; +Cc: Patrick Steinhardt, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1068 bytes --] Hello, "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com> writes: > Hello again, > > This is the fourth version my patch series that refactors the reftable > compaction strategy to instead follow a geometric sequence. Changes compared > to v3: > > * Changed env name from GIT_TEST_REFTABLE_NO_AUTOCOMPACTION to > GIT_TEST_REFTABLE_AUTOCOMPACTION and set the default to false. This > should hopefully be a bit more intuitive since it avoids the double > negative. > * Updated the corresponding env var test in t0610-reftable-basics.sh to > assert on the number of tables added and be overall less fragile. > * Folded lines that were too long. > * Updated some comments in stack.c to more accurately explain that table > segment end is exclusive. > * Dropped reftable/stack: make segment end inclusive commit to keep segment > end exclusive and better follow expectations. > > Thanks for taking a look! > > -Justin > Just a note that this doesn't merge nicely with nice because of conflicts with a2f711ade0c4816a59155d72559cbc4759cd4699. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v4 0/2] reftable/stack: use geometric table compaction 2024-04-03 11:12 ` Karthik Nayak @ 2024-04-03 16:56 ` Junio C Hamano 0 siblings, 0 replies; 52+ messages in thread From: Junio C Hamano @ 2024-04-03 16:56 UTC (permalink / raw) To: Karthik Nayak Cc: Justin Tobler via GitGitGadget, git, Patrick Steinhardt, Justin Tobler Karthik Nayak <karthik.188@gmail.com> writes: > Just a note that this doesn't merge nicely with nice because of > conflicts with a2f711ade0c4816a59155d72559cbc4759cd4699. True. I've been resolving the conflict already, so there is not much new to see here ;-) It seems that the plan is to update the section that conflicts even further to avoid the access to the environment variable, so I'll have to update my rerere database yet another time. Thanks. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v5 0/3] reftable/stack: use geometric table compaction 2024-04-03 0:20 ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget ` (3 preceding siblings ...) 2024-04-03 11:12 ` Karthik Nayak @ 2024-04-04 18:29 ` Justin Tobler via GitGitGadget 2024-04-04 18:29 ` [PATCH v5 1/3] reftable/stack: allow disabling of auto-compaction Justin Tobler via GitGitGadget ` (4 more replies) 4 siblings, 5 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-04 18:29 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler Hello again, This is the fifth version my patch series that refactors the reftable compaction strategy to instead follow a geometric sequence. Changes compared to v4: * To fix some failing tests and conflicts, this patch series now depends on the ps/pack-refs-auto series which is currently in next. * Lifted the GIT_TEST_REFTABLE_AUTOCOMPACTION env out of the reftable library and into the reftable backend code. Thanks for taking a look! -Justin Justin Tobler (3): reftable/stack: allow disabling of auto-compaction reftable/stack: add env to disable autocompaction reftable/stack: use geometric table compaction refs/reftable-backend.c | 4 ++ reftable/reftable-writer.h | 3 + reftable/stack.c | 125 +++++++++++++++++++------------------ reftable/stack.h | 4 -- reftable/stack_test.c | 77 ++++++----------------- t/t0610-reftable-basics.sh | 71 ++++++++++++++++----- 6 files changed, 146 insertions(+), 138 deletions(-) base-commit: 4b32163adf4863c6df3bb6b43540fa2ca3494e28 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v5 Pull-Request: https://github.com/gitgitgadget/git/pull/1683 Range-diff vs v4: -: ----------- > 1: a7011dbc6aa reftable/stack: allow disabling of auto-compaction 1: 2a0421e5f20 ! 2: 7c4fe0e9ec5 reftable/stack: add env to disable autocompaction @@ Commit message Signed-off-by: Justin Tobler <jltobler@gmail.com> - ## reftable/stack.c ## -@@ reftable/stack.c: int reftable_addition_commit(struct reftable_addition *add) - if (err) - goto done; - -- if (!add->stack->disable_auto_compact) -+ if (!add->stack->disable_auto_compact && -+ git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) - err = reftable_stack_auto_compact(add->stack); - - done: - - ## reftable/system.h ## -@@ reftable/system.h: license that can be found in the LICENSE file or at - #include "tempfile.h" - #include "hash-ll.h" /* hash ID, sizes.*/ - #include "dir.h" /* remove_dir_recursively, for tests.*/ + ## refs/reftable-backend.c ## +@@ + #include "../reftable/reftable-merged.h" + #include "../setup.h" + #include "../strmap.h" +#include "parse.h" + #include "refs-internal.h" - int hash_size(uint32_t id); + /* +@@ refs/reftable-backend.c: static struct ref_store *reftable_be_init(struct repository *repo, + refs->write_options.hash_id = repo->hash_algo->format_id; + refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask); ++ if (!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) ++ refs->write_options.disable_auto_compact = 1; ++ + /* + * Set up the main reftable stack that is hosted in GIT_COMMON_DIR. + * This stack contains both the shared and the main worktree refs. ## t/t0610-reftable-basics.sh ## @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause auto-compaction' ' 2: e0f4d0dbcc1 ! 3: 8f124acf0f8 reftable/stack: use geometric table compaction @@ reftable/stack_test.c: static void test_empty_add(void) + static void test_reftable_stack_auto_compaction(void) { - struct reftable_write_options cfg = { 0 }; + struct reftable_write_options cfg = { @@ reftable/stack_test.c: static void test_reftable_stack_compaction_concurrent_clean(void) int stack_test_main(int argc, const char *argv[]) { @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes are syn EOF ' +@@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: fails gracefully when auto compaction fail + done || + exit 1 + done && +- test_line_count = 13 .git/reftable/tables.list ++ test_line_count = 10 .git/reftable/tables.list + ) + ' + @@ t/t0610-reftable-basics.sh: test_expect_success 'pack-refs: compacts tables' ' test_commit -C repo A && @@ t/t0610-reftable-basics.sh: test_expect_success 'pack-refs: compacts tables' ' git -C repo pack-refs && ls -1 repo/.git/reftable >table-files && +@@ t/t0610-reftable-basics.sh: test_expect_success "$command: auto compaction" ' + # The tables should have been auto-compacted, and thus auto + # compaction should not have to do anything. + ls -1 .git/reftable >tables-expect && +- test_line_count = 4 tables-expect && ++ test_line_count = 3 tables-expect && + git $command --auto && + ls -1 .git/reftable >tables-actual && + test_cmp tables-expect tables-actual && +@@ t/t0610-reftable-basics.sh: test_expect_success "$command: auto compaction" ' + git branch B && + git branch C && + rm .git/reftable/*.lock && +- test_line_count = 5 .git/reftable/tables.list && ++ test_line_count = 4 .git/reftable/tables.list && + + git $command --auto && + test_line_count = 1 .git/reftable/tables.list @@ t/t0610-reftable-basics.sh: do umask $umask && git init --shared=true repo && -- gitgitgadget ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v5 1/3] reftable/stack: allow disabling of auto-compaction 2024-04-04 18:29 ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget @ 2024-04-04 18:29 ` Justin Tobler via GitGitGadget 2024-04-08 6:12 ` Patrick Steinhardt 2024-04-04 18:29 ` [PATCH v5 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget ` (3 subsequent siblings) 4 siblings, 1 reply; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-04 18:29 UTC (permalink / raw) To: git Cc: Patrick Steinhardt, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> Move the `disable_auto_compact` option into `reftable_write_options` to allow a stack to be configured with auto-compaction disabled. In a subsequent commit, this is used to disable auto-compaction when a specific environment variable is set. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/reftable-writer.h | 3 +++ reftable/stack.c | 2 +- reftable/stack.h | 1 - reftable/stack_test.c | 11 ++++++----- 4 files changed, 10 insertions(+), 7 deletions(-) diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h index 7c7cae5f99b..155bf0bbe2a 100644 --- a/reftable/reftable-writer.h +++ b/reftable/reftable-writer.h @@ -46,6 +46,9 @@ struct reftable_write_options { * is a single line, and add '\n' if missing. */ unsigned exact_log_message : 1; + + /* boolean: Prevent auto-compaction of tables. */ + unsigned disable_auto_compact : 1; }; /* reftable_block_stats holds statistics for a single block type */ diff --git a/reftable/stack.c b/reftable/stack.c index dde50b61d69..1a7cdad12c9 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -680,7 +680,7 @@ int reftable_addition_commit(struct reftable_addition *add) if (err) goto done; - if (!add->stack->disable_auto_compact) { + if (!add->stack->config.disable_auto_compact) { /* * Auto-compact the stack to keep the number of tables in * control. It is possible that a concurrent writer is already diff --git a/reftable/stack.h b/reftable/stack.h index d919455669e..c862053025f 100644 --- a/reftable/stack.h +++ b/reftable/stack.h @@ -19,7 +19,6 @@ struct reftable_stack { int list_fd; char *reftable_dir; - int disable_auto_compact; struct reftable_write_options config; diff --git a/reftable/stack_test.c b/reftable/stack_test.c index 351e35bd86d..4fec823f14f 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -325,7 +325,7 @@ static void test_reftable_stack_transaction_api_performs_auto_compaction(void) * we can ensure that we indeed honor this setting and have * better control over when exactly auto compaction runs. */ - st->disable_auto_compact = i != n; + st->config.disable_auto_compact = i != n; err = reftable_stack_new_addition(&add, st); EXPECT_ERR(err); @@ -497,6 +497,7 @@ static void test_reftable_stack_add(void) struct reftable_write_options cfg = { .exact_log_message = 1, .default_permissions = 0660, + .disable_auto_compact = 1, }; struct reftable_stack *st = NULL; char *dir = get_tmp_dir(__LINE__); @@ -508,7 +509,6 @@ static void test_reftable_stack_add(void) err = reftable_new_stack(&st, dir, cfg); EXPECT_ERR(err); - st->disable_auto_compact = 1; for (i = 0; i < N; i++) { char buf[256]; @@ -935,7 +935,9 @@ static void test_empty_add(void) static void test_reftable_stack_auto_compaction(void) { - struct reftable_write_options cfg = { 0 }; + struct reftable_write_options cfg = { + .disable_auto_compact = 1, + }; struct reftable_stack *st = NULL; char *dir = get_tmp_dir(__LINE__); @@ -945,7 +947,6 @@ static void test_reftable_stack_auto_compaction(void) err = reftable_new_stack(&st, dir, cfg); EXPECT_ERR(err); - st->disable_auto_compact = 1; /* call manually below for coverage. */ for (i = 0; i < N; i++) { char name[100]; struct reftable_ref_record ref = { @@ -994,7 +995,7 @@ static void test_reftable_stack_add_performs_auto_compaction(void) * we can ensure that we indeed honor this setting and have * better control over when exactly auto compaction runs. */ - st->disable_auto_compact = i != n; + st->config.disable_auto_compact = i != n; strbuf_reset(&refname); strbuf_addf(&refname, "branch-%04d", i); -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v5 1/3] reftable/stack: allow disabling of auto-compaction 2024-04-04 18:29 ` [PATCH v5 1/3] reftable/stack: allow disabling of auto-compaction Justin Tobler via GitGitGadget @ 2024-04-08 6:12 ` Patrick Steinhardt 0 siblings, 0 replies; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-08 6:12 UTC (permalink / raw) To: Justin Tobler via GitGitGadget Cc: git, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 5099 bytes --] On Thu, Apr 04, 2024 at 06:29:27PM +0000, Justin Tobler via GitGitGadget wrote: > From: Justin Tobler <jltobler@gmail.com> > > Move the `disable_auto_compact` option into `reftable_write_options` to > allow a stack to be configured with auto-compaction disabled. In a > subsequent commit, this is used to disable auto-compaction when a > specific environment variable is set. This patch looks good to me. I think the commit subject and message could use a bit of polishing though: we do not add a new way to disable auto-compaction as that already exists. The important bit though is that this toggle is purely internal right now and thus cannot be accessed by library users. I'd thus propose something along the following lines: ``` reftable/stack: expose option to disable auto-compaction While the reftable stack already has a bit controls whether or not to run auto-compation, this bit is not accessible to users of the library. There are usecases though where a caller may want to have more control over auto-compaction. Move the `disable_auto_compact` option into `reftable_write_options` to allow external callers to disable auto-compaction. This will be used in a subsequent commit. ``` Patrick > Signed-off-by: Justin Tobler <jltobler@gmail.com> > --- > reftable/reftable-writer.h | 3 +++ > reftable/stack.c | 2 +- > reftable/stack.h | 1 - > reftable/stack_test.c | 11 ++++++----- > 4 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h > index 7c7cae5f99b..155bf0bbe2a 100644 > --- a/reftable/reftable-writer.h > +++ b/reftable/reftable-writer.h > @@ -46,6 +46,9 @@ struct reftable_write_options { > * is a single line, and add '\n' if missing. > */ > unsigned exact_log_message : 1; > + > + /* boolean: Prevent auto-compaction of tables. */ > + unsigned disable_auto_compact : 1; > }; > > /* reftable_block_stats holds statistics for a single block type */ > diff --git a/reftable/stack.c b/reftable/stack.c > index dde50b61d69..1a7cdad12c9 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -680,7 +680,7 @@ int reftable_addition_commit(struct reftable_addition *add) > if (err) > goto done; > > - if (!add->stack->disable_auto_compact) { > + if (!add->stack->config.disable_auto_compact) { > /* > * Auto-compact the stack to keep the number of tables in > * control. It is possible that a concurrent writer is already > diff --git a/reftable/stack.h b/reftable/stack.h > index d919455669e..c862053025f 100644 > --- a/reftable/stack.h > +++ b/reftable/stack.h > @@ -19,7 +19,6 @@ struct reftable_stack { > int list_fd; > > char *reftable_dir; > - int disable_auto_compact; > > struct reftable_write_options config; > > diff --git a/reftable/stack_test.c b/reftable/stack_test.c > index 351e35bd86d..4fec823f14f 100644 > --- a/reftable/stack_test.c > +++ b/reftable/stack_test.c > @@ -325,7 +325,7 @@ static void test_reftable_stack_transaction_api_performs_auto_compaction(void) > * we can ensure that we indeed honor this setting and have > * better control over when exactly auto compaction runs. > */ > - st->disable_auto_compact = i != n; > + st->config.disable_auto_compact = i != n; > > err = reftable_stack_new_addition(&add, st); > EXPECT_ERR(err); > @@ -497,6 +497,7 @@ static void test_reftable_stack_add(void) > struct reftable_write_options cfg = { > .exact_log_message = 1, > .default_permissions = 0660, > + .disable_auto_compact = 1, > }; > struct reftable_stack *st = NULL; > char *dir = get_tmp_dir(__LINE__); > @@ -508,7 +509,6 @@ static void test_reftable_stack_add(void) > > err = reftable_new_stack(&st, dir, cfg); > EXPECT_ERR(err); > - st->disable_auto_compact = 1; > > for (i = 0; i < N; i++) { > char buf[256]; > @@ -935,7 +935,9 @@ static void test_empty_add(void) > > static void test_reftable_stack_auto_compaction(void) > { > - struct reftable_write_options cfg = { 0 }; > + struct reftable_write_options cfg = { > + .disable_auto_compact = 1, > + }; > struct reftable_stack *st = NULL; > char *dir = get_tmp_dir(__LINE__); > > @@ -945,7 +947,6 @@ static void test_reftable_stack_auto_compaction(void) > err = reftable_new_stack(&st, dir, cfg); > EXPECT_ERR(err); > > - st->disable_auto_compact = 1; /* call manually below for coverage. */ > for (i = 0; i < N; i++) { > char name[100]; > struct reftable_ref_record ref = { > @@ -994,7 +995,7 @@ static void test_reftable_stack_add_performs_auto_compaction(void) > * we can ensure that we indeed honor this setting and have > * better control over when exactly auto compaction runs. > */ > - st->disable_auto_compact = i != n; > + st->config.disable_auto_compact = i != n; > > strbuf_reset(&refname); > strbuf_addf(&refname, "branch-%04d", i); > -- > gitgitgadget > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v5 2/3] reftable/stack: add env to disable autocompaction 2024-04-04 18:29 ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget 2024-04-04 18:29 ` [PATCH v5 1/3] reftable/stack: allow disabling of auto-compaction Justin Tobler via GitGitGadget @ 2024-04-04 18:29 ` Justin Tobler via GitGitGadget 2024-04-08 6:12 ` Patrick Steinhardt 2024-04-04 18:29 ` [PATCH v5 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget ` (2 subsequent siblings) 4 siblings, 1 reply; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-04 18:29 UTC (permalink / raw) To: git Cc: Patrick Steinhardt, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> In future tests it will be neccesary to create repositories with a set number of tables. To make this easier, introduce the `GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set to false, disables autocompaction of reftables. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- refs/reftable-backend.c | 4 ++++ t/t0610-reftable-basics.sh | 21 +++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 0bed6d2ab48..6b6191f89dd 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -18,6 +18,7 @@ #include "../reftable/reftable-merged.h" #include "../setup.h" #include "../strmap.h" +#include "parse.h" #include "refs-internal.h" /* @@ -248,6 +249,9 @@ static struct ref_store *reftable_be_init(struct repository *repo, refs->write_options.hash_id = repo->hash_algo->format_id; refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask); + if (!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) + refs->write_options.disable_auto_compact = 1; + /* * Set up the main reftable stack that is hosted in GIT_COMMON_DIR. * This stack contains both the shared and the main worktree refs. diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 931d888bbbc..c9e10b34684 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -299,6 +299,27 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' test_line_count = 1 repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: env var disables compaction' ' + test_when_finished "rm -rf repo" && + + git init repo && + test_commit -C repo A && + + start=$(wc -l <repo/.git/reftable/tables.list) && + iterations=5 && + expected=$((start + iterations)) && + + for i in $(test_seq $iterations) + do + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C repo update-ref branch-$i HEAD || return 1 + done && + test_line_count = $expected repo/.git/reftable/tables.list && + + git -C repo update-ref foo HEAD && + test_line_count -lt $expected repo/.git/reftable/tables.list +' + check_fsync_events () { local trace="$1" && shift && -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v5 2/3] reftable/stack: add env to disable autocompaction 2024-04-04 18:29 ` [PATCH v5 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget @ 2024-04-08 6:12 ` Patrick Steinhardt 2024-04-08 16:18 ` Junio C Hamano 0 siblings, 1 reply; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-08 6:12 UTC (permalink / raw) To: Justin Tobler via GitGitGadget Cc: git, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2726 bytes --] On Thu, Apr 04, 2024 at 06:29:28PM +0000, Justin Tobler via GitGitGadget wrote: > From: Justin Tobler <jltobler@gmail.com> > > In future tests it will be neccesary to create repositories with a set > number of tables. To make this easier, introduce the > `GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set > to false, disables autocompaction of reftables. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > --- > refs/reftable-backend.c | 4 ++++ > t/t0610-reftable-basics.sh | 21 +++++++++++++++++++++ > 2 files changed, 25 insertions(+) > > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c > index 0bed6d2ab48..6b6191f89dd 100644 > --- a/refs/reftable-backend.c > +++ b/refs/reftable-backend.c > @@ -18,6 +18,7 @@ > #include "../reftable/reftable-merged.h" > #include "../setup.h" > #include "../strmap.h" > +#include "parse.h" > #include "refs-internal.h" > > /* > @@ -248,6 +249,9 @@ static struct ref_store *reftable_be_init(struct repository *repo, > refs->write_options.hash_id = repo->hash_algo->format_id; > refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask); > > + if (!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) > + refs->write_options.disable_auto_compact = 1; > + This could be simplified to: ``` refs->write_options.disable_auto_compact = !git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1); ``` Patrick > /* > * Set up the main reftable stack that is hosted in GIT_COMMON_DIR. > * This stack contains both the shared and the main worktree refs. > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 931d888bbbc..c9e10b34684 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -299,6 +299,27 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' > test_line_count = 1 repo/.git/reftable/tables.list > ' > > +test_expect_success 'ref transaction: env var disables compaction' ' > + test_when_finished "rm -rf repo" && > + > + git init repo && > + test_commit -C repo A && > + > + start=$(wc -l <repo/.git/reftable/tables.list) && > + iterations=5 && > + expected=$((start + iterations)) && > + > + for i in $(test_seq $iterations) > + do > + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ > + git -C repo update-ref branch-$i HEAD || return 1 > + done && > + test_line_count = $expected repo/.git/reftable/tables.list && > + > + git -C repo update-ref foo HEAD && > + test_line_count -lt $expected repo/.git/reftable/tables.list > +' > + > check_fsync_events () { > local trace="$1" && > shift && > -- > gitgitgadget > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v5 2/3] reftable/stack: add env to disable autocompaction 2024-04-08 6:12 ` Patrick Steinhardt @ 2024-04-08 16:18 ` Junio C Hamano 0 siblings, 0 replies; 52+ messages in thread From: Junio C Hamano @ 2024-04-08 16:18 UTC (permalink / raw) To: Patrick Steinhardt Cc: Justin Tobler via GitGitGadget, git, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler Patrick Steinhardt <ps@pks.im> writes: >> @@ -248,6 +249,9 @@ static struct ref_store *reftable_be_init(struct repository *repo, >> refs->write_options.hash_id = repo->hash_algo->format_id; >> refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask); >> >> + if (!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) >> + refs->write_options.disable_auto_compact = 1; >> + > > This could be simplified to: > > ``` > refs->write_options.disable_auto_compact = > !git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1); > ``` I presume that the .disable_auto_compact member is off initially, given that this is inside reftable_be_init(), but your rewrite makes it easier on readers, as they do not have to know what value the member originally has at this point in the flow. So even though it replaces two lines of code with another two lines of code, it does count as a valuable simplification at the conceptual level. Thanks. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v5 3/3] reftable/stack: use geometric table compaction 2024-04-04 18:29 ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget 2024-04-04 18:29 ` [PATCH v5 1/3] reftable/stack: allow disabling of auto-compaction Justin Tobler via GitGitGadget 2024-04-04 18:29 ` [PATCH v5 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget @ 2024-04-04 18:29 ` Justin Tobler via GitGitGadget 2024-04-08 6:12 ` [PATCH v5 0/3] " Patrick Steinhardt 2024-04-08 16:16 ` [PATCH v6 " Justin Tobler via GitGitGadget 4 siblings, 0 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-04 18:29 UTC (permalink / raw) To: git Cc: Patrick Steinhardt, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> To reduce the number of on-disk reftables, compaction is performed. Contiguous tables with the same binary log value of size are grouped into segments. The segment that has both the lowest binary log value and contains more than one table is set as the starting point when identifying the compaction segment. Since segments containing a single table are not initially considered for compaction, if the table appended to the list does not match the previous table log value, no compaction occurs for the new table. It is therefore possible for unbounded growth of the table list. This can be demonstrated by repeating the following sequence: git branch -f foo git branch -d foo Each operation results in a new table being written with no compaction occurring until a separate operation produces a table matching the previous table log value. Instead, to avoid unbounded growth of the table list, the compaction strategy is updated to ensure tables follow a geometric sequence after each operation by individually evaluating each table in reverse index order. This strategy results in a much simpler and more robust algorithm compared to the previous one while also maintaining a minimal ordered set of tables on-disk. When creating 10 thousand references, the new strategy has no performance impact: Benchmark 1: update-ref: create refs sequentially (revision = HEAD~) Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s] Range (min … max): 26.447 s … 26.569 s 10 runs Benchmark 2: update-ref: create refs sequentially (revision = HEAD) Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s] Range (min … max): 26.366 s … 26.444 s 10 runs Summary update-ref: create refs sequentially (revision = HEAD) ran 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~) Some tests in `t0610-reftable-basics.sh` assert the on-disk state of tables and are therefore updated to specify the correct new table count. Since compaction is more aggressive in ensuring tables maintain a geometric sequence, the expected table count is reduced in these tests. In `reftable/stack_test.c` tests related to `sizes_to_segments()` are removed because the function is no longer needed. Also, the `test_suggest_compaction_segment()` test is updated to better showcase and reflect the new geometric compaction behavior. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/stack.c | 123 +++++++++++++++++++------------------ reftable/stack.h | 3 - reftable/stack_test.c | 66 ++++---------------- t/t0610-reftable-basics.sh | 50 ++++++++++----- 4 files changed, 111 insertions(+), 131 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index 1a7cdad12c9..80266bcbab1 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -1216,75 +1216,76 @@ static int segment_size(struct segment *s) return s->end - s->start; } -int fastlog2(uint64_t sz) -{ - int l = 0; - if (sz == 0) - return 0; - for (; sz; sz /= 2) { - l++; - } - return l - 1; -} - -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n) -{ - struct segment *segs = reftable_calloc(n, sizeof(*segs)); - struct segment cur = { 0 }; - size_t next = 0, i; - - if (n == 0) { - *seglen = 0; - return segs; - } - for (i = 0; i < n; i++) { - int log = fastlog2(sizes[i]); - if (cur.log != log && cur.bytes > 0) { - struct segment fresh = { - .start = i, - }; - - segs[next++] = cur; - cur = fresh; - } - - cur.log = log; - cur.end = i + 1; - cur.bytes += sizes[i]; - } - segs[next++] = cur; - *seglen = next; - return segs; -} - struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) { - struct segment min_seg = { - .log = 64, - }; - struct segment *segs; - size_t seglen = 0, i; - - segs = sizes_to_segments(&seglen, sizes, n); - for (i = 0; i < seglen; i++) { - if (segment_size(&segs[i]) == 1) - continue; + struct segment seg = { 0 }; + uint64_t bytes; + size_t i; - if (segs[i].log < min_seg.log) - min_seg = segs[i]; - } + /* + * If there are no tables or only a single one then we don't have to + * compact anything. The sequence is geometric by definition already. + */ + if (n <= 1) + return seg; - while (min_seg.start > 0) { - size_t prev = min_seg.start - 1; - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) + /* + * Find the ending table of the compaction segment needed to restore the + * geometric sequence. Note that the segment end is exclusive. + * + * To do so, we iterate backwards starting from the most recent table + * until a valid segment end is found. If the preceding table is smaller + * than the current table multiplied by the geometric factor (2), the + * compaction segment end has been identified. + * + * Tables after the ending point are not added to the byte count because + * they are already valid members of the geometric sequence. Due to the + * properties of a geometric sequence, it is not possible for the sum of + * these tables to exceed the value of the ending point table. + * + * Example table size sequence requiring no compaction: + * 64, 32, 16, 8, 4, 2, 1 + * + * Example table size sequence where compaction segment end is set to + * the last table. Since the segment end is exclusive, the last table is + * excluded during subsequent compaction and the table with size 3 is + * the final table included: + * 64, 32, 16, 8, 4, 3, 1 + */ + for (i = n - 1; i > 0; i--) { + if (sizes[i - 1] < sizes[i] * 2) { + seg.end = i + 1; + bytes = sizes[i]; break; + } + } - min_seg.start = prev; - min_seg.bytes += sizes[prev]; + /* + * Find the starting table of the compaction segment by iterating + * through the remaining tables and keeping track of the accumulated + * size of all tables seen from the segment end table. The previous + * table is compared to the accumulated size because the tables from the + * segment end are merged backwards recursively. + * + * Note that we keep iterating even after we have found the first + * starting point. This is because there may be tables in the stack + * preceding that first starting point which violate the geometric + * sequence. + * + * Example compaction segment start set to table with size 32: + * 128, 32, 16, 8, 4, 3, 1 + */ + for (; i > 0; i--) { + uint64_t curr = bytes; + bytes += sizes[i - 1]; + + if (sizes[i - 1] < curr * 2) { + seg.start = i - 1; + seg.bytes = bytes; + } } - reftable_free(segs); - return min_seg; + return seg; } static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) diff --git a/reftable/stack.h b/reftable/stack.h index c862053025f..d43efa47607 100644 --- a/reftable/stack.h +++ b/reftable/stack.h @@ -32,12 +32,9 @@ int read_lines(const char *filename, char ***lines); struct segment { size_t start, end; - int log; uint64_t bytes; }; -int fastlog2(uint64_t sz); -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n); struct segment suggest_compaction_segment(uint64_t *sizes, size_t n); #endif diff --git a/reftable/stack_test.c b/reftable/stack_test.c index 4fec823f14f..1df3ffce526 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -770,59 +770,13 @@ static void test_reftable_stack_hash_id(void) clear_dir(dir); } -static void test_log2(void) -{ - EXPECT(1 == fastlog2(3)); - EXPECT(2 == fastlog2(4)); - EXPECT(2 == fastlog2(5)); -} - -static void test_sizes_to_segments(void) -{ - uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 }; - /* .................0 1 2 3 4 5 */ - - size_t seglen = 0; - struct segment *segs = - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); - EXPECT(segs[2].log == 3); - EXPECT(segs[2].start == 5); - EXPECT(segs[2].end == 6); - - EXPECT(segs[1].log == 2); - EXPECT(segs[1].start == 2); - EXPECT(segs[1].end == 5); - reftable_free(segs); -} - -static void test_sizes_to_segments_empty(void) -{ - size_t seglen = 0; - struct segment *segs = sizes_to_segments(&seglen, NULL, 0); - EXPECT(seglen == 0); - reftable_free(segs); -} - -static void test_sizes_to_segments_all_equal(void) -{ - uint64_t sizes[] = { 5, 5 }; - size_t seglen = 0; - struct segment *segs = - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); - EXPECT(seglen == 1); - EXPECT(segs[0].start == 0); - EXPECT(segs[0].end == 2); - reftable_free(segs); -} - static void test_suggest_compaction_segment(void) { - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; - /* .................0 1 2 3 4 5 6 */ + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; struct segment min = suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); - EXPECT(min.start == 2); - EXPECT(min.end == 7); + EXPECT(min.start == 1); + EXPECT(min.end == 10); } static void test_suggest_compaction_segment_nothing(void) @@ -933,6 +887,16 @@ static void test_empty_add(void) reftable_stack_destroy(st2); } +static int fastlog2(uint64_t sz) +{ + int l = 0; + if (sz == 0) + return 0; + for (; sz; sz /= 2) + l++; + return l - 1; +} + static void test_reftable_stack_auto_compaction(void) { struct reftable_write_options cfg = { @@ -1122,7 +1086,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void) int stack_test_main(int argc, const char *argv[]) { RUN_TEST(test_empty_add); - RUN_TEST(test_log2); RUN_TEST(test_names_equal); RUN_TEST(test_parse_names); RUN_TEST(test_read_file); @@ -1143,9 +1106,6 @@ int stack_test_main(int argc, const char *argv[]) RUN_TEST(test_reftable_stack_update_index_check); RUN_TEST(test_reftable_stack_uptodate); RUN_TEST(test_reftable_stack_validate_refname); - RUN_TEST(test_sizes_to_segments); - RUN_TEST(test_sizes_to_segments_all_equal); - RUN_TEST(test_sizes_to_segments_empty); RUN_TEST(test_suggest_compaction_segment); RUN_TEST(test_suggest_compaction_segment_nothing); return 0; diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index c9e10b34684..8eec093788d 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -293,7 +293,7 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' test_line_count = 1 repo/.git/reftable/tables.list && test_commit -C repo --no-tag A && - test_line_count = 2 repo/.git/reftable/tables.list && + test_line_count = 1 repo/.git/reftable/tables.list && test_commit -C repo --no-tag B && test_line_count = 1 repo/.git/reftable/tables.list @@ -320,6 +320,19 @@ test_expect_success 'ref transaction: env var disables compaction' ' test_line_count -lt $expected repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: alternating table sizes are compacted' ' + test_when_finished "rm -rf repo" && + + git init repo && + test_commit -C repo A && + for i in $(test_seq 5) + do + git -C repo branch -f foo && + git -C repo branch -d foo || return 1 + done && + test_line_count = 2 repo/.git/reftable/tables.list +' + check_fsync_events () { local trace="$1" && shift && @@ -345,7 +358,7 @@ test_expect_success 'ref transaction: writes are synced' ' git -C repo -c core.fsync=reference \ -c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD && check_fsync_events trace2.txt <<-EOF - "name":"hardware-flush","count":2 + "name":"hardware-flush","count":4 EOF ' @@ -377,7 +390,7 @@ test_expect_success 'ref transaction: fails gracefully when auto compaction fail done || exit 1 done && - test_line_count = 13 .git/reftable/tables.list + test_line_count = 10 .git/reftable/tables.list ) ' @@ -387,8 +400,8 @@ test_expect_success 'pack-refs: compacts tables' ' test_commit -C repo A && ls -1 repo/.git/reftable >table-files && - test_line_count = 4 table-files && - test_line_count = 3 repo/.git/reftable/tables.list && + test_line_count = 3 table-files && + test_line_count = 2 repo/.git/reftable/tables.list && git -C repo pack-refs && ls -1 repo/.git/reftable >table-files && @@ -429,7 +442,7 @@ test_expect_success "$command: auto compaction" ' # The tables should have been auto-compacted, and thus auto # compaction should not have to do anything. ls -1 .git/reftable >tables-expect && - test_line_count = 4 tables-expect && + test_line_count = 3 tables-expect && git $command --auto && ls -1 .git/reftable >tables-actual && test_cmp tables-expect tables-actual && @@ -447,7 +460,7 @@ test_expect_success "$command: auto compaction" ' git branch B && git branch C && rm .git/reftable/*.lock && - test_line_count = 5 .git/reftable/tables.list && + test_line_count = 4 .git/reftable/tables.list && git $command --auto && test_line_count = 1 .git/reftable/tables.list @@ -479,7 +492,7 @@ do umask $umask && git init --shared=true repo && test_commit -C repo A && - test_line_count = 3 repo/.git/reftable/tables.list + test_line_count = 2 repo/.git/reftable/tables.list ) && git -C repo pack-refs && test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list && @@ -847,12 +860,16 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && + + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 3 repo/.git/reftable/tables.list && git -C repo pack-refs && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list ' @@ -860,13 +877,17 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && + + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 3 repo/.git/reftable/tables.list && git -C worktree pack-refs && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list + test_line_count = 3 repo/.git/reftable/tables.list ' test_expect_success 'worktree: creating shared ref updates main stack' ' @@ -880,6 +901,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' ' test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list && + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ git -C worktree update-ref refs/heads/shared HEAD && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 2 repo/.git/reftable/tables.list -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v5 0/3] reftable/stack: use geometric table compaction 2024-04-04 18:29 ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget ` (2 preceding siblings ...) 2024-04-04 18:29 ` [PATCH v5 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget @ 2024-04-08 6:12 ` Patrick Steinhardt 2024-04-08 16:17 ` Justin Tobler 2024-04-08 16:16 ` [PATCH v6 " Justin Tobler via GitGitGadget 4 siblings, 1 reply; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-08 6:12 UTC (permalink / raw) To: Justin Tobler via GitGitGadget Cc: git, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 6087 bytes --] On Thu, Apr 04, 2024 at 06:29:26PM +0000, Justin Tobler via GitGitGadget wrote: > Hello again, > > This is the fifth version my patch series that refactors the reftable > compaction strategy to instead follow a geometric sequence. Changes compared > to v4: > > * To fix some failing tests and conflicts, this patch series now depends on > the ps/pack-refs-auto series which is currently in next. > * Lifted the GIT_TEST_REFTABLE_AUTOCOMPACTION env out of the reftable > library and into the reftable backend code. > > Thanks for taking a look! > > -Justin I've added two additional nits which you may or may not want to address. But overall this patch series looks good to me. Thanks! Patrick > Justin Tobler (3): > reftable/stack: allow disabling of auto-compaction > reftable/stack: add env to disable autocompaction > reftable/stack: use geometric table compaction > > refs/reftable-backend.c | 4 ++ > reftable/reftable-writer.h | 3 + > reftable/stack.c | 125 +++++++++++++++++++------------------ > reftable/stack.h | 4 -- > reftable/stack_test.c | 77 ++++++----------------- > t/t0610-reftable-basics.sh | 71 ++++++++++++++++----- > 6 files changed, 146 insertions(+), 138 deletions(-) > > > base-commit: 4b32163adf4863c6df3bb6b43540fa2ca3494e28 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v5 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v5 > Pull-Request: https://github.com/gitgitgadget/git/pull/1683 > > Range-diff vs v4: > > -: ----------- > 1: a7011dbc6aa reftable/stack: allow disabling of auto-compaction > 1: 2a0421e5f20 ! 2: 7c4fe0e9ec5 reftable/stack: add env to disable autocompaction > @@ Commit message > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > > - ## reftable/stack.c ## > -@@ reftable/stack.c: int reftable_addition_commit(struct reftable_addition *add) > - if (err) > - goto done; > - > -- if (!add->stack->disable_auto_compact) > -+ if (!add->stack->disable_auto_compact && > -+ git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) > - err = reftable_stack_auto_compact(add->stack); > - > - done: > - > - ## reftable/system.h ## > -@@ reftable/system.h: license that can be found in the LICENSE file or at > - #include "tempfile.h" > - #include "hash-ll.h" /* hash ID, sizes.*/ > - #include "dir.h" /* remove_dir_recursively, for tests.*/ > + ## refs/reftable-backend.c ## > +@@ > + #include "../reftable/reftable-merged.h" > + #include "../setup.h" > + #include "../strmap.h" > +#include "parse.h" > + #include "refs-internal.h" > > - int hash_size(uint32_t id); > + /* > +@@ refs/reftable-backend.c: static struct ref_store *reftable_be_init(struct repository *repo, > + refs->write_options.hash_id = repo->hash_algo->format_id; > + refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask); > > ++ if (!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) > ++ refs->write_options.disable_auto_compact = 1; > ++ > + /* > + * Set up the main reftable stack that is hosted in GIT_COMMON_DIR. > + * This stack contains both the shared and the main worktree refs. > > ## t/t0610-reftable-basics.sh ## > @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause auto-compaction' ' > 2: e0f4d0dbcc1 ! 3: 8f124acf0f8 reftable/stack: use geometric table compaction > @@ reftable/stack_test.c: static void test_empty_add(void) > + > static void test_reftable_stack_auto_compaction(void) > { > - struct reftable_write_options cfg = { 0 }; > + struct reftable_write_options cfg = { > @@ reftable/stack_test.c: static void test_reftable_stack_compaction_concurrent_clean(void) > int stack_test_main(int argc, const char *argv[]) > { > @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes are syn > EOF > ' > > +@@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: fails gracefully when auto compaction fail > + done || > + exit 1 > + done && > +- test_line_count = 13 .git/reftable/tables.list > ++ test_line_count = 10 .git/reftable/tables.list > + ) > + ' > + > @@ t/t0610-reftable-basics.sh: test_expect_success 'pack-refs: compacts tables' ' > > test_commit -C repo A && > @@ t/t0610-reftable-basics.sh: test_expect_success 'pack-refs: compacts tables' ' > > git -C repo pack-refs && > ls -1 repo/.git/reftable >table-files && > +@@ t/t0610-reftable-basics.sh: test_expect_success "$command: auto compaction" ' > + # The tables should have been auto-compacted, and thus auto > + # compaction should not have to do anything. > + ls -1 .git/reftable >tables-expect && > +- test_line_count = 4 tables-expect && > ++ test_line_count = 3 tables-expect && > + git $command --auto && > + ls -1 .git/reftable >tables-actual && > + test_cmp tables-expect tables-actual && > +@@ t/t0610-reftable-basics.sh: test_expect_success "$command: auto compaction" ' > + git branch B && > + git branch C && > + rm .git/reftable/*.lock && > +- test_line_count = 5 .git/reftable/tables.list && > ++ test_line_count = 4 .git/reftable/tables.list && > + > + git $command --auto && > + test_line_count = 1 .git/reftable/tables.list > @@ t/t0610-reftable-basics.sh: do > umask $umask && > git init --shared=true repo && > > -- > gitgitgadget [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v5 0/3] reftable/stack: use geometric table compaction 2024-04-08 6:12 ` [PATCH v5 0/3] " Patrick Steinhardt @ 2024-04-08 16:17 ` Justin Tobler 0 siblings, 0 replies; 52+ messages in thread From: Justin Tobler @ 2024-04-08 16:17 UTC (permalink / raw) To: Patrick Steinhardt Cc: Justin Tobler via GitGitGadget, git, Karthik Nayak, Han-Wen Nienhuys On 24/04/08 08:12AM, Patrick Steinhardt wrote: > On Thu, Apr 04, 2024 at 06:29:26PM +0000, Justin Tobler via GitGitGadget wrote: > > Hello again, > > > > This is the fifth version my patch series that refactors the reftable > > compaction strategy to instead follow a geometric sequence. Changes compared > > to v4: > > > > * To fix some failing tests and conflicts, this patch series now depends on > > the ps/pack-refs-auto series which is currently in next. > > * Lifted the GIT_TEST_REFTABLE_AUTOCOMPACTION env out of the reftable > > library and into the reftable backend code. > > > > Thanks for taking a look! > > > > -Justin > > I've added two additional nits which you may or may not want to address. > But overall this patch series looks good to me. Thanks! Thanks Patrick! I've updated per your suggestions in the next patch series version. -Justin ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v6 0/3] reftable/stack: use geometric table compaction 2024-04-04 18:29 ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget ` (3 preceding siblings ...) 2024-04-08 6:12 ` [PATCH v5 0/3] " Patrick Steinhardt @ 2024-04-08 16:16 ` Justin Tobler via GitGitGadget 2024-04-08 16:16 ` [PATCH v6 1/3] reftable/stack: expose option to disable auto-compaction Justin Tobler via GitGitGadget ` (3 more replies) 4 siblings, 4 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-08 16:16 UTC (permalink / raw) To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler Hello again, This is the sixth version my patch series that refactors the reftable compaction strategy to instead follow a geometric sequence. Changes compared to v5: * Reworded commit message to more clearly explain that the already existing configuration to disable auto-compaction is being exposed to callers of the library. * Simplified expression to set the disable_auto_compact configuration. Thanks for taking a look! -Justin Justin Tobler (3): reftable/stack: expose option to disable auto-compaction reftable/stack: add env to disable autocompaction reftable/stack: use geometric table compaction refs/reftable-backend.c | 3 + reftable/reftable-writer.h | 3 + reftable/stack.c | 125 +++++++++++++++++++------------------ reftable/stack.h | 4 -- reftable/stack_test.c | 77 ++++++----------------- t/t0610-reftable-basics.sh | 71 ++++++++++++++++----- 6 files changed, 145 insertions(+), 138 deletions(-) base-commit: 4b32163adf4863c6df3bb6b43540fa2ca3494e28 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v6 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v6 Pull-Request: https://github.com/gitgitgadget/git/pull/1683 Range-diff vs v5: 1: a7011dbc6aa ! 1: 9c8f6b336ec reftable/stack: allow disabling of auto-compaction @@ Metadata Author: Justin Tobler <jltobler@gmail.com> ## Commit message ## - reftable/stack: allow disabling of auto-compaction + reftable/stack: expose option to disable auto-compaction + + The reftable stack already has a variable to configure whether or not to + run auto-compaction, but it is inaccessible to users of the library. + There exist use cases where a caller may want to have more control over + auto-compaction. Move the `disable_auto_compact` option into `reftable_write_options` to - allow a stack to be configured with auto-compaction disabled. In a - subsequent commit, this is used to disable auto-compaction when a - specific environment variable is set. + allow external callers to disable auto-compaction. This will be used in + a subsequent commit. Signed-off-by: Justin Tobler <jltobler@gmail.com> 2: 7c4fe0e9ec5 ! 2: c7bc7346540 reftable/stack: add env to disable autocompaction @@ refs/reftable-backend.c /* @@ refs/reftable-backend.c: static struct ref_store *reftable_be_init(struct repository *repo, + refs->write_options.block_size = 4096; refs->write_options.hash_id = repo->hash_algo->format_id; refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask); ++ refs->write_options.disable_auto_compact = ++ !git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1); -+ if (!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) -+ refs->write_options.disable_auto_compact = 1; -+ /* * Set up the main reftable stack that is hosted in GIT_COMMON_DIR. - * This stack contains both the shared and the main worktree refs. ## t/t0610-reftable-basics.sh ## @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause auto-compaction' ' 3: 8f124acf0f8 = 3: d75494a88b0 reftable/stack: use geometric table compaction -- gitgitgadget ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v6 1/3] reftable/stack: expose option to disable auto-compaction 2024-04-08 16:16 ` [PATCH v6 " Justin Tobler via GitGitGadget @ 2024-04-08 16:16 ` Justin Tobler via GitGitGadget 2024-04-08 16:16 ` [PATCH v6 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget ` (2 subsequent siblings) 3 siblings, 0 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-08 16:16 UTC (permalink / raw) To: git Cc: Patrick Steinhardt, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> The reftable stack already has a variable to configure whether or not to run auto-compaction, but it is inaccessible to users of the library. There exist use cases where a caller may want to have more control over auto-compaction. Move the `disable_auto_compact` option into `reftable_write_options` to allow external callers to disable auto-compaction. This will be used in a subsequent commit. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/reftable-writer.h | 3 +++ reftable/stack.c | 2 +- reftable/stack.h | 1 - reftable/stack_test.c | 11 ++++++----- 4 files changed, 10 insertions(+), 7 deletions(-) diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h index 7c7cae5f99b..155bf0bbe2a 100644 --- a/reftable/reftable-writer.h +++ b/reftable/reftable-writer.h @@ -46,6 +46,9 @@ struct reftable_write_options { * is a single line, and add '\n' if missing. */ unsigned exact_log_message : 1; + + /* boolean: Prevent auto-compaction of tables. */ + unsigned disable_auto_compact : 1; }; /* reftable_block_stats holds statistics for a single block type */ diff --git a/reftable/stack.c b/reftable/stack.c index dde50b61d69..1a7cdad12c9 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -680,7 +680,7 @@ int reftable_addition_commit(struct reftable_addition *add) if (err) goto done; - if (!add->stack->disable_auto_compact) { + if (!add->stack->config.disable_auto_compact) { /* * Auto-compact the stack to keep the number of tables in * control. It is possible that a concurrent writer is already diff --git a/reftable/stack.h b/reftable/stack.h index d919455669e..c862053025f 100644 --- a/reftable/stack.h +++ b/reftable/stack.h @@ -19,7 +19,6 @@ struct reftable_stack { int list_fd; char *reftable_dir; - int disable_auto_compact; struct reftable_write_options config; diff --git a/reftable/stack_test.c b/reftable/stack_test.c index 351e35bd86d..4fec823f14f 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -325,7 +325,7 @@ static void test_reftable_stack_transaction_api_performs_auto_compaction(void) * we can ensure that we indeed honor this setting and have * better control over when exactly auto compaction runs. */ - st->disable_auto_compact = i != n; + st->config.disable_auto_compact = i != n; err = reftable_stack_new_addition(&add, st); EXPECT_ERR(err); @@ -497,6 +497,7 @@ static void test_reftable_stack_add(void) struct reftable_write_options cfg = { .exact_log_message = 1, .default_permissions = 0660, + .disable_auto_compact = 1, }; struct reftable_stack *st = NULL; char *dir = get_tmp_dir(__LINE__); @@ -508,7 +509,6 @@ static void test_reftable_stack_add(void) err = reftable_new_stack(&st, dir, cfg); EXPECT_ERR(err); - st->disable_auto_compact = 1; for (i = 0; i < N; i++) { char buf[256]; @@ -935,7 +935,9 @@ static void test_empty_add(void) static void test_reftable_stack_auto_compaction(void) { - struct reftable_write_options cfg = { 0 }; + struct reftable_write_options cfg = { + .disable_auto_compact = 1, + }; struct reftable_stack *st = NULL; char *dir = get_tmp_dir(__LINE__); @@ -945,7 +947,6 @@ static void test_reftable_stack_auto_compaction(void) err = reftable_new_stack(&st, dir, cfg); EXPECT_ERR(err); - st->disable_auto_compact = 1; /* call manually below for coverage. */ for (i = 0; i < N; i++) { char name[100]; struct reftable_ref_record ref = { @@ -994,7 +995,7 @@ static void test_reftable_stack_add_performs_auto_compaction(void) * we can ensure that we indeed honor this setting and have * better control over when exactly auto compaction runs. */ - st->disable_auto_compact = i != n; + st->config.disable_auto_compact = i != n; strbuf_reset(&refname); strbuf_addf(&refname, "branch-%04d", i); -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v6 2/3] reftable/stack: add env to disable autocompaction 2024-04-08 16:16 ` [PATCH v6 " Justin Tobler via GitGitGadget 2024-04-08 16:16 ` [PATCH v6 1/3] reftable/stack: expose option to disable auto-compaction Justin Tobler via GitGitGadget @ 2024-04-08 16:16 ` Justin Tobler via GitGitGadget 2024-04-08 16:16 ` [PATCH v6 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-04-08 16:20 ` [PATCH v6 0/3] " Patrick Steinhardt 3 siblings, 0 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-08 16:16 UTC (permalink / raw) To: git Cc: Patrick Steinhardt, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> In future tests it will be neccesary to create repositories with a set number of tables. To make this easier, introduce the `GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set to false, disables autocompaction of reftables. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- refs/reftable-backend.c | 3 +++ t/t0610-reftable-basics.sh | 21 +++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 0bed6d2ab48..1cda48c5046 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -18,6 +18,7 @@ #include "../reftable/reftable-merged.h" #include "../setup.h" #include "../strmap.h" +#include "parse.h" #include "refs-internal.h" /* @@ -247,6 +248,8 @@ static struct ref_store *reftable_be_init(struct repository *repo, refs->write_options.block_size = 4096; refs->write_options.hash_id = repo->hash_algo->format_id; refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask); + refs->write_options.disable_auto_compact = + !git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1); /* * Set up the main reftable stack that is hosted in GIT_COMMON_DIR. diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 931d888bbbc..c9e10b34684 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -299,6 +299,27 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' test_line_count = 1 repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: env var disables compaction' ' + test_when_finished "rm -rf repo" && + + git init repo && + test_commit -C repo A && + + start=$(wc -l <repo/.git/reftable/tables.list) && + iterations=5 && + expected=$((start + iterations)) && + + for i in $(test_seq $iterations) + do + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C repo update-ref branch-$i HEAD || return 1 + done && + test_line_count = $expected repo/.git/reftable/tables.list && + + git -C repo update-ref foo HEAD && + test_line_count -lt $expected repo/.git/reftable/tables.list +' + check_fsync_events () { local trace="$1" && shift && -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v6 3/3] reftable/stack: use geometric table compaction 2024-04-08 16:16 ` [PATCH v6 " Justin Tobler via GitGitGadget 2024-04-08 16:16 ` [PATCH v6 1/3] reftable/stack: expose option to disable auto-compaction Justin Tobler via GitGitGadget 2024-04-08 16:16 ` [PATCH v6 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget @ 2024-04-08 16:16 ` Justin Tobler via GitGitGadget 2024-04-08 16:20 ` [PATCH v6 0/3] " Patrick Steinhardt 3 siblings, 0 replies; 52+ messages in thread From: Justin Tobler via GitGitGadget @ 2024-04-08 16:16 UTC (permalink / raw) To: git Cc: Patrick Steinhardt, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler, Justin Tobler From: Justin Tobler <jltobler@gmail.com> To reduce the number of on-disk reftables, compaction is performed. Contiguous tables with the same binary log value of size are grouped into segments. The segment that has both the lowest binary log value and contains more than one table is set as the starting point when identifying the compaction segment. Since segments containing a single table are not initially considered for compaction, if the table appended to the list does not match the previous table log value, no compaction occurs for the new table. It is therefore possible for unbounded growth of the table list. This can be demonstrated by repeating the following sequence: git branch -f foo git branch -d foo Each operation results in a new table being written with no compaction occurring until a separate operation produces a table matching the previous table log value. Instead, to avoid unbounded growth of the table list, the compaction strategy is updated to ensure tables follow a geometric sequence after each operation by individually evaluating each table in reverse index order. This strategy results in a much simpler and more robust algorithm compared to the previous one while also maintaining a minimal ordered set of tables on-disk. When creating 10 thousand references, the new strategy has no performance impact: Benchmark 1: update-ref: create refs sequentially (revision = HEAD~) Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s] Range (min … max): 26.447 s … 26.569 s 10 runs Benchmark 2: update-ref: create refs sequentially (revision = HEAD) Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s] Range (min … max): 26.366 s … 26.444 s 10 runs Summary update-ref: create refs sequentially (revision = HEAD) ran 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~) Some tests in `t0610-reftable-basics.sh` assert the on-disk state of tables and are therefore updated to specify the correct new table count. Since compaction is more aggressive in ensuring tables maintain a geometric sequence, the expected table count is reduced in these tests. In `reftable/stack_test.c` tests related to `sizes_to_segments()` are removed because the function is no longer needed. Also, the `test_suggest_compaction_segment()` test is updated to better showcase and reflect the new geometric compaction behavior. Signed-off-by: Justin Tobler <jltobler@gmail.com> --- reftable/stack.c | 123 +++++++++++++++++++------------------ reftable/stack.h | 3 - reftable/stack_test.c | 66 ++++---------------- t/t0610-reftable-basics.sh | 50 ++++++++++----- 4 files changed, 111 insertions(+), 131 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index 1a7cdad12c9..80266bcbab1 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -1216,75 +1216,76 @@ static int segment_size(struct segment *s) return s->end - s->start; } -int fastlog2(uint64_t sz) -{ - int l = 0; - if (sz == 0) - return 0; - for (; sz; sz /= 2) { - l++; - } - return l - 1; -} - -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n) -{ - struct segment *segs = reftable_calloc(n, sizeof(*segs)); - struct segment cur = { 0 }; - size_t next = 0, i; - - if (n == 0) { - *seglen = 0; - return segs; - } - for (i = 0; i < n; i++) { - int log = fastlog2(sizes[i]); - if (cur.log != log && cur.bytes > 0) { - struct segment fresh = { - .start = i, - }; - - segs[next++] = cur; - cur = fresh; - } - - cur.log = log; - cur.end = i + 1; - cur.bytes += sizes[i]; - } - segs[next++] = cur; - *seglen = next; - return segs; -} - struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) { - struct segment min_seg = { - .log = 64, - }; - struct segment *segs; - size_t seglen = 0, i; - - segs = sizes_to_segments(&seglen, sizes, n); - for (i = 0; i < seglen; i++) { - if (segment_size(&segs[i]) == 1) - continue; + struct segment seg = { 0 }; + uint64_t bytes; + size_t i; - if (segs[i].log < min_seg.log) - min_seg = segs[i]; - } + /* + * If there are no tables or only a single one then we don't have to + * compact anything. The sequence is geometric by definition already. + */ + if (n <= 1) + return seg; - while (min_seg.start > 0) { - size_t prev = min_seg.start - 1; - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) + /* + * Find the ending table of the compaction segment needed to restore the + * geometric sequence. Note that the segment end is exclusive. + * + * To do so, we iterate backwards starting from the most recent table + * until a valid segment end is found. If the preceding table is smaller + * than the current table multiplied by the geometric factor (2), the + * compaction segment end has been identified. + * + * Tables after the ending point are not added to the byte count because + * they are already valid members of the geometric sequence. Due to the + * properties of a geometric sequence, it is not possible for the sum of + * these tables to exceed the value of the ending point table. + * + * Example table size sequence requiring no compaction: + * 64, 32, 16, 8, 4, 2, 1 + * + * Example table size sequence where compaction segment end is set to + * the last table. Since the segment end is exclusive, the last table is + * excluded during subsequent compaction and the table with size 3 is + * the final table included: + * 64, 32, 16, 8, 4, 3, 1 + */ + for (i = n - 1; i > 0; i--) { + if (sizes[i - 1] < sizes[i] * 2) { + seg.end = i + 1; + bytes = sizes[i]; break; + } + } - min_seg.start = prev; - min_seg.bytes += sizes[prev]; + /* + * Find the starting table of the compaction segment by iterating + * through the remaining tables and keeping track of the accumulated + * size of all tables seen from the segment end table. The previous + * table is compared to the accumulated size because the tables from the + * segment end are merged backwards recursively. + * + * Note that we keep iterating even after we have found the first + * starting point. This is because there may be tables in the stack + * preceding that first starting point which violate the geometric + * sequence. + * + * Example compaction segment start set to table with size 32: + * 128, 32, 16, 8, 4, 3, 1 + */ + for (; i > 0; i--) { + uint64_t curr = bytes; + bytes += sizes[i - 1]; + + if (sizes[i - 1] < curr * 2) { + seg.start = i - 1; + seg.bytes = bytes; + } } - reftable_free(segs); - return min_seg; + return seg; } static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) diff --git a/reftable/stack.h b/reftable/stack.h index c862053025f..d43efa47607 100644 --- a/reftable/stack.h +++ b/reftable/stack.h @@ -32,12 +32,9 @@ int read_lines(const char *filename, char ***lines); struct segment { size_t start, end; - int log; uint64_t bytes; }; -int fastlog2(uint64_t sz); -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n); struct segment suggest_compaction_segment(uint64_t *sizes, size_t n); #endif diff --git a/reftable/stack_test.c b/reftable/stack_test.c index 4fec823f14f..1df3ffce526 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -770,59 +770,13 @@ static void test_reftable_stack_hash_id(void) clear_dir(dir); } -static void test_log2(void) -{ - EXPECT(1 == fastlog2(3)); - EXPECT(2 == fastlog2(4)); - EXPECT(2 == fastlog2(5)); -} - -static void test_sizes_to_segments(void) -{ - uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 }; - /* .................0 1 2 3 4 5 */ - - size_t seglen = 0; - struct segment *segs = - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); - EXPECT(segs[2].log == 3); - EXPECT(segs[2].start == 5); - EXPECT(segs[2].end == 6); - - EXPECT(segs[1].log == 2); - EXPECT(segs[1].start == 2); - EXPECT(segs[1].end == 5); - reftable_free(segs); -} - -static void test_sizes_to_segments_empty(void) -{ - size_t seglen = 0; - struct segment *segs = sizes_to_segments(&seglen, NULL, 0); - EXPECT(seglen == 0); - reftable_free(segs); -} - -static void test_sizes_to_segments_all_equal(void) -{ - uint64_t sizes[] = { 5, 5 }; - size_t seglen = 0; - struct segment *segs = - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); - EXPECT(seglen == 1); - EXPECT(segs[0].start == 0); - EXPECT(segs[0].end == 2); - reftable_free(segs); -} - static void test_suggest_compaction_segment(void) { - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; - /* .................0 1 2 3 4 5 6 */ + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; struct segment min = suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); - EXPECT(min.start == 2); - EXPECT(min.end == 7); + EXPECT(min.start == 1); + EXPECT(min.end == 10); } static void test_suggest_compaction_segment_nothing(void) @@ -933,6 +887,16 @@ static void test_empty_add(void) reftable_stack_destroy(st2); } +static int fastlog2(uint64_t sz) +{ + int l = 0; + if (sz == 0) + return 0; + for (; sz; sz /= 2) + l++; + return l - 1; +} + static void test_reftable_stack_auto_compaction(void) { struct reftable_write_options cfg = { @@ -1122,7 +1086,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void) int stack_test_main(int argc, const char *argv[]) { RUN_TEST(test_empty_add); - RUN_TEST(test_log2); RUN_TEST(test_names_equal); RUN_TEST(test_parse_names); RUN_TEST(test_read_file); @@ -1143,9 +1106,6 @@ int stack_test_main(int argc, const char *argv[]) RUN_TEST(test_reftable_stack_update_index_check); RUN_TEST(test_reftable_stack_uptodate); RUN_TEST(test_reftable_stack_validate_refname); - RUN_TEST(test_sizes_to_segments); - RUN_TEST(test_sizes_to_segments_all_equal); - RUN_TEST(test_sizes_to_segments_empty); RUN_TEST(test_suggest_compaction_segment); RUN_TEST(test_suggest_compaction_segment_nothing); return 0; diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index c9e10b34684..8eec093788d 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -293,7 +293,7 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' test_line_count = 1 repo/.git/reftable/tables.list && test_commit -C repo --no-tag A && - test_line_count = 2 repo/.git/reftable/tables.list && + test_line_count = 1 repo/.git/reftable/tables.list && test_commit -C repo --no-tag B && test_line_count = 1 repo/.git/reftable/tables.list @@ -320,6 +320,19 @@ test_expect_success 'ref transaction: env var disables compaction' ' test_line_count -lt $expected repo/.git/reftable/tables.list ' +test_expect_success 'ref transaction: alternating table sizes are compacted' ' + test_when_finished "rm -rf repo" && + + git init repo && + test_commit -C repo A && + for i in $(test_seq 5) + do + git -C repo branch -f foo && + git -C repo branch -d foo || return 1 + done && + test_line_count = 2 repo/.git/reftable/tables.list +' + check_fsync_events () { local trace="$1" && shift && @@ -345,7 +358,7 @@ test_expect_success 'ref transaction: writes are synced' ' git -C repo -c core.fsync=reference \ -c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD && check_fsync_events trace2.txt <<-EOF - "name":"hardware-flush","count":2 + "name":"hardware-flush","count":4 EOF ' @@ -377,7 +390,7 @@ test_expect_success 'ref transaction: fails gracefully when auto compaction fail done || exit 1 done && - test_line_count = 13 .git/reftable/tables.list + test_line_count = 10 .git/reftable/tables.list ) ' @@ -387,8 +400,8 @@ test_expect_success 'pack-refs: compacts tables' ' test_commit -C repo A && ls -1 repo/.git/reftable >table-files && - test_line_count = 4 table-files && - test_line_count = 3 repo/.git/reftable/tables.list && + test_line_count = 3 table-files && + test_line_count = 2 repo/.git/reftable/tables.list && git -C repo pack-refs && ls -1 repo/.git/reftable >table-files && @@ -429,7 +442,7 @@ test_expect_success "$command: auto compaction" ' # The tables should have been auto-compacted, and thus auto # compaction should not have to do anything. ls -1 .git/reftable >tables-expect && - test_line_count = 4 tables-expect && + test_line_count = 3 tables-expect && git $command --auto && ls -1 .git/reftable >tables-actual && test_cmp tables-expect tables-actual && @@ -447,7 +460,7 @@ test_expect_success "$command: auto compaction" ' git branch B && git branch C && rm .git/reftable/*.lock && - test_line_count = 5 .git/reftable/tables.list && + test_line_count = 4 .git/reftable/tables.list && git $command --auto && test_line_count = 1 .git/reftable/tables.list @@ -479,7 +492,7 @@ do umask $umask && git init --shared=true repo && test_commit -C repo A && - test_line_count = 3 repo/.git/reftable/tables.list + test_line_count = 2 repo/.git/reftable/tables.list ) && git -C repo pack-refs && test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list && @@ -847,12 +860,16 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && + + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 3 repo/.git/reftable/tables.list && git -C repo pack-refs && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list ' @@ -860,13 +877,17 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' ' test_when_finished "rm -rf repo worktree" && git init repo && test_commit -C repo A && + + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ git -C repo worktree add ../worktree && + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ + git -C worktree update-ref refs/worktree/per-worktree HEAD && - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list && + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && + test_line_count = 3 repo/.git/reftable/tables.list && git -C worktree pack-refs && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && - test_line_count = 4 repo/.git/reftable/tables.list + test_line_count = 3 repo/.git/reftable/tables.list ' test_expect_success 'worktree: creating shared ref updates main stack' ' @@ -880,6 +901,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' ' test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 1 repo/.git/reftable/tables.list && + GIT_TEST_REFTABLE_AUTOCOMPACTION=false \ git -C worktree update-ref refs/heads/shared HEAD && test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && test_line_count = 2 repo/.git/reftable/tables.list -- gitgitgadget ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v6 0/3] reftable/stack: use geometric table compaction 2024-04-08 16:16 ` [PATCH v6 " Justin Tobler via GitGitGadget ` (2 preceding siblings ...) 2024-04-08 16:16 ` [PATCH v6 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget @ 2024-04-08 16:20 ` Patrick Steinhardt 2024-04-08 19:12 ` Junio C Hamano 3 siblings, 1 reply; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-08 16:20 UTC (permalink / raw) To: Justin Tobler via GitGitGadget Cc: git, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 3892 bytes --] On Mon, Apr 08, 2024 at 04:16:52PM +0000, Justin Tobler via GitGitGadget wrote: > Hello again, > > This is the sixth version my patch series that refactors the reftable > compaction strategy to instead follow a geometric sequence. Changes compared > to v5: > > * Reworded commit message to more clearly explain that the already existing > configuration to disable auto-compaction is being exposed to callers of > the library. > * Simplified expression to set the disable_auto_compact configuration. > > Thanks for taking a look! Thanks, this version looks good to me! Patrick > -Justin > > Justin Tobler (3): > reftable/stack: expose option to disable auto-compaction > reftable/stack: add env to disable autocompaction > reftable/stack: use geometric table compaction > > refs/reftable-backend.c | 3 + > reftable/reftable-writer.h | 3 + > reftable/stack.c | 125 +++++++++++++++++++------------------ > reftable/stack.h | 4 -- > reftable/stack_test.c | 77 ++++++----------------- > t/t0610-reftable-basics.sh | 71 ++++++++++++++++----- > 6 files changed, 145 insertions(+), 138 deletions(-) > > > base-commit: 4b32163adf4863c6df3bb6b43540fa2ca3494e28 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v6 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v6 > Pull-Request: https://github.com/gitgitgadget/git/pull/1683 > > Range-diff vs v5: > > 1: a7011dbc6aa ! 1: 9c8f6b336ec reftable/stack: allow disabling of auto-compaction > @@ Metadata > Author: Justin Tobler <jltobler@gmail.com> > > ## Commit message ## > - reftable/stack: allow disabling of auto-compaction > + reftable/stack: expose option to disable auto-compaction > + > + The reftable stack already has a variable to configure whether or not to > + run auto-compaction, but it is inaccessible to users of the library. > + There exist use cases where a caller may want to have more control over > + auto-compaction. > > Move the `disable_auto_compact` option into `reftable_write_options` to > - allow a stack to be configured with auto-compaction disabled. In a > - subsequent commit, this is used to disable auto-compaction when a > - specific environment variable is set. > + allow external callers to disable auto-compaction. This will be used in > + a subsequent commit. > > Signed-off-by: Justin Tobler <jltobler@gmail.com> > > 2: 7c4fe0e9ec5 ! 2: c7bc7346540 reftable/stack: add env to disable autocompaction > @@ refs/reftable-backend.c > > /* > @@ refs/reftable-backend.c: static struct ref_store *reftable_be_init(struct repository *repo, > + refs->write_options.block_size = 4096; > refs->write_options.hash_id = repo->hash_algo->format_id; > refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask); > ++ refs->write_options.disable_auto_compact = > ++ !git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1); > > -+ if (!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1)) > -+ refs->write_options.disable_auto_compact = 1; > -+ > /* > * Set up the main reftable stack that is hosted in GIT_COMMON_DIR. > - * This stack contains both the shared and the main worktree refs. > > ## t/t0610-reftable-basics.sh ## > @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause auto-compaction' ' > 3: 8f124acf0f8 = 3: d75494a88b0 reftable/stack: use geometric table compaction > > -- > gitgitgadget [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v6 0/3] reftable/stack: use geometric table compaction 2024-04-08 16:20 ` [PATCH v6 0/3] " Patrick Steinhardt @ 2024-04-08 19:12 ` Junio C Hamano 0 siblings, 0 replies; 52+ messages in thread From: Junio C Hamano @ 2024-04-08 19:12 UTC (permalink / raw) To: Patrick Steinhardt Cc: Justin Tobler via GitGitGadget, git, Karthik Nayak, Han-Wen Nienhuys, Justin Tobler Patrick Steinhardt <ps@pks.im> writes: > On Mon, Apr 08, 2024 at 04:16:52PM +0000, Justin Tobler via GitGitGadget wrote: >> Hello again, >> >> This is the sixth version my patch series that refactors the reftable >> compaction strategy to instead follow a geometric sequence. Changes compared >> to v5: >> >> * Reworded commit message to more clearly explain that the already existing >> configuration to disable auto-compaction is being exposed to callers of >> the library. >> * Simplified expression to set the disable_auto_compact configuration. >> >> Thanks for taking a look! > > Thanks, this version looks good to me! Will queue. I'll mark it for 'next' after I take a brief look for myself. Thanks, both. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/3] reftable/stack: use geometric table compaction 2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget ` (4 preceding siblings ...) 2024-03-29 4:16 ` [PATCH v3 " Justin Tobler via GitGitGadget @ 2024-04-03 19:12 ` Junio C Hamano 2024-04-03 19:30 ` Patrick Steinhardt 5 siblings, 1 reply; 52+ messages in thread From: Junio C Hamano @ 2024-04-03 19:12 UTC (permalink / raw) To: Justin Tobler via GitGitGadget; +Cc: git, Patrick Steinhardt, Justin Tobler "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com> writes: > This is the second version my patch series that refactors the reftable > compaction strategy to instead follow a geometric sequence. Changes compared > to v1: > > * Added GIT_TEST_REFTABLE_NO_AUTOCOMPACTION environment variable to disable > reftable compaction when testing. > * Refactored worktree tests in t0610-reftable-basics.sh to properly assert > git-pack-refs(1) works as expected. > * Added test to validate that alternating table sizes are compacted. > * Added benchmark to compare compaction strategies. > * Moved change that made compaction segment end inclusive to its own > commit. > * Added additional explanation in commits and comments and fixed typos. Has anybody took a look at recent failures with this series present in 'seen' [*1*] and without [*2*] in osx-reftable jobs for t0610? *1* https://github.com/git/git/actions/runs/8543205866/job/23406512990 *2* https://github.com/git/git/actions/runs/8543840764/job/23408543876 ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/3] reftable/stack: use geometric table compaction 2024-04-03 19:12 ` [PATCH v2 " Junio C Hamano @ 2024-04-03 19:30 ` Patrick Steinhardt 2024-04-04 5:34 ` Patrick Steinhardt 0 siblings, 1 reply; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-03 19:30 UTC (permalink / raw) To: Junio C Hamano; +Cc: Justin Tobler via GitGitGadget, git, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1339 bytes --] On Wed, Apr 03, 2024 at 12:12:32PM -0700, Junio C Hamano wrote: > "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > This is the second version my patch series that refactors the reftable > > compaction strategy to instead follow a geometric sequence. Changes compared > > to v1: > > > > * Added GIT_TEST_REFTABLE_NO_AUTOCOMPACTION environment variable to disable > > reftable compaction when testing. > > * Refactored worktree tests in t0610-reftable-basics.sh to properly assert > > git-pack-refs(1) works as expected. > > * Added test to validate that alternating table sizes are compacted. > > * Added benchmark to compare compaction strategies. > > * Moved change that made compaction segment end inclusive to its own > > commit. > > * Added additional explanation in commits and comments and fixed typos. > > Has anybody took a look at recent failures with this series present > in 'seen' [*1*] and without [*2*] in osx-reftable jobs for t0610? > > *1* https://github.com/git/git/actions/runs/8543205866/job/23406512990 > *2* https://github.com/git/git/actions/runs/8543840764/job/23408543876 I noticed that both `seen` and `next` started to fail in the GitLab mirror today. Unless somebody else beats me to it I'll investigate tomorrow what causes these. Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/3] reftable/stack: use geometric table compaction 2024-04-03 19:30 ` Patrick Steinhardt @ 2024-04-04 5:34 ` Patrick Steinhardt 2024-04-04 18:28 ` Justin Tobler 0 siblings, 1 reply; 52+ messages in thread From: Patrick Steinhardt @ 2024-04-04 5:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: Justin Tobler via GitGitGadget, git, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 2036 bytes --] On Wed, Apr 03, 2024 at 09:30:19PM +0200, Patrick Steinhardt wrote: > On Wed, Apr 03, 2024 at 12:12:32PM -0700, Junio C Hamano wrote: > > "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > > > This is the second version my patch series that refactors the reftable > > > compaction strategy to instead follow a geometric sequence. Changes compared > > > to v1: > > > > > > * Added GIT_TEST_REFTABLE_NO_AUTOCOMPACTION environment variable to disable > > > reftable compaction when testing. > > > * Refactored worktree tests in t0610-reftable-basics.sh to properly assert > > > git-pack-refs(1) works as expected. > > > * Added test to validate that alternating table sizes are compacted. > > > * Added benchmark to compare compaction strategies. > > > * Moved change that made compaction segment end inclusive to its own > > > commit. > > > * Added additional explanation in commits and comments and fixed typos. > > > > Has anybody took a look at recent failures with this series present > > in 'seen' [*1*] and without [*2*] in osx-reftable jobs for t0610? > > > > *1* https://github.com/git/git/actions/runs/8543205866/job/23406512990 > > *2* https://github.com/git/git/actions/runs/8543840764/job/23408543876 > > I noticed that both `seen` and `next` started to fail in the GitLab > mirror today. Unless somebody else beats me to it I'll investigate > tomorrow what causes these. Things work on GitLab CI again, all pipelines are green there now. Which probably also is because you have evicted this series from "seen". On GitHub most of the failures I see are still related to the regression in libcurl. But your first link definitely is specific to the changes in this patch series and comes from a bad interaction with "ps/pack-refs-auto". That series added a few tests where the exact number of tables that exist is now different. Justin wanted to make that series a dependency anyway, so I assume that he'll then address those issues. Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/3] reftable/stack: use geometric table compaction 2024-04-04 5:34 ` Patrick Steinhardt @ 2024-04-04 18:28 ` Justin Tobler 0 siblings, 0 replies; 52+ messages in thread From: Justin Tobler @ 2024-04-04 18:28 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Junio C Hamano, Justin Tobler via GitGitGadget, git On 24/04/04 07:34AM, Patrick Steinhardt wrote: > On Wed, Apr 03, 2024 at 09:30:19PM +0200, Patrick Steinhardt wrote: > > On Wed, Apr 03, 2024 at 12:12:32PM -0700, Junio C Hamano wrote: > > > "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > > > > > This is the second version my patch series that refactors the reftable > > > > compaction strategy to instead follow a geometric sequence. Changes compared > > > > to v1: > > > > > > > > * Added GIT_TEST_REFTABLE_NO_AUTOCOMPACTION environment variable to disable > > > > reftable compaction when testing. > > > > * Refactored worktree tests in t0610-reftable-basics.sh to properly assert > > > > git-pack-refs(1) works as expected. > > > > * Added test to validate that alternating table sizes are compacted. > > > > * Added benchmark to compare compaction strategies. > > > > * Moved change that made compaction segment end inclusive to its own > > > > commit. > > > > * Added additional explanation in commits and comments and fixed typos. > > > > > > Has anybody took a look at recent failures with this series present > > > in 'seen' [*1*] and without [*2*] in osx-reftable jobs for t0610? > > > > > > *1* https://github.com/git/git/actions/runs/8543205866/job/23406512990 > > > *2* https://github.com/git/git/actions/runs/8543840764/job/23408543876 > > > > I noticed that both `seen` and `next` started to fail in the GitLab > > mirror today. Unless somebody else beats me to it I'll investigate > > tomorrow what causes these. > > Things work on GitLab CI again, all pipelines are green there now. Which > probably also is because you have evicted this series from "seen". On > GitHub most of the failures I see are still related to the regression in > libcurl. > > But your first link definitely is specific to the changes in this patch > series and comes from a bad interaction with "ps/pack-refs-auto". That > series added a few tests where the exact number of tables that exist is > now different. > > Justin wanted to make that series a dependency anyway, so I assume that > he'll then address those issues. Yes, I've made this series depend on "ps/pack-refs-auto" and have updated the conflicting tests in the next version :) -Justin ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2024-04-08 19:12 UTC | newest] Thread overview: 52+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-05 20:03 [PATCH] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-03-06 12:30 ` Patrick Steinhardt 2024-03-06 12:37 ` Patrick Steinhardt 2024-03-21 22:48 ` Justin Tobler 2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget 2024-03-21 22:40 ` [PATCH v2 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget 2024-03-22 1:25 ` Patrick Steinhardt 2024-03-21 22:40 ` [PATCH v2 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-03-22 1:25 ` Patrick Steinhardt 2024-03-27 13:24 ` Karthik Nayak 2024-03-21 22:40 ` [PATCH v2 3/3] reftable/segment: make segment end inclusive Justin Tobler via GitGitGadget 2024-03-22 1:25 ` [PATCH v2 0/3] reftable/stack: use geometric table compaction Patrick Steinhardt 2024-04-03 10:13 ` Han-Wen Nienhuys 2024-04-03 10:18 ` Patrick Steinhardt 2024-04-03 15:14 ` Justin Tobler 2024-04-03 16:40 ` Junio C Hamano 2024-03-29 4:16 ` [PATCH v3 " Justin Tobler via GitGitGadget 2024-03-29 4:16 ` [PATCH v3 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget 2024-03-29 18:25 ` Junio C Hamano 2024-03-29 21:56 ` Junio C Hamano 2024-04-02 7:23 ` Patrick Steinhardt 2024-04-02 17:23 ` Junio C Hamano 2024-03-29 4:16 ` [PATCH v3 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-04-02 7:23 ` Patrick Steinhardt 2024-03-29 4:16 ` [PATCH v3 3/3] reftable/stack: make segment end inclusive Justin Tobler via GitGitGadget 2024-03-29 18:36 ` Junio C Hamano 2024-04-02 7:23 ` Patrick Steinhardt 2024-04-03 0:20 ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-04-03 0:20 ` [PATCH v4 1/2] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget 2024-04-03 0:20 ` [PATCH v4 2/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-04-03 4:47 ` [PATCH v4 0/2] " Patrick Steinhardt 2024-04-03 11:12 ` Karthik Nayak 2024-04-03 16:56 ` Junio C Hamano 2024-04-04 18:29 ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget 2024-04-04 18:29 ` [PATCH v5 1/3] reftable/stack: allow disabling of auto-compaction Justin Tobler via GitGitGadget 2024-04-08 6:12 ` Patrick Steinhardt 2024-04-04 18:29 ` [PATCH v5 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget 2024-04-08 6:12 ` Patrick Steinhardt 2024-04-08 16:18 ` Junio C Hamano 2024-04-04 18:29 ` [PATCH v5 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-04-08 6:12 ` [PATCH v5 0/3] " Patrick Steinhardt 2024-04-08 16:17 ` Justin Tobler 2024-04-08 16:16 ` [PATCH v6 " Justin Tobler via GitGitGadget 2024-04-08 16:16 ` [PATCH v6 1/3] reftable/stack: expose option to disable auto-compaction Justin Tobler via GitGitGadget 2024-04-08 16:16 ` [PATCH v6 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget 2024-04-08 16:16 ` [PATCH v6 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget 2024-04-08 16:20 ` [PATCH v6 0/3] " Patrick Steinhardt 2024-04-08 19:12 ` Junio C Hamano 2024-04-03 19:12 ` [PATCH v2 " Junio C Hamano 2024-04-03 19:30 ` Patrick Steinhardt 2024-04-04 5:34 ` Patrick Steinhardt 2024-04-04 18:28 ` Justin Tobler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).