[PATCH 0/8] reftable: improvements and fixes for compaction

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/8] reftable: improvements and fixes for compaction
@ 2024-07-31 14:14 Patrick Steinhardt
  2024-07-31 14:14 ` [PATCH 1/8] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
                   ` (9 more replies)
  0 siblings, 10 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-07-31 14:14 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2504 bytes --]

Hi,

this patch series addresses a couple of issues with compaction of the
reftable stack. The original intent was to make compaction handle the
case gracefully where a subset of the tables it wants to compact has
been locked already. While working on that I found an issue where
compaction may race with a concurrent writer that modified the
"tables.list" file while compacting it.

In theory, this situation should work alright and has been accounted for
in the design: the compacting process locks "tables.list", then locks
all of the tables it wants to compact, unlocks "tables.list", compacts
the data and re-locks "tables.list" to update the stack. No concurrent
process would thus be able to compact the same tables, and thus we know
that the tables we have just compacted must still exist.

What the code didn't do though is to reload the stack before writing the
updated tables to "tables.list". This could either lead to us dropping
new tables appended to the stack while we were compacting, or it could
lead to us writing references to concurrently-compacted tables which
have been removed meanwhile, leading to data loss.

The fix itself is rather straight-forward. What I'm missing though is a
way to test for this given that the logic only triggers when racing with
another thread. I didn't have any idea how to do this, so if anybody
else has an idea: please, share it! Otherwise, I'm not sure I feel
comfortable with untested medium-complexity code. The alternative would
be to just bail out when we see that the stack has been compacted, which
is less ideal when we have just done a bunch of working compacting large
tables.

Thanks!

Patrick

Patrick Steinhardt (8):
  reftable/stack: refactor function to gather table sizes
  reftable/stack: test compaction with already-locked tables
  reftable/stack: update stats on failed full compaction
  reftable/stack: simplify tracking of table locks
  reftable/stack: do not die when fsyncing lock file files
  reftable/stack: use lock_file when adding table to "tables.list"
  reftable/stack: fix corruption on concurrent compaction
  reftable/stack: handle locked tables during auto-compaction

 reftable/stack.c           | 228 +++++++++++++++++++++++++++++--------
 reftable/stack_test.c      | 104 +++++++++++++++++
 t/t0610-reftable-basics.sh |  21 ++--
 3 files changed, 300 insertions(+), 53 deletions(-)

base-commit: 39bf06adf96da25b87c9aa7d35a32ef3683eb4a4
-- 
2.46.0.dirty

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/8] reftable/stack: refactor function to gather table sizes
  2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
@ 2024-07-31 14:14 ` Patrick Steinhardt
  2024-08-02 20:26   ` Junio C Hamano
  2024-07-31 14:14 ` [PATCH 2/8] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-07-31 14:14 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

Refactor the function that gathers table sizes to be more idiomatic. For
one, use `REFTABLE_CALLOC_ARRAY()` instead of `reftable_calloc()`.
Second, avoid using an integer to iterate through the tables in the
reftable stack given that `stack_len` itself is using a `size_t`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 737591125e..ba8234b486 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1305,14 +1305,15 @@ struct segment suggest_compaction_segment(uint64_t *sizes, size_t n,
 
 static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
 {
-	uint64_t *sizes =
-		reftable_calloc(st->merged->stack_len, sizeof(*sizes));
 	int version = (st->opts.hash_id == GIT_SHA1_FORMAT_ID) ? 1 : 2;
 	int overhead = header_size(version) - 1;
-	int i = 0;
-	for (i = 0; i < st->merged->stack_len; i++) {
+	uint64_t *sizes;
+
+	REFTABLE_CALLOC_ARRAY(sizes, st->merged->stack_len);
+
+	for (size_t i = 0; i < st->merged->stack_len; i++)
 		sizes[i] = st->readers[i]->size - overhead;
-	}
+
 	return sizes;
 }
 
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/8] reftable/stack: test compaction with already-locked tables
  2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
  2024-07-31 14:14 ` [PATCH 1/8] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
@ 2024-07-31 14:14 ` Patrick Steinhardt
  2024-08-02 21:05   ` Junio C Hamano
  2024-07-31 14:15 ` [PATCH 3/8] reftable/stack: update stats on failed full compaction Patrick Steinhardt
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-07-31 14:14 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4621 bytes --]

We're lacking test coverage for compacting tables when some of the
tables that we are about to compact are locked. Add two tests that
exercise this, one for auto-compaction and one for full compaction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack_test.c | 103 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index e3c11e6a6e..04526c6604 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -863,6 +863,58 @@ static void test_reftable_stack_auto_compaction(void)
 	clear_dir(dir);
 }
 
+static void test_reftable_stack_auto_compaction_with_locked_tables(void)
+{
+	struct reftable_write_options opts = {
+		.disable_auto_compact = 1,
+	};
+	struct reftable_stack *st = NULL;
+	struct strbuf buf = STRBUF_INIT;
+	char *dir = get_tmp_dir(__LINE__);
+	int err;
+
+	err = reftable_new_stack(&st, dir, &opts);
+	EXPECT_ERR(err);
+
+	for (size_t i = 0; i < 5; i++) {
+		struct reftable_ref_record ref = {
+			.update_index = reftable_stack_next_update_index(st),
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { i },
+		};
+
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "refs/heads/branch-%04" PRIuMAX, (uintmax_t) i);
+		ref.refname = buf.buf;
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		EXPECT_ERR(err);
+	}
+	EXPECT(st->merged->stack_len == 5);
+
+	/*
+	 * Given that all tables we have written should be roughly the same
+	 * size, we expect that auto-compaction will want to compact all of the
+	 * tables. Locking any of the tables will keep it from doing so.
+	 */
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "%s/%s.lock", dir, st->readers[2]->name);
+	write_file_buf(buf.buf, "", 0);
+
+	/*
+	 * Ideally, we'd handle the situation where any of the tables is locked
+	 * gracefully. We don't (yet) do this though and thus fail.
+	 */
+	err = reftable_stack_auto_compact(st);
+	EXPECT(err == REFTABLE_LOCK_ERROR);
+	EXPECT(st->stats.failures == 1);
+	EXPECT(st->merged->stack_len == 5);
+
+	reftable_stack_destroy(st);
+	strbuf_release(&buf);
+	clear_dir(dir);
+}
+
 static void test_reftable_stack_add_performs_auto_compaction(void)
 {
 	struct reftable_write_options opts = { 0 };
@@ -911,6 +963,55 @@ static void test_reftable_stack_add_performs_auto_compaction(void)
 	clear_dir(dir);
 }
 
+static void test_reftable_stack_compaction_with_locked_tables(void)
+{
+	struct reftable_write_options opts = {
+		.disable_auto_compact = 1,
+	};
+	struct reftable_stack *st = NULL;
+	struct strbuf buf = STRBUF_INIT;
+	char *dir = get_tmp_dir(__LINE__);
+	int err;
+
+	err = reftable_new_stack(&st, dir, &opts);
+	EXPECT_ERR(err);
+
+	for (size_t i = 0; i < 3; i++) {
+		struct reftable_ref_record ref = {
+			.update_index = reftable_stack_next_update_index(st),
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { i },
+		};
+
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "refs/heads/branch-%04" PRIuMAX, (uintmax_t) i);
+		ref.refname = buf.buf;
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		EXPECT_ERR(err);
+	}
+	EXPECT(st->merged->stack_len == 3);
+
+	/* Lock one of the tables that we're about to compact. */
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "%s/%s.lock", dir, st->readers[1]->name);
+	write_file_buf(buf.buf, "", 0);
+
+	/*
+	 * Compaction is expected to fail given that we were not able to
+	 * compact all tables.
+	 */
+	err = reftable_stack_compact_all(st, NULL);
+	EXPECT(err == REFTABLE_LOCK_ERROR);
+	/* TODO: this is wrong, we should get notified about the failure. */
+	EXPECT(st->stats.failures == 0);
+	EXPECT(st->merged->stack_len == 3);
+
+	reftable_stack_destroy(st);
+	strbuf_release(&buf);
+	clear_dir(dir);
+}
+
 static void test_reftable_stack_compaction_concurrent(void)
 {
 	struct reftable_write_options opts = { 0 };
@@ -1016,9 +1117,11 @@ int stack_test_main(int argc, const char *argv[])
 	RUN_TEST(test_reftable_stack_add);
 	RUN_TEST(test_reftable_stack_add_one);
 	RUN_TEST(test_reftable_stack_auto_compaction);
+	RUN_TEST(test_reftable_stack_auto_compaction_with_locked_tables);
 	RUN_TEST(test_reftable_stack_add_performs_auto_compaction);
 	RUN_TEST(test_reftable_stack_compaction_concurrent);
 	RUN_TEST(test_reftable_stack_compaction_concurrent_clean);
+	RUN_TEST(test_reftable_stack_compaction_with_locked_tables);
 	RUN_TEST(test_reftable_stack_hash_id);
 	RUN_TEST(test_reftable_stack_lock_failure);
 	RUN_TEST(test_reftable_stack_log_normalize);
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/8] reftable/stack: update stats on failed full compaction
  2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
  2024-07-31 14:14 ` [PATCH 1/8] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
  2024-07-31 14:14 ` [PATCH 2/8] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
@ 2024-07-31 14:15 ` Patrick Steinhardt
  2024-07-31 14:15 ` [PATCH 4/8] reftable/stack: simplify tracking of table locks Patrick Steinhardt
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-07-31 14:15 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2104 bytes --]

When auto-compaction fails due to a locking error, we update the
statistics to indicate this failure. We're not doing the same when
performing a full compaction.

Fix this inconsistency by using `stack_compact_range_stats()`, which
handles the stat update for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c      | 14 +++++++-------
 reftable/stack_test.c |  3 +--
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index ba8234b486..e5959d2c76 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1205,13 +1205,6 @@ static int stack_compact_range(struct reftable_stack *st,
 	return err;
 }
 
-int reftable_stack_compact_all(struct reftable_stack *st,
-			       struct reftable_log_expiry_config *config)
-{
-	return stack_compact_range(st, 0, st->merged->stack_len ?
-			st->merged->stack_len - 1 : 0, config);
-}
-
 static int stack_compact_range_stats(struct reftable_stack *st,
 				     size_t first, size_t last,
 				     struct reftable_log_expiry_config *config)
@@ -1222,6 +1215,13 @@ static int stack_compact_range_stats(struct reftable_stack *st,
 	return err;
 }
 
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	size_t last = st->merged->stack_len ? st->merged->stack_len - 1 : 0;
+	return stack_compact_range_stats(st, 0, last, config);
+}
+
 static int segment_size(struct segment *s)
 {
 	return s->end - s->start;
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 04526c6604..8739ff3d63 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -1003,8 +1003,7 @@ static void test_reftable_stack_compaction_with_locked_tables(void)
 	 */
 	err = reftable_stack_compact_all(st, NULL);
 	EXPECT(err == REFTABLE_LOCK_ERROR);
-	/* TODO: this is wrong, we should get notified about the failure. */
-	EXPECT(st->stats.failures == 0);
+	EXPECT(st->stats.failures == 1);
 	EXPECT(st->merged->stack_len == 3);
 
 	reftable_stack_destroy(st);
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/8] reftable/stack: simplify tracking of table locks
  2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2024-07-31 14:15 ` [PATCH 3/8] reftable/stack: update stats on failed full compaction Patrick Steinhardt
@ 2024-07-31 14:15 ` Patrick Steinhardt
  2024-07-31 21:57   ` Justin Tobler
  2024-08-02 23:00   ` Junio C Hamano
  2024-07-31 14:15 ` [PATCH 5/8] reftable/stack: do not die when fsyncing lock file files Patrick Steinhardt
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-07-31 14:15 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3041 bytes --]

When compacting tables, we store the locks of all tables we are about to
compact in the `table_locks` array. As we currently only ever compact
all tables in the user-provided range or none, we simply track those
locks via the indices of the respective tables in the merged stack.

This is about to change though, as we will introduce a mode where auto
compaction gracefully handles the case of already-locked files. In this
case, it may happen that we only compact a subset of the user-supplied
range of tables. In this case, the indices will not necessarily match
the lock indices anymore.

Refactor the code such that we track the number of locks via a separate
variable. The resulting code is expected to perform the same, but will
make it easier to perform the described change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index e5959d2c76..07e7ffc6b9 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1016,7 +1016,7 @@ static int stack_compact_range(struct reftable_stack *st,
 	struct lock_file *table_locks = NULL;
 	struct tempfile *new_table = NULL;
 	int is_empty_table = 0, err = 0;
-	size_t i;
+	size_t i, nlocks = 0;
 
 	if (first > last || (!expiry && first == last)) {
 		err = 0;
@@ -1051,7 +1051,7 @@ static int stack_compact_range(struct reftable_stack *st,
 	for (i = first; i <= last; i++) {
 		stack_filename(&table_name, st, reader_name(st->readers[i]));
 
-		err = hold_lock_file_for_update(&table_locks[i - first],
+		err = hold_lock_file_for_update(&table_locks[nlocks],
 						table_name.buf, LOCK_NO_DEREF);
 		if (err < 0) {
 			if (errno == EEXIST)
@@ -1066,7 +1066,7 @@ static int stack_compact_range(struct reftable_stack *st,
 		 * run into file descriptor exhaustion when we compress a lot
 		 * of tables.
 		 */
-		err = close_lock_file_gently(&table_locks[i - first]);
+		err = close_lock_file_gently(&table_locks[nlocks++]);
 		if (err < 0) {
 			err = REFTABLE_IO_ERROR;
 			goto done;
@@ -1183,8 +1183,8 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * Delete the old tables. They may still be in use by concurrent
 	 * readers, so it is expected that unlinking tables may fail.
 	 */
-	for (i = first; i <= last; i++) {
-		struct lock_file *table_lock = &table_locks[i - first];
+	for (i = 0; i < nlocks; i++) {
+		struct lock_file *table_lock = &table_locks[i];
 		char *table_path = get_locked_file_path(table_lock);
 		unlink(table_path);
 		free(table_path);
@@ -1192,8 +1192,8 @@ static int stack_compact_range(struct reftable_stack *st,
 
 done:
 	rollback_lock_file(&tables_list_lock);
-	for (i = first; table_locks && i <= last; i++)
-		rollback_lock_file(&table_locks[i - first]);
+	for (i = 0; table_locks && i < nlocks; i++)
+		rollback_lock_file(&table_locks[i]);
 	reftable_free(table_locks);
 
 	delete_tempfile(&new_table);
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/8] reftable/stack: do not die when fsyncing lock file files
  2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2024-07-31 14:15 ` [PATCH 4/8] reftable/stack: simplify tracking of table locks Patrick Steinhardt
@ 2024-07-31 14:15 ` Patrick Steinhardt
  2024-07-31 14:15 ` [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list" Patrick Steinhardt
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-07-31 14:15 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

We use `fsync_component_or_die()` when committing an addition to the
"tables.list" lock file, which unsurprisingly dies in case the fsync
fails. Given that this is part of the reftable library, we should never
die and instead let callers handle the error.

Adapt accordingly and use `fsync_component()` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 07e7ffc6b9..9ca549294f 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -674,8 +674,11 @@ int reftable_addition_commit(struct reftable_addition *add)
 		goto done;
 	}
 
-	fsync_component_or_die(FSYNC_COMPONENT_REFERENCE, lock_file_fd,
-			       get_tempfile_path(add->lock_file));
+	err = fsync_component(FSYNC_COMPONENT_REFERENCE, lock_file_fd);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
 
 	err = rename_tempfile(&add->lock_file, add->stack->list_file);
 	if (err < 0) {
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list"
  2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2024-07-31 14:15 ` [PATCH 5/8] reftable/stack: do not die when fsyncing lock file files Patrick Steinhardt
@ 2024-07-31 14:15 ` Patrick Steinhardt
  2024-07-31 23:02   ` Justin Tobler
  2024-07-31 14:15 ` [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-07-31 14:15 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3095 bytes --]

When modifying "tables.list", we need to lock the list before updating
it to ensure that no concurrent writers modify the list at the same
point in time. While we do this via the `lock_file` subsystem when
compacting the stack, we manually handle the lock when adding a new
table to it. While not wrong, it is at least inconsistent.

Refactor the code to consistenly lock "tables.list" via the `lock_file`
subsytem.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 9ca549294f..9cc91a262c 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -567,7 +567,7 @@ static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
 }
 
 struct reftable_addition {
-	struct tempfile *lock_file;
+	struct lock_file tables_list_lock;
 	struct reftable_stack *stack;
 
 	char **new_tables;
@@ -581,13 +581,13 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 					struct reftable_stack *st)
 {
 	struct strbuf lock_file_name = STRBUF_INIT;
-	int err = 0;
-	add->stack = st;
+	int err;
 
-	strbuf_addf(&lock_file_name, "%s.lock", st->list_file);
+	add->stack = st;
 
-	add->lock_file = create_tempfile(lock_file_name.buf);
-	if (!add->lock_file) {
+	err = hold_lock_file_for_update(&add->tables_list_lock, st->list_file,
+					LOCK_NO_DEREF);
+	if (err < 0) {
 		if (errno == EEXIST) {
 			err = REFTABLE_LOCK_ERROR;
 		} else {
@@ -596,7 +596,8 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 		goto done;
 	}
 	if (st->opts.default_permissions) {
-		if (chmod(add->lock_file->filename.buf, st->opts.default_permissions) < 0) {
+		if (chmod(get_lock_file_path(&add->tables_list_lock),
+			  st->opts.default_permissions) < 0) {
 			err = REFTABLE_IO_ERROR;
 			goto done;
 		}
@@ -635,7 +636,7 @@ static void reftable_addition_close(struct reftable_addition *add)
 	add->new_tables_len = 0;
 	add->new_tables_cap = 0;
 
-	delete_tempfile(&add->lock_file);
+	rollback_lock_file(&add->tables_list_lock);
 	strbuf_release(&nm);
 }
 
@@ -651,7 +652,7 @@ void reftable_addition_destroy(struct reftable_addition *add)
 int reftable_addition_commit(struct reftable_addition *add)
 {
 	struct strbuf table_list = STRBUF_INIT;
-	int lock_file_fd = get_tempfile_fd(add->lock_file);
+	int lock_file_fd = get_lock_file_fd(&add->tables_list_lock);
 	int err = 0;
 	size_t i;
 
@@ -674,13 +675,14 @@ int reftable_addition_commit(struct reftable_addition *add)
 		goto done;
 	}
 
-	err = fsync_component(FSYNC_COMPONENT_REFERENCE, lock_file_fd);
+	err = fsync_component(FSYNC_COMPONENT_REFERENCE,
+			      get_lock_file_fd(&add->tables_list_lock));
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
 	}
 
-	err = rename_tempfile(&add->lock_file, add->stack->list_file);
+	err = commit_lock_file(&add->tables_list_lock);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction
  2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2024-07-31 14:15 ` [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list" Patrick Steinhardt
@ 2024-07-31 14:15 ` Patrick Steinhardt
  2024-08-01  1:04   ` Justin Tobler
  2024-07-31 14:15 ` [PATCH 8/8] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-07-31 14:15 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 6886 bytes --]

The locking employed by compaction uses the following schema:

  1. Lock "tables.list" and verify that it matches the version we have
     loaded in core.

  2. Lock each of the tables in the user-supplied range of tables that
     we are supposed to compact. These locks prohibit any concurrent
     process to compact those tables while we are doing that.

  3. Unlock "tables.list". This enables concurrent processes to add new
     tables to the stack, but also allows them to compact tables outside
     of the range of tables that we have locked.

  4. Perform the compaction.

  5. Lock "tables.list" again.

  6. Move the compacted table into place.

  7. Write the new order of tables, including the compacted table, into
     the lockfile.

  8. Commit the lockfile into place.

Letting concurrent processes modify the "tables.list" file while we are
doing the compaction is very much part of the design and thus expected.
After all, it may take some time to compact tables in the case where we
are compacting a lot or very large tables.

But there is a bug in the code. Suppose we have two processes which are
compacting two slices of the table. Given that we lock each of the
tables before compacting them, we know that the slices must be disjunct
from each other. But regardless of that, compaction performed by one
process will always impact what the other process needs to write to the
"tables.list" file.

Right now , we do not check whether the "tables.list" has been
changed after we have locked it for the second time in (5). This has the
consequence that we will always commit the old, cached in-core tables to
disk without paying to respect what the other process has written. This
scenario would then lead to data loss and corruption.

This can even happen in the simpler case of one compacting process and
one writing process. The newly-appended table by the writing process
would get discarded by the compacting process because it never sees the
new table.

Fix this bug by re-checking whether our stack is still up to date after
locking for the second time. If it isn't, then we adjust the indices of
tables to replace in the updated stack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 101 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 96 insertions(+), 5 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 9cc91a262c..2b1ac58120 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1021,7 +1021,9 @@ static int stack_compact_range(struct reftable_stack *st,
 	struct lock_file *table_locks = NULL;
 	struct tempfile *new_table = NULL;
 	int is_empty_table = 0, err = 0;
+	size_t first_to_replace, last_to_replace;
 	size_t i, nlocks = 0;
+	char **names = NULL;
 
 	if (first > last || (!expiry && first == last)) {
 		err = 0;
@@ -1124,6 +1126,94 @@ static int stack_compact_range(struct reftable_stack *st,
 		}
 	}
 
+	/*
+	 * As we have unlocked the stack while compacting our slice of tables
+	 * it may have happened that a concurrently running process has updated
+	 * the stack while we were compacting. In that case, we need to check
+	 * whether the tables that we have just compacted still exist in the
+	 * stack in the exact same order as we have compacted them.
+	 *
+	 * If they do exist, then it is fine to continue and replace those
+	 * tables with our compacted version. If they don't, then we need to
+	 * abort.
+	 */
+	err = stack_uptodate(st);
+	if (err < 0)
+		goto done;
+	if (err > 0) {
+		ssize_t new_offset = -1;
+		int fd;
+
+		fd = open(st->list_file, O_RDONLY);
+		if (fd < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto done;
+		}
+
+		err = fd_read_lines(fd, &names);
+		close(fd);
+		if (err < 0)
+			goto done;
+
+		/*
+		 * Search for the offset of the first table that we have
+		 * compacted in the updated "tables.list" file.
+		 */
+		for (size_t i = 0; names[i]; i++) {
+			if (strcmp(names[i], st->readers[first]->name))
+				continue;
+
+			/*
+			 * We have found the first entry. Verify that all the
+			 * subsequent tables we have compacted still exist in
+			 * the modified stack in the exact same order as we
+			 * have compacted them.
+			 */
+			for (size_t j = 1; j < last - first + 1; j++) {
+				const char *old = first + j < st->merged->stack_len ?
+					st->readers[first + j]->name : NULL;
+				const char *new = names[i + j];
+
+				/*
+				 * If some entries are missing or in case the tables
+				 * have changed then we need to bail out. Again, this
+				 * shouldn't ever happen because we have locked the
+				 * tables we are compacting.
+				 */
+				if (!old || !new || strcmp(old, new)) {
+					err = REFTABLE_OUTDATED_ERROR;
+					goto done;
+				}
+			}
+
+			new_offset = i;
+			break;
+		}
+
+		/*
+		 * In case we didn't find our compacted tables in the stack we
+		 * need to bail out. In theory, this should have never happened
+		 * because we locked the tables we are compacting.
+		 */
+		if (new_offset < 0) {
+			err = REFTABLE_OUTDATED_ERROR;
+			goto done;
+		}
+
+		/*
+		 * We have found the new range that we want to replace, so
+		 * let's update the range of tables that we want to replace.
+		 */
+		last_to_replace = last + (new_offset - first);
+		first_to_replace = new_offset;
+	} else {
+		REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);
+		for (size_t i = 0; i < st->merged->stack_len; i++)
+			names[i] = xstrdup(st->readers[i]->name);
+		last_to_replace = last;
+		first_to_replace = first;
+	}
+
 	/*
 	 * If the resulting compacted table is not empty, then we need to move
 	 * it into place now.
@@ -1146,12 +1236,12 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * have just written. In case the compacted table became empty we
 	 * simply skip writing it.
 	 */
-	for (i = 0; i < first; i++)
-		strbuf_addf(&tables_list_buf, "%s\n", st->readers[i]->name);
+	for (i = 0; i < first_to_replace; i++)
+		strbuf_addf(&tables_list_buf, "%s\n", names[i]);
 	if (!is_empty_table)
 		strbuf_addf(&tables_list_buf, "%s\n", new_table_name.buf);
-	for (i = last + 1; i < st->merged->stack_len; i++)
-		strbuf_addf(&tables_list_buf, "%s\n", st->readers[i]->name);
+	for (i = last_to_replace + 1; names[i]; i++)
+		strbuf_addf(&tables_list_buf, "%s\n", names[i]);
 
 	err = write_in_full(get_lock_file_fd(&tables_list_lock),
 			    tables_list_buf.buf, tables_list_buf.len);
@@ -1204,9 +1294,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	delete_tempfile(&new_table);
 	strbuf_release(&new_table_name);
 	strbuf_release(&new_table_path);
-
 	strbuf_release(&tables_list_buf);
 	strbuf_release(&table_name);
+	free_names(names);
+
 	return err;
 }
 
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 8/8] reftable/stack: handle locked tables during auto-compaction
  2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
                   ` (6 preceding siblings ...)
  2024-07-31 14:15 ` [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
@ 2024-07-31 14:15 ` Patrick Steinhardt
  2024-08-02 23:24   ` Junio C Hamano
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
  9 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-07-31 14:15 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 9112 bytes --]

When compacting tables, it may happen that we want to compact a set of
tables which are already locked by a concurrent process that compacts
them. In the case where we wanted to perform a full compaction of all
tables it is sensible to bail out in this case, as we cannot fulfill the
requested action.

But when performing auto-compaction it isn't necessarily in our best
interest of us to abort the whole operation. For example, due to the
geometric compacting schema that we use, it may be that process A takes
a lot of time to compact the bulk of all tables whereas process B
appends a bunch of new tables to the stack. B would in this case also
notice that it has to compact the tables that process A is compacting
already and thus also try to compact the same range, probably including
the new tables it has appended. But because those tables are locked
already, it will fail and thus abort the complete auto-compaction. The
consequence is that the stack will grow longer and longer while A isn't
yet done with compaction, which will lead to a growing performance
impact.

Instead of aborting auto-compaction altogether, let's gracefully handle
this situation by instead compacting tables which aren't locked. To do
so, instead of locking from the beginning of the slice-to-be-compacted,
we start locking tables from the end of the slice. Once we hit the first
table that is locked already, we abort. If we succeded to lock two or
more tables, then we simply reduce the slice of tables that we're about
to compact to those which we managed to lock.

This ensures that we can at least make some progress for compaction in
said scenario. It also helps in other scenarios, like for example when a
process died and left a stale lockfile behind. In such a case we can at
least ensure some compaction on a best-effort basis.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c           | 61 +++++++++++++++++++++++++++++++-------
 reftable/stack_test.c      | 12 ++++----
 t/t0610-reftable-basics.sh | 21 ++++++++-----
 3 files changed, 71 insertions(+), 23 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 2b1ac58120..9657ca4418 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1000,6 +1000,15 @@ static int stack_write_compact(struct reftable_stack *st,
 	return err;
 }
 
+enum stack_compact_range_flags {
+	/*
+	 * Perform a best-effort compaction. That is, even if we cannot lock
+	 * all tables in the specified range, we will try to compact the
+	 * remaining slice.
+	 */
+	STACK_COMPACT_RANGE_BEST_EFFORT = (1 << 0),
+};
+
 /*
  * Compact all tables in the range `[first, last)` into a single new table.
  *
@@ -1011,7 +1020,8 @@ static int stack_write_compact(struct reftable_stack *st,
  */
 static int stack_compact_range(struct reftable_stack *st,
 			       size_t first, size_t last,
-			       struct reftable_log_expiry_config *expiry)
+			       struct reftable_log_expiry_config *expiry,
+			       unsigned int flags)
 {
 	struct strbuf tables_list_buf = STRBUF_INIT;
 	struct strbuf new_table_name = STRBUF_INIT;
@@ -1053,19 +1063,47 @@ static int stack_compact_range(struct reftable_stack *st,
 	/*
 	 * Lock all tables in the user-provided range. This is the slice of our
 	 * stack which we'll compact.
+	 *
+	 * Note that we lock tables in reverse order from last to first. The
+	 * intent behind this is to allow a newer process to perform best
+	 * effort compaction of tables that it has added in the case where an
+	 * older process is still busy compacting tables which are preexisting
+	 * from the point of view of the newer process.
 	 */
 	REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1);
-	for (i = first; i <= last; i++) {
-		stack_filename(&table_name, st, reader_name(st->readers[i]));
+	for (i = last + 1; i > first; i--) {
+		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));
 
 		err = hold_lock_file_for_update(&table_locks[nlocks],
 						table_name.buf, LOCK_NO_DEREF);
 		if (err < 0) {
-			if (errno == EEXIST)
+			/*
+			 * When the table is locked already we may do a
+			 * best-effort compaction and compact only the tables
+			 * that we have managed to lock so far. This of course
+			 * requires that we have been able to lock at least two
+			 * tables, otherwise there would be nothing to compact.
+			 * In that case, we return a lock error to our caller.
+			 */
+			if (errno == EEXIST && last - (i - 1) >= 2 &&
+			    flags & STACK_COMPACT_RANGE_BEST_EFFORT) {
+				err = 0;
+				/*
+				 * The subtraction is to offset the index, the
+				 * addition is to only compact up to the table
+				 * of the preceding iteration. They obviously
+				 * cancel each other out, but that may be
+				 * non-obvious when it was omitted.
+				 */
+				first = (i - 1) + 1;
+				break;
+			} else if (errno == EEXIST) {
 				err = REFTABLE_LOCK_ERROR;
-			else
+				goto done;
+			} else {
 				err = REFTABLE_IO_ERROR;
-			goto done;
+				goto done;
+			}
 		}
 
 		/*
@@ -1270,7 +1308,7 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * delete the files after we closed them on Windows, so this needs to
 	 * happen first.
 	 */
-	err = reftable_stack_reload_maybe_reuse(st, first < last);
+	err = reftable_stack_reload_maybe_reuse(st, first_to_replace < last_to_replace);
 	if (err < 0)
 		goto done;
 
@@ -1303,9 +1341,10 @@ static int stack_compact_range(struct reftable_stack *st,
 
 static int stack_compact_range_stats(struct reftable_stack *st,
 				     size_t first, size_t last,
-				     struct reftable_log_expiry_config *config)
+				     struct reftable_log_expiry_config *config,
+				     unsigned int flags)
 {
-	int err = stack_compact_range(st, first, last, config);
+	int err = stack_compact_range(st, first, last, config, flags);
 	if (err == REFTABLE_LOCK_ERROR)
 		st->stats.failures++;
 	return err;
@@ -1315,7 +1354,7 @@ int reftable_stack_compact_all(struct reftable_stack *st,
 			       struct reftable_log_expiry_config *config)
 {
 	size_t last = st->merged->stack_len ? st->merged->stack_len - 1 : 0;
-	return stack_compact_range_stats(st, 0, last, config);
+	return stack_compact_range_stats(st, 0, last, config, 0);
 }
 
 static int segment_size(struct segment *s)
@@ -1422,7 +1461,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st)
 	reftable_free(sizes);
 	if (segment_size(&seg) > 0)
 		return stack_compact_range_stats(st, seg.start, seg.end - 1,
-						 NULL);
+						 NULL, STACK_COMPACT_RANGE_BEST_EFFORT);
 
 	return 0;
 }
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 8739ff3d63..af5757c0f6 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -902,13 +902,15 @@ static void test_reftable_stack_auto_compaction_with_locked_tables(void)
 	write_file_buf(buf.buf, "", 0);
 
 	/*
-	 * Ideally, we'd handle the situation where any of the tables is locked
-	 * gracefully. We don't (yet) do this though and thus fail.
+	 * When parts of the stack are locked, then auto-compaction does a best
+	 * effort compaction of those tables which aren't locked. So while this
+	 * would in theory compact all tables, due to the preexisting lock we
+	 * only compact the newest two tables.
 	 */
 	err = reftable_stack_auto_compact(st);
-	EXPECT(err == REFTABLE_LOCK_ERROR);
-	EXPECT(st->stats.failures == 1);
-	EXPECT(st->merged->stack_len == 5);
+	EXPECT_ERR(err);
+	EXPECT(st->stats.failures == 0);
+	EXPECT(st->merged->stack_len == 4);
 
 	reftable_stack_destroy(st);
 	strbuf_release(&buf);
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index b06c46999d..37510c2b2a 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -478,19 +478,26 @@ test_expect_success "$command: auto compaction" '
 
 		test_oid blob17_2 | git hash-object -w --stdin &&
 
-		# Lock all tables write some refs. Auto-compaction will be
-		# unable to compact tables and thus fails gracefully, leaving
-		# the stack in a sub-optimal state.
-		ls .git/reftable/*.ref |
+		# Lock all tables, write some refs. Auto-compaction will be
+		# unable to compact tables and thus fails gracefully,
+		# compacting only those tables which are not locked.
+		ls .git/reftable/*.ref | sort |
 		while read table
 		do
-			touch "$table.lock" || exit 1
+			touch "$table.lock" &&
+			basename "$table" >>tables.expect || exit 1
 		done &&
+		test_line_count = 2 .git/reftable/tables.list &&
 		git branch B &&
 		git branch C &&
-		rm .git/reftable/*.lock &&
-		test_line_count = 4 .git/reftable/tables.list &&
 
+		# The new tables are auto-compacted, but the locked tables are
+		# left intact.
+		test_line_count = 3 .git/reftable/tables.list &&
+		head -n 2 .git/reftable/tables.list >tables.head &&
+		test_cmp tables.expect tables.head &&
+
+		rm .git/reftable/*.lock &&
 		git $command --auto &&
 		test_line_count = 1 .git/reftable/tables.list
 	)
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] reftable/stack: simplify tracking of table locks
  2024-07-31 14:15 ` [PATCH 4/8] reftable/stack: simplify tracking of table locks Patrick Steinhardt
@ 2024-07-31 21:57   ` Justin Tobler
  2024-08-02 23:00   ` Junio C Hamano
  1 sibling, 0 replies; 50+ messages in thread
From: Justin Tobler @ 2024-07-31 21:57 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/07/31 04:15PM, Patrick Steinhardt wrote:
> When compacting tables, we store the locks of all tables we are about to
> compact in the `table_locks` array. As we currently only ever compact
> all tables in the user-provided range or none, we simply track those
> locks via the indices of the respective tables in the merged stack.
> 
> This is about to change though, as we will introduce a mode where auto
> compaction gracefully handles the case of already-locked files. In this
> case, it may happen that we only compact a subset of the user-supplied
> range of tables. In this case, the indices will not necessarily match
> the lock indices anymore.
> 
> Refactor the code such that we track the number of locks via a separate
> variable. The resulting code is expected to perform the same, but will
> make it easier to perform the described change.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/stack.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/reftable/stack.c b/reftable/stack.c
> index e5959d2c76..07e7ffc6b9 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -1016,7 +1016,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  	struct lock_file *table_locks = NULL;
>  	struct tempfile *new_table = NULL;
>  	int is_empty_table = 0, err = 0;
> -	size_t i;
> +	size_t i, nlocks = 0;
>  
>  	if (first > last || (!expiry && first == last)) {
>  		err = 0;
> @@ -1051,7 +1051,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  	for (i = first; i <= last; i++) {
>  		stack_filename(&table_name, st, reader_name(st->readers[i]));
>  
> -		err = hold_lock_file_for_update(&table_locks[i - first],
> +		err = hold_lock_file_for_update(&table_locks[nlocks],

Tables in the list are locked in reverse order. Previously, the locks
were also added to `table_locks` in reverse order. This could leave some
elements empty at the beginning if only a subset of tables are locked.
Now each table lock is added starting from index 0. This means the
contents of `table_locks` are now in a reversed order.

Ultimately, this makes no difference though because all the usages also
have updated `table_locks` accesses meaning the same order is maintained
in practice.

So far makes sense :)

>  						table_name.buf, LOCK_NO_DEREF);
>  		if (err < 0) {
>  			if (errno == EEXIST)
> @@ -1066,7 +1066,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  		 * run into file descriptor exhaustion when we compress a lot
>  		 * of tables.
>  		 */
> -		err = close_lock_file_gently(&table_locks[i - first]);
> +		err = close_lock_file_gently(&table_locks[nlocks++]);
>  		if (err < 0) {
>  			err = REFTABLE_IO_ERROR;
>  			goto done;
> @@ -1183,8 +1183,8 @@ static int stack_compact_range(struct reftable_stack *st,
>  	 * Delete the old tables. They may still be in use by concurrent
>  	 * readers, so it is expected that unlinking tables may fail.
>  	 */
> -	for (i = first; i <= last; i++) {
> -		struct lock_file *table_lock = &table_locks[i - first];
> +	for (i = 0; i < nlocks; i++) {
> +		struct lock_file *table_lock = &table_locks[i];
>  		char *table_path = get_locked_file_path(table_lock);
>  		unlink(table_path);
>  		free(table_path);
> @@ -1192,8 +1192,8 @@ static int stack_compact_range(struct reftable_stack *st,
>  
>  done:
>  	rollback_lock_file(&tables_list_lock);
> -	for (i = first; table_locks && i <= last; i++)
> -		rollback_lock_file(&table_locks[i - first]);
> +	for (i = 0; table_locks && i < nlocks; i++)
> +		rollback_lock_file(&table_locks[i]);
>  	reftable_free(table_locks);
>  
>  	delete_tempfile(&new_table);
> -- 
> 2.46.0.dirty
> 



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list"
  2024-07-31 14:15 ` [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list" Patrick Steinhardt
@ 2024-07-31 23:02   ` Justin Tobler
  2024-08-01  8:40     ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Justin Tobler @ 2024-07-31 23:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/07/31 04:15PM, Patrick Steinhardt wrote:
> When modifying "tables.list", we need to lock the list before updating
> it to ensure that no concurrent writers modify the list at the same
> point in time. While we do this via the `lock_file` subsystem when
> compacting the stack, we manually handle the lock when adding a new
> table to it. While not wrong, it is at least inconsistent.

Indeed, much more consistent to use the lockfile API here. :)
> 
> Refactor the code to consistenly lock "tables.list" via the `lock_file`

s/consistenly/consistently/

> subsytem.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/stack.c | 24 +++++++++++++-----------
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/reftable/stack.c b/reftable/stack.c
> index 9ca549294f..9cc91a262c 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -567,7 +567,7 @@ static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
>  }
>  
>  struct reftable_addition {
> -	struct tempfile *lock_file;
> +	struct lock_file tables_list_lock;
>  	struct reftable_stack *stack;
>  
>  	char **new_tables;
> @@ -581,13 +581,13 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
>  					struct reftable_stack *st)
>  {
>  	struct strbuf lock_file_name = STRBUF_INIT;
> -	int err = 0;
> -	add->stack = st;
> +	int err;
>  
> -	strbuf_addf(&lock_file_name, "%s.lock", st->list_file);
> +	add->stack = st;
>  
> -	add->lock_file = create_tempfile(lock_file_name.buf);
> -	if (!add->lock_file) {
> +	err = hold_lock_file_for_update(&add->tables_list_lock, st->list_file,
> +					LOCK_NO_DEREF);
> +	if (err < 0) {
>  		if (errno == EEXIST) {
>  			err = REFTABLE_LOCK_ERROR;
>  		} else {
> @@ -596,7 +596,8 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
>  		goto done;
>  	}
>  	if (st->opts.default_permissions) {
> -		if (chmod(add->lock_file->filename.buf, st->opts.default_permissions) < 0) {
> +		if (chmod(get_lock_file_path(&add->tables_list_lock),
> +			  st->opts.default_permissions) < 0) {
>  			err = REFTABLE_IO_ERROR;
>  			goto done;
>  		}
> @@ -635,7 +636,7 @@ static void reftable_addition_close(struct reftable_addition *add)
>  	add->new_tables_len = 0;
>  	add->new_tables_cap = 0;
>  
> -	delete_tempfile(&add->lock_file);
> +	rollback_lock_file(&add->tables_list_lock);
>  	strbuf_release(&nm);
>  }
>  
> @@ -651,7 +652,7 @@ void reftable_addition_destroy(struct reftable_addition *add)
>  int reftable_addition_commit(struct reftable_addition *add)
>  {
>  	struct strbuf table_list = STRBUF_INIT;
> -	int lock_file_fd = get_tempfile_fd(add->lock_file);
> +	int lock_file_fd = get_lock_file_fd(&add->tables_list_lock);
>  	int err = 0;
>  	size_t i;
>  
> @@ -674,13 +675,14 @@ int reftable_addition_commit(struct reftable_addition *add)
>  		goto done;
>  	}
>  
> -	err = fsync_component(FSYNC_COMPONENT_REFERENCE, lock_file_fd);
> +	err = fsync_component(FSYNC_COMPONENT_REFERENCE,
> +			      get_lock_file_fd(&add->tables_list_lock));

I might be missing something, but is there a reason we have to get the
lock file fd again instead of just using `lock_file_fd`?

>  	if (err < 0) {
>  		err = REFTABLE_IO_ERROR;
>  		goto done;
>  	}
>  
> -	err = rename_tempfile(&add->lock_file, add->stack->list_file);
> +	err = commit_lock_file(&add->tables_list_lock);
>  	if (err < 0) {
>  		err = REFTABLE_IO_ERROR;
>  		goto done;
> -- 
> 2.46.0.dirty
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction
  2024-07-31 14:15 ` [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
@ 2024-08-01  1:04   ` Justin Tobler
  2024-08-01  8:41     ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Justin Tobler @ 2024-08-01  1:04 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/07/31 04:15PM, Patrick Steinhardt wrote:
> The locking employed by compaction uses the following schema:
> 
>   1. Lock "tables.list" and verify that it matches the version we have
>      loaded in core.
> 
>   2. Lock each of the tables in the user-supplied range of tables that
>      we are supposed to compact. These locks prohibit any concurrent
>      process to compact those tables while we are doing that.
> 
>   3. Unlock "tables.list". This enables concurrent processes to add new
>      tables to the stack, but also allows them to compact tables outside
>      of the range of tables that we have locked.
> 
>   4. Perform the compaction.
> 
>   5. Lock "tables.list" again.
> 
>   6. Move the compacted table into place.
> 
>   7. Write the new order of tables, including the compacted table, into
>      the lockfile.
> 
>   8. Commit the lockfile into place.
> 
> Letting concurrent processes modify the "tables.list" file while we are
> doing the compaction is very much part of the design and thus expected.
> After all, it may take some time to compact tables in the case where we
> are compacting a lot or very large tables.

s/or/of/

> But there is a bug in the code. Suppose we have two processes which are
> compacting two slices of the table. Given that we lock each of the
> tables before compacting them, we know that the slices must be disjunct
> from each other. But regardless of that, compaction performed by one
> process will always impact what the other process needs to write to the
> "tables.list" file.

I'm not quite sure I understand at this point how it is possible for two
compaction operations to be performed concurrently. Wouldn't there
always be overlap between the two compaction segments thus causing one
of the operations to be unable to acquire all of the required locks and
abort?

> Right now , we do not check whether the "tables.list" has been

s/now ,/now,/

> changed after we have locked it for the second time in (5). This has the
> consequence that we will always commit the old, cached in-core tables to
> disk without paying to respect what the other process has written. This
> scenario would then lead to data loss and corruption.

If a concurrent compaction happens though, it would mess up the indices
and cause problems when writting the "tables.list" file. That would not
be good.

> This can even happen in the simpler case of one compacting process and
> one writing process. The newly-appended table by the writing process
> would get discarded by the compacting process because it never sees the
> new table.

This is indeed a problem. Since we don't reload the stack, we are
unaware of any concurrently append tables causing them to not be
written in the new "tables.list" file. Scary

> Fix this bug by re-checking whether our stack is still up to date after
> locking for the second time. If it isn't, then we adjust the indices of
> tables to replace in the updated stack.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/stack.c | 101 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 96 insertions(+), 5 deletions(-)
> 
> diff --git a/reftable/stack.c b/reftable/stack.c
> index 9cc91a262c..2b1ac58120 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -1021,7 +1021,9 @@ static int stack_compact_range(struct reftable_stack *st,
>  	struct lock_file *table_locks = NULL;
>  	struct tempfile *new_table = NULL;
>  	int is_empty_table = 0, err = 0;
> +	size_t first_to_replace, last_to_replace;
>  	size_t i, nlocks = 0;
> +	char **names = NULL;
>  
>  	if (first > last || (!expiry && first == last)) {
>  		err = 0;
> @@ -1124,6 +1126,94 @@ static int stack_compact_range(struct reftable_stack *st,
>  		}
>  	}
>  
> +	/*
> +	 * As we have unlocked the stack while compacting our slice of tables
> +	 * it may have happened that a concurrently running process has updated
> +	 * the stack while we were compacting. In that case, we need to check
> +	 * whether the tables that we have just compacted still exist in the
> +	 * stack in the exact same order as we have compacted them.
> +	 *
> +	 * If they do exist, then it is fine to continue and replace those
> +	 * tables with our compacted version. If they don't, then we need to
> +	 * abort.
> +	 */
> +	err = stack_uptodate(st);
> +	if (err < 0)
> +		goto done;
> +	if (err > 0) {
> +		ssize_t new_offset = -1;
> +		int fd;
> +
> +		fd = open(st->list_file, O_RDONLY);
> +		if (fd < 0) {
> +			err = REFTABLE_IO_ERROR;
> +			goto done;
> +		}
> +
> +		err = fd_read_lines(fd, &names);

Reading `names` here will include all tables that were appended
concurrently which we need to accurately rewrite the new "tables.list".
Makes sense.

> +		close(fd);
> +		if (err < 0)
> +			goto done;
> +
> +		/*
> +		 * Search for the offset of the first table that we have
> +		 * compacted in the updated "tables.list" file.
> +		 */
> +		for (size_t i = 0; names[i]; i++) {
> +			if (strcmp(names[i], st->readers[first]->name))
> +				continue;
> +
> +			/*
> +			 * We have found the first entry. Verify that all the
> +			 * subsequent tables we have compacted still exist in
> +			 * the modified stack in the exact same order as we
> +			 * have compacted them.
> +			 */
> +			for (size_t j = 1; j < last - first + 1; j++) {
> +				const char *old = first + j < st->merged->stack_len ?
> +					st->readers[first + j]->name : NULL;
> +				const char *new = names[i + j];
> +
> +				/*
> +				 * If some entries are missing or in case the tables
> +				 * have changed then we need to bail out. Again, this
> +				 * shouldn't ever happen because we have locked the
> +				 * tables we are compacting.
> +				 */
> +				if (!old || !new || strcmp(old, new)) {
> +					err = REFTABLE_OUTDATED_ERROR;
> +					goto done;
> +				}
> +			}
> +
> +			new_offset = i;
> +			break;
> +		}
> +
> +		/*
> +		 * In case we didn't find our compacted tables in the stack we
> +		 * need to bail out. In theory, this should have never happened
> +		 * because we locked the tables we are compacting.
> +		 */
> +		if (new_offset < 0) {
> +			err = REFTABLE_OUTDATED_ERROR;
> +			goto done;
> +		}
> +
> +		/*
> +		 * We have found the new range that we want to replace, so
> +		 * let's update the range of tables that we want to replace.
> +		 */
> +		last_to_replace = last + (new_offset - first);
> +		first_to_replace = new_offset;
> +	} else {
> +		REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);

I was confused at first by the `stack_len` + 1. The extra element is
NULL which tells us there are no more tables to add to the list,
correct? It looks like `fd_read_lines()` also adds an extra element.

> +		for (size_t i = 0; i < st->merged->stack_len; i++)
> +			names[i] = xstrdup(st->readers[i]->name);
> +		last_to_replace = last;
> +		first_to_replace = first;
> +	}
> +
>  	/*
>  	 * If the resulting compacted table is not empty, then we need to move
>  	 * it into place now.
> @@ -1146,12 +1236,12 @@ static int stack_compact_range(struct reftable_stack *st,
>  	 * have just written. In case the compacted table became empty we
>  	 * simply skip writing it.
>  	 */
> -	for (i = 0; i < first; i++)
> -		strbuf_addf(&tables_list_buf, "%s\n", st->readers[i]->name);
> +	for (i = 0; i < first_to_replace; i++)
> +		strbuf_addf(&tables_list_buf, "%s\n", names[i]);
>  	if (!is_empty_table)
>  		strbuf_addf(&tables_list_buf, "%s\n", new_table_name.buf);
> -	for (i = last + 1; i < st->merged->stack_len; i++)
> -		strbuf_addf(&tables_list_buf, "%s\n", st->readers[i]->name);
> +	for (i = last_to_replace + 1; names[i]; i++)
> +		strbuf_addf(&tables_list_buf, "%s\n", names[i]);

The content of names is now up-to-date along with the starting and
ending indices for the segement being compacted. This allows to to omit
the correct segment of tables.
>  
>  	err = write_in_full(get_lock_file_fd(&tables_list_lock),
>  			    tables_list_buf.buf, tables_list_buf.len);
> @@ -1204,9 +1294,10 @@ static int stack_compact_range(struct reftable_stack *st,
>  	delete_tempfile(&new_table);
>  	strbuf_release(&new_table_name);
>  	strbuf_release(&new_table_path);
> -
>  	strbuf_release(&tables_list_buf);
>  	strbuf_release(&table_name);
> +	free_names(names);
> +
>  	return err;
>  }
>  
> -- 
> 2.46.0.dirty
> 



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list"
  2024-07-31 23:02   ` Justin Tobler
@ 2024-08-01  8:40     ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-01  8:40 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

On Wed, Jul 31, 2024 at 06:02:41PM -0500, Justin Tobler wrote:
> On 24/07/31 04:15PM, Patrick Steinhardt wrote:
> > @@ -674,13 +675,14 @@ int reftable_addition_commit(struct reftable_addition *add)
> >  		goto done;
> >  	}
> >  
> > -	err = fsync_component(FSYNC_COMPONENT_REFERENCE, lock_file_fd);
> > +	err = fsync_component(FSYNC_COMPONENT_REFERENCE,
> > +			      get_lock_file_fd(&add->tables_list_lock));
> 
> I might be missing something, but is there a reason we have to get the
> lock file fd again instead of just using `lock_file_fd`?

Not really. No idea what I was thinking here, will fix.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction
  2024-08-01  1:04   ` Justin Tobler
@ 2024-08-01  8:41     ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-01  8:41 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3464 bytes --]

On Wed, Jul 31, 2024 at 08:04:17PM -0500, Justin Tobler wrote:
> On 24/07/31 04:15PM, Patrick Steinhardt wrote:
> > But there is a bug in the code. Suppose we have two processes which are
> > compacting two slices of the table. Given that we lock each of the
> > tables before compacting them, we know that the slices must be disjunct
> > from each other. But regardless of that, compaction performed by one
> > process will always impact what the other process needs to write to the
> > "tables.list" file.
> 
> I'm not quite sure I understand at this point how it is possible for two
> compaction operations to be performed concurrently. Wouldn't there
> always be overlap between the two compaction segments thus causing one
> of the operations to be unable to acquire all of the required locks and
> abort?

In practice we cannot assume anything about how another process compacts
tables. While we can assume something about how a particular version of
Git compacts tables, we cannot assume anything about future versions of
Git or about alternate implementations of Git. The reftable backend
allows for compacting only a subset of tables, and the heuristic is not
mandated by the on-disk format except that the tables that we are about
to compact need to be next to each other in the stack.

Furthermore, with the next patch, we also handle it gracefully when some
parts of the stack are locked already. Thus, it can easily happen that
process A compacts tables 1 to 3, whereas process B will try to compact
tables 1 to 5, fail to acquire the lock for table 3, and then reduce the
range to compact to 3 to 5.

> > changed after we have locked it for the second time in (5). This has the
> > consequence that we will always commit the old, cached in-core tables to
> > disk without paying to respect what the other process has written. This
> > scenario would then lead to data loss and corruption.
> 
> If a concurrent compaction happens though, it would mess up the indices
> and cause problems when writting the "tables.list" file. That would not
> be good.

Yup.

> > This can even happen in the simpler case of one compacting process and
> > one writing process. The newly-appended table by the writing process
> > would get discarded by the compacting process because it never sees the
> > new table.
> 
> This is indeed a problem. Since we don't reload the stack, we are
> unaware of any concurrently append tables causing them to not be
> written in the new "tables.list" file. Scary

Indeed.

> > +		/*
> > +		 * We have found the new range that we want to replace, so
> > +		 * let's update the range of tables that we want to replace.
> > +		 */
> > +		last_to_replace = last + (new_offset - first);
> > +		first_to_replace = new_offset;
> > +	} else {
> > +		REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);
> 
> I was confused at first by the `stack_len` + 1. The extra element is
> NULL which tells us there are no more tables to add to the list,
> correct? It looks like `fd_read_lines()` also adds an extra element.

Yes, that's the reason why we have it. We end up passing `names` to
`free_names()`, which uses `NULL` as a sentinel value to know when to
stop iterating over the array's entries.

I'll add a comment.

Thanks for your review. I'll wait a bit longer before sending out
another version of this patch series to wait for some more feedback.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] reftable/stack: refactor function to gather table sizes
  2024-07-31 14:14 ` [PATCH 1/8] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
@ 2024-08-02 20:26   ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2024-08-02 20:26 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> Refactor the function that gathers table sizes to be more idiomatic. For
> one, use `REFTABLE_CALLOC_ARRAY()` instead of `reftable_calloc()`.
> Second, avoid using an integer to iterate through the tables in the
> reftable stack given that `stack_len` itself is using a `size_t`.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/stack.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/reftable/stack.c b/reftable/stack.c
> index 737591125e..ba8234b486 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -1305,14 +1305,15 @@ struct segment suggest_compaction_segment(uint64_t *sizes, size_t n,
>  
>  static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
>  {
> -	uint64_t *sizes =
> -		reftable_calloc(st->merged->stack_len, sizeof(*sizes));
>  	int version = (st->opts.hash_id == GIT_SHA1_FORMAT_ID) ? 1 : 2;
>  	int overhead = header_size(version) - 1;
> -	int i = 0;
> -	for (i = 0; i < st->merged->stack_len; i++) {
> +	uint64_t *sizes;
> +
> +	REFTABLE_CALLOC_ARRAY(sizes, st->merged->stack_len);
> +
> +	for (size_t i = 0; i < st->merged->stack_len; i++)
>  		sizes[i] = st->readers[i]->size - overhead;
> -	}
> +

OK.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/8] reftable/stack: test compaction with already-locked tables
  2024-07-31 14:14 ` [PATCH 2/8] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
@ 2024-08-02 21:05   ` Junio C Hamano
  2024-08-05 12:11     ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2024-08-02 21:05 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> +static void test_reftable_stack_auto_compaction_with_locked_tables(void)
> +{
> +	struct reftable_write_options opts = {
> +		.disable_auto_compact = 1,
> +	};
> +	struct reftable_stack *st = NULL;
> +	struct strbuf buf = STRBUF_INIT;
> +	char *dir = get_tmp_dir(__LINE__);
> +	int err;
> +
> +	err = reftable_new_stack(&st, dir, &opts);
> +	EXPECT_ERR(err);
> +
> +	for (size_t i = 0; i < 5; i++) {
> +		struct reftable_ref_record ref = {
> +			.update_index = reftable_stack_next_update_index(st),
> +			.value_type = REFTABLE_REF_VAL1,
> +			.value.val1 = { i },
> +		};

As val1 is an array of unsigned char, i cannot reasonably go beyond
255, but that is perfectly fine.  We are preparing 5 original tables
to compact, and that might grow to 17 tables over time, but 255 ought
to be more than enough.

> +
> +		strbuf_reset(&buf);
> +		strbuf_addf(&buf, "refs/heads/branch-%04" PRIuMAX, (uintmax_t) i);

Yet we are prepared to handle i that is beyond any usual integer ;-)

I am tempted to suggest using the bog-standard int for everything
for the sake of consistency within this loop, but it does not matter
all that much in a standalone test program ;-)

> +		ref.refname = buf.buf;
> +
> +		err = reftable_stack_add(st, &write_test_ref, &ref);
> +		EXPECT_ERR(err);
> +	}
> +	EXPECT(st->merged->stack_len == 5);
> +
> +	/*
> +	 * Given that all tables we have written should be roughly the same
> +	 * size, we expect that auto-compaction will want to compact all of the
> +	 * tables. Locking any of the tables will keep it from doing so.
> +	 */
> +	strbuf_reset(&buf);
> +	strbuf_addf(&buf, "%s/%s.lock", dir, st->readers[2]->name);
> +	write_file_buf(buf.buf, "", 0);

OK.  [2] is just a random number pulled out of 0..5?

> +static void test_reftable_stack_compaction_with_locked_tables(void)
> +{
> +	struct reftable_write_options opts = {
> +		.disable_auto_compact = 1,
> +	};
> +	struct reftable_stack *st = NULL;
> +	struct strbuf buf = STRBUF_INIT;
> +	char *dir = get_tmp_dir(__LINE__);
> +	int err;
> +
> +	err = reftable_new_stack(&st, dir, &opts);
> +	EXPECT_ERR(err);
> +
> +	for (size_t i = 0; i < 3; i++) {
> +...
> +	}
> +	EXPECT(st->merged->stack_len == 3);

Hmph, this somehow looks familiar.  The only difference is how many
tables are compacted with which one locked, and whether it is
compact_all() or auto_compact() that triggers the compaction
behaviour, right?

I wonder if we want to factor out the commonality into a shared
function, or it is too much trouble only for two duplicates and we
can worry about it when we were about to add the third one?

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] reftable/stack: simplify tracking of table locks
  2024-07-31 14:15 ` [PATCH 4/8] reftable/stack: simplify tracking of table locks Patrick Steinhardt
  2024-07-31 21:57   ` Justin Tobler
@ 2024-08-02 23:00   ` Junio C Hamano
  2024-08-05 12:11     ` Patrick Steinhardt
  1 sibling, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2024-08-02 23:00 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> -	size_t i;
> +	size_t i, nlocks = 0;
>  
>  	if (first > last || (!expiry && first == last)) {
>  		err = 0;
> @@ -1051,7 +1051,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  	for (i = first; i <= last; i++) {
>  		stack_filename(&table_name, st, reader_name(st->readers[i]));
>  
> -		err = hold_lock_file_for_update(&table_locks[i - first],
> +		err = hold_lock_file_for_update(&table_locks[nlocks],
>  						table_name.buf, LOCK_NO_DEREF);
>  		if (err < 0) {
>  			if (errno == EEXIST)
> @@ -1066,7 +1066,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  		 * run into file descriptor exhaustion when we compress a lot
>  		 * of tables.
>  		 */
> -		err = close_lock_file_gently(&table_locks[i - first]);
> +		err = close_lock_file_gently(&table_locks[nlocks++]);
>  		if (err < 0) {
>  			err = REFTABLE_IO_ERROR;
>  			goto done;

The only unusual control flow in this loop that runs i from first to
last is to leave it upon an error, so "i - first" and "nlocks" is
always the same, at this step in the patch series.

> @@ -1183,8 +1183,8 @@ static int stack_compact_range(struct reftable_stack *st,
>  	 * Delete the old tables. They may still be in use by concurrent
>  	 * readers, so it is expected that unlinking tables may fail.
>  	 */
> -	for (i = first; i <= last; i++) {
> -		struct lock_file *table_lock = &table_locks[i - first];
> +	for (i = 0; i < nlocks; i++) {
> +		struct lock_file *table_lock = &table_locks[i];
>  		char *table_path = get_locked_file_path(table_lock);
>  		unlink(table_path);
>  		free(table_path);

And this one at this step in the patch series is skipped if the
earlier loop saw even a single error, so again, this is a benign
noop change.

> @@ -1192,8 +1192,8 @@ static int stack_compact_range(struct reftable_stack *st,
>  
>  done:
>  	rollback_lock_file(&tables_list_lock);
> -	for (i = first; table_locks && i <= last; i++)
> -		rollback_lock_file(&table_locks[i - first]);
> +	for (i = 0; table_locks && i < nlocks; i++)
> +		rollback_lock_file(&table_locks[i]);

This is a true bugfix, isn't it?  If we failed to create lock file
somewhere in the middle, we used to still go ahead and attempted
rollback_lock_file() on all of them.  Now we rollback only what we
successfully called hold_lock_file_for_update().

I wonder why nobody segfaulted where after a failed lock.  The
answer probably is that lk->tempfile that is NULL will safely bypass
most of the things because is_tempfile_active() would say "false" on
such a lockfile.  But still it probably still were wrong to call
rollback_lock_file() on a "struct lockfile" full of NUL-bytes, and
it is good that we no longer do that.

>  	reftable_free(table_locks);
>  
>  	delete_tempfile(&new_table);

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] reftable/stack: handle locked tables during auto-compaction
  2024-07-31 14:15 ` [PATCH 8/8] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
@ 2024-08-02 23:24   ` Junio C Hamano
  2024-08-05 12:11     ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2024-08-02 23:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> -	for (i = first; i <= last; i++) {
> -		stack_filename(&table_name, st, reader_name(st->readers[i]));
> +	for (i = last + 1; i > first; i--) {
> +		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));
>  
>  		err = hold_lock_file_for_update(&table_locks[nlocks],
>  						table_name.buf, LOCK_NO_DEREF);
>  		if (err < 0) {
> -			if (errno == EEXIST)
> +			/*
> +			 * When the table is locked already we may do a
> +			 * best-effort compaction and compact only the tables
> +			 * that we have managed to lock so far. This of course
> +			 * requires that we have been able to lock at least two
> +			 * tables, otherwise there would be nothing to compact.
> +			 * In that case, we return a lock error to our caller.
> +			 */
> +			if (errno == EEXIST && last - (i - 1) >= 2 &&
> +			    flags & STACK_COMPACT_RANGE_BEST_EFFORT) {
> +				err = 0;
> +				/*
> +				 * The subtraction is to offset the index, the
> +				 * addition is to only compact up to the table
> +				 * of the preceding iteration. They obviously
> +				 * cancel each other out, but that may be
> +				 * non-obvious when it was omitted.
> +				 */
> +				first = (i - 1) + 1;
> +				break;

OK, so this case no longer jumps to "done" label.  Starting from the
high end of the st->readers[] array down to what we manged to take
locks on will be processed with the existing logic that compacted
first..last, which makes sense.

> +			} else if (errno == EEXIST) {
>  				err = REFTABLE_LOCK_ERROR;
> -			else
> +				goto done;
> +			} else {
>  				err = REFTABLE_IO_ERROR;
> -			goto done;
> +				goto done;
> +			}
>  		}
>  
>  		/*
> @@ -1270,7 +1308,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  	 * delete the files after we closed them on Windows, so this needs to
>  	 * happen first.
>  	 */
> -	err = reftable_stack_reload_maybe_reuse(st, first < last);
> +	err = reftable_stack_reload_maybe_reuse(st, first_to_replace < last_to_replace);

What is this change about?

No code changed that computes first_to_replace and last_to_replace?
Perhaps before this step, using first/last and using
first_to_replace/last_to_replace did not make any difference,
because we never dealt with a case where we failed to lock any of
the tables?

I am wondering if this would be helped by a no-op clarification
before the actual behaviour change, similar to how step 4/8 added
the nlocks variable.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/8] reftable/stack: test compaction with already-locked tables
  2024-08-02 21:05   ` Junio C Hamano
@ 2024-08-05 12:11     ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 12:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

On Fri, Aug 02, 2024 at 02:05:43PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > +		ref.refname = buf.buf;
> > +
> > +		err = reftable_stack_add(st, &write_test_ref, &ref);
> > +		EXPECT_ERR(err);
> > +	}
> > +	EXPECT(st->merged->stack_len == 5);
> > +
> > +	/*
> > +	 * Given that all tables we have written should be roughly the same
> > +	 * size, we expect that auto-compaction will want to compact all of the
> > +	 * tables. Locking any of the tables will keep it from doing so.
> > +	 */
> > +	strbuf_reset(&buf);
> > +	strbuf_addf(&buf, "%s/%s.lock", dir, st->readers[2]->name);
> > +	write_file_buf(buf.buf, "", 0);
> 
> OK.  [2] is just a random number pulled out of 0..5?

Not quite as random. It is picked such that we can demonstrate in a
follow-up patch that auto-compaction knows to pack tables 4 and 5, while
leaking tables 1 to 3 intact. This only becomes important in a follow up
patch where we change the backing logic.

> > +static void test_reftable_stack_compaction_with_locked_tables(void)
> > +{
> > +	struct reftable_write_options opts = {
> > +		.disable_auto_compact = 1,
> > +	};
> > +	struct reftable_stack *st = NULL;
> > +	struct strbuf buf = STRBUF_INIT;
> > +	char *dir = get_tmp_dir(__LINE__);
> > +	int err;
> > +
> > +	err = reftable_new_stack(&st, dir, &opts);
> > +	EXPECT_ERR(err);
> > +
> > +	for (size_t i = 0; i < 3; i++) {
> > +...
> > +	}
> > +	EXPECT(st->merged->stack_len == 3);
> 
> Hmph, this somehow looks familiar.  The only difference is how many
> tables are compacted with which one locked, and whether it is
> compact_all() or auto_compact() that triggers the compaction
> behaviour, right?
> 
> I wonder if we want to factor out the commonality into a shared
> function, or it is too much trouble only for two duplicates and we
> can worry about it when we were about to add the third one?

I was also briefly thinking the same, but then didn't follow through
with that thought. In fact, there's multiple places in this file where
we populate a stack with N tables. I think it should be easy enough to
pull this into a function indeed.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] reftable/stack: simplify tracking of table locks
  2024-08-02 23:00   ` Junio C Hamano
@ 2024-08-05 12:11     ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 12:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1743 bytes --]

On Fri, Aug 02, 2024 at 04:00:52PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > @@ -1192,8 +1192,8 @@ static int stack_compact_range(struct reftable_stack *st,
> >  
> >  done:
> >  	rollback_lock_file(&tables_list_lock);
> > -	for (i = first; table_locks && i <= last; i++)
> > -		rollback_lock_file(&table_locks[i - first]);
> > +	for (i = 0; table_locks && i < nlocks; i++)
> > +		rollback_lock_file(&table_locks[i]);
> 
> This is a true bugfix, isn't it?  If we failed to create lock file
> somewhere in the middle, we used to still go ahead and attempted
> rollback_lock_file() on all of them.  Now we rollback only what we
> successfully called hold_lock_file_for_update().
> 
> I wonder why nobody segfaulted where after a failed lock.  The
> answer probably is that lk->tempfile that is NULL will safely bypass
> most of the things because is_tempfile_active() would say "false" on
> such a lockfile.  But still it probably still were wrong to call
> rollback_lock_file() on a "struct lockfile" full of NUL-bytes, and
> it is good that we no longer do that.

I don't think it is. `table_locks` is an array of `struct lockfile` and
is allocated with calloc(3P), so we know that each uninitialized lock
will be all zeroes. For each uninitialized lock, `rollback_lock_file()`
calls `delete_tempfile(&lk->tempfile)`, which derefs the pointer and
thus essentially calls `!is_tempfile_active(lk->tempfile)`, and that
ultimately ends up checking whethere `tempfile` is a `NULL` pointer or
not. And as the structs were zero-initialized, it is a `NULL` pointer
and thus we bail out.

We also have tests that exercise this logic, so it also seems to be fine
in practice.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] reftable/stack: handle locked tables during auto-compaction
  2024-08-02 23:24   ` Junio C Hamano
@ 2024-08-05 12:11     ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 12:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

On Fri, Aug 02, 2024 at 04:24:42PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > +			} else if (errno == EEXIST) {
> >  				err = REFTABLE_LOCK_ERROR;
> > -			else
> > +				goto done;
> > +			} else {
> >  				err = REFTABLE_IO_ERROR;
> > -			goto done;
> > +				goto done;
> > +			}
> >  		}
> >  
> >  		/*
> > @@ -1270,7 +1308,7 @@ static int stack_compact_range(struct reftable_stack *st,
> >  	 * delete the files after we closed them on Windows, so this needs to
> >  	 * happen first.
> >  	 */
> > -	err = reftable_stack_reload_maybe_reuse(st, first < last);
> > +	err = reftable_stack_reload_maybe_reuse(st, first_to_replace < last_to_replace);
> 
> What is this change about?
> 
> No code changed that computes first_to_replace and last_to_replace?
> Perhaps before this step, using first/last and using
> first_to_replace/last_to_replace did not make any difference,
> because we never dealt with a case where we failed to lock any of
> the tables?
> 
> I am wondering if this would be helped by a no-op clarification
> before the actual behaviour change, similar to how step 4/8 added
> the nlocks variable.

As we know that we always want to replace the same set of tables, just
with different offsets, it follows that `first - last` is always equal
to `first_to_replace - last_to_replace`.

So this is a no-op change, let me just drop it.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/9] reftable: improvements and fixes for compaction
  2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
                   ` (7 preceding siblings ...)
  2024-07-31 14:15 ` [PATCH 8/8] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
@ 2024-08-05 13:07 ` Patrick Steinhardt
  2024-08-05 13:07   ` [PATCH v2 1/9] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
                     ` (9 more replies)
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
  9 siblings, 10 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 13:07 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 7432 bytes --]

Hi,

this is the second version of my patch series that aims to improve the
way reftable stack perform compaction.

Changes compared to v1:

  - Extract a test helper function that sets up a stack with N tables
    containing refs.

  - Reuse file descriptor that we have already stored in a local
    variable instead of calling `lock_file_fd()` a second time.

  - Remove a no-op change in the last patch.

  - Add a comment explaining why we have to allocate N+1 many table
    names.

  - Some typo fixes.

Thanks!

Patrick

Patrick Steinhardt (9):
  reftable/stack: refactor function to gather table sizes
  reftable/stack: extract function to setup stack with N tables
  reftable/stack: test compaction with already-locked tables
  reftable/stack: update stats on failed full compaction
  reftable/stack: simplify tracking of table locks
  reftable/stack: do not die when fsyncing lock file files
  reftable/stack: use lock_file when adding table to "tables.list"
  reftable/stack: fix corruption on concurrent compaction
  reftable/stack: handle locked tables during auto-compaction

 reftable/stack.c           | 231 +++++++++++++++++++++++++++++--------
 reftable/stack_test.c      | 138 +++++++++++++++++-----
 t/t0610-reftable-basics.sh |  21 ++--
 3 files changed, 306 insertions(+), 84 deletions(-)

Range-diff against v1:
 1:  5d99191f5c =  1:  6d2b54ba8e reftable/stack: refactor function to gather table sizes
 -:  ---------- >  2:  ff17306cc0 reftable/stack: extract function to setup stack with N tables
 2:  123fb9d80e !  3:  63e64c8d82 reftable/stack: test compaction with already-locked tables
    @@ reftable/stack_test.c: static void test_reftable_stack_auto_compaction(void)
     +	err = reftable_new_stack(&st, dir, &opts);
     +	EXPECT_ERR(err);
     +
    -+	for (size_t i = 0; i < 5; i++) {
    -+		struct reftable_ref_record ref = {
    -+			.update_index = reftable_stack_next_update_index(st),
    -+			.value_type = REFTABLE_REF_VAL1,
    -+			.value.val1 = { i },
    -+		};
    -+
    -+		strbuf_reset(&buf);
    -+		strbuf_addf(&buf, "refs/heads/branch-%04" PRIuMAX, (uintmax_t) i);
    -+		ref.refname = buf.buf;
    -+
    -+		err = reftable_stack_add(st, &write_test_ref, &ref);
    -+		EXPECT_ERR(err);
    -+	}
    ++	write_n_ref_tables(st, &opts, 5);
     +	EXPECT(st->merged->stack_len == 5);
     +
     +	/*
    @@ reftable/stack_test.c: static void test_reftable_stack_add_performs_auto_compact
     +	err = reftable_new_stack(&st, dir, &opts);
     +	EXPECT_ERR(err);
     +
    -+	for (size_t i = 0; i < 3; i++) {
    -+		struct reftable_ref_record ref = {
    -+			.update_index = reftable_stack_next_update_index(st),
    -+			.value_type = REFTABLE_REF_VAL1,
    -+			.value.val1 = { i },
    -+		};
    -+
    -+		strbuf_reset(&buf);
    -+		strbuf_addf(&buf, "refs/heads/branch-%04" PRIuMAX, (uintmax_t) i);
    -+		ref.refname = buf.buf;
    -+
    -+		err = reftable_stack_add(st, &write_test_ref, &ref);
    -+		EXPECT_ERR(err);
    -+	}
    ++	write_n_ref_tables(st, &opts, 3);
     +	EXPECT(st->merged->stack_len == 3);
     +
     +	/* Lock one of the tables that we're about to compact. */
 3:  1fa7acbddf =  4:  1989dafcb4 reftable/stack: update stats on failed full compaction
 4:  40d9f75cf2 =  5:  73e5d104eb reftable/stack: simplify tracking of table locks
 5:  9233c36894 =  6:  e411e14904 reftable/stack: do not die when fsyncing lock file files
 6:  9703246156 !  7:  b868a518d6 reftable/stack: use lock_file when adding table to "tables.list"
    @@ Commit message
         compacting the stack, we manually handle the lock when adding a new
         table to it. While not wrong, it is at least inconsistent.
     
    -    Refactor the code to consistenly lock "tables.list" via the `lock_file`
    +    Refactor the code to consistently lock "tables.list" via the `lock_file`
         subsytem.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
    @@ reftable/stack.c: int reftable_addition_commit(struct reftable_addition *add)
      		goto done;
      	}
      
    --	err = fsync_component(FSYNC_COMPONENT_REFERENCE, lock_file_fd);
    -+	err = fsync_component(FSYNC_COMPONENT_REFERENCE,
    -+			      get_lock_file_fd(&add->tables_list_lock));
    - 	if (err < 0) {
    - 		err = REFTABLE_IO_ERROR;
    - 		goto done;
    - 	}
    - 
     -	err = rename_tempfile(&add->lock_file, add->stack->list_file);
     +	err = commit_lock_file(&add->tables_list_lock);
      	if (err < 0) {
 7:  b746565bf0 !  8:  ff17414d26 reftable/stack: fix corruption on concurrent compaction
    @@ Commit message
         Letting concurrent processes modify the "tables.list" file while we are
         doing the compaction is very much part of the design and thus expected.
         After all, it may take some time to compact tables in the case where we
    -    are compacting a lot or very large tables.
    +    are compacting a lot of very large tables.
     
         But there is a bug in the code. Suppose we have two processes which are
         compacting two slices of the table. Given that we lock each of the
    @@ Commit message
         process will always impact what the other process needs to write to the
         "tables.list" file.
     
    -    Right now , we do not check whether the "tables.list" has been
    -    changed after we have locked it for the second time in (5). This has the
    +    Right now, we do not check whether the "tables.list" has been changed
    +    after we have locked it for the second time in (5). This has the
         consequence that we will always commit the old, cached in-core tables to
         disk without paying to respect what the other process has written. This
         scenario would then lead to data loss and corruption.
    @@ reftable/stack.c: static int stack_compact_range(struct reftable_stack *st,
     +		last_to_replace = last + (new_offset - first);
     +		first_to_replace = new_offset;
     +	} else {
    ++		/*
    ++		 * `fd_read_lines()` uses a `NULL` sentinel to indicate that
    ++		 * the array is at its end. As we use `free_names()` to free
    ++		 * the array, we need to include this sentinel value here and
    ++		 * thus have to allocate `stack_len + 1` many entries.
    ++		 */
     +		REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);
     +		for (size_t i = 0; i < st->merged->stack_len; i++)
     +			names[i] = xstrdup(st->readers[i]->name);
 8:  dc22178307 !  9:  1ef560feb1 reftable/stack: handle locked tables during auto-compaction
    @@ reftable/stack.c: static int stack_compact_range(struct reftable_stack *st,
      		}
      
      		/*
    -@@ reftable/stack.c: static int stack_compact_range(struct reftable_stack *st,
    - 	 * delete the files after we closed them on Windows, so this needs to
    - 	 * happen first.
    - 	 */
    --	err = reftable_stack_reload_maybe_reuse(st, first < last);
    -+	err = reftable_stack_reload_maybe_reuse(st, first_to_replace < last_to_replace);
    - 	if (err < 0)
    - 		goto done;
    - 
     @@ reftable/stack.c: static int stack_compact_range(struct reftable_stack *st,
      
      static int stack_compact_range_stats(struct reftable_stack *st,

base-commit: 39bf06adf96da25b87c9aa7d35a32ef3683eb4a4
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/9] reftable/stack: refactor function to gather table sizes
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
@ 2024-08-05 13:07   ` Patrick Steinhardt
  2024-08-05 13:07   ` [PATCH v2 2/9] reftable/stack: extract function to setup stack with N tables Patrick Steinhardt
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 13:07 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

Refactor the function that gathers table sizes to be more idiomatic. For
one, use `REFTABLE_CALLOC_ARRAY()` instead of `reftable_calloc()`.
Second, avoid using an integer to iterate through the tables in the
reftable stack given that `stack_len` itself is using a `size_t`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 737591125e..ba8234b486 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1305,14 +1305,15 @@ struct segment suggest_compaction_segment(uint64_t *sizes, size_t n,
 
 static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
 {
-	uint64_t *sizes =
-		reftable_calloc(st->merged->stack_len, sizeof(*sizes));
 	int version = (st->opts.hash_id == GIT_SHA1_FORMAT_ID) ? 1 : 2;
 	int overhead = header_size(version) - 1;
-	int i = 0;
-	for (i = 0; i < st->merged->stack_len; i++) {
+	uint64_t *sizes;
+
+	REFTABLE_CALLOC_ARRAY(sizes, st->merged->stack_len);
+
+	for (size_t i = 0; i < st->merged->stack_len; i++)
 		sizes[i] = st->readers[i]->size - overhead;
-	}
+
 	return sizes;
 }
 
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/9] reftable/stack: extract function to setup stack with N tables
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
  2024-08-05 13:07   ` [PATCH v2 1/9] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
@ 2024-08-05 13:07   ` Patrick Steinhardt
  2024-08-05 13:07   ` [PATCH v2 3/9] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 13:07 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3294 bytes --]

We're about to add two tests, and both of them will want to initialize
the reftable stack with a set of N tables. Introduce a new function that
handles this and refactor existing tests that use such a setup to use
it.

Note that this changes the exact records contained in the preexisting
tests. This is fine though as we only care about the shape of the stack
here, not the shape of each table.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack_test.c | 60 ++++++++++++++++++++-----------------------
 1 file changed, 28 insertions(+), 32 deletions(-)

diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index e3c11e6a6e..61dddf7f69 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -109,6 +109,30 @@ static int write_test_ref(struct reftable_writer *wr, void *arg)
 	return reftable_writer_add_ref(wr, ref);
 }
 
+static void write_n_ref_tables(struct reftable_stack *st,
+			       struct reftable_write_options *opts,
+			       size_t n)
+{
+	struct strbuf buf = STRBUF_INIT;
+	int err;
+
+	for (size_t i = 0; i < n; i++) {
+		struct reftable_ref_record ref = {
+			.update_index = reftable_stack_next_update_index(st),
+			.value_type = REFTABLE_REF_VAL1,
+		};
+
+		strbuf_addf(&buf, "refs/heads/branch-%04u", (unsigned) i);
+		ref.refname = buf.buf;
+		set_test_hash(ref.value.val1, i);
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		EXPECT_ERR(err);
+	}
+
+	strbuf_release(&buf);
+}
+
 struct write_log_arg {
 	struct reftable_log_record *log;
 	uint64_t update_index;
@@ -916,25 +940,11 @@ static void test_reftable_stack_compaction_concurrent(void)
 	struct reftable_write_options opts = { 0 };
 	struct reftable_stack *st1 = NULL, *st2 = NULL;
 	char *dir = get_tmp_dir(__LINE__);
-	int err, i;
-	int N = 3;
+	int err;
 
 	err = reftable_new_stack(&st1, dir, &opts);
 	EXPECT_ERR(err);
-
-	for (i = 0; i < N; i++) {
-		char name[100];
-		struct reftable_ref_record ref = {
-			.refname = name,
-			.update_index = reftable_stack_next_update_index(st1),
-			.value_type = REFTABLE_REF_SYMREF,
-			.value.symref = (char *) "master",
-		};
-		snprintf(name, sizeof(name), "branch%04d", i);
-
-		err = reftable_stack_add(st1, &write_test_ref, &ref);
-		EXPECT_ERR(err);
-	}
+	write_n_ref_tables(st1, &opts, 3);
 
 	err = reftable_new_stack(&st2, dir, &opts);
 	EXPECT_ERR(err);
@@ -965,25 +975,11 @@ static void test_reftable_stack_compaction_concurrent_clean(void)
 	struct reftable_write_options opts = { 0 };
 	struct reftable_stack *st1 = NULL, *st2 = NULL, *st3 = NULL;
 	char *dir = get_tmp_dir(__LINE__);
-	int err, i;
-	int N = 3;
+	int err;
 
 	err = reftable_new_stack(&st1, dir, &opts);
 	EXPECT_ERR(err);
-
-	for (i = 0; i < N; i++) {
-		char name[100];
-		struct reftable_ref_record ref = {
-			.refname = name,
-			.update_index = reftable_stack_next_update_index(st1),
-			.value_type = REFTABLE_REF_SYMREF,
-			.value.symref = (char *) "master",
-		};
-		snprintf(name, sizeof(name), "branch%04d", i);
-
-		err = reftable_stack_add(st1, &write_test_ref, &ref);
-		EXPECT_ERR(err);
-	}
+	write_n_ref_tables(st1, &opts, 3);
 
 	err = reftable_new_stack(&st2, dir, &opts);
 	EXPECT_ERR(err);
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/9] reftable/stack: test compaction with already-locked tables
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
  2024-08-05 13:07   ` [PATCH v2 1/9] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
  2024-08-05 13:07   ` [PATCH v2 2/9] reftable/stack: extract function to setup stack with N tables Patrick Steinhardt
@ 2024-08-05 13:07   ` Patrick Steinhardt
  2024-08-08 10:46     ` Karthik Nayak
  2024-08-05 13:08   ` [PATCH v2 4/9] reftable/stack: update stats on failed full compaction Patrick Steinhardt
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 13:07 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3862 bytes --]

We're lacking test coverage for compacting tables when some of the
tables that we are about to compact are locked. Add two tests that
exercise this, one for auto-compaction and one for full compaction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack_test.c | 77 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 61dddf7f69..46d376026b 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -887,6 +887,45 @@ static void test_reftable_stack_auto_compaction(void)
 	clear_dir(dir);
 }
 
+static void test_reftable_stack_auto_compaction_with_locked_tables(void)
+{
+	struct reftable_write_options opts = {
+		.disable_auto_compact = 1,
+	};
+	struct reftable_stack *st = NULL;
+	struct strbuf buf = STRBUF_INIT;
+	char *dir = get_tmp_dir(__LINE__);
+	int err;
+
+	err = reftable_new_stack(&st, dir, &opts);
+	EXPECT_ERR(err);
+
+	write_n_ref_tables(st, &opts, 5);
+	EXPECT(st->merged->stack_len == 5);
+
+	/*
+	 * Given that all tables we have written should be roughly the same
+	 * size, we expect that auto-compaction will want to compact all of the
+	 * tables. Locking any of the tables will keep it from doing so.
+	 */
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "%s/%s.lock", dir, st->readers[2]->name);
+	write_file_buf(buf.buf, "", 0);
+
+	/*
+	 * Ideally, we'd handle the situation where any of the tables is locked
+	 * gracefully. We don't (yet) do this though and thus fail.
+	 */
+	err = reftable_stack_auto_compact(st);
+	EXPECT(err == REFTABLE_LOCK_ERROR);
+	EXPECT(st->stats.failures == 1);
+	EXPECT(st->merged->stack_len == 5);
+
+	reftable_stack_destroy(st);
+	strbuf_release(&buf);
+	clear_dir(dir);
+}
+
 static void test_reftable_stack_add_performs_auto_compaction(void)
 {
 	struct reftable_write_options opts = { 0 };
@@ -935,6 +974,42 @@ static void test_reftable_stack_add_performs_auto_compaction(void)
 	clear_dir(dir);
 }
 
+static void test_reftable_stack_compaction_with_locked_tables(void)
+{
+	struct reftable_write_options opts = {
+		.disable_auto_compact = 1,
+	};
+	struct reftable_stack *st = NULL;
+	struct strbuf buf = STRBUF_INIT;
+	char *dir = get_tmp_dir(__LINE__);
+	int err;
+
+	err = reftable_new_stack(&st, dir, &opts);
+	EXPECT_ERR(err);
+
+	write_n_ref_tables(st, &opts, 3);
+	EXPECT(st->merged->stack_len == 3);
+
+	/* Lock one of the tables that we're about to compact. */
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "%s/%s.lock", dir, st->readers[1]->name);
+	write_file_buf(buf.buf, "", 0);
+
+	/*
+	 * Compaction is expected to fail given that we were not able to
+	 * compact all tables.
+	 */
+	err = reftable_stack_compact_all(st, NULL);
+	EXPECT(err == REFTABLE_LOCK_ERROR);
+	/* TODO: this is wrong, we should get notified about the failure. */
+	EXPECT(st->stats.failures == 0);
+	EXPECT(st->merged->stack_len == 3);
+
+	reftable_stack_destroy(st);
+	strbuf_release(&buf);
+	clear_dir(dir);
+}
+
 static void test_reftable_stack_compaction_concurrent(void)
 {
 	struct reftable_write_options opts = { 0 };
@@ -1012,9 +1087,11 @@ int stack_test_main(int argc, const char *argv[])
 	RUN_TEST(test_reftable_stack_add);
 	RUN_TEST(test_reftable_stack_add_one);
 	RUN_TEST(test_reftable_stack_auto_compaction);
+	RUN_TEST(test_reftable_stack_auto_compaction_with_locked_tables);
 	RUN_TEST(test_reftable_stack_add_performs_auto_compaction);
 	RUN_TEST(test_reftable_stack_compaction_concurrent);
 	RUN_TEST(test_reftable_stack_compaction_concurrent_clean);
+	RUN_TEST(test_reftable_stack_compaction_with_locked_tables);
 	RUN_TEST(test_reftable_stack_hash_id);
 	RUN_TEST(test_reftable_stack_lock_failure);
 	RUN_TEST(test_reftable_stack_log_normalize);
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 4/9] reftable/stack: update stats on failed full compaction
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-08-05 13:07   ` [PATCH v2 3/9] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
@ 2024-08-05 13:08   ` Patrick Steinhardt
  2024-08-05 13:08   ` [PATCH v2 5/9] reftable/stack: simplify tracking of table locks Patrick Steinhardt
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 13:08 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2104 bytes --]

When auto-compaction fails due to a locking error, we update the
statistics to indicate this failure. We're not doing the same when
performing a full compaction.

Fix this inconsistency by using `stack_compact_range_stats()`, which
handles the stat update for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c      | 14 +++++++-------
 reftable/stack_test.c |  3 +--
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index ba8234b486..e5959d2c76 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1205,13 +1205,6 @@ static int stack_compact_range(struct reftable_stack *st,
 	return err;
 }
 
-int reftable_stack_compact_all(struct reftable_stack *st,
-			       struct reftable_log_expiry_config *config)
-{
-	return stack_compact_range(st, 0, st->merged->stack_len ?
-			st->merged->stack_len - 1 : 0, config);
-}
-
 static int stack_compact_range_stats(struct reftable_stack *st,
 				     size_t first, size_t last,
 				     struct reftable_log_expiry_config *config)
@@ -1222,6 +1215,13 @@ static int stack_compact_range_stats(struct reftable_stack *st,
 	return err;
 }
 
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	size_t last = st->merged->stack_len ? st->merged->stack_len - 1 : 0;
+	return stack_compact_range_stats(st, 0, last, config);
+}
+
 static int segment_size(struct segment *s)
 {
 	return s->end - s->start;
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 46d376026b..2b1eb83934 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -1001,8 +1001,7 @@ static void test_reftable_stack_compaction_with_locked_tables(void)
 	 */
 	err = reftable_stack_compact_all(st, NULL);
 	EXPECT(err == REFTABLE_LOCK_ERROR);
-	/* TODO: this is wrong, we should get notified about the failure. */
-	EXPECT(st->stats.failures == 0);
+	EXPECT(st->stats.failures == 1);
 	EXPECT(st->merged->stack_len == 3);
 
 	reftable_stack_destroy(st);
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 5/9] reftable/stack: simplify tracking of table locks
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2024-08-05 13:08   ` [PATCH v2 4/9] reftable/stack: update stats on failed full compaction Patrick Steinhardt
@ 2024-08-05 13:08   ` Patrick Steinhardt
  2024-08-05 13:08   ` [PATCH v2 6/9] reftable/stack: do not die when fsyncing lock file files Patrick Steinhardt
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 13:08 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3041 bytes --]

When compacting tables, we store the locks of all tables we are about to
compact in the `table_locks` array. As we currently only ever compact
all tables in the user-provided range or none, we simply track those
locks via the indices of the respective tables in the merged stack.

This is about to change though, as we will introduce a mode where auto
compaction gracefully handles the case of already-locked files. In this
case, it may happen that we only compact a subset of the user-supplied
range of tables. In this case, the indices will not necessarily match
the lock indices anymore.

Refactor the code such that we track the number of locks via a separate
variable. The resulting code is expected to perform the same, but will
make it easier to perform the described change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index e5959d2c76..07e7ffc6b9 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1016,7 +1016,7 @@ static int stack_compact_range(struct reftable_stack *st,
 	struct lock_file *table_locks = NULL;
 	struct tempfile *new_table = NULL;
 	int is_empty_table = 0, err = 0;
-	size_t i;
+	size_t i, nlocks = 0;
 
 	if (first > last || (!expiry && first == last)) {
 		err = 0;
@@ -1051,7 +1051,7 @@ static int stack_compact_range(struct reftable_stack *st,
 	for (i = first; i <= last; i++) {
 		stack_filename(&table_name, st, reader_name(st->readers[i]));
 
-		err = hold_lock_file_for_update(&table_locks[i - first],
+		err = hold_lock_file_for_update(&table_locks[nlocks],
 						table_name.buf, LOCK_NO_DEREF);
 		if (err < 0) {
 			if (errno == EEXIST)
@@ -1066,7 +1066,7 @@ static int stack_compact_range(struct reftable_stack *st,
 		 * run into file descriptor exhaustion when we compress a lot
 		 * of tables.
 		 */
-		err = close_lock_file_gently(&table_locks[i - first]);
+		err = close_lock_file_gently(&table_locks[nlocks++]);
 		if (err < 0) {
 			err = REFTABLE_IO_ERROR;
 			goto done;
@@ -1183,8 +1183,8 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * Delete the old tables. They may still be in use by concurrent
 	 * readers, so it is expected that unlinking tables may fail.
 	 */
-	for (i = first; i <= last; i++) {
-		struct lock_file *table_lock = &table_locks[i - first];
+	for (i = 0; i < nlocks; i++) {
+		struct lock_file *table_lock = &table_locks[i];
 		char *table_path = get_locked_file_path(table_lock);
 		unlink(table_path);
 		free(table_path);
@@ -1192,8 +1192,8 @@ static int stack_compact_range(struct reftable_stack *st,
 
 done:
 	rollback_lock_file(&tables_list_lock);
-	for (i = first; table_locks && i <= last; i++)
-		rollback_lock_file(&table_locks[i - first]);
+	for (i = 0; table_locks && i < nlocks; i++)
+		rollback_lock_file(&table_locks[i]);
 	reftable_free(table_locks);
 
 	delete_tempfile(&new_table);
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 6/9] reftable/stack: do not die when fsyncing lock file files
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2024-08-05 13:08   ` [PATCH v2 5/9] reftable/stack: simplify tracking of table locks Patrick Steinhardt
@ 2024-08-05 13:08   ` Patrick Steinhardt
  2024-08-05 13:08   ` [PATCH v2 7/9] reftable/stack: use lock_file when adding table to "tables.list" Patrick Steinhardt
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 13:08 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

We use `fsync_component_or_die()` when committing an addition to the
"tables.list" lock file, which unsurprisingly dies in case the fsync
fails. Given that this is part of the reftable library, we should never
die and instead let callers handle the error.

Adapt accordingly and use `fsync_component()` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 07e7ffc6b9..9ca549294f 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -674,8 +674,11 @@ int reftable_addition_commit(struct reftable_addition *add)
 		goto done;
 	}
 
-	fsync_component_or_die(FSYNC_COMPONENT_REFERENCE, lock_file_fd,
-			       get_tempfile_path(add->lock_file));
+	err = fsync_component(FSYNC_COMPONENT_REFERENCE, lock_file_fd);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
 
 	err = rename_tempfile(&add->lock_file, add->stack->list_file);
 	if (err < 0) {
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 7/9] reftable/stack: use lock_file when adding table to "tables.list"
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2024-08-05 13:08   ` [PATCH v2 6/9] reftable/stack: do not die when fsyncing lock file files Patrick Steinhardt
@ 2024-08-05 13:08   ` Patrick Steinhardt
  2024-08-05 13:08   ` [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 13:08 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2848 bytes --]

When modifying "tables.list", we need to lock the list before updating
it to ensure that no concurrent writers modify the list at the same
point in time. While we do this via the `lock_file` subsystem when
compacting the stack, we manually handle the lock when adding a new
table to it. While not wrong, it is at least inconsistent.

Refactor the code to consistently lock "tables.list" via the `lock_file`
subsytem.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 9ca549294f..54982e0f7d 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -567,7 +567,7 @@ static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
 }
 
 struct reftable_addition {
-	struct tempfile *lock_file;
+	struct lock_file tables_list_lock;
 	struct reftable_stack *stack;
 
 	char **new_tables;
@@ -581,13 +581,13 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 					struct reftable_stack *st)
 {
 	struct strbuf lock_file_name = STRBUF_INIT;
-	int err = 0;
-	add->stack = st;
+	int err;
 
-	strbuf_addf(&lock_file_name, "%s.lock", st->list_file);
+	add->stack = st;
 
-	add->lock_file = create_tempfile(lock_file_name.buf);
-	if (!add->lock_file) {
+	err = hold_lock_file_for_update(&add->tables_list_lock, st->list_file,
+					LOCK_NO_DEREF);
+	if (err < 0) {
 		if (errno == EEXIST) {
 			err = REFTABLE_LOCK_ERROR;
 		} else {
@@ -596,7 +596,8 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 		goto done;
 	}
 	if (st->opts.default_permissions) {
-		if (chmod(add->lock_file->filename.buf, st->opts.default_permissions) < 0) {
+		if (chmod(get_lock_file_path(&add->tables_list_lock),
+			  st->opts.default_permissions) < 0) {
 			err = REFTABLE_IO_ERROR;
 			goto done;
 		}
@@ -635,7 +636,7 @@ static void reftable_addition_close(struct reftable_addition *add)
 	add->new_tables_len = 0;
 	add->new_tables_cap = 0;
 
-	delete_tempfile(&add->lock_file);
+	rollback_lock_file(&add->tables_list_lock);
 	strbuf_release(&nm);
 }
 
@@ -651,7 +652,7 @@ void reftable_addition_destroy(struct reftable_addition *add)
 int reftable_addition_commit(struct reftable_addition *add)
 {
 	struct strbuf table_list = STRBUF_INIT;
-	int lock_file_fd = get_tempfile_fd(add->lock_file);
+	int lock_file_fd = get_lock_file_fd(&add->tables_list_lock);
 	int err = 0;
 	size_t i;
 
@@ -680,7 +681,7 @@ int reftable_addition_commit(struct reftable_addition *add)
 		goto done;
 	}
 
-	err = rename_tempfile(&add->lock_file, add->stack->list_file);
+	err = commit_lock_file(&add->tables_list_lock);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2024-08-05 13:08   ` [PATCH v2 7/9] reftable/stack: use lock_file when adding table to "tables.list" Patrick Steinhardt
@ 2024-08-05 13:08   ` Patrick Steinhardt
  2024-08-08 12:14     ` Karthik Nayak
  2024-08-05 13:08   ` [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
  2024-08-08 12:26   ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Karthik Nayak
  9 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 13:08 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 7157 bytes --]

The locking employed by compaction uses the following schema:

  1. Lock "tables.list" and verify that it matches the version we have
     loaded in core.

  2. Lock each of the tables in the user-supplied range of tables that
     we are supposed to compact. These locks prohibit any concurrent
     process to compact those tables while we are doing that.

  3. Unlock "tables.list". This enables concurrent processes to add new
     tables to the stack, but also allows them to compact tables outside
     of the range of tables that we have locked.

  4. Perform the compaction.

  5. Lock "tables.list" again.

  6. Move the compacted table into place.

  7. Write the new order of tables, including the compacted table, into
     the lockfile.

  8. Commit the lockfile into place.

Letting concurrent processes modify the "tables.list" file while we are
doing the compaction is very much part of the design and thus expected.
After all, it may take some time to compact tables in the case where we
are compacting a lot of very large tables.

But there is a bug in the code. Suppose we have two processes which are
compacting two slices of the table. Given that we lock each of the
tables before compacting them, we know that the slices must be disjunct
from each other. But regardless of that, compaction performed by one
process will always impact what the other process needs to write to the
"tables.list" file.

Right now, we do not check whether the "tables.list" has been changed
after we have locked it for the second time in (5). This has the
consequence that we will always commit the old, cached in-core tables to
disk without paying to respect what the other process has written. This
scenario would then lead to data loss and corruption.

This can even happen in the simpler case of one compacting process and
one writing process. The newly-appended table by the writing process
would get discarded by the compacting process because it never sees the
new table.

Fix this bug by re-checking whether our stack is still up to date after
locking for the second time. If it isn't, then we adjust the indices of
tables to replace in the updated stack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 107 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 102 insertions(+), 5 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 54982e0f7d..51eb4470c7 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1020,7 +1020,9 @@ static int stack_compact_range(struct reftable_stack *st,
 	struct lock_file *table_locks = NULL;
 	struct tempfile *new_table = NULL;
 	int is_empty_table = 0, err = 0;
+	size_t first_to_replace, last_to_replace;
 	size_t i, nlocks = 0;
+	char **names = NULL;
 
 	if (first > last || (!expiry && first == last)) {
 		err = 0;
@@ -1123,6 +1125,100 @@ static int stack_compact_range(struct reftable_stack *st,
 		}
 	}
 
+	/*
+	 * As we have unlocked the stack while compacting our slice of tables
+	 * it may have happened that a concurrently running process has updated
+	 * the stack while we were compacting. In that case, we need to check
+	 * whether the tables that we have just compacted still exist in the
+	 * stack in the exact same order as we have compacted them.
+	 *
+	 * If they do exist, then it is fine to continue and replace those
+	 * tables with our compacted version. If they don't, then we need to
+	 * abort.
+	 */
+	err = stack_uptodate(st);
+	if (err < 0)
+		goto done;
+	if (err > 0) {
+		ssize_t new_offset = -1;
+		int fd;
+
+		fd = open(st->list_file, O_RDONLY);
+		if (fd < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto done;
+		}
+
+		err = fd_read_lines(fd, &names);
+		close(fd);
+		if (err < 0)
+			goto done;
+
+		/*
+		 * Search for the offset of the first table that we have
+		 * compacted in the updated "tables.list" file.
+		 */
+		for (size_t i = 0; names[i]; i++) {
+			if (strcmp(names[i], st->readers[first]->name))
+				continue;
+
+			/*
+			 * We have found the first entry. Verify that all the
+			 * subsequent tables we have compacted still exist in
+			 * the modified stack in the exact same order as we
+			 * have compacted them.
+			 */
+			for (size_t j = 1; j < last - first + 1; j++) {
+				const char *old = first + j < st->merged->stack_len ?
+					st->readers[first + j]->name : NULL;
+				const char *new = names[i + j];
+
+				/*
+				 * If some entries are missing or in case the tables
+				 * have changed then we need to bail out. Again, this
+				 * shouldn't ever happen because we have locked the
+				 * tables we are compacting.
+				 */
+				if (!old || !new || strcmp(old, new)) {
+					err = REFTABLE_OUTDATED_ERROR;
+					goto done;
+				}
+			}
+
+			new_offset = i;
+			break;
+		}
+
+		/*
+		 * In case we didn't find our compacted tables in the stack we
+		 * need to bail out. In theory, this should have never happened
+		 * because we locked the tables we are compacting.
+		 */
+		if (new_offset < 0) {
+			err = REFTABLE_OUTDATED_ERROR;
+			goto done;
+		}
+
+		/*
+		 * We have found the new range that we want to replace, so
+		 * let's update the range of tables that we want to replace.
+		 */
+		last_to_replace = last + (new_offset - first);
+		first_to_replace = new_offset;
+	} else {
+		/*
+		 * `fd_read_lines()` uses a `NULL` sentinel to indicate that
+		 * the array is at its end. As we use `free_names()` to free
+		 * the array, we need to include this sentinel value here and
+		 * thus have to allocate `stack_len + 1` many entries.
+		 */
+		REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);
+		for (size_t i = 0; i < st->merged->stack_len; i++)
+			names[i] = xstrdup(st->readers[i]->name);
+		last_to_replace = last;
+		first_to_replace = first;
+	}
+
 	/*
 	 * If the resulting compacted table is not empty, then we need to move
 	 * it into place now.
@@ -1145,12 +1241,12 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * have just written. In case the compacted table became empty we
 	 * simply skip writing it.
 	 */
-	for (i = 0; i < first; i++)
-		strbuf_addf(&tables_list_buf, "%s\n", st->readers[i]->name);
+	for (i = 0; i < first_to_replace; i++)
+		strbuf_addf(&tables_list_buf, "%s\n", names[i]);
 	if (!is_empty_table)
 		strbuf_addf(&tables_list_buf, "%s\n", new_table_name.buf);
-	for (i = last + 1; i < st->merged->stack_len; i++)
-		strbuf_addf(&tables_list_buf, "%s\n", st->readers[i]->name);
+	for (i = last_to_replace + 1; names[i]; i++)
+		strbuf_addf(&tables_list_buf, "%s\n", names[i]);
 
 	err = write_in_full(get_lock_file_fd(&tables_list_lock),
 			    tables_list_buf.buf, tables_list_buf.len);
@@ -1203,9 +1299,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	delete_tempfile(&new_table);
 	strbuf_release(&new_table_name);
 	strbuf_release(&new_table_path);
-
 	strbuf_release(&tables_list_buf);
 	strbuf_release(&table_name);
+	free_names(names);
+
 	return err;
 }
 
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2024-08-05 13:08   ` [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
@ 2024-08-05 13:08   ` Patrick Steinhardt
  2024-08-06 18:46     ` Justin Tobler
  2024-08-08 12:25     ` Karthik Nayak
  2024-08-08 12:26   ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Karthik Nayak
  9 siblings, 2 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-05 13:08 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 8750 bytes --]

When compacting tables, it may happen that we want to compact a set of
tables which are already locked by a concurrent process that compacts
them. In the case where we wanted to perform a full compaction of all
tables it is sensible to bail out in this case, as we cannot fulfill the
requested action.

But when performing auto-compaction it isn't necessarily in our best
interest of us to abort the whole operation. For example, due to the
geometric compacting schema that we use, it may be that process A takes
a lot of time to compact the bulk of all tables whereas process B
appends a bunch of new tables to the stack. B would in this case also
notice that it has to compact the tables that process A is compacting
already and thus also try to compact the same range, probably including
the new tables it has appended. But because those tables are locked
already, it will fail and thus abort the complete auto-compaction. The
consequence is that the stack will grow longer and longer while A isn't
yet done with compaction, which will lead to a growing performance
impact.

Instead of aborting auto-compaction altogether, let's gracefully handle
this situation by instead compacting tables which aren't locked. To do
so, instead of locking from the beginning of the slice-to-be-compacted,
we start locking tables from the end of the slice. Once we hit the first
table that is locked already, we abort. If we succeded to lock two or
more tables, then we simply reduce the slice of tables that we're about
to compact to those which we managed to lock.

This ensures that we can at least make some progress for compaction in
said scenario. It also helps in other scenarios, like for example when a
process died and left a stale lockfile behind. In such a case we can at
least ensure some compaction on a best-effort basis.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c           | 59 +++++++++++++++++++++++++++++++-------
 reftable/stack_test.c      | 12 ++++----
 t/t0610-reftable-basics.sh | 21 +++++++++-----
 3 files changed, 70 insertions(+), 22 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 51eb4470c7..b0721640e4 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -999,6 +999,15 @@ static int stack_write_compact(struct reftable_stack *st,
 	return err;
 }
 
+enum stack_compact_range_flags {
+	/*
+	 * Perform a best-effort compaction. That is, even if we cannot lock
+	 * all tables in the specified range, we will try to compact the
+	 * remaining slice.
+	 */
+	STACK_COMPACT_RANGE_BEST_EFFORT = (1 << 0),
+};
+
 /*
  * Compact all tables in the range `[first, last)` into a single new table.
  *
@@ -1010,7 +1019,8 @@ static int stack_write_compact(struct reftable_stack *st,
  */
 static int stack_compact_range(struct reftable_stack *st,
 			       size_t first, size_t last,
-			       struct reftable_log_expiry_config *expiry)
+			       struct reftable_log_expiry_config *expiry,
+			       unsigned int flags)
 {
 	struct strbuf tables_list_buf = STRBUF_INIT;
 	struct strbuf new_table_name = STRBUF_INIT;
@@ -1052,19 +1062,47 @@ static int stack_compact_range(struct reftable_stack *st,
 	/*
 	 * Lock all tables in the user-provided range. This is the slice of our
 	 * stack which we'll compact.
+	 *
+	 * Note that we lock tables in reverse order from last to first. The
+	 * intent behind this is to allow a newer process to perform best
+	 * effort compaction of tables that it has added in the case where an
+	 * older process is still busy compacting tables which are preexisting
+	 * from the point of view of the newer process.
 	 */
 	REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1);
-	for (i = first; i <= last; i++) {
-		stack_filename(&table_name, st, reader_name(st->readers[i]));
+	for (i = last + 1; i > first; i--) {
+		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));
 
 		err = hold_lock_file_for_update(&table_locks[nlocks],
 						table_name.buf, LOCK_NO_DEREF);
 		if (err < 0) {
-			if (errno == EEXIST)
+			/*
+			 * When the table is locked already we may do a
+			 * best-effort compaction and compact only the tables
+			 * that we have managed to lock so far. This of course
+			 * requires that we have been able to lock at least two
+			 * tables, otherwise there would be nothing to compact.
+			 * In that case, we return a lock error to our caller.
+			 */
+			if (errno == EEXIST && last - (i - 1) >= 2 &&
+			    flags & STACK_COMPACT_RANGE_BEST_EFFORT) {
+				err = 0;
+				/*
+				 * The subtraction is to offset the index, the
+				 * addition is to only compact up to the table
+				 * of the preceding iteration. They obviously
+				 * cancel each other out, but that may be
+				 * non-obvious when it was omitted.
+				 */
+				first = (i - 1) + 1;
+				break;
+			} else if (errno == EEXIST) {
 				err = REFTABLE_LOCK_ERROR;
-			else
+				goto done;
+			} else {
 				err = REFTABLE_IO_ERROR;
-			goto done;
+				goto done;
+			}
 		}
 
 		/*
@@ -1308,9 +1346,10 @@ static int stack_compact_range(struct reftable_stack *st,
 
 static int stack_compact_range_stats(struct reftable_stack *st,
 				     size_t first, size_t last,
-				     struct reftable_log_expiry_config *config)
+				     struct reftable_log_expiry_config *config,
+				     unsigned int flags)
 {
-	int err = stack_compact_range(st, first, last, config);
+	int err = stack_compact_range(st, first, last, config, flags);
 	if (err == REFTABLE_LOCK_ERROR)
 		st->stats.failures++;
 	return err;
@@ -1320,7 +1359,7 @@ int reftable_stack_compact_all(struct reftable_stack *st,
 			       struct reftable_log_expiry_config *config)
 {
 	size_t last = st->merged->stack_len ? st->merged->stack_len - 1 : 0;
-	return stack_compact_range_stats(st, 0, last, config);
+	return stack_compact_range_stats(st, 0, last, config, 0);
 }
 
 static int segment_size(struct segment *s)
@@ -1427,7 +1466,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st)
 	reftable_free(sizes);
 	if (segment_size(&seg) > 0)
 		return stack_compact_range_stats(st, seg.start, seg.end - 1,
-						 NULL);
+						 NULL, STACK_COMPACT_RANGE_BEST_EFFORT);
 
 	return 0;
 }
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 2b1eb83934..733cf6113e 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -913,13 +913,15 @@ static void test_reftable_stack_auto_compaction_with_locked_tables(void)
 	write_file_buf(buf.buf, "", 0);
 
 	/*
-	 * Ideally, we'd handle the situation where any of the tables is locked
-	 * gracefully. We don't (yet) do this though and thus fail.
+	 * When parts of the stack are locked, then auto-compaction does a best
+	 * effort compaction of those tables which aren't locked. So while this
+	 * would in theory compact all tables, due to the preexisting lock we
+	 * only compact the newest two tables.
 	 */
 	err = reftable_stack_auto_compact(st);
-	EXPECT(err == REFTABLE_LOCK_ERROR);
-	EXPECT(st->stats.failures == 1);
-	EXPECT(st->merged->stack_len == 5);
+	EXPECT_ERR(err);
+	EXPECT(st->stats.failures == 0);
+	EXPECT(st->merged->stack_len == 4);
 
 	reftable_stack_destroy(st);
 	strbuf_release(&buf);
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index b06c46999d..37510c2b2a 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -478,19 +478,26 @@ test_expect_success "$command: auto compaction" '
 
 		test_oid blob17_2 | git hash-object -w --stdin &&
 
-		# Lock all tables write some refs. Auto-compaction will be
-		# unable to compact tables and thus fails gracefully, leaving
-		# the stack in a sub-optimal state.
-		ls .git/reftable/*.ref |
+		# Lock all tables, write some refs. Auto-compaction will be
+		# unable to compact tables and thus fails gracefully,
+		# compacting only those tables which are not locked.
+		ls .git/reftable/*.ref | sort |
 		while read table
 		do
-			touch "$table.lock" || exit 1
+			touch "$table.lock" &&
+			basename "$table" >>tables.expect || exit 1
 		done &&
+		test_line_count = 2 .git/reftable/tables.list &&
 		git branch B &&
 		git branch C &&
-		rm .git/reftable/*.lock &&
-		test_line_count = 4 .git/reftable/tables.list &&
 
+		# The new tables are auto-compacted, but the locked tables are
+		# left intact.
+		test_line_count = 3 .git/reftable/tables.list &&
+		head -n 2 .git/reftable/tables.list >tables.head &&
+		test_cmp tables.expect tables.head &&
+
+		rm .git/reftable/*.lock &&
 		git $command --auto &&
 		test_line_count = 1 .git/reftable/tables.list
 	)
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction
  2024-08-05 13:08   ` [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
@ 2024-08-06 18:46     ` Justin Tobler
  2024-08-07  5:31       ` Patrick Steinhardt
  2024-08-08 12:25     ` Karthik Nayak
  1 sibling, 1 reply; 50+ messages in thread
From: Justin Tobler @ 2024-08-06 18:46 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano

On 24/08/05 03:08PM, Patrick Steinhardt wrote:
> When compacting tables, it may happen that we want to compact a set of
> tables which are already locked by a concurrent process that compacts
> them. In the case where we wanted to perform a full compaction of all
> tables it is sensible to bail out in this case, as we cannot fulfill the
> requested action.
> 
> But when performing auto-compaction it isn't necessarily in our best
> interest of us to abort the whole operation. For example, due to the
> geometric compacting schema that we use, it may be that process A takes
> a lot of time to compact the bulk of all tables whereas process B
> appends a bunch of new tables to the stack. B would in this case also
> notice that it has to compact the tables that process A is compacting
> already and thus also try to compact the same range, probably including
> the new tables it has appended. But because those tables are locked
> already, it will fail and thus abort the complete auto-compaction. The
> consequence is that the stack will grow longer and longer while A isn't
> yet done with compaction, which will lead to a growing performance
> impact.

With this change, a concurrent compaction may create tables that violate
the geometric sequence post-compaction, but is already possible because
compaction doesn't take into account newly appended tables. It makes
sense to compact available tables to minimize performance impact.

> Instead of aborting auto-compaction altogether, let's gracefully handle
> this situation by instead compacting tables which aren't locked. To do
> so, instead of locking from the beginning of the slice-to-be-compacted,
> we start locking tables from the end of the slice. Once we hit the first
> table that is locked already, we abort. If we succeded to lock two or
> more tables, then we simply reduce the slice of tables that we're about
> to compact to those which we managed to lock.
> 
> This ensures that we can at least make some progress for compaction in
> said scenario. It also helps in other scenarios, like for example when a
> process died and left a stale lockfile behind. In such a case we can at
> least ensure some compaction on a best-effort basis.

At first I wondered best-effort compaction could further mask a stale
lockfile issue, but auto-compaction already attempted compaction gently
and did not report issues to begin with. So this is just a win. :)

> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/stack.c           | 59 +++++++++++++++++++++++++++++++-------
>  reftable/stack_test.c      | 12 ++++----
>  t/t0610-reftable-basics.sh | 21 +++++++++-----
>  3 files changed, 70 insertions(+), 22 deletions(-)
> 
> diff --git a/reftable/stack.c b/reftable/stack.c
> index 51eb4470c7..b0721640e4 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -999,6 +999,15 @@ static int stack_write_compact(struct reftable_stack *st,
>  	return err;
>  }
>  
> +enum stack_compact_range_flags {
> +	/*
> +	 * Perform a best-effort compaction. That is, even if we cannot lock
> +	 * all tables in the specified range, we will try to compact the
> +	 * remaining slice.
> +	 */
> +	STACK_COMPACT_RANGE_BEST_EFFORT = (1 << 0),
> +};
> +
>  /*
>   * Compact all tables in the range `[first, last)` into a single new table.
>   *
> @@ -1010,7 +1019,8 @@ static int stack_write_compact(struct reftable_stack *st,
>   */
>  static int stack_compact_range(struct reftable_stack *st,
>  			       size_t first, size_t last,
> -			       struct reftable_log_expiry_config *expiry)
> +			       struct reftable_log_expiry_config *expiry,
> +			       unsigned int flags)
>  {
>  	struct strbuf tables_list_buf = STRBUF_INIT;
>  	struct strbuf new_table_name = STRBUF_INIT;
> @@ -1052,19 +1062,47 @@ static int stack_compact_range(struct reftable_stack *st,
>  	/*
>  	 * Lock all tables in the user-provided range. This is the slice of our
>  	 * stack which we'll compact.
> +	 *
> +	 * Note that we lock tables in reverse order from last to first. The
> +	 * intent behind this is to allow a newer process to perform best
> +	 * effort compaction of tables that it has added in the case where an
> +	 * older process is still busy compacting tables which are preexisting
> +	 * from the point of view of the newer process.
>  	 */

Now that concurrent compaction of discrete table segments is possible,
locking the tables in reverse order allows a compaction process to
immediately mark the ending table of the segment and thus avoids
unnecessary competition between other concurrent compaction processes.

>  	REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1);
> -	for (i = first; i <= last; i++) {
> -		stack_filename(&table_name, st, reader_name(st->readers[i]));
> +	for (i = last + 1; i > first; i--) {
> +		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));

I might be missing something, but why not set `i = last` and `i >=
first`? It looks like everywhere we reference `i` we subtract one
anyways. Since `last` is already at the starting index, it seems it
would be a bit more straightforward.

>  
>  		err = hold_lock_file_for_update(&table_locks[nlocks],
>  						table_name.buf, LOCK_NO_DEREF);
>  		if (err < 0) {
> -			if (errno == EEXIST)
> +			/*
> +			 * When the table is locked already we may do a
> +			 * best-effort compaction and compact only the tables
> +			 * that we have managed to lock so far. This of course
> +			 * requires that we have been able to lock at least two
> +			 * tables, otherwise there would be nothing to compact.
> +			 * In that case, we return a lock error to our caller.
> +			 */
> +			if (errno == EEXIST && last - (i - 1) >= 2 &&
> +			    flags & STACK_COMPACT_RANGE_BEST_EFFORT) {
> +				err = 0;
> +				/*
> +				 * The subtraction is to offset the index, the
> +				 * addition is to only compact up to the table
> +				 * of the preceding iteration. They obviously
> +				 * cancel each other out, but that may be
> +				 * non-obvious when it was omitted.
> +				 */
> +				first = (i - 1) + 1;

I think we could solve potential confusion here by changing how `i` is
initially set also.

> +				break;
> +			} else if (errno == EEXIST) {
>  				err = REFTABLE_LOCK_ERROR;
> -			else
> +				goto done;
> +			} else {
>  				err = REFTABLE_IO_ERROR;
> -			goto done;
> +				goto done;
> +			}
>  		}
>  
>  		/*
> @@ -1308,9 +1346,10 @@ static int stack_compact_range(struct reftable_stack *st,
>  
>  static int stack_compact_range_stats(struct reftable_stack *st,
>  				     size_t first, size_t last,
> -				     struct reftable_log_expiry_config *config)
> +				     struct reftable_log_expiry_config *config,
> +				     unsigned int flags)
>  {
> -	int err = stack_compact_range(st, first, last, config);
> +	int err = stack_compact_range(st, first, last, config, flags);
>  	if (err == REFTABLE_LOCK_ERROR)
>  		st->stats.failures++;
>  	return err;
> @@ -1320,7 +1359,7 @@ int reftable_stack_compact_all(struct reftable_stack *st,
>  			       struct reftable_log_expiry_config *config)
>  {
>  	size_t last = st->merged->stack_len ? st->merged->stack_len - 1 : 0;
> -	return stack_compact_range_stats(st, 0, last, config);
> +	return stack_compact_range_stats(st, 0, last, config, 0);
>  }
>  
>  static int segment_size(struct segment *s)
> @@ -1427,7 +1466,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st)
>  	reftable_free(sizes);
>  	if (segment_size(&seg) > 0)
>  		return stack_compact_range_stats(st, seg.start, seg.end - 1,
> -						 NULL);
> +						 NULL, STACK_COMPACT_RANGE_BEST_EFFORT);
>  
>  	return 0;
>  }
> diff --git a/reftable/stack_test.c b/reftable/stack_test.c
> index 2b1eb83934..733cf6113e 100644
> --- a/reftable/stack_test.c
> +++ b/reftable/stack_test.c
> @@ -913,13 +913,15 @@ static void test_reftable_stack_auto_compaction_with_locked_tables(void)
>  	write_file_buf(buf.buf, "", 0);
>  
>  	/*
> -	 * Ideally, we'd handle the situation where any of the tables is locked
> -	 * gracefully. We don't (yet) do this though and thus fail.
> +	 * When parts of the stack are locked, then auto-compaction does a best
> +	 * effort compaction of those tables which aren't locked. So while this
> +	 * would in theory compact all tables, due to the preexisting lock we
> +	 * only compact the newest two tables.
>  	 */
>  	err = reftable_stack_auto_compact(st);
> -	EXPECT(err == REFTABLE_LOCK_ERROR);
> -	EXPECT(st->stats.failures == 1);
> -	EXPECT(st->merged->stack_len == 5);
> +	EXPECT_ERR(err);
> +	EXPECT(st->stats.failures == 0);
> +	EXPECT(st->merged->stack_len == 4);
>  
>  	reftable_stack_destroy(st);
>  	strbuf_release(&buf);
> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> index b06c46999d..37510c2b2a 100755
> --- a/t/t0610-reftable-basics.sh
> +++ b/t/t0610-reftable-basics.sh
> @@ -478,19 +478,26 @@ test_expect_success "$command: auto compaction" '
>  
>  		test_oid blob17_2 | git hash-object -w --stdin &&
>  
> -		# Lock all tables write some refs. Auto-compaction will be
> -		# unable to compact tables and thus fails gracefully, leaving
> -		# the stack in a sub-optimal state.
> -		ls .git/reftable/*.ref |
> +		# Lock all tables, write some refs. Auto-compaction will be
> +		# unable to compact tables and thus fails gracefully,
> +		# compacting only those tables which are not locked.
> +		ls .git/reftable/*.ref | sort |
>  		while read table
>  		do
> -			touch "$table.lock" || exit 1
> +			touch "$table.lock" &&
> +			basename "$table" >>tables.expect || exit 1
>  		done &&
> +		test_line_count = 2 .git/reftable/tables.list &&
>  		git branch B &&
>  		git branch C &&
> -		rm .git/reftable/*.lock &&
> -		test_line_count = 4 .git/reftable/tables.list &&
>  
> +		# The new tables are auto-compacted, but the locked tables are
> +		# left intact.
> +		test_line_count = 3 .git/reftable/tables.list &&
> +		head -n 2 .git/reftable/tables.list >tables.head &&
> +		test_cmp tables.expect tables.head &&
> +
> +		rm .git/reftable/*.lock &&
>  		git $command --auto &&
>  		test_line_count = 1 .git/reftable/tables.list
>  	)
> -- 
> 2.46.0.dirty
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction
  2024-08-06 18:46     ` Justin Tobler
@ 2024-08-07  5:31       ` Patrick Steinhardt
  2024-08-07 19:12         ` Justin Tobler
  0 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-07  5:31 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

On Tue, Aug 06, 2024 at 01:46:35PM -0500, Justin Tobler wrote:
> On 24/08/05 03:08PM, Patrick Steinhardt wrote:
> >  	REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1);
> > -	for (i = first; i <= last; i++) {
> > -		stack_filename(&table_name, st, reader_name(st->readers[i]));
> > +	for (i = last + 1; i > first; i--) {
> > +		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));
> 
> I might be missing something, but why not set `i = last` and `i >=
> first`? It looks like everywhere we reference `i` we subtract one
> anyways. Since `last` is already at the starting index, it seems it
> would be a bit more straightforward.

You are missing the case where `first == 0`. With `first = 0`, `i >=
first` would be truish when `i == 0` and thus we would continue to loop.
We then execute `i--`, wrap around, and still have `i >= first`.

Thus, an endless loop is born :)

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction
  2024-08-07  5:31       ` Patrick Steinhardt
@ 2024-08-07 19:12         ` Justin Tobler
  0 siblings, 0 replies; 50+ messages in thread
From: Justin Tobler @ 2024-08-07 19:12 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano

On 24/08/07 07:31AM, Patrick Steinhardt wrote:
> On Tue, Aug 06, 2024 at 01:46:35PM -0500, Justin Tobler wrote:
> > On 24/08/05 03:08PM, Patrick Steinhardt wrote:
> > >  	REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1);
> > > -	for (i = first; i <= last; i++) {
> > > -		stack_filename(&table_name, st, reader_name(st->readers[i]));
> > > +	for (i = last + 1; i > first; i--) {
> > > +		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));
> > 
> > I might be missing something, but why not set `i = last` and `i >=
> > first`? It looks like everywhere we reference `i` we subtract one
> > anyways. Since `last` is already at the starting index, it seems it
> > would be a bit more straightforward.
> 
> You are missing the case where `first == 0`. With `first = 0`, `i >=
> first` would be truish when `i == 0` and thus we would continue to loop.
> We then execute `i--`, wrap around, and still have `i >= first`.
> 
> Thus, an endless loop is born :)

Ah, thanks for explaining! This version of the patch series looks good
to me.

-Justin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/9] reftable/stack: test compaction with already-locked tables
  2024-08-05 13:07   ` [PATCH v2 3/9] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
@ 2024-08-08 10:46     ` Karthik Nayak
  0 siblings, 0 replies; 50+ messages in thread
From: Karthik Nayak @ 2024-08-08 10:46 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> We're lacking test coverage for compacting tables when some of the
> tables that we are about to compact are locked. Add two tests that
> exercise this, one for auto-compaction and one for full compaction.
>

So this patch prepares for the upcoming fixes by adding tests which fail
compaction. Makes sense.

[snip]

> +static void test_reftable_stack_compaction_with_locked_tables(void)
> +{
> +	struct reftable_write_options opts = {
> +		.disable_auto_compact = 1,
> +	};
> +	struct reftable_stack *st = NULL;
> +	struct strbuf buf = STRBUF_INIT;
> +	char *dir = get_tmp_dir(__LINE__);
> +	int err;
> +
> +	err = reftable_new_stack(&st, dir, &opts);
> +	EXPECT_ERR(err);
> +
> +	write_n_ref_tables(st, &opts, 3);
> +	EXPECT(st->merged->stack_len == 3);
> +
> +	/* Lock one of the tables that we're about to compact. */
> +	strbuf_reset(&buf);
> +	strbuf_addf(&buf, "%s/%s.lock", dir, st->readers[1]->name);
> +	write_file_buf(buf.buf, "", 0);
> +
> +	/*
> +	 * Compaction is expected to fail given that we were not able to
> +	 * compact all tables.
> +	 */
> +	err = reftable_stack_compact_all(st, NULL);
> +	EXPECT(err == REFTABLE_LOCK_ERROR);
> +	/* TODO: this is wrong, we should get notified about the failure. */
> +	EXPECT(st->stats.failures == 0);

This is a good catch. The autocompaction code has a wrapper
`stack_compact_range_stats` which handles this exact scenario.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction
  2024-08-05 13:08   ` [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
@ 2024-08-08 12:14     ` Karthik Nayak
  2024-08-08 13:29       ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Karthik Nayak @ 2024-08-08 12:14 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4430 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> The locking employed by compaction uses the following schema:
>
>   1. Lock "tables.list" and verify that it matches the version we have
>      loaded in core.
>
>   2. Lock each of the tables in the user-supplied range of tables that
>      we are supposed to compact. These locks prohibit any concurrent
>      process to compact those tables while we are doing that.
>
>   3. Unlock "tables.list". This enables concurrent processes to add new
>      tables to the stack, but also allows them to compact tables outside
>      of the range of tables that we have locked.
>
>   4. Perform the compaction.
>
>   5. Lock "tables.list" again.
>
>   6. Move the compacted table into place.
>
>   7. Write the new order of tables, including the compacted table, into
>      the lockfile.
>
>   8. Commit the lockfile into place.
>

This summary helps a lot, thanks!

[snip]

> @@ -1123,6 +1125,100 @@ static int stack_compact_range(struct reftable_stack *st,
>  		}
>  	}
>
> +	/*
> +	 * As we have unlocked the stack while compacting our slice of tables
> +	 * it may have happened that a concurrently running process has updated
> +	 * the stack while we were compacting. In that case, we need to check
> +	 * whether the tables that we have just compacted still exist in the
> +	 * stack in the exact same order as we have compacted them.
> +	 *

But as per the current implementation, the tables we compacted would
always exist in tables.list, since we've obtained a lock on them.

Looking at the code below, wouldn't it be more ideal to talk about how
there are two scenarios we need to handle?
1. Stack is upto date, there we simply overwrite the stack with our
modified version.
2. Stack is not upto date, in this scenario, we need to amend the stack
without loosing out information. An extra check here is that we also see
that the tables we compact are still existing. (I don't really get, why
they wouldn't be though).

> +	 * If they do exist, then it is fine to continue and replace those
> +	 * tables with our compacted version. If they don't, then we need to
> +	 * abort.
> +	 */
> +	err = stack_uptodate(st);
> +	if (err < 0)
> +		goto done;
> +	if (err > 0) {

So this is the scenario where the stack is no longer upto date.

> +		ssize_t new_offset = -1;
> +		int fd;
> +
> +		fd = open(st->list_file, O_RDONLY);
> +		if (fd < 0) {
> +			err = REFTABLE_IO_ERROR;
> +			goto done;
> +		}
> +
> +		err = fd_read_lines(fd, &names);
> +		close(fd);
> +		if (err < 0)
> +			goto done;
> +
> +		/*
> +		 * Search for the offset of the first table that we have
> +		 * compacted in the updated "tables.list" file.
> +		 */
> +		for (size_t i = 0; names[i]; i++) {
> +			if (strcmp(names[i], st->readers[first]->name))
> +				continue;
> +
> +			/*
> +			 * We have found the first entry. Verify that all the
> +			 * subsequent tables we have compacted still exist in
> +			 * the modified stack in the exact same order as we
> +			 * have compacted them.
> +			 */
> +			for (size_t j = 1; j < last - first + 1; j++) {
> +				const char *old = first + j < st->merged->stack_len ?
> +					st->readers[first + j]->name : NULL;
> +				const char *new = names[i + j];
> +
> +				/*
> +				 * If some entries are missing or in case the tables
> +				 * have changed then we need to bail out. Again, this
> +				 * shouldn't ever happen because we have locked the
> +				 * tables we are compacting.
> +				 */

Okay, this is exactly what I was saying above. It still does makes sense
to keep this check to ensure future versions don't break it.

> +				if (!old || !new || strcmp(old, new)) {
> +					err = REFTABLE_OUTDATED_ERROR;
> +					goto done;
> +				}
> +			}
> +
> +			new_offset = i;
> +			break;
> +		}
> +
> +		/*
> +		 * In case we didn't find our compacted tables in the stack we
> +		 * need to bail out. In theory, this should have never happened
> +		 * because we locked the tables we are compacting.
> +		 */
> +		if (new_offset < 0) {
> +			err = REFTABLE_OUTDATED_ERROR;
> +			goto done;
> +		}
> +
> +		/*
> +		 * We have found the new range that we want to replace, so
> +		 * let's update the range of tables that we want to replace.
> +		 */
> +		last_to_replace = last + (new_offset - first);
> +		first_to_replace = new_offset;

Nit: might be easier to read as

  first_to_replace = new_offset;
  last_to_replace = first_to_replace + (last - first);

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction
  2024-08-05 13:08   ` [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
  2024-08-06 18:46     ` Justin Tobler
@ 2024-08-08 12:25     ` Karthik Nayak
  1 sibling, 0 replies; 50+ messages in thread
From: Karthik Nayak @ 2024-08-08 12:25 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4064 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> When compacting tables, it may happen that we want to compact a set of
> tables which are already locked by a concurrent process that compacts
> them. In the case where we wanted to perform a full compaction of all
> tables it is sensible to bail out in this case, as we cannot fulfill the
> requested action.
>
> But when performing auto-compaction it isn't necessarily in our best
> interest of us to abort the whole operation. For example, due to the
> geometric compacting schema that we use, it may be that process A takes
> a lot of time to compact the bulk of all tables whereas process B
> appends a bunch of new tables to the stack. B would in this case also
> notice that it has to compact the tables that process A is compacting
> already and thus also try to compact the same range, probably including
> the new tables it has appended. But because those tables are locked
> already, it will fail and thus abort the complete auto-compaction. The
> consequence is that the stack will grow longer and longer while A isn't
> yet done with compaction, which will lead to a growing performance
> impact.
>
> Instead of aborting auto-compaction altogether, let's gracefully handle
> this situation by instead compacting tables which aren't locked. To do
> so, instead of locking from the beginning of the slice-to-be-compacted,
> we start locking tables from the end of the slice. Once we hit the first
> table that is locked already, we abort. If we succeded to lock two or

s/succeded/succeeded

> more tables, then we simply reduce the slice of tables that we're about
> to compact to those which we managed to lock.
>
> This ensures that we can at least make some progress for compaction in
> said scenario. It also helps in other scenarios, like for example when a
> process died and left a stale lockfile behind. In such a case we can at
> least ensure some compaction on a best-effort basis.
>

Right. This is really well written.

[snip]

> @@ -1052,19 +1062,47 @@ static int stack_compact_range(struct reftable_stack *st,
>  	/*
>  	 * Lock all tables in the user-provided range. This is the slice of our
>  	 * stack which we'll compact.
> +	 *
> +	 * Note that we lock tables in reverse order from last to first. The
> +	 * intent behind this is to allow a newer process to perform best
> +	 * effort compaction of tables that it has added in the case where an
> +	 * older process is still busy compacting tables which are preexisting
> +	 * from the point of view of the newer process.
>  	 */
>  	REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1);
> -	for (i = first; i <= last; i++) {
> -		stack_filename(&table_name, st, reader_name(st->readers[i]));
> +	for (i = last + 1; i > first; i--) {
> +		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));
>
>  		err = hold_lock_file_for_update(&table_locks[nlocks],
>  						table_name.buf, LOCK_NO_DEREF);
>  		if (err < 0) {
> -			if (errno == EEXIST)
> +			/*
> +			 * When the table is locked already we may do a
> +			 * best-effort compaction and compact only the tables
> +			 * that we have managed to lock so far. This of course
> +			 * requires that we have been able to lock at least two
> +			 * tables, otherwise there would be nothing to compact.
> +			 * In that case, we return a lock error to our caller.
> +			 */
> +			if (errno == EEXIST && last - (i - 1) >= 2 &&
> +			    flags & STACK_COMPACT_RANGE_BEST_EFFORT) {
> +				err = 0;
> +				/*
> +				 * The subtraction is to offset the index, the
> +				 * addition is to only compact up to the table
> +				 * of the preceding iteration. They obviously
> +				 * cancel each other out, but that may be
> +				 * non-obvious when it was omitted.
> +				 */
> +				first = (i - 1) + 1;

I remember thinking about this and how to fix it, I'm happy how you've
done it. It's simple and sufficient. Kudos.

[snip]

Overall this patch looked great. I have nothing to add, I really like it
that we use a flag and this means this would only come into play for
auto-compaction.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/9] reftable: improvements and fixes for compaction
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2024-08-05 13:08   ` [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
@ 2024-08-08 12:26   ` Karthik Nayak
  9 siblings, 0 replies; 50+ messages in thread
From: Karthik Nayak @ 2024-08-08 12:26 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> this is the second version of my patch series that aims to improve the
> way reftable stack perform compaction.
>
> Changes compared to v1:
>
>   - Extract a test helper function that sets up a stack with N tables
>     containing refs.
>
>   - Reuse file descriptor that we have already stored in a local
>     variable instead of calling `lock_file_fd()` a second time.
>
>   - Remove a no-op change in the last patch.
>
>   - Add a comment explaining why we have to allocate N+1 many table
>     names.
>
>   - Some typo fixes.
>
> Thanks!

I haven't reviewed v1, but did skim through it. I left some comments,
(maybe they were already asked/answered) overall the series looks great
to me.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction
  2024-08-08 12:14     ` Karthik Nayak
@ 2024-08-08 13:29       ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 13:29 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2136 bytes --]

On Thu, Aug 08, 2024 at 07:14:15AM -0500, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > +			/*
> > +			 * We have found the first entry. Verify that all the
> > +			 * subsequent tables we have compacted still exist in
> > +			 * the modified stack in the exact same order as we
> > +			 * have compacted them.
> > +			 */
> > +			for (size_t j = 1; j < last - first + 1; j++) {
> > +				const char *old = first + j < st->merged->stack_len ?
> > +					st->readers[first + j]->name : NULL;
> > +				const char *new = names[i + j];
> > +
> > +				/*
> > +				 * If some entries are missing or in case the tables
> > +				 * have changed then we need to bail out. Again, this
> > +				 * shouldn't ever happen because we have locked the
> > +				 * tables we are compacting.
> > +				 */
> 
> Okay, this is exactly what I was saying above. It still does makes sense
> to keep this check to ensure future versions don't break it.

Yeah, exactly. It's mostly about defense in depth, but should in theory
never be needed.

> > +				if (!old || !new || strcmp(old, new)) {
> > +					err = REFTABLE_OUTDATED_ERROR;
> > +					goto done;
> > +				}
> > +			}
> > +
> > +			new_offset = i;
> > +			break;
> > +		}
> > +
> > +		/*
> > +		 * In case we didn't find our compacted tables in the stack we
> > +		 * need to bail out. In theory, this should have never happened
> > +		 * because we locked the tables we are compacting.
> > +		 */
> > +		if (new_offset < 0) {
> > +			err = REFTABLE_OUTDATED_ERROR;
> > +			goto done;
> > +		}
> > +
> > +		/*
> > +		 * We have found the new range that we want to replace, so
> > +		 * let's update the range of tables that we want to replace.
> > +		 */
> > +		last_to_replace = last + (new_offset - first);
> > +		first_to_replace = new_offset;
> 
> Nit: might be easier to read as
> 
>   first_to_replace = new_offset;
>   last_to_replace = first_to_replace + (last - first);

True. Initially I didn't have the `first_to_replace` variables, but now
that I do it certainly makes it easier to order it naturally.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 0/9] reftable: improvements and fixes for compaction
  2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
                   ` (8 preceding siblings ...)
  2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
@ 2024-08-08 14:06 ` Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 1/9] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
                     ` (9 more replies)
  9 siblings, 10 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 14:06 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 6330 bytes --]

Hi,

this is the second version of my patch series that aims to improve the
way reftable stack perform compaction.

Changes compared to v2:

  - Drop the unused `reftable_write_options` structure in
    `write_n_ref_tables()`.

  - Fix a commit message typo.

  - Reorder some variable assignments to feel more natural.

Thanks!

Patrick

Patrick Steinhardt (9):
  reftable/stack: refactor function to gather table sizes
  reftable/stack: extract function to setup stack with N tables
  reftable/stack: test compaction with already-locked tables
  reftable/stack: update stats on failed full compaction
  reftable/stack: simplify tracking of table locks
  reftable/stack: do not die when fsyncing lock file files
  reftable/stack: use lock_file when adding table to "tables.list"
  reftable/stack: fix corruption on concurrent compaction
  reftable/stack: handle locked tables during auto-compaction

 reftable/stack.c           | 231 +++++++++++++++++++++++++++++--------
 reftable/stack_test.c      | 142 ++++++++++++++++++-----
 t/t0610-reftable-basics.sh |  21 ++--
 3 files changed, 310 insertions(+), 84 deletions(-)

Range-diff against v2:
 1:  6d2b54ba8e =  1:  6d2b54ba8e reftable/stack: refactor function to gather table sizes
 2:  ff17306cc0 !  2:  798a661824 reftable/stack: extract function to setup stack with N tables
    @@ Commit message
         tests. This is fine though as we only care about the shape of the stack
         here, not the shape of each table.
     
    +    Furthermore, with this change we now start to disable auto compaction
    +    when writing the tables, as otherwise we might not end up with the
    +    expected amount of new tables added. This also slightly changes the
    +    behaviour of these tests, but the properties we care for remain intact.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## reftable/stack_test.c ##
    @@ reftable/stack_test.c: static int write_test_ref(struct reftable_writer *wr, voi
      }
      
     +static void write_n_ref_tables(struct reftable_stack *st,
    -+			       struct reftable_write_options *opts,
     +			       size_t n)
     +{
     +	struct strbuf buf = STRBUF_INIT;
    ++	int disable_auto_compact;
     +	int err;
     +
    ++	disable_auto_compact = st->opts.disable_auto_compact;
    ++	st->opts.disable_auto_compact = 1;
    ++
     +	for (size_t i = 0; i < n; i++) {
     +		struct reftable_ref_record ref = {
     +			.update_index = reftable_stack_next_update_index(st),
    @@ reftable/stack_test.c: static int write_test_ref(struct reftable_writer *wr, voi
     +		EXPECT_ERR(err);
     +	}
     +
    ++	st->opts.disable_auto_compact = disable_auto_compact;
     +	strbuf_release(&buf);
     +}
     +
    @@ reftable/stack_test.c: static void test_reftable_stack_compaction_concurrent(voi
     -		err = reftable_stack_add(st1, &write_test_ref, &ref);
     -		EXPECT_ERR(err);
     -	}
    -+	write_n_ref_tables(st1, &opts, 3);
    ++	write_n_ref_tables(st1, 3);
      
      	err = reftable_new_stack(&st2, dir, &opts);
      	EXPECT_ERR(err);
    @@ reftable/stack_test.c: static void test_reftable_stack_compaction_concurrent_cle
     -		err = reftable_stack_add(st1, &write_test_ref, &ref);
     -		EXPECT_ERR(err);
     -	}
    -+	write_n_ref_tables(st1, &opts, 3);
    ++	write_n_ref_tables(st1, 3);
      
      	err = reftable_new_stack(&st2, dir, &opts);
      	EXPECT_ERR(err);
 3:  63e64c8d82 !  3:  949f748823 reftable/stack: test compaction with already-locked tables
    @@ reftable/stack_test.c: static void test_reftable_stack_auto_compaction(void)
     +	err = reftable_new_stack(&st, dir, &opts);
     +	EXPECT_ERR(err);
     +
    -+	write_n_ref_tables(st, &opts, 5);
    ++	write_n_ref_tables(st, 5);
     +	EXPECT(st->merged->stack_len == 5);
     +
     +	/*
    @@ reftable/stack_test.c: static void test_reftable_stack_add_performs_auto_compact
     +	err = reftable_new_stack(&st, dir, &opts);
     +	EXPECT_ERR(err);
     +
    -+	write_n_ref_tables(st, &opts, 3);
    ++	write_n_ref_tables(st, 3);
     +	EXPECT(st->merged->stack_len == 3);
     +
     +	/* Lock one of the tables that we're about to compact. */
 4:  1989dafcb4 =  4:  478f945a45 reftable/stack: update stats on failed full compaction
 5:  73e5d104eb =  5:  812a45f3d2 reftable/stack: simplify tracking of table locks
 6:  e411e14904 =  6:  d7d34e7253 reftable/stack: do not die when fsyncing lock file files
 7:  b868a518d6 =  7:  37a757649a reftable/stack: use lock_file when adding table to "tables.list"
 8:  ff17414d26 !  8:  b27cb325fc reftable/stack: fix corruption on concurrent compaction
    @@ reftable/stack.c: static int stack_compact_range(struct reftable_stack *st,
     +		 * We have found the new range that we want to replace, so
     +		 * let's update the range of tables that we want to replace.
     +		 */
    -+		last_to_replace = last + (new_offset - first);
     +		first_to_replace = new_offset;
    ++		last_to_replace = last + (new_offset - first);
     +	} else {
     +		/*
     +		 * `fd_read_lines()` uses a `NULL` sentinel to indicate that
    @@ reftable/stack.c: static int stack_compact_range(struct reftable_stack *st,
     +		REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);
     +		for (size_t i = 0; i < st->merged->stack_len; i++)
     +			names[i] = xstrdup(st->readers[i]->name);
    -+		last_to_replace = last;
     +		first_to_replace = first;
    ++		last_to_replace = last;
     +	}
     +
      	/*
 9:  1ef560feb1 !  9:  dc2fea145d reftable/stack: handle locked tables during auto-compaction
    @@ Commit message
         this situation by instead compacting tables which aren't locked. To do
         so, instead of locking from the beginning of the slice-to-be-compacted,
         we start locking tables from the end of the slice. Once we hit the first
    -    table that is locked already, we abort. If we succeded to lock two or
    +    table that is locked already, we abort. If we succeeded to lock two or
         more tables, then we simply reduce the slice of tables that we're about
         to compact to those which we managed to lock.
     
-- 
2.46.0.46.g406f326d27.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 1/9] reftable/stack: refactor function to gather table sizes
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
@ 2024-08-08 14:06   ` Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 2/9] reftable/stack: extract function to setup stack with N tables Patrick Steinhardt
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 14:06 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 1242 bytes --]

Refactor the function that gathers table sizes to be more idiomatic. For
one, use `REFTABLE_CALLOC_ARRAY()` instead of `reftable_calloc()`.
Second, avoid using an integer to iterate through the tables in the
reftable stack given that `stack_len` itself is using a `size_t`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 737591125e..ba8234b486 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1305,14 +1305,15 @@ struct segment suggest_compaction_segment(uint64_t *sizes, size_t n,
 
 static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
 {
-	uint64_t *sizes =
-		reftable_calloc(st->merged->stack_len, sizeof(*sizes));
 	int version = (st->opts.hash_id == GIT_SHA1_FORMAT_ID) ? 1 : 2;
 	int overhead = header_size(version) - 1;
-	int i = 0;
-	for (i = 0; i < st->merged->stack_len; i++) {
+	uint64_t *sizes;
+
+	REFTABLE_CALLOC_ARRAY(sizes, st->merged->stack_len);
+
+	for (size_t i = 0; i < st->merged->stack_len; i++)
 		sizes[i] = st->readers[i]->size - overhead;
-	}
+
 	return sizes;
 }
 
-- 
2.46.0.46.g406f326d27.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 2/9] reftable/stack: extract function to setup stack with N tables
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 1/9] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
@ 2024-08-08 14:06   ` Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 3/9] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 14:06 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 3713 bytes --]

We're about to add two tests, and both of them will want to initialize
the reftable stack with a set of N tables. Introduce a new function that
handles this and refactor existing tests that use such a setup to use
it.

Note that this changes the exact records contained in the preexisting
tests. This is fine though as we only care about the shape of the stack
here, not the shape of each table.

Furthermore, with this change we now start to disable auto compaction
when writing the tables, as otherwise we might not end up with the
expected amount of new tables added. This also slightly changes the
behaviour of these tests, but the properties we care for remain intact.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack_test.c | 64 +++++++++++++++++++++----------------------
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index e3c11e6a6e..0b110f6f02 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -109,6 +109,34 @@ static int write_test_ref(struct reftable_writer *wr, void *arg)
 	return reftable_writer_add_ref(wr, ref);
 }
 
+static void write_n_ref_tables(struct reftable_stack *st,
+			       size_t n)
+{
+	struct strbuf buf = STRBUF_INIT;
+	int disable_auto_compact;
+	int err;
+
+	disable_auto_compact = st->opts.disable_auto_compact;
+	st->opts.disable_auto_compact = 1;
+
+	for (size_t i = 0; i < n; i++) {
+		struct reftable_ref_record ref = {
+			.update_index = reftable_stack_next_update_index(st),
+			.value_type = REFTABLE_REF_VAL1,
+		};
+
+		strbuf_addf(&buf, "refs/heads/branch-%04u", (unsigned) i);
+		ref.refname = buf.buf;
+		set_test_hash(ref.value.val1, i);
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		EXPECT_ERR(err);
+	}
+
+	st->opts.disable_auto_compact = disable_auto_compact;
+	strbuf_release(&buf);
+}
+
 struct write_log_arg {
 	struct reftable_log_record *log;
 	uint64_t update_index;
@@ -916,25 +944,11 @@ static void test_reftable_stack_compaction_concurrent(void)
 	struct reftable_write_options opts = { 0 };
 	struct reftable_stack *st1 = NULL, *st2 = NULL;
 	char *dir = get_tmp_dir(__LINE__);
-	int err, i;
-	int N = 3;
+	int err;
 
 	err = reftable_new_stack(&st1, dir, &opts);
 	EXPECT_ERR(err);
-
-	for (i = 0; i < N; i++) {
-		char name[100];
-		struct reftable_ref_record ref = {
-			.refname = name,
-			.update_index = reftable_stack_next_update_index(st1),
-			.value_type = REFTABLE_REF_SYMREF,
-			.value.symref = (char *) "master",
-		};
-		snprintf(name, sizeof(name), "branch%04d", i);
-
-		err = reftable_stack_add(st1, &write_test_ref, &ref);
-		EXPECT_ERR(err);
-	}
+	write_n_ref_tables(st1, 3);
 
 	err = reftable_new_stack(&st2, dir, &opts);
 	EXPECT_ERR(err);
@@ -965,25 +979,11 @@ static void test_reftable_stack_compaction_concurrent_clean(void)
 	struct reftable_write_options opts = { 0 };
 	struct reftable_stack *st1 = NULL, *st2 = NULL, *st3 = NULL;
 	char *dir = get_tmp_dir(__LINE__);
-	int err, i;
-	int N = 3;
+	int err;
 
 	err = reftable_new_stack(&st1, dir, &opts);
 	EXPECT_ERR(err);
-
-	for (i = 0; i < N; i++) {
-		char name[100];
-		struct reftable_ref_record ref = {
-			.refname = name,
-			.update_index = reftable_stack_next_update_index(st1),
-			.value_type = REFTABLE_REF_SYMREF,
-			.value.symref = (char *) "master",
-		};
-		snprintf(name, sizeof(name), "branch%04d", i);
-
-		err = reftable_stack_add(st1, &write_test_ref, &ref);
-		EXPECT_ERR(err);
-	}
+	write_n_ref_tables(st1, 3);
 
 	err = reftable_new_stack(&st2, dir, &opts);
 	EXPECT_ERR(err);
-- 
2.46.0.46.g406f326d27.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 3/9] reftable/stack: test compaction with already-locked tables
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 1/9] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 2/9] reftable/stack: extract function to setup stack with N tables Patrick Steinhardt
@ 2024-08-08 14:06   ` Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 4/9] reftable/stack: update stats on failed full compaction Patrick Steinhardt
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 14:06 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 3863 bytes --]

We're lacking test coverage for compacting tables when some of the
tables that we are about to compact are locked. Add two tests that
exercise this, one for auto-compaction and one for full compaction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack_test.c | 77 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 0b110f6f02..1d109933d3 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -891,6 +891,45 @@ static void test_reftable_stack_auto_compaction(void)
 	clear_dir(dir);
 }
 
+static void test_reftable_stack_auto_compaction_with_locked_tables(void)
+{
+	struct reftable_write_options opts = {
+		.disable_auto_compact = 1,
+	};
+	struct reftable_stack *st = NULL;
+	struct strbuf buf = STRBUF_INIT;
+	char *dir = get_tmp_dir(__LINE__);
+	int err;
+
+	err = reftable_new_stack(&st, dir, &opts);
+	EXPECT_ERR(err);
+
+	write_n_ref_tables(st, 5);
+	EXPECT(st->merged->stack_len == 5);
+
+	/*
+	 * Given that all tables we have written should be roughly the same
+	 * size, we expect that auto-compaction will want to compact all of the
+	 * tables. Locking any of the tables will keep it from doing so.
+	 */
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "%s/%s.lock", dir, st->readers[2]->name);
+	write_file_buf(buf.buf, "", 0);
+
+	/*
+	 * Ideally, we'd handle the situation where any of the tables is locked
+	 * gracefully. We don't (yet) do this though and thus fail.
+	 */
+	err = reftable_stack_auto_compact(st);
+	EXPECT(err == REFTABLE_LOCK_ERROR);
+	EXPECT(st->stats.failures == 1);
+	EXPECT(st->merged->stack_len == 5);
+
+	reftable_stack_destroy(st);
+	strbuf_release(&buf);
+	clear_dir(dir);
+}
+
 static void test_reftable_stack_add_performs_auto_compaction(void)
 {
 	struct reftable_write_options opts = { 0 };
@@ -939,6 +978,42 @@ static void test_reftable_stack_add_performs_auto_compaction(void)
 	clear_dir(dir);
 }
 
+static void test_reftable_stack_compaction_with_locked_tables(void)
+{
+	struct reftable_write_options opts = {
+		.disable_auto_compact = 1,
+	};
+	struct reftable_stack *st = NULL;
+	struct strbuf buf = STRBUF_INIT;
+	char *dir = get_tmp_dir(__LINE__);
+	int err;
+
+	err = reftable_new_stack(&st, dir, &opts);
+	EXPECT_ERR(err);
+
+	write_n_ref_tables(st, 3);
+	EXPECT(st->merged->stack_len == 3);
+
+	/* Lock one of the tables that we're about to compact. */
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "%s/%s.lock", dir, st->readers[1]->name);
+	write_file_buf(buf.buf, "", 0);
+
+	/*
+	 * Compaction is expected to fail given that we were not able to
+	 * compact all tables.
+	 */
+	err = reftable_stack_compact_all(st, NULL);
+	EXPECT(err == REFTABLE_LOCK_ERROR);
+	/* TODO: this is wrong, we should get notified about the failure. */
+	EXPECT(st->stats.failures == 0);
+	EXPECT(st->merged->stack_len == 3);
+
+	reftable_stack_destroy(st);
+	strbuf_release(&buf);
+	clear_dir(dir);
+}
+
 static void test_reftable_stack_compaction_concurrent(void)
 {
 	struct reftable_write_options opts = { 0 };
@@ -1016,9 +1091,11 @@ int stack_test_main(int argc, const char *argv[])
 	RUN_TEST(test_reftable_stack_add);
 	RUN_TEST(test_reftable_stack_add_one);
 	RUN_TEST(test_reftable_stack_auto_compaction);
+	RUN_TEST(test_reftable_stack_auto_compaction_with_locked_tables);
 	RUN_TEST(test_reftable_stack_add_performs_auto_compaction);
 	RUN_TEST(test_reftable_stack_compaction_concurrent);
 	RUN_TEST(test_reftable_stack_compaction_concurrent_clean);
+	RUN_TEST(test_reftable_stack_compaction_with_locked_tables);
 	RUN_TEST(test_reftable_stack_hash_id);
 	RUN_TEST(test_reftable_stack_lock_failure);
 	RUN_TEST(test_reftable_stack_log_normalize);
-- 
2.46.0.46.g406f326d27.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 4/9] reftable/stack: update stats on failed full compaction
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-08-08 14:06   ` [PATCH v3 3/9] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
@ 2024-08-08 14:06   ` Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 5/9] reftable/stack: simplify tracking of table locks Patrick Steinhardt
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 14:06 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 2119 bytes --]

When auto-compaction fails due to a locking error, we update the
statistics to indicate this failure. We're not doing the same when
performing a full compaction.

Fix this inconsistency by using `stack_compact_range_stats()`, which
handles the stat update for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c      | 14 +++++++-------
 reftable/stack_test.c |  3 +--
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index ba8234b486..e5959d2c76 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1205,13 +1205,6 @@ static int stack_compact_range(struct reftable_stack *st,
 	return err;
 }
 
-int reftable_stack_compact_all(struct reftable_stack *st,
-			       struct reftable_log_expiry_config *config)
-{
-	return stack_compact_range(st, 0, st->merged->stack_len ?
-			st->merged->stack_len - 1 : 0, config);
-}
-
 static int stack_compact_range_stats(struct reftable_stack *st,
 				     size_t first, size_t last,
 				     struct reftable_log_expiry_config *config)
@@ -1222,6 +1215,13 @@ static int stack_compact_range_stats(struct reftable_stack *st,
 	return err;
 }
 
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	size_t last = st->merged->stack_len ? st->merged->stack_len - 1 : 0;
+	return stack_compact_range_stats(st, 0, last, config);
+}
+
 static int segment_size(struct segment *s)
 {
 	return s->end - s->start;
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 1d109933d3..3ed8e44924 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -1005,8 +1005,7 @@ static void test_reftable_stack_compaction_with_locked_tables(void)
 	 */
 	err = reftable_stack_compact_all(st, NULL);
 	EXPECT(err == REFTABLE_LOCK_ERROR);
-	/* TODO: this is wrong, we should get notified about the failure. */
-	EXPECT(st->stats.failures == 0);
+	EXPECT(st->stats.failures == 1);
 	EXPECT(st->merged->stack_len == 3);
 
 	reftable_stack_destroy(st);
-- 
2.46.0.46.g406f326d27.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 5/9] reftable/stack: simplify tracking of table locks
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2024-08-08 14:06   ` [PATCH v3 4/9] reftable/stack: update stats on failed full compaction Patrick Steinhardt
@ 2024-08-08 14:06   ` Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 6/9] reftable/stack: do not die when fsyncing lock file files Patrick Steinhardt
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 14:06 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 3056 bytes --]

When compacting tables, we store the locks of all tables we are about to
compact in the `table_locks` array. As we currently only ever compact
all tables in the user-provided range or none, we simply track those
locks via the indices of the respective tables in the merged stack.

This is about to change though, as we will introduce a mode where auto
compaction gracefully handles the case of already-locked files. In this
case, it may happen that we only compact a subset of the user-supplied
range of tables. In this case, the indices will not necessarily match
the lock indices anymore.

Refactor the code such that we track the number of locks via a separate
variable. The resulting code is expected to perform the same, but will
make it easier to perform the described change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index e5959d2c76..07e7ffc6b9 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1016,7 +1016,7 @@ static int stack_compact_range(struct reftable_stack *st,
 	struct lock_file *table_locks = NULL;
 	struct tempfile *new_table = NULL;
 	int is_empty_table = 0, err = 0;
-	size_t i;
+	size_t i, nlocks = 0;
 
 	if (first > last || (!expiry && first == last)) {
 		err = 0;
@@ -1051,7 +1051,7 @@ static int stack_compact_range(struct reftable_stack *st,
 	for (i = first; i <= last; i++) {
 		stack_filename(&table_name, st, reader_name(st->readers[i]));
 
-		err = hold_lock_file_for_update(&table_locks[i - first],
+		err = hold_lock_file_for_update(&table_locks[nlocks],
 						table_name.buf, LOCK_NO_DEREF);
 		if (err < 0) {
 			if (errno == EEXIST)
@@ -1066,7 +1066,7 @@ static int stack_compact_range(struct reftable_stack *st,
 		 * run into file descriptor exhaustion when we compress a lot
 		 * of tables.
 		 */
-		err = close_lock_file_gently(&table_locks[i - first]);
+		err = close_lock_file_gently(&table_locks[nlocks++]);
 		if (err < 0) {
 			err = REFTABLE_IO_ERROR;
 			goto done;
@@ -1183,8 +1183,8 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * Delete the old tables. They may still be in use by concurrent
 	 * readers, so it is expected that unlinking tables may fail.
 	 */
-	for (i = first; i <= last; i++) {
-		struct lock_file *table_lock = &table_locks[i - first];
+	for (i = 0; i < nlocks; i++) {
+		struct lock_file *table_lock = &table_locks[i];
 		char *table_path = get_locked_file_path(table_lock);
 		unlink(table_path);
 		free(table_path);
@@ -1192,8 +1192,8 @@ static int stack_compact_range(struct reftable_stack *st,
 
 done:
 	rollback_lock_file(&tables_list_lock);
-	for (i = first; table_locks && i <= last; i++)
-		rollback_lock_file(&table_locks[i - first]);
+	for (i = 0; table_locks && i < nlocks; i++)
+		rollback_lock_file(&table_locks[i]);
 	reftable_free(table_locks);
 
 	delete_tempfile(&new_table);
-- 
2.46.0.46.g406f326d27.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 6/9] reftable/stack: do not die when fsyncing lock file files
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2024-08-08 14:06   ` [PATCH v3 5/9] reftable/stack: simplify tracking of table locks Patrick Steinhardt
@ 2024-08-08 14:06   ` Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 7/9] reftable/stack: use lock_file when adding table to "tables.list" Patrick Steinhardt
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 14:06 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]

We use `fsync_component_or_die()` when committing an addition to the
"tables.list" lock file, which unsurprisingly dies in case the fsync
fails. Given that this is part of the reftable library, we should never
die and instead let callers handle the error.

Adapt accordingly and use `fsync_component()` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 07e7ffc6b9..9ca549294f 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -674,8 +674,11 @@ int reftable_addition_commit(struct reftable_addition *add)
 		goto done;
 	}
 
-	fsync_component_or_die(FSYNC_COMPONENT_REFERENCE, lock_file_fd,
-			       get_tempfile_path(add->lock_file));
+	err = fsync_component(FSYNC_COMPONENT_REFERENCE, lock_file_fd);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
 
 	err = rename_tempfile(&add->lock_file, add->stack->list_file);
 	if (err < 0) {
-- 
2.46.0.46.g406f326d27.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 7/9] reftable/stack: use lock_file when adding table to "tables.list"
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2024-08-08 14:06   ` [PATCH v3 6/9] reftable/stack: do not die when fsyncing lock file files Patrick Steinhardt
@ 2024-08-08 14:06   ` Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 8/9] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 14:06 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 2863 bytes --]

When modifying "tables.list", we need to lock the list before updating
it to ensure that no concurrent writers modify the list at the same
point in time. While we do this via the `lock_file` subsystem when
compacting the stack, we manually handle the lock when adding a new
table to it. While not wrong, it is at least inconsistent.

Refactor the code to consistently lock "tables.list" via the `lock_file`
subsytem.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 9ca549294f..54982e0f7d 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -567,7 +567,7 @@ static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
 }
 
 struct reftable_addition {
-	struct tempfile *lock_file;
+	struct lock_file tables_list_lock;
 	struct reftable_stack *stack;
 
 	char **new_tables;
@@ -581,13 +581,13 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 					struct reftable_stack *st)
 {
 	struct strbuf lock_file_name = STRBUF_INIT;
-	int err = 0;
-	add->stack = st;
+	int err;
 
-	strbuf_addf(&lock_file_name, "%s.lock", st->list_file);
+	add->stack = st;
 
-	add->lock_file = create_tempfile(lock_file_name.buf);
-	if (!add->lock_file) {
+	err = hold_lock_file_for_update(&add->tables_list_lock, st->list_file,
+					LOCK_NO_DEREF);
+	if (err < 0) {
 		if (errno == EEXIST) {
 			err = REFTABLE_LOCK_ERROR;
 		} else {
@@ -596,7 +596,8 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 		goto done;
 	}
 	if (st->opts.default_permissions) {
-		if (chmod(add->lock_file->filename.buf, st->opts.default_permissions) < 0) {
+		if (chmod(get_lock_file_path(&add->tables_list_lock),
+			  st->opts.default_permissions) < 0) {
 			err = REFTABLE_IO_ERROR;
 			goto done;
 		}
@@ -635,7 +636,7 @@ static void reftable_addition_close(struct reftable_addition *add)
 	add->new_tables_len = 0;
 	add->new_tables_cap = 0;
 
-	delete_tempfile(&add->lock_file);
+	rollback_lock_file(&add->tables_list_lock);
 	strbuf_release(&nm);
 }
 
@@ -651,7 +652,7 @@ void reftable_addition_destroy(struct reftable_addition *add)
 int reftable_addition_commit(struct reftable_addition *add)
 {
 	struct strbuf table_list = STRBUF_INIT;
-	int lock_file_fd = get_tempfile_fd(add->lock_file);
+	int lock_file_fd = get_lock_file_fd(&add->tables_list_lock);
 	int err = 0;
 	size_t i;
 
@@ -680,7 +681,7 @@ int reftable_addition_commit(struct reftable_addition *add)
 		goto done;
 	}
 
-	err = rename_tempfile(&add->lock_file, add->stack->list_file);
+	err = commit_lock_file(&add->tables_list_lock);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
-- 
2.46.0.46.g406f326d27.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 8/9] reftable/stack: fix corruption on concurrent compaction
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2024-08-08 14:06   ` [PATCH v3 7/9] reftable/stack: use lock_file when adding table to "tables.list" Patrick Steinhardt
@ 2024-08-08 14:06   ` Patrick Steinhardt
  2024-08-08 14:06   ` [PATCH v3 9/9] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
  2024-08-08 16:52   ` [PATCH v3 0/9] reftable: improvements and fixes for compaction Karthik Nayak
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 14:06 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 7172 bytes --]

The locking employed by compaction uses the following schema:

  1. Lock "tables.list" and verify that it matches the version we have
     loaded in core.

  2. Lock each of the tables in the user-supplied range of tables that
     we are supposed to compact. These locks prohibit any concurrent
     process to compact those tables while we are doing that.

  3. Unlock "tables.list". This enables concurrent processes to add new
     tables to the stack, but also allows them to compact tables outside
     of the range of tables that we have locked.

  4. Perform the compaction.

  5. Lock "tables.list" again.

  6. Move the compacted table into place.

  7. Write the new order of tables, including the compacted table, into
     the lockfile.

  8. Commit the lockfile into place.

Letting concurrent processes modify the "tables.list" file while we are
doing the compaction is very much part of the design and thus expected.
After all, it may take some time to compact tables in the case where we
are compacting a lot of very large tables.

But there is a bug in the code. Suppose we have two processes which are
compacting two slices of the table. Given that we lock each of the
tables before compacting them, we know that the slices must be disjunct
from each other. But regardless of that, compaction performed by one
process will always impact what the other process needs to write to the
"tables.list" file.

Right now, we do not check whether the "tables.list" has been changed
after we have locked it for the second time in (5). This has the
consequence that we will always commit the old, cached in-core tables to
disk without paying to respect what the other process has written. This
scenario would then lead to data loss and corruption.

This can even happen in the simpler case of one compacting process and
one writing process. The newly-appended table by the writing process
would get discarded by the compacting process because it never sees the
new table.

Fix this bug by re-checking whether our stack is still up to date after
locking for the second time. If it isn't, then we adjust the indices of
tables to replace in the updated stack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 107 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 102 insertions(+), 5 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 54982e0f7d..3f13c3eb34 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1020,7 +1020,9 @@ static int stack_compact_range(struct reftable_stack *st,
 	struct lock_file *table_locks = NULL;
 	struct tempfile *new_table = NULL;
 	int is_empty_table = 0, err = 0;
+	size_t first_to_replace, last_to_replace;
 	size_t i, nlocks = 0;
+	char **names = NULL;
 
 	if (first > last || (!expiry && first == last)) {
 		err = 0;
@@ -1123,6 +1125,100 @@ static int stack_compact_range(struct reftable_stack *st,
 		}
 	}
 
+	/*
+	 * As we have unlocked the stack while compacting our slice of tables
+	 * it may have happened that a concurrently running process has updated
+	 * the stack while we were compacting. In that case, we need to check
+	 * whether the tables that we have just compacted still exist in the
+	 * stack in the exact same order as we have compacted them.
+	 *
+	 * If they do exist, then it is fine to continue and replace those
+	 * tables with our compacted version. If they don't, then we need to
+	 * abort.
+	 */
+	err = stack_uptodate(st);
+	if (err < 0)
+		goto done;
+	if (err > 0) {
+		ssize_t new_offset = -1;
+		int fd;
+
+		fd = open(st->list_file, O_RDONLY);
+		if (fd < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto done;
+		}
+
+		err = fd_read_lines(fd, &names);
+		close(fd);
+		if (err < 0)
+			goto done;
+
+		/*
+		 * Search for the offset of the first table that we have
+		 * compacted in the updated "tables.list" file.
+		 */
+		for (size_t i = 0; names[i]; i++) {
+			if (strcmp(names[i], st->readers[first]->name))
+				continue;
+
+			/*
+			 * We have found the first entry. Verify that all the
+			 * subsequent tables we have compacted still exist in
+			 * the modified stack in the exact same order as we
+			 * have compacted them.
+			 */
+			for (size_t j = 1; j < last - first + 1; j++) {
+				const char *old = first + j < st->merged->stack_len ?
+					st->readers[first + j]->name : NULL;
+				const char *new = names[i + j];
+
+				/*
+				 * If some entries are missing or in case the tables
+				 * have changed then we need to bail out. Again, this
+				 * shouldn't ever happen because we have locked the
+				 * tables we are compacting.
+				 */
+				if (!old || !new || strcmp(old, new)) {
+					err = REFTABLE_OUTDATED_ERROR;
+					goto done;
+				}
+			}
+
+			new_offset = i;
+			break;
+		}
+
+		/*
+		 * In case we didn't find our compacted tables in the stack we
+		 * need to bail out. In theory, this should have never happened
+		 * because we locked the tables we are compacting.
+		 */
+		if (new_offset < 0) {
+			err = REFTABLE_OUTDATED_ERROR;
+			goto done;
+		}
+
+		/*
+		 * We have found the new range that we want to replace, so
+		 * let's update the range of tables that we want to replace.
+		 */
+		first_to_replace = new_offset;
+		last_to_replace = last + (new_offset - first);
+	} else {
+		/*
+		 * `fd_read_lines()` uses a `NULL` sentinel to indicate that
+		 * the array is at its end. As we use `free_names()` to free
+		 * the array, we need to include this sentinel value here and
+		 * thus have to allocate `stack_len + 1` many entries.
+		 */
+		REFTABLE_CALLOC_ARRAY(names, st->merged->stack_len + 1);
+		for (size_t i = 0; i < st->merged->stack_len; i++)
+			names[i] = xstrdup(st->readers[i]->name);
+		first_to_replace = first;
+		last_to_replace = last;
+	}
+
 	/*
 	 * If the resulting compacted table is not empty, then we need to move
 	 * it into place now.
@@ -1145,12 +1241,12 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * have just written. In case the compacted table became empty we
 	 * simply skip writing it.
 	 */
-	for (i = 0; i < first; i++)
-		strbuf_addf(&tables_list_buf, "%s\n", st->readers[i]->name);
+	for (i = 0; i < first_to_replace; i++)
+		strbuf_addf(&tables_list_buf, "%s\n", names[i]);
 	if (!is_empty_table)
 		strbuf_addf(&tables_list_buf, "%s\n", new_table_name.buf);
-	for (i = last + 1; i < st->merged->stack_len; i++)
-		strbuf_addf(&tables_list_buf, "%s\n", st->readers[i]->name);
+	for (i = last_to_replace + 1; names[i]; i++)
+		strbuf_addf(&tables_list_buf, "%s\n", names[i]);
 
 	err = write_in_full(get_lock_file_fd(&tables_list_lock),
 			    tables_list_buf.buf, tables_list_buf.len);
@@ -1203,9 +1299,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	delete_tempfile(&new_table);
 	strbuf_release(&new_table_name);
 	strbuf_release(&new_table_path);
-
 	strbuf_release(&tables_list_buf);
 	strbuf_release(&table_name);
+	free_names(names);
+
 	return err;
 }
 
-- 
2.46.0.46.g406f326d27.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 9/9] reftable/stack: handle locked tables during auto-compaction
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2024-08-08 14:06   ` [PATCH v3 8/9] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
@ 2024-08-08 14:06   ` Patrick Steinhardt
  2024-08-08 16:52   ` [PATCH v3 0/9] reftable: improvements and fixes for compaction Karthik Nayak
  9 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-08-08 14:06 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 8766 bytes --]

When compacting tables, it may happen that we want to compact a set of
tables which are already locked by a concurrent process that compacts
them. In the case where we wanted to perform a full compaction of all
tables it is sensible to bail out in this case, as we cannot fulfill the
requested action.

But when performing auto-compaction it isn't necessarily in our best
interest of us to abort the whole operation. For example, due to the
geometric compacting schema that we use, it may be that process A takes
a lot of time to compact the bulk of all tables whereas process B
appends a bunch of new tables to the stack. B would in this case also
notice that it has to compact the tables that process A is compacting
already and thus also try to compact the same range, probably including
the new tables it has appended. But because those tables are locked
already, it will fail and thus abort the complete auto-compaction. The
consequence is that the stack will grow longer and longer while A isn't
yet done with compaction, which will lead to a growing performance
impact.

Instead of aborting auto-compaction altogether, let's gracefully handle
this situation by instead compacting tables which aren't locked. To do
so, instead of locking from the beginning of the slice-to-be-compacted,
we start locking tables from the end of the slice. Once we hit the first
table that is locked already, we abort. If we succeeded to lock two or
more tables, then we simply reduce the slice of tables that we're about
to compact to those which we managed to lock.

This ensures that we can at least make some progress for compaction in
said scenario. It also helps in other scenarios, like for example when a
process died and left a stale lockfile behind. In such a case we can at
least ensure some compaction on a best-effort basis.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c           | 59 +++++++++++++++++++++++++++++++-------
 reftable/stack_test.c      | 12 ++++----
 t/t0610-reftable-basics.sh | 21 +++++++++-----
 3 files changed, 70 insertions(+), 22 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 3f13c3eb34..2071e428a8 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -999,6 +999,15 @@ static int stack_write_compact(struct reftable_stack *st,
 	return err;
 }
 
+enum stack_compact_range_flags {
+	/*
+	 * Perform a best-effort compaction. That is, even if we cannot lock
+	 * all tables in the specified range, we will try to compact the
+	 * remaining slice.
+	 */
+	STACK_COMPACT_RANGE_BEST_EFFORT = (1 << 0),
+};
+
 /*
  * Compact all tables in the range `[first, last)` into a single new table.
  *
@@ -1010,7 +1019,8 @@ static int stack_write_compact(struct reftable_stack *st,
  */
 static int stack_compact_range(struct reftable_stack *st,
 			       size_t first, size_t last,
-			       struct reftable_log_expiry_config *expiry)
+			       struct reftable_log_expiry_config *expiry,
+			       unsigned int flags)
 {
 	struct strbuf tables_list_buf = STRBUF_INIT;
 	struct strbuf new_table_name = STRBUF_INIT;
@@ -1052,19 +1062,47 @@ static int stack_compact_range(struct reftable_stack *st,
 	/*
 	 * Lock all tables in the user-provided range. This is the slice of our
 	 * stack which we'll compact.
+	 *
+	 * Note that we lock tables in reverse order from last to first. The
+	 * intent behind this is to allow a newer process to perform best
+	 * effort compaction of tables that it has added in the case where an
+	 * older process is still busy compacting tables which are preexisting
+	 * from the point of view of the newer process.
 	 */
 	REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1);
-	for (i = first; i <= last; i++) {
-		stack_filename(&table_name, st, reader_name(st->readers[i]));
+	for (i = last + 1; i > first; i--) {
+		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));
 
 		err = hold_lock_file_for_update(&table_locks[nlocks],
 						table_name.buf, LOCK_NO_DEREF);
 		if (err < 0) {
-			if (errno == EEXIST)
+			/*
+			 * When the table is locked already we may do a
+			 * best-effort compaction and compact only the tables
+			 * that we have managed to lock so far. This of course
+			 * requires that we have been able to lock at least two
+			 * tables, otherwise there would be nothing to compact.
+			 * In that case, we return a lock error to our caller.
+			 */
+			if (errno == EEXIST && last - (i - 1) >= 2 &&
+			    flags & STACK_COMPACT_RANGE_BEST_EFFORT) {
+				err = 0;
+				/*
+				 * The subtraction is to offset the index, the
+				 * addition is to only compact up to the table
+				 * of the preceding iteration. They obviously
+				 * cancel each other out, but that may be
+				 * non-obvious when it was omitted.
+				 */
+				first = (i - 1) + 1;
+				break;
+			} else if (errno == EEXIST) {
 				err = REFTABLE_LOCK_ERROR;
-			else
+				goto done;
+			} else {
 				err = REFTABLE_IO_ERROR;
-			goto done;
+				goto done;
+			}
 		}
 
 		/*
@@ -1308,9 +1346,10 @@ static int stack_compact_range(struct reftable_stack *st,
 
 static int stack_compact_range_stats(struct reftable_stack *st,
 				     size_t first, size_t last,
-				     struct reftable_log_expiry_config *config)
+				     struct reftable_log_expiry_config *config,
+				     unsigned int flags)
 {
-	int err = stack_compact_range(st, first, last, config);
+	int err = stack_compact_range(st, first, last, config, flags);
 	if (err == REFTABLE_LOCK_ERROR)
 		st->stats.failures++;
 	return err;
@@ -1320,7 +1359,7 @@ int reftable_stack_compact_all(struct reftable_stack *st,
 			       struct reftable_log_expiry_config *config)
 {
 	size_t last = st->merged->stack_len ? st->merged->stack_len - 1 : 0;
-	return stack_compact_range_stats(st, 0, last, config);
+	return stack_compact_range_stats(st, 0, last, config, 0);
 }
 
 static int segment_size(struct segment *s)
@@ -1427,7 +1466,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st)
 	reftable_free(sizes);
 	if (segment_size(&seg) > 0)
 		return stack_compact_range_stats(st, seg.start, seg.end - 1,
-						 NULL);
+						 NULL, STACK_COMPACT_RANGE_BEST_EFFORT);
 
 	return 0;
 }
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 3ed8e44924..8c36590ff0 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -917,13 +917,15 @@ static void test_reftable_stack_auto_compaction_with_locked_tables(void)
 	write_file_buf(buf.buf, "", 0);
 
 	/*
-	 * Ideally, we'd handle the situation where any of the tables is locked
-	 * gracefully. We don't (yet) do this though and thus fail.
+	 * When parts of the stack are locked, then auto-compaction does a best
+	 * effort compaction of those tables which aren't locked. So while this
+	 * would in theory compact all tables, due to the preexisting lock we
+	 * only compact the newest two tables.
 	 */
 	err = reftable_stack_auto_compact(st);
-	EXPECT(err == REFTABLE_LOCK_ERROR);
-	EXPECT(st->stats.failures == 1);
-	EXPECT(st->merged->stack_len == 5);
+	EXPECT_ERR(err);
+	EXPECT(st->stats.failures == 0);
+	EXPECT(st->merged->stack_len == 4);
 
 	reftable_stack_destroy(st);
 	strbuf_release(&buf);
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index b06c46999d..37510c2b2a 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -478,19 +478,26 @@ test_expect_success "$command: auto compaction" '
 
 		test_oid blob17_2 | git hash-object -w --stdin &&
 
-		# Lock all tables write some refs. Auto-compaction will be
-		# unable to compact tables and thus fails gracefully, leaving
-		# the stack in a sub-optimal state.
-		ls .git/reftable/*.ref |
+		# Lock all tables, write some refs. Auto-compaction will be
+		# unable to compact tables and thus fails gracefully,
+		# compacting only those tables which are not locked.
+		ls .git/reftable/*.ref | sort |
 		while read table
 		do
-			touch "$table.lock" || exit 1
+			touch "$table.lock" &&
+			basename "$table" >>tables.expect || exit 1
 		done &&
+		test_line_count = 2 .git/reftable/tables.list &&
 		git branch B &&
 		git branch C &&
-		rm .git/reftable/*.lock &&
-		test_line_count = 4 .git/reftable/tables.list &&
 
+		# The new tables are auto-compacted, but the locked tables are
+		# left intact.
+		test_line_count = 3 .git/reftable/tables.list &&
+		head -n 2 .git/reftable/tables.list >tables.head &&
+		test_cmp tables.expect tables.head &&
+
+		rm .git/reftable/*.lock &&
 		git $command --auto &&
 		test_line_count = 1 .git/reftable/tables.list
 	)
-- 
2.46.0.46.g406f326d27.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/9] reftable: improvements and fixes for compaction
  2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2024-08-08 14:06   ` [PATCH v3 9/9] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
@ 2024-08-08 16:52   ` Karthik Nayak
  9 siblings, 0 replies; 50+ messages in thread
From: Karthik Nayak @ 2024-08-08 16:52 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Justin Tobler, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 520 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> this is the second version of my patch series that aims to improve the
> way reftable stack perform compaction.
>
> Changes compared to v2:
>
>   - Drop the unused `reftable_write_options` structure in
>     `write_n_ref_tables()`.
>
>   - Fix a commit message typo.
>
>   - Reorder some variable assignments to feel more natural.
>
> Thanks!
>
> Patrick
>

This version includes all the suggestions I made on the previous
version. Looks good to me now!

Thanks

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2024-08-08 16:52 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-31 14:14 [PATCH 0/8] reftable: improvements and fixes for compaction Patrick Steinhardt
2024-07-31 14:14 ` [PATCH 1/8] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
2024-08-02 20:26   ` Junio C Hamano
2024-07-31 14:14 ` [PATCH 2/8] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
2024-08-02 21:05   ` Junio C Hamano
2024-08-05 12:11     ` Patrick Steinhardt
2024-07-31 14:15 ` [PATCH 3/8] reftable/stack: update stats on failed full compaction Patrick Steinhardt
2024-07-31 14:15 ` [PATCH 4/8] reftable/stack: simplify tracking of table locks Patrick Steinhardt
2024-07-31 21:57   ` Justin Tobler
2024-08-02 23:00   ` Junio C Hamano
2024-08-05 12:11     ` Patrick Steinhardt
2024-07-31 14:15 ` [PATCH 5/8] reftable/stack: do not die when fsyncing lock file files Patrick Steinhardt
2024-07-31 14:15 ` [PATCH 6/8] reftable/stack: use lock_file when adding table to "tables.list" Patrick Steinhardt
2024-07-31 23:02   ` Justin Tobler
2024-08-01  8:40     ` Patrick Steinhardt
2024-07-31 14:15 ` [PATCH 7/8] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
2024-08-01  1:04   ` Justin Tobler
2024-08-01  8:41     ` Patrick Steinhardt
2024-07-31 14:15 ` [PATCH 8/8] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
2024-08-02 23:24   ` Junio C Hamano
2024-08-05 12:11     ` Patrick Steinhardt
2024-08-05 13:07 ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Patrick Steinhardt
2024-08-05 13:07   ` [PATCH v2 1/9] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
2024-08-05 13:07   ` [PATCH v2 2/9] reftable/stack: extract function to setup stack with N tables Patrick Steinhardt
2024-08-05 13:07   ` [PATCH v2 3/9] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
2024-08-08 10:46     ` Karthik Nayak
2024-08-05 13:08   ` [PATCH v2 4/9] reftable/stack: update stats on failed full compaction Patrick Steinhardt
2024-08-05 13:08   ` [PATCH v2 5/9] reftable/stack: simplify tracking of table locks Patrick Steinhardt
2024-08-05 13:08   ` [PATCH v2 6/9] reftable/stack: do not die when fsyncing lock file files Patrick Steinhardt
2024-08-05 13:08   ` [PATCH v2 7/9] reftable/stack: use lock_file when adding table to "tables.list" Patrick Steinhardt
2024-08-05 13:08   ` [PATCH v2 8/9] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
2024-08-08 12:14     ` Karthik Nayak
2024-08-08 13:29       ` Patrick Steinhardt
2024-08-05 13:08   ` [PATCH v2 9/9] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
2024-08-06 18:46     ` Justin Tobler
2024-08-07  5:31       ` Patrick Steinhardt
2024-08-07 19:12         ` Justin Tobler
2024-08-08 12:25     ` Karthik Nayak
2024-08-08 12:26   ` [PATCH v2 0/9] reftable: improvements and fixes for compaction Karthik Nayak
2024-08-08 14:06 ` [PATCH v3 " Patrick Steinhardt
2024-08-08 14:06   ` [PATCH v3 1/9] reftable/stack: refactor function to gather table sizes Patrick Steinhardt
2024-08-08 14:06   ` [PATCH v3 2/9] reftable/stack: extract function to setup stack with N tables Patrick Steinhardt
2024-08-08 14:06   ` [PATCH v3 3/9] reftable/stack: test compaction with already-locked tables Patrick Steinhardt
2024-08-08 14:06   ` [PATCH v3 4/9] reftable/stack: update stats on failed full compaction Patrick Steinhardt
2024-08-08 14:06   ` [PATCH v3 5/9] reftable/stack: simplify tracking of table locks Patrick Steinhardt
2024-08-08 14:06   ` [PATCH v3 6/9] reftable/stack: do not die when fsyncing lock file files Patrick Steinhardt
2024-08-08 14:06   ` [PATCH v3 7/9] reftable/stack: use lock_file when adding table to "tables.list" Patrick Steinhardt
2024-08-08 14:06   ` [PATCH v3 8/9] reftable/stack: fix corruption on concurrent compaction Patrick Steinhardt
2024-08-08 14:06   ` [PATCH v3 9/9] reftable/stack: handle locked tables during auto-compaction Patrick Steinhardt
2024-08-08 16:52   ` [PATCH v3 0/9] reftable: improvements and fixes for compaction Karthik Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).