git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] reftable: graceful concurrent writes
@ 2024-09-17 13:43 Patrick Steinhardt
  2024-09-17 13:43 ` [PATCH 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-17 13:43 UTC (permalink / raw)
  To: git

Hi,

the reftable backend cannot properly handle concurrent writes due to two
reasons:

  - It will bail out immediately when it sees a locked "tables.list"
    file. This is being addressed by introducing a configurable locking
    timeout, similar to how we have it for both loose and packed refs.

  - It will bail out when it notices that its stack is out-of-date after
    having locked the "tables.list" file. This is addressed by reloading
    the stack as requested after locking it, which is fine because our
    transactional API would verify queued ref updates against their
    expected state after the lock was acquired anyway.

So with this patch series we can now spawn concurrent writers and they
are expected to succeed, which is demonstrated by the test added by the
last patch.

Thanks!

Patrick

Patrick Steinhardt (3):
  refs/reftable: introduce "reftable.lockTimeout"
  reftable/stack: allow locking of outdated stacks
  refs/reftable: reload locked stack when preparing transaction

 Documentation/config/reftable.txt |  7 ++++
 refs/reftable-backend.c           |  9 ++++-
 reftable/reftable-stack.h         | 13 +++++-
 reftable/reftable-writer.h        |  8 ++++
 reftable/stack.c                  | 38 ++++++++++++------
 t/t0610-reftable-basics.sh        | 58 ++++++++++++++++++++++++++
 t/unit-tests/t-reftable-stack.c   | 67 ++++++++++++++++++++++++++++++-
 7 files changed, 181 insertions(+), 19 deletions(-)

-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-17 13:43 [PATCH 0/3] reftable: graceful concurrent writes Patrick Steinhardt
@ 2024-09-17 13:43 ` Patrick Steinhardt
  2024-09-17 17:46   ` karthik nayak
  2024-09-17 13:43 ` [PATCH 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-17 13:43 UTC (permalink / raw)
  To: git

When multiple concurrent processes try to update references in a
repository they may try to lock the same lockfiles. This can happen even
when the updates are non-conflicting and can both be applied, so it
doesn't always make sense to abort the transaction immediately. Both the
"loose" and "packed" backends thus have a grace period that they wait
for the lock to be released that can be controlled via the config values
"core.filesRefLockTimeout" and "core.packedRefsTimeout", respectively.

The reftable backend doesn't have such a setting yet and instead fails
immediately when it sees such a lock. But the exact same concepts apply
here as they do apply to the other backends.

Introduce a new "reftable.lockTimeout" config that controls how long we
may wait for a "tables.list" lock to be released. The default value of
this config is 100ms, which is the same default as we have it for the
"loose" backend.

Note that even though we also lock individual tables, this config really
only applies to the "tables.list" file. This is because individual
tables are only ever locked when we already hold the "tables.list" lock
during compaction. When we observe such a lock we in fact do not want to
compact the table at all because it is already in the process of being
compacted by a concurrent process. So applying the same timeout here
would not make any sense and only delay progress.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/reftable.txt |  7 +++++++
 refs/reftable-backend.c           |  4 ++++
 reftable/reftable-writer.h        |  8 ++++++++
 reftable/stack.c                  | 18 ++++++++++++------
 t/t0610-reftable-basics.sh        | 27 +++++++++++++++++++++++++++
 5 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/Documentation/config/reftable.txt b/Documentation/config/reftable.txt
index 05157279778..6c9449d0232 100644
--- a/Documentation/config/reftable.txt
+++ b/Documentation/config/reftable.txt
@@ -46,3 +46,10 @@ reftable.geometricFactor::
 By default, the geometric sequence uses a factor of 2, meaning that for any
 table, the next-biggest table must at least be twice as big. A maximum factor
 of 256 is supported.
+
+reftable.lockTimeout::
+	Whenever the reftable backend appends a new table to the stack, it has
+	to lock the central "tables.list" file before updating it. This config
+	controls how long the process will wait to acquire the lock in case
+	another process has already acquired it. Default is 1000 (i.e., retry
+	for 1 second).
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 1c4b19e737f..e90ddfb98dd 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -256,6 +256,9 @@ static int reftable_be_config(const char *var, const char *value,
 		if (factor > UINT8_MAX)
 			die("reftable geometric factor cannot exceed %u", (unsigned)UINT8_MAX);
 		opts->auto_compaction_factor = factor;
+	} else if (!strcmp(var, "reftable.locktimeout")) {
+		unsigned long lock_timeout = git_config_ulong(var, value, ctx->kvi);
+		opts->lock_timeout_ms = lock_timeout;
 	}
 
 	return 0;
@@ -281,6 +284,7 @@ static struct ref_store *reftable_be_init(struct repository *repo,
 	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
 	refs->write_options.disable_auto_compact =
 		!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1);
+	refs->write_options.lock_timeout_ms = 100;
 
 	git_config(reftable_be_config, &refs->write_options);
 
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index 189b1f4144f..d5f2446220b 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -51,6 +51,14 @@ struct reftable_write_options {
 	 * tables to compact. Defaults to 2 if unset.
 	 */
 	uint8_t auto_compaction_factor;
+
+	/*
+	 * The number of milliseconds to wait when trying to lock "tables.list".
+	 * Note that this does not apply to locking individual tables, as these
+	 * should only ever be locked when already holding the "tables.list"
+	 * lock.
+	 */
+	unsigned lock_timeout_ms;
 };
 
 /* reftable_block_stats holds statistics for a single block type */
diff --git a/reftable/stack.c b/reftable/stack.c
index ce0a35216ba..5ccad2cff31 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -603,8 +603,10 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 
 	add->stack = st;
 
-	err = hold_lock_file_for_update(&add->tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&add->tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST) {
 			err = REFTABLE_LOCK_ERROR;
@@ -1056,8 +1058,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * Hold the lock so that we can read "tables.list" and lock all tables
 	 * which are part of the user-specified range.
 	 */
-	err = hold_lock_file_for_update(&tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST)
 			err = REFTABLE_LOCK_ERROR;
@@ -1156,8 +1160,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * "tables.list". We'll then replace the compacted range of tables with
 	 * the new table.
 	 */
-	err = hold_lock_file_for_update(&tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST)
 			err = REFTABLE_LOCK_ERROR;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 37510c2b2a5..62da3e37823 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -423,6 +423,33 @@ test_expect_success 'ref transaction: fails gracefully when auto compaction fail
 	)
 '
 
+test_expect_success 'ref transaction: timeout acquiring tables.list lock' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		>.git/reftable/tables.list.lock &&
+		test_must_fail git update-ref refs/heads/branch HEAD 2>err &&
+		test_grep "cannot lock references" err
+	)
+'
+
+test_expect_success 'ref transaction: retry acquiring tables.list lock' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		LOCK=.git/reftable/tables.list.lock &&
+		>$LOCK &&
+		{
+			( sleep 1 && rm -f $LOCK ) &
+		} &&
+		git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
+	)
+'
+
 test_expect_success 'pack-refs: compacts tables' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/3] reftable/stack: allow locking of outdated stacks
  2024-09-17 13:43 [PATCH 0/3] reftable: graceful concurrent writes Patrick Steinhardt
  2024-09-17 13:43 ` [PATCH 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
@ 2024-09-17 13:43 ` Patrick Steinhardt
  2024-09-17 13:43 ` [PATCH 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-17 13:43 UTC (permalink / raw)
  To: git

In `reftable_stack_new_addition()` we first lock the stack and then
check whether it is still up-to-date. If it is not we return an error to
the caller indicating that the stack is outdated.

This is overly restrictive in our ref transaction interface though: we
lock the stack right before we start to verify the transaction, so we do
not really care whether it is outdated or not. What we really want is
that the stack is up-to-date after it has been locked so that we can
verify queued updates against its current state while we know that it is
locked for concurrent modification.

Introduce a new flag `REFTABLE_STACK_NEW_ADDITION_RELOAD` that alters
the behaviour of `reftable_stack_init_addition()` in this case: when we
notice that it is out-of-date we reload it instead of returning an error
to the caller.

This logic will be wired up in the reftable backend in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c         |  4 +-
 reftable/reftable-stack.h       | 13 ++++++-
 reftable/stack.c                | 20 ++++++----
 t/unit-tests/t-reftable-stack.c | 67 ++++++++++++++++++++++++++++++++-
 4 files changed, 91 insertions(+), 13 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index e90ddfb98dd..c500fb820a7 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -766,7 +766,7 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		if (ret)
 			return ret;
 
-		ret = reftable_stack_new_addition(&addition, stack);
+		ret = reftable_stack_new_addition(&addition, stack, 0);
 		if (ret) {
 			if (ret == REFTABLE_LOCK_ERROR)
 				strbuf_addstr(err, "cannot lock references");
@@ -2203,7 +2203,7 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	if (ret < 0)
 		goto done;
 
-	ret = reftable_stack_new_addition(&add, stack);
+	ret = reftable_stack_new_addition(&add, stack, 0);
 	if (ret < 0)
 		goto done;
 
diff --git a/reftable/reftable-stack.h b/reftable/reftable-stack.h
index f4f8cabc7fb..67316b215ed 100644
--- a/reftable/reftable-stack.h
+++ b/reftable/reftable-stack.h
@@ -37,12 +37,21 @@ uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
 /* holds a transaction to add tables at the top of a stack. */
 struct reftable_addition;
 
+enum {
+	/*
+	 * Reload the stack when the stack is out-of-date after locking it.
+	 */
+	REFTABLE_STACK_NEW_ADDITION_RELOAD = (1 << 0),
+};
+
 /*
  * returns a new transaction to add reftables to the given stack. As a side
- * effect, the ref database is locked.
+ * effect, the ref database is locked. Accepts REFTABLE_STACK_NEW_ADDITION_*
+ * flags.
  */
 int reftable_stack_new_addition(struct reftable_addition **dest,
-				struct reftable_stack *st);
+				struct reftable_stack *st,
+				int flags);
 
 /* Adds a reftable to transaction. */
 int reftable_addition_add(struct reftable_addition *add,
diff --git a/reftable/stack.c b/reftable/stack.c
index 5ccad2cff31..f9c95d5fa62 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -596,7 +596,8 @@ struct reftable_addition {
 #define REFTABLE_ADDITION_INIT {0}
 
 static int reftable_stack_init_addition(struct reftable_addition *add,
-					struct reftable_stack *st)
+					struct reftable_stack *st,
+					int flags)
 {
 	struct strbuf lock_file_name = STRBUF_INIT;
 	int err;
@@ -626,6 +627,11 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 	err = stack_uptodate(st);
 	if (err < 0)
 		goto done;
+	if (err > 0 && flags & REFTABLE_STACK_NEW_ADDITION_RELOAD) {
+		err = reftable_stack_reload_maybe_reuse(add->stack, 1);
+		if (err)
+			goto done;
+	}
 	if (err > 0) {
 		err = REFTABLE_OUTDATED_ERROR;
 		goto done;
@@ -633,9 +639,8 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 
 	add->next_update_index = reftable_stack_next_update_index(st);
 done:
-	if (err) {
+	if (err)
 		reftable_addition_close(add);
-	}
 	strbuf_release(&lock_file_name);
 	return err;
 }
@@ -739,13 +744,14 @@ int reftable_addition_commit(struct reftable_addition *add)
 }
 
 int reftable_stack_new_addition(struct reftable_addition **dest,
-				struct reftable_stack *st)
+				struct reftable_stack *st,
+				int flags)
 {
 	int err = 0;
 	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
 	REFTABLE_CALLOC_ARRAY(*dest, 1);
 	**dest = empty;
-	err = reftable_stack_init_addition(*dest, st);
+	err = reftable_stack_init_addition(*dest, st, flags);
 	if (err) {
 		reftable_free(*dest);
 		*dest = NULL;
@@ -759,7 +765,7 @@ static int stack_try_add(struct reftable_stack *st,
 			 void *arg)
 {
 	struct reftable_addition add = REFTABLE_ADDITION_INIT;
-	int err = reftable_stack_init_addition(&add, st);
+	int err = reftable_stack_init_addition(&add, st, 0);
 	if (err < 0)
 		goto done;
 
@@ -1608,7 +1614,7 @@ static int reftable_stack_clean_locked(struct reftable_stack *st)
 int reftable_stack_clean(struct reftable_stack *st)
 {
 	struct reftable_addition *add = NULL;
-	int err = reftable_stack_new_addition(&add, st);
+	int err = reftable_stack_new_addition(&add, st, 0);
 	if (err < 0) {
 		goto done;
 	}
diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
index d62a9c1bed5..a37cc698d87 100644
--- a/t/unit-tests/t-reftable-stack.c
+++ b/t/unit-tests/t-reftable-stack.c
@@ -271,7 +271,7 @@ static void t_reftable_stack_transaction_api(void)
 
 	reftable_addition_destroy(add);
 
-	err = reftable_stack_new_addition(&add, st);
+	err = reftable_stack_new_addition(&add, st, 0);
 	check(!err);
 
 	err = reftable_addition_add(add, write_test_ref, &ref);
@@ -292,6 +292,68 @@ static void t_reftable_stack_transaction_api(void)
 	clear_dir(dir);
 }
 
+static void t_reftable_stack_transaction_with_reload(void)
+{
+	char *dir = get_tmp_dir(__LINE__);
+	struct reftable_stack *st1 = NULL, *st2 = NULL;
+	int err;
+	struct reftable_addition *add = NULL;
+	struct reftable_ref_record refs[2] = {
+		{
+			.refname = (char *) "refs/heads/a",
+			.update_index = 1,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { '1' },
+		},
+		{
+			.refname = (char *) "refs/heads/b",
+			.update_index = 2,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { '1' },
+		},
+	};
+	struct reftable_ref_record ref = { 0 };
+
+	err = reftable_new_stack(&st1, dir, NULL);
+	check(!err);
+	err = reftable_new_stack(&st2, dir, NULL);
+	check(!err);
+
+	err = reftable_stack_new_addition(&add, st1, 0);
+	check(!err);
+	err = reftable_addition_add(add, write_test_ref, &refs[0]);
+	check(!err);
+	err = reftable_addition_commit(add);
+	check(!err);
+	reftable_addition_destroy(add);
+
+	/*
+	 * The second stack is now outdated, which we should notice. We do not
+	 * create the addition and lock the stack by default, but allow the
+	 * reload to happen when REFTABLE_STACK_NEW_ADDITION_RELOAD is set.
+	 */
+	err = reftable_stack_new_addition(&add, st2, 0);
+	check_int(err, ==, REFTABLE_OUTDATED_ERROR);
+	err = reftable_stack_new_addition(&add, st2, REFTABLE_STACK_NEW_ADDITION_RELOAD);
+	check(!err);
+	err = reftable_addition_add(add, write_test_ref, &refs[1]);
+	check(!err);
+	err = reftable_addition_commit(add);
+	check(!err);
+	reftable_addition_destroy(add);
+
+	for (size_t i = 0; i < ARRAY_SIZE(refs); i++) {
+		err = reftable_stack_read_ref(st2, refs[i].refname, &ref);
+		check(!err);
+		check(reftable_ref_record_equal(&refs[i], &ref, GIT_SHA1_RAWSZ));
+	}
+
+	reftable_ref_record_release(&ref);
+	reftable_stack_destroy(st1);
+	reftable_stack_destroy(st2);
+	clear_dir(dir);
+}
+
 static void t_reftable_stack_transaction_api_performs_auto_compaction(void)
 {
 	char *dir = get_tmp_dir(__LINE__);
@@ -322,7 +384,7 @@ static void t_reftable_stack_transaction_api_performs_auto_compaction(void)
 		 */
 		st->opts.disable_auto_compact = i != n;
 
-		err = reftable_stack_new_addition(&add, st);
+		err = reftable_stack_new_addition(&add, st, 0);
 		check(!err);
 
 		err = reftable_addition_add(add, write_test_ref, &ref);
@@ -1314,6 +1376,7 @@ int cmd_main(int argc UNUSED, const char *argv[] UNUSED)
 	TEST(t_reftable_stack_reload_with_missing_table(), "stack iteration with garbage tables");
 	TEST(t_reftable_stack_tombstone(), "'tombstone' refs in stack");
 	TEST(t_reftable_stack_transaction_api(), "update transaction to stack");
+	TEST(t_reftable_stack_transaction_with_reload(), "transaction with reload");
 	TEST(t_reftable_stack_transaction_api_performs_auto_compaction(), "update transaction triggers auto-compaction");
 	TEST(t_reftable_stack_update_index_check(), "update transactions with equal update indices");
 	TEST(t_reftable_stack_uptodate(), "stack must be reloaded before ref update");
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-09-17 13:43 [PATCH 0/3] reftable: graceful concurrent writes Patrick Steinhardt
  2024-09-17 13:43 ` [PATCH 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
  2024-09-17 13:43 ` [PATCH 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
@ 2024-09-17 13:43 ` Patrick Steinhardt
  2024-09-17 18:26 ` [PATCH 0/3] reftable: graceful concurrent writes karthik nayak
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-17 13:43 UTC (permalink / raw)
  To: git

When starting a reftable transaction we lock all stacks we are about to
modify. While it may happen that the stack is out-of-date at this point
in time we don't really care: transactional updates encode the expected
state of a certain reference, so all that we really want to verify is
that the _current_ value matches that expected state.

Pass `REFTABLE_STACK_NEW_ADDITION_RELOAD` when locking the stack such
that an out-of-date stack will be reloaded after having been locked.
This change is safe because all verifications of the expected state
happen after this step anyway.

Add a testcase that verifies that many writers are now able to write to
the stack concurrently without failures and with a deterministic end
result.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c    |  3 ++-
 t/t0610-reftable-basics.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index c500fb820a7..d4b383ca179 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -766,7 +766,8 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		if (ret)
 			return ret;
 
-		ret = reftable_stack_new_addition(&addition, stack, 0);
+		ret = reftable_stack_new_addition(&addition, stack,
+						  REFTABLE_STACK_NEW_ADDITION_RELOAD);
 		if (ret) {
 			if (ret == REFTABLE_LOCK_ERROR)
 				strbuf_addstr(err, "cannot lock references");
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 62da3e37823..2d951c8ceb6 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -450,6 +450,37 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
 	)
 '
 
+test_expect_success 'ref transaction: many concurrent writers' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		# Set a high timeout such that a busy CI machine will not abort
+		# early. 10 seconds should hopefully be ample of time to make
+		# this non-flaky.
+		git config set reftable.lockTimeout 10000 &&
+		test_commit --no-tag initial &&
+
+		head=$(git rev-parse HEAD) &&
+		for i in $(test_seq 100)
+		do
+			printf "%s commit\trefs/heads/branch-%s\n" "$head" "$i" ||
+			return 1
+		done >expect &&
+		printf "%s commit\trefs/heads/main\n" "$head" >>expect &&
+
+		for i in $(test_seq 100)
+		do
+			{ git update-ref refs/heads/branch-$i HEAD& } ||
+			return 1
+		done &&
+
+		wait &&
+		git for-each-ref --sort=v:refname >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'pack-refs: compacts tables' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-17 13:43 ` [PATCH 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
@ 2024-09-17 17:46   ` karthik nayak
  2024-09-17 17:50     ` Eric Sunshine
  2024-09-18  4:31     ` Patrick Steinhardt
  0 siblings, 2 replies; 42+ messages in thread
From: karthik nayak @ 2024-09-17 17:46 UTC (permalink / raw)
  To: Patrick Steinhardt, git

[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

[snip]

> +reftable.lockTimeout::
> +	Whenever the reftable backend appends a new table to the stack, it has
> +	to lock the central "tables.list" file before updating it. This config
> +	controls how long the process will wait to acquire the lock in case
> +	another process has already acquired it. Default is 1000 (i.e., retry
> +	for 1 second).

Isn't the default 100ms? As that was what was mentioned in the commit
message

[snip]

> +test_expect_success 'ref transaction: retry acquiring tables.list lock' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit initial &&
> +		LOCK=.git/reftable/tables.list.lock &&
> +		>$LOCK &&
> +		{
> +			( sleep 1 && rm -f $LOCK ) &
> +		} &&
> +		git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
> +	)
> +'

Nit: This does stall the test for 1s. Which is slightly annoying when
running single tests locally. Couldn't we achieve this by doing `sleep
0.1`?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-17 17:46   ` karthik nayak
@ 2024-09-17 17:50     ` Eric Sunshine
  2024-09-18  4:31       ` Patrick Steinhardt
  2024-09-18  4:31     ` Patrick Steinhardt
  1 sibling, 1 reply; 42+ messages in thread
From: Eric Sunshine @ 2024-09-17 17:50 UTC (permalink / raw)
  To: karthik nayak; +Cc: Patrick Steinhardt, git

On Tue, Sep 17, 2024 at 1:46 PM karthik nayak <karthik.188@gmail.com> wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > +test_expect_success 'ref transaction: retry acquiring tables.list lock' '
> > +     test_when_finished "rm -rf repo" &&
> > +     git init repo &&
> > +     (
> > +             cd repo &&
> > +             test_commit initial &&
> > +             LOCK=.git/reftable/tables.list.lock &&
> > +             >$LOCK &&
> > +             {
> > +                     ( sleep 1 && rm -f $LOCK ) &
> > +             } &&
> > +             git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
> > +     )
> > +'
>
> Nit: This does stall the test for 1s. Which is slightly annoying when
> running single tests locally. Couldn't we achieve this by doing `sleep
> 0.1`?

`sleep 0.1` is neither POSIX nor portable.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/3] reftable: graceful concurrent writes
  2024-09-17 13:43 [PATCH 0/3] reftable: graceful concurrent writes Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2024-09-17 13:43 ` [PATCH 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
@ 2024-09-17 18:26 ` karthik nayak
  2024-09-18  4:31   ` Patrick Steinhardt
  2024-09-18  4:32 ` [PATCH v2 " Patrick Steinhardt
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: karthik nayak @ 2024-09-17 18:26 UTC (permalink / raw)
  To: Patrick Steinhardt, git

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> the reftable backend cannot properly handle concurrent writes due to two
> reasons:
>
>   - It will bail out immediately when it sees a locked "tables.list"
>     file. This is being addressed by introducing a configurable locking
>     timeout, similar to how we have it for both loose and packed refs.
>
>   - It will bail out when it notices that its stack is out-of-date after
>     having locked the "tables.list" file. This is addressed by reloading
>     the stack as requested after locking it, which is fine because our
>     transactional API would verify queued ref updates against their
>     expected state after the lock was acquired anyway.
>
> So with this patch series we can now spawn concurrent writers and they
> are expected to succeed, which is demonstrated by the test added by the
> last patch.
>

I only had a comment in the first commit. The rest two look good to me.
I do wonder if we really need a flag? But that's just a nit.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-17 17:50     ` Eric Sunshine
@ 2024-09-18  4:31       ` Patrick Steinhardt
  0 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  4:31 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: karthik nayak, git

On Tue, Sep 17, 2024 at 01:50:27PM -0400, Eric Sunshine wrote:
> On Tue, Sep 17, 2024 at 1:46 PM karthik nayak <karthik.188@gmail.com> wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
> > > +test_expect_success 'ref transaction: retry acquiring tables.list lock' '
> > > +     test_when_finished "rm -rf repo" &&
> > > +     git init repo &&
> > > +     (
> > > +             cd repo &&
> > > +             test_commit initial &&
> > > +             LOCK=.git/reftable/tables.list.lock &&
> > > +             >$LOCK &&
> > > +             {
> > > +                     ( sleep 1 && rm -f $LOCK ) &
> > > +             } &&
> > > +             git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
> > > +     )
> > > +'
> >
> > Nit: This does stall the test for 1s. Which is slightly annoying when
> > running single tests locally. Couldn't we achieve this by doing `sleep
> > 0.1`?
> 
> `sleep 0.1` is neither POSIX nor portable.

The above test also verifies that the timeout can in fact be overridden.
If we only had `sleep 0.1` we wouldn't be able to reliably verify that,
as the default is a timeout of 100ms.

We also have essentially the same tests in t0601 for packed-refs, I
mostly just copied the logic over from there. It's not exactly pretty,
and I'm not a huge fan of introducing timing sensitive tests, but in the
end I guess it's okayish.

Patrick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-17 17:46   ` karthik nayak
  2024-09-17 17:50     ` Eric Sunshine
@ 2024-09-18  4:31     ` Patrick Steinhardt
  1 sibling, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  4:31 UTC (permalink / raw)
  To: karthik nayak; +Cc: git

On Tue, Sep 17, 2024 at 01:46:39PM -0400, karthik nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> [snip]
> 
> > +reftable.lockTimeout::
> > +	Whenever the reftable backend appends a new table to the stack, it has
> > +	to lock the central "tables.list" file before updating it. This config
> > +	controls how long the process will wait to acquire the lock in case
> > +	another process has already acquired it. Default is 1000 (i.e., retry
> > +	for 1 second).
> 
> Isn't the default 100ms? As that was what was mentioned in the commit
> message

Oh, yeah. I ended up changing it. 1 second is the default timeout used
by packed-refs, 100ms is the default timeout used by loose refs. The
reason why I ultimately picked 100ms over 1s is that our usecase is
closer to loose refs, because it is our "normal" code path that we hit
whenever we write reftables.

Will fix.

Patrick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/3] reftable: graceful concurrent writes
  2024-09-17 18:26 ` [PATCH 0/3] reftable: graceful concurrent writes karthik nayak
@ 2024-09-18  4:31   ` Patrick Steinhardt
  0 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  4:31 UTC (permalink / raw)
  To: karthik nayak; +Cc: git

On Tue, Sep 17, 2024 at 11:26:58AM -0700, karthik nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Hi,
> >
> > the reftable backend cannot properly handle concurrent writes due to two
> > reasons:
> >
> >   - It will bail out immediately when it sees a locked "tables.list"
> >     file. This is being addressed by introducing a configurable locking
> >     timeout, similar to how we have it for both loose and packed refs.
> >
> >   - It will bail out when it notices that its stack is out-of-date after
> >     having locked the "tables.list" file. This is addressed by reloading
> >     the stack as requested after locking it, which is fine because our
> >     transactional API would verify queued ref updates against their
> >     expected state after the lock was acquired anyway.
> >
> > So with this patch series we can now spawn concurrent writers and they
> > are expected to succeed, which is demonstrated by the test added by the
> > last patch.
> >
> 
> I only had a comment in the first commit. The rest two look good to me.
> I do wonder if we really need a flag? But that's just a nit.

I didn't carefully vet all locations where we could pass that flag, so
right now we only pass it in a single location. Also, we need to keep in
mind that this is library code: even if we had converted all callsites
to pass the new flag I still it's a sensibe thing to wish for to disable
the automatic refresh and abort instead.

Thanks for your review!

Patrick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 0/3] reftable: graceful concurrent writes
  2024-09-17 13:43 [PATCH 0/3] reftable: graceful concurrent writes Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2024-09-17 18:26 ` [PATCH 0/3] reftable: graceful concurrent writes karthik nayak
@ 2024-09-18  4:32 ` Patrick Steinhardt
  2024-09-18  4:32   ` [PATCH v2 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
                     ` (3 more replies)
  2024-09-18  9:59 ` [PATCH v3 " Patrick Steinhardt
  2024-09-24  5:32 ` [PATCH v4 " Patrick Steinhardt
  6 siblings, 4 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  4:32 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine

Hi,

this is the second patch series that makes graceful concurrent writes
work with the reftable backend.

There is only a single change compared to v1, namely a fix to the docs
to mention the right default timeout value.

Thanks!

Patrick

Patrick Steinhardt (3):
  refs/reftable: introduce "reftable.lockTimeout"
  reftable/stack: allow locking of outdated stacks
  refs/reftable: reload locked stack when preparing transaction

 Documentation/config/reftable.txt |  7 ++++
 refs/reftable-backend.c           |  9 ++++-
 reftable/reftable-stack.h         | 13 +++++-
 reftable/reftable-writer.h        |  8 ++++
 reftable/stack.c                  | 38 ++++++++++++------
 t/t0610-reftable-basics.sh        | 58 ++++++++++++++++++++++++++
 t/unit-tests/t-reftable-stack.c   | 67 ++++++++++++++++++++++++++++++-
 7 files changed, 181 insertions(+), 19 deletions(-)

Range-diff against v1:
1:  ca3eab99f7e ! 1:  700a35df125 refs/reftable: introduce "reftable.lockTimeout"
    @@ Documentation/config/reftable.txt: reftable.geometricFactor::
     +	Whenever the reftable backend appends a new table to the stack, it has
     +	to lock the central "tables.list" file before updating it. This config
     +	controls how long the process will wait to acquire the lock in case
    -+	another process has already acquired it. Default is 1000 (i.e., retry
    -+	for 1 second).
    ++	another process has already acquired it. Default is 100 (i.e., retry
    ++	for 100ms).
     
      ## refs/reftable-backend.c ##
     @@ refs/reftable-backend.c: static int reftable_be_config(const char *var, const char *value,
2:  cd65e6d57b0 = 2:  f4be0966e17 reftable/stack: allow locking of outdated stacks
3:  37ec3acbff0 = 3:  111b497ef17 refs/reftable: reload locked stack when preparing transaction
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-18  4:32 ` [PATCH v2 " Patrick Steinhardt
@ 2024-09-18  4:32   ` Patrick Steinhardt
  2024-09-18  9:22     ` James Liu
  2024-09-18  4:32   ` [PATCH v2 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  4:32 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine

When multiple concurrent processes try to update references in a
repository they may try to lock the same lockfiles. This can happen even
when the updates are non-conflicting and can both be applied, so it
doesn't always make sense to abort the transaction immediately. Both the
"loose" and "packed" backends thus have a grace period that they wait
for the lock to be released that can be controlled via the config values
"core.filesRefLockTimeout" and "core.packedRefsTimeout", respectively.

The reftable backend doesn't have such a setting yet and instead fails
immediately when it sees such a lock. But the exact same concepts apply
here as they do apply to the other backends.

Introduce a new "reftable.lockTimeout" config that controls how long we
may wait for a "tables.list" lock to be released. The default value of
this config is 100ms, which is the same default as we have it for the
"loose" backend.

Note that even though we also lock individual tables, this config really
only applies to the "tables.list" file. This is because individual
tables are only ever locked when we already hold the "tables.list" lock
during compaction. When we observe such a lock we in fact do not want to
compact the table at all because it is already in the process of being
compacted by a concurrent process. So applying the same timeout here
would not make any sense and only delay progress.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/reftable.txt |  7 +++++++
 refs/reftable-backend.c           |  4 ++++
 reftable/reftable-writer.h        |  8 ++++++++
 reftable/stack.c                  | 18 ++++++++++++------
 t/t0610-reftable-basics.sh        | 27 +++++++++++++++++++++++++++
 5 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/Documentation/config/reftable.txt b/Documentation/config/reftable.txt
index 05157279778..abb793bcbb0 100644
--- a/Documentation/config/reftable.txt
+++ b/Documentation/config/reftable.txt
@@ -46,3 +46,10 @@ reftable.geometricFactor::
 By default, the geometric sequence uses a factor of 2, meaning that for any
 table, the next-biggest table must at least be twice as big. A maximum factor
 of 256 is supported.
+
+reftable.lockTimeout::
+	Whenever the reftable backend appends a new table to the stack, it has
+	to lock the central "tables.list" file before updating it. This config
+	controls how long the process will wait to acquire the lock in case
+	another process has already acquired it. Default is 100 (i.e., retry
+	for 100ms).
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 1c4b19e737f..e90ddfb98dd 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -256,6 +256,9 @@ static int reftable_be_config(const char *var, const char *value,
 		if (factor > UINT8_MAX)
 			die("reftable geometric factor cannot exceed %u", (unsigned)UINT8_MAX);
 		opts->auto_compaction_factor = factor;
+	} else if (!strcmp(var, "reftable.locktimeout")) {
+		unsigned long lock_timeout = git_config_ulong(var, value, ctx->kvi);
+		opts->lock_timeout_ms = lock_timeout;
 	}
 
 	return 0;
@@ -281,6 +284,7 @@ static struct ref_store *reftable_be_init(struct repository *repo,
 	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
 	refs->write_options.disable_auto_compact =
 		!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1);
+	refs->write_options.lock_timeout_ms = 100;
 
 	git_config(reftable_be_config, &refs->write_options);
 
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index 189b1f4144f..d5f2446220b 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -51,6 +51,14 @@ struct reftable_write_options {
 	 * tables to compact. Defaults to 2 if unset.
 	 */
 	uint8_t auto_compaction_factor;
+
+	/*
+	 * The number of milliseconds to wait when trying to lock "tables.list".
+	 * Note that this does not apply to locking individual tables, as these
+	 * should only ever be locked when already holding the "tables.list"
+	 * lock.
+	 */
+	unsigned lock_timeout_ms;
 };
 
 /* reftable_block_stats holds statistics for a single block type */
diff --git a/reftable/stack.c b/reftable/stack.c
index ce0a35216ba..5ccad2cff31 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -603,8 +603,10 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 
 	add->stack = st;
 
-	err = hold_lock_file_for_update(&add->tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&add->tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST) {
 			err = REFTABLE_LOCK_ERROR;
@@ -1056,8 +1058,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * Hold the lock so that we can read "tables.list" and lock all tables
 	 * which are part of the user-specified range.
 	 */
-	err = hold_lock_file_for_update(&tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST)
 			err = REFTABLE_LOCK_ERROR;
@@ -1156,8 +1160,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * "tables.list". We'll then replace the compacted range of tables with
 	 * the new table.
 	 */
-	err = hold_lock_file_for_update(&tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST)
 			err = REFTABLE_LOCK_ERROR;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 37510c2b2a5..62da3e37823 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -423,6 +423,33 @@ test_expect_success 'ref transaction: fails gracefully when auto compaction fail
 	)
 '
 
+test_expect_success 'ref transaction: timeout acquiring tables.list lock' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		>.git/reftable/tables.list.lock &&
+		test_must_fail git update-ref refs/heads/branch HEAD 2>err &&
+		test_grep "cannot lock references" err
+	)
+'
+
+test_expect_success 'ref transaction: retry acquiring tables.list lock' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		LOCK=.git/reftable/tables.list.lock &&
+		>$LOCK &&
+		{
+			( sleep 1 && rm -f $LOCK ) &
+		} &&
+		git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
+	)
+'
+
 test_expect_success 'pack-refs: compacts tables' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/3] reftable/stack: allow locking of outdated stacks
  2024-09-18  4:32 ` [PATCH v2 " Patrick Steinhardt
  2024-09-18  4:32   ` [PATCH v2 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
@ 2024-09-18  4:32   ` Patrick Steinhardt
  2024-09-18  9:26     ` James Liu
  2024-09-18  4:32   ` [PATCH v2 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
  2024-09-18  9:33   ` [PATCH v2 0/3] reftable: graceful concurrent writes James Liu
  3 siblings, 1 reply; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  4:32 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine

In `reftable_stack_new_addition()` we first lock the stack and then
check whether it is still up-to-date. If it is not we return an error to
the caller indicating that the stack is outdated.

This is overly restrictive in our ref transaction interface though: we
lock the stack right before we start to verify the transaction, so we do
not really care whether it is outdated or not. What we really want is
that the stack is up-to-date after it has been locked so that we can
verify queued updates against its current state while we know that it is
locked for concurrent modification.

Introduce a new flag `REFTABLE_STACK_NEW_ADDITION_RELOAD` that alters
the behaviour of `reftable_stack_init_addition()` in this case: when we
notice that it is out-of-date we reload it instead of returning an error
to the caller.

This logic will be wired up in the reftable backend in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c         |  4 +-
 reftable/reftable-stack.h       | 13 ++++++-
 reftable/stack.c                | 20 ++++++----
 t/unit-tests/t-reftable-stack.c | 67 ++++++++++++++++++++++++++++++++-
 4 files changed, 91 insertions(+), 13 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index e90ddfb98dd..c500fb820a7 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -766,7 +766,7 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		if (ret)
 			return ret;
 
-		ret = reftable_stack_new_addition(&addition, stack);
+		ret = reftable_stack_new_addition(&addition, stack, 0);
 		if (ret) {
 			if (ret == REFTABLE_LOCK_ERROR)
 				strbuf_addstr(err, "cannot lock references");
@@ -2203,7 +2203,7 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	if (ret < 0)
 		goto done;
 
-	ret = reftable_stack_new_addition(&add, stack);
+	ret = reftable_stack_new_addition(&add, stack, 0);
 	if (ret < 0)
 		goto done;
 
diff --git a/reftable/reftable-stack.h b/reftable/reftable-stack.h
index f4f8cabc7fb..67316b215ed 100644
--- a/reftable/reftable-stack.h
+++ b/reftable/reftable-stack.h
@@ -37,12 +37,21 @@ uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
 /* holds a transaction to add tables at the top of a stack. */
 struct reftable_addition;
 
+enum {
+	/*
+	 * Reload the stack when the stack is out-of-date after locking it.
+	 */
+	REFTABLE_STACK_NEW_ADDITION_RELOAD = (1 << 0),
+};
+
 /*
  * returns a new transaction to add reftables to the given stack. As a side
- * effect, the ref database is locked.
+ * effect, the ref database is locked. Accepts REFTABLE_STACK_NEW_ADDITION_*
+ * flags.
  */
 int reftable_stack_new_addition(struct reftable_addition **dest,
-				struct reftable_stack *st);
+				struct reftable_stack *st,
+				int flags);
 
 /* Adds a reftable to transaction. */
 int reftable_addition_add(struct reftable_addition *add,
diff --git a/reftable/stack.c b/reftable/stack.c
index 5ccad2cff31..f9c95d5fa62 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -596,7 +596,8 @@ struct reftable_addition {
 #define REFTABLE_ADDITION_INIT {0}
 
 static int reftable_stack_init_addition(struct reftable_addition *add,
-					struct reftable_stack *st)
+					struct reftable_stack *st,
+					int flags)
 {
 	struct strbuf lock_file_name = STRBUF_INIT;
 	int err;
@@ -626,6 +627,11 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 	err = stack_uptodate(st);
 	if (err < 0)
 		goto done;
+	if (err > 0 && flags & REFTABLE_STACK_NEW_ADDITION_RELOAD) {
+		err = reftable_stack_reload_maybe_reuse(add->stack, 1);
+		if (err)
+			goto done;
+	}
 	if (err > 0) {
 		err = REFTABLE_OUTDATED_ERROR;
 		goto done;
@@ -633,9 +639,8 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 
 	add->next_update_index = reftable_stack_next_update_index(st);
 done:
-	if (err) {
+	if (err)
 		reftable_addition_close(add);
-	}
 	strbuf_release(&lock_file_name);
 	return err;
 }
@@ -739,13 +744,14 @@ int reftable_addition_commit(struct reftable_addition *add)
 }
 
 int reftable_stack_new_addition(struct reftable_addition **dest,
-				struct reftable_stack *st)
+				struct reftable_stack *st,
+				int flags)
 {
 	int err = 0;
 	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
 	REFTABLE_CALLOC_ARRAY(*dest, 1);
 	**dest = empty;
-	err = reftable_stack_init_addition(*dest, st);
+	err = reftable_stack_init_addition(*dest, st, flags);
 	if (err) {
 		reftable_free(*dest);
 		*dest = NULL;
@@ -759,7 +765,7 @@ static int stack_try_add(struct reftable_stack *st,
 			 void *arg)
 {
 	struct reftable_addition add = REFTABLE_ADDITION_INIT;
-	int err = reftable_stack_init_addition(&add, st);
+	int err = reftable_stack_init_addition(&add, st, 0);
 	if (err < 0)
 		goto done;
 
@@ -1608,7 +1614,7 @@ static int reftable_stack_clean_locked(struct reftable_stack *st)
 int reftable_stack_clean(struct reftable_stack *st)
 {
 	struct reftable_addition *add = NULL;
-	int err = reftable_stack_new_addition(&add, st);
+	int err = reftable_stack_new_addition(&add, st, 0);
 	if (err < 0) {
 		goto done;
 	}
diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
index d62a9c1bed5..a37cc698d87 100644
--- a/t/unit-tests/t-reftable-stack.c
+++ b/t/unit-tests/t-reftable-stack.c
@@ -271,7 +271,7 @@ static void t_reftable_stack_transaction_api(void)
 
 	reftable_addition_destroy(add);
 
-	err = reftable_stack_new_addition(&add, st);
+	err = reftable_stack_new_addition(&add, st, 0);
 	check(!err);
 
 	err = reftable_addition_add(add, write_test_ref, &ref);
@@ -292,6 +292,68 @@ static void t_reftable_stack_transaction_api(void)
 	clear_dir(dir);
 }
 
+static void t_reftable_stack_transaction_with_reload(void)
+{
+	char *dir = get_tmp_dir(__LINE__);
+	struct reftable_stack *st1 = NULL, *st2 = NULL;
+	int err;
+	struct reftable_addition *add = NULL;
+	struct reftable_ref_record refs[2] = {
+		{
+			.refname = (char *) "refs/heads/a",
+			.update_index = 1,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { '1' },
+		},
+		{
+			.refname = (char *) "refs/heads/b",
+			.update_index = 2,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { '1' },
+		},
+	};
+	struct reftable_ref_record ref = { 0 };
+
+	err = reftable_new_stack(&st1, dir, NULL);
+	check(!err);
+	err = reftable_new_stack(&st2, dir, NULL);
+	check(!err);
+
+	err = reftable_stack_new_addition(&add, st1, 0);
+	check(!err);
+	err = reftable_addition_add(add, write_test_ref, &refs[0]);
+	check(!err);
+	err = reftable_addition_commit(add);
+	check(!err);
+	reftable_addition_destroy(add);
+
+	/*
+	 * The second stack is now outdated, which we should notice. We do not
+	 * create the addition and lock the stack by default, but allow the
+	 * reload to happen when REFTABLE_STACK_NEW_ADDITION_RELOAD is set.
+	 */
+	err = reftable_stack_new_addition(&add, st2, 0);
+	check_int(err, ==, REFTABLE_OUTDATED_ERROR);
+	err = reftable_stack_new_addition(&add, st2, REFTABLE_STACK_NEW_ADDITION_RELOAD);
+	check(!err);
+	err = reftable_addition_add(add, write_test_ref, &refs[1]);
+	check(!err);
+	err = reftable_addition_commit(add);
+	check(!err);
+	reftable_addition_destroy(add);
+
+	for (size_t i = 0; i < ARRAY_SIZE(refs); i++) {
+		err = reftable_stack_read_ref(st2, refs[i].refname, &ref);
+		check(!err);
+		check(reftable_ref_record_equal(&refs[i], &ref, GIT_SHA1_RAWSZ));
+	}
+
+	reftable_ref_record_release(&ref);
+	reftable_stack_destroy(st1);
+	reftable_stack_destroy(st2);
+	clear_dir(dir);
+}
+
 static void t_reftable_stack_transaction_api_performs_auto_compaction(void)
 {
 	char *dir = get_tmp_dir(__LINE__);
@@ -322,7 +384,7 @@ static void t_reftable_stack_transaction_api_performs_auto_compaction(void)
 		 */
 		st->opts.disable_auto_compact = i != n;
 
-		err = reftable_stack_new_addition(&add, st);
+		err = reftable_stack_new_addition(&add, st, 0);
 		check(!err);
 
 		err = reftable_addition_add(add, write_test_ref, &ref);
@@ -1314,6 +1376,7 @@ int cmd_main(int argc UNUSED, const char *argv[] UNUSED)
 	TEST(t_reftable_stack_reload_with_missing_table(), "stack iteration with garbage tables");
 	TEST(t_reftable_stack_tombstone(), "'tombstone' refs in stack");
 	TEST(t_reftable_stack_transaction_api(), "update transaction to stack");
+	TEST(t_reftable_stack_transaction_with_reload(), "transaction with reload");
 	TEST(t_reftable_stack_transaction_api_performs_auto_compaction(), "update transaction triggers auto-compaction");
 	TEST(t_reftable_stack_update_index_check(), "update transactions with equal update indices");
 	TEST(t_reftable_stack_uptodate(), "stack must be reloaded before ref update");
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-09-18  4:32 ` [PATCH v2 " Patrick Steinhardt
  2024-09-18  4:32   ` [PATCH v2 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
  2024-09-18  4:32   ` [PATCH v2 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
@ 2024-09-18  4:32   ` Patrick Steinhardt
  2024-09-18  9:33   ` [PATCH v2 0/3] reftable: graceful concurrent writes James Liu
  3 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  4:32 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine

When starting a reftable transaction we lock all stacks we are about to
modify. While it may happen that the stack is out-of-date at this point
in time we don't really care: transactional updates encode the expected
state of a certain reference, so all that we really want to verify is
that the _current_ value matches that expected state.

Pass `REFTABLE_STACK_NEW_ADDITION_RELOAD` when locking the stack such
that an out-of-date stack will be reloaded after having been locked.
This change is safe because all verifications of the expected state
happen after this step anyway.

Add a testcase that verifies that many writers are now able to write to
the stack concurrently without failures and with a deterministic end
result.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c    |  3 ++-
 t/t0610-reftable-basics.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index c500fb820a7..d4b383ca179 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -766,7 +766,8 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		if (ret)
 			return ret;
 
-		ret = reftable_stack_new_addition(&addition, stack, 0);
+		ret = reftable_stack_new_addition(&addition, stack,
+						  REFTABLE_STACK_NEW_ADDITION_RELOAD);
 		if (ret) {
 			if (ret == REFTABLE_LOCK_ERROR)
 				strbuf_addstr(err, "cannot lock references");
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 62da3e37823..2d951c8ceb6 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -450,6 +450,37 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
 	)
 '
 
+test_expect_success 'ref transaction: many concurrent writers' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		# Set a high timeout such that a busy CI machine will not abort
+		# early. 10 seconds should hopefully be ample of time to make
+		# this non-flaky.
+		git config set reftable.lockTimeout 10000 &&
+		test_commit --no-tag initial &&
+
+		head=$(git rev-parse HEAD) &&
+		for i in $(test_seq 100)
+		do
+			printf "%s commit\trefs/heads/branch-%s\n" "$head" "$i" ||
+			return 1
+		done >expect &&
+		printf "%s commit\trefs/heads/main\n" "$head" >>expect &&
+
+		for i in $(test_seq 100)
+		do
+			{ git update-ref refs/heads/branch-$i HEAD& } ||
+			return 1
+		done &&
+
+		wait &&
+		git for-each-ref --sort=v:refname >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'pack-refs: compacts tables' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-18  4:32   ` [PATCH v2 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
@ 2024-09-18  9:22     ` James Liu
  2024-09-18  9:39       ` Patrick Steinhardt
  0 siblings, 1 reply; 42+ messages in thread
From: James Liu @ 2024-09-18  9:22 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: karthik nayak, Eric Sunshine

On Wed Sep 18, 2024 at 2:32 PM AEST, Patrick Steinhardt wrote:
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 1c4b19e737f..e90ddfb98dd 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -256,6 +256,9 @@ static int reftable_be_config(const char *var, const char *value,
>  		if (factor > UINT8_MAX)
>  			die("reftable geometric factor cannot exceed %u", (unsigned)UINT8_MAX);
>  		opts->auto_compaction_factor = factor;
> +	} else if (!strcmp(var, "reftable.locktimeout")) {
> +		unsigned long lock_timeout = git_config_ulong(var, value, ctx->kvi);
> +		opts->lock_timeout_ms = lock_timeout;
>  	}

Do we need to support the `0` and `-1` values that are possible for
the "core.filesRefLockTimeout" and "core.packedRefsTimeout" timeouts
here as well?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/3] reftable/stack: allow locking of outdated stacks
  2024-09-18  4:32   ` [PATCH v2 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
@ 2024-09-18  9:26     ` James Liu
  2024-09-18  9:39       ` Patrick Steinhardt
  0 siblings, 1 reply; 42+ messages in thread
From: James Liu @ 2024-09-18  9:26 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: karthik nayak, Eric Sunshine

I just want to check my understanding of this test, since I think it's
the first time I've reviewed anything using this test harness:

On Wed Sep 18, 2024 at 2:32 PM AEST, Patrick Steinhardt wrote:
> diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
> index d62a9c1bed5..a37cc698d87 100644
> --- a/t/unit-tests/t-reftable-stack.c
> +++ b/t/unit-tests/t-reftable-stack.c
> @@ -271,7 +271,7 @@ static void t_reftable_stack_transaction_api(void)
>  
>  	reftable_addition_destroy(add);
>  
> -	err = reftable_stack_new_addition(&add, st);
> +	err = reftable_stack_new_addition(&add, st, 0);
>  	check(!err);
>  
>  	err = reftable_addition_add(add, write_test_ref, &ref);
> @@ -292,6 +292,68 @@ static void t_reftable_stack_transaction_api(void)
>  	clear_dir(dir);
>  }
>  
> +static void t_reftable_stack_transaction_with_reload(void)
> +{
> +	char *dir = get_tmp_dir(__LINE__);
> +	struct reftable_stack *st1 = NULL, *st2 = NULL;
> +	int err;
> +	struct reftable_addition *add = NULL;
> +	struct reftable_ref_record refs[2] = {
> +		{
> +			.refname = (char *) "refs/heads/a",
> +			.update_index = 1,
> +			.value_type = REFTABLE_REF_VAL1,
> +			.value.val1 = { '1' },
> +		},
> +		{
> +			.refname = (char *) "refs/heads/b",
> +			.update_index = 2,
> +			.value_type = REFTABLE_REF_VAL1,
> +			.value.val1 = { '1' },
> +		},
> +	};
> +	struct reftable_ref_record ref = { 0 };
> +

Create two reftable stacks that provide a view into the reftable tables
inside "dir".

> +	err = reftable_new_stack(&st1, dir, NULL);
> +	check(!err);
> +	err = reftable_new_stack(&st2, dir, NULL);
> +	check(!err);
> +

Successfully add refs[0] to the first stack using the transactional API.

> +	err = reftable_stack_new_addition(&add, st1, 0);
> +	check(!err);
> +	err = reftable_addition_add(add, write_test_ref, &refs[0]);
> +	check(!err);
> +	err = reftable_addition_commit(add);
> +	check(!err);
> +	reftable_addition_destroy(add);
> +
> +	/*
> +	 * The second stack is now outdated, which we should notice. We do not
> +	 * create the addition and lock the stack by default, but allow the
> +	 * reload to happen when REFTABLE_STACK_NEW_ADDITION_RELOAD is set.
> +	 */

We try to open a transaction via the second reftable stack, but the
this stack is outdated because we've written to "dir" when the previous
stack addition was committed.

> +	err = reftable_stack_new_addition(&add, st2, 0);
> +	check_int(err, ==, REFTABLE_OUTDATED_ERROR);

Try again, but supply the flag so it performs a reload internally. Write
refs[1] to "dir" by committing the transaction. 

> +	err = reftable_stack_new_addition(&add, st2, REFTABLE_STACK_NEW_ADDITION_RELOAD);
> +	check(!err);
> +	err = reftable_addition_add(add, write_test_ref, &refs[1]);
> +	check(!err);
> +	err = reftable_addition_commit(add);
> +	check(!err);
> +	reftable_addition_destroy(add);
> +

Asserts.

> +	for (size_t i = 0; i < ARRAY_SIZE(refs); i++) {
> +		err = reftable_stack_read_ref(st2, refs[i].refname, &ref);
> +		check(!err);
> +		check(reftable_ref_record_equal(&refs[i], &ref, GIT_SHA1_RAWSZ));
> +	}
> +
> +	reftable_ref_record_release(&ref);
> +	reftable_stack_destroy(st1);
> +	reftable_stack_destroy(st2);
> +	clear_dir(dir);
> +}
> +


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/3] reftable: graceful concurrent writes
  2024-09-18  4:32 ` [PATCH v2 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-09-18  4:32   ` [PATCH v2 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
@ 2024-09-18  9:33   ` James Liu
  3 siblings, 0 replies; 42+ messages in thread
From: James Liu @ 2024-09-18  9:33 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: karthik nayak, Eric Sunshine

Thanks Patrick! I've just left two comments.

Cheers,
James

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-18  9:22     ` James Liu
@ 2024-09-18  9:39       ` Patrick Steinhardt
  0 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  9:39 UTC (permalink / raw)
  To: James Liu; +Cc: git, karthik nayak, Eric Sunshine

On Wed, Sep 18, 2024 at 07:22:28PM +1000, James Liu wrote:
> On Wed Sep 18, 2024 at 2:32 PM AEST, Patrick Steinhardt wrote:
> > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> > index 1c4b19e737f..e90ddfb98dd 100644
> > --- a/refs/reftable-backend.c
> > +++ b/refs/reftable-backend.c
> > @@ -256,6 +256,9 @@ static int reftable_be_config(const char *var, const char *value,
> >  		if (factor > UINT8_MAX)
> >  			die("reftable geometric factor cannot exceed %u", (unsigned)UINT8_MAX);
> >  		opts->auto_compaction_factor = factor;
> > +	} else if (!strcmp(var, "reftable.locktimeout")) {
> > +		unsigned long lock_timeout = git_config_ulong(var, value, ctx->kvi);
> > +		opts->lock_timeout_ms = lock_timeout;
> >  	}
> 
> Do we need to support the `0` and `-1` values that are possible for
> the "core.filesRefLockTimeout" and "core.packedRefsTimeout" timeouts
> here as well?

We already handle `0`, which is provided by the underlying lockfile
interface. But we don't handle `-1` yet. I guess wiring it up does make
sense indeed, even if it is just to be consistent with the other
timeouts.

Patrick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/3] reftable/stack: allow locking of outdated stacks
  2024-09-18  9:26     ` James Liu
@ 2024-09-18  9:39       ` Patrick Steinhardt
  0 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  9:39 UTC (permalink / raw)
  To: James Liu; +Cc: git, karthik nayak, Eric Sunshine

On Wed, Sep 18, 2024 at 07:26:49PM +1000, James Liu wrote:
> I just want to check my understanding of this test, since I think it's
> the first time I've reviewed anything using this test harness:
> 
> On Wed Sep 18, 2024 at 2:32 PM AEST, Patrick Steinhardt wrote:
> > diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
> > index d62a9c1bed5..a37cc698d87 100644
> > --- a/t/unit-tests/t-reftable-stack.c
> > +++ b/t/unit-tests/t-reftable-stack.c
> > @@ -271,7 +271,7 @@ static void t_reftable_stack_transaction_api(void)
> >  
> >  	reftable_addition_destroy(add);
> >  
> > -	err = reftable_stack_new_addition(&add, st);
> > +	err = reftable_stack_new_addition(&add, st, 0);
> >  	check(!err);
> >  
> >  	err = reftable_addition_add(add, write_test_ref, &ref);
> > @@ -292,6 +292,68 @@ static void t_reftable_stack_transaction_api(void)
> >  	clear_dir(dir);
> >  }
> >  
> > +static void t_reftable_stack_transaction_with_reload(void)
> > +{
> > +	char *dir = get_tmp_dir(__LINE__);
> > +	struct reftable_stack *st1 = NULL, *st2 = NULL;
> > +	int err;
> > +	struct reftable_addition *add = NULL;
> > +	struct reftable_ref_record refs[2] = {
> > +		{
> > +			.refname = (char *) "refs/heads/a",
> > +			.update_index = 1,
> > +			.value_type = REFTABLE_REF_VAL1,
> > +			.value.val1 = { '1' },
> > +		},
> > +		{
> > +			.refname = (char *) "refs/heads/b",
> > +			.update_index = 2,
> > +			.value_type = REFTABLE_REF_VAL1,
> > +			.value.val1 = { '1' },
> > +		},
> > +	};
> > +	struct reftable_ref_record ref = { 0 };
> > +
> 
> Create two reftable stacks that provide a view into the reftable tables
> inside "dir".

Yup.

> > +	err = reftable_new_stack(&st1, dir, NULL);
> > +	check(!err);
> > +	err = reftable_new_stack(&st2, dir, NULL);
> > +	check(!err);
> > +
> 
> Successfully add refs[0] to the first stack using the transactional API.

Here we only open the stacks without doing anything with them yet. This
is preparation for being able to read/write them.

> > +	err = reftable_stack_new_addition(&add, st1, 0);
> > +	check(!err);
> > +	err = reftable_addition_add(add, write_test_ref, &refs[0]);
> > +	check(!err);
> > +	err = reftable_addition_commit(add);
> > +	check(!err);
> > +	reftable_addition_destroy(add);
> > +
> > +	/*
> > +	 * The second stack is now outdated, which we should notice. We do not
> > +	 * create the addition and lock the stack by default, but allow the
> > +	 * reload to happen when REFTABLE_STACK_NEW_ADDITION_RELOAD is set.
> > +	 */
> 
> We try to open a transaction via the second reftable stack, but the
> this stack is outdated because we've written to "dir" when the previous
> stack addition was committed.

Yup.

> > +	err = reftable_stack_new_addition(&add, st2, 0);
> > +	check_int(err, ==, REFTABLE_OUTDATED_ERROR);
> 
> Try again, but supply the flag so it performs a reload internally. Write
> refs[1] to "dir" by committing the transaction. 

Yup.

> > +	err = reftable_stack_new_addition(&add, st2, REFTABLE_STACK_NEW_ADDITION_RELOAD);
> > +	check(!err);
> > +	err = reftable_addition_add(add, write_test_ref, &refs[1]);
> > +	check(!err);
> > +	err = reftable_addition_commit(add);
> > +	check(!err);
> > +	reftable_addition_destroy(add);
> > +
> 
> Asserts.

Exactly.

Patrick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 0/3] reftable: graceful concurrent writes
  2024-09-17 13:43 [PATCH 0/3] reftable: graceful concurrent writes Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2024-09-18  4:32 ` [PATCH v2 " Patrick Steinhardt
@ 2024-09-18  9:59 ` Patrick Steinhardt
  2024-09-18  9:59   ` [PATCH v3 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
                     ` (3 more replies)
  2024-09-24  5:32 ` [PATCH v4 " Patrick Steinhardt
  6 siblings, 4 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  9:59 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine, James Liu

Hi,

this is the third version of my patch series that implements support for
graceful concurrent writes with the reftable backend.

There is only a single change compared to v2, namely that we handle `0`
and `-1` for the lock timeout config. With `0` we fail immediately, with
`-1` we lock indefinitely. This matches semantics of loose and packed
ref locking.

Thanks!

Patrick

Patrick Steinhardt (3):
  refs/reftable: introduce "reftable.lockTimeout"
  reftable/stack: allow locking of outdated stacks
  refs/reftable: reload locked stack when preparing transaction

 Documentation/config/reftable.txt |  8 ++++
 refs/reftable-backend.c           | 13 +++++-
 reftable/reftable-stack.h         | 13 +++++-
 reftable/reftable-writer.h        | 11 +++++
 reftable/stack.c                  | 38 ++++++++++++------
 t/t0610-reftable-basics.sh        | 58 ++++++++++++++++++++++++++
 t/unit-tests/t-reftable-stack.c   | 67 ++++++++++++++++++++++++++++++-
 7 files changed, 189 insertions(+), 19 deletions(-)

Range-diff against v2:
1:  700a35df125 ! 1:  77cffd3b1eb refs/reftable: introduce "reftable.lockTimeout"
    @@ Documentation/config/reftable.txt: reftable.geometricFactor::
     +	Whenever the reftable backend appends a new table to the stack, it has
     +	to lock the central "tables.list" file before updating it. This config
     +	controls how long the process will wait to acquire the lock in case
    -+	another process has already acquired it. Default is 100 (i.e., retry
    -+	for 100ms).
    ++	another process has already acquired it. Value 0 means not to retry at
    ++	all; -1 means to try indefinitely. Default is 100 (i.e., retry for
    ++	100ms).
     
      ## refs/reftable-backend.c ##
     @@ refs/reftable-backend.c: static int reftable_be_config(const char *var, const char *value,
    @@ refs/reftable-backend.c: static int reftable_be_config(const char *var, const ch
      			die("reftable geometric factor cannot exceed %u", (unsigned)UINT8_MAX);
      		opts->auto_compaction_factor = factor;
     +	} else if (!strcmp(var, "reftable.locktimeout")) {
    -+		unsigned long lock_timeout = git_config_ulong(var, value, ctx->kvi);
    ++		int64_t lock_timeout = git_config_int64(var, value, ctx->kvi);
    ++		if (lock_timeout > LONG_MAX)
    ++			die("reftable lock timeout cannot exceed %"PRIdMAX, (intmax_t)LONG_MAX);
    ++		if (lock_timeout < 0 && lock_timeout != -1)
    ++			die("reftable lock timeout does not support negative values other than -1");
     +		opts->lock_timeout_ms = lock_timeout;
      	}
      
    @@ reftable/reftable-writer.h: struct reftable_write_options {
     +	 * Note that this does not apply to locking individual tables, as these
     +	 * should only ever be locked when already holding the "tables.list"
     +	 * lock.
    ++	 *
    ++	 * Passing 0 will fail immediately when the file is locked, passing a
    ++	 * negative value will cause us to block indefinitely.
     +	 */
    -+	unsigned lock_timeout_ms;
    ++	long lock_timeout_ms;
      };
      
      /* reftable_block_stats holds statistics for a single block type */
2:  f4be0966e17 = 2:  6130565498e reftable/stack: allow locking of outdated stacks
3:  111b497ef17 = 3:  25d4e513a36 refs/reftable: reload locked stack when preparing transaction
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-18  9:59 ` [PATCH v3 " Patrick Steinhardt
@ 2024-09-18  9:59   ` Patrick Steinhardt
  2024-09-19 21:34     ` Junio C Hamano
  2024-09-18  9:59   ` [PATCH v3 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  9:59 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine, James Liu

When multiple concurrent processes try to update references in a
repository they may try to lock the same lockfiles. This can happen even
when the updates are non-conflicting and can both be applied, so it
doesn't always make sense to abort the transaction immediately. Both the
"loose" and "packed" backends thus have a grace period that they wait
for the lock to be released that can be controlled via the config values
"core.filesRefLockTimeout" and "core.packedRefsTimeout", respectively.

The reftable backend doesn't have such a setting yet and instead fails
immediately when it sees such a lock. But the exact same concepts apply
here as they do apply to the other backends.

Introduce a new "reftable.lockTimeout" config that controls how long we
may wait for a "tables.list" lock to be released. The default value of
this config is 100ms, which is the same default as we have it for the
"loose" backend.

Note that even though we also lock individual tables, this config really
only applies to the "tables.list" file. This is because individual
tables are only ever locked when we already hold the "tables.list" lock
during compaction. When we observe such a lock we in fact do not want to
compact the table at all because it is already in the process of being
compacted by a concurrent process. So applying the same timeout here
would not make any sense and only delay progress.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/reftable.txt |  8 ++++++++
 refs/reftable-backend.c           |  8 ++++++++
 reftable/reftable-writer.h        | 11 +++++++++++
 reftable/stack.c                  | 18 ++++++++++++------
 t/t0610-reftable-basics.sh        | 27 +++++++++++++++++++++++++++
 5 files changed, 66 insertions(+), 6 deletions(-)

diff --git a/Documentation/config/reftable.txt b/Documentation/config/reftable.txt
index 05157279778..57087803a54 100644
--- a/Documentation/config/reftable.txt
+++ b/Documentation/config/reftable.txt
@@ -46,3 +46,11 @@ reftable.geometricFactor::
 By default, the geometric sequence uses a factor of 2, meaning that for any
 table, the next-biggest table must at least be twice as big. A maximum factor
 of 256 is supported.
+
+reftable.lockTimeout::
+	Whenever the reftable backend appends a new table to the stack, it has
+	to lock the central "tables.list" file before updating it. This config
+	controls how long the process will wait to acquire the lock in case
+	another process has already acquired it. Value 0 means not to retry at
+	all; -1 means to try indefinitely. Default is 100 (i.e., retry for
+	100ms).
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 1c4b19e737f..ca281e39a29 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -256,6 +256,13 @@ static int reftable_be_config(const char *var, const char *value,
 		if (factor > UINT8_MAX)
 			die("reftable geometric factor cannot exceed %u", (unsigned)UINT8_MAX);
 		opts->auto_compaction_factor = factor;
+	} else if (!strcmp(var, "reftable.locktimeout")) {
+		int64_t lock_timeout = git_config_int64(var, value, ctx->kvi);
+		if (lock_timeout > LONG_MAX)
+			die("reftable lock timeout cannot exceed %"PRIdMAX, (intmax_t)LONG_MAX);
+		if (lock_timeout < 0 && lock_timeout != -1)
+			die("reftable lock timeout does not support negative values other than -1");
+		opts->lock_timeout_ms = lock_timeout;
 	}
 
 	return 0;
@@ -281,6 +288,7 @@ static struct ref_store *reftable_be_init(struct repository *repo,
 	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
 	refs->write_options.disable_auto_compact =
 		!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1);
+	refs->write_options.lock_timeout_ms = 100;
 
 	git_config(reftable_be_config, &refs->write_options);
 
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index 189b1f4144f..f5e25cfda16 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -51,6 +51,17 @@ struct reftable_write_options {
 	 * tables to compact. Defaults to 2 if unset.
 	 */
 	uint8_t auto_compaction_factor;
+
+	/*
+	 * The number of milliseconds to wait when trying to lock "tables.list".
+	 * Note that this does not apply to locking individual tables, as these
+	 * should only ever be locked when already holding the "tables.list"
+	 * lock.
+	 *
+	 * Passing 0 will fail immediately when the file is locked, passing a
+	 * negative value will cause us to block indefinitely.
+	 */
+	long lock_timeout_ms;
 };
 
 /* reftable_block_stats holds statistics for a single block type */
diff --git a/reftable/stack.c b/reftable/stack.c
index ce0a35216ba..5ccad2cff31 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -603,8 +603,10 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 
 	add->stack = st;
 
-	err = hold_lock_file_for_update(&add->tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&add->tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST) {
 			err = REFTABLE_LOCK_ERROR;
@@ -1056,8 +1058,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * Hold the lock so that we can read "tables.list" and lock all tables
 	 * which are part of the user-specified range.
 	 */
-	err = hold_lock_file_for_update(&tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST)
 			err = REFTABLE_LOCK_ERROR;
@@ -1156,8 +1160,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * "tables.list". We'll then replace the compacted range of tables with
 	 * the new table.
 	 */
-	err = hold_lock_file_for_update(&tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST)
 			err = REFTABLE_LOCK_ERROR;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 37510c2b2a5..62da3e37823 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -423,6 +423,33 @@ test_expect_success 'ref transaction: fails gracefully when auto compaction fail
 	)
 '
 
+test_expect_success 'ref transaction: timeout acquiring tables.list lock' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		>.git/reftable/tables.list.lock &&
+		test_must_fail git update-ref refs/heads/branch HEAD 2>err &&
+		test_grep "cannot lock references" err
+	)
+'
+
+test_expect_success 'ref transaction: retry acquiring tables.list lock' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		LOCK=.git/reftable/tables.list.lock &&
+		>$LOCK &&
+		{
+			( sleep 1 && rm -f $LOCK ) &
+		} &&
+		git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
+	)
+'
+
 test_expect_success 'pack-refs: compacts tables' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 2/3] reftable/stack: allow locking of outdated stacks
  2024-09-18  9:59 ` [PATCH v3 " Patrick Steinhardt
  2024-09-18  9:59   ` [PATCH v3 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
@ 2024-09-18  9:59   ` Patrick Steinhardt
  2024-09-20 18:10     ` Junio C Hamano
  2024-09-18  9:59   ` [PATCH v3 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
  2024-09-18 23:23   ` [PATCH v3 0/3] reftable: graceful concurrent writes James Liu
  3 siblings, 1 reply; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  9:59 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine, James Liu

In `reftable_stack_new_addition()` we first lock the stack and then
check whether it is still up-to-date. If it is not we return an error to
the caller indicating that the stack is outdated.

This is overly restrictive in our ref transaction interface though: we
lock the stack right before we start to verify the transaction, so we do
not really care whether it is outdated or not. What we really want is
that the stack is up-to-date after it has been locked so that we can
verify queued updates against its current state while we know that it is
locked for concurrent modification.

Introduce a new flag `REFTABLE_STACK_NEW_ADDITION_RELOAD` that alters
the behaviour of `reftable_stack_init_addition()` in this case: when we
notice that it is out-of-date we reload it instead of returning an error
to the caller.

This logic will be wired up in the reftable backend in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c         |  4 +-
 reftable/reftable-stack.h       | 13 ++++++-
 reftable/stack.c                | 20 ++++++----
 t/unit-tests/t-reftable-stack.c | 67 ++++++++++++++++++++++++++++++++-
 4 files changed, 91 insertions(+), 13 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index ca281e39a29..6ca00627dd7 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -770,7 +770,7 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		if (ret)
 			return ret;
 
-		ret = reftable_stack_new_addition(&addition, stack);
+		ret = reftable_stack_new_addition(&addition, stack, 0);
 		if (ret) {
 			if (ret == REFTABLE_LOCK_ERROR)
 				strbuf_addstr(err, "cannot lock references");
@@ -2207,7 +2207,7 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	if (ret < 0)
 		goto done;
 
-	ret = reftable_stack_new_addition(&add, stack);
+	ret = reftable_stack_new_addition(&add, stack, 0);
 	if (ret < 0)
 		goto done;
 
diff --git a/reftable/reftable-stack.h b/reftable/reftable-stack.h
index f4f8cabc7fb..67316b215ed 100644
--- a/reftable/reftable-stack.h
+++ b/reftable/reftable-stack.h
@@ -37,12 +37,21 @@ uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
 /* holds a transaction to add tables at the top of a stack. */
 struct reftable_addition;
 
+enum {
+	/*
+	 * Reload the stack when the stack is out-of-date after locking it.
+	 */
+	REFTABLE_STACK_NEW_ADDITION_RELOAD = (1 << 0),
+};
+
 /*
  * returns a new transaction to add reftables to the given stack. As a side
- * effect, the ref database is locked.
+ * effect, the ref database is locked. Accepts REFTABLE_STACK_NEW_ADDITION_*
+ * flags.
  */
 int reftable_stack_new_addition(struct reftable_addition **dest,
-				struct reftable_stack *st);
+				struct reftable_stack *st,
+				int flags);
 
 /* Adds a reftable to transaction. */
 int reftable_addition_add(struct reftable_addition *add,
diff --git a/reftable/stack.c b/reftable/stack.c
index 5ccad2cff31..f9c95d5fa62 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -596,7 +596,8 @@ struct reftable_addition {
 #define REFTABLE_ADDITION_INIT {0}
 
 static int reftable_stack_init_addition(struct reftable_addition *add,
-					struct reftable_stack *st)
+					struct reftable_stack *st,
+					int flags)
 {
 	struct strbuf lock_file_name = STRBUF_INIT;
 	int err;
@@ -626,6 +627,11 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 	err = stack_uptodate(st);
 	if (err < 0)
 		goto done;
+	if (err > 0 && flags & REFTABLE_STACK_NEW_ADDITION_RELOAD) {
+		err = reftable_stack_reload_maybe_reuse(add->stack, 1);
+		if (err)
+			goto done;
+	}
 	if (err > 0) {
 		err = REFTABLE_OUTDATED_ERROR;
 		goto done;
@@ -633,9 +639,8 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 
 	add->next_update_index = reftable_stack_next_update_index(st);
 done:
-	if (err) {
+	if (err)
 		reftable_addition_close(add);
-	}
 	strbuf_release(&lock_file_name);
 	return err;
 }
@@ -739,13 +744,14 @@ int reftable_addition_commit(struct reftable_addition *add)
 }
 
 int reftable_stack_new_addition(struct reftable_addition **dest,
-				struct reftable_stack *st)
+				struct reftable_stack *st,
+				int flags)
 {
 	int err = 0;
 	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
 	REFTABLE_CALLOC_ARRAY(*dest, 1);
 	**dest = empty;
-	err = reftable_stack_init_addition(*dest, st);
+	err = reftable_stack_init_addition(*dest, st, flags);
 	if (err) {
 		reftable_free(*dest);
 		*dest = NULL;
@@ -759,7 +765,7 @@ static int stack_try_add(struct reftable_stack *st,
 			 void *arg)
 {
 	struct reftable_addition add = REFTABLE_ADDITION_INIT;
-	int err = reftable_stack_init_addition(&add, st);
+	int err = reftable_stack_init_addition(&add, st, 0);
 	if (err < 0)
 		goto done;
 
@@ -1608,7 +1614,7 @@ static int reftable_stack_clean_locked(struct reftable_stack *st)
 int reftable_stack_clean(struct reftable_stack *st)
 {
 	struct reftable_addition *add = NULL;
-	int err = reftable_stack_new_addition(&add, st);
+	int err = reftable_stack_new_addition(&add, st, 0);
 	if (err < 0) {
 		goto done;
 	}
diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
index d62a9c1bed5..a37cc698d87 100644
--- a/t/unit-tests/t-reftable-stack.c
+++ b/t/unit-tests/t-reftable-stack.c
@@ -271,7 +271,7 @@ static void t_reftable_stack_transaction_api(void)
 
 	reftable_addition_destroy(add);
 
-	err = reftable_stack_new_addition(&add, st);
+	err = reftable_stack_new_addition(&add, st, 0);
 	check(!err);
 
 	err = reftable_addition_add(add, write_test_ref, &ref);
@@ -292,6 +292,68 @@ static void t_reftable_stack_transaction_api(void)
 	clear_dir(dir);
 }
 
+static void t_reftable_stack_transaction_with_reload(void)
+{
+	char *dir = get_tmp_dir(__LINE__);
+	struct reftable_stack *st1 = NULL, *st2 = NULL;
+	int err;
+	struct reftable_addition *add = NULL;
+	struct reftable_ref_record refs[2] = {
+		{
+			.refname = (char *) "refs/heads/a",
+			.update_index = 1,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { '1' },
+		},
+		{
+			.refname = (char *) "refs/heads/b",
+			.update_index = 2,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { '1' },
+		},
+	};
+	struct reftable_ref_record ref = { 0 };
+
+	err = reftable_new_stack(&st1, dir, NULL);
+	check(!err);
+	err = reftable_new_stack(&st2, dir, NULL);
+	check(!err);
+
+	err = reftable_stack_new_addition(&add, st1, 0);
+	check(!err);
+	err = reftable_addition_add(add, write_test_ref, &refs[0]);
+	check(!err);
+	err = reftable_addition_commit(add);
+	check(!err);
+	reftable_addition_destroy(add);
+
+	/*
+	 * The second stack is now outdated, which we should notice. We do not
+	 * create the addition and lock the stack by default, but allow the
+	 * reload to happen when REFTABLE_STACK_NEW_ADDITION_RELOAD is set.
+	 */
+	err = reftable_stack_new_addition(&add, st2, 0);
+	check_int(err, ==, REFTABLE_OUTDATED_ERROR);
+	err = reftable_stack_new_addition(&add, st2, REFTABLE_STACK_NEW_ADDITION_RELOAD);
+	check(!err);
+	err = reftable_addition_add(add, write_test_ref, &refs[1]);
+	check(!err);
+	err = reftable_addition_commit(add);
+	check(!err);
+	reftable_addition_destroy(add);
+
+	for (size_t i = 0; i < ARRAY_SIZE(refs); i++) {
+		err = reftable_stack_read_ref(st2, refs[i].refname, &ref);
+		check(!err);
+		check(reftable_ref_record_equal(&refs[i], &ref, GIT_SHA1_RAWSZ));
+	}
+
+	reftable_ref_record_release(&ref);
+	reftable_stack_destroy(st1);
+	reftable_stack_destroy(st2);
+	clear_dir(dir);
+}
+
 static void t_reftable_stack_transaction_api_performs_auto_compaction(void)
 {
 	char *dir = get_tmp_dir(__LINE__);
@@ -322,7 +384,7 @@ static void t_reftable_stack_transaction_api_performs_auto_compaction(void)
 		 */
 		st->opts.disable_auto_compact = i != n;
 
-		err = reftable_stack_new_addition(&add, st);
+		err = reftable_stack_new_addition(&add, st, 0);
 		check(!err);
 
 		err = reftable_addition_add(add, write_test_ref, &ref);
@@ -1314,6 +1376,7 @@ int cmd_main(int argc UNUSED, const char *argv[] UNUSED)
 	TEST(t_reftable_stack_reload_with_missing_table(), "stack iteration with garbage tables");
 	TEST(t_reftable_stack_tombstone(), "'tombstone' refs in stack");
 	TEST(t_reftable_stack_transaction_api(), "update transaction to stack");
+	TEST(t_reftable_stack_transaction_with_reload(), "transaction with reload");
 	TEST(t_reftable_stack_transaction_api_performs_auto_compaction(), "update transaction triggers auto-compaction");
 	TEST(t_reftable_stack_update_index_check(), "update transactions with equal update indices");
 	TEST(t_reftable_stack_uptodate(), "stack must be reloaded before ref update");
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-09-18  9:59 ` [PATCH v3 " Patrick Steinhardt
  2024-09-18  9:59   ` [PATCH v3 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
  2024-09-18  9:59   ` [PATCH v3 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
@ 2024-09-18  9:59   ` Patrick Steinhardt
  2024-09-18 23:23   ` [PATCH v3 0/3] reftable: graceful concurrent writes James Liu
  3 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-18  9:59 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine, James Liu

When starting a reftable transaction we lock all stacks we are about to
modify. While it may happen that the stack is out-of-date at this point
in time we don't really care: transactional updates encode the expected
state of a certain reference, so all that we really want to verify is
that the _current_ value matches that expected state.

Pass `REFTABLE_STACK_NEW_ADDITION_RELOAD` when locking the stack such
that an out-of-date stack will be reloaded after having been locked.
This change is safe because all verifications of the expected state
happen after this step anyway.

Add a testcase that verifies that many writers are now able to write to
the stack concurrently without failures and with a deterministic end
result.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c    |  3 ++-
 t/t0610-reftable-basics.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 6ca00627dd7..efe2b0258ca 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -770,7 +770,8 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		if (ret)
 			return ret;
 
-		ret = reftable_stack_new_addition(&addition, stack, 0);
+		ret = reftable_stack_new_addition(&addition, stack,
+						  REFTABLE_STACK_NEW_ADDITION_RELOAD);
 		if (ret) {
 			if (ret == REFTABLE_LOCK_ERROR)
 				strbuf_addstr(err, "cannot lock references");
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 62da3e37823..2d951c8ceb6 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -450,6 +450,37 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
 	)
 '
 
+test_expect_success 'ref transaction: many concurrent writers' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		# Set a high timeout such that a busy CI machine will not abort
+		# early. 10 seconds should hopefully be ample of time to make
+		# this non-flaky.
+		git config set reftable.lockTimeout 10000 &&
+		test_commit --no-tag initial &&
+
+		head=$(git rev-parse HEAD) &&
+		for i in $(test_seq 100)
+		do
+			printf "%s commit\trefs/heads/branch-%s\n" "$head" "$i" ||
+			return 1
+		done >expect &&
+		printf "%s commit\trefs/heads/main\n" "$head" >>expect &&
+
+		for i in $(test_seq 100)
+		do
+			{ git update-ref refs/heads/branch-$i HEAD& } ||
+			return 1
+		done &&
+
+		wait &&
+		git for-each-ref --sort=v:refname >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'pack-refs: compacts tables' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 0/3] reftable: graceful concurrent writes
  2024-09-18  9:59 ` [PATCH v3 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-09-18  9:59   ` [PATCH v3 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
@ 2024-09-18 23:23   ` James Liu
  2024-09-24  5:33     ` Patrick Steinhardt
  3 siblings, 1 reply; 42+ messages in thread
From: James Liu @ 2024-09-18 23:23 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: karthik nayak, Eric Sunshine

On Wed Sep 18, 2024 at 7:59 PM AEST, Patrick Steinhardt wrote:
> Hi,
>
> this is the third version of my patch series that implements support for
> graceful concurrent writes with the reftable backend.
>
> There is only a single change compared to v2, namely that we handle `0`
> and `-1` for the lock timeout config. With `0` we fail immediately, with
> `-1` we lock indefinitely. This matches semantics of loose and packed
> ref locking.

I guess you meant "we retry indefinitely" here :D

>
> Thanks!
>
> Patrick

Thanks Patrick, no more comments from me!

Cheers,
James

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-18  9:59   ` [PATCH v3 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
@ 2024-09-19 21:34     ` Junio C Hamano
  0 siblings, 0 replies; 42+ messages in thread
From: Junio C Hamano @ 2024-09-19 21:34 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, karthik nayak, Eric Sunshine, James Liu

Patrick Steinhardt <ps@pks.im> writes:

> +reftable.lockTimeout::
> +	Whenever the reftable backend appends a new table to the stack, it has
> +	to lock the central "tables.list" file before updating it. This config
> +	controls how long the process will wait to acquire the lock in case
> +	another process has already acquired it. Value 0 means not to retry at
> +	all; -1 means to try indefinitely. Default is 100 (i.e., retry for
> +	100ms).

Existing timeout knobs are in a hierarchy that is too wide
(i.e. core.*timeout) and this fixes the mistake by placing the name
in a lot more appropriate name (i.e. reftable.*timeout).  If I were
designing the system from scratch, I would probably place all of
them in "refs.*timeout", but I do not think it is worth extra
engineering effort to rename them and pay the transition cost.

> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 1c4b19e737f..ca281e39a29 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -256,6 +256,13 @@ static int reftable_be_config(const char *var, const char *value,
>  		if (factor > UINT8_MAX)
>  			die("reftable geometric factor cannot exceed %u", (unsigned)UINT8_MAX);
>  		opts->auto_compaction_factor = factor;
> +	} else if (!strcmp(var, "reftable.locktimeout")) {
> +		int64_t lock_timeout = git_config_int64(var, value, ctx->kvi);
> +		if (lock_timeout > LONG_MAX)
> +			die("reftable lock timeout cannot exceed %"PRIdMAX, (intmax_t)LONG_MAX);
> +		if (lock_timeout < 0 && lock_timeout != -1)
> +			die("reftable lock timeout does not support negative values other than -1");
> +		opts->lock_timeout_ms = lock_timeout;

Existing lock timeouts this models after seems to consider a
platform native "int" is good enough size to represent the timeout
value in milliseconds, but the eventual user of this value in the
lockfile API expects a long, so lock_timeout_ms being long is fine.

Perhaps #leftoverbits to straighten out the types used for the other
two timeout configuration variables.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/3] reftable/stack: allow locking of outdated stacks
  2024-09-18  9:59   ` [PATCH v3 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
@ 2024-09-20 18:10     ` Junio C Hamano
  2024-09-24  5:33       ` Patrick Steinhardt
  0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2024-09-20 18:10 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, karthik nayak, Eric Sunshine, James Liu

Patrick Steinhardt <ps@pks.im> writes:

>  int reftable_stack_new_addition(struct reftable_addition **dest,
> -				struct reftable_stack *st);
> +				struct reftable_stack *st,
> +				int flags);

We usually use "unsigned" for a flag word, i.e., a collection of
flag bits.

> @@ -1314,6 +1376,7 @@ int cmd_main(int argc UNUSED, const char *argv[] UNUSED)
>  	TEST(t_reftable_stack_reload_with_missing_table(), "stack iteration with garbage tables");
>  	TEST(t_reftable_stack_tombstone(), "'tombstone' refs in stack");
>  	TEST(t_reftable_stack_transaction_api(), "update transaction to stack");
> +	TEST(t_reftable_stack_transaction_with_reload(), "transaction with reload");
>  	TEST(t_reftable_stack_transaction_api_performs_auto_compaction(), "update transaction triggers auto-compaction");
>  	TEST(t_reftable_stack_update_index_check(), "update transactions with equal update indices");
>  	TEST(t_reftable_stack_uptodate(), "stack must be reloaded before ref update");

Looking good.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 0/3] reftable: graceful concurrent writes
  2024-09-17 13:43 [PATCH 0/3] reftable: graceful concurrent writes Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2024-09-18  9:59 ` [PATCH v3 " Patrick Steinhardt
@ 2024-09-24  5:32 ` Patrick Steinhardt
  2024-09-24  5:33   ` [PATCH v4 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
                     ` (2 more replies)
  6 siblings, 3 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-24  5:32 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine, James Liu, Junio C Hamano

Hi,

this is the fourth version of my patch series that implements for
graceful concurrent writes with the reftable backend.

There's only a single change, namely a fix to the type used by the newly
introduced `flags` field.

Thanks!

Patrick

Patrick Steinhardt (3):
  refs/reftable: introduce "reftable.lockTimeout"
  reftable/stack: allow locking of outdated stacks
  refs/reftable: reload locked stack when preparing transaction

 Documentation/config/reftable.txt |  8 ++++
 refs/reftable-backend.c           | 13 +++++-
 reftable/reftable-stack.h         | 13 +++++-
 reftable/reftable-writer.h        | 11 +++++
 reftable/stack.c                  | 38 ++++++++++++------
 t/t0610-reftable-basics.sh        | 58 ++++++++++++++++++++++++++
 t/unit-tests/t-reftable-stack.c   | 67 ++++++++++++++++++++++++++++++-
 7 files changed, 189 insertions(+), 19 deletions(-)

Range-diff against v3:
1:  77cffd3b1eb = 1:  77cffd3b1eb refs/reftable: introduce "reftable.lockTimeout"
2:  6130565498e ! 2:  81a836062e9 reftable/stack: allow locking of outdated stacks
    @@ reftable/reftable-stack.h: uint64_t reftable_stack_next_update_index(struct reft
      int reftable_stack_new_addition(struct reftable_addition **dest,
     -				struct reftable_stack *st);
     +				struct reftable_stack *st,
    -+				int flags);
    ++				unsigned int flags);
      
      /* Adds a reftable to transaction. */
      int reftable_addition_add(struct reftable_addition *add,
    @@ reftable/stack.c: struct reftable_addition {
      static int reftable_stack_init_addition(struct reftable_addition *add,
     -					struct reftable_stack *st)
     +					struct reftable_stack *st,
    -+					int flags)
    ++					unsigned int flags)
      {
      	struct strbuf lock_file_name = STRBUF_INIT;
      	int err;
    @@ reftable/stack.c: int reftable_addition_commit(struct reftable_addition *add)
      int reftable_stack_new_addition(struct reftable_addition **dest,
     -				struct reftable_stack *st)
     +				struct reftable_stack *st,
    -+				int flags)
    ++				unsigned int flags)
      {
      	int err = 0;
      	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
3:  25d4e513a36 = 3:  9ce2d18dff2 refs/reftable: reload locked stack when preparing transaction
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 1/3] refs/reftable: introduce "reftable.lockTimeout"
  2024-09-24  5:32 ` [PATCH v4 " Patrick Steinhardt
@ 2024-09-24  5:33   ` Patrick Steinhardt
  2024-09-24  5:33   ` [PATCH v4 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
  2024-09-24  5:33   ` [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
  2 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-24  5:33 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine, James Liu, Junio C Hamano

When multiple concurrent processes try to update references in a
repository they may try to lock the same lockfiles. This can happen even
when the updates are non-conflicting and can both be applied, so it
doesn't always make sense to abort the transaction immediately. Both the
"loose" and "packed" backends thus have a grace period that they wait
for the lock to be released that can be controlled via the config values
"core.filesRefLockTimeout" and "core.packedRefsTimeout", respectively.

The reftable backend doesn't have such a setting yet and instead fails
immediately when it sees such a lock. But the exact same concepts apply
here as they do apply to the other backends.

Introduce a new "reftable.lockTimeout" config that controls how long we
may wait for a "tables.list" lock to be released. The default value of
this config is 100ms, which is the same default as we have it for the
"loose" backend.

Note that even though we also lock individual tables, this config really
only applies to the "tables.list" file. This is because individual
tables are only ever locked when we already hold the "tables.list" lock
during compaction. When we observe such a lock we in fact do not want to
compact the table at all because it is already in the process of being
compacted by a concurrent process. So applying the same timeout here
would not make any sense and only delay progress.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/reftable.txt |  8 ++++++++
 refs/reftable-backend.c           |  8 ++++++++
 reftable/reftable-writer.h        | 11 +++++++++++
 reftable/stack.c                  | 18 ++++++++++++------
 t/t0610-reftable-basics.sh        | 27 +++++++++++++++++++++++++++
 5 files changed, 66 insertions(+), 6 deletions(-)

diff --git a/Documentation/config/reftable.txt b/Documentation/config/reftable.txt
index 05157279778..57087803a54 100644
--- a/Documentation/config/reftable.txt
+++ b/Documentation/config/reftable.txt
@@ -46,3 +46,11 @@ reftable.geometricFactor::
 By default, the geometric sequence uses a factor of 2, meaning that for any
 table, the next-biggest table must at least be twice as big. A maximum factor
 of 256 is supported.
+
+reftable.lockTimeout::
+	Whenever the reftable backend appends a new table to the stack, it has
+	to lock the central "tables.list" file before updating it. This config
+	controls how long the process will wait to acquire the lock in case
+	another process has already acquired it. Value 0 means not to retry at
+	all; -1 means to try indefinitely. Default is 100 (i.e., retry for
+	100ms).
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 1c4b19e737f..ca281e39a29 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -256,6 +256,13 @@ static int reftable_be_config(const char *var, const char *value,
 		if (factor > UINT8_MAX)
 			die("reftable geometric factor cannot exceed %u", (unsigned)UINT8_MAX);
 		opts->auto_compaction_factor = factor;
+	} else if (!strcmp(var, "reftable.locktimeout")) {
+		int64_t lock_timeout = git_config_int64(var, value, ctx->kvi);
+		if (lock_timeout > LONG_MAX)
+			die("reftable lock timeout cannot exceed %"PRIdMAX, (intmax_t)LONG_MAX);
+		if (lock_timeout < 0 && lock_timeout != -1)
+			die("reftable lock timeout does not support negative values other than -1");
+		opts->lock_timeout_ms = lock_timeout;
 	}
 
 	return 0;
@@ -281,6 +288,7 @@ static struct ref_store *reftable_be_init(struct repository *repo,
 	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
 	refs->write_options.disable_auto_compact =
 		!git_env_bool("GIT_TEST_REFTABLE_AUTOCOMPACTION", 1);
+	refs->write_options.lock_timeout_ms = 100;
 
 	git_config(reftable_be_config, &refs->write_options);
 
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index 189b1f4144f..f5e25cfda16 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -51,6 +51,17 @@ struct reftable_write_options {
 	 * tables to compact. Defaults to 2 if unset.
 	 */
 	uint8_t auto_compaction_factor;
+
+	/*
+	 * The number of milliseconds to wait when trying to lock "tables.list".
+	 * Note that this does not apply to locking individual tables, as these
+	 * should only ever be locked when already holding the "tables.list"
+	 * lock.
+	 *
+	 * Passing 0 will fail immediately when the file is locked, passing a
+	 * negative value will cause us to block indefinitely.
+	 */
+	long lock_timeout_ms;
 };
 
 /* reftable_block_stats holds statistics for a single block type */
diff --git a/reftable/stack.c b/reftable/stack.c
index ce0a35216ba..5ccad2cff31 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -603,8 +603,10 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 
 	add->stack = st;
 
-	err = hold_lock_file_for_update(&add->tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&add->tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST) {
 			err = REFTABLE_LOCK_ERROR;
@@ -1056,8 +1058,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * Hold the lock so that we can read "tables.list" and lock all tables
 	 * which are part of the user-specified range.
 	 */
-	err = hold_lock_file_for_update(&tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST)
 			err = REFTABLE_LOCK_ERROR;
@@ -1156,8 +1160,10 @@ static int stack_compact_range(struct reftable_stack *st,
 	 * "tables.list". We'll then replace the compacted range of tables with
 	 * the new table.
 	 */
-	err = hold_lock_file_for_update(&tables_list_lock, st->list_file,
-					LOCK_NO_DEREF);
+	err = hold_lock_file_for_update_timeout(&tables_list_lock,
+						st->list_file,
+						LOCK_NO_DEREF,
+						st->opts.lock_timeout_ms);
 	if (err < 0) {
 		if (errno == EEXIST)
 			err = REFTABLE_LOCK_ERROR;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 37510c2b2a5..62da3e37823 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -423,6 +423,33 @@ test_expect_success 'ref transaction: fails gracefully when auto compaction fail
 	)
 '
 
+test_expect_success 'ref transaction: timeout acquiring tables.list lock' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		>.git/reftable/tables.list.lock &&
+		test_must_fail git update-ref refs/heads/branch HEAD 2>err &&
+		test_grep "cannot lock references" err
+	)
+'
+
+test_expect_success 'ref transaction: retry acquiring tables.list lock' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		LOCK=.git/reftable/tables.list.lock &&
+		>$LOCK &&
+		{
+			( sleep 1 && rm -f $LOCK ) &
+		} &&
+		git -c reftable.lockTimeout=5000 update-ref refs/heads/branch HEAD
+	)
+'
+
 test_expect_success 'pack-refs: compacts tables' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 2/3] reftable/stack: allow locking of outdated stacks
  2024-09-24  5:32 ` [PATCH v4 " Patrick Steinhardt
  2024-09-24  5:33   ` [PATCH v4 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
@ 2024-09-24  5:33   ` Patrick Steinhardt
  2024-09-24  5:33   ` [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
  2 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-24  5:33 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine, James Liu, Junio C Hamano

In `reftable_stack_new_addition()` we first lock the stack and then
check whether it is still up-to-date. If it is not we return an error to
the caller indicating that the stack is outdated.

This is overly restrictive in our ref transaction interface though: we
lock the stack right before we start to verify the transaction, so we do
not really care whether it is outdated or not. What we really want is
that the stack is up-to-date after it has been locked so that we can
verify queued updates against its current state while we know that it is
locked for concurrent modification.

Introduce a new flag `REFTABLE_STACK_NEW_ADDITION_RELOAD` that alters
the behaviour of `reftable_stack_init_addition()` in this case: when we
notice that it is out-of-date we reload it instead of returning an error
to the caller.

This logic will be wired up in the reftable backend in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c         |  4 +-
 reftable/reftable-stack.h       | 13 ++++++-
 reftable/stack.c                | 20 ++++++----
 t/unit-tests/t-reftable-stack.c | 67 ++++++++++++++++++++++++++++++++-
 4 files changed, 91 insertions(+), 13 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index ca281e39a29..6ca00627dd7 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -770,7 +770,7 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		if (ret)
 			return ret;
 
-		ret = reftable_stack_new_addition(&addition, stack);
+		ret = reftable_stack_new_addition(&addition, stack, 0);
 		if (ret) {
 			if (ret == REFTABLE_LOCK_ERROR)
 				strbuf_addstr(err, "cannot lock references");
@@ -2207,7 +2207,7 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	if (ret < 0)
 		goto done;
 
-	ret = reftable_stack_new_addition(&add, stack);
+	ret = reftable_stack_new_addition(&add, stack, 0);
 	if (ret < 0)
 		goto done;
 
diff --git a/reftable/reftable-stack.h b/reftable/reftable-stack.h
index f4f8cabc7fb..6370fe45ddf 100644
--- a/reftable/reftable-stack.h
+++ b/reftable/reftable-stack.h
@@ -37,12 +37,21 @@ uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
 /* holds a transaction to add tables at the top of a stack. */
 struct reftable_addition;
 
+enum {
+	/*
+	 * Reload the stack when the stack is out-of-date after locking it.
+	 */
+	REFTABLE_STACK_NEW_ADDITION_RELOAD = (1 << 0),
+};
+
 /*
  * returns a new transaction to add reftables to the given stack. As a side
- * effect, the ref database is locked.
+ * effect, the ref database is locked. Accepts REFTABLE_STACK_NEW_ADDITION_*
+ * flags.
  */
 int reftable_stack_new_addition(struct reftable_addition **dest,
-				struct reftable_stack *st);
+				struct reftable_stack *st,
+				unsigned int flags);
 
 /* Adds a reftable to transaction. */
 int reftable_addition_add(struct reftable_addition *add,
diff --git a/reftable/stack.c b/reftable/stack.c
index 5ccad2cff31..84cf37a2ad0 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -596,7 +596,8 @@ struct reftable_addition {
 #define REFTABLE_ADDITION_INIT {0}
 
 static int reftable_stack_init_addition(struct reftable_addition *add,
-					struct reftable_stack *st)
+					struct reftable_stack *st,
+					unsigned int flags)
 {
 	struct strbuf lock_file_name = STRBUF_INIT;
 	int err;
@@ -626,6 +627,11 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 	err = stack_uptodate(st);
 	if (err < 0)
 		goto done;
+	if (err > 0 && flags & REFTABLE_STACK_NEW_ADDITION_RELOAD) {
+		err = reftable_stack_reload_maybe_reuse(add->stack, 1);
+		if (err)
+			goto done;
+	}
 	if (err > 0) {
 		err = REFTABLE_OUTDATED_ERROR;
 		goto done;
@@ -633,9 +639,8 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 
 	add->next_update_index = reftable_stack_next_update_index(st);
 done:
-	if (err) {
+	if (err)
 		reftable_addition_close(add);
-	}
 	strbuf_release(&lock_file_name);
 	return err;
 }
@@ -739,13 +744,14 @@ int reftable_addition_commit(struct reftable_addition *add)
 }
 
 int reftable_stack_new_addition(struct reftable_addition **dest,
-				struct reftable_stack *st)
+				struct reftable_stack *st,
+				unsigned int flags)
 {
 	int err = 0;
 	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
 	REFTABLE_CALLOC_ARRAY(*dest, 1);
 	**dest = empty;
-	err = reftable_stack_init_addition(*dest, st);
+	err = reftable_stack_init_addition(*dest, st, flags);
 	if (err) {
 		reftable_free(*dest);
 		*dest = NULL;
@@ -759,7 +765,7 @@ static int stack_try_add(struct reftable_stack *st,
 			 void *arg)
 {
 	struct reftable_addition add = REFTABLE_ADDITION_INIT;
-	int err = reftable_stack_init_addition(&add, st);
+	int err = reftable_stack_init_addition(&add, st, 0);
 	if (err < 0)
 		goto done;
 
@@ -1608,7 +1614,7 @@ static int reftable_stack_clean_locked(struct reftable_stack *st)
 int reftable_stack_clean(struct reftable_stack *st)
 {
 	struct reftable_addition *add = NULL;
-	int err = reftable_stack_new_addition(&add, st);
+	int err = reftable_stack_new_addition(&add, st, 0);
 	if (err < 0) {
 		goto done;
 	}
diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
index d62a9c1bed5..a37cc698d87 100644
--- a/t/unit-tests/t-reftable-stack.c
+++ b/t/unit-tests/t-reftable-stack.c
@@ -271,7 +271,7 @@ static void t_reftable_stack_transaction_api(void)
 
 	reftable_addition_destroy(add);
 
-	err = reftable_stack_new_addition(&add, st);
+	err = reftable_stack_new_addition(&add, st, 0);
 	check(!err);
 
 	err = reftable_addition_add(add, write_test_ref, &ref);
@@ -292,6 +292,68 @@ static void t_reftable_stack_transaction_api(void)
 	clear_dir(dir);
 }
 
+static void t_reftable_stack_transaction_with_reload(void)
+{
+	char *dir = get_tmp_dir(__LINE__);
+	struct reftable_stack *st1 = NULL, *st2 = NULL;
+	int err;
+	struct reftable_addition *add = NULL;
+	struct reftable_ref_record refs[2] = {
+		{
+			.refname = (char *) "refs/heads/a",
+			.update_index = 1,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { '1' },
+		},
+		{
+			.refname = (char *) "refs/heads/b",
+			.update_index = 2,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { '1' },
+		},
+	};
+	struct reftable_ref_record ref = { 0 };
+
+	err = reftable_new_stack(&st1, dir, NULL);
+	check(!err);
+	err = reftable_new_stack(&st2, dir, NULL);
+	check(!err);
+
+	err = reftable_stack_new_addition(&add, st1, 0);
+	check(!err);
+	err = reftable_addition_add(add, write_test_ref, &refs[0]);
+	check(!err);
+	err = reftable_addition_commit(add);
+	check(!err);
+	reftable_addition_destroy(add);
+
+	/*
+	 * The second stack is now outdated, which we should notice. We do not
+	 * create the addition and lock the stack by default, but allow the
+	 * reload to happen when REFTABLE_STACK_NEW_ADDITION_RELOAD is set.
+	 */
+	err = reftable_stack_new_addition(&add, st2, 0);
+	check_int(err, ==, REFTABLE_OUTDATED_ERROR);
+	err = reftable_stack_new_addition(&add, st2, REFTABLE_STACK_NEW_ADDITION_RELOAD);
+	check(!err);
+	err = reftable_addition_add(add, write_test_ref, &refs[1]);
+	check(!err);
+	err = reftable_addition_commit(add);
+	check(!err);
+	reftable_addition_destroy(add);
+
+	for (size_t i = 0; i < ARRAY_SIZE(refs); i++) {
+		err = reftable_stack_read_ref(st2, refs[i].refname, &ref);
+		check(!err);
+		check(reftable_ref_record_equal(&refs[i], &ref, GIT_SHA1_RAWSZ));
+	}
+
+	reftable_ref_record_release(&ref);
+	reftable_stack_destroy(st1);
+	reftable_stack_destroy(st2);
+	clear_dir(dir);
+}
+
 static void t_reftable_stack_transaction_api_performs_auto_compaction(void)
 {
 	char *dir = get_tmp_dir(__LINE__);
@@ -322,7 +384,7 @@ static void t_reftable_stack_transaction_api_performs_auto_compaction(void)
 		 */
 		st->opts.disable_auto_compact = i != n;
 
-		err = reftable_stack_new_addition(&add, st);
+		err = reftable_stack_new_addition(&add, st, 0);
 		check(!err);
 
 		err = reftable_addition_add(add, write_test_ref, &ref);
@@ -1314,6 +1376,7 @@ int cmd_main(int argc UNUSED, const char *argv[] UNUSED)
 	TEST(t_reftable_stack_reload_with_missing_table(), "stack iteration with garbage tables");
 	TEST(t_reftable_stack_tombstone(), "'tombstone' refs in stack");
 	TEST(t_reftable_stack_transaction_api(), "update transaction to stack");
+	TEST(t_reftable_stack_transaction_with_reload(), "transaction with reload");
 	TEST(t_reftable_stack_transaction_api_performs_auto_compaction(), "update transaction triggers auto-compaction");
 	TEST(t_reftable_stack_update_index_check(), "update transactions with equal update indices");
 	TEST(t_reftable_stack_uptodate(), "stack must be reloaded before ref update");
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-09-24  5:32 ` [PATCH v4 " Patrick Steinhardt
  2024-09-24  5:33   ` [PATCH v4 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
  2024-09-24  5:33   ` [PATCH v4 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
@ 2024-09-24  5:33   ` Patrick Steinhardt
  2024-09-27  4:07     ` Jeff King
  2 siblings, 1 reply; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-24  5:33 UTC (permalink / raw)
  To: git; +Cc: karthik nayak, Eric Sunshine, James Liu, Junio C Hamano

When starting a reftable transaction we lock all stacks we are about to
modify. While it may happen that the stack is out-of-date at this point
in time we don't really care: transactional updates encode the expected
state of a certain reference, so all that we really want to verify is
that the _current_ value matches that expected state.

Pass `REFTABLE_STACK_NEW_ADDITION_RELOAD` when locking the stack such
that an out-of-date stack will be reloaded after having been locked.
This change is safe because all verifications of the expected state
happen after this step anyway.

Add a testcase that verifies that many writers are now able to write to
the stack concurrently without failures and with a deterministic end
result.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c    |  3 ++-
 t/t0610-reftable-basics.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 6ca00627dd7..efe2b0258ca 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -770,7 +770,8 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		if (ret)
 			return ret;
 
-		ret = reftable_stack_new_addition(&addition, stack, 0);
+		ret = reftable_stack_new_addition(&addition, stack,
+						  REFTABLE_STACK_NEW_ADDITION_RELOAD);
 		if (ret) {
 			if (ret == REFTABLE_LOCK_ERROR)
 				strbuf_addstr(err, "cannot lock references");
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 62da3e37823..2d951c8ceb6 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -450,6 +450,37 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
 	)
 '
 
+test_expect_success 'ref transaction: many concurrent writers' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		# Set a high timeout such that a busy CI machine will not abort
+		# early. 10 seconds should hopefully be ample of time to make
+		# this non-flaky.
+		git config set reftable.lockTimeout 10000 &&
+		test_commit --no-tag initial &&
+
+		head=$(git rev-parse HEAD) &&
+		for i in $(test_seq 100)
+		do
+			printf "%s commit\trefs/heads/branch-%s\n" "$head" "$i" ||
+			return 1
+		done >expect &&
+		printf "%s commit\trefs/heads/main\n" "$head" >>expect &&
+
+		for i in $(test_seq 100)
+		do
+			{ git update-ref refs/heads/branch-$i HEAD& } ||
+			return 1
+		done &&
+
+		wait &&
+		git for-each-ref --sort=v:refname >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'pack-refs: compacts tables' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.46.0.551.gc5ee8f2d1c.dirty


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/3] reftable/stack: allow locking of outdated stacks
  2024-09-20 18:10     ` Junio C Hamano
@ 2024-09-24  5:33       ` Patrick Steinhardt
  0 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-24  5:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, karthik nayak, Eric Sunshine, James Liu

On Fri, Sep 20, 2024 at 11:10:14AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >  int reftable_stack_new_addition(struct reftable_addition **dest,
> > -				struct reftable_stack *st);
> > +				struct reftable_stack *st,
> > +				int flags);
> 
> We usually use "unsigned" for a flag word, i.e., a collection of
> flag bits.

Oh, yeah, this was a mere oversight. The topic hasn't yet hit next, so
let me fix this in a final reroll.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 0/3] reftable: graceful concurrent writes
  2024-09-18 23:23   ` [PATCH v3 0/3] reftable: graceful concurrent writes James Liu
@ 2024-09-24  5:33     ` Patrick Steinhardt
  0 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-24  5:33 UTC (permalink / raw)
  To: James Liu; +Cc: git, karthik nayak, Eric Sunshine

On Thu, Sep 19, 2024 at 09:23:21AM +1000, James Liu wrote:
> On Wed Sep 18, 2024 at 7:59 PM AEST, Patrick Steinhardt wrote:
> > Hi,
> >
> > this is the third version of my patch series that implements support for
> > graceful concurrent writes with the reftable backend.
> >
> > There is only a single change compared to v2, namely that we handle `0`
> > and `-1` for the lock timeout config. With `0` we fail immediately, with
> > `-1` we lock indefinitely. This matches semantics of loose and packed
> > ref locking.
> 
> I guess you meant "we retry indefinitely" here :D

Oops, yes :)

> >
> > Thanks!
> >
> > Patrick
> 
> Thanks Patrick, no more comments from me!

Thanks for your review!

Patrick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-09-24  5:33   ` [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
@ 2024-09-27  4:07     ` Jeff King
  2024-09-30  6:49       ` Patrick Steinhardt
  2024-09-30 22:19       ` Josh Steadmon
  0 siblings, 2 replies; 42+ messages in thread
From: Jeff King @ 2024-09-27  4:07 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, karthik nayak, Eric Sunshine, James Liu, Junio C Hamano

On Tue, Sep 24, 2024 at 07:33:08AM +0200, Patrick Steinhardt wrote:

> +test_expect_success 'ref transaction: many concurrent writers' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		# Set a high timeout such that a busy CI machine will not abort
> +		# early. 10 seconds should hopefully be ample of time to make
> +		# this non-flaky.
> +		git config set reftable.lockTimeout 10000 &&

I saw this test racily fail in the Windows CI build. The failure is as
you might imagine, a few of the background update-ref invocations
failed:

  fatal: update_ref failed for ref 'refs/heads/branch-21': reftable: transaction failure: I/O error

but of course we don't notice because they're backgrounded. And then the
expected output is missing the branch-21 entry (and in my case,
branch-64 suffered a similar fate).

At first I thought we probably needed to bump the timeout (and EIO was
just our way of passing that up the stack). But looking at the
timestamps in the Actions log, the whole loop took less than 10ms to
run.

So could this be indicative of a real contention issue specific to
Windows? I'm wondering if something like the old "you can't delete a
file somebody else has open" restriction is biting us somehow.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-09-27  4:07     ` Jeff King
@ 2024-09-30  6:49       ` Patrick Steinhardt
  2024-09-30 22:19       ` Josh Steadmon
  1 sibling, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-09-30  6:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git, karthik nayak, Eric Sunshine, James Liu, Junio C Hamano

On Fri, Sep 27, 2024 at 12:07:52AM -0400, Jeff King wrote:
> On Tue, Sep 24, 2024 at 07:33:08AM +0200, Patrick Steinhardt wrote:
> 
> > +test_expect_success 'ref transaction: many concurrent writers' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		# Set a high timeout such that a busy CI machine will not abort
> > +		# early. 10 seconds should hopefully be ample of time to make
> > +		# this non-flaky.
> > +		git config set reftable.lockTimeout 10000 &&
> 
> I saw this test racily fail in the Windows CI build. The failure is as
> you might imagine, a few of the background update-ref invocations
> failed:
> 
>   fatal: update_ref failed for ref 'refs/heads/branch-21': reftable: transaction failure: I/O error
> 
> but of course we don't notice because they're backgrounded. And then the
> expected output is missing the branch-21 entry (and in my case,
> branch-64 suffered a similar fate).
> 
> At first I thought we probably needed to bump the timeout (and EIO was
> just our way of passing that up the stack). But looking at the
> timestamps in the Actions log, the whole loop took less than 10ms to
> run.
> 
> So could this be indicative of a real contention issue specific to
> Windows? I'm wondering if something like the old "you can't delete a
> file somebody else has open" restriction is biting us somehow.

Thanks for letting me know!

I very much expect that this is the scenario that you mention, where we
try to delete a file that is still held open by another process. We're
trying to be mindful about this restriction is the reftable library by
not failing when a call to unlink(3P) fails for any of the tables, and
we do have logic in place to unlink them at a later point in time when
not referenced by the "tables.list" file. So none of the calls to unlink
are error-checked at all.

But there's one file that we _have_ to rewrite, and that is of course
the "tables.list" file itself. We never unlink the file though but only
rename the new instance into place. I think I recently discovered that
we have some problems here, because Windows seemed to allow this in some
scenarios but not in others.

In any case, I've already set up a Windows VM last week, so I'll
investigate the issue later this week.

Patrick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-09-27  4:07     ` Jeff King
  2024-09-30  6:49       ` Patrick Steinhardt
@ 2024-09-30 22:19       ` Josh Steadmon
  2024-10-01  4:27         ` Patrick Steinhardt
  2024-10-01  7:34         ` Junio C Hamano
  1 sibling, 2 replies; 42+ messages in thread
From: Josh Steadmon @ 2024-09-30 22:19 UTC (permalink / raw)
  To: Jeff King
  Cc: Patrick Steinhardt, git, karthik nayak, Eric Sunshine, James Liu,
	Junio C Hamano

On 2024.09.27 00:07, Jeff King wrote:
> On Tue, Sep 24, 2024 at 07:33:08AM +0200, Patrick Steinhardt wrote:
> 
> > +test_expect_success 'ref transaction: many concurrent writers' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		# Set a high timeout such that a busy CI machine will not abort
> > +		# early. 10 seconds should hopefully be ample of time to make
> > +		# this non-flaky.
> > +		git config set reftable.lockTimeout 10000 &&
> 
> I saw this test racily fail in the Windows CI build. The failure is as
> you might imagine, a few of the background update-ref invocations
> failed:
> 
>   fatal: update_ref failed for ref 'refs/heads/branch-21': reftable: transaction failure: I/O error
> 
> but of course we don't notice because they're backgrounded. And then the
> expected output is missing the branch-21 entry (and in my case,
> branch-64 suffered a similar fate).
> 
> At first I thought we probably needed to bump the timeout (and EIO was
> just our way of passing that up the stack). But looking at the
> timestamps in the Actions log, the whole loop took less than 10ms to
> run.
> 
> So could this be indicative of a real contention issue specific to
> Windows? I'm wondering if something like the old "you can't delete a
> file somebody else has open" restriction is biting us somehow.
> 
> -Peff

We're seeing repeated failures from this test case with ASan enabled.
Unfortunately, we've only been able to reproduce this on our
$DAYJOB-specific build system. I haven't been able to get it to fail
using just the upstream Makefile so far. I'll keep trying to find a way
to reproduce this.

FWIW, we're not getting I/O errors, we see the following:
fatal: update_ref failed for ref 'refs/heads/branch-20': cannot lock references

We tried increasing the timeout in the test to 2 minutes (up from 10s),
but it didn't fix the failures.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-09-30 22:19       ` Josh Steadmon
@ 2024-10-01  4:27         ` Patrick Steinhardt
  2024-10-01 22:54           ` Jeff King
  2024-10-01  7:34         ` Junio C Hamano
  1 sibling, 1 reply; 42+ messages in thread
From: Patrick Steinhardt @ 2024-10-01  4:27 UTC (permalink / raw)
  To: Josh Steadmon, Jeff King, git, karthik nayak, Eric Sunshine,
	James Liu, Junio C Hamano

On Mon, Sep 30, 2024 at 03:19:04PM -0700, Josh Steadmon wrote:
> On 2024.09.27 00:07, Jeff King wrote:
> > On Tue, Sep 24, 2024 at 07:33:08AM +0200, Patrick Steinhardt wrote:
> > 
> > > +test_expect_success 'ref transaction: many concurrent writers' '
> > > +	test_when_finished "rm -rf repo" &&
> > > +	git init repo &&
> > > +	(
> > > +		cd repo &&
> > > +		# Set a high timeout such that a busy CI machine will not abort
> > > +		# early. 10 seconds should hopefully be ample of time to make
> > > +		# this non-flaky.
> > > +		git config set reftable.lockTimeout 10000 &&
> > 
> > I saw this test racily fail in the Windows CI build. The failure is as
> > you might imagine, a few of the background update-ref invocations
> > failed:
> > 
> >   fatal: update_ref failed for ref 'refs/heads/branch-21': reftable: transaction failure: I/O error
> > 
> > but of course we don't notice because they're backgrounded. And then the
> > expected output is missing the branch-21 entry (and in my case,
> > branch-64 suffered a similar fate).
> > 
> > At first I thought we probably needed to bump the timeout (and EIO was
> > just our way of passing that up the stack). But looking at the
> > timestamps in the Actions log, the whole loop took less than 10ms to
> > run.
> > 
> > So could this be indicative of a real contention issue specific to
> > Windows? I'm wondering if something like the old "you can't delete a
> > file somebody else has open" restriction is biting us somehow.
> > 
> > -Peff
> 
> We're seeing repeated failures from this test case with ASan enabled.
> Unfortunately, we've only been able to reproduce this on our
> $DAYJOB-specific build system. I haven't been able to get it to fail
> using just the upstream Makefile so far. I'll keep trying to find a way
> to reproduce this.
> 
> FWIW, we're not getting I/O errors, we see the following:
> fatal: update_ref failed for ref 'refs/heads/branch-20': cannot lock references
> 
> We tried increasing the timeout in the test to 2 minutes (up from 10s),
> but it didn't fix the failures.

If this is causing problems for folks I'd say we can do the below change
for now. It's of course only a stop-gap solution until I find the time
to debug this, which should be later this week or early next week.

Patrick

diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 2d951c8ceb..ad7bb39b79 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -450,7 +450,7 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
 	)
 '
 
-test_expect_success 'ref transaction: many concurrent writers' '
+test_expect_success !WINDOWS 'ref transaction: many concurrent writers' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-09-30 22:19       ` Josh Steadmon
  2024-10-01  4:27         ` Patrick Steinhardt
@ 2024-10-01  7:34         ` Junio C Hamano
  2024-10-01 18:53           ` Josh Steadmon
  1 sibling, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2024-10-01  7:34 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: Jeff King, Patrick Steinhardt, git, karthik nayak, Eric Sunshine,
	James Liu

Josh Steadmon <steadmon@google.com> writes:

> On 2024.09.27 00:07, Jeff King wrote:
>> On Tue, Sep 24, 2024 at 07:33:08AM +0200, Patrick Steinhardt wrote:
>> 
>> > +test_expect_success 'ref transaction: many concurrent writers' '
>> > +	test_when_finished "rm -rf repo" &&
>> > +	git init repo &&
>> > +	(
>> > +		cd repo &&
>> > +		# Set a high timeout such that a busy CI machine will not abort
>> > +		# early. 10 seconds should hopefully be ample of time to make
>> > +		# this non-flaky.
>> > +		git config set reftable.lockTimeout 10000 &&
>> ...
>
> We're seeing repeated failures from this test case with ASan enabled.
> Unfortunately, we've only been able to reproduce this on our
> $DAYJOB-specific build system. I haven't been able to get it to fail
> using just the upstream Makefile so far. I'll keep trying to find a way
> to reproduce this.
>
> FWIW, we're not getting I/O errors, we see the following:
> fatal: update_ref failed for ref 'refs/heads/branch-20': cannot lock references
>
> We tried increasing the timeout in the test to 2 minutes (up from 10s),
> but it didn't fix the failures.

Thanks for a report, and please keep digging ;-).

Is your build, like Peff's, for Windows, or your variant of Linux?

Thanks.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-10-01  7:34         ` Junio C Hamano
@ 2024-10-01 18:53           ` Josh Steadmon
  2024-10-01 19:08             ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Josh Steadmon @ 2024-10-01 18:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Patrick Steinhardt, git, karthik nayak, Eric Sunshine,
	James Liu

On 2024.10.01 00:34, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
> > On 2024.09.27 00:07, Jeff King wrote:
> >> On Tue, Sep 24, 2024 at 07:33:08AM +0200, Patrick Steinhardt wrote:
> >> 
> >> > +test_expect_success 'ref transaction: many concurrent writers' '
> >> > +	test_when_finished "rm -rf repo" &&
> >> > +	git init repo &&
> >> > +	(
> >> > +		cd repo &&
> >> > +		# Set a high timeout such that a busy CI machine will not abort
> >> > +		# early. 10 seconds should hopefully be ample of time to make
> >> > +		# this non-flaky.
> >> > +		git config set reftable.lockTimeout 10000 &&
> >> ...
> >
> > We're seeing repeated failures from this test case with ASan enabled.
> > Unfortunately, we've only been able to reproduce this on our
> > $DAYJOB-specific build system. I haven't been able to get it to fail
> > using just the upstream Makefile so far. I'll keep trying to find a way
> > to reproduce this.
> >
> > FWIW, we're not getting I/O errors, we see the following:
> > fatal: update_ref failed for ref 'refs/heads/branch-20': cannot lock references
> >
> > We tried increasing the timeout in the test to 2 minutes (up from 10s),
> > but it didn't fix the failures.
> 
> Thanks for a report, and please keep digging ;-).
> 
> Is your build, like Peff's, for Windows, or your variant of Linux?
> 
> Thanks.

It's our internal Debian-based variant of Linux. Sorry for not
specifying earlier.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-10-01 18:53           ` Josh Steadmon
@ 2024-10-01 19:08             ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2024-10-01 19:08 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: Junio C Hamano, Patrick Steinhardt, git, karthik nayak,
	Eric Sunshine, James Liu

On Tue, Oct 01, 2024 at 11:53:54AM -0700, Josh Steadmon wrote:

> > > We're seeing repeated failures from this test case with ASan enabled.
> > > Unfortunately, we've only been able to reproduce this on our
> > > $DAYJOB-specific build system. I haven't been able to get it to fail
> > > using just the upstream Makefile so far. I'll keep trying to find a way
> > > to reproduce this.
> > >
> > > FWIW, we're not getting I/O errors, we see the following:
> > > fatal: update_ref failed for ref 'refs/heads/branch-20': cannot lock references
> > >
> > > We tried increasing the timeout in the test to 2 minutes (up from 10s),
> > > but it didn't fix the failures.
> > 
> > Thanks for a report, and please keep digging ;-).
> > 
> > Is your build, like Peff's, for Windows, or your variant of Linux?
> > 
> > Thanks.
> 
> It's our internal Debian-based variant of Linux. Sorry for not
> specifying earlier.

I just tried doing (on 'next', since the new test is added there):

  make SANITIZE=address,undefined CFLAGS=-O0
  cd t
  ./t0610-reftable-basics.sh --run=1-47 --stress

and got lots of failures like you describe. But in my case bumping the
timeout from 10s to 100s made that go away. So I think it was just the
sanitizers making things _really_ slow and causing legit timeouts.

Have you tried going beyond 2 minutes? That seems like a crazy amount of
time, but I guess it's possible on a really overloaded system. Of course
it's also quite possible that there are multiple issues and you are
seeing something else entirely. :)

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-10-01  4:27         ` Patrick Steinhardt
@ 2024-10-01 22:54           ` Jeff King
  2024-10-01 23:24             ` Junio C Hamano
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2024-10-01 22:54 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Josh Steadmon, git, karthik nayak, Eric Sunshine, James Liu,
	Junio C Hamano

On Tue, Oct 01, 2024 at 06:27:33AM +0200, Patrick Steinhardt wrote:

> If this is causing problems for folks I'd say we can do the below change
> for now. It's of course only a stop-gap solution until I find the time
> to debug this, which should be later this week or early next week.
> 
> Patrick
> 
> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> index 2d951c8ceb..ad7bb39b79 100755
> --- a/t/t0610-reftable-basics.sh
> +++ b/t/t0610-reftable-basics.sh
> @@ -450,7 +450,7 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
>  	)
>  '
>  
> -test_expect_success 'ref transaction: many concurrent writers' '
> +test_expect_success !WINDOWS 'ref transaction: many concurrent writers' '
>  	test_when_finished "rm -rf repo" &&
>  	git init repo &&
>  	(

IMHO we can live with a flaky test for a little while. It's not like
it's the only one. ;) And hopefully your digging turns up a real
solution.

It also sounds from subsequent discussion that Josh's issue was on
Linux, so it wouldn't help there.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-10-01 22:54           ` Jeff King
@ 2024-10-01 23:24             ` Junio C Hamano
  2024-10-02 10:58               ` Patrick Steinhardt
  0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2024-10-01 23:24 UTC (permalink / raw)
  To: Jeff King
  Cc: Patrick Steinhardt, Josh Steadmon, git, karthik nayak,
	Eric Sunshine, James Liu

Jeff King <peff@peff.net> writes:

> On Tue, Oct 01, 2024 at 06:27:33AM +0200, Patrick Steinhardt wrote:
>
>> If this is causing problems for folks I'd say we can do the below change
>> for now. It's of course only a stop-gap solution until I find the time
>> to debug this, which should be later this week or early next week.
>> 
>> Patrick
>> 
>> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
>> index 2d951c8ceb..ad7bb39b79 100755
>> --- a/t/t0610-reftable-basics.sh
>> +++ b/t/t0610-reftable-basics.sh
>> @@ -450,7 +450,7 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
>>  	)
>>  '
>>  
>> -test_expect_success 'ref transaction: many concurrent writers' '
>> +test_expect_success !WINDOWS 'ref transaction: many concurrent writers' '
>>  	test_when_finished "rm -rf repo" &&
>>  	git init repo &&
>>  	(
>
> IMHO we can live with a flaky test for a little while. It's not like
> it's the only one. ;) And hopefully your digging turns up a real
> solution.
>
> It also sounds from subsequent discussion that Josh's issue was on
> Linux, so it wouldn't help there.

That's true.  WINDOWS prereq would not help there, even though it
would hide the breakage from CI.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction
  2024-10-01 23:24             ` Junio C Hamano
@ 2024-10-02 10:58               ` Patrick Steinhardt
  0 siblings, 0 replies; 42+ messages in thread
From: Patrick Steinhardt @ 2024-10-02 10:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Josh Steadmon, git, karthik nayak, Eric Sunshine,
	James Liu

On Tue, Oct 01, 2024 at 04:24:50PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > On Tue, Oct 01, 2024 at 06:27:33AM +0200, Patrick Steinhardt wrote:
> >
> >> If this is causing problems for folks I'd say we can do the below change
> >> for now. It's of course only a stop-gap solution until I find the time
> >> to debug this, which should be later this week or early next week.
> >> 
> >> Patrick
> >> 
> >> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> >> index 2d951c8ceb..ad7bb39b79 100755
> >> --- a/t/t0610-reftable-basics.sh
> >> +++ b/t/t0610-reftable-basics.sh
> >> @@ -450,7 +450,7 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
> >>  	)
> >>  '
> >>  
> >> -test_expect_success 'ref transaction: many concurrent writers' '
> >> +test_expect_success !WINDOWS 'ref transaction: many concurrent writers' '
> >>  	test_when_finished "rm -rf repo" &&
> >>  	git init repo &&
> >>  	(
> >
> > IMHO we can live with a flaky test for a little while. It's not like
> > it's the only one. ;) And hopefully your digging turns up a real
> > solution.
> >
> > It also sounds from subsequent discussion that Josh's issue was on
> > Linux, so it wouldn't help there.
> 
> That's true.  WINDOWS prereq would not help there, even though it
> would hide the breakage from CI.

Okay, fair enough, thanks. I should finally get to this issue either on
Friday or early next week.

Patrick

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-10-02 10:58 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-17 13:43 [PATCH 0/3] reftable: graceful concurrent writes Patrick Steinhardt
2024-09-17 13:43 ` [PATCH 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
2024-09-17 17:46   ` karthik nayak
2024-09-17 17:50     ` Eric Sunshine
2024-09-18  4:31       ` Patrick Steinhardt
2024-09-18  4:31     ` Patrick Steinhardt
2024-09-17 13:43 ` [PATCH 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
2024-09-17 13:43 ` [PATCH 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
2024-09-17 18:26 ` [PATCH 0/3] reftable: graceful concurrent writes karthik nayak
2024-09-18  4:31   ` Patrick Steinhardt
2024-09-18  4:32 ` [PATCH v2 " Patrick Steinhardt
2024-09-18  4:32   ` [PATCH v2 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
2024-09-18  9:22     ` James Liu
2024-09-18  9:39       ` Patrick Steinhardt
2024-09-18  4:32   ` [PATCH v2 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
2024-09-18  9:26     ` James Liu
2024-09-18  9:39       ` Patrick Steinhardt
2024-09-18  4:32   ` [PATCH v2 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
2024-09-18  9:33   ` [PATCH v2 0/3] reftable: graceful concurrent writes James Liu
2024-09-18  9:59 ` [PATCH v3 " Patrick Steinhardt
2024-09-18  9:59   ` [PATCH v3 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
2024-09-19 21:34     ` Junio C Hamano
2024-09-18  9:59   ` [PATCH v3 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
2024-09-20 18:10     ` Junio C Hamano
2024-09-24  5:33       ` Patrick Steinhardt
2024-09-18  9:59   ` [PATCH v3 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
2024-09-18 23:23   ` [PATCH v3 0/3] reftable: graceful concurrent writes James Liu
2024-09-24  5:33     ` Patrick Steinhardt
2024-09-24  5:32 ` [PATCH v4 " Patrick Steinhardt
2024-09-24  5:33   ` [PATCH v4 1/3] refs/reftable: introduce "reftable.lockTimeout" Patrick Steinhardt
2024-09-24  5:33   ` [PATCH v4 2/3] reftable/stack: allow locking of outdated stacks Patrick Steinhardt
2024-09-24  5:33   ` [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction Patrick Steinhardt
2024-09-27  4:07     ` Jeff King
2024-09-30  6:49       ` Patrick Steinhardt
2024-09-30 22:19       ` Josh Steadmon
2024-10-01  4:27         ` Patrick Steinhardt
2024-10-01 22:54           ` Jeff King
2024-10-01 23:24             ` Junio C Hamano
2024-10-02 10:58               ` Patrick Steinhardt
2024-10-01  7:34         ` Junio C Hamano
2024-10-01 18:53           ` Josh Steadmon
2024-10-01 19:08             ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).