Git development
 help / color / mirror / Atom feed
* [PATCH v2 02/11] reftable: handle interrupted reads
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]

There are calls to pread(3P) and read(3P) where we don't properly handle
interrupts. Convert them to use `pread_in_full()` and `read_in_full()`,
respectively.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/blocksource.c | 2 +-
 reftable/stack.c       | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/reftable/blocksource.c b/reftable/blocksource.c
index 8331b34e82..a1ea304429 100644
--- a/reftable/blocksource.c
+++ b/reftable/blocksource.c
@@ -109,7 +109,7 @@ static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
 	struct file_block_source *b = v;
 	assert(off + size <= b->size);
 	dest->data = reftable_malloc(size);
-	if (pread(b->fd, dest->data, size, off) != size)
+	if (pread_in_full(b->fd, dest->data, size, off) != size)
 		return -1;
 	dest->len = size;
 	return size;
diff --git a/reftable/stack.c b/reftable/stack.c
index ddbdf1b9c8..ed108a929b 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -92,7 +92,7 @@ static int fd_read_lines(int fd, char ***namesp)
 	}
 
 	buf = reftable_malloc(size + 1);
-	if (read(fd, buf, size) != size) {
+	if (read_in_full(fd, buf, size) != size) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
 	}
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 03/11] reftable: handle interrupted writes
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1923 bytes --]

There are calls to write(3P) where we don't properly handle interrupts.
Convert them to use `write_in_full()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c      | 6 +++---
 reftable/stack_test.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index ed108a929b..f0cadad490 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -42,7 +42,7 @@ static void stack_filename(struct strbuf *dest, struct reftable_stack *st,
 static ssize_t reftable_fd_write(void *arg, const void *data, size_t sz)
 {
 	int *fdp = (int *)arg;
-	return write(*fdp, data, sz);
+	return write_in_full(*fdp, data, sz);
 }
 
 int reftable_new_stack(struct reftable_stack **dest, const char *dir,
@@ -554,7 +554,7 @@ int reftable_addition_commit(struct reftable_addition *add)
 		strbuf_addstr(&table_list, "\n");
 	}
 
-	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	err = write_in_full(add->lock_file_fd, table_list.buf, table_list.len);
 	strbuf_release(&table_list);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
@@ -1024,7 +1024,7 @@ static int stack_compact_range(struct reftable_stack *st, int first, int last,
 		strbuf_addstr(&ref_list_contents, "\n");
 	}
 
-	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	err = write_in_full(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
 		unlink(new_table_path.buf);
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index d0b717510f..0644c8ad2e 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -78,7 +78,7 @@ static void test_read_file(void)
 	int i = 0;
 
 	EXPECT(fd > 0);
-	n = write(fd, out, strlen(out));
+	n = write_in_full(fd, out, strlen(out));
 	EXPECT(n == strlen(out));
 	err = close(fd);
 	EXPECT(err >= 0);
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 04/11] reftable/stack: verify that `reftable_stack_add()` uses auto-compaction
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

While we have several tests that check whether we correctly perform
auto-compaction when manually calling `reftable_stack_auto_compact()`,
we don't have any tests that verify whether `reftable_stack_add()` does
call it automatically. Add one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack_test.c | 47 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 0644c8ad2e..c979d177c2 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -850,6 +850,52 @@ static void test_reftable_stack_auto_compaction(void)
 	clear_dir(dir);
 }
 
+static void test_reftable_stack_add_performs_auto_compaction(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	char *dir = get_tmp_dir(__LINE__);
+	int err, i, n = 20;
+
+	err = reftable_new_stack(&st, dir, cfg);
+	EXPECT_ERR(err);
+
+	for (i = 0; i <= n; i++) {
+		struct reftable_ref_record ref = {
+			.update_index = reftable_stack_next_update_index(st),
+			.value_type = REFTABLE_REF_SYMREF,
+			.value.symref = "master",
+		};
+		char name[100];
+
+		/*
+		 * Disable auto-compaction for all but the last runs. Like this
+		 * we can ensure that we indeed honor this setting and have
+		 * better control over when exactly auto compaction runs.
+		 */
+		st->disable_auto_compact = i != n;
+
+		snprintf(name, sizeof(name), "branch%04d", i);
+		ref.refname = name;
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		EXPECT_ERR(err);
+
+		/*
+		 * The stack length should grow continuously for all runs where
+		 * auto compaction is disabled. When enabled, we should merge
+		 * all tables in the stack.
+		 */
+		if (i != n)
+			EXPECT(st->merged->stack_len == i + 1);
+		else
+			EXPECT(st->merged->stack_len == 1);
+	}
+
+	reftable_stack_destroy(st);
+	clear_dir(dir);
+}
+
 static void test_reftable_stack_compaction_concurrent(void)
 {
 	struct reftable_write_options cfg = { 0 };
@@ -960,6 +1006,7 @@ int stack_test_main(int argc, const char *argv[])
 	RUN_TEST(test_reftable_stack_add);
 	RUN_TEST(test_reftable_stack_add_one);
 	RUN_TEST(test_reftable_stack_auto_compaction);
+	RUN_TEST(test_reftable_stack_add_performs_auto_compaction);
 	RUN_TEST(test_reftable_stack_compaction_concurrent);
 	RUN_TEST(test_reftable_stack_compaction_concurrent_clean);
 	RUN_TEST(test_reftable_stack_hash_id);
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 05/11] reftable/stack: perform auto-compaction with transactional interface
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4295 bytes --]

Whenever updating references or reflog entries in the reftable stack, we
need to add a new table to the stack, thus growing the stack's length by
one. It can thus happen quite fast that the stack grows very long, which
results in performance issues when trying to read records. But besides
performance issues, this can also lead to exhaustion of file descriptors
very rapidly as every single table requires a separate descriptor when
opening the stack.

While git-pack-refs(1) fixes this issue for us by merging the tables, it
runs too irregularly to keep the length of the stack within reasonable
limits. This is why the reftable stack has an auto-compaction mechanism:
`reftable_stack_add()` will call `reftable_stack_auto_compact()` after
its added the new table, which will auto-compact the stack as required.

But while this logic works alright for `reftable_stack_add()`, we do not
do the same in `reftable_addition_commit()`, which is the transactional
equivalent to the former function that allows us to write multiple
updates to the stack atomically. Consequentially, we will easily run
into file descriptor exhaustion in code paths that use many separate
transactions like e.g. non-atomic fetches.

Fix this issue by calling `reftable_stack_auto_compact()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c      |  6 +++++
 reftable/stack_test.c | 56 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)

diff --git a/reftable/stack.c b/reftable/stack.c
index f0cadad490..f5d18a842a 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -584,6 +584,12 @@ int reftable_addition_commit(struct reftable_addition *add)
 	add->new_tables_len = 0;
 
 	err = reftable_stack_reload(add->stack);
+	if (err)
+		goto done;
+
+	if (!add->stack->disable_auto_compact)
+		err = reftable_stack_auto_compact(add->stack);
+
 done:
 	reftable_addition_close(add);
 	return err;
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index c979d177c2..4c2f794c49 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -289,6 +289,61 @@ static void test_reftable_stack_transaction_api(void)
 	clear_dir(dir);
 }
 
+static void test_reftable_stack_transaction_api_performs_auto_compaction(void)
+{
+	char *dir = get_tmp_dir(__LINE__);
+	struct reftable_write_options cfg = {0};
+	struct reftable_addition *add = NULL;
+	struct reftable_stack *st = NULL;
+	int i, n = 20, err;
+
+	err = reftable_new_stack(&st, dir, cfg);
+	EXPECT_ERR(err);
+
+	for (i = 0; i <= n; i++) {
+		struct reftable_ref_record ref = {
+			.update_index = reftable_stack_next_update_index(st),
+			.value_type = REFTABLE_REF_SYMREF,
+			.value.symref = "master",
+		};
+		char name[100];
+
+		snprintf(name, sizeof(name), "branch%04d", i);
+		ref.refname = name;
+
+		/*
+		 * Disable auto-compaction for all but the last runs. Like this
+		 * we can ensure that we indeed honor this setting and have
+		 * better control over when exactly auto compaction runs.
+		 */
+		st->disable_auto_compact = i != n;
+
+		err = reftable_stack_new_addition(&add, st);
+		EXPECT_ERR(err);
+
+		err = reftable_addition_add(add, &write_test_ref, &ref);
+		EXPECT_ERR(err);
+
+		err = reftable_addition_commit(add);
+		EXPECT_ERR(err);
+
+		reftable_addition_destroy(add);
+
+		/*
+		 * The stack length should grow continuously for all runs where
+		 * auto compaction is disabled. When enabled, we should merge
+		 * all tables in the stack.
+		 */
+		if (i != n)
+			EXPECT(st->merged->stack_len == i + 1);
+		else
+			EXPECT(st->merged->stack_len == 1);
+	}
+
+	reftable_stack_destroy(st);
+	clear_dir(dir);
+}
+
 static void test_reftable_stack_validate_refname(void)
 {
 	struct reftable_write_options cfg = { 0 };
@@ -1014,6 +1069,7 @@ int stack_test_main(int argc, const char *argv[])
 	RUN_TEST(test_reftable_stack_log_normalize);
 	RUN_TEST(test_reftable_stack_tombstone);
 	RUN_TEST(test_reftable_stack_transaction_api);
+	RUN_TEST(test_reftable_stack_transaction_api_performs_auto_compaction);
 	RUN_TEST(test_reftable_stack_update_index_check);
 	RUN_TEST(test_reftable_stack_uptodate);
 	RUN_TEST(test_reftable_stack_validate_refname);
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 06/11] reftable/stack: reuse buffers when reloading stack
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2328 bytes --]

In `reftable_stack_reload_once()` we iterate over all the tables added
to the stack in order to figure out whether any of the tables needs to
be reloaded. We use a set of buffers in this context to compute the
paths of these tables, but discard those buffers on every iteration.
This is quite wasteful given that we do not need to transfer ownership
of the allocated buffer outside of the loop.

Refactor the code to instead reuse the buffers to reduce the number of
allocations we need to do.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index f5d18a842a..2dd2373360 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -204,6 +204,7 @@ static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
 		reftable_calloc(sizeof(struct reftable_table) * names_len);
 	int new_readers_len = 0;
 	struct reftable_merged_table *new_merged = NULL;
+	struct strbuf table_path = STRBUF_INIT;
 	int i;
 
 	while (*names) {
@@ -223,13 +224,10 @@ static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
 
 		if (!rd) {
 			struct reftable_block_source src = { NULL };
-			struct strbuf table_path = STRBUF_INIT;
 			stack_filename(&table_path, st, name);
 
 			err = reftable_block_source_from_file(&src,
 							      table_path.buf);
-			strbuf_release(&table_path);
-
 			if (err < 0)
 				goto done;
 
@@ -267,16 +265,13 @@ static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
 	for (i = 0; i < cur_len; i++) {
 		if (cur[i]) {
 			const char *name = reader_name(cur[i]);
-			struct strbuf filename = STRBUF_INIT;
-			stack_filename(&filename, st, name);
+			stack_filename(&table_path, st, name);
 
 			reader_close(cur[i]);
 			reftable_reader_free(cur[i]);
 
 			/* On Windows, can only unlink after closing. */
-			unlink(filename.buf);
-
-			strbuf_release(&filename);
+			unlink(table_path.buf);
 		}
 	}
 
@@ -288,6 +283,7 @@ static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
 	reftable_free(new_readers);
 	reftable_free(new_tables);
 	reftable_free(cur);
+	strbuf_release(&table_path);
 	return err;
 }
 
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 07/11] reftable/stack: fix stale lock when dying
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4515 bytes --]

When starting a transaction via `reftable_stack_init_addition()`, we
create a lockfile for the reftable stack itself which we'll write the
new list of tables to. But if we terminate abnormally e.g. via a call to
`die()`, then we do not remove the lockfile. Subsequent executions of
Git which try to modify references will thus fail with an out-of-date
error.

Fix this bug by registering the lock as a `struct tempfile`, which
ensures automatic cleanup for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 47 +++++++++++++++--------------------------------
 1 file changed, 15 insertions(+), 32 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 2dd2373360..2f1494aef2 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -17,6 +17,8 @@ license that can be found in the LICENSE file or at
 #include "reftable-merged.h"
 #include "writer.h"
 
+#include "tempfile.h"
+
 static int stack_try_add(struct reftable_stack *st,
 			 int (*write_table)(struct reftable_writer *wr,
 					    void *arg),
@@ -440,8 +442,7 @@ static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
 }
 
 struct reftable_addition {
-	int lock_file_fd;
-	struct strbuf lock_file_name;
+	struct tempfile *lock_file;
 	struct reftable_stack *stack;
 
 	char **new_tables;
@@ -449,24 +450,19 @@ struct reftable_addition {
 	uint64_t next_update_index;
 };
 
-#define REFTABLE_ADDITION_INIT                \
-	{                                     \
-		.lock_file_name = STRBUF_INIT \
-	}
+#define REFTABLE_ADDITION_INIT {0}
 
 static int reftable_stack_init_addition(struct reftable_addition *add,
 					struct reftable_stack *st)
 {
+	struct strbuf lock_file_name = STRBUF_INIT;
 	int err = 0;
 	add->stack = st;
 
-	strbuf_reset(&add->lock_file_name);
-	strbuf_addstr(&add->lock_file_name, st->list_file);
-	strbuf_addstr(&add->lock_file_name, ".lock");
+	strbuf_addf(&lock_file_name, "%s.lock", st->list_file);
 
-	add->lock_file_fd = open(add->lock_file_name.buf,
-				 O_EXCL | O_CREAT | O_WRONLY, 0666);
-	if (add->lock_file_fd < 0) {
+	add->lock_file = create_tempfile(lock_file_name.buf);
+	if (!add->lock_file) {
 		if (errno == EEXIST) {
 			err = REFTABLE_LOCK_ERROR;
 		} else {
@@ -475,7 +471,7 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 		goto done;
 	}
 	if (st->config.default_permissions) {
-		if (chmod(add->lock_file_name.buf, st->config.default_permissions) < 0) {
+		if (chmod(lock_file_name.buf, st->config.default_permissions) < 0) {
 			err = REFTABLE_IO_ERROR;
 			goto done;
 		}
@@ -495,6 +491,7 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 	if (err) {
 		reftable_addition_close(add);
 	}
+	strbuf_release(&lock_file_name);
 	return err;
 }
 
@@ -512,15 +509,7 @@ static void reftable_addition_close(struct reftable_addition *add)
 	add->new_tables = NULL;
 	add->new_tables_len = 0;
 
-	if (add->lock_file_fd > 0) {
-		close(add->lock_file_fd);
-		add->lock_file_fd = 0;
-	}
-	if (add->lock_file_name.len > 0) {
-		unlink(add->lock_file_name.buf);
-		strbuf_release(&add->lock_file_name);
-	}
-
+	delete_tempfile(&add->lock_file);
 	strbuf_release(&nm);
 }
 
@@ -536,8 +525,10 @@ void reftable_addition_destroy(struct reftable_addition *add)
 int reftable_addition_commit(struct reftable_addition *add)
 {
 	struct strbuf table_list = STRBUF_INIT;
+	int lock_file_fd = get_tempfile_fd(add->lock_file);
 	int i = 0;
 	int err = 0;
+
 	if (add->new_tables_len == 0)
 		goto done;
 
@@ -550,28 +541,20 @@ int reftable_addition_commit(struct reftable_addition *add)
 		strbuf_addstr(&table_list, "\n");
 	}
 
-	err = write_in_full(add->lock_file_fd, table_list.buf, table_list.len);
+	err = write_in_full(lock_file_fd, table_list.buf, table_list.len);
 	strbuf_release(&table_list);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
 	}
 
-	err = close(add->lock_file_fd);
-	add->lock_file_fd = 0;
-	if (err < 0) {
-		err = REFTABLE_IO_ERROR;
-		goto done;
-	}
-
-	err = rename(add->lock_file_name.buf, add->stack->list_file);
+	err = rename_tempfile(&add->lock_file, add->stack->list_file);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
 	}
 
 	/* success, no more state to clean up. */
-	strbuf_release(&add->lock_file_name);
 	for (i = 0; i < add->new_tables_len; i++) {
 		reftable_free(add->new_tables[i]);
 	}
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 08/11] reftable/stack: fix use of unseeded randomness
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2117 bytes --]

When writing a new reftable stack, Git will first create the stack with
a random suffix so that concurrent updates will not try to write to the
same file. This random suffix is computed via a call to rand(3P). But we
never seed the function via srand(3P), which means that the suffix is in
fact always the same.

Fix this bug by using `git_rand()` instead, which does not need to be
initialized. While this function is likely going to be slower depending
on the platform, this slowness should not matter in practice as we only
use it when writing a new reftable stack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/readwrite_test.c | 6 +++---
 reftable/stack.c          | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 469ab79a5a..278663f22d 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -141,8 +141,8 @@ static void test_log_buffer_size(void)
 	*/
 	uint8_t hash1[GIT_SHA1_RAWSZ], hash2[GIT_SHA1_RAWSZ];
 	for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
-		hash1[i] = (uint8_t)(rand() % 256);
-		hash2[i] = (uint8_t)(rand() % 256);
+		hash1[i] = (uint8_t)(git_rand() % 256);
+		hash2[i] = (uint8_t)(git_rand() % 256);
 	}
 	log.value.update.old_hash = hash1;
 	log.value.update.new_hash = hash2;
@@ -320,7 +320,7 @@ static void test_log_zlib_corruption(void)
 	};
 
 	for (i = 0; i < sizeof(message) - 1; i++)
-		message[i] = (uint8_t)(rand() % 64 + ' ');
+		message[i] = (uint8_t)(git_rand() % 64 + ' ');
 
 	reftable_writer_set_limits(w, 1, 1);
 
diff --git a/reftable/stack.c b/reftable/stack.c
index 2f1494aef2..95963f67a2 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -434,7 +434,7 @@ int reftable_stack_add(struct reftable_stack *st,
 static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
 {
 	char buf[100];
-	uint32_t rnd = (uint32_t)rand();
+	uint32_t rnd = (uint32_t)git_rand();
 	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x",
 		 min, max, rnd);
 	strbuf_reset(dest);
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 09/11] reftable/merged: reuse buffer to compute record keys
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3361 bytes --]

When iterating over entries in the merged iterator's queue, we compute
the key of each of the entries and write it into a buffer. We do not
reuse the buffer though and thus re-allocate it on every iteration,
which is wasteful given that we never transfer ownership of the
allocated bytes outside of the loop.

Refactor the code to reuse the buffer. This also fixes a potential
memory leak when `merged_iter_advance_subiter()` returns an error.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 31 ++++++++++++++++---------------
 reftable/merged.h |  2 ++
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 5ded470c08..556bb5c556 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -52,6 +52,8 @@ static void merged_iter_close(void *p)
 		reftable_iterator_destroy(&mi->stack[i]);
 	}
 	reftable_free(mi->stack);
+	strbuf_release(&mi->key);
+	strbuf_release(&mi->entry_key);
 }
 
 static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
@@ -85,7 +87,6 @@ static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
 static int merged_iter_next_entry(struct merged_iter *mi,
 				  struct reftable_record *rec)
 {
-	struct strbuf entry_key = STRBUF_INIT;
 	struct pq_entry entry = { 0 };
 	int err = 0;
 
@@ -105,33 +106,31 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 	  such a deployment, the loop below must be changed to collect all
 	  entries for the same key, and return new the newest one.
 	*/
-	reftable_record_key(&entry.rec, &entry_key);
+	reftable_record_key(&entry.rec, &mi->entry_key);
 	while (!merged_iter_pqueue_is_empty(mi->pq)) {
 		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
-		struct strbuf k = STRBUF_INIT;
-		int err = 0, cmp = 0;
+		int cmp = 0;
 
-		reftable_record_key(&top.rec, &k);
+		reftable_record_key(&top.rec, &mi->key);
 
-		cmp = strbuf_cmp(&k, &entry_key);
-		strbuf_release(&k);
-
-		if (cmp > 0) {
+		cmp = strbuf_cmp(&mi->key, &mi->entry_key);
+		if (cmp > 0)
 			break;
-		}
 
 		merged_iter_pqueue_remove(&mi->pq);
 		err = merged_iter_advance_subiter(mi, top.index);
-		if (err < 0) {
-			return err;
-		}
+		if (err < 0)
+			goto done;
 		reftable_record_release(&top.rec);
 	}
 
 	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+
+done:
 	reftable_record_release(&entry.rec);
-	strbuf_release(&entry_key);
-	return 0;
+	strbuf_release(&mi->entry_key);
+	strbuf_release(&mi->key);
+	return err;
 }
 
 static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
@@ -248,6 +247,8 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
 		.typ = reftable_record_type(rec),
 		.hash_id = mt->hash_id,
 		.suppress_deletions = mt->suppress_deletions,
+		.key = STRBUF_INIT,
+		.entry_key = STRBUF_INIT,
 	};
 	int n = 0;
 	int err = 0;
diff --git a/reftable/merged.h b/reftable/merged.h
index 7d9f95d27e..d5b39dfe7f 100644
--- a/reftable/merged.h
+++ b/reftable/merged.h
@@ -31,6 +31,8 @@ struct merged_iter {
 	uint8_t typ;
 	int suppress_deletions;
 	struct merged_iter_pqueue pq;
+	struct strbuf key;
+	struct strbuf entry_key;
 };
 
 void merged_table_release(struct reftable_merged_table *mt);
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 10/11] reftable/block: introduce macro to initialize `struct block_iter`
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3434 bytes --]

There are a bunch of locations where we initialize members of `struct
block_iter`, which makes it harder than necessary to expand this struct
to have additional members. Unify the logic via a new `BLOCK_ITER_INIT`
macro that initializes all members.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c      | 4 +---
 reftable/block.h      | 4 ++++
 reftable/block_test.c | 4 ++--
 reftable/iter.h       | 8 ++++----
 reftable/reader.c     | 7 +++----
 5 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 34d4d07369..8c6a8c77fc 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -389,9 +389,7 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
 	struct reftable_record rec = reftable_new_record(block_reader_type(br));
 	struct strbuf key = STRBUF_INIT;
 	int err = 0;
-	struct block_iter next = {
-		.last_key = STRBUF_INIT,
-	};
+	struct block_iter next = BLOCK_ITER_INIT;
 
 	int i = binsearch(br->restart_count, &restart_key_less, &args);
 	if (args.error) {
diff --git a/reftable/block.h b/reftable/block.h
index 87c77539b5..51699af233 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -86,6 +86,10 @@ struct block_iter {
 	struct strbuf last_key;
 };
 
+#define BLOCK_ITER_INIT { \
+	.last_key = STRBUF_INIT, \
+}
+
 /* initializes a block reader. */
 int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 		      uint32_t header_off, uint32_t table_block_size,
diff --git a/reftable/block_test.c b/reftable/block_test.c
index cb88af4a56..c00bbc8aed 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -32,7 +32,7 @@ static void test_block_read_write(void)
 	int i = 0;
 	int n;
 	struct block_reader br = { 0 };
-	struct block_iter it = { .last_key = STRBUF_INIT };
+	struct block_iter it = BLOCK_ITER_INIT;
 	int j = 0;
 	struct strbuf want = STRBUF_INIT;
 
@@ -87,7 +87,7 @@ static void test_block_read_write(void)
 	block_iter_close(&it);
 
 	for (i = 0; i < N; i++) {
-		struct block_iter it = { .last_key = STRBUF_INIT };
+		struct block_iter it = BLOCK_ITER_INIT;
 		strbuf_reset(&want);
 		strbuf_addstr(&want, names[i]);
 
diff --git a/reftable/iter.h b/reftable/iter.h
index 09eb0cbfa5..47d67d84df 100644
--- a/reftable/iter.h
+++ b/reftable/iter.h
@@ -53,10 +53,10 @@ struct indexed_table_ref_iter {
 	int is_finished;
 };
 
-#define INDEXED_TABLE_REF_ITER_INIT                                     \
-	{                                                               \
-		.cur = { .last_key = STRBUF_INIT }, .oid = STRBUF_INIT, \
-	}
+#define INDEXED_TABLE_REF_ITER_INIT { \
+	.cur = BLOCK_ITER_INIT, \
+	.oid = STRBUF_INIT, \
+}
 
 void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
 					  struct indexed_table_ref_iter *itr);
diff --git a/reftable/reader.c b/reftable/reader.c
index b4db23ce18..9de64f50b4 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -224,10 +224,9 @@ struct table_iter {
 	struct block_iter bi;
 	int is_finished;
 };
-#define TABLE_ITER_INIT                          \
-	{                                        \
-		.bi = {.last_key = STRBUF_INIT } \
-	}
+#define TABLE_ITER_INIT { \
+	.bi = BLOCK_ITER_INIT \
+}
 
 static void table_iter_copy_from(struct table_iter *dest,
 				 struct table_iter *src)
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 11/11] reftable/block: reuse buffer to compute record keys
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3135 bytes --]

When iterating over entries in the block iterator we compute the key of
each of the entries and write it into a buffer. We do not reuse the
buffer though and thus re-allocate it on every iteration, which is
wasteful.

Refactor the code to reuse the buffer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 19 ++++++++-----------
 reftable/block.h |  2 ++
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 8c6a8c77fc..1df3d8a0f0 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -323,30 +323,28 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 		.len = it->br->block_len - it->next_off,
 	};
 	struct string_view start = in;
-	struct strbuf key = STRBUF_INIT;
 	uint8_t extra = 0;
 	int n = 0;
 
 	if (it->next_off >= it->br->block_len)
 		return 1;
 
-	n = reftable_decode_key(&key, &extra, it->last_key, in);
+	n = reftable_decode_key(&it->key, &extra, it->last_key, in);
 	if (n < 0)
 		return -1;
 
-	if (!key.len)
+	if (!it->key.len)
 		return REFTABLE_FORMAT_ERROR;
 
 	string_view_consume(&in, n);
-	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
+	n = reftable_record_decode(rec, it->key, extra, in, it->br->hash_size);
 	if (n < 0)
 		return -1;
 	string_view_consume(&in, n);
 
 	strbuf_reset(&it->last_key);
-	strbuf_addbuf(&it->last_key, &key);
+	strbuf_addbuf(&it->last_key, &it->key);
 	it->next_off += start.len - in.len;
-	strbuf_release(&key);
 	return 0;
 }
 
@@ -377,6 +375,7 @@ int block_iter_seek(struct block_iter *it, struct strbuf *want)
 void block_iter_close(struct block_iter *it)
 {
 	strbuf_release(&it->last_key);
+	strbuf_release(&it->key);
 }
 
 int block_reader_seek(struct block_reader *br, struct block_iter *it,
@@ -387,7 +386,6 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
 		.r = br,
 	};
 	struct reftable_record rec = reftable_new_record(block_reader_type(br));
-	struct strbuf key = STRBUF_INIT;
 	int err = 0;
 	struct block_iter next = BLOCK_ITER_INIT;
 
@@ -414,8 +412,8 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
 		if (err < 0)
 			goto done;
 
-		reftable_record_key(&rec, &key);
-		if (err > 0 || strbuf_cmp(&key, want) >= 0) {
+		reftable_record_key(&rec, &it->key);
+		if (err > 0 || strbuf_cmp(&it->key, want) >= 0) {
 			err = 0;
 			goto done;
 		}
@@ -424,8 +422,7 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
 	}
 
 done:
-	strbuf_release(&key);
-	strbuf_release(&next.last_key);
+	block_iter_close(&next);
 	reftable_record_release(&rec);
 
 	return err;
diff --git a/reftable/block.h b/reftable/block.h
index 51699af233..17481e6331 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -84,10 +84,12 @@ struct block_iter {
 
 	/* key for last entry we read. */
 	struct strbuf last_key;
+	struct strbuf key;
 };
 
 #define BLOCK_ITER_INIT { \
 	.last_key = STRBUF_INIT, \
+	.key = STRBUF_INIT, \
 }
 
 /* initializes a block reader. */
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* Re: [PATCH 2/2] checkout: forbid "-B <branch>" from touching a branch used elsewhere
From: Junio C Hamano @ 2023-12-08 17:13 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Willem Verstraeten, phillip.wood, git, Jeff King
In-Reply-To: <CAPig+cSGF+vQrnD0f99cbdpQOOC7X6ULa9tFe+FwVrG0SF4PGg@mail.gmail.com>

Eric Sunshine <sunshine@sunshineco.com> writes:

>    Needs review and documentation updates.
>
> I'm not sure if the "Needs review" comment is still applicable since
> the patch did get some review comments, however, the mentioned
> documentation update is probably still needed for this series to
> graduate.

Thanks.  I think "-B" being defined as "branch -f <branch>" followed
by "checkout <branch>" makes it technically unnecessary to add any
new documentation (because "checkout <branch>" will refuse, so it
naturally follows that "checkout -B <branch>" should), but giving
the failure mode a bit more explicit mention would be more helpful
to readers.

Here is to illustrate what I have in mind.  The mention of the
"transactional" was already in the documentation for the "checkout"
back when switch was described at d787d311 (checkout: split part of
it to new command 'switch', 2019-03-29), but somehow was left out in
the documentation of the "switch".  While it is not incorrect to say
that it is a convenient short-cut, it is more important to say what
happens when one of them fails, so I am tempted to port that
description over to the "switch" command, and give the "used elsewhere"
as a sample failure mode.

The test has been also enhanced to check the "transactional" nature.

 Documentation/git-checkout.txt |  4 +++-
 Documentation/git-switch.txt   |  9 +++++++--
 t/t2400-worktree-add.sh        | 18 ++++++++++++++++--
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git c/Documentation/git-checkout.txt w/Documentation/git-checkout.txt
index 240c54639e..55a50b5b23 100644
--- c/Documentation/git-checkout.txt
+++ w/Documentation/git-checkout.txt
@@ -63,7 +63,9 @@ $ git checkout <branch>
 ------------
 +
 that is to say, the branch is not reset/created unless "git checkout" is
-successful.
+successful (e.g., when the branch is in use in another worktree, not
+just the current branch stays the same, but the branch is not reset to
+the start-point, either).
 
 'git checkout' --detach [<branch>]::
 'git checkout' [--detach] <commit>::
diff --git c/Documentation/git-switch.txt w/Documentation/git-switch.txt
index c60fc9c138..6137421ede 100644
--- c/Documentation/git-switch.txt
+++ w/Documentation/git-switch.txt
@@ -59,13 +59,18 @@ out at most one of `A` and `B`, in which case it defaults to `HEAD`.
 -c <new-branch>::
 --create <new-branch>::
 	Create a new branch named `<new-branch>` starting at
-	`<start-point>` before switching to the branch. This is a
-	convenient shortcut for:
+	`<start-point>` before switching to the branch. This is the
+	transactional equivalent of
 +
 ------------
 $ git branch <new-branch>
 $ git switch <new-branch>
 ------------
++
+that is to say, the branch is not reset/created unless "git switch" is
+successful (e.g., when the branch is in use in another worktree, not
+just the current branch stays the same, but the branch is not reset to
+the start-point, either).
 
 -C <new-branch>::
 --force-create <new-branch>::
diff --git c/t/t2400-worktree-add.sh w/t/t2400-worktree-add.sh
index bbcb2d3419..5d5064e63d 100755
--- c/t/t2400-worktree-add.sh
+++ w/t/t2400-worktree-add.sh
@@ -129,8 +129,22 @@ test_expect_success 'die the same branch is already checked out' '
 test_expect_success 'refuse to reset a branch in use elsewhere' '
 	(
 		cd here &&
-		test_must_fail git checkout -B newmain 2>actual &&
-		grep "already used by worktree at" actual
+
+		# we know we are on detached HEAD but just in case ...
+		git checkout --detach HEAD &&
+		git rev-parse --verify HEAD >old.head &&
+
+		git rev-parse --verify refs/heads/newmain >old.branch &&
+		test_must_fail git checkout -B newmain 2>error &&
+		git rev-parse --verify refs/heads/newmain >new.branch &&
+		git rev-parse --verify HEAD >new.head &&
+
+		grep "already used by worktree at" error &&
+		test_cmp old.branch new.branch &&
+		test_cmp old.head new.head &&
+
+		# and we must be still on the same detached HEAD state
+		test_must_fail git symbolic-ref HEAD
 	)
 '
 

^ permalink raw reply related

* getting git send-email patches from someone who is behind
From: Sandra Snan @ 2023-12-08 17:50 UTC (permalink / raw)
  To: git

I have a li'l git send-email question. Someone sent me a patch set 
of six patches today but they were not on most current main. I had 
to guess what version they were sending to so I could git am when 
I was on that particular version. I managed to sort it all out so 
this question is more for future reference.

Isn't there a way inside of the emails that it can show what 
version to apply the patches to?

Because now I was like "OK, I remember talking to them the other 
day and that means they probably are on what for me is HEAD^^" and 
that turned out to be correct, and sorting out the conflicts was 
also easy enough,
but if I hadn't talked to them beforehand I would've been 
completely lost.

I asked another friend about it and he said:

> it's possible to record the base commit: 
> https://git-scm.com/docs/git-format-patch#_base_tree_information
> however, it's a bit finicky to do with git-send-email

I dunno.

I get that one of the fun parts about using patches instead of PRs 
is
that you can be a li'l more loosey goosey about exactly what 
commit
something is supposed to belong to but here I would've been 
completely
lost because the patchset just borked horribly right from the 
first patch.

If others have run into this, what's the solution?

^ permalink raw reply

* Re: [PATCH 1/1] MyFirstContribution: configure shallow threads for git format-patch
From: Junio C Hamano @ 2023-12-08 18:06 UTC (permalink / raw)
  To: Stan Hu; +Cc: git, Patrick Steinhardt
In-Reply-To: <e197cbd28135df5523ff5ba1688566edb831f037.1701927372.git.stanhu@gmail.com>

Stan Hu <stanhu@gmail.com> writes:

> +[[configure-shallow-threads]]
> +=== Configuring shallow threads for `git format-patch`
> +
> +It is common practice on the Git mailing list to use "shallow" threads,
> +where every mail is a reply to the first mail of the series. You can
> +configure the default threading style of `git format-patch` via:
> +
> +-----
> +$ git config format.thread shallow
> +----


Hmph, I do not think I have such an option defined, yet I send out
my series as "shallow" threads.  I think the new description is
somewhat misleading as-is.  Isn't it applicable ONLY to those who
let "format-patch" decide the message ID for patches?  If the user
lets "git send-email" assign the message IDs (and hence the thread
structure), the configuration variable would not apply, no?

^ permalink raw reply

* Re: [PATCH] setup: recognize bare repositories with packed-refs
From: Junio C Hamano @ 2023-12-08 18:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Adam Majer, git
In-Reply-To: <20231128190446.GA10477@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> So with regards to the loosening in your patch, my questions would be:
>
>   - if we are going to change the rules for repository detection, is
>     this where we want to end up? We haven't changed them (yet) for
>     reftables. If we are going to do so, should we have a scheme that
>     will work for that transition, too? The "refs is an empty file"
>     scheme would fix your use case, too (though see below).
>
>   - is the rest of Git ready to handle a missing "refs/" directory? It
>     looks like making a ref will auto-create it (since we may have to
>     make refs/foo/bar/... anyway).
>
>   - what about other implementations? Your embedded repos will
>     presumably not work with libgit2, jgit, etc, until they also get
>     similar patches.
>
>   - what about empty repositories? In that case there will be no "refs/"
>     file and no "packed-refs" file (such a repository is less likely, of
>     course, but it may contain objects but no refs, or the point may be
>     to have an empty repo as a test vector). Likewise, it is possible
>     for a repository to have an empty "objects" directory (even with a
>     non-empty refs directory, if there are only symrefs), and your patch
>     doesn't address that.

All good points.

> Getting back to your use case, I'd suggest one of:
>
>   - do the usual "touch refs/.gitignore" trick to explicitly track the
>     empty directory. It looks like the ref code will ignore this (we
>     don't allow ref names to start with "." in a path component)
>
>   - whatever is consuming the embedded repos could "mkdir -p refs
>     objects" as needed. This is a minor pain, but I think in the long
>     term we are moving to a world where you have to explicitly do
>     "GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
>     they're already special and require some setup; adding an extra step
>     may not be so bad.

Yeah, it truly is caused by the combination of the fact that we do
not "track" empty directories and that skeleton Git repository
structure does rely on possibly empty directories.  The above two
are reasonable workarounds when you are dealing with any medium that
does not allow empty directories, not just working tree managed by
Git.

^ permalink raw reply

* Re: getting git send-email patches from someone who is behind
From: Konstantin Ryabitsev @ 2023-12-08 18:49 UTC (permalink / raw)
  To: Sandra Snan; +Cc: git
In-Reply-To: <87h6ksk2hk.fsf@ellen.idiomdrottning.org>

On Fri, Dec 08, 2023 at 06:50:31PM +0100, Sandra Snan wrote:
> I have a li'l git send-email question. Someone sent me a patch set of six
> patches today but they were not on most current main. I had to guess what
> version they were sending to so I could git am when I was on that particular
> version. I managed to sort it all out so this question is more for future
> reference.
> 
> Isn't there a way inside of the emails that it can show what version to
> apply the patches to?
> 
> Because now I was like "OK, I remember talking to them the other day and
> that means they probably are on what for me is HEAD^^" and that turned out
> to be correct, and sorting out the conflicts was also easy enough,
> but if I hadn't talked to them beforehand I would've been completely lost.
> 
> I asked another friend about it and he said:
> 
> > it's possible to record the base commit:
> > https://git-scm.com/docs/git-format-patch#_base_tree_information
> > however, it's a bit finicky to do with git-send-email
> 
> I dunno.
> 
> I get that one of the fun parts about using patches instead of PRs is
> that you can be a li'l more loosey goosey about exactly what commit
> something is supposed to belong to but here I would've been completely
> lost because the patchset just borked horribly right from the first patch.
> 
> If others have run into this, what's the solution?

Yes, setting base-commit is the right solution here -- it will tell you
exactly where in the tree it belongs. In its absence, tools like b4 will also
try to guess where the series might belong by comparing the file index
information mentioned in the patches. It's a clever, but not exact process,
because the patch may depend on changes in the files that aren't being
modified.

In general, having a base-commit: line in the cover letter or first patch is
absolutely the right way to go.

-K

^ permalink raw reply

* Re: getting git send-email patches from someone who is behind
From: Sandra Snan @ 2023-12-08 18:57 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git
In-Reply-To: <20231208-amusing-vengeful-raptor-e71a8b@meerkat>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
> Yes, setting base-commit is the right solution here -- it will 
> tell you exactly where in the tree it belongs.

I didn't know that before today and that's good advice when 
sending patches, but if I'm receiving patches without that field 
set, I wonder how people are dealing with that.

> In its absence, tools like b4 will also try to guess where the 
> series might belong by comparing the file index information 
> mentioned in the patches.

Ah, this! https://b4.docs.kernel.org/en/latest/

This looks wonderful, thank you!

^ permalink raw reply

* Re: [BUG] rev-list doesn't validate arguments to -n option
From: Junio C Hamano @ 2023-12-08 20:36 UTC (permalink / raw)
  To: git; +Cc: Britton Kerin
In-Reply-To: <CAC4O8c-nuOTS=a0sVp1603KaM2bZjs+yNZzdAaa5CGTNGFE7hQ@mail.gmail.com>

Britton Kerin <britton.kerin@gmail.com> writes:

> It tolerates non-numeric arguments and garbage after a number:
>
> For example:
>
> $ # -n 1 means same as -n 0:
> $ git rev-list -n q newest_commit
> $ git rev-list -n 0 newest_commit
> $ # Garbage after number is tolerated:
> $ git rev-list -n 1q newest_commit
> 3be33f83695088d968cf084a1a08bdcde25a8d7a
> $ git rev-list -n 2q newest_commit
> 3be33f83695088d968cf084a1a08bdcde25a8d7a
> 286e62e1b68e39334978e6222cbff187ecc17bcf

Indeed, unlike the option parsing for most commands, "rev-list" and
"log" family of commands in the oldest part of the system still use
hand-crafted option parsing and most notably use atoi() instead of a
more robust strtol() family of functions.  The same issue is present
for parsing things like "--max-count=1q" and "--skip=1q".

Perhaps a change along this line, plus a few new tests, would
suffice.  There are others (like "--max-age" we see in the post
context of the patch) that use atoi() but they probably should not
share the same machinery (most importantly, the type of the value)
as "--max-count", so I didn't touch it in the below.

 revision.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git c/revision.c w/revision.c
index 00d5c29bfc..99e838bbd1 100644
--- c/revision.c
+++ w/revision.c
@@ -2223,6 +2223,15 @@ static void add_message_grep(struct rev_info *revs, const char *pattern)
 	add_grep(revs, pattern, GREP_PATTERN_BODY);
 }
 
+static int parse_count(const char *arg)
+{
+	int count;
+
+	if (strtol_i(arg, 10, &count) < 0 || count < 0)
+		die("'%s': not a non-negative integer", arg);
+	return count;
+}
+
 static int handle_revision_opt(struct rev_info *revs, int argc, const char **argv,
 			       int *unkc, const char **unkv,
 			       const struct setup_revision_opt* opt)
@@ -2249,26 +2258,24 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	}
 
 	if ((argcount = parse_long_opt("max-count", argv, &optarg))) {
-		revs->max_count = atoi(optarg);
+		revs->max_count = parse_count(optarg);
 		revs->no_walk = 0;
 		return argcount;
 	} else if ((argcount = parse_long_opt("skip", argv, &optarg))) {
-		revs->skip_count = atoi(optarg);
+		revs->skip_count = parse_count(optarg);
 		return argcount;
 	} else if ((*arg == '-') && isdigit(arg[1])) {
 		/* accept -<digit>, like traditional "head" */
-		if (strtol_i(arg + 1, 10, &revs->max_count) < 0 ||
-		    revs->max_count < 0)
-			die("'%s': not a non-negative integer", arg + 1);
+		revs->max_count = parse_count(arg + 1);
 		revs->no_walk = 0;
 	} else if (!strcmp(arg, "-n")) {
 		if (argc <= 1)
 			return error("-n requires an argument");
-		revs->max_count = atoi(argv[1]);
+		revs->max_count = parse_count(argv[1]);
 		revs->no_walk = 0;
 		return 2;
 	} else if (skip_prefix(arg, "-n", &optarg)) {
-		revs->max_count = atoi(optarg);
+		revs->max_count = parse_count(optarg);
 		revs->no_walk = 0;
 	} else if ((argcount = parse_long_opt("max-age", argv, &optarg))) {
 		revs->max_age = atoi(optarg);

^ permalink raw reply related

* Re: [Outreachy][PATCH v3] t2400: avoid using pipes
From: Junio C Hamano @ 2023-12-08 21:00 UTC (permalink / raw)
  To: Achu Luma; +Cc: christian.couder, git
In-Reply-To: <20231204153740.2992-1-ach.lumap@gmail.com>

Achu Luma <ach.lumap@gmail.com> writes:

> Subject: Re: [Outreachy][PATCH v3] t2400: avoid using pipes

"avoid using pipes" is a means to an end.  And it is more important
to tell readers what that "end" is.  With this patch, what are we
trying to achieve?  Cater to platforms that lack pipes?  Help
platforms that cannot run two processes at the same time, so let one
run and store the result in a file, and then let the other one run,
to reduce the CPU load?

If we run a "git" command, especially a command we are testing, on
the upstream side of a pipe, we lose information.  We cannot tell
what exit status the command exited with.  That is what we care
about.

So, it is better to say that in the title, e.g.,

    Subject: [PATCH] t2400: avoid losing exit status to pipes

> The exit code of the preceding command in a pipe is disregarded,
> so it's advisable to refrain from relying on it.

It is unclear what "it" refers to here.  We cannot rely on the exit
code of the command on the upstream side of a pipe, obviously.

> Instead, by
> saving the output of a Git command to a file, we gain the
> ability to examine the exit codes of both commands separately.

Surely.  I personally think that the title that says what the
purpose of the patch is clearly should be sufficient without any
further description in the body, though.
>
> Signed-off-by: Achu Luma <ach.lumap@gmail.com>
> ---
>  Since v2 I don't send a cover  letter anymore, and I changed 
>  my "Signed-of-by: ..." line so that it
>  contains my full real name and I added "Outreachy" to the subject.

Nicely done.

>
>  t/t2400-worktree-add.sh | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/t/t2400-worktree-add.sh b/t/t2400-worktree-add.sh
> index df4aff7825..7ead05bb98 100755
> --- a/t/t2400-worktree-add.sh
> +++ b/t/t2400-worktree-add.sh
> @@ -468,7 +468,8 @@ test_expect_success 'put a worktree under rebase' '
>  		cd under-rebase &&
>  		set_fake_editor &&
>  		FAKE_LINES="edit 1" git rebase -i HEAD^ &&
> -		git worktree list | grep "under-rebase.*detached HEAD"
> +		git worktree list >actual && 
> +		grep "under-rebase.*detached HEAD" actual
>  	)
>  '
>  
> @@ -509,7 +510,8 @@ test_expect_success 'checkout a branch under bisect' '
>  		git bisect start &&
>  		git bisect bad &&
>  		git bisect good HEAD~2 &&
> -		git worktree list | grep "under-bisect.*detached HEAD" &&
> +		git worktree list >actual && 
> +		grep "under-bisect.*detached HEAD" actual &&
>  		test_must_fail git worktree add new-bisect under-bisect &&
>  		! test -d new-bisect
>  	)

^ permalink raw reply

* Re: [PATCH] setup: recognize bare repositories with packed-refs
From: Taylor Blau @ 2023-12-08 21:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Adam Majer, git
In-Reply-To: <20231206210836.GA106480@coredump.intra.peff.net>

On Wed, Dec 06, 2023 at 04:08:36PM -0500, Jeff King wrote:
> On Wed, Nov 29, 2023 at 04:30:46PM -0500, Taylor Blau wrote:
>
> > On Tue, Nov 28, 2023 at 02:04:46PM -0500, Jeff King wrote:
> > >   - whatever is consuming the embedded repos could "mkdir -p refs
> > >     objects" as needed. This is a minor pain, but I think in the long
> > >     term we are moving to a world where you have to explicitly do
> > >     "GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
> > >     they're already special and require some setup; adding an extra step
> > >     may not be so bad.
> >
> > I hope not. I suppose that using embedded bare repositories in a test
> > requires additional setup at least to "cd" into the directory (if they
> > are not using `$GIT_DIR` or `--git-dir` already). But I fear that
> > imposing even a small change like this is too tall an order for how many
> > millions of these exist in the wild across all sorts of projects.
>
> I dunno. I am skeptical that there are millions of these. Who really
> wants to embed bare git repos except for projects related to Git itself,
> which want test vectors? Is there a use case I'm missing?

Just picking on GitHub as an example, my copy has a fair number of
embedded bare repositories:

    $ find . -mindepth 2 -type d -name '*.git' | wc -l
    279

That might be an unfair example in general, since GitHub probably has a
greater need to embed bare repositories than most other projects. But I
think that we shouldn't make our decision here based on volume of
embedded bare repositories, but rather on the number of projects which
have >1 embedded bare repository.

In other words, the cost of migrating a single project's embedded bare
repositories is roughly the same whether there are 1 or 279 of them. So
the effort scales with the number of projects, not repositories.

Perhaps I'm over-estimating how difficult this transition would be to
impose on users. But it does make me very leery to make this kind of a
change without having a better sense of how many of them exist in the
wild.

Searching just on GitHub for `path:**/*.git/config` [^1], it looks like
there are ~1,400 results. That provides us an upper-bound on the number
of projects which have embedded bare repositories, so perhaps I really
am overestimating the burden we'd be imposing on other projects.

I dunno :-).

Thanks,
Taylor

[^1]: Searching for "path:**/*.git" doesn't quite work, since GitHub's
  search doesn't match directories here. So the search I actually used
  isn't perfect, but it should give us a rough approximation.

^ permalink raw reply

* Re: Minor UX annoyance w/`git add --patch untracked/file`
From: Junio C Hamano @ 2023-12-08 21:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Vito Caputo, git
In-Reply-To: <20231206195407.GD103708@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> They _could_, but keep in mind that the argument is not strictly a path.
> It is a pathspec that may match multiple paths. So:
>
>   git add -p path/to/
>
> for example will pick up the tracked files in path/to/, but not your
> untracked one.

The corresponding command w/o "-p", i.e., "git add path/to/", will
pick up both tracked and untracked ones from the named directory,
while honoring the ignore settings.  So I suspect it might feel more
natural if "-p" followed suit.

Not that I feel strongly either way.  The command has only worked
with already tracked files since its inception and nobody complained
in the past 15 years or so, probably because nobody cared that much
for relatively rare event of creating a new file and adding it.

Thanks.

^ permalink raw reply

* Re: [PATCH] gitk: add setting to hide unknown refs
From: Junio C Hamano @ 2023-12-08 21:13 UTC (permalink / raw)
  To: Joachim B Haga via GitGitGadget; +Cc: git, Joachim B Haga
In-Reply-To: <pull.1619.git.1701695899635.gitgitgadget@gmail.com>

"Joachim B Haga via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This patch adds a setting to gitk to exclude all unknown refs - which
> is considerably simpler than trying to honour the `excludeDecoration`
> pattern.

"This was simpler to implement" is a one-time cost savings for the
developer who added the feature.  For that one-time cost savings,
all the current and future users will pay the price of inconsistent
behaviour between "gitk" and "git log".

It does not look like a good trade-off.

^ permalink raw reply

* Re: [PATCH v2 02/11] reftable: handle interrupted reads
From: Taylor Blau @ 2023-12-08 21:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <b404fdf066e802328bcbcefeb9da7c996738f840.1702047081.git.ps@pks.im>

On Fri, Dec 08, 2023 at 03:53:02PM +0100, Patrick Steinhardt wrote:
> There are calls to pread(3P) and read(3P) where we don't properly handle
> interrupts. Convert them to use `pread_in_full()` and `read_in_full()`,

Just checking... do you mean "interrupt" in the kernel sense? Or are you
referring to the possibility of short reads/writes (in later patches)?

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2 04/11] reftable/stack: verify that `reftable_stack_add()` uses auto-compaction
From: Taylor Blau @ 2023-12-08 21:35 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <8061b9d2fcb3e8c3d1fd641e705b9a8879e452f4.1702047081.git.ps@pks.im>

On Fri, Dec 08, 2023 at 03:53:10PM +0100, Patrick Steinhardt wrote:
> diff --git a/reftable/stack_test.c b/reftable/stack_test.c
> index 0644c8ad2e..c979d177c2 100644
> --- a/reftable/stack_test.c
> +++ b/reftable/stack_test.c
> @@ -850,6 +850,52 @@ static void test_reftable_stack_auto_compaction(void)
>  	clear_dir(dir);
>  }
>
> +static void test_reftable_stack_add_performs_auto_compaction(void)
> +{
> +	struct reftable_write_options cfg = { 0 };
> +	struct reftable_stack *st = NULL;
> +	char *dir = get_tmp_dir(__LINE__);
> +	int err, i, n = 20;
> +
> +	err = reftable_new_stack(&st, dir, cfg);
> +	EXPECT_ERR(err);
> +
> +	for (i = 0; i <= n; i++) {
> +		struct reftable_ref_record ref = {
> +			.update_index = reftable_stack_next_update_index(st),
> +			.value_type = REFTABLE_REF_SYMREF,
> +			.value.symref = "master",
> +		};
> +		char name[100];
> +
> +		/*
> +		 * Disable auto-compaction for all but the last runs. Like this
> +		 * we can ensure that we indeed honor this setting and have
> +		 * better control over when exactly auto compaction runs.
> +		 */
> +		st->disable_auto_compact = i != n;
> +
> +		snprintf(name, sizeof(name), "branch%04d", i);
> +		ref.refname = name;

Is there a reason that we have to use snprintf() here and not a strbuf?

I would have expected to see something like:

    struct strbuf buf = STRBUF_INIT;
    /* ... */
    strbuf_addf(&buf, "branch%04d", i);
    ref.refname = strbuf_detach(&buf, NULL);

I guess it doesn't matter too much, but I think if we can avoid using
snprintf(), it's worth doing. If we must use snprintf() here, we should
probably use Git's xsnprintf() instead.

> +		err = reftable_stack_add(st, &write_test_ref, &ref);
> +		EXPECT_ERR(err);
> +
> +		/*
> +		 * The stack length should grow continuously for all runs where
> +		 * auto compaction is disabled. When enabled, we should merge
> +		 * all tables in the stack.
> +		 */
> +		if (i != n)
> +			EXPECT(st->merged->stack_len == i + 1);
> +		else
> +			EXPECT(st->merged->stack_len == 1);

You could shorten this to

    EXPECT(st->merged->stack_len == (i == n ? 1 : i + 1);

But I like the version that you wrote here better, because it clearly
indicates when we should and should not perform compaction.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2 05/11] reftable/stack: perform auto-compaction with transactional interface
From: Taylor Blau @ 2023-12-08 22:14 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <77b9ae8aa675dd96dd10f4a5369f1f994fa59939.1702047081.git.ps@pks.im>

On Fri, Dec 08, 2023 at 03:53:14PM +0100, Patrick Steinhardt wrote:
> [...] It can thus happen quite fast that the stack grows very long, which
> results in performance issues when trying to read records.

This sentence was a little confusing to read at first glance. Perhaps
instead:

    The stack can grow to become quite long rather quickly, leading to
    performance issues when trying to read records.

> But besides
> performance issues, this can also lead to exhaustion of file descriptors
> very rapidly as every single table requires a separate descriptor when
> opening the stack.

Well explained, thanks. Everything else here looks good to me.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2 06/11] reftable/stack: reuse buffers when reloading stack
From: Taylor Blau @ 2023-12-08 22:17 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <f797feff8dec383f1db9ae403cd89b80d1743432.1702047081.git.ps@pks.im>

On Fri, Dec 08, 2023 at 03:53:18PM +0100, Patrick Steinhardt wrote:
> In `reftable_stack_reload_once()` we iterate over all the tables added
> to the stack in order to figure out whether any of the tables needs to
> be reloaded. We use a set of buffers in this context to compute the
> paths of these tables, but discard those buffers on every iteration.
> This is quite wasteful given that we do not need to transfer ownership
> of the allocated buffer outside of the loop.
>
> Refactor the code to instead reuse the buffers to reduce the number of
> allocations we need to do.

> @@ -267,16 +265,13 @@ static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
>  	for (i = 0; i < cur_len; i++) {
>  		if (cur[i]) {
>  			const char *name = reader_name(cur[i]);
> -			struct strbuf filename = STRBUF_INIT;
> -			stack_filename(&filename, st, name);
> +			stack_filename(&table_path, st, name);

This initially caught me by surprise, but on closer inspection I agree
that this is OK, since stack_filename() calls strbuf_reset() before
adjusting the buffer contents.

(As a side-note, I do find the side-effect of stack_filename() to be a
little surprising, but that's not the fault of this series and not worth
changing here.)

Thanks,
Taylor

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox