Git development

Git development
 help / color / mirror / Atom feed

* getting git send-email patches from someone who is behind
From: Sandra Snan @ 2023-12-08 17:50 UTC (permalink / raw)
  To: git

I have a li'l git send-email question. Someone sent me a patch set 
of six patches today but they were not on most current main. I had 
to guess what version they were sending to so I could git am when 
I was on that particular version. I managed to sort it all out so 
this question is more for future reference.

Isn't there a way inside of the emails that it can show what 
version to apply the patches to?

Because now I was like "OK, I remember talking to them the other 
day and that means they probably are on what for me is HEAD^^" and 
that turned out to be correct, and sorting out the conflicts was 
also easy enough,
but if I hadn't talked to them beforehand I would've been 
completely lost.

I asked another friend about it and he said:

> it's possible to record the base commit: 
> https://git-scm.com/docs/git-format-patch#_base_tree_information
> however, it's a bit finicky to do with git-send-email

I dunno.

I get that one of the fun parts about using patches instead of PRs 
is
that you can be a li'l more loosey goosey about exactly what 
commit
something is supposed to belong to but here I would've been 
completely
lost because the patchset just borked horribly right from the 
first patch.

If others have run into this, what's the solution?

^ permalink raw reply

* Re: [PATCH 2/2] checkout: forbid "-B <branch>" from touching a branch used elsewhere
From: Junio C Hamano @ 2023-12-08 17:13 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Willem Verstraeten, phillip.wood, git, Jeff King
In-Reply-To: <CAPig+cSGF+vQrnD0f99cbdpQOOC7X6ULa9tFe+FwVrG0SF4PGg@mail.gmail.com>

Eric Sunshine <sunshine@sunshineco.com> writes:

>    Needs review and documentation updates.
>
> I'm not sure if the "Needs review" comment is still applicable since
> the patch did get some review comments, however, the mentioned
> documentation update is probably still needed for this series to
> graduate.

Thanks.  I think "-B" being defined as "branch -f <branch>" followed
by "checkout <branch>" makes it technically unnecessary to add any
new documentation (because "checkout <branch>" will refuse, so it
naturally follows that "checkout -B <branch>" should), but giving
the failure mode a bit more explicit mention would be more helpful
to readers.

Here is to illustrate what I have in mind.  The mention of the
"transactional" was already in the documentation for the "checkout"
back when switch was described at d787d311 (checkout: split part of
it to new command 'switch', 2019-03-29), but somehow was left out in
the documentation of the "switch".  While it is not incorrect to say
that it is a convenient short-cut, it is more important to say what
happens when one of them fails, so I am tempted to port that
description over to the "switch" command, and give the "used elsewhere"
as a sample failure mode.

The test has been also enhanced to check the "transactional" nature.

 Documentation/git-checkout.txt |  4 +++-
 Documentation/git-switch.txt   |  9 +++++++--
 t/t2400-worktree-add.sh        | 18 ++++++++++++++++--
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git c/Documentation/git-checkout.txt w/Documentation/git-checkout.txt
index 240c54639e..55a50b5b23 100644
--- c/Documentation/git-checkout.txt
+++ w/Documentation/git-checkout.txt
@@ -63,7 +63,9 @@ $ git checkout <branch>
 ------------
 +
 that is to say, the branch is not reset/created unless "git checkout" is
-successful.
+successful (e.g., when the branch is in use in another worktree, not
+just the current branch stays the same, but the branch is not reset to
+the start-point, either).
 
 'git checkout' --detach [<branch>]::
 'git checkout' [--detach] <commit>::
diff --git c/Documentation/git-switch.txt w/Documentation/git-switch.txt
index c60fc9c138..6137421ede 100644
--- c/Documentation/git-switch.txt
+++ w/Documentation/git-switch.txt
@@ -59,13 +59,18 @@ out at most one of `A` and `B`, in which case it defaults to `HEAD`.
 -c <new-branch>::
 --create <new-branch>::
 	Create a new branch named `<new-branch>` starting at
-	`<start-point>` before switching to the branch. This is a
-	convenient shortcut for:
+	`<start-point>` before switching to the branch. This is the
+	transactional equivalent of
 +
 ------------
 $ git branch <new-branch>
 $ git switch <new-branch>
 ------------
++
+that is to say, the branch is not reset/created unless "git switch" is
+successful (e.g., when the branch is in use in another worktree, not
+just the current branch stays the same, but the branch is not reset to
+the start-point, either).
 
 -C <new-branch>::
 --force-create <new-branch>::
diff --git c/t/t2400-worktree-add.sh w/t/t2400-worktree-add.sh
index bbcb2d3419..5d5064e63d 100755
--- c/t/t2400-worktree-add.sh
+++ w/t/t2400-worktree-add.sh
@@ -129,8 +129,22 @@ test_expect_success 'die the same branch is already checked out' '
 test_expect_success 'refuse to reset a branch in use elsewhere' '
 	(
 		cd here &&
-		test_must_fail git checkout -B newmain 2>actual &&
-		grep "already used by worktree at" actual
+
+		# we know we are on detached HEAD but just in case ...
+		git checkout --detach HEAD &&
+		git rev-parse --verify HEAD >old.head &&
+
+		git rev-parse --verify refs/heads/newmain >old.branch &&
+		test_must_fail git checkout -B newmain 2>error &&
+		git rev-parse --verify refs/heads/newmain >new.branch &&
+		git rev-parse --verify HEAD >new.head &&
+
+		grep "already used by worktree at" error &&
+		test_cmp old.branch new.branch &&
+		test_cmp old.head new.head &&
+
+		# and we must be still on the same detached HEAD state
+		test_must_fail git symbolic-ref HEAD
 	)
 '
 

^ permalink raw reply related

* [PATCH v2 11/11] reftable/block: reuse buffer to compute record keys
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3135 bytes --]

When iterating over entries in the block iterator we compute the key of
each of the entries and write it into a buffer. We do not reuse the
buffer though and thus re-allocate it on every iteration, which is
wasteful.

Refactor the code to reuse the buffer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 19 ++++++++-----------
 reftable/block.h |  2 ++
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 8c6a8c77fc..1df3d8a0f0 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -323,30 +323,28 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 		.len = it->br->block_len - it->next_off,
 	};
 	struct string_view start = in;
-	struct strbuf key = STRBUF_INIT;
 	uint8_t extra = 0;
 	int n = 0;
 
 	if (it->next_off >= it->br->block_len)
 		return 1;
 
-	n = reftable_decode_key(&key, &extra, it->last_key, in);
+	n = reftable_decode_key(&it->key, &extra, it->last_key, in);
 	if (n < 0)
 		return -1;
 
-	if (!key.len)
+	if (!it->key.len)
 		return REFTABLE_FORMAT_ERROR;
 
 	string_view_consume(&in, n);
-	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
+	n = reftable_record_decode(rec, it->key, extra, in, it->br->hash_size);
 	if (n < 0)
 		return -1;
 	string_view_consume(&in, n);
 
 	strbuf_reset(&it->last_key);
-	strbuf_addbuf(&it->last_key, &key);
+	strbuf_addbuf(&it->last_key, &it->key);
 	it->next_off += start.len - in.len;
-	strbuf_release(&key);
 	return 0;
 }
 
@@ -377,6 +375,7 @@ int block_iter_seek(struct block_iter *it, struct strbuf *want)
 void block_iter_close(struct block_iter *it)
 {
 	strbuf_release(&it->last_key);
+	strbuf_release(&it->key);
 }
 
 int block_reader_seek(struct block_reader *br, struct block_iter *it,
@@ -387,7 +386,6 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
 		.r = br,
 	};
 	struct reftable_record rec = reftable_new_record(block_reader_type(br));
-	struct strbuf key = STRBUF_INIT;
 	int err = 0;
 	struct block_iter next = BLOCK_ITER_INIT;
 
@@ -414,8 +412,8 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
 		if (err < 0)
 			goto done;
 
-		reftable_record_key(&rec, &key);
-		if (err > 0 || strbuf_cmp(&key, want) >= 0) {
+		reftable_record_key(&rec, &it->key);
+		if (err > 0 || strbuf_cmp(&it->key, want) >= 0) {
 			err = 0;
 			goto done;
 		}
@@ -424,8 +422,7 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
 	}
 
 done:
-	strbuf_release(&key);
-	strbuf_release(&next.last_key);
+	block_iter_close(&next);
 	reftable_record_release(&rec);
 
 	return err;
diff --git a/reftable/block.h b/reftable/block.h
index 51699af233..17481e6331 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -84,10 +84,12 @@ struct block_iter {
 
 	/* key for last entry we read. */
 	struct strbuf last_key;
+	struct strbuf key;
 };
 
 #define BLOCK_ITER_INIT { \
 	.last_key = STRBUF_INIT, \
+	.key = STRBUF_INIT, \
 }
 
 /* initializes a block reader. */
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 10/11] reftable/block: introduce macro to initialize `struct block_iter`
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3434 bytes --]

There are a bunch of locations where we initialize members of `struct
block_iter`, which makes it harder than necessary to expand this struct
to have additional members. Unify the logic via a new `BLOCK_ITER_INIT`
macro that initializes all members.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c      | 4 +---
 reftable/block.h      | 4 ++++
 reftable/block_test.c | 4 ++--
 reftable/iter.h       | 8 ++++----
 reftable/reader.c     | 7 +++----
 5 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 34d4d07369..8c6a8c77fc 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -389,9 +389,7 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
 	struct reftable_record rec = reftable_new_record(block_reader_type(br));
 	struct strbuf key = STRBUF_INIT;
 	int err = 0;
-	struct block_iter next = {
-		.last_key = STRBUF_INIT,
-	};
+	struct block_iter next = BLOCK_ITER_INIT;
 
 	int i = binsearch(br->restart_count, &restart_key_less, &args);
 	if (args.error) {
diff --git a/reftable/block.h b/reftable/block.h
index 87c77539b5..51699af233 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -86,6 +86,10 @@ struct block_iter {
 	struct strbuf last_key;
 };
 
+#define BLOCK_ITER_INIT { \
+	.last_key = STRBUF_INIT, \
+}
+
 /* initializes a block reader. */
 int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 		      uint32_t header_off, uint32_t table_block_size,
diff --git a/reftable/block_test.c b/reftable/block_test.c
index cb88af4a56..c00bbc8aed 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -32,7 +32,7 @@ static void test_block_read_write(void)
 	int i = 0;
 	int n;
 	struct block_reader br = { 0 };
-	struct block_iter it = { .last_key = STRBUF_INIT };
+	struct block_iter it = BLOCK_ITER_INIT;
 	int j = 0;
 	struct strbuf want = STRBUF_INIT;
 
@@ -87,7 +87,7 @@ static void test_block_read_write(void)
 	block_iter_close(&it);
 
 	for (i = 0; i < N; i++) {
-		struct block_iter it = { .last_key = STRBUF_INIT };
+		struct block_iter it = BLOCK_ITER_INIT;
 		strbuf_reset(&want);
 		strbuf_addstr(&want, names[i]);
 
diff --git a/reftable/iter.h b/reftable/iter.h
index 09eb0cbfa5..47d67d84df 100644
--- a/reftable/iter.h
+++ b/reftable/iter.h
@@ -53,10 +53,10 @@ struct indexed_table_ref_iter {
 	int is_finished;
 };
 
-#define INDEXED_TABLE_REF_ITER_INIT                                     \
-	{                                                               \
-		.cur = { .last_key = STRBUF_INIT }, .oid = STRBUF_INIT, \
-	}
+#define INDEXED_TABLE_REF_ITER_INIT { \
+	.cur = BLOCK_ITER_INIT, \
+	.oid = STRBUF_INIT, \
+}
 
 void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
 					  struct indexed_table_ref_iter *itr);
diff --git a/reftable/reader.c b/reftable/reader.c
index b4db23ce18..9de64f50b4 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -224,10 +224,9 @@ struct table_iter {
 	struct block_iter bi;
 	int is_finished;
 };
-#define TABLE_ITER_INIT                          \
-	{                                        \
-		.bi = {.last_key = STRBUF_INIT } \
-	}
+#define TABLE_ITER_INIT { \
+	.bi = BLOCK_ITER_INIT \
+}
 
 static void table_iter_copy_from(struct table_iter *dest,
 				 struct table_iter *src)
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 09/11] reftable/merged: reuse buffer to compute record keys
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3361 bytes --]

When iterating over entries in the merged iterator's queue, we compute
the key of each of the entries and write it into a buffer. We do not
reuse the buffer though and thus re-allocate it on every iteration,
which is wasteful given that we never transfer ownership of the
allocated bytes outside of the loop.

Refactor the code to reuse the buffer. This also fixes a potential
memory leak when `merged_iter_advance_subiter()` returns an error.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 31 ++++++++++++++++---------------
 reftable/merged.h |  2 ++
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 5ded470c08..556bb5c556 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -52,6 +52,8 @@ static void merged_iter_close(void *p)
 		reftable_iterator_destroy(&mi->stack[i]);
 	}
 	reftable_free(mi->stack);
+	strbuf_release(&mi->key);
+	strbuf_release(&mi->entry_key);
 }
 
 static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
@@ -85,7 +87,6 @@ static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
 static int merged_iter_next_entry(struct merged_iter *mi,
 				  struct reftable_record *rec)
 {
-	struct strbuf entry_key = STRBUF_INIT;
 	struct pq_entry entry = { 0 };
 	int err = 0;
 
@@ -105,33 +106,31 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 	  such a deployment, the loop below must be changed to collect all
 	  entries for the same key, and return new the newest one.
 	*/
-	reftable_record_key(&entry.rec, &entry_key);
+	reftable_record_key(&entry.rec, &mi->entry_key);
 	while (!merged_iter_pqueue_is_empty(mi->pq)) {
 		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
-		struct strbuf k = STRBUF_INIT;
-		int err = 0, cmp = 0;
+		int cmp = 0;
 
-		reftable_record_key(&top.rec, &k);
+		reftable_record_key(&top.rec, &mi->key);
 
-		cmp = strbuf_cmp(&k, &entry_key);
-		strbuf_release(&k);
-
-		if (cmp > 0) {
+		cmp = strbuf_cmp(&mi->key, &mi->entry_key);
+		if (cmp > 0)
 			break;
-		}
 
 		merged_iter_pqueue_remove(&mi->pq);
 		err = merged_iter_advance_subiter(mi, top.index);
-		if (err < 0) {
-			return err;
-		}
+		if (err < 0)
+			goto done;
 		reftable_record_release(&top.rec);
 	}
 
 	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+
+done:
 	reftable_record_release(&entry.rec);
-	strbuf_release(&entry_key);
-	return 0;
+	strbuf_release(&mi->entry_key);
+	strbuf_release(&mi->key);
+	return err;
 }
 
 static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
@@ -248,6 +247,8 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
 		.typ = reftable_record_type(rec),
 		.hash_id = mt->hash_id,
 		.suppress_deletions = mt->suppress_deletions,
+		.key = STRBUF_INIT,
+		.entry_key = STRBUF_INIT,
 	};
 	int n = 0;
 	int err = 0;
diff --git a/reftable/merged.h b/reftable/merged.h
index 7d9f95d27e..d5b39dfe7f 100644
--- a/reftable/merged.h
+++ b/reftable/merged.h
@@ -31,6 +31,8 @@ struct merged_iter {
 	uint8_t typ;
 	int suppress_deletions;
 	struct merged_iter_pqueue pq;
+	struct strbuf key;
+	struct strbuf entry_key;
 };
 
 void merged_table_release(struct reftable_merged_table *mt);
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 08/11] reftable/stack: fix use of unseeded randomness
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2117 bytes --]

When writing a new reftable stack, Git will first create the stack with
a random suffix so that concurrent updates will not try to write to the
same file. This random suffix is computed via a call to rand(3P). But we
never seed the function via srand(3P), which means that the suffix is in
fact always the same.

Fix this bug by using `git_rand()` instead, which does not need to be
initialized. While this function is likely going to be slower depending
on the platform, this slowness should not matter in practice as we only
use it when writing a new reftable stack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/readwrite_test.c | 6 +++---
 reftable/stack.c          | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 469ab79a5a..278663f22d 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -141,8 +141,8 @@ static void test_log_buffer_size(void)
 	*/
 	uint8_t hash1[GIT_SHA1_RAWSZ], hash2[GIT_SHA1_RAWSZ];
 	for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
-		hash1[i] = (uint8_t)(rand() % 256);
-		hash2[i] = (uint8_t)(rand() % 256);
+		hash1[i] = (uint8_t)(git_rand() % 256);
+		hash2[i] = (uint8_t)(git_rand() % 256);
 	}
 	log.value.update.old_hash = hash1;
 	log.value.update.new_hash = hash2;
@@ -320,7 +320,7 @@ static void test_log_zlib_corruption(void)
 	};
 
 	for (i = 0; i < sizeof(message) - 1; i++)
-		message[i] = (uint8_t)(rand() % 64 + ' ');
+		message[i] = (uint8_t)(git_rand() % 64 + ' ');
 
 	reftable_writer_set_limits(w, 1, 1);
 
diff --git a/reftable/stack.c b/reftable/stack.c
index 2f1494aef2..95963f67a2 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -434,7 +434,7 @@ int reftable_stack_add(struct reftable_stack *st,
 static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
 {
 	char buf[100];
-	uint32_t rnd = (uint32_t)rand();
+	uint32_t rnd = (uint32_t)git_rand();
 	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x",
 		 min, max, rnd);
 	strbuf_reset(dest);
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 07/11] reftable/stack: fix stale lock when dying
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4515 bytes --]

When starting a transaction via `reftable_stack_init_addition()`, we
create a lockfile for the reftable stack itself which we'll write the
new list of tables to. But if we terminate abnormally e.g. via a call to
`die()`, then we do not remove the lockfile. Subsequent executions of
Git which try to modify references will thus fail with an out-of-date
error.

Fix this bug by registering the lock as a `struct tempfile`, which
ensures automatic cleanup for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 47 +++++++++++++++--------------------------------
 1 file changed, 15 insertions(+), 32 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 2dd2373360..2f1494aef2 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -17,6 +17,8 @@ license that can be found in the LICENSE file or at
 #include "reftable-merged.h"
 #include "writer.h"
 
+#include "tempfile.h"
+
 static int stack_try_add(struct reftable_stack *st,
 			 int (*write_table)(struct reftable_writer *wr,
 					    void *arg),
@@ -440,8 +442,7 @@ static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
 }
 
 struct reftable_addition {
-	int lock_file_fd;
-	struct strbuf lock_file_name;
+	struct tempfile *lock_file;
 	struct reftable_stack *stack;
 
 	char **new_tables;
@@ -449,24 +450,19 @@ struct reftable_addition {
 	uint64_t next_update_index;
 };
 
-#define REFTABLE_ADDITION_INIT                \
-	{                                     \
-		.lock_file_name = STRBUF_INIT \
-	}
+#define REFTABLE_ADDITION_INIT {0}
 
 static int reftable_stack_init_addition(struct reftable_addition *add,
 					struct reftable_stack *st)
 {
+	struct strbuf lock_file_name = STRBUF_INIT;
 	int err = 0;
 	add->stack = st;
 
-	strbuf_reset(&add->lock_file_name);
-	strbuf_addstr(&add->lock_file_name, st->list_file);
-	strbuf_addstr(&add->lock_file_name, ".lock");
+	strbuf_addf(&lock_file_name, "%s.lock", st->list_file);
 
-	add->lock_file_fd = open(add->lock_file_name.buf,
-				 O_EXCL | O_CREAT | O_WRONLY, 0666);
-	if (add->lock_file_fd < 0) {
+	add->lock_file = create_tempfile(lock_file_name.buf);
+	if (!add->lock_file) {
 		if (errno == EEXIST) {
 			err = REFTABLE_LOCK_ERROR;
 		} else {
@@ -475,7 +471,7 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 		goto done;
 	}
 	if (st->config.default_permissions) {
-		if (chmod(add->lock_file_name.buf, st->config.default_permissions) < 0) {
+		if (chmod(lock_file_name.buf, st->config.default_permissions) < 0) {
 			err = REFTABLE_IO_ERROR;
 			goto done;
 		}
@@ -495,6 +491,7 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 	if (err) {
 		reftable_addition_close(add);
 	}
+	strbuf_release(&lock_file_name);
 	return err;
 }
 
@@ -512,15 +509,7 @@ static void reftable_addition_close(struct reftable_addition *add)
 	add->new_tables = NULL;
 	add->new_tables_len = 0;
 
-	if (add->lock_file_fd > 0) {
-		close(add->lock_file_fd);
-		add->lock_file_fd = 0;
-	}
-	if (add->lock_file_name.len > 0) {
-		unlink(add->lock_file_name.buf);
-		strbuf_release(&add->lock_file_name);
-	}
-
+	delete_tempfile(&add->lock_file);
 	strbuf_release(&nm);
 }
 
@@ -536,8 +525,10 @@ void reftable_addition_destroy(struct reftable_addition *add)
 int reftable_addition_commit(struct reftable_addition *add)
 {
 	struct strbuf table_list = STRBUF_INIT;
+	int lock_file_fd = get_tempfile_fd(add->lock_file);
 	int i = 0;
 	int err = 0;
+
 	if (add->new_tables_len == 0)
 		goto done;
 
@@ -550,28 +541,20 @@ int reftable_addition_commit(struct reftable_addition *add)
 		strbuf_addstr(&table_list, "\n");
 	}
 
-	err = write_in_full(add->lock_file_fd, table_list.buf, table_list.len);
+	err = write_in_full(lock_file_fd, table_list.buf, table_list.len);
 	strbuf_release(&table_list);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
 	}
 
-	err = close(add->lock_file_fd);
-	add->lock_file_fd = 0;
-	if (err < 0) {
-		err = REFTABLE_IO_ERROR;
-		goto done;
-	}
-
-	err = rename(add->lock_file_name.buf, add->stack->list_file);
+	err = rename_tempfile(&add->lock_file, add->stack->list_file);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
 	}
 
 	/* success, no more state to clean up. */
-	strbuf_release(&add->lock_file_name);
 	for (i = 0; i < add->new_tables_len; i++) {
 		reftable_free(add->new_tables[i]);
 	}
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 06/11] reftable/stack: reuse buffers when reloading stack
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2328 bytes --]

In `reftable_stack_reload_once()` we iterate over all the tables added
to the stack in order to figure out whether any of the tables needs to
be reloaded. We use a set of buffers in this context to compute the
paths of these tables, but discard those buffers on every iteration.
This is quite wasteful given that we do not need to transfer ownership
of the allocated buffer outside of the loop.

Refactor the code to instead reuse the buffers to reduce the number of
allocations we need to do.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index f5d18a842a..2dd2373360 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -204,6 +204,7 @@ static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
 		reftable_calloc(sizeof(struct reftable_table) * names_len);
 	int new_readers_len = 0;
 	struct reftable_merged_table *new_merged = NULL;
+	struct strbuf table_path = STRBUF_INIT;
 	int i;
 
 	while (*names) {
@@ -223,13 +224,10 @@ static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
 
 		if (!rd) {
 			struct reftable_block_source src = { NULL };
-			struct strbuf table_path = STRBUF_INIT;
 			stack_filename(&table_path, st, name);
 
 			err = reftable_block_source_from_file(&src,
 							      table_path.buf);
-			strbuf_release(&table_path);
-
 			if (err < 0)
 				goto done;
 
@@ -267,16 +265,13 @@ static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
 	for (i = 0; i < cur_len; i++) {
 		if (cur[i]) {
 			const char *name = reader_name(cur[i]);
-			struct strbuf filename = STRBUF_INIT;
-			stack_filename(&filename, st, name);
+			stack_filename(&table_path, st, name);
 
 			reader_close(cur[i]);
 			reftable_reader_free(cur[i]);
 
 			/* On Windows, can only unlink after closing. */
-			unlink(filename.buf);
-
-			strbuf_release(&filename);
+			unlink(table_path.buf);
 		}
 	}
 
@@ -288,6 +283,7 @@ static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
 	reftable_free(new_readers);
 	reftable_free(new_tables);
 	reftable_free(cur);
+	strbuf_release(&table_path);
 	return err;
 }
 
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 05/11] reftable/stack: perform auto-compaction with transactional interface
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4295 bytes --]

Whenever updating references or reflog entries in the reftable stack, we
need to add a new table to the stack, thus growing the stack's length by
one. It can thus happen quite fast that the stack grows very long, which
results in performance issues when trying to read records. But besides
performance issues, this can also lead to exhaustion of file descriptors
very rapidly as every single table requires a separate descriptor when
opening the stack.

While git-pack-refs(1) fixes this issue for us by merging the tables, it
runs too irregularly to keep the length of the stack within reasonable
limits. This is why the reftable stack has an auto-compaction mechanism:
`reftable_stack_add()` will call `reftable_stack_auto_compact()` after
its added the new table, which will auto-compact the stack as required.

But while this logic works alright for `reftable_stack_add()`, we do not
do the same in `reftable_addition_commit()`, which is the transactional
equivalent to the former function that allows us to write multiple
updates to the stack atomically. Consequentially, we will easily run
into file descriptor exhaustion in code paths that use many separate
transactions like e.g. non-atomic fetches.

Fix this issue by calling `reftable_stack_auto_compact()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c      |  6 +++++
 reftable/stack_test.c | 56 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)

diff --git a/reftable/stack.c b/reftable/stack.c
index f0cadad490..f5d18a842a 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -584,6 +584,12 @@ int reftable_addition_commit(struct reftable_addition *add)
 	add->new_tables_len = 0;
 
 	err = reftable_stack_reload(add->stack);
+	if (err)
+		goto done;
+
+	if (!add->stack->disable_auto_compact)
+		err = reftable_stack_auto_compact(add->stack);
+
 done:
 	reftable_addition_close(add);
 	return err;
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index c979d177c2..4c2f794c49 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -289,6 +289,61 @@ static void test_reftable_stack_transaction_api(void)
 	clear_dir(dir);
 }
 
+static void test_reftable_stack_transaction_api_performs_auto_compaction(void)
+{
+	char *dir = get_tmp_dir(__LINE__);
+	struct reftable_write_options cfg = {0};
+	struct reftable_addition *add = NULL;
+	struct reftable_stack *st = NULL;
+	int i, n = 20, err;
+
+	err = reftable_new_stack(&st, dir, cfg);
+	EXPECT_ERR(err);
+
+	for (i = 0; i <= n; i++) {
+		struct reftable_ref_record ref = {
+			.update_index = reftable_stack_next_update_index(st),
+			.value_type = REFTABLE_REF_SYMREF,
+			.value.symref = "master",
+		};
+		char name[100];
+
+		snprintf(name, sizeof(name), "branch%04d", i);
+		ref.refname = name;
+
+		/*
+		 * Disable auto-compaction for all but the last runs. Like this
+		 * we can ensure that we indeed honor this setting and have
+		 * better control over when exactly auto compaction runs.
+		 */
+		st->disable_auto_compact = i != n;
+
+		err = reftable_stack_new_addition(&add, st);
+		EXPECT_ERR(err);
+
+		err = reftable_addition_add(add, &write_test_ref, &ref);
+		EXPECT_ERR(err);
+
+		err = reftable_addition_commit(add);
+		EXPECT_ERR(err);
+
+		reftable_addition_destroy(add);
+
+		/*
+		 * The stack length should grow continuously for all runs where
+		 * auto compaction is disabled. When enabled, we should merge
+		 * all tables in the stack.
+		 */
+		if (i != n)
+			EXPECT(st->merged->stack_len == i + 1);
+		else
+			EXPECT(st->merged->stack_len == 1);
+	}
+
+	reftable_stack_destroy(st);
+	clear_dir(dir);
+}
+
 static void test_reftable_stack_validate_refname(void)
 {
 	struct reftable_write_options cfg = { 0 };
@@ -1014,6 +1069,7 @@ int stack_test_main(int argc, const char *argv[])
 	RUN_TEST(test_reftable_stack_log_normalize);
 	RUN_TEST(test_reftable_stack_tombstone);
 	RUN_TEST(test_reftable_stack_transaction_api);
+	RUN_TEST(test_reftable_stack_transaction_api_performs_auto_compaction);
 	RUN_TEST(test_reftable_stack_update_index_check);
 	RUN_TEST(test_reftable_stack_uptodate);
 	RUN_TEST(test_reftable_stack_validate_refname);
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 04/11] reftable/stack: verify that `reftable_stack_add()` uses auto-compaction
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

While we have several tests that check whether we correctly perform
auto-compaction when manually calling `reftable_stack_auto_compact()`,
we don't have any tests that verify whether `reftable_stack_add()` does
call it automatically. Add one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack_test.c | 47 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 0644c8ad2e..c979d177c2 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -850,6 +850,52 @@ static void test_reftable_stack_auto_compaction(void)
 	clear_dir(dir);
 }
 
+static void test_reftable_stack_add_performs_auto_compaction(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	char *dir = get_tmp_dir(__LINE__);
+	int err, i, n = 20;
+
+	err = reftable_new_stack(&st, dir, cfg);
+	EXPECT_ERR(err);
+
+	for (i = 0; i <= n; i++) {
+		struct reftable_ref_record ref = {
+			.update_index = reftable_stack_next_update_index(st),
+			.value_type = REFTABLE_REF_SYMREF,
+			.value.symref = "master",
+		};
+		char name[100];
+
+		/*
+		 * Disable auto-compaction for all but the last runs. Like this
+		 * we can ensure that we indeed honor this setting and have
+		 * better control over when exactly auto compaction runs.
+		 */
+		st->disable_auto_compact = i != n;
+
+		snprintf(name, sizeof(name), "branch%04d", i);
+		ref.refname = name;
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		EXPECT_ERR(err);
+
+		/*
+		 * The stack length should grow continuously for all runs where
+		 * auto compaction is disabled. When enabled, we should merge
+		 * all tables in the stack.
+		 */
+		if (i != n)
+			EXPECT(st->merged->stack_len == i + 1);
+		else
+			EXPECT(st->merged->stack_len == 1);
+	}
+
+	reftable_stack_destroy(st);
+	clear_dir(dir);
+}
+
 static void test_reftable_stack_compaction_concurrent(void)
 {
 	struct reftable_write_options cfg = { 0 };
@@ -960,6 +1006,7 @@ int stack_test_main(int argc, const char *argv[])
 	RUN_TEST(test_reftable_stack_add);
 	RUN_TEST(test_reftable_stack_add_one);
 	RUN_TEST(test_reftable_stack_auto_compaction);
+	RUN_TEST(test_reftable_stack_add_performs_auto_compaction);
 	RUN_TEST(test_reftable_stack_compaction_concurrent);
 	RUN_TEST(test_reftable_stack_compaction_concurrent_clean);
 	RUN_TEST(test_reftable_stack_hash_id);
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 03/11] reftable: handle interrupted writes
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1923 bytes --]

There are calls to write(3P) where we don't properly handle interrupts.
Convert them to use `write_in_full()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c      | 6 +++---
 reftable/stack_test.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index ed108a929b..f0cadad490 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -42,7 +42,7 @@ static void stack_filename(struct strbuf *dest, struct reftable_stack *st,
 static ssize_t reftable_fd_write(void *arg, const void *data, size_t sz)
 {
 	int *fdp = (int *)arg;
-	return write(*fdp, data, sz);
+	return write_in_full(*fdp, data, sz);
 }
 
 int reftable_new_stack(struct reftable_stack **dest, const char *dir,
@@ -554,7 +554,7 @@ int reftable_addition_commit(struct reftable_addition *add)
 		strbuf_addstr(&table_list, "\n");
 	}
 
-	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	err = write_in_full(add->lock_file_fd, table_list.buf, table_list.len);
 	strbuf_release(&table_list);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
@@ -1024,7 +1024,7 @@ static int stack_compact_range(struct reftable_stack *st, int first, int last,
 		strbuf_addstr(&ref_list_contents, "\n");
 	}
 
-	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	err = write_in_full(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
 		unlink(new_table_path.buf);
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index d0b717510f..0644c8ad2e 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -78,7 +78,7 @@ static void test_read_file(void)
 	int i = 0;
 
 	EXPECT(fd > 0);
-	n = write(fd, out, strlen(out));
+	n = write_in_full(fd, out, strlen(out));
 	EXPECT(n == strlen(out));
 	err = close(fd);
 	EXPECT(err >= 0);
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 02/11] reftable: handle interrupted reads
From: Patrick Steinhardt @ 2023-12-08 14:53 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]

There are calls to pread(3P) and read(3P) where we don't properly handle
interrupts. Convert them to use `pread_in_full()` and `read_in_full()`,
respectively.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/blocksource.c | 2 +-
 reftable/stack.c       | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/reftable/blocksource.c b/reftable/blocksource.c
index 8331b34e82..a1ea304429 100644
--- a/reftable/blocksource.c
+++ b/reftable/blocksource.c
@@ -109,7 +109,7 @@ static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
 	struct file_block_source *b = v;
 	assert(off + size <= b->size);
 	dest->data = reftable_malloc(size);
-	if (pread(b->fd, dest->data, size, off) != size)
+	if (pread_in_full(b->fd, dest->data, size, off) != size)
 		return -1;
 	dest->len = size;
 	return size;
diff --git a/reftable/stack.c b/reftable/stack.c
index ddbdf1b9c8..ed108a929b 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -92,7 +92,7 @@ static int fd_read_lines(int fd, char ***namesp)
 	}
 
 	buf = reftable_malloc(size + 1);
-	if (read(fd, buf, size) != size) {
+	if (read_in_full(fd, buf, size) != size) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
 	}
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 00/11] reftable: small set of fixes
From: Patrick Steinhardt @ 2023-12-08 14:52 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1700549493.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2189 bytes --]

Hi,

this is the second version of my patch series that addresses several
smallish issues in the reftable backend. Given that the first version
didn't receive any reviews yet I decided to squash in additional
findings into this series. This is both to reduce the number of follow
up series, but also to hopefully push this topic onto the radar of folks
on the mailing list.

Changes compared to v1:

  - Allocations were optimized further for `struct merged_iter` by
    making the buffers part of the structure itself so that they can
    be reused across iterations.

  - Allocations were optimized for `struct block_iter` in the same
    way.

  - Temporary stacks have a supposedly-random suffix so that concurrent
    writers don't conflict with each other. We used unseeded `rand()`
    calls for it though, so they weren't random after all. This is fixed
    by converting to use `git_rand()` instead.

Patrick

Patrick Steinhardt (11):
  reftable: wrap EXPECT macros in do/while
  reftable: handle interrupted reads
  reftable: handle interrupted writes
  reftable/stack: verify that `reftable_stack_add()` uses
    auto-compaction
  reftable/stack: perform auto-compaction with transactional interface
  reftable/stack: reuse buffers when reloading stack
  reftable/stack: fix stale lock when dying
  reftable/stack: fix use of unseeded randomness
  reftable/merged: reuse buffer to compute record keys
  reftable/block: introduce macro to initialize `struct block_iter`
  reftable/block: reuse buffer to compute record keys

 reftable/block.c          |  23 ++++-----
 reftable/block.h          |   6 +++
 reftable/block_test.c     |   4 +-
 reftable/blocksource.c    |   2 +-
 reftable/iter.h           |   8 +--
 reftable/merged.c         |  31 +++++------
 reftable/merged.h         |   2 +
 reftable/reader.c         |   7 ++-
 reftable/readwrite_test.c |   6 +--
 reftable/stack.c          |  73 +++++++++++---------------
 reftable/stack_test.c     | 105 +++++++++++++++++++++++++++++++++++++-
 reftable/test_framework.h |  58 +++++++++++----------
 12 files changed, 211 insertions(+), 114 deletions(-)

-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 01/11] reftable: wrap EXPECT macros in do/while
From: Patrick Steinhardt @ 2023-12-08 14:52 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4427 bytes --]

The `EXPECT` macros used by the reftable test framework are all using a
single `if` statement with the actual condition. This results in weird
syntax when using them in if/else statements like the following:

```
if (foo)
	EXPECT(foo == 2)
else
	EXPECT(bar == 2)
```

Note that there need not be a trailing semicolon. Furthermore, it is not
immediately obvious whether the else now belongs to the `if (foo)` or
whether it belongs to the expanded `if (foo == 2)` from the macro.

Fix this by wrapping the macros in a do/while loop.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/test_framework.h | 58 +++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/reftable/test_framework.h b/reftable/test_framework.h
index 774cb275bf..ee44f735ae 100644
--- a/reftable/test_framework.h
+++ b/reftable/test_framework.h
@@ -12,32 +12,38 @@ license that can be found in the LICENSE file or at
 #include "system.h"
 #include "reftable-error.h"
 
-#define EXPECT_ERR(c)                                                  \
-	if (c != 0) {                                                  \
-		fflush(stderr);                                        \
-		fflush(stdout);                                        \
-		fprintf(stderr, "%s: %d: error == %d (%s), want 0\n",  \
-			__FILE__, __LINE__, c, reftable_error_str(c)); \
-		abort();                                               \
-	}
-
-#define EXPECT_STREQ(a, b)                                               \
-	if (strcmp(a, b)) {                                              \
-		fflush(stderr);                                          \
-		fflush(stdout);                                          \
-		fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
-			__LINE__, #a, a, #b, b);                         \
-		abort();                                                 \
-	}
-
-#define EXPECT(c)                                                          \
-	if (!(c)) {                                                        \
-		fflush(stderr);                                            \
-		fflush(stdout);                                            \
-		fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
-			__LINE__, #c);                                     \
-		abort();                                                   \
-	}
+#define EXPECT_ERR(c)                                                          \
+	do {                                                                   \
+		if (c != 0) {                                                  \
+			fflush(stderr);                                        \
+			fflush(stdout);                                        \
+			fprintf(stderr, "%s: %d: error == %d (%s), want 0\n",  \
+				__FILE__, __LINE__, c, reftable_error_str(c)); \
+			abort();                                               \
+		}                                                              \
+	} while (0)
+
+#define EXPECT_STREQ(a, b)                                                       \
+	do {                                                                     \
+		if (strcmp(a, b)) {                                              \
+			fflush(stderr);                                          \
+			fflush(stdout);                                          \
+			fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
+				__LINE__, #a, a, #b, b);                         \
+			abort();                                                 \
+		}                                                                \
+	} while (0)
+
+#define EXPECT(c)                                                                  \
+	do {                                                                       \
+		if (!(c)) {                                                        \
+			fflush(stderr);                                            \
+			fflush(stdout);                                            \
+			fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
+				__LINE__, #c);                                     \
+			abort();                                                   \
+		}                                                                  \
+	} while (0)
 
 #define RUN_TEST(f)                          \
 	fprintf(stderr, "running %s\n", #f); \
-- 
2.43.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* Re: [PATCH v2 1/2] completion: refactor existence checks for pseudorefs
From: Patrick Steinhardt @ 2023-12-08  8:24 UTC (permalink / raw)
  To: Stan Hu; +Cc: git
In-Reply-To: <1c6a747691f36ede4224b6d4c2e0c8fd4c0575fd.1701928891.git.stanhu@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

On Wed, Dec 06, 2023 at 10:06:39PM -0800, Stan Hu wrote:
> In preparation for the reftable backend, this commit introduces a
> '__git_pseudoref_exists' function that continues to use 'test -f' to
> determine whether a given pseudoref exists in the local filesystem.
> 
> Signed-off-by: Stan Hu <stanhu@gmail.com>
> ---
>  contrib/completion/git-completion.bash | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
> index 13a39ebd2e..9fbdc13f9a 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -122,6 +122,15 @@ __git ()
>  		${__git_dir:+--git-dir="$__git_dir"} "$@" 2>/dev/null
>  }
>  
> +# Runs git in $__git_repo_path to determine whether a ref exists.
> +# 1: The ref to search
> +__git_ref_exists ()

I first thought that you missed Junio's point that `__git_ref_exists`
may better be renamed to something lkie `__git_pseudoref_exists`. But
you do indeed change the name in the second patch. I'd propose to
instead squash the rename into the first patch so that the series
becomes easier to read.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 10/24] pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions
From: Patrick Steinhardt @ 2023-12-08  8:19 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZXIsbO++u9n/yDYi@nand.local>

[-- Attachment #1: Type: text/plain, Size: 1465 bytes --]

On Thu, Dec 07, 2023 at 03:34:52PM -0500, Taylor Blau wrote:
> On Thu, Dec 07, 2023 at 02:13:13PM +0100, Patrick Steinhardt wrote:
[snip]
> > Can't it happen that we have no pack here? In the MIDX-case we skip all
> > packs that either do not have a bitmap or are not preferred. So does it
> > mean that in reverse, every preferred packfile must have a a bitmap? I'd
> > think that to not be true in case bitmaps are turned off.
> 
> It's subtle, but this state is indeed not possible. If we have a MIDX
> and it has a bitmap, we know that there is at least one object at least
> one pack.
> 
> On the "at least one object front", that check was added in eb57277ba3
> (midx: prevent writing a .bitmap without any objects, 2022-02-09). And
> we know that our preferred pack (either explicitly given or the one we
> infer automatically) is non-empty, via the check added in 5d3cd09a80
> (midx: reject empty `--preferred-pack`'s, 2021-08-31).
> 
> (As a fun/non-fun aside, looking these up gave me some serious deja-vu
> and reminded me of how painful discovering and fixing those bugs was!)
> 
> So we're OK here. We could add a comment which captures what I wrote
> above here, but since this is a temporary state (and we're going to
> change how we select which packs are reuse candidates in a later patch),
> I think it's OK to avoid (but please let me know if you feel differently).

Makes sense, thanks for the explanation!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 09/24] repack: implement `--extend-disjoint` mode
From: Patrick Steinhardt @ 2023-12-08  8:19 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZXIq4mjDUoqlGvgW@nand.local>

[-- Attachment #1: Type: text/plain, Size: 4326 bytes --]

On Thu, Dec 07, 2023 at 03:28:18PM -0500, Taylor Blau wrote:
> On Thu, Dec 07, 2023 at 02:13:08PM +0100, Patrick Steinhardt wrote:
> > >   - cruft packs (which may necessarily need to include an object from a
> > >     disjoint pack in order to freshen it in certain circumstances)
> >
> > This one took me a while to figure out. If we'd mark crufts as disjoint,
> > then it would mean that new packfiles cannot be marked as disjoint if
> > objects which were previously unreachable do become reachable again.
> > So we'd be pessimizing packfiles for live objects in favor of others
> > which aren't.
> 
> Yeah, that's right, too. There are a couple of cases where more than one
> cruft pack may contain the same object, one of them being the
> flip-flopping between reachable and unreachable as you suggest above.
> Another is that you have a non-prunable unreachable object which is
> already in a cruft pack. If the object's mtime gets updated (and still
> cannot be pruned), we'll end up freshening the object loose, and then
> packing it again (with the more recent mtime) into a new cruft pack.
> 
> That aside, I actually think that there are ways to mark cruft packs
> disjoint. But they're complicated, and moreover, I don't think you'd
> ever *want* to mark a cruft pack as disjoint. Cruft packs usually
> contain garbage, which is unlikely to be useful to any fetches/clones.
> 
> If we did mark them as disjoint, it would mean that we could reuse
> verbatim sections of the cruft pack in our output, but we would likely
> end up with very few such sections.

Makes sense. It also doesn't feel worth it to introduce additional
complexity for objects that for most of the part wouldn't ever be served
on a fetch anyway.

[snip]
> > Okay. I had a bit of trouble to sift through the various different
> > flags like "--retain-disjoint", "--extend-disjoint", "--ignore-disjoint"
> > and so on. But well, they do different things and it's been a few days
> > since I've reviewed the preceding patches, so this is probably fine.
> 
> Yeah, I am definitely open to better naming conventions here? I figured
> that:
> 
>   - --retain-disjoint was a good name for the MIDX option, since it is
>     retaining existing disjoint packs in the new MIDX
>   - --extend-disjoint was a good name for the repack option, since it is
>     extending the set of disjoint packs
>   - --ignore-disjoint was a good name for the pack-objects option, since
>     it is ignoring objects in disjoint packs
> 
> Writing this out, I think that you could make an argument that
> `--exclude-disjoint` is a better name for the last option. So I'm
> definitely open to suggestions here, but I don't want to get too bogged
> down on command-line option naming (so long as we're all reasonably
> happy with the result).

Yeah, as said, I don't mind it too much. It's a complex area and the
flags all do different things, so it's expected that you may have to do
some research on what exactly they do. That being said, I do like your
proposed `--exclude-disjoint` a lot more than `--ignore-disjoint`.

> > One thing I wondered: do we need to consider the `-l` flag? When using
> > an alternate object directory it is totally feasible that the alternate
> > may be creating new disjoint packages without us knowing, and thus we
> > may not be able to guarantee the disjoint property anymore.
> 
> I don't think so. We'd only care about one direction of this (that
> alternates do not create disjoint packs which overlap with ours, instead
> of the other way around), but since we don't put non-local packs in the
> MIDX, I think we're OK.
> 
> I suppose that you might run into trouble if you use the chained MIDX
> thing (via its `->next` pointer). I haven't used that feature myself, so
> I'd have to play around with it.

We do use this feature at GitLab for forks, where forks connect to a
common alternate object directory to deduplicate objects. As both the
fork repository and the alternate object directory use an MIDX I think
they would be set up exactly like that.

I guess the only really viable solution here is to ignore disjoint packs
in the main repo that connects to the alternate in the case where the
alternate has any disjoint packs itself.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [BUG] rev-list doesn't validate arguments to -n option
From: Britton Kerin @ 2023-12-07 22:12 UTC (permalink / raw)
  To: git

It tolerates non-numeric arguments and garbage after a number:

For example:

$ # -n 1 means same as -n 0:
$ git rev-list -n q newest_commit
$ git rev-list -n 0 newest_commit
$ # Garbage after number is tolerated:
$ git rev-list -n 1q newest_commit
3be33f83695088d968cf084a1a08bdcde25a8d7a
$ git rev-list -n 2q newest_commit
3be33f83695088d968cf084a1a08bdcde25a8d7a
286e62e1b68e39334978e6222cbff187ecc17bcf

^ permalink raw reply

* Re: [PATCH 16/24] pack-objects: prepare `write_reused_pack()` for multi-pack reuse
From: Taylor Blau @ 2023-12-07 20:47 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZXHE-cLrP7iRHWHY@tanuki>

On Thu, Dec 07, 2023 at 02:13:29PM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 28, 2023 at 02:08:37PM -0500, Taylor Blau wrote:
> > The function `write_reused_pack()` within `builtin/pack-objects.c` is
> > responsible for performing pack-reuse on a single pack, and has two main
> > functions:
> >
> >   - it dispatches a call to `write_reused_pack_verbatim()` to see if we
> >     can reuse portions of the packfile in whole-word chunks
> >
> >   - for any remaining objects (that is, any objects that appear after
> >     the first "gap" in the bitmap), call write_reused_pack_one() on that
> >     object to record it for reuse.
> >
> > Prepare this function for multi-pack reuse by removing the assumption
> > that the bit position corresponding to the first object being reused
> > from a given pack may not be at bit position zero.
>
> Is this double-negation intended? We remove the assumption that we start
> reading at position zero, but the paragraph here states that we remove
> the assumption that we do _not_ start at bit zero.

Oops, great catch. I'll s/may not/must in the last paragraph to clarify.

> I'll stop reviewing here and will come back to this series somewhen next
> week.

Thanks as usual for your review -- I appreciate you digging through this
rather dense series.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 14/24] pack-objects: keep track of `pack_start` for each reuse pack
From: Taylor Blau @ 2023-12-07 20:43 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZXHE9L7iqXQAit_1@tanuki>

On Thu, Dec 07, 2023 at 02:13:24PM +0100, Patrick Steinhardt wrote:
> > In order to compute this value correctly, we need to know not only where
> > we are in the packfile we're assembling (with `hashfile_total(f)`) but
> > also the position of the first byte of the packfile that we are
> > currently reusing.
> >
> > Together, these two allow us to compute the reused chunk's offset
> > difference relative to the start of the reused pack, as desired.
>
> Hm. I'm not quite sure I fully understand the motivation here. Is this
> something that was broken all along? Why does it become a problem now?
> Sorry if I'm missing the obvious here.

No worries, I should have explained this better. Indeed we do have to
worry about patching deltas today when reusing objects from a pack. But
we have to extend the implementation in order to perform reuse over
multiple packs when any of them (excluding the first, which would work
with the existing logic) have delta/base pairs on either side of a gap.

I'll try to make it a little clearer, thanks for pointing that out.

> > @@ -1126,10 +1129,12 @@ static void write_reused_pack(struct packed_git *reuse_packfile,
> >  {
> >  	size_t i = 0;
> >  	uint32_t offset;
> > +	off_t pack_start = hashfile_total(f) - sizeof(struct pack_header);
>
> Given that this patch in its current state doesn't seem to do anything
> yet, am I right in assuming that `hashfile_total(f) - sizeof(struct
> pack_header)` is always expected to be zero for now?

Yep, that's right.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 10/24] pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions
From: Taylor Blau @ 2023-12-07 20:34 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZXHE6Ym3CICtNxFd@tanuki>

On Thu, Dec 07, 2023 at 02:13:13PM +0100, Patrick Steinhardt wrote:
> > +	if (bitmap_is_midx(bitmap_git)) {
> > +		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> > +			struct bitmapped_pack pack;
> > +			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
> > +				warning(_("unable to load pack: '%s', disabling pack-reuse"),
> > +					bitmap_git->midx->pack_names[i]);
> > +				free(packs);
> > +				return -1;
> > +			}
> > +			if (!pack.bitmap_nr)
> > +				continue; /* no objects from this pack */
> > +			if (pack.bitmap_pos)
> > +				continue; /* not preferred pack */
> > +
> > +			ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
> > +			memcpy(&packs[packs_nr++], &pack, sizeof(pack));
> > +
> > +			objects_nr += pack.p->num_objects;
> > +		}
> > +	} else {
> > +		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
> > +
> > +		packs[packs_nr].p = bitmap_git->pack;
> > +		packs[packs_nr].bitmap_pos = 0;
> > +		packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects;
> > +		packs[packs_nr].disjoint = 1;
> > +
> > +		objects_nr = packs[packs_nr++].p->num_objects;
> > +	}
> > +
> > +	word_alloc = objects_nr / BITS_IN_EWORD;
> > +	if (objects_nr % BITS_IN_EWORD)
> > +		word_alloc++;
> > +	reuse = bitmap_word_alloc(word_alloc);
> > +
> > +	if (packs_nr != 1)
> > +		BUG("pack reuse not yet implemented for multiple packs");
>
> Can't it happen that we have no pack here? In the MIDX-case we skip all
> packs that either do not have a bitmap or are not preferred. So does it
> mean that in reverse, every preferred packfile must have a a bitmap? I'd
> think that to not be true in case bitmaps are turned off.

It's subtle, but this state is indeed not possible. If we have a MIDX
and it has a bitmap, we know that there is at least one object at least
one pack.

On the "at least one object front", that check was added in eb57277ba3
(midx: prevent writing a .bitmap without any objects, 2022-02-09). And
we know that our preferred pack (either explicitly given or the one we
infer automatically) is non-empty, via the check added in 5d3cd09a80
(midx: reject empty `--preferred-pack`'s, 2021-08-31).

(As a fun/non-fun aside, looking these up gave me some serious deja-vu
and reminded me of how painful discovering and fixing those bugs was!)

So we're OK here. We could add a comment which captures what I wrote
above here, but since this is a temporary state (and we're going to
change how we select which packs are reuse candidates in a later patch),
I think it's OK to avoid (but please let me know if you feel differently).

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 09/24] repack: implement `--extend-disjoint` mode
From: Taylor Blau @ 2023-12-07 20:28 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZXHE5Lce_6CAWKFT@tanuki>

On Thu, Dec 07, 2023 at 02:13:08PM +0100, Patrick Steinhardt wrote:
> > The gotchas mostly have to do with making sure that we do not generate a
> > disjoint pack in the following scenarios:
>
> Okay, let me verify whether I understand the reasons:
>
> >   - promisor packs
>
> Which is because promisor packs actually don't contain any objects?

Right.

> >   - cruft packs (which may necessarily need to include an object from a
> >     disjoint pack in order to freshen it in certain circumstances)
>
> This one took me a while to figure out. If we'd mark crufts as disjoint,
> then it would mean that new packfiles cannot be marked as disjoint if
> objects which were previously unreachable do become reachable again.
> So we'd be pessimizing packfiles for live objects in favor of others
> which aren't.

Yeah, that's right, too. There are a couple of cases where more than one
cruft pack may contain the same object, one of them being the
flip-flopping between reachable and unreachable as you suggest above.
Another is that you have a non-prunable unreachable object which is
already in a cruft pack. If the object's mtime gets updated (and still
cannot be pruned), we'll end up freshening the object loose, and then
packing it again (with the more recent mtime) into a new cruft pack.

That aside, I actually think that there are ways to mark cruft packs
disjoint. But they're complicated, and moreover, I don't think you'd
ever *want* to mark a cruft pack as disjoint. Cruft packs usually
contain garbage, which is unlikely to be useful to any fetches/clones.

If we did mark them as disjoint, it would mean that we could reuse
verbatim sections of the cruft pack in our output, but we would likely
end up with very few such sections.

> >   - all-into-one repacks without '-d'
>
> Because here the old packfiles that this would make redundant aren't
> deleted and thus the objects are duplicate now.

Yep.

> > Otherwise, we mark which packs were created as disjoint by using a new
> > bit in the `generated_pack_data` struct, and then marking those pack(s)
> > as disjoint accordingly when generating the MIDX. Non-deleted packs
> > which are marked as disjoint are retained as such by passing the
> > equivalent of `--retain-disjoint` when calling the MIDX API to update
> > the MIDX.
>
> Okay. I had a bit of trouble to sift through the various different
> flags like "--retain-disjoint", "--extend-disjoint", "--ignore-disjoint"
> and so on. But well, they do different things and it's been a few days
> since I've reviewed the preceding patches, so this is probably fine.

Yeah, I am definitely open to better naming conventions here? I figured
that:

  - --retain-disjoint was a good name for the MIDX option, since it is
    retaining existing disjoint packs in the new MIDX
  - --extend-disjoint was a good name for the repack option, since it is
    extending the set of disjoint packs
  - --ignore-disjoint was a good name for the pack-objects option, since
    it is ignoring objects in disjoint packs

Writing this out, I think that you could make an argument that
`--exclude-disjoint` is a better name for the last option. So I'm
definitely open to suggestions here, but I don't want to get too bogged
down on command-line option naming (so long as we're all reasonably
happy with the result).

> One thing I wondered: do we need to consider the `-l` flag? When using
> an alternate object directory it is totally feasible that the alternate
> may be creating new disjoint packages without us knowing, and thus we
> may not be able to guarantee the disjoint property anymore.

I don't think so. We'd only care about one direction of this (that
alternates do not create disjoint packs which overlap with ours, instead
of the other way around), but since we don't put non-local packs in the
MIDX, I think we're OK.

I suppose that you might run into trouble if you use the chained MIDX
thing (via its `->next` pointer). I haven't used that feature myself, so
I'd have to play around with it.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 11/24] pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature
From: Taylor Blau @ 2023-12-07 14:36 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZXHE7_KwukSRBET1@tanuki>

On Thu, Dec 07, 2023 at 02:13:19PM +0100, Patrick Steinhardt wrote:
> > +	if (reuse_packfile) {
> > +		reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap);
> > +		if (!reuse_packfile_objects)
> > +			BUG("expected non-empty reuse bitmap");
>
> We're now re-computing `bitmap_popcount()` for the bitmap a second time.
> But I really don't think this is ever going to be a problem in practice
> given that it only does a bunch of math. Any performance regression
> would thus ultimately be drowned out by everything else.
>
> In other words: this is probably fine.

I definitely agree that any performance regression from calling
bitmap_popcount() twice would be drowned out by the rest of what
pack-objects is doing.

For what it's worth:

- The bitmap_popcount() call is a loop over ewah_bit_popcount64() for
  each of the allocated words. And the latter is more or less three
  copies of:

      b7:	55 55 55
      ba:	48 23 45 f8          	and    -0x8(%rbp),%rax
      be:	48 8b 55 f8          	mov    -0x8(%rbp),%rdx
      c2:	48 89 d1             	mov    %rdx,%rcx
      c5:	48 d1 e9             	shr    %rcx
      c8:	48 ba 55 55 55 55 55 	movabs $0x5555555555555555,%rdx
      cf:	55 55 55
      d2:	48 21 ca             	and    %rcx,%rdx
      d5:	48 01 d0             	add    %rdx,%rax
      d8:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
      dc:	48 b8 33 33 33 33 33 	movabs $0x3333333333333333,%rax

  Followed by:

     144:	48 0f af c2          	imul   %rdx,%rax
     148:	48 c1 e8 38          	shr    $0x38,%rax
     14c:	5d                   	pop    %rbp
     14d:	c3                   	ret

  With the usual x86 ABI preamble and postamble. So this should be an
  extremely cheap function to compute.

- But, the earlier bitmap_popcount() call in
  reuse_partial_packfile_from_bitmap() is not necessary, since we only
  care whether or not there are _any_ bits set in the bitmap, not how
  many of them there are.

  So we could write something like `bitmap_empty(reuse)` instead, which
  would be much cheaper (again, not that I think we'll notice this
  either way, but throwing away the result of bitmap_popcount() and
  calling it twice does leave me a little unsatisfied).

So I think we could reasonably do something like:

--- 8< ---
diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 7b525b1ecd..ac7e0af622 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -169,6 +169,15 @@ size_t bitmap_popcount(struct bitmap *self)
 	return count;
 }

+int bitmap_is_empty(struct bitmap *self)
+{
+	size_t i;
+	for (i = 0; i < self->word_alloc; i++)
+		if (self->words[i])
+			return 0;
+	return 1;
+}
+
 int bitmap_equals(struct bitmap *self, struct bitmap *other)
 {
 	struct bitmap *big, *small;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 7eb8b9b630..c11d76c6f3 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -189,5 +189,6 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other);
 void bitmap_or(struct bitmap *self, const struct bitmap *other);

 size_t bitmap_popcount(struct bitmap *self);
+int bitmap_is_empty(struct bitmap *self);

 #endif
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 614fc09a4e..e50b322779 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2045,7 +2045,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,

 	reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse);

-	if (!bitmap_popcount(reuse)) {
+	if (bitmap_is_empty(reuse)) {
 		free(packs);
 		bitmap_free(reuse);
 		return;
--- >8 ---

Thanks,
Taylor

^ permalink raw reply related

* Re: [PATCH 16/24] pack-objects: prepare `write_reused_pack()` for multi-pack reuse
From: Patrick Steinhardt @ 2023-12-07 13:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <67a8196978244b56d4f60861f140b78c59d15e30.1701198172.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

On Tue, Nov 28, 2023 at 02:08:37PM -0500, Taylor Blau wrote:
> The function `write_reused_pack()` within `builtin/pack-objects.c` is
> responsible for performing pack-reuse on a single pack, and has two main
> functions:
> 
>   - it dispatches a call to `write_reused_pack_verbatim()` to see if we
>     can reuse portions of the packfile in whole-word chunks
> 
>   - for any remaining objects (that is, any objects that appear after
>     the first "gap" in the bitmap), call write_reused_pack_one() on that
>     object to record it for reuse.
> 
> Prepare this function for multi-pack reuse by removing the assumption
> that the bit position corresponding to the first object being reused
> from a given pack may not be at bit position zero.

Is this double-negation intended? We remove the assumption that we start
reading at position zero, but the paragraph here states that we remove
the assumption that we do _not_ start at bit zero.

I'll stop reviewing here and will come back to this series somewhen next
week.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 14/24] pack-objects: keep track of `pack_start` for each reuse pack
From: Patrick Steinhardt @ 2023-12-07 13:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <6f4fba861b59f909148775ee64c3ba89afc872b5.1701198172.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 4093 bytes --]

On Tue, Nov 28, 2023 at 02:08:32PM -0500, Taylor Blau wrote:
> When reusing objects from a pack, we keep track of a set of one or more
> `reused_chunk`s, corresponding to sections of one or more object(s) from
> a source pack that we are reusing. Each chunk contains two pieces of
> information:
> 
>   - the offset of the first object in the source pack (relative to the
>     beginning of the source pack)
>   - the difference between that offset, and the corresponding offset in
>     the pack we're generating
> 
> The purpose of keeping track of these is so that we can patch an
> OFS_DELTAs that cross over a section of the reuse pack that we didn't
> take.
> 
> For instance, consider a hypothetical pack as shown below:
> 
>                                                 (chunk #2)
>                                                 __________...
>                                                /
>                                               /
>       +--------+---------+-------------------+---------+
>   ... | <base> | <other> |      (unused)     | <delta> | ...
>       +--------+---------+-------------------+---------+
>        \                /
>         \______________/
>            (chunk #1)
> 
> Suppose that we are sending objects "base", "other", and "delta", and
> that the "delta" object is stored as an OFS_DELTA, and that its base is
> "base". If we don't send any objects in the "(unused)" range, we can't
> copy the delta'd object directly, since its delta offset includes a
> range of the pack that we didn't copy, so we have to account for that
> difference when patching and reassembling the delta.
> 
> In order to compute this value correctly, we need to know not only where
> we are in the packfile we're assembling (with `hashfile_total(f)`) but
> also the position of the first byte of the packfile that we are
> currently reusing.
> 
> Together, these two allow us to compute the reused chunk's offset
> difference relative to the start of the reused pack, as desired.

Hm. I'm not quite sure I fully understand the motivation here. Is this
something that was broken all along? Why does it become a problem now?
Sorry if I'm missing the obvious here.

> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 7682bd65bb..eb8be514d1 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1016,6 +1016,7 @@ static off_t find_reused_offset(off_t where)
>  
>  static void write_reused_pack_one(struct packed_git *reuse_packfile,
>  				  size_t pos, struct hashfile *out,
> +				  off_t pack_start,
>  				  struct pack_window **w_curs)
>  {
>  	off_t offset, next, cur;
> @@ -1025,7 +1026,8 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
>  	offset = pack_pos_to_offset(reuse_packfile, pos);
>  	next = pack_pos_to_offset(reuse_packfile, pos + 1);
>  
> -	record_reused_object(offset, offset - hashfile_total(out));
> +	record_reused_object(offset,
> +			     offset - (hashfile_total(out) - pack_start));
>  
>  	cur = offset;
>  	type = unpack_object_header(reuse_packfile, w_curs, &cur, &size);
> @@ -1095,6 +1097,7 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
>  
>  static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile,
>  					 struct hashfile *out,
> +					 off_t pack_start UNUSED,
>  					 struct pack_window **w_curs)
>  {
>  	size_t pos = 0;
> @@ -1126,10 +1129,12 @@ static void write_reused_pack(struct packed_git *reuse_packfile,
>  {
>  	size_t i = 0;
>  	uint32_t offset;
> +	off_t pack_start = hashfile_total(f) - sizeof(struct pack_header);

Given that this patch in its current state doesn't seem to do anything
yet, am I right in assuming that `hashfile_total(f) - sizeof(struct
pack_header)` is always expected to be zero for now?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox