Git development

Git development
 help / color / mirror / Atom feed

* [PATCH v3 5/9] SubmittingPatches: update extra tags list
From: Josh Soref via GitGitGadget @ 2023-12-28  4:55 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
	Dragan Simic, Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v3.git.1703739324.gitgitgadget@gmail.com>

From: Josh Soref <jsoref@gmail.com>

Add items with at least 100 uses in the past three years:
- Co-authored-by
- Helped-by
- Mentored-by
- Suggested-by

git log --since=3.years|
  perl -ne 'next unless /^\s+[A-Z][a-z]+-\S+:/;s/^\s+//;s/:.*/:/;print'|
  sort|uniq -c|sort -n|grep '[0-9][0-9] '
  14 Based-on-patch-by:
  14 Original-patch-by:
  17 Tested-by:
 100 Suggested-by:
 121 Co-authored-by:
 163 Mentored-by:
 274 Reported-by:
 290 Acked-by:
 450 Helped-by:
 602 Reviewed-by:
14111 Signed-off-by:

Signed-off-by: Josh Soref <jsoref@gmail.com>
---
 Documentation/SubmittingPatches | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 58dfe405049..31878cb70b7 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -355,6 +355,14 @@ If you like, you can put extra tags at the end:
   patch after a detailed analysis.
 . `Tested-by:` is used to indicate that the person applied the patch
   and found it to have the desired effect.
+. `Co-authored-by:` is used to indicate that people exchanged drafts
+   of a patch before submitting it.
+. `Helped-by:` is used to credit someone who suggested ideas for
+  changes without providing the precise changes in patch form.
+. `Mentored-by:` is used to credit someone with helping develop a
+  patch as part of a mentorship program (e.g., GSoC or Outreachy).
+. `Suggested-by:` is used to credit someone with suggesting the idea
+  for a patch.
 
 While you can also create your own trailer if the situation warrants it, we
 encourage you to instead use one of the common trailers in this project
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 6/9] SubmittingPatches: provide tag naming advice
From: Josh Soref via GitGitGadget @ 2023-12-28  4:55 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
	Dragan Simic, Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v3.git.1703739324.gitgitgadget@gmail.com>

From: Josh Soref <jsoref@gmail.com>

Current statistics show a strong preference to only capitalize the first
letter in a hyphenated tag, but that some guidance would be helpful:

git log |
  perl -ne 'next unless /^\s+(?:Signed-[oO]ff|Acked)-[bB]y:/;
    s/^\s+//;s/:.*/:/;print'|
  sort|uniq -c|sort -n
   2 Signed-off-By:
   4 Signed-Off-by:
  22 Acked-By:
  47 Signed-Off-By:
2202 Acked-by:
95315 Signed-off-by:

Signed-off-by: Josh Soref <jsoref@gmail.com>
---
 Documentation/SubmittingPatches | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 31878cb70b7..94c874ab5e6 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -368,6 +368,9 @@ While you can also create your own trailer if the situation warrants it, we
 encourage you to instead use one of the common trailers in this project
 highlighted above.
 
+Only capitalize the very first letter of tags, i.e. favor
+"Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By".
+
 [[git-tools]]
 === Generate your patch using Git tools out of your commits.
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 7/9] SubmittingPatches: clarify GitHub visual
From: Josh Soref via GitGitGadget @ 2023-12-28  4:55 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
	Dragan Simic, Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v3.git.1703739324.gitgitgadget@gmail.com>

From: Josh Soref <jsoref@gmail.com>

GitHub has two general forms for its states, sometimes they're a simple
colored object (e.g. green check or red x), and sometimes there's also a
colored container (e.g. green box or red circle) which contains that
object (e.g. check or x).

That's a lot of words to try to describe things, but in general, the key
for a failure is that it's recognized as an `x` and that it's associated
with the color red -- the color of course is problematic for people who
are red-green color-blind, but that's why they are paired with distinct
shapes.

Signed-off-by: Josh Soref <jsoref@gmail.com>
---
 Documentation/SubmittingPatches | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 94c874ab5e6..0665f89f38c 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -602,11 +602,11 @@ After the initial setup, CI will run whenever you push new changes
 to your fork of Git on GitHub.  You can monitor the test state of all your
 branches here: `https://github.com/<Your GitHub handle>/git/actions/workflows/main.yml`
 
-If a branch did not pass all test cases then it is marked with a red
-cross. In that case you can click on the failing job and navigate to
-"ci/run-build-and-tests.sh" and/or "ci/print-test-failures.sh". You
-can also download "Artifacts" which are tarred (or zipped) archives
-with test data relevant for debugging.
+If a branch does not pass all test cases then it will be marked with a
+red +x+, instead of a green check. In that case, you can click on the
+failing job and navigate to "ci/run-build-and-tests.sh" and/or
+"ci/print-test-failures.sh". You can also download "Artifacts" which
+are tarred (or zipped) archives with test data relevant for debugging.
 
 Then fix the problem and push your fix to your GitHub fork. This will
 trigger a new CI build to ensure all tests pass.
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 8/9] SubmittingPatches: clarify GitHub artifact format
From: Josh Soref via GitGitGadget @ 2023-12-28  4:55 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
	Dragan Simic, Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v3.git.1703739324.gitgitgadget@gmail.com>

From: Josh Soref <jsoref@gmail.com>

GitHub wraps artifacts generated by workflows in a .zip file.

Internally, workflows can package anything they like in them.

A recently generated failure artifact had the form:

windows-artifacts.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
 76001695  12-19-2023 01:35   artifacts.tar.gz
 11005650  12-19-2023 01:35   tracked.tar.gz
---------                     -------
 87007345                     2 files

Signed-off-by: Josh Soref <jsoref@gmail.com>
---
 Documentation/SubmittingPatches | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 0665f89f38c..5e2e13b5e09 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -606,7 +606,8 @@ If a branch does not pass all test cases then it will be marked with a
 red +x+, instead of a green check. In that case, you can click on the
 failing job and navigate to "ci/run-build-and-tests.sh" and/or
 "ci/print-test-failures.sh". You can also download "Artifacts" which
-are tarred (or zipped) archives with test data relevant for debugging.
+are zip archives containing tarred (or zipped) archives with test data
+relevant for debugging.
 
 Then fix the problem and push your fix to your GitHub fork. This will
 trigger a new CI build to ensure all tests pass.
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 9/9] SubmittingPatches: hyphenate non-ASCII
From: Josh Soref via GitGitGadget @ 2023-12-28  4:55 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
	Dragan Simic, Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v3.git.1703739324.gitgitgadget@gmail.com>

From: Josh Soref <jsoref@gmail.com>

Git documentation does this with the exception of ancient release notes.

Signed-off-by: Josh Soref <jsoref@gmail.com>
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 5e2e13b5e09..e734a3f0f17 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -699,7 +699,7 @@ message to an external program, and this is a handy way to drive
 `git am`.  However, if the message is MIME encoded, what is
 piped into the program is the representation you see in your
 `*Article*` buffer after unwrapping MIME.  This is often not what
-you would want for two reasons.  It tends to screw up non ASCII
+you would want for two reasons.  It tends to screw up non-ASCII
 characters (most notably in people's names), and also
 whitespaces (fatal in patches).  Running "C-u g" to display the
 message in raw form before using "|" to run the pipe can work
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH v2 11/11] reftable/block: reuse buffer to compute record keys
From: Patrick Steinhardt @ 2023-12-28  5:53 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git, Jonathan Nieder
In-Reply-To: <CAOw_e7bQ+jxO2zhj32mDksq9uBKQfNt=wMNP5K6Oy1DqievCdg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1110 bytes --]

On Thu, Dec 21, 2023 at 11:43:27AM +0100, Han-Wen Nienhuys wrote:
> On Fri, Dec 8, 2023 at 3:53 PM Patrick Steinhardt <ps@pks.im> wrote:
> 
> > @@ -84,10 +84,12 @@ struct block_iter {
> >
> >         /* key for last entry we read. */
> >         struct strbuf last_key;
> > +       struct strbuf key;
> >  };
> 
> it's slightly more efficient, but the new field has no essential
> meaning. If I encountered this code with the change you make here, I
> would probably refactor it in the opposite direction to increase code
> clarity.
> 
> I suspect that the gains are too small to be measurable, but if you
> are after small efficiency gains, you can have
> reftable_record_decode() consume the key to avoid copying overhead in
> record.c.

Fair enough. I've done a better job for these kinds of refactorings for
the reftable library in my second patch series [1] by including some
benchmarks via Valgrind. This should result in less handwavy and
actually measurable/significant performance improvements.

Patrick

[1]: https://public-inbox.org/git/xmqqr0jgsn9g.fsf@gitster.g/T/#t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 0/8] reftable: fixes and optimizations (pt.2)
From: Patrick Steinhardt @ 2023-12-28  6:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1703063544.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3681 bytes --]

Hi,

this is the second version of my patch series that contains additional
fixes and optimizations for the reftable library. Changes compared to
v1:

  - Added a patch to not auto-compact twice in `reftable_stack_add()`,
    reported by Han-Wen.

  - Fixed an issue I've previously introduced while rebasing the patch
    series.

  - Fixed a missing header, as reported by Junio.

Due to the added patch that avoids double auto-compaction this patch
series now depends on 637e34a783 (Merge branch 'ps/reftable-fixes',
2023-12-27).

Patrick

Patrick Steinhardt (8):
  reftable/stack: do not overwrite errors when compacting
  reftable/stack: do not auto-compact twice in `reftable_stack_add()`
  reftable/writer: fix index corruption when writing multiple indices
  reftable/record: constify some parts of the interface
  reftable/record: store "val1" hashes as static arrays
  reftable/record: store "val2" hashes as static arrays
  reftable/merged: really reuse buffers to compute record keys
  reftable/merged: transfer ownership of records when iterating

 reftable/block_test.c      |   4 +-
 reftable/merged.c          |   8 +--
 reftable/merged_test.c     |  16 +++---
 reftable/readwrite_test.c  | 102 +++++++++++++++++++++++++++++++------
 reftable/record.c          |  17 ++-----
 reftable/record_test.c     |   5 --
 reftable/reftable-record.h |  11 ++--
 reftable/stack.c           |  23 +++------
 reftable/stack_test.c      |   2 -
 reftable/writer.c          |   4 +-
 10 files changed, 117 insertions(+), 75 deletions(-)

Range-diff against v1:
1:  573fb2c4fb = 1:  22a57020c6 reftable/stack: do not overwrite errors when compacting
-:  ---------- > 2:  a08f7efc99 reftable/stack: do not auto-compact twice in `reftable_stack_add()`
2:  86ee79c48d = 3:  c00e08d97f reftable/writer: fix index corruption when writing multiple indices
3:  3ad4a0e5b9 = 4:  3416268087 reftable/record: constify some parts of the interface
4:  06c9eab678 ! 5:  46ca3a37f8 reftable/record: store "val1" hashes as static arrays
    @@ reftable/readwrite_test.c: static void test_write_object_id_length(void)
      	};
      	int err;
      	int i;
    +@@ reftable/readwrite_test.c: static void test_write_multiple_indices(void)
    + 	writer = reftable_new_writer(&strbuf_add_void, &writer_buf, &opts);
    + 	reftable_writer_set_limits(writer, 1, 1);
    + 	for (i = 0; i < 100; i++) {
    +-		unsigned char hash[GIT_SHA1_RAWSZ] = {i};
    + 		struct reftable_ref_record ref = {
    + 			.update_index = 1,
    + 			.value_type = REFTABLE_REF_VAL1,
    +-			.value.val1 = hash,
    ++			.value.val1 = {i},
    + 		};
    + 
    + 		strbuf_reset(&buf);
     
      ## reftable/record.c ##
     @@ reftable/record.c: static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
    @@ reftable/record_test.c: static void test_reftable_ref_record_roundtrip(void)
      		case REFTABLE_REF_VAL2:
     
      ## reftable/reftable-record.h ##
    +@@ reftable/reftable-record.h: license that can be found in the LICENSE file or at
    + #ifndef REFTABLE_RECORD_H
    + #define REFTABLE_RECORD_H
    + 
    ++#include "hash-ll.h"
    + #include <stdint.h>
    + 
    + /*
     @@ reftable/reftable-record.h: struct reftable_ref_record {
      #define REFTABLE_NR_REF_VALUETYPES 4
      	} value_type;
5:  49f13c123f = 6:  c8a36917b1 reftable/record: store "val2" hashes as static arrays
6:  32b7ddee28 = 7:  6313f8affd reftable/merged: really reuse buffers to compute record keys
7:  3dbabea22a = 8:  25a3919e58 reftable/merged: transfer ownership of records when iterating

-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 1/8] reftable/stack: do not overwrite errors when compacting
From: Patrick Steinhardt @ 2023-12-28  6:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1703743174.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1854 bytes --]

In order to compact multiple stacks we iterate through the merged ref
and log records. When there is any error either when reading the records
from the old merged table or when writing the records to the new table
then we break out of the respective loops. When breaking out of the loop
for the ref records though the error code will be overwritten, which may
cause us to inadvertently skip over bad ref records. In the worst case,
this can lead to a compacted stack that is missing records.

Fix the code by using `goto done` instead so that any potential error
codes are properly returned to the caller.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 16bab82063..8729508dc3 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -801,18 +801,16 @@ static int stack_write_compact(struct reftable_stack *st,
 			err = 0;
 			break;
 		}
-		if (err < 0) {
-			break;
-		}
+		if (err < 0)
+			goto done;
 
 		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
 			continue;
 		}
 
 		err = reftable_writer_add_ref(wr, &ref);
-		if (err < 0) {
-			break;
-		}
+		if (err < 0)
+			goto done;
 		entries++;
 	}
 	reftable_iterator_destroy(&it);
@@ -827,9 +825,8 @@ static int stack_write_compact(struct reftable_stack *st,
 			err = 0;
 			break;
 		}
-		if (err < 0) {
-			break;
-		}
+		if (err < 0)
+			goto done;
 		if (first == 0 && reftable_log_record_is_deletion(&log)) {
 			continue;
 		}
@@ -845,9 +842,8 @@ static int stack_write_compact(struct reftable_stack *st,
 		}
 
 		err = reftable_writer_add_log(wr, &log);
-		if (err < 0) {
-			break;
-		}
+		if (err < 0)
+			goto done;
 		entries++;
 	}
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 2/8] reftable/stack: do not auto-compact twice in `reftable_stack_add()`
From: Patrick Steinhardt @ 2023-12-28  6:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1703743174.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

In 5c086453ff (reftable/stack: perform auto-compaction with
transactional interface, 2023-12-11), we fixed a bug where the
transactional interface to add changes to a reftable stack did not
perform auto-compaction by calling `reftable_stack_auto_compact()` in
`reftable_stack_addition_commit()`. While correct, this change may now
cause us to perform auto-compaction twice in the non-transactional
interface `reftable_stack_add()`:

  - It performs auto-compaction by itself.

  - It now transitively performs auto-compaction via the transactional
    interface.

Remove the first instance so that we only end up doing auto-compaction
once.

Reported-by: Han-Wen Nienhuys <hanwenn@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 8729508dc3..7ffeb3ee10 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -425,9 +425,6 @@ int reftable_stack_add(struct reftable_stack *st,
 		return err;
 	}

-	if (!st->disable_auto_compact)
-		return reftable_stack_auto_compact(st);
-
 	return 0;
 }

-- 
2.43.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 3/8] reftable/writer: fix index corruption when writing multiple indices
From: Patrick Steinhardt @ 2023-12-28  6:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1703743174.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 5241 bytes --]

Each reftable may contain multiple types of blocks for refs, objects and
reflog records, where each of these may have an index that makes it more
efficient to find the records. It was observed that the index for log
records can become corrupted under certain circumstances, where the
first entry of the index points into the object index instead of to the
log records.

As it turns out, this corruption can occur whenever we write a log index
as well as at least one additional index. Writing records and their index
is basically a two-step process:

  1. We write all blocks for the corresponding record. Each block that
     gets written is added to a list of blocks to index.

  2. Once all blocks were written we finish the section. If at least two
     blocks have been added to the list of blocks to index then we will
     now write the index for those blocks and flush it, as well.

When we have a very large number of blocks then we may decide to write a
multi-level index, which is why we also keep track of the list of the
index blocks in the same way as we previously kept track of the blocks
to index.

Now when we have finished writing all index blocks we clear the index
and flush the last block to disk. This is done in the wrong order though
because flushing the block to disk will re-add it to the list of blocks
to be indexed. The result is that the next section we are about to write
will have an entry in the list of blocks to index that points to the
last block of the preceding section's index, which will corrupt the log
index.

Fix this corruption by clearing the index after having written the last
block.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/readwrite_test.c | 80 +++++++++++++++++++++++++++++++++++++++
 reftable/writer.c         |  4 +-
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 278663f22d..9c16e0504e 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -798,6 +798,85 @@ static void test_write_key_order(void)
 	strbuf_release(&buf);
 }
 
+static void test_write_multiple_indices(void)
+{
+	struct reftable_write_options opts = {
+		.block_size = 100,
+	};
+	struct strbuf writer_buf = STRBUF_INIT, buf = STRBUF_INIT;
+	struct reftable_block_source source = { 0 };
+	struct reftable_iterator it = { 0 };
+	const struct reftable_stats *stats;
+	struct reftable_writer *writer;
+	struct reftable_reader *reader;
+	int err, i;
+
+	writer = reftable_new_writer(&strbuf_add_void, &writer_buf, &opts);
+	reftable_writer_set_limits(writer, 1, 1);
+	for (i = 0; i < 100; i++) {
+		unsigned char hash[GIT_SHA1_RAWSZ] = {i};
+		struct reftable_ref_record ref = {
+			.update_index = 1,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = hash,
+		};
+
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "refs/heads/%04d", i);
+		ref.refname = buf.buf,
+
+		err = reftable_writer_add_ref(writer, &ref);
+		EXPECT_ERR(err);
+	}
+
+	for (i = 0; i < 100; i++) {
+		unsigned char hash[GIT_SHA1_RAWSZ] = {i};
+		struct reftable_log_record log = {
+			.update_index = 1,
+			.value_type = REFTABLE_LOG_UPDATE,
+			.value.update = {
+				.old_hash = hash,
+				.new_hash = hash,
+			},
+		};
+
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "refs/heads/%04d", i);
+		log.refname = buf.buf,
+
+		err = reftable_writer_add_log(writer, &log);
+		EXPECT_ERR(err);
+	}
+
+	reftable_writer_close(writer);
+
+	/*
+	 * The written data should be sufficiently large to result in indices
+	 * for each of the block types.
+	 */
+	stats = reftable_writer_stats(writer);
+	EXPECT(stats->ref_stats.index_offset > 0);
+	EXPECT(stats->obj_stats.index_offset > 0);
+	EXPECT(stats->log_stats.index_offset > 0);
+
+	block_source_from_strbuf(&source, &writer_buf);
+	err = reftable_new_reader(&reader, &source, "filename");
+	EXPECT_ERR(err);
+
+	/*
+	 * Seeking the log uses the log index now. In case there is any
+	 * confusion regarding indices we would notice here.
+	 */
+	err = reftable_reader_seek_log(reader, &it, "");
+	EXPECT_ERR(err);
+
+	reftable_iterator_destroy(&it);
+	reftable_writer_free(writer);
+	reftable_reader_free(reader);
+	strbuf_release(&writer_buf);
+	strbuf_release(&buf);
+}
+
 static void test_corrupt_table_empty(void)
 {
 	struct strbuf buf = STRBUF_INIT;
@@ -847,5 +926,6 @@ int readwrite_test_main(int argc, const char *argv[])
 	RUN_TEST(test_log_overflow);
 	RUN_TEST(test_write_object_id_length);
 	RUN_TEST(test_write_object_id_min_length);
+	RUN_TEST(test_write_multiple_indices);
 	return 0;
 }
diff --git a/reftable/writer.c b/reftable/writer.c
index 2e322a5683..ee4590e20f 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -432,12 +432,12 @@ static int writer_finish_section(struct reftable_writer *w)
 		reftable_free(idx);
 	}
 
-	writer_clear_index(w);
-
 	err = writer_flush_block(w);
 	if (err < 0)
 		return err;
 
+	writer_clear_index(w);
+
 	bstats = writer_reftable_block_stats(w, typ);
 	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
 	bstats->index_offset = index_start;
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 4/8] reftable/record: constify some parts of the interface
From: Patrick Steinhardt @ 2023-12-28  6:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1703743174.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2714 bytes --]

We're about to convert reftable records to stop storing their object IDs
as allocated hashes. Prepare for this refactoring by constifying some
parts of the interface that will be impacted by this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c          | 8 ++++----
 reftable/reftable-record.h | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/reftable/record.c b/reftable/record.c
index fbaa1fbef5..5e258c734b 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -76,7 +76,7 @@ int reftable_is_block_type(uint8_t typ)
 	return 0;
 }
 
-uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec)
+const unsigned char *reftable_ref_record_val1(const struct reftable_ref_record *rec)
 {
 	switch (rec->value_type) {
 	case REFTABLE_REF_VAL1:
@@ -88,7 +88,7 @@ uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec)
 	}
 }
 
-uint8_t *reftable_ref_record_val2(const struct reftable_ref_record *rec)
+const unsigned char *reftable_ref_record_val2(const struct reftable_ref_record *rec)
 {
 	switch (rec->value_type) {
 	case REFTABLE_REF_VAL2:
@@ -242,7 +242,7 @@ static char hexdigit(int c)
 	return 'a' + (c - 10);
 }
 
-static void hex_format(char *dest, uint8_t *src, int hash_size)
+static void hex_format(char *dest, const unsigned char *src, int hash_size)
 {
 	assert(hash_size > 0);
 	if (src) {
@@ -1164,7 +1164,7 @@ int reftable_record_equal(struct reftable_record *a, struct reftable_record *b,
 		reftable_record_data(a), reftable_record_data(b), hash_size);
 }
 
-static int hash_equal(uint8_t *a, uint8_t *b, int hash_size)
+static int hash_equal(const unsigned char *a, const unsigned char *b, int hash_size)
 {
 	if (a && b)
 		return !memcmp(a, b, hash_size);
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index 67104f8fbf..f7eb2d6015 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -49,11 +49,11 @@ struct reftable_ref_record {
 
 /* Returns the first hash, or NULL if `rec` is not of type
  * REFTABLE_REF_VAL1 or REFTABLE_REF_VAL2. */
-uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec);
+const unsigned char *reftable_ref_record_val1(const struct reftable_ref_record *rec);
 
 /* Returns the second hash, or NULL if `rec` is not of type
  * REFTABLE_REF_VAL2. */
-uint8_t *reftable_ref_record_val2(const struct reftable_ref_record *rec);
+const unsigned char *reftable_ref_record_val2(const struct reftable_ref_record *rec);
 
 /* returns whether 'ref' represents a deletion */
 int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 5/8] reftable/record: store "val1" hashes as static arrays
From: Patrick Steinhardt @ 2023-12-28  6:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1703743174.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 8585 bytes --]

When reading ref records of type "val1" we store its object ID in an
allocated array. This results in an additional allocation for every
single ref record we read, which is rather inefficient especially when
iterating over refs.

Refactor the code to instead use a static array of `GIT_MAX_RAWSZ`
bytes. While this means that `struct ref_record` is bigger now, we
typically do not store all refs in an array anyway and instead only
handle a limited number of records at the same point in time.

Using `git show-ref --quiet` in a repository with ~350k refs this leads
to a significant drop in allocations. Before:

    HEAP SUMMARY:
        in use at exit: 21,098 bytes in 192 blocks
      total heap usage: 2,116,683 allocs, 2,116,491 frees, 76,098,060 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,098 bytes in 192 blocks
      total heap usage: 1,419,031 allocs, 1,418,839 frees, 62,145,036 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block_test.c      |  4 +---
 reftable/merged_test.c     | 16 ++++++----------
 reftable/readwrite_test.c  | 14 ++++----------
 reftable/record.c          |  3 ---
 reftable/record_test.c     |  1 -
 reftable/reftable-record.h |  3 ++-
 reftable/stack_test.c      |  2 --
 7 files changed, 13 insertions(+), 30 deletions(-)

diff --git a/reftable/block_test.c b/reftable/block_test.c
index c00bbc8aed..dedb05c7d8 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -49,13 +49,11 @@ static void test_block_read_write(void)
 
 	for (i = 0; i < N; i++) {
 		char name[100];
-		uint8_t hash[GIT_SHA1_RAWSZ];
 		snprintf(name, sizeof(name), "branch%02d", i);
-		memset(hash, i, sizeof(hash));
 
 		rec.u.ref.refname = name;
 		rec.u.ref.value_type = REFTABLE_REF_VAL1;
-		rec.u.ref.value.val1 = hash;
+		memset(rec.u.ref.value.val1, i, GIT_SHA1_RAWSZ);
 
 		names[i] = xstrdup(name);
 		n = block_writer_add(&bw, &rec);
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
index d08c16abef..b3927a5d73 100644
--- a/reftable/merged_test.c
+++ b/reftable/merged_test.c
@@ -123,13 +123,11 @@ static void readers_destroy(struct reftable_reader **readers, size_t n)
 
 static void test_merged_between(void)
 {
-	uint8_t hash1[GIT_SHA1_RAWSZ] = { 1, 2, 3, 0 };
-
 	struct reftable_ref_record r1[] = { {
 		.refname = "b",
 		.update_index = 1,
 		.value_type = REFTABLE_REF_VAL1,
-		.value.val1 = hash1,
+		.value.val1 = { 1, 2, 3, 0 },
 	} };
 	struct reftable_ref_record r2[] = { {
 		.refname = "a",
@@ -165,26 +163,24 @@ static void test_merged_between(void)
 
 static void test_merged(void)
 {
-	uint8_t hash1[GIT_SHA1_RAWSZ] = { 1 };
-	uint8_t hash2[GIT_SHA1_RAWSZ] = { 2 };
 	struct reftable_ref_record r1[] = {
 		{
 			.refname = "a",
 			.update_index = 1,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash1,
+			.value.val1 = { 1 },
 		},
 		{
 			.refname = "b",
 			.update_index = 1,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash1,
+			.value.val1 = { 1 },
 		},
 		{
 			.refname = "c",
 			.update_index = 1,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash1,
+			.value.val1 = { 1 },
 		}
 	};
 	struct reftable_ref_record r2[] = { {
@@ -197,13 +193,13 @@ static void test_merged(void)
 			.refname = "c",
 			.update_index = 3,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash2,
+			.value.val1 = { 2 },
 		},
 		{
 			.refname = "d",
 			.update_index = 3,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash1,
+			.value.val1 = { 1 },
 		},
 	};
 
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 9c16e0504e..87b238105c 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -60,18 +60,15 @@ static void write_table(char ***names, struct strbuf *buf, int N,
 	*names = reftable_calloc(sizeof(char *) * (N + 1));
 	reftable_writer_set_limits(w, update_index, update_index);
 	for (i = 0; i < N; i++) {
-		uint8_t hash[GIT_SHA256_RAWSZ] = { 0 };
 		char name[100];
 		int n;
 
-		set_test_hash(hash, i);
-
 		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
 
 		ref.refname = name;
 		ref.update_index = update_index;
 		ref.value_type = REFTABLE_REF_VAL1;
-		ref.value.val1 = hash;
+		set_test_hash(ref.value.val1, i);
 		(*names)[i] = xstrdup(name);
 
 		n = reftable_writer_add_ref(w, &ref);
@@ -675,11 +672,10 @@ static void test_write_object_id_min_length(void)
 	struct strbuf buf = STRBUF_INIT;
 	struct reftable_writer *w =
 		reftable_new_writer(&strbuf_add_void, &buf, &opts);
-	uint8_t hash[GIT_SHA1_RAWSZ] = {42};
 	struct reftable_ref_record ref = {
 		.update_index = 1,
 		.value_type = REFTABLE_REF_VAL1,
-		.value.val1 = hash,
+		.value.val1 = {42},
 	};
 	int err;
 	int i;
@@ -711,11 +707,10 @@ static void test_write_object_id_length(void)
 	struct strbuf buf = STRBUF_INIT;
 	struct reftable_writer *w =
 		reftable_new_writer(&strbuf_add_void, &buf, &opts);
-	uint8_t hash[GIT_SHA1_RAWSZ] = {42};
 	struct reftable_ref_record ref = {
 		.update_index = 1,
 		.value_type = REFTABLE_REF_VAL1,
-		.value.val1 = hash,
+		.value.val1 = {42},
 	};
 	int err;
 	int i;
@@ -814,11 +809,10 @@ static void test_write_multiple_indices(void)
 	writer = reftable_new_writer(&strbuf_add_void, &writer_buf, &opts);
 	reftable_writer_set_limits(writer, 1, 1);
 	for (i = 0; i < 100; i++) {
-		unsigned char hash[GIT_SHA1_RAWSZ] = {i};
 		struct reftable_ref_record ref = {
 			.update_index = 1,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash,
+			.value.val1 = {i},
 		};
 
 		strbuf_reset(&buf);
diff --git a/reftable/record.c b/reftable/record.c
index 5e258c734b..a67a6b4d8a 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -219,7 +219,6 @@ static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
 	case REFTABLE_REF_DELETION:
 		break;
 	case REFTABLE_REF_VAL1:
-		ref->value.val1 = reftable_malloc(hash_size);
 		memcpy(ref->value.val1, src->value.val1, hash_size);
 		break;
 	case REFTABLE_REF_VAL2:
@@ -303,7 +302,6 @@ void reftable_ref_record_release(struct reftable_ref_record *ref)
 		reftable_free(ref->value.val2.value);
 		break;
 	case REFTABLE_REF_VAL1:
-		reftable_free(ref->value.val1);
 		break;
 	case REFTABLE_REF_DELETION:
 		break;
@@ -394,7 +392,6 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key,
 			return -1;
 		}
 
-		r->value.val1 = reftable_malloc(hash_size);
 		memcpy(r->value.val1, in.buf, hash_size);
 		string_view_consume(&in, hash_size);
 		break;
diff --git a/reftable/record_test.c b/reftable/record_test.c
index 70ae78feca..5c94d26e35 100644
--- a/reftable/record_test.c
+++ b/reftable/record_test.c
@@ -119,7 +119,6 @@ static void test_reftable_ref_record_roundtrip(void)
 		case REFTABLE_REF_DELETION:
 			break;
 		case REFTABLE_REF_VAL1:
-			in.u.ref.value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
 			set_hash(in.u.ref.value.val1, 1);
 			break;
 		case REFTABLE_REF_VAL2:
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index f7eb2d6015..7f3a0df635 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -9,6 +9,7 @@ license that can be found in the LICENSE file or at
 #ifndef REFTABLE_RECORD_H
 #define REFTABLE_RECORD_H
 
+#include "hash-ll.h"
 #include <stdint.h>
 
 /*
@@ -38,7 +39,7 @@ struct reftable_ref_record {
 #define REFTABLE_NR_REF_VALUETYPES 4
 	} value_type;
 	union {
-		uint8_t *val1; /* malloced hash. */
+		unsigned char val1[GIT_MAX_RAWSZ];
 		struct {
 			uint8_t *value; /* first value, malloced hash  */
 			uint8_t *target_value; /* second value, malloced hash */
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 14a3fc11ee..feab49d7f7 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -463,7 +463,6 @@ static void test_reftable_stack_add(void)
 		refs[i].refname = xstrdup(buf);
 		refs[i].update_index = i + 1;
 		refs[i].value_type = REFTABLE_REF_VAL1;
-		refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
 		set_test_hash(refs[i].value.val1, i);
 
 		logs[i].refname = xstrdup(buf);
@@ -600,7 +599,6 @@ static void test_reftable_stack_tombstone(void)
 		refs[i].update_index = i + 1;
 		if (i % 2 == 0) {
 			refs[i].value_type = REFTABLE_REF_VAL1;
-			refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
 			set_test_hash(refs[i].value.val1, i);
 		}
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 6/8] reftable/record: store "val2" hashes as static arrays
From: Patrick Steinhardt @ 2023-12-28  6:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1703743174.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 5214 bytes --]

Similar to the preceding commit, convert ref records of type "val2" to
store their object IDs in static arrays instead of allocating them for
every single record.

We're using the same benchmark as in the preceding commit, with `git
show-ref --quiet` in a repository with ~350k refs. This time around
though the effects aren't this huge. Before:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 1,419,040 allocs, 1,418,847 frees, 62,153,868 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated

This is because "val2"-type records are typically only stored for peeled
tags, and the number of annotated tags in the benchmark repository is
rather low. Still, it can be seen that this change leads to a reduction
of allocations overall, even if only a small one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/readwrite_test.c  | 12 ++++--------
 reftable/record.c          |  6 ------
 reftable/record_test.c     |  4 ----
 reftable/reftable-record.h |  4 ++--
 4 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 87b238105c..178766dfa8 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -547,8 +547,6 @@ static void test_table_refs_for(int indexed)
 		uint8_t hash[GIT_SHA1_RAWSZ];
 		char fill[51] = { 0 };
 		char name[100];
-		uint8_t hash1[GIT_SHA1_RAWSZ];
-		uint8_t hash2[GIT_SHA1_RAWSZ];
 		struct reftable_ref_record ref = { NULL };
 
 		memset(hash, i, sizeof(hash));
@@ -558,11 +556,9 @@ static void test_table_refs_for(int indexed)
 		name[40] = 0;
 		ref.refname = name;
 
-		set_test_hash(hash1, i / 4);
-		set_test_hash(hash2, 3 + i / 4);
 		ref.value_type = REFTABLE_REF_VAL2;
-		ref.value.val2.value = hash1;
-		ref.value.val2.target_value = hash2;
+		set_test_hash(ref.value.val2.value, i / 4);
+		set_test_hash(ref.value.val2.target_value, 3 + i / 4);
 
 		/* 80 bytes / entry, so 3 entries per block. Yields 17
 		 */
@@ -570,8 +566,8 @@ static void test_table_refs_for(int indexed)
 		n = reftable_writer_add_ref(w, &ref);
 		EXPECT(n == 0);
 
-		if (!memcmp(hash1, want_hash, GIT_SHA1_RAWSZ) ||
-		    !memcmp(hash2, want_hash, GIT_SHA1_RAWSZ)) {
+		if (!memcmp(ref.value.val2.value, want_hash, GIT_SHA1_RAWSZ) ||
+		    !memcmp(ref.value.val2.target_value, want_hash, GIT_SHA1_RAWSZ)) {
 			want_names[want_names_len++] = xstrdup(name);
 		}
 	}
diff --git a/reftable/record.c b/reftable/record.c
index a67a6b4d8a..5c3fbb7b2a 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -222,9 +222,7 @@ static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
 		memcpy(ref->value.val1, src->value.val1, hash_size);
 		break;
 	case REFTABLE_REF_VAL2:
-		ref->value.val2.value = reftable_malloc(hash_size);
 		memcpy(ref->value.val2.value, src->value.val2.value, hash_size);
-		ref->value.val2.target_value = reftable_malloc(hash_size);
 		memcpy(ref->value.val2.target_value,
 		       src->value.val2.target_value, hash_size);
 		break;
@@ -298,8 +296,6 @@ void reftable_ref_record_release(struct reftable_ref_record *ref)
 		reftable_free(ref->value.symref);
 		break;
 	case REFTABLE_REF_VAL2:
-		reftable_free(ref->value.val2.target_value);
-		reftable_free(ref->value.val2.value);
 		break;
 	case REFTABLE_REF_VAL1:
 		break;
@@ -401,11 +397,9 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key,
 			return -1;
 		}
 
-		r->value.val2.value = reftable_malloc(hash_size);
 		memcpy(r->value.val2.value, in.buf, hash_size);
 		string_view_consume(&in, hash_size);
 
-		r->value.val2.target_value = reftable_malloc(hash_size);
 		memcpy(r->value.val2.target_value, in.buf, hash_size);
 		string_view_consume(&in, hash_size);
 		break;
diff --git a/reftable/record_test.c b/reftable/record_test.c
index 5c94d26e35..2876db7d27 100644
--- a/reftable/record_test.c
+++ b/reftable/record_test.c
@@ -122,11 +122,7 @@ static void test_reftable_ref_record_roundtrip(void)
 			set_hash(in.u.ref.value.val1, 1);
 			break;
 		case REFTABLE_REF_VAL2:
-			in.u.ref.value.val2.value =
-				reftable_malloc(GIT_SHA1_RAWSZ);
 			set_hash(in.u.ref.value.val2.value, 1);
-			in.u.ref.value.val2.target_value =
-				reftable_malloc(GIT_SHA1_RAWSZ);
 			set_hash(in.u.ref.value.val2.target_value, 2);
 			break;
 		case REFTABLE_REF_SYMREF:
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index 7f3a0df635..bb6e99acd3 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -41,8 +41,8 @@ struct reftable_ref_record {
 	union {
 		unsigned char val1[GIT_MAX_RAWSZ];
 		struct {
-			uint8_t *value; /* first value, malloced hash  */
-			uint8_t *target_value; /* second value, malloced hash */
+			unsigned char value[GIT_MAX_RAWSZ]; /* first hash  */
+			unsigned char target_value[GIT_MAX_RAWSZ]; /* second hash */
 		} val2;
 		char *symref; /* referent, malloced 0-terminated string */
 	} value;
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 7/8] reftable/merged: really reuse buffers to compute record keys
From: Patrick Steinhardt @ 2023-12-28  6:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1703743174.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

In 829231dc20 (reftable/merged: reuse buffer to compute record keys,
2023-12-11), we have refactored the merged iterator to reuse a set of
buffers that it would otherwise have to reallocate on every single
iteration. Unfortunately, there was a brown-paper-bag-style bug here as
we continue to release these buffers after the iteration, and thus we
have essentially gained nothing.

Fix this performance issue by not releasing those buffers on iteration
anymore, where we instead rely on `merged_iter_close()` to release the
buffers for us.

Using `git show-ref --quiet` in a repository with ~350k refs this leads
to a significant drop in allocations. Before:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 556bb5c556..a28bb99aaf 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -128,8 +128,6 @@ static int merged_iter_next_entry(struct merged_iter *mi,

 done:
 	reftable_record_release(&entry.rec);
-	strbuf_release(&mi->entry_key);
-	strbuf_release(&mi->key);
 	return err;
 }

-- 
2.43.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 8/8] reftable/merged: transfer ownership of records when iterating
From: Patrick Steinhardt @ 2023-12-28  6:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1703743174.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2155 bytes --]

When iterating over recods with the merged iterator we put the records
into a priority queue before yielding them to the caller. This means
that we need to allocate the contents of these records before we can
pass them over to the caller.

The handover to the caller is quite inefficient though because we first
deallocate the record passed in by the caller and then copy over the new
record, which requires us to reallocate memory.

Refactor the code to instead transfer ownership of the new record to the
caller. So instead of reallocating all contents, we now release the old
record and then copy contents of the new record into place.

The following benchmark of `git show-ref --quiet` in a repository with
around 350k refs shows a clear improvement. Before:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 357,007 allocs, 356,814 frees, 24,193,602 bytes allocated

This shows that we now have roundabout a single allocation per record
that we're yielding from the iterator. Ideally, we'd also get rid of
this allocation so that the number of allocations doesn't scale with the
number of refs anymore. This would require some larger surgery though
because the memory is owned by the priority queue before transferring it
over to the caller.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index a28bb99aaf..a52914d667 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -124,10 +124,12 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 		reftable_record_release(&top.rec);
 	}

-	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+	reftable_record_release(rec);
+	*rec = entry.rec;

 done:
-	reftable_record_release(&entry.rec);
+	if (err)
+		reftable_record_release(&entry.rec);
 	return err;
 }

-- 
2.43.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v4] sideband.c: remove redundant 'NEEDSWORK' tag
From: Chandra Pratap via GitGitGadget @ 2023-12-28  8:01 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen, Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1625.v3.git.1703672407895.gitgitgadget@gmail.com>

From: Chandra Pratap <chandrapratap3519@gmail.com>

Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
---
    sideband.c: replace int with size_t for clarity

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1625%2FChand-ra%2Fdusra-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1625/Chand-ra/dusra-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1625

Range-diff vs v3:

 1:  273415aa6a4 ! 1:  8c003256e5b sideband.c: remove redundant 'NEEDSWORK' tag
     @@ sideband.c: void list_config_color_sideband_slots(struct string_list *list, cons
        *
      - * NEEDSWORK: use "size_t n" instead for clarity.
      + * It is fine to use "int n" here instead of "size_t n" as all calls to this
     -+ * function pass an 'int' parameter.
     ++ * function pass an 'int' parameter. Additionally, the buffer involved in
     ++ * storing these 'int' values takes input from a packet via the pkt-line
     ++ * interface, which is capable of transferring only 64kB at a time.
        */
       static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
       {


 sideband.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/sideband.c b/sideband.c
index 6cbfd391c47..266a67342be 100644
--- a/sideband.c
+++ b/sideband.c
@@ -69,7 +69,10 @@ void list_config_color_sideband_slots(struct string_list *list, const char *pref
  * of the line. This should be called for a single line only, which is
  * passed as the first N characters of the SRC array.
  *
- * NEEDSWORK: use "size_t n" instead for clarity.
+ * It is fine to use "int n" here instead of "size_t n" as all calls to this
+ * function pass an 'int' parameter. Additionally, the buffer involved in
+ * storing these 'int' values takes input from a packet via the pkt-line
+ * interface, which is capable of transferring only 64kB at a time.
  */
 static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
 {

base-commit: 1a87c842ece327d03d08096395969aca5e0a6996
-- 
gitgitgadget

^ permalink raw reply related

* Re: reftable: How to represent deleted referees of symrefs in the reflog?
From: Patrick Steinhardt @ 2023-12-28  8:14 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git, Jonathan Nieder, Terry Parker, Luca Milanesio
In-Reply-To: <CAOw_e7aDPTjDj_K3yyyatWkw3R-oT+_u1LgZyzQEE=6LnH+pmg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3802 bytes --]

On Wed, Dec 20, 2023 at 08:03:00PM +0100, Han-Wen Nienhuys wrote:
> On Tue, Nov 21, 2023 at 2:46 PM Patrick Steinhardt <ps@pks.im> wrote:
> > To me it seems like deletions in this case only delete a particular log
> > entry instead of the complete log for a particular reference. And some
> > older discussion [1] seems to confirm my hunch that a complete reflog is
> > deleted not with `log_type = 0x0`, but instead by writing the null
> > object ID as new ID.
> 
> No, writing a null OID (more precisely a transition from $SHA1 =>
> $ZERO) means that a branch was at $SHA1, and then was deleted. The
> reflog continues to exist, and new entries may be added by reviving
> the branch. That would add a $ZERO => $NEWSHA transition, but the
> history of the branch prior to its deletion is retained.
> 
> >  # This behaviour is a bit more on the weird side. We delete the
> >  # referee, and that causes the files backend to claim that the reflog
> >  # for HEAD is gone, as well. The reflog still exists though, as
> >  # demonstrated in the next command.
> >  $ git update-ref -m delete-main -d refs/heads/main
> >  $ git reflog show HEAD
> > fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
> 
> This looks wrong to me. HEAD has a history, and that history didn't go
> away because the current branch happened to be deleted.
> 
> > It kind of feels like the second step in the files backend where the
> > reflog is claimed to not exist is buggy -- I'd have expected to still
> 
> right, I agree.
> 
> > see the reflog, as the HEAD reference exists quite alright and has never
> > stopped to exist. And in the third step, I'd have expected to see three
> > reflog entries, including the deletion that is indeed still present in
> > the on-disk logfile.
> >
> > But with the reftable backend the problem becomes worse: we cannot even
> > represent the fact that the reflog still exists, but that the deletion
> > of the referee has caused the HEAD to point to the null OID, because the
> > null OID indicates complete deletion of the reflog.
> 
> This doesn't match my recollection. See
> https://github.com/git/git/pull/1215, or more precisely
> https://github.com/git/git/blob/3b2c5ddb28fa42a8c944373bea2ca756b1723908/refs/reftable-backend.c#L1582
> 
> Removing the entire reflog means removing all the individual entries
> using tombstones.

Yeah, indeed. I was misinterpreting older discussions here, and now use
individual tombstones which does work as expected. It's a bit
unfortunate that deletion of the reflog thus scales with its size, but
for now I think that's okay. We can still iterate on this if this ever
proves to become a problem.

> > Consequentially, if
> > we wrote the null OID, we'd only be able to see the last log entry here.
> >
> > It may totally be that I'm missing something obvious here. But if not,
> > it leaves us with a couple of options for how to fix it:
> >
> >     1. Disregard this edge case and accept that the reftable backend
> >        does not do exactly the same thing as the files backend in very
> >        specific situations like this.
> 
> I remember discussing with Jun that it would be acceptable to have
> slight semantic differences if unavoidable for the reflogs, and there
> should be a record of this in the list.  I think there will always be
> some differences: for example, dropping reflogs because of a dir/file
> conflict seems like a bug rather than a feature.

Initially I'm aiming for matching semantics, because this will make it a
ton easier to land the initial implementation of the backend. But I do
agree that eventually we may want to iterate and allow for diverging
behaviour between the backends.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 01/12] t: introduce DEFAULT_REPO_FORMAT prereq
From: Patrick Steinhardt @ 2023-12-28  8:55 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
In-Reply-To: <CAOLa=ZSY-FBUMOx9SHkJqf7XteQC=hVoYxo73q=E1wL3QOBYtA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 724 bytes --]

On Fri, Dec 22, 2023 at 03:41:10AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
[snip]
> > -test_expect_success SHA1 'git branch -m q q2 without config should succeed' '
> > +test_expect_success DEFAULT_REPO_FORMAT 'git branch -m q q2 without config should succeed' '
> >  	git branch -m q q2 &&
> >  	git branch -m q2 q
> >  '
> 
> My only concern is whether we'll end up blindly adding
> DEFAULT_REPO_FORMAT for tests where only SHA1 is a prereq or only where
> the new format extension is a prereq.

Yeah, this is of course a possibility with this new prereq. I don't
really know what to do about it though, and think that it does serve a
sensible purpose. So... dunno.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 03/12] refs: refactor logic to look up storage backends
From: Patrick Steinhardt @ 2023-12-28  8:56 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
In-Reply-To: <CAOLa=ZR1Bc+Acwjma9OM0SvyLg4-HB-S3Pxr66VCKTb0d_tdnw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2044 bytes --]

On Fri, Dec 22, 2023 at 04:38:30AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > +
> > +const char *ref_storage_format_to_name(int ref_storage_format)
> > +{
> > +	const struct ref_storage_be *be = find_ref_storage_backend(ref_storage_format);
> > +	if (!be)
> > +		return "unknown";
> > +	return be->name;
> > +}
> >
> 
> Would it make more sense to return NULL here?

The only purpose of this function will be to print error messages, so
"unknown" seems like the better choice.

> >  /*
> >   * How to handle various characters in refnames:
> >   * 0: An acceptable character for refs
> > @@ -2029,12 +2045,12 @@ static struct ref_store *ref_store_init(struct repository *repo,
> >  					const char *gitdir,
> >  					unsigned int flags)
> >  {
> > -	const char *be_name = "files";
> > -	struct ref_storage_be *be = find_ref_storage_backend(be_name);
> > +	int format = REF_STORAGE_FORMAT_FILES;
> > +	const struct ref_storage_be *be = find_ref_storage_backend(format);
> >  	struct ref_store *refs;
> >
> >  	if (!be)
> > -		BUG("reference backend %s is unknown", be_name);
> > +		BUG("reference backend %s is unknown", ref_storage_format_to_name(format));
> >
> 
> This doesn't really get us more information, since be == NULL, calling
> `ref_storage_format_to_name` will again call `find_ref_storage_backend`,
> which will be NULL. I wonder if its better to:
> 
> @@ -2029,12 +2045,12 @@ static struct ref_store *ref_store_init(struct
> repository *repo,
>                      const char *gitdir,
>                      unsigned int flags)
>   {
>  -	const char *be_name = "files";
>  -	struct ref_storage_be *be = find_ref_storage_backend(be_name);
>  +	int format = REF_STORAGE_FORMAT_FILES;
>  +	const struct ref_storage_be *be = find_ref_storage_backend(format);
>      struct ref_store *refs;
> 
>      if (!be)
>  -		BUG("reference backend %s is unknown", be_name);
>  +		BUG("reference backend is unknown");

Good point, will change.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 04/12] setup: start tracking ref storage format when
From: Patrick Steinhardt @ 2023-12-28  8:56 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
In-Reply-To: <CAOLa=ZQLLqj=78dJN0y51EFf3_XsHnTBYNjkB0cAAGf3dXJ9KA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

On Fri, Dec 22, 2023 at 05:09:37AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/refs.c b/refs.c
> > index e87c85897d..d289a5e175 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -2045,12 +2045,12 @@ static struct ref_store *ref_store_init(struct repository *repo,
> >  					const char *gitdir,
> >  					unsigned int flags)
> >  {
> > -	int format = REF_STORAGE_FORMAT_FILES;
> > -	const struct ref_storage_be *be = find_ref_storage_backend(format);
> > +	const struct ref_storage_be *be;
> >  	struct ref_store *refs;
> >
> > +	be = find_ref_storage_backend(repo->ref_storage_format);
> >  	if (!be)
> > -		BUG("reference backend %s is unknown", ref_storage_format_to_name(format));
> > +		BUG("reference backend is unknown");
> >
> >  	refs = be->init(repo, gitdir, flags);
> >  	return refs;
> > diff --git a/refs.h b/refs.h
> > index c6571bcf6c..944a67ac1b 100644
> > --- a/refs.h
> > +++ b/refs.h
> > @@ -11,6 +11,7 @@ struct string_list;
> >  struct string_list_item;
> >  struct worktree;
> >
> > +int default_ref_storage_format(void);
> >
> 
> Is this used/defined somewhere?

No, it's a leftover from a previous iteration. Will drop.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 04/12] setup: start tracking ref storage format when
From: Patrick Steinhardt @ 2023-12-28  8:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqv88ssp4r.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 1629 bytes --]

On Wed, Dec 20, 2023 at 10:30:28AM -0800, Junio C Hamano wrote:
> There is a topic in-flight that introduces the .compat_hash_algo
> member to the repo_fmt structure.  Seeing a conflict resolution like
> the attached (there are many others that are similar in spirit), I
> have to wonder if we want to add repo_set_ref_storage_format()
> helper function.  There are many assignments to .ref_storage_format
> member after this series is applied.
> 
> Note that I haven't read the series in full, so take the above with
> a grain of salt---it might turn out to be that direct assignment is
> more desirable, I dunno.
> 
> Thanks.

Wrapping this in a `repo_set_ref_storage_format()` doesn't add much for
now, but on the other hand it also doesn't hurt and is more in line with
how we set the other formats. This could also be useful in the future to
have a central place for additional sanity checks, if required. So yeah,
let's do it.

Makes me wonder whether we should then also add the following diff to
"setup: set repository's format on init" when both topics are being
merged together:

diff --git a/setup.c b/setup.c
index 3d980814bc..3d35c78c68 100644
--- a/setup.c
+++ b/setup.c
@@ -2210,6 +2210,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * format we can update the repository's settings accordingly.
 	 */
 	repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+	repo_set_compat_hash_algo(the_repository, repo_fmt.compat_hash_algo);
 	repo_set_ref_storage_format(the_repository, repo_fmt.ref_storage_format);
 
 	if (!(flags & INIT_DB_SKIP_REFDB))

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 00/12] Introduce `refStorage` extension
From: Patrick Steinhardt @ 2023-12-28  9:57 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano
In-Reply-To: <cover.1703067989.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 12827 bytes --]

Hi,

this is the second version of my patch series that introduces the new
`refStorage` extension. This extension will be used for the upcoming
reftable backend.

Changes compared to v2:

  - Fixed various typos in commit messages.

  - Fixed redundant information when the refstorage extension's value
    isn't understood.

  - Introduced `repo_set_ref_storage_format()`.

Thanks for the feedback so far!

Patrick

Patrick Steinhardt (12):
  t: introduce DEFAULT_REPO_FORMAT prereq
  worktree: skip reading HEAD when repairing worktrees
  refs: refactor logic to look up storage backends
  setup: start tracking ref storage format
  setup: set repository's formats on init
  setup: introduce "extensions.refStorage" extension
  setup: introduce GIT_DEFAULT_REF_FORMAT envvar
  t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
  builtin/rev-parse: introduce `--show-ref-format` flag
  builtin/init: introduce `--ref-format=` value flag
  builtin/clone: introduce `--ref-format=` value flag
  t9500: write "extensions.refstorage" into config

 Documentation/config/extensions.txt           | 11 +++
 Documentation/git-clone.txt                   |  6 ++
 Documentation/git-init.txt                    |  7 ++
 Documentation/git-rev-parse.txt               |  3 +
 Documentation/git.txt                         |  5 ++
 Documentation/ref-storage-format.txt          |  1 +
 .../technical/repository-version.txt          |  5 ++
 builtin/clone.c                               | 17 ++++-
 builtin/init-db.c                             | 15 +++-
 builtin/rev-parse.c                           |  4 ++
 refs.c                                        | 34 ++++++---
 refs.h                                        |  3 +
 refs/debug.c                                  |  1 -
 refs/files-backend.c                          |  1 -
 refs/packed-backend.c                         |  1 -
 refs/refs-internal.h                          |  1 -
 repository.c                                  |  6 ++
 repository.h                                  |  7 ++
 setup.c                                       | 63 +++++++++++++++--
 setup.h                                       |  9 ++-
 t/README                                      |  3 +
 t/t0001-init.sh                               | 70 +++++++++++++++++++
 t/t1500-rev-parse.sh                          | 17 +++++
 t/t3200-branch.sh                             |  2 +-
 t/t5601-clone.sh                              | 17 +++++
 t/t9500-gitweb-standalone-no-errors.sh        |  5 ++
 t/test-lib-functions.sh                       |  5 ++
 t/test-lib.sh                                 | 15 +++-
 worktree.c                                    | 24 ++++---
 29 files changed, 323 insertions(+), 35 deletions(-)
 create mode 100644 Documentation/ref-storage-format.txt

Range-diff against v1:
 1:  239ca38efd !  1:  3613439cb7 t: introduce DEFAULT_REPO_FORMAT prereq
    @@ Commit message
         repository format or otherwise they would fail to run, e.g. because they
         fail to detect the correct hash function. While the hash function is the
         only extension right now that creates problems like this, we are about
    -    to add a second extensions for the ref format.
    +    to add a second extension for the ref format.
     
         Introduce a new DEFAULT_REPO_FORMAT prereq that can easily be amended
         whenever we add new format extensions. Next to making any such changes
 2:  e895091025 !  2:  ecf4f1ddee worktree: skip reading HEAD when repairing worktrees
    @@ Commit message
         worktree: skip reading HEAD when repairing worktrees
     
         When calling `git init --separate-git-dir=<new-path>` on a preexisting
    -    repository, then we move the Git directory of that repository to the new
    -    path specified by the user. If there are worktrees present in the Git
    -    repository, we need to repair the worktrees so that their gitlinks point
    -    to the new location of the repository.
    +    repository, we move the Git directory of that repository to the new path
    +    specified by the user. If there are worktrees present in the repository,
    +    we need to repair the worktrees so that their gitlinks point to the new
    +    location of the repository.
     
         This repair logic will load repositories via `get_worktrees()`, which
         will enumerate up and initialize all worktrees. Part of initialization
    @@ Commit message
         format, which does not work.
     
         We do not require the worktree HEADs at all to repair worktrees. So
    -    let's fix issue this by skipping over the step that reads them.
    +    let's fix this issue by skipping over the step that reads them.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
 3:  f712d5ef5b !  3:  12329b99b7 refs: refactor logic to look up storage backends
    @@ refs.c
     -	for (be = refs_backends; be; be = be->next)
     -		if (!strcmp(be->name, name))
     -			return be;
    -+	if (ref_storage_format && ref_storage_format < ARRAY_SIZE(refs_backends))
    ++	if (ref_storage_format < ARRAY_SIZE(refs_backends))
     +		return refs_backends[ref_storage_format];
      	return NULL;
      }
    @@ refs.c: static struct ref_store *ref_store_init(struct repository *repo,
      
      	if (!be)
     -		BUG("reference backend %s is unknown", be_name);
    -+		BUG("reference backend %s is unknown", ref_storage_format_to_name(format));
    ++		BUG("reference backend is unknown");
      
      	refs = be->init(repo, gitdir, flags);
      	return refs;
 4:  6564659d40 !  4:  ddd099fbaf setup: start tracking ref storage format when
    @@ Metadata
     Author: Patrick Steinhardt <ps@pks.im>
     
      ## Commit message ##
    -    setup: start tracking ref storage format when
    +    setup: start tracking ref storage format
     
         In order to discern which ref storage format a repository is supposed to
         use we need to start setting up and/or discovering the format. This
    @@ Commit message
           - The first path is when we create a repository via `init_db()`. When
             we are re-initializing a preexisting repository we need to retain
             the previously used ref storage format -- if the user asked for a
    -        different format then this indicates an erorr and we error out.
    +        different format then this indicates an error and we error out.
             Otherwise we either initialize the repository with the format asked
             for by the user or the default format, which currently is the
             "files" backend.
    @@ refs.c: static struct ref_store *ref_store_init(struct repository *repo,
      
     +	be = find_ref_storage_backend(repo->ref_storage_format);
      	if (!be)
    --		BUG("reference backend %s is unknown", ref_storage_format_to_name(format));
    -+		BUG("reference backend is unknown");
    - 
    - 	refs = be->init(repo, gitdir, flags);
    - 	return refs;
    -
    - ## refs.h ##
    -@@ refs.h: struct string_list;
    - struct string_list_item;
    - struct worktree;
    - 
    -+int default_ref_storage_format(void);
    - int ref_storage_format_by_name(const char *name);
    - const char *ref_storage_format_to_name(int ref_storage_format);
    + 		BUG("reference backend is unknown");
      
     
      ## repository.c ##
    +@@ repository.c: void repo_set_hash_algo(struct repository *repo, int hash_algo)
    + 	repo->hash_algo = &hash_algos[hash_algo];
    + }
    + 
    ++void repo_set_ref_storage_format(struct repository *repo, int format)
    ++{
    ++	repo->ref_storage_format = format;
    ++}
    ++
    + /*
    +  * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
    +  * Return 0 upon success and a non-zero value upon failure.
     @@ repository.c: int repo_init(struct repository *repo,
      		goto error;
      
      	repo_set_hash_algo(repo, format.hash_algo);
    -+	repo->ref_storage_format = format.ref_storage_format;
    ++	repo_set_ref_storage_format(repo, format.ref_storage_format);
      	repo->repository_format_worktree_config = format.worktree_config;
      
      	/* take ownership of format.partial_clone */
    @@ repository.h: struct repository {
      	/* A unique-id for tracing purposes. */
      	int trace2_repo_id;
      
    +@@ repository.h: void repo_set_gitdir(struct repository *repo, const char *root,
    + 		     const struct set_gitdir_args *extra_args);
    + void repo_set_worktree(struct repository *repo, const char *path);
    + void repo_set_hash_algo(struct repository *repo, int algo);
    ++void repo_set_ref_storage_format(struct repository *repo, int format);
    + void initialize_the_repository(void);
    + RESULT_MUST_BE_USED
    + int repo_init(struct repository *r, const char *gitdir, const char *worktree);
     
      ## setup.c ##
     @@ setup.c: const char *setup_git_directory_gently(int *nongit_ok)
      		}
      		if (startup_info->have_repository) {
      			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
    -+			the_repository->ref_storage_format =
    -+				repo_fmt.ref_storage_format;
    ++			repo_set_ref_storage_format(the_repository,
    ++						    repo_fmt.ref_storage_format);
      			the_repository->repository_format_worktree_config =
      				repo_fmt.worktree_config;
      			/* take ownership of repo_fmt.partial_clone */
    @@ setup.c: void check_repository_format(struct repository_format *fmt)
      	check_repository_format_gently(get_git_dir(), fmt, NULL);
      	startup_info->have_repository = 1;
      	repo_set_hash_algo(the_repository, fmt->hash_algo);
    -+	the_repository->ref_storage_format =
    -+		fmt->ref_storage_format;
    ++	repo_set_ref_storage_format(the_repository,
    ++				    fmt->ref_storage_format);
      	the_repository->repository_format_worktree_config =
      		fmt->worktree_config;
      	the_repository->repository_format_partial_clone =
    @@ setup.c: void create_reference_database(const char *initial_branch, int quiet)
      	safe_create_dir(git_path("refs"), 1);
      	adjust_shared_perm(git_path("refs"));
      
    -+	the_repository->ref_storage_format = ref_storage_format;
    ++	repo_set_ref_storage_format(the_repository, ref_storage_format);
      	if (refs_init_db(&err))
      		die("failed to set up refs db: %s", err.buf);
      
 5:  f90a63d63c !  5:  01a1e58a97 setup: set repository's formats on init
    @@ setup.c: int init_db(const char *git_dir, const char *real_git_dir,
     +	 * Now that we have set up both the hash algorithm and the ref storage
     +	 * format we can update the repository's settings accordingly.
     +	 */
    -+	the_repository->hash_algo = &hash_algos[repo_fmt.hash_algo];
    -+	the_repository->ref_storage_format = repo_fmt.ref_storage_format;
    ++	repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
    ++	repo_set_ref_storage_format(the_repository, repo_fmt.ref_storage_format);
     +
      	if (!(flags & INIT_DB_SKIP_REFDB))
      		create_reference_database(repo_fmt.ref_storage_format,
 6:  beeb182f28 =  6:  0a586fa648 setup: introduce "extensions.refStorage" extension
 7:  dd91a75da4 =  7:  6d8754f73a setup: introduce GIT_DEFAULT_REF_FORMAT envvar
 8:  ed3bf008cd =  8:  c645932f3d t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
 9:  8a3d950d69 =  9:  761d647770 builtin/rev-parse: introduce `--show-ref-format` flag
10:  4d98b53553 = 10:  e382b5bf08 builtin/init: introduce `--ref-format=` value flag
11:  71cf0ce827 = 11:  257233658d builtin/clone: introduce `--ref-format=` value flag
12:  bbe2fbb154 ! 12:  b8cd06ec53 t9500: write "extensions.refstorage" into config
    @@ Commit message
         t9500: write "extensions.refstorage" into config
     
         In t9500 we're writing a custom configuration that sets up gitweb. This
    -    requires us manually ensure that the repository format is configured as
    -    required, including both the repository format version and extensions.
    -    With the introduction of the "extensions.refStorage" extension we need
    -    to update the test to also write this new one.
    +    requires us to manually ensure that the repository format is configured
    +    as required, including both the repository format version and
    +    extensions. With the introduction of the "extensions.refStorage"
    +    extension we need to update the test to also write this new one.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     

base-commit: e79552d19784ee7f4bbce278fe25f93fbda196fa
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 01/12] t: introduce DEFAULT_REPO_FORMAT prereq
From: Patrick Steinhardt @ 2023-12-28  9:57 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano
In-Reply-To: <cover.1703753910.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1587 bytes --]

A limited number of tests require repositories to have the default
repository format or otherwise they would fail to run, e.g. because they
fail to detect the correct hash function. While the hash function is the
only extension right now that creates problems like this, we are about
to add a second extension for the ref format.

Introduce a new DEFAULT_REPO_FORMAT prereq that can easily be amended
whenever we add new format extensions. Next to making any such changes
easier on us, the prerequisite's name should also help to clarify the
intent better.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t3200-branch.sh | 2 +-
 t/test-lib.sh     | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 6a316f081e..de7d3014e4 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -519,7 +519,7 @@ EOF
 
 mv .git/config .git/config-saved
 
-test_expect_success SHA1 'git branch -m q q2 without config should succeed' '
+test_expect_success DEFAULT_REPO_FORMAT 'git branch -m q q2 without config should succeed' '
 	git branch -m q q2 &&
 	git branch -m q2 q
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 876b99562a..dc03f06b8e 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1936,6 +1936,10 @@ test_lazy_prereq SHA1 '
 	esac
 '
 
+test_lazy_prereq DEFAULT_REPO_FORMAT '
+	test_have_prereq SHA1
+'
+
 # Ensure that no test accidentally triggers a Git command
 # that runs the actual maintenance scheduler, affecting a user's
 # system permanently.
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 02/12] worktree: skip reading HEAD when repairing worktrees
From: Patrick Steinhardt @ 2023-12-28  9:57 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano
In-Reply-To: <cover.1703753910.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4298 bytes --]

When calling `git init --separate-git-dir=<new-path>` on a preexisting
repository, we move the Git directory of that repository to the new path
specified by the user. If there are worktrees present in the repository,
we need to repair the worktrees so that their gitlinks point to the new
location of the repository.

This repair logic will load repositories via `get_worktrees()`, which
will enumerate up and initialize all worktrees. Part of initialization
is logic that we resolve their respective worktree HEADs, even though
that information may not actually be needed in the end by all callers.

In the context of git-init(1) this is about to become a problem, because
we do not have a repository that was set up via `setup_git_directory()`
or friends. Consequentially, it is not yet fully initialized at the time
of calling `repair_worktrees()`, and properly setting up all parts of
the repository in `init_db()` before we repair worktrees is not an easy
thing to do. While this is okay right now where we only have a single
reference backend in Git, once we gain a second one we would be trying
to look up the worktree HEADs before we have figured out the reference
format, which does not work.

We do not require the worktree HEADs at all to repair worktrees. So
let's fix this issue by skipping over the step that reads them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 worktree.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/worktree.c b/worktree.c
index a56a6c2a3d..9702ed0308 100644
--- a/worktree.c
+++ b/worktree.c
@@ -51,7 +51,7 @@ static void add_head_info(struct worktree *wt)
 /**
  * get the main worktree
  */
-static struct worktree *get_main_worktree(void)
+static struct worktree *get_main_worktree(int skip_reading_head)
 {
 	struct worktree *worktree = NULL;
 	struct strbuf worktree_path = STRBUF_INIT;
@@ -70,11 +70,13 @@ static struct worktree *get_main_worktree(void)
 	 */
 	worktree->is_bare = (is_bare_repository_cfg == 1) ||
 		is_bare_repository();
-	add_head_info(worktree);
+	if (!skip_reading_head)
+		add_head_info(worktree);
 	return worktree;
 }
 
-static struct worktree *get_linked_worktree(const char *id)
+static struct worktree *get_linked_worktree(const char *id,
+					    int skip_reading_head)
 {
 	struct worktree *worktree = NULL;
 	struct strbuf path = STRBUF_INIT;
@@ -93,7 +95,8 @@ static struct worktree *get_linked_worktree(const char *id)
 	CALLOC_ARRAY(worktree, 1);
 	worktree->path = strbuf_detach(&worktree_path, NULL);
 	worktree->id = xstrdup(id);
-	add_head_info(worktree);
+	if (!skip_reading_head)
+		add_head_info(worktree);
 
 done:
 	strbuf_release(&path);
@@ -118,7 +121,7 @@ static void mark_current_worktree(struct worktree **worktrees)
 	free(git_dir);
 }
 
-struct worktree **get_worktrees(void)
+static struct worktree **get_worktrees_internal(int skip_reading_head)
 {
 	struct worktree **list = NULL;
 	struct strbuf path = STRBUF_INIT;
@@ -128,7 +131,7 @@ struct worktree **get_worktrees(void)
 
 	ALLOC_ARRAY(list, alloc);
 
-	list[counter++] = get_main_worktree();
+	list[counter++] = get_main_worktree(skip_reading_head);
 
 	strbuf_addf(&path, "%s/worktrees", get_git_common_dir());
 	dir = opendir(path.buf);
@@ -137,7 +140,7 @@ struct worktree **get_worktrees(void)
 		while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
 			struct worktree *linked = NULL;
 
-			if ((linked = get_linked_worktree(d->d_name))) {
+			if ((linked = get_linked_worktree(d->d_name, skip_reading_head))) {
 				ALLOC_GROW(list, counter + 1, alloc);
 				list[counter++] = linked;
 			}
@@ -151,6 +154,11 @@ struct worktree **get_worktrees(void)
 	return list;
 }
 
+struct worktree **get_worktrees(void)
+{
+	return get_worktrees_internal(0);
+}
+
 const char *get_worktree_git_dir(const struct worktree *wt)
 {
 	if (!wt)
@@ -591,7 +599,7 @@ static void repair_noop(int iserr UNUSED,
 
 void repair_worktrees(worktree_repair_fn fn, void *cb_data)
 {
-	struct worktree **worktrees = get_worktrees();
+	struct worktree **worktrees = get_worktrees_internal(1);
 	struct worktree **wt = worktrees + 1; /* +1 skips main worktree */
 
 	if (!fn)
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 03/12] refs: refactor logic to look up storage backends
From: Patrick Steinhardt @ 2023-12-28  9:57 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano
In-Reply-To: <cover.1703753910.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 5129 bytes --]

In order to look up ref storage backends, we're currently using a linked
list of backends, where each backend is expected to set up its `next`
pointer to the next ref storage backend. This is kind of a weird setup
as backends need to be aware of other backends without much of a reason.

Refactor the code so that the array of backends is centrally defined in
"refs.c", where each backend is now identified by an integer constant.
Expose functions to translate from those integer constants to the name
and vice versa, which will be required by subsequent patches.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c                | 34 +++++++++++++++++++++++++---------
 refs.h                |  3 +++
 refs/debug.c          |  1 -
 refs/files-backend.c  |  1 -
 refs/packed-backend.c |  1 -
 refs/refs-internal.h  |  1 -
 repository.h          |  3 +++
 7 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/refs.c b/refs.c
index 16bfa21df7..ab14634731 100644
--- a/refs.c
+++ b/refs.c
@@ -33,17 +33,33 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static const struct ref_storage_be *refs_backends[] = {
+	[REF_STORAGE_FORMAT_FILES] = &refs_be_files,
+};
 
-static struct ref_storage_be *find_ref_storage_backend(const char *name)
+static const struct ref_storage_be *find_ref_storage_backend(int ref_storage_format)
 {
-	struct ref_storage_be *be;
-	for (be = refs_backends; be; be = be->next)
-		if (!strcmp(be->name, name))
-			return be;
+	if (ref_storage_format < ARRAY_SIZE(refs_backends))
+		return refs_backends[ref_storage_format];
 	return NULL;
 }
 
+int ref_storage_format_by_name(const char *name)
+{
+	for (int i = 0; i < ARRAY_SIZE(refs_backends); i++)
+		if (refs_backends[i] && !strcmp(refs_backends[i]->name, name))
+			return i;
+	return REF_STORAGE_FORMAT_UNKNOWN;
+}
+
+const char *ref_storage_format_to_name(int ref_storage_format)
+{
+	const struct ref_storage_be *be = find_ref_storage_backend(ref_storage_format);
+	if (!be)
+		return "unknown";
+	return be->name;
+}
+
 /*
  * How to handle various characters in refnames:
  * 0: An acceptable character for refs
@@ -2029,12 +2045,12 @@ static struct ref_store *ref_store_init(struct repository *repo,
 					const char *gitdir,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	int format = REF_STORAGE_FORMAT_FILES;
+	const struct ref_storage_be *be = find_ref_storage_backend(format);
 	struct ref_store *refs;
 
 	if (!be)
-		BUG("reference backend %s is unknown", be_name);
+		BUG("reference backend is unknown");
 
 	refs = be->init(repo, gitdir, flags);
 	return refs;
diff --git a/refs.h b/refs.h
index ff113bb12a..e2098ea51b 100644
--- a/refs.h
+++ b/refs.h
@@ -11,6 +11,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+int ref_storage_format_by_name(const char *name);
+const char *ref_storage_format_to_name(int ref_storage_format);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/debug.c b/refs/debug.c
index 83b7a0ba65..b9775f2c37 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -426,7 +426,6 @@ static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
 }
 
 struct ref_storage_be refs_be_debug = {
-	.next = NULL,
 	.name = "debug",
 	.init = NULL,
 	.init_db = debug_init_db,
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ad8b1d143f..43fd0ac760 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3241,7 +3241,6 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err UNUSED)
 }
 
 struct ref_storage_be refs_be_files = {
-	.next = NULL,
 	.name = "files",
 	.init = files_ref_store_create,
 	.init_db = files_init_db,
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b9fa097a29..8d1090e284 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1705,7 +1705,6 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
 }
 
 struct ref_storage_be refs_be_packed = {
-	.next = NULL,
 	.name = "packed",
 	.init = packed_ref_store_create,
 	.init_db = packed_init_db,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 4af83bf9a5..8e9f04cc67 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -663,7 +663,6 @@ typedef int read_symbolic_ref_fn(struct ref_store *ref_store, const char *refnam
 				 struct strbuf *referent);
 
 struct ref_storage_be {
-	struct ref_storage_be *next;
 	const char *name;
 	ref_store_init_fn *init;
 	ref_init_db_fn *init_db;
diff --git a/repository.h b/repository.h
index 5f18486f64..ea4c488b81 100644
--- a/repository.h
+++ b/repository.h
@@ -24,6 +24,9 @@ enum fetch_negotiation_setting {
 	FETCH_NEGOTIATION_NOOP,
 };
 
+#define REF_STORAGE_FORMAT_UNKNOWN 0
+#define REF_STORAGE_FORMAT_FILES   1
+
 struct repo_settings {
 	int initialized;
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox