Git development
 help / color / mirror / Atom feed
* Re: [PATCH v2 4/4] rev-list: allow missing tips with --missing=[print|allow*]
From: Linus Arver @ 2024-02-13 22:33 UTC (permalink / raw)
  To: Christian Couder, git
  Cc: Junio C Hamano, Patrick Steinhardt, John Cai, Christian Couder,
	Christian Couder
In-Reply-To: <20240208135055.2705260-5-christian.couder@gmail.com>

Christian Couder <christian.couder@gmail.com> writes:

> In 9830926c7d (rev-list: add commit object support in `--missing`
> option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
> so that it works with with missing commits, not just blobs/trees.
>
> Unfortunately, such a command would still fail with a "fatal: bad
> object <oid>" if it is passed a missing commit, blob or tree as an
> argument (before the rev walking even begins).
>
> When such a command is used to find the dependencies of some objects,
> for example the dependencies of quarantined objects (see the
> "QUARANTINE ENVIRONMENT" section in the git-receive-pack(1)
> documentation), it would be better if the command would instead
> consider such missing objects, especially commits, in the same way as
> other missing objects.

Looks good.

> If, for example `--missing=print` is used, it would be nice for some
> use cases if the missing tips passed as arguments were reported in
> the same way as other missing objects instead of the command just
> failing.

Nit: this paragraph sounds a bit redundant but it is giving an explicit
example of `--missing=print` which was not discussed so far, so I think
it's fine as is.

> We could introduce a new option to make it work like this, but most
> users are likely to prefer the command to have this behavior as the
> default one. Introducing a new option would require another dumb loop
> to look for that options early, which isn't nice.

s/options/option

> Also we made `git rev-list` work with missing commits very recently
> and the command is most often passed commits as arguments. So let's
> consider this as a bug fix related to these previous change.

s/previous change/recent changes

> While at it let's add a NEEDSWORK comment to say that we should get
> rid of the existing ugly dumb loops that parse the
> `--exclude-promisor-objects` and `--missing=...` options early.
>
> Helped-by: Linus Arver <linusa@google.com>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  Documentation/rev-list-options.txt |  4 +++
>  builtin/rev-list.c                 | 15 +++++++-
>  revision.c                         | 14 ++++++--
>  t/t6022-rev-list-missing.sh        | 56 ++++++++++++++++++++++++++++++
>  4 files changed, 85 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index a583b52c61..bb753b6292 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -1019,6 +1019,10 @@ Unexpected missing objects will raise an error.
>  +
>  The form '--missing=print' is like 'allow-any', but will also print a
>  list of the missing objects.  Object IDs are prefixed with a ``?'' character.
> ++
> +If some tips passed to the traversal are missing, they will be
> +considered as missing too, and the traversal will ignore them. In case
> +we cannot get their Object ID though, an error will be raised.

The only other mention of the term "tips" is for the "--alternate-refs"
flag on line 215 which uses "ref tips". Perhaps this is obvious to
veteran Git developers but I do wonder if we need to somehow define it
(however briefly) the first time we mention it (either in the document
as a whole, or just within this newly added paragraph).

Here's an alternate wording

    Ref tips given on the command line are a special case. They are
    first dereferenced to Object IDs (if this is not possible, an error
    will be raised). Then these IDs are checked to see if the objects
    they refer to exist. If so, the traversal happens starting with
    these tips. Otherwise, then such tips will not be used for
    traversal.

    Even though such missing tips won't be included for traversal, for
    purposes of the `--missing` flag they will be treated the same way
    as those objects that did get traversed (and were determined to be
    missing). For example, if `--missing=print` is given then the Object
    IDs of these tips will be printed just like all other missing
    objects encountered during traversal.

But also, I realize that these documentation touch-ups might be better
served by an overall pass over the whole document, so it's fine if we
decide not to take this suggestion at this time.

Aside: unfortunately we don't seem to define the relationship between
ref tips (e.g., "HEAD"), object IDs (40-char hex string), and the actual
objects (the real data that we traverse over). It's probably another
thing that could be fixed up in the docs in the future.

>  --exclude-promisor-objects::
>  	(For internal use only.)  Prefilter object traversal at
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index b3f4783858..ec9556f135 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -545,6 +545,15 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  	 *
>  	 * Let "--missing" to conditionally set fetch_if_missing.
>  	 */
> +	/*
> +	 * NEEDSWORK: These dump loops to look for some options early
> +	 * are ugly.

I agree with Junio's suggestion to use more objective language.

> We really need setup_revisions() to have a
> +	 * mechanism to allow and disallow some sets of options for
> +	 * different commands (like rev-list, replay, etc). Such

s/Such/Such a

> +	 * mechanism should do an early parsing of option and be able

s/option/options

> +	 * to manage the `--exclude-promisor-objects` and `--missing=...`
> +	 * options below.
> +	 */
>  	for (i = 1; i < argc; i++) {
>  		const char *arg = argv[i];
>  		if (!strcmp(arg, "--exclude-promisor-objects")) {
> 
> [...]
> 
> @@ -2178,13 +2183,18 @@ static int handle_revision_arg_1(const char *arg_, struct rev_info *revs, int fl
>  	if (revarg_opt & REVARG_COMMITTISH)
>  		get_sha1_flags |= GET_OID_COMMITTISH;
>  
> +	/*
> +	 * Even if revs->do_not_die_on_missing_objects is set, we
> +	 * should error out if we can't even get an oid, ...
> +	 */

Perhaps this wording is more precise?

    If we can't even get an oid, we are forced to error out (regardless
    of revs->do_not_die_on_missing_objects) because a valid traversal
    must start from *some* valid oid. OTOH we ignore the ref tip
    altogether with revs->ignore_missing.

> +	 * ... as
> +	 * `--missing=print` should be able to report missing oids.

I think this comment would be better placed ...

>  	if (get_oid_with_context(revs->repo, arg, get_sha1_flags, &oid, &oc))
>  		return revs->ignore_missing ? 0 : -1;
>  	if (!cant_be_filename)
>  		verify_non_filename(revs->prefix, arg);
>  	object = get_reference(revs, arg, &oid, flags ^ local_flags);

... around here.

>  	if (!object)
> -		return revs->ignore_missing ? 0 : -1;
> +		return (revs->ignore_missing || revs->do_not_die_on_missing_objects) ? 0 : -1;
>  	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
>  	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
>  	free(oc.path);
> @@ -3830,8 +3840,6 @@ int prepare_revision_walk(struct rev_info *revs)
>  				       FOR_EACH_OBJECT_PROMISOR_ONLY);
>  	}
>  
> -	oidset_init(&revs->missing_commits, 0);
> -
>  	if (!revs->reflog_info)
>  		prepare_to_use_bloom_filter(revs);
>  	if (!revs->unsorted_input)
> diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
> index 5f1be7abb5..78387eebb3 100755
> --- a/t/t6022-rev-list-missing.sh
> +++ b/t/t6022-rev-list-missing.sh
> @@ -78,4 +78,60 @@ do
>  	done
>  done
>  
> +for missing_tip in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
> +do
> +	# We want to check that things work when both
> +	#   - all the tips passed are missing (case existing_tip = ""), and
> +	#   - there is one missing tip and one existing tip (case existing_tip = "HEAD")
> +	for existing_tip in "" "HEAD"
> +	do

Though I am biased, these new variable names do make this test that much
easier to read. Thanks.

^ permalink raw reply

* Re: [PATCH v2 4/4] rev-list: allow missing tips with --missing=[print|allow*]
From: Linus Arver @ 2024-02-13 22:38 UTC (permalink / raw)
  To: Junio C Hamano, Christian Couder
  Cc: git, Patrick Steinhardt, John Cai, Christian Couder
In-Reply-To: <xmqqeddmonq0.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Christian Couder <christian.couder@gmail.com> writes:
>
>> +	 * mechanism to allow and disallow some sets of options for
>> +	 * different commands (like rev-list, replay, etc). Such
>> +	 * mechanism should do an early parsing of option and be able
>> +	 * to manage the `--exclude-promisor-objects` and `--missing=...`
>> +	 * options below.
>> +	 */
>>  	for (i = 1; i < argc; i++) {
>>  		const char *arg = argv[i];
>>  		if (!strcmp(arg, "--exclude-promisor-objects")) {
>> @@ -753,8 +762,12 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>>  
>>  	if (arg_print_omitted)
>>  		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
>> -	if (arg_missing_action == MA_PRINT)
>> +	if (arg_missing_action == MA_PRINT) {
>>  		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
>> +		/* Already add missing tips */
>> +		oidset_insert_from_set(&missing_objects, &revs.missing_commits);
>> +		oidset_clear(&revs.missing_commits);
>> +	}
>
> It is unclear what "already" here refers to, at least to me.

It's grammatically correct but perhaps a bit "over eager" (gives the
impression that we get these missing tips all the time and is a common
"happy" path). I do still think my earlier v1 comments

    Did you mean "Add already-missing commits"? Perhaps even more explicit
    would be "Add missing tips"?

are relevant here.

^ permalink raw reply

* Re: [PATCH v3 2/2] column: guard against negative padding
From: Rubén Justo @ 2024-02-13 23:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kristoffer Haugsbakk, git
In-Reply-To: <xmqqle7o5f34.fsf@gitster.g>

On 13-feb-2024 11:39:11, Junio C Hamano wrote:
> Rubén Justo <rjusto@gmail.com> writes:
> 
> >> This one happens to be safe currently because "git tag" passes 2 in
> >> opts->padding, but I do not think this is needed.
> >
> > At first glance, I also thought this was not necessary.
> >
> > However, callers of run_column_filter() might forget to check the return
> > value, and the BUG() triggered by the underlying process could be buried
> > and ignored.  Having the BUG() here, in the same process, makes it more
> > noticeable.
> 
> The point of BUG() is to help developers catch the silly breakage
> before it excapes from the lab, and we can expect these careless
> developers to ignore the return value.  But "column --padding=-1"
> invoked as a subprocess will show a human-readable error message
> to such a developer, so it is less important than the BUG() in the
> other place.

Thinking again about this; you are right.  That BUG() in
run_column_filter() does not make sense in this series.

It is addressing a different error, and perhaps a solution could be:

--- >8 ---
Subject: [PATCH] tag: error when git-column fails

If the user asks for the list of tags to be displayed in columns
("--columns"), a child git-column process is used to format the output
as expected.

In a rare situation where we encounter a problem spawning that child
process, we will work erroneously.

Make noticeable we're having a problem executing git-column, so the user
can act accordingly.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
---
 builtin/tag.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 37473ac21f..30532b76d5 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -530,7 +530,8 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 			struct column_options copts;
 			memset(&copts, 0, sizeof(copts));
 			copts.padding = 2;
-			run_column_filter(colopts, &copts);
+			if (run_column_filter(colopts, &copts))
+				die(_("could not start 'git column'")
 		}
 		filter.name_patterns = argv;
 		ret = list_tags(&filter, sorting, &format);
-- 
2.43.0

^ permalink raw reply related

* [PATCH] tag: error when git-column fails
From: Rubén Justo @ 2024-02-13 23:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kristoffer Haugsbakk, git
In-Reply-To: <8acde766-e2cd-4901-b665-f677cd15295d@gmail.com>

If the user asks for the list of tags to be displayed in columns
("--columns"), a child git-column process is used to format the output
as expected.

In a rare situation where we encounter a problem spawning that child
process, we will work erroneously.

Make noticeable we're having a problem executing git-column, so the user
can act accordingly.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
---
 builtin/tag.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 37473ac21f..19a7e06bf4 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -530,7 +530,8 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 			struct column_options copts;
 			memset(&copts, 0, sizeof(copts));
 			copts.padding = 2;
-			run_column_filter(colopts, &copts);
+			if (run_column_filter(colopts, &copts))
+				die(_("could not start 'git column'"));
 		}
 		filter.name_patterns = argv;
 		ret = list_tags(&filter, sorting, &format);
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH v2 3/5] completion: add and use __git_compute_first_level_config_vars_for_section
From: Junio C Hamano @ 2024-02-14  0:24 UTC (permalink / raw)
  To: Philippe Blain; +Cc: Patrick Steinhardt, Philippe Blain via GitGitGadget, git
In-Reply-To: <996fab96-82b1-0b54-3a09-0ecc18d68a11@gmail.com>

Philippe Blain <levraiphilippeblain@gmail.com> writes:

>> A silly question (primarily because I do not much use the indirect
>> reference construct ${!name}).  Does the assignment with printf need
>> to spell out the long variable name with "_${section}"?  Can it be
>> 
>>     printf -v "$this_section" ...
>> 
>> instead, as we already have the short-hand for it?
>
> No, unfortunately neither "$this_section" nor "${!this_section}"
> work, so we must use the long name.

Hmph, this does not match my experiment, though.  What am I doing
wrong?

        bash$ vname=foo
        bash$ foo=bar
        bash$ set | grep foo
        foo=bar
        vname=foo
        bash$ printf -v "$vname" "%d" 1234
        bash$ set | grep foo
        foo=1234
        vname=foo
        bash$ echo $BASH_VERSION
        5.2.21(1)-release


^ permalink raw reply

* Re: [PATCH 05/12] fsmonitor: refactor refresh callback for non-directory events
From: Junio C Hamano @ 2024-02-14  1:34 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler
In-Reply-To: <0896d4af907d71df29b0c4f5a27d24ea80b3c0e1.1707857541.git.gitgitgadget@gmail.com>

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 66 ++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 38 insertions(+), 28 deletions(-)

Up to this point, I found it a very pleasant read.  Nothing
surprising or unexpected.  Just a simple series of nice clean-ups.

Thanks.

^ permalink raw reply

* Re: [PATCH] tag: error when git-column fails
From: Junio C Hamano @ 2024-02-14  1:35 UTC (permalink / raw)
  To: Rubén Justo; +Cc: Kristoffer Haugsbakk, git
In-Reply-To: <59df085d-0de8-45b1-9b8b-c69e91e56a1f@gmail.com>

Rubén Justo <rjusto@gmail.com> writes:

> @@ -530,7 +530,8 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
>  			struct column_options copts;
>  			memset(&copts, 0, sizeof(copts));
>  			copts.padding = 2;
> -			run_column_filter(colopts, &copts);
> +			if (run_column_filter(colopts, &copts))
> +				die(_("could not start 'git column'"));

Nice.  This obvious omission should have been here from the day one.

Will queue.  Thanks.

>  		}
>  		filter.name_patterns = argv;
>  		ret = list_tags(&filter, sorting, &format);

^ permalink raw reply

* [L10N] ci: bump GitHub Actions versions in l10n.yml
From: Jiang Xin @ 2024-02-14  4:38 UTC (permalink / raw)
  To: Git List, Junio C Hamano, Johannes Schindelin; +Cc: Jiang Xin, dependabot [bot]

Hi Junio,

Johannes (dscho) sent a pull request [1] which updates the git-l10n
GitHub Actions to the git-l10n repository. This change can be tested
directly in the git-l10n repository, because the git-l10n GitHub Actions
is only turned on in the git-l10n repository. I will merge this commit
in my tree and send it to you this Sunday along with other l10n updates.

[1] https://github.com/git-l10n/git-po/pull/749/

--- >8 ---
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Wed, 7 Feb 2024 19:42:17 +0000
Subject: [PATCH] l10n: bump Actions versions in l10n.yml

This avoids the "Node.js 16 Actions are deprecated" warnings.

Original-commits-by: dependabot[bot] <support@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 .github/workflows/l10n.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/l10n.yml b/.github/workflows/l10n.yml
index 6c3849658a..3d89758981 100644
--- a/.github/workflows/l10n.yml
+++ b/.github/workflows/l10n.yml
@@ -63,7 +63,7 @@ jobs:
             origin \
             ${{ github.ref }} \
             $args
-      - uses: actions/setup-go@v2
+      - uses: actions/setup-go@v5
         with:
           go-version: '>=1.16'
       - name: Install git-po-helper
@@ -91,7 +91,7 @@ jobs:
           cat git-po-helper.out
           exit $exit_code
       - name: Create comment in pull request for report
-        uses: mshick/add-pr-comment@v1
+        uses: mshick/add-pr-comment@v2
         if: >-
           always() &&
           github.event_name == 'pull_request_target' &&
-- 
2.44.0.rc0


^ permalink raw reply related

* [PATCH] diff: mark some diff parameters as placeholders
From: Jiang Xin @ 2024-02-14  5:31 UTC (permalink / raw)
  To: Git List, Junio C Hamano, Jean-Noël Avila
  Cc: Jiang Xin, Alexander Shopov, Jordi Mas, Ralf Thielow,
	Jimmy Angelakos, Christopher Díaz, Bagas Sanjaya,
	Alessandro Menti, Gwan-gyeong Mun, Arusekk, Daniel Santos,
	Dimitriy Ryazantcev, Peter Krefting, Emir SARI, Arkadii Yakovets,
	Trần Ngọc Quân, Teng Long, Yi-Jyun Pan

From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

Some l10n translators translated the parameters "files", "param1" and
"param2" in the following message:

    "synonym for --dirstat=files,param1,param2..."

Translating "param1" and "param2" is OK, but changing the parameter
"files" is wrong. The parameters that are not meant to be used verbatim
should be marked as placeholders, but the verbatim parameter not marked
as a placeholder should be left as is.

This change is a complement for commit 51e846e673 (doc: enforce
placeholders in documentation, 2023-12-25).

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index ccfa1fca0d..c256ef103e 100644
--- a/diff.c
+++ b/diff.c
@@ -5599,7 +5599,7 @@ struct option *add_diff_options(const struct option *opts,
 			       PARSE_OPT_NONEG | PARSE_OPT_NOARG,
 			       diff_opt_dirstat),
 		OPT_CALLBACK_F(0, "dirstat-by-file", options, N_("<param1,param2>..."),
-			       N_("synonym for --dirstat=files,param1,param2..."),
+			       N_("synonym for --dirstat=files,<param1>,<param2>..."),
 			       PARSE_OPT_NONEG | PARSE_OPT_OPTARG,
 			       diff_opt_dirstat),
 		OPT_BIT_F(0, "check", &options->output_format,
-- 
2.44.0.rc0


^ permalink raw reply related

* What's cooking in git.git (Feb 2024, #05; Tue, 13)
From: Junio C Hamano @ 2024-02-14  6:27 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking in my tree.  Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a
future release).  Commits prefixed with '-' are only in 'seen', and
aren't considered "accepted" at all and may be annotated with an URL
to a message that raises issues but they are no means exhaustive.  A
topic without enough support may be discarded after a long period of
no activity (of course they can be resubmit when new interests
arise).

Git 2.44-rc1 will be tagged soon.  This round, let's try having just
one release candidate before the final, which is expected to be
tagged on or around 20th.

Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors.  Some
repositories have only a subset of branches.

With maint, master, next, seen, todo:

	git://git.kernel.org/pub/scm/git/git.git/
	git://repo.or.cz/alt-git.git/
	https://kernel.googlesource.com/pub/scm/git/git/
	https://github.com/git/git/
	https://gitlab.com/git-vcs/git/

With all the integration branches and topics broken out:

	https://github.com/gitster/git/

Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):

	git://git.kernel.org/pub/scm/git/git-htmldocs.git/
	https://github.com/gitster/git-htmldocs.git/

Release tarballs are available at:

	https://www.kernel.org/pub/software/scm/git/

--------------------------------------------------
[Graduated to 'master']

* bk/complete-bisect (2024-02-06) 7 commits
  (merged to 'next' on 2024-02-07 at ac95a595b7)
 + completion: bisect: recognize but do not complete view subcommand
 + completion: bisect: complete log opts for visualize subcommand
 + completion: new function __git_complete_log_opts
 + completion: bisect: complete missing --first-parent and - -no-checkout options
 + completion: bisect: complete custom terms and related options
 + completion: bisect: complete bad, new, old, and help subcommands
 + completion: tests: always use 'master' for default initial branch name

 Command line completion support (in contrib/) has been
 updated for "git bisect".
 source: <20240206215048.488344-1-britton.kerin@gmail.com>


* jc/bisect-doc (2024-02-07) 2 commits
  (merged to 'next' on 2024-02-07 at 914fa6775f)
 + bisect: document command line arguments for "bisect start"
 + bisect: document "terms" subcommand more fully

 Doc update.
 source: <20240207214436.538586-1-gitster@pobox.com>


* jc/sign-buffer-failure-propagation-fix (2024-02-07) 2 commits
  (merged to 'next' on 2024-02-08 at badb96b5ac)
 + ssh signing: signal an error with a negative return value
  (merged to 'next' on 2024-02-07 at 2cedac9d38)
 + tag: fix sign_buffer() call to create a signed tag

 A failed "git tag -s" did not necessarily result in an error
 depending on the crypto backend, which has been corrected.
 source: <xmqq4jek9ko1.fsf@gitster.g>
 source: <xmqq5xyzr6tm.fsf@gitster.g>


* pb/template-for-single-commit-pr (2024-02-06) 1 commit
  (merged to 'next' on 2024-02-07 at 2a56c8eb13)
 + .github/PULL_REQUEST_TEMPLATE.md: add a note about single-commit PRs

 Doc update.
 source: <pull.1665.v2.git.git.1707225612576.gitgitgadget@gmail.com>


* ps/reftable-multi-level-indices-fix (2024-02-01) 6 commits
  (merged to 'next' on 2024-02-07 at 143f47a079)
 + reftable: document reading and writing indices
 + reftable/writer: fix writing multi-level indices
 + reftable/writer: simplify writing index records
 + reftable/writer: use correct type to iterate through index entries
 + reftable/reader: be more careful about errors in indexed seeks
 + Merge branch 'jc/reftable-core-fsync' into ps/reftable-multi-level-indices-fix

 Write multi-level indices for reftable has been corrected.
 source: <cover.1706773842.git.ps@pks.im>


* ps/reftable-styles (2024-02-06) 9 commits
  (merged to 'next' on 2024-02-07 at 18670512e2)
 + reftable/record: improve semantics when initializing records
 + reftable/merged: refactor initialization of iterators
 + reftable/merged: refactor seeking of records
 + reftable/stack: use `size_t` to track stack length
 + reftable/stack: use `size_t` to track stack slices during compaction
 + reftable/stack: index segments with `size_t`
 + reftable/stack: fix parameter validation when compacting range
 + reftable: introduce macros to allocate arrays
 + reftable: introduce macros to grow arrays

 Code clean-up in various reftable code paths.
 source: <cover.1707200355.git.ps@pks.im>


* ps/report-failure-from-git-stash (2024-02-06) 1 commit
  (merged to 'next' on 2024-02-07 at a8a3f91f61)
 + builtin/stash: report failure to write to index

 "git stash" sometimes was silent even when it failed due to
 unwritable index file, which has been corrected.
 source: <cb098cf88cbfcbf7c4872f8887856629b909cb91.1707197653.git.ps@pks.im>


* pw/show-ref-pseudorefs (2024-02-07) 2 commits
  (merged to 'next' on 2024-02-08 at 7e9f850dba)
 + t1400: use show-ref to check pseudorefs
 + show-ref --verify: accept pseudorefs

 "git show-ref --verify" did not show things like "CHERRY_PICK_HEAD",
 which has been corrected.
 source: <pull.1654.git.1707324277.gitgitgadget@gmail.com>


* tb/multi-pack-reuse-experiment (2024-02-05) 2 commits
  (merged to 'next' on 2024-02-08 at e92afaa170)
 + pack-objects: enable multi-pack reuse via `feature.experimental`
 + t5332-multi-pack-reuse.sh: extract pack-objects helper functions

 Setting `feature.experimental` opts the user into multi-pack reuse
 experiment

 Will cook in 'next'.
 source: <cover.1707173415.git.me@ttaylorr.com>


* vd/for-each-ref-sort-with-formatted-timestamp (2024-02-07) 1 commit
  (merged to 'next' on 2024-02-08 at 5f86cad208)
 + ref-filter.c: sort formatted dates by byte value

 "git branch" and friends learned to use the formatted text as
 sorting key, not the underlying timestamp value, when the --sort
 option is used with author or committer timestamp with a format
 specifier (e.g., "--sort=creatordate:format:%H:%M:%S").
 source: <pull.1655.git.1707357439586.gitgitgadget@gmail.com>

--------------------------------------------------
[New Topics]

* ps/ref-tests-update-even-more (2024-02-09) 7 commits
 - t7003: ensure filter-branch prunes reflogs with the reftable backend
 - t2011: exercise D/F conflicts with HEAD with the reftable backend
 - t1405: remove unneeded cleanup step
 - t1404: make D/F conflict tests compatible with reftable backend
 - t1400: exercise reflog with gaps with reftable backend
 - t0410: enable tests with extensions with non-default repo format
 - t: move tests exercising the "files" backend

 More tests that are marked as "ref-files only" have been updated to
 improve test coverage of reftable backend.

 Needs review.
 source: <cover.1707463221.git.ps@pks.im>


* pw/gc-during-rebase (2024-02-09) 1 commit
  (merged to 'next' on 2024-02-12 at d54c5ce325)
 + prune: mark rebase autostash and orig-head as reachable

 The sequencer machinery does not use the ref API and instead
 records names of certain objects it needs for its correct operation
 in temporary files, which makes these objects susceptible to loss
 by garbage collection.  These temporary files have been added as
 starting points for reachability analysis to fix this.

 Will merge to 'master'.
 source: <pull.1656.v2.git.1707495579886.gitgitgadget@gmail.com>


* rs/receive-pack-remove-find-header (2024-02-12) 2 commits
  (merged to 'next' on 2024-02-12 at f1bf281e10)
 + receive-pack: use find_commit_header() in check_nonce()
 + receive-pack: use find_commit_header() in check_cert_push_options()

 Code simplification.

 Will merge to 'master'.
 source: <8b350cae-2180-4ac7-a911-d40043576445@web.de>


* cp/git-flush-is-an-env-bool (2024-02-13) 1 commit
  (merged to 'next' on 2024-02-13 at c0850f5675)
 + write-or-die: fix the polarity of GIT_FLUSH environment variable

 Recent conversion to allow more than 0/1 in GIT_FLUSH broke the
 mechanism by flipping what yes/no means by mistake, which has been
 corrected.

 Will merge to 'master' and then to 'maint'
 source: <xmqqbk8k5eo0.fsf@gitster.g>


* jc/unit-tests-make-relative-fix (2024-02-12) 1 commit
  (merged to 'next' on 2024-02-12 at 554eddef80)
 + unit-tests: do show relative file paths on non-Windows, too

 The mechanism to report the filename in the source code, used by
 the unit-test machinery, assumed that the compiler expanded __FILE__
 to the path to the source given to the $(CC), but some compilers
 give full path, breaking the output.  This has been corrected.

 Will merge to 'master'.
 source: <xmqqle7r9enn.fsf_-_@gitster.g>


* js/github-actions-update (2024-02-12) 2 commits
  (merged to 'next' on 2024-02-12 at f52de8b126)
 + ci(linux32): add a note about Actions that must not be updated
 + ci: bump remaining outdated Actions versions
 (this branch uses jc/github-actions-update.)

 Update remaining GitHub Actions jobs to avoid warnings against
 using deprecated version of Node.js.

 Will merge to 'master'.
 source: <pull.1660.v2.git.1707653489.gitgitgadget@gmail.com>


* rs/use-xstrncmpz (2024-02-12) 1 commit
  (merged to 'next' on 2024-02-12 at 37e5f0fc14)
 + use xstrncmpz()

 Code clean-up.

 Will cook in 'next'.
 source: <954b75d0-1504-4f57-b34e-e770a4b7b3ea@web.de>


* kn/for-all-refs (2024-02-12) 6 commits
 - for-each-ref: add new option to include root refs
 - ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR'
 - refs: introduce `refs_for_each_include_root_refs()`
 - refs: extract out `loose_fill_ref_dir_regular_file()`
 - refs: introduce `is_pseudoref()` and `is_headref()`
 - Merge branch 'ps/reftable-backend' into kn/for-all-refs
 (this branch uses ps/reftable-backend.)

 "git for-each-ref" filters its output with prefixes given from the
 command line, but it did not honor an empty string to mean "pass
 everything", which has been corrected.

 Needs review.
 source: <20240211183923.131278-1-karthik.188@gmail.com>


* kh/column-reject-negative-padding (2024-02-13) 2 commits
 - column: guard against negative padding
 - column: disallow negative padding

 "git column" has been taught to reject negative padding value, as
 it would lead to nonsense behaviour including division by zero.

 Will merge to 'next'?
 source: <cover.1707839454.git.code@khaugsbakk.name>

--------------------------------------------------
[Cooking]

* vn/rebase-with-cherry-pick-authorship (2024-02-08) 1 commit
  (merged to 'next' on 2024-02-09 at ed35d33595)
 + sequencer: unset GIT_CHERRY_PICK_HELP for 'exec' commands

 "git cherry-pick" invoked during "git rebase -i" session lost
 the authorship information, which has been corrected.

 Will merge to 'master'.
 source: <0adb1068-ef10-44ed-ad1d-e0927a09245d@gmail.com>


* jc/no-lazy-fetch (2024-02-13) 1 commit
 - git: --no-lazy-fetch option

 "git --no-lazy-fetch cmd" allows to run "cmd" while disabling lazy
 fetching of objects from the promisor remote, which may be handy
 for debugging.

 Will merge to 'next'.
 source: <xmqq1q9mmtpw.fsf@gitster.g>


* jc/t9210-lazy-fix (2024-02-08) 1 commit
 - t9210: do not rely on lazy fetching to fail

 Adjust use of "rev-list --missing" in an existing tests so that it
 does not depend on a buggy failure mode.

 Will merge to 'next'.
 source: <xmqq7cjemttr.fsf@gitster.g>


* gt/at-is-synonym-for-head-in-add-patch (2024-02-13) 2 commits
 - add -p tests: remove PERL prerequisites
 - add-patch: classify '@' as a synonym for 'HEAD'

 Teach "git checkout -p" and friends that "@" is a synonym for
 "HEAD".

 Will merge to 'next'?
 source: <20240211202035.7196-2-shyamthakkar001@gmail.com>


* js/unit-test-suite-runner (2024-02-03) 7 commits
 - t/Makefile: run unit tests alongside shell tests
 - unit tests: add rule for running with test-tool
 - test-tool run-command testsuite: support unit tests
 - test-tool run-command testsuite: remove hardcoded filter
 - test-tool run-command testsuite: get shell from env
 - t0080: turn t-basic unit test into a helper
 - Merge branch 'jk/unit-tests-buildfix' into js/unit-test-suite-runner

 The "test-tool" has been taught to run testsuite tests in parallel,
 bypassing the need to use the "prove" tool.

 Expecting a reroll.
 cf. <20240207225802.GA538110@coredump.intra.peff.net>
 source: <cover.1706921262.git.steadmon@google.com>


* js/check-null-from-read-object-file (2024-02-06) 1 commit
  (merged to 'next' on 2024-02-12 at 3a18369516)
 + Always check the return value of `repo_read_object_file()`

 The code paths that call repo_read_object_file() have been
 tightened to react to errors.

 Will merge to 'master'.
 source: <pull.1650.git.1707143753726.gitgitgadget@gmail.com>


* ps/reftable-backend (2024-02-07) 3 commits
  (merged to 'next' on 2024-02-08 at ba1c4c52bb)
 + refs/reftable: fix leak when copying reflog fails
  (merged to 'next' on 2024-02-07 at 1115200acb)
 + ci: add jobs to test with the reftable backend
 + refs: introduce reftable backend
 (this branch is used by kn/for-all-refs.)

 Integrate the reftable code into the refs framework as a backend.

 Will cook in 'next'.
 source: <cover.1707288261.git.ps@pks.im>


* cc/rev-list-allow-missing-tips (2024-02-08) 4 commits
 - rev-list: allow missing tips with --missing=[print|allow*]
 - t6022: fix 'test' style and 'even though' typo
 - oidset: refactor oidset_insert_from_set()
 - revision: clarify a 'return NULL' in get_reference()

 "git rev-list --missing=print" have learned to optionally take
 "--allow-missing-tips", which allows the objects at the starting
 points to be missing.

 Comments?
 cf. <xmqq7cjemttr.fsf@gitster.g>
 source: <20240208135055.2705260-1-christian.couder@gmail.com>


* ps/reftable-iteration-perf (2024-02-12) 7 commits
  (merged to 'next' on 2024-02-12 at 6abaf58383)
 + reftable/reader: add comments to `table_iter_next()`
 + reftable/record: don't try to reallocate ref record name
 + reftable/block: swap buffers instead of copying
 + reftable/pq: allocation-less comparison of entry keys
 + reftable/merged: skip comparison for records of the same subiter
 + reftable/merged: allocation-less dropping of shadowed records
 + reftable/record: introduce function to compare records by key

 The code to iterate over refs with the reftable backend has seen
 some optimization.

 Will cook in 'next'.
 source: <cover.1707726654.git.ps@pks.im>


* jc/github-actions-update (2024-02-02) 3 commits
  (merged to 'next' on 2024-02-07 at 2cd6caaf70)
 + Merge branch 'jc/maint-github-actions-update' into jc/github-actions-update
 + GitHub Actions: update to github-script@v7
 + GitHub Actions: update to checkout@v4
 (this branch is used by js/github-actions-update.)

 Squelch node.js 16 deprecation warnings from GitHub Actions CI
 by updating actions/github-script and actions/checkout that use
 node.js 20.

 Will merge to 'master'.
 source: <20240202203935.1240458-1-gitster@pobox.com>


* js/merge-tree-3-trees (2024-02-07) 6 commits
 - cache-tree: avoid an unnecessary check
 - Always check `parse_tree*()`'s return value
 - t4301: verify that merge-tree fails on missing blob objects
 - merge-ort: do check `parse_tree()`'s return value
 - merge-tree: fail with a non-zero exit code on missing tree objects
  (merged to 'next' on 2024-01-30 at 0c77b04e59)
 + merge-tree: accept 3 trees as arguments

 "git merge-tree" has learned that the three trees involved in the
 3-way merge only need to be trees, not necessarily commits.

 Expecting a reroll.
 cf. <CAPig+cSs8MFkLasTULh7tybrFm7SwaT+JeR7HnXjh+-agCHYMw@mail.gmail.com>
 cf. <CAPig+cSJz3U+vT==NhX5hcrTjsCggnAzhzQOvZcSXbcEGuYaKQ@mail.gmail.com>
 source: <pull.1647.v2.git.1706474063109.gitgitgadget@gmail.com>
 source: <pull.1651.v2.git.1707324461.gitgitgadget@gmail.com>


* pb/complete-config (2024-02-12) 4 commits
  (merged to 'next' on 2024-02-13 at d09f5e469a)
 + completion: add and use __git_compute_second_level_config_vars_for_section
 + completion: add and use __git_compute_first_level_config_vars_for_section
 + completion: complete 'submodule.*' config variables
 + completion: add space after config variable names also in Bash 3

 The command line completion script (in contrib/) learned to
 complete configuration variable names better.

 Will merge to 'master'.
 cf. <Zcs34kGTqTbIana6@tanuki>
 source: <pull.1660.v3.git.git.1707589943.gitgitgadget@gmail.com>


* rj/complete-reflog (2024-01-26) 4 commits
 - completion: reflog show <log-options>
 - completion: reflog with implicit "show"
 - completion: introduce __git_find_subcommand
 - completion: introduce __gitcomp_subcommand

 The command line completion script (in contrib/) learned to
 complete "git reflog" better.

 Needs review.
 source: <98daf977-dbad-4d3b-a293-6a769895088f@gmail.com>


* ml/log-merge-with-cherry-pick-and-other-pseudo-heads (2024-02-12) 2 commits
 - revision: implement `git log --merge` also for rebase/cherry-pick/revert
 - revision: ensure MERGE_HEAD is a ref in prepare_show_merge

 "git log --merge" learned to pay attention to CHERRY_PICK_HEAD and
 other kinds of *_HEAD pseudorefs.

 Expecting a reroll.
 cf. <790a3f11-5a8c-42f2-7a35-f2900c0299b4@gmail.com>
 cf. <8384d1dc-b6c4-b853-9bf6-3d7ccee86d12@gmail.com>
 source: <20240210-ml-log-merge-with-cherry-pick-and-other-pseudo-heads-v4-0-3bc9e62808f4@gmail.com>


* bk/complete-dirname-for-am-and-format-patch (2024-01-12) 1 commit
 - completion: dir-type optargs for am, format-patch

 Command line completion support (in contrib/) has been
 updated for a few commands to complete directory names where a
 directory name is expected.

 Expecting a reroll.
 cf. <40c3a824-a961-490b-94d4-4eb23c8f713d@gmail.com>
 source: <d37781c3-6af2-409b-95a8-660a9b92d20b@smtp-relay.sendinblue.com>


* bk/complete-send-email (2024-01-12) 1 commit
 - completion: don't complete revs when --no-format-patch

 Command line completion support (in contrib/) has been taught to
 avoid offering revision names as candidates to "git send-email" when
 the command is used to send pre-generated files.

 Needs review.
 cf. <xmqq4jej6i1b.fsf@gitster.g>
 source: <a718b5ee-afb0-44bd-a299-3208fac43506@smtp-relay.sendinblue.com>


* la/trailer-api (2024-02-06) 28 commits
 - trailer: introduce "template" term for readability
 - trailer_set_*(): put out parameter at the end
 - trailer: unify "--trailer ..." arg handling
 - trailer: deprecate "new_trailer_item" struct from API
 - trailer_add_arg_item(): drop new_trailer_item usage
 - trailer: add new helper functions to API
 - trailer: prepare to delete "parse_trailers_from_command_line_args()"
 - trailer: spread usage of "trailer_block" language
 - trailer: retire trailer_info_get() from API
 - trailer: make trailer_info struct private
 - sequencer: use the trailer iterator
 - trailer: teach iterator about non-trailer lines
 - trailer: finish formatting unification
 - format_trailer_info(): avoid double-printing the separator
 - format_trailer_info(): teach it about opts->trim_empty
 - trailer: begin formatting unification
 - format_trailer_info(): append newline for non-trailer lines
 - format_trailer_info(): drop redundant unfold_value()
 - format_trailer_info(): use trailer_item objects
 - format_trailers_from_commit(): indirectly call trailer_info_get()
 - format_trailer_info(): move "fast path" to caller
 - format_trailers(): use strbuf instead of FILE
 - trailer_info_get(): reorder parameters
 - trailer: start preparing for formatting unification
 - trailer: move interpret_trailers() to interpret-trailers.c
 - trailer: prepare to expose functions as part of API
 - shortlog: add test for de-duplicating folded trailers
 - trailer: free trailer_info _after_ all related usage

 Code clean-up.
 source: <pull.1632.v4.git.1707196348.gitgitgadget@gmail.com>


* cp/apply-core-filemode (2023-12-26) 3 commits
  (merged to 'next' on 2024-02-07 at 089a3fbb86)
 + apply: code simplification
 + apply: correctly reverse patch's pre- and post-image mode bits
 + apply: ignore working tree filemode when !core.filemode

 "git apply" on a filesystem without filemode support have learned
 to take a hint from what is in the index for the path, even when
 not working with the "--index" or "--cached" option, when checking
 the executable bit match what is required by the preimage in the
 patch.

 Will cook in 'next'.
 cf. <xmqqzfwb53a9.fsf@gitster.g>
 source: <20231226233218.472054-1-gitster@pobox.com>


* tb/path-filter-fix (2024-01-31) 16 commits
 - bloom: introduce `deinit_bloom_filters()`
 - commit-graph: reuse existing Bloom filters where possible
 - object.h: fix mis-aligned flag bits table
 - commit-graph: new Bloom filter version that fixes murmur3
 - commit-graph: unconditionally load Bloom filters
 - bloom: prepare to discard incompatible Bloom filters
 - bloom: annotate filters with hash version
 - repo-settings: introduce commitgraph.changedPathsVersion
 - t4216: test changed path filters with high bit paths
 - t/helper/test-read-graph: implement `bloom-filters` mode
 - bloom.h: make `load_bloom_filter_from_graph()` public
 - t/helper/test-read-graph.c: extract `dump_graph_info()`
 - gitformat-commit-graph: describe version 2 of BDAT
 - commit-graph: ensure Bloom filters are read with consistent settings
 - revision.c: consult Bloom filters for root commits
 - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`

 The Bloom filter used for path limited history traversal was broken
 on systems whose "char" is unsigned; update the implementation and
 bump the format version to 2.

 Waiting for a final ack?
 cf. <ZcFjkfbsBfk7JQIH@nand.local>
 source: <cover.1706741516.git.me@ttaylorr.com>


* eb/hash-transition (2023-10-02) 30 commits
 - t1016-compatObjectFormat: add tests to verify the conversion between objects
 - t1006: test oid compatibility with cat-file
 - t1006: rename sha1 to oid
 - test-lib: compute the compatibility hash so tests may use it
 - builtin/ls-tree: let the oid determine the output algorithm
 - object-file: handle compat objects in check_object_signature
 - tree-walk: init_tree_desc take an oid to get the hash algorithm
 - builtin/cat-file: let the oid determine the output algorithm
 - rev-parse: add an --output-object-format parameter
 - repository: implement extensions.compatObjectFormat
 - object-file: update object_info_extended to reencode objects
 - object-file-convert: convert commits that embed signed tags
 - object-file-convert: convert commit objects when writing
 - object-file-convert: don't leak when converting tag objects
 - object-file-convert: convert tag objects when writing
 - object-file-convert: add a function to convert trees between algorithms
 - object: factor out parse_mode out of fast-import and tree-walk into in object.h
 - cache: add a function to read an OID of a specific algorithm
 - tag: sign both hashes
 - commit: export add_header_signature to support handling signatures on tags
 - commit: convert mergetag before computing the signature of a commit
 - commit: write commits for both hashes
 - object-file: add a compat_oid_in parameter to write_object_file_flags
 - object-file: update the loose object map when writing loose objects
 - loose: compatibilty short name support
 - loose: add a mapping between SHA-1 and SHA-256 for loose objects
 - repository: add a compatibility hash algorithm
 - object-names: support input of oids in any supported hash
 - oid-array: teach oid-array to handle multiple kinds of oids
 - object-file-convert: stubs for converting from one object format to another

 Teach a repository to work with both SHA-1 and SHA-256 hash algorithms.

 Will merge to and cook in 'next'?
 cf. <xmqqv86z5359.fsf@gitster.g>
 source: <878r8l929e.fsf@gmail.froward.int.ebiederm.org>


* jc/rerere-cleanup (2023-08-25) 4 commits
 - rerere: modernize use of empty strbuf
 - rerere: try_merge() should use LL_MERGE_ERROR when it means an error
 - rerere: fix comment on handle_file() helper
 - rerere: simplify check_one_conflict() helper function

 Code clean-up.

 Not ready to be reviewed yet.
 source: <20230824205456.1231371-1-gitster@pobox.com>

--------------------------------------------------
[Discarded]

* tb/pair-chunk-expect (2023-11-10) 8 commits
 . midx: read `OOFF` chunk with `pair_chunk_expect()`
 . midx: read `OIDL` chunk with `pair_chunk_expect()`
 . commit-graph: read `BIDX` chunk with `pair_chunk_expect()`
 . commit-graph: read `GDAT` chunk with `pair_chunk_expect()`
 . commit-graph: read `CDAT` chunk with `pair_chunk_expect()`
 . commit-graph: read `OIDL` chunk with `pair_chunk_expect()`
 . chunk-format: introduce `pair_chunk_expect()` helper
 . Merge branch 'jk/chunk-bounds-more' into HEAD

 Further code clean-up.

 Retracted for now.
 cf. <ZcFjkfbsBfk7JQIH@nand.local>
 source: <cover.1699569246.git.me@ttaylorr.com>


* ak/color-decorate-symbols (2023-10-23) 7 commits
 - log: add color.decorate.pseudoref config variable
 - refs: exempt pseudorefs from pattern prefixing
 - refs: add pseudorefs array and iteration functions
 - log: add color.decorate.ref config variable
 - log: add color.decorate.symbol config variable
 - log: use designated inits for decoration_colors
 - config: restructure color.decorate documentation

 A new config for coloring.

 Ejected, to be rebuilt on updated kn/for-all-refs topic
 cf. <xmqqcyt853vz.fsf@gitster.g>
 source: <20231023221143.72489-1-andy.koppe@gmail.com>

^ permalink raw reply

* Re: [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects
From: Linus Arver @ 2024-02-14  7:20 UTC (permalink / raw)
  To: Eric W. Biederman, Junio C Hamano
  Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman
In-Reply-To: <20231002024034.2611-5-ebiederm@gmail.com>

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> From: "brian m. carlson" <sandals@crustytoothpaste.net>
>
> As part of the transition plan, we'd like to add a file in the .git
> directory that maps loose objects between SHA-1 and SHA-256.  Let's
> implement the specification in the transition plan and store this data
> on a per-repository basis in struct repository.

Could you explain a bit what the specification is, exactly? That would
save reviewers the trouble of comparing the large chunk of new code here
with the transition plan (which I assume is still
Documentation/technical/hash-function-transition.txt.

Also, are there any slight deviations from the specification for reasons
that may not be obvious to reviewers?

I would prefer if this patch is split up into smaller preparatory
patches, starting with the core essentials and then building it up
piece-by-piece to make it easier to review.

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  Makefile              |   1 +
>  loose.c               | 246 ++++++++++++++++++++++++++++++++++++++++++
>  loose.h               |  22 ++++
>  object-file-convert.c |  14 ++-
>  object-store-ll.h     |   3 +
>  object.c              |   2 +
>  repository.c          |   6 ++
>  7 files changed, 293 insertions(+), 1 deletion(-)
>  create mode 100644 loose.c
>  create mode 100644 loose.h
>
> diff --git a/Makefile b/Makefile
> index f7e824f25cda..3c18664def9a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1053,6 +1053,7 @@ LIB_OBJS += list-objects-filter.o
>  LIB_OBJS += list-objects.o
>  LIB_OBJS += lockfile.o
>  LIB_OBJS += log-tree.o
> +LIB_OBJS += loose.o

The name "loose" appears to be a bit too generic for something with such
a specialized role (_mapping_ of loose objects). Would
"loose-object-map" be a better name?

>  LIB_OBJS += ls-refs.o
>  LIB_OBJS += mailinfo.o
>  LIB_OBJS += mailmap.o
> diff --git a/loose.c b/loose.c
> new file mode 100644
> index 000000000000..6ba73cc84dca
> --- /dev/null
> +++ b/loose.c
> @@ -0,0 +1,246 @@
> +#include "git-compat-util.h"
> +#include "hash.h"
> +#include "path.h"
> +#include "object-store.h"
> +#include "hex.h"
> +#include "wrapper.h"
> +#include "gettext.h"
> +#include "loose.h"
> +#include "lockfile.h"
> +
> +static const char *loose_object_header = "# loose-object-idx\n";

IDK what the "loose-object-idx" is vs the "loose-object-map", but I
guess I need to read more of the code.

But also, I am at my limits here and am unable to review this patch as
is (too big for me to chew at once, sorry).

I'll pause my review of this series here to give Eric B some time to
respond. Thanks.

^ permalink raw reply

* Re: [PATCH v2 00/30] initial support for multiple hash functions
From: Linus Arver @ 2024-02-14  7:36 UTC (permalink / raw)
  To: Junio C Hamano, git
  Cc: Eric W. Biederman, brian m. carlson, Eric Sunshine,
	Eric W. Biederman
In-Reply-To: <owlymssbn6qa.fsf@fine.c.googlers.com>

Linus Arver <linusa@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>>
>>> This addresses all of the known test failures from v1 of this set of
>>> changes.  In particular I have reworked commit_tree_extended which
>>> was flagged by smatch, -Werror=array-bounds, and the leak detector.
>>>
>>> One functional bug was fixed in repo_for_each_abbrev where it was
>>> mistakenly displaying too many ambiguous oids.
>>>
>>> I am posting this so that people review and testing of this patchset
>>> won't be distracted by the known and fixed issues.
>>
>> We haven't seen any reviews on this second round, and have had it
>> outside 'next' for too long.  I am tempted to say that we merge it
>> to 'next' and see if anybody screams at this point.
>
> FWIW out of all the "Needs review" topics this one seemed like the most
> deserving of another pair of eyes, and I was planning to review some of
> the patches here this week + the weekend. If my review takes too long
> (taking longer than this weekend) I can give another update next week
> saying "too hard for me, please don't wait for me" to unblock you from
> merging to next.
>
> Thanks.

Unfortunately I don't think I can finish reviewing the rest of the
series (after all this time I've only been able to review just 4 out of
30 patches). I'm also stuck on trying to understand patch 5, as there is
a lot going on there.

FWIW a lot (perhaps all?) of my comments so far were around readability
and not material to the actual design or approach of anything AFAICS.
So, it's time for me to say "don't bother waiting for me" as I said
(predicted?) earlier.

Don't bother waiting for me. Thanks.

^ permalink raw reply

* Re: [PATCH] diff: mark some diff parameters as placeholders
From: Jean-Noël Avila @ 2024-02-14  7:36 UTC (permalink / raw)
  To: Jiang Xin, Git List, Junio C Hamano
  Cc: Jiang Xin, Alexander Shopov, Jordi Mas, Ralf Thielow,
	Jimmy Angelakos, Christopher Díaz, Bagas Sanjaya,
	Alessandro Menti, Gwan-gyeong Mun, Arusekk, Daniel Santos,
	Dimitriy Ryazantcev, Peter Krefting, Emir SARI, Arkadii Yakovets,
	Trần Ngọc Quân, Teng Long, Yi-Jyun Pan
In-Reply-To: <1e33662683b43e93889b4b3493a0edc2e3483920.1707888478.git.zhiyou.jx@alibaba-inc.com>

Hello,

Some parameter placeholder slipped through...

Le 14/02/2024 à 06:31, Jiang Xin a écrit :
> From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> 
> Some l10n translators translated the parameters "files", "param1" and
> "param2" in the following message:
> 
>      "synonym for --dirstat=files,param1,param2..."
> 
> Translating "param1" and "param2" is OK, but changing the parameter
> "files" is wrong. The parameters that are not meant to be used verbatim
> should be marked as placeholders, but the verbatim parameter not marked
> as a placeholder should be left as is.
> 
> This change is a complement for commit 51e846e673 (doc: enforce
> placeholders in documentation, 2023-12-25).
> 
> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> ---
>   diff.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/diff.c b/diff.c
> index ccfa1fca0d..c256ef103e 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -5599,7 +5599,7 @@ struct option *add_diff_options(const struct option *opts,
>   			       PARSE_OPT_NONEG | PARSE_OPT_NOARG,
>   			       diff_opt_dirstat),
>   		OPT_CALLBACK_F(0, "dirstat-by-file", options, N_("<param1,param2>..."),

This line also needs changes:
N_("<param1>,<param2>...")

> -			       N_("synonym for --dirstat=files,param1,param2..."),
> +			       N_("synonym for --dirstat=files,<param1>,<param2>..."),
>   			       PARSE_OPT_NONEG | PARSE_OPT_OPTARG,
>   			       diff_opt_dirstat),
>   		OPT_BIT_F(0, "check", &options->output_format,

Thanks

JN

^ permalink raw reply

* [PATCH 00/12] reftable: improve ref iteration performance (pt.2)
From: Patrick Steinhardt @ 2024-02-14  7:45 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3837 bytes --]

Hi,

this is the second part of my patch series that aim to improve raw ref
iteration performance with the reftable backend. The target of this
series was to get the reftable backend performant enough to match the
files backend. This was achieved via two angles:

  - Refactorings of the merged iterators that allow us to skip copying
    and allocating records from the sub-iterators to the caller. This is
    implemented by the first 8 patches.

  - Refactorings of how we decode reftable records so that we reuse
    allocated memory. Like this the number of allocations don't scale
    with the number of iterated records anymore, but instead with the
    number of blocks which we're iterating over.

Combined, these refactorings lead to a sizeable speedup when iterating
over 1 million refs:

```
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
  Time (mean ± σ):     146.1 ms ±   4.2 ms    [User: 143.2 ms, System: 2.8 ms]
  Range (min … max):   140.7 ms … 180.2 ms    1000 runs

Benchmark 2: show-ref: single matching ref (revision = HEAD)
  Time (mean ± σ):      97.6 ms ±   3.2 ms    [User: 94.8 ms, System: 2.7 ms]
  Range (min … max):    94.6 ms … 122.1 ms    1000 runs

Summary
  show-ref: single matching ref (revision = HEAD) ran
    1.50 ± 0.07 times faster than show-ref: single matching ref (revision = HEAD~)
```

With this, the reftable backend is now on par with the files backend:

```
Benchmark 1: show-ref: single matching ref (refformat = files)
  Time (mean ± σ):      97.8 ms ±   3.4 ms    [User: 87.6 ms, System: 10.0 ms]
  Range (min … max):    95.0 ms … 121.3 ms    1000 runs

Benchmark 2: show-ref: single matching ref (refformat = reftable)
  Time (mean ± σ):      97.4 ms ±   3.2 ms    [User: 94.5 ms, System: 2.7 ms]
  Range (min … max):    94.1 ms … 126.3 ms    1000 runs

Summary
  show-ref: single matching ref (refformat = reftable) ran
    1.00 ± 0.05 times faster than show-ref: single matching ref (refformat = files)
```

There are still optimization opportunities left, but given that the
original target has been fulfilled I decided to stop so that the patch
series remains somewhat reasonably sized.

The patch series is based on `master` at 2996f11c1d (Sync with 'maint',
2024-02-12) and depends on ps/reftable-iteration-perf at c68ca7abd3
(reftable/reader: add comments to `table_iter_next()`, 2024-02-12).

Patrick

Patrick Steinhardt (12):
  reftable/pq: use `size_t` to track iterator index
  reftable/merged: make `merged_iter` structure private
  reftable/merged: advance subiter on subsequent iteration
  reftable/merged: make subiters own their records
  reftable/merged: remove unnecessary null check for subiters
  reftable/merged: handle subiter cleanup on close only
  reftable/merged: circumvent pqueue with single subiter
  reftable/merged: avoid duplicate pqueue emptiness check
  reftable/record: reuse refname when decoding
  reftable/record: reuse refname when copying
  reftable/record: decode keys in place
  reftable: allow inlining of a few functions

 reftable/block.c           |  25 +++----
 reftable/block.h           |   2 -
 reftable/iter.c            |   5 --
 reftable/iter.h            |   4 --
 reftable/merged.c          | 139 +++++++++++++++++++------------------
 reftable/merged.h          |   9 ---
 reftable/pq.c              |  18 +----
 reftable/pq.h              |  16 +++--
 reftable/pq_test.c         |  41 +++++------
 reftable/record.c          |  64 +++++++++--------
 reftable/record.h          |  21 ++++--
 reftable/record_test.c     |   3 +-
 reftable/reftable-record.h |   1 +
 13 files changed, 170 insertions(+), 178 deletions(-)


base-commit: 2996f11c1d11ab68823f0939b6469dedc2b9ab90
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH 01/12] reftable/pq: use `size_t` to track iterator index
From: Patrick Steinhardt @ 2024-02-14  7:45 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 864 bytes --]

The reftable priority queue is used by the merged iterator to yield
records from its sub-iterators in the expected order. Each entry has a
record corresponding to such a sub-iterator as well as an index that
indicates which sub-iterator the record belongs to. But while the
sub-iterators are tracked with a `size_t`, we store the index as an
`int` in the entry.

Fix this and use `size_t` consistently.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/pq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/reftable/pq.h b/reftable/pq.h
index e85bac9b52..9e25a43a36 100644
--- a/reftable/pq.h
+++ b/reftable/pq.h
@@ -12,7 +12,7 @@ license that can be found in the LICENSE file or at
 #include "record.h"
 
 struct pq_entry {
-	int index;
+	size_t index;
 	struct reftable_record rec;
 };
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 02/12] reftable/merged: make `merged_iter` structure private
From: Patrick Steinhardt @ 2024-02-14  7:45 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1440 bytes --]

The `merged_iter` structure is not used anywhere outside of "merged.c",
but is declared in its header. Move it into the code file so that it is
clear that its implementation details are never exposed to anything.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 9 +++++++++
 reftable/merged.h | 9 ---------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 1aa6cd31b7..12ebd732e8 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -17,6 +17,15 @@ license that can be found in the LICENSE file or at
 #include "reftable-error.h"
 #include "system.h"
 
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	size_t stack_len;
+	uint8_t typ;
+	int suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
 static int merged_iter_init(struct merged_iter *mi)
 {
 	for (size_t i = 0; i < mi->stack_len; i++) {
diff --git a/reftable/merged.h b/reftable/merged.h
index 7d9f95d27e..288ad66656 100644
--- a/reftable/merged.h
+++ b/reftable/merged.h
@@ -24,15 +24,6 @@ struct reftable_merged_table {
 	uint64_t max;
 };
 
-struct merged_iter {
-	struct reftable_iterator *stack;
-	uint32_t hash_id;
-	size_t stack_len;
-	uint8_t typ;
-	int suppress_deletions;
-	struct merged_iter_pqueue pq;
-};
-
 void merged_table_release(struct reftable_merged_table *mt);
 
 #endif
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 03/12] reftable/merged: advance subiter on subsequent iteration
From: Patrick Steinhardt @ 2024-02-14  7:45 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4308 bytes --]

When advancing the merged iterator, we pop the top-most entry from its
priority queue and then advance the sub-iterator that the entry belongs
to, adding the result as a new entry. This is quite sensible in the case
where the merged iterator is used to actual iterate through records. But
the merged iterator is also used when we look up a single record, only,
so advancing the sub-iterator is wasted effort because we would never
even look at the result.

Instead of immediately advancing the sub-iterator, we can also defer
this to the next iteration of the merged iterator by storing the
intent-to-advance. This results in a small speedup when reading many
records. The following benchmark creates 10000 refs, which will also end
up with many ref lookups:

    Benchmark 1: update-ref: create many refs (revision = HEAD~)
      Time (mean ± σ):     337.2 ms ±   7.3 ms    [User: 200.1 ms, System: 136.9 ms]
      Range (min … max):   329.3 ms … 373.2 ms    100 runs

    Benchmark 2: update-ref: create many refs (revision = HEAD)
      Time (mean ± σ):     332.5 ms ±   5.9 ms    [User: 197.2 ms, System: 135.1 ms]
      Range (min … max):   327.6 ms … 359.8 ms    100 runs

    Summary
      update-ref: create many refs (revision = HEAD) ran
        1.01 ± 0.03 times faster than update-ref: create many refs (revision = HEAD~)

While this speedup alone isn't really worth it, this refactoring will
also allow two additional optimizations in subsequent patches. First, it
will allow us to special-case when there is only a single sub-iter left
to circumvent the priority queue altogether. And second, it makes it
easier to avoid copying records to the caller.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 12ebd732e8..9b1ccfff00 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -19,11 +19,12 @@ license that can be found in the LICENSE file or at
 
 struct merged_iter {
 	struct reftable_iterator *stack;
+	struct merged_iter_pqueue pq;
 	uint32_t hash_id;
 	size_t stack_len;
 	uint8_t typ;
 	int suppress_deletions;
-	struct merged_iter_pqueue pq;
+	ssize_t advance_index;
 };
 
 static int merged_iter_init(struct merged_iter *mi)
@@ -96,13 +97,17 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 	struct pq_entry entry = { 0 };
 	int err = 0;
 
+	if (mi->advance_index >= 0) {
+		err = merged_iter_advance_subiter(mi, mi->advance_index);
+		if (err < 0)
+			return err;
+		mi->advance_index = -1;
+	}
+
 	if (merged_iter_pqueue_is_empty(mi->pq))
 		return 1;
 
 	entry = merged_iter_pqueue_remove(&mi->pq);
-	err = merged_iter_advance_subiter(mi, entry.index);
-	if (err < 0)
-		return err;
 
 	/*
 	  One can also use reftable as datacenter-local storage, where the ref
@@ -116,14 +121,6 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
 		int cmp;
 
-		/*
-		 * When the next entry comes from the same queue as the current
-		 * entry then it must by definition be larger. This avoids a
-		 * comparison in the most common case.
-		 */
-		if (top.index == entry.index)
-			break;
-
 		cmp = reftable_record_cmp(&top.rec, &entry.rec);
 		if (cmp > 0)
 			break;
@@ -137,6 +134,7 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 
 	reftable_record_release(rec);
 	*rec = entry.rec;
+	mi->advance_index = entry.index;
 
 done:
 	if (err)
@@ -160,9 +158,8 @@ static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
 static int merged_iter_next_void(void *p, struct reftable_record *rec)
 {
 	struct merged_iter *mi = p;
-	if (merged_iter_pqueue_is_empty(mi->pq))
+	if (merged_iter_pqueue_is_empty(mi->pq) && mi->advance_index < 0)
 		return 1;
-
 	return merged_iter_next(mi, rec);
 }
 
@@ -255,6 +252,7 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
 		.typ = reftable_record_type(rec),
 		.hash_id = mt->hash_id,
 		.suppress_deletions = mt->suppress_deletions,
+		.advance_index = -1,
 	};
 	struct merged_iter *p;
 	int err;
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 04/12] reftable/merged: make subiters own their records
From: Patrick Steinhardt @ 2024-02-14  7:45 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 9250 bytes --]

For each subiterator, the merged table needs to track their current
record. This record is owned by the priority queue though instead of by
the merged iterator. This is not optimal performance-wise.

For one, we need to move around records whenever we add or remove a
record from the priority queue. Thus, the bigger the entries the more
bytes we need to copy around. And compared to pointers, a reftable
record is rather on the bigger side. The other issue is that this makes
it harder to reuse the records.

Refactor the code so that the merged iterator tracks ownership of the
records per-subiter. Instead of having records in the priority queue, we
can now use mere pointers to the per-subiter records. This also allows
us to swap records between the caller and the per-subiter record instead
of doing an actual copy via `reftable_record_copy_from()`, which removes
the need to release the caller-provided record.

This results in a noticeable speedup when iterating through many refs.
The following benchmark iterates through 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     145.5 ms ±   4.5 ms    [User: 142.5 ms, System: 2.8 ms]
    Range (min … max):   141.3 ms … 177.0 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     139.0 ms ±   4.7 ms    [User: 136.1 ms, System: 2.8 ms]
    Range (min … max):   134.2 ms … 182.2 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

This refactoring also allows a subsequent refactoring where we start
reusing memory allocated by the reftable records because we do not need
to release the caller-provided record anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c  | 54 ++++++++++++++++++++++++----------------------
 reftable/pq.c      |  8 ++-----
 reftable/pq.h      |  2 +-
 reftable/pq_test.c | 41 ++++++++++++++++-------------------
 4 files changed, 49 insertions(+), 56 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 9b1ccfff00..ae74234472 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -17,8 +17,13 @@ license that can be found in the LICENSE file or at
 #include "reftable-error.h"
 #include "system.h"
 
+struct merged_subiter {
+	struct reftable_iterator iter;
+	struct reftable_record rec;
+};
+
 struct merged_iter {
-	struct reftable_iterator *stack;
+	struct merged_subiter *subiters;
 	struct merged_iter_pqueue pq;
 	uint32_t hash_id;
 	size_t stack_len;
@@ -32,16 +37,18 @@ static int merged_iter_init(struct merged_iter *mi)
 	for (size_t i = 0; i < mi->stack_len; i++) {
 		struct pq_entry e = {
 			.index = i,
+			.rec = &mi->subiters[i].rec,
 		};
 		int err;
 
-		reftable_record_init(&e.rec, mi->typ);
-		err = iterator_next(&mi->stack[i], &e.rec);
+		reftable_record_init(&mi->subiters[i].rec, mi->typ);
+		err = iterator_next(&mi->subiters[i].iter,
+				    &mi->subiters[i].rec);
 		if (err < 0)
 			return err;
 		if (err > 0) {
-			reftable_iterator_destroy(&mi->stack[i]);
-			reftable_record_release(&e.rec);
+			reftable_iterator_destroy(&mi->subiters[i].iter);
+			reftable_record_release(&mi->subiters[i].rec);
 			continue;
 		}
 
@@ -56,9 +63,11 @@ static void merged_iter_close(void *p)
 	struct merged_iter *mi = p;
 
 	merged_iter_pqueue_release(&mi->pq);
-	for (size_t i = 0; i < mi->stack_len; i++)
-		reftable_iterator_destroy(&mi->stack[i]);
-	reftable_free(mi->stack);
+	for (size_t i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->subiters[i].iter);
+		reftable_record_release(&mi->subiters[i].rec);
+	}
+	reftable_free(mi->subiters);
 }
 
 static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
@@ -66,17 +75,16 @@ static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
 {
 	struct pq_entry e = {
 		.index = idx,
+		.rec = &mi->subiters[idx].rec,
 	};
 	int err;
 
-	reftable_record_init(&e.rec, mi->typ);
-	err = iterator_next(&mi->stack[idx], &e.rec);
+	err = iterator_next(&mi->subiters[idx].iter, &mi->subiters[idx].rec);
 	if (err < 0)
 		return err;
-
 	if (err > 0) {
-		reftable_iterator_destroy(&mi->stack[idx]);
-		reftable_record_release(&e.rec);
+		reftable_iterator_destroy(&mi->subiters[idx].iter);
+		reftable_record_release(&mi->subiters[idx].rec);
 		return 0;
 	}
 
@@ -86,7 +94,7 @@ static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
 
 static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
 {
-	if (iterator_is_null(&mi->stack[idx]))
+	if (iterator_is_null(&mi->subiters[idx].iter))
 		return 0;
 	return merged_iter_advance_nonnull_subiter(mi, idx);
 }
@@ -121,25 +129,19 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
 		int cmp;
 
-		cmp = reftable_record_cmp(&top.rec, &entry.rec);
+		cmp = reftable_record_cmp(top.rec, entry.rec);
 		if (cmp > 0)
 			break;
 
 		merged_iter_pqueue_remove(&mi->pq);
 		err = merged_iter_advance_subiter(mi, top.index);
 		if (err < 0)
-			goto done;
-		reftable_record_release(&top.rec);
+			return err;
 	}
 
-	reftable_record_release(rec);
-	*rec = entry.rec;
 	mi->advance_index = entry.index;
-
-done:
-	if (err)
-		reftable_record_release(&entry.rec);
-	return err;
+	SWAP(*rec, *entry.rec);
+	return 0;
 }
 
 static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
@@ -257,10 +259,10 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
 	struct merged_iter *p;
 	int err;
 
-	REFTABLE_CALLOC_ARRAY(merged.stack, mt->stack_len);
+	REFTABLE_CALLOC_ARRAY(merged.subiters, mt->stack_len);
 	for (size_t i = 0; i < mt->stack_len; i++) {
 		err = reftable_table_seek_record(&mt->stack[i],
-						 &merged.stack[merged.stack_len], rec);
+						 &merged.subiters[merged.stack_len].iter, rec);
 		if (err < 0)
 			goto out;
 		if (!err)
diff --git a/reftable/pq.c b/reftable/pq.c
index e0ccce2b97..0074d6bc43 100644
--- a/reftable/pq.c
+++ b/reftable/pq.c
@@ -14,7 +14,7 @@ license that can be found in the LICENSE file or at
 
 int pq_less(struct pq_entry *a, struct pq_entry *b)
 {
-	int cmp = reftable_record_cmp(&a->rec, &b->rec);
+	int cmp = reftable_record_cmp(a->rec, b->rec);
 	if (cmp == 0)
 		return a->index > b->index;
 	return cmp < 0;
@@ -82,10 +82,6 @@ void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, const struct pq_entry
 
 void merged_iter_pqueue_release(struct merged_iter_pqueue *pq)
 {
-	int i = 0;
-	for (i = 0; i < pq->len; i++) {
-		reftable_record_release(&pq->heap[i].rec);
-	}
 	FREE_AND_NULL(pq->heap);
-	pq->len = pq->cap = 0;
+	memset(pq, 0, sizeof(*pq));
 }
diff --git a/reftable/pq.h b/reftable/pq.h
index 9e25a43a36..ce23972c16 100644
--- a/reftable/pq.h
+++ b/reftable/pq.h
@@ -13,7 +13,7 @@ license that can be found in the LICENSE file or at
 
 struct pq_entry {
 	size_t index;
-	struct reftable_record rec;
+	struct reftable_record *rec;
 };
 
 struct merged_iter_pqueue {
diff --git a/reftable/pq_test.c b/reftable/pq_test.c
index c202eff848..b7d3c80cc7 100644
--- a/reftable/pq_test.c
+++ b/reftable/pq_test.c
@@ -27,48 +27,43 @@ void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
 
 static void test_pq(void)
 {
-	char *names[54] = { NULL };
-	int N = ARRAY_SIZE(names) - 1;
-
 	struct merged_iter_pqueue pq = { NULL };
+	struct reftable_record recs[54];
+	int N = ARRAY_SIZE(recs) - 1, i;
 	char *last = NULL;
 
-	int i = 0;
 	for (i = 0; i < N; i++) {
-		char name[100];
-		snprintf(name, sizeof(name), "%02d", i);
-		names[i] = xstrdup(name);
+		struct strbuf refname = STRBUF_INIT;
+		strbuf_addf(&refname, "%02d", i);
+
+		reftable_record_init(&recs[i], BLOCK_TYPE_REF);
+		recs[i].u.ref.refname = strbuf_detach(&refname, NULL);
 	}
 
 	i = 1;
 	do {
-		struct pq_entry e = { .rec = { .type = BLOCK_TYPE_REF,
-					       .u.ref = {
-						       .refname = names[i],
-					       } } };
+		struct pq_entry e = {
+			.rec = &recs[i],
+		};
+
 		merged_iter_pqueue_add(&pq, &e);
 		merged_iter_pqueue_check(pq);
+
 		i = (i * 7) % N;
 	} while (i != 1);
 
 	while (!merged_iter_pqueue_is_empty(pq)) {
 		struct pq_entry e = merged_iter_pqueue_remove(&pq);
-		struct reftable_record *rec = &e.rec;
 		merged_iter_pqueue_check(pq);
 
-		EXPECT(reftable_record_type(rec) == BLOCK_TYPE_REF);
-		if (last) {
-			EXPECT(strcmp(last, rec->u.ref.refname) < 0);
-		}
-		/* this is names[i], so don't dealloc. */
-		last = rec->u.ref.refname;
-		rec->u.ref.refname = NULL;
-		reftable_record_release(rec);
-	}
-	for (i = 0; i < N; i++) {
-		reftable_free(names[i]);
+		EXPECT(reftable_record_type(e.rec) == BLOCK_TYPE_REF);
+		if (last)
+			EXPECT(strcmp(last, e.rec->u.ref.refname) < 0);
+		last = e.rec->u.ref.refname;
 	}
 
+	for (i = 0; i < N; i++)
+		reftable_record_release(&recs[i]);
 	merged_iter_pqueue_release(&pq);
 }
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 05/12] reftable/merged: remove unnecessary null check for subiters
From: Patrick Steinhardt @ 2024-02-14  7:45 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3071 bytes --]

Whenever we advance a subiter we first call `iterator_is_null()`. This
is not needed though because we only ever advance subiters which have
entries in the priority queue, and we do not end entries to the priority
queue when the subiter has been exhausted.

Drop the check as well as the now-unused function. This results in a
surprisingly big speedup:

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     138.1 ms ±   4.4 ms    [User: 135.1 ms, System: 2.8 ms]
      Range (min … max):   133.4 ms … 167.3 ms    1000 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     134.4 ms ±   4.2 ms    [User: 131.5 ms, System: 2.8 ms]
      Range (min … max):   130.0 ms … 164.0 ms    1000 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.03 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/iter.c   |  5 -----
 reftable/iter.h   |  4 ----
 reftable/merged.c | 10 +---------
 3 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/reftable/iter.c b/reftable/iter.c
index 8b5ebf6183..7aa30c4a51 100644
--- a/reftable/iter.c
+++ b/reftable/iter.c
@@ -16,11 +16,6 @@ license that can be found in the LICENSE file or at
 #include "reader.h"
 #include "reftable-error.h"
 
-int iterator_is_null(struct reftable_iterator *it)
-{
-	return !it->ops;
-}
-
 static void filtering_ref_iterator_close(void *iter_arg)
 {
 	struct filtering_ref_iterator *fri = iter_arg;
diff --git a/reftable/iter.h b/reftable/iter.h
index 47d67d84df..537431baba 100644
--- a/reftable/iter.h
+++ b/reftable/iter.h
@@ -16,10 +16,6 @@ license that can be found in the LICENSE file or at
 #include "reftable-iterator.h"
 #include "reftable-generic.h"
 
-/* Returns true for a zeroed out iterator, such as the one returned from
- * iterator_destroy. */
-int iterator_is_null(struct reftable_iterator *it);
-
 /* iterator that produces only ref records that point to `oid` */
 struct filtering_ref_iterator {
 	int double_check;
diff --git a/reftable/merged.c b/reftable/merged.c
index ae74234472..29ad09f3d8 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -70,8 +70,7 @@ static void merged_iter_close(void *p)
 	reftable_free(mi->subiters);
 }
 
-static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
-					       size_t idx)
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
 {
 	struct pq_entry e = {
 		.index = idx,
@@ -92,13 +91,6 @@ static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
 	return 0;
 }
 
-static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
-{
-	if (iterator_is_null(&mi->subiters[idx].iter))
-		return 0;
-	return merged_iter_advance_nonnull_subiter(mi, idx);
-}
-
 static int merged_iter_next_entry(struct merged_iter *mi,
 				  struct reftable_record *rec)
 {
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 06/12] reftable/merged: handle subiter cleanup on close only
From: Patrick Steinhardt @ 2024-02-14  7:46 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1472 bytes --]

When advancing one of the subiters fails we immediately release
resources associated with that subiter. This is not necessary though as
we will release these resources when closing the merged iterator anyway.

Drop the logic and only release resources when the merged iterator is
done. This is a mere cleanup that should help reduce the cognitive load
when reading through the code.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 29ad09f3d8..d9ed4a19dd 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -46,11 +46,8 @@ static int merged_iter_init(struct merged_iter *mi)
 				    &mi->subiters[i].rec);
 		if (err < 0)
 			return err;
-		if (err > 0) {
-			reftable_iterator_destroy(&mi->subiters[i].iter);
-			reftable_record_release(&mi->subiters[i].rec);
+		if (err > 0)
 			continue;
-		}
 
 		merged_iter_pqueue_add(&mi->pq, &e);
 	}
@@ -79,13 +76,8 @@ static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
 	int err;
 
 	err = iterator_next(&mi->subiters[idx].iter, &mi->subiters[idx].rec);
-	if (err < 0)
+	if (err)
 		return err;
-	if (err > 0) {
-		reftable_iterator_destroy(&mi->subiters[idx].iter);
-		reftable_record_release(&mi->subiters[idx].rec);
-		return 0;
-	}
 
 	merged_iter_pqueue_add(&mi->pq, &e);
 	return 0;
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 07/12] reftable/merged: circumvent pqueue with single subiter
From: Patrick Steinhardt @ 2024-02-14  7:46 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3220 bytes --]

The merged iterator uses a priority queue to order records so that we
can yielid them in the expected order. This priority queue of course
comes with some overhead as we need to add, compare and remove entries
in that priority queue.

In the general case, that overhead cannot really be avoided. But when we
have a single subiter left then there is no need to use the priority
queue anymore because the order is exactly the same as what that subiter
would return.

While having a single subiter may sound like an edge case, it happens
more frequently than one might think. In the most common scenario, you
can expect a repository to have a single large table that contains most
of the records and then a set of smaller tables which contain later
additions to the reftable stack. In this case it is quite likely that we
exhaust subiters of those smaller stacks before exhausting the large
table.

Special-case this and return records directly from the remaining
subiter. This results in a sizeable speedup when iterating over 1m refs
in a repository with a single table:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     135.4 ms ±   4.4 ms    [User: 132.5 ms, System: 2.8 ms]
    Range (min … max):   131.0 ms … 166.3 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     126.3 ms ±   3.9 ms    [User: 123.3 ms, System: 2.8 ms]
    Range (min … max):   122.7 ms … 157.0 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.07 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index d9ed4a19dd..29161a32cf 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -87,16 +87,36 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 				  struct reftable_record *rec)
 {
 	struct pq_entry entry = { 0 };
-	int err = 0;
+	int err = 0, empty;
+
+	empty = merged_iter_pqueue_is_empty(mi->pq);
 
 	if (mi->advance_index >= 0) {
+		/*
+		 * When there are no pqueue entries then we only have a single
+		 * subiter left. There is no need to use the pqueue in that
+		 * case anymore as we know that the subiter will return entries
+		 * in the correct order already.
+		 *
+		 * While this may sound like a very specific edge case, it may
+		 * happen more frequently than you think. Most repositories
+		 * will end up having a single large base table that contains
+		 * most of the refs. It's thus likely that we exhaust all
+		 * subiters but the one from that base ref.
+		 */
+		if (empty)
+			return iterator_next(&mi->subiters[mi->advance_index].iter,
+					     rec);
+
 		err = merged_iter_advance_subiter(mi, mi->advance_index);
 		if (err < 0)
 			return err;
+		if (!err)
+			empty = 0;
 		mi->advance_index = -1;
 	}
 
-	if (merged_iter_pqueue_is_empty(mi->pq))
+	if (empty)
 		return 1;
 
 	entry = merged_iter_pqueue_remove(&mi->pq);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 08/12] reftable/merged: avoid duplicate pqueue emptiness check
From: Patrick Steinhardt @ 2024-02-14  7:46 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2590 bytes --]

When calling `merged_iter_next_void()` we first check whether the iter
has been exhausted already. We already perform this check two levels
down the stack in `merged_iter_next_entry()` though, which makes this
check redundant.

Now if this check was there to accellerate the common case it might have
made sense to keep it. But the iterator being exhausted is rather the
uncommon case because you can expect most reftable stacks to contain
more than two refs.

Simplify the code by removing the check. As `merged_iter_next_void()` is
basically empty except for calling `merged_iter_next()` now, merge these
two functions. This also results in a tiny speedup when iterating over
many refs:

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     125.6 ms ±   3.8 ms    [User: 122.7 ms, System: 2.8 ms]
      Range (min … max):   122.4 ms … 153.4 ms    1000 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     124.0 ms ±   3.9 ms    [User: 121.1 ms, System: 2.8 ms]
      Range (min … max):   120.1 ms … 156.4 ms    1000 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 29161a32cf..f85a24c678 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -148,27 +148,19 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 	return 0;
 }
 
-static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
+static int merged_iter_next_void(void *p, struct reftable_record *rec)
 {
+	struct merged_iter *mi = p;
 	while (1) {
 		int err = merged_iter_next_entry(mi, rec);
-		if (err == 0 && mi->suppress_deletions &&
-		    reftable_record_is_deletion(rec)) {
+		if (err)
+			return err;
+		if (mi->suppress_deletions && reftable_record_is_deletion(rec))
 			continue;
-		}
-
-		return err;
+		return 0;
 	}
 }
 
-static int merged_iter_next_void(void *p, struct reftable_record *rec)
-{
-	struct merged_iter *mi = p;
-	if (merged_iter_pqueue_is_empty(mi->pq) && mi->advance_index < 0)
-		return 1;
-	return merged_iter_next(mi, rec);
-}
-
 static struct reftable_iterator_vtable merged_iter_vtable = {
 	.next = &merged_iter_next_void,
 	.close = &merged_iter_close,
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 09/12] reftable/record: reuse refname when decoding
From: Patrick Steinhardt @ 2024-02-14  7:46 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3425 bytes --]

When decoding a reftable record we will first release the user-provided
record and then decode the new record into it. This is quite inefficient
as we basically need to reallocate at least the refname every time.

Refactor the function to start tracking the refname capacity. Like this,
we can stow away the refname, release, restore and then grow the refname
to the required number of bytes via `REFTABLE_ALLOC_GROW()`.

This refactoring is safe to do because all functions that assigning to
the refname will first call `release_reftable_record()`, which will zero
out the complete record after releasing memory.

This change results in a nice speedup when iterating over 1 million
refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)

    Time (mean ± σ):     124.0 ms ±   3.9 ms    [User: 121.1 ms, System: 2.7 ms]
    Range (min … max):   120.4 ms … 152.7 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     114.4 ms ±   3.7 ms    [User: 111.5 ms, System: 2.7 ms]
    Range (min … max):   111.0 ms … 152.1 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.08 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Furthermore, with this change we now perform a mostly constant number of
allocations when iterating. Before this change:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 1,006,620 allocs, 1,006,495 frees, 25,398,363 bytes allocated

After this change:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 6,623 allocs, 6,498 frees, 509,592 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c          | 16 ++++++++++++----
 reftable/reftable-record.h |  1 +
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/reftable/record.c b/reftable/record.c
index d6bb42e887..e800cfef00 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -368,16 +368,24 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key,
 	struct reftable_ref_record *r = rec;
 	struct string_view start = in;
 	uint64_t update_index = 0;
-	int n = get_var_int(&update_index, &in);
+	const char *refname = NULL;
+	size_t refname_cap = 0;
+	int n;
+
+	assert(hash_size > 0);
+
+	n = get_var_int(&update_index, &in);
 	if (n < 0)
 		return n;
 	string_view_consume(&in, n);
 
+	SWAP(refname, r->refname);
+	SWAP(refname_cap, r->refname_cap);
 	reftable_ref_record_release(r);
+	SWAP(refname, r->refname);
+	SWAP(refname_cap, r->refname_cap);
 
-	assert(hash_size > 0);
-
-	r->refname = reftable_malloc(key.len + 1);
+	REFTABLE_ALLOC_GROW(r->refname, key.len + 1, r->refname_cap);
 	memcpy(r->refname, key.buf, key.len);
 	r->refname[key.len] = 0;
 
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index bb6e99acd3..e657001d42 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -22,6 +22,7 @@ license that can be found in the LICENSE file or at
 /* reftable_ref_record holds a ref database entry target_value */
 struct reftable_ref_record {
 	char *refname; /* Name of the ref, malloced. */
+	size_t refname_cap;
 	uint64_t update_index; /* Logical timestamp at which this value is
 				* written */
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 10/12] reftable/record: reuse refname when copying
From: Patrick Steinhardt @ 2024-02-14  7:46 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

Do the same optimization as in the preceding commit, but this time for
`reftable_record_copy()`. While not as noticeable, it still results in a
small speedup when iterating over 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     114.0 ms ±   3.8 ms    [User: 111.1 ms, System: 2.7 ms]
    Range (min … max):   110.9 ms … 144.3 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     112.5 ms ±   3.7 ms    [User: 109.5 ms, System: 2.8 ms]
    Range (min … max):   109.2 ms … 140.7 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.01 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/reftable/record.c b/reftable/record.c
index e800cfef00..3f2a639036 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -205,14 +205,26 @@ static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
 {
 	struct reftable_ref_record *ref = rec;
 	const struct reftable_ref_record *src = src_rec;
+	char *refname = NULL;
+	size_t refname_cap = 0;
+
 	assert(hash_size > 0);
 
-	/* This is simple and correct, but we could probably reuse the hash
-	 * fields. */
+	SWAP(refname, ref->refname);
+	SWAP(refname_cap, ref->refname_cap);
 	reftable_ref_record_release(ref);
+	SWAP(refname, ref->refname);
+	SWAP(refname_cap, ref->refname_cap);
+
 	if (src->refname) {
-		ref->refname = xstrdup(src->refname);
+		size_t refname_len = strlen(src->refname);
+
+		REFTABLE_ALLOC_GROW(ref->refname, refname_len + 1,
+				    ref->refname_cap);
+		memcpy(ref->refname, src->refname, refname_len);
+		ref->refname[refname_len] = 0;
 	}
+
 	ref->update_index = src->update_index;
 	ref->value_type = src->value_type;
 	switch (src->value_type) {
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 11/12] reftable/record: decode keys in place
From: Patrick Steinhardt @ 2024-02-14  7:46 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1707895758.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 7644 bytes --]

When reading a record from a block, we need to decode the record's key.
As reftable keys are prefix-compressed, meaning they reuse a prefix from
the preceding record's key, this is a bit more involved than just having
to copy the relevant bytes: we need to figure out the prefix and suffix
lengths, copy the prefix from the preceding record and finally copy the
suffix from the current record.

This is done by passing three buffers to `reftable_decode_key()`: one
buffer that holds the result, one buffer that holds the last key, and
one buffer that points to the current record. The final key is then
assembled by calling `strbuf_add()` twice to copy over the prefix and
suffix.

Performing two memory copies is inefficient though. And we can indeed do
better by decoding keys in place. Instead of providing two buffers, the
caller may only call a single buffer that is already pre-populated with
the last key. Like this, we only have to call `strbuf_setlen()` to trim
the record to its prefix and then `strbuf_add()` to add the suffix.

This refactoring leads to a noticeable performance bump when iterating
over 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     112.2 ms ±   3.9 ms    [User: 109.3 ms, System: 2.8 ms]
    Range (min … max):   109.2 ms … 149.6 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     106.0 ms ±   3.5 ms    [User: 103.2 ms, System: 2.7 ms]
    Range (min … max):   103.2 ms … 133.7 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.06 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c       | 25 +++++++++++--------------
 reftable/block.h       |  2 --
 reftable/record.c      | 19 +++++++++----------
 reftable/record.h      |  9 ++++++---
 reftable/record_test.c |  3 ++-
 5 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 72eb73b380..ad9074dba6 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -291,9 +291,8 @@ static int restart_key_less(size_t idx, void *args)
 	/* the restart key is verbatim in the block, so this could avoid the
 	   alloc for decoding the key */
 	struct strbuf rkey = STRBUF_INIT;
-	struct strbuf last_key = STRBUF_INIT;
 	uint8_t unused_extra;
-	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
+	int n = reftable_decode_key(&rkey, &unused_extra, in);
 	int result;
 	if (n < 0) {
 		a->error = 1;
@@ -326,35 +325,34 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 	if (it->next_off >= it->br->block_len)
 		return 1;
 
-	n = reftable_decode_key(&it->key, &extra, it->last_key, in);
+	n = reftable_decode_key(&it->last_key, &extra, in);
 	if (n < 0)
 		return -1;
-
-	if (!it->key.len)
+	if (!it->last_key.len)
 		return REFTABLE_FORMAT_ERROR;
 
 	string_view_consume(&in, n);
-	n = reftable_record_decode(rec, it->key, extra, in, it->br->hash_size);
+	n = reftable_record_decode(rec, it->last_key, extra, in, it->br->hash_size);
 	if (n < 0)
 		return -1;
 	string_view_consume(&in, n);
 
-	strbuf_swap(&it->last_key, &it->key);
 	it->next_off += start.len - in.len;
 	return 0;
 }
 
 int block_reader_first_key(struct block_reader *br, struct strbuf *key)
 {
-	struct strbuf empty = STRBUF_INIT;
-	int off = br->header_off + 4;
+	int off = br->header_off + 4, n;
 	struct string_view in = {
 		.buf = br->block.data + off,
 		.len = br->block_len - off,
 	};
-
 	uint8_t extra = 0;
-	int n = reftable_decode_key(key, &extra, empty, in);
+
+	strbuf_reset(key);
+
+	n = reftable_decode_key(key, &extra, in);
 	if (n < 0)
 		return n;
 	if (!key->len)
@@ -371,7 +369,6 @@ int block_iter_seek(struct block_iter *it, struct strbuf *want)
 void block_iter_close(struct block_iter *it)
 {
 	strbuf_release(&it->last_key);
-	strbuf_release(&it->key);
 }
 
 int block_reader_seek(struct block_reader *br, struct block_iter *it,
@@ -408,8 +405,8 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
 		if (err < 0)
 			goto done;
 
-		reftable_record_key(&rec, &it->key);
-		if (err > 0 || strbuf_cmp(&it->key, want) >= 0) {
+		reftable_record_key(&rec, &it->last_key);
+		if (err > 0 || strbuf_cmp(&it->last_key, want) >= 0) {
 			err = 0;
 			goto done;
 		}
diff --git a/reftable/block.h b/reftable/block.h
index 17481e6331..51699af233 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -84,12 +84,10 @@ struct block_iter {
 
 	/* key for last entry we read. */
 	struct strbuf last_key;
-	struct strbuf key;
 };
 
 #define BLOCK_ITER_INIT { \
 	.last_key = STRBUF_INIT, \
-	.key = STRBUF_INIT, \
 }
 
 /* initializes a block reader. */
diff --git a/reftable/record.c b/reftable/record.c
index 3f2a639036..37682cc7d0 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -159,20 +159,19 @@ int reftable_encode_key(int *restart, struct string_view dest,
 	return start.len - dest.len;
 }
 
-int reftable_decode_key(struct strbuf *key, uint8_t *extra,
-			struct strbuf last_key, struct string_view in)
+int reftable_decode_key(struct strbuf *last_key, uint8_t *extra,
+			struct string_view in)
 {
 	int start_len = in.len;
 	uint64_t prefix_len = 0;
 	uint64_t suffix_len = 0;
-	int n = get_var_int(&prefix_len, &in);
+	int n;
+
+	n = get_var_int(&prefix_len, &in);
 	if (n < 0)
 		return -1;
 	string_view_consume(&in, n);
 
-	if (prefix_len > last_key.len)
-		return -1;
-
 	n = get_var_int(&suffix_len, &in);
 	if (n <= 0)
 		return -1;
@@ -181,12 +180,12 @@ int reftable_decode_key(struct strbuf *key, uint8_t *extra,
 	*extra = (uint8_t)(suffix_len & 0x7);
 	suffix_len >>= 3;
 
-	if (in.len < suffix_len)
+	if (in.len < suffix_len ||
+	    prefix_len > last_key->len)
 		return -1;
 
-	strbuf_reset(key);
-	strbuf_add(key, last_key.buf, prefix_len);
-	strbuf_add(key, in.buf, suffix_len);
+	strbuf_setlen(last_key, prefix_len);
+	strbuf_add(last_key, in.buf, suffix_len);
 	string_view_consume(&in, suffix_len);
 
 	return start_len - in.len;
diff --git a/reftable/record.h b/reftable/record.h
index a05e2be179..91c9c6ebfd 100644
--- a/reftable/record.h
+++ b/reftable/record.h
@@ -81,9 +81,12 @@ int reftable_encode_key(int *is_restart, struct string_view dest,
 			struct strbuf prev_key, struct strbuf key,
 			uint8_t extra);
 
-/* Decode into `key` and `extra` from `in` */
-int reftable_decode_key(struct strbuf *key, uint8_t *extra,
-			struct strbuf last_key, struct string_view in);
+/*
+ * Decode into `last_key` and `extra` from `in`. `last_key` is expected to
+ * contain the decoded key of the preceding record, if any.
+ */
+int reftable_decode_key(struct strbuf *last_key, uint8_t *extra,
+			struct string_view in);
 
 /* reftable_index_record are used internally to speed up lookups. */
 struct reftable_index_record {
diff --git a/reftable/record_test.c b/reftable/record_test.c
index a86cff5526..89209894d8 100644
--- a/reftable/record_test.c
+++ b/reftable/record_test.c
@@ -295,7 +295,8 @@ static void test_key_roundtrip(void)
 	EXPECT(!restart);
 	EXPECT(n > 0);
 
-	m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest);
+	strbuf_addstr(&roundtrip, "refs/heads/master");
+	m = reftable_decode_key(&roundtrip, &rt_extra, dest);
 	EXPECT(n == m);
 	EXPECT(0 == strbuf_cmp(&key, &roundtrip));
 	EXPECT(rt_extra == extra);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox