Git development

Git development
 help / color / mirror / Atom feed

* Re: Is --minimal ever not the right thing?
From: Mike Castle @ 2023-12-19 17:25 UTC (permalink / raw)
  To: Tao Klerks; +Cc: git
In-Reply-To: <CAPMMpohbQK+3o46iiY+0o=vS+UC_HBB=CxsNT_hAb5dDz+514Q@mail.gmail.com>

I believe that the diff algorithms available are the same one's in GNU
diff.  From https://www.gnu.org/software/diffutils/manual/html_node/diff-Performance.html:
"""
The way that GNU diff determines which lines have changed always comes
up with a near-minimal set of differences. Usually it is good enough
for practical purposes. If the diff output is large, you might want
diff to use a modified algorithm that sometimes produces a smaller set
of differences. The --minimal (-d) option does this; however, it can
also cause diff to run more slowly than usual, so it is not the
default behavior.
"""

Since it has been that way decades before git even existed, I suspect
(but do not know) that, yes, analysis has been performed, and it makes
sense to keep the current default.

Then again, in the decades sense, the entire stack from hardware to
compilers has improved, and maybe it does deserve a revisit.  You
could check whatever email archives is used for diffutils and see if
there has been any discussion on it recently (say, last 5 years?).

As you pointed out, you can set it yourself and see what happens over time.

Cheers,
mrc

^ permalink raw reply

* Re: What's cooking in git.git (Dec 2023, #03; Mon, 18)
From: Junio C Hamano @ 2023-12-19 17:30 UTC (permalink / raw)
  To: René Scharfe; +Cc: git
In-Reply-To: <3e6c7e70-33c2-4607-b9a3-8d70d2a83ff5@web.de>

René Scharfe <l.s.r@web.de> writes:

> Am 19.12.23 um 02:06 schrieb Junio C Hamano:
>> * rs/t6300-compressed-size-fix (2023-12-13) 2 commits
>>  - test-lib-functions: add object size functions
>>  - t6300: avoid hard-coding object sizes
>>
>>  Test fix.
>>
>>  Will merge to 'next'?
>>  source: <9feeb6cf-aabf-4002-917f-3f6c27547bc8@web.de>
>>  source: <ff735aac-b60b-4d52-a6dc-180ab504fc8d@web.de>
>
> The first patch is good to go.  The seconds one isn't; please drop it.

Alright.  Thanks.  Will do.

^ permalink raw reply

* Re: git diff-files bug
From: Elijah Newren @ 2023-12-19 17:47 UTC (permalink / raw)
  To: Josh Reed; +Cc: git
In-Reply-To: <CAELOy+5AsRyLEs-WdYw1spqkmMDKjKSQzbogAoRBFe-zGLjvXg@mail.gmail.com>

Hi,

On Tue, Dec 19, 2023 at 8:56 AM Josh Reed <jriddy@gmail.com> wrote:
>
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)

Let's extend your example a bit to point out when/if the index is updated...

>
> > bash
> ⬢[jreed@toolbox example-project]$ git diff-files --exit-code --patch; echo $?
> 0

$ stat --format=%y .git/index
2023-12-19 09:30:55.867055501 -0800

This is the modification time of the index at the very beginning.

> ⬢[jreed@toolbox example-project]$ chmod g+w README.md

$ stat --format=%y .git/index
2023-12-19 09:30:55.867055501 -0800

As expected, the index has not changed just because some file changed.

> ⬢[jreed@toolbox example-project]$ git diff-files --exit-code --patch; echo $?
> 1

$ stat --format=%y .git/index
2023-12-19 09:30:55.867055501 -0800

Note that the index still has not changed; diff-files will not update
it when it notices differences.  But diff-files shows us differences
between the working tree and index, and the working tree has changed.
Those changes mean that the stat information in the index no longer
matches the working tree.  That alone is enough for diff-files to give
a non-zero exit status.  This does tend to trip up folks, though.

> ⬢[jreed@toolbox example-project]$ git diff --exit-code --patch; echo $?
> 0

$ stat --format=%y .git/index
2023-12-19 09:33:46.773896615 -0800

Now the index _was_ updated.  So, not only are there no content
differences, and no relevant filemode differences, but the stat
information in the index now matches what is in the working tree.  So,
diff returns an exit code of 0.  (Note that this means "git diff" is
not a read-only operation, which sometimes trips up people in the
other direction.)

> ⬢[jreed@toolbox example-project]$ git diff-files --exit-code --patch; echo $?
> 0

Right, contents match, relelvant filemodes match, and the stat
information in the index now matches what is in the working tree.
And:

$ stat --format=%y .git/index
2023-12-19 09:33:46.773896615 -0800

...no further changes to the index from this command, of course.

> What did you expect to happen? (Expected behavior)
>
> Git diff-files should likely ignore group permissions changes, or at least
> its output should be stable across the same worktree/index state.

It does ignore group permission changes; the problem was that stat
information in the index did not match what was in the working tree.
If you run

$ git update-index --refresh

before calling diff-files, that'll make sure the stat information is up-to-date.

> What happened instead? (Actual behavior)
>
> The command `git diff-files --exit-code --patch` fails with no exit code when
> a file mode has changed in a way that the rest of git commands ignored.
> Furthermore, subsequent git commands seem to change the behavior of `git
> diff-files` in this regard, so it's hard to tell what is the expected behavior.
>
> What's different between what you expected and what actually happened?
>
> The `git diff-files --exit-code` command is inconsistent in how it behaves.
> I suspect it should ignore irrelavant mode changes like `g+w`, but even if
> it should report them, they should produce a patch or at least have stable
> results when we re-run the command.

This has bit a few folks before; I wonder if there's a reasonable
update we can make to the documentation to address this.

^ permalink raw reply

* Re: Is --minimal ever not the right thing?
From: Elijah Newren @ 2023-12-19 17:55 UTC (permalink / raw)
  To: Mike Castle; +Cc: Tao Klerks, git
In-Reply-To: <CA+t9iMyrLAekwQHNky4w9nWD6WwxidxwfSmbqCpSRnkJgoQ0LA@mail.gmail.com>

To add to what Mike said...

On Tue, Dec 19, 2023 at 9:25 AM Mike Castle <dalgoda@gmail.com> wrote:
>
> I believe that the diff algorithms available are the same one's in GNU
> diff.  From https://www.gnu.org/software/diffutils/manual/html_node/diff-Performance.html:
> """
> The way that GNU diff determines which lines have changed always comes
> up with a near-minimal set of differences. Usually it is good enough
> for practical purposes. If the diff output is large, you might want
> diff to use a modified algorithm that sometimes produces a smaller set
> of differences. The --minimal (-d) option does this; however, it can
> also cause diff to run more slowly than usual, so it is not the
> default behavior.
> """
>
> Since it has been that way decades before git even existed, I suspect
> (but do not know) that, yes, analysis has been performed, and it makes
> sense to keep the current default.
>
> Then again, in the decades sense, the entire stack from hardware to
> compilers has improved, and maybe it does deserve a revisit.  You
> could check whatever email archives is used for diffutils and see if
> there has been any discussion on it recently (say, last 5 years?).
>
> As you pointed out, you can set it yourself and see what happens over time.

There have been various discussions of diff performance, quality of
results, what the default should be, etc.  Including within the last
year.

minimal is guaranteed to produce a minimal diff, i.e. fewest total
subtractions and additions.  That is sometimes "best" quality, but
definitely not always.  On the performance axis, in special cases
minimal can be nearly as fast as myers and the other diff algorithms,
but only in special cases.

I think patience or histogram would make better defaults, at least
with some tweaks.  I had some patches to improve some worst case
performance and quality results coming from histogram that I was
working on in early 2023, but those got put on the backburner when
$DAYJOB pulled support for my Git work.  And I'm not aware of anyone
else currently working in the area.

Hope that helps,
Elijah

^ permalink raw reply

* Re: Is --minimal ever not the right thing?
From: Konstantin Tokarev @ 2023-12-19 18:09 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Mike Castle, Tao Klerks, git
In-Reply-To: <CABPp-BEmgOAj17DozyXNaf-9CawDic4uTpMbckef3+zHf7URqQ@mail.gmail.com>

On Tue, 19 Dec 2023 09:55:34 -0800
Elijah Newren <newren@gmail.com> wrote:

> minimal is guaranteed to produce a minimal diff, i.e. fewest total
> subtractions and additions.  That is sometimes "best" quality, but
> definitely not always. 

I second this. Recently I had a case when I had to use --anchored
option of git diff to produce more informative diff instead of minimal
one.

^ permalink raw reply

* [PATCH v2] Teach git apply to respect core.fileMode settings
From: Chandra Pratap via GitGitGadget @ 2023-12-19 18:30 UTC (permalink / raw)
  To: git; +Cc: Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1620.git.1702908568890.gitgitgadget@gmail.com>

From: Chandra Pratap <chandrapratap3519@gmail.com>

When applying a patch that adds an executable file, git apply
ignores the core.fileMode setting (core.fileMode in git config
specifies whether the executable bit on files in the working tree
should be honored or not) resulting in warnings like:

warning: script.sh has type 100644, expected 100755

even when core.fileMode is set to false, which is undesired. This
is extra true for systems like Windows, which don't rely on "lsat()".

Fix this by inferring the correct file mode from the existing
index entry when core.filemode is set to false. The added
test case helps verify the change and prevents future regression.

Reviewed-by: Johannes Schindelin <johannes.schindelin@gmail.com>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
---
    apply: make git apply respect core.fileMode settings
    
    Closes issue #1555 on GitHub.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1620%2FChand-ra%2Fdevel-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1620/Chand-ra/devel-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1620

Range-diff vs v1:

 1:  d896351158c ! 1:  29c8c6d120e Teach git apply to respect core.fileMode settings
     @@ Metadata
       ## Commit message ##
          Teach git apply to respect core.fileMode settings
      
     -    CC: Johannes Schindelin <johannes.schindelin@gmail.com>
     +    When applying a patch that adds an executable file, git apply
     +    ignores the core.fileMode setting (core.fileMode in git config
     +    specifies whether the executable bit on files in the working tree
     +    should be honored or not) resulting in warnings like:
     +
     +    warning: script.sh has type 100644, expected 100755
     +
     +    even when core.fileMode is set to false, which is undesired. This
     +    is extra true for systems like Windows, which don't rely on "lsat()".
     +
     +    Fix this by inferring the correct file mode from the existing
     +    index entry when core.filemode is set to false. The added
     +    test case helps verify the change and prevents future regression.
     +
     +    Reviewed-by: Johannes Schindelin <johannes.schindelin@gmail.com>
          Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
      
       ## apply.c ##
     @@ apply.c: static int check_preimage(struct apply_state *state,
      -	if (!state->cached && !previous)
      -		st_mode = ce_mode_from_stat(*ce, st->st_mode);
      +	if (!state->cached && !previous) {
     -+		if (!trust_executable_bit && patch->old_mode)
     -+			st_mode = patch->old_mode;
     -+		else st_mode = ce_mode_from_stat(*ce, st->st_mode);
     ++		if (!trust_executable_bit)
     ++			st_mode = *ce ? (*ce)->ce_mode : patch->old_mode;
     ++		else
     ++			st_mode = ce_mode_from_stat(*ce, st->st_mode);
      +	}
       
       	if (patch->is_new < 0)
     @@ t/t4129-apply-samemode.sh: test_expect_success POSIXPERM 'do not use core.shared
       	)
       '
       
     -+test_expect_success FILEMODE 'ensure git apply respects core.fileMode' '
     ++test_expect_success 'ensure git apply respects core.fileMode' '
      +	test_config core.fileMode false &&
      +	echo true >script.sh &&
      +	git add --chmod=+x script.sh &&
     ++	git ls-files -s script.sh | grep "^100755" &&
      +	test_tick && git commit -m "Add script" &&
     ++	git ls-tree -r HEAD script.sh | grep "^100755" &&
      +
      +	echo true >>script.sh &&
      +	test_tick && git commit -m "Modify script" script.sh &&
      +	git format-patch -1 --stdout >patch &&
     ++	grep "index.*100755" patch &&
      +
      +	git switch -c branch HEAD^ &&
     ++	git apply --index patch 2>err &&
     ++	! grep "has type 100644, expected 100755" err &&
     ++	git restore -S script.sh && git restore script.sh &&
     ++
      +	git apply patch 2>err &&
     -+	! test_grep "has type 100644, expected 100755" err
     ++	! grep "has type 100644, expected 100755" err &&
     ++
     ++	git apply --cached patch 2>err &&
     ++	! grep "has type 100644, expected 100755" err
      +'
      +
       test_done


 apply.c                   |  8 ++++++--
 t/t4129-apply-samemode.sh | 25 +++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/apply.c b/apply.c
index 3d69fec836d..58f26c40413 100644
--- a/apply.c
+++ b/apply.c
@@ -3778,8 +3778,12 @@ static int check_preimage(struct apply_state *state,
 		return error_errno("%s", old_name);
 	}
 
-	if (!state->cached && !previous)
-		st_mode = ce_mode_from_stat(*ce, st->st_mode);
+	if (!state->cached && !previous) {
+		if (!trust_executable_bit)
+			st_mode = *ce ? (*ce)->ce_mode : patch->old_mode;
+		else
+			st_mode = ce_mode_from_stat(*ce, st->st_mode);
+	}
 
 	if (patch->is_new < 0)
 		patch->is_new = 0;
diff --git a/t/t4129-apply-samemode.sh b/t/t4129-apply-samemode.sh
index e7a7295f1b6..73fc472b246 100755
--- a/t/t4129-apply-samemode.sh
+++ b/t/t4129-apply-samemode.sh
@@ -101,4 +101,29 @@ test_expect_success POSIXPERM 'do not use core.sharedRepository for working tree
 	)
 '
 
+test_expect_success 'ensure git apply respects core.fileMode' '
+	test_config core.fileMode false &&
+	echo true >script.sh &&
+	git add --chmod=+x script.sh &&
+	git ls-files -s script.sh | grep "^100755" &&
+	test_tick && git commit -m "Add script" &&
+	git ls-tree -r HEAD script.sh | grep "^100755" &&
+
+	echo true >>script.sh &&
+	test_tick && git commit -m "Modify script" script.sh &&
+	git format-patch -1 --stdout >patch &&
+	grep "index.*100755" patch &&
+
+	git switch -c branch HEAD^ &&
+	git apply --index patch 2>err &&
+	! grep "has type 100644, expected 100755" err &&
+	git restore -S script.sh && git restore script.sh &&
+
+	git apply patch 2>err &&
+	! grep "has type 100644, expected 100755" err &&
+
+	git apply --cached patch 2>err &&
+	! grep "has type 100644, expected 100755" err
+'
+
 test_done

base-commit: 1a87c842ece327d03d08096395969aca5e0a6996
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH] remote.h: retire CAS_OPT_NAME
From: Junio C Hamano @ 2023-12-19 19:26 UTC (permalink / raw)
  To: git

When the "--force-with-lease" option was introduced in 28f5d176
(remote.c: add command line option parser for "--force-with-lease",
2013-07-08), the design discussion revolved around the concept of
"compare-and-swap", and it can still be seen in the name used for
variables and helper functions.  The end-user facing option name
ended up to be a bit different, so during the development iteration
of the feature, we used this C preprocessor macro to make it easier
to rename it later.

All of that happened more than 10 years ago, and the flexibility
afforded by the CAS_OPT_NAME macro outlived its usefulness.  Inline
the constant string for the option name, like all other option names
in the code.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/push.c      | 4 ++--
 builtin/send-pack.c | 2 +-
 remote.h            | 2 --
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/push.c b/builtin/push.c
index dbdf609daf..e740dd93e3 100644
--- a/builtin/push.c
+++ b/builtin/push.c
@@ -393,7 +393,7 @@ static int push_with_options(struct transport *transport, struct refspec *rs,
 	if (!is_empty_cas(&cas)) {
 		if (!transport->smart_options)
 			die("underlying transport does not support --%s option",
-			    CAS_OPT_NAME);
+			    "force-with-lease");
 		transport->smart_options->cas = &cas;
 	}
 
@@ -604,7 +604,7 @@ int cmd_push(int argc, const char **argv, const char *prefix)
 		OPT_BIT('n' , "dry-run", &flags, N_("dry run"), TRANSPORT_PUSH_DRY_RUN),
 		OPT_BIT( 0,  "porcelain", &flags, N_("machine-readable output"), TRANSPORT_PUSH_PORCELAIN),
 		OPT_BIT('f', "force", &flags, N_("force updates"), TRANSPORT_PUSH_FORCE),
-		OPT_CALLBACK_F(0, CAS_OPT_NAME, &cas, N_("<refname>:<expect>"),
+		OPT_CALLBACK_F(0, "force-with-lease", &cas, N_("<refname>:<expect>"),
 			       N_("require old value of ref to be at this value"),
 			       PARSE_OPT_OPTARG | PARSE_OPT_LITERAL_ARGHELP, parseopt_push_cas_option),
 		OPT_BIT(0, TRANS_OPT_FORCE_IF_INCLUDES, &flags,
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 4784143004..87e5269e9f 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -207,7 +207,7 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix)
 		OPT_BOOL(0, "stateless-rpc", &stateless_rpc, N_("use stateless RPC protocol")),
 		OPT_BOOL(0, "stdin", &from_stdin, N_("read refs from stdin")),
 		OPT_BOOL(0, "helper-status", &helper_status, N_("print status from remote helper")),
-		OPT_CALLBACK_F(0, CAS_OPT_NAME, &cas, N_("<refname>:<expect>"),
+		OPT_CALLBACK_F(0, "force-with-lease", &cas, N_("<refname>:<expect>"),
 		  N_("require old value of ref to be at this value"),
 		  PARSE_OPT_OPTARG, parseopt_push_cas_option),
 		OPT_BOOL(0, TRANS_OPT_FORCE_IF_INCLUDES, &force_if_includes,
diff --git a/remote.h b/remote.h
index 73638cefeb..9ba7f7d3e2 100644
--- a/remote.h
+++ b/remote.h
@@ -398,8 +398,6 @@ struct ref *get_stale_heads(struct refspec *rs, struct ref *fetch_map);
 /*
  * Compare-and-swap
  */
-#define CAS_OPT_NAME "force-with-lease"
-
 struct push_cas_option {
 	unsigned use_tracking_for_rest:1;
 	unsigned use_force_if_includes:1;
-- 
2.43.0-121-g624eb90fa8


^ permalink raw reply related

* Re: [PATCH v2] Teach git apply to respect core.fileMode settings
From: Torsten Bögershausen @ 2023-12-19 20:46 UTC (permalink / raw)
  To: Chandra Pratap via GitGitGadget; +Cc: git, Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1620.v2.git.1703010646036.gitgitgadget@gmail.com>

On Tue, Dec 19, 2023 at 06:30:45PM +0000, Chandra Pratap via GitGitGadget wrote:
> From: Chandra Pratap <chandrapratap3519@gmail.com>
>

Thanks for working on this, some small nit-picks inline.

> When applying a patch that adds an executable file, git apply
> ignores the core.fileMode setting (core.fileMode in git config
> specifies whether the executable bit on files in the working tree
> should be honored or not) resulting in warnings like:
>
> warning: script.sh has type 100644, expected 100755
>
> even when core.fileMode is set to false, which is undesired. This
> is extra true for systems like Windows, which don't rely on "lsat()".

Small typo here: lsat() should be lstat(). But being nit-picking (and simplifying):
Windows does not provide an implementation, so Git for Windows does it's own,
which currently does not implement the x-bit(s).
In short: The ', which don't rely on "lsat()' could probably just removed.

>
> Fix this by inferring the correct file mode from the existing
> index entry when core.filemode is set to false. The added
> test case helps verify the change and prevents future regression.

Another nit-pick, sorry for that, I try to convince everybody to not
use "added".
So may be
Add a test case that verifies the change and prevents future regression.

>
> Reviewed-by: Johannes Schindelin <johannes.schindelin@gmail.com>
> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
> ---
>     apply: make git apply respect core.fileMode settings
>
>     Closes issue #1555 on GitHub.
>

[]
>
>
>  apply.c                   |  8 ++++++--
>  t/t4129-apply-samemode.sh | 25 +++++++++++++++++++++++++
>  2 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/apply.c b/apply.c
> index 3d69fec836d..58f26c40413 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -3778,8 +3778,12 @@ static int check_preimage(struct apply_state *state,
>  		return error_errno("%s", old_name);
>  	}
>
> -	if (!state->cached && !previous)
> -		st_mode = ce_mode_from_stat(*ce, st->st_mode);

> +	if (!state->cached && !previous) {
> +		if (!trust_executable_bit)
> +			st_mode = *ce ? (*ce)->ce_mode : patch->old_mode;
> +		else
> +			st_mode = ce_mode_from_stat(*ce, st->st_mode);
> +	}
>
>  	if (patch->is_new < 0)
>  		patch->is_new = 0;
> diff --git a/t/t4129-apply-samemode.sh b/t/t4129-apply-samemode.sh
> index e7a7295f1b6..73fc472b246 100755
> --- a/t/t4129-apply-samemode.sh
> +++ b/t/t4129-apply-samemode.sh
> @@ -101,4 +101,29 @@ test_expect_success POSIXPERM 'do not use core.sharedRepository for working tree
>  	)
>  '
>
> +test_expect_success 'ensure git apply respects core.fileMode' '

Small nit-pick:
The "ensure" is probably not needed, all tests ensure something.

> +	test_config core.fileMode false &&
> +	echo true >script.sh &&
> +	git add --chmod=+x script.sh &&
> +	git ls-files -s script.sh | grep "^100755" &&
> +	test_tick && git commit -m "Add script" &&
> +	git ls-tree -r HEAD script.sh | grep "^100755" &&
> +
> +	echo true >>script.sh &&
> +	test_tick && git commit -m "Modify script" script.sh &&
> +	git format-patch -1 --stdout >patch &&
> +	grep "index.*100755" patch &&
> +
> +	git switch -c branch HEAD^ &&
> +	git apply --index patch 2>err &&
> +	! grep "has type 100644, expected 100755" err &&

This feels somewhat volatile against future changes.
grep-ing for things that are not there, without verifying
that they are there, without this patch.

On my test system, there is no message at all, so a

test_must_be_empty err &&

feels more "stable".

> +	git restore -S script.sh && git restore script.sh &&
> +
> +	git apply patch 2>err &&
> +	! grep "has type 100644, expected 100755" err &&
> +
> +	git apply --cached patch 2>err &&
> +	! grep "has type 100644, expected 100755" err
> +'
> +
>  test_done

^ permalink raw reply

* [PATCH v3 0/2] completion: refactor and support reftables backend
From: Stan Hu @ 2023-12-19 22:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Christian Couder, Stan Hu
In-Reply-To: <20231130202404.89791-1-stanhu@gmail.com>

This patch series addresses the review feedback:

1. Renames the '__git_ref_exists' helper to '__git_pseudoref_exists' in
   the first refactor patch.

Stan Hu (2):
  completion: refactor existence checks for pseudorefs
  completion: support pseudoref existence checks for reftables

 contrib/completion/git-completion.bash | 43 +++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 5 deletions(-)

-- 
2.42.0


^ permalink raw reply

* [PATCH v3 1/2] completion: refactor existence checks for pseudorefs
From: Stan Hu @ 2023-12-19 22:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Christian Couder, Stan Hu
In-Reply-To: <cover.1703022850.git.stanhu@gmail.com>

In preparation for the reftable backend, this commit introduces a
'__git_pseudoref_exists' function that continues to use 'test -f' to
determine whether a given pseudoref exists in the local filesystem.

Signed-off-by: Stan Hu <stanhu@gmail.com>
---
 contrib/completion/git-completion.bash | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 13a39ebd2e..8edd002eed 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -122,6 +122,15 @@ __git ()
 		${__git_dir:+--git-dir="$__git_dir"} "$@" 2>/dev/null
 }
 
+# Runs git in $__git_repo_path to determine whether a pseudoref exists.
+# 1: The pseudo-ref to search
+__git_pseudoref_exists ()
+{
+	local ref=$1
+
+	[ -f "$__git_repo_path/$ref" ]
+}
+
 # Removes backslash escaping, single quotes and double quotes from a word,
 # stores the result in the variable $dequoted_word.
 # 1: The word to dequote.
@@ -1625,7 +1634,7 @@ __git_cherry_pick_inprogress_options=$__git_sequencer_inprogress_options
 _git_cherry_pick ()
 {
 	__git_find_repo_path
-	if [ -f "$__git_repo_path"/CHERRY_PICK_HEAD ]; then
+	if __git_pseudoref_exists CHERRY_PICK_HEAD; then
 		__gitcomp "$__git_cherry_pick_inprogress_options"
 		return
 	fi
@@ -2067,7 +2076,7 @@ _git_log ()
 	__git_find_repo_path
 
 	local merge=""
-	if [ -f "$__git_repo_path/MERGE_HEAD" ]; then
+	if __git_pseudoref_exists MERGE_HEAD; then
 		merge="--merge"
 	fi
 	case "$prev,$cur" in
@@ -2934,6 +2943,7 @@ _git_reset ()
 
 _git_restore ()
 {
+	__git_find_repo_path
 	case "$prev" in
 	-s)
 		__git_complete_refs
@@ -2952,7 +2962,7 @@ _git_restore ()
 		__gitcomp_builtin restore
 		;;
 	*)
-		if __git rev-parse --verify --quiet HEAD >/dev/null; then
+		if __git_pseudoref_exists HEAD; then
 			__git_complete_index_file "--modified"
 		fi
 	esac
@@ -2963,7 +2973,7 @@ __git_revert_inprogress_options=$__git_sequencer_inprogress_options
 _git_revert ()
 {
 	__git_find_repo_path
-	if [ -f "$__git_repo_path"/REVERT_HEAD ]; then
+	if __git_pseudoref_exists REVERT_HEAD; then
 		__gitcomp "$__git_revert_inprogress_options"
 		return
 	fi
@@ -3592,7 +3602,7 @@ __gitk_main ()
 	__git_find_repo_path
 
 	local merge=""
-	if [ -f "$__git_repo_path/MERGE_HEAD" ]; then
+	if __git_pseudoref_exists MERGE_HEAD; then
 		merge="--merge"
 	fi
 	case "$cur" in
-- 
2.42.0


^ permalink raw reply related

* [PATCH v3 2/2] completion: support pseudoref existence checks for reftables
From: Stan Hu @ 2023-12-19 22:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Christian Couder, Stan Hu
In-Reply-To: <cover.1703022850.git.stanhu@gmail.com>

In contrib/completion/git-completion.bash, there are a bunch of
instances where we read pseudorefs, such as HEAD, MERGE_HEAD,
REVERT_HEAD, and others via the filesystem. However, the upcoming
reftable refs backend won't use '.git/HEAD' at all but instead will
write an invalid refname as placeholder for backwards compatibility,
which will break the git-completion script.

Update the '__git_pseudoref_exists' function to:

1. Recognize the placeholder '.git/HEAD' written by the reftable
   backend (its content is specified in the reftable specs).
2. If reftable is in use, use 'git rev-parse' to determine whether the
    given ref exists.
3. Otherwise, continue to use 'test -f' to check for the ref's filename.

Signed-off-by: Stan Hu <stanhu@gmail.com>
---
 contrib/completion/git-completion.bash | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 8edd002eed..e21a39b406 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -122,12 +122,35 @@ __git ()
 		${__git_dir:+--git-dir="$__git_dir"} "$@" 2>/dev/null
 }
 
+# Helper function to read the first line of a file into a variable.
+# __git_eread requires 2 arguments, the file path and the name of the
+# variable, in that order.
+#
+# This is taken from git-prompt.sh.
+__git_eread ()
+{
+	test -r "$1" && IFS=$'\r\n' read -r "$2" <"$1"
+}
+
 # Runs git in $__git_repo_path to determine whether a pseudoref exists.
 # 1: The pseudo-ref to search
 __git_pseudoref_exists ()
 {
 	local ref=$1
 
+	# If the reftable is in use, we have to shell out to 'git rev-parse'
+	# to determine whether the ref exists instead of looking directly in
+	# the filesystem to determine whether the ref exists. Otherwise, use
+	# Bash builtins since executing Git commands are expensive on some
+	# platforms.
+	if __git_eread "$__git_repo_path/HEAD" head; then
+		b="${head#ref: }"
+		if [ "$b" == "refs/heads/.invalid" ]; then
+			__git -C "$__git_repo_path" rev-parse --verify --quiet "$ref" 2>/dev/null
+			return $?
+		fi
+	fi
+
 	[ -f "$__git_repo_path/$ref" ]
 }
 
-- 
2.42.0


^ permalink raw reply related

* Re: [PATCH v2] Teach git apply to respect core.fileMode settings
From: Junio C Hamano @ 2023-12-19 22:21 UTC (permalink / raw)
  To: Chandra Pratap via GitGitGadget; +Cc: git, Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1620.v2.git.1703010646036.gitgitgadget@gmail.com>

"Chandra Pratap via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Chandra Pratap <chandrapratap3519@gmail.com>
>
> When applying a patch that adds an executable file, git apply
> ignores the core.fileMode setting (core.fileMode in git config
> specifies whether the executable bit on files in the working tree
> should be honored or not) resulting in warnings like:
>
> warning: script.sh has type 100644, expected 100755
>
> even when core.fileMode is set to false, which is undesired. This
> is extra true for systems like Windows, which don't rely on "lsat()".
>
> Fix this by inferring the correct file mode from the existing
> index entry when core.filemode is set to false. The added
> test case helps verify the change and prevents future regression.

Thanks.  Has _this_ particular iteration of the patch been reviewed
by Dscho?  Otherwise ...

> Reviewed-by: Johannes Schindelin <johannes.schindelin@gmail.com>

... this line is a bit problematic.  Just double-checking.

> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
> ---

> diff --git a/t/t4129-apply-samemode.sh b/t/t4129-apply-samemode.sh
> index e7a7295f1b6..73fc472b246 100755
> --- a/t/t4129-apply-samemode.sh
> +++ b/t/t4129-apply-samemode.sh
> @@ -101,4 +101,29 @@ test_expect_success POSIXPERM 'do not use core.sharedRepository for working tree
>  	)
>  '
>  
> +test_expect_success 'ensure git apply respects core.fileMode' '
> +	test_config core.fileMode false &&
> +	echo true >script.sh &&
> +	git add --chmod=+x script.sh &&
> +	git ls-files -s script.sh | grep "^100755" &&
> +	test_tick && git commit -m "Add script" &&
> +	git ls-tree -r HEAD script.sh | grep "^100755" &&

I am wondering if we want to be more strict about hiding error
return code from "git ls-files" and "git ls-tree" behind pipes
like these.  Usually we encourage using a temporary file, e.g.,

	...
	git ls-files -s script.sh >ls-files-output &&
	test_grep "^100755" ls-files-output &&
	...

> +	echo true >>script.sh &&
> +	test_tick && git commit -m "Modify script" script.sh &&
> +	git format-patch -1 --stdout >patch &&
> +	grep "index.*100755" patch &&

Anchor the pattern when you know where it has to appear and what the
surrounding letters must be, e.g.,

	test_grep "^index .* 100755$" patch &&

A test that expects a match with "grep" is silent when it fails, but
if you use test_grep, the output from "sh t4129-*.sh -i -v" gets
easier to view when the test fails, as it is more verbose and say
"we wanted to see X in the output but your output does not have it".

> +	git switch -c branch HEAD^ &&
> +	git apply --index patch 2>err &&
> +	! grep "has type 100644, expected 100755" err &&

If you wanted to use test_grep here, the way to negate it is a bit
peculiar, i.e.

	test_grep ! "has type ..." err &&

It does not have as much value as the positive case, as "! grep"
that expects to fail would show the unexpected match.  In any case,
making sure this particular error message does not appear in the
output is a good way to test it, instead of insisting that the
output is empty, since we may add output to the standard error
stream for unrelated reasons to issue warnings, etc.

> +	git restore -S script.sh && git restore script.sh &&

Why not "git reset --hard" here?  Just being curious why we want to
waste two invocations of "git restore".

> +	git apply patch 2>err &&
> +	! grep "has type 100644, expected 100755" err &&
> +
> +	git apply --cached patch 2>err &&
> +	! grep "has type 100644, expected 100755" err
> +'
> +
>  test_done
>
> base-commit: 1a87c842ece327d03d08096395969aca5e0a6996

^ permalink raw reply

* Re: [PATCH v3 0/2] completion: refactor and support reftables backend
From: Junio C Hamano @ 2023-12-20  0:48 UTC (permalink / raw)
  To: Stan Hu; +Cc: git, Patrick Steinhardt, Christian Couder
In-Reply-To: <cover.1703022850.git.stanhu@gmail.com>

Stan Hu <stanhu@gmail.com> writes:

> This patch series addresses the review feedback:
>
> 1. Renames the '__git_ref_exists' helper to '__git_pseudoref_exists' in
>    the first refactor patch.
>
> Stan Hu (2):
>   completion: refactor existence checks for pseudorefs
>   completion: support pseudoref existence checks for reftables
>
>  contrib/completion/git-completion.bash | 43 +++++++++++++++++++++++---
>  1 file changed, 38 insertions(+), 5 deletions(-)

Thanks.  Will queue.

^ permalink raw reply

* Re: Re: [PATCH 0/1] revert/cherry-pick: add --show-current-patch option
From: Michael Lohmann @ 2023-12-20  6:51 UTC (permalink / raw)
  To: phillip.wood123; +Cc: git, mi.al.lohmann, mial.lohmann, newren, phillip.wood
In-Reply-To: <42ff6b11-f991-4a6d-ad68-ca8c5a3cd735@gmail.com>

Hi Phillip

On 18/12/2023 16:42, Phillip Wood wrote:
> Thanks for bringing this up I agree it can be very helpful to look at
> the original commit when resolving cherry-pick and revert conflicts.
> I'm in two minds about this change though - I wonder if it'd be better
> to improve the documentation for CHERRY_PICK_HEAD and REVERT_HEAD and
> tell users to run "git show CHERRY_PICK_HEAD" instead. I think the
> main reason we have a "--show-current-patch" option for "rebase" is
> that there are two different implementations of that command and the
> patched-based one of them does not support REBASE_HEAD. That reasoning
> does not apply to "cherry-pick" and "revert" and
> "--show-current-patch" suggests a patch-based implementation which is
> also not the case for these commands.

I appreciate the urge of limiting the interface to the minimum needed
and not to duplicate functionality that already exists. On the other
hand, this would
a) grant the user the same experience, not having to wonder about
implementation details such as different backends for rebase, but not
for revert/cherry-pick and
b) (I know it is more indicative of me, but:) when I am looking for a
feature in software and I look into the respective man page I tend to
focus first on the synopsis instead of reading the whole page (or
sometimes I even just rely on the shell autocompletion for
discoverability).

So yes, mentioning REVERT_HEAD and CHERRY_PICK_HEAD in the respective
docs would technically be sufficient, but I don't think it is as
discoverable to an average user (who does not know about the details of
all the existing pseudo refs) as a toplevel action would be. But an
assessment of the pros and cons is not on me to decide.

I have to be honest: I have troubles distinguishing a "patch" and a
"diff", the latter of which `git show <commit>` shows according to the
documentation ("For commits it shows the log message and textual
diff."), though my understanding was that a patch is a diff + context
lines, which is what `git show` actually shows... I think this is
probably why I don't feel so strong about the potential loose usage of
the word here.

Also the documentation of cherry-pick already uses the word "patch" in a
(according to my understanding from a technical perspective) sloppy (but
from a layman's point of view probably nevertheless helpful) way:

> The following sequence attempts to backport a patch, bails out because
> the code the patch applies to has changed too much, and then tries
> again, this time exercising more care about matching up context lines.
> 
> ------------
> $ git cherry-pick topic^             <1>
> $ git diff                           <2>
> $ git cherry-pick --abort            <3>
> $ git cherry-pick -Xpatience topic^  <4>
> ------------
> <1> apply the change that would be shown by `git show topic^`.
>     In this example, the patch does not apply cleanly, so
>     information about the conflict is written to the index and
>     working tree and no new commit results.

Should that also be rephrased?

Out of curiosity: The following from the rebase docs seems to imply that
the apply backend will probably be removed in the future:
> --apply
>           Use applying strategies to rebase (calling git-am
>           internally). This option may become a no-op in the future
>           once the merge backend handles everything the apply one
>           does.

But I would expect the `rebase --show-current-patch` still to be
working. Would that only be a legacy compatibility flag and instead also
for rebases the recommended option would be to run
`git show REBASE_HEAD`?

Best Wishes

Michael

^ permalink raw reply

* [PATCH] Documentation/git-merge.txt: fix reference to synopsys
From: Michael Lohmann @ 2023-12-20  7:05 UTC (permalink / raw)
  To: git; +Cc: Michael Lohmann

437591a9d738 changed the synopsys from two separate lines for `--abort`
and `--continue` to a single line (and it also simultaneously added
`--quit`). That way the "enumeration" of the syntax for `--continue` is
no longer valid. Since `--quit` is now also part of the same syntax
line, a general statement cannot be made any more. Instead of trying to
enumerate the synopsys, be explicit in the limitations of when
respective actions are valid.

This change also groups `--abort` and `--continue` together when
explaining the circumstances under which they can be run in order to
avoid duplication.

Signed-off-by: Michael Lohmann <mi.al.lohmann@gmail.com>
---
 Documentation/git-merge.txt | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-merge.txt b/Documentation/git-merge.txt
index e8ab340319..d8863cc943 100644
--- a/Documentation/git-merge.txt
+++ b/Documentation/git-merge.txt
@@ -46,21 +46,20 @@ a log message from the user describing the changes. Before the operation,
     D---E---F---G---H master
 ------------
 
-The second syntax ("`git merge --abort`") can only be run after the
-merge has resulted in conflicts. 'git merge --abort' will abort the
-merge process and try to reconstruct the pre-merge state. However,
-if there were uncommitted changes when the merge started (and
-especially if those changes were further modified after the merge
-was started), 'git merge --abort' will in some cases be unable to
-reconstruct the original (pre-merge) changes. Therefore:
+It is possible that a merge failure will prevent this process from being
+completely automatic. "`git merge --continue`" and "`git merge --abort`"
+can only be run after the merge has resulted in conflicts.
+
+'git merge --abort' will abort the merge process and try to reconstruct
+the pre-merge state. However, if there were uncommitted changes when the
+merge started (and especially if those changes were further modified
+after the merge was started), 'git merge --abort' will in some cases be
+unable to reconstruct the original (pre-merge) changes. Therefore:
 
 *Warning*: Running 'git merge' with non-trivial uncommitted changes is
 discouraged: while possible, it may leave you in a state that is hard to
 back out of in the case of a conflict.
 
-The third syntax ("`git merge --continue`") can only be run after the
-merge has resulted in conflicts.
-
 OPTIONS
 -------
 :git-merge: 1
-- 
2.39.3 (Apple Git-145)


^ permalink raw reply related

* Re: [PATCH] rebase: use strvec_pushf() for format-patch revisions
From: René Scharfe @ 2023-12-20  8:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List
In-Reply-To: <xmqqmsu6ce0u.fsf@gitster.g>

Am 19.12.23 um 18:12 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> In run_am(), a strbuf is used to create a revision argument that is then
>> added to the argument list for git format-patch using strvec_push().
>> Use strvec_pushf() to add it directly instead, simplifying the code.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>
> Makes sense.  Between the location of the original strbuf_addf()
> call and the new strvec_pushf() call, nobody mucks with *opts so
> this change won't affect the correctness.  We no longer use the
> extra strbuf, and upon failing to open the rebased-patches file,
> we no longer leak the contents of it.  Good.

Ha!  I didn't even notice that leak on error.  Perhaps the two release
calls at the end gave me the illusion of cleanliness?  Or more likely
the opportunity to use strvec_pushf() grabbed my full attention (tunnel
vision).

>
>> @@ -615,34 +614,32 @@ static int run_am(struct rebase_options *opts)
>>  		return run_command(&am);
>>  	}
>>
>> -	strbuf_addf(&revisions, "%s...%s",
>> -		    oid_to_hex(opts->root ?
>> -			       /* this is now equivalent to !opts->upstream */
>> -			       &opts->onto->object.oid :
>> -			       &opts->upstream->object.oid),
>> -		    oid_to_hex(&opts->orig_head->object.oid));
>> -
>>  	rebased_patches = xstrdup(git_path("rebased-patches"));
>>  	format_patch.out = open(rebased_patches,
>>  				O_WRONLY | O_CREAT | O_TRUNC, 0666);
>>  	if (format_patch.out < 0) {
>>  		status = error_errno(_("could not open '%s' for writing"),
>>  				     rebased_patches);
>>  		free(rebased_patches);
>>  		strvec_clear(&am.args);
>>  		return status;
>>  	}
>>
>>  	format_patch.git_cmd = 1;
>>  	strvec_pushl(&format_patch.args, "format-patch", "-k", "--stdout",
>>  		     "--full-index", "--cherry-pick", "--right-only",
>>  		     "--default-prefix", "--no-renames",
>>  		     "--no-cover-letter", "--pretty=mboxrd", "--topo-order",
>>  		     "--no-base", NULL);
>>  	if (opts->git_format_patch_opt.len)
>>  		strvec_split(&format_patch.args,
>>  			     opts->git_format_patch_opt.buf);
>> -	strvec_push(&format_patch.args, revisions.buf);
>> +	strvec_pushf(&format_patch.args, "%s...%s",
>> +		     oid_to_hex(opts->root ?
>> +				/* this is now equivalent to !opts->upstream */
>> +				&opts->onto->object.oid :
>> +				&opts->upstream->object.oid),
>> +		     oid_to_hex(&opts->orig_head->object.oid));
>>  	if (opts->restrict_revision)
>>  		strvec_pushf(&format_patch.args, "^%s",
>>  			     oid_to_hex(&opts->restrict_revision->object.oid));
>> @@ -665,10 +662,8 @@ static int run_am(struct rebase_options *opts)
>>  			"As a result, git cannot rebase them."),
>>  		      opts->revisions);
>>
>> -		strbuf_release(&revisions);
>>  		return status;
>>  	}
>> -	strbuf_release(&revisions);
>>
>>  	am.in = open(rebased_patches, O_RDONLY);
>>  	if (am.in < 0) {
>> --
>> 2.43.0

^ permalink raw reply

* Re: [PATCH] rebase-interactive: show revert option and add single letter shortcut
From: Michael Lohmann @ 2023-12-20  8:53 UTC (permalink / raw)
  To: gitster
  Cc: Johannes.Schindelin, git, mi.al.lohmann, mial.lohmann,
	phillip.wood123, phillip.wood
In-Reply-To: <xmqqmsu7e4hp.fsf@gitster.g>

Hi Junio,

On 18. Dec 2023, at 19:43, Junio C Hamano <gitster@pobox.com> wrote:
> Michael Lohmann <mial.lohmann@gmail.com> writes:
>
>> A `revert` in an interactive rebase can be useful, e.g. if a faulty
>> commit was pushed to the main branch already, so you can't just drop it.
> 
> But wouldn't that be typically done simply by running "git revert",
> totally outside the context of "git rebase -i"?
> 
> Interactive rebase is more geared toward rearranging an already
> built history *after* such a "git revert" is made and possibly other
> commits are made either before or after that commit that was created
> by "git revert".

Right - I recently found myself in a situation where a coworker merged a
faulty commit leading to a build failure and (given that only the two of
us actively worked on that project) we coordinated that he would prepare
a proper fix, while I wanted to rebase my current feature branch on
master, but with that commit reverted. For the sake of a clean history I
preferred to have the revert commit right at the merge-base, instead of
somewhere in the middle of all of my commits.

But you are right - if there are more than two people working on a
project, the typical way of properly doing this would be to revert the
offending commit in the master branch until a fix is available.

You can also do the revert you want to create in your feature branch and
directly update the master:
- `git rebase -i origin/master`
- add the following two lines to the top of the todo list:
    revert B
    exec git push origin @:refs/heads/revert-commit

Instead of
- `git switch origin/master -c revert-commit`
- `git revert B`
- `git push origin HEAD`
- `git switch -`
- `git rebase revert-commit`

So with the interactive revert you can create the "revert-commit” branch
"directly from within your own branch”. But indeed - it really is not
needed...

> A much cleaner way to structure your branch is not to muck with such
> tentative changes *on* the branch you eventually want to store the
> final result on.  Fork another branch and rebase B away:

I really like that workflow! I’ll adapt it :)

So in total: I don’t think documenting this is necessary (that is also
why my first message was not directly the patch, but the question why
this is undocumented) and it might even lead to the unclean workflow
that I ended up having, so even from that perspective it might not be a
good thing.

Thank you very much for this very detailed explanation of the workflow!
Michael

P.S. I am sorry - the first reply only went directly to Junio and not the
mailing list

^ permalink raw reply

* [PATCH 0/7] reftable: fixes and optimizations (pt.2)
From: Patrick Steinhardt @ 2023-12-20  9:17 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1611 bytes --]

Hi,

this is the second set of patches that fixes bugs and performs some
slight memory optimizations. The series builds on top of c0cadb0576,
which has been merged to `next`.

The series is structured as follows:

  - Patch 1: some hardening to not corrupt the reftable stack on
    compaction.

  - Patch 2: fix corruption of a reftable when writing multiple indices.

  - Patches 3 - 7: various smallish refactorings to optimize memory
    usage. Overall these reduce allocations when iterating many refs by
    almost 85%.

Thanks in advance for your review!

Patrick

Patrick Steinhardt (7):
  reftable/stack: do not overwrite errors when compacting
  reftable/writer: fix index corruption when writing multiple indices
  reftable/record: constify some parts of the interface
  reftable/record: store "val1" hashes as static arrays
  reftable/record: store "val2" hashes as static arrays
  reftable/merged: really reuse buffers to compute record keys
  reftable/merged: transfer ownership of records when iterating

 reftable/block_test.c      |   4 +-
 reftable/merged.c          |   8 +--
 reftable/merged_test.c     |  16 +++---
 reftable/readwrite_test.c  | 103 +++++++++++++++++++++++++++++++------
 reftable/record.c          |  17 ++----
 reftable/record_test.c     |   5 --
 reftable/reftable-record.h |  10 ++--
 reftable/stack.c           |  20 +++----
 reftable/stack_test.c      |   2 -
 reftable/writer.c          |   4 +-
 10 files changed, 117 insertions(+), 72 deletions(-)


base-commit: c0cadb0576d4920915eb3bd38a7d1abfcbd25f98
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH 1/7] reftable/stack: do not overwrite errors when compacting
From: Patrick Steinhardt @ 2023-12-20  9:17 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703063544.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1854 bytes --]

In order to compact multiple stacks we iterate through the merged ref
and log records. When there is any error either when reading the records
from the old merged table or when writing the records to the new table
then we break out of the respective loops. When breaking out of the loop
for the ref records though the error code will be overwritten, which may
cause us to inadvertently skip over bad ref records. In the worst case,
this can lead to a compacted stack that is missing records.

Fix the code by using `goto done` instead so that any potential error
codes are properly returned to the caller.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 16bab82063..8729508dc3 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -801,18 +801,16 @@ static int stack_write_compact(struct reftable_stack *st,
 			err = 0;
 			break;
 		}
-		if (err < 0) {
-			break;
-		}
+		if (err < 0)
+			goto done;
 
 		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
 			continue;
 		}
 
 		err = reftable_writer_add_ref(wr, &ref);
-		if (err < 0) {
-			break;
-		}
+		if (err < 0)
+			goto done;
 		entries++;
 	}
 	reftable_iterator_destroy(&it);
@@ -827,9 +825,8 @@ static int stack_write_compact(struct reftable_stack *st,
 			err = 0;
 			break;
 		}
-		if (err < 0) {
-			break;
-		}
+		if (err < 0)
+			goto done;
 		if (first == 0 && reftable_log_record_is_deletion(&log)) {
 			continue;
 		}
@@ -845,9 +842,8 @@ static int stack_write_compact(struct reftable_stack *st,
 		}
 
 		err = reftable_writer_add_log(wr, &log);
-		if (err < 0) {
-			break;
-		}
+		if (err < 0)
+			goto done;
 		entries++;
 	}
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 2/7] reftable/writer: fix index corruption when writing multiple indices
From: Patrick Steinhardt @ 2023-12-20  9:17 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703063544.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 5241 bytes --]

Each reftable may contain multiple types of blocks for refs, objects and
reflog records, where each of these may have an index that makes it more
efficient to find the records. It was observed that the index for log
records can become corrupted under certain circumstances, where the
first entry of the index points into the object index instead of to the
log records.

As it turns out, this corruption can occur whenever we write a log index
as well as at least one additional index. Writing records and their index
is basically a two-step process:

  1. We write all blocks for the corresponding record. Each block that
     gets written is added to a list of blocks to index.

  2. Once all blocks were written we finish the section. If at least two
     blocks have been added to the list of blocks to index then we will
     now write the index for those blocks and flush it, as well.

When we have a very large number of blocks then we may decide to write a
multi-level index, which is why we also keep track of the list of the
index blocks in the same way as we previously kept track of the blocks
to index.

Now when we have finished writing all index blocks we clear the index
and flush the last block to disk. This is done in the wrong order though
because flushing the block to disk will re-add it to the list of blocks
to be indexed. The result is that the next section we are about to write
will have an entry in the list of blocks to index that points to the
last block of the preceding section's index, which will corrupt the log
index.

Fix this corruption by clearing the index after having written the last
block.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/readwrite_test.c | 80 +++++++++++++++++++++++++++++++++++++++
 reftable/writer.c         |  4 +-
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 278663f22d..9c16e0504e 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -798,6 +798,85 @@ static void test_write_key_order(void)
 	strbuf_release(&buf);
 }
 
+static void test_write_multiple_indices(void)
+{
+	struct reftable_write_options opts = {
+		.block_size = 100,
+	};
+	struct strbuf writer_buf = STRBUF_INIT, buf = STRBUF_INIT;
+	struct reftable_block_source source = { 0 };
+	struct reftable_iterator it = { 0 };
+	const struct reftable_stats *stats;
+	struct reftable_writer *writer;
+	struct reftable_reader *reader;
+	int err, i;
+
+	writer = reftable_new_writer(&strbuf_add_void, &writer_buf, &opts);
+	reftable_writer_set_limits(writer, 1, 1);
+	for (i = 0; i < 100; i++) {
+		unsigned char hash[GIT_SHA1_RAWSZ] = {i};
+		struct reftable_ref_record ref = {
+			.update_index = 1,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = hash,
+		};
+
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "refs/heads/%04d", i);
+		ref.refname = buf.buf,
+
+		err = reftable_writer_add_ref(writer, &ref);
+		EXPECT_ERR(err);
+	}
+
+	for (i = 0; i < 100; i++) {
+		unsigned char hash[GIT_SHA1_RAWSZ] = {i};
+		struct reftable_log_record log = {
+			.update_index = 1,
+			.value_type = REFTABLE_LOG_UPDATE,
+			.value.update = {
+				.old_hash = hash,
+				.new_hash = hash,
+			},
+		};
+
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "refs/heads/%04d", i);
+		log.refname = buf.buf,
+
+		err = reftable_writer_add_log(writer, &log);
+		EXPECT_ERR(err);
+	}
+
+	reftable_writer_close(writer);
+
+	/*
+	 * The written data should be sufficiently large to result in indices
+	 * for each of the block types.
+	 */
+	stats = reftable_writer_stats(writer);
+	EXPECT(stats->ref_stats.index_offset > 0);
+	EXPECT(stats->obj_stats.index_offset > 0);
+	EXPECT(stats->log_stats.index_offset > 0);
+
+	block_source_from_strbuf(&source, &writer_buf);
+	err = reftable_new_reader(&reader, &source, "filename");
+	EXPECT_ERR(err);
+
+	/*
+	 * Seeking the log uses the log index now. In case there is any
+	 * confusion regarding indices we would notice here.
+	 */
+	err = reftable_reader_seek_log(reader, &it, "");
+	EXPECT_ERR(err);
+
+	reftable_iterator_destroy(&it);
+	reftable_writer_free(writer);
+	reftable_reader_free(reader);
+	strbuf_release(&writer_buf);
+	strbuf_release(&buf);
+}
+
 static void test_corrupt_table_empty(void)
 {
 	struct strbuf buf = STRBUF_INIT;
@@ -847,5 +926,6 @@ int readwrite_test_main(int argc, const char *argv[])
 	RUN_TEST(test_log_overflow);
 	RUN_TEST(test_write_object_id_length);
 	RUN_TEST(test_write_object_id_min_length);
+	RUN_TEST(test_write_multiple_indices);
 	return 0;
 }
diff --git a/reftable/writer.c b/reftable/writer.c
index 2e322a5683..ee4590e20f 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -432,12 +432,12 @@ static int writer_finish_section(struct reftable_writer *w)
 		reftable_free(idx);
 	}
 
-	writer_clear_index(w);
-
 	err = writer_flush_block(w);
 	if (err < 0)
 		return err;
 
+	writer_clear_index(w);
+
 	bstats = writer_reftable_block_stats(w, typ);
 	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
 	bstats->index_offset = index_start;
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 3/7] reftable/record: constify some parts of the interface
From: Patrick Steinhardt @ 2023-12-20  9:17 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703063544.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2714 bytes --]

We're about to convert reftable records to stop storing their object IDs
as allocated hashes. Prepare for this refactoring by constifying some
parts of the interface that will be impacted by this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c          | 8 ++++----
 reftable/reftable-record.h | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/reftable/record.c b/reftable/record.c
index fbaa1fbef5..5e258c734b 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -76,7 +76,7 @@ int reftable_is_block_type(uint8_t typ)
 	return 0;
 }
 
-uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec)
+const unsigned char *reftable_ref_record_val1(const struct reftable_ref_record *rec)
 {
 	switch (rec->value_type) {
 	case REFTABLE_REF_VAL1:
@@ -88,7 +88,7 @@ uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec)
 	}
 }
 
-uint8_t *reftable_ref_record_val2(const struct reftable_ref_record *rec)
+const unsigned char *reftable_ref_record_val2(const struct reftable_ref_record *rec)
 {
 	switch (rec->value_type) {
 	case REFTABLE_REF_VAL2:
@@ -242,7 +242,7 @@ static char hexdigit(int c)
 	return 'a' + (c - 10);
 }
 
-static void hex_format(char *dest, uint8_t *src, int hash_size)
+static void hex_format(char *dest, const unsigned char *src, int hash_size)
 {
 	assert(hash_size > 0);
 	if (src) {
@@ -1164,7 +1164,7 @@ int reftable_record_equal(struct reftable_record *a, struct reftable_record *b,
 		reftable_record_data(a), reftable_record_data(b), hash_size);
 }
 
-static int hash_equal(uint8_t *a, uint8_t *b, int hash_size)
+static int hash_equal(const unsigned char *a, const unsigned char *b, int hash_size)
 {
 	if (a && b)
 		return !memcmp(a, b, hash_size);
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index 67104f8fbf..f7eb2d6015 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -49,11 +49,11 @@ struct reftable_ref_record {
 
 /* Returns the first hash, or NULL if `rec` is not of type
  * REFTABLE_REF_VAL1 or REFTABLE_REF_VAL2. */
-uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec);
+const unsigned char *reftable_ref_record_val1(const struct reftable_ref_record *rec);
 
 /* Returns the second hash, or NULL if `rec` is not of type
  * REFTABLE_REF_VAL2. */
-uint8_t *reftable_ref_record_val2(const struct reftable_ref_record *rec);
+const unsigned char *reftable_ref_record_val2(const struct reftable_ref_record *rec);
 
 /* returns whether 'ref' represents a deletion */
 int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 4/7] reftable/record: store "val1" hashes as static arrays
From: Patrick Steinhardt @ 2023-12-20  9:17 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703063544.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 7954 bytes --]

When reading ref records of type "val1" we store its object ID in an
allocated array. This results in an additional allocation for every
single ref record we read, which is rather inefficient especially when
iterating over refs.

Refactor the code to instead use a static array of `GIT_MAX_RAWSZ`
bytes. While this means that `struct ref_record` is bigger now, we
typically do not store all refs in an array anyway and instead only
handle a limited number of records at the same point in time.

Using `git show-ref --quiet` in a repository with ~350k refs this leads
to a significant drop in allocations. Before:

    HEAP SUMMARY:
        in use at exit: 21,098 bytes in 192 blocks
      total heap usage: 2,116,683 allocs, 2,116,491 frees, 76,098,060 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,098 bytes in 192 blocks
      total heap usage: 1,419,031 allocs, 1,418,839 frees, 62,145,036 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block_test.c      |  4 +---
 reftable/merged_test.c     | 16 ++++++----------
 reftable/readwrite_test.c  | 11 +++--------
 reftable/record.c          |  3 ---
 reftable/record_test.c     |  1 -
 reftable/reftable-record.h |  2 +-
 reftable/stack_test.c      |  2 --
 7 files changed, 11 insertions(+), 28 deletions(-)

diff --git a/reftable/block_test.c b/reftable/block_test.c
index c00bbc8aed..dedb05c7d8 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -49,13 +49,11 @@ static void test_block_read_write(void)
 
 	for (i = 0; i < N; i++) {
 		char name[100];
-		uint8_t hash[GIT_SHA1_RAWSZ];
 		snprintf(name, sizeof(name), "branch%02d", i);
-		memset(hash, i, sizeof(hash));
 
 		rec.u.ref.refname = name;
 		rec.u.ref.value_type = REFTABLE_REF_VAL1;
-		rec.u.ref.value.val1 = hash;
+		memset(rec.u.ref.value.val1, i, GIT_SHA1_RAWSZ);
 
 		names[i] = xstrdup(name);
 		n = block_writer_add(&bw, &rec);
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
index d08c16abef..b3927a5d73 100644
--- a/reftable/merged_test.c
+++ b/reftable/merged_test.c
@@ -123,13 +123,11 @@ static void readers_destroy(struct reftable_reader **readers, size_t n)
 
 static void test_merged_between(void)
 {
-	uint8_t hash1[GIT_SHA1_RAWSZ] = { 1, 2, 3, 0 };
-
 	struct reftable_ref_record r1[] = { {
 		.refname = "b",
 		.update_index = 1,
 		.value_type = REFTABLE_REF_VAL1,
-		.value.val1 = hash1,
+		.value.val1 = { 1, 2, 3, 0 },
 	} };
 	struct reftable_ref_record r2[] = { {
 		.refname = "a",
@@ -165,26 +163,24 @@ static void test_merged_between(void)
 
 static void test_merged(void)
 {
-	uint8_t hash1[GIT_SHA1_RAWSZ] = { 1 };
-	uint8_t hash2[GIT_SHA1_RAWSZ] = { 2 };
 	struct reftable_ref_record r1[] = {
 		{
 			.refname = "a",
 			.update_index = 1,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash1,
+			.value.val1 = { 1 },
 		},
 		{
 			.refname = "b",
 			.update_index = 1,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash1,
+			.value.val1 = { 1 },
 		},
 		{
 			.refname = "c",
 			.update_index = 1,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash1,
+			.value.val1 = { 1 },
 		}
 	};
 	struct reftable_ref_record r2[] = { {
@@ -197,13 +193,13 @@ static void test_merged(void)
 			.refname = "c",
 			.update_index = 3,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash2,
+			.value.val1 = { 2 },
 		},
 		{
 			.refname = "d",
 			.update_index = 3,
 			.value_type = REFTABLE_REF_VAL1,
-			.value.val1 = hash1,
+			.value.val1 = { 1 },
 		},
 	};
 
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 9c16e0504e..56c0b4db5d 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -60,18 +60,15 @@ static void write_table(char ***names, struct strbuf *buf, int N,
 	*names = reftable_calloc(sizeof(char *) * (N + 1));
 	reftable_writer_set_limits(w, update_index, update_index);
 	for (i = 0; i < N; i++) {
-		uint8_t hash[GIT_SHA256_RAWSZ] = { 0 };
 		char name[100];
 		int n;
 
-		set_test_hash(hash, i);
-
 		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
 
 		ref.refname = name;
 		ref.update_index = update_index;
 		ref.value_type = REFTABLE_REF_VAL1;
-		ref.value.val1 = hash;
+		set_test_hash(ref.value.val1, i);
 		(*names)[i] = xstrdup(name);
 
 		n = reftable_writer_add_ref(w, &ref);
@@ -675,11 +672,10 @@ static void test_write_object_id_min_length(void)
 	struct strbuf buf = STRBUF_INIT;
 	struct reftable_writer *w =
 		reftable_new_writer(&strbuf_add_void, &buf, &opts);
-	uint8_t hash[GIT_SHA1_RAWSZ] = {42};
 	struct reftable_ref_record ref = {
 		.update_index = 1,
 		.value_type = REFTABLE_REF_VAL1,
-		.value.val1 = hash,
+		.value.val1 = {42},
 	};
 	int err;
 	int i;
@@ -711,11 +707,10 @@ static void test_write_object_id_length(void)
 	struct strbuf buf = STRBUF_INIT;
 	struct reftable_writer *w =
 		reftable_new_writer(&strbuf_add_void, &buf, &opts);
-	uint8_t hash[GIT_SHA1_RAWSZ] = {42};
 	struct reftable_ref_record ref = {
 		.update_index = 1,
 		.value_type = REFTABLE_REF_VAL1,
-		.value.val1 = hash,
+		.value.val1 = {42},
 	};
 	int err;
 	int i;
diff --git a/reftable/record.c b/reftable/record.c
index 5e258c734b..a67a6b4d8a 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -219,7 +219,6 @@ static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
 	case REFTABLE_REF_DELETION:
 		break;
 	case REFTABLE_REF_VAL1:
-		ref->value.val1 = reftable_malloc(hash_size);
 		memcpy(ref->value.val1, src->value.val1, hash_size);
 		break;
 	case REFTABLE_REF_VAL2:
@@ -303,7 +302,6 @@ void reftable_ref_record_release(struct reftable_ref_record *ref)
 		reftable_free(ref->value.val2.value);
 		break;
 	case REFTABLE_REF_VAL1:
-		reftable_free(ref->value.val1);
 		break;
 	case REFTABLE_REF_DELETION:
 		break;
@@ -394,7 +392,6 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key,
 			return -1;
 		}
 
-		r->value.val1 = reftable_malloc(hash_size);
 		memcpy(r->value.val1, in.buf, hash_size);
 		string_view_consume(&in, hash_size);
 		break;
diff --git a/reftable/record_test.c b/reftable/record_test.c
index 70ae78feca..5c94d26e35 100644
--- a/reftable/record_test.c
+++ b/reftable/record_test.c
@@ -119,7 +119,6 @@ static void test_reftable_ref_record_roundtrip(void)
 		case REFTABLE_REF_DELETION:
 			break;
 		case REFTABLE_REF_VAL1:
-			in.u.ref.value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
 			set_hash(in.u.ref.value.val1, 1);
 			break;
 		case REFTABLE_REF_VAL2:
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index f7eb2d6015..44b5125445 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -38,7 +38,7 @@ struct reftable_ref_record {
 #define REFTABLE_NR_REF_VALUETYPES 4
 	} value_type;
 	union {
-		uint8_t *val1; /* malloced hash. */
+		unsigned char val1[GIT_MAX_RAWSZ];
 		struct {
 			uint8_t *value; /* first value, malloced hash  */
 			uint8_t *target_value; /* second value, malloced hash */
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 14a3fc11ee..feab49d7f7 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -463,7 +463,6 @@ static void test_reftable_stack_add(void)
 		refs[i].refname = xstrdup(buf);
 		refs[i].update_index = i + 1;
 		refs[i].value_type = REFTABLE_REF_VAL1;
-		refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
 		set_test_hash(refs[i].value.val1, i);
 
 		logs[i].refname = xstrdup(buf);
@@ -600,7 +599,6 @@ static void test_reftable_stack_tombstone(void)
 		refs[i].update_index = i + 1;
 		if (i % 2 == 0) {
 			refs[i].value_type = REFTABLE_REF_VAL1;
-			refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
 			set_test_hash(refs[i].value.val1, i);
 		}
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 5/7] reftable/record: store "val2" hashes as static arrays
From: Patrick Steinhardt @ 2023-12-20  9:17 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703063544.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 5214 bytes --]

Similar to the preceding commit, convert ref records of type "val2" to
store their object IDs in static arrays instead of allocating them for
every single record.

We're using the same benchmark as in the preceding commit, with `git
show-ref --quiet` in a repository with ~350k refs. This time around
though the effects aren't this huge. Before:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 1,419,040 allocs, 1,418,847 frees, 62,153,868 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated

This is because "val2"-type records are typically only stored for peeled
tags, and the number of annotated tags in the benchmark repository is
rather low. Still, it can be seen that this change leads to a reduction
of allocations overall, even if only a small one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/readwrite_test.c  | 12 ++++--------
 reftable/record.c          |  6 ------
 reftable/record_test.c     |  4 ----
 reftable/reftable-record.h |  4 ++--
 4 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 56c0b4db5d..900afc1b70 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -547,8 +547,6 @@ static void test_table_refs_for(int indexed)
 		uint8_t hash[GIT_SHA1_RAWSZ];
 		char fill[51] = { 0 };
 		char name[100];
-		uint8_t hash1[GIT_SHA1_RAWSZ];
-		uint8_t hash2[GIT_SHA1_RAWSZ];
 		struct reftable_ref_record ref = { NULL };
 
 		memset(hash, i, sizeof(hash));
@@ -558,11 +556,9 @@ static void test_table_refs_for(int indexed)
 		name[40] = 0;
 		ref.refname = name;
 
-		set_test_hash(hash1, i / 4);
-		set_test_hash(hash2, 3 + i / 4);
 		ref.value_type = REFTABLE_REF_VAL2;
-		ref.value.val2.value = hash1;
-		ref.value.val2.target_value = hash2;
+		set_test_hash(ref.value.val2.value, i / 4);
+		set_test_hash(ref.value.val2.target_value, 3 + i / 4);
 
 		/* 80 bytes / entry, so 3 entries per block. Yields 17
 		 */
@@ -570,8 +566,8 @@ static void test_table_refs_for(int indexed)
 		n = reftable_writer_add_ref(w, &ref);
 		EXPECT(n == 0);
 
-		if (!memcmp(hash1, want_hash, GIT_SHA1_RAWSZ) ||
-		    !memcmp(hash2, want_hash, GIT_SHA1_RAWSZ)) {
+		if (!memcmp(ref.value.val2.value, want_hash, GIT_SHA1_RAWSZ) ||
+		    !memcmp(ref.value.val2.target_value, want_hash, GIT_SHA1_RAWSZ)) {
 			want_names[want_names_len++] = xstrdup(name);
 		}
 	}
diff --git a/reftable/record.c b/reftable/record.c
index a67a6b4d8a..5c3fbb7b2a 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -222,9 +222,7 @@ static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
 		memcpy(ref->value.val1, src->value.val1, hash_size);
 		break;
 	case REFTABLE_REF_VAL2:
-		ref->value.val2.value = reftable_malloc(hash_size);
 		memcpy(ref->value.val2.value, src->value.val2.value, hash_size);
-		ref->value.val2.target_value = reftable_malloc(hash_size);
 		memcpy(ref->value.val2.target_value,
 		       src->value.val2.target_value, hash_size);
 		break;
@@ -298,8 +296,6 @@ void reftable_ref_record_release(struct reftable_ref_record *ref)
 		reftable_free(ref->value.symref);
 		break;
 	case REFTABLE_REF_VAL2:
-		reftable_free(ref->value.val2.target_value);
-		reftable_free(ref->value.val2.value);
 		break;
 	case REFTABLE_REF_VAL1:
 		break;
@@ -401,11 +397,9 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key,
 			return -1;
 		}
 
-		r->value.val2.value = reftable_malloc(hash_size);
 		memcpy(r->value.val2.value, in.buf, hash_size);
 		string_view_consume(&in, hash_size);
 
-		r->value.val2.target_value = reftable_malloc(hash_size);
 		memcpy(r->value.val2.target_value, in.buf, hash_size);
 		string_view_consume(&in, hash_size);
 		break;
diff --git a/reftable/record_test.c b/reftable/record_test.c
index 5c94d26e35..2876db7d27 100644
--- a/reftable/record_test.c
+++ b/reftable/record_test.c
@@ -122,11 +122,7 @@ static void test_reftable_ref_record_roundtrip(void)
 			set_hash(in.u.ref.value.val1, 1);
 			break;
 		case REFTABLE_REF_VAL2:
-			in.u.ref.value.val2.value =
-				reftable_malloc(GIT_SHA1_RAWSZ);
 			set_hash(in.u.ref.value.val2.value, 1);
-			in.u.ref.value.val2.target_value =
-				reftable_malloc(GIT_SHA1_RAWSZ);
 			set_hash(in.u.ref.value.val2.target_value, 2);
 			break;
 		case REFTABLE_REF_SYMREF:
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index 44b5125445..83d252ec2c 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -40,8 +40,8 @@ struct reftable_ref_record {
 	union {
 		unsigned char val1[GIT_MAX_RAWSZ];
 		struct {
-			uint8_t *value; /* first value, malloced hash  */
-			uint8_t *target_value; /* second value, malloced hash */
+			unsigned char value[GIT_MAX_RAWSZ]; /* first hash  */
+			unsigned char target_value[GIT_MAX_RAWSZ]; /* second hash */
 		} val2;
 		char *symref; /* referent, malloced 0-terminated string */
 	} value;
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 6/7] reftable/merged: really reuse buffers to compute record keys
From: Patrick Steinhardt @ 2023-12-20  9:17 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703063544.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

In 829231dc20 (reftable/merged: reuse buffer to compute record keys,
2023-12-11), we have refactored the merged iterator to reuse a set of
buffers that it would otherwise have to reallocate on every single
iteration. Unfortunately, there was a brown-paper-bag-style bug here as
we continue to release these buffers after the iteration, and thus we
have essentially gained nothing.

Fix this performance issue by not releasing those buffers on iteration
anymore, where we instead rely on `merged_iter_close()` to release the
buffers for us.

Using `git show-ref --quiet` in a repository with ~350k refs this leads
to a significant drop in allocations. Before:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 556bb5c556..a28bb99aaf 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -128,8 +128,6 @@ static int merged_iter_next_entry(struct merged_iter *mi,

 done:
 	reftable_record_release(&entry.rec);
-	strbuf_release(&mi->entry_key);
-	strbuf_release(&mi->key);
 	return err;
 }

-- 
2.43.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 7/7] reftable/merged: transfer ownership of records when iterating
From: Patrick Steinhardt @ 2023-12-20  9:17 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703063544.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2155 bytes --]

When iterating over recods with the merged iterator we put the records
into a priority queue before yielding them to the caller. This means
that we need to allocate the contents of these records before we can
pass them over to the caller.

The handover to the caller is quite inefficient though because we first
deallocate the record passed in by the caller and then copy over the new
record, which requires us to reallocate memory.

Refactor the code to instead transfer ownership of the new record to the
caller. So instead of reallocating all contents, we now release the old
record and then copy contents of the new record into place.

The following benchmark of `git show-ref --quiet` in a repository with
around 350k refs shows a clear improvement. Before:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 357,007 allocs, 356,814 frees, 24,193,602 bytes allocated

This shows that we now have roundabout a single allocation per record
that we're yielding from the iterator. Ideally, we'd also get rid of
this allocation so that the number of allocations doesn't scale with the
number of refs anymore. This would require some larger surgery though
because the memory is owned by the priority queue before transferring it
over to the caller.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index a28bb99aaf..a52914d667 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -124,10 +124,12 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 		reftable_record_release(&top.rec);
 	}

-	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+	reftable_record_release(rec);
+	*rec = entry.rec;

 done:
-	reftable_record_release(&entry.rec);
+	if (err)
+		reftable_record_release(&entry.rec);
 	return err;
 }

-- 
2.43.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox