* Re: [PATCH v2] checkout/switch: disallow checking out same branch in multiple worktrees
From: Junio C Hamano @ 2023-01-18 22:55 UTC (permalink / raw)
To: Carlo Marcelo Arenas Belón
Cc: git, pclouds, Jinwook Jeong, Rubén Justo, Eric Sunshine
In-Reply-To: <20230118061527.76218-1-carenas@gmail.com>
Carlo Marcelo Arenas Belón <carenas@gmail.com> writes:
> Changes since v1
> * A much better commit message
> * Changes to the tests as suggested by Eric
> * Changes to the logic as suggested by Rubén
I queued this topic at the tip of 'seen' as 2fe0b4e3 (Merge branch
'cb/checkout-same-branch-twice' into seen, 2023-01-18), on top of
4ea8693b (Merge branch 'mc/credential-helper-auth-headers' into
seen, 2023-01-18).
- 4ea8693b - https://github.com/git/git/actions/runs/3952916442
- 2fe0b4e3 - https://github.com/git/git/actions/runs/3953521066
Comparing these two runs, inclusion of this topic seems to introduce
new leaks, as t1408 and t2018 (neither of which was touched by this
topic) that used to pass are now failing.
> builtin/checkout.c | 24 +++++++++++++++++-------
> t/t2400-worktree-add.sh | 18 ++++++++++++++++--
> 2 files changed, 33 insertions(+), 9 deletions(-)
Thanks.
^ permalink raw reply
* Re: [PATCH v4 08/19] worktree: fix a trivial leak in prune_worktrees()
From: Ævar Arnfjörð Bjarmason @ 2023-01-18 23:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, René Scharfe, Eric Sunshine
In-Reply-To: <xmqqedrr6cnp.fsf@gitster.g>
On Wed, Jan 18 2023, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> diff --git c/builtin/branch.c w/builtin/branch.c
>>> index f63fd45edb..4fe7757670 100644
>>> --- c/builtin/branch.c
>>> +++ w/builtin/branch.c
>>> @@ -742,6 +742,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
>>> if (filter.abbrev == -1)
>>> filter.abbrev = DEFAULT_ABBREV;
>>> filter.ignore_case = icase;
>>> + UNLEAK(filter);
>>>
>>> finalize_colopts(&colopts, -1);
>>> if (filter.verbose) {
>>
>> I'll send a v5 re-roll without this change, sorry.
>
> I'd rather see your reroll with the above addition of UNLEAK() than
> without it, to fix the breakage.
I don't mind that UNLEAK() being in-tree until a better fix for that
leak, but doesn't the v5 I sent also address this?
The issue was that I mis-marked a test as passing, when it only passed
depending on my local compiler (-fsanitize=leak is fickle
sometimes). Now that we're not marking that test as leak-free there's no
need for the UNLEAK() for now, no?
Or is there some edge case I didn't spot/notice?
1. https://lore.kernel.org/git/cover-v5-00.19-00000000000-20230118T120334Z-avarab@gmail.com/
^ permalink raw reply
* Re: [PATCH v3] grep: correctly identify utf-8 characters with \{b,w} in -P
From: Ævar Arnfjörð Bjarmason @ 2023-01-18 23:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Carlo Arenas, git, Diomidis Spinellis
In-Reply-To: <xmqq7cxj6chn.fsf@gitster.g>
On Wed, Jan 18 2023, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> GNU grep -P has no knob and would likely never have one.
>>
>> I think the general knob in not just GNU grep but GNU utils and the
>> wider *nix landscape is "tweak your LC_ALL and/or other locale
>> varibales".
>>
>> Which works for it, and will work for us once we're using PCRE2_UCP too.
>>
>>> So for now, I think we should acknowledge the bug, provide an option
>>> for people that might need the fix, and fix all other problems we
>>> have, which will include changes in PCRE2 as well to better fit our
>>> use case.
>>
>> Hrm, what are those PCRE2 changes? The one I saw so far (or was it a
>> proposal) was to just make its "grep" utility use the PCRE2_UCP like GNU
>> grep is now doing in its unreleased version in its git repo...
>
> Yeah, I didn't understand Carlo's comment in that paragraph at all.
>
> In short, it sounds to me that the earlier one that added PCRE2_UCP
> unconditionally would be the best alternative among those that have
> been discussed.
I agree.
^ permalink raw reply
* Re: [PATCH v8 2/2] send-email: expose header information to git-send-email's sendemail-validate hook
From: Ævar Arnfjörð Bjarmason @ 2023-01-18 23:12 UTC (permalink / raw)
To: Michael Strawbridge; +Cc: git, Luben Tuikov, Junio C Hamano
In-Reply-To: <20230118163203.488652-3-michael.strawbridge@amd.com>
On Wed, Jan 18 2023, Michael Strawbridge wrote:
> To allow further flexibility in the Git hook, the SMTP header
> information of the email which git-send-email intends to send, is now
> passed as the 2nd argument to the sendemail-validate hook.
>
> As an example, this can be useful for acting upon keywords in the
> subject or specific email addresses.
>
> As a consequence of needing all the header data, validation has been
> moved later in the sequence to right before sending the emails instead
> of at the beginning.
Ah, I see. I tested this (i.e. moving it back to the previous behavior)
and you did this change because you don't have the $sender variable yet.
I tried this quickly on top, which seems to work, i.e. now we do this in
the same order as before, but we just move the $sender code earlier:
diff --git a/git-send-email.perl b/git-send-email.perl
index d123dfd33d5..7e7681116bb 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -787,6 +787,28 @@ sub is_format_patch_arg {
@files = handle_backup_files(@files);
+if (defined $sender) {
+ $sender =~ s/^\s+|\s+$//g;
+ ($sender) = expand_aliases($sender);
+} else {
+ $sender = $repoauthor->() || $repocommitter->() || '';
+}
+
+# $sender could be an already sanitized address
+# (e.g. sendemail.from could be manually sanitized by user).
+# But it's a no-op to run sanitize_address on an already sanitized address.
+$sender = sanitize_address($sender);
+
+if ($validate) {
+ foreach my $f (@files) {
+ unless (-p $f) {
+ pre_process_file($f, 1);
+
+ validate_patch($f, $target_xfer_encoding);
+ }
+ }
+}
+
if (@files) {
unless ($quiet) {
print $_,"\n" for (@files);
@@ -1035,18 +1057,6 @@ sub file_declares_8bit_cte {
}
}
-if (defined $sender) {
- $sender =~ s/^\s+|\s+$//g;
- ($sender) = expand_aliases($sender);
-} else {
- $sender = $repoauthor->() || $repocommitter->() || '';
-}
-
-# $sender could be an already sanitized address
-# (e.g. sendemail.from could be manually sanitized by user).
-# But it's a no-op to run sanitize_address on an already sanitized address.
-$sender = sanitize_address($sender);
-
my $to_whom = __("To whom should the emails be sent (if anyone)?");
my $prompting = 0;
if (!@initial_to && !defined $to_cmd) {
@@ -1120,16 +1130,6 @@ sub expand_one_alias {
$time = time - scalar $#files;
-if ($validate) {
- foreach my $f (@files) {
- unless (-p $f) {
- pre_process_file($f, 1);
-
- validate_patch($f, $target_xfer_encoding);
- }
- }
-}
-
$in_reply_to = $initial_in_reply_to;
$references = $initial_in_reply_to || '';
$message_num = 0;
All tests pass with that, which is less good than it sounds, because
shouldn't your tests be checking whether we have this non--quiet
print-out of the files as expected before or after the validation hook
runs?
> +
> + my ($recipients_ref, $to, $date, $gitversion, $cc, $ccline, $header) = gen_header();
> +
> + require File::Temp;
> + my ($header_filehandle, $header_filename) = File::Temp::tempfile(
> + ".gitsendemail.header.XXXXXX", DIR => $repo->repo_path());
> + print $header_filehandle $header;
> +
> my @cmd = ("git", "hook", "run", "--ignore-missing",
> $hook_name, "--");
> - my @cmd_msg = (@cmd, "<patch>");
> - my @cmd_run = (@cmd, $target);
> + my @cmd_msg = (@cmd, "<patch>", "<header>");
> + my @cmd_run = (@cmd, $target, $header_filename);
> $hook_error = system_or_msg(\@cmd_run, undef, "@cmd_msg");
> + unlink($header_filehandle);
> chdir($cwd_save) or die("chdir: $!");
I'm still curious about the "stdin" question I asked in the last round.
^ permalink raw reply
* Re: [PATCH v6 10/12] http: replace unsafe size_t multiplication with st_mult
From: Ævar Arnfjörð Bjarmason @ 2023-01-18 23:16 UTC (permalink / raw)
To: Victoria Dye
Cc: Matthew John Cheetham via GitGitGadget, git, Derrick Stolee,
Lessley Dennington, M Hickford, Jeff Hostetler, Glen Choo,
Matthew John Cheetham, Matthew John Cheetham
In-Reply-To: <aa8abc8d-284b-b87e-f594-27ee40cc4bec@github.com>
On Wed, Jan 18 2023, Victoria Dye wrote:
> Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Jan 18 2023, Matthew John Cheetham via GitGitGadget wrote:
>>
>>> From: Matthew John Cheetham <mjcheetham@outlook.com>
>>>
>>> Replace direct multiplication of two size_t parameters in curl response
>>> stream handling callback functions with `st_mult` to guard against
>>> overflows.
>>>
>>> Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
>>> ---
>>> http.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/http.c b/http.c
>>> index 8a5ba3f4776..a2a80318bb2 100644
>>> --- a/http.c
>>> +++ b/http.c
>>> @@ -146,7 +146,7 @@ static int http_schannel_use_ssl_cainfo;
>>>
>>> size_t fread_buffer(char *ptr, size_t eltsize, size_t nmemb, void *buffer_)
>>> {
>>> - size_t size = eltsize * nmemb;
>>> + size_t size = st_mult(eltsize, nmemb);
>>> struct buffer *buffer = buffer_;
>>>
>>> if (size > buffer->buf.len - buffer->posn)
>>> @@ -176,7 +176,7 @@ curlioerr ioctl_buffer(CURL *handle, int cmd, void *clientp)
>>>
>>> size_t fwrite_buffer(char *ptr, size_t eltsize, size_t nmemb, void *buffer_)
>>> {
>>> - size_t size = eltsize * nmemb;
>>> + size_t size = st_mult(eltsize, nmemb);
>>> struct strbuf *buffer = buffer_;
>>>
>>> strbuf_add(buffer, ptr, size);
>>
>> This is a really worthwhile fix, but shouldn't this be split into its
>> own stand-alone patch? It applies on "master", and seems like something
>> that's a good idea outside of this "test-http-server" topic.
>
> While it's this change *can* stand alone, please keep in mind that
> suggestions like this (recommending a series be split and resubmitted) can
> be highly disruptive to the in-flight topic and the original contributor.
>
> Monitoring and iterating on multiple series at once is time-consuming for
> the contributor and reviewers, and often (although not in this case) it
> creates a dependency of one series on another, which comes with a cost to
> the maintainer's time. Not to say those recommendations should never be made
> (e.g. in a clearly too-long series early in its review cycle, or when
> certain patches lead to excessive context switching while reviewing), just
> that they should be made more carefully, with consideration for the time of
> other contributors.
>
> So, with that in mind, I don't think this patch is critical enough to
> separate into an independent submission, and (subjectively) it does not
> disrupt the flow of this series.
Yes, I take your general point, it's not always the right thing,
sometimes a while-at-it cleanup is better than a split-out etc.
In this case the split-out seemed like it wouldn't create a dependency
between topics, as the rest of the series didn't rely on the overflow
sanity check being added, it's just a good idea to do it in general.
^ permalink raw reply
* Re: [PATCH v4 08/19] worktree: fix a trivial leak in prune_worktrees()
From: Junio C Hamano @ 2023-01-18 23:20 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, René Scharfe, Eric Sunshine
In-Reply-To: <230119.86h6wnwihp.gmgdl@evledraar.gmail.com>
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> I'd rather see your reroll with the above addition of UNLEAK() than
>> without it, to fix the breakage.
>
> I don't mind that UNLEAK() being in-tree until a better fix for that
> leak, but doesn't the v5 I sent also address this?
>
> The issue was that I mis-marked a test as passing, when it only passed
> depending on my local compiler (-fsanitize=leak is fickle
> sometimes). Now that we're not marking that test as leak-free there's no
> need for the UNLEAK() for now, no?
>
> Or is there some edge case I didn't spot/notice?
The top-level singleton instance of "filter" that is used once and
never grows unbounded is a perfect use case for UNLEAK(). And with
it, the test DOES get leak-free, so it would be appropriate to keep
the PASSES variable added to the script (until it gets broken
again).
Or did you have any other uses of tools with leaks in the script,
other than the one that is fixed with the UNLEAK()?
^ permalink raw reply
* Re: [PATCH v3] grep: correctly identify utf-8 characters with \{b,w} in -P
From: Junio C Hamano @ 2023-01-18 23:24 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Carlo Arenas, git, Diomidis Spinellis
In-Reply-To: <230119.86cz7bwign.gmgdl@evledraar.gmail.com>
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> In short, it sounds to me that the earlier one that added PCRE2_UCP
>> unconditionally would be the best alternative among those that have
>> been discussed.
>
> I agree.
So, ... we'll mark cb/grep-pcre-ucp, ea8bc435 (grep: correctly
identify utf-8 characters with \{b,w} in -P, 2023-01-08), to be
merged to 'next' and see what happens. I'll add your Acked-by
while at it, if you do not mind.
---- >8 --------- >8 --------- >8 --------- >8 ------
From: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Date: Sun, 8 Jan 2023 07:52:17 -0800
Subject: [PATCH] grep: correctly identify utf-8 characters with \{b,w} in -P
When UTF is enabled for a PCRE match, the corresponding flags are
added to the pcre2_compile() call, but PCRE2_UCP wasn't included.
This prevents extending the meaning of the character classes to
include those new valid characters and therefore result in failed
matches for expressions that rely on that extention, for ex:
$ git grep -P '\bÆvar'
Add PCRE2_UCP so that \w will include Æ and therefore \b could
correctly match the beginning of that word.
This has an impact on performance that has been estimated to be
between 20% to 40% and that is shown through the added performance
test.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
grep.c | 2 +-
t/perf/p7822-grep-perl-character.sh | 42 +++++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 1 deletion(-)
create mode 100755 t/perf/p7822-grep-perl-character.sh
diff --git a/grep.c b/grep.c
index 06eed69493..1687f65b64 100644
--- a/grep.c
+++ b/grep.c
@@ -293,7 +293,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
options |= PCRE2_CASELESS;
}
if (!opt->ignore_locale && is_utf8_locale() && !literal)
- options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
+ options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
#ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
/* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
diff --git a/t/perf/p7822-grep-perl-character.sh b/t/perf/p7822-grep-perl-character.sh
new file mode 100755
index 0000000000..87009c60df
--- /dev/null
+++ b/t/perf/p7822-grep-perl-character.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+test_description="git-grep's perl regex
+
+If GIT_PERF_GREP_THREADS is set to a list of threads (e.g. '1 4 8'
+etc.) we will test the patterns under those numbers of threads.
+"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+if test -n "$GIT_PERF_GREP_THREADS"
+then
+ test_set_prereq PERF_GREP_ENGINES_THREADS
+fi
+
+for pattern in \
+ '\\bhow' \
+ '\\bÆvar' \
+ '\\d+ \\bÆvar' \
+ '\\bBelón\\b' \
+ '\\w{12}\\b'
+do
+ echo '$pattern' >pat
+ if ! test_have_prereq PERF_GREP_ENGINES_THREADS
+ then
+ test_perf "grep -P '$pattern'" --prereq PCRE "
+ git -P grep -f pat || :
+ "
+ else
+ for threads in $GIT_PERF_GREP_THREADS
+ do
+ test_perf "grep -P '$pattern' with $threads threads" --prereq PTHREADS,PCRE "
+ git -c grep.threads=$threads -P grep -f pat || :
+ "
+ done
+ fi
+done
+
+test_done
--
2.39.1-231-ga7caae2729
^ permalink raw reply related
* Re: [PATCH] worktree: teach find_shared_symref to ignore current worktree
From: Rubén Justo @ 2023-01-18 23:50 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git List, Eric Sunshine
In-Reply-To: <xmqqilh491y6.fsf@gitster.g>
On 17-ene-2023 15:27:29, Junio C Hamano wrote:
> being solved. Rather, it is unclear what problem you are solving.
Sorry, I didn't explain the motivation well in the message.
First, I'm not sure if "ignore_current_worktree" is correct, maybe needs
to be "prefer_current_worktree". Having said that, let me use an example.
With...
$ git worktree add wt1 -b main
$ git worktree add wt2 -f main
... we get confuse results:
$ git -C wt1 rebase main main
Current branch main is up to date.
$ git -C wt2 rebase main main
fatal: 'main' is already checked out...
The problem I'm trying to solve is that find_shared_symref() returns the
first matching worktree, and a possible matching "current worktree"
might not be the first matching one. That's why die_if_checked_out()
dies with "git -C wt2".
find_shared_symref() searches through the list of worktrees that
get_worktrees() composes: first the main worktree and then, as getdir()
returns them, those in .git/worktrees/*. The search is sequential and
once a match is found, it is returned. And so die_if_checked_out(),
when asked to "ignore_current_worktree", is going to consider for
"is_current" the worktree which may or may not be the "current" one,
depending on the sequence from get_worktree(), and getdir() ultimately.
If we want to disallow operations on a worktree with a checked out
branch also on another worktree, we need the "ignore_current_worktree".
But, and now I'm more in favor of this, if we prefer to allow the
operation, we need a "prefer_current_worktree", to induce
find_shared_symref() to return the "current" one if it matches, or any
other one that matches.
^ permalink raw reply
* Re: [PATCH v3 0/9] config API: make "multi" safe, fix numerous segfaults
From: Glen Choo @ 2023-01-19 0:10 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git
Cc: Junio C Hamano, Derrick Stolee, Jeff King, Taylor Blau,
SZEDER Gábor, Ævar Arnfjörð Bjarmason
In-Reply-To: <cover-v3-0.9-00000000000-20221125T093158Z-avarab@gmail.com>
We covered this at Review Club this week (thanks for coming, Ævar!). You
can find the notes at:
https://docs.google.com/document/d/14L8BAumGTpsXpjDY8VzZ4rRtpAjuGrFSRqn3stCuS_w/edit
The overall sentiment from the meeting was that this is a positive
direction for the config API to go in. My personal opinion is that this
series is close to mergeable and I had mostly minor comments.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> This series fixes numerous segfaults in config API users, because they
> didn't expect *_get_multi() to hand them a string_list with a NULL in
> it given config like "[a] key" (note, no "="'s).
As you mentioned in Review Club, this series also fixes a wart in
config.h where *_get_value_multi() returned a "struct string_list"
instead of an error code like all other getters. So this series is
technically doing two sort-of-different things...
> Ævar Arnfjörð Bjarmason (9):
> for-each-repo tests: test bad --config keys
> config tests: cover blind spots in git_die_config() tests
> config tests: add "NULL" tests for *_get_value_multi()
> versioncmp.c: refactor config reading next commit
> config API: have *_multi() return an "int" and take a "dest"
> for-each-repo: error on bad --config
Fix the wart..
> config API users: test for *_get_value_multi() segfaults
> config API: add "string" version of *_value_multi(), fix segfaults
> for-each-repo: with bad config, don't conflate <path> and <cmd>
and introduce the better API that won't segfault, but I think it's okay
to have the series do both since they're closely related enough and the
latter is quite small anyway.
^ permalink raw reply
* Re: [PATCH v3 2/9] config tests: cover blind spots in git_die_config() tests
From: Glen Choo @ 2023-01-19 0:15 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git
Cc: Junio C Hamano, Derrick Stolee, Jeff King, Taylor Blau,
SZEDER Gábor, Ævar Arnfjörð Bjarmason
In-Reply-To: <patch-v3-2.9-3eb8da6086d-20221125T093159Z-avarab@gmail.com>
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> There were no tests checking for the output of the git_die_config()
> function in the config API, added in 5a80e97c827 (config: add
> `git_die_config()` to the config-set API, 2014-08-07). We only tested
> "test_must_fail", but didn't assert the output.
It wasn't immediately obvious to me why this was relevant to this
series; but reading ahead to 5/9 shows that git_die_config() is a caller
of a *_get_value_multi() function that we are changing, so we want to
assert on the output so that we know that git_die_config() is still
doing the right thing (since test_must_fail alone won't tell us whether
we introduced bugs in git_die_config()).
Might be good to include that extra context, but I don't feel strongly
about it.
^ permalink raw reply
* Re: [PATCH v3 3/9] config tests: add "NULL" tests for *_get_value_multi()
From: Glen Choo @ 2023-01-19 0:28 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git
Cc: Junio C Hamano, Derrick Stolee, Jeff King, Taylor Blau,
SZEDER Gábor, Ævar Arnfjörð Bjarmason
In-Reply-To: <patch-v3-3.9-14b08dfc162-20221125T093159Z-avarab@gmail.com>
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> A less well known edge case in the config format is that keys can be
> value-less, a shorthand syntax for "true" boolean keys. I.e. these two
> are equivalent as far as "--type=bool" is concerned:
>
> [a]key
> [a]key = true
>
> But as far as our parser is concerned the values for these two are
> NULL, and "true". I.e. for a sequence like:
>
> [a]key=x
> [a]key
> [a]key=y
>
> We get a "struct string_list" with "string" members with ".string"
> values of:
>
> { "x", NULL, "y" }
>
> This behavior goes back to the initial implementation of
> git_config_bool() in 17712991a59 (Add ".git/config" file parser,
> 2005-10-10).
I didn't know about this behavior before, actually. Thanks for the
explanation.
> When the "t/t1308-config-set.sh" tests were added in [1] only one of
> the three "(NULL)" lines in "t/helper/test-config.c" had any test
> coverage. This change adds tests that stress the remaining two.
I initially read this as testing that t/helper/test-config.c is doing
the right thing, which would be the antipattern of writing tests for our
tests.
During Review Club, you mentioned that the motivation was something
else, which IIRC is closer to exercising the internals of the configset
API, which makes sense to me, thought it would be helpful to clarify
that better in the commit message.
> +test_expect_success 'emit multi values from configset with NULL entry' '
> + test_when_finished "rm -f my.config" &&
> + cat >my.config <<-\EOF &&
> + [a]key=x
> + [a]key
> + [a]key=y
> + EOF
> + cat >expect <<-\EOF &&
> + x
> + (NULL)
> + y
> + EOF
> + test-tool config configset_get_value_multi a.key my.config >actual &&
> + test_cmp expect actual
> +'
So if this meant to exercise configset_get_value_multi(), maybe it would
be even clearer to just say so in the test name, e.g.
'configset_get_value_multi with NULL entry'.
Side comment: by the end of the series, *_get_value_multi() has no
legitimate callers outside of config.c and the test code, and if we
remove it from config.h, this scenario wouldn't ever bother us in actual
`git` usage, but we probably still want to test for it since we want to
exercise the internals.
^ permalink raw reply
* Re: [PATCH v3 5/9] config API: have *_multi() return an "int" and take a "dest"
From: Glen Choo @ 2023-01-19 0:50 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git
Cc: Junio C Hamano, Derrick Stolee, Jeff King, Taylor Blau,
SZEDER Gábor, Ævar Arnfjörð Bjarmason
In-Reply-To: <patch-v3-5.9-e0e6ade3f38-20221125T093159Z-avarab@gmail.com>
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> Have the "git_configset_get_value_multi()" function and its siblings
> return an "int" and populate a "**dest" parameter like every other
> git_configset_get_*()" in the API.
Indeed, this is the only function that's inconsistent. Great to see that
it's being fixed :)
> As we'll see in in subsequent commits this fixes a blind spot in the
> API where it wasn't possible to tell whether a list was empty from
> whether a config key existed. We'll take advantage of that in
> subsequent commits, but for now we're faithfully converting existing
> API callers.
Sounds good.
> Most of this is straightforward, commentary on cases that stand out:
Thanks for this btw, I found it quite helpful for navigating the patch.
>
> - As we've tested for in a preceding commit we can rely on getting the
> config list in git_die_config(), and as we need to handle the new
> return value let's BUG() out if we can't acquire it.
This wasn't immediately clear to me; I'll explain more in the code.
> - In "builtin/for-each-ref.c" we could preserve the comment added in
I think you meant for-each-repo.
> @@ -45,14 +46,10 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
> if (!config_key)
> die(_("missing --config=<config>"));
>
> - values = repo_config_get_value_multi(the_repository,
> - config_key);
> -
> - /*
> - * Do nothing on an empty list, which is equivalent to the case
> - * where the config variable does not exist at all.
> - */
> - if (!values)
> + err = repo_config_get_value_multi(the_repository, config_key, &values);
> + if (err < 0)
> + return 0;
> + else if (err)
> return 0;
This conditional could be collapsed into "if (err)", but it's like this
because the next patch distinguishes between the two cases. Not really
worth the callout in commentary, but FYI for others who might be
wondering the same thing.
> diff --git a/config.c b/config.c
> index c058b2c70c3..0b07045ed8c 100644
> --- a/config.c
> +++ b/config.c
> @@ -2275,23 +2275,28 @@ void read_very_early_config(config_fn_t cb, void *data)
> config_with_options(cb, data, NULL, &opts);
> }
>
> -static struct config_set_element *configset_find_element(struct config_set *cs, const char *key)
> +static int configset_find_element(struct config_set *cs, const char *key,
> + struct config_set_element **dest)
> {
> struct config_set_element k;
> struct config_set_element *found_entry;
> char *normalized_key;
> + int ret;
> +
> /*
> * `key` may come from the user, so normalize it before using it
> * for querying entries from the hashmap.
> */
> - if (git_config_parse_key(key, &normalized_key, NULL))
> - return NULL;
> + ret = git_config_parse_key(key, &normalized_key, NULL);
> + if (ret < 0)
> + return ret;
>
> hashmap_entry_init(&k.ent, strhash(normalized_key));
> k.key = normalized_key;
> found_entry = hashmap_get_entry(&cs->config_hash, &k, ent, NULL);
> free(normalized_key);
> - return found_entry;
> + *dest = found_entry;
> + return 0;
> }
>
> static int configset_add_value(struct config_set *cs, const char *key, const char *value)
> @@ -2300,8 +2305,11 @@ static int configset_add_value(struct config_set *cs, const char *key, const cha
> struct string_list_item *si;
> struct configset_list_item *l_item;
> struct key_value_info *kv_info = xmalloc(sizeof(*kv_info));
> + int ret;
>
> - e = configset_find_element(cs, key);
> + ret = configset_find_element(cs, key, &e);
> + if (ret < 0)
> + return ret;
> /*
> * Since the keys are being fed by git_config*() callback mechanism, they
> * are already normalized. So simply add them without any further munging.
> @@ -2395,24 +2403,38 @@ int git_configset_add_file(struct config_set *cs, const char *filename)
> int git_configset_get_value(struct config_set *cs, const char *key, const char **value)
> {
> const struct string_list *values = NULL;
> + int ret;
> +
> /*
> * Follows "last one wins" semantic, i.e., if there are multiple matches for the
> * queried key in the files of the configset, the value returned will be the last
> * value in the value list for that key.
> */
> - values = git_configset_get_value_multi(cs, key);
> + ret = git_configset_get_value_multi(cs, key, &values);
>
> - if (!values)
> + if (ret < 0)
> + return ret;
> + else if (!values)
> return 1;
> assert(values->nr > 0);
> *value = values->items[values->nr - 1].string;
> return 0;
> }
>
> -const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key)
> +int git_configset_get_value_multi(struct config_set *cs, const char *key,
> + const struct string_list **dest)
> {
> - struct config_set_element *e = configset_find_element(cs, key);
> - return e ? &e->value_list : NULL;
> + struct config_set_element *e;
> + int ret;
> +
> + ret = configset_find_element(cs, key, &e);
> + if (ret < 0)
> + return ret;
> + else if (!e)
> + return 1;
> + *dest = &e->value_list;
> +
> + return 0;
> }
The changes here and the call sites look quite straightforward.
> int git_config_get_string(const char *key, char **dest)
> @@ -2819,7 +2841,8 @@ void git_die_config(const char *key, const char *err, ...)
> error_fn(err, params);
> va_end(params);
> }
> - values = git_config_get_value_multi(key);
> + if (git_config_get_value_multi(key, &values))
> + BUG("for key '%s' we must have a value to report on", key);
> kv_info = values->items[values->nr - 1].util;
> git_die_config_linenr(key, kv_info->filename, kv_info->linenr);
> }
Here is the BUG() call that wasn't immediately clear to me. What wasn't
obvious from the commentary is that this was an 'unhandled error'
before (we didn't check if the returned value was NULL). Arguably we
should have had this BUG call before, but we didn't enforce this until
we added RESULT_MUST_BE_USED.
And this should be a BUG(), and not e.g. error(), since git_die_config()
is meant to report bad config values, so git_config_get_value_multi()
should never fail if we've already managed to get a value, looks good.
> diff --git a/config.h b/config.h
> index ef9eade6414..7f6ce6f2fb5 100644
> --- a/config.h
> +++ b/config.h
> @@ -459,10 +459,18 @@ int git_configset_add_parameters(struct config_set *cs);
> /**
> * Finds and returns the value list, sorted in order of increasing priority
> * for the configuration variable `key` and config set `cs`. When the
> - * configuration variable `key` is not found, returns NULL. The caller
> - * should not free or modify the returned pointer, as it is owned by the cache.
> + * configuration variable `key` is not found, returns 1 without touching
> + * `value`.
> + *
> + * The key will be parsed for validity with git_config_parse_key(), on
> + * error a negative value will be returned.
> + *
> + * The caller should not free or modify the returned pointer, as it is
> + * owned by the cache.
> */
> -const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key);
> +RESULT_MUST_BE_USED
> +int git_configset_get_value_multi(struct config_set *cs, const char *key,
> + const struct string_list **dest);
Updated comments look good too.
^ permalink raw reply
* Re: [PATCH v3 7/9] config API users: test for *_get_value_multi() segfaults
From: Glen Choo @ 2023-01-19 0:51 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git
Cc: Junio C Hamano, Derrick Stolee, Jeff King, Taylor Blau,
SZEDER Gábor, Ævar Arnfjörð Bjarmason
In-Reply-To: <patch-v3-7.9-f35aacef4ca-20221125T093159Z-avarab@gmail.com>
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> +test_expect_failure 'unregister with no value for maintenance.repo' '
> + cp .git/config .git/config.orig &&
> + test_when_finished mv .git/config.orig .git/config &&
> +
> + cat >>.git/config <<-\EOF &&
> + [maintenance]
> + repo
> + EOF
> + cat >expect <<-\EOF &&
> + error: missing value for '\''maintenance.repo'\''
> + EOF
> + git maintenance unregister &&
> + git maintenance unregister --force
> +'
> +
Mechanical error: This 'expect' was probably meant for the next patch.
^ permalink raw reply
* Re: [PATCH v3 8/9] config API: add "string" version of *_value_multi(), fix segfaults
From: Glen Choo @ 2023-01-19 1:03 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git
Cc: Junio C Hamano, Derrick Stolee, Jeff King, Taylor Blau,
SZEDER Gábor, Ævar Arnfjörð Bjarmason
In-Reply-To: <patch-v3-8.9-b45189b4624-20221125T093159Z-avarab@gmail.com>
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> Fix numerous and mostly long-standing segfaults in consumers of
> the *_config_*value_multi() API. As discussed in the preceding commit
> an empty key in the config syntax yields a "NULL" string, which these
> users would give to strcmp() (or similar), resulting in segfaults.
>
> As this change shows, most users users of the *_config_*value_multi()
> API didn't really want such an an unsafe and low-level API, let's give
> them something with the safety of git_config_get_string() instead.
I think the low-level API argument makes sense. All of the other
*_get_*() functions perform some kind of validation, e.g.
config_parse_*() for non-string types and config_error_nonbool() for
strings. In effect, *_get_value_multi() was returning raw output from
the config parser without any concern for the caller.
> This fix is similar to what the *_string() functions and others
> acquired in[1] and [2]. Namely introducing and using a safer
> "*_get_string_multi()" variant of the low-level "_*value_multi()"
> function.
This suggests to me that we should really get rid of
*_get_value_multi(), since nobody outside of config.c should need it. I
don't think we'd ever end up in a situation where the caller wants the
raw strings from the config parser (unless we had a config
key which accepted values of different types? but that sounds like a
terrible mistake).
> There are now three remaining files using the low-level API:
>
> - Two cases in "builtin/submodule--helper.c", where it's used safely
> to see if any config exists.
> - One in "builtin/for-each-repo.c", which we'll convert in a
> subsequent commit.
> - The "t/helper/test-config.c" code added in [3].
As you noted, the only remaining non-test caller of the low-level API is
builtin/submodule--helper.c, which maybe we could safely convert anyway
and get rid of the API altogether. I'm okay with that being a leftover
bit, but maybe that's worth noting in the CL.
> The callback pattern being used here will make it easy to introduce
> e.g. a "multi" variant which coerces its values to "bool", "int",
> "path" etc.
I like that this is quite easily extensible, e.g.
> diff --git a/config.c b/config.c
> index 0b07045ed8c..9bd43189c02 100644
> --- a/config.c
> +++ b/config.c
> @@ -2437,6 +2437,25 @@ int git_configset_get_value_multi(struct config_set *cs, const char *key,
> return 0;
> }
>
> +static int check_multi_string(struct string_list_item *item, void *util)
> +{
> + return item->string ? 0 : config_error_nonbool(util);
> +}
> +
> +int git_configset_get_string_multi(struct config_set *cs, const char *key,
> + const struct string_list **dest)
> +{
> + int ret;
> +
> + if ((ret = git_configset_get_value_multi(cs, key, dest)))
> + return ret;
> + if ((ret = for_each_string_list((struct string_list *)*dest,
> + check_multi_string, (void *)key)))
> + return ret;
> +
> + return 0;
> +}
we could just use config_parse_<typename>() if we want to add, e.g.
*_get_bool_multi().
And as a reasonableness check, config_error_nonbool() is what we use to
validate the *_get_string() functions, so it makes sense to reuse it for
the string list version.
^ permalink raw reply
* Re: [PATCH v8 2/2] send-email: expose header information to git-send-email's sendemail-validate hook
From: Michael Strawbridge @ 2023-01-19 1:05 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Luben Tuikov, Junio C Hamano
In-Reply-To: <230119.868rhzwi36.gmgdl@evledraar.gmail.com>
On 2023-01-18 18:12, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Jan 18 2023, Michael Strawbridge wrote:
>
>> To allow further flexibility in the Git hook, the SMTP header
>> information of the email which git-send-email intends to send, is now
>> passed as the 2nd argument to the sendemail-validate hook.
>>
>> As an example, this can be useful for acting upon keywords in the
>> subject or specific email addresses.
>>
>> As a consequence of needing all the header data, validation has been
>> moved later in the sequence to right before sending the emails instead
>> of at the beginning.
> Ah, I see. I tested this (i.e. moving it back to the previous behavior)
> and you did this change because you don't have the $sender variable yet.
>
> I tried this quickly on top, which seems to work, i.e. now we do this in
> the same order as before, but we just move the $sender code earlier:
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index d123dfd33d5..7e7681116bb 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -787,6 +787,28 @@ sub is_format_patch_arg {
>
> @files = handle_backup_files(@files);
>
> +if (defined $sender) {
> + $sender =~ s/^\s+|\s+$//g;
> + ($sender) = expand_aliases($sender);
> +} else {
> + $sender = $repoauthor->() || $repocommitter->() || '';
> +}
> +
> +# $sender could be an already sanitized address
> +# (e.g. sendemail.from could be manually sanitized by user).
> +# But it's a no-op to run sanitize_address on an already sanitized address.
> +$sender = sanitize_address($sender);
> +
> +if ($validate) {
> + foreach my $f (@files) {
> + unless (-p $f) {
> + pre_process_file($f, 1);
> +
> + validate_patch($f, $target_xfer_encoding);
> + }
> + }
> +}
> +
> if (@files) {
> unless ($quiet) {
> print $_,"\n" for (@files);
> @@ -1035,18 +1057,6 @@ sub file_declares_8bit_cte {
> }
> }
>
> -if (defined $sender) {
> - $sender =~ s/^\s+|\s+$//g;
> - ($sender) = expand_aliases($sender);
> -} else {
> - $sender = $repoauthor->() || $repocommitter->() || '';
> -}
> -
> -# $sender could be an already sanitized address
> -# (e.g. sendemail.from could be manually sanitized by user).
> -# But it's a no-op to run sanitize_address on an already sanitized address.
> -$sender = sanitize_address($sender);
> -
> my $to_whom = __("To whom should the emails be sent (if anyone)?");
> my $prompting = 0;
> if (!@initial_to && !defined $to_cmd) {
> @@ -1120,16 +1130,6 @@ sub expand_one_alias {
>
> $time = time - scalar $#files;
>
> -if ($validate) {
> - foreach my $f (@files) {
> - unless (-p $f) {
> - pre_process_file($f, 1);
> -
> - validate_patch($f, $target_xfer_encoding);
> - }
> - }
> -}
> -
> $in_reply_to = $initial_in_reply_to;
> $references = $initial_in_reply_to || '';
> $message_num = 0;
>
> All tests pass with that, which is less good than it sounds, because
> shouldn't your tests be checking whether we have this non--quiet
> print-out of the files as expected before or after the validation hook
> runs?
Thank you very much! That idea to move sender earlier actually
simplifies this a lot. I had to move a piece relating to the date field
earlier as well but after doing that I think the patch gives the same
output.
I think checking the timing of stdout in tests is not easy. I'm a bit
unsure we would want to test such specific stdout behaviour in the
tests. I have focused on the functional output (eg header output).
Luckily with the above, my change will no longer introduce any stdout
differences anymore.
I will wait a little before pushing out a v9 in case there is more feedback.
>> +
>> + my ($recipients_ref, $to, $date, $gitversion, $cc, $ccline, $header) = gen_header();
>> +
>> + require File::Temp;
>> + my ($header_filehandle, $header_filename) = File::Temp::tempfile(
>> + ".gitsendemail.header.XXXXXX", DIR => $repo->repo_path());
>> + print $header_filehandle $header;
>> +
>> my @cmd = ("git", "hook", "run", "--ignore-missing",
>> $hook_name, "--");
>> - my @cmd_msg = (@cmd, "<patch>");
>> - my @cmd_run = (@cmd, $target);
>> + my @cmd_msg = (@cmd, "<patch>", "<header>");
>> + my @cmd_run = (@cmd, $target, $header_filename);
>> $hook_error = system_or_msg(\@cmd_run, undef, "@cmd_msg");
>> + unlink($header_filehandle);
>> chdir($cwd_save) or die("chdir: $!");
> I'm still curious about the "stdin" question I asked in the last round.
For the stdin question, are you referring to the git hook run question?
I know there have been a lot of parallel threads so you may have missed
my reply
(https://public-inbox.org/git/20230117215811.78313-1-michael.strawbridge@amd.com/).
Here's what I responded with:
I was trying to follow the convention that the original hook was using.
I'm not against changing this if the out of tree patches you speak of
are going to be rolled in soon. However, I'd prefer not to delay this
patch if these other patches are far off. Thanks.
^ permalink raw reply
* Re: [RFC/PATCH 0/6] hash-object: use fsck to check objects
From: Jeff King @ 2023-01-19 1:39 UTC (permalink / raw)
To: git; +Cc: René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>
On Wed, Jan 18, 2023 at 03:35:06PM -0500, Jeff King wrote:
> The other option is having the fsck code avoid looking past the size it
> was given. I think the intent is that this should work, from commits
> like 4d0d89755e (Make sure fsck_commit_buffer() does not run out of the
> buffer, 2014-09-11). We do use skip_prefix() and parse_oid_hex(), which
> won't respect the size, but I think[1] that's OK because we'll have
> parsed up to the end-of-header beforehand (and those functions would
> never match past there).
>
> Which would mean that 9a1a3a4d4c (mktag: allow omitting the header/body
> \n separator, 2021-01-05) and acf9de4c94 (mktag: use fsck instead of
> custom verify_tag(), 2021-01-05) were buggy, and we can just fix them.
>
> [1] But I said "I think" above because it can get pretty subtle. There's
> some more discussion in this thread:
>
> https://lore.kernel.org/git/20150625155128.C3E9738005C@gemini.denx.de/
>
> but I haven't yet convinced myself it's safe. This is exactly the
> kind of analysis I wish I had the power to nerd-snipe René into.
I poked at this a bit more, and it definitely isn't safe. I think the
use of skip_prefix(), etc, is OK, because they'd always stop at an
unexpected newline. But verify_headers() is only confirming that we have
a series of complete lines, and we might end with no "\n\n" (and hence
no commit/tag message). And so the obvious case that fools us is one
where the data simply ends at a newline, but we are missing one or more
headers. So a truncated commit like:
tree 1234567890123456789012345678901234567890
(with the newline at the end of the "tree" line, but nothing else) will
cause fsck_commit() to look past the "size" we pass it. With all of the
current callers, that means it sees a NUL and bails. So it's not
currently a bug, but it becomes one if we can feed it arbitrary buffers.
Fixing it isn't _too_ bad, and could look something like this:
diff --git a/fsck.c b/fsck.c
index c2c8facd2d..3f0bb3e350 100644
--- a/fsck.c
+++ b/fsck.c
@@ -834,6 +834,7 @@ static int fsck_commit(const struct object_id *oid,
unsigned author_count;
int err;
const char *buffer_begin = buffer;
+ const char *buffer_end = buffer + size;
const char *p;
if (verify_headers(buffer, size, oid, OBJ_COMMIT, options))
@@ -847,7 +848,7 @@ static int fsck_commit(const struct object_id *oid,
return err;
}
buffer = p + 1;
- while (skip_prefix(buffer, "parent ", &buffer)) {
+ while (buffer < buffer_end && skip_prefix(buffer, "parent ", &buffer)) {
if (parse_oid_hex(buffer, &parent_oid, &p) || *p != '\n') {
err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
if (err)
@@ -856,7 +857,7 @@ static int fsck_commit(const struct object_id *oid,
buffer = p + 1;
}
author_count = 0;
- while (skip_prefix(buffer, "author ", &buffer)) {
+ while (buffer < buffer_end && skip_prefix(buffer, "author ", &buffer)) {
author_count++;
err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
if (err)
@@ -868,7 +869,7 @@ static int fsck_commit(const struct object_id *oid,
err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
if (err)
return err;
- if (!skip_prefix(buffer, "committer ", &buffer))
+ if (buffer >= buffer_end || !skip_prefix(buffer, "committer ", &buffer))
return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line");
err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
if (err)
And then the tag side would need something similar. I'd probably also
sprinkle some comments in verify_headers() and its callers documenting
our assumptions and what's OK to do (string-like parsing functions work
as long as they stop when they hit a newline). That, plus a few tests
covering the problematic cases to avoid regressions, would probably be
OK.
I think fsck_tree() is mostly fine, as the tree-iterating code detects
truncation. Though I do find the use of strlen() in decode_tree_entry()
a little suspicious (and that would be true of the current code, as
well, since it powers hash-object's existing parsing check).
-Peff
^ permalink raw reply related
* [PATCH] hash-object: fix descriptor leak with --literally
From: Jeff King @ 2023-01-19 1:57 UTC (permalink / raw)
To: git
In hash_object(), we open a descriptor for each file to hash (whether we
got the filename from the command line or --stdin-paths), but never
close it. For the traditional code path which feeds the result to
index_fd(), this is OK; it closes the descriptor for us.
But 5ba9a93b39 (hash-object: add --literally option, 2014-09-11) a
second code path which does not close the descriptor. There we need to
do so ourselves.
You can see the problem in a clone of git.git like this:
$ git ls-files -s | grep ^100644 | cut -f2 |
git hash-object --stdin-paths --literally >/dev/null
fatal: could not open 'builtin/var.c' for reading: Too many open files
After this patch, it completes successfully. I didn't bother with a
test, as it's a pain to deal with descriptor limits portably, and the
fix is so trivial.
Signed-off-by: Jeff King <peff@peff.net>
---
Something I ran into while testing my hash-object fsck series, but I
broke it off here because it's really an independent bug-fix.
I do think the world would be less confusing if index_fd() didn't close
the descriptor we pass it, and then hash_file() could just do:
fd = open();
hash_fd(fd);
close(fd);
which is much more readable. But it has many other callers. So even if
we wanted to untangle all that, I think it makes sense to do this
obvious fix in the meantime.
builtin/hash-object.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index b506381502..44db83f07f 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -27,6 +27,7 @@ static int hash_literally(struct object_id *oid, int fd, const char *type, unsig
else
ret = write_object_file_literally(buf.buf, buf.len, type, oid,
flags);
+ close(fd);
strbuf_release(&buf);
return ret;
}
--
2.39.1.616.gd06fca9e99
^ permalink raw reply related
* Re: [RFC/PATCH 0/6] hash-object: use fsck to check objects
From: Jeff King @ 2023-01-19 2:03 UTC (permalink / raw)
To: Taylor Blau
Cc: Junio C Hamano, git, René Scharfe,
Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hm4KKVwguGGrdn@nand.local>
On Wed, Jan 18, 2023 at 04:38:40PM -0500, Taylor Blau wrote:
> On Wed, Jan 18, 2023 at 12:59:24PM -0800, Junio C Hamano wrote:
> > The --literally option was invented initially primarily to allow a
> > bogus type of object (e.g. "hash-object -t xyzzy --literally") but I
> > am happy to see that we are finding different uses. I wonder if
> > these objects of known types but with syntactically bad contents can
> > be "repack"ed from loose into packed?
> >
> > > [5/6]: fsck: provide a function to fsck buffer without object struct
>
> It is indeed possible:
>
> --- >8 ---
> Initialized empty Git repository in /home/ttaylorr/src/git/t/trash directory.t9999-test/.git/
> expecting success of 9999.1 'repacking corrupt loose object into packed':
> name=$(echo $ZERO_OID | sed -e "s/00/Q/g") &&
> printf "100644 fooQ$name" | q_to_nul |
> git hash-object -w --stdin -t tree >in &&
>
> git pack-objects .git/objects/pack/pack <in
>
> Enumerating objects: 1, done.
> Counting objects: 100% (1/1), done.
> 06146c77fd19c096858d6459d602be0fdf10891b
> Writing objects: 100% (1/1), done.
> Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
> ok 1 - repacking corrupt loose object into packed
> --- 8< ---
Right, we don't do any fsck-ing when packing objects. Nor should we, I
think. We should be checking objects when they come into the repository
(via index-pack/unpack-objects) or when they're created (hash-object),
but there's little need to do so when they migrate between storage
formats.
The fact that "--literally" manually writes a loose object is mostly an
implementation detail. I think if we are not writing an object with an
esoteric type, that it could even just hit the regular index_fd() code
path (and drop the HASH_FORMAT_CHECK flag).
If you do write one with "-t xyzzy", I think pack-objects would barf,
but not because of fsck checks. It just couldn't represent that type
(which really makes such objects pretty pointless; you cannot ever fetch
or push them!).
-Peff
^ permalink raw reply
* Re: [PATCH 4/6] t: use hash-object --literally when created malformed objects
From: Jeff King @ 2023-01-19 2:06 UTC (permalink / raw)
To: Taylor Blau
Cc: git, René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hiWGOArYqzen58@nand.local>
On Wed, Jan 18, 2023 at 04:19:20PM -0500, Taylor Blau wrote:
> > This patch is worth looking at because it shows the kinds of things the
> > new hash-object from patch 6 will reject.
>
> Obviously we could avoid this patch entirely by making the new behavior
> of fscking the incoming objects hidden behind a `--fsck` flag or
> something. But I think the decision not to is a good one.
>
> We already have `--literally`, and it makes sense that passing that
> should let us write anything, and that not passing it should perform
> some validity checks. But I think exactly *what* those checks are is
> ambiguous enough that the absence of `--literally` implying fsck checks
> isn't out of the question.
>
> You address this in the last patch more thoroughly, but I figure that it
> is worth stating some of this here during review to indicate that I
> think the direction you pursued here is a good one.
Thanks for raising this, I think it's a good thing to consider. I didn't
even really think about making it a new option, since we already do
quality checks (and loosen them via --literally). This just seemed like
more of the same.
So yeah, if there were people who really wanted to distinguish between
the severity of the old checks and the new ones, we can add --fsck (or
even default to having it on, and disable it with --no-fsck to get the
old checks). But I just see little point in that.
One thing we _could_ support that my patch doesn't (I think; I didn't
test very deeply here) is respecting individual fsck.msgType config
variables. Again, I don't really see much point there. If you know you
are producing garbage, then just say --literally. The type-specific ones
are useful when you have to hold your nose and accept somebody else's
historical garbage, and you want to limit the damage as much as
possible.
-Peff
^ permalink raw reply
* Re: [PATCH 5/6] fsck: provide a function to fsck buffer without object struct
From: Jeff King @ 2023-01-19 2:07 UTC (permalink / raw)
To: Taylor Blau
Cc: git, René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hjiWUAIuiWfJoD@nand.local>
On Wed, Jan 18, 2023 at 04:24:25PM -0500, Taylor Blau wrote:
> On Wed, Jan 18, 2023 at 03:43:53PM -0500, Jeff King wrote:
> > However, the only external interface that fsck.c provides is
> > fsck_object(), which requires an object struct, then promptly discards
> > everything except its oid and type. Let's factor out the post-discard
> > part of that function as fsck_buffer(), leaving fsck_object() as a thin
> > wrapper around it. That will provide more flexibility for callers which
> > may not have a struct.
>
> It's really nice that the only thing we care about having an object
> struct around for is basically just knowing its type. IOW it seems to
> have made the refactoring here pretty straightforward, which is nice
> ;-).
Yeah, it was always in the back of my mind while doing other fsck
refactors. But I have to admit that I was surprised that we were so
close to the finish line. :)
-Peff
^ permalink raw reply
* Re: [PATCH 6/6] hash-object: use fsck for object checks
From: Jeff King @ 2023-01-19 2:31 UTC (permalink / raw)
To: Taylor Blau
Cc: git, René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hlyr0o6gs9omI5@nand.local>
On Wed, Jan 18, 2023 at 04:34:02PM -0500, Taylor Blau wrote:
> That being said, let me play devil's advocate for a second. Do the new
> fsck checks slow anything in hash-object down significantly? If so, then
> it's plausible to imagine a hash-object caller who (a) doesn't use
> `--literally`, but (b) does care about throughput if they're writing a
> large number of objects at once.
>
> I don't know if such a situation exists, or if these new fsck checks
> even slow hash-object down enough to care. But I didn't catch a
> discussion of this case in your series, so I figured I'd bring it up
> here just in case.
That's a really good point to bring up.
Prior to timing anything, here were my guesses:
- it won't make a big difference either way because the time is
dominated by computing sha1 anyway
- we might actually be a little faster for commits and tags in the new
code, because they aren't allocating structs for the pointed-to
objects (trees, parents, etc). Nor stuffing them into obj_hash, so
our total memory usage would be lower.
- trees may be a little slower, because we're doing a more analysis on
the filenames (sort order, various filesystem specific checks for
.git, etc)
And here's what I timed, using linux.git. First I pulled out the raw
object data like so:
mkdir -p commit tag tree
git cat-file --batch-all-objects --unordered --batch-check='%(objecttype) %(objectname)' |
perl -alne 'print $F[1] unless $F[0] eq "blob"' |
git cat-file --batch |
perl -ne '
/(\S+) (\S+) (\d+)/ or die "confusing: $_";
my $dir = "$2/" . substr($1, 0, 2);
my $fn = "$dir/" . substr($1, 2);
mkdir($dir);
open(my $fh, ">", $fn) or die "open($fn): $!";
read(STDIN, my $buf, $3) or die "read($3): $!";
print $fh $buf;
read(STDIN, $buf, 1); # trailing newline
'
And then I timed it like this:
find commit -type f | sort >input
hyperfine -L v old,new './git.{v} hash-object --stdin-paths -t commit <input'
which yielded:
Benchmark 1: ./git.old hash-object --stdin-paths -t commit <input
Time (mean ± σ): 7.264 s ± 0.142 s [User: 4.129 s, System: 3.043 s]
Range (min … max): 7.098 s … 7.558 s 10 runs
Benchmark 2: ./git.new hash-object --stdin-paths -t commit <input
Time (mean ± σ): 6.832 s ± 0.087 s [User: 3.848 s, System: 2.901 s]
Range (min … max): 6.752 s … 7.059 s 10 runs
Summary
'./git.new hash-object --stdin-paths -t commit <input' ran
1.06 ± 0.02 times faster than './git.old hash-object --stdin-paths -t commit <input'
So the new code is indeed faster, though really most of the time is
spent reading the data and computing the hash anyway. For comparison,
using --literally drops it to ~6.3s.
And according to massif, peak heap drops from 241MB to 80k. Which is
pretty good.
Trees are definitely slower, though. I reduced the number to fit in my
budget of patience:
find tree -type f | sort | head -n 200000 >input
hyperfine -L v old,new './git.{v} hash-object --stdin-paths -t tree <input'
And got:
Benchmark 1: ./git.old hash-object --stdin-paths -t tree <input
Time (mean ± σ): 2.470 s ± 0.022 s [User: 1.902 s, System: 0.549 s]
Range (min … max): 2.442 s … 2.509 s 10 runs
Benchmark 2: ./git.new hash-object --stdin-paths -t tree <input
Time (mean ± σ): 3.244 s ± 0.026 s [User: 2.661 s, System: 0.567 s]
Range (min … max): 3.215 s … 3.295 s 10 runs
Summary
'./git.old hash-object --stdin-paths -t tree <input' ran
1.31 ± 0.02 times faster than './git.new hash-object --stdin-paths -t tree <input'
So we indeed got a bit slower (and --literally here is ~2.2s). It's
enough that it outweighs the benefits from the commits getting faster
(especially because there tend to be more trees than commits). But those
also get diluted by blobs (which have a lot of data to hash and free
fsck checks).
So in the end, I think nobody would really care that much. The absolute
numbers are pretty small, and this is already a fairly dumb way to get
objects into your repository. The usual way is via index-pack, and it
already uses the fsck code for its checks. But I do think it was a good
question to explore (plus it found a descriptor leak in hash-object,
which I sent a separate patch for).
-Peff
^ permalink raw reply
* [PATCH] rebase: mark --update-refs as requiring the merge backend
From: Elijah Newren via GitGitGadget @ 2023-01-19 5:36 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
--update-refs is built in terms of the sequencer, which requires the
merge backend. It was already marked as incompatible with the apply
backend in the git-rebase manual, but the code didn't check for this
incompatibility and warn the user. Check and warn now.
While at it, fix a typo in t3422...and fix some misleading wording (all
useful options other than --whitespace=fix have long since been
implemented in the merge backend).
Signed-off-by: Elijah Newren <newren@gmail.com>
---
rebase: mark --update-refs as requiring the merge backend
--update-refs is built in terms of the sequencer, which requires the
merge backend. It was already marked as incompatible with the apply
backend in the git-rebase manual, but the code didn't check for this
incompatibility and warn the user. Check and warn now.
While at it, fix a typo in t3422...and fix some misleading wording (all
useful options other than --whitespace=fix have long since been
implemented in the merge backend).
Signed-off-by: Elijah Newren newren@gmail.com
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1466%2Fnewren%2Frebase-update-refs-imply-merge-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1466/newren/rebase-update-refs-imply-merge-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1466
builtin/rebase.c | 3 +++
t/t3422-rebase-incompatible-options.sh | 15 ++++++++++-----
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/builtin/rebase.c b/builtin/rebase.c
index 1481c5b6a5b..accd62fce48 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -1514,6 +1514,9 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
}
}
+ if (options.update_refs)
+ imply_merge(&options, "--update-refs");
+
if (options.type == REBASE_UNSPECIFIED) {
if (!strcmp(options.default_backend, "merge"))
imply_merge(&options, "--merge");
diff --git a/t/t3422-rebase-incompatible-options.sh b/t/t3422-rebase-incompatible-options.sh
index 6dabb05a2ad..5a00618d265 100755
--- a/t/t3422-rebase-incompatible-options.sh
+++ b/t/t3422-rebase-incompatible-options.sh
@@ -25,11 +25,11 @@ test_expect_success 'setup' '
'
#
-# Rebase has lots of useful options like --whitepsace=fix, which are
-# actually all built in terms of flags to git-am. Since neither
-# --merge nor --interactive (nor any options that imply those two) use
-# git-am, using them together will result in flags like --whitespace=fix
-# being ignored. Make sure rebase warns the user and aborts instead.
+# Rebase has a useful option, --whitespace=fix, which is actually
+# built in terms of flags to git-am. Since neither --merge nor
+# --interactive (nor any options that imply those two) use git-am,
+# using them together will result in --whitespace=fix being ignored.
+# Make sure rebase warns the user and aborts instead.
#
test_rebase_am_only () {
@@ -60,6 +60,11 @@ test_rebase_am_only () {
test_must_fail git rebase $opt --exec 'true' A
"
+ test_expect_success "$opt incompatible with --update-refs" "
+ git checkout B^0 &&
+ test_must_fail git rebase $opt --update-refs A
+ "
+
}
test_rebase_am_only --whitespace=fix
base-commit: 2b4f5a4e4bb102ac8d967cea653ed753b608193c
--
gitgitgadget
^ permalink raw reply related
* [PATCH v3] checkout/switch: disallow checking out same branch in multiple worktrees
From: Carlo Marcelo Arenas Belón @ 2023-01-19 5:53 UTC (permalink / raw)
To: git
Cc: pclouds, gitster, Carlo Marcelo Arenas Belón, Jinwook Jeong,
Rubén Justo, Eric Sunshine
In-Reply-To: <20230118061527.76218-1-carenas@gmail.com>
Commands `git switch -C` and `git checkout -B` neglect to check whether
the provided branch is already checked out in some other worktree, as
shown by the following:
$ git worktree list
.../foo beefb00f [main]
$ git worktree add ../other
Preparing worktree (new branch 'other')
HEAD is now at beefb00f first
$ cd ../other
$ git switch -C main
Switched to and reset branch 'main'
$ git worktree list
.../foo beefb00f [main]
.../other beefb00f [main]
Fix this problem by teaching `git switch -C` and `git checkout -B` to
check whether the branch in question is already checked out elsewhere
by expanding on the existing checks that are being used by their non
force variants.
Unlike what it is done for `git switch` and `git checkout`, that have
an historical exception to ignore other workspaces if the branch to
check is the current one (as required when called as part of other
tools), the logic implemented is more strict and will require the user
to invoke the command with `--ignore-other-worktrees` to explicitly
indicate they want the risky behaviour.
This matches the current behaviour of `git branch -f` and is safer, for
more details see the tests in t2400.
Reported-by: Jinwook Jeong <vustthat@gmail.com>
Helped-by: Rubén Justo <rjusto@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
Changes since v2
* A leak free implementation
* More details in commit as suggested by Junio
Changes since v1
* A much better commit message
* Changes to the tests as suggested by Eric
* Changes to the logic as suggested by Rubén
builtin/checkout.c | 31 +++++++++++++++++++++++--------
t/t2400-worktree-add.sh | 18 ++++++++++++++++--
2 files changed, 39 insertions(+), 10 deletions(-)
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5963e1b74b..467e194701 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -1476,7 +1476,8 @@ static void die_if_some_operation_in_progress(void)
}
static int checkout_branch(struct checkout_opts *opts,
- struct branch_info *new_branch_info)
+ struct branch_info *new_branch_info,
+ char *check_branch_path)
{
if (opts->pathspec.nr)
die(_("paths cannot be used with switching branches"));
@@ -1535,15 +1536,15 @@ static int checkout_branch(struct checkout_opts *opts,
if (!opts->can_switch_when_in_progress)
die_if_some_operation_in_progress();
- if (new_branch_info->path && !opts->force_detach && !opts->new_branch &&
- !opts->ignore_other_worktrees) {
+ if (check_branch_path && !opts->force_detach && !opts->ignore_other_worktrees) {
int flag;
char *head_ref = resolve_refdup("HEAD", 0, NULL, &flag);
- if (head_ref &&
- (!(flag & REF_ISSYMREF) || strcmp(head_ref, new_branch_info->path)))
- die_if_checked_out(new_branch_info->path, 1);
+ if (opts->new_branch_force || (head_ref &&
+ (!(flag & REF_ISSYMREF) || strcmp(head_ref, check_branch_path))))
+ die_if_checked_out(check_branch_path, 1);
free(head_ref);
}
+ free(check_branch_path);
if (!new_branch_info->commit && opts->new_branch) {
struct object_id rev;
@@ -1630,6 +1631,7 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
struct branch_info *new_branch_info)
{
int parseopt_flags = 0;
+ char *check_branch_path = NULL;
opts->overwrite_ignore = 1;
opts->prefix = prefix;
@@ -1741,6 +1743,7 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
!opts->new_branch;
int n = parse_branchname_arg(argc, argv, dwim_ok,
new_branch_info, opts, &rev);
+ check_branch_path = xstrdup_or_null(new_branch_info->path);
argv += n;
argc -= n;
} else if (!opts->accept_ref && opts->from_treeish) {
@@ -1753,8 +1756,18 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
opts, &rev,
opts->from_treeish);
+ check_branch_path = xstrdup_or_null(new_branch_info->path);
if (!opts->source_tree)
die(_("reference is not a tree: %s"), opts->from_treeish);
+ } else if (opts->new_branch && !opts->ignore_other_worktrees) {
+ struct object_id rev;
+
+ if (!get_oid_mb(opts->new_branch, &rev)) {
+ struct strbuf buf = STRBUF_INIT;
+ strbuf_branchname(&buf, opts->new_branch, INTERPRET_BRANCH_LOCAL);
+ strbuf_splice(&buf, 0, 0, "refs/heads/", 11);
+ check_branch_path = strbuf_detach(&buf, NULL);
+ }
}
if (argc) {
@@ -1818,10 +1831,12 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
strbuf_release(&buf);
}
- if (opts->patch_mode || opts->pathspec.nr)
+ if (opts->patch_mode || opts->pathspec.nr) {
+ free(check_branch_path);
return checkout_paths(opts, new_branch_info);
+ }
else
- return checkout_branch(opts, new_branch_info);
+ return checkout_branch(opts, new_branch_info, check_branch_path);
}
int cmd_checkout(int argc, const char **argv, const char *prefix)
diff --git a/t/t2400-worktree-add.sh b/t/t2400-worktree-add.sh
index d587e0b20d..a66f9bb30c 100755
--- a/t/t2400-worktree-add.sh
+++ b/t/t2400-worktree-add.sh
@@ -121,7 +121,10 @@ test_expect_success '"add" worktree creating new branch' '
test_expect_success 'die the same branch is already checked out' '
(
cd here &&
- test_must_fail git checkout newmain
+ test_must_fail git checkout newmain &&
+ test_must_fail git checkout -B newmain &&
+ test_must_fail git switch newmain &&
+ test_must_fail git switch -C newmain
)
'
@@ -143,7 +146,18 @@ test_expect_success 'not die the same branch is already checked out' '
test_expect_success 'not die on re-checking out current branch' '
(
cd there &&
- git checkout newmain
+ git checkout newmain &&
+ git switch newmain
+ )
+'
+
+test_expect_success 'but die if using force without --ignore-other-worktrees' '
+ (
+ cd there &&
+ test_must_fail git checkout -B newmain &&
+ test_must_fail git switch -C newmain &&
+ git checkout --ignore-other-worktrees -B newmain &&
+ git switch --ignore-other-worktrees -C newmain
)
'
--
2.37.1 (Apple Git-137.1)
^ permalink raw reply related
* Re: [PATCH] hash-object: fix descriptor leak with --literally
From: Junio C Hamano @ 2023-01-19 6:26 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <Y8ijpJqtkDTi792i@coredump.intra.peff.net>
Jeff King <peff@peff.net> writes:
> In hash_object(), we open a descriptor for each file to hash (whether we
> got the filename from the command line or --stdin-paths), but never
> close it. For the traditional code path which feeds the result to
> index_fd(), this is OK; it closes the descriptor for us.
>
> But 5ba9a93b39 (hash-object: add --literally option, 2014-09-11) a
> second code path which does not close the descriptor.
A sentence without verb? "5ba9 (hash-...) added a second code path,
which does not close the descriptor." or something?
> After this patch, it completes successfully. I didn't bother with a
> test, as it's a pain to deal with descriptor limits portably, and the
> fix is so trivial.
True. Will queue. Thanks.
> I do think the world would be less confusing if index_fd() didn't close
> the descriptor we pass it, and then hash_file() could just do:
>
> fd = open();
> hash_fd(fd);
> close(fd);
>
> which is much more readable. But it has many other callers. So even if
> we wanted to untangle all that, I think it makes sense to do this
> obvious fix in the meantime.
Indeed, thanks.
> builtin/hash-object.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/builtin/hash-object.c b/builtin/hash-object.c
> index b506381502..44db83f07f 100644
> --- a/builtin/hash-object.c
> +++ b/builtin/hash-object.c
> @@ -27,6 +27,7 @@ static int hash_literally(struct object_id *oid, int fd, const char *type, unsig
> else
> ret = write_object_file_literally(buf.buf, buf.len, type, oid,
> flags);
> + close(fd);
> strbuf_release(&buf);
> return ret;
> }
^ permalink raw reply
* Re* [PATCH v3] checkout/switch: disallow checking out same branch in multiple worktrees
From: Junio C Hamano @ 2023-01-19 7:23 UTC (permalink / raw)
To: Carlo Marcelo Arenas Belón
Cc: git, pclouds, Jinwook Jeong, Rubén Justo, Eric Sunshine
In-Reply-To: <20230119055325.1013-1-carenas@gmail.com>
Carlo Marcelo Arenas Belón <carenas@gmail.com> writes:
> Changes since v2
> * A leak free implementation
> * More details in commit as suggested by Junio
I meant to say we may need more details in the documentation, but
after reading the existing documentation, we say that
- "-B <name>" is equivalent to run "branch -f <name>", which is
sufficient to hint that it will fail to recreate and check out an
existing branch that is checked out elsewhere, because "branch
-f" would fail in such a situation.
and that
- "--ignore-other-worktrees" defeats the safety that makes "git
checkout refuses when the wanted ref is already checked out".
so the existing documentation of "git checkout" may already be OK.
As long as it is well known that "git branch -f" still fails in the
situation, that is. After re-reading "git branch --help" twice,
however, I am not sure if it is so clear, though.
How about adding something like this, as an independent
documentation improvement?
----- >8 --------- >8 --------- >8 --------- >8 -----
Subject: [PATCH] branch: document `-f` and linked worktree behaviour
"git branch -f name start" forces to recreate the named branch, but
the forcing does not defeat the "do not touch a branch that is
checked out elsewhere" safety valve.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Documentation/git-branch.txt | 4 ++++
1 file changed, 4 insertions(+)
diff --git c/Documentation/git-branch.txt w/Documentation/git-branch.txt
index aa2f78c4c2..b12e7940d3 100644
--- c/Documentation/git-branch.txt
+++ w/Documentation/git-branch.txt
@@ -123,6 +123,10 @@ OPTIONS
points to a valid commit. In combination with
`-m` (or `--move`), allow renaming the branch even if the new
branch name already exists, the same applies for `-c` (or `--copy`).
++
+Note that 'git branch -f <branchname> [<start-point>]' refuses to change
+an existing branch `<branchname>` that is checked out in another worktree
+linked to the same repository.
-m::
--move::
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox