Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH] t4129: prevent loss of exit code due to the use of pipes
From: Junio C Hamano @ 2024-01-10 17:25 UTC (permalink / raw)
  To: Chandra Pratap via GitGitGadget; +Cc: git, Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1636.git.1704891257544.gitgitgadget@gmail.com>

"Chandra Pratap via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  t/t4129-apply-samemode.sh | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)

Looks good.  Will queue with two instances of a minor style fix (see
below).

> diff --git a/t/t4129-apply-samemode.sh b/t/t4129-apply-samemode.sh
> index e7a7295f1b6..ffabeafa213 100755
> --- a/t/t4129-apply-samemode.sh
> +++ b/t/t4129-apply-samemode.sh
> @@ -41,7 +41,8 @@ test_expect_success FILEMODE 'same mode (index only)' '
>  	chmod +x file &&
>  	git add file &&
>  	git apply --cached patch-0.txt &&
> -	git ls-files -s file | grep "^100755"
> +	git ls-files -s file > ls-files-output &&

Redirection operator ">" and "<" sticks to the file in question
(look for "> " and "< " in this file and you'd find none), i.e.

	git ls-files -s file >ls-files-output &&

Thanks.

^ permalink raw reply

* Re: [PATCH 1/4] reftable/stack: refactor stack reloading to have common exit path
From: Junio C Hamano @ 2024-01-10 17:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <01ece2626dd4cb494829e146d99c172fa8428478.1704714575.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> The `reftable_stack_reload_maybe_reuse()` function is responsible for
> reloading the reftable list from disk. The function is quite hard to
> follow though because it has a bunch of different exit paths, many of
> which have to free the same set of resources.
>
> Refactor the function to have a common exit path. While at it, touch up
> the style of this function a bit to match our usual coding style better.
> ---
>  reftable/stack.c | 86 +++++++++++++++++++++++-------------------------
>  1 file changed, 42 insertions(+), 44 deletions(-)

Missing sign-off.

Other than that, I did not find anything questionable in the
conversion.  By sticking to the two simple invariants:

 - we use "err" as our return value when we jump to "out:"

 - we always keep "names" and "names_after" freeable, and free them
   when we jump to "out:".

the exit status and leak prevention are both very clear (and the
behaviour is not changed---it is not like there are any existing
leaks that are plugged by this restructuring of the loop).


^ permalink raw reply

* Re: [PATCH] index-pack: spawn threads atomically
From: Taylor Blau @ 2024-01-10 17:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20240110114456.GF16674@coredump.intra.peff.net>

On Wed, Jan 10, 2024 at 06:44:56AM -0500, Jeff King wrote:
> On Fri, Jan 05, 2024 at 11:33:23AM -0500, Taylor Blau wrote:
>
> > -	test_must_fail git index-pack --fix-thin --stdin <recoverable.pack
> > +	test_must_fail git index-pack --threads=1 --fix-thin --stdin <recoverable.pack
> > [...]
> > For what it's worth, I'm fine with either approach, mostly to avoid
> > tying up more of the list's time discussing the options. But I have a
> > vague preference towards `--threads=1` since it doesn't require us to
> > touch production code.
>
> That's quite tempting, actually. The flip side, though, is that the test
> no longer reflects the production code as well. That is, in the real
> world we'd still call exit() from a thread. That obviously works OK now
> (modulo LSan), but if we ever had a regression where that left us in an
> inconsistent state, we'd be less likely to notice it. Feels kind of
> unlikely in practice, though.
>
> I dunno. I guess the real least-bad thing is seeing if LSan can be
> fixed to handle this atomically. I haven't even reported it there.

In the meantime, I think that the `--threads=1` approach feels less
invasive. I tend to agree that neither option is ideal, but that
`--threads=1` is probably the least bad, and that failing to catch a
regression there feels rather unlikely.

> If do go with "--threads=1", I suspect several tests in that file need
> it.

Yeah, there are a couple of others. I think the ones that need modifying
are at the intersection of "expected to fail" and "in a test which is
expected to pass leak-free":

    $ grep -l 'TEST_PASSES_SANITIZE_LEAK=true' t????-*.sh |
      xargs grep -l 'test_must_fail git index-pack'
    t5302-pack-index.sh
    t5308-pack-detect-duplicates.sh
    t5309-pack-delta-cycles.sh
    t5313-pack-bounds-checks.sh
    t5325-reverse-index.sh

I'll send a series shortly to tweak those test scripts to avoid this
issue if you want to notify the LSan folks of this issue more generally.

> -Peff

Thanks,
Taylor

^ permalink raw reply

* Re: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
From: Mohit Marathe @ 2024-01-10 17:38 UTC (permalink / raw)
  To: Christian Couder
  Cc: git@vger.kernel.org, gitster@pobox.com, britton.kerin@gmail.com,
	peff@peff.net
In-Reply-To: <CAP8UFD1d7FSa=mUzzUA5e3eSEcCVfaymxWewo5GjdDBi4GyE-g@mail.gmail.com>

>In https://public-inbox.org/git/xmqqjzpjsbjl.fsf@gitster.g/ Junio says:
>
>"Some places use atoi() immediately followed by strspn() to skip over
>digits, which means they are parsing an integer and want to continue
>reading after the integer, which is incompatible with what
>strtol_i() wants to do.  They need either a separate helper or an
>updated strtol_i() that optionally allows you to parse the prefix
>and report where the integer ended, e.g., something like:"
>
>and then he suggests the above helper.
>
>So it seems that the two instances you found look like good places
>where Junio says the new helper could be useful.
>
>Now if you want to continue further on this, I think you would need to
>take a closer look at those two instances to see if replacing atoi()
>there with the new helper would improve something there or not. If you
>find it would improve something, be sure to explain what would be
>improved in the commit message.

I took a closer look at `builtin/patch-id.c` and it seems replacing 
`atoi()` (which is used to parse numbers in the hunk header) wouldn't
improve anything, unless I'm missing something.

So then I tried finding other places where `atoi()` can be replaced
but I find it difficult to find any reason that would justify the change.
So far I've only looked at few of the MANY occurrences of `atoi()`.
As far as I understand, the only advantage of `strtol_i()`
over `atoi()` is better error handling. And most of places I've seen either
already takes care of that or does not need that at all.

Thanks!

^ permalink raw reply

* Re: [PATCH 3/3] advice: allow disabling the automatic hint in advise_if_enabled()
From: Dragan Simic @ 2024-01-10 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Rubén Justo, Jeff King, Git List
In-Reply-To: <xmqqil41duov.fsf@gitster.g>

On 2024-01-10 17:22, Junio C Hamano wrote:
> Dragan Simic <dsimic@manjaro.org> writes:
> 
>> No worries.  Regarding disabling the advices for disabling the advice
>> messages, maybe it would be better to have only one configuration knob
>> for that purpose, e.g. "core.verboseAdvice", as a boolean knob.
> 
> I am not sure if you understood Peff's example that illustrates why
> it is a bad thing, as ...
> 
>> That
>> way, fishing for the right knob for some advice message wouldn't turn
>> itself into an issue,
> 
> ... this is exactly what a single core.verboseAdvice knob that
> squelches the "how to disable this particular advice message" part
> from the message is problematic.  Before you see and familialize
> yourself with all advice messages, you may see one piece of advice X
> and find it useful to keep, feeling no need to turn it off.  If you
> use that single knob to squelch the part to tell you how to turn
> advice X off.  But what happens when you hit another unrelated
> advice Y then?  Because your core.verboseAdvice is a single big red
> button, the message does not say which advice.* variable to tweak
> for this particular advice message Y.

Makes sense, but please allow me to explain how I envisioned the whole 
thing
with the single, big core.verboseAdvice configuration knob:

1) You use git and find some advice useful, so you decide to keep it 
displayed.
    However, the additional advice about turning the advice off becomes 
annoying
    a bit, or better said, it becomes redundant because the main advice 
stays.

2) As a result, you simply set core.verboseAdvice to false and voila, 
the
    redundant additional advice disappears.  Seems perfect!  Of course, 
it
    isn't perfect, as the next point will clearly show.

3) You keep using git, and some advice becomes no longer needed, maybe 
even
    one of the advices that you previously used to find useful, or you 
find
    some newly added advice a bit annoying and, as a result, not needed.  
But,
    what do you do, where's that helpful additional advice?

4) As a careful git user that remembers important things, you go back to 
your
    git configuration file and set core.verboseAdvice to true, and the 
additional
    advices are back, telling you how to disable the unwanted advice.

5) After you disable the unwanted advice, you set core.verboseAdvice 
back to
    false and keep it that way until the next redundant advice pops up.

However, I do see that this approach has its downsides, for example the 
need
for the unwanted advice to be displayed again together with the 
additional advice,
by executing the appropriate git commands, after the above-described 
point #4.

Let's see what it would look like with the granular, per-advice on/off 
knobs:

1) You use git and find some advice useful, so you decide to keep it 
displayed.
    However, the additional advice about turning the advice off becomes 
annoying
    a bit, or better said, it becomes redundant because the main advice 
stays.

2) As a result, you follow the additional advice and set the specific 
knob to
    false, and voila, the redundant additional advice disappears.  Of 
course,
    it once again isn't perfect, as the next point will clearly show.

3) You keep using git, and one of the advices that you previously used 
to find
    useful becomes no longer needed.  But, what do you do, where's that 
helpful
    additional advice telling you how to turn the advice off?

4) Now you need to dig though the manual pages, or to re-enable the 
additional
    advices in your git configuration file, perhaps all of them at once, 
while
    keeping a backup of your original settings, to restore it later.  
Then, you
    again need to wait until the original advice gets displayed.

5) The additional advice is finally back, after some time passes or 
after
    specifically reproducing the exact git commands, telling you how to 
disable
    the unwanted advice.  Of course, you follow the advice and set the 
right
    knob in your git configuration file.

5) After you disable the unwanted advice, you restore the git 
configuration
    backup that you made earlier (you did that, right?), taking care not 
to
    override the newly made changes that disabled the unwanted advice.

Quite frankly, the second approach, although more granular, seems much 
more
complicated and more error-prone to me.

Of course, please let me know if I'm missing something obvious.

^ permalink raw reply

* Re: [PATCH] branch: error description when deleting a not fully merged branch
From: Junio C Hamano @ 2024-01-10 17:48 UTC (permalink / raw)
  To: Rubén Justo; +Cc: Git List
In-Reply-To: <04c3556f-0242-4ac3-b94a-be824cd2004a@gmail.com>

Rubén Justo <rjusto@gmail.com> writes:

> The error message we show when the user tries to delete a not fully
> merged branch describes the error and gives a hint to the user:
>
> 	error: the branch 'foo' is not fully merged.
> 	If you are sure you want to delete it, run 'git branch -D foo'.
>
> Let's move the hint part so that it takes advantage of the advice
> machinery:
>
> 	error: the branch 'foo' is not fully merged
> 	hint: If you are sure you want to delete it, run 'git branch -D foo'
> 	hint: Disable this message with "git config advice.forceDeleteBranch false"

This is probably one sensible step forward, so let's queue it as-is.

But with reservations for longer-term future direction.  Stepping
back a bit, when 'foo' is not fully merged and the user used "branch
-d" on it, is it sensible for us to suggest use of "branch -D"?

Especially now this is a "hint" to help less experienced folks, it
may be helpful to suggest how the user can answer "If you are sure
you want to delete" part.  As this knows what unique commits on the
branch being deleted are about to be lost, one way to do so may be
to tell the user about them ("you are about to lose 'branch: error
description when deleting a not fully merged branch' and other 47
commits that are not merged the target branch 'main'", for example).

Another possibility is to suggest merging the branch into the
target, instead of suggesting a destructive "deletion", but I
suspect that it goes too far second-guessing the end-user intention.

Thanks.

> Signed-off-by: Rubén Justo <rjusto@gmail.com>
> ---
>
> This change is a pending NEEDSWORK from a recent series about adjusting
> the error messages in branch.c
>
> Unfortunately the full message now becomes a three line message.
>
> Hopefully we can find a way in the near future to keep it at two.
>
>  Documentation/config/advice.txt | 3 +++
>  advice.c                        | 1 +
>  advice.h                        | 3 ++-
>  builtin/branch.c                | 9 ++++++---
>  4 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/config/advice.txt b/Documentation/config/advice.txt
> index 4d7e5d8759..5814d659b9 100644
> --- a/Documentation/config/advice.txt
> +++ b/Documentation/config/advice.txt
> @@ -142,4 +142,7 @@ advice.*::
>  		Advice shown when a user tries to create a worktree from an
>  		invalid reference, to instruct how to create a new unborn
>  		branch instead.
> +	forceDeleteBranch::
> +		Advice shown when a user tries to delete a not fully merged
> +		branch without the force option set.
>  --
> diff --git a/advice.c b/advice.c
> index 50c79443ba..e91f5d7ab8 100644
> --- a/advice.c
> +++ b/advice.c
> @@ -79,6 +79,7 @@ static struct {
>  	[ADVICE_UPDATE_SPARSE_PATH]			= { "updateSparsePath", 1 },
>  	[ADVICE_WAITING_FOR_EDITOR]			= { "waitingForEditor", 1 },
>  	[ADVICE_WORKTREE_ADD_ORPHAN]			= { "worktreeAddOrphan", 1 },
> +	[ADVICE_FORCE_DELETE_BRANCH]			= { "forceDeleteBranch", 1 },
>  };
>  
>  static const char turn_off_instructions[] =
> diff --git a/advice.h b/advice.h
> index 2affbe1426..5bef900142 100644
> --- a/advice.h
> +++ b/advice.h
> @@ -10,7 +10,7 @@ struct string_list;
>   * Add the new config variable to Documentation/config/advice.txt.
>   * Call advise_if_enabled to print your advice.
>   */
> - enum advice_type {
> +enum advice_type {
>  	ADVICE_ADD_EMBEDDED_REPO,
>  	ADVICE_ADD_EMPTY_PATHSPEC,
>  	ADVICE_ADD_IGNORED_FILE,
> @@ -50,6 +50,7 @@ struct string_list;
>  	ADVICE_WAITING_FOR_EDITOR,
>  	ADVICE_SKIPPED_CHERRY_PICKS,
>  	ADVICE_WORKTREE_ADD_ORPHAN,
> +	ADVICE_FORCE_DELETE_BRANCH,
>  };
>  
>  int git_default_advice_config(const char *var, const char *value);
> diff --git a/builtin/branch.c b/builtin/branch.c
> index 0a32d1b6c8..2240433bc8 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -24,6 +24,7 @@
>  #include "ref-filter.h"
>  #include "worktree.h"
>  #include "help.h"
> +#include "advice.h"
>  #include "commit-reach.h"
>  
>  static const char * const builtin_branch_usage[] = {
> @@ -190,9 +191,11 @@ static int check_branch_commit(const char *branchname, const char *refname,
>  		return -1;
>  	}
>  	if (!force && !branch_merged(kinds, branchname, rev, head_rev)) {
> -		error(_("the branch '%s' is not fully merged.\n"
> -		      "If you are sure you want to delete it, "
> -		      "run 'git branch -D %s'"), branchname, branchname);
> +		error(_("the branch '%s' is not fully merged"),
> +		      branchname);
> +		advise_if_enabled(ADVICE_FORCE_DELETE_BRANCH,
> +				  _("If you are sure you want to delete it, "
> +				  "run 'git branch -D %s'"), branchname);
>  		return -1;
>  	}
>  	return 0;

^ permalink raw reply

* [PATCH 1/5] t5309: run expected-to-fail `index-pack`s with `--threads=1`
From: Taylor Blau @ 2024-01-10 17:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano
In-Reply-To: <ZZ7VEVXSg1T8ZkIK@nand.local>

The t5309 script triggers a racy false positive with SANITIZE=leak on a
multi-core system. Running with "--stress --run=6" usually fails within
10 seconds or so for me, complaining with something like:

    + git index-pack --fix-thin --stdin
    fatal: REF_DELTA at offset 46 already resolved (duplicate base 01d7713666f4de822776c7622c10f1b07de280dc?)

    =================================================================
    ==3904583==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 32 byte(s) in 1 object(s) allocated from:
        #0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
        #2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
        #3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
        #4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
        #5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
        #6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
        #7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

    SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).
    Aborted

What happens is this:

  0. We construct a bogus pack with a duplicate object in it and trigger
     index-pack.

  1. We spawn a bunch of worker threads to resolve deltas (on my system
     it is 16 threads).

  2. One of the threads sees the duplicate object and bails by calling
     exit(), taking down all of the threads. This is expected and is the
     point of the test.

  3. At the time exit() is called, we may still be spawning threads from
     the main process via pthread_create(). LSan hooks thread creation
     to update its book-keeping; it has to know where each thread's
     stack is (so it can find entry points for reachable memory). So it
     calls pthread_getattr_np() to get information about the new thread.
     That may allocate memory that must be freed with a matching call to
     pthread_attr_destroy(). Probably LSan does that immediately, but
     if you're unlucky enough, the exit() will happen while it's between
     those two calls, and the allocated pthread_attr_t appears as a
     leak.

This isn't a real leak. It's not even in our code, but rather in the
LSan instrumentation code. So we could just ignore it. But the false
positive can cause people to waste time tracking it down.

It's possibly something that LSan could protect against (e.g., cover the
getattr/destroy pair with a mutex, and then in the final post-exit()
check for leaks try to take the same mutex). But I don't know enough
about LSan to say if that's a reasonable approach or not (or if my
analysis is even completely correct).

One approach to papering over this issue (short of LSan fixing it
upstream) is to make the creation of work threads "atomic", i.e. by
spawning all of them before letting any of them start to work.  This
shouldn't make any practical difference for non-LSan runs. The thread
spawning is quick, and could happen before any worker thread gets
scheduled anyway.

But that requires us to tweak production code (albeit at a negligible
cost) in order to appease LSan in this narrow circumstance. Another
approach is to simply run these expected-to-fail `index-pack`
invocations with `--threads=1` so that we bypass the above issue
entirely.

The downside of that approach is that the test doesn't match our
production code as well as it did before (where we might have run those
same `index-pack` invocations with >1 thread, depending on how many CPUs
the testing machine has). The risk there is that we might miss a
regression that would leave us in an inconsistent state. But that feels
rather unlikely in practice, and there are many other tests related to
`index-pack` in the suite.

Original-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5309-pack-delta-cycles.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5309-pack-delta-cycles.sh b/t/t5309-pack-delta-cycles.sh
index 4e910c5b9d..4100595c89 100755
--- a/t/t5309-pack-delta-cycles.sh
+++ b/t/t5309-pack-delta-cycles.sh
@@ -44,7 +44,7 @@ test_expect_success 'index-pack detects missing base objects' '
 		pack_obj $A $B
 	} >missing.pack &&
 	pack_trailer missing.pack &&
-	test_must_fail git index-pack --fix-thin --stdin <missing.pack
+	test_must_fail git index-pack --threads=1 --fix-thin --stdin <missing.pack
 '
 
 test_expect_success 'index-pack detects REF_DELTA cycles' '
@@ -55,13 +55,13 @@ test_expect_success 'index-pack detects REF_DELTA cycles' '
 		pack_obj $B $A
 	} >cycle.pack &&
 	pack_trailer cycle.pack &&
-	test_must_fail git index-pack --fix-thin --stdin <cycle.pack
+	test_must_fail git index-pack --threads=1 --fix-thin --stdin <cycle.pack
 '
 
 test_expect_success 'failover to an object in another pack' '
 	clear_packs &&
 	git index-pack --stdin <ab.pack &&
-	test_must_fail git index-pack --stdin --fix-thin <cycle.pack
+	test_must_fail git index-pack --threads=1 --stdin --fix-thin <cycle.pack
 '
 
 test_expect_success 'failover to a duplicate object in the same pack' '
@@ -73,7 +73,7 @@ test_expect_success 'failover to a duplicate object in the same pack' '
 		pack_obj $A
 	} >recoverable.pack &&
 	pack_trailer recoverable.pack &&
-	test_must_fail git index-pack --fix-thin --stdin <recoverable.pack
+	test_must_fail git index-pack --threads=1 --fix-thin --stdin <recoverable.pack
 '
 
 test_done
-- 
2.43.0.288.g906e6a084d


^ permalink raw reply related

* [PATCH 2/5] t5302: run expected-to-fail `index-pack`s with `--threads=1`
From: Taylor Blau @ 2024-01-10 17:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano
In-Reply-To: <ZZ7VEVXSg1T8ZkIK@nand.local>

For identical reasons as in the previous commit, apply the same
treatment to expected-to-fail `index-pack` invocations in t5302 with
`--threads=1`.

(Note that this treatment only needs to be applied in tests which are
expected to pass when built with the leak-checker enabled).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5302-pack-index.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index d88e6f1691..fdbcd80f89 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -290,7 +290,7 @@ test_expect_success 'too-large packs report the breach' '
 	pack=$(git pack-objects --all pack </dev/null) &&
 	sz="$(test_file_size pack-$pack.pack)" &&
 	test "$sz" -gt 20 &&
-	test_must_fail git index-pack --max-input-size=20 pack-$pack.pack 2>err &&
+	test_must_fail git index-pack --threads=1 --max-input-size=20 pack-$pack.pack 2>err &&
 	grep "maximum allowed size (20 bytes)" err
 '
 
-- 
2.43.0.288.g906e6a084d


^ permalink raw reply related

* [PATCH 3/5] t5308: run expected-to-fail `index-pack`s with `--threads=1`
From: Taylor Blau @ 2024-01-10 17:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano
In-Reply-To: <ZZ7VEVXSg1T8ZkIK@nand.local>

For identical reasons as in the previous commit, apply the same
treatment to expected-to-fail `index-pack` invocations in t5308 with
`--threads=1`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5308-pack-detect-duplicates.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5308-pack-detect-duplicates.sh b/t/t5308-pack-detect-duplicates.sh
index 655cafa054..5df552a42e 100755
--- a/t/t5308-pack-detect-duplicates.sh
+++ b/t/t5308-pack-detect-duplicates.sh
@@ -76,7 +76,7 @@ test_expect_success 'lookup in duplicated pack' '
 test_expect_success 'index-pack can reject packs with duplicates' '
 	clear_packs &&
 	create_pack dups.pack 2 &&
-	test_must_fail git index-pack --strict --stdin <dups.pack &&
+	test_must_fail git index-pack --threads=1 --strict --stdin <dups.pack &&
 	test_expect_code 1 git cat-file -e $LO_SHA1
 '
 
-- 
2.43.0.288.g906e6a084d


^ permalink raw reply related

* [PATCH 4/5] t5313: run expected-to-fail `index-pack`s with `--threads=1`
From: Taylor Blau @ 2024-01-10 17:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano
In-Reply-To: <ZZ7VEVXSg1T8ZkIK@nand.local>

For identical reasons as in the previous commit, apply the same
treatment to expected-to-fail `index-pack` invocations in t5313 with
`--threads=1`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5313-pack-bounds-checks.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t5313-pack-bounds-checks.sh b/t/t5313-pack-bounds-checks.sh
index ceaa6700a2..e9829b6ef3 100755
--- a/t/t5313-pack-bounds-checks.sh
+++ b/t/t5313-pack-bounds-checks.sh
@@ -91,7 +91,7 @@ test_expect_success 'pack/index object count mismatch' '
 
 	# ...and make sure that index-pack --verify, which has its
 	# own reading routines, does not segfault.
-	test_must_fail git index-pack --verify $pack
+	test_must_fail git index-pack --threads=1 --verify $pack
 '
 
 test_expect_success 'matched bogus object count' '
@@ -111,7 +111,7 @@ test_expect_success 'matched bogus object count' '
 	git cat-file blob $object >actual &&
 	test_cmp file actual &&
 
-	test_must_fail git index-pack --verify $pack
+	test_must_fail git index-pack --threads=1 --verify $pack
 '
 
 # Note that we cannot check the fallback case for these
@@ -128,7 +128,7 @@ test_expect_success 'bogus object offset (v1)' '
 	munge $idx $((4 * 256)) "\377\0\0\0" &&
 	clear_base &&
 	test_must_fail git cat-file blob $object &&
-	test_must_fail git index-pack --verify $pack
+	test_must_fail git index-pack --threads=1 --verify $pack
 '
 
 test_expect_success 'bogus object offset (v2, no msb)' '
@@ -136,7 +136,7 @@ test_expect_success 'bogus object offset (v2, no msb)' '
 	munge $idx $(ofs_table 1) "\0\377\0\0" &&
 	clear_base &&
 	test_must_fail git cat-file blob $object &&
-	test_must_fail git index-pack --verify $pack
+	test_must_fail git index-pack --threads=1 --verify $pack
 '
 
 test_expect_success 'bogus offset into v2 extended table' '
@@ -144,7 +144,7 @@ test_expect_success 'bogus offset into v2 extended table' '
 	munge $idx $(ofs_table 1) "\377\0\0\0" &&
 	clear_base &&
 	test_must_fail git cat-file blob $object &&
-	test_must_fail git index-pack --verify $pack
+	test_must_fail git index-pack --threads=1 --verify $pack
 '
 
 test_expect_success 'bogus offset inside v2 extended table' '
@@ -169,7 +169,7 @@ test_expect_success 'bogus offset inside v2 extended table' '
 	mv tmp "$idx" &&
 	clear_base &&
 	test_must_fail git cat-file blob $object &&
-	test_must_fail git index-pack --verify $pack
+	test_must_fail git index-pack --threads=1 --verify $pack
 '
 
 test_expect_success 'bogus OFS_DELTA in packfile' '
-- 
2.43.0.288.g906e6a084d


^ permalink raw reply related

* [PATCH 5/5] t5325: run expected-to-fail `index-pack`s with `--threads=1`
From: Taylor Blau @ 2024-01-10 17:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano
In-Reply-To: <ZZ7VEVXSg1T8ZkIK@nand.local>

For identical reasons as in the previous commit, apply the same
treatment to expected-to-fail `index-pack` invocations in t5325 with
`--threads=1`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5325-reverse-index.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
index 431a603ca0..dc3d2235e8 100755
--- a/t/t5325-reverse-index.sh
+++ b/t/t5325-reverse-index.sh
@@ -64,7 +64,7 @@ test_expect_success 'index-pack can verify reverse indexes' '
 	chmod u+w $rev &&
 	printf "xxxx" | dd of=$rev bs=1 count=4 conv=notrunc &&
 
-	test_must_fail git index-pack --rev-index --verify \
+	test_must_fail git index-pack --threads=1 --rev-index --verify \
 		$packdir/pack-$pack.pack 2>err &&
 	grep "validation error" err
 '
-- 
2.43.0.288.g906e6a084d

^ permalink raw reply related

* RE: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
From: rsbecker @ 2024-01-10 17:55 UTC (permalink / raw)
  To: 'Mohit Marathe', 'Christian Couder'
  Cc: git, gitster, britton.kerin, peff
In-Reply-To: <F6ejgAfr2IMRNR3Tq0CDTHeT9xMWzJ9ley8M_fnSX97ayRNRp_CEgA62WdtOooi9bha1WJPGB53ptJYQFII2lCbIflwgNvbIaefw7nK8w7M=@proton.me>

On Wednesday, January 10, 2024 12:38 PM, Mohit Marathe wrote:
>>In https://public-inbox.org/git/xmqqjzpjsbjl.fsf@gitster.g/ Junio says:
>>
>>"Some places use atoi() immediately followed by strspn() to skip over
>>digits, which means they are parsing an integer and want to continue
>>reading after the integer, which is incompatible with what
>>strtol_i() wants to do.  They need either a separate helper or an
>>updated strtol_i() that optionally allows you to parse the prefix and
>>report where the integer ended, e.g., something like:"
>>
>>and then he suggests the above helper.
>>
>>So it seems that the two instances you found look like good places
>>where Junio says the new helper could be useful.
>>
>>Now if you want to continue further on this, I think you would need to
>>take a closer look at those two instances to see if replacing atoi()
>>there with the new helper would improve something there or not. If you
>>find it would improve something, be sure to explain what would be
>>improved in the commit message.
>
>I took a closer look at `builtin/patch-id.c` and it seems replacing `atoi()` (which is
>used to parse numbers in the hunk header) wouldn't improve anything, unless I'm
>missing something.
>
>So then I tried finding other places where `atoi()` can be replaced but I find it
>difficult to find any reason that would justify the change.
>So far I've only looked at few of the MANY occurrences of `atoi()`.
>As far as I understand, the only advantage of `strtol_i()` over `atoi()` is better error
>handling. And most of places I've seen either already takes care of that or does not
>need that at all.

I am not sure this is a good idea. The error detection inside strtol_i() reports a -1 if the supplied text value is invalid. This does not differentiate between an invalid value and a valid "-1" supplied. Replacing all instances of atoi() with strtol_i() will likely cause breakages as the error semantics are different between the two.
--Randall


^ permalink raw reply

* Re: [PATCH v3 1/2] t7501: add tests for --include and --only
From: Junio C Hamano @ 2024-01-10 18:37 UTC (permalink / raw)
  To: Ghanshyam Thakkar; +Cc: git, phillip.wood123, christian.couder
In-Reply-To: <20240110163622.51182-4-shyamthakkar001@gmail.com>

Ghanshyam Thakkar <shyamthakkar001@gmail.com> writes:

> -# FIXME: Test the various index usages, -i and -o, test reflog,
> +# fixme: test the various index usages, test reflog,
>  # signoff

Why the sudden downcasing?  If this were to turn it to "TODO:"
(110+) or "NEEDSWORK:" (110+) from less often used "FIXME:" (40-),
such a change might be defensible, but there is only one instance
of downcased "fixme:", so I am curious how this happened.

> +test_expect_success '--include fails with untracked files' '
> +	echo content >baz &&
> +	test_must_fail git commit --include -m "initial" baz
> +'

OK, this is because "--allow-empty" is not passed.  This should fail
without -i/-o (which should be the same as passing -o), with -i, or
with -o.

Calling this commit "initial" is highly misleading.  There are bunch
of commits already, but path "baz" has never been used.

> +test_expect_success '--include with staged changes' '
> +	echo newcontent >baz &&
> +	echo newcontent >file &&
> +	git add file &&
> +	git commit --include -m "file baz" baz  &&

I suspect that you found a bug whose behaviour we do not want to
carve into stone with this test.  When this test begins, because the
previous step failed to create the initial commit, we are creating
the root commit, without --allow-empty, with contents in the index
for path "file".  At this point

    $ git commit -m "file baz" baz

(or with "-o", which is the same thing) does error out with

    error: pathspec 'baz' did not match any file(s) known to git

because the "only" thing is to take the changes between HEAD and the
index and limit them further to those paths that match "baz", but
there is no path that matches "baz".

This command

    $ git commit -m "file baz" -i baz

should either error out the same way, or behave more or less[*] like

    $ git add baz && git commit -m "file baz"

and record changes to both "file" and "baz".

    Side note: The "more or less" is we should do "git rm baz"
               instead, if you removed the path.

But it seems that the current code simply ignores the unmatching
pathspec "baz" that is on the command line, hence recording only the
change to the contents of "file".

> +	git diff --name-only >remaining &&
> +	test_must_be_empty remaining &&
> +	git diff --name-only --staged >remaining &&
> +	test_must_be_empty remaining

These two tests to see if the working tree is clean and if the index
is clean are not wrong per-se, but is insufficient.  Judging from
the commit message you used, I think you expected this commit to
contain both changes to 'file' and 'baz'.  That needs to be also
checked with something like "git diff --name-only HEAD^ HEAD".

Now which behaviour between "error out because there is no path in
the index that matches pathspec 'baz'" and "add baz to the index and
commit that addition, together with what is already in the index" we
would want to take is probably open for discussion.  Such a discussion
may decide that the current behaviour is fine.  Until then...

> +test_expect_success '--only fails with untracked files' '
> +	echo content >newfile &&
> +	test_must_fail git commit --only -m "newfile" newfile
> +'

And this should fail the same way without "-o".  Don't we have such
a test that we can just sneak "with -o the same thing happens" test
next to it?

> +test_expect_success '--only excludes staged changes' '
> +	echo content >file &&
> +	echo content >baz &&
> +	git add baz &&
> +	git commit --only -m "file" file &&

This should behave exactly the same way without "-o".

> +	git diff --name-only >actual &&
> +	test_must_be_empty actual &&
> +	git diff --name-only --staged >actual &&
> +	test_grep "baz" actual
> +'
> +
> +test_expect_success '--include and --only do not mix' '
> +	test_when_finished "git reset --hard" &&
> +	echo new >file &&
> +	echo new >baz &&
> +	test_must_fail git commit --include --only -m "bar" bar baz

OK.  If you grep for 'cannot be used together' in t/ directory,
there are many tests that make sure how an invocation like this
should fail, i.e. with an error message that mentions incompatible
options.  Don't we want to do the same?

Thanks.

^ permalink raw reply

* Re: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
From: Junio C Hamano @ 2024-01-10 18:44 UTC (permalink / raw)
  To: Mohit Marathe
  Cc: Christian Couder, git@vger.kernel.org, britton.kerin@gmail.com,
	peff@peff.net
In-Reply-To: <F6ejgAfr2IMRNR3Tq0CDTHeT9xMWzJ9ley8M_fnSX97ayRNRp_CEgA62WdtOooi9bha1WJPGB53ptJYQFII2lCbIflwgNvbIaefw7nK8w7M=@proton.me>

Mohit Marathe <mohitmarathe@proton.me> writes:

> I took a closer look at `builtin/patch-id.c` and it seems replacing 
> `atoi()` (which is used to parse numbers in the hunk header) wouldn't
> improve anything, unless I'm missing something.

You can of course notice an invalid patch that places non-digit to
the hunk header and error out with such a change.  If we are reading
output from Git that we are invoking, hopefully we will not see such
an invalid patch, but the command is designed to read arbitrary
input like a patch e-mail you received over the network, so I do not
think it is fair to say there is no merit to such a change, even
though I agree that it may not matter all that much.

A corrupt patch may be getting a nonsense patch-ID with the current
code and hopefully is not matching other patches that are not
corrupt, but with such a change, a corrupt patch may not be getting
any patch-ID and a loop that computes patch-ID for many files and
try to match them up might need to be rewritten to take the new
failure case into account.

^ permalink raw reply

* Re: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
From: Junio C Hamano @ 2024-01-10 18:46 UTC (permalink / raw)
  To: rsbecker
  Cc: 'Mohit Marathe', 'Christian Couder', git,
	britton.kerin, peff
In-Reply-To: <004601da43ee$2b6cd0c0$82467240$@nexbridge.com>

<rsbecker@nexbridge.com> writes:

> I am not sure this is a good idea. The error detection inside strtol_i() reports a -1 if the supplied text value is invalid. This does not differentiate between an invalid value and a valid "-1" supplied. Replacing all instances of atoi() with strtol_i() will likely cause breakages as the error semantics are different between the two.

atoi() and strtol_i() have totally different function signatures, so
it won't be a straight replacement.  The report of "-1" you mention
is only about strtol_i() reporting "have we successfully read out a
number [yes/no]?".  The value we parsed goes to a separate location,
so there is no risk of confusion as long as the caller knows what it
is doing.

Thanks.

^ permalink raw reply

* [PATCH 0/2] Generalize reference locking in tests
From: Justin Tobler via GitGitGadget @ 2024-01-10 18:52 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler

There are two tests in t1401 and t5541 that rely on writing a reference lock
file directly. While this works for the files reference backend, reftable
reference locks operate differently and are incompatible with this approach.
To be reference backend agnostic, this patch series refactors the two tests
to use git-update-ref(1) to lock refs instead.

This approach is more verbose and may warrant consideration of implementing
an abstraction to further simplify this operation. There appears to be one
other existing test in t1400 that also follows this pattern. Being that
there is only a handful of affected tests the implemented approach may be
sufficient as is.

Justin Tobler (2):
  t1401: generalize reference locking
  t5541: generalize reference locking

 t/t1401-symbolic-ref.sh    | 28 ++++++++++++++++++++++++----
 t/t5541-http-push-smart.sh | 25 +++++++++++++++++++++++--
 2 files changed, 47 insertions(+), 6 deletions(-)

base-commit: 624eb90fa8f65a79396615f3c2842ac5a3743350
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1634%2Fjltobler%2Fjt%2Fbackend-generic-ref-lock-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1634/jltobler/jt/backend-generic-ref-lock-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1634
-- 
gitgitgadget

^ permalink raw reply

* [PATCH 1/2] t1401: generalize reference locking
From: Justin Tobler via GitGitGadget @ 2024-01-10 18:52 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Justin Tobler
In-Reply-To: <pull.1634.git.1704912750.gitgitgadget@gmail.com>

From: Justin Tobler <jltobler@gmail.com>

Some tests set up reference locks by directly creating the lockfile.
While this works for the files reference backend, reftable reference
locks operate differently and are incompatible with this approach.
Refactor the test to use git-update-ref(1) to lock refs instead so that
the test does not need to be aware of how the ref backend locks refs.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 t/t1401-symbolic-ref.sh | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/t/t1401-symbolic-ref.sh b/t/t1401-symbolic-ref.sh
index c7745e1bf69..bafe8db24df 100755
--- a/t/t1401-symbolic-ref.sh
+++ b/t/t1401-symbolic-ref.sh
@@ -105,10 +105,30 @@ test_expect_success LONG_REF 'we can parse long symbolic ref' '
 	test_cmp expect actual
 '
 
-test_expect_success 'symbolic-ref reports failure in exit code' '
-	test_when_finished "rm -f .git/HEAD.lock" &&
-	>.git/HEAD.lock &&
-	test_must_fail git symbolic-ref HEAD refs/heads/whatever
+test_expect_success PIPE 'symbolic-ref reports failure in exit code' '
+	mkfifo in out &&
+	(git update-ref --stdin <in >out &) &&
+
+	exec 9>in &&
+	exec 8<out &&
+	test_when_finished "exec 9>&-" &&
+	test_when_finished "exec 8<&-" &&
+
+	echo "start" >&9 &&
+	echo "start: ok" >expected &&
+	read line <&8 &&
+	echo "$line" >actual &&
+	test_cmp expected actual &&
+
+	echo "update HEAD refs/heads/foo" >&9 &&
+
+	echo "prepare" >&9 &&
+	echo "prepare: ok" >expected &&
+	read line <&8 &&
+	echo "$line" >actual &&
+	test_cmp expected actual &&
+
+	test_must_fail git symbolic-ref HEAD refs/heads/bar
 '
 
 test_expect_success 'symbolic-ref writes reflog entry' '
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 2/2] t5541: generalize reference locking
From: Justin Tobler via GitGitGadget @ 2024-01-10 18:52 UTC (permalink / raw)
  To: git; +Cc: Justin Tobler, Justin Tobler
In-Reply-To: <pull.1634.git.1704912750.gitgitgadget@gmail.com>

From: Justin Tobler <jltobler@gmail.com>

Some tests set up reference locks by directly creating the lockfile.
While this works for the files reference backend, reftable reference
locks operate differently and are incompatible with this approach.
Generalize reference locking by preparing a reference transaction.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 t/t5541-http-push-smart.sh | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
index df758e187df..5b94c0b066f 100755
--- a/t/t5541-http-push-smart.sh
+++ b/t/t5541-http-push-smart.sh
@@ -232,8 +232,29 @@ test_expect_success 'push --atomic fails on server-side errors' '
 	test_config -C "$d" http.receivepack true &&
 	up="$HTTPD_URL"/smart/atomic-branches.git &&
 
-	# break ref updates for other on the remote site
-	mkdir "$d/refs/heads/other.lock" &&
+	mkfifo in out &&
+	(git -C "$d" update-ref --stdin <in >out &) &&
+
+	exec 9>in &&
+	exec 8<out &&
+	test_when_finished "exec 9>&-" &&
+	test_when_finished "exec 8<&-" &&
+
+	echo "start" >&9 &&
+	echo "start: ok" >expected &&
+	read line <&8 &&
+	echo "$line" >actual &&
+	test_cmp expected actual &&
+
+	echo "update refs/heads/other refs/heads/other" >&9 &&
+
+	# Prepare reference transaction on `other` reference to lock it and thus
+	# break updates on the remote.
+	echo "prepare" >&9 &&
+	echo "prepare: ok" >expected &&
+	read line <&8 &&
+	echo "$line" >actual &&
+	test_cmp expected actual &&
 
 	# add the new commit to other
 	git branch -f other collateral &&
-- 
gitgitgadget

^ permalink raw reply related

* [ANNOUNCE] git-scm.com updates (was: Conservancy status report (2023))
From: Taylor Blau @ 2024-01-10 19:43 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Christian Couder,
	Junio C Hamano, Dan Moore
In-Reply-To: <ZRHTWaPthX/TETJz@nand.local>

On Mon, Sep 25, 2023 at 02:37:13PM -0400, Taylor Blau wrote:
> One notable change from last time is that our hosting fees have gone up
> significantly. These are entirely from Heroku's change in policy to no
> longer grant the Git project hosting credits for the git-scm.com
> project. Our costs in the meantime have been supported by a generous
> donation from Dan Moore at FusionAuth. The below email has some more
> details:
>
>   https://lore.kernel.org/git/YkcmtqcFaO7v1jW5@nand.local/
>
> It appears that since the above was written, Heroku has a new (?)
> program for giving credits to open-source projects. The details are
> below:
>
>   https://www.heroku.com/open-source-credit-program
>
> I applied on behalf of the Git project on 2023-09-25, and will follow-up
> on the list if/when we hear back from them.

Out of the blue today I received an email from Heroku saying that our
application to their open-source credit program was accepted. Quoting
from their email:

  [...] We are excited to announce that your project was a fit to
  receive $250 credits/month for one year, totaling $3,000 in credits.
  These credits have been applied to the account submitted on the
  application, and will be valid from January 2024 to December 2024.

$250 per month is more than enough to cover our hosting costs (which
average around ~$140 USD per month), so we should have plenty of room to
host git-scm.com free of charge for this year.

I know that there are some plans in the works to convert git-scm.com to
a static site in [1]. So even though we likely won't be on Heroku
forever, this should tide us over until we're ready to move to GitHub
Pages.

I would like to thank Dan Moore for his generous donation to the Git
project, which has covered our Heroku bill in past years. Now that
Heroku has agreed to cover out costs for this year, we will be able to
use the funds Dan has graciously provided to on behalf of FusionAuth for
other purposes.

Thanks,
Taylor

[1]: https://github.com/git/git-scm.com/pull/1804

^ permalink raw reply

* Re: [PATCH 00/10] Enrich Trailer API
From: Junio C Hamano @ 2024-01-10 19:45 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget
  Cc: git, Emily Shaffer, Christian Couder, Linus Arver
In-Reply-To: <pull.1632.git.1704869487.gitgitgadget@gmail.com>

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This patch series is the first 10 patches of a much larger series I've been
> working. The main goal of this series is to enrich the API in trailer.h. The
> larger series brings a number of additional code simplifications and
> cleanups (exposing and fixing some bugs along the way), and builds on top of
> this series. The goal of the larger series is to make the trailer interface
> ready for unit testing. By "trailer API" I mean those functions exposed in
> trailer.h.

Are there places in the current code that deals with trailers but
does not use the trailer API (e.g., manually parse and/or insert the
trailer in an in-core buffer)?  Is it part of the larger goal to
update these places so that we will always use the trailer API to
touch trailers, and if so, have these places been identified?

Obviously the reason why I ask is that testing cannot be the goal by
itself.  The "alternative" ...

> As an alternative to this patch series, we could keep trailer.h intact and
> decide to unit-test the existing "trailer_info_get()" function which does
> most of the trailer parsing work.

... may allow you to "test", but it would make it more difficult in
the future to revamp the trailer API, if it is needed, in order to
cover code paths that ought to be using but currently bypassing the
trailer API.

> This series breaks up "process_trailers()" into smaller pieces, exposing
> many of the parts relevant to trailer-related processing in trailer.h. This
> forces us to start writing unit tests for these now public functions, but
> that is a good thing because those same unit tests should be easy to write
> (due to their small(er) sizes), but also, because those unit tests will now
> ensure some degree of stability across new versions of trailer.h (we will
> start noticing when the behavior of any of these API functions change).

And helper functions, each of which does one small thing well, may
be more applicable to other code paths that are currently bypassing
the API.

> Thanks to the aggressive refactoring in this series, I've been able to
> identify and fix a couple bugs in our existing implementation. Those fixes
> build on top of this series but were not included here, in order to keep
> this series small.

It would be nicer to have a concise list of these fixes (in the form
of "git shortlog") as a teaser here ;-).  That would hopefully
entice others into reviewing this part that forms the foundation.

Thanks.

^ permalink raw reply

* Re: [PATCH 2/4] reftable/stack: refactor reloading to use file descriptor
From: Junio C Hamano @ 2024-01-10 19:57 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <726d302d7b21a5b948575492c717cfdb701de5cd.1704714575.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> We're about to introduce a stat(3P)-based caching mechanism to reload
> the list of stacks only when it has changed. In order to avoid race
> conditions this requires us to have a file descriptor available that we
> can use to call fstat(3P) on.
>
> Prepare for this by converting the code to use `fd_read_lines()` so that
> we have the file descriptor readily available.
> ---

Missing sign-off.

Other than that, the change is a refactoring that is very faithful
to the original.  Instead of doing the "open - pretend we opened an
empty file if it does not exist - read - close" dance all inside the
read_lines() call, this sort-of open codes the helper in this caller,
so that later steps of this series can look at the file descriptor
open to it.

Looking good.

>  reftable/stack.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/reftable/stack.c b/reftable/stack.c
> index bf869a6772..b1ee247601 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -308,6 +308,7 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
>  	struct timeval deadline;
>  	int64_t delay = 0;
>  	int tries = 0, err;
> +	int fd = -1;
>  
>  	err = gettimeofday(&deadline, NULL);
>  	if (err < 0)
> @@ -329,9 +330,19 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
>  		if (tries > 3 && tv_cmp(&now, &deadline) >= 0)
>  			goto out;
>  
> -		err = read_lines(st->list_file, &names);
> -		if (err < 0)
> -			goto out;
> +		fd = open(st->list_file, O_RDONLY);
> +		if (fd < 0) {
> +			if (errno != ENOENT) {
> +				err = REFTABLE_IO_ERROR;
> +				goto out;
> +			}
> +
> +			names = reftable_calloc(sizeof(char *));
> +		} else {
> +			err = fd_read_lines(fd, &names);
> +			if (err < 0)
> +				goto out;
> +		}
>  
>  		err = reftable_stack_reload_once(st, names, reuse_open);
>  		if (!err)
> @@ -356,12 +367,16 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
>  		names = NULL;
>  		free_names(names_after);
>  		names_after = NULL;
> +		close(fd);
> +		fd = -1;
>  
>  		delay = delay + (delay * rand()) / RAND_MAX + 1;
>  		sleep_millisec(delay);
>  	}
>  
>  out:
> +	if (fd >= 0)
> +		close(fd);
>  	free_names(names);
>  	free_names(names_after);
>  	return err;

^ permalink raw reply

* Re: [PATCH 3/4] reftable/stack: use stat info to avoid re-reading stack list
From: Junio C Hamano @ 2024-01-10 20:07 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <4fabdc3d8016dbc1e20fbe90058ee7320a5f770b.1704714575.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> We can do better and use the same stat(3P)-based mechanism that the
> "packed" backend uses. Instead of reading the file, we will only open
> the file descriptor, fstat(3P) it, and then compare the info against the
> cached value from the last time we have updated the stack. This should
> always work alright because "tables.list" is updated atomically via a
> rename, so even if the ctime or mtime wasn't granular enough to identify
> a change, at least the inode number should have changed.

Or the file size.  Let's keep in mind that many users get useless
inum from their filesystem X-<.

>   Summary
>     update-ref: create many refs (refcount = 1, revision = HEAD~) ran
>       1.01 ± 0.09 times faster than update-ref: create many refs (refcount = 1, revision = HEAD)
>       2.72 ± 0.11 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
>       3.42 ± 0.13 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
>     163.59 ± 5.62 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
>     233.91 ± 7.92 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)
> ---

Nice.

> @@ -374,6 +375,8 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
>  		sleep_millisec(delay);
>  	}
>  
> +	stat_validity_update(&st->list_validity, fd);
> +
>  out:
>  	if (fd >= 0)
>  		close(fd);

The stat_validity_update() does not happen in the error codepath.

Should we be clearing the validity of the list when somebody jumps
to "out:" due to an error?  Or by the time this function gets
called, the caller would already have cleared the validity and an
error that jumps to "out:" keeps the list invalid?

Other than the missing sign-off, the change looks very straight-forward.

^ permalink raw reply

* [DISCUSS] Introducing Rust into the Git project
From: Taylor Blau @ 2024-01-10 20:16 UTC (permalink / raw)
  To: git

Over the holiday break at the end of last year I spent some time
thinking on what it would take to introduce Rust into the Git project.

There is significant work underway to introduce Rust into the Linux
kernel (see [1], [2]). Among their stated goals, I think there are a few
which could be potentially relevant to the Git project:

  - Lower risk of memory safety bugs, data races, memory leaks, etc.
    thanks to the language's safety guarantees.

  - Easier to gain confidence when refactoring or introducing new code
    in Rust (assuming little to no use of the language's `unsafe`
    feature).

  - Contributing to Git becomes easier and accessible to a broader group
    of programmers by relying on a more modern language.

Given the allure of these benefits, I think it's at least worth
considering and discussing how Rust might make its way into Junio's
tree.

I imagine that the transition state would involve some parts of the
project being built in C and calling into Rust code via FFI (and perhaps
vice-versa, with Rust code calling back into the existing C codebase).
Luckily for us, Rust's FFI provides a zero-cost abstraction [3], meaning
there is no performance impact when calling code from one language in
the other.

Some open questions from me, at least to get the discussion going are:

  1. Platform support. The Rust compiler (rustc) does not enjoy the same
     widespread availability that C compilers do. For instance, I
     suspect that NonStop, AIX, Solaris, among others may not be
     supported.

     One possible alternative is to have those platforms use a Rust
     front-end for a compiler that they do support. The gccrs [4]
     project would allow us to compile Rust anywhere where GCC is
     available. The rustc_codegen_gcc [5] project uses GCC's libgccjit
     API to target GCC from rustc itself.

  2. Migration. What parts of Git are easiest to convert to Rust? My
     hunch is that the answer is any stand-alone libraries, like
     strbuf.h. I'm not sure how we should identify these, though, and in
     what order we would want to move them over.

  3. Interaction with the lib-ification effort. There is lots of work
     going on in an effort to lib-ify much of the Git codebase done by
     Google. I'm not sure how this would interact with that effort, but
     we should make sure that one isn't a blocker for the other.

I'm curious to hear what others think about this. I think that this
would be an exciting and worthwhile direction for the project. Let's
see!

Thanks,
Taylor

[1]: https://rust-for-linux.com/
[2]: https://lore.kernel.org/rust-for-linux/20210414184604.23473-1-ojeda@kernel.org/
[3]: https://blog.rust-lang.org/2015/04/24/Rust-Once-Run-Everywhere.html#c-talking-to-rust
[4]: https://github.com/Rust-GCC/gccrs
[5]: https://github.com/rust-lang/rustc_codegen_gcc

^ permalink raw reply

* Re: [PATCH 4/4] reftable/blocksource: use mmap to read tables
From: Junio C Hamano @ 2024-01-10 21:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <a23f38a80609a5c5a6931400ffd28a92b33838bb.1704714575.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> Using mmap comes with a significant drawback on Windows though, because
> mmapped files cannot be deleted and neither is it possible to rename
> files onto an mmapped file. But for one, the reftable library gracefully
> handles the case where auto-compaction cannot delete a still-open stack
> already and ignores any such errors. Also, `reftable_stack_clean()` will
> prune stale tables which are not referenced by "tables.list" anymore so
> that those files can eventually be pruned. And second, we never rewrite
> already-rewritten stacks, so it does not matter that we cannot rename a
> file over an mmaped file, either.

I somehow thought that we use "read into an allocated memory and
pretend as if we mapped" emulation on Windows, so all of that may be
moot.

> diff --git a/reftable/blocksource.c b/reftable/blocksource.c
> index a1ea304429..5d3f3d264c 100644
> --- a/reftable/blocksource.c
> +++ b/reftable/blocksource.c
> @@ -13,6 +13,12 @@ license that can be found in the LICENSE file or at
>  #include "reftable-blocksource.h"
>  #include "reftable-error.h"
>  
> +#if defined(NO_MMAP)
> +static int use_mmap = 0;
> +#else
> +static int use_mmap = 1;
> +#endif

Is there (do you need) some externally controllable knob that the
user can use to turn this off on a system that is compiled without
NO_MMAP?  Otherwise it is misleading to have this as a variable.

> -static void file_close(void *b)
> +static void file_close(void *v)
>  {
> -	int fd = ((struct file_block_source *)b)->fd;
> -	if (fd > 0) {
> -		close(fd);
> -		((struct file_block_source *)b)->fd = 0;
> +	struct file_block_source *b = v;
> +
> +	if (b->fd >= 0) {
> +		close(b->fd);
> +		b->fd = -1;
>  	}
>  
> +	if (use_mmap)
> +		munmap(b->data, b->size);
> +	else
> +		reftable_free(b->data);
> +	b->data = NULL;
> +
>  	reftable_free(b);
>  }

It would have been nicer to do this kind of "a void pointer is taken
and we immediately cast it to a concrete and useful type upon entry"
clean-up as a separate step that is purely clean-up, if there were
many instances.  It is the first such one in the series as far as I
remember, so it is not a huge deal.

> @@ -108,9 +119,7 @@ static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
>  {
>  	struct file_block_source *b = v;
>  	assert(off + size <= b->size);
> -	dest->data = reftable_malloc(size);
> -	if (pread_in_full(b->fd, dest->data, size, off) != size)
> -		return -1;
> +	dest->data = b->data + off;
>  	dest->len = size;
>  	return size;
>  }

So, we used to delay the allocation and reading of a block until
this function gets called.  Now, by the time the control reaches
the function, we are expected to have the data handy at b->data.
That is ensured by reftable_block_source_from_file(), I presume?

> @@ -127,8 +136,10 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
>  {
>  	struct stat st = { 0 };
>  	int err = 0;
> -	int fd = open(name, O_RDONLY);
> +	int fd;
>  	struct file_block_source *p = NULL;
> +
> +	fd = open(name, O_RDONLY);
>  	if (fd < 0) {
>  		if (errno == ENOENT) {
>  			return REFTABLE_NOT_EXIST_ERROR;

This is a no-op clean-up that would have been better in a separate
clean-up step.  Again, not a huge deal.

> @@ -144,7 +155,18 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
>  
>  	p = reftable_calloc(sizeof(struct file_block_source));
>  	p->size = st.st_size;
> -	p->fd = fd;
> +	if (use_mmap) {
> +		p->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +		p->fd = fd;

This is a bit unusual for our use of mmap() where the norm is to
close the file descriptor once we mapped (we only need the memory
region itself and not the originating file descriptor to unmap it).

Is there a reason why we need to keep p->fd?  Now the other side of
this if/else preallocates the whole thing and slurps everything in
core to allow the remainder of the code to mimic what happens on the
mmaped block memory (e.g., we saw how file_read_block() assumes that
b->data[0..b->size] are valid) and does not obviously need p->fd,
can we just remove .fd member from the file_block_source structure?

That way, file_close() can also lose the conditional close() call.

For that matter, do we need any codepath in this file that is
enabled by !use_mmap?  Can't we just use xmmap() and let its
"instead, we allocate, read into it and return" emulation?

Thanks.

> +	} else {
> +		p->data = xmalloc(st.st_size);
> +		if (read_in_full(fd, p->data, st.st_size) != st.st_size) {
> +			close(fd);
> +			return -1;
> +		}
> +		close(fd);
> +		p->fd = -1;
> +	}
>  
>  	assert(!bs->ops);
>  	bs->ops = &file_vtable;

^ permalink raw reply

* Re: ps/reftable-optimize-io
From: Junio C Hamano @ 2024-01-10 21:25 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <ZZ5AJL4di1TAC-up@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

> Let's wait a few days for reviews. I don't expect there to be a ton of
> reviews from the usual contributors due to the still-limited knowledge
> around reftables in our community.

A successful nerd-sniping?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox