Git development

Git development
 help / color / mirror / Atom feed

* Re: none
From: Junio C Hamano @ 2023-10-17 20:21 UTC (permalink / raw)
  To: Dorcas Litunya; +Cc: christian.couder, git
In-Reply-To: <ZS2ESFGP2H3CTJSK@dorcaslitunya-virtual-machine>

Dorcas Litunya <anonolitunya@gmail.com> writes:

> Bcc: 
> Subject: Re: [PATCH] t/t7601: Modernize test scripts using functions
> Reply-To: 
> In-Reply-To: <xmqq1qdumrto.fsf@gitster.g>

What are these lines doing here?

> So should I replace this in the next version or leave this as is?

Perhaps I was not clear enough, but I found the commit title and
description need to be updated to clearly record the intent of the
change with a handful of points, so I will not be accepting the
patch as-is.

These two sections may be of help.

Documentation/MyFirstContribution.txt::now-what
Documentation/MyFirstContribution.txt::reviewing

Thanks.

^ permalink raw reply

* Re: [PATCH 2/8] t7900: setup and tear down clones
From: Kristoffer Haugsbakk @ 2023-10-17 20:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, stolee
In-Reply-To: <xmqqwmvlgg71.fsf@gitster.g>

On Tue, Oct 17, 2023, at 22:13, Junio C Hamano wrote:
> As I already said in my response to the cover letter, while I am
> surprised that the series managed to make each step (and it alone)
> succeed after the set-up (applaud!), I am not sure if it is really
> worth doing.  As the business of test scripts is to test git, and it
> means that we should always assume that we are dealing with a
> potentially broken version of git.  By running so many git
> subcommands in test_when_finished, each of them may be from a buggy
> implementation of git, we cannot be really sure that we are
> resetting the environment to the pristine state.  We should strive
> to do as little as possible in test_when_finished.

I'll have to think more about this part in order to understand the
ramifications. Thanks for the feedback.

> This is even worse; it has to redo much of what the previous test
> did.  Developers cannot be reasonably expected to maintain this
> duplication when we need to change the earlier test.
>
> While I am impressed that "set-up + individual single test" was made
> to work, I am not convinced that the changes that took us to get
> there are reasonable.  The end result looks much less maintainable
> and more wasteful with duplicated steps.
>
> Thanks.

I can rewrite this one—as well as others—to use the `setup` keyword in the
original test instead.

But dropping the series is also fine. I am still very new to this test
suite.

Cheers

-- 
Kristoffer Haugsbakk

^ permalink raw reply

* Re: [PATCH 0/8] t7900: untangle test dependencies
From: Kristoffer Haugsbakk @ 2023-10-17 20:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, stolee
In-Reply-To: <xmqqbkcxhvf9.fsf@gitster.g>

On Tue, Oct 17, 2023, at 21:59, Junio C Hamano wrote:
> It is kind-of surprising that with only 8 patches you can reach such
> a state, but ...
>
>> # The tests that used to depend on each other should still pass
>> # when run together
>> ./t7900-maintenance.sh --quiet --run=setup,30,31 &&
>
> ... this puzzles me.  What does it mean for tests to "depend on each
> other"?  Does this mean running #31 with or without running #30 runs
> under different condition and potentially run different things?

What I mean is that some preceding test has a side-effect that a test
depends on. Or that the test depends on some test *not* having done
something; patch 9/8 changes `maintenance.auto config option` to delete
and init the repository since it depends on the preceding tests *not*
having run `git maintenance register`, since that turns off the default
`true` value of `maintenance.auto`.

(Maybe those last meta-tests with combining tests like number 30 and 31
was a bit silly.)

> One might argue that, in an ideal world, our tests should work when
> any non-setup tests are omitted (so, instead of $i above, you'll
> have an arbitrary subsequence of 1..42 and your tests still pass),
> and it may be a worthy goal, but at the same time, it may be a bit
> impractical, as setting things up is costly, but what you can do in
> the common "setup" will be very small.  Or you'll have so much
> "recovering from damage" in test_when_finished for each test that
> makes such untangling of dependencies too costly.

I don't know what the policy is. :) My motivation was that I was working
on something else which seemed to break the suite, then I tried to reduce
the tests that were run to get rid of the noise (`--verbose`), but then it
got confusing because I didn't know if I had really broken some tests
myself or if more tests would start failing by only running a subset of
them.

That last patch 9/8 deals with what I discovered when I added two tests
before `maintenance.auto config option`; the test started failing, which
made me think that my changes might have some side-effect on what the test
is testing. But it was just an invisible dependency on `git maintenance
register` *not* having been run in the whole suite up until that point.

Cheers

^ permalink raw reply

* Re: [PATCH 2/8] t7900: setup and tear down clones
From: Junio C Hamano @ 2023-10-17 20:13 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git, stolee
In-Reply-To: <e3987cda75e4db72393f85de4bbb71d2ebaa097b.1697319294.git.code@khaugsbakk.name>

Kristoffer Haugsbakk <code@khaugsbakk.name> writes:

> Test `loose-objects task` depends on the two clones setup in `prefetch
> multiple remotes`.
>
> Reuse the two clones setup and tear down the clones afterwards in both
> tests.
>
> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
> ---
>  t/t7900-maintenance.sh | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index ca86b2ba687..ebc207f1a58 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -145,6 +145,12 @@ test_expect_success 'run --task=prefetch with no remotes' '
>  '
>  
>  test_expect_success 'prefetch multiple remotes' '
> +	test_when_finished rm -r clone1 &&
> +	test_when_finished rm -r clone2 &&
> +	test_when_finished git remote remove remote1 &&
> +	test_when_finished git remote remove remote2 &&
> +	test_when_finished git tag --delete one &&
> +	test_when_finished git tag --delete two &&
>  	git clone . clone1 &&
>  	git clone . clone2 &&
>  	git remote add remote1 "file://$(pwd)/clone1" &&

As I already said in my response to the cover letter, while I am
surprised that the series managed to make each step (and it alone)
succeed after the set-up (applaud!), I am not sure if it is really
worth doing.  As the business of test scripts is to test git, and it
means that we should always assume that we are dealing with a
potentially broken version of git.  By running so many git
subcommands in test_when_finished, each of them may be from a buggy
implementation of git, we cannot be really sure that we are
resetting the environment to the pristine state.  We should strive
to do as little as possible in test_when_finished.

> @@ -175,6 +181,22 @@ test_expect_success 'prefetch multiple remotes' '
>  '
>  
>  test_expect_success 'loose-objects task' '
> +	test_when_finished rm -r clone1 &&
> +	test_when_finished rm -r clone2 &&
> +	test_when_finished git remote remove remote1 &&
> +	test_when_finished git remote remove remote2 &&
> +	test_when_finished git tag --delete one &&
> +	test_when_finished git tag --delete two &&

Ditto.

> +	git clone . clone1 &&
> +	git clone . clone2 &&
> +	git remote add remote1 "file://$(pwd)/clone1" &&
> +	git remote add remote2 "file://$(pwd)/clone2" &&
> +	git -C clone1 switch -c one &&
> +	git -C clone2 switch -c two &&
> +	test_commit -C clone1 one &&
> +	test_commit -C clone2 two &&
> +	git fetch --all &&

This is even worse; it has to redo much of what the previous test
did.  Developers cannot be reasonably expected to maintain this
duplication when we need to change the earlier test.

While I am impressed that "set-up + individual single test" was made
to work, I am not convinced that the changes that took us to get
there are reasonable.  The end result looks much less maintainable
and more wasteful with duplicated steps.

Thanks.

^ permalink raw reply

* [PATCH v3 3/3] subtree: adding test to validate fix
From: Zach FettersMoore @ 2023-10-17 20:02 UTC (permalink / raw)
  To: gitgitgadget, git; +Cc: zach.fetters
In-Reply-To: <pull.1587.v3.git.1696019580.gitgitgadget@gmail.com>

From: "Zach FettersMoore via GitGitGadget" <gitgitgadget@gmail.com>

> From: Zach FettersMoore <zach.fetters@apollographql.com>
> 
> Adding a test to validate that the proposed fix
> solves the issue.
> 
> The test accomplishes this by checking the output
> of the split command to ensure the output from
> the progress of 'process_split_commit' function
> that represents the 'extracount' of commits
> processed does not increment.
> 
> This was tested against the original functionality
> to show the test failed, and then with this fix
> to show the test passes.
> 
> This illustrated that when using multiple subtrees,
> A and B, when doing a split on subtree B, the
> processing does not traverse the entire history
> of subtree A which is unnecessary and would cause
> the 'extracount' of processed commits to climb
> based on the number of commits in the history of
> subtree A.
> 
> Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
> ---
>  contrib/subtree/t/t7900-subtree.sh | 41 ++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
> index 49a21dd7c9c..57c12e9f924 100755
> --- a/contrib/subtree/t/t7900-subtree.sh
> +++ b/contrib/subtree/t/t7900-subtree.sh
> @@ -385,6 +385,47 @@ test_expect_success 'split sub dir/ with --rejoin' '
>  	)
>  '
>  
> +test_expect_success 'split with multiple subtrees' '
> +	subtree_test_create_repo "$test_count" &&
> +	subtree_test_create_repo "$test_count/subA" &&
> +	subtree_test_create_repo "$test_count/subB" &&
> +	test_create_commit "$test_count" main1 &&
> +	test_create_commit "$test_count/subA" subA1 &&
> +	test_create_commit "$test_count/subA" subA2 &&
> +	test_create_commit "$test_count/subA" subA3 &&
> +	test_create_commit "$test_count/subB" subB1 &&
> +	(
> +		cd "$test_count" &&
> +		git fetch ./subA HEAD &&
> +		git subtree add --prefix=subADir FETCH_HEAD
> +	) &&
> +	(
> +		cd "$test_count" &&
> +		git fetch ./subB HEAD &&
> +		git subtree add --prefix=subBDir FETCH_HEAD
> +	) &&
> +	test_create_commit "$test_count" subADir/main-subA1 &&
> +	test_create_commit "$test_count" subBDir/main-subB1 &&
> +	(
> +		cd "$test_count" &&
> +		git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1"
> +	) &&
> +	(
> +		cd "$test_count" &&
> +		git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1"
> +	) &&
> +	test_create_commit "$test_count" subADir/main-subA2 &&
> +	test_create_commit "$test_count" subBDir/main-subB2 &&
> +	(
> +		cd "$test_count" &&
> +		git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2"
> +	) &&
> +	(
> +		cd "$test_count" &&
> +		test "$(git subtree split --prefix=subBDir --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
> +	)
> +'
> +
>  test_expect_success 'split sub dir/ with --rejoin from scratch' '
>  	subtree_test_create_repo "$test_count" &&
>  	test_create_commit "$test_count" main1 &&
> --

Checking to see if there is something else I need to do with this
to get it across the finish line, wasn't sure if something else
was needed since it's been dormant since my last push a few 
weeks ago.

Thanks

^ permalink raw reply

* Re: [PATCH 0/8] t7900: untangle test dependencies
From: Junio C Hamano @ 2023-10-17 19:59 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git, stolee
In-Reply-To: <cover.1697319294.git.code@khaugsbakk.name>

Kristoffer Haugsbakk <code@khaugsbakk.name> writes:

> #!/bin/sh
> cd t
> # Every test run together with `setup` should pass
> for i in $(seq 1 42)
> do
>     ./t7900-maintenance.sh --quiet --run=setup,$i || return 1
> done &&

It is kind-of surprising that with only 8 patches you can reach such
a state, but ...

> # The tests that used to depend on each other should still pass
> # when run together
> ./t7900-maintenance.sh --quiet --run=setup,30,31 &&

... this puzzles me.  What does it mean for tests to "depend on each
other"?  Does this mean running #31 with or without running #30 runs
under different condition and potentially run different things?

One might argue that, in an ideal world, our tests should work when
any non-setup tests are omitted (so, instead of $i above, you'll
have an arbitrary subsequence of 1..42 and your tests still pass),
and it may be a worthy goal, but at the same time, it may be a bit
impractical, as setting things up is costly, but what you can do in
the common "setup" will be very small.  Or you'll have so much
"recovering from damage" in test_when_finished for each test that
makes such untangling of dependencies too costly.

^ permalink raw reply

* Re: [PATCH] grep: die gracefully when outside repository
From: Kristoffer Haugsbakk @ 2023-10-17 19:51 UTC (permalink / raw)
  To: gitster; +Cc: Kristoffer Haugsbakk, git, ks1322, peff
In-Reply-To: <xmqqmswhjj48.fsf@gitster.g>

On Tue, Oct 17, 2023, at 18:42, Junio C Hamano wrote:
> It is curious that the original has two sources of hint_path (i.e.,
> get_git_dir() is used as a fallback for get_git_work_tree()).  Are
> we certain that the check is at the right place?  If we do not have
> a repository, then both would fail by returning NULL, so it should
> not matter if we add the new check before we check either or both,
> or even after we checked both before dying.
>
> I wonder if
>
> 	const char *hint_path = get_git_work_tree();
>
> 	if (!hint_path)
> 	        hint_path = get_git_dir();
> 	if (hint_path)
> 		die(_("%s: '%s' is outside repository at '%s'"),
> 		    elt, copyfrom, absolute_path(hint_path));
> 	else
> 		die(_("%s: '%s' is outside the directory tree"),
> 		    elt, copyfrom);
>
> makes the intent of the code clearer.

That doesn't work since `get_git_dir()` triggers `BUG` instead of
returning `NULL`.

The `hint_path` declaration has to be at the start because of style
rules. But we can initialize it after.

I can also have a second look at the test since I am using `grep` to
test the failure output and not the translation string variant.

-- >8 --
Subject: [PATCH] fixup! grep: die gracefully when outside repository

---
 pathspec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/pathspec.c b/pathspec.c
index e115832f17a..0c1061fad11 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -467,10 +467,11 @@ static void init_pathspec_item(struct pathspec_item *item, unsigned flags,
 		match = prefix_path_gently(prefix, prefixlen,
 					   &prefixlen, copyfrom);
 		if (!match) {
-			const char *hint_path = get_git_work_tree();
+			const char *hint_path;
 			if (!have_git_dir())
 				die(_("'%s' is outside the directory tree"),
 				    copyfrom);
+			hint_path = get_git_work_tree();
 			if (!hint_path)
 				hint_path = get_git_dir();
 			die(_("%s: '%s' is outside repository at '%s'"), elt,
-- 
2.42.0.2.g879ad04204

^ permalink raw reply related

* Re: [PATCH v2 1/1] [OUTREACHY] add: standardize die() messages output.
From: Junio C Hamano @ 2023-10-17 19:41 UTC (permalink / raw)
  To: Naomi Ibe; +Cc: git
In-Reply-To: <20231017113946.747-1-naomi.ibeh69@gmail.com>

Naomi Ibe <naomi.ibeh69@gmail.com> writes:

>  builtin/add.c: clean up die() messages
>
>     As described in the CodingGuidelines document, a single line
>     message given to die() and its friends should not capitalize its
>     first word, and should not add full-stop at the end.
>
> Signed-off-by: Naomi Ibe <naomi.ibeh69@gmail.com>

That's a strange formatting.  I think you meant to make the first
line (i.e. "builtin/add.c:...") as the Subject: of the e-mail, and
the three lines of text as the body, without any indentation.

Will queue with such a fix locally to save time, but please make
sure you know what in your workflow to produce a patch and send it
out caused this strange formatting, so that you can fix it and send
your patches correctly when you need to do so for real.

Thanks.

^ permalink raw reply

* Re: [PATCH] commit: detect commits that exist in commit-graph but not in the ODB
From: Junio C Hamano @ 2023-10-17 18:34 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Taylor Blau
In-Reply-To: <ZS4rmtBTYnp2RMiY@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

> Fair point indeed. The following is a worst-case scenario benchmark of
> of the change where we do a full topological walk of all reachable
> commits in the graph, executed in linux.git. We parse commit parents via
> `repo_parse_commit_gently()`, so the new code path now basically has to
> check for object existence of every reachable commit:
> ...
> The added check does lead to a performance regression indeed, which is
> not all that unexpected. That being said, the commit-graph still results
> in a significant speedup compared to the case where we don't have it.

Yeah, I agree that both points are expected.  An extra check that is
much cheaper than the full parsing is paying a small price to be a
bit more careful than before.  The question is if the price is small
enough.  I am still not sure if the extra carefulness is warranted
for all normal cases to spend 30% extra cycles

Yeah, I agree that both points are expected.  An extra check that is
much cheaper than the full parsing is paying a small price to be a
bit more careful than before.  The question is if the price is small
enough.  I am still not sure if the extra carefulness is warranted
for all normal cases to spend 30% extra cycles.

Thanks.

^ permalink raw reply

* Re: [PATCH v3 0/5] config-parse: create config parsing library
From: Junio C Hamano @ 2023-10-17 17:13 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, jonathantanmy, calvinwan, glencbz
In-Reply-To: <cover.1695330852.git.steadmon@google.com>

Josh Steadmon <steadmon@google.com> writes:

> Config parsing no longer uses global state as of gc/config-context, so the
> natural next step for libification is to turn that into its own library.
> This series starts that process by moving config parsing into
> config-parse.[c|h] so that other programs can include this functionality
> without pulling in all of config.[c|h].

This has been in list archive collecting dust.  It is unfortunate
that not many people appear to be interested in reviewing others'
patches?

> Open questions:
> - How do folks feel about the do_event() refactor in patches 2 & 3?

I gave a quick re-read and found that the code after patch 2 made it
easier to see how config.c::do_event() does its thing (even though
the patch text of that exact step was somehow a bit hard to follow).

However, the helper added by patch 3, do_event_and_flush(), that
duplicates exactly what do_event() does, is hard to reason about, at
least for me.  It returns early without setting .previous_type to
EOF and the value returned from the helper signals if that is the
case (the two early return points both return what flush_event()
gave us), but the only caller of the helper does not even inspect
the return value, unlike all the callers of do_event(), which also
looks a bit fishy.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 2/2] Prevent git from rehashing 4GiB files
From: Junio C Hamano @ 2023-10-17 17:02 UTC (permalink / raw)
  To: Jason Hatton
  Cc: Jeff King, brian m. carlson, git@vger.kernel.org, Taylor Blau
In-Reply-To: <MW5PR16MB50791EDD3D006EFBE8DBBCB3AFD6A@MW5PR16MB5079.namprd16.prod.outlook.com>

Jason Hatton <jhatton@globalfinishing.com> writes:

> We could probably write the below to do the same thing.
>
> void fill_stat_data(struct stat_data *sd, struct stat *st)
> {
>       sd->sd_ctime.sec = (unsigned int)st->st_ctime;
>       sd->sd_mtime.sec = (unsigned int)st->st_mtime;
>       sd->sd_ctime.nsec = ST_CTIME_NSEC(*st);
>       sd->sd_mtime.nsec = ST_MTIME_NSEC(*st);
>       sd->sd_dev = st->st_dev;
>       sd->sd_ino = st->st_ino;
>       sd->sd_uid = st->st_uid;
>       sd->sd_gid = st->st_gid;
>       sd->sd_size = st->st_size;
> +      if (sd->sd_size == 0 && st->st_size!= 0) {
> +            sd->sd_size = 1;
> +      }
> }

The above is a fairly straight-forward inlining (except that it does
explicit comparisons with zero) of the helper function the version
of patch under discussion added, and uses 1 instead of (1<<31) as an
arbigrary nonzero number that can be used to work around the issue.

So I agree with you that it would do the same thing.  I am not
surprised if it also gets scolded by Coverity the same way, though.

^ permalink raw reply

* Re:[PATCH] t/t7601: Modernize test scripts using functions
From: Dorcas Litunya @ 2023-10-17 16:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: anonolitunya, git, christian.couder
In-Reply-To: <ZS2ESFGP2H3CTJSK@dorcaslitunya-virtual-machine>

On Mon, Oct 16, 2023 at 09:43:24PM +0300, Dorcas Litunya wrote:
> Bcc: 
> Subject: Re: [PATCH] t/t7601: Modernize test scripts using functions
> Reply-To: 
> In-Reply-To: <xmqq1qdumrto.fsf@gitster.g>
> 
> On Mon, Oct 16, 2023 at 09:53:55AM -0700, Junio C Hamano wrote:
> > Dorcas AnonoLitunya <anonolitunya@gmail.com> writes:
> > 
> > > Subject: Re: [PATCH] t/t7601: Modernize test scripts using functions
> > 
> > Let's try if we can pack a bit more information.  For example
> > 
> > Subject: [PATCH] t7601: use "test_path_is_file" etc. instead of "test -f"
> > 
> > would clarify what kind of modernization is done by this patch.
> > 
> > > The test script is currently using the command format 'test -f' to
> > > check for existence or absence of files.
> > 
> > "is currently using" -> "uses".
> > 
> > > Replace it with new helper functions following the format
> > > 'test_path_is_file'.
> > 
> > I am not sure what role "the format" plays in this picture.
> > test_path_is_file is not new---it has been around for quite a while.
> > 
> > > Consequently, the patch also replaces the inverse command '! test -f' or
> > > 'test ! -f' with new helper function following the format
> > > 'test_path_is_missing'
> > 
> > A bit more on this later.
> >
> So should I replace this in the next version or leave this as is?
Hello Junio,

Following up on this? What are your thoughts on it?

Thanks!

Dorcas
> > > This adjustment using helper functions makes the code more readable and
> > > easier to understand.
> > 
> > Looking good.  If I were writing this, I'll make the whole thing
> > more like this, though:
> > 
> >     t7601: use "test_path_is_file" etc. instead of "test -f"
> > 
> >     Some tests in t7601 use "test -f" and "test ! -f" to see if a
> >     path exists or is missing.  Use test_path_is_file and
> >     test_path_is_missing helper functions to clarify these tests a
> >     bit better.  This especially matters for the "missing" case,
> >     because "test ! -f F" will be happy if "F" exists as a
> >     directory, but the intent of the test is that "F" should not
> >     exist, even as a directory.
> > 
> > 
> > > diff --git a/t/t7601-merge-pull-config.sh b/t/t7601-merge-pull-config.sh
> > > index bd238d89b0..e08767df66 100755
> > > --- a/t/t7601-merge-pull-config.sh
> > > +++ b/t/t7601-merge-pull-config.sh
> > > @@ -349,13 +349,13 @@ test_expect_success 'Cannot rebase with multiple heads' '
> > >  
> > >  test_expect_success 'merge c1 with c2' '
> > >  	git reset --hard c1 &&
> > > -	test -f c0.c &&
> > > -	test -f c1.c &&
> > > -	test ! -f c2.c &&
> > > -	test ! -f c3.c &&
> > > +	test_path_is_file c0.c &&
> > > +	test_path_is_file c1.c &&
> > > +	test_path_is_missing c2.c &&
> > > +	test_path_is_missing c3.c &&
> > 
> > The original says "We are happy if c2.c is not a file", so it would
> > have been happy if by some mistake "git reset" created a directory
> > there.  But the _intent_ of the test is that we do not have anything
> > at c2.c, and the updated code expresses it better.

^ permalink raw reply

* Re: [PATCH] grep: die gracefully when outside repository
From: Junio C Hamano @ 2023-10-17 16:42 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git, ks1322
In-Reply-To: <087c92e3904dd774f672373727c300bf7f5f6369.1697317276.git.code@khaugsbakk.name>

Kristoffer Haugsbakk <code@khaugsbakk.name> writes:

> diff --git a/pathspec.c b/pathspec.c
> index 3a3a5724c44..e115832f17a 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -468,6 +468,9 @@ static void init_pathspec_item(struct pathspec_item *item, unsigned flags,
>  					   &prefixlen, copyfrom);
>  		if (!match) {
>  			const char *hint_path = get_git_work_tree();
> +			if (!have_git_dir())
> +				die(_("'%s' is outside the directory tree"),
> +				    copyfrom);
>  			if (!hint_path)
>  				hint_path = get_git_dir();
>  			die(_("%s: '%s' is outside repository at '%s'"), elt,

It is curious that the original has two sources of hint_path (i.e.,
get_git_dir() is used as a fallback for get_git_work_tree()).  Are
we certain that the check is at the right place?  If we do not have
a repository, then both would fail by returning NULL, so it should
not matter if we add the new check before we check either or both,
or even after we checked both before dying.

I wonder if

	const char *hint_path = get_git_work_tree();

	if (!hint_path)
	        hint_path = get_git_dir();
	if (hint_path)
		die(_("%s: '%s' is outside repository at '%s'"),
		    elt, copyfrom, absolute_path(hint_path));
	else
		die(_("%s: '%s' is outside the directory tree"),
		    elt, copyfrom);

makes the intent of the code clearer.  We want to hint the location
of the repository by computing hint_path, and if we can compute it,
we use it in the error message, but otherwise we don't add hint.  And
we apply that conditional whether we have repository or not---what we
care about is the NULL-ness of the hint string we computed.

> diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> index 39d6d713ecb..b976f81a166 100755
> --- a/t/t7810-grep.sh
> +++ b/t/t7810-grep.sh
> @@ -1234,6 +1234,19 @@ test_expect_success 'outside of git repository with fallbackToNoIndex' '
>  	)
>  '
>  
> +test_expect_success 'outside of git repository with pathspec outside the directory tree' '
> +	test_when_finished rm -fr non &&
> +	rm -fr non &&
> +	mkdir -p non/git/sub &&
> +	(
> +		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
> +		export GIT_CEILING_DIRECTORIES &&
> +		cd non/git &&
> +		test_expect_code 128 git grep --no-index search .. 2>error &&
> +		grep "is outside the directory tree" error

Excellent.  This is a very good use of the GIT_CEILING_DIRECTORIES
facility.

> +	)
> +'
> +
>  test_expect_success 'inside git repository but with --no-index' '
>  	rm -fr is &&
>  	mkdir -p is/git/sub &&
>
> base-commit: 43c8a30d150ecede9709c1f2527c8fba92c65f40

^ permalink raw reply

* [PATCH v2 7/7] builtin/merge-tree.c: implement support for `--write-pack`
From: Taylor Blau @ 2023-10-17 16:31 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697560266.git.me@ttaylorr.com>

When using merge-tree often within a repository[^1], it is possible to
generate a relatively large number of loose objects, which can result in
degraded performance, and inode exhaustion in extreme cases.

Building on the functionality introduced in previous commits, the
bulk-checkin machinery now has support to write arbitrary blob and tree
objects which are small enough to be held in-core. We can use this to
write any blob/tree objects generated by ORT into a separate pack
instead of writing them out individually as loose.

This functionality is gated behind a new `--write-pack` option to
`merge-tree` that works with the (non-deprecated) `--write-tree` mode.

The implementation is relatively straightforward. There are two spots
within the ORT mechanism where we call `write_object_file()`, one for
content differences within blobs, and another to assemble any new trees
necessary to construct the merge. In each of those locations,
conditionally replace calls to `write_object_file()` with
`index_blob_bulk_checkin_incore()` or `index_tree_bulk_checkin_incore()`
depending on which kind of object we are writing.

The only remaining task is to begin and end the transaction necessary to
initialize the bulk-checkin machinery, and move any new pack(s) it
created into the main object store.

[^1]: Such is the case at GitHub, where we run presumptive "test merges"
  on open pull requests to see whether or not we can light up the merge
  button green depending on whether or not the presumptive merge was
  conflicted.

  This is done in response to a number of user-initiated events,
  including viewing an open pull request whose last test merge is stale
  with respect to the current base and tip of the pull request. As a
  result, merge-tree can be run very frequently on large, active
  repositories.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-merge-tree.txt |  4 ++
 builtin/merge-tree.c             |  5 ++
 merge-ort.c                      | 42 +++++++++++----
 merge-recursive.h                |  1 +
 t/t4301-merge-tree-write-tree.sh | 93 ++++++++++++++++++++++++++++++++
 5 files changed, 136 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt
index ffc4fbf7e8..9d37609ef1 100644
--- a/Documentation/git-merge-tree.txt
+++ b/Documentation/git-merge-tree.txt
@@ -69,6 +69,10 @@ OPTIONS
 	specify a merge-base for the merge, and specifying multiple bases is
 	currently not supported. This option is incompatible with `--stdin`.
 
+--write-pack::
+	Write any new objects into a separate packfile instead of as
+	individual loose objects.
+
 [[OUTPUT]]
 OUTPUT
 ------
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 0de42aecf4..672ebd4c54 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -18,6 +18,7 @@
 #include "quote.h"
 #include "tree.h"
 #include "config.h"
+#include "bulk-checkin.h"
 
 static int line_termination = '\n';
 
@@ -414,6 +415,7 @@ struct merge_tree_options {
 	int show_messages;
 	int name_only;
 	int use_stdin;
+	int write_pack;
 };
 
 static int real_merge(struct merge_tree_options *o,
@@ -440,6 +442,7 @@ static int real_merge(struct merge_tree_options *o,
 	init_merge_options(&opt, the_repository);
 
 	opt.show_rename_progress = 0;
+	opt.write_pack = o->write_pack;
 
 	opt.branch1 = branch1;
 	opt.branch2 = branch2;
@@ -548,6 +551,8 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 			   &merge_base,
 			   N_("commit"),
 			   N_("specify a merge-base for the merge")),
+		OPT_BOOL(0, "write-pack", &o.write_pack,
+			 N_("write new objects to a pack instead of as loose")),
 		OPT_END()
 	};
 
diff --git a/merge-ort.c b/merge-ort.c
index 7857ce9fbd..e198d2bc2b 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -48,6 +48,7 @@
 #include "tree.h"
 #include "unpack-trees.h"
 #include "xdiff-interface.h"
+#include "bulk-checkin.h"
 
 /*
  * We have many arrays of size 3.  Whenever we have such an array, the
@@ -2107,10 +2108,19 @@ static int handle_content_merge(struct merge_options *opt,
 		if ((merge_status < 0) || !result_buf.ptr)
 			ret = error(_("failed to execute internal merge"));
 
-		if (!ret &&
-		    write_object_file(result_buf.ptr, result_buf.size,
-				      OBJ_BLOB, &result->oid))
-			ret = error(_("unable to add %s to database"), path);
+		if (!ret) {
+			ret = opt->write_pack
+				? index_blob_bulk_checkin_incore(&result->oid,
+								 result_buf.ptr,
+								 result_buf.size,
+								 path, 1)
+				: write_object_file(result_buf.ptr,
+						    result_buf.size,
+						    OBJ_BLOB, &result->oid);
+			if (ret)
+				ret = error(_("unable to add %s to database"),
+					    path);
+		}
 
 		free(result_buf.ptr);
 		if (ret)
@@ -3596,7 +3606,8 @@ static int tree_entry_order(const void *a_, const void *b_)
 				 b->string, strlen(b->string), bmi->result.mode);
 }
 
-static int write_tree(struct object_id *result_oid,
+static int write_tree(struct merge_options *opt,
+		      struct object_id *result_oid,
 		      struct string_list *versions,
 		      unsigned int offset,
 		      size_t hash_size)
@@ -3630,8 +3641,14 @@ static int write_tree(struct object_id *result_oid,
 	}
 
 	/* Write this object file out, and record in result_oid */
-	if (write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid))
+	ret = opt->write_pack
+		? index_tree_bulk_checkin_incore(result_oid,
+						 buf.buf, buf.len, "", 1)
+		: write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid);
+
+	if (ret)
 		ret = -1;
+
 	strbuf_release(&buf);
 	return ret;
 }
@@ -3796,8 +3813,8 @@ static int write_completed_directory(struct merge_options *opt,
 		 */
 		dir_info->is_null = 0;
 		dir_info->result.mode = S_IFDIR;
-		if (write_tree(&dir_info->result.oid, &info->versions, offset,
-			       opt->repo->hash_algo->rawsz) < 0)
+		if (write_tree(opt, &dir_info->result.oid, &info->versions,
+			       offset, opt->repo->hash_algo->rawsz) < 0)
 			ret = -1;
 	}
 
@@ -4331,9 +4348,13 @@ static int process_entries(struct merge_options *opt,
 		fflush(stdout);
 		BUG("dir_metadata accounting completely off; shouldn't happen");
 	}
-	if (write_tree(result_oid, &dir_metadata.versions, 0,
+	if (write_tree(opt, result_oid, &dir_metadata.versions, 0,
 		       opt->repo->hash_algo->rawsz) < 0)
 		ret = -1;
+
+	if (opt->write_pack)
+		end_odb_transaction();
+
 cleanup:
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
@@ -4877,6 +4898,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 */
 	strmap_init(&opt->priv->conflicts);
 
+	if (opt->write_pack)
+		begin_odb_transaction();
+
 	trace2_region_leave("merge", "allocate/init", opt->repo);
 }
 
diff --git a/merge-recursive.h b/merge-recursive.h
index b88000e3c2..156e160876 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -48,6 +48,7 @@ struct merge_options {
 	unsigned renormalize : 1;
 	unsigned record_conflict_msgs_as_headers : 1;
 	const char *msg_header_prefix;
+	unsigned write_pack : 1;
 
 	/* internal fields used by the implementation */
 	struct merge_options_internal *priv;
diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 250f721795..2d81ff4de5 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -922,4 +922,97 @@ test_expect_success 'check the input format when --stdin is passed' '
 	test_cmp expect actual
 '
 
+packdir=".git/objects/pack"
+
+test_expect_success 'merge-tree can pack its result with --write-pack' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	# base has lines [3, 4, 5]
+	#   - side adds to the beginning, resulting in [1, 2, 3, 4, 5]
+	#   - other adds to the end, resulting in [3, 4, 5, 6, 7]
+	#
+	# merging the two should result in a new blob object containing
+	# [1, 2, 3, 4, 5, 6, 7], along with a new tree.
+	test_commit -C repo base file "$(test_seq 3 5)" &&
+	git -C repo branch -M main &&
+	git -C repo checkout -b side main &&
+	test_commit -C repo side file "$(test_seq 1 5)" &&
+	git -C repo checkout -b other main &&
+	test_commit -C repo other file "$(test_seq 3 7)" &&
+
+	find repo/$packdir -type f -name "pack-*.idx" >packs.before &&
+	tree="$(git -C repo merge-tree --write-pack \
+		refs/tags/side refs/tags/other)" &&
+	blob="$(git -C repo rev-parse $tree:file)" &&
+	find repo/$packdir -type f -name "pack-*.idx" >packs.after &&
+
+	test_must_be_empty packs.before &&
+	test_line_count = 1 packs.after &&
+
+	git show-index <$(cat packs.after) >objects &&
+	test_line_count = 2 objects &&
+	grep "^[1-9][0-9]* $tree" objects &&
+	grep "^[1-9][0-9]* $blob" objects
+'
+
+test_expect_success 'merge-tree can write multiple packs with --write-pack' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		git config pack.packSizeLimit 512 &&
+
+		test_seq 512 >f &&
+
+		# "f" contains roughly ~2,000 bytes.
+		#
+		# Each side ("foo" and "bar") adds a small amount of data at the
+		# beginning and end of "base", respectively.
+		git add f &&
+		test_tick &&
+		git commit -m base &&
+		git branch -M main &&
+
+		git checkout -b foo main &&
+		{
+			echo foo && cat f
+		} >f.tmp &&
+		mv f.tmp f &&
+		git add f &&
+		test_tick &&
+		git commit -m foo &&
+
+		git checkout -b bar main &&
+		echo bar >>f &&
+		git add f &&
+		test_tick &&
+		git commit -m bar &&
+
+		find $packdir -type f -name "pack-*.idx" >packs.before &&
+		# Merging either side should result in a new object which is
+		# larger than 1M, thus the result should be split into two
+		# separate packs.
+		tree="$(git merge-tree --write-pack \
+			refs/heads/foo refs/heads/bar)" &&
+		blob="$(git rev-parse $tree:f)" &&
+		find $packdir -type f -name "pack-*.idx" >packs.after &&
+
+		test_must_be_empty packs.before &&
+		test_line_count = 2 packs.after &&
+		for idx in $(cat packs.after)
+		do
+			git show-index <$idx || return 1
+		done >objects &&
+
+		# The resulting set of packs should contain one copy of both
+		# objects, each in a separate pack.
+		test_line_count = 2 objects &&
+		grep "^[1-9][0-9]* $tree" objects &&
+		grep "^[1-9][0-9]* $blob" objects
+
+	)
+'
+
 test_done
-- 
2.42.0.405.gdb2a2f287e

^ permalink raw reply related

* [PATCH v2 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
From: Taylor Blau @ 2023-10-17 16:31 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697560266.git.me@ttaylorr.com>

The remaining missing piece in order to teach the `merge-tree` builtin
how to write the contents of a merge into a pack is a function to index
tree objects into a bulk-checkin pack.

This patch implements that missing piece, which is a thin wrapper around
all of the functionality introduced in previous commits.

If and when Git gains support for a "compatibility" hash algorithm, the
changes to support that here will be minimal. The bulk-checkin machinery
will need to convert the incoming tree to compute its length under the
compatibility hash, necessary to reconstruct its header. With that
information (and the converted contents of the tree), the bulk-checkin
machinery will have enough to keep track of the converted object's hash
in order to update the compatibility mapping.

Within `deflate_tree_to_pack_incore()`, the changes should be limited
to something like:

    struct strbuf converted = STRBUF_INIT;
    if (the_repository->compat_hash_algo) {
      if (convert_object_file(&compat_obj,
                              the_repository->hash_algo,
                              the_repository->compat_hash_algo, ...) < 0)
        die(...);

      format_object_header_hash(the_repository->compat_hash_algo,
                                OBJ_TREE, size);
    }
    /* compute the converted tree's hash using the compat algorithm */
    strbuf_release(&converted);

, assuming related changes throughout the rest of the bulk-checkin
machinery necessary to update the hash of the converted object, which
are likewise minimal in size.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 27 +++++++++++++++++++++++++++
 bulk-checkin.h |  4 ++++
 2 files changed, 31 insertions(+)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 25cd1ffa25..fe13100e04 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -423,6 +423,22 @@ static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
 						   OBJ_BLOB, path, flags);
 }
 
+static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state,
+				       struct object_id *result_oid,
+				       const void *buf, size_t size,
+				       const char *path, unsigned flags)
+{
+	git_hash_ctx ctx;
+	struct hashfile_checkpoint checkpoint = {0};
+
+	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_TREE,
+				  size);
+
+	return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
+						   result_oid, buf, size,
+						   OBJ_TREE, path, flags);
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -514,6 +530,17 @@ int index_blob_bulk_checkin_incore(struct object_id *oid,
 	return status;
 }
 
+int index_tree_bulk_checkin_incore(struct object_id *oid,
+				   const void *buf, size_t size,
+				   const char *path, unsigned flags)
+{
+	int status = deflate_tree_to_pack_incore(&bulk_checkin_packfile, oid,
+						 buf, size, path, flags);
+	if (!odb_transaction_nesting)
+		flush_bulk_checkin_packfile(&bulk_checkin_packfile);
+	return status;
+}
+
 void begin_odb_transaction(void)
 {
 	odb_transaction_nesting += 1;
diff --git a/bulk-checkin.h b/bulk-checkin.h
index 1b91daeaee..89786b3954 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -17,6 +17,10 @@ int index_blob_bulk_checkin_incore(struct object_id *oid,
 				   const void *buf, size_t size,
 				   const char *path, unsigned flags);
 
+int index_tree_bulk_checkin_incore(struct object_id *oid,
+				   const void *buf, size_t size,
+				   const char *path, unsigned flags);
+
 /*
  * Tell the object database to optimize for adding
  * multiple objects. end_odb_transaction must be called
-- 
2.42.0.405.gdb2a2f287e


^ permalink raw reply related

* [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
From: Taylor Blau @ 2023-10-17 16:31 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697560266.git.me@ttaylorr.com>

Now that we have factored out many of the common routines necessary to
index a new object into a pack created by the bulk-checkin machinery, we
can introduce a variant of `index_blob_bulk_checkin()` that acts on
blobs whose contents we can fit in memory.

This will be useful in a couple of more commits in order to provide the
`merge-tree` builtin with a mechanism to create a new pack containing
any objects it created during the merge, instead of storing those
objects individually as loose.

Similar to the existing `index_blob_bulk_checkin()` function, the
entrypoint delegates to `deflate_blob_to_pack_incore()`, which is
responsible for formatting the pack header and then deflating the
contents into the pack. The latter is accomplished by calling
deflate_blob_contents_to_pack_incore(), which takes advantage of the
earlier refactoring and is responsible for writing the object to the
pack and handling any overage from pack.packSizeLimit.

The bulk of the new functionality is implemented in the function
`stream_obj_to_pack_incore()`, which is a generic implementation for
writing objects of arbitrary type (whose contents we can fit in-core)
into a bulk-checkin pack.

The new function shares an unfortunate degree of similarity to the
existing `stream_blob_to_pack()` function. But DRY-ing up these two
would likely be more trouble than it's worth, since the latter has to
deal with reading and writing the contents of the object.

Consistent with the rest of the bulk-checkin mechanism, there are no
direct tests here. In future commits when we expose this new
functionality via the `merge-tree` builtin, we will test it indirectly
there.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++
 bulk-checkin.h |   4 ++
 2 files changed, 122 insertions(+)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index f4914fb6d1..25cd1ffa25 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -140,6 +140,69 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id
 	return 0;
 }
 
+static int stream_obj_to_pack_incore(struct bulk_checkin_packfile *state,
+				     git_hash_ctx *ctx,
+				     off_t *already_hashed_to,
+				     const void *buf, size_t size,
+				     enum object_type type,
+				     const char *path, unsigned flags)
+{
+	git_zstream s;
+	unsigned char obuf[16384];
+	unsigned hdrlen;
+	int status = Z_OK;
+	int write_object = (flags & HASH_WRITE_OBJECT);
+
+	git_deflate_init(&s, pack_compression_level);
+
+	hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size);
+	s.next_out = obuf + hdrlen;
+	s.avail_out = sizeof(obuf) - hdrlen;
+
+	if (*already_hashed_to < size) {
+		size_t hsize = size - *already_hashed_to;
+		if (hsize) {
+			the_hash_algo->update_fn(ctx, buf, hsize);
+		}
+		*already_hashed_to = size;
+	}
+	s.next_in = (void *)buf;
+	s.avail_in = size;
+
+	while (status != Z_STREAM_END) {
+		status = git_deflate(&s, Z_FINISH);
+		if (!s.avail_out || status == Z_STREAM_END) {
+			if (write_object) {
+				size_t written = s.next_out - obuf;
+
+				/* would we bust the size limit? */
+				if (state->nr_written &&
+				    pack_size_limit_cfg &&
+				    pack_size_limit_cfg < state->offset + written) {
+					git_deflate_abort(&s);
+					return -1;
+				}
+
+				hashwrite(state->f, obuf, written);
+				state->offset += written;
+			}
+			s.next_out = obuf;
+			s.avail_out = sizeof(obuf);
+		}
+
+		switch (status) {
+		case Z_OK:
+		case Z_BUF_ERROR:
+		case Z_STREAM_END:
+			continue;
+		default:
+			die("unexpected deflate failure: %d", status);
+		}
+	}
+	git_deflate_end(&s);
+	return 0;
+}
+
 /*
  * Read the contents from fd for size bytes, streaming it to the
  * packfile in state while updating the hash in ctx. Signal a failure
@@ -316,6 +379,50 @@ static void finalize_checkpoint(struct bulk_checkin_packfile *state,
 	}
 }
 
+static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state,
+					       git_hash_ctx *ctx,
+					       struct hashfile_checkpoint *checkpoint,
+					       struct object_id *result_oid,
+					       const void *buf, size_t size,
+					       enum object_type type,
+					       const char *path, unsigned flags)
+{
+	struct pack_idx_entry *idx = NULL;
+	off_t already_hashed_to = 0;
+
+	/* Note: idx is non-NULL when we are writing */
+	if (flags & HASH_WRITE_OBJECT)
+		CALLOC_ARRAY(idx, 1);
+
+	while (1) {
+		prepare_checkpoint(state, checkpoint, idx, flags);
+		if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to,
+					       buf, size, type, path, flags))
+			break;
+		truncate_checkpoint(state, checkpoint, idx);
+	}
+
+	finalize_checkpoint(state, ctx, checkpoint, idx, result_oid);
+
+	return 0;
+}
+
+static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
+				       struct object_id *result_oid,
+				       const void *buf, size_t size,
+				       const char *path, unsigned flags)
+{
+	git_hash_ctx ctx;
+	struct hashfile_checkpoint checkpoint = {0};
+
+	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
+				  size);
+
+	return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
+						   result_oid, buf, size,
+						   OBJ_BLOB, path, flags);
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -396,6 +503,17 @@ int index_blob_bulk_checkin(struct object_id *oid,
 	return status;
 }
 
+int index_blob_bulk_checkin_incore(struct object_id *oid,
+				   const void *buf, size_t size,
+				   const char *path, unsigned flags)
+{
+	int status = deflate_blob_to_pack_incore(&bulk_checkin_packfile, oid,
+						 buf, size, path, flags);
+	if (!odb_transaction_nesting)
+		flush_bulk_checkin_packfile(&bulk_checkin_packfile);
+	return status;
+}
+
 void begin_odb_transaction(void)
 {
 	odb_transaction_nesting += 1;
diff --git a/bulk-checkin.h b/bulk-checkin.h
index aa7286a7b3..1b91daeaee 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -13,6 +13,10 @@ int index_blob_bulk_checkin(struct object_id *oid,
 			    int fd, size_t size,
 			    const char *path, unsigned flags);
 
+int index_blob_bulk_checkin_incore(struct object_id *oid,
+				   const void *buf, size_t size,
+				   const char *path, unsigned flags);
+
 /*
  * Tell the object database to optimize for adding
  * multiple objects. end_odb_transaction must be called
-- 
2.42.0.405.gdb2a2f287e


^ permalink raw reply related

* [PATCH v2 4/7] bulk-checkin: factor our `finalize_checkpoint()`
From: Taylor Blau @ 2023-10-17 16:31 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697560266.git.me@ttaylorr.com>

In a similar spirit as previous commits, factor out the routine to
finalize the just-written object from the bulk-checkin mechanism.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index b92d7a6f5a..f4914fb6d1 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -292,6 +292,30 @@ static void truncate_checkpoint(struct bulk_checkin_packfile *state,
 	flush_bulk_checkin_packfile(state);
 }
 
+static void finalize_checkpoint(struct bulk_checkin_packfile *state,
+				git_hash_ctx *ctx,
+				struct hashfile_checkpoint *checkpoint,
+				struct pack_idx_entry *idx,
+				struct object_id *result_oid)
+{
+	the_hash_algo->final_oid_fn(result_oid, ctx);
+	if (!idx)
+		return;
+
+	idx->crc32 = crc32_end(state->f);
+	if (already_written(state, result_oid)) {
+		hashfile_truncate(state->f, checkpoint);
+		state->offset = checkpoint->offset;
+		free(idx);
+	} else {
+		oidcpy(&idx->oid, result_oid);
+		ALLOC_GROW(state->written,
+			   state->nr_written + 1,
+			   state->alloc_written);
+		state->written[state->nr_written++] = idx;
+	}
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -324,22 +348,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 		if (lseek(fd, seekback, SEEK_SET) == (off_t) -1)
 			return error("cannot seek back");
 	}
-	the_hash_algo->final_oid_fn(result_oid, &ctx);
-	if (!idx)
-		return 0;
-
-	idx->crc32 = crc32_end(state->f);
-	if (already_written(state, result_oid)) {
-		hashfile_truncate(state->f, &checkpoint);
-		state->offset = checkpoint.offset;
-		free(idx);
-	} else {
-		oidcpy(&idx->oid, result_oid);
-		ALLOC_GROW(state->written,
-			   state->nr_written + 1,
-			   state->alloc_written);
-		state->written[state->nr_written++] = idx;
-	}
+	finalize_checkpoint(state, &ctx, &checkpoint, idx, result_oid);
 	return 0;
 }
 
-- 
2.42.0.405.gdb2a2f287e


^ permalink raw reply related

* [PATCH v2 3/7] bulk-checkin: factor out `truncate_checkpoint()`
From: Taylor Blau @ 2023-10-17 16:31 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697560266.git.me@ttaylorr.com>

In a similar spirit as previous commits, factor our the routine to
truncate a bulk-checkin packfile when writing past the pack size limit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index c1f5450583..b92d7a6f5a 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -276,6 +276,22 @@ static void prepare_checkpoint(struct bulk_checkin_packfile *state,
 	}
 }
 
+static void truncate_checkpoint(struct bulk_checkin_packfile *state,
+				struct hashfile_checkpoint *checkpoint,
+				struct pack_idx_entry *idx)
+{
+	/*
+	 * Writing this object to the current pack will make
+	 * it too big; we need to truncate it, start a new
+	 * pack, and write into it.
+	 */
+	if (!idx)
+		BUG("should not happen");
+	hashfile_truncate(state->f, checkpoint);
+	state->offset = checkpoint->offset;
+	flush_bulk_checkin_packfile(state);
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -304,16 +320,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 		if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
 					 fd, size, path, flags))
 			break;
-		/*
-		 * Writing this object to the current pack will make
-		 * it too big; we need to truncate it, start a new
-		 * pack, and write into it.
-		 */
-		if (!idx)
-			BUG("should not happen");
-		hashfile_truncate(state->f, &checkpoint);
-		state->offset = checkpoint.offset;
-		flush_bulk_checkin_packfile(state);
+		truncate_checkpoint(state, &checkpoint, idx);
 		if (lseek(fd, seekback, SEEK_SET) == (off_t) -1)
 			return error("cannot seek back");
 	}
-- 
2.42.0.405.gdb2a2f287e


^ permalink raw reply related

* [PATCH v2 2/7] bulk-checkin: factor out `prepare_checkpoint()`
From: Taylor Blau @ 2023-10-17 16:31 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697560266.git.me@ttaylorr.com>

In a similar spirit as the previous commit, factor out the routine to
prepare streaming into a bulk-checkin pack into its own function. Unlike
the previous patch, this is a verbatim copy and paste.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index fd3c110d1c..c1f5450583 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -263,6 +263,19 @@ static void format_object_header_hash(const struct git_hash_algo *algop,
 	algop->init_fn(&checkpoint->ctx);
 }
 
+static void prepare_checkpoint(struct bulk_checkin_packfile *state,
+			       struct hashfile_checkpoint *checkpoint,
+			       struct pack_idx_entry *idx,
+			       unsigned flags)
+{
+	prepare_to_stream(state, flags);
+	if (idx) {
+		hashfile_checkpoint(state->f, checkpoint);
+		idx->offset = state->offset;
+		crc32_begin(state->f);
+	}
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -287,12 +300,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 	already_hashed_to = 0;
 
 	while (1) {
-		prepare_to_stream(state, flags);
-		if (idx) {
-			hashfile_checkpoint(state->f, &checkpoint);
-			idx->offset = state->offset;
-			crc32_begin(state->f);
-		}
+		prepare_checkpoint(state, &checkpoint, idx, flags);
 		if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
 					 fd, size, path, flags))
 			break;
-- 
2.42.0.405.gdb2a2f287e


^ permalink raw reply related

* [PATCH v2 1/7] bulk-checkin: factor out `format_object_header_hash()`
From: Taylor Blau @ 2023-10-17 16:31 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697560266.git.me@ttaylorr.com>

Before deflating a blob into a pack, the bulk-checkin mechanism prepares
the pack object header by calling `format_object_header()`, and writing
into a scratch buffer, the contents of which eventually makes its way
into the pack.

Future commits will add support for deflating multiple kinds of objects
into a pack, and will likewise need to perform a similar operation as
below.

This is a mostly straightforward extraction, with one notable exception.
Instead of hard-coding `the_hash_algo`, pass it in to the new function
as an argument. This isn't strictly necessary for our immediate purposes
here, but will prove useful in the future if/when the bulk-checkin
mechanism grows support for the hash transition plan.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 6ce62999e5..fd3c110d1c 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -247,6 +247,22 @@ static void prepare_to_stream(struct bulk_checkin_packfile *state,
 		die_errno("unable to write pack header");
 }
 
+static void format_object_header_hash(const struct git_hash_algo *algop,
+				      git_hash_ctx *ctx,
+				      struct hashfile_checkpoint *checkpoint,
+				      enum object_type type,
+				      size_t size)
+{
+	unsigned char header[16384];
+	unsigned header_len = format_object_header((char *)header,
+						   sizeof(header),
+						   type, size);
+
+	algop->init_fn(ctx);
+	algop->update_fn(ctx, header, header_len);
+	algop->init_fn(&checkpoint->ctx);
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -254,8 +270,6 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 {
 	off_t seekback, already_hashed_to;
 	git_hash_ctx ctx;
-	unsigned char obuf[16384];
-	unsigned header_len;
 	struct hashfile_checkpoint checkpoint = {0};
 	struct pack_idx_entry *idx = NULL;
 
@@ -263,11 +277,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 	if (seekback == (off_t) -1)
 		return error("cannot find the current offset");
 
-	header_len = format_object_header((char *)obuf, sizeof(obuf),
-					  OBJ_BLOB, size);
-	the_hash_algo->init_fn(&ctx);
-	the_hash_algo->update_fn(&ctx, obuf, header_len);
-	the_hash_algo->init_fn(&checkpoint.ctx);
+	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
+				  size);
 
 	/* Note: idx is non-NULL when we are writing */
 	if ((flags & HASH_WRITE_OBJECT) != 0)
-- 
2.42.0.405.gdb2a2f287e


^ permalink raw reply related

* [PATCH v2 0/7] merge-ort: implement support for packing objects together
From: Taylor Blau @ 2023-10-17 16:31 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>

(Previously based on 'eb/limit-bulk-checkin-to-blobs', which has since
been merged. This series is now based on the tip of 'master', which is
a9ecda2788 (The eighteenth batch, 2023-10-13) at the time of writing).

This series implements support for a new merge-tree option,
`--write-pack`, which causes any newly-written objects to be packed
together instead of being stored individually as loose.

Much is unchanged since last time, except for a small tweak to one of
the commit messages in response to feedback from Eric W. Biederman. The
series has also been rebased onto 'master', which had a couple of
conflicts that I resolved pertaining to:

  - 9eb5419799 (bulk-checkin: only support blobs in index_bulk_checkin,
    2023-09-26)
  - e0b8c84240 (treewide: fix various bugs w/ OpenSSL 3+ EVP API,
    2023-09-01)

They were mostly trivial resolutions, and the results can be viewed in
the range-diff included below.

(From last time: the motivating use-case behind these changes is to
better support repositories who invoke merge-tree frequently, generating
a potentially large number of loose objects, resulting in a possible
adverse effect on performance.)

Thanks in advance for any review!

Taylor Blau (7):
  bulk-checkin: factor out `format_object_header_hash()`
  bulk-checkin: factor out `prepare_checkpoint()`
  bulk-checkin: factor out `truncate_checkpoint()`
  bulk-checkin: factor our `finalize_checkpoint()`
  bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
  bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
  builtin/merge-tree.c: implement support for `--write-pack`

 Documentation/git-merge-tree.txt |   4 +
 builtin/merge-tree.c             |   5 +
 bulk-checkin.c                   | 258 ++++++++++++++++++++++++++-----
 bulk-checkin.h                   |   8 +
 merge-ort.c                      |  42 +++--
 merge-recursive.h                |   1 +
 t/t4301-merge-tree-write-tree.sh |  93 +++++++++++
 7 files changed, 363 insertions(+), 48 deletions(-)

Range-diff against v1:
1:  37f4072815 ! 1:  edf1cbafc1 bulk-checkin: factor out `format_object_header_hash()`
    @@ bulk-checkin.c: static void prepare_to_stream(struct bulk_checkin_packfile *stat
      }
      
     +static void format_object_header_hash(const struct git_hash_algo *algop,
    -+				      git_hash_ctx *ctx, enum object_type type,
    ++				      git_hash_ctx *ctx,
    ++				      struct hashfile_checkpoint *checkpoint,
    ++				      enum object_type type,
     +				      size_t size)
     +{
     +	unsigned char header[16384];
    @@ bulk-checkin.c: static void prepare_to_stream(struct bulk_checkin_packfile *stat
     +
     +	algop->init_fn(ctx);
     +	algop->update_fn(ctx, header, header_len);
    ++	algop->init_fn(&checkpoint->ctx);
     +}
     +
      static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
    @@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *st
     -					  OBJ_BLOB, size);
     -	the_hash_algo->init_fn(&ctx);
     -	the_hash_algo->update_fn(&ctx, obuf, header_len);
    -+	format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
    +-	the_hash_algo->init_fn(&checkpoint.ctx);
    ++	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
    ++				  size);
      
      	/* Note: idx is non-NULL when we are writing */
      	if ((flags & HASH_WRITE_OBJECT) != 0)
2:  9cc1f3014a ! 2:  b3f89d5853 bulk-checkin: factor out `prepare_checkpoint()`
    @@ Commit message
     
      ## bulk-checkin.c ##
     @@ bulk-checkin.c: static void format_object_header_hash(const struct git_hash_algo *algop,
    - 	algop->update_fn(ctx, header, header_len);
    + 	algop->init_fn(&checkpoint->ctx);
      }
      
     +static void prepare_checkpoint(struct bulk_checkin_packfile *state,
3:  f392ed2211 = 3:  abe4fb0a59 bulk-checkin: factor out `truncate_checkpoint()`
4:  9c6ca564ad = 4:  0b855a6eb7 bulk-checkin: factor our `finalize_checkpoint()`
5:  30ca7334c7 ! 5:  239bf39bfb bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
    @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
      
     +static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state,
     +					       git_hash_ctx *ctx,
    ++					       struct hashfile_checkpoint *checkpoint,
     +					       struct object_id *result_oid,
     +					       const void *buf, size_t size,
     +					       enum object_type type,
     +					       const char *path, unsigned flags)
     +{
    -+	struct hashfile_checkpoint checkpoint = {0};
     +	struct pack_idx_entry *idx = NULL;
     +	off_t already_hashed_to = 0;
     +
    @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
     +		CALLOC_ARRAY(idx, 1);
     +
     +	while (1) {
    -+		prepare_checkpoint(state, &checkpoint, idx, flags);
    ++		prepare_checkpoint(state, checkpoint, idx, flags);
     +		if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to,
     +					       buf, size, type, path, flags))
     +			break;
    -+		truncate_checkpoint(state, &checkpoint, idx);
    ++		truncate_checkpoint(state, checkpoint, idx);
     +	}
     +
    -+	finalize_checkpoint(state, ctx, &checkpoint, idx, result_oid);
    ++	finalize_checkpoint(state, ctx, checkpoint, idx, result_oid);
     +
     +	return 0;
     +}
    @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
     +				       const char *path, unsigned flags)
     +{
     +	git_hash_ctx ctx;
    ++	struct hashfile_checkpoint checkpoint = {0};
     +
    -+	format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
    ++	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
    ++				  size);
     +
    -+	return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid,
    -+						   buf, size, OBJ_BLOB, path,
    -+						   flags);
    ++	return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
    ++						   result_oid, buf, size,
    ++						   OBJ_BLOB, path, flags);
     +}
     +
      static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
6:  cb0f79cabb ! 6:  57613807d8 bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
    @@ Commit message
         Within `deflate_tree_to_pack_incore()`, the changes should be limited
         to something like:
     
    +        struct strbuf converted = STRBUF_INIT;
             if (the_repository->compat_hash_algo) {
    -          struct strbuf converted = STRBUF_INIT;
               if (convert_object_file(&compat_obj,
                                       the_repository->hash_algo,
                                       the_repository->compat_hash_algo, ...) < 0)
    @@ Commit message
     
               format_object_header_hash(the_repository->compat_hash_algo,
                                         OBJ_TREE, size);
    -
    -          strbuf_release(&converted);
             }
    +        /* compute the converted tree's hash using the compat algorithm */
    +        strbuf_release(&converted);
     
         , assuming related changes throughout the rest of the bulk-checkin
         machinery necessary to update the hash of the converted object, which
    @@ Commit message
     
      ## bulk-checkin.c ##
     @@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
    - 						   flags);
    + 						   OBJ_BLOB, path, flags);
      }
      
     +static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state,
    @@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packf
     +				       const char *path, unsigned flags)
     +{
     +	git_hash_ctx ctx;
    ++	struct hashfile_checkpoint checkpoint = {0};
     +
    -+	format_object_header_hash(the_hash_algo, &ctx, OBJ_TREE, size);
    ++	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_TREE,
    ++				  size);
     +
    -+	return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid,
    -+						   buf, size, OBJ_TREE, path,
    -+						   flags);
    ++	return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
    ++						   result_oid, buf, size,
    ++						   OBJ_TREE, path, flags);
     +}
     +
      static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
7:  e969210145 ! 7:  f21400f56c builtin/merge-tree.c: implement support for `--write-pack`
    @@ merge-ort.c
       * We have many arrays of size 3.  Whenever we have such an array, the
     @@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
      		if ((merge_status < 0) || !result_buf.ptr)
    - 			ret = err(opt, _("Failed to execute internal merge"));
    + 			ret = error(_("failed to execute internal merge"));
      
     -		if (!ret &&
     -		    write_object_file(result_buf.ptr, result_buf.size,
     -				      OBJ_BLOB, &result->oid))
    --			ret = err(opt, _("Unable to add %s to database"),
    --				  path);
    +-			ret = error(_("unable to add %s to database"), path);
     +		if (!ret) {
     +			ret = opt->write_pack
     +				? index_blob_bulk_checkin_incore(&result->oid,
    @@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
     +						    result_buf.size,
     +						    OBJ_BLOB, &result->oid);
     +			if (ret)
    -+				ret = err(opt, _("Unable to add %s to database"),
    -+					  path);
    ++				ret = error(_("unable to add %s to database"),
    ++					    path);
     +		}
      
      		free(result_buf.ptr);
-- 
2.42.0.405.gdb2a2f287e

^ permalink raw reply

* Re: Git Pathspec bug
From: Junio C Hamano @ 2023-10-17 16:02 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: Moritz Widmann, git
In-Reply-To: <2c45e813-738a-480f-8c77-8c646df9c0e3@app.fastmail.com>

"Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:

> On Tue, Oct 17, 2023, at 11:45, Moritz Widmann wrote:
>> I executed the following command in zsh (added `command` just to be sure 
>> that there's no aliases or functions)
>>
>> command git submodule add 
>> 'git@github.com:moritz-t-w/Godot-Onscreen-Keyboard.git' '.'
>> fatal: empty string is not a valid pathspec. please use . instead if you 
>> meant to match all paths
>>
>> Git Version: 2.42.0
>>
>> OS: Arch Linux
>
> Is this the same issue?: https://stackoverflow.com/a/53441183/1725151

It does look so.

It is correct to reject such a request to attempt to add a submodule
as if it is overlayed at the same level as its superproject [*].
But the error message is totally bogus, I think.  It is not that the
pathspec the end-user gave us is wrong (the user does not even give
a pathspec in this case---the last one must be a concrete path in
the superproject where the newly added submodule is), and the user
should not be told anything about "valid" pathspec.

Patches welcome ;-)

Thanks.

[Footnote]

* Our submodules do not allow such a layout (and "git add foo" in
  such an environment would not know to which repository between the
  submodule or the superproject that new file "foo" should be added,
  which is just one example why such a layout is not usable).

^ permalink raw reply

* Re: [PATCH v2 2/2] Prevent git from rehashing 4GiB files
From: Jason Hatton @ 2023-10-17 14:49 UTC (permalink / raw)
  To: Jeff King, brian m. carlson
  Cc: git@vger.kernel.org, Junio C Hamano, Taylor Blau
In-Reply-To: <20231017000019.GB551672@coredump.intra.peff.net>

From: Jeff King <peff@peff.net>

> On Thu, Oct 12, 2023 at 04:09:30PM +0000, brian m. carlson wrote:
> 
> > +static unsigned int munge_st_size(off_t st_size) {
> > +	unsigned int sd_size = st_size;
> > +
> > +	/*
> > +	 * If the file is an exact multiple of 4 GiB, modify the value so it
> > +	 * doesn't get marked as racily clean (zero).
> > +	 */
> > +	if (!sd_size && st_size)
> > +		return 0x80000000;
> > +	else
> > +		return sd_size;
> > +}
> 
> Coverity complained that the "true" side of this conditional is
> unreachable, since sd_size is assigned from st_size, so the two values
> cannot be both true and false. But obviously we are depending here on
> the truncation of off_t to "unsigned int". I'm not sure if Coverity is
> just dumb, or if it somehow has a different size for off_t.
> 
> I don't _think_ this would ever cause confusion in a real compiler, as
> assignment from a larger type to a smaller has well-defined truncation,
> as far as I know.
> 
> But I do wonder if an explicit "& 0xFFFFFFFF" would make it more obvious
> what is happening (which would also do the right thing if in some
> hypothetical platform "unsigned int" ended up larger than 32 bits).
> 
> -Peff

I originally wrote the code this way to work exactly like the original
code with one exception: Never truncate a nonzero st_size to a zero
sd_size. The original code is here in fill_stat_data:

I was attempting to use exactly the same implicit type conversion and
types as the original.

We could probably write the below to do the same thing.

void fill_stat_data(struct stat_data *sd, struct stat *st)
{
      sd->sd_ctime.sec = (unsigned int)st->st_ctime;
      sd->sd_mtime.sec = (unsigned int)st->st_mtime;
      sd->sd_ctime.nsec = ST_CTIME_NSEC(*st);
      sd->sd_mtime.nsec = ST_MTIME_NSEC(*st);
      sd->sd_dev = st->st_dev;
      sd->sd_ino = st->st_ino;
      sd->sd_uid = st->st_uid;
      sd->sd_gid = st->st_gid;
      sd->sd_size = st->st_size;
+      if (sd->sd_size == 0 && st->st_size!= 0) {
+            sd->sd_size = 1;
+      }
}

- Jason D. Hatton


^ permalink raw reply

* [PATCH v2 1/1] [OUTREACHY] add: standardize die() messages output.
From: Naomi Ibe @ 2023-10-17 11:39 UTC (permalink / raw)
  To: git; +Cc: Naomi Ibe

 builtin/add.c: clean up die() messages

    As described in the CodingGuidelines document, a single line
    message given to die() and its friends should not capitalize its
    first word, and should not add full-stop at the end.

Signed-off-by: Naomi Ibe <naomi.ibeh69@gmail.com>
---
 builtin/add.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index c27254a5cd..5126d2ede3 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -182,7 +182,7 @@ static int edit_patch(int argc, const char **argv, const char *prefix)
 	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
 
 	if (repo_read_index(the_repository) < 0)
-		die(_("Could not read the index"));
+		die(_("could not read the index"));
 
 	repo_init_revisions(the_repository, &rev, prefix);
 	rev.diffopt.context = 7;
@@ -200,15 +200,15 @@ static int edit_patch(int argc, const char **argv, const char *prefix)
 		die(_("editing patch failed"));
 
 	if (stat(file, &st))
-		die_errno(_("Could not stat '%s'"), file);
+		die_errno(_("could not stat '%s'"), file);
 	if (!st.st_size)
-		die(_("Empty patch. Aborted."));
+		die(_("empty patch. aborted"));
 
 	child.git_cmd = 1;
 	strvec_pushl(&child.args, "apply", "--recount", "--cached", file,
 		     NULL);
 	if (run_command(&child))
-		die(_("Could not apply '%s'"), file);
+		die(_("could not apply '%s'"), file);
 
 	unlink(file);
 	free(file);
@@ -568,7 +568,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 finish:
 	if (write_locked_index(&the_index, &lock_file,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die(_("Unable to write new index file"));
+		die(_("unable to write new index file"));
 
 	dir_clear(&dir);
 	clear_pathspec(&pathspec);
-- 
2.36.1.windows.1


^ permalink raw reply related

* Re: Method for Calculating Statistics of Developer Contribution to a Specified Branch.
From: Hongyi Zhao @ 2023-10-17 11:37 UTC (permalink / raw)
  To: brian m. carlson, Hongyi Zhao, Git List
In-Reply-To: <ZS2qZtYDvItovjqg@tapette.crustytoothpaste.net>

On Tue, Oct 17, 2023 at 5:26 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2023-10-16 at 14:10:01, Hongyi Zhao wrote:
> > Dear Git Mailing List,
> >
> > I am a developer currently working on a project and I wanted to
> > establish statistics for each team member's contribution to a specific
> > branch.
> >
> > Say, for a user "JianboLin", I am currently using the following method:
> >
> > $ git clone https://github.com/OrderN/CONQUEST-release.git
> > $ cd CONQUEST-release
> > $ git log --author="JianboLin" --stat --summary origin/f-mlff | awk
> > 'NF ==4 && $2 =="|" && $3 ~/[0-9]+/ && $4 ~/[+-]+|[+]+|[-]+/ {s+=$3}
> > END {print s}'
> >
> > Using the above command, I am able to calculate the number of lines
> > contributed by a specific author on a specific branch, which allows me
> > to quantify the contribution to a branch by each team member.
> >
> > However, I would like to know if a more efficient or accurate method
> > exists to carry out this task. Are there any other parameters,
> > commands, or aspects I need to consider to get a more comprehensive
> > measure of contribution?
>
> Can you maybe explain what you want to measure and what your goal is in
> doing so?
>
> The problem is that lines of code isn't really that useful as a measure
> of contribution value or developer productivity, which are the reasons
> people typically measure that metric.  For example, with three lines, a
> colleague fixed a persistently difficult-to-reproduce problem which had
> been affecting many of our largest customers.  That was a very valuable
> contribution, but not very large.  I've made similar kinds of changes
> myself, both at work and in open source projects.
>
> Certainly you can compute the number of lines of code changed by a
> developer, but that is not typically a very useful metric, since it
> doesn't lead you to any interesting conclusions about the benefits or
> value of the contributions or developer in question.  However, perhaps
> you have a different goal in mind, and if you can explain what that is,
> we may be able to help you find a better way of doing it.

I want to calculate a certain developer's contribution based on
different standards of code line count and the importance of the code.

> --
> brian m. carlson (he/him or they/them)
> Toronto, Ontario, CA

Regards,
Zhao

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox