Git development

Git development
 help / color / mirror / Atom feed

* [PATCH 1/1] [OUTREACHY] Fixed add.c file to conform to guidelines when using die() listed in issue #635
From: Naomi Ibe @ 2023-10-09  1:16 UTC (permalink / raw)
  To: git; +Cc: Naomi

From: Naomi <naomi.ibeh69@gmail.com>

Signed-off-by: Naomi Ibe <naomi.ibeh69@gmail.com>
---
 builtin/add.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index c27254a5cd..5126d2ede3 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -182,7 +182,7 @@ static int edit_patch(int argc, const char **argv, const char *prefix)
 	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
 
 	if (repo_read_index(the_repository) < 0)
-		die(_("Could not read the index"));
+		die(_("could not read the index"));
 
 	repo_init_revisions(the_repository, &rev, prefix);
 	rev.diffopt.context = 7;
@@ -200,15 +200,15 @@ static int edit_patch(int argc, const char **argv, const char *prefix)
 		die(_("editing patch failed"));
 
 	if (stat(file, &st))
-		die_errno(_("Could not stat '%s'"), file);
+		die_errno(_("could not stat '%s'"), file);
 	if (!st.st_size)
-		die(_("Empty patch. Aborted."));
+		die(_("empty patch. aborted"));
 
 	child.git_cmd = 1;
 	strvec_pushl(&child.args, "apply", "--recount", "--cached", file,
 		     NULL);
 	if (run_command(&child))
-		die(_("Could not apply '%s'"), file);
+		die(_("could not apply '%s'"), file);
 
 	unlink(file);
 	free(file);
@@ -568,7 +568,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 finish:
 	if (write_locked_index(&the_index, &lock_file,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die(_("Unable to write new index file"));
+		die(_("unable to write new index file"));
 
 	dir_clear(&dir);
 	clear_pathspec(&pathspec);
-- 
2.36.1.windows.1


^ permalink raw reply related

* [PATCH 0/1] *** EDITED add.c ***
From: Naomi Ibe @ 2023-10-09  1:18 UTC (permalink / raw)
  To: git; +Cc: Naomi Ibe

*** BLURB HERE ***

Naomi (1):
  [OUTREACHY] Fixed add.c file to conform to guidelines when using die()
    listed in issue #635

 builtin/add.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
2.36.1.windows.1


^ permalink raw reply

* Re: [PATCH] repack: free existing_cruft array after use
From: Taylor Blau @ 2023-10-09  1:24 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Patrick Steinhardt, Eric Sunshine
In-Reply-To: <20231007172031.GA1524950@coredump.intra.peff.net>

On Sat, Oct 07, 2023 at 01:20:31PM -0400, Jeff King wrote:
> On Mon, Oct 02, 2023 at 08:44:32PM -0400, Taylor Blau wrote:
>
> > +static void collapse_small_cruft_packs(FILE *in, size_t max_size,
> > +				       struct existing_packs *existing)
> > +{
> > +	struct packed_git **existing_cruft, *p;
> > +	struct strbuf buf = STRBUF_INIT;
> > [...]
> > +
> > +	strbuf_release(&buf);
> > +}
>
> Coverity (using the just-merged-to-next version of the workflow file!)
> flagged a leak here. Since the topic (tb/repack-max-cruft-size) is in
> 'next', I think we'd want this on top:

Woohoo! I'm glad that this is already paying dividends.

> -- >8 --
> Subject: [PATCH] repack: free existing_cruft array after use
>
> We allocate an array of packed_git pointers so that we can sort the list
> of cruft packs, but we never free the array, causing a small leak. Note
> that we don't need to free the packed_git structs themselves; they're
> owned by the repository object.

Thanks, I can't believe I missed this when writing this function. The
fix looks obviously correct to me, so this has my:

    Acked-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

^ permalink raw reply

* Re: [OUTREACHY] git send-email issues
From: Naomi Ibe @ 2023-10-09  1:25 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git
In-Reply-To: <CACS=G2wnQEVqiUA8Jy=iaaAnRv7bMKmegR2rGR=Tbhf7UuLR4w@mail.gmail.com>

I finally got the hang of it!! Apparently I was supposed to set
encryption to "ssl" and serverport to 465, but for the past two days
I'd been using "tls" for encryption and serverport as 587, once I
exchanged them, all was done.
I wasn't able to get the hang of gitgitgadget tonight but I'd
definitely be looking into it in the coming days.

On Sun, Oct 8, 2023 at 11:37 PM Naomi Ibe <naomi.ibeh69@gmail.com> wrote:
>
> Oh wow
> I read about gitgadget but didn't understand how it works, I'd
> definitely check it out with these links you sent and I really hope it
> works. Thank you very very much
>
> On Sun, Oct 8, 2023 at 11:17 PM Kristoffer Haugsbakk
> <code@khaugsbakk.name> wrote:
> >
> > Hi Naomi
> >
> > On Sun, Oct 8, 2023, at 23:59, Naomi Ibe wrote:
> > > I've used the --smtp-debug tag, checked the official docs and other
> > > docs too, plus stackoverflow,google and even chatgpt, but nothing
> > > seems to be working. I've even had to change my gmail password tonight
> > > but it still doesn't work. Any tips at all would be greatly
> > > appreciated at this point. Thank you
> >
> > Here [1] is a good resource for setting up Gmail for git-send-email. It's
> > a bit of a chore to set up but that resource was enough for me to get it
> > working. One of the things that you are going to need is an App password.
> >
> > But have you considered using GitGitGadget instead?[2] You can make a pull
> > request on that repository and then the program (gitgitgadget) can send
> > the emails to this mailing list for you.
> >
> > [1] https://stackoverflow.com/a/68238913/1725151
> >
> > [2] https://github.com/gitgitgadget/gitgitgadget/
> >
> > --
> > Kristoffer

^ permalink raw reply

* Re: [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
From: Taylor Blau @ 2023-10-09  1:31 UTC (permalink / raw)
  To: Eric Biederman; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
In-Reply-To: <E81727B0-A523-4A45-A606-61442357291D@gmail.com>

On Fri, Oct 06, 2023 at 10:07:20PM -0500, Eric Biederman wrote:
> On October 6, 2023 5:02:04 PM CDT, Taylor Blau <me@ttaylorr.com> wrote:
> >Within `deflate_tree_to_pack_incore()`, the changes should be limited
> >to something like:
> >
> >    if (the_repository->compat_hash_algo) {
> >      struct strbuf converted = STRBUF_INIT;
> >      if (convert_object_file(&compat_obj,
> >                              the_repository->hash_algo,
> >                              the_repository->compat_hash_algo, ...) < 0)
> >        die(...);
> >
> >      format_object_header_hash(the_repository->compat_hash_algo,
> >                                OBJ_TREE, size);
> >
> >      strbuf_release(&converted);
> >    }
> >
> >, assuming related changes throughout the rest of the bulk-checkin
> >machinery necessary to update the hash of the converted object, which
> >are likewise minimal in size.
>
> So this is close.   Just in case someone wants to
> go down this path I want to point out that
> the converted object need to have the compat hash computed over it.
>
> Which means that the strbuf_release in your example comes a bit early.

Doh. You're absolutely right. Let's fix this if/when we cross that
bridge ;-).

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
From: Taylor Blau @ 2023-10-09  1:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren, Junio C Hamano, git, Eric W. Biederman
In-Reply-To: <20231008173329.GA1557002@coredump.intra.peff.net>

On Sun, Oct 08, 2023 at 01:33:29PM -0400, Jeff King wrote:
> On Sun, Oct 08, 2023 at 12:04:04PM -0400, Taylor Blau wrote:
>
> > > I was interested in the same question as Junio, but from a different
> > > angle.  fast-import documentation points out that the packs it creates
> > > are suboptimal with poorer delta choices.  Are the packs created by
> > > bulk-checkin prone to the same issues?  When I was thinking in terms
> > > of having "git merge" use fast-import for pack creation instead of
> > > writing loose objects (an idea I never investigated very far), I was
> > > wondering if I'd need to mark those packs as "less optimal" and do
> > > something to make sure they were more likely to be repacked.
> > >
> > > I believe geometric repacking didn't exist back when I was thinking
> > > about this, and perhaps geometric repacking automatically handles
> > > things nicely for us.  Does it, or are we risking retaining
> > > sub-optimal deltas from the bulk-checkin code?
> > >
> > > (I've never really cracked open the pack code, so I have absolutely no
> > > idea; I'm just curious.)
> >
> > Yes, the bulk-checkin mechanism suffers from an even worse problem which
> > is the pack it creates will contain no deltas whatsoever. The contents
> > of the pack are just getting written as-is, so there's no fancy
> > delta-ficiation going on.
>
> I wonder how big a deal this would be in practice for merges.
> pack-objects will look for deltas between any two candidates objects,
> but in practice I think most deltas are between objects from multiple
> commits (across the "time" dimension, if you will) rather than within a
> single tree (the "space" dimension). And a merge operation is generally
> creating a single new tree (recursive merging may create intermediate
> states which would delta, but we don't actually need to keep those
> intermediate ones. I won't be surprised if we do, though).
>
> We should be able to test that theory by looking at existing deltas.
> Here's a script which builds an index of blobs and trees to the commits
> that introduce them:
>
>   git rev-list HEAD |
>   git diff-tree --stdin -r -m -t --raw |
>   perl -lne '
>     if (/^[0-9a-f]/) {
>       $commit = $_;
>     } elsif (/^:\S+ \S+ \S+ (\S+)/) {
>       $h{$1} = $commit;
>     }
>     END { print "$_ $h{$_}" for keys(%h) }
>   ' >commits.db
>
> And then we can see which deltas come from the same commit:
>
>   git cat-file --batch-all-objects --batch-check='%(objectname) %(deltabase)' |
>   perl -alne '
>     BEGIN {
>       open(my $fh, "<", "commits.db");
>       %commit = map { chomp; split } <$fh>;
>     }
>     next if $F[1] =~ /0{40}/; # not a delta
>     next unless defined $commit{$F[0]}; # not in index
>     print $commit{$F[0]} eq $commit{$F[1]} ? "inner" : "outer", " ", $_;
>   '
>
> In git.git, I see 460 "inner" deltas, and 193,081 "outer" ones. The
> inner ones are mostly small single-file test vectors, which makes sense.
> It's possible to have a merge result that does conflict resolution in
> two such files (that would then delta), but it seems like a fairly
> unlikely case. Numbers for linux.git are similar.
>
> So it might just not be a big issue at all for this use case.

Very interesting, thanks for running (and documenting!) this experiment.
I'm mostly with you that it probably doesn't make a huge difference in
practice here.

One thing that I'm not entirely clear on is how we'd treat objects that
could be good delta candidates for each other between two packs. For
instance, if I write a tree corresponding to the merge between two
branches, it's likely that the resulting tree would be a good delta
candidate against either of the trees at the tips of those two refs.

But we won't pack those trees (the ones at the tips of the refs) in the
same pack as the tree containing their merge. If we later on tried to
repack, would we evaluate the tip trees as possible delta candidates
against the merged tree? Or would we look at the merged tree, realize it
isn't delta'd with anything, and then not attempt to find any
candidates?

> > I think Michael Haggerty (?) suggested to me off-list that it might be
> > interesting to have a flag that we could mark packs with bad/no deltas
> > as such so that we don't implicitly trust their contents as having high
> > quality deltas.
>
> I was going to suggest the same thing. ;) Unfortunately it's a bit
> tricky to do as we have no room in the file format for an optional flag.
> You'd have to add a ".mediocre-delta" file or something.

Yeah, I figured that we'd add a new ".baddeltas" file or something. (As
an aside, we probably should have an optional flags section in the .pack
format, since we seem to have a lot of optional pack extensions: .rev,
.bitmap, .keep, .promisor, etc.)

> But here's another approach. I recall discussing a while back the idea
> that we should not necessarily trust the quality of deltas in packs that
> are pushed (and I think Thomas Gummerer even did some experiments inside
> GitHub with those, though I don't remember the results). And one way
> around that is during geometric repacking to consider the biggest/oldest
> pack as "preferred", reuse its deltas, but always compute from scratch
> with the others (neither reusing on-disk deltas, nor skipping
> try_delta() when two objects come from the same pack).
>
> That same strategy would work here (and for incremental fast-import
> packs, though of course not if your fast-import pack is the "big" one
> after you do a from-scratch import).

Yeah, definitely. I think that that code is still living in GitHub's
fork, but inactive since we haven't set any of the relevant
configuration in GitHub's production environment.

> Possibly it exacerbates the "no deltas" issue from above (though it
> would depend on the command).  The bigger question to me is one of
> checkpointing. When do we finish off the pack with a .idx and make it
> available to other readers? We could do it at program exit, but I
> suspect there are some commands that really want to make objects
> available sooner (e.g., as soon as "git add" writes an index, we'd want
> those objects to already be available). Probably every program that
> writes objects would need to be annotated with a checkpoint call (which
> would be a noop in loose object mode).
>
> So maybe it's a dumb direction. I dunno.

I wouldn't say it's a dumb direction ;-). But I'd be hesitant pursuing
it without solving the "no deltas" question from earlier.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 00/25] Documentation fixes
From: Taylor Blau @ 2023-10-09  1:44 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren
In-Reply-To: <pull.1595.git.1696747527.gitgitgadget@gmail.com>

On Sun, Oct 08, 2023 at 06:45:02AM +0000, Elijah Newren via GitGitGadget wrote:
> It turns out that AI is pretty good at making small fixes to documentation;
> certainly not perfect, but it provides quite good signal. Unfortunately,
> there is a lot to sift through. Some points about my strategy:

Quite interesting ;-).

I'm curious to learn a little bit more about your
strategy beyond what you wrote:

  - What tool did you use? ChatGPT? Something home-grown?
  - (Assuming this was generated by some sort of LLM): what did you
    prompt it with?
  - What was the output format: the edited text in its entirety, or a
    patch that can be applied on top?

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 1/1] [OUTREACHY] Fixed add.c file to conform to guidelines when using die() listed in issue #635
From: Christian Couder @ 2023-10-09  7:27 UTC (permalink / raw)
  To: Naomi Ibe; +Cc: git
In-Reply-To: <20231009011652.1791-1-naomi.ibeh69@gmail.com>

On Mon, Oct 9, 2023 at 3:17 AM Naomi Ibe <naomi.ibeh69@gmail.com> wrote:
>
> From: Naomi <naomi.ibeh69@gmail.com>

First the subject should start, after "[PATCH 1/1][Outreachy]", with
the area of the code you are changing followed by ":", so here "add:"
(no need for ".c").

Also even if the subject gives a lot of information already, it's
better to use the body of the commit message to give a bit more
context and details. For example here either the subject or the body
of the commit message should say which specific guideline(s) the patch
is enforcing.

> Signed-off-by: Naomi Ibe <naomi.ibeh69@gmail.com>
> ---
>  builtin/add.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>

Otherwise the patch looks good to me.

Thanks!

^ permalink raw reply

* Re: [PATCH 0/1] *** EDITED add.c ***
From: Christian Couder @ 2023-10-09  7:38 UTC (permalink / raw)
  To: Naomi Ibe; +Cc: git
In-Reply-To: <20231009011842.1956-1-naomi.ibeh69@gmail.com>

On Mon, Oct 9, 2023 at 3:19 AM Naomi Ibe <naomi.ibeh69@gmail.com> wrote:
>
> *** BLURB HERE ***

A patch with number 0/X (where X is any number) is called a "cover
letter" and is supposed to explain the purpose of the patch series it
is part of. Usually a cover letter is not useful for a single patch
long patch series (when X is 1), as the commit message of the patch
should be enough.

Also things like "*** BLURB HERE ***" (including the stars) are
supposed to be replaced by a subject or some comments to explain
things.

^ permalink raw reply

* Re: [PATCH 0/1] *** EDITED add.c ***
From: Naomi Ibe @ 2023-10-09  7:45 UTC (permalink / raw)
  To: Christian Couder; +Cc: git
In-Reply-To: <CAP8UFD3HZOMJTxW5EkUwKu48GebSKX3-EPD8tjGEQnE2MGaZ7w@mail.gmail.com>

Okay I understand it better now ,can I do anything about it? Or should
I leave it as is?.
It's my first time using the "git patch" and "git send-email"
commands, so it was a real struggle understanding the proper way to
get things setup.

On Mon, Oct 9, 2023 at 8:39 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Mon, Oct 9, 2023 at 3:19 AM Naomi Ibe <naomi.ibeh69@gmail.com> wrote:
> >
> > *** BLURB HERE ***
>
> A patch with number 0/X (where X is any number) is called a "cover
> letter" and is supposed to explain the purpose of the patch series it
> is part of. Usually a cover letter is not useful for a single patch
> long patch series (when X is 1), as the commit message of the patch
> should be enough.
>
> Also things like "*** BLURB HERE ***" (including the stars) are
> supposed to be replaced by a subject or some comments to explain
> things.

^ permalink raw reply

* Re: [PATCH 1/1] [OUTREACHY] Fixed add.c file to conform to guidelines when using die() listed in issue #635
From: Naomi Ibe @ 2023-10-09  7:57 UTC (permalink / raw)
  To: Christian Couder; +Cc: git
In-Reply-To: <CAP8UFD35DBBwQ1Mgc+NGVoh1ReLncAz9OJF3Yj++FFrESw8rtw@mail.gmail.com>

Thank you very much for the feedback Christian, I'd definitely keep
this in mind the next time I'm creating/sending a patch.
You also said asides working on my commit message, the patch looks
good, but as I'm applying to Outreachy, I need to send in my proof of
contribution as a link of a task I worked on.
Would sending a link of the mailing list , with the url of this patch
I just worked on suffice?Something like this :
https://public-inbox.org/git/20231009011652.1791-1-naomi.ibeh69@gmail.com/T/#u
I ask because since I used patches to send to Git , I have no pull
request link to submit.

On Mon, Oct 9, 2023 at 8:28 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Mon, Oct 9, 2023 at 3:17 AM Naomi Ibe <naomi.ibeh69@gmail.com> wrote:
> >
> > From: Naomi <naomi.ibeh69@gmail.com>
>
> First the subject should start, after "[PATCH 1/1][Outreachy]", with
> the area of the code you are changing followed by ":", so here "add:"
> (no need for ".c").
>
> Also even if the subject gives a lot of information already, it's
> better to use the body of the commit message to give a bit more
> context and details. For example here either the subject or the body
> of the commit message should say which specific guideline(s) the patch
> is enforcing.
>
> > Signed-off-by: Naomi Ibe <naomi.ibeh69@gmail.com>
> > ---
> >  builtin/add.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
>
> Otherwise the patch looks good to me.
>
> Thanks!

^ permalink raw reply

* Re: [PATCH 0/1] *** EDITED add.c ***
From: Christian Couder @ 2023-10-09  8:03 UTC (permalink / raw)
  To: Naomi Ibe; +Cc: git
In-Reply-To: <CACS=G2xfUYVmvhZ_r447uArW-_6yK0r7V9BkzX7i+E7MvruA9Q@mail.gmail.com>

On Mon, Oct 9, 2023 at 9:46 AM Naomi Ibe <naomi.ibeh69@gmail.com> wrote:
>
> Okay I understand it better now ,can I do anything about it? Or should
> I leave it as is?.

I think you should send a version 2 of the patch (so without a cover
letter) taking into account the comments I made about it. You might
want to wait a bit before sending version 2 though, so that others
might have time to read your patch and give you more feedback about
it.

> It's my first time using the "git patch

I guess you mean `git format-patch`.

> and "git send-email"
> commands, so it was a real struggle understanding the proper way to
> get things setup.

Yeah, it's not easy to set things up, but once that's done it should
be much easier. To make things even easier, you could perhaps save
somewhere the commands that you have previously used. And to improve
or make changes to those commands, you could take a look at the
documentation for `git format-patch` and `git send-email` which have a
number of interesting options, and take note of those interesting
options.

^ permalink raw reply

* Re: [PATCH 0/1] *** EDITED add.c ***
From: Naomi Ibe @ 2023-10-09  8:14 UTC (permalink / raw)
  To: Christian Couder; +Cc: git
In-Reply-To: <CAP8UFD3Cj7nC704Rti8jBp6VxBJDxsBpf_8vvordW8MWXc_GDA@mail.gmail.com>

Okay then, I'd wait for additional feedback and send in a Version 2
later. Thank you

On Mon, Oct 9, 2023 at 9:03 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Mon, Oct 9, 2023 at 9:46 AM Naomi Ibe <naomi.ibeh69@gmail.com> wrote:
> >
> > Okay I understand it better now ,can I do anything about it? Or should
> > I leave it as is?.
>
> I think you should send a version 2 of the patch (so without a cover
> letter) taking into account the comments I made about it. You might
> want to wait a bit before sending version 2 though, so that others
> might have time to read your patch and give you more feedback about
> it.
>
> > It's my first time using the "git patch
>
> I guess you mean `git format-patch`.
>
> > and "git send-email"
> > commands, so it was a real struggle understanding the proper way to
> > get things setup.
>
> Yeah, it's not easy to set things up, but once that's done it should
> be much easier. To make things even easier, you could perhaps save
> somewhere the commands that you have previously used. And to improve
> or make changes to those commands, you could take a look at the
> documentation for `git format-patch` and `git send-email` which have a
> number of interesting options, and take note of those interesting
> options.

^ permalink raw reply

* Re: [PATCH 1/1] [OUTREACHY] Fixed add.c file to conform to guidelines when using die() listed in issue #635
From: Christian Couder @ 2023-10-09  8:22 UTC (permalink / raw)
  To: Naomi Ibe; +Cc: git
In-Reply-To: <CACS=G2zcU+o6av=CQy7WAG7DZmNEERcPqB_W3Cmub4w25V3K4g@mail.gmail.com>

On Mon, Oct 9, 2023 at 9:57 AM Naomi Ibe <naomi.ibeh69@gmail.com> wrote:
>
> Thank you very much for the feedback Christian, I'd definitely keep
> this in mind the next time I'm creating/sending a patch.

Great!

> You also said asides working on my commit message, the patch looks
> good,

Yeah, but first others might find some issues in the patch itself,
also, as I say elsewhere, I think it's worth it for you to send a
version 2 of the patch that fixes the commit message issues I pointed
out.

> but as I'm applying to Outreachy, I need to send in my proof of
> contribution as a link of a task I worked on.
> Would sending a link of the mailing list , with the url of this patch
> I just worked on suffice?Something like this :
> https://public-inbox.org/git/20231009011652.1791-1-naomi.ibeh69@gmail.com/T/#u
> I ask because since I used patches to send to Git , I have no pull
> request link to submit.

Yeah, links to the emails you sent in a mailing list archive (Public
Inbox or lore.kernel.org/git) are fine.

^ permalink raw reply

* Re: [PATCH] doc/cat-file: clarify description regarding various command forms
From: Štěpán Němec @ 2023-10-09  8:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, avarab
In-Reply-To: <20231005210125.GA981206@coredump.intra.peff.net>

On Thu, 5 Oct 2023 17:01:25 -0400
Jeff King wrote:

> On Thu, Oct 05, 2023 at 07:48:52PM +0200, Štěpán Němec wrote:
>
>> > Yeah, I think that is a big improvement over the status quo. I might
>> > also be worth starting with a single-sentence overview of what is common
>> > to both modes. Something like:
>> >
>> >   Output the contents or details of one or more objects. [...]
>> 
>> I thought about that when proposing the rewrite, but feel that it would
>> again just duplicate what's said elsewhere, in this case even before,
>> not after, in the very first line of the man page:
>> 
>>     git-cat-file - Provide content or type and size information for
>>     repository objects
>
> Ah, true, I was thinking that the DESCRIPTION section would be the first
> thing users would read, but I didn't notice the headline. I agree that
> what it says is probably sufficient (though arguably "type and size" is
> slightly inaccurate there; I said "details" in my proposed text but
> maybe that is too vague).

We could also leave the NAME vague(r) and put an additional sentence at
the beginning of DESCRIPTION:

NAME
    git-cat-file - Provide contents or details of repository objects

SYNOPSIS
    [...]

DESCRIPTION
    Output the contents or other properties such as size, type or delta
    information of one or more objects.

    The command can operate [...]

-- 
Štěpán

^ permalink raw reply

* Re: [Outreachy] Move existing tests to a unit testing framework
From: Luma @ 2023-10-09  9:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder
In-Reply-To: <CAFR+8DyN8vbuvdgZPkSVqS2=sqconwhx3QfcpJ0+Wi_oCA=s0w@mail.gmail.com>

Dear Git Community and Mentors,

I hope this email finds you well. My name is Achu Luma, and I am
excited to submit my application for the Outreachy program with the
Git project.
I have been a passionate open-source enthusiast and a dedicated Git
user for two years, and I am thrilled at the opportunity to contribute
to the Git community.

Introduction:
----------------
I study Computer Science from the University of Bamenda. Over the past
4 years, I have gained experience in software development and have
participated in various class projects.

Why I am a Good Fit:
----------------------
1. Proficient with Git: I have a good understanding of Git's version
control system and have successfully used it in both personal and
educational projects.

2. Strong Programming Skills: My programming skills in python, C etc
and experience with git, shell etc make me well-prepared to contribute
to Git's codebase.

3. Open Source Involvement: I have actively contributed to git
open-source project, including
https://public-inbox.org/git/20231003174853.1732-1-ach.lumap@gmail.com/T/#t
, where I have submitted a patch that has been well-received.

Project Idea - Moving Existing Tests to a Unit Testing Framework:
------------------------------------------------------------------
I am excited about "Moving Existing Tests to a Unit Testing Framework".
The objective of this project is to enhance the efficiency and
maintainability of Git's testing infrastructure by porting existing
unit tests to a unit testing framework.

**Project Plan**:
- Evaluate the existing tests in the `t/helper/` directory to identify
those suitable for migration to the unit testing framework.
- Develop a migration strategy and create detailed plans for adapting
these tests.
- Port the identified tests to the unit testing framework while
ensuring they maintain their functionality.
- Verify the correctness and reliability of the migrated tests through
thorough testing and validation.
- Collaborate with the Git community to gather feedback and make
necessary adjustments.

**Timeline**:
- Community Bonding (Oct 2 - Nov 20): Familiarize myself with the Git
project and establish communication channels.
- Coding Phase (Dec 4 - Jan 15): Implement the migration of tests and
seek feedback from mentors and the community.
- Testing and Validation (Jan 15 - Feb 15): Rigorously test the
migrated tests and make improvements based on feedback.
- Documentation and Finalization (Feb 15 - March 1): Document the
migration process and finalize the project.

**Contribution to Git Community**:
I have actively participated in Git's mailing-list discussions and
submitted a patch(
https://public-inbox.org/git/20231003174853.1732-1-ach.lumap@gmail.com/T/#t)
for review. I have received positive feedback on my contributions, and
it has been queued to be merged into official Git branches maintained
by Junio. Additionally, I have been involved in discussions related to
the git project.(https://public-inbox.org/git/CAFR+8DyN8vbuvdgZPkSVqS2=sqconwhx3QfcpJ0+Wi_oCA=s0w@mail.gmail.com/T/#t)

**Proposal Drafts**:
I have shared drafts of this proposal on the Git mailing list
git@vger.kernel.org  and will incorporate valuable feedback provided
by the community.

**Next Steps**:
I am eager to discuss my proposal further and collaborate with the Git
community to ensure the success of this project. I will continue to
engage with the community, seek guidance, and refine my proposal as
per your suggestions.

Thank you for considering my application. I look forward to the
opportunity to contribute to the Git project and help make it even
more robust and reliable.

Best Regards,

On Wed, Oct 4, 2023 at 12:36 AM Luma <ach.lumap@gmail.com> wrote:
>
> oh yes, "Move existing tests to a unit testing framework" was the
> only listed project for this current Outreachy cohort. So, I used it
> to express my intent.
> I appreciate the clarification on authorship identity for patches. I
> will update subsequent patches with a legal full name to conform to
> the community rules.
>
> Regards.
>
> On Tue, Oct 3, 2023 at 7:51 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Luma <ach.lumap@gmail.com> writes:
> >
> > > Hi;
> > > My name is Luma, and  I wanted to take a moment to introduce myself
> > > and share some
> > > insights on an essential aspect of  avoiding pipes in git related
> > > commands in test scripts.
> > >
> > > I am an outreachy applicant for the December 2023 cohort and look
> > > forward to learning from you.
> >
> > I notice that the title of the message and the immediate topic you
> > discuss in the body of the message do not match.  I presume that the
> > topic on the title is what you prefer to work on if the unit testing
> > framework is ready by the time Outreachy program starts, and the
> > mention about "do not clobber exit code of Git with pipes in the
> > tests" is your "dip the tip of a toe in water" microproject?
> >
> > Welcome to the Git development community.
> >
> > Do you have a single word name?  If so please disregard the below,
> > but in case "Luma" is just a nickname (e.g. like I am introducing
> > myself to my Git friends "Hi, I am Gitster!") you use online, please
> > read on.
> >
> > For signing off your patches, we'd prefer to see your real name
> > used, as Signed-off-by: is meant to have legal significance.  And
> > because we also expect the authorship identity to match the
> > name/e-mail of the sign-off, it would mean your patch submissions
> > are expected to look like:
> >
> >         From: Luma <ach.lumap@gmail.com>
> >         Subject: ... title of the patch goes here ...
> >
> >         ... body of the proposed commit log message goes here...
> >
> >         Signed-off-by: Luma <ach.lumap@gmail.com>
> >
> > but "Luma" replaced with your full real name.
> >
> > Thanks.

^ permalink raw reply

* Re: [PATCH v6] merge-tree: add -X strategy option
From: Phillip Wood @ 2023-10-09  9:58 UTC (permalink / raw)
  To: Izzy via GitGitGadget, git; +Cc: Elijah Newren, Jeff King, Izzy
In-Reply-To: <pull.1565.v6.git.1695522222723.gitgitgadget@gmail.com>

On 24/09/2023 03:23, Izzy via GitGitGadget wrote:

Thanks for updating the patch, sorry it has taken me a while to look at 
this.

> From: Tang Yuyi <winglovet@gmail.com>
> 
> Add merge strategy option to produce more customizable merge result such
> as automatically resolving conflicts.
> 
> Signed-off-by: Tang Yuyi <winglovet@gmail.com>
> ---

>   static int real_merge(struct merge_tree_options *o,
> @@ -423,7 +425,7 @@ static int real_merge(struct merge_tree_options *o,
>   {
>   	struct commit *parent1, *parent2;
>   	struct commit_list *merge_bases = NULL;
> -	struct merge_options opt;
> +	struct merge_options opt = o->merge_options;

Copying struct merge_options by value here is unusual in our code base. 
I don't think it introduces a bug (there is no function to free the 
resources allocated in struct merge_options so we do not end up freeing 
them twice for example) but it would be clearer that it was safe if you did

	struct merge_options *opt = &o->merge_options;

and updated the code to reflect that opt is now a pointer or just 
replaced all uses of "opt" with "o->merge_options"

> +	if (xopts.nr && o.mode == MODE_TRIVIAL)
> +		die(_("--trivial-merge is incompatible with all other options"));

This is good, thanks for adding it.

> +	for (int x = 0; x < xopts.nr; x++)

This is not worth a re-roll on its own but as xopts.nr is a size_t we 
should declare x to be the same type rather than an int.

The tests are pretty minimal, ideally we would check that "-X" works 
with "--stdin", that it is rejected by a trivial merge and that one of 
the other strategy options works.

Best Wishes

Phillip

^ permalink raw reply

* Re: What's cooking in git.git (Oct 2023, #03; Fri, 6)
From: Phillip Wood @ 2023-10-09  9:59 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Elijah Newren, Izzy
In-Reply-To: <xmqqh6n24zf1.fsf@gitster.g>

On 07/10/2023 09:20, Junio C Hamano wrote:
> * ty/merge-tree-strategy-options (2023-09-25) 1 commit
>    (merged to 'next' on 2023-09-29 at aa65b54416)
>   + merge-tree: add -X strategy option
> 
>   "git merge-tree" learned to take strategy backend specific options
>   via the "-X" option, like "git merge" does.
> 
>   Will merge to 'master'.
>   source: <pull.1565.v6.git.1695522222723.gitgitgadget@gmail.com>

We may want to hold off on this as there is a wierd copy-by-value of 
struct merge-options cf <a482d047-dd40-436d-8daa-0c74780af11f@gmail.com>

Best Wishes

Phillip

^ permalink raw reply

* Re: [PATCH 0/4] Performance improvement & cleanup in loose ref iteration
From: Patrick Steinhardt @ 2023-10-09 10:04 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
In-Reply-To: <pull.1594.git.1696615769.gitgitgadget@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5313 bytes --]

On Fri, Oct 06, 2023 at 06:09:25PM +0000, Victoria Dye via GitGitGadget wrote:
> While investigating ref iteration performance in builtins like
> 'for-each-ref' and 'show-ref', I found two small improvement opportunities.
> 
> The first patch tweaks the logic around prefix matching in
> 'cache_ref_iterator_advance' so that we correctly skip refs that do not
> actually match a given prefix. The unnecessary iteration doesn't seem to be
> causing any bugs in the ref iteration commands that I've tested, but it
> doesn't hurt to be more precise (and it helps with some other patches I'm
> working on ;) ).
> 
> The next three patches update how 'loose_fill_ref_dir' determines the type
> of ref cache entry to create (directory or regular). On platforms that
> include d_type information in 'struct dirent' (as far as I can tell, all
> except NonStop & certain versions of Cygwin), this allows us to skip calling
> 'stat'. In ad-hoc testing, this improved performance of 'git for-each-ref'
> by about 20%.

I've done a small set of benchmarks with my usual test repositories,
which is linux.git with a bunch of references added. The repository
comes in four sizes:

- small: 50k references
- medium: 500k references
- high:  1.1m references
- huge: 12m references

Unfortunately, I couldn't really reproduce the performance improvements.
In fact, the new version runs consistently a tiny bit slower than the
old version:

    # Old version, which is 3a06386e31 (The fifteenth batch, 2023-10-04).

    Benchmark 1: git for-each-ref (revision=old,refcount=small)
      Time (mean ± σ):     135.5 ms ±   1.2 ms    [User: 76.4 ms, System: 59.0 ms]
      Range (min … max):   134.8 ms … 136.9 ms    3 runs

    Benchmark 2: git for-each-ref (revision=old,refcount=medium)
      Time (mean ± σ):     822.7 ms ±   2.2 ms    [User: 697.4 ms, System: 125.1 ms]
      Range (min … max):   821.1 ms … 825.2 ms    3 runs

    Benchmark 3: git for-each-ref (revision=old,refcount=high)
      Time (mean ± σ):      1.960 s ±  0.015 s    [User: 1.702 s, System: 0.257 s]
      Range (min … max):    1.944 s …  1.973 s    3 runs

    # New version, which is your tip.

    Benchmark 4: git for-each-ref (revision=old,refcount=huge)
      Time (mean ± σ):     16.815 s ±  0.054 s    [User: 15.091 s, System: 1.722 s]
      Range (min … max):   16.760 s … 16.869 s    3 runs

    Benchmark 5: git for-each-ref (revision=new,refcount=small)
      Time (mean ± σ):     136.0 ms ±   0.2 ms    [User: 78.8 ms, System: 57.1 ms]
      Range (min … max):   135.8 ms … 136.2 ms    3 runs

    Benchmark 6: git for-each-ref (revision=new,refcount=medium)
      Time (mean ± σ):     830.4 ms ±  21.2 ms    [User: 691.3 ms, System: 138.7 ms]
      Range (min … max):   814.2 ms … 854.5 ms    3 runs

    Benchmark 7: git for-each-ref (revision=new,refcount=high)
      Time (mean ± σ):      1.966 s ±  0.013 s    [User: 1.717 s, System: 0.249 s]
      Range (min … max):    1.952 s …  1.978 s    3 runs

    Benchmark 8: git for-each-ref (revision=new,refcount=huge)
      Time (mean ± σ):     16.945 s ±  0.037 s    [User: 15.182 s, System: 1.760 s]
      Range (min … max):   16.910 s … 16.983 s    3 runs

    Summary
      git for-each-ref (revision=old,refcount=small) ran
        1.00 ± 0.01 times faster than git for-each-ref (revision=new,refcount=small)
        6.07 ± 0.06 times faster than git for-each-ref (revision=old,refcount=medium)
        6.13 ± 0.17 times faster than git for-each-ref (revision=new,refcount=medium)
       14.46 ± 0.17 times faster than git for-each-ref (revision=old,refcount=high)
       14.51 ± 0.16 times faster than git for-each-ref (revision=new,refcount=high)
      124.09 ± 1.15 times faster than git for-each-ref (revision=old,refcount=huge)
      125.05 ± 1.12 times faster than git for-each-ref (revision=new,refcount=huge)

The performance regression isn't all that concerning, but it makes me
wonder why I see things becoming slower rather than faster. My guess is
that this is because all my test repositories are well-packed and don't
have a lot of loose references. But I just wanted to confirm how you
benchmarked your change and what the underlying shape of your test repo
was.

Patrick

> Thanks!
> 
>  * Victoria
> 
> Victoria Dye (4):
>   ref-cache.c: fix prefix matching in ref iteration
>   dir.[ch]: expose 'get_dtype'
>   dir.[ch]: add 'follow_symlink' arg to 'get_dtype'
>   files-backend.c: avoid stat in 'loose_fill_ref_dir'
> 
>  diagnose.c           | 42 +++---------------------------------------
>  dir.c                | 33 +++++++++++++++++++++++++++++++++
>  dir.h                | 16 ++++++++++++++++
>  refs/files-backend.c | 14 +++++---------
>  refs/ref-cache.c     |  3 ++-
>  5 files changed, 59 insertions(+), 49 deletions(-)
> 
> 
> base-commit: 3a06386e314565108ad56a9bdb8f7b80ac52fb69
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1594%2Fvdye%2Fvdye%2Fref-iteration-cleanup-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1594/vdye/vdye/ref-iteration-cleanup-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1594
> -- 
> gitgitgadget

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 1/4] ref-cache.c: fix prefix matching in ref iteration
From: Patrick Steinhardt @ 2023-10-09 10:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye
In-Reply-To: <xmqqfs2n8lnn.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 3571 bytes --]

On Fri, Oct 06, 2023 at 02:51:24PM -0700, Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > From: Victoria Dye <vdye@github.com>
> >
> > Update 'cache_ref_iterator_advance' to skip over refs that are not matched
> > by the given prefix.
> >
> > Currently, a ref entry is considered "matched" if the entry name is fully
> > contained within the prefix:
> >
> > * prefix: "refs/heads/v1"
> > * entry: "refs/heads/v1.0"
> >
> > OR if the prefix is fully contained in the entry name:
> >
> > * prefix: "refs/heads/v1.0"
> > * entry: "refs/heads/v1"
> >
> > The first case is always correct, but the second is only correct if the ref
> > cache entry is a directory, for example:
> >
> > * prefix: "refs/heads/example"
> > * entry: "refs/heads/"
> >
> > Modify the logic in 'cache_ref_iterator_advance' to reflect these
> > expectations:
> >
> > 1. If 'overlaps_prefix' returns 'PREFIX_EXCLUDES_DIR', then the prefix and
> >    ref cache entry do not overlap at all. Skip this entry.
> > 2. If 'overlaps_prefix' returns 'PREFIX_WITHIN_DIR', then the prefix matches
> >    inside this entry if it is a directory. Skip if the entry is not a
> >    directory, otherwise iterate over it.
> > 3. Otherwise, 'overlaps_prefix' returned 'PREFIX_CONTAINS_DIR', indicating
> >    that the cache entry (directory or not) is fully contained by or equal to
> >    the prefix. Iterate over this entry.
> >
> > Note that condition 2 relies on the names of directory entries having the
> > appropriate trailing slash. The existing function documentation of
> > 'create_dir_entry' explicitly calls out the trailing slash requirement, so
> > this is a safe assumption to make.
> 
> Thanks for explaining it very well and clearly.  
> 
> Allowing prefix="refs/heads/v1.0" to yield entry="refs/heads/v1"
> (case #2 above that this patch fixes the behaviour for) would cause
> ref_iterator_advance() to return a ref outside the hierarhcy,
> wouldn't it?  So it appears to me that either one of the two would
> be true:
> 
>  * the code is structured in such a way that such a condition does
>    not actually happen (in which case this patch would be a no-op),
>    or
> 
>  * there is a bug in the current code that is fixed by this patch,
>    whose externally observable behaviour can be verified with a
>    test.
> 
> It is not quite clear to me which is the case here.  The code with
> the patch looks more logical than the original, but I am not sure
> how to demonstrate the existing breakage (if any).

Agreed, I also had a bit of a hard time to figure out whether this is an
actual bug fix, a performance improvement or merely a refactoring.

Patrick

> > Signed-off-by: Victoria Dye <vdye@github.com>
> > ---
> >  refs/ref-cache.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> > index 2294c4564fb..6e3b725245c 100644
> > --- a/refs/ref-cache.c
> > +++ b/refs/ref-cache.c
> > @@ -412,7 +412,8 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  
> >  		if (level->prefix_state == PREFIX_WITHIN_DIR) {
> >  			entry_prefix_state = overlaps_prefix(entry->name, iter->prefix);
> > -			if (entry_prefix_state == PREFIX_EXCLUDES_DIR)
> > +			if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
> > +			    (entry_prefix_state == PREFIX_WITHIN_DIR && !(entry->flag & REF_DIR)))
> >  				continue;
> >  		} else {
> >  			entry_prefix_state = level->prefix_state;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH 0/3] rev-list: add support for commits in `--missing`
From: Karthik Nayak @ 2023-10-09 10:55 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak

The `--missing` option in git-rev-list(1) was introduced intitally
to deal with missing blobs in the context of promissory notes.
Eventually the option was extended to also support tree objects in
7c0fe330d5 (rev-list: handle missing tree objects properly,2018-10-05).

This patch series extends the `--missing` option to also add support for
commit objects. We do this by introducing a new flag `MISSING` which is
added whenever we encounter a missing commit during traversal. Then in
`builtin/rev-list` we check for this flag and take the appropriate
action based on the `--missing=*` option used.

This series is an alternate to the patch series I had posted earlier:
https://lore.kernel.org/git/20230908174208.249184-1-karthik.188@gmail.com/.
In that patch, we introduced an option `--ignore-missing-links` which
was added to expose the `ignore_missing_links` bit to the user. The
issue in that patch was that, the option `--ignore-missing-links` didn't
play well the pre-existing `--missing` option. This series avoids that
route and just extends the `--missing` option for commits to solve the
same problem.

Karthik Nayak (3):
  revision: rename bit to `do_not_die_on_missing_objects`
  rev-list: move `show_commit()` to the bottom
  rev-list: add commit object support in `--missing` option

 builtin/reflog.c            |  2 +-
 builtin/rev-list.c          | 93 +++++++++++++++++++------------------
 list-objects.c              |  2 +-
 object.h                    |  2 +-
 revision.c                  |  9 +++-
 revision.h                  | 20 ++++----
 t/t6022-rev-list-missing.sh | 74 +++++++++++++++++++++++++++++
 7 files changed, 145 insertions(+), 57 deletions(-)
 create mode 100755 t/t6022-rev-list-missing.sh

-- 
2.41.0

^ permalink raw reply

* [PATCH 1/3] revision: rename bit to `do_not_die_on_missing_objects`
From: Karthik Nayak @ 2023-10-09 10:55 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak
In-Reply-To: <20231009105528.17777-1-karthik.188@gmail.com>

The bit `do_not_die_on_missing_tree` is used in revision.h to ensure the
revision walker does not die when encountering a missing tree. This is
currently exclusively set within `builtin/rev-list.c` to ensure the
`--missing` option works with missing trees.

In the upcoming commits, we will extend `--missing` to also support
missing commits. So let's rename the bit to
`do_not_die_on_missing_objects`, which is object type agnostic and can
be used for both trees/commits.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/reflog.c   |  2 +-
 builtin/rev-list.c |  2 +-
 list-objects.c     |  2 +-
 revision.h         | 17 +++++++++--------
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/builtin/reflog.c b/builtin/reflog.c
index df63a5892e..9e369a5977 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -298,7 +298,7 @@ static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
 		struct rev_info revs;
 
 		repo_init_revisions(the_repository, &revs, prefix);
-		revs.do_not_die_on_missing_tree = 1;
+		revs.do_not_die_on_missing_objects = 1;
 		revs.ignore_missing = 1;
 		revs.ignore_missing_links = 1;
 		if (verbose)
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ff715d6918..ea77489c38 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -561,7 +561,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	}
 
 	if (arg_missing_action)
-		revs.do_not_die_on_missing_tree = 1;
+		revs.do_not_die_on_missing_objects = 1;
 
 	argc = setup_revisions(argc, argv, &revs, &s_r_opt);
 
diff --git a/list-objects.c b/list-objects.c
index c25c72b32c..47296dff2f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -177,7 +177,7 @@ static void process_tree(struct traversal_context *ctx,
 		    is_promisor_object(&obj->oid))
 			return;
 
-		if (!revs->do_not_die_on_missing_tree)
+		if (!revs->do_not_die_on_missing_objects)
 			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
diff --git a/revision.h b/revision.h
index 50091bbd13..c73c92ef40 100644
--- a/revision.h
+++ b/revision.h
@@ -212,18 +212,19 @@ struct rev_info {
 
 			/*
 			 * Blobs are shown without regard for their existence.
-			 * But not so for trees: unless exclude_promisor_objects
+			 * But not so for trees/commits: unless exclude_promisor_objects
 			 * is set and the tree in question is a promisor object;
 			 * OR ignore_missing_links is set, the revision walker
-			 * dies with a "bad tree object HASH" message when
-			 * encountering a missing tree. For callers that can
-			 * handle missing trees and want them to be filterable
+			 * dies with a "bad <type> object HASH" message when
+			 * encountering a missing object. For callers that can
+			 * handle missing trees/commits and want them to be filterable
 			 * and showable, set this to true. The revision walker
-			 * will filter and show such a missing tree as usual,
-			 * but will not attempt to recurse into this tree
-			 * object.
+			 * will filter and show such a missing object as usual,
+			 * but will not attempt to recurse into this tree/commit
+			 * object. The revision walker will also set the MISSING
+			 * flag for such objects.
 			 */
-			do_not_die_on_missing_tree:1,
+			do_not_die_on_missing_objects:1,
 
 			/* for internal use only */
 			exclude_promisor_objects:1;
-- 
2.41.0


^ permalink raw reply related

* [PATCH 3/3] rev-list: add commit object support in `--missing` option
From: Karthik Nayak @ 2023-10-09 10:55 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak
In-Reply-To: <20231009105528.17777-1-karthik.188@gmail.com>

The `--missing` object option in rev-list currently works only with
missing blobs/trees. For missing commits the revision walker fails with
a fatal error.

Let's extend the functionality of `--missing` option to also support
commit objects. This is done by adding a new `MISSING` flag that the
revision walker sets whenever it encounters a missing tree/commit. The
revision walker will now continue the traversal and call `show_commit()`
even for missing commits. In rev-list we can then check for this flag
and call the existing code for parsing `--missing` objects.

A scenario where this option would be used is to find the boundary
objects between different object directories. Consider a repository with
a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
repository, using the `--missing=print` option while disabling the
alternate object directory allows us to find the boundary objects
between the main and alternate object directory.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/rev-list.c          |  6 +++
 object.h                    |  2 +-
 revision.c                  |  9 ++++-
 revision.h                  |  3 ++
 t/t6022-rev-list-missing.sh | 74 +++++++++++++++++++++++++++++++++++++
 5 files changed, 91 insertions(+), 3 deletions(-)
 create mode 100755 t/t6022-rev-list-missing.sh

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 98542e8b3c..604ae01d0c 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -149,6 +149,12 @@ static void show_commit(struct commit *commit, void *data)
 
 	display_progress(progress, ++progress_counter);
 
+	if (revs->do_not_die_on_missing_objects &&
+	    commit->object.flags & MISSING) {
+		finish_object__ma(&commit->object);
+		return;
+	}
+
 	if (show_disk_usage)
 		total_disk_usage += get_object_disk_usage(&commit->object);
 
diff --git a/object.h b/object.h
index 114d45954d..12913e6116 100644
--- a/object.h
+++ b/object.h
@@ -62,7 +62,7 @@ void object_array_init(struct object_array *array);
 
 /*
  * object flag allocation:
- * revision.h:               0---------10         15             23------27
+ * revision.h:               0---------10         15             22------27
  * fetch-pack.c:             01    67
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
diff --git a/revision.c b/revision.c
index e789834dd1..615645d3d1 100644
--- a/revision.c
+++ b/revision.c
@@ -1168,7 +1168,8 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 	for (parent = commit->parents; parent; parent = parent->next) {
 		struct commit *p = parent->item;
 		int gently = revs->ignore_missing_links ||
-			     revs->exclude_promisor_objects;
+			     revs->exclude_promisor_objects ||
+			     revs->do_not_die_on_missing_objects;
 		if (repo_parse_commit_gently(revs->repo, p, gently) < 0) {
 			if (revs->exclude_promisor_objects &&
 			    is_promisor_object(&p->object.oid)) {
@@ -1176,7 +1177,11 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 					break;
 				continue;
 			}
-			return -1;
+
+			if (!revs->do_not_die_on_missing_objects)
+				return -1;
+			else
+				p->object.flags |= MISSING;
 		}
 		if (revs->sources) {
 			char **slot = revision_sources_at(revs->sources, p);
diff --git a/revision.h b/revision.h
index c73c92ef40..07906bc3bc 100644
--- a/revision.h
+++ b/revision.h
@@ -41,6 +41,9 @@
 /* WARNING: This is also used as REACHABLE in commit-graph.c. */
 #define PULL_MERGE	(1u<<15)
 
+/* WARNING: This is also used as NEEDS_BITMAP in pack-bitmap.c. */
+#define MISSING         (1u<<22)
+
 #define TOPO_WALK_EXPLORED	(1u<<23)
 #define TOPO_WALK_INDEGREE	(1u<<24)
 
diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
new file mode 100755
index 0000000000..bbff66e4fc
--- /dev/null
+++ b/t/t6022-rev-list-missing.sh
@@ -0,0 +1,74 @@
+#!/bin/sh
+
+test_description='handling of missing objects in rev-list'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+# We setup the repository with two commits, this way HEAD is always
+# available and we can hide commit 1.
+test_expect_success 'create repository and alternate directory' '
+	test_commit 1 &&
+	test_commit 2
+'
+
+for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
+do
+	test_expect_success "rev-list --missing=error fails with missing object $obj" '
+		oid="$(git rev-parse $obj)" &&
+		path=".git/objects/$(test_oid_to_path $oid)" &&
+
+		mv "$path" "$path.hidden" &&
+		test_when_finished "mv $path.hidden $path" &&
+
+		test_must_fail git rev-list --missing=error --objects \
+			--no-object-names HEAD
+	'
+done
+
+for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
+do
+	for action in "allow-any" "print"
+	do
+		test_expect_success "rev-list --missing=$action with missing $obj" '
+			oid="$(git rev-parse $obj)" &&
+			path=".git/objects/$(test_oid_to_path $oid)" &&
+
+			# Before the object is made missing, we use rev-list to
+			# get the expected oids.
+			git rev-list --objects --no-object-names \
+				HEAD ^$obj >expect.raw &&
+
+			# Since both the commits have the `1.t` blob, rev-list
+			# will print it since its reachable from either commit. Unless
+			# the blob itself is missing.
+			if [ $obj != "HEAD:1.t" ]; then
+				echo $(git rev-parse HEAD:1.t) >>expect.raw
+			fi &&
+
+			mv "$path" "$path.hidden" &&
+			test_when_finished "mv $path.hidden $path" &&
+
+			git rev-list --missing=$action --objects --no-object-names \
+				HEAD >actual.raw &&
+
+			# When the action is to print, we should also add the missing
+			# oid to the expect list.
+			case $action in
+			allow-any)
+				;;
+			print)
+				grep ?$oid actual.raw &&
+				echo ?$oid >>expect.raw
+				;;
+			esac &&
+
+			sort actual.raw >actual &&
+			sort expect.raw >expect &&
+			test_cmp expect actual
+		'
+	done
+done
+
+
+test_done
-- 
2.41.0


^ permalink raw reply related

* [PATCH 2/3] rev-list: move `show_commit()` to the bottom
From: Karthik Nayak @ 2023-10-09 10:55 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak
In-Reply-To: <20231009105528.17777-1-karthik.188@gmail.com>

The `show_commit()` function already depends on `finish_commit()`, and
in the upcoming commit, we'll also add a dependency on
`finish_object__ma()`. Since in C symbols must be declared before
they're used, let's move `show_commit()` below both `finish_commit()`
and `finish_object__ma()`, so the code is cleaner as a whole without the
need for declarations.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/rev-list.c | 85 +++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 43 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ea77489c38..98542e8b3c 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -100,7 +100,48 @@ static off_t get_object_disk_usage(struct object *obj)
 	return size;
 }
 
-static void finish_commit(struct commit *commit);
+static inline void finish_object__ma(struct object *obj)
+{
+	/*
+	 * Whether or not we try to dynamically fetch missing objects
+	 * from the server, we currently DO NOT have the object.  We
+	 * can either print, allow (ignore), or conditionally allow
+	 * (ignore) them.
+	 */
+	switch (arg_missing_action) {
+	case MA_ERROR:
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
+		return;
+
+	case MA_ALLOW_ANY:
+		return;
+
+	case MA_PRINT:
+		oidset_insert(&missing_objects, &obj->oid);
+		return;
+
+	case MA_ALLOW_PROMISOR:
+		if (is_promisor_object(&obj->oid))
+			return;
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
+		return;
+
+	default:
+		BUG("unhandled missing_action");
+		return;
+	}
+}
+
+static void finish_commit(struct commit *commit)
+{
+	free_commit_list(commit->parents);
+	commit->parents = NULL;
+	free_commit_buffer(the_repository->parsed_objects,
+			   commit);
+}
+
 static void show_commit(struct commit *commit, void *data)
 {
 	struct rev_list_info *info = data;
@@ -219,48 +260,6 @@ static void show_commit(struct commit *commit, void *data)
 	finish_commit(commit);
 }
 
-static void finish_commit(struct commit *commit)
-{
-	free_commit_list(commit->parents);
-	commit->parents = NULL;
-	free_commit_buffer(the_repository->parsed_objects,
-			   commit);
-}
-
-static inline void finish_object__ma(struct object *obj)
-{
-	/*
-	 * Whether or not we try to dynamically fetch missing objects
-	 * from the server, we currently DO NOT have the object.  We
-	 * can either print, allow (ignore), or conditionally allow
-	 * (ignore) them.
-	 */
-	switch (arg_missing_action) {
-	case MA_ERROR:
-		die("missing %s object '%s'",
-		    type_name(obj->type), oid_to_hex(&obj->oid));
-		return;
-
-	case MA_ALLOW_ANY:
-		return;
-
-	case MA_PRINT:
-		oidset_insert(&missing_objects, &obj->oid);
-		return;
-
-	case MA_ALLOW_PROMISOR:
-		if (is_promisor_object(&obj->oid))
-			return;
-		die("unexpected missing %s object '%s'",
-		    type_name(obj->type), oid_to_hex(&obj->oid));
-		return;
-
-	default:
-		BUG("unhandled missing_action");
-		return;
-	}
-}
-
 static int finish_object(struct object *obj, const char *name UNUSED,
 			 void *cb_data)
 {
-- 
2.41.0


^ permalink raw reply related

* Re: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
From: Patrick Steinhardt @ 2023-10-09 10:54 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Junio C Hamano, git, Elijah Newren, Eric W. Biederman, Jeff King
In-Reply-To: <ZSCR7e6KKqFv8mZk@nand.local>

[-- Attachment #1: Type: text/plain, Size: 3831 bytes --]

On Fri, Oct 06, 2023 at 07:02:05PM -0400, Taylor Blau wrote:
> On Fri, Oct 06, 2023 at 03:35:25PM -0700, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > > When using merge-tree often within a repository[^1], it is possible to
> > > generate a relatively large number of loose objects, which can result in
> > > degraded performance, and inode exhaustion in extreme cases.
> >
> > Well, be it "git merge-tree" or "git merge", new loose objects tend
> > to accumulate until "gc" kicks in, so it is not a new problem for
> > mere mortals, is it?
> 
> Yeah, I would definitely suspect that this is more of an issue for
> forges than individual Git users.
> 
> > As one "interesting" use case of "merge-tree" is for a Git hosting
> > site with bare repositories to offer trial merges, without which
> > majority of the object their repositories acquire would have been in
> > packs pushed by their users, "Gee, loose objects consume many inodes
> > in exchange for easier selective pruning" becomes an issue, right?
> 
> Right.
> 
> > Just like it hurts performance to have too many loose object files,
> > presumably it would also hurt performance to keep too many packs,
> > each came from such a trial merge.  Do we have a "gc" story offered
> > for these packs created by the new feature?  E.g., "once merge-tree
> > is done creating a trial merge, we can discard the objects created
> > in the pack, because we never expose new objects in the pack to the
> > outside, processes running simultaneously, so instead closing the
> > new packfile by calling flush_bulk_checkin_packfile(), we can safely
> > unlink the temporary pack.  We do not even need to spend cycles to
> > run a gc that requires cycles to enumerate what is still reachable",
> > or something like that?
> 
> I know Johannes worked on something like this recently. IIRC, it
> effectively does something like:
> 
>     struct tmp_objdir *tmp_objdir = tmp_objdir_create(...);
>     tmp_objdir_replace_primary_odb(tmp_objdir, 1);
> 
> at the beginning of a merge operation, and:
> 
>     tmp_objdir_discard_objects(tmp_objdir);
> 
> at the end. I haven't followed that work off-list very closely, but it
> is only possible for GitHub to discard certain niche kinds of
> merges/rebases, since in general we make the objects created during test
> merges available via refs/pull/N/{merge,rebase}.
> 
> I think that like anything, this is a trade-off. Having lots of packs
> can be a performance hindrance just like having lots of loose objects.
> But since we can represent more objects with fewer inodes when packed,
> storing those objects together in a pack is preferable when (a) you're
> doing lots of test-merges, and (b) you want to keep those objects
> around, e.g., because they are reachable.

In Gitaly, we usually set up quarantine directories for all operations
that create objects. This allows us to discard any newly written objects
in case either the RPC call gets cancelled or in case our access checks
determine that the change should not be allowed. The logic is rather
simple:

    1. Create a new temporary directory.

    2. Set up the new temporary directory as main object database via
       the `GIT_OBJECT_DIRECTORY` environment variable.

    3. Set up the main repository's object database via the
       `GIT_ALTERNATE_OBJECT_DIRECTORIES` environment variable.

    4. Execute Git commands that write objects with these environment
       variables set up. The new objects will end up neatly contained in
       the temporary directory.

    5. Once done, either discard the temporary object database or
       migrate objects into the main object daatabase.

I wonder whether this would be a viable approach for you, as well.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox