Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
From: Patrick Steinhardt @ 2023-10-09 10:54 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Junio C Hamano, git, Elijah Newren, Eric W. Biederman, Jeff King
In-Reply-To: <ZSCR7e6KKqFv8mZk@nand.local>

[-- Attachment #1: Type: text/plain, Size: 3831 bytes --]

On Fri, Oct 06, 2023 at 07:02:05PM -0400, Taylor Blau wrote:
> On Fri, Oct 06, 2023 at 03:35:25PM -0700, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > > When using merge-tree often within a repository[^1], it is possible to
> > > generate a relatively large number of loose objects, which can result in
> > > degraded performance, and inode exhaustion in extreme cases.
> >
> > Well, be it "git merge-tree" or "git merge", new loose objects tend
> > to accumulate until "gc" kicks in, so it is not a new problem for
> > mere mortals, is it?
> 
> Yeah, I would definitely suspect that this is more of an issue for
> forges than individual Git users.
> 
> > As one "interesting" use case of "merge-tree" is for a Git hosting
> > site with bare repositories to offer trial merges, without which
> > majority of the object their repositories acquire would have been in
> > packs pushed by their users, "Gee, loose objects consume many inodes
> > in exchange for easier selective pruning" becomes an issue, right?
> 
> Right.
> 
> > Just like it hurts performance to have too many loose object files,
> > presumably it would also hurt performance to keep too many packs,
> > each came from such a trial merge.  Do we have a "gc" story offered
> > for these packs created by the new feature?  E.g., "once merge-tree
> > is done creating a trial merge, we can discard the objects created
> > in the pack, because we never expose new objects in the pack to the
> > outside, processes running simultaneously, so instead closing the
> > new packfile by calling flush_bulk_checkin_packfile(), we can safely
> > unlink the temporary pack.  We do not even need to spend cycles to
> > run a gc that requires cycles to enumerate what is still reachable",
> > or something like that?
> 
> I know Johannes worked on something like this recently. IIRC, it
> effectively does something like:
> 
>     struct tmp_objdir *tmp_objdir = tmp_objdir_create(...);
>     tmp_objdir_replace_primary_odb(tmp_objdir, 1);
> 
> at the beginning of a merge operation, and:
> 
>     tmp_objdir_discard_objects(tmp_objdir);
> 
> at the end. I haven't followed that work off-list very closely, but it
> is only possible for GitHub to discard certain niche kinds of
> merges/rebases, since in general we make the objects created during test
> merges available via refs/pull/N/{merge,rebase}.
> 
> I think that like anything, this is a trade-off. Having lots of packs
> can be a performance hindrance just like having lots of loose objects.
> But since we can represent more objects with fewer inodes when packed,
> storing those objects together in a pack is preferable when (a) you're
> doing lots of test-merges, and (b) you want to keep those objects
> around, e.g., because they are reachable.

In Gitaly, we usually set up quarantine directories for all operations
that create objects. This allows us to discard any newly written objects
in case either the RPC call gets cancelled or in case our access checks
determine that the change should not be allowed. The logic is rather
simple:

    1. Create a new temporary directory.

    2. Set up the new temporary directory as main object database via
       the `GIT_OBJECT_DIRECTORY` environment variable.

    3. Set up the main repository's object database via the
       `GIT_ALTERNATE_OBJECT_DIRECTORIES` environment variable.

    4. Execute Git commands that write objects with these environment
       variables set up. The new objects will end up neatly contained in
       the temporary directory.

    5. Once done, either discard the temporary object database or
       migrate objects into the main object daatabase.

I wonder whether this would be a viable approach for you, as well.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v3] merge-ort: initialize repo in index state
From: John Cai via GitGitGadget @ 2023-10-09 13:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, John Cai, John Cai
In-Reply-To: <pull.1583.v2.git.git.1696781998420.gitgitgadget@gmail.com>

From: John Cai <johncai86@gmail.com>

initialize_attr_index() does not initialize the repo member of
attr_index. Starting in 44451a2e5e (attr: teach "--attr-source=<tree>"
global option to "git", 2023-05-06), this became a problem because
istate->repo gets passed down the call chain starting in
git_check_attr(). This gets passed all the way down to
replace_refs_enabled(), which segfaults when accessing r->gitdir.

Fix this by initializing the repository in the index state.

Signed-off-by: John Cai <johncai86@gmail.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
---
    merge-ort: initialize repo in index state
    
    initialize_attr_index() does not initialize the repo member of
    attr_index. Starting in 44451a2e5e (attr: teach "--attr-source=" global
    option to "git", 2023-05-06), this became a problem because istate->repo
    gets passed down the call chain starting in git_check_attr(). This gets
    passed all the way down to replace_refs_enabled(), which segfaults when
    accessing r->gitdir.
    
    Fix this by initializing the repository in the index state.
    
    Changes since V2:
    
     * fixed test by using printf instead of echo
    
    Changes since v1:
    
     * using opt->repo to avoid hardcoding another the_repository
     * clarified test

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1583%2Fjohn-cai%2Fjc%2Fpopulate-repo-when-init-attr-index-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1583/john-cai/jc/populate-repo-when-init-attr-index-v3
Pull-Request: https://github.com/git/git/pull/1583

Range-diff vs v2:

 1:  e178236064a ! 1:  792b01fa616 merge-ort: initialize repo in index state
     @@ t/t4300-merge-tree.sh: EXPECTED
      +		git checkout @{-1} &&
      +		tree=$(git --attr-source=gitattributes merge-tree --write-tree \
      +		--merge-base "$base" --end-of-options "$source" "$merge") &&
     -+		echo "foo\nbar\nbaz" >expect &&
     ++		printf "foo\nbar\nbaz\n" >expect &&
      +		git cat-file -p "$tree:file1" >actual &&
      +		test_cmp expect actual
      +	)


 merge-ort.c           |  1 +
 t/t4300-merge-tree.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 7857ce9fbd1..36537256613 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1902,6 +1902,7 @@ static void initialize_attr_index(struct merge_options *opt)
 	struct index_state *attr_index = &opt->priv->attr_index;
 	struct cache_entry *ce;
 
+	attr_index->repo = opt->repo;
 	attr_index->initialized = 1;
 
 	if (!opt->renormalize)
diff --git a/t/t4300-merge-tree.sh b/t/t4300-merge-tree.sh
index 57c4f26e461..c3a03e54187 100755
--- a/t/t4300-merge-tree.sh
+++ b/t/t4300-merge-tree.sh
@@ -86,6 +86,33 @@ EXPECTED
 	test_cmp expected actual
 '
 
+test_expect_success '3-way merge with --attr-source' '
+	test_when_finished rm -rf 3-way &&
+	git init 3-way &&
+	(
+		cd 3-way &&
+		test_commit initial file1 foo &&
+		base=$(git rev-parse HEAD) &&
+		git checkout -b brancha &&
+		echo bar >>file1 &&
+		git commit -am "adding bar" &&
+		source=$(git rev-parse HEAD) &&
+		git checkout @{-1} &&
+		git checkout -b branchb &&
+		echo baz >>file1 &&
+		git commit -am "adding baz" &&
+		merge=$(git rev-parse HEAD) &&
+		git checkout -b gitattributes &&
+		test_commit "gitattributes" .gitattributes "file1 merge=union" &&
+		git checkout @{-1} &&
+		tree=$(git --attr-source=gitattributes merge-tree --write-tree \
+		--merge-base "$base" --end-of-options "$source" "$merge") &&
+		printf "foo\nbar\nbaz\n" >expect &&
+		git cat-file -p "$tree:file1" >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'file change A, B (same)' '
 	git reset --hard initial &&
 	test_commit "change-a-b-same-A" "initial-file" "AAA" &&

base-commit: 493f4622739e9b64f24b465b21aa85870dd9dc09
-- 
gitgitgadget

^ permalink raw reply related

* Re: [Outreachy] Introduction and Interest in Contributing to the Git Community
From: Isoken Ibizugbe @ 2023-10-09 13:31 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Johannes Schindelin
In-Reply-To: <CAP8UFD3WzgADiy07uLGpj23r3jrUnYh_Wdsc1N8ZoaAHQPDZag@mail.gmail.com>

Thank you Christian for the help, I have another idea for a micro
project "modernizing a test script". I have found test scripts that
needs modernizing , which is to rename the setup test 'prepare
reference tree' to 'setup'. Is it appropriate for a micro project and
should I go ahead with it.


On Sun, Oct 8, 2023 at 11:50 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Sat, Oct 7, 2023 at 12:26 PM Isoken Ibizugbe <isokenjune@gmail.com> wrote:
> >
> > Good day,
> > I am interested in working on this issue
> > https://github.com/gitgitgadget/git/issues/1555 as a micro project. Is
> > it worth doing and appropriate for a micro project?
>
> (I have added Dscho, alias Johannes Schindelin, in Cc as he might know
> more about this.)
>
> I think it's worth doing, but I am not sure, as I think it might
> generate regressions for a low number of users who might rely on
> core.fileMode for some special use cases. (Not sure we would want to
> support such hypothetical use cases though.)
>
> Also it's not so simple to implement, you will have to dig a bit in
> the code, and not just implement it, but likely also write tests for
> it. So it's definitely significantly more complex than a simple "test
> cleanup" micro-project. I might overlook things too.
>
> But if you feel like it suits your skills well and are not afraid with
> something a bit complex, I would think that you can go for it.
>
> Best,
> Christian.

^ permalink raw reply

* Re: [Outreachy] Introduction and Interest in Contributing to the Git Community
From: Christian Couder @ 2023-10-09 13:38 UTC (permalink / raw)
  To: Isoken Ibizugbe; +Cc: git, Johannes Schindelin
In-Reply-To: <CAJHH8bEa=xE_xNdbW4rDJQQ9dacAuFQseajdtBmGnZ1bDxZsxQ@mail.gmail.com>

On Mon, Oct 9, 2023 at 3:33 PM Isoken Ibizugbe <isokenjune@gmail.com> wrote:
>
> Thank you Christian for the help, I have another idea for a micro
> project "modernizing a test script". I have found test scripts that
> needs modernizing , which is to rename the setup test 'prepare
> reference tree' to 'setup'. Is it appropriate for a micro project and
> should I go ahead with it.

I am not sure these kinds of renames are worth doing, but if you find
some clear doc or Junio saying that setup tests should be named
"setup", then that might be worth a try.

^ permalink raw reply

* Re: [OUTREACHY] Permission To Work On Tasks
From: Christian Couder @ 2023-10-09 13:45 UTC (permalink / raw)
  To: Naomi Ibe; +Cc: git, Junio C Hamano
In-Reply-To: <CAP8UFD0PKAGchx5iqyZqCdua-KYJcrmO2FfNf_vt7xs=+7YL4Q@mail.gmail.com>

On Sat, Oct 7, 2023 at 9:20 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Sat, Oct 7, 2023 at 1:29 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Naomi Ibe <naomi.ibeh69@gmail.com> writes:

> > > Second issue is this https://github.com/gitgitgadget/git/issues/302 .
> > > Is it still available to be worked on? I notice it was opened in 2019
> >
> > Stepping back a bit, do you agree with what the issue says?
> > Remember, these "issue"s are merely one person's opinion and not
> > endorsed by the community.
> >
> > Before you ask "is it still available", do you know the current
> > status (not the status of the "issue")?  Have you looked at "git
> > commit --help" to find it out yourself to see if "now" is singled
> > out?  Here is what we say in our documentation:
> >
> >     In addition to recognizing all date formats above, the --date
> >     option will also try to make sense of other, more human-centric
> >     date formats, such as relative dates like "yesterday" or "last
> >     Friday at noon".
> >
> > So apparently it is still "available".  It is a different matter how
> > well a patch that adds "now" to the examples listed there will be
> > accepted, though.  During a microproject, one of the things new
> > contributors are expected to learn is to convince others the cause
> > of their patches with the proposed commit log message well.
>
> Yeah, I think this issue, if it is indeed an issue, is not something
> easy to "fix" for a newcomer as it requires to be familiar with our
> documentation and perhaps our code too, or to research them enough to
> understand what a good improvement would be. So you could perhaps do
> it, but it would likely require more work.

Also if it's only a documentation improvement, it might not actually
be a proper microproject as we want them to be "code related".

^ permalink raw reply

* Re: [PATCH v6] merge-tree: add -X strategy option
From: Jeff King @ 2023-10-09 15:53 UTC (permalink / raw)
  To: phillip.wood; +Cc: Izzy via GitGitGadget, git, Elijah Newren, Izzy
In-Reply-To: <a482d047-dd40-436d-8daa-0c74780af11f@gmail.com>

On Mon, Oct 09, 2023 at 10:58:07AM +0100, Phillip Wood wrote:

> > @@ -423,7 +425,7 @@ static int real_merge(struct merge_tree_options *o,
> >   {
> >   	struct commit *parent1, *parent2;
> >   	struct commit_list *merge_bases = NULL;
> > -	struct merge_options opt;
> > +	struct merge_options opt = o->merge_options;
> 
> Copying struct merge_options by value here is unusual in our code base. I
> don't think it introduces a bug (there is no function to free the resources
> allocated in struct merge_options so we do not end up freeing them twice for
> example) but it would be clearer that it was safe if you did
> 
> 	struct merge_options *opt = &o->merge_options;
> 
> and updated the code to reflect that opt is now a pointer or just replaced
> all uses of "opt" with "o->merge_options"

I agree that struct-copying is an unusual pattern, and we'd potentially
run into problems with duplication. But I think it is even trickier than
that here. We also go on to actually _modify_ opt in this function,
assigning to various members (both directly, and I think the merge code
itself will write to opt->priv).

So if we use a pointer (rather than struct assignment), those changes
will persist in the merge_options struct that was passed in. Which is
also weird.

Between the two, I think using a pointer is probably the least-weird.
This real_merge() function is only called once, and is a static-local
helper for cmd_merge_tree(). So the two functions work as a single unit,
and munging "opt" is not a big deal.

-Peff

^ permalink raw reply

* Re: [PATCH] doc/cat-file: clarify description regarding various command forms
From: Jeff King @ 2023-10-09 15:56 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: git, avarab
In-Reply-To: <20231009103651+0200.580831-stepnem@smrk.net>

On Mon, Oct 09, 2023 at 10:36:51AM +0200, Štěpán Němec wrote:

> > Ah, true, I was thinking that the DESCRIPTION section would be the first
> > thing users would read, but I didn't notice the headline. I agree that
> > what it says is probably sufficient (though arguably "type and size" is
> > slightly inaccurate there; I said "details" in my proposed text but
> > maybe that is too vague).
> 
> We could also leave the NAME vague(r) and put an additional sentence at
> the beginning of DESCRIPTION:

Yup, that is a good suggestion. Do you want to wrap all of this
discussion up as a patch?

-Peff

^ permalink raw reply

* [PATCH v4 0/3] diff-merges: introduce '--dd' option
From: Sergey Organov @ 2023-10-09 16:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Sergey Organov
In-Reply-To: <20230909125446.142715-1-sorganov@gmail.com>

This new convenience option requests full diff with respect to first
parent, so that

  git log --dd

will output diff with respect to first parent for every commit,
universally, no matter how many parents the commit turns out to have.

Gives user quick and universal way to see what changes, exactly, were
brought to a branch by merges as well as by regular commits.

'--dd' is implemented as pure synonym for "--diff-merges=first-parent
--patch".

The first commit in the series tweaks diff-merges documentation a bit,
and is valuable by itself. It's put here as '--dd' implementation
commit depends on it in its documentation part.

Note: the need for this new convenience option mostly emerged from
denial by the community of patches that modify '-m' behavior to imply
'-p' as the rest of similar options (such as --cc) do. So, basically,
'--dd' is what '-m' should have been to be more useful.

Updates in v4:

  * Removed "(which see)" reference from documentation that caused
    confusion.

  * Removed explanation why it's --dd and not simply -d from commit
    message.

  * Refined --remerge-diff short description according to Junio and
    Elijah comments.

  * Added explanation of --dd purpose.

  * Fixed style and syntax of "on,m::" description.

Updates in v3:

  * Option renamed from '-d' to '--dd' due to Junio overpowering
    request to keep short-and-sweet '-d' reserved for another (yet
    unspecified) use.

  * Added completion of '--dd' to git-completion.bash.

Updates in v2:

  * Reordered documentation for diff-merges formats in accordance with
    Junio recommendation.

  * Removed clarification of surprising -m behavior due to controversy
    with Junio on how exactly it should look like.

Sergey Organov (3):
  diff-merges: improve --diff-merges documentation
  diff-merges: introduce '--dd' option
  completion: complete '--dd'

 Documentation/diff-options.txt         | 105 ++++++++++++++-----------
 Documentation/git-log.txt              |   4 +-
 contrib/completion/git-completion.bash |   2 +-
 diff-merges.c                          |   3 +
 t/t4013-diff-various.sh                |   8 ++
 5 files changed, 73 insertions(+), 49 deletions(-)

Interdiff against v3:
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f80d493dd4c8..23f95e6172b9 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -45,7 +45,7 @@ endif::git-format-patch[]
 ifdef::git-log[]
 -m::
 	Show diffs for merge commits in the default format. This is
-	similar to '--diff-merges=on' (which see) except `-m` will
+	similar to '--diff-merges=on', except `-m` will
 	produce no output unless `-p` is given as well.
 
 -c::
@@ -62,7 +62,7 @@ ifdef::git-log[]
 	Shortcut for '--diff-merges=first-parent -p'.
 
 --remerge-diff::
-	Produce diff against re-merge.
+	Produce remerge-diff output for merge commits.
 	Shortcut for '--diff-merges=remerge -p'.
 
 --no-diff-merges::
@@ -83,7 +83,7 @@ off, none::
 on, m::
 	Make diff output for merge commits to be shown in the default
 	format. The default format could be changed using
-	`log.diffMerges` configuration parameter, which default value
+	`log.diffMerges` configuration variable, whose default value
 	is `separate`.
 +
 first-parent, 1::
-- 
2.25.1


^ permalink raw reply related

* [PATCH v4 2/3] diff-merges: introduce '--dd' option
From: Sergey Organov @ 2023-10-09 16:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Sergey Organov
In-Reply-To: <20231009160535.236523-1-sorganov@gmail.com>

This option provides a shortcut to request diff with respect to first
parent for any kind of commit, universally. It's implemented as pure
synonym for "--diff-merges=first-parent --patch".

Gives user quick and universal way to see what changes, exactly, were
brought to a branch by merges as well as by regular commits.

Signed-off-by: Sergey Organov <sorganov@gmail.com>
---
 Documentation/diff-options.txt | 5 +++++
 Documentation/git-log.txt      | 2 +-
 diff-merges.c                  | 3 +++
 t/t4013-diff-various.sh        | 8 ++++++++
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 69065c0e90a8..23f95e6172b9 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -56,6 +56,11 @@ ifdef::git-log[]
 	Produce dense combined diff output for merge commits.
 	Shortcut for '--diff-merges=dense-combined -p'.
 
+--dd::
+	Produce diff with respect to first parent for both merge and
+	regular commits.
+	Shortcut for '--diff-merges=first-parent -p'.
+
 --remerge-diff::
 	Produce remerge-diff output for merge commits.
 	Shortcut for '--diff-merges=remerge -p'.
diff --git a/Documentation/git-log.txt b/Documentation/git-log.txt
index 9b7ec96e767a..579682172fe4 100644
--- a/Documentation/git-log.txt
+++ b/Documentation/git-log.txt
@@ -120,7 +120,7 @@ By default, `git log` does not generate any diff output. The options
 below can be used to show the changes made by each commit.
 
 Note that unless one of `--diff-merges` variants (including short
-`-m`, `-c`, and `--cc` options) is explicitly given, merge commits
+`-m`, `-c`, `--cc`, and `--dd` options) is explicitly given, merge commits
 will not show a diff, even if a diff format like `--patch` is
 selected, nor will they match search options like `-S`. The exception
 is when `--first-parent` is in use, in which case `first-parent` is
diff --git a/diff-merges.c b/diff-merges.c
index ec97616db1df..45507588a279 100644
--- a/diff-merges.c
+++ b/diff-merges.c
@@ -131,6 +131,9 @@ int diff_merges_parse_opts(struct rev_info *revs, const char **argv)
 	} else if (!strcmp(arg, "--cc")) {
 		set_dense_combined(revs);
 		revs->merges_imply_patch = 1;
+	} else if (!strcmp(arg, "--dd")) {
+		set_first_parent(revs);
+		revs->merges_imply_patch = 1;
 	} else if (!strcmp(arg, "--remerge-diff")) {
 		set_remerge_diff(revs);
 		revs->merges_imply_patch = 1;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 5de1d190759f..4b474808311e 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -473,6 +473,14 @@ test_expect_success 'log --diff-merges=on matches --diff-merges=separate' '
 	test_cmp expected actual
 '
 
+test_expect_success 'log --dd matches --diff-merges=1 -p' '
+	git log --diff-merges=1 -p master >result &&
+	process_diffs result >expected &&
+	git log --dd master >result &&
+	process_diffs result >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'deny wrong log.diffMerges config' '
 	test_config log.diffMerges wrong-value &&
 	test_expect_code 128 git log
-- 
2.25.1


^ permalink raw reply related

* [PATCH v4 1/3] diff-merges: improve --diff-merges documentation
From: Sergey Organov @ 2023-10-09 16:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Sergey Organov
In-Reply-To: <20231009160535.236523-1-sorganov@gmail.com>

* Put descriptions of convenience shortcuts first, so they are the
  first things reader observes rather than lengthy detailed stuff.

* Get rid of very long line containing all the --diff-merges formats
  by replacing them with <format>, and putting each supported format
  on its own line.

Signed-off-by: Sergey Organov <sorganov@gmail.com>
---
 Documentation/diff-options.txt | 100 ++++++++++++++++++---------------
 Documentation/git-log.txt      |   2 +-
 2 files changed, 55 insertions(+), 47 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 9f33f887711d..69065c0e90a8 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -43,66 +43,74 @@ endif::git-diff[]
 endif::git-format-patch[]
 
 ifdef::git-log[]
---diff-merges=(off|none|on|first-parent|1|separate|m|combined|c|dense-combined|cc|remerge|r)::
+-m::
+	Show diffs for merge commits in the default format. This is
+	similar to '--diff-merges=on', except `-m` will
+	produce no output unless `-p` is given as well.
+
+-c::
+	Produce combined diff output for merge commits.
+	Shortcut for '--diff-merges=combined -p'.
+
+--cc::
+	Produce dense combined diff output for merge commits.
+	Shortcut for '--diff-merges=dense-combined -p'.
+
+--remerge-diff::
+	Produce remerge-diff output for merge commits.
+	Shortcut for '--diff-merges=remerge -p'.
+
 --no-diff-merges::
+	Synonym for '--diff-merges=off'.
+
+--diff-merges=<format>::
 	Specify diff format to be used for merge commits. Default is
-	{diff-merges-default} unless `--first-parent` is in use, in which case
-	`first-parent` is the default.
+	{diff-merges-default} unless `--first-parent` is in use, in
+	which case `first-parent` is the default.
 +
---diff-merges=(off|none):::
---no-diff-merges:::
+The following formats are supported:
++
+--
+off, none::
 	Disable output of diffs for merge commits. Useful to override
 	implied value.
 +
---diff-merges=on:::
---diff-merges=m:::
--m:::
-	This option makes diff output for merge commits to be shown in
-	the default format. `-m` will produce the output only if `-p`
-	is given as well. The default format could be changed using
-	`log.diffMerges` configuration parameter, which default value
+on, m::
+	Make diff output for merge commits to be shown in the default
+	format. The default format could be changed using
+	`log.diffMerges` configuration variable, whose default value
 	is `separate`.
 +
---diff-merges=first-parent:::
---diff-merges=1:::
-	This option makes merge commits show the full diff with
-	respect to the first parent only.
+first-parent, 1::
+	Show full diff with respect to first parent. This is the same
+	format as `--patch` produces for non-merge commits.
 +
---diff-merges=separate:::
-	This makes merge commits show the full diff with respect to
-	each of the parents. Separate log entry and diff is generated
-	for each parent.
+separate::
+	Show full diff with respect to each of parents.
+	Separate log entry and diff is generated for each parent.
 +
---diff-merges=remerge:::
---diff-merges=r:::
---remerge-diff:::
-	With this option, two-parent merge commits are remerged to
-	create a temporary tree object -- potentially containing files
-	with conflict markers and such.  A diff is then shown between
-	that temporary tree and the actual merge commit.
+combined, c::
+	Show differences from each of the parents to the merge
+	result simultaneously instead of showing pairwise diff between
+	a parent and the result one at a time. Furthermore, it lists
+	only files which were modified from all parents.
++
+dense-combined, cc::
+	Further compress output produced by `--diff-merges=combined`
+	by omitting uninteresting hunks whose contents in the parents
+	have only two variants and the merge result picks one of them
+	without modification.
++
+remerge, r::
+	Remerge two-parent merge commits to create a temporary tree
+	object--potentially containing files with conflict markers
+	and such.  A diff is then shown between that temporary tree
+	and the actual merge commit.
 +
 The output emitted when this option is used is subject to change, and
 so is its interaction with other options (unless explicitly
 documented).
-+
---diff-merges=combined:::
---diff-merges=c:::
--c:::
-	With this option, diff output for a merge commit shows the
-	differences from each of the parents to the merge result
-	simultaneously instead of showing pairwise diff between a
-	parent and the result one at a time. Furthermore, it lists
-	only files which were modified from all parents. `-c` implies
-	`-p`.
-+
---diff-merges=dense-combined:::
---diff-merges=cc:::
---cc:::
-	With this option the output produced by
-	`--diff-merges=combined` is further compressed by omitting
-	uninteresting hunks whose contents in the parents have only
-	two variants and the merge result picks one of them without
-	modification.  `--cc` implies `-p`.
+--
 
 --combined-all-paths::
 	This flag causes combined diffs (used for merge commits) to
diff --git a/Documentation/git-log.txt b/Documentation/git-log.txt
index 2a66cf888074..9b7ec96e767a 100644
--- a/Documentation/git-log.txt
+++ b/Documentation/git-log.txt
@@ -124,7 +124,7 @@ Note that unless one of `--diff-merges` variants (including short
 will not show a diff, even if a diff format like `--patch` is
 selected, nor will they match search options like `-S`. The exception
 is when `--first-parent` is in use, in which case `first-parent` is
-the default format.
+the default format for merge commits.
 
 :git-log: 1
 :diff-merges-default: `off`
-- 
2.25.1


^ permalink raw reply related

* [PATCH v4 3/3] completion: complete '--dd'
From: Sergey Organov @ 2023-10-09 16:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Sergey Organov
In-Reply-To: <20231009160535.236523-1-sorganov@gmail.com>

'--dd' only makes sense for 'git log' and 'git show', so add it to
__git_log_show_options which is referenced in the completion for these
two commands.

Signed-off-by: Sergey Organov <sorganov@gmail.com>
---
 contrib/completion/git-completion.bash | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 133ec92bfae7..ca4fa39f3ff8 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -2042,7 +2042,7 @@ __git_log_shortlog_options="
 "
 # Options accepted by log and show
 __git_log_show_options="
-	--diff-merges --diff-merges= --no-diff-merges --remerge-diff
+	--diff-merges --diff-merges= --no-diff-merges --dd --remerge-diff
 "
 
 __git_diff_merges_opts="off none on first-parent 1 separate m combined c dense-combined cc remerge r"
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
From: Taylor Blau @ 2023-10-09 16:08 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Junio C Hamano, git, Elijah Newren, Eric W. Biederman, Jeff King
In-Reply-To: <ZSPb1OYRrQSUugtg@tanuki>

On Mon, Oct 09, 2023 at 12:54:12PM +0200, Patrick Steinhardt wrote:
> In Gitaly, we usually set up quarantine directories for all operations
> that create objects. This allows us to discard any newly written objects
> in case either the RPC call gets cancelled or in case our access checks
> determine that the change should not be allowed. The logic is rather
> simple:
>
>     1. Create a new temporary directory.
>
>     2. Set up the new temporary directory as main object database via
>        the `GIT_OBJECT_DIRECTORY` environment variable.
>
>     3. Set up the main repository's object database via the
>        `GIT_ALTERNATE_OBJECT_DIRECTORIES` environment variable.

Is there a reason not to run Git in the quarantine environment and list
the main repository as an alternate via $GIT_DIR/objects/info/alternates
instead of the GIT_ALTERNATE_OBJECT_DIRECTORIES environment variable?

>     4. Execute Git commands that write objects with these environment
>        variables set up. The new objects will end up neatly contained in
>        the temporary directory.
>
>     5. Once done, either discard the temporary object database or
>        migrate objects into the main object daatabase.

Interesting. I'm curious why you don't use the builtin tmp_objdir
mechanism in Git itself. Do you need to run more than one command in the
quarantine environment? If so, that makes sense that you'd want to have
a scratch repository that lasts beyond the lifetime of a single process.

> I wonder whether this would be a viable approach for you, as well.

I think that the main problem that we are trying to solve with this
series is creating a potentially large number of loose objects. I think
that you could do something like what you propose above, with a 'git
repacks -adk' before moving its objects over back to the main repository.

But since we're working in a single process only when doing a merge-tree
operation, I think it is probably more expedient to write the pack's
bytes directly.

> Patrick

Thanks,
Taylor

^ permalink raw reply

* Re: What's cooking in git.git (Oct 2023, #03; Fri, 6)
From: Taylor Blau @ 2023-10-09 16:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqh6n24zf1.fsf@gitster.g>

On Sat, Oct 07, 2023 at 01:20:02AM -0700, Junio C Hamano wrote:
> * tb/repack-max-cruft-size (2023-10-05) 4 commits
>   (merged to 'next' on 2023-10-06 at b3ca6df3b9)
>  + builtin/repack.c: avoid making cruft packs preferred
>  + builtin/repack.c: implement support for `--max-cruft-size`
>  + builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
>  + t7700: split cruft-related tests to t7704
>
>  "git repack" learned "--max-cruft-size" to prevent cruft packs from
>  growing without bounds.
>
>  Will merge to 'master'.
>  source: <cover.1696293862.git.me@ttaylorr.com>
>  source: <035393935108d02aaf8927189b05102f4f74f340.1696370003.git.me@ttaylorr.com>

Thanks. On a semi-related note, did you want to pick up my patch in

  https://lore.kernel.org/git/035393935108d02aaf8927189b05102f4f74f340.1696370003.git.me@ttaylorr.com/

? That should address a performance bug that occurs when a repository
(incorrectly) chooses a cruft pack as its preferred pack source when
writing a MIDX bitmap, significantly impeding the pack-reuse mechanism.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 1/4] ref-cache.c: fix prefix matching in ref iteration
From: Victoria Dye @ 2023-10-09 16:21 UTC (permalink / raw)
  To: Patrick Steinhardt, Junio C Hamano; +Cc: Victoria Dye via GitGitGadget, git
In-Reply-To: <ZSPQLjJwq-7SjsDT@tanuki>

Patrick Steinhardt wrote:
>> Allowing prefix="refs/heads/v1.0" to yield entry="refs/heads/v1"
>> (case #2 above that this patch fixes the behaviour for) would cause
>> ref_iterator_advance() to return a ref outside the hierarhcy,
>> wouldn't it?  So it appears to me that either one of the two would
>> be true:
>>
>>  * the code is structured in such a way that such a condition does
>>    not actually happen (in which case this patch would be a no-op),
>>    or
>>
>>  * there is a bug in the current code that is fixed by this patch,
>>    whose externally observable behaviour can be verified with a
>>    test.
>>
>> It is not quite clear to me which is the case here.  The code with
>> the patch looks more logical than the original, but I am not sure
>> how to demonstrate the existing breakage (if any).
> 
> Agreed, I also had a bit of a hard time to figure out whether this is an
> actual bug fix, a performance improvement or merely a refactoring.
> 

I originally operated on the assumption that it was the first case, which is
why I didn't include a test in this patch. Commands like 'for-each-ref',
'show-ref', etc. either use an empty prefix or a directory prefix with a
trailing slash, which won't trigger this issue. I encountered the problem
while working on a builtin that filtered refs by a user-specified prefix -
the results included refs that should not have been matched, which led me to
this fix.

Scanning through the codebase again, though, I do see a way to replicate the
issue:

$ git update-ref refs/bisect/b HEAD
$ git rev-parse --abbrev-ref --bisect
refs/bisect/b

Because 'rev-parse --bisect' uses the "refs/bisect/bad" prefix (no trailing
slash) and does no additional filtering in its 'for_each_fullref_in'
callback, refs like "refs/bisect/b" and "refs/bisect/ba" are (incorrectly)
matched. I'll re-roll with the added test.

^ permalink raw reply

* Re: What's cooking in git.git (Oct 2023, #03; Fri, 6)
From: Junio C Hamano @ 2023-10-09 16:35 UTC (permalink / raw)
  To: René Scharfe; +Cc: git
In-Reply-To: <4014e490-c6c1-453d-b0ed-645220e3e614@web.de>

René Scharfe <l.s.r@web.de> writes:

> Am 07.10.23 um 10:20 schrieb Junio C Hamano:
>> * rs/parse-options-value-int (2023-09-18) 2 commits
>>  - parse-options: use and require int pointer for OPT_CMDMODE
>>  - parse-options: add int value pointer to struct option
>> ...
> I don't mind removing this topic from seen for now; it's not ready, yet.

After seeing the discussion moved to giving a more type safe enum
support and then somehow convesation seemed to have petered out, I
was wondering if we figured out the problem space enough to see an
updated version with well defined scope and good problem
description.  I do not mind keeping it on 'seen', especially if
these two early steps are not expected to change.  It is not like
they are causing any maintenance burden.

Thanks.

^ permalink raw reply

* Re: [PATCH 00/25] Documentation fixes
From: Elijah Newren @ 2023-10-09 16:46 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Elijah Newren via GitGitGadget, git
In-Reply-To: <ZSNbALj63zjzOURN@nand.local>

On Sun, Oct 8, 2023 at 6:44 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Sun, Oct 08, 2023 at 06:45:02AM +0000, Elijah Newren via GitGitGadget wrote:
> > It turns out that AI is pretty good at making small fixes to documentation;
> > certainly not perfect, but it provides quite good signal. Unfortunately,
> > there is a lot to sift through. Some points about my strategy:
>
> Quite interesting ;-).
>
> I'm curious to learn a little bit more about your
> strategy beyond what you wrote:
>
>   - What tool did you use? ChatGPT? Something home-grown?

A mixture of gpt-4 and gpt-4-32k (I would have just used gpt-4, but
trying to give it a full file blows the token limit on several of
Git's documentation files).

Also, it was sent to an internally hosted instance.  On this internal
instance, it seemed to require passing the
api-version=2023-03-15-preview parameter.  I don't really know what
that parameter means, but I suspect it might have been some
6-months-ish old version of gpt-4?

>   - (Assuming this was generated by some sort of LLM): what did you
>     prompt it with?

Note that it was exactly one file per prompt, which was as follows:

"""
For the asciidoc file below, are there any typos, grammatical errors,
or wording problems?  If so, please highlight them along with proposed
corrections:

--------------------
${FILE_CONTENTS}
"""

If I had to do it over, I'd be much more explicit about the output
format.  Probably, "Please respond by outputting the full file, with
any corrections included.  If there are no corrections, simply output
the original file as-is." which would allow me to simply diff the
output and look at the changes.

Also, I would probably specify that "The ascii doc file starts three
lines below, just after the line of dashes", hoping that would help it
avoid sometimes presuming that the dashes were part of the file.

>   - What was the output format: the edited text in its entirety, or a
>     patch that can be applied on top?

My wording was unfortunately vague, so I sometimes got human prose
instructing me with a change to make, sometimes I got a bulleted list
in the form "${old_text} -> ${new_text}", but most of the time it
printed the file (or a subset thereof) with corrections.  I also had
all the output concatenated into one large file, which made it "fun"
to work through all the changes.  Even when diffing files, I manually
applied any changes I saw to the actual file (which did risk
introducing new typos, and missing some of the corrections, but did
ensure I reviewed everything).

Also, not only did I get different output formats, but there were many
times the file was cut off at some point.  I sometimes assumed that
just meant there were no changes outside that region, but there were
times where there was only one change and it had given me hundreds of
lines of context around it before it cut off, so it did leave me with
the feeling it might have only processed or responded to part of the
file.

There were also several times where the changes it suggested were a
no-op, making me wonder if it just failed or something -- I looked at
it really closely (including sometimes piping the output through xxd,
and thus once noticed a change of tab-after-period to
space-after-period), but when it was responding with human prose and
said something like "Change the sentence that reads '${old_version}'
-> '${old_version}', it made me wonder if something just went haywire
with the LLM and I should retry.

However, despite the above issues making me think there are more
documentation issues to be found with an LLM, I didn't re-check any
files unless I got an error with no output (e.g. excessive number of
tokens, or I've hit rate limits on using the API).  I didn't bother,
because the firehose of changes it provided me even without those
caveats was far more than enough to deal with.

^ permalink raw reply

* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Elijah Newren @ 2023-10-09 17:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sergey Organov, git
In-Reply-To: <xmqqmswv3p11.fsf@gitster.g>

On Fri, Oct 6, 2023 at 11:49 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Elijah Newren <newren@gmail.com> writes:
> >
> >> In my opinion, --remerge-diff does this better; wouldn't we want a
> >> ...
> > Between -c and --cc, I do not think there is anything that makes us
> > favor -c over --cc.  While the algorithm to decide which hunks out
> > of -c's output to omit was being polished, comparison with -c served
> > a good way to give baseline, but once --cc has become solid, I do
> > not think I've used -c myself.

Perhaps, then, the user manual should either omit -c, or recommend
users use --cc instead?

> > I personally find that a very trivial merge resolution is far easier
> > to read with --cc than --remerge-diff, the latter being way too
> > verbose.

Ah, indeed, for those that know the --cc output format well (it takes
a bit to figure out for newcomers), your example demonstrates this
nicely.  Thanks.

> > Also, --cc and -c should work inside a read-only repository where
> > you only have read access to.  If remerge needs to write some
> > objects to the repository, then you'd need some hack to give a
> > writable object store overlay via the alternate odb mechanism, or
> > something, right?

Well, it does use a temporary object store with the alternate odb
mechanism already, but I don't think there's any code to allow the
user to input the location for the temporary store, and thus we'd
probably attempt to write it underneath the same read-only directory.
So, yes, read-only repositories would likely be problematic for
--remerge-diff.

However, are read-only repositories worth mentioning in the documentation here?

> Well, the above did not come out as well as I intended, as I forgot
> to prefix it with something I thought was obvious from what I said
> in the recent discussion in the earlier iteration of this topic,
> where I said that it would be "--remerge-diff", if I were to pick an
> option that is so useful that it deserves short and sweet single
> letter.  Narutally, it came after we gained experience with "--cc",
> so it would be surprising if it did worse.  Just like it is natural
> to expect that "--cc" would give more useful output than "-m -p"
> that predates everybody else.
>
> In short, I would say "--remerge-diff" would give output that is the
> easiest to grok among the three modern variants to show the changes
> a merge introduces.
>
> The above two cases, where I said cc does better than remerge-diff,
> were meant as _exceptions_ for that general sentiment.

Thanks, this is useful.  This does make me wonder, though: Should we
perhaps guide users as to what we recommend (and recommend against) in
this documentation?

If we have lots of options and they all shine on different usecases,
it makes sense to just provide a long list of possibilities for users.
But if we generally feel that one is entirely supplanted by another
(e.g. -c by --cc) it seems beneficial to mention that, and if we
generally feel that one will often be clearer or more useful than the
others (e.g. --remerge-diff), it seems beneficial to recommend it.
Thoughts?

Also, perhaps this would be best to include in a follow-up series (as
it appears from Sergey's latest iteration that we are leaving other
tweaks for a later series anyway), if we do decide we want to do it...

^ permalink raw reply

* Re: [PATCH v6] merge-tree: add -X strategy option
From: Junio C Hamano @ 2023-10-09 17:10 UTC (permalink / raw)
  To: Jeff King; +Cc: phillip.wood, Izzy via GitGitGadget, git, Elijah Newren, Izzy
In-Reply-To: <20231009155315.GA3252778@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> I agree that struct-copying is an unusual pattern, and we'd potentially
> run into problems with duplication. But I think it is even trickier than
> that here. We also go on to actually _modify_ opt in this function,
> assigning to various members (both directly, and I think the merge code
> itself will write to opt->priv).
>
> So if we use a pointer (rather than struct assignment), those changes
> will persist in the merge_options struct that was passed in. Which is
> also weird.
>
> Between the two, I think using a pointer is probably the least-weird.
> This real_merge() function is only called once, and is a static-local
> helper for cmd_merge_tree(). So the two functions work as a single unit,
> and munging "opt" is not a big deal.

It is called once per --stdin input to perform many merges in a row.
The most obvious "structure to pointer to structure" conversion
below seems to break an assertion (which is not very surprising, as
it happens inside that --stdin loop), so I am tempted to revert the
whole thing for now.

Thanks.


git: merge-ort.c:5110: merge_incore_recursive: Assertion `opt->ancestor == NULL' failed.
./test-lib.sh: line 1067: 738791 Done                    printf "c1 c3\nc2 -- c1 c3\nc2 c3"
     738792 Aborted                 | git -C repo merge-tree --stdin > actual


 builtin/merge-tree.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git c/builtin/merge-tree.c w/builtin/merge-tree.c
index 7024b5ce2e..1cb1fba2de 100644
--- c/builtin/merge-tree.c
+++ w/builtin/merge-tree.c
@@ -425,7 +425,7 @@ static int real_merge(struct merge_tree_options *o,
 {
 	struct commit *parent1, *parent2;
 	struct commit_list *merge_bases = NULL;
-	struct merge_options opt = o->merge_options;
+	struct merge_options *opt = &o->merge_options;
 	struct merge_result result = { 0 };
 	int show_messages = o->show_messages;
 
@@ -439,10 +439,10 @@ static int real_merge(struct merge_tree_options *o,
 		help_unknown_ref(branch2, "merge-tree",
 				 _("not something we can merge"));
 
-	opt.show_rename_progress = 0;
+	opt->show_rename_progress = 0;
 
-	opt.branch1 = branch1;
-	opt.branch2 = branch2;
+	opt->branch1 = branch1;
+	opt->branch2 = branch2;
 
 	if (merge_base) {
 		struct commit *base_commit;
@@ -452,11 +452,11 @@ static int real_merge(struct merge_tree_options *o,
 		if (!base_commit)
 			die(_("could not lookup commit '%s'"), merge_base);
 
-		opt.ancestor = merge_base;
+		opt->ancestor = merge_base;
 		base_tree = repo_get_commit_tree(the_repository, base_commit);
 		parent1_tree = repo_get_commit_tree(the_repository, parent1);
 		parent2_tree = repo_get_commit_tree(the_repository, parent2);
-		merge_incore_nonrecursive(&opt, base_tree, parent1_tree, parent2_tree, &result);
+		merge_incore_nonrecursive(opt, base_tree, parent1_tree, parent2_tree, &result);
 	} else {
 		/*
 		 * Get the merge bases, in reverse order; see comment above
@@ -467,7 +467,7 @@ static int real_merge(struct merge_tree_options *o,
 		if (!merge_bases && !o->allow_unrelated_histories)
 			die(_("refusing to merge unrelated histories"));
 		merge_bases = reverse_commit_list(merge_bases);
-		merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
+		merge_incore_recursive(opt, merge_bases, parent1, parent2, &result);
 	}
 
 	if (result.clean < 0)
@@ -501,12 +501,12 @@ static int real_merge(struct merge_tree_options *o,
 	}
 	if (show_messages) {
 		putchar(line_termination);
-		merge_display_update_messages(&opt, line_termination == '\0',
+		merge_display_update_messages(opt, line_termination == '\0',
 					      &result);
 	}
 	if (o->use_stdin)
 		putchar(line_termination);
-	merge_finalize(&opt, &result);
+	merge_finalize(opt, &result);
 	return !result.clean; /* result.clean < 0 handled above */
 }
 


^ permalink raw reply related

* Re: [RFC PATCH] Not computing changed path filter for root commits
From: Taylor Blau @ 2023-10-09 17:20 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: SZEDER Gábor, git
In-Reply-To: <20231002225546.409837-1-jonathantanmy@google.com>

On Mon, Oct 02, 2023 at 03:55:46PM -0700, Jonathan Tan wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > diff --git a/revision.c b/revision.c
> > index 2f4c53ea20..1d36df49e2 100644
> > --- a/revision.c
> > +++ b/revision.c
> > @@ -837,14 +837,24 @@ static int rev_compare_tree(struct rev_info *revs,
> >  static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
> >  {
> >  	struct tree *t1 = repo_get_commit_tree(the_repository, commit);
> > +	int bloom_ret = 1;
> >
> >  	if (!t1)
> >  		return 0;
> >
> > +	if (revs->bloom_keys_nr) {
> > +		bloom_ret = check_maybe_different_in_bloom_filter(revs, commit);
> > +		if (!bloom_ret)
> > +			return 1;
> > +	}
> > +
> >  	tree_difference = REV_TREE_SAME;
> >  	revs->pruning.flags.has_changes = 0;
> >  	diff_tree_oid(NULL, &t1->object.oid, "", &revs->pruning);
> >
> > +	if (bloom_ret == 1 && tree_difference == REV_TREE_SAME)
> > +		count_bloom_filter_false_positive++;
> > +
> >  	return tree_difference == REV_TREE_SAME;
> >  }
>
> I'll concentrate on getting this patch in, and will look at (and
> discuss) the other Bloom filter-related emails later.

Sounds good. I know that I have some pending mail from SZEDER as well,
so I'll try and apply any feedback from both of you before sending a
reroll so that we can get this all done in one shot.

> This looks good, possibly except a code path in try_to_simplify_commit()
> that calls this rev_same_tree_as_empty() function when
> rev_compare_tree() between a commit and its parent returns REV_TREE_NEW.
> So there are 2 issues: How can rev_compare_tree() ever return
> REV_TREE_NEW? And it doesn't seem right to check Bloom filters in this
> code path, since rev_same_tree_as_empty() was invoked here while we are
> enumerating through a commit's parents, which necessarily implies that
> the commit has parents, but here we're using the Bloom filter as if the
> commit is known to have no parents.

> As for the first issue, rev_compare_tree() returns REV_TREE_NEW when the
> parent's tree is NULL. I'm not sure how this can happen - the tree can
> be NULL if the parent commit is not parsed, but at this point I think
> that it has been parsed. And I think every commit has a tree. This goes
> back all the way to 3a5e860815 (revision: make tree comparison functions
> take commits rather than trees, 2008-11-03) and even beyond that (I
> didn't dig further).

The more I think about this, the more confused I become ;-). I was able
to track this all the way back to Linus's 461cf59f89 (rev-list: stop
when the file disappears, 2006-01-18), but am convinced that both of
these cases (we have an analogous one for when t2 is NULL and we return
REV_TREE_OLD) are dead code.

I replaced the "if (!t1) return REV_TREE_NEW;" with "if (!t1)
BUG("oops")", and was able to get the whole test suite to pass. So... I
am pretty sure that this is dead code, but not sure enough to remove it
myself ;-).

To your point about seeing the tree as NULL before parsing the parent, I
don't think that is the case here, since we parse the parent immediately
before calling rev_compare_tree() (and indeed Git will refuse to read
commit objects which do not list a tree).

> As for the second issue, we can probably solve this by being defensive
> in rev_same_tree_as_empty() by only using the Bloom filter when the
> commit has no parents. Not sure if this is being overly defensive,
> though.

I am also unsure whether we are being overly defensive here or not. But
I agree that it does feel safer to apply something like:

--- 8< ---
diff --git a/revision.c b/revision.c
index 3d78ea6a9a..21b3085465 100644
--- a/revision.c
+++ b/revision.c
@@ -834,7 +834,8 @@ static int rev_compare_tree(struct rev_info *revs,
 	return tree_difference;
 }

-static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
+static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit,
+				  int use_bloom_filter)
 {
 	struct tree *t1 = repo_get_commit_tree(the_repository, commit);
 	int bloom_ret = 1;
@@ -842,7 +843,7 @@ static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
 	if (!t1)
 		return 0;

-	if (revs->bloom_keys_nr) {
+	if (use_bloom_filter && revs->bloom_keys_nr) {
 		bloom_ret = check_maybe_different_in_bloom_filter(revs, commit);
 		if (!bloom_ret)
 			return 1;
@@ -892,7 +893,7 @@ static int compact_treesame(struct rev_info *revs, struct commit *commit, unsign
 		if (nth_parent != 0)
 			die("compact_treesame %u", nth_parent);
 		old_same = !!(commit->object.flags & TREESAME);
-		if (rev_same_tree_as_empty(revs, commit))
+		if (rev_same_tree_as_empty(revs, commit, 1))
 			commit->object.flags |= TREESAME;
 		else
 			commit->object.flags &= ~TREESAME;
@@ -988,7 +989,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 		return;

 	if (!commit->parents) {
-		if (rev_same_tree_as_empty(revs, commit))
+		if (rev_same_tree_as_empty(revs, commit, 1))
 			commit->object.flags |= TREESAME;
 		return;
 	}
@@ -1069,7 +1070,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)

 		case REV_TREE_NEW:
 			if (revs->remove_empty_trees &&
-			    rev_same_tree_as_empty(revs, p)) {
+			    rev_same_tree_as_empty(revs, p, 0)) {
 				/* We are adding all the specified
 				 * paths from this parent, so the
 				 * history beyond this parent is not
--- >8 ---

on top.

> There is also the issue that count_bloom_filter_false_positive is
> incremented even when no Bloom filters are present, but I think this is
> fine (it matches the behavior of rev_compare_tree()).

Yep, agreed.

Thanks,
Taylor

^ permalink raw reply related

* RE: [EXTERNAL] Re: Microsoft Smart App Control - Git - git-bash.exe File Unsigned
From: Rolland Swing (Insight Global LLC) @ 2023-10-09 17:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git@vger.kernel.org, Anthony Chuang
In-Reply-To: <ZSCvaWuPJ1peZ3KF@tapette.crustytoothpaste.net>

Thanks Brian - I'll reach out to them via their issue tracker.

Thanks,

Rolland

-----Original Message-----
From: brian m. carlson <sandals@crustytoothpaste.net> 
Sent: Friday, October 6, 2023 6:08 PM
To: Rolland Swing (Insight Global LLC) <v-roswing@microsoft.com>
Cc: git@vger.kernel.org; Anthony Chuang <anchuang@microsoft.com>
Subject: [EXTERNAL] Re: Microsoft Smart App Control - Git - git-bash.exe File Unsigned

On 2023-10-05 at 20:41:39, Rolland Swing (Insight Global LLC) wrote:
> Hi Git Team,

Hey,

> We're part of the Microsoft team that owns Smart App Control (https://learn.microsoft.com/en-us/windows/apps/develop/smart-app-control/overview), which requires applications to sign all of their executable files (exe, dll, msi, tmp, and a few other file formats).
>  
> We found during internal testing and/or from user feedback that your app, git-bash.exe, is not correctly signed. 
> 
> Block Event:   FileName: \Device\HarddiskVolume7\Program 
> Files\Git\git-bash.exe
>   Calling Process: \Device\HarddiskVolume7\Windows\explorer.exe
>   Sha256 Hash: 
> 42F2E685686FB6356A195709AF912C7B9D424466BD7C6D69258AADA5E80AC3C2

The Git project doesn't distribute any binaries at all.  We distribute only source code.  Many distributors compile these to produce binaries.

The project you are probably thinking of is Git for Windows, which, while related, is a separate project.  They do indeed distribute binaries, and this looks like a binary that's theirs.  If you'd like to contact them, you can use their issue tracker
(https://github.com/git-for-windows/git/issues) to inquire.

However, I will note that a cursory search there found https://github.com/git-for-windows/git/issues/798, where the maintainer points out that there are over 400 exe files and 250 dll files, which would make signing them all excessively burdensome.  I expect the upcoming requirements for HSM-backed keys for Windows code signing may make that even slower and more burdensome.  That being said, perhaps with automation, the maintainer may feel differently than they did in 2016, so it might be worth asking again.
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

^ permalink raw reply

* Re: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
From: Junio C Hamano @ 2023-10-09 17:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Elijah Newren, git, Eric W. Biederman
In-Reply-To: <20231008173329.GA1557002@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

>> Yes, the bulk-checkin mechanism suffers from an even worse problem which
>> is the pack it creates will contain no deltas whatsoever. The contents
>> of the pack are just getting written as-is, so there's no fancy
>> delta-ficiation going on.
>
> I wonder how big a deal this would be in practice for merges.
> ...

Thanks for your experiments ;-)

The reason why bulk-checkin mechanism does not attempt deltifying
(as opposed to fast-import that attempts to deltify with the
immediately previous object and only with that single object) is
exactly the same.  It was done to support the initial check-in,
which by definition lacks the delta opportunity along the time axis.

As you describe, such a delta-less pack would risk missed
deltification opportunity when running a repack (without "-f"), as
the opposite of the well known "reuse delta" heuristics, aka "this
object was stored in the base form, it is likely that the previous
pack-object tried but did not find a good delta base for it, let's
not waste time retrying that" heuristics would get in the way.

^ permalink raw reply

* Re: [RFC PATCH] Not computing changed path filter for root commits
From: Taylor Blau @ 2023-10-09 17:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: SZEDER Gábor, git
In-Reply-To: <ZSQ2XwbTM4DDLfJq@nand.local>

On Mon, Oct 09, 2023 at 01:20:31PM -0400, Taylor Blau wrote:
> On Mon, Oct 02, 2023 at 03:55:46PM -0700, Jonathan Tan wrote:
> > As for the second issue, we can probably solve this by being defensive
> > in rev_same_tree_as_empty() by only using the Bloom filter when the
> > commit has no parents. Not sure if this is being overly defensive,
> > though.
>
> I am also unsure whether we are being overly defensive here or not. But
> I agree that it does feel safer to apply something like:

Never mind, we are being overly defensive there. The goal here is to
avoid using a commit's Bloom filter in cases where we are acting as if a
commit is at the root of history, but in fact has parents.

This only happens when we return REV_TREE_NEW from a call to
`rev_compare_tree(revs, p, commit, nth_parent)`. But we'll only get
REV_TREE_NEW back if

    repo_get_commit_tree(the_repository, p);

returns NULL. But when we call rev_same_tree_as_empty(revs, p) in the
REV_TREE_NEW case, we return early as follows:

    struct tree *t1 = repo_get_commit_tree(revs, p);
    if (!t1)
      return 0;

So we won't even consult the Bloom filter in that case, since t1 is NULL
for the same reason as what caused rev_compare_tree() to return
REV_TREE_NEW in the first place.

I am still dumbfounded by how we would ever get REV_TREE_NEW in the
first place, but if we did, I think we would be OK here.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH] repack: free existing_cruft array after use
From: Junio C Hamano @ 2023-10-09 17:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, git, Patrick Steinhardt, Eric Sunshine
In-Reply-To: <ZSNWZ9+Q2eOpy91A@nand.local>

Taylor Blau <me@ttaylorr.com> writes:

> On Sat, Oct 07, 2023 at 01:20:31PM -0400, Jeff King wrote:
>> On Mon, Oct 02, 2023 at 08:44:32PM -0400, Taylor Blau wrote:
>> ...
>> Coverity (using the just-merged-to-next version of the workflow file!)
>> flagged a leak here. Since the topic (tb/repack-max-cruft-size) is in
>> 'next', I think we'd want this on top:
>
> Woohoo! I'm glad that this is already paying dividends.
> ...
> Thanks, I can't believe I missed this when writing this function. The
> fix looks obviously correct to me, so this has my:
>
>     Acked-by: Taylor Blau <me@ttaylorr.com>
>
> Thanks,

Will queue.  Thanks.



^ permalink raw reply

* Re: [PATCH v7 2/3] unit tests: add TAP unit test framework
From: Josh Steadmon @ 2023-10-09 17:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: phillip.wood123, git, linusa, calvinwan, rsbecker
In-Reply-To: <xmqq350hw6n7.fsf@gitster.g>

On 2023.08.17 17:12, Junio C Hamano wrote:
> 
> What I am going to utter from here on are not complaints but purely
> meant as questions.  
> 
> Would the resulting output and maintainability of the tests change
> (improve, or worsen) if we introduce
> 
> 	static void assert_empty_strbuf(struct strbuf *buf)
> 	{
> 		check_uint(buf->len, ==, 0);
>                 check_uint(buf->alloc, ==, 0);
> 		check(buf.buf == strbuf_slopbuf);
> 		check_char(buf.buf[0], ==, '\0');
> 	}
> 
> and call it from the setup() function to ensure that
> strbuf_release(&buf) it calls after running customer test f() brings
> the buffer in a reasonably initialized state?  The t_static_init()
> test should be able to say
> 
> 	static void t_static_init(void)
> 	{
> 		struct strbuf buf = STRBUF_INIT;
> 		assert_empty_strbuf(&buf);
> 	}
> 
> if we did so, but is that a good thing or a bad thing (e.g. it may
> make it harder to figure out where the real error came from, because
> of the "line number" thing will not easily capture the caller of the
> caller, perhaps)?  

I am unsure whether or not this is an improvement. While it would
certainly help readability and reduce duplication if this were
production code, in test code it can often be more valuable to be
verbose and explicit, so that individual broken test cases can be
quickly understood without having to do a lot of cross referencing.

I'll hold off on adding any more utility functions in t-strbuf for V8,
but if you or other folks feel strongly about it we can address it in
V9.


> > +	check_char(buf.buf[0], ==, '\0');
> > +}
> 
> > +static void t_dynamic_init(void)
> > +{
> > +	struct strbuf buf;
> > +
> > +	strbuf_init(&buf, 1024);
> > +	check_uint(buf.len, ==, 0);
> > +	check_uint(buf.alloc, >=, 1024);
> > +	check_char(buf.buf[0], ==, '\0');
> 
> Is it sensible to check buf.buf is not slopbuf at this point, or
> does it make the test TOO intimate with the current implementation
> detail?

Yes, I think this is too much of an internal detail. None of the users
of strbuf ever reference it directly. Presumably for library-ish code,
we should stick to testing just the user-observable parts, not the
implementation.


> > +	check_char(buf->buf[0], ==, ch);
> > +	check_char(buf->buf[1], ==, '\0');
> > +}
> 
> In any case, this t_addch() REQUIRES that incoming buf is empty,
> doesn't it?  I do not think it is sensible.  I would have expected
> that it would be more like
> 
> 	t_addch(struct strbuf *buf, void *data)
> 	{
> 		char ch = *(char *)data;
> 		size_t orig_alloc = buf->alloc;
> 		size_t orig_len = buf->len;
> 
> 		if (!assert_sane_strbuf(buf))
> 			return;
>                 strbuf_addch(buf, ch);
> 		if (!assert_sane_strbuf(buf))
> 			return;
> 		check_uint(buf->len, ==, orig_len + 1);
> 		check_uint(buf->alloc, >=, orig_alloc);
>                 check_char(buf->buf[buf->len - 1], ==, ch);
>                 check_char(buf->buf[buf->len], ==, '\0');
> 	}
> 
> to ensure that we can add a ch to a strbuf with any existing
> contents and get a one-byte longer contents than before, with the
> last byte of the buffer becoming 'ch' and still NUL terminated.
> 
> And we protect ourselves with a helper that checks if the given
> strbuf looks *sane*.

Yeah, in general I think this is a good improvement, but again I'm not
sure if it's worth adding additional helpers. I'll try to rework this a
bit in V8.


> 	static int assert_sane_strbuf(struct strbuf *buf)
> 	{
>         	/* can use slopbuf only when the length is 0 */
> 		if (buf->buf == strbuf_slopbuf)
>                 	return (buf->len == 0);
> 		/* everybody else must have non-NULL buffer */
> 		if (buf->buf == NULL)
> 			return 0;
>                 /* 
> 		 * alloc must be at least 1 byte larger than len
> 		 * for the terminating NUL at the end.
> 		 */
> 		return ((buf->len + 1 <= buf->alloc) &&
> 		    	(buf->buf[buf->len] == '\0'));
> 	}
> 
> You can obviously use your check_foo() for the individual checks
> done in this function to get a more detailed diagnosis, but because
> I have confused myself enough by thinking about their polarity, I
> wrote this in barebones comparison instead.
> 

^ permalink raw reply

* Outreachy Introduction and Interest in Contributing to the Git Community
From: Doreen Wanyama @ 2023-10-09 17:38 UTC (permalink / raw)
  To: git

Dear Git community,

I hope you are all doing well. I am writing to show my interest in
working in the project titled move existing tests to a unit testing
framework. This is because I have always been intrigued by the work
the git community does and hence I am interested in being part of
this. I have gone through the links provided about getting started on
this. I spent yesterday evening and a better part of today trying to
understand the resources. As of now I would like to start working on a
microproject since I understand this is the first step. I am finding
it difficult though to start. Someone to please help me understand how
I should go about this or how I should go about finding my first
microproject. Just a brief explanation will help.
Thank you in advance.

Best regards,

Doreen Wanyama

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox