From: Zach FettersMoore <zach.fetters@apollographql.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Zach FettersMoore via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Subject: Re: [PATCH] subtree: fix split processing with multiple subtrees present
Date: Thu, 26 Oct 2023 15:59:42 -0400 [thread overview]
Message-ID: <CAEWN6q3HfeU1Uj4TPiiVW8PO13xwxgqihPH5-cm8T1oK5tHVTQ@mail.gmail.com> (raw)
In-Reply-To: <xmqqpm2fht2x.fsf@gitster.g>
> Please do not violate Documentation/CodingGuidelines for our shell
> scripted Porcelain, even if it is a script in contrib/ and also
>please avoid bash-isms.
I believe I have resolved the CodingGuidelines issues.
> Also doesn't "subtree" have its own test? If this change is a fix
> for some problem(s), can we have a test or two that demonstrate how
> the current code without the patch is broken?
I was able to add a test that validates against some of the metrics that
are tracked when running a split for processing commits. Validated that
before my fix the test fails, and after my fix the test passes.
>> In the diagram below, 'M' represents the mainline repo branch, 'A'
>> represents one subtree, and 'B' represents another. M3 and B1 represent
>> a split commit for subtree B that was created from commit M4. M2 and A1
>> represent a split commit made from subtree A that was also created
>> based on changes back to and including M4. M1 represents new changes to
>> the repo, in this scenario if you try to run a 'git subtree split
>> --rejoin' for subtree B, commits M1, M2, and A1, will be included in
>> the processing of changes for the new split commit since the last
>> split/rejoin for subtree B was at M3. The issue is that by having A1
>> included in this processing the command ends up needing to processing
>> every commit down tree A even though none of that is needed or relevant
>> to the current command and result.
>>
>> M1
>> | \ \
>> M2 | |
>> | A1 |
>> M3 | |
>> | | B1
>> M4 | |
> The above paragraph explains which different things you drew in the
> diagram are representing, but it is not clear how they relate to
> each other. Do they for example depict parent-child commit
> relationship? What are the wide gaps between these three tracks and
> what are the short angled lines leaning to the left near the tip?
> Is the time/topology flowing from bottom to top?
I am realizing I made a few mistakes with trying to illustrate the diagram
which I will attempt to make more clear below. As for the 3 columns in the
diagram, 'M' represents the mainline branch of the repo being developed in,
while column 'A' represents the history of a subtree 'A' included in the
repo, and column 'B' also represents the history of a subtree 'B' in the
repo. The diagram attempts to illustrate when a 'git subtree split --rejoin'
is used, that there is a commit made in the subtrees history, and that is
then merged into the mainline repo branch.
M1
|
|
M2 --- |
| A1
| |
M3 ---------- |
| | B1
M4 | |
Hopefully that helps better illustrate the state of the repo before the new
'git subtree split --rejoin' attempt and why it results in the described issue.
>> +should_ignore_subtree_commit () {
>> + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
>> + then
>> + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
>
> Here $dir is a free variable that comes from outside. The caller
> does not supply it as a parameter to this function (and the caller
> does not receive it as its parameter from its caller). Yet the file
> as a whole seems to liberally make assignments to it ("git grep dir="
> on the file counts 7 assignments). Are we sure we are looking for
> the right $dir in this particular grep?
>
> Side note: I am not familiar with this part of the code at
> all, so do not take it as "here is a bug", but more as "this
> smells error prone."
From my testing and what I see for '$dir' usage in the 'cmd_split'
function which leads to this code it is the correct '$dir', although
I see your point about it being reassigned in different places which
makes it error prone. I switched this to use the command
line argument '$arg_prefix' since the subtree prefix passed into
the command is what we actually want in this case so we can filter
out commits from other subtrees.
> Also can $dir have regular expressions special characters? "The
> existing code and new code alike, git-subtree is not prepared to
> handle directory names with RE special characters well at all, so
> do not use them if you do not want your history broken" is an
> acceptable answer.
As far as I can tell from looking at the code (which I only recently
started using) the '$dir' which is based on the subtree prefix is
not setup to handle this.
> The caller of this function process_split_commit is cmd_split and
> process_split_commit (hence this function) is called repeatedly
> inside a loop. This function makes a traversal over the entire
> history for each and every iteration in "good" cases where there is
> no 'mainline' or 'subtree-dir' commits for the given $dir.
>
> I wonder if it is more efficient to enumerate all commits that hits
> these grep criteria in the cmd_split before it starts to call
> process_split_commit repeatedly. If it knows which commit can be
> ignored beforehand, it can skip and not call process_split_commit,
> no?
Moved this functionality into the 'cmd_split' function as suggested.
>> + then
>> + return 0
>> + fi
>> + fi
>> + return 1
>> +}
>> +
>> # Usage: process_split_commit REV PARENTS
>> process_split_commit () {
>> assert test $# = 2
>> local rev="$1"
>> local parents="$2"
>
> These seem to assume that $1 and $2 can have $IFS in them, so
> shouldn't ...
>
>> + if should_ignore_subtree_commit $rev
>
> ... this call too enclose $rev inside a pair of double-quotes for
> consistency? We know the loop in the cmd_split that calls this
> function is reading from "rev-list --parents" and $rev is a 40-hex
> commit object name (and $parents can have more than one 40-hex
> commit object names separated with SP), so it is safe to leave $rev
> unquoted, but it pays to be consistent to help make the code more
> readable.
Updated this for consistency
On Mon, Sep 18, 2023 at 9:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Zach FettersMoore via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > In the diagram below, 'M' represents the mainline repo branch, 'A'
> > represents one subtree, and 'B' represents another. M3 and B1 represent
> > a split commit for subtree B that was created from commit M4. M2 and A1
> > represent a split commit made from subtree A that was also created
> > based on changes back to and including M4. M1 represents new changes to
> > the repo, in this scenario if you try to run a 'git subtree split
> > --rejoin' for subtree B, commits M1, M2, and A1, will be included in
> > the processing of changes for the new split commit since the last
> > split/rejoin for subtree B was at M3. The issue is that by having A1
> > included in this processing the command ends up needing to processing
> > every commit down tree A even though none of that is needed or relevant
> > to the current command and result.
> >
> > M1
> > | \ \
> > M2 | |
> > | A1 |
> > M3 | |
> > | | B1
> > M4 | |
>
> The above paragraph explains which different things you drew in the
> diagram are representing, but it is not clear how they relate to
> each other. Do they for example depict parent-child commit
> relationship? What are the wide gaps between these three tracks and
> what are the short angled lines leaning to the left near the tip?
> Is the time/topology flowing from bottom to top?
>
> > diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> > index e0c5d3b0de6..e9250dfb019 100755
> > --- a/contrib/subtree/git-subtree.sh
> > +++ b/contrib/subtree/git-subtree.sh
> > @@ -778,12 +778,29 @@ ensure_valid_ref_format () {
> > die "fatal: '$1' does not look like a ref"
> > }
> >
> > +# Usage: check if a commit from another subtree should be ignored from processing for splits
>
> Way overlong line. Please split them accordingly. I won't comment
> on what CodingGuidelines tells us already, in this review, but have
> a few comments here:
>
> > +should_ignore_subtree_commit () {
> > + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
> > + then
> > + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
>
> Here $dir is a free variable that comes from outside. The caller
> does not supply it as a parameter to this function (and the caller
> does not receive it as its parameter from its caller). Yet the file
> as a whole seems to liberally make assignments to it ("git grep dir="
> on the file counts 7 assignments). Are we sure we are looking for
> the right $dir in this particular grep?
>
> Side note: I am not familiar with this part of the code at
> all, so do not take it as "here is a bug", but more as "this
> smells error prone."
>
> Also can $dir have regular expressions special characters? "The
> existing code and new code alike, git-subtree is not prepared to
> handle directory names with RE special characters well at all, so
> do not use them if you do not want your history broken" is an
> acceptable answer.
>
> The caller of this function process_split_commit is cmd_split and
> process_split_commit (hence this function) is called repeatedly
> inside a loop. This function makes a traversal over the entire
> history for each and every iteration in "good" cases where there is
> no 'mainline' or 'subtree-dir' commits for the given $dir.
>
> I wonder if it is more efficient to enumerate all commits that hits
> these grep criteria in the cmd_split before it starts to call
> process_split_commit repeatedly. If it knows which commit can be
> ignored beforehand, it can skip and not call process_split_commit,
> no?
>
> > + then
> > + return 0
> > + fi
> > + fi
> > + return 1
> > +}
> > +
> > # Usage: process_split_commit REV PARENTS
> > process_split_commit () {
> > assert test $# = 2
> > local rev="$1"
> > local parents="$2"
>
> These seem to assume that $1 and $2 can have $IFS in them, so
> shouldn't ...
>
> > + if should_ignore_subtree_commit $rev
>
> ... this call too enclose $rev inside a pair of double-quotes for
> consistency? We know the loop in the cmd_split that calls this
> function is reading from "rev-list --parents" and $rev is a 40-hex
> commit object name (and $parents can have more than one 40-hex
> commit object names separated with SP), so it is safe to leave $rev
> unquoted, but it pays to be consistent to help make the code more
> readable.
>
> > + then
> > + return
> > + fi
> > +
> > if test $indent -eq 0
> > then
> > revcount=$(($revcount + 1))
> >
> > base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518
next prev parent reply other threads:[~2023-10-26 19:59 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-18 20:05 [PATCH] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
2023-09-18 23:31 ` Junio C Hamano
2023-09-19 1:04 ` Junio C Hamano
2023-10-26 19:59 ` Zach FettersMoore [this message]
2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget
2023-09-22 16:25 ` [PATCH v2 1/2] " Zach FettersMoore via GitGitGadget
2023-09-22 16:25 ` [PATCH v2 2/2] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget
2023-09-29 20:32 ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
2023-09-29 20:32 ` [PATCH v3 1/3] " Zach FettersMoore via GitGitGadget
2023-09-29 20:32 ` [PATCH v3 2/3] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget
2023-09-29 20:33 ` [PATCH v3 3/3] subtree: adding test to validate fix Zach FettersMoore via GitGitGadget
2023-10-17 20:02 ` Zach FettersMoore
2023-10-26 19:17 ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
2023-11-18 11:28 ` Christian Couder
2023-11-28 21:04 ` Zach FettersMoore
2023-11-28 21:17 ` [PATCH v5] " Zach FettersMoore via GitGitGadget
2023-11-30 20:33 ` Christian Couder
2023-11-30 21:01 ` Zach FettersMoore
2023-12-01 14:54 ` [PATCH v6] " Zach FettersMoore via GitGitGadget
2023-12-04 11:08 ` Christian Couder
2023-12-11 15:39 ` Zach FettersMoore
2023-12-12 16:06 ` Christian Couder
2023-12-12 22:28 ` Junio C Hamano
2023-12-13 15:20 ` Zach FettersMoore
2024-01-03 16:33 ` Christian Couder
2023-12-20 15:25 ` Christian Couder
2024-01-25 10:09 ` Christian Couder
2024-01-25 16:38 ` Junio C Hamano
2024-01-25 18:52 ` Christian Couder
2024-01-25 18:56 ` Junio C Hamano
2025-08-21 3:13 ` subtree: [v2.44 regression] split may produce different history Colin Stagner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAEWN6q3HfeU1Uj4TPiiVW8PO13xwxgqihPH5-cm8T1oK5tHVTQ@mail.gmail.com \
--to=zach.fetters@apollographql.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).