Git development
 help / color / mirror / Atom feed
* Re: [PATCH v3 15/15] rebase: change the default backend from "am" to "merge"
From: Phillip Wood @ 2020-01-11 14:41 UTC (permalink / raw)
  To: Elijah Newren, Jonathan Nieder
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Johannes Schindelin, Phillip Wood, Denton Liu, Junio C Hamano,
	Pavel Roskin, Alban Gruin, SZEDER Gábor
In-Reply-To: <CABPp-BEGwJ=0+6TMYXvMzpds0h6bz4gZA8Pi95SH9M4CKBtfOw@mail.gmail.com>

Hi Jonathan & Elijah

On 11/01/2020 01:16, Elijah Newren wrote:
> Hi Jonathan,
> 
> On Fri, Jan 10, 2020 at 3:14 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>>
>> Hi,
>>
>> Elijah Newren via GitGitGadget wrote:
>>
>>> The am-backend drops information and thus limits what we can do:
>>>
>>>    * lack of full tree information from the original commits means we
>>>      cannot do directory rename detection and warn users that they might
>>>      want to move some of their new files that they placed in old
>>>      directories to prevent their becoming orphaned.[1]
>>>    * reduction in context from only having a few lines beyond those
>>>      changed means that when context lines are non-unique we can apply
>>>      patches incorrectly.[2]
>>>    * lack of access to original commits means that conflict marker
>>>      annotation has less information available.
>>>
>>> Also, the merge/interactive backend have far more abilities, appear to
>>> currently have a slight performance advantage[3] and have room for more
>>> optimizations than the am backend[4] (and work is underway to take
>>> advantage of some of those possibilities).
>>>
>>> [1] https://lore.kernel.org/git/xmqqh8jeh1id.fsf@gitster-ct.c.googlers.com/
>>> [2] https://lore.kernel.org/git/CABPp-BGiu2nVMQY_t-rnFR5GQUz_ipyEE8oDocKeO+h+t4Mn4A@mail.gmail.com/
>>> [3] https://public-inbox.org/git/CABPp-BF=ev03WgODk6TMQmuNoatg2kiEe5DR__gJ0OTVqHSnfQ@mail.gmail.com/
>>> [4] https://lore.kernel.org/git/CABPp-BGh7yW69QwxQb13K0HM38NKmQif3A6C6UULEKYnkEJ5vA@mail.gmail.com/
>>>
>>> Signed-off-by: Elijah Newren <newren@gmail.com>
>>> ---
>>>   Documentation/git-rebase.txt           |  2 +-
>>>   builtin/rebase.c                       |  4 ++--
>>>   t/t5520-pull.sh                        | 10 ++++++----
>>>   t/t9106-git-svn-commit-diff-clobber.sh |  3 ++-
>>>   4 files changed, 11 insertions(+), 8 deletions(-)
>>
>> Thanks for writing this.  We finally rolled this out to our internal
>> population at $DAYJOB and ran into a couple of issues:
> 
> Cool, thanks for testing it out.
> 
>>   1. "git rebase --am" does not invoke the post-commit hook, but "git
>>      rebase --merge" does.  Is this behavior change intended?
>>
>>      Noticed because jiri[1] installs a post-commit hook that warns
>>      about commits on detached HEAD, so this change makes rebases more
>>      noisy in repositories that were set up using jiri.

Perhaps that hook could learn not to warn if a branch is being rebased? 
git could be more helpful there by having a porcelain option to status 
that prints the branch name if we're rebasing (`git worktree --list` 
shows the branch correctly when it's being rebased but does not (yet - I 
have a patch to do it) mark the current worktree so isn't very helpful.)

> I've never used a post-commit hook or seen one in the wild.  Certainly
> wasn't intentional, but it's not clear to me if it's wrong or right
> either.  I don't see why it would make sense to distinguish between
> any of git rebase --am/--merge/--interactive, but it isn't too
> surprising that by historical accident the two rebase backends which
> happened to call git-commit behind the scenes would call a post-commit
> hook and the other rebase backend that didn't call git-commit
> wouldn't.

Looking through the history the am based rebase has never run the 
post-commit hook as am has its own set of hooks and the scripted version 
used commit-tree. The merge based rebase ran `git commit` which ran the 
post commit hook. The interactive rebase ran the hook until and I broke 
it in a356ee4659b ("sequencer: try to commit without forking 'git 
commit'", 2017-11-24) and after I fixed it in 4627bc777e ("sequencer: 
run post-commit hook", 2019-10-15). As it was broken for two years with 
no one noticing it can't be that popular.

> But the big question here, is what is correct behavior?  Should rebase
> call the post-commit hook, or should it skip it?  I haven't any clue
> what the answer to that is.

It's creating a new commit so I lean towards thinking it should run the 
post-commit hook. As an example I have a post-commit hook that prints a 
warning if a commit is created on a branch that is being rewritten by 
one of my scripts in another worktree. There are pre-commit and 
pre-rebase hooks to try and prevent that, but the warning is there as a 
last resort if those hooks are by-passed.

>>   2. GIT_REFLOG_ACTION contains "rebase -i" even though the rebase is
>>      not interactive.

If this is important to people I think it should be easy enough to set 
GIT_REFLOG_ACTION to the appropriate string in builtin/rebase.c (so long 
as it hasn't already been set by the user) rather than relying on 
sequencer.c to do it.

> Yep, as does --keep, --exec, --rebase-merges, etc.  There are lots of
> rebases which use the interactive machinery even if they aren't
> explicitly interactive.  I've never seen the "-i" in the reflog
> message defined, but clearly it has always been used whenever the
> interactive machinery was in play regardless of whether the rebase was
> interactive.  In that regard, I figured that --merge fit in rather
> nicely.  (And I noted the fact that reflog messages were different
> between the backends among the "BEHAVIORAL DIFFERENCES" section of
> git-rebase.txt).  But if others think we should just drop the -i (much
> as we did for the bash prompt), I'd be happy with that too.  If we go
> that route, I think I'd rather drop the -i in the reflog for all
> rebases, not just the
> using-the-interactive-machinery-but-not-explicitly-interactive ones.
> 
>>   3. In circumstances I haven't pinned down yet, we get the error
>>      message "invalid date format: @@2592000 +0000":
>>
>>          $ git rebase --committer-date-is-author-date --onto branch_K branch_L~1 branch_L
>>          $ git checkout --theirs file
>>          $ git add file
>>          $ git rebase --continue
>>          fatal: invalid date format: @@2592000 +0000
>>          error: could not commit staged changes.
>>
>>      This isn't reproducible without --committer-date-is-author-date.
>>      More context (the test where it happens) is in [2].
> 
> Interesting.  Do you happen to know if this started happening with
> ra/rebase-i-more-options, or did it just become an issue with
> en/rebase-backend?  I looked around at the link you provided and feel
> a bit confused; I'm not sure which test does this or how I'd
> reproduce.

I'm confused by the test as well. As ra/rebase-i-more-options only 
touched the sequencer then any bugs would only show up in this test 
(which runs a non-interactive rebase) once en/rbease-backend switched to 
that backend. It seems likely that ra/rebase-i-more-options is to blame.

Jonathan - do you happen to know if your users create empty commits at 
all? and if so what do they expect rebase to do with them (and any that 
become empty when they are rebased) - cf 
https://lore.kernel.org/git/<CABPp-BEH=9qejeqysHYE+AJ+JPaBympZizq-bx_OjArYFa4xUQ@mail.gmail.com>

Best Wishes

Phillip

>>   4. I suspect the exit status in the "you need to resolve conflicts"
>>      case has changed.  With rebase --am, [3] would automatically
>>      invoke rebase --abort when conflicts are present, but with rebase
>>      --merge it does not.
>>
>> Known?
> 
> Nope, but I would certainly hope that "you need to resolve conflicts"
> would result in a non-zero exit status.  If it doesn't, that sounds
> like a bug in the interactive backend that we need to fix.  I'll dig
> in.
> 
> 
> 
> Thanks for the reports!
> Elijah
> 

^ permalink raw reply

* Re: [PATCH v3 2/2] rebase-interactive: warn if commit is dropped with `rebase --edit-todo'
From: Phillip Wood @ 2020-01-11 14:44 UTC (permalink / raw)
  To: Alban Gruin, phillip.wood, Junio C Hamano; +Cc: git, Johannes Schindelin
In-Reply-To: <85e3d40a-9829-0f50-8d91-7e8e8fa319f1@gmail.com>

Hi Alban

On 10/01/2020 21:31, Alban Gruin wrote:
> Hi Phillip,
> 
> Le 10/01/2020 à 18:13, Phillip Wood a écrit :
>> Hi Alban
>>
>> On 09/01/2020 21:13, Alban Gruin wrote:
>>> Hi Phillip,
>>> [...]
>>>
>>> In which case, if the check did not pass at the previous edit, the new
>>> todo list should be compared to the backup.  As sequencer_continue()
>>> already does this, extract this to its own function in
>>> rebase-interactive.c.  To keep track of this, a file is created on the
>>> disk (as you suggested in your other email.)  At the next edit, if this
>>> file exists and no errors were found, it is deleted.  The backup is only
>>> created if there is no errors in `todo_list' and in `new_todo'.
>>>
>>> This would guarantee that there is no errors in the backup, and that the
>>> edited list is always compared to a list exempt of errors.
>>>
>>> This approach also has the benefit to detect if a commit part of a
>>> badcmd was dropped.
>>>
>>> After some tweaks (ie. `expect' now lists 2 commits instead of one),
>>> this passes the test with the change you suggested, and the one you sent
>>> in your other email.
>>
>> That sounds good. I'm not sure how it passes the test in my other email
>> though, if sequencer_continue() compares the todo list to the backup
>> wont it still fail when continuing after conflicts as the backup is out
>> of date?
>>
> 
> I changed sequencer_continue() to check the todo list only if the file
> indicating an error exists.

That makes sense

> I still have to rewrite the commit message, then I’ll re-send this series.

Excellent, I look forward to reading them

Best Wishes

Phillip

> Cheers,
> Alban
> 
>> Best Wishes
>>
>> Phillip
>>

^ permalink raw reply

* Re: [PATCH] branch: let '--edit-description' default to rebased branch during rebase
From: Eric Sunshine @ 2020-01-11 13:26 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: Git List, SZEDER Gábor
In-Reply-To: <20200111123533.1613844-1-marcandre.lureau@redhat.com>

On Sat, Jan 11, 2020 at 7:36 AM <marcandre.lureau@redhat.com> wrote:
> Defaulting to editing the description of the rebased branch without an
> explicit branchname argument would be useful.  Even the git bash prompt
> shows the name of the rebased branch, and then
>
>   ~/src/git (mybranch|REBASE-i 1/2)$ git branch --edit-description
>   fatal: Cannot give description to detached HEAD
>
> looks quite unhelpful.
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
> diff --git a/builtin/branch.c b/builtin/branch.c
> @@ -745,15 +745,27 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
>                 if (!argc) {
> -                       if (filter.detached)
> -                               die(_("Cannot give description to detached HEAD"));
> -                       branch_name = head;
> +                       if (filter.detached) {
> +                               struct wt_status_state state;
> +
> +                               memset(&state, 0, sizeof(state));
> +
> +                               if (wt_status_check_rebase(NULL, &state)) {
> +                                       branch_name = state.branch;
> +                               }

Style: drop unneeded braces.

> +
> +                               if (!branch_name)
> +                                       die(_("Cannot give description to detached HEAD"));
> +
> +                               free(state.onto);

Also, no need for all the blank lines which eat up valuable vertical
screen real-estate without making the code clearer.

> +                       } else
> +                               branch_name = xstrdup(head);

It would be easier to see what happens in the common case (when not
rebasing) if you invert the condition to `if (!filter.detached)` and
turn this one-line 'else' branch into the 'if' branch.

> @@ -772,6 +784,8 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
>                 if (edit_branch_description(branch_name))
>                         return 1;
> +
> +               free(branch_name);

That `return 1` just above this free() is leaking 'branch_name', isn't it?

> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> @@ -1260,6 +1260,25 @@ test_expect_success 'use --edit-description' '
> +test_expect_success 'use --edit-description during rebase' '
> +       write_script editor <<-\EOF &&
> +               echo "Rebase contents" >"$1"
> +       EOF
> +       (
> +               set_fake_editor &&
> +               FAKE_LINES="break 1" git rebase -i HEAD^ &&
> +               EDITOR=./editor git branch --edit-description &&
> +               git rebase --continue
> +       ) &&
> +       write_script editor <<-\EOF &&
> +               git stripspace -s <"$1" >"EDITOR_OUTPUT"
> +       EOF
> +       EDITOR=./editor git branch --edit-description &&
> +       echo "Rebase contents" >expect &&
> +       test_cmp expect EDITOR_OUTPUT
> +'
> +test_done

Strange place for a test_done() invocation considering that existing
tests follow the new one added by this patch.

>  test_expect_success 'detect typo in branch name when using --edit-description' '
>         write_script editor <<-\EOF &&
>                 echo "New contents" >"$1"

^ permalink raw reply

* [PATCH] branch: let '--edit-description' default to rebased branch during rebase
From: marcandre.lureau @ 2020-01-11 12:35 UTC (permalink / raw)
  To: git; +Cc: szeder.dev, Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Defaulting to editing the description of the rebased branch without an
explicit branchname argument would be useful.  Even the git bash prompt
shows the name of the rebased branch, and then

  ~/src/git (mybranch|REBASE-i 1/2)$ git branch --edit-description
  fatal: Cannot give description to detached HEAD

looks quite unhelpful.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
Changed in v2:
 - add a test
 - fix commit message & summary
 - simplify code

builtin/branch.c  | 24 +++++++++++++++++++-----
 t/t3200-branch.sh | 19 +++++++++++++++++++
 2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index d8297f80ff..ee82dc828e 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -745,15 +745,27 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		string_list_clear(&output, 0);
 		return 0;
 	} else if (edit_description) {
-		const char *branch_name;
+		char *branch_name = NULL;
 		struct strbuf branch_ref = STRBUF_INIT;
 
 		if (!argc) {
-			if (filter.detached)
-				die(_("Cannot give description to detached HEAD"));
-			branch_name = head;
+			if (filter.detached) {
+				struct wt_status_state state;
+
+				memset(&state, 0, sizeof(state));
+
+				if (wt_status_check_rebase(NULL, &state)) {
+					branch_name = state.branch;
+				}
+
+				if (!branch_name)
+					die(_("Cannot give description to detached HEAD"));
+
+				free(state.onto);
+			} else
+				branch_name = xstrdup(head);
 		} else if (argc == 1)
-			branch_name = argv[0];
+			branch_name = xstrdup(argv[0]);
 		else
 			die(_("cannot edit description of more than one branch"));
 
@@ -772,6 +784,8 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 
 		if (edit_branch_description(branch_name))
 			return 1;
+
+		free(branch_name);
 	} else if (copy) {
 		if (!argc)
 			die(_("branch name required"));
diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 411a70b0ce..a20ebedea0 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -1260,6 +1260,25 @@ test_expect_success 'use --edit-description' '
 	test_cmp expect EDITOR_OUTPUT
 '
 
+test_expect_success 'use --edit-description during rebase' '
+	write_script editor <<-\EOF &&
+		echo "Rebase contents" >"$1"
+	EOF
+	(
+		set_fake_editor &&
+		FAKE_LINES="break 1" git rebase -i HEAD^ &&
+		EDITOR=./editor git branch --edit-description &&
+		git rebase --continue
+	) &&
+	write_script editor <<-\EOF &&
+		git stripspace -s <"$1" >"EDITOR_OUTPUT"
+	EOF
+	EDITOR=./editor git branch --edit-description &&
+	echo "Rebase contents" >expect &&
+	test_cmp expect EDITOR_OUTPUT
+'
+test_done
+
 test_expect_success 'detect typo in branch name when using --edit-description' '
 	write_script editor <<-\EOF &&
 		echo "New contents" >"$1"

base-commit: 7a6a90c6ec48fc78c83d7090d6c1b95d8f3739c0
-- 
2.25.0.rc2.2.g5aece98438


^ permalink raw reply related

* Re: git submodule update  strange output behavior.
From: Carlo Wood @ 2020-01-11 11:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqblrbm7c9.fsf@gitster-ct.c.googlers.com>

On Fri, 10 Jan 2020 11:36:54 -0800
Junio C Hamano <gitster@pobox.com> wrote:

> Carlo Wood <carlo@alinoe.com> writes:
> 
> > It seems to me that the other part of the problem is printing
> > this output for submodules when nothing (needed to be) is fetched.  
> 
> Hmm, I am not sure if that is a reasonable expectation.  Would it be
> possible to tell if there is something that needs to be fetched
> without attempting to contact the other side?

That wasn't even my point though: assume that we can't know, so we
have to attempt a fetch anyway (seems indeed likely), then I'd
expect to see this message for *every* submodule, and not just
for submodules inside other submodule.

There is an asymmetry (in the output of this command) between
submodules in the top level project and submodules inside other
submodules.

This then makes me wonder after all, if the message is suppressed for
top-level submodules then can't it also be suppressed for the others
(those that are currently being printed)?

-- 
Carlo Wood <carlo@alinoe.com>

^ permalink raw reply

* Re: Question about the pack OBJ_OFS_DELTA format
From: Jeff King @ 2020-01-11  9:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Erik Fastermann, git
In-Reply-To: <xmqq1rs7m757.fsf@gitster-ct.c.googlers.com>

On Fri, Jan 10, 2020 at 11:41:08AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The pack-format.txt file says:
> >
> >        offset encoding:
> >             n bytes with MSB set in all but the last one.
> >             The offset is then the number constructed by
> >             concatenating the lower 7 bit of each byte, and
> >             for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
> >             to the result.
> >
> > but I think is missing two bits of information:
> >
> >   - the bytes are in most-significant to least-significant order, which
> >     IIRC is the opposite of the size varint
> >
> >   - each 7-bit byte sneaks in some extra data by implicitly adding "1"
> >     to all but the last byte
> 
> Isn't the latter mentioned in the paragraph you quoted?

Hmm, yeah. I admit I had trouble parsing exactly what that part was
trying to say, and thought it was trying to talk about how you'd shift
the individual bytes. But reading more carefully, it does say "adding",
so yeah, it accounts for the extra.

It's a little confusing, I think, because in code you'd add just
continually add one before shifting, rather than trying to add in the
extra values at the end.

-Peff

^ permalink raw reply

* Potential Issue with ls-tree documentation
From: Kevin Bowersox @ 2020-01-11  9:46 UTC (permalink / raw)
  To: git

Within the ls-tree documentation for Git found here: 
https://git-scm.com/docs/git-ls-tree <https://git-scm.com/docs/git-ls-tree>

It mentions the following:

the behaviour is slightly different from that of "/bin/ls" in that the 
<path> denotes just a list of patterns to match, e.g. so specifying 
directory name (without |-r|) will behave differently, and order of the 
arguments does not matter.

I'm wondering if this statement is accurate.  Here is my reasoning:

When using this form of the command:

git ls-tree <tree-ish> [<path>]

the <tree-ish> (assuming this is typically a branch name) must be 
specified prior to the path, making the order of the arguments matter 
when issuing the command.

If my reasoning is correct I would be glad to submit the changes for 
review/merge.

Appreciate any insights.

Kevin Bowersox

^ permalink raw reply

* [RFC] Proposal: New options for git bisect run to control the interpretation of exit codes
From: Stephen Oberholtzer @ 2020-01-11  1:42 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

After considering the responses I got for my --invert-status proposal, I
went back to the drawing board and considered how it might interact with
another feature I was going to propose.  This is the result.

To avoid repeating myself, I'm going to start with an example of what I
imagine putting into Documentation/git-bisect.txt, to document this
feature.  After the second snip line, you can find my justification for
this feature, as well as some preemptive answers to questions I expect
people to have.

>8----------------------------------------------------------------------8<

{at top}

git bisect run [--(old|good|<term-old>)-status=<list>]
[--(new|bad|<term-new>)-status=<list>]
[--skip-status=<list>] [--] <cmd>...

{below the paragraph beginning with "The special exit code 125")

Custom exit status mappings
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The previous paragraphs describe how the exit status of <cmd> is mapped
to one of four actions:

 * `old`/`good`, which defaults to exit status 0.

 * `new`/`bad`, which defaults to exit status 1-127 except 125.

 * `skip`, which defaults to exit status 125.

 * Any status not mapped to one of the first three is treated as a fatal
   error, causing `git bisect run` to immediately terminate with a
   nonzero exit status.

For more precise control over bisect run's interpretation of the command's
exit status, you can use one or more `--<term>-status=<list>` options:

<term>::
  One of the following:
  * `skip` (`--skip-status`)
  * `old` or the corresponding term specified in the bisect start call
     (`--old-status` or `--good-status`)
  * `new` or the corresponding term specified in the bisect start call
     (`--new-status` or `--bad-status`)
<list>::
A comma-separated list of values (e.g. 1) or ranges (e.g. 1-10).

(It should be noted that specifying --old-status is unlikely to be useful
without a corresponding --new-status option.)

This feature can make a few things much easier:

 * If you want to bisect a "fix", you can use (as an example)

---------------
$ git bisect run --old-status=1-127 --new-status=0 my_script arguments
---------------

 * If the test involves some fundamentally-unreliable aspects such as I/O
  (for example, a network connection could be disrupted, or a file write
   could fail due to disk space issues), you can specify something like
   --new-status=99, and then any exit status other than 0 or 99 will be
   treated as a fatal error:

   -----------------
   $ git bisect run --new-status=99 sh -c 'setup-test && (run-test || exit 99)'
   -----------------

   If setup-test exits with a nonzero status code (except 99), then the
   run will abort with an error, rather than assume that the current commit
   contains the issue being bisected.


The default status code lists are as follows:
  --skip-status=125
  --old-status=0
  --new-status=1-127

The priority of each classification is as follows:

 * If the status code is on the --skip-status list, the action is "skip".
   Note that this takes precedence over the --old-status and --new-status
   lists; doing so simplifies the specification of --new-status.

 * If the status is on the --old-status list, *and* not on the --new-status
   list, the action is "old" (or "good").

 * If the status is on the --new-status list, *and* not on the --old-status
   list, the action is "new" (or "bad").

 * Otherwise, the command is treated as having experienced a fatal error,
   and run will terminate with a nonzero exit status.

In the last case, the bisection will be left where it was; you may correct
the error and perform another bisect run, or you may finish the bisection
with manual good/bad commands.

>8----------------------------------------------------------------------8<

The motivation
==============

First, an excerpt from the current documentation for 'git bisect':

> 126 and 127 are used by POSIX shells to signal specific error status
> (127 is for command not found, 126 is for command found but not executable
> - these details do not matter, as they are normal errors in the script,
> as far as bisect run is concerned

This shows a fundamental disconnect between bisect run's view of the
world and reality.  I submit that, in reality, status codes 126 and 127 are
overwhelmingly likely indicators that the script did not work correctly,
in which case the run should be halted so the user may correct the issue.
However, 126 and 127 are mapped to "git bisect bad" -- as in, "this commit
definitely contains the bug that I am searching for".


Let's consider the consequences of an inappropriate status code:

- 'good':  The bisect will incorrectly select the wrong commit (specifically,
  a later commit than the one that actually introduced the issue.)

  This will also indirectly result in more trials than would otherwise be
  necessary to determine the result, because the bisection will have to be
  restarted at the point where the mistake occurred.  (In the worst possible
  case, where a mistake occurs on the first step, the bisection will take
  at least twice as long.)

- 'bad': The bisect will incorrectly select the wrong commit (specifically,
  an earlier commit than the one that actually introduced the issue.)

  Like 'good', this will also indirectly result in extra trials.

- 'skip': The bisect will unnecessarily test at least one extra commit for
  each false 'skip' result.  In the worst case, it may not be able to narrow
  down the issue to a single commit.

- Abort: The bisection stops until the user restarts it.  No extra commits
  are tested, though if the user isn't paying attention, the wallclock
  time will take longer than usual.

In particular, the behavior of a false match on 'good' or 'bad' is *at best*
extra time needed to do the bisection.  At worst -- if the user is not
familiar with the code in question -- a great deal of confusion and time-
wasting can result as the user investigates the wrong commit.  As such, it
is critical that you have no false 'good' or 'bad' results.


If you're using a shell script to run your test, a false 'good' result can
easily be prevented by putting 'set -e' at the top of the script.

Avoiding a false 'bad' result is far more difficult, especially if the test
is complex and you're not familiar with shell scripting in general.  (The
man page for bash, on my machine, is 3725 lines long, and does not lend
itself well to searching.)  For the task that set me down this path, it
took me about six iterations to create a robust script that would return
the exit code that git wanted,  and _it still didn't work right in all
cases_.


What would have been a great deal simpler is if I could have just picked
an exit code that none of the other commands in my script would ever
return (such as 99), and told git to treat any code other than 0 or 99 as
a fatal error.  Which is essentially what I ended up doing (or trying to),
but unfamiliar with shell scripting as I was, I had several issues.

If I can make it easy for someone to use bisect run without them having
to learn any otherwise-unnecessary shell scripting constructs, I would
consider that a win.


Pre-emptive Q&A
===============

Q: Why allow ranges?  Why not restrict it to single status codes?
A: It allows for a simpler implementation, because the default 'bad' status
   list is a already a range (1-124,126-127).  By supporting ranges,
   we need only one single code path:

   exit_bad=1-124,126-127
   # override exit_bad if --bad-status is specified
   # check exit code against exit_bad

Q: Why treat any exit code that's on both the 'old' and 'new' lists as a
   fatal error?
A: Because it's obviously a mistake, and we don't know the correct way
   to interpret it.  By halting the run and returning an error, we can
   return control to the user, who can fix it by fixing the overlapping
   lists and re-running the command.

Q: Why not an error when a code is on both 'old' and 'skip', or 'new' and
   'skip'?
A: Two reasons:
   1. As I detailed earlier, a false 'skip' is less problematic than a
      false 'new'/'bad'.
   2. By prioritizing 'skip' over 'new', the default behavior for 'new'
      can be written as '1-127' instead of '1-124,126-127'.


-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE

^ permalink raw reply

* Re: [PATCH v3 15/15] rebase: change the default backend from "am" to "merge"
From: Elijah Newren @ 2020-01-11  1:16 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Johannes Schindelin, Phillip Wood, Denton Liu, Junio C Hamano,
	Pavel Roskin, Alban Gruin, SZEDER Gábor
In-Reply-To: <20200110231436.GA24315@google.com>

Hi Jonathan,

On Fri, Jan 10, 2020 at 3:14 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Hi,
>
> Elijah Newren via GitGitGadget wrote:
>
> > The am-backend drops information and thus limits what we can do:
> >
> >   * lack of full tree information from the original commits means we
> >     cannot do directory rename detection and warn users that they might
> >     want to move some of their new files that they placed in old
> >     directories to prevent their becoming orphaned.[1]
> >   * reduction in context from only having a few lines beyond those
> >     changed means that when context lines are non-unique we can apply
> >     patches incorrectly.[2]
> >   * lack of access to original commits means that conflict marker
> >     annotation has less information available.
> >
> > Also, the merge/interactive backend have far more abilities, appear to
> > currently have a slight performance advantage[3] and have room for more
> > optimizations than the am backend[4] (and work is underway to take
> > advantage of some of those possibilities).
> >
> > [1] https://lore.kernel.org/git/xmqqh8jeh1id.fsf@gitster-ct.c.googlers.com/
> > [2] https://lore.kernel.org/git/CABPp-BGiu2nVMQY_t-rnFR5GQUz_ipyEE8oDocKeO+h+t4Mn4A@mail.gmail.com/
> > [3] https://public-inbox.org/git/CABPp-BF=ev03WgODk6TMQmuNoatg2kiEe5DR__gJ0OTVqHSnfQ@mail.gmail.com/
> > [4] https://lore.kernel.org/git/CABPp-BGh7yW69QwxQb13K0HM38NKmQif3A6C6UULEKYnkEJ5vA@mail.gmail.com/
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  Documentation/git-rebase.txt           |  2 +-
> >  builtin/rebase.c                       |  4 ++--
> >  t/t5520-pull.sh                        | 10 ++++++----
> >  t/t9106-git-svn-commit-diff-clobber.sh |  3 ++-
> >  4 files changed, 11 insertions(+), 8 deletions(-)
>
> Thanks for writing this.  We finally rolled this out to our internal
> population at $DAYJOB and ran into a couple of issues:

Cool, thanks for testing it out.

>  1. "git rebase --am" does not invoke the post-commit hook, but "git
>     rebase --merge" does.  Is this behavior change intended?
>
>     Noticed because jiri[1] installs a post-commit hook that warns
>     about commits on detached HEAD, so this change makes rebases more
>     noisy in repositories that were set up using jiri.

I've never used a post-commit hook or seen one in the wild.  Certainly
wasn't intentional, but it's not clear to me if it's wrong or right
either.  I don't see why it would make sense to distinguish between
any of git rebase --am/--merge/--interactive, but it isn't too
surprising that by historical accident the two rebase backends which
happened to call git-commit behind the scenes would call a post-commit
hook and the other rebase backend that didn't call git-commit
wouldn't.

But the big question here, is what is correct behavior?  Should rebase
call the post-commit hook, or should it skip it?  I haven't any clue
what the answer to that is.

>  2. GIT_REFLOG_ACTION contains "rebase -i" even though the rebase is
>     not interactive.

Yep, as does --keep, --exec, --rebase-merges, etc.  There are lots of
rebases which use the interactive machinery even if they aren't
explicitly interactive.  I've never seen the "-i" in the reflog
message defined, but clearly it has always been used whenever the
interactive machinery was in play regardless of whether the rebase was
interactive.  In that regard, I figured that --merge fit in rather
nicely.  (And I noted the fact that reflog messages were different
between the backends among the "BEHAVIORAL DIFFERENCES" section of
git-rebase.txt).  But if others think we should just drop the -i (much
as we did for the bash prompt), I'd be happy with that too.  If we go
that route, I think I'd rather drop the -i in the reflog for all
rebases, not just the
using-the-interactive-machinery-but-not-explicitly-interactive ones.

>  3. In circumstances I haven't pinned down yet, we get the error
>     message "invalid date format: @@2592000 +0000":
>
>         $ git rebase --committer-date-is-author-date --onto branch_K branch_L~1 branch_L
>         $ git checkout --theirs file
>         $ git add file
>         $ git rebase --continue
>         fatal: invalid date format: @@2592000 +0000
>         error: could not commit staged changes.
>
>     This isn't reproducible without --committer-date-is-author-date.
>     More context (the test where it happens) is in [2].

Interesting.  Do you happen to know if this started happening with
ra/rebase-i-more-options, or did it just become an issue with
en/rebase-backend?  I looked around at the link you provided and feel
a bit confused; I'm not sure which test does this or how I'd
reproduce.

>  4. I suspect the exit status in the "you need to resolve conflicts"
>     case has changed.  With rebase --am, [3] would automatically
>     invoke rebase --abort when conflicts are present, but with rebase
>     --merge it does not.
>
> Known?

Nope, but I would certainly hope that "you need to resolve conflicts"
would result in a non-zero exit status.  If it doesn't, that sounds
like a bug in the interactive backend that we need to fix.  I'll dig
in.



Thanks for the reports!
Elijah

^ permalink raw reply

* Re: [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks
From: Jakub Narebski @ 2020-01-11  0:27 UTC (permalink / raw)
  To: Garima Singh via GitGitGadget
  Cc: git, Derrick Stolee, SZEDER Gábor, Jonathan Tan,
	Jeff Hostetler, Taylor Blau, Jeff King, Junio C Hamano,
	Garima Singh
In-Reply-To: <72a2bbf6765a1e99a3a23372f801099a07fe11a5.1576879520.git.gitgitgadget@gmail.com>

"Garima Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Garima Singh <garima.singh@microsoft.com>
>
> If bloom filters have been written to the commit-graph file, revision walk will
> use them to speed up revision walks for a particular path.

I'd propose the following change, to make structure of the above
sentence simpler:

  Revision walk will now use Bloom filters for commits to speed up
  revision walks for a particular path (for computing history of a
  path), if they are present in the commit-graph file.

> Note: The current implementation does this in the case of single pathspec
> case only.
>
> We load the bloom filters during the prepare_revision_walk step when dealing

I think it would flow better with the following change:

s/when dealing/, but only when dealing/

> with a single pathspec. While comparing trees in rev_compare_trees(), if the
> bloom filter says that the file is not different between the two trees, we
> don't need to compute the expensive diff. This is where we get our performance
> gains.

Maybe we should also add:

  The other answer we can get from the Bloom filter is "maybe".

>
> Performance Gains:
> We tested the performance of `git log --path` on the git repo, the linux and
> some internal large repos, with a variety of paths of varying depths.

I think you meant `git log <path>`, not `git log --path`.

>
> On the git and linux repos:
> we observed a 2x to 5x speed up.

It would be good, I think, to have some specific numbers: starting from
this version, for this path, with and without Bloom filters it takes so
long (and the improvement in %).

While at it, it might be good idea to provide _costs_: how much
additional space on disk Bloom filters take for those specific examples,
and how much extra time (and memory) it takes to compute Bloom filters.

With actual specific numbers we can estimate when it would start to be
worth it to create Bloom filter data...

>
> On a large internal repo with files seated 6-10 levels deep in the tree:
> we observed 10x to 20x speed ups, with some paths going up to 28 times
> faster.

It would be nice to see specific numbers, if showing pathnames is
possible.  In any case it would be good to have more information: what
paths give 10x, what give 20x, and what kinds give 28x speedup (what
is path depth, how many objects, etc.).

>
> RFC Notes:
> I plan to collect the folloowing statistics around this usage of bloom filters
> and trace them out using trace2.
> - number of bloom filter queries,
> - number of "No" responses (file hasn't changed)
> - number of "Maybe" responses (file may have changed)
> - number of "Commit not parsed" cases (commit had too many changes to have a
>   bloom filter written out, currently our limit is 512 diffs)

Perhaps also:
  - histogram of bloom filter sizes in 64-bit blocks
    (which is rough histogram of number of changes per commit)

Though I think all those statistics are a bit specific to the
repository, and how you use Git.

>
> Helped-by: Derrick Stolee <dstolee@microsoft.com
> Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
> Helped-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Garima Singh <garima.singh@microsoft.com>
> ---
>  bloom.c              | 20 ++++++++++++
>  bloom.h              |  4 +++
>  revision.c           | 67 +++++++++++++++++++++++++++++++++++++--
>  revision.h           |  5 +++
>  t/t4216-log-bloom.sh | 74 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 168 insertions(+), 2 deletions(-)
>  create mode 100755 t/t4216-log-bloom.sh
>
> diff --git a/bloom.c b/bloom.c
> index 86b1005802..0c7505d3d6 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -235,3 +235,23 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
>  
>  	return filter;
>  }
> +
> +int bloom_filter_contains(struct bloom_filter *filter,
> +			  struct bloom_key *key,
> +			  struct bloom_filter_settings *settings)
> +{
> +	int i;
> +	uint64_t mod = filter->len * BITS_PER_BLOCK;
> +
> +	if (!mod)
> +		return 1;

Ah, so filter->len equal to zero denotes too many changes for a Bloom
filter, and this conditional is a short-circuit test: always return
"maybe".

I wonder if we should explicitly check for filter->len = 1 and
filter->data[0] = 0, which should be empty Bloom filter -- for commit
with no changes with respect to first parent, and short-circuit
returning 0 (no file would ever belong).

> +
> +	for (i = 0; i < settings->num_hashes; i++) {
> +		uint64_t hash_mod = key->hashes[i] % mod;
> +		uint64_t block_pos = hash_mod / BITS_PER_BLOCK;

I have seen this code before... ;-)  The add_key_to_filter() function
includes almost identical code, but I am not sure if it is feasible to
eliminate this (slight) code duplication.  Probably not.

> +		if (!(filter->data[block_pos] & get_bitmask(hash_mod)))
> +			return 0;

All right, if at least one hash function does not match, return "no"...

> +	}
> +
> +	return 1;

... else return "maybe".

> +}

I am wondering however if this code should not be moved earlier in the
series, so that commit 2/9 in series actually adds fully functional
[semi-generic] Bloom filter implementation.

> diff --git a/bloom.h b/bloom.h
> index 101d689bbd..9bdacd0a8e 100644
> --- a/bloom.h
> +++ b/bloom.h
> @@ -44,4 +44,8 @@ void fill_bloom_key(const char *data,
>  		    struct bloom_key *key,
>  		    struct bloom_filter_settings *settings);
>  
> +int bloom_filter_contains(struct bloom_filter *filter,
> +			  struct bloom_key *key,
> +			  struct bloom_filter_settings *settings);
> +
>  #endif
> diff --git a/revision.c b/revision.c
> index 39a25e7a5d..01f5330740 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -29,6 +29,7 @@
>  #include "prio-queue.h"
>  #include "hashmap.h"
>  #include "utf8.h"
> +#include "bloom.h"
>  
>  volatile show_early_output_fn_t show_early_output;
>  
> @@ -624,11 +625,34 @@ static void file_change(struct diff_options *options,
>  	options->flags.has_changes = 1;
>  }
>  
> +static int check_maybe_different_in_bloom_filter(struct rev_info *revs,
> +						 struct commit *commit,
> +						 struct bloom_key *key,
> +						 struct bloom_filter_settings *settings)

All right, this function name is certainly descriptive.  I wonder if it
wouldn't be better to use a shorter name, like maybe_different(), or
maybe_not_treesame(), so that the conditional using this function would
read naturally:

  if (!maybe_different(revs, commit, revs->bloom_key,
  		       revs->bloom_filter_settings))
  	return REV_TREE_SAME;

But this might be just a matter of taste.

> +{
> +	struct bloom_filter *filter;
> +
> +	if (!revs->repo->objects->commit_graph)
> +		return -1;
> +	if (commit->generation == GENERATION_NUMBER_INFINITY)
> +		return -1;

O.K., so we check that there is loaded commit graph, and that given
commit is in a commit graph, otherwise we return "no data".

I agree with distinguishing between "no data" value (which is <0, a
convention for denoting errors), and "maybe" value.  Currently this
distinction is not utilized, but it would help in the case of more than
one path given -- in "no data" case there is no need to check other
paths against non-existing Bloom filter.

> +	if (!key || !settings)
> +		return -1;

Why the check for non-NULL 'key' and 'settings' is the last check?

BTW. when it is possible for key or settings to be NULL?  Or is it just
defensive programming?

> +
> +	filter = get_bloom_filter(revs->repo, commit, 0);

O.K., we won't be recomputing Bloom filter if it is not present either
on slab (as "inside-out" auxiliary data for a commit), or in the
commit-graph (in Bloom filter chunks).

> +
> +	if (!filter || !filter->len)
> +		return 1;

Sidenote: bloom_filter_contains() would also indirectly check for
filter->len being zero.  Though this doesn't cost much.

Shouldn't !filter case return value of -1 i.e. "no data", rather than
return value of 1 i.e. "maybe"?

> +
> +	return bloom_filter_contains(filter, key, settings);
> +}

All right.

> +
>  static int rev_compare_tree(struct rev_info *revs,
> -			    struct commit *parent, struct commit *commit)
> +			    struct commit *parent, struct commit *commit, int nth_parent)
>  {
>  	struct tree *t1 = get_commit_tree(parent);
>  	struct tree *t2 = get_commit_tree(commit);
> +	int bloom_ret = 1;
>  
>  	if (!t1)
>  		return REV_TREE_NEW;
> @@ -653,6 +677,16 @@ static int rev_compare_tree(struct rev_info *revs,
>  			return REV_TREE_SAME;
>  	}
>  
> +	if (revs->pruning.pathspec.nr == 1 && !nth_parent) {

All right, Bloom filter stores information about changed paths with
respect to first-parent changes only, so if we are asking about is not a
first parent (where nth_parent == 0), we cannot use Bloom filter.

Currently we limit the check to single pathspec only; that is a good
start and a good simplification.

> +		bloom_ret = check_maybe_different_in_bloom_filter(revs,
> +								  commit,
> +								  revs->bloom_key,
> +								  revs->bloom_filter_settings);
> +
> +		if (bloom_ret == 0)
> +			return REV_TREE_SAME;

Pretty straightforward.

> +	}
> +
>  	tree_difference = REV_TREE_SAME;
>  	revs->pruning.flags.has_changes = 0;
>  	if (diff_tree_oid(&t1->object.oid, &t2->object.oid, "",
> @@ -855,7 +889,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
>  			die("cannot simplify commit %s (because of %s)",
>  			    oid_to_hex(&commit->object.oid),
>  			    oid_to_hex(&p->object.oid));
> -		switch (rev_compare_tree(revs, p, commit)) {
> +		switch (rev_compare_tree(revs, p, commit, nth_parent)) {

All right, we need to pass information about the index of the parent;
and we have just done that.  Good.

>  		case REV_TREE_SAME:
>  			if (!revs->simplify_history || !relevant_commit(p)) {
>  				/* Even if a merge with an uninteresting
> @@ -3342,6 +3376,33 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
>  	}
>  }
>  
> +static void prepare_to_use_bloom_filter(struct rev_info *revs)

All right, I see that pointers to bloom_key and bloom_filter_settings
were added to the rev_info struct.  I understand why the former is here,
but the latter seems to be there just as a shortcut (to not owned data),
which is fine but a bit strange.

Or is the latter here to allow for Bloom filter settings to possibly
change from commit-graph file in the chain to commit-graph file, and
thus from commit to commit?

> +{
> +	struct pathspec_item *pi;

Maybe 'pathspec' (or the like) instead of short and cryptic 'pi' would
be better a better name... unless 'pi' is used in other places already.

> +	const char *path;
> +	size_t len;
> +
> +	if (!revs->commits)
> +	    return;

When revs->commits may be NULL?  I understand that we need to have this
check because we use revs->commits->item next (sidenote: can revs ever
be NULL?).

Would `git log --all -- <path>` use Bloom filters (as it theoretically
could)?

> +
> +	parse_commit(revs->commits->item);

Why parsing first commit on the list of starting commits is needed here?
Please help me understand this line.

And shouldn't we use repo_parse_commit() here?

> +
> +	if (!revs->repo->objects->commit_graph)
> +		return;
> +
> +	revs->bloom_filter_settings = revs->repo->objects->commit_graph->settings;
> +	if (!revs->bloom_filter_settings)
> +		return;

All right, so if there is no commit graph, or the commit graph does not
include Bloom filter data, there is nothing to do.

Though I worry that it would make Git do not use Bloom filter if the top
commit-graph in the chain does not include Bloom filter data, while
other commit-graph files do (and Git could have used that information to
speed up the file history query).

> +
> +	pi = &revs->pruning.pathspec.items[0];
> +	path = pi->match;
> +	len = strlen(path);


Why not the following, if we do not do any checks for 'pi' value:

  +	path = &revs->pruning.pathspec.items[0]->match;

A question: is the path in the `match` field in `struct pathspec`
normalized with respect to trailing slash (for directories)?  Bloom
filter stores pathnames for directories without trailing slash.

What I mean is if, for example, both of those would use Bloom filter
data:

  $ git log -- Documentation/
  $ git log -- Documentation

> +
> +	load_bloom_filters();
> +	revs->bloom_key = xmalloc(sizeof(struct bloom_key));
> +	fill_bloom_key(path, len, revs->bloom_key, revs->bloom_filter_settings);

All right, looks good.

Though... do we leak revs->bloom_key, as should we worry about it?

> +}
> +
>  int prepare_revision_walk(struct rev_info *revs)
>  {
>  	int i;
> @@ -3391,6 +3452,8 @@ int prepare_revision_walk(struct rev_info *revs)
>  		simplify_merges(revs);
>  	if (revs->children.name)
>  		set_children(revs);
> +	if (revs->pruning.pathspec.nr == 1)
> +	    prepare_to_use_bloom_filter(revs);

Looks good.

Minor nitpick: 4 spaces instead of tab are used for indentation (or, to
be more exact a tab followed by 4 space, instead of two tabs):

  +	if (revs->pruning.pathspec.nr == 1)
  +		prepare_to_use_bloom_filter(revs);

>  	return 0;
>  }
>  
> diff --git a/revision.h b/revision.h
> index a1a804bd3d..65dc11e8f1 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -56,6 +56,8 @@ struct repository;
>  struct rev_info;
>  struct string_list;
>  struct saved_parents;
> +struct bloom_key;
> +struct bloom_filter_settings;
>  define_shared_commit_slab(revision_sources, char *);
>  
>  struct rev_cmdline_info {
> @@ -291,6 +293,9 @@ struct rev_info {
>  	struct revision_sources *sources;
>  
>  	struct topo_walk_info *topo_walk_info;
> +
> +	struct bloom_key *bloom_key;
> +	struct bloom_filter_settings *bloom_filter_settings;

It might be a good idea to add one-line comment above those newly
introduced fields.  The `struct rev_info` has many subsections of fields
described by such comments (or even block comments), like e.g.

  /* Starting list */
  /* Parents of shown commits */
  /* The end-points specified by the end user */
  /* excluding from --branches, --refs, etc. expansion */
  /* Traversal flags */
  /* diff info for patches and for paths limiting */

  /*
   * Whether the arguments parsed by setup_revisions() included any
   * "input" revisions that might still have yielded an empty pending
   * list (e.g., patterns like "--all" or "--glob").
   */

>  };
>  
>  int ref_excluded(struct string_list *, const char *path);
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> new file mode 100755
> index 0000000000..d42f077998
> --- /dev/null
> +++ b/t/t4216-log-bloom.sh
> @@ -0,0 +1,74 @@
> +#!/bin/sh
> +
> +test_description='git log for a path with bloom filters'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup repo' '
> +	git init &&
> +	git config core.commitGraph true &&
> +	git config gc.writeCommitGraph false &&
> +	infodir=".git/objects/info" &&
> +	graphdir="$infodir/commit-graphs" &&
> +	test_oid_init

Why do you use `test_oid_init`, when you are *not* using `test_oid`?
I guess it is because t5318-commit-graph.sh uses it, isn't it?

The `graphdir` shell variable is not used either.

> +'
> +
> +test_expect_success 'create 9 commits and repack' '
> +	test_commit c1 file1 &&
> +	test_commit c2 file2 &&
> +	test_commit c3 file3 &&
> +	test_commit c4 file1 &&
> +	test_commit c5 file2 &&
> +	test_commit c6 file3 &&
> +	test_commit c7 file1 &&
> +	test_commit c8 file2 &&
> +	test_commit c9 file3
> +'

Wouldn't it be better for this step to be a part of 'setup repo' step?
Anyway, the test name says '... and repack', but `git repack` is missing
(it should be done after last test_commit).

I think it would be good idea to test behavior of Bloom filters with
respect to directories, so at least one file should be in a subdirectory
(maybe even deeper in hierarchy).

We should also test the behavior with respect to merges, and when we are
not following the first parent.  But that might be a separate part of
this test.

> +
> +printf "c7\nc4\nc1" > expect_file1

Doing things outside test is discouraged.  We can create a separate test
that creates those expect_file* files, or it can be a part of 'create
commits' test.

Anyway, instead of doing test without Bloom filters (something that
should have been tested already by other parts of testsuite), and then
doing the same test with Bloom filter, why not compare that the result
without and with Bloom filter is the same.  The t5318-commit-graph.sh
test does this with help of graph_git_two_modes() function:

  graph_git_two_modes () {
  	git -c core.commitGraph=true  $1 >output
  	git -c core.commitGraph=false $1 >expect
  	test_cmp expect output
  }

Sidenote: I wonder if it is high time to create t/lib-commit-graph.sh
helper with, among others, this common function.

> +
> +test_expect_success 'log without bloom filters' '
> +	git log --pretty="format:%s"  -- file1 > actual &&

CodingGuidelines:57: Redirection operators should be written with space
before, but no space after them.  (Minor nitpick)

  +	git log --pretty="format:%s"  -- file1 >actual &&

> +	test_cmp expect_file1 actual
> +'
> +
> +printf "c8\nc7\nc5\nc4\nc2\nc1" > expect_file1_file2

  +printf "c8\nc7\nc5\nc4\nc2\nc1" >expect_file1_file2

> +
> +test_expect_success 'multi-path log without bloom filters' '
> +	git log --pretty="format:%s"  -- file1 file2 > actual &&

  +	git log --pretty="format:%s"  -- file1 file2 >actual &&

> +	test_cmp expect_file1_file2 actual
> +'
> +
> +graph_read_expect() {

CodingGuidelines:144: We prefer a space between the function name and
the parentheses, and no space inside the parentheses.  (Minor nitpick)

  +graph_read_expect () {

> +	OPTIONAL=""
> +	NUM_CHUNKS=5
> +	if test ! -z $2
> +	then
> +		OPTIONAL=" $2"
> +		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))

This should be either

  +		NUM_CHUNKS=$((5 + $(echo "$2" | wc -w)))

or more future proof

  +		NUM_CHUNKS=$(($NUM_CHUNKS + $(echo "$2" | wc -w)))

We got away with this bug because we were not using octopus merges, and
there were no optional core chunks, that is we never call
graph_read_expect with second parameter in this test.

> +	fi
> +	cat >expect <<- EOF

Why there is space between "<<-" and "EOF"?

> +	header: 43475048 1 1 $NUM_CHUNKS 0
> +	num_commits: $1
> +	chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data$OPTIONAL
> +	EOF
> +	test-tool read-graph >output &&
> +	test_cmp expect output
> +}
> +
> +test_expect_success 'write commit graph with bloom filters' '
> +	git commit-graph write --reachable --changed-paths &&
> +	test_path_is_file $infodir/commit-graph &&
> +	graph_read_expect "9"
> +'

All right, this is preparatory step for further tests.

> +
> +test_expect_success 'log using bloom filters' '
> +	git log --pretty="format:%s" -- file1 > actual &&
> +	test_cmp expect_file1 actual
> +'
> +
> +test_expect_success 'multi-path log using bloom filters' '
> +	git log --pretty="format:%s"  -- file1 file2 > actual &&
> +	test_cmp expect_file1_file2 actual
> +'

With graph_git_two_modes() it would be much simpler:

  +test_expect_success 'single-path log' '
  +	git_graph_two_modes "git log --pretty=format:%s -- file1"
  +'

  +test_expect_success 'multi-path log' '
  +	git_graph_two_modes "git log --pretty=format:%s -- file1 file2"
  +'

What is missing is:
1. checking that Git is actually _using_ Bloom filters
   (which might be difficult to do)
2. testing that Bloom filters work also for history of subdirectories
   e.g. "git log -- subdir/" and "git log -- subdir"; this would of
   course require adjusting setup step
3. testing specific behaviors, like "git log --all -- file1"
4. merges with history following second parent
5. commits with no changes and/or merges with no first-parent changes
6. commit with more than 512 changed files (marked as slow test,
   and perhaps created with fast-import interface, like bulk commit
   creation in test_commit_bulk)

> +
> +test_done

Best regards,
-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH v3 15/15] rebase: change the default backend from "am" to "merge"
From: Jonathan Nieder @ 2020-01-10 23:14 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Johannes.Schindelin, phillip.wood, liu.denton, gitster,
	plroskin, alban.gruin, szeder.dev, Elijah Newren
In-Reply-To: <044853fd612ee8cf6928bb7c4ebb1eacaa172aa3.1577217299.git.gitgitgadget@gmail.com>

Hi,

Elijah Newren via GitGitGadget wrote:

> The am-backend drops information and thus limits what we can do:
>
>   * lack of full tree information from the original commits means we
>     cannot do directory rename detection and warn users that they might
>     want to move some of their new files that they placed in old
>     directories to prevent their becoming orphaned.[1]
>   * reduction in context from only having a few lines beyond those
>     changed means that when context lines are non-unique we can apply
>     patches incorrectly.[2]
>   * lack of access to original commits means that conflict marker
>     annotation has less information available.
>
> Also, the merge/interactive backend have far more abilities, appear to
> currently have a slight performance advantage[3] and have room for more
> optimizations than the am backend[4] (and work is underway to take
> advantage of some of those possibilities).
>
> [1] https://lore.kernel.org/git/xmqqh8jeh1id.fsf@gitster-ct.c.googlers.com/
> [2] https://lore.kernel.org/git/CABPp-BGiu2nVMQY_t-rnFR5GQUz_ipyEE8oDocKeO+h+t4Mn4A@mail.gmail.com/
> [3] https://public-inbox.org/git/CABPp-BF=ev03WgODk6TMQmuNoatg2kiEe5DR__gJ0OTVqHSnfQ@mail.gmail.com/
> [4] https://lore.kernel.org/git/CABPp-BGh7yW69QwxQb13K0HM38NKmQif3A6C6UULEKYnkEJ5vA@mail.gmail.com/
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  Documentation/git-rebase.txt           |  2 +-
>  builtin/rebase.c                       |  4 ++--
>  t/t5520-pull.sh                        | 10 ++++++----
>  t/t9106-git-svn-commit-diff-clobber.sh |  3 ++-
>  4 files changed, 11 insertions(+), 8 deletions(-)

Thanks for writing this.  We finally rolled this out to our internal
population at $DAYJOB and ran into a couple of issues:

 1. "git rebase --am" does not invoke the post-commit hook, but "git
    rebase --merge" does.  Is this behavior change intended?

    Noticed because jiri[1] installs a post-commit hook that warns
    about commits on detached HEAD, so this change makes rebases more
    noisy in repositories that were set up using jiri.

 2. GIT_REFLOG_ACTION contains "rebase -i" even though the rebase is
    not interactive.

 3. In circumstances I haven't pinned down yet, we get the error
    message "invalid date format: @@2592000 +0000":

	$ git rebase --committer-date-is-author-date --onto branch_K branch_L~1 branch_L
	$ git checkout --theirs file
	$ git add file
	$ git rebase --continue
	fatal: invalid date format: @@2592000 +0000
	error: could not commit staged changes.

    This isn't reproducible without --committer-date-is-author-date.
    More context (the test where it happens) is in [2].

 4. I suspect the exit status in the "you need to resolve conflicts"
    case has changed.  With rebase --am, [3] would automatically
    invoke rebase --abort when conflicts are present, but with rebase
    --merge it does not.

Known?

Thanks,
Jonathan

[1] https://fuchsia.googlesource.com/jiri/+/60436c301224231cb99be41ce937dfc223bee272/project/manifest.go#1347
[2] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/master:tests/git_common_test.py;l=721;drc=6b52dc21e166c46707b4c8eb26c74c70d4f9977e;bpv=1;bpt=0
[3] https://fuchsia.googlesource.com/jiri/+/60436c301224231cb99be41ce937dfc223bee272/project/project.go#1664

^ permalink raw reply

* Re: [RFC PATCH] unpack-trees: watch for out-of-range index position
From: Emily Shaffer @ 2020-01-10 23:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20200110063741.GA409153@coredump.intra.peff.net>

On Fri, Jan 10, 2020 at 01:37:41AM -0500, Jeff King wrote:
> On Thu, Jan 09, 2020 at 02:46:41PM -0800, Emily Shaffer wrote:
> 
> > > Perhaps. The integrity check only protects against an index that was
> > > modified after the fact, not one that was generated by a buggy Git. I'm
> > > not sure we know how the index that led to this patch got into this
> > > state (though it sounds like Emily has a copy and could check the hash
> > > on it), but other cache-tree segfault I found recently was with an index
> > > with an intact integrity hash.
> > 
> > Yeah, I can do that, although I'm not sure how. The index itself is very
> > small - it only contains one file and one tree extension - so I'll go
> > ahead and paste some poking and prodding, and if it's not what you
> > wanted then please let me know what else to run.
> 
> I was thinking you would run something like:
> 
>   size=$(stat --format=%s "$file")
>   actual=$(head -c $(($size-20)) "$file" | sha1sum | awk '{print $1}')
>   expect=$(xxd -s -20 -g 20 -c 20 "$file" | awk '{print $2}')
>   if test "$actual" = "$expect"; then
>           echo "OK ($actual)"
>   else
>           echo "FAIL ($actual != $expect)"
>   fi
> 
> to manually check the sha1.

Unsurprising given your mail, yeah, this looks OK when I run it against
the repo in question.

> So this bogus index was probably actually created by Git, not an
> after-the-fact byte corruption.

Disappointingly, the repro repo we got was aggressively redacted - I
don't have any reflogs to look through and try and get a hint of what
happened, and I imagine the reporter has moved on with their life enough
that we can't get something useful from there now.

 - Emily

^ permalink raw reply

* Re: [PATCH] fetch: emphasize failure during submodule fetch
From: Emily Shaffer @ 2020-01-10 23:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqtv53kquh.fsf@gitster-ct.c.googlers.com>

On Fri, Jan 10, 2020 at 12:18:30PM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > In cases when a submodule fetch fails when there are many submodules, the error
> > from the lone failing submodule fetch is buried under activity on the other
> > submodules if more than one fetch fell back on fetch-by-oid. Call out a failure
> > late so the user is aware that something went wrong.
> >
> > Example without this change:
> 
> >   $ git pull --rebase
> >   remote: Counting objects: 1591, done
> >   remote: Finding sources: 100% (4317/4317)
> >   remote: Total 4317 (delta 1923), reused 4252 (delta 1923)
> >   Receiving objects: 100% (4317/4317), 2.09 MiB | 8.15 MiB/s, done.
> >   Resolving deltas: 100% (1923/1923), completed with 101 local objects.
> >   From https://android.googlesource.com/platform/superproject
> >   [snip ~100 lines]
> >   From https://android.googlesource.com/platform/prebuilts/fullsdk/platforms/android-29
> >    * branch            a97149980b7d8acf48392af591b35689f7205d9e -> FETCH_HEAD
> >   From https://android.googlesource.com/platform/prebuilts/fullsdk-darwin/platform-tools
> >    * branch            98f9454af8ca210818eff4f502097c471d7327b5 -> FETCH_HEAD
> >   From https://android.googlesource.com/platform/prebuilts/checkstyle
> >    * branch            6fb3e23f05ed186908ea9f48d6692220891363b0 -> FETCH_HEAD
> >    * branch            f21d92f6339f0993a946b25fa2172c2ceb5e332b -> FETCH_HEAD
> >   From https://android.googlesource.com/platform/prebuilts/androidx/studio
> >    * branch            bed5e7b5866b8698bbcd1879134b03ac312a2ba8 -> FETCH_HEAD
> >   From https://android.googlesource.com/platform/prebuilts/androidx/internal
> >    * branch                179375220f834de5dfbee169f4c2f948d850a203 -> FETCH_HEAD
> >    * branch                1dcf3ceef9a86001c693fa34b3513f0c4af26178 -> FETCH_HEAD
> >    * branch                2ea3ccef4c98f5de1b74affd1dda33f5b2834a45 -> FETCH_HEAD
> >    * branch                a09de09c3814c3d31cc770d5351b92d29ea624ae -> FETCH_HEAD
> >    * branch                d2ae6add8b2c0e28899e4faeb2d6889ceefb0b62 -> FETCH_HEAD
> >    * branch                e244e2a5f7d98f47f75d06ef57ef1c6c5701a38d -> FETCH_HEAD
> >   Auto packing the repository in background for optimum performance.
> >   See "git help gc" for manual housekeeping.
> >   From https://android.googlesource.com/platform/prebuilts/androidx/external
> >    * branch              c3df2fa7f3e63b8714ac8d24f86a26cc50ee4af5 -> FETCH_HEAD
> >   fatal: remote error: want c5bd7796550b3742772c8fb8c73a1311013b5159 not valid
> >   From https://android.googlesource.com/platform/external/noto-fonts
> >    * branch            02969d3046f6944a5a211d2331d1c82736487f01 -> FETCH_HEAD
> >    * branch            9ee45fcd0b8bb8621c1cdbc6de5fe7502eff7393 -> FETCH_HEAD
> >   From https://android.googlesource.com/platform/external/dokka
> >    * branch            03a8ed966a7b847931a0ee20327f989837aaff13 -> FETCH_HEAD
> >    * branch            cb1684602b5b4e18385d890c972764c55d177704 -> FETCH_HEAD
> >    * branch            fd4521e89ab0e01447dda9b42be2b9bbc000f02f -> FETCH_HEAD
> >   From https://android.googlesource.com/platform/external/doclava
> >    * branch            04ddf3962f0cd40c81a2e144f27f497223782457 -> FETCH_HEAD
> >    * branch            44bf22680e939b21a21a365f6038d5883d5163c8 -> FETCH_HEAD
> >    * branch            66f673f4a3865f3b4ab645655a6484101dbd051f -> FETCH_HEAD
> >
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > As hinted by the snippet in the commit-message (should I remove it? I
> > think it's a poignant example, I couldn't see the fatal without grepping
> > even after being told it was there) this manifested to an end user via
> > 'git pull'.
> 
> It indeed is too noisy, especially without showing what happens with
> this patch.

Sure, it makes sense. I'll take it out in next round.

> 
> Is it clear to the users that a block of lines starting "From $URL"
> and ending before the next "From $AnotherURL" is about the same
> repository, including error messages?

Well, for me - and the bug reporter - the "fatal" line visually blends
in with the "From" next to it. I think once you see the "fatal" line
it's clear where it's coming from, sure.

I wonder if the line order still holds with -j specified, though.

> 
> > diff --git a/builtin/fetch.c b/builtin/fetch.c
> > index b4c6d921d0..0c19781cb9 100644
> > --- a/builtin/fetch.c
> > +++ b/builtin/fetch.c
> > @@ -1857,6 +1857,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
> >  						    verbosity < 0,
> >  						    max_children);
> >  		argv_array_clear(&options);
> > +		if (result)
> > +			fprintf(stderr, _("Failure during submodule fetch.\n"));
> 
> How does a user find out which submodule had trouble with after
> seeing this message?  Or is it something you still need to find by
> scrolling back?

The handiest way is probably the latter; maybe there is some way to
achieve the former, but my submodule fu isn't strong enough for me to
answer from the top of my head.

> 
> If the latter, I am not sure if there is much point to add a
> half-way solution like this.  It is a different story if "fetch"
> exits with success status when this happens, but I do not think the
> "result" that is non-zero is being lost before the function returns,
> so...

I agree, although I do find it irritating that there's no final
success/failure log line from 'git fetch'. I personally don't run 'echo
$?' after every step in my Git workflow.

It's less trivial (a low bar) to try and point out the submodule(s) which
had an issue by this point, but I can give it a shot if you are open to
the change.

> 
> >  	}
> >  
> >  	string_list_clear(&list, 0);

^ permalink raw reply

* Re: [PATCH v2 00/16] t: replace incorrect test_must_fail usage (part 2)
From: Eric Sunshine @ 2020-01-10 21:45 UTC (permalink / raw)
  To: Denton Liu
  Cc: Git Mailing List, Jakub Narebski, Johannes Sixt, Junio C Hamano
In-Reply-To: <cover.1578372682.git.liu.denton@gmail.com>

On Mon, Jan 6, 2020 at 11:53 PM Denton Liu <liu.denton@gmail.com> wrote:
> Changes since v1:
> * Clean up as suggested by Eric
> * Replace 12/16 with J6t's patch

In the future, please also provide a link in the mailing list archive
to the previous round (v1 [1], in this case) to help reviewers
understand what these bullet points mean since the points themselves
are quite lacking in detail. Pointing people at the previous round
helps not only those who reviewed that round -- who may have already
forgotten the details of their own and others' review comments -- but
will also help onboard people who start reviewing at this round.

> Range-diff against v1:
>  -:  ---------- >  3:  501eb147c3 t2018: improve style of if-statement
> 16:  31aa0c7d15 <  -:  ---------- t4124: let sed open its own files

Dropping patches when re-rolling is fine, but please do not sneak in
new patches unrelated to any of the original changes in this series.
Not only is it likely that such a patch will get overlooked by
reviewers, but even when it is noticed, reviewers have to spend extra
time and effort trying to understand why the patch was added in the
first place (especially since there is not even a mention of it in the
cover letter). After digging into it, I can see that this newly-added
patch (3/16) is an additional sensible style cleanup to the same test
script that many of the other patches are touching, so in some sense
one could argue that it's very vaguely related to v1, but please keep
in mind the potentially negative effect it can have on reviewers[2]
when new changes are added to a series, causing the series to diverge
rather than converge.

[1]: https://lore.kernel.org/git/cover.1577454401.git.liu.denton@gmail.com/
[2]: https://lore.kernel.org/git/CAPig+cQqK-HiDjmnBFo-qeE6cZ73EveWg6Ygb-4BX3X_iPSJZA@mail.gmail.com/

^ permalink raw reply

* Re: [PATCH v3 2/2] rebase-interactive: warn if commit is dropped with `rebase --edit-todo'
From: Alban Gruin @ 2020-01-10 21:31 UTC (permalink / raw)
  To: phillip.wood, Junio C Hamano; +Cc: git, Johannes Schindelin
In-Reply-To: <79ad2e01-2fc3-cd52-7879-5a81370628ef@gmail.com>

Hi Phillip,

Le 10/01/2020 à 18:13, Phillip Wood a écrit :
> Hi Alban
> 
> On 09/01/2020 21:13, Alban Gruin wrote:
>> Hi Phillip,
>>
>> Le 09/12/2019 à 17:00, Phillip Wood a écrit :
>>>>> diff --git a/rebase-interactive.c b/rebase-interactive.c
>>>>> index ad5dd49c31..80b6a2e7a6 100644
>>>>> --- a/rebase-interactive.c
>>>>> +++ b/rebase-interactive.c
>>>>> @@ -97,7 +97,8 @@ int edit_todo_list(struct repository *r, struct
>>>>> todo_list *todo_list,
>>>>>               struct todo_list *new_todo, const char *shortrevisions,
>>>>>               const char *shortonto, unsigned flags)
>>>>>    {
>>>>> -    const char *todo_file = rebase_path_todo();
>>>>> +    const char *todo_file = rebase_path_todo(),
>>>>> +        *todo_backup = rebase_path_todo_backup();
>>>>>        /* If the user is editing the todo list, we first try to parse
>>>>> @@ -110,9 +111,9 @@ int edit_todo_list(struct repository *r, struct
>>>>> todo_list *todo_list,
>>>>>                        -1, flags | TODO_LIST_SHORTEN_IDS |
>>>>> TODO_LIST_APPEND_TODO_HELP))
>>>>>            return error_errno(_("could not write '%s'"), todo_file);
>>>>>    -    if (initial && copy_file(rebase_path_todo_backup(), todo_file,
>>>>> 0666))
>>>>> -        return error(_("could not copy '%s' to '%s'."), todo_file,
>>>>> -                 rebase_path_todo_backup());
>>>>> +    unlink(todo_backup);
>>>>> +    if (copy_file(todo_backup, todo_file, 0666))
>>>>> +        return error(_("could not copy '%s' to '%s'."), todo_file,
>>>>> todo_backup);
>>>>
>>>> We used to copy ONLY when initial is set and we left old todo_backup
>>>> intact when !initial.  That is no longer true after this change, but
>>>> it is intended---we create an exact copy of what we would hand out
>>>> to the end-user, so that we can compare it with the edited result
>>>> to figure out what got changed.
>>>
>>> I think it would be better to only create a new copy if the last edit
>>> was successful. As it stands if I edit the todo list and accidentally
>>> delete some lines and then edit the todo list again to try and fix it
>>> the second edit will succeed whether or not I reinserted the deleted
>>> lines.
>>>
>>> We could add this to the tests to check that a subsequent edit that does
>>> not fix the problem fails
>>>
>>> diff --git a/t/t3404-rebase-interactive.sh
>>> b/t/t3404-rebase-interactive.sh
>>> index 969e12d281..8544d8ab2c 100755
>>> --- a/t/t3404-rebase-interactive.sh
>>> +++ b/t/t3404-rebase-interactive.sh
>>>
>>> @@ -1416,6 +1416,7 @@ test_expect_success 'rebase --edit-todo respects
>>> rebase.missingCommitsCheck = er
>>>                  test_i18ncmp expect actual &&
>>>                  test_must_fail git rebase --continue 2>actual &&
>>>                  test_i18ncmp expect actual &&
>>> +               test_must_fail git rebase --edit-todo &&
>>>                  cp orig .git/rebase-merge/git-rebase-todo &&
>>>                  test_must_fail env FAKE_LINES="1 2 3 4" \
>>>                          git rebase --edit-todo 2>actual &&
>>>
>>>
>>
>> In which case, if the check did not pass at the previous edit, the new
>> todo list should be compared to the backup.  As sequencer_continue()
>> already does this, extract this to its own function in
>> rebase-interactive.c.  To keep track of this, a file is created on the
>> disk (as you suggested in your other email.)  At the next edit, if this
>> file exists and no errors were found, it is deleted.  The backup is only
>> created if there is no errors in `todo_list' and in `new_todo'.
>>
>> This would guarantee that there is no errors in the backup, and that the
>> edited list is always compared to a list exempt of errors.
>>
>> This approach also has the benefit to detect if a commit part of a
>> badcmd was dropped.
>>
>> After some tweaks (ie. `expect' now lists 2 commits instead of one),
>> this passes the test with the change you suggested, and the one you sent
>> in your other email.
> 
> That sounds good. I'm not sure how it passes the test in my other email
> though, if sequencer_continue() compares the todo list to the backup
> wont it still fail when continuing after conflicts as the backup is out
> of date?
> 

I changed sequencer_continue() to check the todo list only if the file
indicating an error exists.

I still have to rewrite the commit message, then I’ll re-send this series.

Cheers,
Alban

> Best Wishes
> 
> Phillip
> 

^ permalink raw reply

* Re: [PATCH] fetch: emphasize failure during submodule fetch
From: Junio C Hamano @ 2020-01-10 20:18 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git
In-Reply-To: <20200110195533.6416-1-emilyshaffer@google.com>

Emily Shaffer <emilyshaffer@google.com> writes:

> In cases when a submodule fetch fails when there are many submodules, the error
> from the lone failing submodule fetch is buried under activity on the other
> submodules if more than one fetch fell back on fetch-by-oid. Call out a failure
> late so the user is aware that something went wrong.
>
> Example without this change:

>   $ git pull --rebase
>   remote: Counting objects: 1591, done
>   remote: Finding sources: 100% (4317/4317)
>   remote: Total 4317 (delta 1923), reused 4252 (delta 1923)
>   Receiving objects: 100% (4317/4317), 2.09 MiB | 8.15 MiB/s, done.
>   Resolving deltas: 100% (1923/1923), completed with 101 local objects.
>   From https://android.googlesource.com/platform/superproject
>   [snip ~100 lines]
>   From https://android.googlesource.com/platform/prebuilts/fullsdk/platforms/android-29
>    * branch            a97149980b7d8acf48392af591b35689f7205d9e -> FETCH_HEAD
>   From https://android.googlesource.com/platform/prebuilts/fullsdk-darwin/platform-tools
>    * branch            98f9454af8ca210818eff4f502097c471d7327b5 -> FETCH_HEAD
>   From https://android.googlesource.com/platform/prebuilts/checkstyle
>    * branch            6fb3e23f05ed186908ea9f48d6692220891363b0 -> FETCH_HEAD
>    * branch            f21d92f6339f0993a946b25fa2172c2ceb5e332b -> FETCH_HEAD
>   From https://android.googlesource.com/platform/prebuilts/androidx/studio
>    * branch            bed5e7b5866b8698bbcd1879134b03ac312a2ba8 -> FETCH_HEAD
>   From https://android.googlesource.com/platform/prebuilts/androidx/internal
>    * branch                179375220f834de5dfbee169f4c2f948d850a203 -> FETCH_HEAD
>    * branch                1dcf3ceef9a86001c693fa34b3513f0c4af26178 -> FETCH_HEAD
>    * branch                2ea3ccef4c98f5de1b74affd1dda33f5b2834a45 -> FETCH_HEAD
>    * branch                a09de09c3814c3d31cc770d5351b92d29ea624ae -> FETCH_HEAD
>    * branch                d2ae6add8b2c0e28899e4faeb2d6889ceefb0b62 -> FETCH_HEAD
>    * branch                e244e2a5f7d98f47f75d06ef57ef1c6c5701a38d -> FETCH_HEAD
>   Auto packing the repository in background for optimum performance.
>   See "git help gc" for manual housekeeping.
>   From https://android.googlesource.com/platform/prebuilts/androidx/external
>    * branch              c3df2fa7f3e63b8714ac8d24f86a26cc50ee4af5 -> FETCH_HEAD
>   fatal: remote error: want c5bd7796550b3742772c8fb8c73a1311013b5159 not valid
>   From https://android.googlesource.com/platform/external/noto-fonts
>    * branch            02969d3046f6944a5a211d2331d1c82736487f01 -> FETCH_HEAD
>    * branch            9ee45fcd0b8bb8621c1cdbc6de5fe7502eff7393 -> FETCH_HEAD
>   From https://android.googlesource.com/platform/external/dokka
>    * branch            03a8ed966a7b847931a0ee20327f989837aaff13 -> FETCH_HEAD
>    * branch            cb1684602b5b4e18385d890c972764c55d177704 -> FETCH_HEAD
>    * branch            fd4521e89ab0e01447dda9b42be2b9bbc000f02f -> FETCH_HEAD
>   From https://android.googlesource.com/platform/external/doclava
>    * branch            04ddf3962f0cd40c81a2e144f27f497223782457 -> FETCH_HEAD
>    * branch            44bf22680e939b21a21a365f6038d5883d5163c8 -> FETCH_HEAD
>    * branch            66f673f4a3865f3b4ab645655a6484101dbd051f -> FETCH_HEAD
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> As hinted by the snippet in the commit-message (should I remove it? I
> think it's a poignant example, I couldn't see the fatal without grepping
> even after being told it was there) this manifested to an end user via
> 'git pull'.

It indeed is too noisy, especially without showing what happens with
this patch.

Is it clear to the users that a block of lines starting "From $URL"
and ending before the next "From $AnotherURL" is about the same
repository, including error messages?

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index b4c6d921d0..0c19781cb9 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1857,6 +1857,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  						    verbosity < 0,
>  						    max_children);
>  		argv_array_clear(&options);
> +		if (result)
> +			fprintf(stderr, _("Failure during submodule fetch.\n"));

How does a user find out which submodule had trouble with after
seeing this message?  Or is it something you still need to find by
scrolling back?

If the latter, I am not sure if there is much point to add a
half-way solution like this.  It is a different story if "fetch"
exits with success status when this happens, but I do not think the
"result" that is non-zero is being lost before the function returns,
so...

>  	}
>  
>  	string_list_clear(&list, 0);

^ permalink raw reply

* Re: [PATCH] userdiff: add Julia to supported userdiff languages
From: Johannes Sixt @ 2020-01-10 20:14 UTC (permalink / raw)
  To: Ryan Zoeller; +Cc: Ryan Zoeller via GitGitGadget, git@vger.kernel.org
In-Reply-To: <YqXJB_En1hjDpzUmawtrEYBztOGh_1uSmwxsQyQ646RtwGnYppq-xg26rwM0x5pu6E8PMVq22HwfSSFXN9aaD9BowSUH1YfztHRUomhOe6E=@rtzoeller.com>

Am 10.01.20 um 19:15 schrieb Ryan Zoeller:
> On Friday, January 10, 2020 11:43 AM, Johannes Sixt <j6t@kdbg.org> wrote:
>> Am 10.01.20 um 04:10 schrieb Ryan Zoeller via GitGitGadget:
>>> -   /* Real and complex literals */
>>> -   "|[-+0-9.e_(im)]+"
>>
>> I am curious: is '(1+2i)' a single literal -- including the parentheses?
>> The expression would also mistake the character sequence '-1)+(2+' as a
>> single word; is it intended?
> 
> This part of the regular expression has a pretty major mistake due
> to me misunderstanding how the parentheses were being interpreted.
> It should be something along the lines of `([-+0-9.e_]|im)+`.
> 
> Julia uses `im` as the designation for an imaginary value; this regex
> was intended to admit e.g. 1+2im, in addition other numeric values
> such as 1_000_000 and 1e10.
I see. I suggest to treat 1+2im as three words '1', '+', and '2im', and
to model numbers in this way:

	|[0-9][0-9_.]*(e[-+]?[0-9_]*)?(im)?

In particular, require a digit at the begin, and do not allow '-' and
'+' an arbitrary number of times, because it would catch 1+2+3+4 as a
single word.

-- Hannes

^ permalink raw reply

* [PATCH] fetch: emphasize failure during submodule fetch
From: Emily Shaffer @ 2020-01-10 19:55 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

In cases when a submodule fetch fails when there are many submodules, the error
from the lone failing submodule fetch is buried under activity on the other
submodules if more than one fetch fell back on fetch-by-oid. Call out a failure
late so the user is aware that something went wrong.

Example without this change:

  $ git pull --rebase
  remote: Counting objects: 1591, done
  remote: Finding sources: 100% (4317/4317)
  remote: Total 4317 (delta 1923), reused 4252 (delta 1923)
  Receiving objects: 100% (4317/4317), 2.09 MiB | 8.15 MiB/s, done.
  Resolving deltas: 100% (1923/1923), completed with 101 local objects.
  From https://android.googlesource.com/platform/superproject
  [snip ~100 lines]
  From https://android.googlesource.com/platform/prebuilts/fullsdk/platforms/android-29
   * branch            a97149980b7d8acf48392af591b35689f7205d9e -> FETCH_HEAD
  From https://android.googlesource.com/platform/prebuilts/fullsdk-darwin/platform-tools
   * branch            98f9454af8ca210818eff4f502097c471d7327b5 -> FETCH_HEAD
  From https://android.googlesource.com/platform/prebuilts/checkstyle
   * branch            6fb3e23f05ed186908ea9f48d6692220891363b0 -> FETCH_HEAD
   * branch            f21d92f6339f0993a946b25fa2172c2ceb5e332b -> FETCH_HEAD
  From https://android.googlesource.com/platform/prebuilts/androidx/studio
   * branch            bed5e7b5866b8698bbcd1879134b03ac312a2ba8 -> FETCH_HEAD
  From https://android.googlesource.com/platform/prebuilts/androidx/internal
   * branch                179375220f834de5dfbee169f4c2f948d850a203 -> FETCH_HEAD
   * branch                1dcf3ceef9a86001c693fa34b3513f0c4af26178 -> FETCH_HEAD
   * branch                2ea3ccef4c98f5de1b74affd1dda33f5b2834a45 -> FETCH_HEAD
   * branch                a09de09c3814c3d31cc770d5351b92d29ea624ae -> FETCH_HEAD
   * branch                d2ae6add8b2c0e28899e4faeb2d6889ceefb0b62 -> FETCH_HEAD
   * branch                e244e2a5f7d98f47f75d06ef57ef1c6c5701a38d -> FETCH_HEAD
  Auto packing the repository in background for optimum performance.
  See "git help gc" for manual housekeeping.
  From https://android.googlesource.com/platform/prebuilts/androidx/external
   * branch              c3df2fa7f3e63b8714ac8d24f86a26cc50ee4af5 -> FETCH_HEAD
  fatal: remote error: want c5bd7796550b3742772c8fb8c73a1311013b5159 not valid
  From https://android.googlesource.com/platform/external/noto-fonts
   * branch            02969d3046f6944a5a211d2331d1c82736487f01 -> FETCH_HEAD
   * branch            9ee45fcd0b8bb8621c1cdbc6de5fe7502eff7393 -> FETCH_HEAD
  From https://android.googlesource.com/platform/external/dokka
   * branch            03a8ed966a7b847931a0ee20327f989837aaff13 -> FETCH_HEAD
   * branch            cb1684602b5b4e18385d890c972764c55d177704 -> FETCH_HEAD
   * branch            fd4521e89ab0e01447dda9b42be2b9bbc000f02f -> FETCH_HEAD
  From https://android.googlesource.com/platform/external/doclava
   * branch            04ddf3962f0cd40c81a2e144f27f497223782457 -> FETCH_HEAD
   * branch            44bf22680e939b21a21a365f6038d5883d5163c8 -> FETCH_HEAD
   * branch            66f673f4a3865f3b4ab645655a6484101dbd051f -> FETCH_HEAD

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
As hinted by the snippet in the commit-message (should I remove it? I
think it's a poignant example, I couldn't see the fatal without grepping
even after being told it was there) this manifested to an end user via
'git pull'. As I was tracking down the right place to "bump" the error
line, I noticed that 'git pull' is made out of run_command() calls, ever
since it was adopted from bash ~5 years ago. Is there interest in
refactoring it to use library calls instead, or do folks consider 'pull'
to be such a thin layer over 'git fetch && git (rebase|merge)' that it's
not worth the effort?

 - Emily

 builtin/fetch.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b4c6d921d0..0c19781cb9 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1857,6 +1857,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 						    verbosity < 0,
 						    max_children);
 		argv_array_clear(&options);
+		if (result)
+			fprintf(stderr, _("Failure during submodule fetch.\n"));
 	}
 
 	string_list_clear(&list, 0);
-- 
2.25.0.rc1.283.g88dfdc4193-goog


^ permalink raw reply related

* Re: Question about the pack OBJ_OFS_DELTA format
From: Junio C Hamano @ 2020-01-10 19:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Erik Fastermann, git
In-Reply-To: <20200110095707.GA459765@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> The pack-format.txt file says:
>
>        offset encoding:
>             n bytes with MSB set in all but the last one.
>             The offset is then the number constructed by
>             concatenating the lower 7 bit of each byte, and
>             for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
>             to the result.
>
> but I think is missing two bits of information:
>
>   - the bytes are in most-significant to least-significant order, which
>     IIRC is the opposite of the size varint
>
>   - each 7-bit byte sneaks in some extra data by implicitly adding "1"
>     to all but the last byte

Isn't the latter mentioned in the paragraph you quoted?

^ permalink raw reply

* Re: git submodule update  strange output behavior.
From: Junio C Hamano @ 2020-01-10 19:36 UTC (permalink / raw)
  To: Carlo Wood; +Cc: git
In-Reply-To: <20200110101251.3b9f9332@hikaru>

Carlo Wood <carlo@alinoe.com> writes:

> It seems to me that the other part of the problem is printing
> this output for submodules when nothing (needed to be) is fetched.

Hmm, I am not sure if that is a reasonable expectation.  Would it be
possible to tell if there is something that needs to be fetched
without attempting to contact the other side?

^ permalink raw reply

* Re: [PATCH] unpack-trees: correctly compute result count
From: Junio C Hamano @ 2020-01-10 19:33 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, johannes.schindelin,
	Derrick Stolee
In-Reply-To: <20200110103729.GA470836@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Fri, Jan 10, 2020 at 01:59:30AM +0000, Derrick Stolee via GitGitGadget wrote:
>
>>     Here is a very small fix to the cone-mode pattern-matching in
>>     unpack-trees.c. Johannes found this while running a Coverity scan for
>>     other issues. He alerted me to the problem and I could immediately see 
>>     my error here. In creating this patch, most of my time was spent asking
>>     "how did this work before?" and "why didn't this hurt performance?"
>>     Hopefully my commit message explains this thoroughly enough.
>
> Yes, it makes perfect sense (and as soon as I saw the explanation of the
> problem, my immediate response was also "wait, how did this even work").
>
> And the patch itself looks good.
>
>>     As for making it into the release, I don't know. The change is small, it
>>     has a very limited scope, but this flaw is also not really hurting
>>     anything in a major way.
>
> I could go either way.
>
> This counts as something small and obvious enough that I'd consider
> slipping it in at the last minute if it were fixing a bad bug. But given
> how minor the bug is, being conservative makes sense to me, if only
> because it's good to exercise our release discipline muscles. :)

Heh.

On one hand, it is obvious, even to a mindless compiler, that the
author meant to count how many elements of cache[] array have been
processed in the loop, so it is clear that the patch makes the code
reflect the author's original intention better.

On the other hand, the code that reflects the author's original
intention has never been tested in the field---it could be possible
that the author thought cache[0] thru cache_end[0] have been
processed, but for some subtlety, only a very early part of the
range was correctly processed, and returning smaller range may have
been hiding that subtle bug ;-)

I do not see any such subtle bug in this particular case, so I am
somewhat tempted to say "it is clear that the 'fix' makes the code
do what the author wanted to do, *and* what the author wanted to do
seems sane, so let's apply it".

Having said that, the earlier part of the patch, i.e.

 	if (pl->use_cone_patterns && orig_ret == MATCHED_RECURSIVE) {
 		struct cache_entry **ce = cache;
-		rc = (cache_end - cache) / sizeof(struct cache_entry *);
+		rc = cache_end - cache;
 
 		while (ce < cache_end) {
 			(*ce)->ce_flags &= ~clear_mask;
 			ce++;
 		}

wouldn't have been needed any fix if it were actually *counting*,
which is what clear_ce_flags_dir() promises its callers it does,
instead of cheating with a subtraction. i.e.

	if (... RECURSIVE) {
		rc = 0;
		while (ce < cache_end) {
			clear;
			ce++;
			rc++;
		}
	}

and that may have been more future-proof way, in that the body of
the while loop may in the future decide to leave early etc. and
actually counting how far the processing progressed would be less
error prone.

ALso, the computation of cache_end earlier in the function looks
suspiciously similar to the code Emily recently touched to avoid
segfaulting against inconsistent index state, where it used
the entry count in the cache_tree structure (when it is valid) to
determine how many entries to skip over.  We may want to see if we
can apply the same optimization here.

Thanks.  Will queue but will not merge to 'master' ;-)

^ permalink raw reply

* Re: [PATCH v2 05/11] object-store: allow threaded access to object reading
From: Christian Couder @ 2020-01-10 19:07 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Jeff King, Jonathan Tan, git,
	Оля Тележная,
	Nguyễn Thái Ngọc Duy, Junio C Hamano,
	Jonathan Nieder, Stefan Beller
In-Reply-To: <CAHd-oW7WhSs2PDSwhksu0Kh5pAJE3rw3nmc-8VJ9dkJitz_N8A@mail.gmail.com>

Hi Matheus,

On Thu, Jan 9, 2020 at 11:02 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Thu, Dec 19, 2019 at 5:27 PM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:

> > However, re-inspecting the code, it seemed to me that we might already
> > have a thread-safe mechanism. The window disposal operations (at
> > close_pack_windows() and unuse_one_window()) are only performed if
> > window.inuse_cnt == 0. So as a thread which reads from the window will
> > also previously increment its inuse_cnt, wouldn't the reading
> > operation be already protected?
> >
> > Another concern would be close_pack_fd(), which can close packs even
> > with in-use windows. However, as the mmap docs[1] says: "closing the
> > file descriptor does not unmap the region".
> >
> > Finally, we also considered reprepare_packed_git() as a possible
> > conflicting operation. But the function called by it to handle
> > packfile opening, prepare_pack(), won't reopen already available
> > packs. Therefore, IIUC, it will leave the opened windows intact.
> >
> > So, aren't perhaps the window readings performed outside the
> > obj_read_mutex critical section already thread-safe?
>
> Any thoughts on this?

My advice, if you don't hear from anyone in the next few days, is to
update the commit message, the cover letter and the comments inside
the patches with the information you researched and to resubmit a new
version of the patch series.

Thanks,
Christian.

^ permalink raw reply

* Re: [PATCH] userdiff: add Julia to supported userdiff languages
From: Ryan Zoeller @ 2020-01-10 18:15 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Ryan Zoeller via GitGitGadget, git@vger.kernel.org
In-Reply-To: <49decf8e-87a9-120f-4c2b-cafc5aa1d466@kdbg.org>

On Friday, January 10, 2020 11:43 AM, Johannes Sixt <j6t@kdbg.org> wrote:

> Am 10.01.20 um 04:10 schrieb Ryan Zoeller via GitGitGadget:
>
> > Add xfuncname and word_regex patterns for Julia1,
> > which is a language used in numerical analysis and
> > computational science.
> > The default behavior for xfuncname did not allow
> > functions to be indented, nor functions to have a
> > macro applied, such as @inline or @generated.
> >
> > Signed-off-by: Ryan Zoeller rtzoeller@rtzoeller.com
> >
> > ----------------------------------------------------
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-521%2Frtzoeller%2Fjulia_userdiff-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-521/rtzoeller/julia_userdiff-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/521
> > Documentation/gitattributes.txt | 2 ++
> > t/t4018-diff-funcname.sh | 1 +
> > t/t4018/julia-function | 5 +++++
> > t/t4018/julia-indented-function | 8 ++++++++
> > t/t4018/julia-inline-function | 5 +++++
> > t/t4018/julia-macro | 5 +++++
> > t/t4018/julia-mutable-struct | 5 +++++
> > t/t4018/julia-struct | 5 +++++
> > userdiff.c | 15 +++++++++++++++
> > 9 files changed, 51 insertions(+)
> > create mode 100644 t/t4018/julia-function
> > create mode 100644 t/t4018/julia-indented-function
> > create mode 100644 t/t4018/julia-inline-function
> > create mode 100644 t/t4018/julia-macro
> > create mode 100644 t/t4018/julia-mutable-struct
> > create mode 100644 t/t4018/julia-struct
>
> The tests all look good.
>
> > diff --git a/userdiff.c b/userdiff.c
> > index efbe05e5a5..b5e938b1c2 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -79,6 +79,21 @@ PATTERNS("java",
> > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
> > "|[-+*/<>%&^|=!]="
> > "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> > +PATTERNS("julia",
> >
> > -   "^[ \t](((mutable[ \t]+)?struct|(@.+[ \t])?function|macro)[ \t].)$",
>
> Looks good to me.
>
> > -   /* -- */
> > -   /* Binary literals */
> > -   "[-+]?0b[01]+"
> > -   /* Hexadecimal literals */
> > -   "|[-+]?0x[0-9a-fA-F]+"
>
> These two could be merged into
>
> /* Binary and hexadecimal literals */
> "|0[bx][0-9a-fA-F]+"

I was trying to avoid `0b` being followed by hex characters from being recognized, e.g. 0bFF. This is admittedly not really a concern, so I'm fine making this change to simplify the regular expression.

>
> Note that I did not insert [-+]? at the front. Even though most if not
> all patterns allow a sign, they are usually wrong to do so, because they
> misclassify a change from 'a+1' to 'a+2' as 'a[-+1-]{++2+}' instead of
> the correct 'a+[-1-]{+2+}'.

I'm fine dropping the leading `[-+]?`.

>
> > -   /* Real and complex literals */
> > -   "|[-+0-9.e_(im)]+"
>
> I am curious: is '(1+2i)' a single literal -- including the parentheses?
> The expression would also mistake the character sequence '-1)+(2+' as a
> single word; is it intended?

This part of the regular expression has a pretty major mistake due to me misunderstanding how the parentheses were being interpreted. It should be something along the lines of `([-+0-9.e_]|im)+`.

Julia uses `im` as the designation for an imaginary value; this regex was intended to admit e.g. 1+2im, in addition other numeric values such as 1_000_000 and 1e10.

>
> > -   /* Should theoretically allow Unicode characters as part of
> > -   -   a word, such as U+2211. However, Julia reserves most of the
> > -   -   U+2200-U+22FF range (as well as others) as user-defined operators,
> > -   -   therefore they are not handled in this regex. */
> > -   "|[a-zA-Z_][a-zA-Z0-9_!]*"
> > -   "|--|\\+\\+|<<=?|>>>=?|>>=?|\\\\\\\\=?|//=?|&&|\\|\\||::|->|[-+*/<>%^&|=!$]=?"),
>
> The last sub-expression permits single-character operators in addition
> to their forms with a '=' appended (computing assignment, I presume).
> You could remove the trailing ? because single non-whitespace characters
> are always a word of their own, even if they are not caught by the word
> regexp.

Agreed, I'll drop the trailing ?.

>
> > PATTERNS("matlab",
> > /*
> > * Octave pattern is mostly the same as matlab, except that '%%%' and
> > base-commit: 042ed3e048af08014487d19196984347e3be7d1c
>
> -- Hannes

Thanks for the feedback,
Ryan Zoeller


^ permalink raw reply

* Re: [PATCH] userdiff: add Julia to supported userdiff languages
From: Johannes Sixt @ 2020-01-10 17:43 UTC (permalink / raw)
  To: Ryan Zoeller via GitGitGadget; +Cc: git, Ryan Zoeller
In-Reply-To: <pull.521.git.1578625810098.gitgitgadget@gmail.com>

Am 10.01.20 um 04:10 schrieb Ryan Zoeller via GitGitGadget:
> Add xfuncname and word_regex patterns for Julia[1],
> which is a language used in numerical analysis and
> computational science.
> 
> The default behavior for xfuncname did not allow
> functions to be indented, nor functions to have a
> macro applied, such as @inline or @generated.
> 
> [1]: https://julialang.org
> 
> Signed-off-by: Ryan Zoeller <rtzoeller@rtzoeller.com>
> ---
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-521%2Frtzoeller%2Fjulia_userdiff-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-521/rtzoeller/julia_userdiff-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/521
> 
>  Documentation/gitattributes.txt |  2 ++
>  t/t4018-diff-funcname.sh        |  1 +
>  t/t4018/julia-function          |  5 +++++
>  t/t4018/julia-indented-function |  8 ++++++++
>  t/t4018/julia-inline-function   |  5 +++++
>  t/t4018/julia-macro             |  5 +++++
>  t/t4018/julia-mutable-struct    |  5 +++++
>  t/t4018/julia-struct            |  5 +++++
>  userdiff.c                      | 15 +++++++++++++++
>  9 files changed, 51 insertions(+)
>  create mode 100644 t/t4018/julia-function
>  create mode 100644 t/t4018/julia-indented-function
>  create mode 100644 t/t4018/julia-inline-function
>  create mode 100644 t/t4018/julia-macro
>  create mode 100644 t/t4018/julia-mutable-struct
>  create mode 100644 t/t4018/julia-struct

The tests all look good.

> diff --git a/userdiff.c b/userdiff.c
> index efbe05e5a5..b5e938b1c2 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -79,6 +79,21 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +PATTERNS("julia",
> +	 "^[ \t]*(((mutable[ \t]+)?struct|(@.+[ \t])?function|macro)[ \t].*)$",

Looks good to me.

> +	 /* -- */
> +	 /* Binary literals */
> +	 "[-+]?0b[01]+"
> +	 /* Hexadecimal literals */
> +	 "|[-+]?0x[0-9a-fA-F]+"

These two could be merged into

	/* Binary and hexadecimal literals */
	"|0[bx][0-9a-fA-F]+"

Note that I did not insert [-+]? at the front. Even though most if not
all patterns allow a sign, they are usually wrong to do so, because they
misclassify a change from 'a+1' to 'a+2' as 'a[-+1-]{++2+}' instead of
the correct 'a+[-1-]{+2+}'.

> +	 /* Real and complex literals */
> +	 "|[-+0-9.e_(im)]+"

I am curious: is '(1+2i)' a single literal -- including the parentheses?
The expression would also mistake the character sequence '-1)+(2+' as a
single word; is it intended?

> +	 /* Should theoretically allow Unicode characters as part of
> +	  * a word, such as U+2211. However, Julia reserves most of the
> +	  * U+2200-U+22FF range (as well as others) as user-defined operators,
> +	  * therefore they are not handled in this regex. */
> +	 "|[a-zA-Z_][a-zA-Z0-9_!]*"
> +	 "|--|\\+\\+|<<=?|>>>=?|>>=?|\\\\\\\\=?|//=?|&&|\\|\\||::|->|[-+*/<>%^&|=!$]=?"),

The last sub-expression permits single-character operators in addition
to their forms with a '=' appended (computing assignment, I presume).
You could remove the trailing ? because single non-whitespace characters
are always a word of their own, even if they are not caught by the word
regexp.

>  PATTERNS("matlab",
>  	 /*
>  	  * Octave pattern is mostly the same as matlab, except that '%%%' and
> 
> base-commit: 042ed3e048af08014487d19196984347e3be7d1c
> 

-- Hannes

^ permalink raw reply

* Re: [PATCH v3 2/2] rebase-interactive: warn if commit is dropped with `rebase --edit-todo'
From: Phillip Wood @ 2020-01-10 17:13 UTC (permalink / raw)
  To: Alban Gruin, phillip.wood, Junio C Hamano; +Cc: git, Johannes Schindelin
In-Reply-To: <0bf602bb-eff0-12f1-fa6c-c670a12f2cee@gmail.com>

Hi Alban

On 09/01/2020 21:13, Alban Gruin wrote:
> Hi Phillip,
> 
> Le 09/12/2019 à 17:00, Phillip Wood a écrit :
>>>> diff --git a/rebase-interactive.c b/rebase-interactive.c
>>>> index ad5dd49c31..80b6a2e7a6 100644
>>>> --- a/rebase-interactive.c
>>>> +++ b/rebase-interactive.c
>>>> @@ -97,7 +97,8 @@ int edit_todo_list(struct repository *r, struct
>>>> todo_list *todo_list,
>>>>               struct todo_list *new_todo, const char *shortrevisions,
>>>>               const char *shortonto, unsigned flags)
>>>>    {
>>>> -    const char *todo_file = rebase_path_todo();
>>>> +    const char *todo_file = rebase_path_todo(),
>>>> +        *todo_backup = rebase_path_todo_backup();
>>>>        /* If the user is editing the todo list, we first try to parse
>>>> @@ -110,9 +111,9 @@ int edit_todo_list(struct repository *r, struct
>>>> todo_list *todo_list,
>>>>                        -1, flags | TODO_LIST_SHORTEN_IDS |
>>>> TODO_LIST_APPEND_TODO_HELP))
>>>>            return error_errno(_("could not write '%s'"), todo_file);
>>>>    -    if (initial && copy_file(rebase_path_todo_backup(), todo_file,
>>>> 0666))
>>>> -        return error(_("could not copy '%s' to '%s'."), todo_file,
>>>> -                 rebase_path_todo_backup());
>>>> +    unlink(todo_backup);
>>>> +    if (copy_file(todo_backup, todo_file, 0666))
>>>> +        return error(_("could not copy '%s' to '%s'."), todo_file,
>>>> todo_backup);
>>>
>>> We used to copy ONLY when initial is set and we left old todo_backup
>>> intact when !initial.  That is no longer true after this change, but
>>> it is intended---we create an exact copy of what we would hand out
>>> to the end-user, so that we can compare it with the edited result
>>> to figure out what got changed.
>>
>> I think it would be better to only create a new copy if the last edit
>> was successful. As it stands if I edit the todo list and accidentally
>> delete some lines and then edit the todo list again to try and fix it
>> the second edit will succeed whether or not I reinserted the deleted lines.
>>
>> We could add this to the tests to check that a subsequent edit that does
>> not fix the problem fails
>>
>> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
>> index 969e12d281..8544d8ab2c 100755
>> --- a/t/t3404-rebase-interactive.sh
>> +++ b/t/t3404-rebase-interactive.sh
>>
>> @@ -1416,6 +1416,7 @@ test_expect_success 'rebase --edit-todo respects
>> rebase.missingCommitsCheck = er
>>                  test_i18ncmp expect actual &&
>>                  test_must_fail git rebase --continue 2>actual &&
>>                  test_i18ncmp expect actual &&
>> +               test_must_fail git rebase --edit-todo &&
>>                  cp orig .git/rebase-merge/git-rebase-todo &&
>>                  test_must_fail env FAKE_LINES="1 2 3 4" \
>>                          git rebase --edit-todo 2>actual &&
>>
>>
> 
> In which case, if the check did not pass at the previous edit, the new
> todo list should be compared to the backup.  As sequencer_continue()
> already does this, extract this to its own function in
> rebase-interactive.c.  To keep track of this, a file is created on the
> disk (as you suggested in your other email.)  At the next edit, if this
> file exists and no errors were found, it is deleted.  The backup is only
> created if there is no errors in `todo_list' and in `new_todo'.
> 
> This would guarantee that there is no errors in the backup, and that the
> edited list is always compared to a list exempt of errors.
> 
> This approach also has the benefit to detect if a commit part of a
> badcmd was dropped.
> 
> After some tweaks (ie. `expect' now lists 2 commits instead of one),
> this passes the test with the change you suggested, and the one you sent
> in your other email.

That sounds good. I'm not sure how it passes the test in my other email 
though, if sequencer_continue() compares the todo list to the backup 
wont it still fail when continuing after conflicts as the backup is out 
of date?

Best Wishes

Phillip


>>>
>>> We unlink(2) unconditionally because the only effect we want to see
>>> here is that todo_backup does not exist before we call copy_file()
>>> that wants to do O_CREAT|O_EXCL.  I wonder if we want to avoid
>>> unlink() when initial, and also if we want to do unlink_or_warn()
>>> when !initial (read: this is just "wondering" without thinking long
>>> enough to suggest that doing so would be better)
>>>
>>>> diff --git a/t/t3404-rebase-interactive.sh
>>>> b/t/t3404-rebase-interactive.sh
>>>> index 29a35840ed..9051c1e11d 100755
>>>> --- a/t/t3404-rebase-interactive.sh
>>>> +++ b/t/t3404-rebase-interactive.sh
>>>> @@ -1343,6 +1343,89 @@ test_expect_success 'rebase -i respects
>>>> rebase.missingCommitsCheck = error' '
>>>>        test B = $(git cat-file commit HEAD^ | sed -ne \$p)
>>>>    '
>>>>    +test_expect_success 'rebase --edit-todo respects
>>>> rebase.missingCommitsCheck = ignore' '
>>>> +    test_config rebase.missingCommitsCheck ignore &&
>>>> +    rebase_setup_and_clean missing-commit &&
>>>> +    set_fake_editor &&
>>>> +    FAKE_LINES="break 1 2 3 4 5" git rebase -i --root &&
>>>> +    FAKE_LINES="1 2 3 4" git rebase --edit-todo 2>actual &&
>>>
>>> OK, so we lost "pick 5" but with missing-check disabled, that should
>>> not trigger any annoying warning or error.
>>>
>>>> +    git rebase --continue 2>actual &&
>>
>> This clobbers actual which hasn't been used yet
>>
> 
> Good catch.
> 
>>>> +    test D = $(git cat-file commit HEAD | sed -ne \$p) &&
>>>
>>>> +    test_i18ngrep \
>>>> +        "Successfully rebased and updated refs/heads/missing-commit" \
>>>> +        actual
>>>> +'
>>>> +
>>>> +test_expect_success 'rebase --edit-todo respects
>>>> rebase.missingCommitsCheck = warn' '
>>>> +    cat >expect <<-EOF &&
>>>> +    error: invalid line 1: badcmd $(git rev-list --pretty=oneline
>>>> --abbrev-commit -1 master~4)
>>>> +    Warning: some commits may have been dropped accidentally.
>>>> +    Dropped commits (newer to older):
>>>> +     - $(git rev-list --pretty=oneline --abbrev-commit -1 master)
>>>> +    To avoid this message, use "drop" to explicitly remove a commit.
>>>> +    EOF
>>>> +    tail -n4 expect >expect.2 &&
>>>> +    test_config rebase.missingCommitsCheck warn &&
>>>> +    rebase_setup_and_clean missing-commit &&
>>>> +    set_fake_editor &&
>>>> +    test_must_fail env FAKE_LINES="bad 1 2 3 4 5" \
>>>> +        git rebase -i --root &&
>>>> +    cp .git/rebase-merge/git-rebase-todo.backup orig &&
>>>> +    FAKE_LINES="2 3 4" git rebase --edit-todo 2>actual.2 &&
>>>> +    head -n5 actual.2 >actual &&
>>>> +    test_i18ncmp expect actual &&
>>>
>>> OK, so we lost "pick 1" while discarding "bad", and we should notice
>>> the lossage?  I see "head -n5" there, which means we are still
>>> getting "invalid line 1: badcmd", even though FAKE_LINES now got rid
>>> of "bad"?  Puzzled...
>>
>> Is the bad there to stop the rebase so we can edit the todo list? If so
>> it would be better to use 'break' instead.
>>
> 
> No, it was here to show that we can detect dropped commits, even if the
> todo list has an error.
> 
>> Best Wishes
>>
>> Phillip
> 
> Cheers,
> Alban
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox