Git development
 help / color / mirror / Atom feed
* Re: [PATCH] tree-walk: disallow overflowing modes
From: Jeff King @ 2023-01-22 22:02 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Junio C Hamano, Ævar Arnfjörð Bjarmason
In-Reply-To: <044bdc8f-fdc9-dfd2-6cbb-941513467524@web.de>

On Sun, Jan 22, 2023 at 11:03:38AM +0100, René Scharfe wrote:

> Am 22.01.23 um 08:50 schrieb Jeff King:
> > On Sat, Jan 21, 2023 at 10:36:09AM +0100, René Scharfe wrote:
> >
> >> When parsing tree entries, reject mode values that don't fit into an
> >> unsigned int.
> >
> > Seems reasonable. I don't think you can cause any interesting mischief
> > here, but it's cheap to check, and finding data problems earlier rather
> > than later is always good.
> >
> > Should it be s/unsigned int/uint16_t/, though?
> 
> "mode" is declared as unsigned int, and I was more concerned with
> overflowing that.

Doh, I just did "vim -t get_mode" and ended up in the near-identical
version in fast-import.c, which uses uint16_t. Maybe it would want the
same treatment. ;)

> We could be more strict and reject everything that oversteps
> S_IFMT|ALLPERMS, but the latter is not defined everywhere.  But
> permission bits are well-known, so the magic number 07777 should be
> recognizable enough.  Like this?

It feels like this level of check should be happening in the fsck
caller. In particular, it's nice for this parsing layer to err on the
forgiving side, because the way you get out of these situations often
involves something like "git ls-tree | git mktree".

So in that sense, even pushing the overflow detection into the fsck
would be nice, but of course it's hard to do, since we have no way to
represent the overflowed value. I guess we could have a "be more
careful" flag that the fsck code would pass, but I'm not sure it's worth
the added complexity.

-Peff

^ permalink raw reply

* Re: [PATCH] Documentation: render dash correctly
From: Junio C Hamano @ 2023-01-22 22:53 UTC (permalink / raw)
  To: Andrei Rybak; +Cc: git
In-Reply-To: <20230122165628.1601062-1-rybak.a.v@gmail.com>

Andrei Rybak <rybak.a.v@gmail.com> writes:

> Three hyphens are rendered verbatim in documentation, so "--" has to be
> used to produce a dash.

Sad but true.  I suspect folks with TeX background were so
accustomed to type three dashes to obtain em dash, but with AsciiDoc
(and asciidoctor), sadly, two dashes is a way to ask for em dash.

The changes in your patch look all reasonable to me at the source
level; I didn't do Documentation/doc-diff to verify, though.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 1/3] branch: fix die_if_checked_out() when ignore_current_worktree
From: Rubén Justo @ 2023-01-22 23:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Phillip Wood
In-Reply-To: <xmqqk01eqr3m.fsf@gitster.g>

On 22-ene-2023 11:58:05, Junio C Hamano wrote:

> Rubén Justo <rjusto@gmail.com> writes:
> 
> >> As the proposed log message explained, updating die_if_checked_out()
> >> with this patch would fix a bug---can we demonstrate the existing
> >> breakage and protect the fix from future breakages by adding a test
> >> or two?
> >
> > 2/3 and 3/3, I think makes more sense on its own commit.
> 
> Hmph, how so?  Especially once you split 1/3 into the preliminary
> refactoring and the real fix, the fix becomes fairly small and
> clear.  And the tests to protect the fix would go best in the same
> commit.

My intention is to protect rebase (2/3) and switch (3/3).  If any of
those tests break, even if die_if_checked_out() is no longer used by
them, I try to make the original intent clear with that in there.

die_if_checked_out() was initially fine, the ignore_current_worktree was
unfortunately introduced.  I haven't checked, but other callers not
affected by the change, i.e. ignore_current_worktree = 0, his tests
should have protected them by the change.

You are right, in a future reroll, split 1/3 could leave a fairly small
commit, maybe not a bad thing.  Definitely this need a reroll, because
of the style issues, but I will wait some time for other reviewers. 

> >> The above comment is about find_shared_symref() which iterates over
> >> worktrees and find the one that uses the named symref.  Now the
> >> comment appears to apply to is_shared_symref() which does not
> >> iterate but takes one specific worktree instance.  Do their
> >> differences necessitate some updates to the comment?
> >
> > I think the comment still makes sense as is for the new function, both the
> > description and the recommendation.  I will review it again.
> 
> OK.  Thanks.
> 
> >> > +int is_shared_symref(const struct worktree *wt, const char *symref,
> >> > +		     const char *target)
> >> > +{
> >> 
> >> What this function does sound more like "is target in use in this
> >> particular worktree by being pointed at by the symref?"  IOW, I do
> >> not see where "shared" comes into its name from.
> >> 
> >> "HEAD" that is tentatively detached while bisecting or rebasing the
> >> "target" branch is still considered to point at the "target", so
> >> perhaps symref_points_at_target() or something?
> >
> > I tried to maintain the terms as much as possible.  I'll think about the name
> > you suggest.
> 
> When you did not change a thing in such a way that it does not
> change the relationship between that thing and other things, it
> makes perfect sense to keep the same term to refer to the thing.
> Otherwise, once the thing starts playing different roles in the
> world, there may be a better word to refer to the updated and
> improved thing.

I tried to maintain the relationship and the role, too.  Just introduce
the helper, as Phillip suggested and I think it is a good idea. 

Thank you.

^ permalink raw reply

* Re: [PATCH] MyFirstContribution: refrain from self-iterating too much
From: Sean Allred @ 2023-01-23  1:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq3583uyk0.fsf@gitster.g>

First review on this list; let me know if I've missed any decorum :-)

Junio C Hamano <gitster@pobox.com> writes:
> +Of course, you still may spot mistakes and rooms for improvements
> +after you sent your initial patch.  Learn from that experience to
> +make sure that you will do such a self-review _before_ sending your
> +patches next time.  You do not have to send your patches immediately
> +once you finished writing them.  It is not a race.  Take your time
> +to self-review them, sleep on them, improve them before sending them
> +out, and do not allow you to send them before you are reasonably
> +sure that you won't find more mistakes in them yourself.

Something of a nit:

    do not allow you to send them...

should be

    do not allow yourself to send them...

You're also using 'them' a *lot* which took me for a tumble my first
read-through. I lost track of what 'them' actually was. Since this
documentation is especially likely to be read by those who are already
confused by the process, it may be beneficial to be more explicit at
some points:

    ...

    patches next time. You do not have to send your patches immediately
    once you finished writing them. It is not a race. Take your time to
    review your own patches, sleep on them, and improve them. Do not
    allow yourself to send out your patches for review before you are
    reasonably sure you won't find more mistakes in them yourself.

--

Thanks for all the work you do on this list; it's much appreciated.

-Sean

--
Sean Allred

^ permalink raw reply

* Re: [PATCH] bisect: fix "reset" when branch is checked out elsewhere
From: Junio C Hamano @ 2023-01-23  2:01 UTC (permalink / raw)
  To: Rubén Justo; +Cc: Git List, Phillip Wood
In-Reply-To: <1c36c334-9f10-3859-c92f-3d889e226769@gmail.com>

Rubén Justo <rjusto@gmail.com> writes:

> Since 1d0fa89 (checkout: add --ignore-other-wortrees, 2015-01-03) we
> have a safety valve in checkout/switch to prevent the same branch from
> being checked out simultaneously in multiple worktrees.
>
> If a branch is bisected in a worktree while also being checked out in
> another worktree; when the bisection is finished, checking out the
> branch back in the current worktree may fail.

True.  But we should explain why failing is a bad thing here.  After
all, "git checkout foo" to check out a branch 'foo' that is being
used in a different worktree linked to the same repository fails,
and that is a GOOD thing.  Is the logic behind your "may fail and
that is a bad thing" something like this?

    When "git bisect reset" goes back to the branch, it used to error
    out if the same branch is checked out in a different worktree.
    Since this can happen only after the end-user deliberately checked
    out the branch twice, erroring out does not contribute to any
    safety.

Having said that...

> @@ -245,7 +245,8 @@ static int bisect_reset(const char *commit)
>  		struct child_process cmd = CHILD_PROCESS_INIT;
>  
>  		cmd.git_cmd = 1;
> -		strvec_pushl(&cmd.args, "checkout", branch.buf, "--", NULL);
> +		strvec_pushl(&cmd.args, "checkout", "--ignore-other-worktrees",
> +				branch.buf, "--", NULL);

"git bisect reset" does take an arbitrary commit or branch name,
which may not be the original branch the user was on.  If the
user did not have any branch checked out twice, can they do
something like

    $ git checkout foo
    $ git bisect start HEAD HEAD~20
    ... bisect session finds the first bad commit ...
    $ git bisect reset bar

where 'foo' is checked out only in this worktree?  What happens if
'bar' has been checked out in a different worktree linked to the
same repository while this bisect was going on?  The current code
may fail due to the safety "checkout" has, but isn't that exactly
what we want?  I.e. prevent 'bar' from being checked out twice by
mistake?  Giving 'bar' on the command line of "bisect reset" is
likely because the user wants to start working on that branch,
without necessarily knowing that they already have a worktree that
checked out the branch elsewhere---in other words, isn't that a lazy
folks' shorthand for "git bisect reset && git checkout bar"?

If we loosen the safety only when bisect_reset() receives NULL to
its commit parameter, i.e. we are going back to the original branch,
the damage might be limited to narrower use cases, but I still am
not sure if the loosening is worth it.

IIUC, the scenario that may be helped would go like this:

    ... another worktree has 'foo' checked out ...
    $ git checkout --ignore-other-worktrees foo
    $ git bisect start HEAD HEAD~20
    ... bisect session finds the first bad commit ...
    $ git bisect reset

The last step wants to go back to 'foo', and it may be more
convenient if it did not fail to go back to the risky state the user
originally created.  But doesn't the error message already tell us
how to go back after this step refuses to recreate such a risky
state?  It sugests "git bisect reset <commit>" to switch to a
detached HEAD, so presumably, after seeing the above fail and
reading the error message, the user could do

    $ git bisect reset foo^{commit}

to finish the bisect session and detach the head at 'foo', and then
the "usual" mantra to recreate the risky state that 'foo' is checked
out twice can be done, i.e.

    $ git checkout --ignore-other-worktrees foo

So, I am not sure if this is a good idea in general.

Or do I misunderstand why you think "checking out may fail" is a bad
thing?

^ permalink raw reply

* Re: [PATCH] attr: fix instructions on how to check attrs
From: John Cai @ 2023-01-23  4:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: John Cai via GitGitGadget, git
In-Reply-To: <xmqq8rhuturw.fsf@gitster.g>

Hi Junio,

On 22 Jan 2023, at 11:10, Junio C Hamano wrote:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: John Cai <johncai86@gmail.com>
>>
>> The instructions in attr.h describing what functions to call to check
>> attributes is missing the index as the first argument to git_check_attr.
>>
>> Fix this to make it consistent with the actual function signature.
>
> Sounds quite sensible.  It would have been very good to explain some
> research in the above, like
>
>     When 7a400a2c (attr: remove an implicit dependency on the_index,
>     2018-08-13) started passing an index_state instance to
>     git_check_attr(), it forgot to update the API documentation that
>     was in Documentation/technical/api-gitattributes.txt.  Later,
>     3a1b3415 (attr: move doc to attr.h, 2019-11-17) moved the API
>     documentation to attr.h and made it to a comment, without
>     realizing the earlier mistake.
>
> or something like that.

good tip about including some history. I'll include that in the re-roll

thanks!

>
> Thanks.
>
>> diff --git a/attr.h b/attr.h
>> index 2f22dffadb3..47f1111f391 100644
>> --- a/attr.h
>> +++ b/attr.h
>> @@ -45,7 +45,7 @@
>>   * const char *path;
>>   *
>>   * setup_check();
>> - * git_check_attr(path, check);
>> + * git_check_attr(&the_index, path, check);
>>   * ------------
>>   *
>>   * - Act on `.value` member of the result, left in `check->items[]`:
>>
>> base-commit: 904d404274fef6695c78a6b055edd184b72e2f9b

^ permalink raw reply

* [PATCH v2] MyFirstContribution: refrain from self-iterating too much
From: Junio C Hamano @ 2023-01-23  4:18 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen
In-Reply-To: <xmqqcz76tv6d.fsf@gitster.g>

Finding mistakes in and improving your own patches is a good idea,
but doing so too quickly is being inconsiderate to reviewers who
have just seen the initial iteration and taking their time to review
it.  Encourage new developers to perform such a self review before
they send out their patches, not after.

Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/MyFirstContribution.txt | 30 +++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/Documentation/MyFirstContribution.txt b/Documentation/MyFirstContribution.txt
index ccfd0cb5f3..3e4f1c7764 100644
--- a/Documentation/MyFirstContribution.txt
+++ b/Documentation/MyFirstContribution.txt
@@ -1256,6 +1256,36 @@ index 88f126184c..38da593a60 100644
 [[now-what]]
 == My Patch Got Emailed - Now What?
 
+After you sent out your first patch, you may find mistakes in it, or
+a different and better way to achieve the goal of the patch.  But
+resist the temptation to send a new version immediately.
+
+ - If the mistakes you found are minor, send a reply to your patch as
+   if you were a reviewer and mention that you will fix them in an
+   updated version.
+
+ - On the other hand, if you think you want to change the course so
+   drastically that reviews on the initial patch would become
+   useless, send a reply to your patch to say so immediately to
+   avoid wasting others' time (e.g. "I am working on a better
+   approach, so please ignore this patch, and wait for the updated
+   version.")
+
+And give reviewers enough time to process your initial patch before
+sending an updated version.
+
+The above is a good practice if you sent your initial patch
+prematurely without polish.  But a better approach of course is to
+avoid sending your patch prematurely in the first place.
+
+Keep in mind that people in the development community do not have to
+see your patch immediately after you wrote it.  Instead of seeing
+the initial version right now, that you will follow up with several
+updated "oops, I like this version better than the previous one"
+versions over 2 days, reviewers would much appreciate if a single
+more polished version came 2 days late and that version, that
+contains fewer mistakes, were the only one they need to review.
+
 [[reviewing]]
 === Responding to Reviews
 
-- 
2.39.1-308-g56c8fb1e95


^ permalink raw reply related

* [PATCH v2] attr: fix instructions on how to check attrs
From: John Cai via GitGitGadget @ 2023-01-23  4:22 UTC (permalink / raw)
  To: git; +Cc: John Cai, John Cai
In-Reply-To: <pull.1441.git.git.1674356774172.gitgitgadget@gmail.com>

From: John Cai <johncai86@gmail.com>

The instructions in attr.h describing what functions to call to check
attributes is missing the index as the first argument to
git_check_attr().

When 7a400a2c (attr: remove an implicit dependency on the_index,
2018-08-13) started passing an index_state instance to git_check_attr(),
it forgot to update the API documentation in
Documentation/technical/api-gitattributes.txt. Later, 3a1b3415
(attr: move doc to attr.h, 2019-11-17) moved the API documentation to
attr.h as a comment, but still left out the index_state as an argument.

Fix this to make the documentation in the comment consistent with the
actual function signature.

Signed-off-by: John Cai <johncai86@gmail.com>
---
    attr: fix instructions on how to check attrs
    
    The instructions in attr.h describing what functions to call to check
    attributes is missing the index as the first argument to git_check_attr.
    
    Fix this to make it consistent with the actual function signature.
    
    Changes since V1:
    
     * updated commit message to include some history
    
    Signed-off-by: John Cai johncai86@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1441%2Fjohn-cai%2Fjc%2Ffix-attr-docs-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1441/john-cai/jc/fix-attr-docs-v2
Pull-Request: https://github.com/git/git/pull/1441

Range-diff vs v1:

 1:  63bb84db487 ! 1:  8cfee55e48f attr: fix instructions on how to check attrs
     @@ Commit message
          attr: fix instructions on how to check attrs
      
          The instructions in attr.h describing what functions to call to check
     -    attributes is missing the index as the first argument to git_check_attr.
     +    attributes is missing the index as the first argument to
     +    git_check_attr().
      
     -    Fix this to make it consistent with the actual function signature.
     +    When 7a400a2c (attr: remove an implicit dependency on the_index,
     +    2018-08-13) started passing an index_state instance to git_check_attr(),
     +    it forgot to update the API documentation in
     +    Documentation/technical/api-gitattributes.txt. Later, 3a1b3415
     +    (attr: move doc to attr.h, 2019-11-17) moved the API documentation to
     +    attr.h as a comment, but still left out the index_state as an argument.
     +
     +    Fix this to make the documentation in the comment consistent with the
     +    actual function signature.
      
          Signed-off-by: John Cai <johncai86@gmail.com>
      


 attr.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/attr.h b/attr.h
index 2f22dffadb3..47f1111f391 100644
--- a/attr.h
+++ b/attr.h
@@ -45,7 +45,7 @@
  * const char *path;
  *
  * setup_check();
- * git_check_attr(path, check);
+ * git_check_attr(&the_index, path, check);
  * ------------
  *
  * - Act on `.value` member of the result, left in `check->items[]`:

base-commit: 904d404274fef6695c78a6b055edd184b72e2f9b
-- 
gitgitgadget

^ permalink raw reply related

* What's cooking in git.git (Jan 2023, #06; Sun, 22)
From: Junio C Hamano @ 2023-01-23  7:48 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking in my tree.  Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a future
release).  Commits prefixed with '-' are only in 'seen', and aren't
considered "accepted" at all.  A topic without enough support may be
discarded after a long period of no activity.

Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors.  Some
repositories have only a subset of branches.

With maint, master, next, seen, todo:

	git://git.kernel.org/pub/scm/git/git.git/
	git://repo.or.cz/alt-git.git/
	https://kernel.googlesource.com/pub/scm/git/git/
	https://github.com/git/git/
	https://gitlab.com/git-vcs/git/

With all the integration branches and topics broken out:

	https://github.com/gitster/git/

Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):

	git://git.kernel.org/pub/scm/git/git-htmldocs.git/
	https://github.com/gitster/git-htmldocs.git/

Release tarballs are available at:

	https://www.kernel.org/pub/software/scm/git/

--------------------------------------------------
[Graduated to 'master']

* ab/bisect-cleanup (2023-01-13) 6 commits
  (merged to 'next' on 2023-01-14 at 945b631a1e)
 + bisect: no longer try to clean up left-over `.git/head-name` files
 + bisect: remove Cogito-related code
 + bisect run: fix the error message
 + bisect: verify that a bogus option won't try to start a bisection
 + bisect--helper: make the order consistently `argc, argv`
 + bisect--helper: simplify exit code computation

 Code clean-up.
 source: <cover-v2-0.6-00000000000-20230112T151651Z-avarab@gmail.com>


* ar/bisect-doc-update (2023-01-13) 2 commits
  (merged to 'next' on 2023-01-14 at df5185519c)
 + git-bisect-lk2009: update nist report link
 + git-bisect-lk2009: update java code conventions link

 Doc update.
 source: <20230110093251.193552-1-rybak.a.v@gmail.com>


* ar/test-cleanup (2023-01-13) 3 commits
  (merged to 'next' on 2023-01-14 at 16d372b65d)
 + t7527: use test_when_finished in 'case insensitive+preserving'
 + t6422: drop commented out code
 + t6003: uncomment test '--max-age=c3, --topo-order'

 Test clean-up.
 source: <20230111233242.16870-1-rybak.a.v@gmail.com>


* es/hooks-and-local-env (2023-01-13) 1 commit
  (merged to 'next' on 2023-01-14 at 93acd89393)
 + githooks: discuss Git operations in foreign repositories

 Doc update for environment variables set when hooks are invoked.
 source: <pull.1457.v2.git.1673293508399.gitgitgadget@gmail.com>


* jc/doc-diff-patch.txt (2023-01-13) 1 commit
  (merged to 'next' on 2023-01-14 at d25ec1f631)
 + docs: link generating patch sections

 Doc update.
 source: <pull.1392.v2.git.git.1673626524221.gitgitgadget@gmail.com>


* jk/curl-avoid-deprecated-api (2023-01-17) 3 commits
  (merged to 'next' on 2023-01-17 at e3ead5f8a0)
 + http: support CURLOPT_PROTOCOLS_STR
 + http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
 + http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT

 Deal with a few deprecation warning from cURL library.
 source: <Y8YP+R/hyNr6sEFA@coredump.intra.peff.net>


* jk/interop-error (2023-01-13) 1 commit
  (merged to 'next' on 2023-01-14 at ddca7887a5)
 + t/interop: report which vanilla git command failed

 Test helper improvement.
 source: <Y8A3yGeJl0TCDNqe@coredump.intra.peff.net>


* jk/read-object-cleanup (2023-01-13) 6 commits
  (merged to 'next' on 2023-01-13 at 8cbeef4abd)
 + object-file: fix indent-with-space
  (merged to 'next' on 2023-01-09 at 19cc3de33e)
 + packfile: inline custom read_object()
 + repo_read_object_file(): stop wrapping read_object_file_extended()
 + read_object_file_extended(): drop lookup_replace option
 + streaming: inline call to read_object_file_extended()
 + object-file: inline calls to read_object()

 Code clean-up.
 source: <Y7l4LsEQcDT9HZ21@coredump.intra.peff.net>


* jx/t1301-updates (2022-11-30) 3 commits
  (merged to 'next' on 2023-01-14 at d4f081b3f8)
 + t1301: do not change $CWD in "shared=all" test case
 + t1301: use test_when_finished for cleanup
 + t1301: fix wrong template dir for git-init

 Test updates.
 source: <20221128130323.8914-1-worldhello.net@gmail.com>


* pb/doc-orig-head (2023-01-13) 5 commits
  (merged to 'next' on 2023-01-14 at 0583c146cb)
 + git-rebase.txt: add a note about 'ORIG_HEAD' being overwritten
 + revisions.txt: be explicit about commands writing 'ORIG_HEAD'
 + git-merge.txt: mention 'ORIG_HEAD' in the Description
 + git-reset.txt: mention 'ORIG_HEAD' in the Description
 + git-cherry-pick.txt: do not use 'ORIG_HEAD' in example

 Document ORIG_HEAD a bit more.
 source: <pull.1456.v2.git.1673356521.gitgitgadget@gmail.com>


* ph/parse-date-reduced-precision (2023-01-13) 1 commit
  (merged to 'next' on 2023-01-14 at eb83564c3e)
 + date.c: allow ISO 8601 reduced precision times

 Loosen date parsing heuristics.
 source: <20230111001003.10916-1-congdanhqx@gmail.com>


* pw/rebase-exec-cleanup (2023-01-13) 1 commit
  (merged to 'next' on 2023-01-14 at 553d0daa62)
 + rebase: cleanup "--exec" option handling

 Code clean-up.
 source: <pull.1461.git.1673542201452.gitgitgadget@gmail.com>


* rs/dup-array (2023-01-09) 5 commits
  (merged to 'next' on 2023-01-14 at 3efbd1ffe0)
 + use DUP_ARRAY
 + add DUP_ARRAY
 + do full type check in BARF_UNLESS_COPYABLE
 + factor out BARF_UNLESS_COPYABLE
 + mingw: make argv2 in try_shell_exec() non-const

 Code cleaning.
 source: <9bc1bd74-f72c-1b43-df7c-950815babb03@web.de>
 source: <3e04e283-cad0-7be4-d85c-65d0a52289e2@web.de>


* sk/merge-filtering-strategies-micro-optim (2023-01-13) 1 commit
  (merged to 'next' on 2023-01-14 at f171559fae)
 + merge: break out of all_strategy loop when strategy is found

 Micro optimization.
 source: <pull.1429.v2.git.git.1673285669004.gitgitgadget@gmail.com>


* tl/ls-tree-code-clean-up (2023-01-13) 6 commits
  (merged to 'next' on 2023-01-14 at f7238037fd)
 + t3104: remove shift code in 'test_ls_tree_format'
 + ls-tree: cleanup the redundant SPACE
 + ls-tree: make "line_termination" less generic
 + ls-tree: fold "show_tree_data" into "cb" struct
 + ls-tree: use a "struct options"
 + ls-tree: don't use "show_tree_data" for "fast" callbacks

 Code clean-up.
 source: <20230112091135.20050-1-tenglong.tl@alibaba-inc.com>


* yc/doc-fetch-fix (2023-01-13) 1 commit
  (merged to 'next' on 2023-01-14 at a3ca60840b)
 + doc: fix non-existent config name

 Doc fix.
 source: <CAEg0tHSZi22RUBREJB=Cfy6O72cicv9FTkgo_Z=gvGRdPK1acw@mail.gmail.com>


* yo/doc-use-more-switch-c (2023-01-13) 1 commit
  (merged to 'next' on 2023-01-14 at 7169d5dabc)
 + doc: add "git switch -c" as another option on detached HEAD

 Doc update.
 source: <pull.1422.v2.git.git.1673261237449.gitgitgadget@gmail.com>

--------------------------------------------------
[New Topics]

* ab/cache-api-cleanup-users (2023-01-17) 3 commits
  (merged to 'next' on 2023-01-18 at c5a4374652)
 + treewide: always have a valid "index_state.repo" member
 + Merge branch 'ds/omit-trailing-hash-in-index' into ab/cache-api-cleanup-users
 + Merge branch 'ab/cache-api-cleanup' into ab/cache-api-cleanup-users
 (this branch uses ab/cache-api-cleanup and ds/omit-trailing-hash-in-index.)

 Updates the users of the cache API.

 Will merge to 'master'.
 cf. <db312853-81a1-542b-db96-d816c463516c@github.com>
 source: <patch-1.1-b4998652822-20230117T135234Z-avarab@gmail.com>


* cb/checkout-same-branch-twice (2023-01-20) 1 commit
 - checkout/switch: disallow checking out same branch in multiple worktrees

 "git checkout -B $branch" failed to protect against checking out
 a branch that is checked out elsewhere, unlike "git branch -f" did.

 Expecting a (hopefully final) reroll.
 cf. <8f24fc3c-c30f-dc70-5a94-5ee4ed3de102@dunelm.org.uk>
 source: <20230120113553.24655-1-carenas@gmail.com>


* sa/cat-file-mailmap--batch-check (2023-01-18) 1 commit
  (merged to 'next' on 2023-01-18 at 25ecb1dd3a)
 + git-cat-file.txt: fix list continuations rendering literally

 Docfix.

 Will merge to 'master'.
 source: <20230118082749.1252459-1-martin.agren@gmail.com>


* pb/branch-advice-recurse-submodules (2023-01-18) 1 commit
  (merged to 'next' on 2023-01-19 at 13747fc72d)
 + branch: improve advice when --recurse-submodules fails

 Improve advice message given when "git branch --resurse-submodules"
 fails.

 Will merge to 'master'.
 source: <pull.1464.git.1673890908453.gitgitgadget@gmail.com>


* ab/sequencer-unleak (2023-01-18) 8 commits
 - commit.c: free() revs.commit in get_fork_point()
 - builtin/rebase.c: free() "options.strategy_opts"
 - sequencer.c: always free() the "msgbuf" in do_pick_commit()
 - builtin/rebase.c: fix "options.onto_name" leak
 - builtin/revert.c: move free-ing of "revs" to replay_opts_release()
 - rebase & sequencer API: fix get_replay_opts() leak in "rebase"
 - sequencer.c: split up sequencer_remove_state()
 - rebase: use "cleanup" pattern in do_interactive_rebase()

 Plug leaks in sequencer subsystem and its users.

 Needs review.
 source: <cover-v3-0.8-00000000000-20230118T160600Z-avarab@gmail.com>


* en/rebase-update-refs-needs-merge-backend (2023-01-22) 9 commits
 - rebase: provide better error message for apply options vs. merge config
 - rebase: put rebase_options initialization in single place
 - rebase: fix formatting of rebase --reapply-cherry-picks option in docs
 - rebase: clarify the OPT_CMDMODE incompatibilities
 - rebase: add coverage of other incompatible options
 - rebase: fix docs about incompatibilities with --root
 - rebase: remove --allow-empty-message from incompatible opts
 - rebase: flag --apply and --merge as incompatible
 - rebase: mark --update-refs as requiring the merge backend

 The "--update-refs" feature of "git rebase" requires the use of the
 merge backend, while "--whitespace=fix" feature does not work with
 the said backend.  Notice the combination and error out, instead of
 silently ignoring one of the features requested.

 Needs review.
 source: <pull.1466.v4.git.1674367961.gitgitgadget@gmail.com>


* jk/hash-object-literally-fd-leak (2023-01-19) 1 commit
  (merged to 'next' on 2023-01-19 at fff9b60a36)
 + hash-object: fix descriptor leak with --literally

 Leakfix.

 Will merge to 'master'.
 source: <Y8ijpJqtkDTi792i@coredump.intra.peff.net>


* jc/doc-branch-update-checked-out-branch (2023-01-18) 1 commit
  (merged to 'next' on 2023-01-19 at 970900a232)
 + branch: document `-f` and linked worktree behaviour

 Document that "branch -f <branch>" disables only the safety to
 avoid recreating an existing branch.

 Will merge to 'master'.
 source: <xmqqa62f2dj1.fsf_-_@gitster.g>


* jc/doc-checkout-b (2023-01-19) 1 commit
 - checkout: document -b/-B to highlight the differences from "git branch"

 Clarify how "checkout -b/-B" and "git branch [-f]" are similar but
 different in the documentation.

 Will merge to 'next'.
 source: <xmqqtu0m1m9i.fsf@gitster.g>


* cw/fetch-remote-group-with-duplication (2023-01-19) 1 commit
  (merged to 'next' on 2023-01-20 at 7f00e43209)
 + fetch: fix duplicate remote parallel fetch bug

 "git fetch <group>", when "<group>" of remotes lists the same
 remote twice, unnecessarily failed when parallel fetching was
 enabled, which has been corrected.

 Will merge to 'master'.
 source: <20230119220538.1522464-1-calvinwan@google.com>


* po/pretty-format-columns-doc (2023-01-19) 5 commits
 - doc: pretty-formats note wide char limitations, and add tests
 - doc: pretty-formats describe use of ellipsis in truncation
 - doc: pretty-formats document negative column alignments
 - doc: pretty-formats: delineate `%<|(` parameter values
 - doc: pretty-formats: separate parameters from placeholders

 Clarify column-padding operators in the pretty format string.

 Will merge to 'next'.
 source: <20230119181827.1319-1-philipoakley@iee.email>


* jk/hash-object-fsck (2023-01-19) 7 commits
 - fsck: do not assume NUL-termination of buffers
 - hash-object: use fsck for object checks
 - fsck: provide a function to fsck buffer without object struct
 - t: use hash-object --literally when created malformed objects
 - t7030: stop using invalid tag name
 - t1006: stop using 0-padded timestamps
 - t1007: modernize malformed object tests

 "git hash-object" now checks that the resulting object is well
 formed with the same code as "git fsck".

 Will merge to 'next'.
 source: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>
 source: <Y8ifa7hyqxSbL92U@coredump.intra.peff.net>


* rs/tree-parse-mode-overflow-check (2023-01-21) 1 commit
 - tree-walk: disallow overflowing modes

 Reject tree objects with entries whose mode bits are overly wide.

 Will merge to 'next'.
 source: <d673fde7-7eb2-6306-86b6-1c1a4c988ee8@web.de>


* tb/t0003-invoke-dd-more-portably (2023-01-22) 1 commit
 - t0003: call dd with portable blocksize

 Test portability fix.

 Will merge to 'next'.
 source: <20230122062839.14542-1-tboegi@web.de>


* ar/markup-em-en-dash (2023-01-22) 1 commit
 - Documentation: render dash correctly

 Doc mark-up fix.

 Will merge to 'next'.
 source: <20230122165628.1601062-1-rybak.a.v@gmail.com>


* jc/attr-doc-fix (2023-01-22) 1 commit
 - attr: fix instructions on how to check attrs

 Comment fix.

 Will merge to 'next'.
 source: <pull.1441.git.git.1674356774172.gitgitgadget@gmail.com>


* rj/avoid-switching-to-already-used-branch (2023-01-22) 3 commits
 - switch: reject if the branch is already checked out elsewhere (test)
 - rebase: refuse to switch to a branch already checked out elsewhere (test)
 - branch: fix die_if_checked_out() when ignore_current_worktree

 A few subcommands have been taught to stop users from working on a
 branch that is being used in another worktree linked to the same
 repository.

 Expecting a (hopefully final) reroll.
 cf. <d61a2393-64c8-da49-fe13-00bc4a52d5e3@gmail.com>
 source: <f7f45f54-9261-45ea-3399-8ba8dee6832b@gmail.com>


* rj/bisect-already-used-branch (2023-01-22) 1 commit
 - bisect: fix "reset" when branch is checked out elsewhere

 Allow "git bisect reset [name]" to check out the named branch (or
 the original one) even when the branch is already checked out in a
 different worktree linked to the same repository.

 Leaning negative. Why is it a good thing?
 cf. <xmqqo7qqovp1.fsf@gitster.g>
 source: <1c36c334-9f10-3859-c92f-3d889e226769@gmail.com>

--------------------------------------------------
[Stalled]

* tl/notes--blankline (2022-11-09) 5 commits
 - notes.c: introduce "--no-blank-line" option
 - notes.c: provide tips when target and append note are both empty
 - notes.c: drop unreachable code in 'append_edit()'
 - notes.c: cleanup for "designated init" and "char ptr init"
 - notes.c: cleanup 'strbuf_grow' call in 'append_edit'

 'git notes append' was taught '--[no-]blank-line' to conditionally
 add a LF between a new and existing note.

 Expecting a reroll.
 cf. <CAPig+cRcezSp4Rqt1Y9bD-FT6+7b0g9qHfbGRx65AOnw2FQXKg@mail.gmail.com>
 source: <cover.1667980450.git.dyroneteng@gmail.com>


* mc/switch-advice (2022-11-09) 1 commit
 - po: use `switch` over `checkout` in error message

 Use 'switch' instead of 'checkout' in an error message.

 Waiting for review response.
 source: <pull.1308.git.git.1668018620148.gitgitgadget@gmail.com>


* js/range-diff-mbox (2022-11-23) 1 commit
 - range-diff: support reading mbox files

 'git range-diff' gained support for reading either side from an .mbox
 file instead of a revision range.

 Waiting for review response.
 cf. <xmqqr0xupmnf.fsf@gitster.g>
 source: <pull.1420.v3.git.1669108102092.gitgitgadget@gmail.com>


* ab/tag-object-type-errors (2022-11-22) 5 commits
 - tag: don't emit potentially incorrect "object is a X, not a Y"
 - tag: don't misreport type of tagged objects in errors
 - object tests: add test for unexpected objects in tags
 - object-file.c: free the "t.tag" in check_tag()
 - Merge branch 'jk/parse-object-type-mismatch' into ab/tag-object-type-errors

 Hardening checks around mismatched object types when one of those
 objects is a tag.

 Expecting a reroll.
 cf. <xmqqzgb5jz5c.fsf@gitster.g>
 cf. <xmqqsfgxjugi.fsf@gitster.g>
 source: <cover-0.4-00000000000-20221118T113442Z-avarab@gmail.com>


* ab/config-multi-and-nonbool (2022-11-27) 9 commits
 - for-each-repo: with bad config, don't conflate <path> and <cmd>
 - config API: add "string" version of *_value_multi(), fix segfaults
 - config API users: test for *_get_value_multi() segfaults
 - for-each-repo: error on bad --config
 - config API: have *_multi() return an "int" and take a "dest"
 - versioncmp.c: refactor config reading next commit
 - config tests: add "NULL" tests for *_get_value_multi()
 - config tests: cover blind spots in git_die_config() tests
 - for-each-repo tests: test bad --config keys

 Assorted config API updates.

 Expecting a reroll.
 source: <cover-v3-0.9-00000000000-20221125T093158Z-avarab@gmail.com>


* ed/fsmonitor-inotify (2022-12-13) 6 commits
 - fsmonitor: update doc for Linux
 - fsmonitor: test updates
 - fsmonitor: enable fsmonitor for Linux
 - fsmonitor: implement filesystem change listener for Linux
 - fsmonitor: determine if filesystem is local or remote
 - fsmonitor: prepare to share code between Mac OS and Linux

 Bundled fsmonitor for Linux using inotify API.

 Needs review on the updated round.
 source: <pull.1352.v5.git.git.1670882286.gitgitgadget@gmail.com>


* jc/spell-id-in-both-caps-in-message-id (2022-12-17) 1 commit
 - e-mail workflow: Message-ID is spelled with ID in both capital letters

 Consistently spell "Message-ID" as such, not "Message-Id".

 Needs review.
 source: <xmqqsfhgnmqg.fsf@gitster.g>


* cb/grep-fallback-failing-jit (2022-12-17) 1 commit
 - grep: fall back to interpreter mode if JIT fails

 In an environment where dynamically generated code is prohibited to
 run (e.g. SELinux), failure to JIT pcre patterns is expected.  Fall
 back to interpreted execution in such a case.

 Expecting a reroll.
 cf. <f680b274-fa85-6624-096a-7753a2671c15@grsecurity.net>
 source: <20221216121557.30714-1-minipli@grsecurity.net>


* ad/test-record-count-when-harness-is-in-use (2022-12-25) 1 commit
 - test-lib: allow storing counts with test harnesses

 Allow summary results from tests to be written to t/test-results
 directory even when a test harness like 'prove' is in use.

 Needs review.
 source: <20221224225200.1027806-1-adam@dinwoodie.org>


* so/diff-merges-more (2022-12-18) 5 commits
 - diff-merges: improve --diff-merges documentation
 - diff-merges: issue warning on lone '-m' option
 - diff-merges: support list of values for --diff-merges
 - diff-merges: implement log.diffMerges-m-imply-p config
 - diff-merges: implement [no-]hide option and log.diffMergesHide config

 Assorted updates to "--diff-merges=X" option.

 May want to discard.  Breaking compatibility does not seem worth it.
 source: <20221217132955.108542-1-sorganov@gmail.com>

--------------------------------------------------
[Cooking]

* rs/ls-tree-path-expansion-fix (2023-01-14) 2 commits
  (merged to 'next' on 2023-01-16 at 6359f28ba7)
 + ls-tree: remove dead store and strbuf for quote_c_style()
 + ls-tree: fix expansion of repeated %(path)

 "git ls-tree --format='%(path) %(path)' $tree $path" showed the
 path three times, which has been corrected.

 Will merge to 'master'.
 source: <55ae7333-3a13-0575-93ed-f858a1c2877e@web.de>


* jc/format-patch-v-unleak (2023-01-16) 1 commit
  (merged to 'next' on 2023-01-16 at 2155d512bc)
 + format-patch: unleak "-v <num>"

 Plug a small leak.

 Will merge to 'master'.
 source: <xmqqv8l8gr6s.fsf@gitster.g>


* ds/omit-trailing-hash-in-index (2023-01-17) 1 commit
  (merged to 'next' on 2023-01-17 at 8dde3cf2db)
 + t1600: fix racy index.skipHash test
 (this branch is used by ab/cache-api-cleanup-users.)

 Quickfix for a topic already in 'master'.

 Will merge to 'master'.
 source: <76204710-356a-2a85-9057-302e6619b9df@github.com>


* ab/cache-api-cleanup (2023-01-16) 5 commits
  (merged to 'next' on 2023-01-16 at a0f388b149)
 + cache API: add a "INDEX_STATE_INIT" macro/function, add release_index()
 + read-cache.c: refactor set_new_index_sparsity() for subsequent commit
 + sparse-index API: BUG() out on NULL ensure_full_index()
 + sparse-index.c: expand_to_path() can assume non-NULL "istate"
 + builtin/difftool.c: { 0 }-initialize rather than using memset()
 (this branch is used by ab/cache-api-cleanup-users.)

 Code clean-up to tighten the use of in-core index in the API.

 Will merge to 'master'.
 source: <cover-v2-0.6-00000000000-20230112T124842Z-avarab@gmail.com>


* ab/test-env-helper (2023-01-14) 1 commit
  (merged to 'next' on 2023-01-16 at 82c17f02e5)
 + env-helper: move this built-in to "test-tool env-helper"

 Remove "git env--helper" and demote it to a test-tool subcommand.

 Will merge to 'master'.
 source: <patch-1.1-e662c570f1d-20230112T155226Z-avarab@gmail.com>


* en/ls-files-doc-update (2023-01-13) 4 commits
 - ls-files: guide folks to --exclude-standard over other --exclude* options
 - ls-files: clarify descriptions of status tags for -t
 - ls-files: clarify descriptions of file selection options
 - ls-files: add missing documentation for --resolve-undo option

 Doc update to ls-files.

 Will merge to 'next'?
 source: <pull.1463.git.1673584914.gitgitgadget@gmail.com>


* en/t6426-todo-cleanup (2023-01-14) 1 commit
  (merged to 'next' on 2023-01-16 at 7d13842eeb)
 + t6426: fix TODO about making test more comprehensive

 Test clean-up.

 Will merge to 'master'.
 source: <pull.1462.v2.git.1673722187025.gitgitgadget@gmail.com>


* zh/scalar-progress (2023-01-16) 1 commit
  (merged to 'next' on 2023-01-17 at d4c25cc71f)
 + scalar: show progress if stderr refers to a terminal

 "scalar" learned to give progress bar.

 Will merge to 'master'.
 source: <pull.1441.v3.git.1673442860379.gitgitgadget@gmail.com>


* ms/send-email-feed-header-to-validate-hook (2023-01-19) 2 commits
 - send-email: expose header information to git-send-email's sendemail-validate hook
 - send-email: refactor header generation functions

 "git send-email" learned to give the e-mail headers to the validate
 hook by passing an extra argument from the command line.

 Expecting a (hopefully final) reroll.
 cf. <c1ba0a28-3c39-b313-2757-dceb02930334@amd.com>
 source: <20230120012459.920932-1-michael.strawbridge@amd.com>


* ds/bundle-uri-5 (2023-01-07) 8 commits
 - bundle-uri: store fetch.bundleCreationToken
 - fetch: fetch from an external bundle URI
 - bundle-uri: drop bundle.flag from design doc
 - clone: set fetch.bundleURI if appropriate
 - bundle-uri: download in creationToken order
 - bundle-uri: parse bundle.<id>.creationToken values
 - bundle-uri: parse bundle.heuristic=creationToken
 - t5558: add tests for creationToken heuristic

 The bundle-URI subsystem adds support for creation-token heuristics
 to help incremental fetches.

 Expecting a reroll.
 cf. <4e2ac966-0f45-8018-ff8f-3831ea0c3c2e@github.com>
 source: <pull.1454.git.1673037405.gitgitgadget@gmail.com>


* tc/cat-file-z-use-cquote (2023-01-16) 1 commit
 - cat-file: quote-format name in error when using -z

 "cat-file" in the batch mode that is fed NUL-terminated pathnames
 learned to cquote them in its error output (otherwise, a funny
 pathname with LF in it would break the lines in the output stream).

 Expecting a reroll.
 cf. <2a2a46f0-a9bc-06a6-72e1-28800518777c@dunelm.org.uk>
 source: <20230116190749.4141516-1-toon@iotcl.com>


* cb/grep-pcre-ucp (2023-01-18) 1 commit
  (merged to 'next' on 2023-01-19 at 2c7e531839)
 + grep: correctly identify utf-8 characters with \{b,w} in -P

 "grep -P" learned to use Unicode Character Property to grok
 character classes when processing \b and \w etc.

 Will merge to 'master'.
 cf. <xmqqzgaf2zpt.fsf@gitster.g>
 source: <20230108155217.2817-1-carenas@gmail.com>


* rs/use-enhanced-bre-on-macos (2023-01-08) 1 commit
  (merged to 'next' on 2023-01-16 at 9b80d4253f)
 + use enhanced basic regular expressions on macOS

 Newer regex library macOS stopped enabling GNU-like enhanced BRE,
 where '\(A\|B\)' works as alternation, unless explicitly asked with
 the REG_ENHANCED flag.  "git grep" now can be compiled to do so, to
 retain the old behaviour.

 Will merge to 'master'.
 source: <26a0d4ca-3d97-ace4-1a1f-92b1ee6715a6@web.de>


* cw/submodule-status-in-parallel (2023-01-17) 6 commits
 - submodule: call parallel code from serial status
 - diff-lib: parallelize run_diff_files for submodules
 - diff-lib: refactor match_stat_with_submodule
 - submodule: move status parsing into function
 - submodule: strbuf variable rename
 - run-command: add duplicate_output_fn to run_processes_parallel_opts

 "git submodule status" learned to run the comparison in submodule
 repositories in parallel.

 Needs review.
 source: <20230104215415.1083526-1-calvinwan@google.com>


* kn/attr-from-tree (2023-01-14) 2 commits
  (merged to 'next' on 2023-01-16 at 426f357683)
 + attr: add flag `--source` to work with tree-ish
 + t0003: move setup for `--all` into new block

 "git check-attr" learned to take an optional tree-ish to read the
 .gitattributes file from.

 Will merge to 'master'.
 source: <cover.1673684790.git.karthik.188@gmail.com>


* ab/various-leak-fixes (2023-01-18) 19 commits
 - push: free_refs() the "local_refs" in set_refspecs()
 - receive-pack: free() the "ref_name" in "struct command"
 - grep API: plug memory leaks by freeing "header_list"
 - grep.c: refactor free_grep_patterns()
 - object-file.c: release the "tag" in check_tag()
 - builtin/merge.c: free "&buf" on "Your local changes..." error
 - builtin/merge.c: use fixed strings, not "strbuf", fix leak
 - show-branch: free() allocated "head" before return
 - commit-graph: fix a parse_options_concat() leak
 - http-backend.c: fix cmd_main() memory leak, refactor reg{exec,free}()
 - http-backend.c: fix "dir" and "cmd_arg" leaks in cmd_main()
 - worktree: fix a trivial leak in prune_worktrees()
 - repack: fix leaks on error with "goto cleanup"
 - name-rev: don't xstrdup() an already dup'd string
 - various: add missing clear_pathspec(), fix leaks
 - clone: use free() instead of UNLEAK()
 - commit-graph: use free_commit_graph() instead of UNLEAK()
 - bundle.c: don't leak the "args" in the "struct child_process"
 - tests: mark tests as passing with SANITIZE=leak

 Leak fixes.

 Needs review.
 source: <cover-v5-00.19-00000000000-20230118T120334Z-avarab@gmail.com>


* rj/branch-unborn-in-other-worktrees (2023-01-19) 3 commits
 - branch: rename orphan branches in any worktree
 - branch: description for orphan branch errors
 - avoid unnecessary worktrees traversing

 Error messages given when working on an unborn branch that is
 checked out in another worktree have been improvved.

 Expecting a reroll.
 cf. <527f7315-be7b-7ec0-04fc-d07da7d4fefa@gmail.com>
 source: <34a58449-4f2e-66ef-ea01-119186aebd23@gmail.com>


* ab/avoid-losing-exit-codes-in-tests (2022-12-20) 6 commits
 - tests: don't lose misc "git" exit codes
 - tests: don't lose "git" exit codes in "! ( git ... | grep )"
 - tests: don't lose exit status with "test <op> $(git ...)"
 - tests: don't lose exit status with "(cd ...; test <op> $(git ...))"
 - t/lib-patch-mode.sh: fix ignored exit codes
 - auto-crlf tests: don't lose exit code in loops and outside tests

 Test clean-up.

 Expecting a hopefully minor and final reroll.
 cf. <1182283a-4a78-3c99-e716-a8c3e58a5823@web.de>
 cf. <xmqqsfhb0vum.fsf@gitster.g>
 source: <cover-v4-0.6-00000000000-20221219T101240Z-avarab@gmail.com>


* sk/win32-close-handle-upon-pthread-join (2023-01-04) 2 commits
  (merged to 'next' on 2023-01-16 at faa279fd5b)
 + win32: close handles of threads that have been joined
 + win32: prepare pthread.c for change by formatting

 Pthread emulation on Win32 leaked thread handle when a thread is
 joined.

 Will merge to 'master'.
 source: <pull.1406.v13.git.git.1672762819.gitgitgadget@gmail.com>


* km/send-email-with-v-reroll-count (2022-11-27) 1 commit
  (merged to 'next' on 2023-01-19 at 9b3543471c)
 + send-email: relay '-v N' to format-patch

 "git send-email -v 3" used to be expanded to "git send-email
 --validate 3" when the user meant to pass them down to
 "format-patch", which has been corrected.

 Will merge to 'master'.
 source: <87edtp5uws.fsf@kyleam.com>


* ja/worktree-orphan (2023-01-13) 4 commits
 - worktree add: add hint to direct users towards --orphan
 - worktree add: add --orphan flag
 - worktree add: refactor opt exclusion tests
 - worktree add: include -B in usage docs

 'git worktree add' learned how to create a worktree based on an
 orphaned branch with `--orphan`.

 Expecting a reroll.
 cf. <11be1b0e-ee38-119f-1d80-cb818946116b@dunelm.org.uk>
 source: <20230109173227.29264-1-jacobabel@nullpo.dev>


* cc/filtered-repack (2022-12-25) 3 commits
 - gc: add gc.repackFilter config option
 - repack: add --filter=<filter-spec> option
 - pack-objects: allow --filter without --stdout

 "git repack" learns to discard objects that ought to be retrievable
 again from the promisor remote.

 May want to discard.  Its jaggy edges may be a bit too sharp.
 cf. <Y7WTv19aqiFCU8au@ncase>
 source: <20221221040446.2860985-1-christian.couder@gmail.com>


* mc/credential-helper-auth-headers (2023-01-20) 12 commits
 - credential: add WWW-Authenticate header to cred requests
 - http: read HTTP WWW-Authenticate response headers
 - http: replace unsafe size_t multiplication with st_mult
 - test-http-server: add sending of arbitrary headers
 - test-http-server: add simple authentication
 - test-http-server: pass Git requests to http-backend
 - test-http-server: add HTTP request parsing
 - test-http-server: add HTTP error response function
 - test-http-server: add stub HTTP server test helper
 - daemon: rename some esoteric/laboured terminology
 - daemon: libify child process handling functions
 - daemon: libify socket setup and option functions

 Extending credential helper protocol.

 Needs review.
 source: <pull.1352.v7.git.1674252530.gitgitgadget@gmail.com>

--------------------------------------------------
[Discarded]

* jc/ci-deprecated-declarations-are-not-fatal (2023-01-14) 1 commit
  (merged to 'next' on 2023-01-14 at 5efb778ab0)
 + ci: do not die on deprecated-declarations warning

 CI build fix for overzealous -Werror.

 Reverted out of 'next'
 Preferring jk/curl-avoid-deprecated-api that fixes the code properly.
 source: <xmqq7cxpkpjp.fsf@gitster.g>


* po/pretty-hard-trunc (2022-11-13) 1 commit
 . pretty-formats: add hard truncation, without ellipsis, options

 Add a new pretty format which truncates without ellipsis.
 source: <20221112143616.1429-1-philipoakley@iee.email>

^ permalink raw reply

* Re: [PATCH] Documentation: render dash correctly
From: Martin Ågren @ 2023-01-23  8:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andrei Rybak, git
In-Reply-To: <xmqqcz76qizp.fsf@gitster.g>

On Sun, 22 Jan 2023 at 23:55, Junio C Hamano <gitster@pobox.com> wrote:
>
> Andrei Rybak <rybak.a.v@gmail.com> writes:
>
> > Three hyphens are rendered verbatim in documentation, so "--" has to be
> > used to produce a dash.
>
> Sad but true.  I suspect folks with TeX background were so
> accustomed to type three dashes to obtain em dash, but with AsciiDoc
> (and asciidoctor), sadly, two dashes is a way to ask for em dash.
>
> The changes in your patch look all reasonable to me at the source
> level; I didn't do Documentation/doc-diff to verify, though.

doc-diff looks good.

I suspect these were identified by greping for " --- ", spaces included.
We seem to have some "---" that aren't surrounded by spaces. They're
perhaps a bit more tedious to find, but I see there are two in
gitformat-signature.txt and technical/rerere.txt. Maybe it would be
worthwhile addressing them too in this patch.

Martin

^ permalink raw reply

* Re: [PATCH] tree-walk: disallow overflowing modes
From: Ævar Arnfjörð Bjarmason @ 2023-01-23  8:33 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano, Jeff King
In-Reply-To: <d673fde7-7eb2-6306-86b6-1c1a4c988ee8@web.de>


On Sat, Jan 21 2023, René Scharfe wrote:

> When parsing tree entries, reject mode values that don't fit into an
> unsigned int.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  tree-walk.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tree-walk.c b/tree-walk.c
> index 74f4d710e8..5e7bc38600 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -17,6 +17,8 @@ static const char *get_mode(const char *str, unsigned int *modep)
>  	while ((c = *str++) != ' ') {
>  		if (c < '0' || c > '7')
>  			return NULL;
> +		if ((mode << 3) >> 3 != mode)
> +			return NULL;
>  		mode = (mode << 3) + (c - '0');
>  	}
>  	*modep = mode;

There was a discussion about this on git-security last August, in a
report that turned out to be uninteresting for the security aspects.

I'll just quote my own reply here out of context
(<220811.86mtcbqd5x.gmgdl@evledraar.gmail.com> for those with access).

For context the other issue in the "two issues" below is the one I'm
working towards fixing in
https://lore.kernel.org/git/cover-0.4-00000000000-20221118T113442Z-avarab@gmail.com/,
the other is this file mode overflow.

As noted at the end below I'm conflicted between whether we should "fix"
this in some way, or just intentionally leave it alone, because while
it's entirely accidental, this is the one part of git's object format
I'm aware of that we could sneak in some extension in the future,
without entirely breaking backwards compatibility.

BEGIN QUOTE

There's really two issues being raised here, how we validate the "mode"
in tree entries, and how we sometimes misreport object type X as type Y.

I replied on-list just now noting that the "mode" thing is something I
was working towards fixing a while ago, but am happy to see a more
isolated fix for:
https://lore.kernel.org/git/220811.86r11nqfi2.gmgdl@evledraar.gmail.com/

I have local patches for the misreporting of object types, it's not
*that* hard to get right. Basically we just need to be more careful
about how we populate the "struct object". The most common case is when
we parse tags that say on their envelope that we refer to a type X, but
it's really a type Y.

In that case we haven't .parsed=1 the object, but we note the wrong type
down. My local patches are basically just deferring that, so we don't
trust such type claims, and take our *actual* parsing of the object over
a previous reference in the object store.

I don't think that leaves any difficult edge cases related to tree
parsing, as Xavier notes we'd have modes claiming that an X is Y, but we
can resolve it using the same principle.

But regarding the "mode parsing" I think
https://lore.kernel.org/git/YvQdR3sDqDMCIjIE@coredump.intra.peff.net/ is
taking us in the wrong direction, but wasn't comfortable saying so on
list, because this is "exploitable" in the following way:

	$ perl -wE 'say unpack "B*", "hello world"'
	0110100001100101011011000110110001101111001000000111011101101111011100100110110001100100

Which, combined with Jeff's series on-list you can try e.g.:
	
	diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
	index ac4099ca893..d435dfd64c5 100755
	--- a/t/t5504-fetch-receive-strict.sh
	+++ b/t/t5504-fetch-receive-strict.sh
	@@ -357,16 +357,16 @@ test_expect_success 'badFilemode is not a strict error' '
	 	tree=$(
	 		cd badmode.git &&
	 		blob=$(echo blob | git hash-object -w --stdin | hex2oct) &&
	-		printf "123456 foo\0${blob}" |
	+		printf "106640110100001100101011011000110110001101111001000000111011101101111011100100110110001100100664 foo\0${blob}" |
	 		git hash-object -t tree --stdin -w --literally
	 	) &&
	 
	 	rm -rf dst.git &&
	 	git init --bare dst.git &&
	 	git -C dst.git config transfer.fsckObjects true &&
	-
	-	git -C badmode.git push ../dst.git $tree:refs/tags/tree 2>err &&
	-	grep "$tree: badFilemode" err
	+	git -C badmode.git push ../dst.git $tree:refs/tags/tree &&
	+	git -C badmode.git fsck &&
	+	git -C dst.git fsck
	 '
	 
	 test_done
	diff --git a/tree-walk.c b/tree-walk.c
	index 74f4d710e8f..2fb9f2e6cbe 100644
	--- a/tree-walk.c
	+++ b/tree-walk.c
	@@ -15,6 +15,7 @@ static const char *get_mode(const char *str, unsigned int *modep)
	 		return NULL;
	 
	 	while ((c = *str++) != ' ') {
	+		fprintf(stderr, "%c\n", c);
	 		if (c < '0' || c > '7')
	 			return NULL;
	 		mode = (mode << 3) + (c - '0');

Which gets you all tests passing, i.e. the particulars of the mode check
are such that we'll accept a very long mode (but if you play with it
you'll find it's not infinite, we'll run into other buffers elsewhere
pretty soon).

This is all assuming you're on a system whose "unsigned int" is 64 bit.

This part of it is also something I discovered in the beginning of 2021
(and there's some old off-list thread between myself & Elijah on the
topic I could dig up), but didn't report here at the time.

So, we could just "fix" it, and re the "wrong direction" above
downgrading the "bad mode" check to a mere warning is going a bit too
far given the above. We have some odd modes in the wild, but we don't
have such "overflowing modes".

On the other hand this edge case is also a golden opportunity we're not
likely to ever have again. We can't really change the git object format
at this point without *major* headaches, but in this case we have the
ability to encode arbitrary data into tree entries (e.g file metadata)
as long as the writer makes sure they overflow back to the valid
filemode :)

As hacky as that is I think it would be regrettable to forever close the
door on that to fix a hypothetical security bug, hypothetical because no
version of git.git has an issue with it. But it is a *potential* edge
case in overflowing any other tree parsing code out there in the wild
(which might be otherwise guarded by a stricter fsck check of ours).

END QUOTE

^ permalink raw reply

* [PATCH v2] Documentation: render dash correctly
From: Andrei Rybak @ 2023-01-23  9:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren
In-Reply-To: <20230122165628.1601062-1-rybak.a.v@gmail.com>

Three hyphens are rendered verbatim in documentation, so "--" has to be
used to produce a dash.  Fix asciidoc output for dashes.  This is
similar to previous commits f0b922473e (Documentation: render special
characters correctly, 2021-07-29) and de82095a95 (doc
hash-function-transition: fix asciidoc output, 2021-02-05).

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
---

On 2023-01-23T09:19, Martin Ågren wrote:
> I suspect these were identified by greping for " --- ", spaces included.

Indeed.

> We seem to have some "---" that aren't surrounded by spaces. They're
> perhaps a bit more tedious to find,

Not really:

    git grep -P -i '[^-][a-z0-9]+---[a-z0-9]+[^-]' -- Documentation/

The "[^-]" part is needed to exclude many examples of commit graphs.

The following:

    The horizontal line of history A---Q is taken to be the first parent of each
    merge.

also comes up in 'Documentation/rev-list-options.txt', but it might be better to
leave "A---Q" as is to be similar to the commit graph example above it (there
are commits between A and Q in the graph, but still).

> but I see there are two in
> gitformat-signature.txt and technical/rerere.txt. Maybe it would be
> worthwhile addressing them too in this patch.
> 
> Martin

Thank you for review, here's v2.

Changes from v1:

 - Added two fixes in gitformat-signature.txt and technical/rerere.txt, as
   suggested by Martin Ågren.

 Documentation/git-apply.txt                          | 2 +-
 Documentation/git-read-tree.txt                      | 2 +-
 Documentation/git.txt                                | 2 +-
 Documentation/gitformat-signature.txt                | 2 +-
 Documentation/technical/hash-function-transition.txt | 2 +-
 Documentation/technical/rerere.txt                   | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
index 1d478cbe9b..5e16e6db7e 100644
--- a/Documentation/git-apply.txt
+++ b/Documentation/git-apply.txt
@@ -208,7 +208,7 @@ behavior:
 * `warn` outputs warnings for a few such errors, but applies the
   patch as-is (default).
 * `fix` outputs warnings for a few such errors, and applies the
-  patch after fixing them (`strip` is a synonym --- the tool
+  patch after fixing them (`strip` is a synonym -- the tool
   used to consider only trailing whitespace characters as errors, and the
   fix involved 'stripping' them, but modern Gits do more).
 * `error` outputs warnings for a few such errors, and refuses
diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index 7567955bad..b09707474d 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -219,7 +219,7 @@ see which of the "local changes" that you made were carried forward by running
 `git diff-index --cached $M`.  Note that this does not
 necessarily match what `git diff-index --cached $H` would have
 produced before such a two tree merge.  This is because of cases
-18 and 19 --- if you already had the changes in $M (e.g. maybe
+18 and 19 -- if you already had the changes in $M (e.g. maybe
 you picked it up via e-mail in a patch form), `git diff-index
 --cached $H` would have told you about the change before this
 merge, but it would not show in `git diff-index --cached $M`
diff --git a/Documentation/git.txt b/Documentation/git.txt
index f9a7a4554c..74973d3cc4 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -613,7 +613,7 @@ The file parameters can point at the user's working file
 (e.g. `new-file` in "git-diff-files"), `/dev/null` (e.g. `old-file`
 when a new file is added), or a temporary file (e.g. `old-file` in the
 index).  `GIT_EXTERNAL_DIFF` should not worry about unlinking the
-temporary file --- it is removed when `GIT_EXTERNAL_DIFF` exits.
+temporary file -- it is removed when `GIT_EXTERNAL_DIFF` exits.
 +
 For a path that is unmerged, `GIT_EXTERNAL_DIFF` is called with 1
 parameter, <path>.
diff --git a/Documentation/gitformat-signature.txt b/Documentation/gitformat-signature.txt
index a249869faf..d8e3eb1bac 100644
--- a/Documentation/gitformat-signature.txt
+++ b/Documentation/gitformat-signature.txt
@@ -37,7 +37,7 @@ line.
 This is even true for an originally empty line.  In the following
 examples, the end of line that ends with a whitespace letter is
 highlighted with a `$` sign; if you are trying to recreate these
-example by hand, do not cut and paste them---they are there
+example by hand, do not cut and paste them--they are there
 primarily to highlight extra whitespace at the end of some lines.
 
 The signed payload and the way the signature is embedded depends
diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
index e2ac36dd21..ed57481089 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -562,7 +562,7 @@ hash re-encode during clone and to encourage peers to modernize.
 The design described here allows fetches by SHA-1 clients of a
 personal SHA-256 repository because it's not much more difficult than
 allowing pushes from that repository. This support needs to be guarded
-by a configuration option --- servers like git.kernel.org that serve a
+by a configuration option -- servers like git.kernel.org that serve a
 large number of clients would not be expected to bear that cost.
 
 Meaning of signatures
diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
index 35d4541433..be58f1bee3 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -99,7 +99,7 @@ conflict to leave line D means that the user declares:
     compatible with what AB and AC wanted to do.
 
 So the conflict we would see when merging AB into ACAB should be
-resolved the same way---it is the resolution that is in line with that
+resolved the same way--it is the resolution that is in line with that
 declaration.
 
 Imagine that similarly previously a branch XYXZ was forked from XY,
-- 
2.39.0


^ permalink raw reply related

* Re: [PATCH] ssh signing: better error message when key not in agent
From: Phillip Wood @ 2023-01-23  9:33 UTC (permalink / raw)
  To: Fabian Stelzer; +Cc: Adam Szkoda, Adam Szkoda via GitGitGadget, git
In-Reply-To: <20230120090331.37dxkko6bgxbjae7@fs>

On 20/01/2023 09:03, Fabian Stelzer wrote:
> On 18.01.2023 16:29, Phillip Wood wrote:
>> Hi Adam
>>
>> I've cc'd Fabian who knows more about the ssh signing code that I do.
>>
>> On 18/01/2023 15:28, Adam Szkoda wrote:
>>> Hi Phillip,
>>>
>>> Good point!  My first thought is to try doing a stat() syscall on the
>>> path from 'user.signingKey' to see if it exists and if not, treat it
>>> as a public key (and pass the -U option).  If that sounds reasonable,
>>> I can update the patch.
>>
>> My reading of the documentation is that user.signingKey may point to a 
>> public or private key so I'm not sure how stat()ing would help. 
>> Looking at the code in sign_buffer_ssh() we have a function 
>> is_literal_ssh_key() that checks if the config value is a public key. 
>> When the user passes the path to a key we could read the file check 
>> use is_literal_ssh_key() to check if it is a public key (or possibly 
>> just check if the file begins with "ssh-"). Fabian - does that sound 
>> reasonable?
> 
> Hi,
> I have encountered the mentioned problem before as well and tried to fix 
> it but did not find a good / reasonable way to do so. Git just passes 
> the user.signingKey to ssh-keygen which states:
> `The key used for signing is specified using the -f option and may refer 
> to either a private key, or a public key with the private half available 
> via ssh-agent(1)`
> 
> I don't think it's a good idea for git to parse the key and try to 
> determine if it's public or private. The fix should probably be in 
> openssh (different error message) but when looking into it last time i 
> remember that the logic for using the key is quite deeply embedded into 
> the ssh code and not easily adjusted for the signing use case. At the 
> moment I don't have the time to look into it but the openssh code for 
> signing is quite readable so feel free to give it a try. Maybe you find 
> a good way.

Thanks Fabian, perhaps the easiest way forward is for us to only pass 
"-U" when we have a literal key in user.signingKey as we know it must a 
be public key in that case.

Best Wishes

Phillip

> Best regards,
> Fabian
> 
>>
>> Best Wishes
>>
>> Phillip
>>
>>> Best
>>> — Adam
>>>
>>>
>>> On Wed, Jan 18, 2023 at 3:34 PM Phillip Wood 
>>> <phillip.wood123@gmail.com> wrote:
>>>>
>>>> On 18/01/2023 11:10, Phillip Wood wrote:
>>>>>> the agent [1].  A fix is scheduled to be released in OpenSSH 9.1. All
>>>>>> that
>>>>>> needs to be done is to pass an additional backward-compatible option
>>>>>> -U to
>>>>>> 'ssh-keygen -Y sign' call.  With '-U', ssh-keygen always interprets
>>>>>> the file
>>>>>> as public key and expects to find the private key in the agent.
>>>>>
>>>>> The documentation for user.signingKey says
>>>>>
>>>>>   If gpg.format is set to ssh this can contain the path to either your
>>>>> private ssh key or the public key when ssh-agent is used.
>>>>>
>>>>> If I've understood correctly passing -U will prevent users from 
>>>>> setting
>>>>> this to a private key.
>>>>
>>>> If there is an easy way to tell if the user has given us a public key
>>>> then we could pass "-U" in that case.
>>>>
>>>> Best Wishes
>>>>
>>>> Phillip

^ permalink raw reply

* Re: [PATCH] ssh signing: better error message when key not in agent
From: Fabian Stelzer @ 2023-01-23 10:02 UTC (permalink / raw)
  To: phillip.wood; +Cc: Adam Szkoda, Adam Szkoda via GitGitGadget, git
In-Reply-To: <6e57bef8-7387-3341-5ed5-4bcfa7ded7a0@dunelm.org.uk>

On 23.01.2023 09:33, Phillip Wood wrote:
>On 20/01/2023 09:03, Fabian Stelzer wrote:
>>On 18.01.2023 16:29, Phillip Wood wrote:
>>>Hi Adam
>>>
>>>I've cc'd Fabian who knows more about the ssh signing code that I do.
>>>
>>>On 18/01/2023 15:28, Adam Szkoda wrote:
>>>>Hi Phillip,
>>>>
>>>>Good point!  My first thought is to try doing a stat() syscall on the
>>>>path from 'user.signingKey' to see if it exists and if not, treat it
>>>>as a public key (and pass the -U option).  If that sounds reasonable,
>>>>I can update the patch.
>>>
>>>My reading of the documentation is that user.signingKey may point 
>>>to a public or private key so I'm not sure how stat()ing would 
>>>help. Looking at the code in sign_buffer_ssh() we have a function 
>>>is_literal_ssh_key() that checks if the config value is a public 
>>>key. When the user passes the path to a key we could read the file 
>>>check use is_literal_ssh_key() to check if it is a public key (or 
>>>possibly just check if the file begins with "ssh-"). Fabian - does 
>>>that sound reasonable?
>>
>>Hi,
>>I have encountered the mentioned problem before as well and tried to 
>>fix it but did not find a good / reasonable way to do so. Git just 
>>passes the user.signingKey to ssh-keygen which states:
>>`The key used for signing is specified using the -f option and may 
>>refer to either a private key, or a public key with the private half 
>>available via ssh-agent(1)`
>>
>>I don't think it's a good idea for git to parse the key and try to 
>>determine if it's public or private. The fix should probably be in 
>>openssh (different error message) but when looking into it last time 
>>i remember that the logic for using the key is quite deeply embedded 
>>into the ssh code and not easily adjusted for the signing use case. 
>>At the moment I don't have the time to look into it but the openssh 
>>code for signing is quite readable so feel free to give it a try. 
>>Maybe you find a good way.
>
>Thanks Fabian, perhaps the easiest way forward is for us to only pass 
>"-U" when we have a literal key in user.signingKey as we know it must 
>a be public key in that case.

Yes, i think that's a good idea as long as the `-U` flag is ignored in older 
ssh versions and shouldn't be too hard to implement. And it should work just 
as well when using `defaultKeyCommand`.

Best,
Fabian

>
>Best Wishes
>
>Phillip
>
>>Best regards,
>>Fabian
>>
>>>
>>>Best Wishes
>>>
>>>Phillip
>>>
>>>>Best
>>>>— Adam
>>>>
>>>>
>>>>On Wed, Jan 18, 2023 at 3:34 PM Phillip Wood 
>>>><phillip.wood123@gmail.com> wrote:
>>>>>
>>>>>On 18/01/2023 11:10, Phillip Wood wrote:
>>>>>>>the agent [1].  A fix is scheduled to be released in OpenSSH 9.1. All
>>>>>>>that
>>>>>>>needs to be done is to pass an additional backward-compatible option
>>>>>>>-U to
>>>>>>>'ssh-keygen -Y sign' call.  With '-U', ssh-keygen always interprets
>>>>>>>the file
>>>>>>>as public key and expects to find the private key in the agent.
>>>>>>
>>>>>>The documentation for user.signingKey says
>>>>>>
>>>>>>  If gpg.format is set to ssh this can contain the path to either your
>>>>>>private ssh key or the public key when ssh-agent is used.
>>>>>>
>>>>>>If I've understood correctly passing -U will prevent users 
>>>>>>from setting
>>>>>>this to a private key.
>>>>>
>>>>>If there is an easy way to tell if the user has given us a public key
>>>>>then we could pass "-U" in that case.
>>>>>
>>>>>Best Wishes
>>>>>
>>>>>Phillip

^ permalink raw reply

* Re: [PATCH v2] Documentation: render dash correctly
From: Martin Ågren @ 2023-01-23 11:04 UTC (permalink / raw)
  To: Andrei Rybak; +Cc: git, Junio C Hamano
In-Reply-To: <20230123090114.429844-1-rybak.a.v@gmail.com>

On Mon, 23 Jan 2023 at 10:01, Andrei Rybak <rybak.a.v@gmail.com> wrote:

>  highlighted with a `$` sign; if you are trying to recreate these
> -example by hand, do not cut and paste them---they are there
> +example by hand, do not cut and paste them--they are there
>  primarily to highlight extra whitespace at the end of some lines.

OK, so this is one of the new ones compared to v1. I can see the
argument for adding some spaces around the "--" for consistency and to
make this a bit easier to read in the resulting manpage (which can of
course be very subjective), but then I can also see that kind of change
being left out as orthogonal to this patch.

This v2 patch looks good to me.

Martin

^ permalink raw reply

* [PATCH v2] ls-tree: add --sparse-filter-oid argument
From: William Sprent via GitGitGadget @ 2023-01-23 11:46 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, Victoria Dye, Elijah Newren, William Sprent,
	William Sprent
In-Reply-To: <pull.1459.git.1673456518993.gitgitgadget@gmail.com>

From: William Sprent <williams@unity3d.com>

There is currently no way to ask git the question "which files would be
part of a sparse checkout of commit X with sparse checkout patterns Y".
One use-case would be that tooling may want know whether sparse checkouts
of two commits contain the same content even if the full trees differ.
Another interesting use-case would be for tooling to use in conjunction
with 'git update-index --index-info'.

'rev-list --objects --filter=sparse:oid' comes close, but as rev-list is
concerned with objects rather than directory trees, it leaves files out
when the same blob occurs in at two different paths.

It is possible to ask git about the sparse status of files currently in
the index with 'ls-files -t'. However, this does not work well when the
caller is interested in another commit, intererested in sparsity
patterns that aren't currently in '.git/info/sparse-checkout', or when
working in with bare repo.

To fill this gap, add a new argument to ls-tree '--sparse-filter-oid'
which takes the object id of a blob containing sparse checkout patterns
that filters the output of 'ls-tree'. When filtering with given sparsity
patterns, 'ls-tree' only outputs blobs and commit objects that
match the given patterns.

While it may be valid in some situations to output a tree object -- e.g.
when a cone pattern matches all blobs recursively contained in a tree --
it is less unclear what should be output if a sparse pattern matches
parts of a tree.

To allow for reusing the pattern matching logic found in
'path_in_sparse_checkout_1()' in 'dir.c' with arbitrary patterns,
extract the pattern matching part of the function into its own new
function 'recursively_match_path_with_sparse_patterns()'.

Signed-off-by: William Sprent <williams@unity3d.com>
---
    ls-tree: add --sparse-filter-oid argument
    
    I'm resubmitting this change as rebased on top of 'master', as it
    conflicted with the topic 'ls-tree.c: clean-up works' 1
    [https://public-inbox.org/git/20230112091135.20050-1-tenglong.tl@alibaba-inc.com],
    which was merged to 'master' recently.
    
    This versions also incorporates changes based on the comments made in 2
    [https://public-inbox.org/git/CAPig+cRgZ0CrkqY7mufuWmhf6BC8yXjXXuOTEQjuz+Y0NA+N7Q@mail.gmail.com/].
    
    I'm also looping in contributors that have touched ls-tree and/or
    sparse-checkouts recently. I hope that's okay.
    
    William

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1459%2Fwilliams-unity%2Fls-tree-sparse-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1459/williams-unity/ls-tree-sparse-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1459

Contributor requested no range-diff. You can review it using these commands:
   git fetch https://github.com/gitgitgadget/git a38d39a4 07a4e24e
   git range-diff <options> a38d39a4..8891b110 56c8fb1e..07a4e24e

 Documentation/git-ls-tree.txt      |  6 ++
 builtin/ls-tree.c                  | 79 +++++++++++++++++++++++-
 dir.c                              | 45 ++++++++------
 dir.h                              |  5 ++
 t/t3106-ls-tree-sparse-checkout.sh | 99 ++++++++++++++++++++++++++++++
 5 files changed, 213 insertions(+), 21 deletions(-)
 create mode 100755 t/t3106-ls-tree-sparse-checkout.sh

diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index 0240adb8eec..724bb1f9dbd 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git ls-tree' [-d] [-r] [-t] [-l] [-z]
 	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]] [--format=<format>]
+	    [--sparse-filter-oid=<blob-ish>]
 	    <tree-ish> [<path>...]
 
 DESCRIPTION
@@ -48,6 +49,11 @@ OPTIONS
 	Show tree entries even when going to recurse them. Has no effect
 	if `-r` was not passed. `-d` implies `-t`.
 
+--sparse-filter-oid=<blob-ish>::
+	Omit showing tree objects and paths that do not match the
+	sparse-checkout specification contained in the blob
+	(or blob-expression) <blob-ish>.
+
 -l::
 --long::
 	Show object size of blob (file) entries.
diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 72eb70823d5..46a815fc7dc 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -13,6 +13,7 @@
 #include "builtin.h"
 #include "parse-options.h"
 #include "pathspec.h"
+#include "dir.h"
 
 static const char * const ls_tree_usage[] = {
 	N_("git ls-tree [<options>] <tree-ish> [<path>...]"),
@@ -364,6 +365,65 @@ static struct ls_tree_cmdmode_to_fmt ls_tree_cmdmode_format[] = {
 	},
 };
 
+struct sparse_filter_data {
+	read_tree_fn_t fn;
+	struct pattern_list pl;
+	struct strbuf full_path_buf;
+	struct ls_tree_options *options;
+};
+
+static void init_sparse_filter_data(struct sparse_filter_data **d, struct ls_tree_options *options,
+	const char *sparse_oid_name, read_tree_fn_t fn)
+{
+	struct object_id sparse_oid;
+	struct object_context oc;
+
+	(*d) = xcalloc(1, sizeof(**d));
+	(*d)->fn = fn;
+	(*d)->pl.use_cone_patterns = core_sparse_checkout_cone;
+	(*d)->options = options;
+	strbuf_init(&(*d)->full_path_buf, 0);
+
+	if (get_oid_with_context(the_repository, sparse_oid_name, GET_OID_BLOB,
+			&sparse_oid, &oc))
+		die(_("unable to access sparse blob in '%s'"), sparse_oid_name);
+	if (add_patterns_from_blob_to_list(&sparse_oid, "", 0, &(*d)->pl) < 0)
+		die(_("unable to parse sparse filter data in %s"),
+		    oid_to_hex(&sparse_oid));
+}
+
+static void free_sparse_filter_data(struct sparse_filter_data *d)
+{
+	clear_pattern_list(&d->pl);
+	strbuf_release(&d->full_path_buf);
+	free(d);
+}
+
+static int path_matches_sparse_checkout_patterns(struct strbuf *full_path, struct pattern_list *pl, int dtype)
+{
+	enum pattern_match_result match = recursively_match_path_with_sparse_patterns(full_path->buf, the_repository->index, dtype, pl);
+	return match > 0;
+}
+
+static int filter_sparse(const struct object_id *oid, struct strbuf *base,
+			 const char *pathname, unsigned mode, void *context)
+{
+	struct sparse_filter_data *data = context;
+
+	int do_recurse = show_recursive(data->options, base->buf, base->len, pathname);
+	if (object_type(mode) == OBJ_TREE)
+		return do_recurse;
+
+	strbuf_reset(&data->full_path_buf);
+	strbuf_addbuf(&data->full_path_buf, base);
+	strbuf_addstr(&data->full_path_buf, pathname);
+
+	if (!path_matches_sparse_checkout_patterns(&data->full_path_buf, &data->pl, DT_REG))
+			return 0;
+
+	return data->fn(oid, base, pathname, mode, data->options);
+}
+
 int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 {
 	struct object_id oid;
@@ -372,7 +432,11 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	read_tree_fn_t fn = NULL;
 	enum ls_tree_cmdmode cmdmode = MODE_DEFAULT;
 	int null_termination = 0;
+
 	struct ls_tree_options options = { 0 };
+	char *sparse_oid_name = NULL;
+	void *read_tree_context = NULL;
+	struct sparse_filter_data *sparse_filter_data = NULL;
 	const struct option ls_tree_options[] = {
 		OPT_BIT('d', NULL, &options.ls_options, N_("only show trees"),
 			LS_TREE_ONLY),
@@ -399,6 +463,9 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 					 N_("format to use for the output"),
 					 PARSE_OPT_NONEG),
 		OPT__ABBREV(&options.abbrev),
+		OPT_STRING_F(0, "filter-sparse-oid", &sparse_oid_name, N_("filter-sparse-oid"),
+			   N_("filter output with sparse-checkout specification contained in the blob"),
+			     PARSE_OPT_NONEG),
 		OPT_END()
 	};
 	struct ls_tree_cmdmode_to_fmt *m2f = ls_tree_cmdmode_format;
@@ -474,7 +541,17 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 		break;
 	}
 
-	ret = !!read_tree(the_repository, tree, &options.pathspec, fn, &options);
+	if (sparse_oid_name) {
+		init_sparse_filter_data(&sparse_filter_data, &options, sparse_oid_name, fn);
+		read_tree_context = sparse_filter_data;
+		fn = filter_sparse;
+	} else
+		read_tree_context = &options;
+
+	ret = !!read_tree(the_repository, tree, &options.pathspec, fn, read_tree_context);
 	clear_pathspec(&options.pathspec);
+	if (sparse_filter_data)
+		free_sparse_filter_data(sparse_filter_data);
+
 	return ret;
 }
diff --git a/dir.c b/dir.c
index 4e99f0c868f..122ebced08e 100644
--- a/dir.c
+++ b/dir.c
@@ -1457,45 +1457,50 @@ int init_sparse_checkout_patterns(struct index_state *istate)
 	return 0;
 }
 
-static int path_in_sparse_checkout_1(const char *path,
-				     struct index_state *istate,
-				     int require_cone_mode)
+int recursively_match_path_with_sparse_patterns(const char *path,
+						struct index_state *istate,
+						int dtype,
+						struct pattern_list *pl)
 {
-	int dtype = DT_REG;
 	enum pattern_match_result match = UNDECIDED;
 	const char *end, *slash;
-
-	/*
-	 * We default to accepting a path if the path is empty, there are no
-	 * patterns, or the patterns are of the wrong type.
-	 */
-	if (!*path ||
-	    init_sparse_checkout_patterns(istate) ||
-	    (require_cone_mode &&
-	     !istate->sparse_checkout_patterns->use_cone_patterns))
-		return 1;
-
 	/*
 	 * If UNDECIDED, use the match from the parent dir (recursively), or
 	 * fall back to NOT_MATCHED at the topmost level. Note that cone mode
 	 * never returns UNDECIDED, so we will execute only one iteration in
 	 * this case.
 	 */
-	for (end = path + strlen(path);
-	     end > path && match == UNDECIDED;
+	for (end = path + strlen(path); end > path && match == UNDECIDED;
 	     end = slash) {
-
 		for (slash = end - 1; slash > path && *slash != '/'; slash--)
 			; /* do nothing */
 
 		match = path_matches_pattern_list(path, end - path,
 				slash > path ? slash + 1 : path, &dtype,
-				istate->sparse_checkout_patterns, istate);
+				pl, istate);
 
 		/* We are going to match the parent dir now */
 		dtype = DT_DIR;
 	}
-	return match > 0;
+
+	return match;
+}
+
+static int path_in_sparse_checkout_1(const char *path,
+				     struct index_state *istate,
+				     int require_cone_mode)
+{
+	/*
+	 * We default to accepting a path if the path is empty, there are no
+	 * patterns, or the patterns are of the wrong type.
+	 */
+	if (!*path ||
+	    init_sparse_checkout_patterns(istate) ||
+	    (require_cone_mode &&
+	     !istate->sparse_checkout_patterns->use_cone_patterns))
+		return 1;
+
+	return recursively_match_path_with_sparse_patterns(path, istate, DT_REG, istate->sparse_checkout_patterns) > 0;
 }
 
 int path_in_sparse_checkout(const char *path,
diff --git a/dir.h b/dir.h
index 8acfc044181..d186d271606 100644
--- a/dir.h
+++ b/dir.h
@@ -403,6 +403,11 @@ int path_in_sparse_checkout(const char *path,
 int path_in_cone_mode_sparse_checkout(const char *path,
 				      struct index_state *istate);
 
+int recursively_match_path_with_sparse_patterns(const char *path,
+						struct index_state *istate,
+						int dtype,
+						struct pattern_list *pl);
+
 struct dir_entry *dir_add_ignored(struct dir_struct *dir,
 				  struct index_state *istate,
 				  const char *pathname, int len);
diff --git a/t/t3106-ls-tree-sparse-checkout.sh b/t/t3106-ls-tree-sparse-checkout.sh
new file mode 100755
index 00000000000..945f3ed1888
--- /dev/null
+++ b/t/t3106-ls-tree-sparse-checkout.sh
@@ -0,0 +1,99 @@
+#!/bin/sh
+
+test_description='ls-tree with sparse filter patterns'
+
+. ./test-lib.sh
+
+check_agrees_with_ls_files () {
+	REPO=repo  &&
+	git -C $REPO submodule deinit -f --all &&
+	git -C $REPO cat-file -p $filter_oid >"$REPO/.git/info/sparse-checkout" &&
+	git -C $REPO sparse-checkout init --cone &&
+	git -C $REPO submodule init &&
+	git -C $REPO ls-files -t >out &&
+	sed -n "/^S /!s/^. //p" out >expect &&
+	test_cmp expect actual
+}
+
+check_same_result_in_bare_repo () {
+	FILTER_OID=$1 &&
+	git -C repo cat-file -p $FILTER_OID >out &&
+	git -C bare hash-object -w --stdin <out &&
+	git -C bare ls-tree --name-only --filter-sparse-oid=$FILTER_OID -r HEAD >bare-result &&
+	test_cmp expect bare-result
+}
+
+test_expect_success 'setup' '
+	git init submodule &&
+	test_commit -C submodule file &&
+	git init repo &&
+	(
+		cd repo &&
+		mkdir dir &&
+		test_commit dir/sub-file &&
+		test_commit dir/sub-file2 &&
+		mkdir dir2 &&
+		test_commit dir2/sub-file1 &&
+		test_commit dir2/sub-file2 &&
+		test_commit top-file &&
+		git clone ../submodule submodule &&
+		git submodule add ./submodule &&
+		git submodule absorbgitdirs &&
+		git commit -m"add submodule" &&
+		git sparse-checkout init --cone
+	) &&
+	git clone --bare ./repo bare
+'
+
+test_expect_success 'toplevel filter only shows toplevel file' '
+	filter_oid=$(git -C repo hash-object -w --stdin <<-\EOF
+	/*
+	!/*/
+	EOF
+	) &&
+	cat >expect <<-EOF &&
+	.gitmodules
+	submodule
+	top-file.t
+	EOF
+	git -C repo ls-tree --name-only --filter-sparse-oid=$filter_oid -r HEAD >actual &&
+	test_cmp expect actual &&
+	check_agrees_with_ls_files &&
+	check_same_result_in_bare_repo $filter_oid
+'
+
+test_expect_success 'non cone single file filter' '
+	filter_oid=$(git -C repo hash-object -w --stdin <<-\EOF
+	/dir/sub-file.t
+	EOF
+	) &&
+	cat >expect <<-EOF &&
+	dir/sub-file.t
+	EOF
+	git -C repo ls-tree --name-only --filter-sparse-oid=$filter_oid -r HEAD >actual &&
+	test_cmp expect actual &&
+	check_agrees_with_ls_files &&
+	check_same_result_in_bare_repo $filter_oid
+'
+
+test_expect_success 'cone filter matching one dir' '
+	filter_oid=$(git -C repo hash-object -w --stdin <<-\EOF
+	/*
+	!/*/
+	/dir/
+	EOF
+	) &&
+	cat >expect <<-EOF &&
+	.gitmodules
+	dir/sub-file.t
+	dir/sub-file2.t
+	submodule
+	top-file.t
+	EOF
+	git -C repo ls-tree --name-only --filter-sparse-oid=$filter_oid -r HEAD >actual &&
+	test_cmp expect actual &&
+	check_agrees_with_ls_files &&
+	check_same_result_in_bare_repo $filter_oid
+'
+
+test_done

base-commit: 56c8fb1e95377900ec9d53c07886022af0a5d3c2
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH v2] ls-tree: add --sparse-filter-oid argument
From: Ævar Arnfjörð Bjarmason @ 2023-01-23 13:00 UTC (permalink / raw)
  To: William Sprent via GitGitGadget
  Cc: git, Eric Sunshine, Derrick Stolee, Victoria Dye, Elijah Newren,
	William Sprent
In-Reply-To: <pull.1459.v2.git.1674474371817.gitgitgadget@gmail.com>


On Mon, Jan 23 2023, William Sprent via GitGitGadget wrote:

> From: William Sprent <williams@unity3d.com>
>
> There is currently no way to ask git the question "which files would be
> part of a sparse checkout of commit X with sparse checkout patterns Y".
> One use-case would be that tooling may want know whether sparse checkouts
> of two commits contain the same content even if the full trees differ.
> Another interesting use-case would be for tooling to use in conjunction
> with 'git update-index --index-info'.

Rather than commenting on individual points, I checked this out and
produced something squashable on-top, it fixes various issues (some
aspects of which still remain):

 * You need to wrap your code at 79 chars (and some of that could be
   helped by picking less verbose identifiers & variable names,
   e.g. just use "context" rather than "read_tree_context" in
   cmd_ls_tree(), it has no other "context"...)
 * You're making the memory management aroind init_sparse_filter_data()
   needlessly hard, just put it on the stack instead. That also allows
   for init-ing most of it right away, or via a macro in the assignment.
 * You have a stray refactoring of dir.c in your proposed change, this
   changes it back, let's leave that sort of thing out.
 * You're adding a function that returns an enum, but you declare it as
   returning "int", let's retain that type in the header & declaration.


diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 46a815fc7dc..68d6ef675f2 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -372,36 +372,37 @@ struct sparse_filter_data {
 	struct ls_tree_options *options;
 };
 
-static void init_sparse_filter_data(struct sparse_filter_data **d, struct ls_tree_options *options,
-	const char *sparse_oid_name, read_tree_fn_t fn)
+static void init_sparse_filter_data(struct sparse_filter_data *d,
+				    const char *sparse_oid_name)
 {
 	struct object_id sparse_oid;
 	struct object_context oc;
 
-	(*d) = xcalloc(1, sizeof(**d));
-	(*d)->fn = fn;
-	(*d)->pl.use_cone_patterns = core_sparse_checkout_cone;
-	(*d)->options = options;
-	strbuf_init(&(*d)->full_path_buf, 0);
-
 	if (get_oid_with_context(the_repository, sparse_oid_name, GET_OID_BLOB,
 			&sparse_oid, &oc))
-		die(_("unable to access sparse blob in '%s'"), sparse_oid_name);
-	if (add_patterns_from_blob_to_list(&sparse_oid, "", 0, &(*d)->pl) < 0)
+		die(_("unable to access sparse blob in '%s'"),
+		    sparse_oid_name);
+	if (add_patterns_from_blob_to_list(&sparse_oid, "", 0, &d->pl) < 0)
 		die(_("unable to parse sparse filter data in %s"),
 		    oid_to_hex(&sparse_oid));
 }
 
-static void free_sparse_filter_data(struct sparse_filter_data *d)
+static void release_sparse_filter_data(struct sparse_filter_data *d)
 {
 	clear_pattern_list(&d->pl);
 	strbuf_release(&d->full_path_buf);
-	free(d);
 }
 
-static int path_matches_sparse_checkout_patterns(struct strbuf *full_path, struct pattern_list *pl, int dtype)
+static int path_matches_sparse_checkout_patterns(struct strbuf *full_path,
+						 struct pattern_list *pl,
+						 int dtype)
 {
-	enum pattern_match_result match = recursively_match_path_with_sparse_patterns(full_path->buf, the_repository->index, dtype, pl);
+	enum pattern_match_result match;
+
+	match = recursively_match_path_with_sparse_patterns(full_path->buf,
+							    the_repository->index,
+							    dtype, pl);
+
 	return match > 0;
 }
 
@@ -436,7 +437,10 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	struct ls_tree_options options = { 0 };
 	char *sparse_oid_name = NULL;
 	void *read_tree_context = NULL;
-	struct sparse_filter_data *sparse_filter_data = NULL;
+	struct sparse_filter_data sparse_filter_data = {
+		.options = &options,
+		.full_path_buf = STRBUF_INIT,
+	};
 	const struct option ls_tree_options[] = {
 		OPT_BIT('d', NULL, &options.ls_options, N_("only show trees"),
 			LS_TREE_ONLY),
@@ -542,16 +546,17 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	}
 
 	if (sparse_oid_name) {
-		init_sparse_filter_data(&sparse_filter_data, &options, sparse_oid_name, fn);
-		read_tree_context = sparse_filter_data;
+		sparse_filter_data.fn = fn;
+		init_sparse_filter_data(&sparse_filter_data, sparse_oid_name);
+		read_tree_context = &sparse_filter_data;
 		fn = filter_sparse;
 	} else
 		read_tree_context = &options;
 
-	ret = !!read_tree(the_repository, tree, &options.pathspec, fn, read_tree_context);
+	ret = !!read_tree(the_repository, tree, &options.pathspec, fn,
+			  read_tree_context);
 	clear_pathspec(&options.pathspec);
-	if (sparse_filter_data)
-		free_sparse_filter_data(sparse_filter_data);
+	release_sparse_filter_data(&sparse_filter_data);
 
 	return ret;
 }
diff --git a/dir.c b/dir.c
index 122ebced08e..57ec6404901 100644
--- a/dir.c
+++ b/dir.c
@@ -1457,10 +1457,10 @@ int init_sparse_checkout_patterns(struct index_state *istate)
 	return 0;
 }
 
-int recursively_match_path_with_sparse_patterns(const char *path,
-						struct index_state *istate,
-						int dtype,
-						struct pattern_list *pl)
+enum pattern_match_result recursively_match_path_with_sparse_patterns(const char *path,
+								      struct index_state *istate,
+								      int dtype,
+								      struct pattern_list *pl)
 {
 	enum pattern_match_result match = UNDECIDED;
 	const char *end, *slash;
@@ -1470,7 +1470,8 @@ int recursively_match_path_with_sparse_patterns(const char *path,
 	 * never returns UNDECIDED, so we will execute only one iteration in
 	 * this case.
 	 */
-	for (end = path + strlen(path); end > path && match == UNDECIDED;
+	for (end = path + strlen(path);
+	     end > path && match == UNDECIDED;
 	     end = slash) {
 		for (slash = end - 1; slash > path && *slash != '/'; slash--)
 			; /* do nothing */
diff --git a/dir.h b/dir.h
index d186d271606..3c65e68ca40 100644
--- a/dir.h
+++ b/dir.h
@@ -403,10 +403,10 @@ int path_in_sparse_checkout(const char *path,
 int path_in_cone_mode_sparse_checkout(const char *path,
 				      struct index_state *istate);
 
-int recursively_match_path_with_sparse_patterns(const char *path,
-						struct index_state *istate,
-						int dtype,
-						struct pattern_list *pl);
+enum pattern_match_result recursively_match_path_with_sparse_patterns(const char *path,
+								      struct index_state *istate,
+								      int dtype,
+								      struct pattern_list *pl);
 
 struct dir_entry *dir_add_ignored(struct dir_struct *dir,
 				  struct index_state *istate,

^ permalink raw reply related

* Re: [PATCH v2] ls-tree: add --sparse-filter-oid argument
From: Ævar Arnfjörð Bjarmason @ 2023-01-23 13:06 UTC (permalink / raw)
  To: William Sprent via GitGitGadget
  Cc: git, Eric Sunshine, Derrick Stolee, Victoria Dye, Elijah Newren,
	William Sprent
In-Reply-To: <pull.1459.v2.git.1674474371817.gitgitgadget@gmail.com>


On Mon, Jan 23 2023, William Sprent via GitGitGadget wrote:

> From: William Sprent <williams@unity3d.com>

...some further inline commentary, in addition to my proposed squash-in...

> diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
> index 0240adb8eec..724bb1f9dbd 100644
> --- a/Documentation/git-ls-tree.txt
> +++ b/Documentation/git-ls-tree.txt
> @@ -11,6 +11,7 @@ SYNOPSIS
>  [verse]
>  'git ls-tree' [-d] [-r] [-t] [-l] [-z]
>  	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]] [--format=<format>]
> +	    [--sparse-filter-oid=<blob-ish>]
>  	    <tree-ish> [<path>...]
>  
>  DESCRIPTION
> @@ -48,6 +49,11 @@ OPTIONS
>  	Show tree entries even when going to recurse them. Has no effect
>  	if `-r` was not passed. `-d` implies `-t`.
>  
> +--sparse-filter-oid=<blob-ish>::
> +	Omit showing tree objects and paths that do not match the
> +	sparse-checkout specification contained in the blob
> +	(or blob-expression) <blob-ish>.
> +
>  -l::
>  --long::
>  	Show object size of blob (file) entries.
> diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
> index 72eb70823d5..46a815fc7dc 100644
> --- a/builtin/ls-tree.c
> +++ b/builtin/ls-tree.c
> @@ -13,6 +13,7 @@
>  #include "builtin.h"
>  #include "parse-options.h"
>  #include "pathspec.h"
> +#include "dir.h"
>  
>  static const char * const ls_tree_usage[] = {
>  	N_("git ls-tree [<options>] <tree-ish> [<path>...]"),
> @@ -364,6 +365,65 @@ static struct ls_tree_cmdmode_to_fmt ls_tree_cmdmode_format[] = {
>  	},
>  };

Okey, here yo uupdate the *.txt, but not the "-h", but it's one of those
where way say <options>.

I for one wouldn't mind some preceding change like my 423be1f83c5 (doc
txt & -h consistency: make "commit" consistent, 2022-10-13), which in
turn would make t/t0450-txt-doc-vs-help.sh pass for this command, but
maybe it's too much of a digression...

> +	(*d) = xcalloc(1, sizeof(**d));
> +	(*d)->fn = fn;
> +	(*d)->pl.use_cone_patterns = core_sparse_checkout_cone;

I forgot to note in my proposed fix-up that I omitted this, but your
tests still pass, which suggests it's either not needed, or your tests
are lacking....

> +	(*d)->options = options;
> +	strbuf_init(&(*d)->full_path_buf, 0);
> +
> +	if (get_oid_with_context(the_repository, sparse_oid_name, GET_OID_BLOB,
> +			&sparse_oid, &oc))
> +		die(_("unable to access sparse blob in '%s'"), sparse_oid_name);
> +	if (add_patterns_from_blob_to_list(&sparse_oid, "", 0, &(*d)->pl) < 0)

As noted you can avoid the malloc here, which also sidesteps this (IMO
at last) ugly &(*v)->m syntax.

> +		die(_("unable to parse sparse filter data in %s"),
> +		    oid_to_hex(&sparse_oid));
> +}
> +
> +static void free_sparse_filter_data(struct sparse_filter_data *d)
> +{
> +	clear_pattern_list(&d->pl);
> +	strbuf_release(&d->full_path_buf);
> +	free(d);
> +}
> +
> +static int path_matches_sparse_checkout_patterns(struct strbuf *full_path, struct pattern_list *pl, int dtype)
> +{
> +	enum pattern_match_result match = recursively_match_path_with_sparse_patterns(full_path->buf, the_repository->index, dtype, pl);

I for one would welcome e.g. abbreviating "sparse_checkout_patterns" as
"scp" or whatever throughout, with resulting shortening of very long
lines & identifiers. E.g. for this one "patch_match_scp" or whatever.


> +	struct sparse_filter_data *data = context;
> +

Style: Don't add a \n\n between variable decls,

> +	int do_recurse = show_recursive(data->options, base->buf, base->len, pathname);

Instead add it here, before the code proper.


> +	if (object_type(mode) == OBJ_TREE)
> +		return do_recurse;
> +
> +	strbuf_reset(&data->full_path_buf);

I wondered if we need this after, but we don't, I for one would welcome a fix-up like:
	
	diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
	index 68d6ef675f2..74dfa9a4580 100644
	--- a/builtin/ls-tree.c
	+++ b/builtin/ls-tree.c
	@@ -410,19 +410,21 @@ static int filter_sparse(const struct object_id *oid, struct strbuf *base,
	 			 const char *pathname, unsigned mode, void *context)
	 {
	 	struct sparse_filter_data *data = context;
	-
	 	int do_recurse = show_recursive(data->options, base->buf, base->len, pathname);
	+	int ret = 0;
	+
	 	if (object_type(mode) == OBJ_TREE)
	 		return do_recurse;
	 
	-	strbuf_reset(&data->full_path_buf);
	 	strbuf_addbuf(&data->full_path_buf, base);
	 	strbuf_addstr(&data->full_path_buf, pathname);
	 
	 	if (!path_matches_sparse_checkout_patterns(&data->full_path_buf, &data->pl, DT_REG))
	-			return 0;
	-
	-	return data->fn(oid, base, pathname, mode, data->options);
	+		goto cleanup;
	+	ret = data->fn(oid, base, pathname, mode, data->options);
	+cleanup:
	+	strbuf_reset(&data->full_path_buf);
	+	return ret;
	 }
	 
	 int cmd_ls_tree(int argc, const char **argv, const char *prefix)
	
It's slightly more verbose, and we do end up needlessly reset-ing the
"last" one, but makes it clear that we don't have need for this after
this.

Which to me, also raises the question of why you have this
"full_path_buf" at all. It looks like a memory optimization, as you're
trying to avoid a malloc()/free() in the loop.

That's fair, but you could also do that with:

	const size_t oldlen = base->len;
	strbuf_addstr(base, pathname);
        /* do the path_matches_sparse_checkout_patterns() call here */
	/* before "cleanup" */
	strbuf_setlen(base, oldlen);

Or whatever, those snippets are untested, but we use similar tricks
elsewhere, and as it's contained to a few lines here I think it's better
than carrying another buffer.

> +	strbuf_addbuf(&data->full_path_buf, base);
> +	strbuf_addstr(&data->full_path_buf, pathname);

Nit: Shorter as: strbuf_addf(&sb, "%s%s", base.buf, pathname) (or equivalent);

> +
> +	if (!path_matches_sparse_checkout_patterns(&data->full_path_buf, &data->pl, DT_REG))
> +			return 0;
> +
> +	return data->fn(oid, base, pathname, mode, data->options);
> +}
> +
>  int cmd_ls_tree(int argc, const char **argv, const char *prefix)
>  {
>  	struct object_id oid;
> @@ -372,7 +432,11 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
>  	read_tree_fn_t fn = NULL;
>  	enum ls_tree_cmdmode cmdmode = MODE_DEFAULT;
>  	int null_termination = 0;
> +

Don't add this stray \n.

>  	struct ls_tree_options options = { 0 };
> +	char *sparse_oid_name = NULL;
> +	void *read_tree_context = NULL;
> +	struct sparse_filter_data *sparse_filter_data = NULL;
>  	const struct option ls_tree_options[] = {
>  		OPT_BIT('d', NULL, &options.ls_options, N_("only show trees"),
>  			LS_TREE_ONLY),
> @@ -399,6 +463,9 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
>  					 N_("format to use for the output"),
>  					 PARSE_OPT_NONEG),
>  		OPT__ABBREV(&options.abbrev),
> +		OPT_STRING_F(0, "filter-sparse-oid", &sparse_oid_name, N_("filter-sparse-oid"),

Make that N_(...) be "object-id", "oid", "hash" or something?

I.e. the RHS with the <type> should be <thingy>, not
<thingy-for-some-purpose>.

> +			   N_("filter output with sparse-checkout specification contained in the blob"),
> +			     PARSE_OPT_NONEG),
>  		OPT_END()
>  	};
>  	struct ls_tree_cmdmode_to_fmt *m2f = ls_tree_cmdmode_format;
> @@ -474,7 +541,17 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
>  		break;
>  	}
>  
> -	ret = !!read_tree(the_repository, tree, &options.pathspec, fn, &options);
> +	if (sparse_oid_name) {
> +		init_sparse_filter_data(&sparse_filter_data, &options, sparse_oid_name, fn);
> +		read_tree_context = sparse_filter_data;
> +		fn = filter_sparse;
> +	} else
> +		read_tree_context = &options;

You're missing a {} here for the "else".

Better yet, delete that "else" and change the decl above to be:

	void *context = &options;

Then just keep the "if" here, where you might clobber the "context".

> +
> +	ret = !!read_tree(the_repository, tree, &options.pathspec, fn, read_tree_context);
>  	clear_pathspec(&options.pathspec);
> +	if (sparse_filter_data)
> +		free_sparse_filter_data(sparse_filter_data);

I suggested a fixup for this, but as an aside don't make a free_*()
function that's not happy to accept NULL, it should work like the free()
itself.

> +
>  	return ret;
>  }
> diff --git a/dir.c b/dir.c
> index 4e99f0c868f..122ebced08e 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1457,45 +1457,50 @@ int init_sparse_checkout_patterns(struct index_state *istate)
>  	return 0;
>  }
>  
> -static int path_in_sparse_checkout_1(const char *path,
> -				     struct index_state *istate,
> -				     int require_cone_mode)
> +int recursively_match_path_with_sparse_patterns(const char *path,
> +						struct index_state *istate,
> +						int dtype,
> +						struct pattern_list *pl)
>  {
> -	int dtype = DT_REG;
>  	enum pattern_match_result match = UNDECIDED;
>  	const char *end, *slash;
> -
> -	/*
> -	 * We default to accepting a path if the path is empty, there are no
> -	 * patterns, or the patterns are of the wrong type.
> -	 */
> -	if (!*path ||
> -	    init_sparse_checkout_patterns(istate) ||
> -	    (require_cone_mode &&
> -	     !istate->sparse_checkout_patterns->use_cone_patterns))
> -		return 1;
> -
>  	/*
>  	 * If UNDECIDED, use the match from the parent dir (recursively), or
>  	 * fall back to NOT_MATCHED at the topmost level. Note that cone mode
>  	 * never returns UNDECIDED, so we will execute only one iteration in
>  	 * this case.
>  	 */
> -	for (end = path + strlen(path);
> -	     end > path && match == UNDECIDED;
> +	for (end = path + strlen(path); end > path && match == UNDECIDED;

All good, aside from this whitespace refactoring, as noted.

>  	     end = slash) {
> -
>  		for (slash = end - 1; slash > path && *slash != '/'; slash--)
>  			; /* do nothing */
>  
>  		match = path_matches_pattern_list(path, end - path,
>  				slash > path ? slash + 1 : path, &dtype,
> -				istate->sparse_checkout_patterns, istate);
> +				pl, istate);
>  
>  		/* We are going to match the parent dir now */
>  		dtype = DT_DIR;
>  	}
> -	return match > 0;
> +
> +	return match;
> +}
> +
> +static int path_in_sparse_checkout_1(const char *path,
> +				     struct index_state *istate,
> +				     int require_cone_mode)
> +{
> +	/*
> +	 * We default to accepting a path if the path is empty, there are no
> +	 * patterns, or the patterns are of the wrong type.
> +	 */
> +	if (!*path ||
> +	    init_sparse_checkout_patterns(istate) ||
> +	    (require_cone_mode &&
> +	     !istate->sparse_checkout_patterns->use_cone_patterns))
> +		return 1;
> +
> +	return recursively_match_path_with_sparse_patterns(path, istate, DT_REG, istate->sparse_checkout_patterns) > 0;

All good, except for the really long line, aside from the previous
suggestion, maybe pull "istate->sparse_checkout_patterns" into a
variable, as it's now used twice here in this function?

> +check_agrees_with_ls_files () {
> +	REPO=repo  &&
> +	git -C $REPO submodule deinit -f --all &&
> +	git -C $REPO cat-file -p $filter_oid >"$REPO/.git/info/sparse-checkout" &&
> +	git -C $REPO sparse-checkout init --cone &&
> +	git -C $REPO submodule init &&
> +	git -C $REPO ls-files -t >out &&
> +	sed -n "/^S /!s/^. //p" out >expect &&
> +	test_cmp expect actual

Instead of this last "sed" munging, why not use the "--format" option to
"ls-tree" to just make the output the same?

^ permalink raw reply

* Re: [PATCH v9 0/2] send-email: expose header information to git-send-email's sendemail-validate hook
From: Ævar Arnfjörð Bjarmason @ 2023-01-23 13:51 UTC (permalink / raw)
  To: Michael Strawbridge; +Cc: git
In-Reply-To: <20230120012459.920932-1-michael.strawbridge@amd.com>


On Thu, Jan 19 2023, Michael Strawbridge wrote:

> Thanks to Ævar for an idea to simplify these patches further.
>
> Michael Strawbridge (2):
>   send-email: refactor header generation functions
>   send-email: expose header information to git-send-email's
>     sendemail-validate hook
>
>  Documentation/githooks.txt | 27 +++++++++--
>  git-send-email.perl        | 95 +++++++++++++++++++++++---------------
>  t/t9001-send-email.sh      | 27 ++++++++++-
>  3 files changed, 106 insertions(+), 43 deletions(-)

Thanks for the update. Aside from any quibbles, I still have some
fundimental concerns about the implementation here:

 * Other hooks take stdin, not this sort of file argument.

   We discussed that ending in
   https://public-inbox.org/git/20230117215811.78313-1-michael.strawbridge@amd.com/;
   but I probably shouldn't have mentioned "git hook" at all.

   I do think though that we shouldn't expose a UX discrepancy like this
   forever, but the ways forward out of that would seem to be to either
   to revert a7555304546 (send-email: use 'git hook run' for
   'sendemail-validate', 2021-12-22) & move forward from there, or to
   wait for those patches (which I'm currentnly CI-ing).

 * Aside from that, shouldn't we have a new "validate-headers" or
   whatever hook, instead of assuming that we can add another argument
   to existing users?...

 * ...except can we do it safely? Now, it seems to me like you have
   potential correctness issues here. We call format_2822_time() to make
   the headers, but that's based on "$time", which we save away earlier.

   But for the rest (e.g. "Message-Id" are we sure that we're giving the
   hook the same headers as the one we actually end up sending?

   But regardless of that, something that would bypass this entire
   stdin/potential correctness etc. problem is if we just pass an offset
   to the the, i.e. currently we have a "validate" which gets the
   contents, if we had a "validate-raw" or whatever we could just pass:

	<headers>
	\n\n
	<content>

   Where the current "validate" just gets "content", no? We could then
   either pass the offset to the "\n\n", or just trust that such a hook
   knows to find the "\n\n".

   I also think that would be more generally usable, as the tiny
   addition of some exit code interpretation would allow us to say "I
   got this, and consider this sent", which would also satisfy some who
   have wanted e.g. a way to intrecept it before it invokes "sendmail"
   (I remember a recent thread about that in relation to using "mutt" to
   send it directly)

   

^ permalink raw reply

* Re: [PATCH v4] sparse-checkout.txt: new document with sparse-checkout directions
From: ZheNing Hu @ 2023-01-23 15:05 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, git, Victoria Dye, Derrick Stolee,
	Shaoxuan Yuan, Matheus Tavares, Glen Choo, Martin von Zweigbergk
In-Reply-To: <CABPp-BF5npQrgoqE7cECJgsNpg7OWmVQrhEd=bm-KcpjsKhkvA@mail.gmail.com>

Elijah Newren <newren@gmail.com> 于2023年1月20日周五 12:30写道:
>
> On Sat, Jan 14, 2023 at 2:19 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > Elijah Newren <newren@gmail.com> 于2022年11月16日周三 13:49写道:
> >
> > > [...]
> > > > > +    The fact that files can move between the 'tracked' and 'untracked'
> > > > > +    categories means some commands will have to treat untracked files
> > > > > +    differently.  But if we have to treat untracked files differently,
> > > > > +    then additional commands may also need changes:
> > > > > +
> > > > > +      * status
> > > > > +      * clean
> > > > > +
> > > >
> > > > I'm a bit worried about git status, because it's used in many shells
> > > > (e.g. zsh) i
> > > > in the git prompt function. Its default behavior is restricted, otherwise users
> > > > may get blocked when they use zsh to cd to that directory. I don't know how
> > > > to reproduce this problem (since the scenario is built on checkout to a local
> > > > unborn branch).
> > >
> > > Could you elaborate?  I'm not sure if you are talking about an
> > > existing problem that you are worried about being exacerbated, or a
> > > hypothetical problem that could occur with changes.  Further, your
> > > wording is so vague about the problem, that I have no idea what its
> > > nature is or whether any changes to status would even possibly have
> > > any bearing on it.  But the suggested changes to git status are
> > > simply:
> > >
> >
> > I find this special case, it will fetch some blobs when "git status".
> > First, we init a git repository, then set sparse specification to "*.js" with
> > no-cone mode, then use blob:none filter to fetch all commits and trees,
> > and finally checkout to the default branch.
> >
> > #!/bin/sh
> >
> > rm -rf sparse-checkout-example
> > git init sparse-checkout-example
> > git -C sparse-checkout-example remote add origin
> > git@github.com:derrickstolee/sparse-checkout-example.git
> > git -C sparse-checkout-example sparse-checkout set --no-cone *.js
> > git -C sparse-checkout-example fetch origin --filter=blob:none main
> > git -C sparse-checkout-example branch --track main origin/main
> > git -C sparse-checkout-example checkout main
> >
> > Then let's do a git status, which some zsh git plugin
> > will do when user "cd" the git repository.
> >
> > # git -C sparse-checkout-exmaple status
> >
> > remote: Enumerating objects: 1, done.
> > remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 1
> > Receiving objects: 100% (1/1), 416 bytes | 416.00 KiB/s, done.
> > remote: Enumerating objects: 1, done.
> > remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 1
> > Receiving objects: 100% (1/1), 160 bytes | 160.00 KiB/s, done.
> > remote: Enumerating objects: 1, done.
> > remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 1
> > Receiving objects: 100% (1/1), 2.01 KiB | 2.01 MiB/s, done.
> > remote: Enumerating objects: 1, done.
> > remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 1
> > Receiving objects: 100% (1/1), 120 bytes | 120.00 KiB/s, done.
> > On branch main
> > Your branch is up to date with 'origin/main'.
> >
> > You are in a sparse checkout with 8% of tracked files present.
> >
> >
> > It took 17.00 seconds to enumerate untracked files. 'status -uno'
> > may speed it up, but you have to be careful not to forget to add
> > new files yourself (see 'git help status').
> > nothing to commit, working tree clean
> >
> > Yeah, here it fetches four blobs, and takes 17s!!!
> >
> > So what blobs are we fetching?
> >
> > GIT_TRACE_PACKET=1 git -C sparse-checkout-example status
> > ...
> > 18:02:32.989231 pkt-line.c:80           packet:        fetch> want
> > dff85a65c0ef4b50a4c01bdd4a247b974bc45f90
> > ...
> > 18:02:37.059203 pkt-line.c:80           packet:        fetch> want
> > f07ead02d13f62414589b1f1b891bb6a764ec91f
> > ...
> > 18:02:40.868899 pkt-line.c:80           packet:        fetch> want
> > 3c4efe206bd0e7230ad0ae8396a3c883c8207906
> > ...
> > 18:02:44.961809 pkt-line.c:80           packet:        fetch> want
> > 6590681af7e177dc71fe08648c4bbf4223b82866
> >
> >
> > Then let's we look what's the blob:
> > git log --find-object=dff85a65c0ef4b50a4c01bdd4a247b974bc45f90 --stat
> > commit 8ec229339caad56eb849c67361a9699004564177
> > Author: Derrick Stolee <dstolee@microsoft.com>
> > Date:   Mon Dec 30 13:30:27 2019 -0500
> >
> >     Add twbs/bootstrap
> >
> >  web/browser/.gitignore | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 47 insertions(+)
> >
> > git log --find-object=f07ead02d13f62414589b1f1b891bb6a764ec91f --stat
> > commit 9c5a31030de62355410a322923e33e90a00032f6
> > Author: Derrick Stolee <dstolee@microsoft.com>
> > Date:   Mon Dec 30 13:31:06 2019 -0500
> >
> >     Add artsy/artsy.github.io
> >
> >  web/editor/.gitignore | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > Yeah, it seems that git status fetch these .gitigore files.
> > So what's the wrong here?
>
> Nothing; this looks exactly as I'd expect.
>
> `git status` is supposed to check whether untracked files are ignored
> or not, and it needs .gitignore files to do so.  If an entire
> directory is missing, then it can avoid loading a .gitignore under
> that directory, but you set up your patterns such that no directory is
> likely to be excluded.
>

This may be a bit strange, why doesn't the cone mode have this problem?

> Maybe there are other special cases that one could attempt to handle
> (e.g. first check if there are any untracked files in a directory and
> only then check for which are ignored, but that'd probably take some
> big refactoring of some hairy dir.c code to accomplish something like
> that, and you'd have to be really careful to not regress the
> performance of non-sparse cases).  Personally, I don't think it's
> worth it.  If you really don't like it, I'd suggest choosing any one
> of the following ways to avoid it:
>

This "bug" may not be very important, because it has no application
at present,  scalar also uses cone mode by default.

>   * Include the .gitignore files in your sparse-checkout; might as
> well since you're going to get them anyway.

This can also seem like a pain in the butt because of the extra
unnecessary .gitigore downloads.

>   * Set status.showUntrackedFiles to `no` so that .gitignore files are
> irrelevant

This may also be a temporary rather than a complete solution.

>   * Use cone mode instead of non-cone mode (and yes, restructure your
> repo if needed)

No, I might think cone mode has some other disadvantages: it includes
too many files. I would prefer to get the behavior of non-cone mode.

>   * Remove the .gitignore files and commit their deletion.  Just do
> without them, somehow.

That's not a proper solution absolutely.

Anyway, thanks for the answer!
--
ZheNing Hu

^ permalink raw reply

* Re: [PATCH v2 1/2] docs: fix sparse-checkout docs link
From: ZheNing Hu @ 2023-01-23 15:15 UTC (permalink / raw)
  To: Elijah Newren
  Cc: ZheNing Hu via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee, Victoria Dye,
	Nguyễn Thái Ngọc Duy
In-Reply-To: <CABPp-BEC4PmQfYT=UhtbJ5QcXXMqwF1e-KvSVVDNjDy69bGH5w@mail.gmail.com>

Elijah Newren <newren@gmail.com> 于2023年1月20日周五 13:12写道:
>
> >  This list used to be a lot longer (see e.g. [1,2,3,4,5,6,7,8,9]), but we've
> >  been working on it.
> >
> > -0. Behavior A is not well supported in Git.  (Behavior B didn't used to
> > -   be either, but was the easier of the two to implement.)
> > +Behavior A is not well supported in Git.  (Behavior B didn't used to
> > +be either, but was the easier of the two to implement.)
>
> Why did you remove the numbering on this, though?  If asciidoc doesn't
> like starting with 0 (the only guess I can think of for why you'd
> change this), shouldn't the series be renumbered starting at 1 rather
> than removing this from the list?
>

I see, I admitted that I treated "0." as some overview information and it
should also be considered as one item of the "Known bugs".

> >  1. am and apply:
> >
> > @@ -1052,7 +1066,8 @@ been working on it.
> >      https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/
> >
> >
> > -=== Reference Emails ===
> > +Reference Emails
> > +----------------
> >
> >  Emails that detail various bugs we've had in sparse-checkout:
> >
> > --
> > gitgitgadget

^ permalink raw reply

* Re: [PATCH v2 1/2] docs: fix sparse-checkout docs link
From: ZheNing Hu @ 2023-01-23 15:16 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Elijah Newren, ZheNing Hu via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee, Victoria Dye,
	Nguyễn Thái Ngọc Duy
In-Reply-To: <CAN0heSoVguXpGC4PMsvh8CedwSvxu8A=iG2hT8Szxdq2ivh9rw@mail.gmail.com>

Martin Ågren <martin.agren@gmail.com> 于2023年1月20日周五 17:35写道:
>
> On Fri, 20 Jan 2023 at 06:29, Elijah Newren <newren@gmail.com> wrote:
> >
> > On Thu, Jan 19, 2023 at 9:34 AM ZheNing Hu via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > >
> > > From: ZheNing Hu <adlternative@gmail.com>
> > > So fix the format of sparse-checkout.txt, and link it in the
> > > Makefile to correct that.
>
> > > -0. Behavior A is not well supported in Git.  (Behavior B didn't used to
> > > -   be either, but was the easier of the two to implement.)
> > > +Behavior A is not well supported in Git.  (Behavior B didn't used to
> > > +be either, but was the easier of the two to implement.)
> >
> > Why did you remove the numbering on this, though?  If asciidoc doesn't
> > like starting with 0 (the only guess I can think of for why you'd
> > change this), shouldn't the series be renumbered starting at 1 rather
> > than removing this from the list?
>
> It looks like the zero causes both asciidoc and Asciidoctor to emit
> warnings (one per item, since each item's number is off by one). They
> also helpfully relabel everything starting at 1.
>
> I agree that there's a better fix here than dropping the 0. Either
> renumber everything or, probably better, just use "." for each item
> rather than "1.", "2." and so on. The right numbers will be inserted
> automatically. This also means that if an item is ever added earlier in
> the list, we won't need to update all the numbers below that point.
>

Good idea.

> (The numbers being generated automatically means we can't refer to them
> ("see item 2") without potentially getting out of sync, but that is true
> regardless if the numbers are corrected for us, as now, or if we just
> use ".".)
>

That shouldn't matter, there are no references to any of these items.

> The contents of these list items could be prettified in various ways,
> but I'm not sure what the status and goal is for these technical/
> documents. Avoiding warnings in the build process, as ZheNing aimed for,
> seems like a good idea regardless.
>
> Martin

Thanks,
--
ZheNing Hu

^ permalink raw reply

* [PATCH v2 01/10] bundle: optionally skip reachability walk
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

When unbundling a bundle, the verify_bundle() method checks two things
with regards to the prerequisite commits:

 1. Those commits are in the object store, and
 2. Those commits are reachable from refs.

During testing of the bundle URI feature, where multiple bundles are
unbundled in the same process, the ref store did not appear to be
refreshing with the new refs/bundles/* references added within that
process. This caused the second half -- the reachability walk -- report
that some commits were not present, despite actually being present.

One way to attempt to fix this would be to create a way to force-refresh
the ref state. That would correct this for these cases where the
refs/bundles/* references have been updated. However, this still is an
expensive operation in a repository with many references.

Instead, optionally allow callers to skip this portion by instead just
checking for presence within the object store. Use this when unbundling
in bundle-uri.c.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 8 +++++++-
 bundle.c     | 3 ++-
 bundle.h     | 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 36268dda172..2f079f713cf 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -322,9 +322,15 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	 * Skip the reachability walk here, since we will be adding
 	 * a reachable ref pointing to the new tips, which will reach
 	 * the prerequisite commits.
+	 *
+	 * Since multiple iterations of unbundle_from_file() can create
+	 * new commits in the object store that are not reachable from
+	 * the current cached state of the ref store, skip the reachability
+	 * walk and move forward as long as the objects are present in the
+	 * object store.
 	 */
 	if ((result = unbundle(r, &header, bundle_fd, NULL,
-			       VERIFY_BUNDLE_QUIET)))
+			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_SKIP_REACHABLE)))
 		return 1;
 
 	/*
diff --git a/bundle.c b/bundle.c
index 4ef7256aa11..b51974f0806 100644
--- a/bundle.c
+++ b/bundle.c
@@ -223,7 +223,8 @@ int verify_bundle(struct repository *r,
 			error("%s", message);
 		error("%s %s", oid_to_hex(oid), name);
 	}
-	if (revs.pending.nr != p->nr)
+	if (revs.pending.nr != p->nr ||
+	    (flags & VERIFY_BUNDLE_SKIP_REACHABLE))
 		goto cleanup;
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
diff --git a/bundle.h b/bundle.h
index 9f2bd733a6a..24c30e5f74a 100644
--- a/bundle.h
+++ b/bundle.h
@@ -34,6 +34,7 @@ int create_bundle(struct repository *r, const char *path,
 enum verify_bundle_flags {
 	VERIFY_BUNDLE_VERBOSE = (1 << 0),
 	VERIFY_BUNDLE_QUIET = (1 << 1),
+	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 2),
 };
 
 int verify_bundle(struct repository *r, struct bundle_header *header,
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 02/10] t5558: add tests for creationToken heuristic
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

As documented in the bundle URI design doc in 2da14fad8fe (docs:
document bundle URI standard, 2022-08-09), the 'creationToken' member of
a bundle URI allows a bundle provider to specify a total order on the
bundles.

Future changes will allow the Git client to understand these members and
modify its behavior around downloading the bundles in that order. In the
meantime, create tests that add creation tokens to the bundle list. For
now, the Git client correctly ignores these unknown keys.

Create a new test helper function, test_remote_https_urls, which filters
GIT_TRACE2_EVENT output to extract a list of URLs passed to
git-remote-https child processes. This can be used to verify the order
of these requests as we implement the creationToken heuristic. For now,
we need to sort the actual output since the current client does not have
a well-defined order that it applies to the bundles.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t5558-clone-bundle-uri.sh | 69 +++++++++++++++++++++++++++++++++++--
 t/test-lib-functions.sh     |  8 +++++
 2 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9155f31fa2c..474432c8ace 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -285,6 +285,8 @@ test_expect_success 'clone HTTP bundle' '
 '
 
 test_expect_success 'clone bundle list (HTTP, no heuristic)' '
+	test_when_finished rm -f trace*.txt &&
+
 	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
 	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle]
@@ -304,12 +306,26 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' '
 		uri = $HTTPD_URL/bundle-4.bundle
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+		git clone --bundle-uri="$HTTPD_URL/bundle-list" \
 		clone-from clone-list-http  2>err &&
 	! grep "Repository lacks these prerequisite commits" err &&
 
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
-	git -C clone-list-http cat-file --batch-check <oids
+	git -C clone-list-http cat-file --batch-check <oids &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-1.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-list
+	EOF
+
+	# Sort the list, since the order is not well-defined
+	# without a heuristic.
+	test_remote_https_urls <trace-clone.txt | sort >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'clone bundle list (HTTP, any mode)' '
@@ -350,6 +366,55 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'clone bundle list (http, creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" git \
+		clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" clone-list-http-2 &&
+
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-list-http-2 cat-file --batch-check <oids &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-1.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-list
+	EOF
+
+	# Since the creationToken heuristic is not yet understood by the
+	# client, the order cannot be verified at this moment. Sort the
+	# list for consistent results.
+	test_remote_https_urls <trace-clone.txt | sort >actual &&
+	test_cmp expect actual
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index f036c4d3003..ace542f4226 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1833,6 +1833,14 @@ test_region () {
 	return 0
 }
 
+# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs
+# sent to git-remote-https child processes.
+test_remote_https_urls() {
+	grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' |
+		sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \
+		    -e 's/"\]}//g'
+}
+
 # Print the destination of symlink(s) provided as arguments. Basically
 # the same as the readlink command, but it's not available everywhere.
 test_readlink () {
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

The previous change taught Git to parse the bundle.heuristic value,
especially when its value is "creationToken". Now, teach Git to parse
the bundle.<id>.creationToken values on each bundle in a bundle list.

Before implementing any logic based on creationToken values for the
creationToken heuristic, parse and print these values for testing
purposes.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 10 ++++++++++
 bundle-uri.h                |  6 ++++++
 t/t5750-bundle-uri-parse.sh | 18 ++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 0d64b1d84ba..f46ab5c1743 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -83,6 +83,9 @@ static int summarize_bundle(struct remote_bundle_info *info, void *data)
 	FILE *fp = data;
 	fprintf(fp, "[bundle \"%s\"]\n", info->id);
 	fprintf(fp, "\turi = %s\n", info->uri);
+
+	if (info->creationToken)
+		fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken);
 	return 0;
 }
 
@@ -203,6 +206,13 @@ static int bundle_list_update(const char *key, const char *value,
 		return 0;
 	}
 
+	if (!strcmp(subkey, "creationtoken")) {
+		if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
+			warning(_("could not parse bundle list key %s with value '%s'"),
+				"creationToken", value);
+		return 0;
+	}
+
 	/*
 	 * At this point, we ignore any information that we don't
 	 * understand, assuming it to be hints for a heuristic the client
diff --git a/bundle-uri.h b/bundle-uri.h
index 2e44a50a90b..ef32840bfa6 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -42,6 +42,12 @@ struct remote_bundle_info {
 	 * this boolean is true.
 	 */
 	unsigned unbundled:1;
+
+	/**
+	 * If the bundle is part of a list with the creationToken
+	 * heuristic, then we use this member for sorting the bundles.
+	 */
+	uint64_t creationToken;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { 0 }
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 6fc92a9c0d4..81bdf58b944 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -258,10 +258,13 @@ test_expect_success 'parse config format: creationToken heuristic' '
 		heuristic = creationToken
 	[bundle "one"]
 		uri = http://example.com/bundle.bdl
+		creationToken = 123456
 	[bundle "two"]
 		uri = https://example.com/bundle.bdl
+		creationToken = 12345678901234567890
 	[bundle "three"]
 		uri = file:///usr/share/git/bundle.bdl
+		creationToken = 1
 	EOF
 
 	test-tool bundle-uri parse-config expect >actual 2>err &&
@@ -269,4 +272,19 @@ test_expect_success 'parse config format: creationToken heuristic' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format edge cases: creationToken heuristic' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+		creationToken = bogus
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	grep "could not parse bundle list key creationToken with value '\''bogus'\''" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox