Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v2] compat/posix.h: enable UNUSED warning messages for Clang
From: Patrick Steinhardt @ 2026-06-05 10:40 UTC (permalink / raw)
  To: Dominik Loidolt; +Cc: gitster, git, asedeno, asedeno, avarab
In-Reply-To: <20260605094647.94805-1-dominik.loidolt@univie.ac.at>

On Fri, Jun 05, 2026 at 11:46:47AM +0200, Dominik Loidolt wrote:
> Use a dedicated Clang version check for the UNUSED macro.
> 
> Commit 7c07f36ad2 (git-compat-util.h: GCC deprecated message arg only in
> GCC 4.5+, 2022-10-05) restricted use of the deprecated attribute's
> message argument in the UNUSED macro to GCC 4.5 or newer.

Ah. I was briefly wondering about this because the UNUSED macro already
works. But the important part here is that it's really only about better
diagnostics via the attribute message.

> Clang identifies itself as GNUC 4.2.1 for compatibility, so
> GIT_GNUC_PREREQ(4, 5) does not detect whether Clang supports the
> deprecated("...") form. Add GIT_CLANG_PREREQ() macro and use it to
> enable the UNUSED warning message for Clang 2.9 and newer.

There's a second user of `GIT_GNUC_PREREQ` in "git-compat-util.h", but
that user checks for GCC 3.1. And as Clang identifies as a newer version
we don't have to adapt any other callsites.

> diff --git a/compat/posix.h b/compat/posix.h
> index faaae1b655..88ad29d74b 100644
> --- a/compat/posix.h
> +++ b/compat/posix.h
> @@ -22,6 +22,17 @@
>   #define GIT_GNUC_PREREQ(maj, min) 0
>  #endif
> 
> +/*
> + * Similar for Clang
> + */

Micronit, not worth rerolling over: this could have easily been a single
line: `/* Similar for Clang. */`

> +#if defined(__clang__) && defined(__clang_minor__) && defined(__clang_major__)
> +# define GIT_CLANG_PREREQ(maj, min) \
> +	((__clang_major__ > (maj)) || \
> +	 (__clang_major__ == (maj) && (__clang_minor__ >= (min))))
> +#else
> +# define GIT_CLANG_PREREQ(maj, min) 0
> +#endif
> +
>  /*
>   * UNUSED marks a function parameter that is always unused.  It also
>   * can be used to annotate a function, a variable, or a type that is
> @@ -35,7 +46,7 @@
>   * When a parameter may be used or unused, depending on conditional
>   * compilation, consider using MAYBE_UNUSED instead.
>   */
> -#if GIT_GNUC_PREREQ(4, 5)
> +#if GIT_GNUC_PREREQ(4, 5) || GIT_CLANG_PREREQ(2, 9)
>  #define UNUSED __attribute__((unused)) \
>  	__attribute__((deprecated ("parameter declared as UNUSED")))
>  #elif defined(__GNUC__)

Makes sense, thanks!

Patrick

^ permalink raw reply

* [PATCH v2] compat/posix.h: enable UNUSED warning messages for Clang
From: Dominik Loidolt @ 2026-06-05  9:46 UTC (permalink / raw)
  To: gitster; +Cc: git, asedeno, asedeno, avarab, Dominik Loidolt
In-Reply-To: <20260503151210.36036-1-dominik.loidolt@univie.ac.at>

Use a dedicated Clang version check for the UNUSED macro.

Commit 7c07f36ad2 (git-compat-util.h: GCC deprecated message arg only in
GCC 4.5+, 2022-10-05) restricted use of the deprecated attribute's
message argument in the UNUSED macro to GCC 4.5 or newer.

Clang identifies itself as GNUC 4.2.1 for compatibility, so
GIT_GNUC_PREREQ(4, 5) does not detect whether Clang supports the
deprecated("...") form. Add GIT_CLANG_PREREQ() macro and use it to
enable the UNUSED warning message for Clang 2.9 and newer.

Signed-off-by: Dominik Loidolt <dominik.loidolt@univie.ac.at>
---
v2:
 - add GIT_CLANG_PREREQ()
 - require Clang 2.9+ for deprecated("...") in UNUSED

 compat/posix.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/compat/posix.h b/compat/posix.h
index faaae1b655..88ad29d74b 100644
--- a/compat/posix.h
+++ b/compat/posix.h
@@ -22,6 +22,17 @@
  #define GIT_GNUC_PREREQ(maj, min) 0
 #endif

+/*
+ * Similar for Clang
+ */
+#if defined(__clang__) && defined(__clang_minor__) && defined(__clang_major__)
+# define GIT_CLANG_PREREQ(maj, min) \
+	((__clang_major__ > (maj)) || \
+	 (__clang_major__ == (maj) && (__clang_minor__ >= (min))))
+#else
+# define GIT_CLANG_PREREQ(maj, min) 0
+#endif
+
 /*
  * UNUSED marks a function parameter that is always unused.  It also
  * can be used to annotate a function, a variable, or a type that is
@@ -35,7 +46,7 @@
  * When a parameter may be used or unused, depending on conditional
  * compilation, consider using MAYBE_UNUSED instead.
  */
-#if GIT_GNUC_PREREQ(4, 5)
+#if GIT_GNUC_PREREQ(4, 5) || GIT_CLANG_PREREQ(2, 9)
 #define UNUSED __attribute__((unused)) \
 	__attribute__((deprecated ("parameter declared as UNUSED")))
 #elif defined(__GNUC__)

base-commit: a89346e34a937f001e5d397ee62224e3e9852040
--
2.54.0


^ permalink raw reply related

* Re: Mirror repositories for submodules
From: Matt Hunter @ 2026-06-05  9:34 UTC (permalink / raw)
  To: Benson Muite, Simon Richter, Junio C Hamano; +Cc: git
In-Reply-To: <87pl25r3tz.fsf@emailplus.org>

On Fri Jun 5, 2026 at 12:47 AM EDT, Benson Muite wrote:
>
> For submodules, the metadata consists of the url of the repository to
> clone from.  One could have a list of absolute URLs.  The default would
> be to assume that the URLs are tried in order, and if a URL times out,
> the next one would be tried.  One may want to change the default
> ordering as a user setting, or do a ping test to get obtain content from
> the closest repository.

Another idea is for the client to attempt in a random order as a kind of
load balancing.

>
> As an example, for linphone-desktop, the first part of the .gitmodules
> file contains:
>
> [submodule "linphone-sdk"]
> path = external/linphone-sdk
> 	url = https://gitlab.linphone.org/BC/public/linphone-sdk.git
> [submodule "external/google/gn"]
>
> This could be updated to
>
> [submodule "linphone-sdk"]
> path = external/linphone-sdk
> 	url = https://gitlab.linphone.org/BC/public/linphone-sdk.git
>         url = https://github.com/BelledonneCommunications/linphone-sdk.git
> [submodule "external/google/gn"]

Relevant to Junio's earlier comment about aiming for a general solution:

It looks like the remote URL configuration already supports multiple
URLs per a single entry.  It's just that git only considers the first in
the list for fetching, and will push to all.

Your proposed change to .gitmodules may be used to initialize the list
of URLs for a cloned submodule's (single) initial remote?  At this
point, any kind of mirror management logic could generically operate on
each remote's URL list.

From git-config(1):
    remote.<name>.url
        The  URL of a remote repository. See git-fetch(1) or git-push(1).
        A configured remote can have multiple URLs; in this case the first
        is used for fetching, and all are used for pushing (assuming no
        remote.<name>.pushurl is defined).  Setting this key to the empty
        string clears the list of urls, allowing you to override earlier
        config.

From gitmodules(5):
    submodule.<name>.url
        Defines a URL from which the submodule repository can be cloned.
        This may be either an absolute URL ready to be passed to
        git-clone(1) or (if it begins with ./ or ../) a location relative
        to the superproject’s origin repository.

Note that the submodule format seems to explicitly support only a single
value right now.  Of course, I am assuming that the implementation
matches the documentation.

^ permalink raw reply

* Re: [PATCH] compat/posix.h: enable UNUSED warning messages for Clang
From: Dominik Loidolt @ 2026-06-05  8:44 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Alejandro R Sedeño, Alejandro R. Sedeño,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <xmqqqznsossj.fsf@gitster.g>

On Mon, May 04, 2026 at 10:41:16AM +0900, Junio C Hamano wrote:
> > Does this "drop support" because you force _all_ versions of Clang
> > to use the "deprecated" attribute, even though you _know_ some older
> > versions do not understand it?  Don't these versions identify
> > themselves so that you can do
> >
> > 	#if defined(__clang__) && CLANG_VERSION >= 2.9
>
> IOW, something like this, perhaps?

Yes, you're right. My original patch breaks older Clang versions for no
good reason.

I'll send a v2 with an explicit Clang version check, as you suggested.

Thanks,
 Dominik

^ permalink raw reply

* Re: Message from Debaashish Nandi
From: Weijie Yuan @ 2026-06-05  6:43 UTC (permalink / raw)
  To: Debaashish Nandi; +Cc: git
In-Reply-To: <fcb34925-f10c-4fc4-826d-a55217101ec8@localhost>

On Thu, Jun 04, 2026 at 06:37:12PM +0000, Debaashish Nandi wrote:
> Hello, is there any game playing which l can understand how git works ?

You may try this:

https://learngitbranching.js.org/

source code:
https://github.com/pcottle/learnGitBranching

^ permalink raw reply

* Re: Mirror repositories for submodules
From: Benson Muite @ 2026-06-05  5:05 UTC (permalink / raw)
  To: Simon Richter, Junio C Hamano; +Cc: git
In-Reply-To: <d64e7f31-4e00-478c-ab31-b671242865fb@hogyros.de>

Simon Richter <Simon.Richter@hogyros.de> writes:

> Hi,
>
> On 6/4/26 10:09 AM, Junio C Hamano wrote:
>
>> So, no, I do not think a contribution to add mirror repositories as
>> alternate submodule sources should be considered for inclusion, as
>> it artificially limits usefulness of the feature.  A feature to add
>> mirror repositories as alternate sources might be worth considering,
>> though.
>
> This is relevant to the Debian use case: we run a git server that 
> archives git trees for Debian packages, and ideally the objects on this 
> server should be identical to what you get from upstream projects.
>
> This is a big problem for archiving projects that use submodules, 
> because we cannot alter the reference URLs.
>
> Cloning from our server will, depending on what upstream uses, either a 
> relative URL (which will go to our server, but we have little control 
> over what the name part of the repository base URL is going to be), or 
> an absolute URL that instructs clients to pull from another place, which 
> conflicts with our goal to have a self-contained archive.
>
> The idea posited earlier, to have a "repository identity" that remains 
> the same across forks and clones, is somewhat appealing, but the best 
> idea I can come up with is generating some kind of repository UUID, and 
> adding a symlink -- not a great design because it pollutes outside the repo:
>
>      $ mkdir myproject
>      $ cd myproject
>      $ git init
>      $ ls -l ..
>      lrwxrwxrwx 1 simon simon   9 Jun  4 14:05 
> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
>      drwxrwxr-x 2 simon simon  40 Jun  4 14:04 myproject
>
> On the other hand, this can be used to construct a stable relative 
> submodule URL.
>
> Making the symlinks optional would require keeping a list of local 
> clones and their UUIDs, and resolving them.
>
> I don't like that design, but as I said it's the best idea I have for now.
>

For submodules, the metadata consists of the url of the repository to
clone from.  One could have a list of absolute URLs.  The default would
be to assume that the URLs are tried in order, and if a URL times out,
the next one would be tried.  One may want to change the default
ordering as a user setting, or do a ping test to get obtain content from
the closest repository.

As an example, for linphone-desktop, the first part of the .gitmodules
file contains:

[submodule "linphone-sdk"]
path = external/linphone-sdk
         url = https://gitlab.linphone.org/BC/public/linphone-sdk.git
[submodule "external/google/gn"]

This could be updated to

[submodule "linphone-sdk"]
path = external/linphone-sdk
         url = https://gitlab.linphone.org/BC/public/linphone-sdk.git
         url = https://github.com/BelledonneCommunications/linphone-sdk.git
[submodule "external/google/gn"]

> I also fully expect that Debian's servers will be used by a lot of 
> people outside the project as soon as it becomes a convenient fallback, 
> in the same way people are pulling .orig.tar.gz archives from Debian 
> mirrors, so we need to make it easy to set up a mirror, to allow this to 
> scale.
>
>     Simon

^ permalink raw reply

* Re: Mirror repositories for submodules
From: Benson Muite @ 2026-06-05  4:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqcxy7qfgk.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Benson Muite <benson_muite@emailplus.org> writes:
>
>> Would a contribution to add mirror repositories as alternate submodule
>> sources be considered for inclusion?  Some projects have mirror
>> repositories on other hosting services, and may have bandwidth limits on
>> their primary hosting service.  Being able to indicate mirror
>> repositories for where to check for updates and sources for submodules
>> when doing `git clone --recurse-submodules https://my.repo ` or `git
>> submodule update --init --recursive` would be helpful when there is a
>> timeout.
>
> I do not see why such a "oh, the repository at $URL1 seems to be
> down, but we know $URL2 serves the equivalent information, so let's
> go there instead" feature has to be limited to submodule use case.
>
> So, no, I do not think a contribution to add mirror repositories as
> alternate submodule sources should be considered for inclusion, as
> it artificially limits usefulness of the feature.  A feature to add
> mirror repositories as alternate sources might be worth considering,
> though.

Thanks for the feedback. This was motivated by problems when trying to
recursively clone, but a more general solution is also fine.

^ permalink raw reply

* Re: Mirror repositories for submodules
From: Benson Muite @ 2026-06-05  4:54 UTC (permalink / raw)
  To: Jeff King, Simon Richter; +Cc: Junio C Hamano, git
In-Reply-To: <20260604061605.GA3194609@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Thu, Jun 04, 2026 at 02:11:38PM +0900, Simon Richter wrote:
>
>> Cloning from our server will, depending on what upstream uses, either a
>> relative URL (which will go to our server, but we have little control over
>> what the name part of the repository base URL is going to be), or an
>> absolute URL that instructs clients to pull from another place, which
>> conflicts with our goal to have a self-contained archive.
>> 
>> The idea posited earlier, to have a "repository identity" that remains the
>> same across forks and clones, is somewhat appealing, but the best idea I can
>> come up with is generating some kind of repository UUID, and adding a
>> symlink -- not a great design because it pollutes outside the repo:
>> 
>>     $ mkdir myproject
>>     $ cd myproject
>>     $ git init
>>     $ ls -l ..
>>     lrwxrwxrwx 1 simon simon   9 Jun  4 14:05
>> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
>>     drwxrwxr-x 2 simon simon  40 Jun  4 14:04 myproject
>> 
>> On the other hand, this can be used to construct a stable relative submodule
>> URL.
>
> Here's a thought experiment. What if you put the UUID into a URL, like:
>
>   repoid://123456789.git
>
> Then your in-repo .gitconfig would point to that repo id and be
> consistent. Of course you need some way to tell Git how to retrieve
> repoid:// URLs. You could do so with a custom remote helper
> (git-remote-repoid), but presumably that helper is eventually going to
> end up going over one of the normal Git protocols.
>
> So we just need to tell Git how to resolve repo id URLs into concrete
> URLs. And indeed, we have url.*.insteadOf to do rewriting already. So
> for example, you can add a submodule but convert it into a uuid like
> this:
>
>   $ git submodule add https://github.com/git/git.git
>   $ git config -f .gitmodules submodule.git.url
>   https://github.com/git/git.git
>   $ git config -f .gitmodules submodule.git.url repoid://123456789.git
>   $ git commit -am 'add submodule with magic repoid'
>
> Now if somebody else comes along and clones it naively, the repo uuid is
> not useful to git by itself:
>
>   $ git clone --recurse-submodules repo
>   Submodule 'git' (repoid://123456789.git) registered for path 'git'
>   Cloning into '/home/peff/tmp/repo/git'...
>   fatal: transport 'repoid' not allowed
>   fatal: clone of 'repoid://123456789.git' into submodule path '/home/peff/tmp/repo/git' failed
>
> But imagine that "somehow" they have learned that 123456789.git can be
> found at some URL. You can do this:
>
>   git -c url.https://github.com/git/git.git.insteadOf=repoid://123456789.git \
>       clone --recurse-submodules repo.git
>
> which would clone from the original URL. Or you could even imagine that
> they have a cache of repositories named by uuid, and then:
>
>   git -c url.https://my/cache/.insteadOf=repoid:// ...
>
> would rewrite all repoid://'s automatically.
>
> The use of "-c" here is mostly for illustration. It is a per-command
> config, so when you later try to update the submodule, you'd run into
> the same problem. Probably you'd want to stuff your mapping into on-disk
> config (either ~/.gitconfig, or if you have a lot of them, perhaps some
> file included from there).
>
> It would be nice if you could use "git clone -c" (note "-c" as an option
> to "clone", not to "git") to set a permanent per-repo config variable.
> But sadly the URL rewriting happens in the submodule repository, not the
> parent. So it has to be a per-user setting.
>
>
> Now, all of that said, do we still need uuids at all? If the canonical
> submodule name is https://github.com/git/git.git, then anybody can just
> rewrite that locally in the same way using url.*.insteadOf config. And I
> think this is a pretty standard way of using submodules. E.g., you might
> rewrite https:// into ssh:// if you prefer that protocol. Or point to a
> local server if it's faster for you.
>
> Which makes me wonder if I am missing something about the original
> request that started this thread. But it sounds to me like it is just
> asking for the existing URL-rewriting feature.
>

The  problem is that one might have multiple repositories, submodules
may themselves have submodules.  Typically a primary development
organization will have its own host, but may also have mirrors on other
services which maybe more convenient for others to use.  A recursive
clone could give upto 20 repositories not all of which are maintained by
the same organization.  URL-rewriting each of them can be inefficient,
especially when the upstream maintains the mirror repositories and can
indicate that in the source repositories.


> -Peff

^ permalink raw reply

* Re: Mirror repositories for submodules
From: Benson Muite @ 2026-06-05  4:47 UTC (permalink / raw)
  To: Simon Richter, Junio C Hamano; +Cc: git
In-Reply-To: <d64e7f31-4e00-478c-ab31-b671242865fb@hogyros.de>

Simon Richter <Simon.Richter@hogyros.de> writes:

> Hi,
>
> On 6/4/26 10:09 AM, Junio C Hamano wrote:
>
>> So, no, I do not think a contribution to add mirror repositories as
>> alternate submodule sources should be considered for inclusion, as
>> it artificially limits usefulness of the feature.  A feature to add
>> mirror repositories as alternate sources might be worth considering,
>> though.
>
> This is relevant to the Debian use case: we run a git server that 
> archives git trees for Debian packages, and ideally the objects on this 
> server should be identical to what you get from upstream projects.
>
> This is a big problem for archiving projects that use submodules, 
> because we cannot alter the reference URLs.
>
> Cloning from our server will, depending on what upstream uses, either a 
> relative URL (which will go to our server, but we have little control 
> over what the name part of the repository base URL is going to be), or 
> an absolute URL that instructs clients to pull from another place, which 
> conflicts with our goal to have a self-contained archive.
>
> The idea posited earlier, to have a "repository identity" that remains 
> the same across forks and clones, is somewhat appealing, but the best 
> idea I can come up with is generating some kind of repository UUID, and 
> adding a symlink -- not a great design because it pollutes outside the repo:
>
>      $ mkdir myproject
>      $ cd myproject
>      $ git init
>      $ ls -l ..
>      lrwxrwxrwx 1 simon simon   9 Jun  4 14:05 
> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
>      drwxrwxr-x 2 simon simon  40 Jun  4 14:04 myproject
>
> On the other hand, this can be used to construct a stable relative 
> submodule URL.
>
> Making the symlinks optional would require keeping a list of local 
> clones and their UUIDs, and resolving them.
>
> I don't like that design, but as I said it's the best idea I have for now.
>
> I also fully expect that Debian's servers will be used by a lot of 
> people outside the project as soon as it becomes a convenient fallback, 
> in the same way people are pulling .orig.tar.gz archives from Debian 
> mirrors, so we need to make it easy to set up a mirror, to allow this to 
> scale.
>

For submodules, the metadata consists of the url of the repository to
clone from.  One could have a list of absolute URLs.  The default would
be to assume that the URLs are tried in order, and if a URL times out,
the next one would be tried.  One may want to change the default
ordering as a user setting, or do a ping test to get obtain content from
the closest repository.

As an example, for linphone-desktop, the first part of the .gitmodules
file contains:

[submodule "linphone-sdk"]
path = external/linphone-sdk
	url = https://gitlab.linphone.org/BC/public/linphone-sdk.git
[submodule "external/google/gn"]

This could be updated to

[submodule "linphone-sdk"]
path = external/linphone-sdk
	url = https://gitlab.linphone.org/BC/public/linphone-sdk.git
        url = https://github.com/BelledonneCommunications/linphone-sdk.git
[submodule "external/google/gn"]
        


^ permalink raw reply

* Re: Mirror repositories for submodules
From: Benson Muite @ 2026-06-05  4:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqcxy7qfgk.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Benson Muite <benson_muite@emailplus.org> writes:
>
>> Would a contribution to add mirror repositories as alternate submodule
>> sources be considered for inclusion?  Some projects have mirror
>> repositories on other hosting services, and may have bandwidth limits on
>> their primary hosting service.  Being able to indicate mirror
>> repositories for where to check for updates and sources for submodules
>> when doing `git clone --recurse-submodules https://my.repo ` or `git
>> submodule update --init --recursive` would be helpful when there is a
>> timeout.
>
> I do not see why such a "oh, the repository at $URL1 seems to be
> down, but we know $URL2 serves the equivalent information, so let's
> go there instead" feature has to be limited to submodule use case.
>
> So, no, I do not think a contribution to add mirror repositories as
> alternate submodule sources should be considered for inclusion, as
> it artificially limits usefulness of the feature.  A feature to add
> mirror repositories as alternate sources might be worth considering,
> though.

Thanks for the feedback. This was motivated by problems when trying to
recursively clone, but a more general solution is also fine.

^ permalink raw reply

* Re: [PATCH] Documentation: remove redundant 'instead' in --subject-prefix
From: Weijie Yuan @ 2026-06-05  4:11 UTC (permalink / raw)
  To: Lucas Seiki Oshiro; +Cc: git
In-Reply-To: <20260604163510.36687-2-lucasseikioshiro@gmail.com>

On Thu, Jun 04, 2026 at 01:34:42PM -0300, Lucas Seiki Oshiro wrote:
>  --subject-prefix=<subject-prefix>::
> -	Instead of the standard '[PATCH]' prefix in the subject
> -	line, instead use '[<subject-prefix>]'. This can be used
> -	to name a patch series, and can be combined with the
> -	`--numbered` option.
> +	Use '[<subject-prefix>]' instead of the standard '[PATCH]'
> +	prefix in the subject line. This can be used to name a patch
> +	series, and can be combined with the `--numbered` option.

Agreed, this reads more smoothly.

^ permalink raw reply

* [PATCH v2] Makefile: dedup archives in $(LIBS) so link recipes don't repeat them
From: Harald Nordgren via GitGitGadget @ 2026-06-04 22:03 UTC (permalink / raw)
  To: git; +Cc: Harald Nordgren, Harald Nordgren
In-Reply-To: <pull.2314.git.git.1780269406949.gitgitgadget@gmail.com>

From: Harald Nordgren <haraldnordgren@gmail.com>

A handful of link recipes listed archive files twice: once explicitly
via $(filter %.a,$^) and again implicitly through $(LIBS), which
expanded to $(filter-out %.o,$(GITLIBS)) $(EXTLIBS). On macOS the
linker warned about the duplicates:

  ld: warning: ignoring duplicate libraries: 'libgit.a', 'target/release/libgitcore.a'

Redefine $(LIBS) to list archive prerequisites from $^ first, then
the rest of the library list with those archives filtered out so each
appears only once.

Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
---
    Makefile: drop duplicate %.a from test-helper link rule
    
    Redefine $(LIBS) to list archive prerequisites from $^ first, then the
    rest of the library list to avoid brittleness in the future.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2314%2FHaraldNordgren%2Fmakefile-test-helper-dedup-libs-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2314/HaraldNordgren/makefile-test-helper-dedup-libs-v2
Pull-Request: https://github.com/git/git/pull/2314

Range-diff vs v1:

 1:  f6166450b0 ! 1:  0ef442ea05 Makefile: drop duplicate %.a from link recipes
     @@ Metadata
      Author: Harald Nordgren <haraldnordgren@gmail.com>
      
       ## Commit message ##
     -    Makefile: drop duplicate %.a from link recipes
     +    Makefile: dedup archives in $(LIBS) so link recipes don't repeat them
      
     -    Three link recipes list archive files twice on the link line: once
     -    via $(filter %.a,$^) and again through $(LIBS), which expands to
     -    $(filter-out %.o,$(GITLIBS)) $(EXTLIBS). On macOS the linker warns
     -    about the duplicates:
     +    A handful of link recipes listed archive files twice: once explicitly
     +    via $(filter %.a,$^) and again implicitly through $(LIBS), which
     +    expanded to $(filter-out %.o,$(GITLIBS)) $(EXTLIBS). On macOS the
     +    linker warned about the duplicates:
      
            ld: warning: ignoring duplicate libraries: 'libgit.a', 'target/release/libgitcore.a'
      
     -    Drop the redundant filter from the test-helper, fuzz-program, and
     -    unit-test recipes so they match the pattern used by other link
     -    recipes in the file.
     +    Redefine $(LIBS) to list archive prerequisites from $^ first, then
     +    the rest of the library list with those archives filtered out so each
     +    appears only once.
      
          Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
      
       ## Makefile ##
     +@@ Makefile: endif
     + #
     + # where we use it as a dependency. Since we also pull object files
     + # from the dependency list, that would make each entry appear twice.
     +-LIBS = $(filter-out %.o, $(GITLIBS)) $(EXTLIBS)
     ++# Archives from $^ come first, then the rest with those archives
     ++# filtered out so each appears only once.
     ++LIBS = $(filter %.a,$^) $(filter-out $(filter %.a,$^),$(filter-out %.o,$(GITLIBS)) $(EXTLIBS))
     + 
     + BASIC_CFLAGS += $(COMPAT_CFLAGS)
     + LIB_OBJS += $(COMPAT_OBJS)
      @@ Makefile: perf: all
       t/helper/test-tool$X: $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS)) $(UNIT_TEST_DIR)/test-lib.o
       


 Makefile | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index b31ecb0756..a828a66f28 100644
--- a/Makefile
+++ b/Makefile
@@ -2503,7 +2503,9 @@ endif
 #
 # where we use it as a dependency. Since we also pull object files
 # from the dependency list, that would make each entry appear twice.
-LIBS = $(filter-out %.o, $(GITLIBS)) $(EXTLIBS)
+# Archives from $^ come first, then the rest with those archives
+# filtered out so each appears only once.
+LIBS = $(filter %.a,$^) $(filter-out $(filter %.a,$^),$(filter-out %.o,$(GITLIBS)) $(EXTLIBS))
 
 BASIC_CFLAGS += $(COMPAT_CFLAGS)
 LIB_OBJS += $(COMPAT_OBJS)
@@ -3392,7 +3394,7 @@ perf: all
 t/helper/test-tool$X: $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS)) $(UNIT_TEST_DIR)/test-lib.o
 
 t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS)
-	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
+	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS)
 
 check-sha1:: t/helper/test-tool$X
 	t/helper/test-sha1.sh
@@ -4015,13 +4017,13 @@ fuzz-all: $(FUZZ_PROGRAMS)
 $(FUZZ_PROGRAMS): %: %.o oss-fuzz/dummy-cmd-main.o $(GITLIBS) GIT-LDFLAGS
 	$(QUIET_LINK)$(FUZZ_CXX) $(FUZZ_CXXFLAGS) -o $@ $(ALL_LDFLAGS) \
 		-Wl,--allow-multiple-definition \
-		$(filter %.o,$^) $(filter %.a,$^) $(LIBS) $(LIB_FUZZING_ENGINE)
+		$(filter %.o,$^) $(LIBS) $(LIB_FUZZING_ENGINE)
 
 $(UNIT_TEST_PROGS): $(UNIT_TEST_BIN)/%$X: $(UNIT_TEST_DIR)/%.o $(UNIT_TEST_OBJS) \
 	$(GITLIBS) GIT-LDFLAGS
 	$(call mkdir_p_parent_template)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) \
-		$(filter %.o,$^) $(filter %.a,$^) $(LIBS)
+		$(filter %.o,$^) $(LIBS)
 
 GIT-TEST-SUITES: FORCE
 	@FLAGS='$(CLAR_TEST_SUITES)'; \

base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH 0/6] Support hashing objects larger than 4GB on Windows
From: Philip Oakley @ 2026-06-04 21:56 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git; +Cc: Johannes Schindelin
In-Reply-To: <pull.2138.git.1780593313.gitgitgadget@gmail.com>

On 04/06/2026 18:15, Johannes Schindelin via GitGitGadget wrote:
> Philip Oakley has contributed these patches ~4.5 years ago, and they have
> been carried in Git for Windows ever since.
> 
> Now that there are already other patch series flying around that try to
> address various aspects about >4GB objects (which aren't handled well by Git
> until it stops forcing unsigned long to do size_t's job), it seems a good
> time to upstream these patches, too, at long last.

Yay. I approve this message ;-)

Philip

> 
> Philip Oakley (6):
>   hash-object: demonstrate a >4GB/LLP64 problem
>   object-file.c: use size_t for header lengths
>   hash algorithms: use size_t for section lengths
>   hash-object --stdin: verify that it works with >4GB/LLP64
>   hash-object: add another >4GB/LLP64 test case
>   hash-object: add a >4GB/LLP64 test case using filtered input
> 
>  object-file.c          | 18 +++++++++---------
>  object-file.h          |  4 ++--
>  sha1dc_git.c           |  3 +--
>  sha1dc_git.h           |  2 +-
>  t/t1007-hash-object.sh | 39 +++++++++++++++++++++++++++++++++++++++
>  5 files changed, 52 insertions(+), 14 deletions(-)
> 
> 
> base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2138%2Fdscho%2FPhilipOakley%2Fhashliteral_t-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2138/dscho/PhilipOakley/hashliteral_t-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/2138


^ permalink raw reply

* trailers: --only-trailers normalizes URLs to trailers
From: Kristoffer Haugsbakk @ 2026-06-04 21:27 UTC (permalink / raw)
  To: git

The following is a bug that follows straightforwardly from the documented
or discussed behavior. In that sense it is not a bug. But it is a bug in
the sense that it makes things inconvenient and violates a design goal.

> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)

Ran what is the equivalent of

    git interpret-trailers --only-trailers

With

    git log --format="%(trailers:only)"

> What did you expect to happen? (Expected behavior)

For URLs like https://www.digsm.xyz/ to be left intact.

(Well, did I expect that? It follows from the discussed behavior...)

> What happened instead? (Actual behavior)

URLs on a line by themselves in eligible trailer blocks get
normalized/canonicalized to a “trailer” with key e.g. `https`:

    https: //www.digsm.xyz/

> What's different between what you expected and what actually happened?

In an ideal world to have some special-casing of URLs so that they are
not detected as trailers. Does anyone realistically want trailers like
this?:

    file: //...
    http: //...
    https: //...

Maybe a C-style comment?

    https: // I changed my mind about providing a URL here.
        This comment is a placeholder.
    Comment: // But next up we have a URL
    https: https://protocoltwiceover.net

And this is where my imagination ends.

Just special-casing `https` would go a long way.

> Anything else you want to add:

Yes, after this [System Info] part.

> Please review the rest of the bug report below.
> You can delete any lines you don't wish to share.

[System Info]
git version:
git version 2.54.0
cpu: x86_64
built from commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
rust: disabled
gettext: enabled
libcurl: 7.81.0
OpenSSL: OpenSSL 3.0.2 15 Mar 2022
zlib: 1.2.11
SHA-1: SHA1_DC
SHA-256: SHA256_BLK
default-ref-format: files
default-hash: sha1
uname: Linux 6.8.0-117-generic #117~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu May  7 22:17:46 UTC  x86_64
compiler info: gnuc: 11.4
libc info: glibc: 2.35
$SHELL (typically, interactive shell): /bin/bash

[Enabled Hooks]
commit-msg
post-applypatch
post-commit
sendemail-validate

***

That things like `--format='%(trailers:only)'` normalize trailers is
known and has been discussed before.[1] There’s been discussion around
the key capitalization and prefix normalization. But this is not about
that. This is just about normalizing the separator part.

🔗 1: https://lore.kernel.org/git/87blk0rjob.fsf@0x63.nu/

One design goal for trailers (either by implementers or reviewers or both)
has been to avoid false positives.[2] That meant trying to avoid
detecting trailers that were not intended. For example:

    Everything was better in the past. Let me not even start on this
    rant: it is not good for my blood pressure.

This will not be picked up as a trailer block unless `rant` is configured
as a trailer key.

† 2: See e.g. Jonathan Tan’s series about among other things adding the
     25% rule

     https://lore.kernel.org/git/xmqq7f96sa9i.fsf@gitster.mtv.corp.google.com/

But that’s pretty innocuous. Just a misplaced rant. The topic of this
bug report is not a big deal either, but it is:

1. Structured data that gets mangled in this normalize mode
2. That can naturally go at the end of the message on its own line

And these two points are very relevant for people who never use
trailers. Or, wait. I guess it isn’t if they don’t use trailers and thus
will never normalize them. But it is relevant if they work on a project
where someone else does that.

IN INTENDED TRAILER BLOCKS [3]

And then there are things that can go wrong if you intend to write trailer blocks:

1. “Non-trailer lines” that are URLs get normalized as trailers (NTL for
   short)
2. User error line wrapping turns one trailer into an empty trailer plus
   a `https` trailer (LW for short)
3. Normalizing trailers along the way (as in patches in flight or
   something) introduces this strange lossiness (NL for short)

I did (2) (LW for short) four years ago it seems:

    See:
    https://digsm.yxz/blog/important-context/?bigtechtracker=86b0c5a1e2b73b08fd54c727f4458649ed9fe3ad1b6e8ac9460c070113509a1e

† 3: Are all-caps titles good or bad? Let me know.

IN THE LINUX KERNEL

There are some hits for the `http` and `https` trailers when trailers
are normalized. The baseline:

   $ git log --extended-regexp --grep='https?: //' --oneline | wc -l
   12

With normalization:

    $ git log --format='%(trailers:only)' |
          grep --extended-regexp '^https?: //' | wc -l
    245

Note that I have no idea how the Linux Kernel is run. But I don’t
imagine that there are uses for `https: //...` trailers.

And trailer usage is complicated. There are for example on-purpose
indented `Link` “trailers”, presumably for the purpose of *excluding*
them as `Link` trailers. See:

    commit d80a9cb1a64ab9c817b6262c7e4e433b6a3581a0

    <body>

    [ljs@kernel.org: avoid bisection hazard]
      Link: https://lkml.kernel.org/r/d0cc6161-77a4-42ba-a411-96c23c78df1b@lucifer.local
    Link: https://lkml.kernel.org/r/c2be872d64ef9573b80727d9ab5446cf002f17b5.1774029655.git.ljs@kernel.org
    Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
    [MORE BELOW]

Is that indented link for that `[]` comment? I dunno.

But what’s the main topic here are intended non-trailer lines which are
URLs that get treated as trailers (NTL). Like this invented example:

Reported-by: ...
https://digsm.xyz/?avastvirus=5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
Signed-off-by: ...

Or this real example where the URLs are clearly part of a “comment”
non-trailer run.

    8236fc613d44e59f6736d6c3e9efffaf26ab7f00
    Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
    [bhelgaas: squash fixes:
    https://lore.kernel.org/r/20260108013956.14351-2-bagasdotme@gmail.com
    https://lore.kernel.org/r/20260108013956.14351-3-bagasdotme@gmail.com]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Link: https://patch.msgid.link/20251210132907.58799-4-xueshuai@linux.alibaba.com

(These are shown as they are written in the commit message. Normalizing
the messages would create `https` trailers.)

Here are examples of line-wrapping mistake commits (LW) for `Link`,
`Closes`, or `Fixes` (sometimes these point to bug URLs and not
commits):

    5bd97f5c5f241a5610c4412d1b93995a26241f81
    Link: https://patch.msgid.link/20260216-work-xattr-socket-v1-4-c2efa4f74cb7@kernel.org
    Link:
    https://lore.kernel.org/3cnmtqmakpbb2uwhenrj7kdqu3uefykiykjllgfbtpkiwhaa4s@sghkevv7jned [1]
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

and:

• 24abe1f238e7d7ac56be6374c52a3c13dab84f69
• 27e21516914dc130a79aa895a5a26e18f0213a5a
• be3536a4bdda53ff5a91b7e542b167d12bddb317

Finally there is this commit which has a trailer in the commit message
itself with the key `https` (NL).

    commit 496c0c4c53bbe1bad97e82cd12103df61a6e459d
    ...
    ...

	net: wan: fsl_ucc_hdlc: free tx_skbuff in uhdlc_memclean

        <body>

	https: //sashiko.dev/#/patchset/20260429114208.941011-1-holger.brunck%40hitachienergy.com
	Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC")
	Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com>
	Link: https://patch.msgid.link/20260507155332.3452319-1-holger.brunck@hitachienergy.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>

How could this have happened? Follow the patch-id link.

https://patch.msgid.link/20260507155332.3452319-1-holger.brunck@hitachienergy.com

    https://sashiko.dev/#/patchset/20260429114208.941011-1-holger.brunck%40hitachienergy.com
    Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC")
    Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com>

So it was just a non-trailer URL line as this person submitted it. But
presumably the person who applied it put the message through a round of
normalization.

Cheers, good night

-- 
Kristoffer

^ permalink raw reply

* Re: [PATCH 1/4] doc: link to config for git-replay(1)
From: Kristoffer Haugsbakk @ 2026-06-04 20:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Siddharth Asthana
In-Reply-To: <xmqqse78fsn2.fsf@gitster.g>

On Sun, May 31, 2026, at 00:18, Junio C Hamano wrote:
> kristofferhaugsbakk@fastmail.com writes:
>
>> From: Kristoffer Haugsbakk <code@khaugsbakk.name>
>>
>> This config doc was added in 336ac90c (replay: add replay.refAction
>> config option, 2025-11-06) but never included anywhere. Include it in
>> git-replay(1) and git-config(1).
>>
>> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
>> ---
>>  Documentation/config.adoc     | 2 ++
>>  Documentation/git-replay.adoc | 4 ++++
>>  2 files changed, 6 insertions(+)
>
> It is always nice to see documentation gaps filled.
>
> The `replay.refAction` configuration variable was indeed left
> dangling without a proper link from the main command documentation,
> which is embarrassing.  I wonder if we can add simple "doc-lint"
> rule or two to prevent similar mistakes from happening again?

For what it’s worth this is how I found it.

I was working on the kh/doc-hook topic and noticed that my changes
didn’t trigger any `doc-diff` changes for git-config(1). So I checked
the file and nope, it wasn’t included. Then I massaged the file includes
and diffed it with `Documentation/config/`. The only missing ones were:

• replay
• fmt-merge-msg

And `fmt-merge-msg` turned out to be a false positive since it is
included via `merge` or something.

>
>> diff --git a/Documentation/config.adoc b/Documentation/config.adoc
>> index 62eebe7c545..51fabecb9b0 100644
>> --- a/Documentation/config.adoc
>> +++ b/Documentation/config.adoc
>> @@ -511,6 +511,8 @@ include::config/remotes.adoc[]
>>[snip]

^ permalink raw reply

* Re: [PATCH 4/6] t: add lint-style.pl with test_grep negation rule
From: Michael Montalbo @ 2026-06-04 19:36 UTC (permalink / raw)
  To: D. Ben Knoble; +Cc: Michael Montalbo via GitGitGadget, git, Eric Sunshine
In-Reply-To: <CALnO6CCjr5xMk=GLHSgf=KQpKJ1FnpimQCYu+BqyufWrRFkh8A@mail.gmail.com>

On Thu, Jun 4, 2026 at 11:35 AM D. Ben Knoble <ben.knoble@gmail.com> wrote:
>
> Hi Michael,
>
> This sounds like a neat effort!
>
> One drive-by comment…
>
> On Thu, Jun 4, 2026 at 3:46 AM Michael Montalbo via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Michael Montalbo <mmontalbo@gmail.com>
> >
> > Add a mechanical lint checker for test scripts, similar in spirit to
> > check-non-portable-shell.pl but focused on test conventions rather
> > than portability.
> >
> > The tool defines LintParser, a subclass of ScriptParser (from the
> > shared lib-shell-parser.pl module).  ScriptParser's
> > parse_cmd() finds test_expect_success blocks and calls check_test()
> > for each body; LintParser overrides check_test() to run lint rules
> > on the parsed commands.  A "# lint-ok" comment suppresses all
> > checks for intentional style violations.
> >
> > The first rule detects '! test_grep' and replaces it with
> > 'test_grep !'.  Shell-level negation suppresses the diagnostic
> > output that test_grep prints on failure; the built-in negation
> > preserves it.
> >
> > Three violations inside test bodies are converted via --fix.  One
> > additional violation in a helper function outside test_expect_success
> > (t7900's test_geometric_repack_needed) is converted manually, since
> > the parser only processes test bodies.
> >
> > Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
> > ---
> >  t/.gitattributes                           |   2 +
> >  t/Makefile                                 |  32 +++-
> >  t/lint-style.pl                            | 200 +++++++++++++++++++++
> >  t/lint-style/heredoc.expect                |   3 +
> >  t/lint-style/heredoc.test                  |  14 ++
> >  t/lint-style/test-grep-negation-fix.expect |   4 +
> >  t/lint-style/test-grep-negation-fix.test   |   4 +
> >  t/lint-style/test-grep-negation.expect     |   3 +
> >  t/lint-style/test-grep-negation.test       |   4 +
> >  t/t0031-lockfile-pid.sh                    |   2 +-
> >  t/t5300-pack-object.sh                     |   2 +-
> >  t/t5319-multi-pack-index.sh                |   2 +-
> >  t/t7900-maintenance.sh                     |   2 +-
> >  13 files changed, 268 insertions(+), 6 deletions(-)
> >  create mode 100755 t/lint-style.pl
> >  create mode 100644 t/lint-style/heredoc.expect
> >  create mode 100644 t/lint-style/heredoc.test
> >  create mode 100644 t/lint-style/test-grep-negation-fix.expect
> >  create mode 100644 t/lint-style/test-grep-negation-fix.test
> >  create mode 100644 t/lint-style/test-grep-negation.expect
> >  create mode 100644 t/lint-style/test-grep-negation.test
> >
> > diff --git a/t/.gitattributes b/t/.gitattributes
> > index 7664c6e027..aea6889d03 100644
> > --- a/t/.gitattributes
> > +++ b/t/.gitattributes
> > @@ -1,5 +1,7 @@
> >  t[0-9][0-9][0-9][0-9]/* -whitespace
> >  /chainlint/*.expect eol=lf -whitespace
> > +/lint-style/*.expect eol=lf -whitespace
> > +/lint-style/*.test eol=lf -whitespace
> >  /t0110/url-* binary
> >  /t3206/* eol=lf
> >  /t3900/*.txt eol=lf
> > diff --git a/t/Makefile b/t/Makefile
> > index 25f923fed9..3a5fa4ce37 100644
> > --- a/t/Makefile
> > +++ b/t/Makefile
> > @@ -46,6 +46,7 @@ TPERF = $(sort $(wildcard perf/p[0-9][0-9][0-9][0-9]-*.sh))
> >  TINTEROP = $(sort $(wildcard interop/i[0-9][0-9][0-9][0-9]-*.sh))
> >  CHAINLINTTESTS = $(sort $(patsubst chainlint/%.test,%,$(wildcard chainlint/*.test)))
> >  CHAINLINT = '$(PERL_PATH_SQ)' chainlint.pl
> > +LINT_STYLE_TESTS = $(sort $(wildcard lint-style/*.test))
> >  UNIT_TEST_SOURCES = $(wildcard unit-tests/t-*.c)
> >  UNIT_TEST_PROGRAMS = $(patsubst unit-tests/%.c,unit-tests/bin/%$(X),$(UNIT_TEST_SOURCES))
> >  UNIT_TEST_PROGRAMS += unit-tests/bin/unit-tests$(X)
> > @@ -139,7 +140,7 @@ check-meson:
> >  test-lint: test-lint-duplicates test-lint-executable \
> >         test-lint-filenames
> >  ifneq ($(PERL_PATH),)
> > -test-lint: test-lint-shell-syntax check-shell-parser
> > +test-lint: test-lint-shell-syntax test-lint-style check-lint-style check-shell-parser
> >  else
> >  GIT_TEST_CHAIN_LINT = 0
> >  endif
> > @@ -162,6 +163,32 @@ test-lint-shell-syntax:
> >
> >  check-shell-parser:
> >         @'$(PERL_PATH_SQ)' check-shell-parser.pl
> > +
> > +test-lint-style:
> > +       @'$(PERL_PATH_SQ)' lint-style.pl $(T) $(THELPERS) $(TPERF)
> > +
> > +check-lint-style:
> > +       @rc=0; for t in $(LINT_STYLE_TESTS); do \
> > +               base=$${t%.test}; \
> > +               case $$base in \
> > +               *-fix) \
> > +                       cp "$$t" "$$t.tmp" && \
> > +                       '$(PERL_PATH_SQ)' lint-style.pl --fix "$$t.tmp" >/dev/null 2>&1; \
> > +                       fix_rc=$$?; \
> > +                       if test $$fix_rc != 0; then \
> > +                               echo "FAIL: $$t (--fix exit code $$fix_rc)"; rc=1; \
> > +                       elif ! diff -u "$$base.expect" "$$t.tmp"; then \
> > +                               echo "FAIL: $$t (--fix output)"; rc=1; \
> > +                       fi; \
> > +                       rm -f "$$t.tmp" ;; \
> > +               *) \
> > +                       if ! '$(PERL_PATH_SQ)' lint-style.pl "$$t" 2>&1 | \
> > +                               diff -u "$$base.expect" -; then \
> > +                               echo "FAIL: $$t"; rc=1; \
> > +                       fi ;; \
> > +               esac; \
> > +       done; test $$rc = 0
> > +
>
> …I wonder if it would be easier to maintain this recipe as a separate
> shell script and have make give LINT_STYLE_TESTS and PERL_PATH (w/o
> SQ? idk) to the script. That's a lot of inline code otherwise!
>

Yeah that's a good call out, will fix in a follow-up. Thank you for
taking a look!

> >  test-lint-filenames:
> >         @# We do *not* pass a glob to ls-files but use grep instead, to catch
> >         @# non-ASCII characters (which are quoted within double-quotes)
> > @@ -188,7 +215,8 @@ perf:
> >
> >  .PHONY: pre-clean $(T) aggregate-results clean valgrind perf \
> >         check-chainlint clean-chainlint test-chainlint \
> > -       check-shell-parser $(UNIT_TESTS)
> > +       check-shell-parser \
> > +       check-lint-style test-lint-style $(UNIT_TESTS)
> >
> >  .PHONY: libgit-sys-test libgit-rs-test
> >  libgit-sys-test:
> > diff --git a/t/lint-style.pl b/t/lint-style.pl
> > new file mode 100755
> > index 0000000000..9268577f9b
> > --- /dev/null
> > +++ b/t/lint-style.pl
> > @@ -0,0 +1,200 @@
> > +#!/usr/bin/perl
> > +
> > +# Check test scripts for style violations that can be detected
> > +# mechanically, such as using bare 'grep' where test_grep should
> > +# be used.  Use --fix to automatically apply suggested replacements.
> > +#
> > +# Detection uses parsed tokens from the shared shell parser for
> > +# correct handling of heredocs, $(...), pipes, and quoting.
> > +# Fixes modify the original file text to preserve formatting.
> > +
> > +use strict;
> > +use warnings;
> > +use File::Basename;
> > +# Force LF output so check-lint-style's diff against the
> > +# pre-committed .expect files works on Windows.
> > +binmode(STDOUT, ':unix');
> > +binmode(STDERR, ':unix');
> > +
> > +my $fix_mode = 0;
> > +if (@ARGV && $ARGV[0] eq '--fix') {
> > +       $fix_mode = 1;
> > +       shift @ARGV;
> > +}
> > +
> > +# Load the shared shell parser (Lexer, ShellParser, ScriptParser).
> > +my $_lib = dirname($0) . "/lib-shell-parser.pl";
> > +$_lib = "./$_lib" unless $_lib =~ m{^/};
> > +do $_lib or die "$0: failed to load $_lib: $@$!\n";
> > +
> > +# LintParser is a subclass of ScriptParser which runs lint rules
> > +# on each test body.  Per-file state (file name, raw lines, dirty
> > +# flag) is stored on the instance before calling parse().
> > +#
> > +# Subroutines defined below (parse_commands, check_test_grep_negation,
> > +# etc.) are in package main and called with the main:: prefix.
> > +# File-scoped lexicals ($fix_mode, $has_fixable, etc.) are visible
> > +# across packages since 'package' does not introduce a new scope.
> > +package LintParser;
> > +our @ISA = ('ScriptParser');
> > +
> > +package main;
> > +
> > +my $exit_code = 0;
> > +my $has_fixable = 0;
> > +
> > +sub err {
> > +       my ($file, $lineno, $line, $msg, %opts) = @_;
> > +       $line =~ s/^\s+//;
> > +       $line =~ s/\s+$//;
> > +       $line =~ s/\s+/ /g;
> > +       my $prefix = ($fix_mode && $opts{fixable}) ? 'fixed' : 'error';
> > +       print "$file:$lineno: $prefix: $msg: $line\n";
> > +       $exit_code = 1 unless $fix_mode && $opts{fixable};
> > +}
> > +
> > +# Report a lint violation found by a rule.  In --fix mode, apply
> > +# the regex substitution on the raw line and report success.
> > +# Otherwise just report.  Returns 1 if the line was modified.
> > +sub report_violation {
> > +       my ($file, $cmd, $line_ref, $match, $fix, $from) = @_;
> > +       my $lineno = $cmd->{lineno};
> > +       my $display = join(' ', @{$cmd->{tokens}});
> > +       $has_fixable++;  # count for the "--fix" hint
> > +       if ($fix_mode) {
> > +               if ($$line_ref =~ s/$match/$fix/) {
> > +                       err $file, $lineno, $display,
> > +                               "replace '$from' with '$fix'",
> > +                               fixable => 1;
> > +                       return 1;
> > +               }
> > +               err $file, $lineno, $display,
> > +                       "replace '$from' with '$fix' (could not auto-fix)";
> > +       } else {
> > +               err $file, $lineno, $display,
> > +                       "replace '$from' with '$fix'";
> > +       }
> > +       return 0;
> > +}
> > +
> > +# Split a token stream into commands at &&, ||, ;;, and \n.
> > +sub parse_commands {
> > +       my ($content) = @_;
> > +       my $parser = ShellParser->new(\$content);
> > +       my @all_tokens = $parser->parse();
> > +
> > +       my @commands;
> > +       my @current;
> > +       my $lineno = 1;
> > +
> > +       for (my $ti = 0; $ti < @all_tokens; $ti++) {
> > +               my $text = $all_tokens[$ti]->[0];
> > +               if ($text =~ /^(?:&&|\|\||;;|\n)$/) {
> > +                       if (@current) {
> > +                               push @commands, {
> > +                                       tokens => [@current],
> > +                                       lineno => $lineno,
> > +                               };
> > +                               @current = ();
> > +                       }
> > +               } else {
> > +                       $lineno = $all_tokens[$ti]->[3]
> > +                               if !@current && defined $all_tokens[$ti]->[3];
> > +                       push @current, $text;
> > +               }
> > +       }
> > +       if (@current) {
> > +               push @commands, {
> > +                       tokens => [@current],
> > +                       lineno => $lineno,
> > +               };
> > +       }
> > +       return @commands;
> > +}
> > +
> > +# --- Rule: '! test_grep' should be 'test_grep !' ---
> > +# Shell-level negation suppresses test_grep's diagnostic output
> > +# on failure.  Built-in negation preserves it.
> > +sub check_test_grep_negation {
> > +       my ($cmd, $file, $line_ref) = @_;
> > +       my @tokens = @{$cmd->{tokens}};
> > +       return unless @tokens >= 2 && $tokens[0] eq '!' && $tokens[1] eq 'test_grep';
> > +
> > +       return report_violation($file, $cmd, $line_ref,
> > +               qr/!\s*test_grep/, 'test_grep !', '! test_grep');
> > +}
> > +
> > +# Map parsed commands back to raw file lines for --fix.
> > +# Detection uses parsed tokens (correct handling of quoting,
> > +# heredocs, pipes) but fixes must modify the original text
> > +# to preserve formatting.
> > +package LintParser;
> > +
> > +sub check_test {
> > +       # Called by ScriptParser::parse_cmd for each test_expect_success
> > +       # or test_expect_failure block.
> > +       my $self = shift @_;
> > +       my $title = ScriptParser::unwrap(shift @_);
> > +
> > +       # Two test body formats:
> > +       #   Quoted:  test_expect_success 'title' '..body..'
> > +       #   Heredoc: test_expect_success 'title' - <<\EOF
> > +       #              ..body..
> > +       #            EOF
> > +       # For quoted, the body token is the quoted string.
> > +       # For heredoc, the body token is '-' and the actual
> > +       # code arrives as the next argument from the Lexer.
> > +       my $body_token = shift @_;
> > +       my $lineno_base = $body_token->[3] || 1;
> > +       my $body = ScriptParser::unwrap($body_token);
> > +
> > +       if ($body eq '-') {
> > +               my $herebody = shift @_;
> > +               if ($herebody) {
> > +                       $body = $herebody->{content};
> > +                       $lineno_base = $herebody->{start_line} || 1;
> > +               }
> > +       }
> > +       return unless $body;
> > +
> > +       # Map each command back to its file line number.
> > +       # $lineno_base is where the body starts in the file;
> > +       # $cmd->{lineno} is relative to the body (starting at 1).
> > +       my $raw_lines = $self->{raw_lines};
> > +       for my $cmd (main::parse_commands($body)) {
> > +               my $ln = ($cmd->{lineno} || 0) + $lineno_base - 1;
> > +               $cmd->{lineno} = $ln;
> > +               next unless $ln >= 1 && $ln <= @$raw_lines;
> > +               next if $raw_lines->[$ln - 1] =~ /#.*lint-ok/;
> > +
> > +               if (main::check_test_grep_negation($cmd, $self->{file}, \$raw_lines->[$ln - 1])) {
> > +                       $self->{dirty} = 1;
> > +               }
> > +       }
> > +}
> > +
> > +package main;
> > +
> > +for my $file (@ARGV) {
> > +       # :unix:crlf strips \r on Windows (same as chainlint.pl)
> > +       open(my $fh, '<:unix:crlf', $file) or die "$0: $file: $!\n";
> > +       my @raw_lines = <$fh>;
> > +       close $fh;
> > +
> > +       my $parser = LintParser->new(\join('', @raw_lines));
> > +       $parser->{file} = $file;
> > +       $parser->{raw_lines} = \@raw_lines;
> > +       $parser->{dirty} = 0;
> > +       $parser->parse();
> > +
> > +       if ($fix_mode && $parser->{dirty}) {
> > +               open(my $out, '>', $file) or die "$0: $file: $!\n";
> > +               print $out @{$parser->{raw_lines}};
> > +               close $out;
> > +       }
> > +}
> > +
> > +if ($has_fixable && !$fix_mode) {
> > +       print "hint: run with --fix to apply the suggested replacements.\n";
> > +}
> > +exit $exit_code;
> > diff --git a/t/lint-style/heredoc.expect b/t/lint-style/heredoc.expect
> > new file mode 100644
> > index 0000000000..7ff6d4a52d
> > --- /dev/null
> > +++ b/t/lint-style/heredoc.expect
> > @@ -0,0 +1,3 @@
> > +lint-style/heredoc.test:8: error: replace '! test_grep' with 'test_grep !': ! test_grep "after-heredoc-is-caught" actual
> > +lint-style/heredoc.test:13: error: replace '! test_grep' with 'test_grep !': ! test_grep "not-inside-sed-heredoc" actual
> > +hint: run with --fix to apply the suggested replacements.
> > diff --git a/t/lint-style/heredoc.test b/t/lint-style/heredoc.test
> > new file mode 100644
> > index 0000000000..4c05831cfb
> > --- /dev/null
> > +++ b/t/lint-style/heredoc.test
> > @@ -0,0 +1,14 @@
> > +test_expect_success 'greps inside heredocs are skipped' '
> > +       cat <<-EOF &&
> > +       grep "inside-strip-tabs" file
> > +       EOF
> > +       cat <<-\EOF &&
> > +       grep "inside-no-expand" file
> > +       EOF
> > +       ! test_grep "after-heredoc-is-caught" actual
> > +'
> > +
> > +test_expect_success 'sed with << does not start a heredoc' '
> > +       sed "s/<< foo/bar/" file &&
> > +       ! test_grep "not-inside-sed-heredoc" actual
> > +'
> > diff --git a/t/lint-style/test-grep-negation-fix.expect b/t/lint-style/test-grep-negation-fix.expect
> > new file mode 100644
> > index 0000000000..28ecde1073
> > --- /dev/null
> > +++ b/t/lint-style/test-grep-negation-fix.expect
> > @@ -0,0 +1,4 @@
> > +test_expect_success 'negated test_grep' '
> > +       test_grep ! "pattern" actual &&
> > +       test_grep ! -i "insensitive" actual
> > +'
> > diff --git a/t/lint-style/test-grep-negation-fix.test b/t/lint-style/test-grep-negation-fix.test
> > new file mode 100644
> > index 0000000000..571c150031
> > --- /dev/null
> > +++ b/t/lint-style/test-grep-negation-fix.test
> > @@ -0,0 +1,4 @@
> > +test_expect_success 'negated test_grep' '
> > +       ! test_grep "pattern" actual &&
> > +       ! test_grep -i "insensitive" actual
> > +'
> > diff --git a/t/lint-style/test-grep-negation.expect b/t/lint-style/test-grep-negation.expect
> > new file mode 100644
> > index 0000000000..1fa9e124aa
> > --- /dev/null
> > +++ b/t/lint-style/test-grep-negation.expect
> > @@ -0,0 +1,3 @@
> > +lint-style/test-grep-negation.test:2: error: replace '! test_grep' with 'test_grep !': ! test_grep "pattern" actual
> > +lint-style/test-grep-negation.test:3: error: replace '! test_grep' with 'test_grep !': ! test_grep -i "insensitive" actual
> > +hint: run with --fix to apply the suggested replacements.
> > diff --git a/t/lint-style/test-grep-negation.test b/t/lint-style/test-grep-negation.test
> > new file mode 100644
> > index 0000000000..571c150031
> > --- /dev/null
> > +++ b/t/lint-style/test-grep-negation.test
> > @@ -0,0 +1,4 @@
> > +test_expect_success 'negated test_grep' '
> > +       ! test_grep "pattern" actual &&
> > +       ! test_grep -i "insensitive" actual
> > +'
> > diff --git a/t/t0031-lockfile-pid.sh b/t/t0031-lockfile-pid.sh
> > index 8ef87addf5..e9e2f04049 100755
> > --- a/t/t0031-lockfile-pid.sh
> > +++ b/t/t0031-lockfile-pid.sh
> > @@ -29,7 +29,7 @@ test_expect_success 'PID info not shown by default' '
> >                 test_must_fail git add . 2>err &&
> >                 # Should not crash, just show normal error without PID
> >                 test_grep "Unable to create" err &&
> > -               ! test_grep "is held by process" err
> > +               test_grep ! "is held by process" err
> >         )
> >  '
> >
> > diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> > index 73445782e7..3179b4963e 100755
> > --- a/t/t5300-pack-object.sh
> > +++ b/t/t5300-pack-object.sh
> > @@ -720,7 +720,7 @@ test_expect_success '--name-hash-version=2 and --write-bitmap-index are incompat
> >
> >         # --stdout option silently removes --write-bitmap-index
> >         git pack-objects --stdout --all --name-hash-version=2 --write-bitmap-index >out 2>err &&
> > -       ! test_grep "currently, --write-bitmap-index requires --name-hash-version=1" err
> > +       test_grep ! "currently, --write-bitmap-index requires --name-hash-version=1" err
> >  '
> >
> >  test_expect_success '--path-walk pack everything' '
> > diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> > index fa0d4046f7..9154d9795f 100755
> > --- a/t/t5319-multi-pack-index.sh
> > +++ b/t/t5319-multi-pack-index.sh
> > @@ -1175,7 +1175,7 @@ test_expect_success 'load reverse index when missing .idx, .pack' '
> >
> >  test_expect_success 'usage shown without sub-command' '
> >         test_expect_code 129 git multi-pack-index 2>err &&
> > -       ! test_grep "unrecognized subcommand" err
> > +       test_grep ! "unrecognized subcommand" err
> >  '
> >
> >  test_expect_success 'complains when run outside of a repository' '
> > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> > index d7f82e1bec..9db4a76f67 100755
> > --- a/t/t7900-maintenance.sh
> > +++ b/t/t7900-maintenance.sh
> > @@ -664,7 +664,7 @@ test_geometric_repack_needed () {
> >         true)
> >                 test_grep "\[\"git\",\"repack\"," trace2.txt;;
> >         false)
> > -               ! test_grep "\[\"git\",\"repack\"," trace2.txt;;
> > +               test_grep ! "\[\"git\",\"repack\"," trace2.txt;;
> >         *)
> >                 BUG "invalid parameter: $NEEDED";;
> >         esac
> > --
> > gitgitgadget
> >
>
>
> --
> D. Ben Knoble

^ permalink raw reply

* Message from Debaashish Nandi
From: Debaashish Nandi @ 2026-06-04 18:37 UTC (permalink / raw)
  To: git
In-Reply-To: <fcb34925-f10c-4fc4-826d-a55217101ec8@localhost>

[-- Attachment #1: Type: text/plain, Size: 71 bytes --]

Hello, is there any game playing which l can understand how git works ?

^ permalink raw reply

* Re: [PATCH 4/6] t: add lint-style.pl with test_grep negation rule
From: D. Ben Knoble @ 2026-06-04 18:34 UTC (permalink / raw)
  To: Michael Montalbo via GitGitGadget; +Cc: git, Eric Sunshine, Michael Montalbo
In-Reply-To: <c1b90101ef5c38a21fc901bd7387acf83eb96806.1780559158.git.gitgitgadget@gmail.com>

Hi Michael,

This sounds like a neat effort!

One drive-by comment…

On Thu, Jun 4, 2026 at 3:46 AM Michael Montalbo via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Michael Montalbo <mmontalbo@gmail.com>
>
> Add a mechanical lint checker for test scripts, similar in spirit to
> check-non-portable-shell.pl but focused on test conventions rather
> than portability.
>
> The tool defines LintParser, a subclass of ScriptParser (from the
> shared lib-shell-parser.pl module).  ScriptParser's
> parse_cmd() finds test_expect_success blocks and calls check_test()
> for each body; LintParser overrides check_test() to run lint rules
> on the parsed commands.  A "# lint-ok" comment suppresses all
> checks for intentional style violations.
>
> The first rule detects '! test_grep' and replaces it with
> 'test_grep !'.  Shell-level negation suppresses the diagnostic
> output that test_grep prints on failure; the built-in negation
> preserves it.
>
> Three violations inside test bodies are converted via --fix.  One
> additional violation in a helper function outside test_expect_success
> (t7900's test_geometric_repack_needed) is converted manually, since
> the parser only processes test bodies.
>
> Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
> ---
>  t/.gitattributes                           |   2 +
>  t/Makefile                                 |  32 +++-
>  t/lint-style.pl                            | 200 +++++++++++++++++++++
>  t/lint-style/heredoc.expect                |   3 +
>  t/lint-style/heredoc.test                  |  14 ++
>  t/lint-style/test-grep-negation-fix.expect |   4 +
>  t/lint-style/test-grep-negation-fix.test   |   4 +
>  t/lint-style/test-grep-negation.expect     |   3 +
>  t/lint-style/test-grep-negation.test       |   4 +
>  t/t0031-lockfile-pid.sh                    |   2 +-
>  t/t5300-pack-object.sh                     |   2 +-
>  t/t5319-multi-pack-index.sh                |   2 +-
>  t/t7900-maintenance.sh                     |   2 +-
>  13 files changed, 268 insertions(+), 6 deletions(-)
>  create mode 100755 t/lint-style.pl
>  create mode 100644 t/lint-style/heredoc.expect
>  create mode 100644 t/lint-style/heredoc.test
>  create mode 100644 t/lint-style/test-grep-negation-fix.expect
>  create mode 100644 t/lint-style/test-grep-negation-fix.test
>  create mode 100644 t/lint-style/test-grep-negation.expect
>  create mode 100644 t/lint-style/test-grep-negation.test
>
> diff --git a/t/.gitattributes b/t/.gitattributes
> index 7664c6e027..aea6889d03 100644
> --- a/t/.gitattributes
> +++ b/t/.gitattributes
> @@ -1,5 +1,7 @@
>  t[0-9][0-9][0-9][0-9]/* -whitespace
>  /chainlint/*.expect eol=lf -whitespace
> +/lint-style/*.expect eol=lf -whitespace
> +/lint-style/*.test eol=lf -whitespace
>  /t0110/url-* binary
>  /t3206/* eol=lf
>  /t3900/*.txt eol=lf
> diff --git a/t/Makefile b/t/Makefile
> index 25f923fed9..3a5fa4ce37 100644
> --- a/t/Makefile
> +++ b/t/Makefile
> @@ -46,6 +46,7 @@ TPERF = $(sort $(wildcard perf/p[0-9][0-9][0-9][0-9]-*.sh))
>  TINTEROP = $(sort $(wildcard interop/i[0-9][0-9][0-9][0-9]-*.sh))
>  CHAINLINTTESTS = $(sort $(patsubst chainlint/%.test,%,$(wildcard chainlint/*.test)))
>  CHAINLINT = '$(PERL_PATH_SQ)' chainlint.pl
> +LINT_STYLE_TESTS = $(sort $(wildcard lint-style/*.test))
>  UNIT_TEST_SOURCES = $(wildcard unit-tests/t-*.c)
>  UNIT_TEST_PROGRAMS = $(patsubst unit-tests/%.c,unit-tests/bin/%$(X),$(UNIT_TEST_SOURCES))
>  UNIT_TEST_PROGRAMS += unit-tests/bin/unit-tests$(X)
> @@ -139,7 +140,7 @@ check-meson:
>  test-lint: test-lint-duplicates test-lint-executable \
>         test-lint-filenames
>  ifneq ($(PERL_PATH),)
> -test-lint: test-lint-shell-syntax check-shell-parser
> +test-lint: test-lint-shell-syntax test-lint-style check-lint-style check-shell-parser
>  else
>  GIT_TEST_CHAIN_LINT = 0
>  endif
> @@ -162,6 +163,32 @@ test-lint-shell-syntax:
>
>  check-shell-parser:
>         @'$(PERL_PATH_SQ)' check-shell-parser.pl
> +
> +test-lint-style:
> +       @'$(PERL_PATH_SQ)' lint-style.pl $(T) $(THELPERS) $(TPERF)
> +
> +check-lint-style:
> +       @rc=0; for t in $(LINT_STYLE_TESTS); do \
> +               base=$${t%.test}; \
> +               case $$base in \
> +               *-fix) \
> +                       cp "$$t" "$$t.tmp" && \
> +                       '$(PERL_PATH_SQ)' lint-style.pl --fix "$$t.tmp" >/dev/null 2>&1; \
> +                       fix_rc=$$?; \
> +                       if test $$fix_rc != 0; then \
> +                               echo "FAIL: $$t (--fix exit code $$fix_rc)"; rc=1; \
> +                       elif ! diff -u "$$base.expect" "$$t.tmp"; then \
> +                               echo "FAIL: $$t (--fix output)"; rc=1; \
> +                       fi; \
> +                       rm -f "$$t.tmp" ;; \
> +               *) \
> +                       if ! '$(PERL_PATH_SQ)' lint-style.pl "$$t" 2>&1 | \
> +                               diff -u "$$base.expect" -; then \
> +                               echo "FAIL: $$t"; rc=1; \
> +                       fi ;; \
> +               esac; \
> +       done; test $$rc = 0
> +

…I wonder if it would be easier to maintain this recipe as a separate
shell script and have make give LINT_STYLE_TESTS and PERL_PATH (w/o
SQ? idk) to the script. That's a lot of inline code otherwise!

>  test-lint-filenames:
>         @# We do *not* pass a glob to ls-files but use grep instead, to catch
>         @# non-ASCII characters (which are quoted within double-quotes)
> @@ -188,7 +215,8 @@ perf:
>
>  .PHONY: pre-clean $(T) aggregate-results clean valgrind perf \
>         check-chainlint clean-chainlint test-chainlint \
> -       check-shell-parser $(UNIT_TESTS)
> +       check-shell-parser \
> +       check-lint-style test-lint-style $(UNIT_TESTS)
>
>  .PHONY: libgit-sys-test libgit-rs-test
>  libgit-sys-test:
> diff --git a/t/lint-style.pl b/t/lint-style.pl
> new file mode 100755
> index 0000000000..9268577f9b
> --- /dev/null
> +++ b/t/lint-style.pl
> @@ -0,0 +1,200 @@
> +#!/usr/bin/perl
> +
> +# Check test scripts for style violations that can be detected
> +# mechanically, such as using bare 'grep' where test_grep should
> +# be used.  Use --fix to automatically apply suggested replacements.
> +#
> +# Detection uses parsed tokens from the shared shell parser for
> +# correct handling of heredocs, $(...), pipes, and quoting.
> +# Fixes modify the original file text to preserve formatting.
> +
> +use strict;
> +use warnings;
> +use File::Basename;
> +# Force LF output so check-lint-style's diff against the
> +# pre-committed .expect files works on Windows.
> +binmode(STDOUT, ':unix');
> +binmode(STDERR, ':unix');
> +
> +my $fix_mode = 0;
> +if (@ARGV && $ARGV[0] eq '--fix') {
> +       $fix_mode = 1;
> +       shift @ARGV;
> +}
> +
> +# Load the shared shell parser (Lexer, ShellParser, ScriptParser).
> +my $_lib = dirname($0) . "/lib-shell-parser.pl";
> +$_lib = "./$_lib" unless $_lib =~ m{^/};
> +do $_lib or die "$0: failed to load $_lib: $@$!\n";
> +
> +# LintParser is a subclass of ScriptParser which runs lint rules
> +# on each test body.  Per-file state (file name, raw lines, dirty
> +# flag) is stored on the instance before calling parse().
> +#
> +# Subroutines defined below (parse_commands, check_test_grep_negation,
> +# etc.) are in package main and called with the main:: prefix.
> +# File-scoped lexicals ($fix_mode, $has_fixable, etc.) are visible
> +# across packages since 'package' does not introduce a new scope.
> +package LintParser;
> +our @ISA = ('ScriptParser');
> +
> +package main;
> +
> +my $exit_code = 0;
> +my $has_fixable = 0;
> +
> +sub err {
> +       my ($file, $lineno, $line, $msg, %opts) = @_;
> +       $line =~ s/^\s+//;
> +       $line =~ s/\s+$//;
> +       $line =~ s/\s+/ /g;
> +       my $prefix = ($fix_mode && $opts{fixable}) ? 'fixed' : 'error';
> +       print "$file:$lineno: $prefix: $msg: $line\n";
> +       $exit_code = 1 unless $fix_mode && $opts{fixable};
> +}
> +
> +# Report a lint violation found by a rule.  In --fix mode, apply
> +# the regex substitution on the raw line and report success.
> +# Otherwise just report.  Returns 1 if the line was modified.
> +sub report_violation {
> +       my ($file, $cmd, $line_ref, $match, $fix, $from) = @_;
> +       my $lineno = $cmd->{lineno};
> +       my $display = join(' ', @{$cmd->{tokens}});
> +       $has_fixable++;  # count for the "--fix" hint
> +       if ($fix_mode) {
> +               if ($$line_ref =~ s/$match/$fix/) {
> +                       err $file, $lineno, $display,
> +                               "replace '$from' with '$fix'",
> +                               fixable => 1;
> +                       return 1;
> +               }
> +               err $file, $lineno, $display,
> +                       "replace '$from' with '$fix' (could not auto-fix)";
> +       } else {
> +               err $file, $lineno, $display,
> +                       "replace '$from' with '$fix'";
> +       }
> +       return 0;
> +}
> +
> +# Split a token stream into commands at &&, ||, ;;, and \n.
> +sub parse_commands {
> +       my ($content) = @_;
> +       my $parser = ShellParser->new(\$content);
> +       my @all_tokens = $parser->parse();
> +
> +       my @commands;
> +       my @current;
> +       my $lineno = 1;
> +
> +       for (my $ti = 0; $ti < @all_tokens; $ti++) {
> +               my $text = $all_tokens[$ti]->[0];
> +               if ($text =~ /^(?:&&|\|\||;;|\n)$/) {
> +                       if (@current) {
> +                               push @commands, {
> +                                       tokens => [@current],
> +                                       lineno => $lineno,
> +                               };
> +                               @current = ();
> +                       }
> +               } else {
> +                       $lineno = $all_tokens[$ti]->[3]
> +                               if !@current && defined $all_tokens[$ti]->[3];
> +                       push @current, $text;
> +               }
> +       }
> +       if (@current) {
> +               push @commands, {
> +                       tokens => [@current],
> +                       lineno => $lineno,
> +               };
> +       }
> +       return @commands;
> +}
> +
> +# --- Rule: '! test_grep' should be 'test_grep !' ---
> +# Shell-level negation suppresses test_grep's diagnostic output
> +# on failure.  Built-in negation preserves it.
> +sub check_test_grep_negation {
> +       my ($cmd, $file, $line_ref) = @_;
> +       my @tokens = @{$cmd->{tokens}};
> +       return unless @tokens >= 2 && $tokens[0] eq '!' && $tokens[1] eq 'test_grep';
> +
> +       return report_violation($file, $cmd, $line_ref,
> +               qr/!\s*test_grep/, 'test_grep !', '! test_grep');
> +}
> +
> +# Map parsed commands back to raw file lines for --fix.
> +# Detection uses parsed tokens (correct handling of quoting,
> +# heredocs, pipes) but fixes must modify the original text
> +# to preserve formatting.
> +package LintParser;
> +
> +sub check_test {
> +       # Called by ScriptParser::parse_cmd for each test_expect_success
> +       # or test_expect_failure block.
> +       my $self = shift @_;
> +       my $title = ScriptParser::unwrap(shift @_);
> +
> +       # Two test body formats:
> +       #   Quoted:  test_expect_success 'title' '..body..'
> +       #   Heredoc: test_expect_success 'title' - <<\EOF
> +       #              ..body..
> +       #            EOF
> +       # For quoted, the body token is the quoted string.
> +       # For heredoc, the body token is '-' and the actual
> +       # code arrives as the next argument from the Lexer.
> +       my $body_token = shift @_;
> +       my $lineno_base = $body_token->[3] || 1;
> +       my $body = ScriptParser::unwrap($body_token);
> +
> +       if ($body eq '-') {
> +               my $herebody = shift @_;
> +               if ($herebody) {
> +                       $body = $herebody->{content};
> +                       $lineno_base = $herebody->{start_line} || 1;
> +               }
> +       }
> +       return unless $body;
> +
> +       # Map each command back to its file line number.
> +       # $lineno_base is where the body starts in the file;
> +       # $cmd->{lineno} is relative to the body (starting at 1).
> +       my $raw_lines = $self->{raw_lines};
> +       for my $cmd (main::parse_commands($body)) {
> +               my $ln = ($cmd->{lineno} || 0) + $lineno_base - 1;
> +               $cmd->{lineno} = $ln;
> +               next unless $ln >= 1 && $ln <= @$raw_lines;
> +               next if $raw_lines->[$ln - 1] =~ /#.*lint-ok/;
> +
> +               if (main::check_test_grep_negation($cmd, $self->{file}, \$raw_lines->[$ln - 1])) {
> +                       $self->{dirty} = 1;
> +               }
> +       }
> +}
> +
> +package main;
> +
> +for my $file (@ARGV) {
> +       # :unix:crlf strips \r on Windows (same as chainlint.pl)
> +       open(my $fh, '<:unix:crlf', $file) or die "$0: $file: $!\n";
> +       my @raw_lines = <$fh>;
> +       close $fh;
> +
> +       my $parser = LintParser->new(\join('', @raw_lines));
> +       $parser->{file} = $file;
> +       $parser->{raw_lines} = \@raw_lines;
> +       $parser->{dirty} = 0;
> +       $parser->parse();
> +
> +       if ($fix_mode && $parser->{dirty}) {
> +               open(my $out, '>', $file) or die "$0: $file: $!\n";
> +               print $out @{$parser->{raw_lines}};
> +               close $out;
> +       }
> +}
> +
> +if ($has_fixable && !$fix_mode) {
> +       print "hint: run with --fix to apply the suggested replacements.\n";
> +}
> +exit $exit_code;
> diff --git a/t/lint-style/heredoc.expect b/t/lint-style/heredoc.expect
> new file mode 100644
> index 0000000000..7ff6d4a52d
> --- /dev/null
> +++ b/t/lint-style/heredoc.expect
> @@ -0,0 +1,3 @@
> +lint-style/heredoc.test:8: error: replace '! test_grep' with 'test_grep !': ! test_grep "after-heredoc-is-caught" actual
> +lint-style/heredoc.test:13: error: replace '! test_grep' with 'test_grep !': ! test_grep "not-inside-sed-heredoc" actual
> +hint: run with --fix to apply the suggested replacements.
> diff --git a/t/lint-style/heredoc.test b/t/lint-style/heredoc.test
> new file mode 100644
> index 0000000000..4c05831cfb
> --- /dev/null
> +++ b/t/lint-style/heredoc.test
> @@ -0,0 +1,14 @@
> +test_expect_success 'greps inside heredocs are skipped' '
> +       cat <<-EOF &&
> +       grep "inside-strip-tabs" file
> +       EOF
> +       cat <<-\EOF &&
> +       grep "inside-no-expand" file
> +       EOF
> +       ! test_grep "after-heredoc-is-caught" actual
> +'
> +
> +test_expect_success 'sed with << does not start a heredoc' '
> +       sed "s/<< foo/bar/" file &&
> +       ! test_grep "not-inside-sed-heredoc" actual
> +'
> diff --git a/t/lint-style/test-grep-negation-fix.expect b/t/lint-style/test-grep-negation-fix.expect
> new file mode 100644
> index 0000000000..28ecde1073
> --- /dev/null
> +++ b/t/lint-style/test-grep-negation-fix.expect
> @@ -0,0 +1,4 @@
> +test_expect_success 'negated test_grep' '
> +       test_grep ! "pattern" actual &&
> +       test_grep ! -i "insensitive" actual
> +'
> diff --git a/t/lint-style/test-grep-negation-fix.test b/t/lint-style/test-grep-negation-fix.test
> new file mode 100644
> index 0000000000..571c150031
> --- /dev/null
> +++ b/t/lint-style/test-grep-negation-fix.test
> @@ -0,0 +1,4 @@
> +test_expect_success 'negated test_grep' '
> +       ! test_grep "pattern" actual &&
> +       ! test_grep -i "insensitive" actual
> +'
> diff --git a/t/lint-style/test-grep-negation.expect b/t/lint-style/test-grep-negation.expect
> new file mode 100644
> index 0000000000..1fa9e124aa
> --- /dev/null
> +++ b/t/lint-style/test-grep-negation.expect
> @@ -0,0 +1,3 @@
> +lint-style/test-grep-negation.test:2: error: replace '! test_grep' with 'test_grep !': ! test_grep "pattern" actual
> +lint-style/test-grep-negation.test:3: error: replace '! test_grep' with 'test_grep !': ! test_grep -i "insensitive" actual
> +hint: run with --fix to apply the suggested replacements.
> diff --git a/t/lint-style/test-grep-negation.test b/t/lint-style/test-grep-negation.test
> new file mode 100644
> index 0000000000..571c150031
> --- /dev/null
> +++ b/t/lint-style/test-grep-negation.test
> @@ -0,0 +1,4 @@
> +test_expect_success 'negated test_grep' '
> +       ! test_grep "pattern" actual &&
> +       ! test_grep -i "insensitive" actual
> +'
> diff --git a/t/t0031-lockfile-pid.sh b/t/t0031-lockfile-pid.sh
> index 8ef87addf5..e9e2f04049 100755
> --- a/t/t0031-lockfile-pid.sh
> +++ b/t/t0031-lockfile-pid.sh
> @@ -29,7 +29,7 @@ test_expect_success 'PID info not shown by default' '
>                 test_must_fail git add . 2>err &&
>                 # Should not crash, just show normal error without PID
>                 test_grep "Unable to create" err &&
> -               ! test_grep "is held by process" err
> +               test_grep ! "is held by process" err
>         )
>  '
>
> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> index 73445782e7..3179b4963e 100755
> --- a/t/t5300-pack-object.sh
> +++ b/t/t5300-pack-object.sh
> @@ -720,7 +720,7 @@ test_expect_success '--name-hash-version=2 and --write-bitmap-index are incompat
>
>         # --stdout option silently removes --write-bitmap-index
>         git pack-objects --stdout --all --name-hash-version=2 --write-bitmap-index >out 2>err &&
> -       ! test_grep "currently, --write-bitmap-index requires --name-hash-version=1" err
> +       test_grep ! "currently, --write-bitmap-index requires --name-hash-version=1" err
>  '
>
>  test_expect_success '--path-walk pack everything' '
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index fa0d4046f7..9154d9795f 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -1175,7 +1175,7 @@ test_expect_success 'load reverse index when missing .idx, .pack' '
>
>  test_expect_success 'usage shown without sub-command' '
>         test_expect_code 129 git multi-pack-index 2>err &&
> -       ! test_grep "unrecognized subcommand" err
> +       test_grep ! "unrecognized subcommand" err
>  '
>
>  test_expect_success 'complains when run outside of a repository' '
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index d7f82e1bec..9db4a76f67 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -664,7 +664,7 @@ test_geometric_repack_needed () {
>         true)
>                 test_grep "\[\"git\",\"repack\"," trace2.txt;;
>         false)
> -               ! test_grep "\[\"git\",\"repack\"," trace2.txt;;
> +               test_grep ! "\[\"git\",\"repack\"," trace2.txt;;
>         *)
>                 BUG "invalid parameter: $NEEDED";;
>         esac
> --
> gitgitgadget
>


-- 
D. Ben Knoble

^ permalink raw reply

* [PATCH 6/6] hash-object: add a >4GB/LLP64 test case using filtered input
From: Philip Oakley via GitGitGadget @ 2026-06-04 17:15 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Philip Oakley
In-Reply-To: <pull.2138.git.1780593313.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

To verify that the `clean` side of the `clean`/`smudge` filter code is
correct with regards to LLP64 (read: to ensure that `size_t` is used
instead of `unsigned long`), here is a test case using a trivial filter,
specifically _not_ writing anything to the object store to limit the
scope of the test case.

As in previous commits, the `big` file from previous test cases is
reused if available, to save setup time, otherwise re-generated.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index f2722380ee..841a6671d1 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -285,4 +285,16 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
 	test_cmp expect actual
 '
 
+# This clean filter does nothing, other than excercising the interface.
+# We ensure that cleaning doesn't mangle large files on 64-bit Windows.
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+		'hash filtered files over 4GB correctly' '
+	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
+	test_oid large5GB >expect &&
+	test_config filter.null-filter.clean "cat" &&
+	echo "big filter=null-filter" >.gitattributes &&
+	git hash-object -- big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH 5/6] hash-object: add another >4GB/LLP64 test case
From: Philip Oakley via GitGitGadget @ 2026-06-04 17:15 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Philip Oakley
In-Reply-To: <pull.2138.git.1780593313.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

To complement the `--stdin` and `--literally` test cases that verify
that we can hash files larger than 4GB on 64-bit platforms using the
LLP64 data model, here is a test case that exercises `hash-object`
_without_ any options.

Just as before, we use the `big` file from the previous test case if it
exists to save on setup time, otherwise generate it.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 59efee3aff..f2722380ee 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -277,4 +277,12 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
 	test_cmp expect actual
 '
 
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+		'files over 4GB hash correctly' '
+	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
+	test_oid large5GB >expect &&
+	git hash-object -- big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 4/6] hash-object --stdin: verify that it works with >4GB/LLP64
From: Philip Oakley via GitGitGadget @ 2026-06-04 17:15 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Philip Oakley
In-Reply-To: <pull.2138.git.1780593313.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

Just like the `hash-object --literally` code path, the `--stdin` code
path also needs to use `size_t` instead of `unsigned long` to represent
memory sizes, otherwise it would cause problems on platforms using the
LLP64 data model (such as Windows).

To limit the scope of the test case, the object is explicitly not
written to the object store, nor are any filters applied.

The `big` file from the previous test case is reused to save setup time;
To avoid relying on that side effect, it is generated if it does not
exist (e.g. when running via `sh t1007-*.sh --long --run=1,41`).

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 10382a815e..59efee3aff 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -269,4 +269,12 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
 	test_cmp expect actual
 '

+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+		'files over 4GB hash correctly via --stdin' '
+	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
+	test_oid large5GB >expect &&
+	git hash-object --stdin <big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH 3/6] hash algorithms: use size_t for section lengths
From: Philip Oakley via GitGitGadget @ 2026-06-04 17:15 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Philip Oakley
In-Reply-To: <pull.2138.git.1780593313.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

Continue walking the code path for the >4GB `hash-object --literally`
test to the hash algorithm step for LLP64 systems.

This patch lets the SHA1DC code use `size_t`, making it compatible with
LLP64 data models (as used e.g. by Windows).

The interested reader of this patch will note that we adjust the
signature of the `git_SHA1DCUpdate()` function without updating _any_
call site. This certainly puzzled at least one reviewer already, so here
is an explanation:

This function is never called directly, but always via the macro
`platform_SHA1_Update`, which is usually called via the macro
`git_SHA1_Update`. However, we never call `git_SHA1_Update()` directly
in `struct git_hash_algo`. Instead, we call `git_hash_sha1_update()`,
which is defined thusly:

    static void git_hash_sha1_update(git_hash_ctx *ctx,
                                     const void *data, size_t len)
    {
        git_SHA1_Update(&ctx->sha1, data, len);
    }

i.e. it contains an implicit downcast from `size_t` to `unsigned long`
(before this here patch). With this patch, there is no downcast anymore.

With this patch, finally, the t1007-hash-object.sh "files over 4GB hash
literally" test case is fixed.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 object-file.c          | 4 ++--
 sha1dc_git.c           | 3 +--
 sha1dc_git.h           | 2 +-
 t/t1007-hash-object.sh | 2 +-
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 1f5f9daf24..c648cecd80 100644
--- a/object-file.c
+++ b/object-file.c
@@ -561,7 +561,7 @@ int odb_source_loose_read_object_info(struct odb_source *source,
 }
 
 static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
-			     const void *buf, unsigned long len,
+			     const void *buf, size_t len,
 			     struct object_id *oid,
 			     char *hdr, size_t *hdrlen)
 {
@@ -581,7 +581,7 @@ static void write_object_file_prepare(const struct git_hash_algo *algo,
 	/* Generate the header */
 	*hdrlen = format_object_header(hdr, *hdrlen, type, len);
 
-	/* Sha1.. */
+	/* Hash (function pointers) computation */
 	hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen);
 }
 
diff --git a/sha1dc_git.c b/sha1dc_git.c
index 9b675a046e..fe58d7962a 100644
--- a/sha1dc_git.c
+++ b/sha1dc_git.c
@@ -27,10 +27,9 @@ void git_SHA1DCFinal(unsigned char hash[20], SHA1_CTX *ctx)
 /*
  * Same as SHA1DCUpdate, but adjust types to match git's usual interface.
  */
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, size_t len)
 {
 	const char *data = vdata;
-	/* We expect an unsigned long, but sha1dc only takes an int */
 	while (len > INT_MAX) {
 		SHA1DCUpdate(ctx, data, INT_MAX);
 		data += INT_MAX;
diff --git a/sha1dc_git.h b/sha1dc_git.h
index f6f880cabe..0bcf1aa84b 100644
--- a/sha1dc_git.h
+++ b/sha1dc_git.h
@@ -15,7 +15,7 @@ void git_SHA1DCInit(SHA1_CTX *);
 #endif
 
 void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, size_t len);
 
 #define platform_SHA_IS_SHA1DC /* used by "test-tool sha1-is-sha1dc" */
 
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 7867fd1dbf..10382a815e 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -261,7 +261,7 @@ test_expect_success '--stdin outside of repository (uses default hash)' '
 	test_cmp expect actual
 '
 
-test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
 		'files over 4GB hash literally' '
 	test-tool genzeros $((5*1024*1024*1024)) >big &&
 	test_oid large5GB >expect &&
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 2/6] object-file.c: use size_t for header lengths
From: Philip Oakley via GitGitGadget @ 2026-06-04 17:15 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Philip Oakley
In-Reply-To: <pull.2138.git.1780593313.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

Continue walking the code path for the >4GB `hash-object --literally`
test. The `hash_object_file_literally()` function internally uses both
`hash_object_file()` and `write_object_file_prepare()`. Both function
signatures use `unsigned long` rather than `size_t` for the mem buffer
sizes. Use `size_t` instead, for LLP64 compatibility.

While at it, convert those function's object's header buffer length to
`size_t` for consistency. The value is already upcast to `uintmax_t` for
print format compatibility.

Note: The hash-object test still does not pass. A subsequent commit
continues to walk the call tree's lower level hash functions to identify
further fixes.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 object-file.c | 14 +++++++-------
 object-file.h |  4 ++--
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/object-file.c b/object-file.c
index 90f995d000..1f5f9daf24 100644
--- a/object-file.c
+++ b/object-file.c
@@ -563,7 +563,7 @@ int odb_source_loose_read_object_info(struct odb_source *source,
 static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
 			     const void *buf, unsigned long len,
 			     struct object_id *oid,
-			     char *hdr, int *hdrlen)
+			     char *hdr, size_t *hdrlen)
 {
 	algo->init_fn(c);
 	git_hash_update(c, hdr, *hdrlen);
@@ -572,9 +572,9 @@ static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_c
 }
 
 static void write_object_file_prepare(const struct git_hash_algo *algo,
-				      const void *buf, unsigned long len,
+				      const void *buf, size_t len,
 				      enum object_type type, struct object_id *oid,
-				      char *hdr, int *hdrlen)
+				      char *hdr, size_t *hdrlen)
 {
 	struct git_hash_ctx c;
 
@@ -717,11 +717,11 @@ out:
 }
 
 void hash_object_file(const struct git_hash_algo *algo, const void *buf,
-		      unsigned long len, enum object_type type,
+		      size_t len, enum object_type type,
 		      struct object_id *oid)
 {
 	char hdr[MAX_HEADER_LEN];
-	int hdrlen = sizeof(hdr);
+	size_t hdrlen = sizeof(hdr);
 
 	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
 }
@@ -1177,7 +1177,7 @@ cleanup:
 }
 
 int odb_source_loose_write_object(struct odb_source *source,
-				  const void *buf, unsigned long len,
+				  const void *buf, size_t len,
 				  enum object_type type, struct object_id *oid,
 				  struct object_id *compat_oid_in,
 				  enum odb_write_object_flags flags)
@@ -1186,7 +1186,7 @@ int odb_source_loose_write_object(struct odb_source *source,
 	const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
 	struct object_id compat_oid;
 	char hdr[MAX_HEADER_LEN];
-	int hdrlen = sizeof(hdr);
+	size_t hdrlen = sizeof(hdr);
 
 	/* Generate compat_oid */
 	if (compat) {
diff --git a/object-file.h b/object-file.h
index 5241b8dd5c..e1e22d512d 100644
--- a/object-file.h
+++ b/object-file.h
@@ -66,7 +66,7 @@ int odb_source_loose_freshen_object(struct odb_source *source,
 				    const struct object_id *oid);
 
 int odb_source_loose_write_object(struct odb_source *source,
-				  const void *buf, unsigned long len,
+				  const void *buf, size_t len,
 				  enum object_type type, struct object_id *oid,
 				  struct object_id *compat_oid_in,
 				  enum odb_write_object_flags flags);
@@ -201,7 +201,7 @@ int finalize_object_file_flags(struct repository *repo,
 			       enum finalize_object_file_flags flags);
 
 void hash_object_file(const struct git_hash_algo *algo, const void *buf,
-		      unsigned long len, enum object_type type,
+		      size_t len, enum object_type type,
 		      struct object_id *oid);
 
 /* Helper to check and "touch" a file */
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 1/6] hash-object: demonstrate a >4GB/LLP64 problem
From: Philip Oakley via GitGitGadget @ 2026-06-04 17:15 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Philip Oakley
In-Reply-To: <pull.2138.git.1780593313.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

On LLP64 systems, such as Windows, the size of `long`, `int`, etc. is
only 32 bits (for backward compatibility). Git's use of `unsigned long`
for file memory sizes in many places, rather than size_t, limits the
handling of large files on LLP64 systems (commonly given as `>4GB`).

Provide a minimum test for handling a >4GB file. The `hash-object`
command, with the  `--literally` and without `-w` option avoids
writing the object, either loose or packed. This avoids the code paths
hitting the `bigFileThreshold` config test code, the zlib code, and the
pack code.

Subsequent patches will walk the test's call chain, converting types to
`size_t` (which is larger in LLP64 data models) where appropriate.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index de076293b6..7867fd1dbf 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -49,6 +49,9 @@ test_expect_success 'setup' '

 	example sha1:ddd3f836d3e3fbb7ae289aa9ae83536f76956399
 	example sha256:b44fe1fe65589848253737db859bd490453510719d7424daab03daf0767b85ae
+
+	large5GB sha1:0be2be10a4c8764f32c4bf372a98edc731a4b204
+	large5GB sha256:dc18ca621300c8d3cfa505a275641ebab00de189859e022a975056882d313e64
 	EOF
 '

@@ -258,4 +261,12 @@ test_expect_success '--stdin outside of repository (uses default hash)' '
 	test_cmp expect actual
 '

+test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+		'files over 4GB hash literally' '
+	test-tool genzeros $((5*1024*1024*1024)) >big &&
+	test_oid large5GB >expect &&
+	git hash-object --stdin --literally <big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH 0/6] Support hashing objects larger than 4GB on Windows
From: Johannes Schindelin via GitGitGadget @ 2026-06-04 17:15 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

Philip Oakley has contributed these patches ~4.5 years ago, and they have
been carried in Git for Windows ever since.

Now that there are already other patch series flying around that try to
address various aspects about >4GB objects (which aren't handled well by Git
until it stops forcing unsigned long to do size_t's job), it seems a good
time to upstream these patches, too, at long last.

Philip Oakley (6):
  hash-object: demonstrate a >4GB/LLP64 problem
  object-file.c: use size_t for header lengths
  hash algorithms: use size_t for section lengths
  hash-object --stdin: verify that it works with >4GB/LLP64
  hash-object: add another >4GB/LLP64 test case
  hash-object: add a >4GB/LLP64 test case using filtered input

 object-file.c          | 18 +++++++++---------
 object-file.h          |  4 ++--
 sha1dc_git.c           |  3 +--
 sha1dc_git.h           |  2 +-
 t/t1007-hash-object.sh | 39 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 52 insertions(+), 14 deletions(-)


base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2138%2Fdscho%2FPhilipOakley%2Fhashliteral_t-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2138/dscho/PhilipOakley/hashliteral_t-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2138
-- 
gitgitgadget

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox