* [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs
@ 2025-10-13 17:46 Okhuomon Ajayi
2025-10-14 3:00 ` Junio C Hamano
2025-10-14 21:18 ` brian m. carlson
0 siblings, 2 replies; 9+ messages in thread
From: Okhuomon Ajayi @ 2025-10-13 17:46 UTC (permalink / raw)
To: git; +Cc: Okhuomon Ajayi
Patch IDs in Git must always use SHA1, regardless of the repository's
object hash. Previously, the code relied on `the_hash_algo` which could
vary depending on the repository, and included a NEEDSWORK comment
suggesting this should be fixed.
This patch updates the comment to clearly state that SHA1 is required
for patch IDs and sets the hash algorithm to SHA1 if it is not already
set. This ensures consistent computation of patch IDs in accordance
with git-patch-id(1).
No functional behavior is changed, but misleading comments are removed
and the code now explicitly enforces correct SHA1 usage for patch IDs.
Signed-off-by: Okhuomon Ajayi <okhuomonajayi54@gmail.com>
---
builtin/patch-id.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index d26e9d0c1e..d47b6f5a3f 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -246,16 +246,11 @@ int cmd_patch_id(int argc,
patch_id_usage, 0);
/*
- * We rely on `the_hash_algo` to compute patch IDs. This is dubious as
- * it means that the hash algorithm now depends on the object hash of
- * the repository, even though git-patch-id(1) clearly defines that
- * patch IDs always use SHA1.
- *
- * NEEDSWORK: This hack should be removed in favor of converting
- * the code that computes patch IDs to always use SHA1.
+ * Patch IDs must always use SHA1, regardless of the repository's
+ * object hash, See git-patch-id(1) for details.
*/
if (!the_hash_algo)
- repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
+ repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
generate_id_list(opts ? opts > 1 : config.stable,
opts ? opts == 3 : config.verbatim);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs
2025-10-13 17:46 [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs Okhuomon Ajayi
@ 2025-10-14 3:00 ` Junio C Hamano
2025-10-14 8:04 ` Okhuomon Ajayi
2025-10-14 21:18 ` brian m. carlson
1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2025-10-14 3:00 UTC (permalink / raw)
To: Okhuomon Ajayi; +Cc: git
Okhuomon Ajayi <okhuomonajayi54@gmail.com> writes:
> Patch IDs in Git must always use SHA1, regardless of the repository's
> object hash. Previously, the code relied on `the_hash_algo` which could
> vary depending on the repository, and included a NEEDSWORK comment
> suggesting this should be fixed.
I do not think that is what the comment suggests to do.
Read it again:
... should be removed in favor of converting the code that
computes patch IDs to always use SHA1.
There are code paths that compute patch IDs elsewhere, and they are
not immediately below the NEEDSWORK comment. The code relies on
the_hash_algo and that is why the "hack" makes repo_set_hash_algo()
call. The suggestion is to convert that code that hashes patch to
compute patch IDs not to use the_hash_algo that is repository
dependeant. I think get_one_pathcid() function in the same file is
one of them.
> This patch updates the comment to clearly state that SHA1 is required
> for patch IDs and sets the hash algorithm to SHA1 if it is not already
> set. This ensures consistent computation of patch IDs in accordance
> with git-patch-id(1).
And if it is already set? I think what your first paragraph claims
to be problematic is that case, and the patch does not touch that
case at all.
Blindly setting the_hash_algo to SHA-1 may not be the end of the
"solution", so whoever wants to work on this needs to be extra
careful. If the code after this point, starting from the call to
generate_id_list() we see in the post context, need to touch any Git
objects in the current repository (e.g., to obtain patch text or
some configuration data), such accesses need to use the hash that
the repository uses. Only the final "now we have this patch, and we
learned what the configuration says how we should compute the
patch-id. Let's hash the patch text following the specified
algorithm" step should use SHA-1 as the hash algorithm.
Perhaps we are lucky that this program has *no* need to access
objects in the repository (my quick scan says this seems to work on
an external text file and does not generate diffs locally out of
objects), and it may not depend on configuration data coming from
any objects in the repository (there are some configuration variables
whose values are blob object names that instructs Git to read such
an object). In such a case, then the solution may be to always
make the code ignore the_hash_algo and unconditionally using SHA1.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs
2025-10-14 3:00 ` Junio C Hamano
@ 2025-10-14 8:04 ` Okhuomon Ajayi
0 siblings, 0 replies; 9+ messages in thread
From: Okhuomon Ajayi @ 2025-10-14 8:04 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Thanks for the explanation!
I’ll update the patch so it doesn’t touch the_hash_algo globally, and
instead uses SHA1 only for the patch-ID computation itself. I’ll also
tweak the comment to make it clear that this is just part of the
bigger work to standardize patch-ID handling across Git
On Tue, Oct 14, 2025 at 4:00 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Okhuomon Ajayi <okhuomonajayi54@gmail.com> writes:
>
> > Patch IDs in Git must always use SHA1, regardless of the repository's
> > object hash. Previously, the code relied on `the_hash_algo` which could
> > vary depending on the repository, and included a NEEDSWORK comment
> > suggesting this should be fixed.
>
> I do not think that is what the comment suggests to do.
>
> Read it again:
>
> ... should be removed in favor of converting the code that
> computes patch IDs to always use SHA1.
>
> There are code paths that compute patch IDs elsewhere, and they are
> not immediately below the NEEDSWORK comment. The code relies on
> the_hash_algo and that is why the "hack" makes repo_set_hash_algo()
> call. The suggestion is to convert that code that hashes patch to
> compute patch IDs not to use the_hash_algo that is repository
> dependeant. I think get_one_pathcid() function in the same file is
> one of them.
>
> > This patch updates the comment to clearly state that SHA1 is required
> > for patch IDs and sets the hash algorithm to SHA1 if it is not already
> > set. This ensures consistent computation of patch IDs in accordance
> > with git-patch-id(1).
>
> And if it is already set? I think what your first paragraph claims
> to be problematic is that case, and the patch does not touch that
> case at all.
>
> Blindly setting the_hash_algo to SHA-1 may not be the end of the
> "solution", so whoever wants to work on this needs to be extra
> careful. If the code after this point, starting from the call to
> generate_id_list() we see in the post context, need to touch any Git
> objects in the current repository (e.g., to obtain patch text or
> some configuration data), such accesses need to use the hash that
> the repository uses. Only the final "now we have this patch, and we
> learned what the configuration says how we should compute the
> patch-id. Let's hash the patch text following the specified
> algorithm" step should use SHA-1 as the hash algorithm.
>
> Perhaps we are lucky that this program has *no* need to access
> objects in the repository (my quick scan says this seems to work on
> an external text file and does not generate diffs locally out of
> objects), and it may not depend on configuration data coming from
> any objects in the repository (there are some configuration variables
> whose values are blob object names that instructs Git to read such
> an object). In such a case, then the solution may be to always
> make the code ignore the_hash_algo and unconditionally using SHA1.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs
2025-10-13 17:46 [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs Okhuomon Ajayi
2025-10-14 3:00 ` Junio C Hamano
@ 2025-10-14 21:18 ` brian m. carlson
2025-10-14 22:29 ` Junio C Hamano
1 sibling, 1 reply; 9+ messages in thread
From: brian m. carlson @ 2025-10-14 21:18 UTC (permalink / raw)
To: Okhuomon Ajayi; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2523 bytes --]
On 2025-10-13 at 17:46:58, Okhuomon Ajayi wrote:
> Patch IDs in Git must always use SHA1, regardless of the repository's
> object hash. Previously, the code relied on `the_hash_algo` which could
> vary depending on the repository, and included a NEEDSWORK comment
> suggesting this should be fixed.
>
> This patch updates the comment to clearly state that SHA1 is required
> for patch IDs and sets the hash algorithm to SHA1 if it is not already
> set. This ensures consistent computation of patch IDs in accordance
> with git-patch-id(1).
>
> No functional behavior is changed, but misleading comments are removed
> and the code now explicitly enforces correct SHA1 usage for patch IDs.
>
> Signed-off-by: Okhuomon Ajayi <okhuomonajayi54@gmail.com>
> ---
> builtin/patch-id.c | 11 +++--------
> 1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/patch-id.c b/builtin/patch-id.c
> index d26e9d0c1e..d47b6f5a3f 100644
> --- a/builtin/patch-id.c
> +++ b/builtin/patch-id.c
> @@ -246,16 +246,11 @@ int cmd_patch_id(int argc,
> patch_id_usage, 0);
>
> /*
> - * We rely on `the_hash_algo` to compute patch IDs. This is dubious as
> - * it means that the hash algorithm now depends on the object hash of
> - * the repository, even though git-patch-id(1) clearly defines that
> - * patch IDs always use SHA1.
> - *
> - * NEEDSWORK: This hack should be removed in favor of converting
> - * the code that computes patch IDs to always use SHA1.
> + * Patch IDs must always use SHA1, regardless of the repository's
> + * object hash, See git-patch-id(1) for details.
> */
> if (!the_hash_algo)
> - repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
> + repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
Hmmm. If I run git patch-id in a SHA-256 repository, then I get a
SHA-256 output here and it's worked this way since Git 2.29.
I know the comment says what it says, but I personally disagree with
this approach. There will be a point in time where SHA-1 is so weak as
to be useless and people will want to build a Git version without it.
For instance, many government agencies around the world have a 2030
deadline for completely stopping all use of SHA-1. If we continue to
use SHA-1 here, then this will have to change anyway in a few years, so
we'd be better off keeping the default algorithm for now and adding an
option to control which hash is used.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs
2025-10-14 21:18 ` brian m. carlson
@ 2025-10-14 22:29 ` Junio C Hamano
2025-10-14 22:49 ` brian m. carlson
0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2025-10-14 22:29 UTC (permalink / raw)
To: brian m. carlson; +Cc: Okhuomon Ajayi, git
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> if (!the_hash_algo)
>> - repo_set_hash_algo(the_repository, GIT_HASH_DEFAULT);
>> + repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
>
> Hmmm. If I run git patch-id in a SHA-256 repository, then I get a
> SHA-256 output here and it's worked this way since Git 2.29.
>
> I know the comment says what it says, but I personally disagree with
> this approach. There will be a point in time where SHA-1 is so weak as
> to be useless and people will want to build a Git version without it.
> For instance, many government agencies around the world have a 2030
> deadline for completely stopping all use of SHA-1. If we continue to
> use SHA-1 here, then this will have to change anyway in a few years, so
> we'd be better off keeping the default algorithm for now and adding an
> option to control which hash is used.
I do not quite agree with that, as SHA-1 in patch-id is merely used
as "a hash function with good distribution that we happened to have
handy access to" without any security requirement. Being able to
compare patch IDs computed long ago stored somewhere with patch ID
on a patch that claims to be freshly written and find them the same
to say "you know, somebody wrote exactly the same patch 7 years ago"
would be valuable, and we do not want to lose it even when you
happen to store your payload in a SHA-256 repository.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs
2025-10-14 22:29 ` Junio C Hamano
@ 2025-10-14 22:49 ` brian m. carlson
2025-10-14 23:27 ` Okhuomon Ajayi
2025-10-15 13:37 ` Junio C Hamano
0 siblings, 2 replies; 9+ messages in thread
From: brian m. carlson @ 2025-10-14 22:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Okhuomon Ajayi, git
[-- Attachment #1: Type: text/plain, Size: 1545 bytes --]
On 2025-10-14 at 22:29:34, Junio C Hamano wrote:
> I do not quite agree with that, as SHA-1 in patch-id is merely used
> as "a hash function with good distribution that we happened to have
> handy access to" without any security requirement. Being able to
> compare patch IDs computed long ago stored somewhere with patch ID
> on a patch that claims to be freshly written and find them the same
> to say "you know, somebody wrote exactly the same patch 7 years ago"
> would be valuable, and we do not want to lose it even when you
> happen to store your payload in a SHA-256 repository.
I think that's too late, though. We already use SHA-256 in a SHA-256
repository, so people already expect that to work now and in the future.
The time to make this decision would have been in 2020 with Git 2.29,
but we now have people who will be using SHA-256 patch IDs and we need
to support them.
We have also specifically discussed in the past people eventually
wanting to compile Git without SHA-1 support at some point in the future
for regulatory or compliance reasons, so we should full well expect that
to happen and we'll need to be agile about the algorithm. SHA-1 will
definitely disappear from at least some distributions of Git in the
future.
Given that context, I think allowing the specification of an algorithm
would allow people to say, "Yes, I am in a SHA-256 repository, but I
want SHA-1," or vice versa, which would work with your use case better.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs
2025-10-14 22:49 ` brian m. carlson
@ 2025-10-14 23:27 ` Okhuomon Ajayi
2025-10-15 13:37 ` Junio C Hamano
1 sibling, 0 replies; 9+ messages in thread
From: Okhuomon Ajayi @ 2025-10-14 23:27 UTC (permalink / raw)
To: brian m. carlson, Junio C Hamano, Okhuomon Ajayi, git
Hi Junio, Brian,
Thanks a lot for the detailed explanations this gave me a much better
understanding of the history behind patch-id and why the hash choice
isn’t straightforward.
I see now that just forcing SHA-1 isn’t ideal since patch-id already
uses the repo’s hash in SHA-256 repos. I’ll take another look at how
the computation works in the other paths and think about how to handle
it better, maybe by adding an option or clarifying the behavior.
Really appreciate you both taking the time to explain I’m learning a
lot from this.
On Tue, Oct 14, 2025 at 11:49 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2025-10-14 at 22:29:34, Junio C Hamano wrote:
> > I do not quite agree with that, as SHA-1 in patch-id is merely used
> > as "a hash function with good distribution that we happened to have
> > handy access to" without any security requirement. Being able to
> > compare patch IDs computed long ago stored somewhere with patch ID
> > on a patch that claims to be freshly written and find them the same
> > to say "you know, somebody wrote exactly the same patch 7 years ago"
> > would be valuable, and we do not want to lose it even when you
> > happen to store your payload in a SHA-256 repository.
>
> I think that's too late, though. We already use SHA-256 in a SHA-256
> repository, so people already expect that to work now and in the future.
> The time to make this decision would have been in 2020 with Git 2.29,
> but we now have people who will be using SHA-256 patch IDs and we need
> to support them.
>
> We have also specifically discussed in the past people eventually
> wanting to compile Git without SHA-1 support at some point in the future
> for regulatory or compliance reasons, so we should full well expect that
> to happen and we'll need to be agile about the algorithm. SHA-1 will
> definitely disappear from at least some distributions of Git in the
> future.
>
> Given that context, I think allowing the specification of an algorithm
> would allow people to say, "Yes, I am in a SHA-256 repository, but I
> want SHA-1," or vice versa, which would work with your use case better.
> --
> brian m. carlson (they/them)
> Toronto, Ontario, CA
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs
2025-10-14 22:49 ` brian m. carlson
2025-10-14 23:27 ` Okhuomon Ajayi
@ 2025-10-15 13:37 ` Junio C Hamano
2025-10-15 13:59 ` Kristoffer Haugsbakk
1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2025-10-15 13:37 UTC (permalink / raw)
To: brian m. carlson; +Cc: Okhuomon Ajayi, git
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> Given that context, I think allowing the specification of an algorithm
> would allow people to say, "Yes, I am in a SHA-256 repository, but I
> want SHA-1," or vice versa, which would work with your use case better.
That is sensible.
In short, the automatic choice is to use the repository's hash
inside a repository, or use the then-default algorithm (which comes
from the preimage of the patch we discussed in this thread) outside
a repository. We want a "Use this hash algorithm, ignoring the
automatic choice" command line option that overrides it.
If we were to do configuration variables, we may need two. One to
replace only the fallback part (i.e. outside a repository, instead
of using whatever then-current algorithm, use this one), and the
other to act as if the above command line option is always given.
But as always, starting with only a command line option would be a
prudent way forward.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs
2025-10-15 13:37 ` Junio C Hamano
@ 2025-10-15 13:59 ` Kristoffer Haugsbakk
0 siblings, 0 replies; 9+ messages in thread
From: Kristoffer Haugsbakk @ 2025-10-15 13:59 UTC (permalink / raw)
To: Junio C Hamano, brian m. carlson; +Cc: Okhuomon Ajayi, git
On Wed, Oct 15, 2025, at 15:37, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> Given that context, I think allowing the specification of an algorithm
>> would allow people to say, "Yes, I am in a SHA-256 repository, but I
>> want SHA-1," or vice versa, which would work with your use case better.
>
> That is sensible.
>
> In short, the automatic choice is to use the repository's hash
> inside a repository, or use the then-default algorithm (which comes
> from the preimage of the patch we discussed in this thread) outside
> a repository. We want a "Use this hash algorithm, ignoring the
> automatic choice" command line option that overrides it.
>
> If we were to do configuration variables, we may need two. One to
> replace only the fallback part (i.e. outside a repository, instead
> of using whatever then-current algorithm, use this one), and the
> other to act as if the above command line option is always given.
I’ve been wondering. Why does a command which in my impression is only
useful for scripting have configuration variables? Shouldn’t plumbing
commands in general avoid that since they can end up relying on
configuration state if they end up in scripts?
I want `--stable` so I always try to use that, not the corresponding
configuration variable.
>
> But as always, starting with only a command line option would be a
> prudent way forward.
>
> Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-10-15 13:59 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-13 17:46 [PATCH] [PATCH] [Outreachy] builtin/patch-id.c: clarify SHA1 usage for patch IDs Okhuomon Ajayi
2025-10-14 3:00 ` Junio C Hamano
2025-10-14 8:04 ` Okhuomon Ajayi
2025-10-14 21:18 ` brian m. carlson
2025-10-14 22:29 ` Junio C Hamano
2025-10-14 22:49 ` brian m. carlson
2025-10-14 23:27 ` Okhuomon Ajayi
2025-10-15 13:37 ` Junio C Hamano
2025-10-15 13:59 ` Kristoffer Haugsbakk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).