* [PATCH 1/9] docs: update pack index v3 format
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
@ 2025-09-19 1:09 ` brian m. carlson
2025-09-19 22:08 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-19 1:09 ` [PATCH 2/9] docs: update offset order for pack index v3 brian m. carlson
` (9 subsequent siblings)
10 siblings, 2 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-19 1:09 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Derrick Stolee
Our current pack index v3 format uses 4-byte integers to find the
trailer of the file. This effectively means that the file cannot be
much larger than 2^32. While this might at first seem to be okay, we
expect that each object will have at least 64 bytes worth of data, which
means that no more than about 67 million objects can be stored.
Again, this might seem fine, but unfortunately, we know of many users
who attempt to create repos with extremely large numbers of commits to
get a "high score," and we've already seen repositories with at least 55
million commits. In the interests of gracefully handling repositories
even for these well-intentioned but ultimately misguided users, let's
change these lengths to 8 bytes.
For the checksums at the end of the file, we're producing 32-byte
SHA-256 checksums because that's what we already do with pack index v2
and SHA-256. Truncating SHA-256 doesn't pose any actual security
problems other than those related to the reduced size, but our pack
checksum must already be 32 bytes (since SHA-256 packs have 32-byte
checksums) and it simplifies the code to use the existing hashfile logic
for these cases for the index checksum as well.
In addition, even though we may not need cryptographic security for the
index checksum, we'd like to avoid arguments from auditors and such for
organizations that may have compliance or security requirements. Using
the simple, boring choice of the full SHA-256 hash avoids all possible
discussion related to hash truncation and removes impediments for these
organizations.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/technical/hash-function-transition.adoc | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
index f047fd80ca..f2df1d618d 100644
--- a/Documentation/technical/hash-function-transition.adoc
+++ b/Documentation/technical/hash-function-transition.adoc
@@ -227,9 +227,9 @@ network byte order):
** 4-byte length in bytes of shortened object names. This is the
shortest possible length needed to make names in the shortened
object name table unambiguous.
- ** 4-byte integer, recording where tables relating to this format
+ ** 8-byte integer, recording where tables relating to this format
are stored in this index file, as an offset from the beginning.
- * 4-byte offset to the trailer from the beginning of this file.
+ * 8-byte offset to the trailer from the beginning of this file.
* Zero or more additional key/value pairs (4-byte key, 4-byte
value). Only one key is supported: 'PSRC'. See the "Loose objects
and unreachable objects" section for supported values and how this
@@ -276,10 +276,10 @@ network byte order):
up to and not including the table of CRC32 values.
- Zero or more NUL bytes.
- The trailer consists of the following:
- * A copy of the 20-byte SHA-256 checksum at the end of the
+ * A copy of the 32-byte SHA-256 checksum at the end of the
corresponding packfile.
- * 20-byte SHA-256 checksum of all of the above.
+ * 32-byte SHA-256 checksum of all of the above.
Loose object index
~~~~~~~~~~~~~~~~~~
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH 1/9] docs: update pack index v3 format
2025-09-19 1:09 ` [PATCH 1/9] docs: update pack index v3 format brian m. carlson
@ 2025-09-19 22:08 ` Junio C Hamano
2025-09-20 15:23 ` brian m. carlson
2025-09-24 7:55 ` Patrick Steinhardt
1 sibling, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2025-09-19 22:08 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> Our current pack index v3 format uses 4-byte integers to find the
> trailer of the file. This effectively means that the file cannot be
> much larger than 2^32. While this might at first seem to be okay, we
> expect that each object will have at least 64 bytes worth of data, which
> means that no more than about 67 million objects can be stored.
>
> Again, this might seem fine, but unfortunately, we know of many users
> who attempt to create repos with extremely large numbers of commits to
> get a "high score," and we've already seen repositories with at least 55
> million commits. In the interests of gracefully handling repositories
> even for these well-intentioned but ultimately misguided users, let's
> change these lengths to 8 bytes.
Very sensible.
I do also agree that 32-byte is the natural size for the trailing
hash, but I found that the two paragraphs below was far more than
necessary. As they argue, we use a truncated hash anywhere in our
file formats, so I would have understood if the explanation were
"20" in "A copy of the 20-byte SHA-256 checksum" is an obvious
typo, as SHA-256 is longer than that. Fix it to "32".
instead of these two paragraphs.
Or did we mean to use a truncated hash back when this transition
design was proposed originally?
> For the checksums at the end of the file, we're producing 32-byte
> SHA-256 checksums because that's what we already do with pack index v2
> and SHA-256. Truncating SHA-256 doesn't pose any actual security
> problems other than those related to the reduced size, but our pack
> checksum must already be 32 bytes (since SHA-256 packs have 32-byte
> checksums) and it simplifies the code to use the existing hashfile logic
> for these cases for the index checksum as well.
>
> In addition, even though we may not need cryptographic security for the
> index checksum, we'd like to avoid arguments from auditors and such for
> organizations that may have compliance or security requirements. Using
> the simple, boring choice of the full SHA-256 hash avoids all possible
> discussion related to hash truncation and removes impediments for these
> organizations.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> Documentation/technical/hash-function-transition.adoc | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
> index f047fd80ca..f2df1d618d 100644
> --- a/Documentation/technical/hash-function-transition.adoc
> +++ b/Documentation/technical/hash-function-transition.adoc
> @@ -227,9 +227,9 @@ network byte order):
> ** 4-byte length in bytes of shortened object names. This is the
> shortest possible length needed to make names in the shortened
> object name table unambiguous.
> - ** 4-byte integer, recording where tables relating to this format
> + ** 8-byte integer, recording where tables relating to this format
> are stored in this index file, as an offset from the beginning.
> - * 4-byte offset to the trailer from the beginning of this file.
> + * 8-byte offset to the trailer from the beginning of this file.
> * Zero or more additional key/value pairs (4-byte key, 4-byte
> value). Only one key is supported: 'PSRC'. See the "Loose objects
> and unreachable objects" section for supported values and how this
> @@ -276,10 +276,10 @@ network byte order):
> up to and not including the table of CRC32 values.
> - Zero or more NUL bytes.
> - The trailer consists of the following:
> - * A copy of the 20-byte SHA-256 checksum at the end of the
> + * A copy of the 32-byte SHA-256 checksum at the end of the
> corresponding packfile.
>
> - * 20-byte SHA-256 checksum of all of the above.
> + * 32-byte SHA-256 checksum of all of the above.
>
> Loose object index
> ~~~~~~~~~~~~~~~~~~
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH 1/9] docs: update pack index v3 format
2025-09-19 22:08 ` Junio C Hamano
@ 2025-09-20 15:23 ` brian m. carlson
2025-09-20 17:01 ` Junio C Hamano
0 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-09-20 15:23 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 1830 bytes --]
On 2025-09-19 at 22:08:03, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
> > Our current pack index v3 format uses 4-byte integers to find the
> > trailer of the file. This effectively means that the file cannot be
> > much larger than 2^32. While this might at first seem to be okay, we
> > expect that each object will have at least 64 bytes worth of data, which
> > means that no more than about 67 million objects can be stored.
> >
> > Again, this might seem fine, but unfortunately, we know of many users
> > who attempt to create repos with extremely large numbers of commits to
> > get a "high score," and we've already seen repositories with at least 55
> > million commits. In the interests of gracefully handling repositories
> > even for these well-intentioned but ultimately misguided users, let's
> > change these lengths to 8 bytes.
>
> Very sensible.
>
> I do also agree that 32-byte is the natural size for the trailing
> hash, but I found that the two paragraphs below was far more than
> necessary. As they argue, we use a truncated hash anywhere in our
> file formats, so I would have understood if the explanation were
>
> "20" in "A copy of the 20-byte SHA-256 checksum" is an obvious
> typo, as SHA-256 is longer than that. Fix it to "32".
>
> instead of these two paragraphs.
>
> Or did we mean to use a truncated hash back when this transition
> design was proposed originally?
I think we intended to use a 20-byte value originally because we felt we
didn't need the full 32 bytes for an index or pack checksum. However,
as I mentioned, we use the 32-byte checksum for SHA-256 already, so all
it does is add complexity to try to mandate a 20-byte value.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 1/9] docs: update pack index v3 format
2025-09-20 15:23 ` brian m. carlson
@ 2025-09-20 17:01 ` Junio C Hamano
0 siblings, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2025-09-20 17:01 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> I do also agree that 32-byte is the natural size for the trailing
>> hash, but I found that the two paragraphs below was far more than
>> necessary. As they argue, we use a truncated hash anywhere in our
>> file formats, so I would have understood if the explanation were
>>
>> "20" in "A copy of the 20-byte SHA-256 checksum" is an obvious
>> typo, as SHA-256 is longer than that. Fix it to "32".
>>
>> instead of these two paragraphs.
>>
>> Or did we mean to use a truncated hash back when this transition
>> design was proposed originally?
>
> I think we intended to use a 20-byte value originally because we felt we
> didn't need the full 32 bytes for an index or pack checksum. However,
> as I mentioned, we use the 32-byte checksum for SHA-256 already, so all
> it does is add complexity to try to mandate a 20-byte value.
I think we are saying the same thing but from different sides of the
same mirror.
SHA-256 packs and any csum-file based file would be using 32-byte
checksum because with CSUM_HASH_IN_STREAM, finalize_hashfile() does
not know any way to produce the trailing hash other than writing the
full hash value, and that would be 32 bytes for SHA-256. This was
exactly where my "20 certainly is a typo" impression came from.
Be it a typo or misdesign, picking 32 instead of 20 is a good thing
to do now for a subsystem and fileformat that is not used anywhere
in producation yet.
Thanks.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 1/9] docs: update pack index v3 format
2025-09-19 1:09 ` [PATCH 1/9] docs: update pack index v3 format brian m. carlson
2025-09-19 22:08 ` Junio C Hamano
@ 2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:39 ` brian m. carlson
1 sibling, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 7:55 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Junio C Hamano, Derrick Stolee
On Fri, Sep 19, 2025 at 01:09:03AM +0000, brian m. carlson wrote:
> Our current pack index v3 format uses 4-byte integers to find the
> trailer of the file. This effectively means that the file cannot be
> much larger than 2^32. While this might at first seem to be okay, we
> expect that each object will have at least 64 bytes worth of data, which
> means that no more than about 67 million objects can be stored.
>
> Again, this might seem fine, but unfortunately, we know of many users
> who attempt to create repos with extremely large numbers of commits to
> get a "high score," and we've already seen repositories with at least 55
> million commits. In the interests of gracefully handling repositories
> even for these well-intentioned but ultimately misguided users, let's
> change these lengths to 8 bytes.
Yeah, this makes sense. We can only assume that repositories will
continue to grow, so it makes sense to future proof.
We also have the 4-byte number of objects contained in the pack. But as
you explain, it's nothing we should need to worry about given that this
is a mere counter, and not an offset into the file. I doubt that there's
repositories out there that'll have more than 4 billion objects anytime
soon.
> For the checksums at the end of the file, we're producing 32-byte
> SHA-256 checksums because that's what we already do with pack index v2
> and SHA-256. Truncating SHA-256 doesn't pose any actual security
> problems other than those related to the reduced size, but our pack
> checksum must already be 32 bytes (since SHA-256 packs have 32-byte
> checksums) and it simplifies the code to use the existing hashfile logic
> for these cases for the index checksum as well.
>
> In addition, even though we may not need cryptographic security for the
> index checksum, we'd like to avoid arguments from auditors and such for
> organizations that may have compliance or security requirements. Using
> the simple, boring choice of the full SHA-256 hash avoids all possible
> discussion related to hash truncation and removes impediments for these
> organizations.
For now we only have SHA256 and SHA1. But thinking about the future,
there will be a time when SHA256 will be considered broken. I wonder
whether we should safeguard against that and also specify the trailer
hash to be agile? That is, instead of hardcoding the hash function, we
add something like a "primary" hash to the packfile and then use the
full output of that hash as checksum.
In any case, please feel free to say "no" to the above thought. It's
just something that popped into my mind upon reading this.
I guess one thing that should be explicitly pointed out in the commit
message is that there are no implementations of the v3 format yet, so
this is basically updating our envisioned design, only. Otherwise one
might wonder why we can update the spec just so.
Patrick
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 1/9] docs: update pack index v3 format
2025-09-24 7:55 ` Patrick Steinhardt
@ 2025-09-25 21:39 ` brian m. carlson
0 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-25 21:39 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 3615 bytes --]
On 2025-09-24 at 07:55:29, Patrick Steinhardt wrote:
> On Fri, Sep 19, 2025 at 01:09:03AM +0000, brian m. carlson wrote:
> > Our current pack index v3 format uses 4-byte integers to find the
> > trailer of the file. This effectively means that the file cannot be
> > much larger than 2^32. While this might at first seem to be okay, we
> > expect that each object will have at least 64 bytes worth of data, which
> > means that no more than about 67 million objects can be stored.
> >
> > Again, this might seem fine, but unfortunately, we know of many users
> > who attempt to create repos with extremely large numbers of commits to
> > get a "high score," and we've already seen repositories with at least 55
> > million commits. In the interests of gracefully handling repositories
> > even for these well-intentioned but ultimately misguided users, let's
> > change these lengths to 8 bytes.
>
> Yeah, this makes sense. We can only assume that repositories will
> continue to grow, so it makes sense to future proof.
>
> We also have the 4-byte number of objects contained in the pack. But as
> you explain, it's nothing we should need to worry about given that this
> is a mere counter, and not an offset into the file. I doubt that there's
> repositories out there that'll have more than 4 billion objects anytime
> soon.
There are certainly some users who try to do that at $DAYJOB, but they
come to our attention (because our maintenance job fails due to taking
too long) before they get there. I am not, however, aware of any
actually legitimate and productive uses of repositories that threaten to
break that limit, which is what I think what we should really care
about.
In the event we start seeing those kinds of problems, it should be easy
to implement pack v5 with a corresponding index, just with a larger
number of objects.
> For now we only have SHA256 and SHA1. But thinking about the future,
> there will be a time when SHA256 will be considered broken. I wonder
> whether we should safeguard against that and also specify the trailer
> hash to be agile? That is, instead of hardcoding the hash function, we
> add something like a "primary" hash to the packfile and then use the
> full output of that hash as checksum.
>
> In any case, please feel free to say "no" to the above thought. It's
> just something that popped into my mind upon reading this.
It is actually that it's the main hash algorithm in use. So if we add a
third algorithm which is SHA-3-512, then the trailer checksum will be
SHA-3-512 when that's the main algorithm.
Technically, it's also SHA-1 if we're in a SHA-1 repository with SHA-256
compatibility. That's not a use case I really encourage, but it is a
use case I'm testing because it exposes bugs in our codebase and I
expect people will want to do in-place conversion from SHA-1 only to
SHA-1 with SHA-256 at some point.
I'll fix that for v2.
> I guess one thing that should be explicitly pointed out in the commit
> message is that there are no implementations of the v3 format yet, so
> this is basically updating our envisioned design, only. Otherwise one
> might wonder why we can update the spec just so.
That isn't completely true. There is an implementation, but it is not
yet on the list, and it follows the spec written here. I will provide
documentation with the rest of the pack index code when index v3 comes
in, but I wanted to update this in case people are trying to add it in
other implementations as well.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 2/9] docs: update offset order for pack index v3
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-09-19 1:09 ` [PATCH 1/9] docs: update pack index v3 format brian m. carlson
@ 2025-09-19 1:09 ` brian m. carlson
2025-09-19 1:09 ` [PATCH 3/9] docs: reflect actual double signature for tags brian m. carlson
` (8 subsequent siblings)
10 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-19 1:09 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Derrick Stolee
The current design of pack index v3 has items in two different orders:
sorted shortened object ID order and pack order. The shortened object
IDs and the pack index offset values are in the former order and
everything else is in the latter.
This, however, poses some problems. We have many parts of the packfile
code that expect to find out data about an object knowing only its index
in pack order. With the current design, to find the pack offset after
having looked up the index in pack order, we must then look up the full
object ID and use that to look up the shortened object ID to find the
pack offset, which is inconvenient, inefficient, and leads to poor cache
usage.
Instead, let's change the offset values to be looked up by pack order.
This works better because once we know the pack order offset, we can
find the full object name and its location in the pack with a simple
index into their respective tables. This makes many operations much
more efficient, especially with the functions we already have, and it
avoids the need for the revindex with pack index v3.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/technical/hash-function-transition.adoc | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
index f2df1d618d..11c4f2950a 100644
--- a/Documentation/technical/hash-function-transition.adoc
+++ b/Documentation/technical/hash-function-transition.adoc
@@ -260,12 +260,10 @@ network byte order):
compressed data to be copied directly from pack to pack during
repacking without undetected data corruption.
- * A table of 4-byte offset values. For an object in the table of
- sorted shortened object names, the value at the corresponding
- index in this table indicates where that object can be found in
- the pack file. These are usually 31-bit pack file offsets, but
- large offsets are encoded as an index into the next table with the
- most significant bit set.
+ * A table of 4-byte offset values. The index of this table in pack order
+ indicates where that object can be found in the pack file. These are
+ usually 31-bit pack file offsets, but large offsets are encoded as
+ an index into the next table with the most significant bit set.
* A table of 8-byte offset entries (empty for pack files less than
2 GiB). Pack files are organized with heavily used objects toward
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH 3/9] docs: reflect actual double signature for tags
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-09-19 1:09 ` [PATCH 1/9] docs: update pack index v3 format brian m. carlson
2025-09-19 1:09 ` [PATCH 2/9] docs: update offset order for pack index v3 brian m. carlson
@ 2025-09-19 1:09 ` brian m. carlson
2025-09-19 22:34 ` Junio C Hamano
2025-09-19 1:09 ` [PATCH 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
` (7 subsequent siblings)
10 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-09-19 1:09 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Derrick Stolee
The documentation for the hash function transition reflects the original
design where the SHA-256 signature would always be placed in a header.
However, due to a missed patch in Git 2.29, we shipped SHA-256 support
such that the signature for the current algorithm is always an in-body
signature and the opposite algorithm is always in a header. Since the
documentation is inaccurate, update it to reflect the correct
information.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
.../technical/hash-function-transition.adoc | 20 ++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
index 11c4f2950a..27c90e3729 100644
--- a/Documentation/technical/hash-function-transition.adoc
+++ b/Documentation/technical/hash-function-transition.adoc
@@ -425,17 +425,19 @@ ordinary unsigned commit.
Signed Tags
~~~~~~~~~~~
-We add a new field "gpgsig-sha256" to the tag object format to allow
-signing tags without relying on SHA-1. Its signed payload is the
-SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
-SIGNATURE-----" delimited in-body signature removed.
+We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
+allow signing tags in both formats. The in-body signature is used for the
+signature in the current hash algorithm and the header is used for the
+signature in the other algorithm. Thus, a dual-signature tag will contain both
+an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
+object or both an in-body signature and a gpgsig header for the SHA-256 format
+of and object.
-This means tags can be signed
+The signed payload of the tag is the content of the tag in the current
+algorithm with both its gpgsig and gpgsig-sha256 fields and
+"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed.
-1. using SHA-1 only, as in existing signed tag objects
-2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
- signature.
-3. using only SHA-256, by only using the gpgsig-sha256 field.
+This means tags can be signed using one or both algorithms.
Mergetag embedding
~~~~~~~~~~~~~~~~~~
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH 3/9] docs: reflect actual double signature for tags
2025-09-19 1:09 ` [PATCH 3/9] docs: reflect actual double signature for tags brian m. carlson
@ 2025-09-19 22:34 ` Junio C Hamano
2025-09-20 15:29 ` brian m. carlson
0 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2025-09-19 22:34 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> Signed Tags
> ~~~~~~~~~~~
> +We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
> +allow signing tags in both formats. The in-body signature is used for the
> +signature in the current hash algorithm and the header is used for the
> +signature in the other algorithm. Thus, a dual-signature tag will contain both
Not suggesting a change in the text, but to make sure I am reading
the new text correctly. Does "the other algorithm" refer to the
compatibility hash algorithm specified by the compatObjectFormat
extension and the "current" algorithm refers to the objectFormat
extension?
> +an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
> +object or both an in-body signature and a gpgsig header for the SHA-256 format
> +of and object.
>
> -This means tags can be signed
> +The signed payload of the tag is the content of the tag in the current
> +algorithm with both its gpgsig and gpgsig-sha256 fields and
My reading of the previous paragraph is that we cannot have gpgsig
and gpgsig-sha256 fields on a single object at the same time.
Should we say "gpgsig or gpgsig-sha256" (instead of "and"), to get
the resulting text parsable as:
both
its gpgsig or gpgsig-sha256 fields
and
"-----BEGIN PGP SIGNATURE-----" delimited in-body signature
removed.
instead?
> +"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed.
>
> -1. using SHA-1 only, as in existing signed tag objects
> -2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
> - signature.
> -3. using only SHA-256, by only using the gpgsig-sha256 field.
> +This means tags can be signed using one or both algorithms.
>
> Mergetag embedding
> ~~~~~~~~~~~~~~~~~~
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 3/9] docs: reflect actual double signature for tags
2025-09-19 22:34 ` Junio C Hamano
@ 2025-09-20 15:29 ` brian m. carlson
2025-09-20 17:04 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
0 siblings, 2 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-20 15:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 2154 bytes --]
On 2025-09-19 at 22:34:02, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
> > Signed Tags
> > ~~~~~~~~~~~
> > +We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
> > +allow signing tags in both formats. The in-body signature is used for the
> > +signature in the current hash algorithm and the header is used for the
> > +signature in the other algorithm. Thus, a dual-signature tag will contain both
>
> Not suggesting a change in the text, but to make sure I am reading
> the new text correctly. Does "the other algorithm" refer to the
> compatibility hash algorithm specified by the compatObjectFormat
> extension and the "current" algorithm refers to the objectFormat
> extension?
The "current algorithm" is usually the main algorithm (that is, SHA-256
where `extensions.objectformat` is `sha256`) and the "other algorithm"
is the compatibility algorithm (SHA-1 in that case). However, when you
convert that object to SHA-1 to hash it in SHA-1, the "current
algorithm" becomes SHA-1 and the "other algorithm" is SHA-256.
Does that make sense?
> > +an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
> > +object or both an in-body signature and a gpgsig header for the SHA-256 format
> > +of and object.
> >
> > -This means tags can be signed
> > +The signed payload of the tag is the content of the tag in the current
> > +algorithm with both its gpgsig and gpgsig-sha256 fields and
>
> My reading of the previous paragraph is that we cannot have gpgsig
> and gpgsig-sha256 fields on a single object at the same time.
Correct, unless we come up with a third hash algorithm. Hopefully that
is a long way away, and we are not considering that case here.
> Should we say "gpgsig or gpgsig-sha256" (instead of "and"), to get
> the resulting text parsable as:
>
> both
> its gpgsig or gpgsig-sha256 fields
> and
> "-----BEGIN PGP SIGNATURE-----" delimited in-body signature
> removed.
>
> instead?
Sure, I'll include that in a reroll.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 3/9] docs: reflect actual double signature for tags
2025-09-20 15:29 ` brian m. carlson
@ 2025-09-20 17:04 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
1 sibling, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2025-09-20 17:04 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> On 2025-09-19 at 22:34:02, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>>
>> > Signed Tags
>> > ~~~~~~~~~~~
>> > +We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
>> > +allow signing tags in both formats. The in-body signature is used for the
>> > +signature in the current hash algorithm and the header is used for the
>> > +signature in the other algorithm. Thus, a dual-signature tag will contain both
>>
>> Not suggesting a change in the text, but to make sure I am reading
>> the new text correctly. Does "the other algorithm" refer to the
>> compatibility hash algorithm specified by the compatObjectFormat
>> extension and the "current" algorithm refers to the objectFormat
>> extension?
>
> The "current algorithm" is usually the main algorithm (that is, SHA-256
> where `extensions.objectformat` is `sha256`) and the "other algorithm"
> is the compatibility algorithm (SHA-1 in that case). However, when you
> convert that object to SHA-1 to hash it in SHA-1, the "current
> algorithm" becomes SHA-1 and the "other algorithm" is SHA-256.
>
> Does that make sense?
Let me see if I got it right by trying to paraphrase the above.
For any object that is suitable to be stored in a repository with
objectFormat and compatObjectFormat set, "current" is the former,
and "the other" is the latter.
Your goal is not educating me, though. I wanted to make sure that
the text would be understood by the target audience of this document
in a way you intended it to be.
Thanks.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 3/9] docs: reflect actual double signature for tags
2025-09-20 15:29 ` brian m. carlson
2025-09-20 17:04 ` Junio C Hamano
@ 2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:46 ` brian m. carlson
1 sibling, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 7:55 UTC (permalink / raw)
To: brian m. carlson, Junio C Hamano, git, Derrick Stolee
On Sat, Sep 20, 2025 at 03:29:06PM +0000, brian m. carlson wrote:
> On 2025-09-19 at 22:34:02, Junio C Hamano wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > > +an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
> > > +object or both an in-body signature and a gpgsig header for the SHA-256 format
> > > +of and object.
> > >
> > > -This means tags can be signed
> > > +The signed payload of the tag is the content of the tag in the current
> > > +algorithm with both its gpgsig and gpgsig-sha256 fields and
> >
> > My reading of the previous paragraph is that we cannot have gpgsig
> > and gpgsig-sha256 fields on a single object at the same time.
>
> Correct, unless we come up with a third hash algorithm. Hopefully that
> is a long way away, and we are not considering that case here.
You mentioned a "missed patch" in the commit message. So is this design
here intentional or merely an oversight?
I'm mostly asking because it feels weird to me that an object shouldn't
have both fields. I would assume that it's easier to implement and
reason about if this signature always was a header, or multiple that is.
But I'm not familiar enough with the logic here to really judge, so I
assume that there are good reasons that I miss.
Patrick
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 3/9] docs: reflect actual double signature for tags
2025-09-24 7:55 ` Patrick Steinhardt
@ 2025-09-25 21:46 ` brian m. carlson
0 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-25 21:46 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Junio C Hamano, git, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 1874 bytes --]
On 2025-09-24 at 07:55:39, Patrick Steinhardt wrote:
> You mentioned a "missed patch" in the commit message. So is this design
> here intentional or merely an oversight?
The original design was to implement all SHA-256 signatures in the
`gpgsig-sha256` header, but the patch to do that got dropped
accidentally for 2.29, so we shipped without it. I decided to fix it in
a compatible way for 2.30 using the design here so that users who had
created SHA-256 tags with 2.29 would not have them be mistaken for
signatures over the SHA-1 values of the tag by Git 2.30.
I knew that people would try things out nearly immediately and that some
people would use very old versions of Git from their LTS distro and did
not want to risk making an incompatible change that would break the
object format, even while things were marked experimental.
> I'm mostly asking because it feels weird to me that an object shouldn't
> have both fields. I would assume that it's easier to implement and
> reason about if this signature always was a header, or multiple that is.
> But I'm not familiar enough with the logic here to really judge, so I
> assume that there are good reasons that I miss.
We should not have both fields. In the SHA-256 version of the tag, the
in-body signature is SHA-256 and there is optionally a `gpgsig` header
for the SHA-1 version of the tag. When that tag is converted into SHA-1
format, the in-body signature moves to the `gpgsig-sha256` header and
the one that was formerly in the `gpgsig` header is placed in body and
that header is removed.
So we will never have both unless we have an additional hash algorithm,
say, SHA-3-512, where, when in SHA-3-512 format, the in-body signature
is over SHA-3-512 and there may be both `gpgsig` and `gpgsig-sha256`
headers.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 4/9] docs: improve ambiguous areas of pack format documentation
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (2 preceding siblings ...)
2025-09-19 1:09 ` [PATCH 3/9] docs: reflect actual double signature for tags brian m. carlson
@ 2025-09-19 1:09 ` brian m. carlson
2025-09-19 23:04 ` Junio C Hamano
2025-09-19 1:09 ` [PATCH 5/9] docs: add documentation for loose objects brian m. carlson
` (6 subsequent siblings)
10 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-09-19 1:09 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Derrick Stolee
It is fair to say that our pack and indexing code is quite complex.
Contributors who wish to work on this code or implementors of other
implementations would benefit from clear, unambiguous documentation
about how our data formats are structured and encoded and what data is
used in the computation of certain values. Unfortunately, some of this
data is missing, which leads to confusion and frustration.
Let's document some of this data to help clarify things. Specify over
what data CRC32 values are computed and also note which CRC32 algorithm
is used, since Wikipedia mentions at least four 32-bit CRC algorithms
and notes that it's possible to use different bit orderings.
In addition, note how we encode objects in the pack. One might be led
to believe that packed objects are always stored with the "<type>
<size>\0" prefix of loose objects, but that is not the case, although
for obvious reasons this data is included in the computation of the
object ID. Explain why this is for the curious reader.
Finally, indicate what the size field of the packed object represents.
Otherwise, a reader might think that the size of a delta is the size of
the full object or that it might contain the offset or object ID,
neither of which are the case. Explain clearly, however, that the
values represent uncompressed sizes to avoid confusion.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/gitformat-pack.adoc | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/Documentation/gitformat-pack.adoc b/Documentation/gitformat-pack.adoc
index d6ae229be5..9b7af5c184 100644
--- a/Documentation/gitformat-pack.adoc
+++ b/Documentation/gitformat-pack.adoc
@@ -32,6 +32,10 @@ In a repository using the traditional SHA-1, pack checksums, index checksums,
and object IDs (object names) mentioned below are all computed using SHA-1.
Similarly, in SHA-256 repositories, these values are computed using SHA-256.
+CRC32 checksums are always computed over the entire packed object, including
+the header (n-byte type and length); the base object name or offset, if any;
+and the entire compressed object. The CRC32 algorithm used is that of zlib.
+
== pack-*.pack files have the following format:
- A header appears at the beginning and consists of the following:
@@ -80,6 +84,15 @@ Valid object types are:
Type 5 is reserved for future expansion. Type 0 is invalid.
+=== Object encoding
+
+Unlike loose objects, packed objects do not have a prefix containing the type,
+size, and a NUL byte. These are not necessary because they can be determined by
+the n-byte type and length that prefixes the data and so they are omitted from
+the compressed and deltified data.
+
+The computation of the object ID still uses this prefix, however.
+
=== Size encoding
This document uses the following "size encoding" of non-negative
@@ -92,6 +105,11 @@ values are more significant.
This size encoding should not be confused with the "offset encoding",
which is also used in this document.
+When encoding the size of an undeltified object in a pack, the size is that of
+the uncompressed raw object. For deltified objects, it is the size of the
+uncompressed delta. The base object name or offset is not included in the size
+computation.
+
=== Deltified representation
Conceptually there are only four object types: commit, tree, tag and
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH 4/9] docs: improve ambiguous areas of pack format documentation
2025-09-19 1:09 ` [PATCH 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
@ 2025-09-19 23:04 ` Junio C Hamano
0 siblings, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2025-09-19 23:04 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> +=== Object encoding
> +
> +Unlike loose objects, packed objects do not have a prefix containing the type,
> +size, and a NUL byte. These are not necessary because they can be determined by
> +the n-byte type and length that prefixes the data and so they are omitted from
> +the compressed and deltified data.
> +
> +The computation of the object ID still uses this prefix, however.
Not wrong per-se, but I've always viewd that the in-pack object
header with n-byte type and length was an optimized representation
that stands in for the textual type+size+NUL, just like the payload
part also uses object representation different from that is used for
loose objects for performance.
And when you view the in-pack object header that way, "are not
necessary" and everything follows in the above appear to somewhat
miss the point. It is not just "type size<NUL>" that is recreated
on the fly for computation of the same object name as in the loose
object form, but the payload also is recreated on the fly to match
what loose object would have had, e.g., a deltified representation
would be reconstituted into non-deltified form, etc.
IOW, I would have exprected the description to go more along this
line intead.
Packed objects use the n-byte type and length in-pack object
header, with in-pack specific representation of the object data.
In order to compute the same object name as if the object were
loose, the object representation used in the loose object is
virtually recreated by translating n-byte type and length to the
textual type + size + NUL, concatenated with the undeltified and
inflated object data and hashing the result.
> === Size encoding
>
> This document uses the following "size encoding" of non-negative
> @@ -92,6 +105,11 @@ values are more significant.
> This size encoding should not be confused with the "offset encoding",
> which is also used in this document.
>
> +When encoding the size of an undeltified object in a pack, the size is that of
> +the uncompressed raw object. For deltified objects, it is the size of the
> +uncompressed delta. The base object name or offset is not included in the size
> +computation.
This is an important point worth describing. Very nice.
If we wanted to help the curious, we can say that these are used to
both help us know beforehand how much memory to allocate, before we
inflate and/or to run patch-delta on the payload.
Thanks.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 5/9] docs: add documentation for loose objects
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (3 preceding siblings ...)
2025-09-19 1:09 ` [PATCH 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
@ 2025-09-19 1:09 ` brian m. carlson
2025-09-19 19:10 ` Junio C Hamano
` (2 more replies)
2025-09-19 1:09 ` [PATCH 6/9] rev-parse: allow printing compatibility hash brian m. carlson
` (5 subsequent siblings)
10 siblings, 3 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-19 1:09 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Derrick Stolee
We currently have no documentation for how loose objects are stored.
Let's add some here so its easy for people to understand how they
work.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/gitformat-loose.adoc | 49 ++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
create mode 100644 Documentation/gitformat-loose.adoc
diff --git a/Documentation/gitformat-loose.adoc b/Documentation/gitformat-loose.adoc
new file mode 100644
index 0000000000..c8bef606fb
--- /dev/null
+++ b/Documentation/gitformat-loose.adoc
@@ -0,0 +1,49 @@
+gitformat-loose(5)
+==================
+
+NAME
+----
+gitformat-loose - Git loose object format
+
+
+SYNOPSIS
+--------
+[verse]
+$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
+$GIT_DIR/objects/loose-object-idx
+$GIT_DIR/objects/loose-map/map-*.map
+
+DESCRIPTION
+-----------
+
+Loose objects are how Git initially stores most of its primary repository data.
+Over the lifetime of a repository, objects are usually written as loose objects
+initially and then converted into packs.
+
+== Loose objects
+
+Each loose object contains a prefix, followed immediately by the data of the
+object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
+`tree`, `commit`, or `tag` and `size` is the size of the data (without the
+prefix) as a decimal integer expressed in ASCII.
+
+The entire contents, prefix and data concatenated, is then compressed with zlib
+and the compressed data is stored in the file. The object ID of the object is
+the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
+
+The file for the loose object is stored under the `objects` directory, with the
+first two hex characters of the object ID being the directory and the remaining
+characters being the file name.
+
+As an example, the empty tree contains the data (when uncompressed) `tree 0\0`
+and, in a SHA-256 repository, would have the object ID
+`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be
+stored under
+`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`.
+
+Similarly, a blob containing the contents `abc` would have the uncompressed
+data of `blob 3\0abc`.
+
+GIT
+---
+Part of the linkgit:git[1] suite
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH 5/9] docs: add documentation for loose objects
2025-09-19 1:09 ` [PATCH 5/9] docs: add documentation for loose objects brian m. carlson
@ 2025-09-19 19:10 ` Junio C Hamano
2025-09-19 19:13 ` Junio C Hamano
2025-09-19 23:16 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2025-09-19 19:10 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> We currently have no documentation for how loose objects are stored.
> Let's add some here so its easy for people to understand how they
> work.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> Documentation/gitformat-loose.adoc | 49 ++++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
> create mode 100644 Documentation/gitformat-loose.adoc
Fails a build, unfortunately.
...
LINT DOCSTYLE includes/cmd-config-section-rest.adoc
GEN lint-docs-manpages
LINT DOCSTYLE includes/cmd-config-section-all.adoc
tmp-meson-diff/meson.adoc tmp-meson-diff/actual.adoc differ: char 3297, line 176
Meson man pages differ from actual man pages:
--- tmp-meson-diff/meson.adoc 2025-09-19 12:04:55.145229743 -0700
+++ tmp-meson-diff/actual.adoc 2025-09-19 12:04:55.149229734 -0700
@@ -173,6 +173,7 @@
gitformat-chunk.adoc
gitformat-commit-graph.adoc
gitformat-index.adoc
+gitformat-loose.adoc
gitformat-pack.adoc
gitformat-signature.adoc
gitglossary.adoc
Thanks.
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH 5/9] docs: add documentation for loose objects
2025-09-19 19:10 ` Junio C Hamano
@ 2025-09-19 19:13 ` Junio C Hamano
2025-09-19 19:15 ` brian m. carlson
` (2 more replies)
0 siblings, 3 replies; 67+ messages in thread
From: Junio C Hamano @ 2025-09-19 19:13 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
Junio C Hamano <gitster@pobox.com> writes:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> We currently have no documentation for how loose objects are stored.
>> Let's add some here so its easy for people to understand how they
>> work.
>>
>> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
>> ---
>> Documentation/gitformat-loose.adoc | 49 ++++++++++++++++++++++++++++++
>> 1 file changed, 49 insertions(+)
>> create mode 100644 Documentation/gitformat-loose.adoc
>
> Fails a build, unfortunately.
>
> ...
> LINT DOCSTYLE includes/cmd-config-section-rest.adoc
> GEN lint-docs-manpages
> LINT DOCSTYLE includes/cmd-config-section-all.adoc
> tmp-meson-diff/meson.adoc tmp-meson-diff/actual.adoc differ: char 3297, line 176
> Meson man pages differ from actual man pages:
> --- tmp-meson-diff/meson.adoc 2025-09-19 12:04:55.145229743 -0700
> +++ tmp-meson-diff/actual.adoc 2025-09-19 12:04:55.149229734 -0700
> @@ -173,6 +173,7 @@
> gitformat-chunk.adoc
> gitformat-commit-graph.adoc
> gitformat-index.adoc
> +gitformat-loose.adoc
> gitformat-pack.adoc
> gitformat-signature.adoc
> gitglossary.adoc
>
> Thanks.
Probably this should be sufficient? Not tested (yet).
diff --git a/Documentation/meson.build b/Documentation/meson.build
index 4404c623f0..93fa3dee8b 100644
--- a/Documentation/meson.build
+++ b/Documentation/meson.build
@@ -171,6 +171,7 @@ manpages = {
'gitformat-chunk.adoc' : 5,
'gitformat-commit-graph.adoc' : 5,
'gitformat-index.adoc' : 5,
+ 'gitformat-loose.adoc' : 5,
'gitformat-pack.adoc' : 5,
'gitformat-signature.adoc' : 5,
'githooks.adoc' : 5,
--
2.51.0-409-gb2b0f57e0f
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH 5/9] docs: add documentation for loose objects
2025-09-19 19:13 ` Junio C Hamano
@ 2025-09-19 19:15 ` brian m. carlson
2025-09-19 20:18 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-19 19:15 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 678 bytes --]
On 2025-09-19 at 19:13:34, Junio C Hamano wrote:
> diff --git a/Documentation/meson.build b/Documentation/meson.build
> index 4404c623f0..93fa3dee8b 100644
> --- a/Documentation/meson.build
> +++ b/Documentation/meson.build
> @@ -171,6 +171,7 @@ manpages = {
> 'gitformat-chunk.adoc' : 5,
> 'gitformat-commit-graph.adoc' : 5,
> 'gitformat-index.adoc' : 5,
> + 'gitformat-loose.adoc' : 5,
> 'gitformat-pack.adoc' : 5,
> 'gitformat-signature.adoc' : 5,
> 'githooks.adoc' : 5,
I'll figure it out for v2. Thanks for the tip; I'll send out v2 once
others have provided any other comments.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH 5/9] docs: add documentation for loose objects
2025-09-19 19:13 ` Junio C Hamano
2025-09-19 19:15 ` brian m. carlson
@ 2025-09-19 20:18 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2 siblings, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2025-09-19 20:18 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
Junio C Hamano <gitster@pobox.com> writes:
> Probably this should be sufficient? Not tested (yet).
I did test and my local build of 'seen' no longer barfs with this.
> diff --git a/Documentation/meson.build b/Documentation/meson.build
> index 4404c623f0..93fa3dee8b 100644
> --- a/Documentation/meson.build
> +++ b/Documentation/meson.build
> @@ -171,6 +171,7 @@ manpages = {
> 'gitformat-chunk.adoc' : 5,
> 'gitformat-commit-graph.adoc' : 5,
> 'gitformat-index.adoc' : 5,
> + 'gitformat-loose.adoc' : 5,
> 'gitformat-pack.adoc' : 5,
> 'gitformat-signature.adoc' : 5,
> 'githooks.adoc' : 5,
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH 5/9] docs: add documentation for loose objects
2025-09-19 19:13 ` Junio C Hamano
2025-09-19 19:15 ` brian m. carlson
2025-09-19 20:18 ` Junio C Hamano
@ 2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:40 ` brian m. carlson
2 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 7:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: brian m. carlson, git, Derrick Stolee
On Fri, Sep 19, 2025 at 12:13:34PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >
> >> We currently have no documentation for how loose objects are stored.
> >> Let's add some here so its easy for people to understand how they
> >> work.
> >>
> >> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> >> ---
> >> Documentation/gitformat-loose.adoc | 49 ++++++++++++++++++++++++++++++
> >> 1 file changed, 49 insertions(+)
> >> create mode 100644 Documentation/gitformat-loose.adoc
> >
> > Fails a build, unfortunately.
> >
> > ...
> > LINT DOCSTYLE includes/cmd-config-section-rest.adoc
> > GEN lint-docs-manpages
> > LINT DOCSTYLE includes/cmd-config-section-all.adoc
> > tmp-meson-diff/meson.adoc tmp-meson-diff/actual.adoc differ: char 3297, line 176
> > Meson man pages differ from actual man pages:
> > --- tmp-meson-diff/meson.adoc 2025-09-19 12:04:55.145229743 -0700
> > +++ tmp-meson-diff/actual.adoc 2025-09-19 12:04:55.149229734 -0700
> > @@ -173,6 +173,7 @@
> > gitformat-chunk.adoc
> > gitformat-commit-graph.adoc
> > gitformat-index.adoc
> > +gitformat-loose.adoc
> > gitformat-pack.adoc
> > gitformat-signature.adoc
> > gitglossary.adoc
> >
> > Thanks.
>
> Probably this should be sufficient? Not tested (yet).
>
>
>
> diff --git a/Documentation/meson.build b/Documentation/meson.build
> index 4404c623f0..93fa3dee8b 100644
> --- a/Documentation/meson.build
> +++ b/Documentation/meson.build
> @@ -171,6 +171,7 @@ manpages = {
> 'gitformat-chunk.adoc' : 5,
> 'gitformat-commit-graph.adoc' : 5,
> 'gitformat-index.adoc' : 5,
> + 'gitformat-loose.adoc' : 5,
> 'gitformat-pack.adoc' : 5,
> 'gitformat-signature.adoc' : 5,
> 'githooks.adoc' : 5,
Yup, this one looks correct. But in fact, we also need a similar change
to our Makefile.
Patrick
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6fb83d0c6e..e1d38fbfe6 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -34,6 +34,7 @@ MAN5_TXT += gitformat-bundle.adoc
MAN5_TXT += gitformat-chunk.adoc
MAN5_TXT += gitformat-commit-graph.adoc
MAN5_TXT += gitformat-index.adoc
+MAN5_TXT += gitformat-loose.adoc
MAN5_TXT += gitformat-pack.adoc
MAN5_TXT += gitformat-signature.adoc
MAN5_TXT += githooks.adoc
diff --git a/Documentation/meson.build b/Documentation/meson.build
index 41f43e0336..64f70ac724 100644
--- a/Documentation/meson.build
+++ b/Documentation/meson.build
@@ -172,6 +172,7 @@ manpages = {
'gitformat-chunk.adoc' : 5,
'gitformat-commit-graph.adoc' : 5,
'gitformat-index.adoc' : 5,
+ 'gitformat-loose.adoc' : 5,
'gitformat-pack.adoc' : 5,
'gitformat-signature.adoc' : 5,
'githooks.adoc' : 5,
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH 5/9] docs: add documentation for loose objects
2025-09-24 7:55 ` Patrick Steinhardt
@ 2025-09-25 21:40 ` brian m. carlson
0 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-25 21:40 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Junio C Hamano, git, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]
On 2025-09-24 at 07:55:45, Patrick Steinhardt wrote:
> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index 6fb83d0c6e..e1d38fbfe6 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -34,6 +34,7 @@ MAN5_TXT += gitformat-bundle.adoc
> MAN5_TXT += gitformat-chunk.adoc
> MAN5_TXT += gitformat-commit-graph.adoc
> MAN5_TXT += gitformat-index.adoc
> +MAN5_TXT += gitformat-loose.adoc
> MAN5_TXT += gitformat-pack.adoc
> MAN5_TXT += gitformat-signature.adoc
> MAN5_TXT += githooks.adoc
> diff --git a/Documentation/meson.build b/Documentation/meson.build
> index 41f43e0336..64f70ac724 100644
> --- a/Documentation/meson.build
> +++ b/Documentation/meson.build
> @@ -172,6 +172,7 @@ manpages = {
> 'gitformat-chunk.adoc' : 5,
> 'gitformat-commit-graph.adoc' : 5,
> 'gitformat-index.adoc' : 5,
> + 'gitformat-loose.adoc' : 5,
> 'gitformat-pack.adoc' : 5,
> 'gitformat-signature.adoc' : 5,
> 'githooks.adoc' : 5,
I've got these already fixed up for v2.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 5/9] docs: add documentation for loose objects
2025-09-19 1:09 ` [PATCH 5/9] docs: add documentation for loose objects brian m. carlson
2025-09-19 19:10 ` Junio C Hamano
@ 2025-09-19 23:16 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2 siblings, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2025-09-19 23:16 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> We currently have no documentation for how loose objects are stored.
> Let's add some here so its easy for people to understand how they
> work.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> Documentation/gitformat-loose.adoc | 49 ++++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
> create mode 100644 Documentation/gitformat-loose.adoc
>
> diff --git a/Documentation/gitformat-loose.adoc b/Documentation/gitformat-loose.adoc
> new file mode 100644
> index 0000000000..c8bef606fb
> --- /dev/null
> +++ b/Documentation/gitformat-loose.adoc
> @@ -0,0 +1,49 @@
> +gitformat-loose(5)
> +==================
> +
> +NAME
> +----
> +gitformat-loose - Git loose object format
> +
> +
> +SYNOPSIS
> +--------
> +[verse]
> +$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
> +$GIT_DIR/objects/loose-object-idx
> +$GIT_DIR/objects/loose-map/map-*.map
> +
> +DESCRIPTION
> +-----------
> +
> +Loose objects are how Git initially stores most of its primary repository data.
"most of" is a bit misleading, I would think. Those who start (what
eventuall becomes) a large project from scratch are only minority of
the users, and all others start with "git clone" from elsewhere, and
in the resulting repository, Git initially stores most of its data
in a packfile (or two).
I think it may become a bit clearer if we drop "initially", and end
the sentence with "data that are created locally", perhaps?
> +Over the lifetime of a repository, objects are usually written as loose objects
> +initially and then converted into packs.
This one is good.
> +== Loose objects
> +
> +Each loose object contains a prefix, followed immediately by the data of the
> +object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
> +`tree`, `commit`, or `tag` and `size` is the size of the data (without the
> +prefix) as a decimal integer expressed in ASCII.
> +
> +The entire contents, prefix and data concatenated, is then compressed with zlib
> +and the compressed data is stored in the file. The object ID of the object is
The glossary calls this "object name", not "ID".
> +the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
We should clarify "data" in uncompressed "data", as we earlier said
"prefix and data concatenated", it can be misread as the payload
alone. You have "The entire contents" that stands for "prefix and
data concatenated" above, so "has of the entire contents" may work.
Also "uncompressed <whatever you rewrite 'data' with>" at the end of
this sentence should be followed by "in hexadecimal". The "first
two hex characters are used for fan-out" etc., depends on that you
do not use binary result of the hash as the object name.
Thanks.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 5/9] docs: add documentation for loose objects
2025-09-19 1:09 ` [PATCH 5/9] docs: add documentation for loose objects brian m. carlson
2025-09-19 19:10 ` Junio C Hamano
2025-09-19 23:16 ` Junio C Hamano
@ 2025-09-24 7:55 ` Patrick Steinhardt
2025-09-30 16:39 ` brian m. carlson
2 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 7:55 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Junio C Hamano, Derrick Stolee
On Fri, Sep 19, 2025 at 01:09:07AM +0000, brian m. carlson wrote:
> We currently have no documentation for how loose objects are stored.
> Let's add some here so its easy for people to understand how they
Nit: s/its/it is/
> diff --git a/Documentation/gitformat-loose.adoc b/Documentation/gitformat-loose.adoc
> new file mode 100644
> index 0000000000..c8bef606fb
> --- /dev/null
> +++ b/Documentation/gitformat-loose.adoc
Do we maybe want to call this "gitformat-loose-objects(5)"? "loose"
feels rather generic.
> @@ -0,0 +1,49 @@
> +gitformat-loose(5)
> +==================
Makes me wonder whether we should also have gitformat-reffiles(5) and
gitformat-reftables(5). Obviously nothing you have to do, but rather an
action item for myself or others interested in the ref backends.
> +NAME
> +----
> +gitformat-loose - Git loose object format
> +
> +
> +SYNOPSIS
> +--------
> +[verse]
> +$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
> +$GIT_DIR/objects/loose-object-idx
> +$GIT_DIR/objects/loose-map/map-*.map
It's a bit weird to list the mapping files here without explaining them.
Should we maybe drop them for now and only add them once we also add a
section explaining their format?
On the other hand, maybe it's better to list those files and not explain
them compared to not mentioning them at all. Not quite sure.
> +DESCRIPTION
> +-----------
> +
> +Loose objects are how Git initially stores most of its primary repository data.
> +Over the lifetime of a repository, objects are usually written as loose objects
> +initially and then converted into packs.
I feel that "most of its primary repository data" is a bit misleading,
as one can expect that most of the data should be in packfiles instead.
How about the following instead:
Loose objects are how Git stores individual objects, where every
object is written as a separate file.
Over the lifetime of a repository, new objects are typically written
as loose objects initially. Eventually, these loose objects will be
compacted into packfiles via repository maintenance to improve disk
space usage and speed up the lookup of those objects.
> +== Loose objects
> +
> +Each loose object contains a prefix, followed immediately by the data of the
> +object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
> +`tree`, `commit`, or `tag` and `size` is the size of the data (without the
> +prefix) as a decimal integer expressed in ASCII.
> +
> +The entire contents, prefix and data concatenated, is then compressed with zlib
> +and the compressed data is stored in the file. The object ID of the object is
> +the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
> +
> +The file for the loose object is stored under the `objects` directory, with the
> +first two hex characters of the object ID being the directory and the remaining
> +characters being the file name.
Should we maybe give a hint why we have these sharding directories?
Patrick
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH 5/9] docs: add documentation for loose objects
2025-09-24 7:55 ` Patrick Steinhardt
@ 2025-09-30 16:39 ` brian m. carlson
0 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-30 16:39 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 2029 bytes --]
On 2025-09-24 at 07:55:50, Patrick Steinhardt wrote:
> On Fri, Sep 19, 2025 at 01:09:07AM +0000, brian m. carlson wrote:
> > We currently have no documentation for how loose objects are stored.
> > Let's add some here so its easy for people to understand how they
>
> Nit: s/its/it is/
Will be fixed in v2.
> Do we maybe want to call this "gitformat-loose-objects(5)"? "loose"
> feels rather generic.
We have "index" and "pack", so I'd rather keep it short.
> > @@ -0,0 +1,49 @@
> > +gitformat-loose(5)
> > +==================
>
> Makes me wonder whether we should also have gitformat-reffiles(5) and
> gitformat-reftables(5). Obviously nothing you have to do, but rather an
> action item for myself or others interested in the ref backends.
Yeah, I definitely think that would be useful.
> > +SYNOPSIS
> > +--------
> > +[verse]
> > +$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
> > +$GIT_DIR/objects/loose-object-idx
> > +$GIT_DIR/objects/loose-map/map-*.map
>
> It's a bit weird to list the mapping files here without explaining them.
> Should we maybe drop them for now and only add them once we also add a
> section explaining their format?
I've bumped those two lines to a future commit (probably part 2).
> I feel that "most of its primary repository data" is a bit misleading,
> as one can expect that most of the data should be in packfiles instead.
> How about the following instead:
>
> Loose objects are how Git stores individual objects, where every
> object is written as a separate file.
>
> Over the lifetime of a repository, new objects are typically written
> as loose objects initially. Eventually, these loose objects will be
> compacted into packfiles via repository maintenance to improve disk
> space usage and speed up the lookup of those objects.
I adopted most of this language.
> Should we maybe give a hint why we have these sharding directories?
Done in v2.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 6/9] rev-parse: allow printing compatibility hash
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (4 preceding siblings ...)
2025-09-19 1:09 ` [PATCH 5/9] docs: add documentation for loose objects brian m. carlson
@ 2025-09-19 1:09 ` brian m. carlson
2025-09-19 23:24 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
2025-09-19 1:09 ` [PATCH 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
` (4 subsequent siblings)
10 siblings, 2 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-19 1:09 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Derrick Stolee
Right now, we have a way to print the storage hash, the input hash, and
the output hash, but we lack a way to print the compatibility hash. Add
a new type to --show-object-format, compat, which prints this value.
If no compatibility hash exists, simply print a newline. This is
important to allow users to use multiple options at once while still
getting unambiguous output.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/git-rev-parse.adoc | 11 ++++++-----
builtin/rev-parse.c | 11 ++++++++++-
t/t1500-rev-parse.sh | 34 ++++++++++++++++++++++++++++++++
3 files changed, 50 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-rev-parse.adoc b/Documentation/git-rev-parse.adoc
index cc32b4b4f0..465ae3e29d 100644
--- a/Documentation/git-rev-parse.adoc
+++ b/Documentation/git-rev-parse.adoc
@@ -324,11 +324,12 @@ The following options are unaffected by `--path-format`:
path of the current directory relative to the top-level
directory.
---show-object-format[=(storage|input|output)]::
- Show the object format (hash algorithm) used for the repository
- for storage inside the `.git` directory, input, or output. For
- input, multiple algorithms may be printed, space-separated.
- If not specified, the default is "storage".
+--show-object-format[=(storage|input|output|compat)]::
+ Show the object format (hash algorithm) used for the repository for storage
+ inside the `.git` directory, input, output, or compatibility. For input,
+ multiple algorithms may be printed, space-separated. If `compat` is
+ requested and no compatibility algorithm is enabled, prints an empty line. If
+ not specified, the default is "storage".
--show-ref-format::
Show the reference storage format used for the repository.
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 44ff1b8342..187b7e8be9 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -1108,11 +1108,20 @@ int cmd_rev_parse(int argc,
const char *val = arg ? arg : "storage";
if (strcmp(val, "storage") &&
+ strcmp(val, "compat") &&
strcmp(val, "input") &&
strcmp(val, "output"))
die(_("unknown mode for --show-object-format: %s"),
arg);
- puts(the_hash_algo->name);
+
+ if (!strcmp(val, "compat")) {
+ if (the_repository->compat_hash_algo)
+ puts(the_repository->compat_hash_algo->name);
+ else
+ putchar('\n');
+ } else {
+ puts(the_hash_algo->name);
+ }
continue;
}
if (!strcmp(arg, "--show-ref-format")) {
diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
index 58a4583088..98c5a772bd 100755
--- a/t/t1500-rev-parse.sh
+++ b/t/t1500-rev-parse.sh
@@ -207,6 +207,40 @@ test_expect_success 'rev-parse --show-object-format in repo' '
grep "unknown mode for --show-object-format: squeamish-ossifrage" err
'
+
+test_expect_success RUST 'rev-parse --show-object-format in repo with compat mode' '
+ mkdir repo &&
+ (
+ sane_unset GIT_DEFAULT_HASH &&
+ cd repo &&
+ git init --object-format=sha256 &&
+ git config extensions.compatobjectformat sha1 &&
+ echo sha256 >expect &&
+ git rev-parse --show-object-format >actual &&
+ test_cmp expect actual &&
+ git rev-parse --show-object-format=storage >actual &&
+ test_cmp expect actual &&
+ git rev-parse --show-object-format=input >actual &&
+ test_cmp expect actual &&
+ git rev-parse --show-object-format=output >actual &&
+ test_cmp expect actual &&
+ echo sha1 >expect &&
+ git rev-parse --show-object-format=compat >actual &&
+ test_cmp expect actual &&
+ test_must_fail git rev-parse --show-object-format=squeamish-ossifrage 2>err &&
+ grep "unknown mode for --show-object-format: squeamish-ossifrage" err
+ ) &&
+ mkdir repo2 &&
+ (
+ sane_unset GIT_DEFAULT_HASH &&
+ cd repo2 &&
+ git init --object-format=sha256 &&
+ echo >expect &&
+ git rev-parse --show-object-format=compat >actual &&
+ test_cmp expect actual
+ )
+'
+
test_expect_success 'rev-parse --show-ref-format' '
test_detect_ref_format >expect &&
git rev-parse --show-ref-format >actual &&
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH 6/9] rev-parse: allow printing compatibility hash
2025-09-19 1:09 ` [PATCH 6/9] rev-parse: allow printing compatibility hash brian m. carlson
@ 2025-09-19 23:24 ` Junio C Hamano
2025-09-24 7:55 ` Patrick Steinhardt
1 sibling, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2025-09-19 23:24 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> Right now, we have a way to print the storage hash, the input hash, and
> the output hash, but we lack a way to print the compatibility hash. Add
> a new type to --show-object-format, compat, which prints this value.
>
> If no compatibility hash exists, simply print a newline. This is
> important to allow users to use multiple options at once while still
> getting unambiguous output.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
Nice.
At first I somehow thought
$ git rev-parse --show-object-format=compat HEAD
in a SHA-1 primary repository would give the equivalent object name
for the commit at HEAD in SHA-256 world, but that is expecting too
much out of a simple 50-line patch ;-).
>
> if (strcmp(val, "storage") &&
> + strcmp(val, "compat") &&
> strcmp(val, "input") &&
> strcmp(val, "output"))
> die(_("unknown mode for --show-object-format: %s"),
> arg);
> - puts(the_hash_algo->name);
> +
> + if (!strcmp(val, "compat")) {
> + if (the_repository->compat_hash_algo)
> + puts(the_repository->compat_hash_algo->name);
> + else
> + putchar('\n');
> + } else {
> + puts(the_hash_algo->name);
> + }
> continue;
> }
> if (!strcmp(arg, "--show-ref-format")) {
Pretty straight-forward.
Thanks.
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH 6/9] rev-parse: allow printing compatibility hash
2025-09-19 1:09 ` [PATCH 6/9] rev-parse: allow printing compatibility hash brian m. carlson
2025-09-19 23:24 ` Junio C Hamano
@ 2025-09-24 7:55 ` Patrick Steinhardt
2025-09-25 21:48 ` brian m. carlson
1 sibling, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 7:55 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Junio C Hamano, Derrick Stolee
On Fri, Sep 19, 2025 at 01:09:08AM +0000, brian m. carlson wrote:
> diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
> index 58a4583088..98c5a772bd 100755
> --- a/t/t1500-rev-parse.sh
> +++ b/t/t1500-rev-parse.sh
> @@ -207,6 +207,40 @@ test_expect_success 'rev-parse --show-object-format in repo' '
> grep "unknown mode for --show-object-format: squeamish-ossifrage" err
> '
>
> +
> +test_expect_success RUST 'rev-parse --show-object-format in repo with compat mode' '
Does this test really depend on the RUST prereq? I cannot see anything
here that would require it.
Patrick
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 6/9] rev-parse: allow printing compatibility hash
2025-09-24 7:55 ` Patrick Steinhardt
@ 2025-09-25 21:48 ` brian m. carlson
0 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-25 21:48 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 797 bytes --]
On 2025-09-24 at 07:55:56, Patrick Steinhardt wrote:
> On Fri, Sep 19, 2025 at 01:09:08AM +0000, brian m. carlson wrote:
> > diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
> > index 58a4583088..98c5a772bd 100755
> > --- a/t/t1500-rev-parse.sh
> > +++ b/t/t1500-rev-parse.sh
> > @@ -207,6 +207,40 @@ test_expect_success 'rev-parse --show-object-format in repo' '
> > grep "unknown mode for --show-object-format: squeamish-ossifrage" err
> > '
> >
> > +
> > +test_expect_success RUST 'rev-parse --show-object-format in repo with compat mode' '
>
> Does this test really depend on the RUST prereq? I cannot see anything
> here that would require it.
I think that should move up into part 2. Will fix for v2.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 7/9] fsck: consider gpgsig headers expected in tags
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (5 preceding siblings ...)
2025-09-19 1:09 ` [PATCH 6/9] rev-parse: allow printing compatibility hash brian m. carlson
@ 2025-09-19 1:09 ` brian m. carlson
2025-09-19 23:31 ` Junio C Hamano
2025-09-19 1:09 ` [PATCH 8/9] Allow specifying compatibility hash brian m. carlson
` (3 subsequent siblings)
10 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-09-19 1:09 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Derrick Stolee
When we're creating a tag, we want to make sure that gpgsig and
gpgsig-sha256 headers are allowed for the commit. The default fsck
behavior is to ignore the fact that they're left over, but some of our
tests enable strict checking which flags them nonetheless. Add
improved checking for these headers as well as documentation and several
tests.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/fsck-msgids.adoc | 6 ++++
fsck.c | 18 ++++++++++++
fsck.h | 2 ++
t/t1450-fsck.sh | 54 ++++++++++++++++++++++++++++++++++
4 files changed, 80 insertions(+)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..52d9a8a811 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -10,6 +10,12 @@
`badFilemode`::
(INFO) A tree contains a bad filemode entry.
+`badGpgsig`::
+ (ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header.
+
+`badHeaderContinuation`::
+ (ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated.
+
`badName`::
(ERROR) An author/committer name is empty.
diff --git a/fsck.c b/fsck.c
index 171b424dd5..341e100d24 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1067,6 +1067,24 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
else
ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
+ if (buffer < buffer_end && (skip_prefix(buffer, "gpgsig ", &buffer) || skip_prefix(buffer, "gpgsig-sha256 ", &buffer))) {
+ eol = memchr(buffer, '\n', buffer_end - buffer);
+ if (!eol) {
+ ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_GPGSIG, "invalid format - unexpected end after 'gpgsig' or 'gpgsig-sha256' line");
+ goto done;
+ }
+ buffer = eol + 1;
+
+ while (buffer < buffer_end && starts_with(buffer, " ")) {
+ eol = memchr(buffer, '\n', buffer_end - buffer);
+ if (!eol) {
+ ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_HEADER_CONTINUATION, "invalid format - unexpected end in 'gpgsig' or 'gpgsig-sha256' continuation line");
+ goto done;
+ }
+ buffer = eol + 1;
+ }
+ }
+
if (buffer < buffer_end && !starts_with(buffer, "\n")) {
/*
* The verify_headers() check will allow
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..c26616d7eb 100644
--- a/fsck.h
+++ b/fsck.h
@@ -25,9 +25,11 @@ enum fsck_msg_type {
FUNC(NUL_IN_HEADER, FATAL) \
FUNC(UNTERMINATED_HEADER, FATAL) \
/* errors */ \
+ FUNC(BAD_HEADER_CONTINUATION, ERROR) \
FUNC(BAD_DATE, ERROR) \
FUNC(BAD_DATE_OVERFLOW, ERROR) \
FUNC(BAD_EMAIL, ERROR) \
+ FUNC(BAD_GPGSIG, ERROR) \
FUNC(BAD_NAME, ERROR) \
FUNC(BAD_OBJECT_SHA1, ERROR) \
FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5ae86c42be..c4b651c2dc 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -454,6 +454,60 @@ test_expect_success 'tag with NUL in header' '
test_grep "error in tag $tag.*unterminated header: NUL at offset" out
'
+test_expect_success 'tag accepts gpgsig header even if not validly signed' '
+ test_oid_cache <<-\EOF &&
+ header sha1:gpgsig-sha256
+ header sha256:gpgsig
+ EOF
+ header=$(test_oid header) &&
+ sha=$(git rev-parse HEAD) &&
+ cat >good-tag <<-EOF &&
+ object $sha
+ type commit
+ tag good
+ tagger T A Gger <tagger@example.com> 1234567890 -0000
+ $header -----BEGIN PGP SIGNATURE-----
+ Not a valid signature
+ -----END PGP SIGNATURE-----
+
+ This is a good tag.
+ EOF
+
+ tag=$(git hash-object --literally -t tag -w --stdin <good-tag) &&
+ test_when_finished "remove_object $tag" &&
+ git update-ref refs/tags/good $tag &&
+ test_when_finished "git update-ref -d refs/tags/good" &&
+ git -c fsck.extraHeaderEntry=error fsck --tags
+'
+
+test_expect_success 'tag rejects invalid headers' '
+ test_oid_cache <<-\EOF &&
+ header sha1:gpgsig-sha256
+ header sha256:gpgsig
+ EOF
+ header=$(test_oid header) &&
+ sha=$(git rev-parse HEAD) &&
+ cat >bad-tag <<-EOF &&
+ object $sha
+ type commit
+ tag good
+ tagger T A Gger <tagger@example.com> 1234567890 -0000
+ $header -----BEGIN PGP SIGNATURE-----
+ Not a valid signature
+ -----END PGP SIGNATURE-----
+ junk
+
+ This is a bad tag with junk at the end of the headers.
+ EOF
+
+ tag=$(git hash-object --literally -t tag -w --stdin <bad-tag) &&
+ test_when_finished "remove_object $tag" &&
+ git update-ref refs/tags/bad $tag &&
+ test_when_finished "git update-ref -d refs/tags/bad" &&
+ test_must_fail git -c fsck.extraHeaderEntry=error fsck --tags 2>out &&
+ test_grep "error in tag $tag.*invalid format - extra header" out
+'
+
test_expect_success 'cleaned up' '
git fsck >actual 2>&1 &&
test_must_be_empty actual
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH 7/9] fsck: consider gpgsig headers expected in tags
2025-09-19 1:09 ` [PATCH 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
@ 2025-09-19 23:31 ` Junio C Hamano
2025-09-22 21:38 ` brian m. carlson
0 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2025-09-19 23:31 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt, Derrick Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> diff --git a/fsck.c b/fsck.c
> index 171b424dd5..341e100d24 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -1067,6 +1067,24 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
> else
> ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
>
> + if (buffer < buffer_end && (skip_prefix(buffer, "gpgsig ", &buffer) || skip_prefix(buffer, "gpgsig-sha256 ", &buffer))) {
Could you wrap this overly long line?
if (buffer < buffer_end &&
(skip_prefix(buffer, "gpgsig ", &buffer) ||
skip_prefix(buffer, "gpgsig-sha256 ", &buffer))) {
> + eol = memchr(buffer, '\n', buffer_end - buffer);
> + if (!eol) {
> + ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_GPGSIG, "invalid format - unexpected end after 'gpgsig' or 'gpgsig-sha256' line");
> + goto done;
> + }
> + buffer = eol + 1;
> +
> + while (buffer < buffer_end && starts_with(buffer, " ")) {
> + eol = memchr(buffer, '\n', buffer_end - buffer);
> + if (!eol) {
> + ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_HEADER_CONTINUATION, "invalid format - unexpected end in 'gpgsig' or 'gpgsig-sha256' continuation line");
> + goto done;
> + }
> + buffer = eol + 1;
> + }
> + }
> +
Do we allow a tag object with both "gpgsig" and "gpgsig-sha256" or
detect as an error? I think the most natural way to extend this
system in the future with a third hash function would be to still
have the primary hash in the payload and signatures created with
other compatibility hash functions on the header, so if we were to
detect, the rule may be "gpgsig* in the headers ought to be unique
and should not include the primary hash algorithm" plus "if you have
gpgsig* in the header, the body must also have inline signature, and
if you don't, the body must not", perhaps?
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH 7/9] fsck: consider gpgsig headers expected in tags
2025-09-19 23:31 ` Junio C Hamano
@ 2025-09-22 21:38 ` brian m. carlson
0 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-22 21:38 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 1291 bytes --]
On 2025-09-19 at 23:31:08, Junio C Hamano wrote:
> Could you wrap this overly long line?
>
> if (buffer < buffer_end &&
> (skip_prefix(buffer, "gpgsig ", &buffer) ||
> skip_prefix(buffer, "gpgsig-sha256 ", &buffer))) {
Will fix in v2.
> Do we allow a tag object with both "gpgsig" and "gpgsig-sha256" or
> detect as an error? I think the most natural way to extend this
> system in the future with a third hash function would be to still
> have the primary hash in the payload and signatures created with
> other compatibility hash functions on the header, so if we were to
> detect, the rule may be "gpgsig* in the headers ought to be unique
> and should not include the primary hash algorithm" plus "if you have
> gpgsig* in the header, the body must also have inline signature, and
> if you don't, the body must not", perhaps?
In v2, I'll make it such that `gpgsig` is allowed only when we're not
using SHA-1 and `gpgsig-sha256` is allowed only when we're not using
SHA-256. It may be that we don't have a trailing signature, though,
since we might turn a SHA-1 tag (signed only with SHA-1) into a SHA-256
tag (which would have only a `gpgsig` header and no trailing SHA-256
signature).
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 8/9] Allow specifying compatibility hash
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (6 preceding siblings ...)
2025-09-19 1:09 ` [PATCH 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
@ 2025-09-19 1:09 ` brian m. carlson
2025-09-24 7:56 ` Patrick Steinhardt
2025-09-19 1:09 ` [PATCH 9/9] t: add a prerequisite for a " brian m. carlson
` (2 subsequent siblings)
10 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-09-19 1:09 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Derrick Stolee
We want to specify a compatibility hash for testing interactions for
SHA-256 repositories where we have SHA-1 compatibility enabled. Allow
the user to specify this scenario in the test suite by setting
GIT_TEST_DEFAULT_HASH to "sha256:sha1".
Note that this will get passed into GIT_DEFAULT_HASH, which Git itself
does not presently support. However, we will support this in a future
commit.
Since we'll now want to know the value for a specific version, let's add
the ability to specify either the storage hash (in this case, SHA-256)
or the compatibility hash (SHA-1). We use a different value for the
compatibility hash that will be enabled for all repositories
(test_repo_compat_hash_algo) versus the one that is used individually in
some tests (test_compat_hash_algo), since we want to still run those
individual tests without requiring that the testsuite be run fully in a
compatibility mode.
Finally, in this scenario, we can no longer rely on having broken
objects work since we lack compatibility mappings to rewrite objects in
the repository. Add a prerequisite, BROKEN_OBJECTS, that checks to see
if creating deliberately broken objects is possible, so that we can
disable these tests if not.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
t/test-lib-functions.sh | 9 +++++++--
t/test-lib.sh | 7 +++++++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index a28de7b19b..52d7759bf5 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1708,11 +1708,16 @@ test_set_hash () {
# Detect the hash algorithm in use.
test_detect_hash () {
case "${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}" in
- "sha256")
+ *:*)
+ test_hash_algo="${GIT_TEST_DEFAULT_HASH%%:*}"
+ test_compat_hash_algo="${GIT_TEST_DEFAULT_HASH##*:}"
+ test_repo_compat_hash_algo="$test_compat_hash_algo"
+ ;;
+ sha256)
test_hash_algo=sha256
test_compat_hash_algo=sha1
;;
- *)
+ sha1)
test_hash_algo=sha1
test_compat_hash_algo=sha256
;;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 621cd31ae1..14c777e4e2 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1917,6 +1917,13 @@ test_lazy_prereq DEFAULT_HASH_ALGORITHM '
test_lazy_prereq DEFAULT_REPO_FORMAT '
test_have_prereq SHA1,REFFILES
'
+# BROKEN_OBJECTS is a test if we can write deliberately broken objects and
+# expect them to work. When running using SHA-256 mode with SHA-1
+# compatibility, we cannot write such objects because there's no SHA-1
+# compatibility value for a nonexistent object.
+test_lazy_prereq BROKEN_OBJECTS '
+ test -z "$test_repo_compat_hash_algo"
+'
# Ensure that no test accidentally triggers a Git command
# that runs the actual maintenance scheduler, affecting a user's
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH 8/9] Allow specifying compatibility hash
2025-09-19 1:09 ` [PATCH 8/9] Allow specifying compatibility hash brian m. carlson
@ 2025-09-24 7:56 ` Patrick Steinhardt
2025-09-30 16:44 ` brian m. carlson
0 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 7:56 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Junio C Hamano, Derrick Stolee
On Fri, Sep 19, 2025 at 01:09:10AM +0000, brian m. carlson wrote:
Tiny nit: it would help if the patch subject was prefixed with "t:".
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index a28de7b19b..52d7759bf5 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1708,11 +1708,16 @@ test_set_hash () {
> # Detect the hash algorithm in use.
> test_detect_hash () {
> case "${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}" in
> - "sha256")
> + *:*)
> + test_hash_algo="${GIT_TEST_DEFAULT_HASH%%:*}"
> + test_compat_hash_algo="${GIT_TEST_DEFAULT_HASH##*:}"
> + test_repo_compat_hash_algo="$test_compat_hash_algo"
> + ;;
> + sha256)
> test_hash_algo=sha256
> test_compat_hash_algo=sha1
> ;;
> - *)
> + sha1)
> test_hash_algo=sha1
> test_compat_hash_algo=sha256
> ;;
Makes sense, I guess. It's a bit hard to judge without seeing any actual
tests, but I don't think that should hold off adding the infrastructure.
Worst case we can still adjust it at later point in time.
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 621cd31ae1..14c777e4e2 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1917,6 +1917,13 @@ test_lazy_prereq DEFAULT_HASH_ALGORITHM '
> test_lazy_prereq DEFAULT_REPO_FORMAT '
> test_have_prereq SHA1,REFFILES
> '
> +# BROKEN_OBJECTS is a test if we can write deliberately broken objects and
s/if/whether/
> +# expect them to work. When running using SHA-256 mode with SHA-1
> +# compatibility, we cannot write such objects because there's no SHA-1
> +# compatibility value for a nonexistent object.
> +test_lazy_prereq BROKEN_OBJECTS '
> + test -z "$test_repo_compat_hash_algo"
> +'
>
> # Ensure that no test accidentally triggers a Git command
> # that runs the actual maintenance scheduler, affecting a user's
Patrick
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH 8/9] Allow specifying compatibility hash
2025-09-24 7:56 ` Patrick Steinhardt
@ 2025-09-30 16:44 ` brian m. carlson
0 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-09-30 16:44 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 499 bytes --]
On 2025-09-24 at 07:56:01, Patrick Steinhardt wrote:
> Makes sense, I guess. It's a bit hard to judge without seeing any actual
> tests, but I don't think that should hold off adding the infrastructure.
> Worst case we can still adjust it at later point in time.
I'll send in an example test as part of v2.
> > '
> > +# BROKEN_OBJECTS is a test if we can write deliberately broken objects and
>
> s/if/whether/
Fixed in v2.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 9/9] t: add a prerequisite for a compatibility hash
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (7 preceding siblings ...)
2025-09-19 1:09 ` [PATCH 8/9] Allow specifying compatibility hash brian m. carlson
@ 2025-09-19 1:09 ` brian m. carlson
2025-09-24 7:56 ` Patrick Steinhardt
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
10 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-09-19 1:09 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Derrick Stolee
In some cases, we'll need to adjust our test suite to work in a proper
way with a compatibility hash. For example, in such a case, we'll only
use pack index v3, since v1 and v2 lack support for multiple algorithms.
Since we won't want to write those older formats, we'll need to skip
tests that do so.
Let's add a COMPAT_HASH prerequisite and define the BROKEN_OBJECTS
prerequisite in terms of it.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
t/test-lib.sh | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 14c777e4e2..a4bb9ab2d8 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1922,7 +1922,13 @@ test_lazy_prereq DEFAULT_REPO_FORMAT '
# compatibility, we cannot write such objects because there's no SHA-1
# compatibility value for a nonexistent object.
test_lazy_prereq BROKEN_OBJECTS '
- test -z "$test_repo_compat_hash_algo"
+ ! test_have_prereq COMPAT_HASH
+'
+
+# COMPAT_HASH is a test if we're operating in a repository with SHA-256 with
+# SHA-1 compatibility.
+test_lazy_prereq COMPAT_HASH '
+ test -n "$test_repo_compat_hash_algo"
'
# Ensure that no test accidentally triggers a Git command
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH 9/9] t: add a prerequisite for a compatibility hash
2025-09-19 1:09 ` [PATCH 9/9] t: add a prerequisite for a " brian m. carlson
@ 2025-09-24 7:56 ` Patrick Steinhardt
0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 7:56 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Junio C Hamano, Derrick Stolee
On Fri, Sep 19, 2025 at 01:09:11AM +0000, brian m. carlson wrote:
> In some cases, we'll need to adjust our test suite to work in a proper
> way with a compatibility hash. For example, in such a case, we'll only
> use pack index v3, since v1 and v2 lack support for multiple algorithms.
> Since we won't want to write those older formats, we'll need to skip
> tests that do so.
>
> Let's add a COMPAT_HASH prerequisite and define the BROKEN_OBJECTS
> prerequisite in terms of it.
Nit: I think this commit could easily be squashed into the preceding
commit.
Patrick
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (8 preceding siblings ...)
2025-09-19 1:09 ` [PATCH 9/9] t: add a prerequisite for a " brian m. carlson
@ 2025-10-02 22:38 ` brian m. carlson
2025-10-02 22:38 ` [PATCH v2 1/9] docs: update pack index v3 format brian m. carlson
` (8 more replies)
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
10 siblings, 9 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-02 22:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
This is the first of several series for SHA-1 and SHA-256
interoperability, all of which will hopefully land in Git 3.0.
The first set of commits here is documentation updates for documentation
which was incorrect, missing, or simply wrong. I have spent more time
than I'd like in the pack code and felt our documentation there could be
more helpful. I also am correcting some things about the
interoperability formats that I've found are not correct or efficient in
terms of implementation and thus I will be implementing differently.
The loose object documentation will be updated with the loose object
mapping in a future commit, but I felt I should send a basic loose
object document first, so here it is.
The remaining commits are for expected gpgsig headers in tags, which
causes some tests which use strict fsck to fail, as well as for
prerequisites for compatibility hashes in the testsuite. Actually using
this configuration is not possible since the tests are still very broken
using it, but declaring these prerequisites allows me and others to send
in patches that use them and thus make our testsuite more resilient.
For example, in interoperability mode we cannot write objects that
are not valid since we cannot convert them into the other hash
algorithm. Thus, when we're testing in a mode that has a compatibility
algorithm, we skip these tests.
The goal is to run the tests in a full compatibility mode where
everything is dual-hash as well as introduce some specific tests for
interoperability that run in all configurations of the tests.
Changes from v1:
* Squash the two test changes into one commit.
* Include a new commit showing the use of the BROKEN_OBJECTS prereq.
* Mention using main algorithm hash in pack index v3.
* Hopefully clarify signed tags.
* Improve text for pack format documentation.
* Wire up build of loose object documentation.
* Remove loose object map documentation.
* Rephrase text about loose objects.
* Remove needless RUST prerequisite.
* Wrap overly long line.
* Reject invalid signature algorithms in tag headers.
* Fix if/whether problem in test comment.
brian m. carlson (9):
docs: update pack index v3 format
docs: update offset order for pack index v3
docs: reflect actual double signature for tags
docs: improve ambiguous areas of pack format documentation
docs: add documentation for loose objects
rev-parse: allow printing compatibility hash
fsck: consider gpgsig headers expected in tags
t: allow specifying compatibility hash
t1010: use BROKEN_OBJECTS prerequisite
Documentation/Makefile | 1 +
Documentation/fsck-msgids.adoc | 6 +++
Documentation/git-rev-parse.adoc | 11 ++--
Documentation/gitformat-loose.adoc | 53 ++++++++++++++++++
Documentation/gitformat-pack.adoc | 18 +++++++
Documentation/meson.build | 1 +
.../technical/hash-function-transition.adoc | 42 ++++++++-------
builtin/rev-parse.c | 11 +++-
fsck.c | 18 +++++++
fsck.h | 2 +
t/t1010-mktree.sh | 13 +++--
t/t1450-fsck.sh | 54 +++++++++++++++++++
t/t1500-rev-parse.sh | 34 ++++++++++++
t/test-lib-functions.sh | 9 +++-
t/test-lib.sh | 13 +++++
15 files changed, 254 insertions(+), 32 deletions(-)
create mode 100644 Documentation/gitformat-loose.adoc
^ permalink raw reply [flat|nested] 67+ messages in thread* [PATCH v2 1/9] docs: update pack index v3 format
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
@ 2025-10-02 22:38 ` brian m. carlson
2025-10-03 17:00 ` Junio C Hamano
2025-10-02 22:38 ` [PATCH v2 2/9] docs: update offset order for pack index v3 brian m. carlson
` (7 subsequent siblings)
8 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-10-02 22:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
Our current pack index v3 format uses 4-byte integers to find the
trailer of the file. This effectively means that the file cannot be
much larger than 2^32. While this might at first seem to be okay, we
expect that each object will have at least 64 bytes worth of data, which
means that no more than about 67 million objects can be stored.
Again, this might seem fine, but unfortunately, we know of many users
who attempt to create repos with extremely large numbers of commits to
get a "high score," and we've already seen repositories with at least 55
million commits. In the interests of gracefully handling repositories
even for these well-intentioned but ultimately misguided users, let's
change these lengths to 8 bytes.
For the checksums at the end of the file, we're producing 32-byte
SHA-256 checksums because that's what we already do with pack index v2
and SHA-256. Truncating SHA-256 doesn't pose any actual security
problems other than those related to the reduced size, but our pack
checksum must already be 32 bytes (since SHA-256 packs have 32-byte
checksums) and it simplifies the code to use the existing hashfile logic
for these cases for the index checksum as well.
In addition, even though we may not need cryptographic security for the
index checksum, we'd like to avoid arguments from auditors and such for
organizations that may have compliance or security requirements. Using
the simple, boring choice of the full SHA-256 hash avoids all possible
discussion related to hash truncation and removes impediments for these
organizations.
Note that we do not yet have a pack index v3 implementation in Git, so
it should be fine to change this format. However, such an
implementation has been written for future inclusion following this
format.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
.../technical/hash-function-transition.adoc | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
index f047fd80ca..274dc993d4 100644
--- a/Documentation/technical/hash-function-transition.adoc
+++ b/Documentation/technical/hash-function-transition.adoc
@@ -227,9 +227,9 @@ network byte order):
** 4-byte length in bytes of shortened object names. This is the
shortest possible length needed to make names in the shortened
object name table unambiguous.
- ** 4-byte integer, recording where tables relating to this format
+ ** 8-byte integer, recording where tables relating to this format
are stored in this index file, as an offset from the beginning.
- * 4-byte offset to the trailer from the beginning of this file.
+ * 8-byte offset to the trailer from the beginning of this file.
* Zero or more additional key/value pairs (4-byte key, 4-byte
value). Only one key is supported: 'PSRC'. See the "Loose objects
and unreachable objects" section for supported values and how this
@@ -276,10 +276,14 @@ network byte order):
up to and not including the table of CRC32 values.
- Zero or more NUL bytes.
- The trailer consists of the following:
- * A copy of the 20-byte SHA-256 checksum at the end of the
+ * A copy of the full main hash checksum at the end of the
corresponding packfile.
- * 20-byte SHA-256 checksum of all of the above.
+ * Full main hash checksum of all of the above.
+
+The "full main hash" is a full-length hash of the main (not compatibility)
+algorithm in the repository. Thus, if the main algorithm is SHA-256, this is
+a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash.
Loose object index
~~~~~~~~~~~~~~~~~~
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH v2 1/9] docs: update pack index v3 format
2025-10-02 22:38 ` [PATCH v2 1/9] docs: update pack index v3 format brian m. carlson
@ 2025-10-03 17:00 ` Junio C Hamano
0 siblings, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2025-10-03 17:00 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
> index f047fd80ca..274dc993d4 100644
> --- a/Documentation/technical/hash-function-transition.adoc
> +++ b/Documentation/technical/hash-function-transition.adoc
> @@ -227,9 +227,9 @@ network byte order):
> ** 4-byte length in bytes of shortened object names. This is the
> shortest possible length needed to make names in the shortened
> object name table unambiguous.
> - ** 4-byte integer, recording where tables relating to this format
> + ** 8-byte integer, recording where tables relating to this format
> are stored in this index file, as an offset from the beginning.
> - * 4-byte offset to the trailer from the beginning of this file.
> + * 8-byte offset to the trailer from the beginning of this file.
> * Zero or more additional key/value pairs (4-byte key, 4-byte
> value). Only one key is supported: 'PSRC'. See the "Loose objects
> and unreachable objects" section for supported values and how this
> @@ -276,10 +276,14 @@ network byte order):
> up to and not including the table of CRC32 values.
> - Zero or more NUL bytes.
> - The trailer consists of the following:
> - * A copy of the 20-byte SHA-256 checksum at the end of the
> + * A copy of the full main hash checksum at the end of the
> corresponding packfile.
> - * 20-byte SHA-256 checksum of all of the above.
> + * Full main hash checksum of all of the above.
> +
> +The "full main hash" is a full-length hash of the main (not compatibility)
> +algorithm in the repository. Thus, if the main algorithm is SHA-256, this is
> +a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash.
I see a nice improvement over v1 here. Very good.
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v2 2/9] docs: update offset order for pack index v3
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-02 22:38 ` [PATCH v2 1/9] docs: update pack index v3 format brian m. carlson
@ 2025-10-02 22:38 ` brian m. carlson
2025-10-02 22:38 ` [PATCH v2 3/9] docs: reflect actual double signature for tags brian m. carlson
` (6 subsequent siblings)
8 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-02 22:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
The current design of pack index v3 has items in two different orders:
sorted shortened object ID order and pack order. The shortened object
IDs and the pack index offset values are in the former order and
everything else is in the latter.
This, however, poses some problems. We have many parts of the packfile
code that expect to find out data about an object knowing only its index
in pack order. With the current design, to find the pack offset after
having looked up the index in pack order, we must then look up the full
object ID and use that to look up the shortened object ID to find the
pack offset, which is inconvenient, inefficient, and leads to poor cache
usage.
Instead, let's change the offset values to be looked up by pack order.
This works better because once we know the pack order offset, we can
find the full object name and its location in the pack with a simple
index into their respective tables. This makes many operations much
more efficient, especially with the functions we already have, and it
avoids the need for the revindex with pack index v3.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/technical/hash-function-transition.adoc | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
index 274dc993d4..adb0c61e53 100644
--- a/Documentation/technical/hash-function-transition.adoc
+++ b/Documentation/technical/hash-function-transition.adoc
@@ -260,12 +260,10 @@ network byte order):
compressed data to be copied directly from pack to pack during
repacking without undetected data corruption.
- * A table of 4-byte offset values. For an object in the table of
- sorted shortened object names, the value at the corresponding
- index in this table indicates where that object can be found in
- the pack file. These are usually 31-bit pack file offsets, but
- large offsets are encoded as an index into the next table with the
- most significant bit set.
+ * A table of 4-byte offset values. The index of this table in pack order
+ indicates where that object can be found in the pack file. These are
+ usually 31-bit pack file offsets, but large offsets are encoded as
+ an index into the next table with the most significant bit set.
* A table of 8-byte offset entries (empty for pack files less than
2 GiB). Pack files are organized with heavily used objects toward
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v2 3/9] docs: reflect actual double signature for tags
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-02 22:38 ` [PATCH v2 1/9] docs: update pack index v3 format brian m. carlson
2025-10-02 22:38 ` [PATCH v2 2/9] docs: update offset order for pack index v3 brian m. carlson
@ 2025-10-02 22:38 ` brian m. carlson
2025-10-02 22:38 ` [PATCH v2 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
` (5 subsequent siblings)
8 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-02 22:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
The documentation for the hash function transition reflects the original
design where the SHA-256 signature would always be placed in a header.
However, due to a missed patch in Git 2.29, we shipped SHA-256 support
such that the signature for the current algorithm is always an in-body
signature and the opposite algorithm is always in a header. Since the
documentation is inaccurate, update it to reflect the correct
information.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
.../technical/hash-function-transition.adoc | 20 ++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
index adb0c61e53..2359d7d106 100644
--- a/Documentation/technical/hash-function-transition.adoc
+++ b/Documentation/technical/hash-function-transition.adoc
@@ -429,17 +429,19 @@ ordinary unsigned commit.
Signed Tags
~~~~~~~~~~~
-We add a new field "gpgsig-sha256" to the tag object format to allow
-signing tags without relying on SHA-1. Its signed payload is the
-SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
-SIGNATURE-----" delimited in-body signature removed.
+We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
+allow signing tags in both formats. The in-body signature is used for the
+signature in the current hash algorithm and the header is used for the
+signature in the other algorithm. Thus, a dual-signature tag will contain both
+an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
+object or both an in-body signature and a gpgsig header for the SHA-256 format
+of and object.
-This means tags can be signed
+The signed payload of the tag is the content of the tag in the current
+algorithm with both its gpgsig and gpgsig-sha256 fields and
+"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed.
-1. using SHA-1 only, as in existing signed tag objects
-2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
- signature.
-3. using only SHA-256, by only using the gpgsig-sha256 field.
+This means tags can be signed using one or both algorithms.
Mergetag embedding
~~~~~~~~~~~~~~~~~~
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v2 4/9] docs: improve ambiguous areas of pack format documentation
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (2 preceding siblings ...)
2025-10-02 22:38 ` [PATCH v2 3/9] docs: reflect actual double signature for tags brian m. carlson
@ 2025-10-02 22:38 ` brian m. carlson
2025-10-03 17:07 ` Junio C Hamano
2025-10-02 22:38 ` [PATCH v2 5/9] docs: add documentation for loose objects brian m. carlson
` (4 subsequent siblings)
8 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-10-02 22:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
It is fair to say that our pack and indexing code is quite complex.
Contributors who wish to work on this code or implementors of other
implementations would benefit from clear, unambiguous documentation
about how our data formats are structured and encoded and what data is
used in the computation of certain values. Unfortunately, some of this
data is missing, which leads to confusion and frustration.
Let's document some of this data to help clarify things. Specify over
what data CRC32 values are computed and also note which CRC32 algorithm
is used, since Wikipedia mentions at least four 32-bit CRC algorithms
and notes that it's possible to use different bit orderings.
In addition, note how we encode objects in the pack. One might be led
to believe that packed objects are always stored with the "<type>
<size>\0" prefix of loose objects, but that is not the case, although
for obvious reasons this data is included in the computation of the
object ID. Explain why this is for the curious reader.
Finally, indicate what the size field of the packed object represents.
Otherwise, a reader might think that the size of a delta is the size of
the full object or that it might contain the offset or object ID,
neither of which are the case. Explain clearly, however, that the
values represent uncompressed sizes to avoid confusion.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/gitformat-pack.adoc | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/Documentation/gitformat-pack.adoc b/Documentation/gitformat-pack.adoc
index d6ae229be5..9b7af5c184 100644
--- a/Documentation/gitformat-pack.adoc
+++ b/Documentation/gitformat-pack.adoc
@@ -32,6 +32,10 @@ In a repository using the traditional SHA-1, pack checksums, index checksums,
and object IDs (object names) mentioned below are all computed using SHA-1.
Similarly, in SHA-256 repositories, these values are computed using SHA-256.
+CRC32 checksums are always computed over the entire packed object, including
+the header (n-byte type and length); the base object name or offset, if any;
+and the entire compressed object. The CRC32 algorithm used is that of zlib.
+
== pack-*.pack files have the following format:
- A header appears at the beginning and consists of the following:
@@ -80,6 +84,15 @@ Valid object types are:
Type 5 is reserved for future expansion. Type 0 is invalid.
+=== Object encoding
+
+Unlike loose objects, packed objects do not have a prefix containing the type,
+size, and a NUL byte. These are not necessary because they can be determined by
+the n-byte type and length that prefixes the data and so they are omitted from
+the compressed and deltified data.
+
+The computation of the object ID still uses this prefix, however.
+
=== Size encoding
This document uses the following "size encoding" of non-negative
@@ -92,6 +105,11 @@ values are more significant.
This size encoding should not be confused with the "offset encoding",
which is also used in this document.
+When encoding the size of an undeltified object in a pack, the size is that of
+the uncompressed raw object. For deltified objects, it is the size of the
+uncompressed delta. The base object name or offset is not included in the size
+computation.
+
=== Deltified representation
Conceptually there are only four object types: commit, tree, tag and
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH v2 4/9] docs: improve ambiguous areas of pack format documentation
2025-10-02 22:38 ` [PATCH v2 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
@ 2025-10-03 17:07 ` Junio C Hamano
2025-10-03 21:06 ` brian m. carlson
0 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2025-10-03 17:07 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> +=== Object encoding
> +
> +Unlike loose objects, packed objects do not have a prefix containing the type,
> +size, and a NUL byte. These are not necessary because they can be determined by
> +the n-byte type and length that prefixes the data and so they are omitted from
> +the compressed and deltified data.
> +
> +The computation of the object ID still uses this prefix, however.
"however" -> "by reconstructing it from the type/length data as
needed"?
Other than that, the new text reads very well.
Thanks.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v2 4/9] docs: improve ambiguous areas of pack format documentation
2025-10-03 17:07 ` Junio C Hamano
@ 2025-10-03 21:06 ` brian m. carlson
0 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-03 21:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Patrick Steinhardt
[-- Attachment #1: Type: text/plain, Size: 850 bytes --]
On 2025-10-03 at 17:07:47, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
> > +=== Object encoding
> > +
> > +Unlike loose objects, packed objects do not have a prefix containing the type,
> > +size, and a NUL byte. These are not necessary because they can be determined by
> > +the n-byte type and length that prefixes the data and so they are omitted from
> > +the compressed and deltified data.
> > +
> > +The computation of the object ID still uses this prefix, however.
>
> "however" -> "by reconstructing it from the type/length data as
> needed"?
>
> Other than that, the new text reads very well.
I've already squashed in a very similar change. I'll wait to see if
Patrick or anyone else has more comments and then send v3.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v2 5/9] docs: add documentation for loose objects
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (3 preceding siblings ...)
2025-10-02 22:38 ` [PATCH v2 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
@ 2025-10-02 22:38 ` brian m. carlson
2025-10-03 17:05 ` Junio C Hamano
2025-10-02 22:38 ` [PATCH v2 6/9] rev-parse: allow printing compatibility hash brian m. carlson
` (3 subsequent siblings)
8 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-10-02 22:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
We currently have no documentation for how loose objects are stored.
Let's add some here so it's easy for people to understand how they
work.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/Makefile | 1 +
Documentation/gitformat-loose.adoc | 53 ++++++++++++++++++++++++++++++
Documentation/meson.build | 1 +
3 files changed, 55 insertions(+)
create mode 100644 Documentation/gitformat-loose.adoc
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6fb83d0c6e..e1d38fbfe6 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -34,6 +34,7 @@ MAN5_TXT += gitformat-bundle.adoc
MAN5_TXT += gitformat-chunk.adoc
MAN5_TXT += gitformat-commit-graph.adoc
MAN5_TXT += gitformat-index.adoc
+MAN5_TXT += gitformat-loose.adoc
MAN5_TXT += gitformat-pack.adoc
MAN5_TXT += gitformat-signature.adoc
MAN5_TXT += githooks.adoc
diff --git a/Documentation/gitformat-loose.adoc b/Documentation/gitformat-loose.adoc
new file mode 100644
index 0000000000..947993663e
--- /dev/null
+++ b/Documentation/gitformat-loose.adoc
@@ -0,0 +1,53 @@
+gitformat-loose(5)
+==================
+
+NAME
+----
+gitformat-loose - Git loose object format
+
+
+SYNOPSIS
+--------
+[verse]
+$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
+
+DESCRIPTION
+-----------
+
+Loose objects are how Git stores individual objects, where every object is
+written as a separate file.
+
+Over the lifetime of a repository, objects are usually written as loose objects
+initially. Eventually, these loose objects will be compacted into packfiles
+via repository maintenance to improve disk space usage and speed up the lookup
+of these objects.
+
+== Loose objects
+
+Each loose object contains a prefix, followed immediately by the data of the
+object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
+`tree`, `commit`, or `tag` and `size` is the size of the data (without the
+prefix) as a decimal integer expressed in ASCII.
+
+The entire contents, prefix and data concatenated, is then compressed with zlib
+and the compressed data is stored in the file. The object ID of the object is
+the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
+
+The file for the loose object is stored under the `objects` directory, with the
+first two hex characters of the object ID being the directory and the remaining
+characters being the file name. This is done to shard the data and avoid too
+many files being in one directory, since some file systems perform poorly with
+many items in a directory.
+
+As an example, the empty tree contains the data (when uncompressed) `tree 0\0`
+and, in a SHA-256 repository, would have the object ID
+`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be
+stored under
+`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`.
+
+Similarly, a blob containing the contents `abc` would have the uncompressed
+data of `blob 3\0abc`.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Documentation/meson.build b/Documentation/meson.build
index 41f43e0336..64f70ac724 100644
--- a/Documentation/meson.build
+++ b/Documentation/meson.build
@@ -172,6 +172,7 @@ manpages = {
'gitformat-chunk.adoc' : 5,
'gitformat-commit-graph.adoc' : 5,
'gitformat-index.adoc' : 5,
+ 'gitformat-loose.adoc' : 5,
'gitformat-pack.adoc' : 5,
'gitformat-signature.adoc' : 5,
'githooks.adoc' : 5,
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH v2 5/9] docs: add documentation for loose objects
2025-10-02 22:38 ` [PATCH v2 5/9] docs: add documentation for loose objects brian m. carlson
@ 2025-10-03 17:05 ` Junio C Hamano
0 siblings, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2025-10-03 17:05 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> +DESCRIPTION
> +-----------
> +
> +Loose objects are how Git stores individual objects, where every object is
> +written as a separate file.
> +
> +Over the lifetime of a repository, objects are usually written as loose objects
> +initially. Eventually, these loose objects will be compacted into packfiles
> +via repository maintenance to improve disk space usage and speed up the lookup
> +of these objects.
Much easier to follow relative to v1. Very much appreciated.
> +== Loose objects
> +
> +Each loose object contains a prefix, followed immediately by the data of the
> +object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
> +`tree`, `commit`, or `tag` and `size` is the size of the data (without the
> +prefix) as a decimal integer expressed in ASCII.
> +
> +The entire contents, prefix and data concatenated, is then compressed with zlib
> +and the compressed data is stored in the file. The object ID of the object is
> +the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
> +
> +The file for the loose object is stored under the `objects` directory, with the
> +first two hex characters of the object ID being the directory and the remaining
> +characters being the file name. This is done to shard the data and avoid too
> +many files being in one directory, since some file systems perform poorly with
> +many items in a directory.
Additional explanation new in v2 look quite sensible.
> +As an example, the empty tree contains the data (when uncompressed) `tree 0\0`
> +and, in a SHA-256 repository, would have the object ID
> +`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be
> +stored under
> +`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`.
> +
> +Similarly, a blob containing the contents `abc` would have the uncompressed
> +data of `blob 3\0abc`.
> +
> +GIT
> +---
> +Part of the linkgit:git[1] suite
> diff --git a/Documentation/meson.build b/Documentation/meson.build
> index 41f43e0336..64f70ac724 100644
> --- a/Documentation/meson.build
> +++ b/Documentation/meson.build
> @@ -172,6 +172,7 @@ manpages = {
> 'gitformat-chunk.adoc' : 5,
> 'gitformat-commit-graph.adoc' : 5,
> 'gitformat-index.adoc' : 5,
> + 'gitformat-loose.adoc' : 5,
> 'gitformat-pack.adoc' : 5,
> 'gitformat-signature.adoc' : 5,
> 'githooks.adoc' : 5,
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v2 6/9] rev-parse: allow printing compatibility hash
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (4 preceding siblings ...)
2025-10-02 22:38 ` [PATCH v2 5/9] docs: add documentation for loose objects brian m. carlson
@ 2025-10-02 22:38 ` brian m. carlson
2025-10-02 22:38 ` [PATCH v2 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
` (2 subsequent siblings)
8 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-02 22:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
Right now, we have a way to print the storage hash, the input hash, and
the output hash, but we lack a way to print the compatibility hash. Add
a new type to --show-object-format, compat, which prints this value.
If no compatibility hash exists, simply print a newline. This is
important to allow users to use multiple options at once while still
getting unambiguous output.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/git-rev-parse.adoc | 11 ++++++-----
builtin/rev-parse.c | 11 ++++++++++-
t/t1500-rev-parse.sh | 34 ++++++++++++++++++++++++++++++++
3 files changed, 50 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-rev-parse.adoc b/Documentation/git-rev-parse.adoc
index cc32b4b4f0..465ae3e29d 100644
--- a/Documentation/git-rev-parse.adoc
+++ b/Documentation/git-rev-parse.adoc
@@ -324,11 +324,12 @@ The following options are unaffected by `--path-format`:
path of the current directory relative to the top-level
directory.
---show-object-format[=(storage|input|output)]::
- Show the object format (hash algorithm) used for the repository
- for storage inside the `.git` directory, input, or output. For
- input, multiple algorithms may be printed, space-separated.
- If not specified, the default is "storage".
+--show-object-format[=(storage|input|output|compat)]::
+ Show the object format (hash algorithm) used for the repository for storage
+ inside the `.git` directory, input, output, or compatibility. For input,
+ multiple algorithms may be printed, space-separated. If `compat` is
+ requested and no compatibility algorithm is enabled, prints an empty line. If
+ not specified, the default is "storage".
--show-ref-format::
Show the reference storage format used for the repository.
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 44ff1b8342..187b7e8be9 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -1108,11 +1108,20 @@ int cmd_rev_parse(int argc,
const char *val = arg ? arg : "storage";
if (strcmp(val, "storage") &&
+ strcmp(val, "compat") &&
strcmp(val, "input") &&
strcmp(val, "output"))
die(_("unknown mode for --show-object-format: %s"),
arg);
- puts(the_hash_algo->name);
+
+ if (!strcmp(val, "compat")) {
+ if (the_repository->compat_hash_algo)
+ puts(the_repository->compat_hash_algo->name);
+ else
+ putchar('\n');
+ } else {
+ puts(the_hash_algo->name);
+ }
continue;
}
if (!strcmp(arg, "--show-ref-format")) {
diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
index 58a4583088..7739ab611b 100755
--- a/t/t1500-rev-parse.sh
+++ b/t/t1500-rev-parse.sh
@@ -207,6 +207,40 @@ test_expect_success 'rev-parse --show-object-format in repo' '
grep "unknown mode for --show-object-format: squeamish-ossifrage" err
'
+
+test_expect_success 'rev-parse --show-object-format in repo with compat mode' '
+ mkdir repo &&
+ (
+ sane_unset GIT_DEFAULT_HASH &&
+ cd repo &&
+ git init --object-format=sha256 &&
+ git config extensions.compatobjectformat sha1 &&
+ echo sha256 >expect &&
+ git rev-parse --show-object-format >actual &&
+ test_cmp expect actual &&
+ git rev-parse --show-object-format=storage >actual &&
+ test_cmp expect actual &&
+ git rev-parse --show-object-format=input >actual &&
+ test_cmp expect actual &&
+ git rev-parse --show-object-format=output >actual &&
+ test_cmp expect actual &&
+ echo sha1 >expect &&
+ git rev-parse --show-object-format=compat >actual &&
+ test_cmp expect actual &&
+ test_must_fail git rev-parse --show-object-format=squeamish-ossifrage 2>err &&
+ grep "unknown mode for --show-object-format: squeamish-ossifrage" err
+ ) &&
+ mkdir repo2 &&
+ (
+ sane_unset GIT_DEFAULT_HASH &&
+ cd repo2 &&
+ git init --object-format=sha256 &&
+ echo >expect &&
+ git rev-parse --show-object-format=compat >actual &&
+ test_cmp expect actual
+ )
+'
+
test_expect_success 'rev-parse --show-ref-format' '
test_detect_ref_format >expect &&
git rev-parse --show-ref-format >actual &&
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v2 7/9] fsck: consider gpgsig headers expected in tags
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (5 preceding siblings ...)
2025-10-02 22:38 ` [PATCH v2 6/9] rev-parse: allow printing compatibility hash brian m. carlson
@ 2025-10-02 22:38 ` brian m. carlson
2025-10-02 22:38 ` [PATCH v2 8/9] t: allow specifying compatibility hash brian m. carlson
2025-10-02 22:38 ` [PATCH v2 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
8 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-02 22:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
When we're creating a tag, we want to make sure that gpgsig and
gpgsig-sha256 headers are allowed for the commit. The default fsck
behavior is to ignore the fact that they're left over, but some of our
tests enable strict checking which flags them nonetheless. Add
improved checking for these headers as well as documentation and several
tests.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/fsck-msgids.adoc | 6 ++++
fsck.c | 18 ++++++++++++
fsck.h | 2 ++
t/t1450-fsck.sh | 54 ++++++++++++++++++++++++++++++++++
4 files changed, 80 insertions(+)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..52d9a8a811 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -10,6 +10,12 @@
`badFilemode`::
(INFO) A tree contains a bad filemode entry.
+`badGpgsig`::
+ (ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header.
+
+`badHeaderContinuation`::
+ (ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated.
+
`badName`::
(ERROR) An author/committer name is empty.
diff --git a/fsck.c b/fsck.c
index 171b424dd5..341e100d24 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1067,6 +1067,24 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
else
ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
+ if (buffer < buffer_end && (skip_prefix(buffer, "gpgsig ", &buffer) || skip_prefix(buffer, "gpgsig-sha256 ", &buffer))) {
+ eol = memchr(buffer, '\n', buffer_end - buffer);
+ if (!eol) {
+ ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_GPGSIG, "invalid format - unexpected end after 'gpgsig' or 'gpgsig-sha256' line");
+ goto done;
+ }
+ buffer = eol + 1;
+
+ while (buffer < buffer_end && starts_with(buffer, " ")) {
+ eol = memchr(buffer, '\n', buffer_end - buffer);
+ if (!eol) {
+ ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_HEADER_CONTINUATION, "invalid format - unexpected end in 'gpgsig' or 'gpgsig-sha256' continuation line");
+ goto done;
+ }
+ buffer = eol + 1;
+ }
+ }
+
if (buffer < buffer_end && !starts_with(buffer, "\n")) {
/*
* The verify_headers() check will allow
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..c26616d7eb 100644
--- a/fsck.h
+++ b/fsck.h
@@ -25,9 +25,11 @@ enum fsck_msg_type {
FUNC(NUL_IN_HEADER, FATAL) \
FUNC(UNTERMINATED_HEADER, FATAL) \
/* errors */ \
+ FUNC(BAD_HEADER_CONTINUATION, ERROR) \
FUNC(BAD_DATE, ERROR) \
FUNC(BAD_DATE_OVERFLOW, ERROR) \
FUNC(BAD_EMAIL, ERROR) \
+ FUNC(BAD_GPGSIG, ERROR) \
FUNC(BAD_NAME, ERROR) \
FUNC(BAD_OBJECT_SHA1, ERROR) \
FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5ae86c42be..c4b651c2dc 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -454,6 +454,60 @@ test_expect_success 'tag with NUL in header' '
test_grep "error in tag $tag.*unterminated header: NUL at offset" out
'
+test_expect_success 'tag accepts gpgsig header even if not validly signed' '
+ test_oid_cache <<-\EOF &&
+ header sha1:gpgsig-sha256
+ header sha256:gpgsig
+ EOF
+ header=$(test_oid header) &&
+ sha=$(git rev-parse HEAD) &&
+ cat >good-tag <<-EOF &&
+ object $sha
+ type commit
+ tag good
+ tagger T A Gger <tagger@example.com> 1234567890 -0000
+ $header -----BEGIN PGP SIGNATURE-----
+ Not a valid signature
+ -----END PGP SIGNATURE-----
+
+ This is a good tag.
+ EOF
+
+ tag=$(git hash-object --literally -t tag -w --stdin <good-tag) &&
+ test_when_finished "remove_object $tag" &&
+ git update-ref refs/tags/good $tag &&
+ test_when_finished "git update-ref -d refs/tags/good" &&
+ git -c fsck.extraHeaderEntry=error fsck --tags
+'
+
+test_expect_success 'tag rejects invalid headers' '
+ test_oid_cache <<-\EOF &&
+ header sha1:gpgsig-sha256
+ header sha256:gpgsig
+ EOF
+ header=$(test_oid header) &&
+ sha=$(git rev-parse HEAD) &&
+ cat >bad-tag <<-EOF &&
+ object $sha
+ type commit
+ tag good
+ tagger T A Gger <tagger@example.com> 1234567890 -0000
+ $header -----BEGIN PGP SIGNATURE-----
+ Not a valid signature
+ -----END PGP SIGNATURE-----
+ junk
+
+ This is a bad tag with junk at the end of the headers.
+ EOF
+
+ tag=$(git hash-object --literally -t tag -w --stdin <bad-tag) &&
+ test_when_finished "remove_object $tag" &&
+ git update-ref refs/tags/bad $tag &&
+ test_when_finished "git update-ref -d refs/tags/bad" &&
+ test_must_fail git -c fsck.extraHeaderEntry=error fsck --tags 2>out &&
+ test_grep "error in tag $tag.*invalid format - extra header" out
+'
+
test_expect_success 'cleaned up' '
git fsck >actual 2>&1 &&
test_must_be_empty actual
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v2 8/9] t: allow specifying compatibility hash
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (6 preceding siblings ...)
2025-10-02 22:38 ` [PATCH v2 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
@ 2025-10-02 22:38 ` brian m. carlson
2025-10-03 17:14 ` Junio C Hamano
2025-10-02 22:38 ` [PATCH v2 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
8 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-10-02 22:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
We want to specify a compatibility hash for testing interactions for
SHA-256 repositories where we have SHA-1 compatibility enabled. Allow
the user to specify this scenario in the test suite by setting
GIT_TEST_DEFAULT_HASH to "sha256:sha1".
Note that this will get passed into GIT_DEFAULT_HASH, which Git itself
does not presently support. However, we will support this in a future
commit.
Since we'll now want to know the value for a specific version, let's add
the ability to specify either the storage hash (in this case, SHA-256)
or the compatibility hash (SHA-1). We use a different value for the
compatibility hash that will be enabled for all repositories
(test_repo_compat_hash_algo) versus the one that is used individually in
some tests (test_compat_hash_algo), since we want to still run those
individual tests without requiring that the testsuite be run fully in a
compatibility mode.
In some cases, we'll need to adjust our test suite to work in a proper
way with a compatibility hash. For example, in such a case, we'll only
use pack index v3, since v1 and v2 lack support for multiple algorithms.
Since we won't want to write those older formats, we'll need to skip
tests that do so. Let's add a COMPAT_HASH prerequisite for this
purpose.
Finally, in this scenario, we can no longer rely on having broken
objects work since we lack compatibility mappings to rewrite objects in
the repository. Add a prerequisite, BROKEN_OBJECTS, that we define in
terms of COMPAT_HASH and checks to see if creating deliberately broken
objects is possible, so that we can disable these tests if not.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
t/test-lib-functions.sh | 9 +++++++--
t/test-lib.sh | 13 +++++++++++++
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index a28de7b19b..52d7759bf5 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1708,11 +1708,16 @@ test_set_hash () {
# Detect the hash algorithm in use.
test_detect_hash () {
case "${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}" in
- "sha256")
+ *:*)
+ test_hash_algo="${GIT_TEST_DEFAULT_HASH%%:*}"
+ test_compat_hash_algo="${GIT_TEST_DEFAULT_HASH##*:}"
+ test_repo_compat_hash_algo="$test_compat_hash_algo"
+ ;;
+ sha256)
test_hash_algo=sha256
test_compat_hash_algo=sha1
;;
- *)
+ sha1)
test_hash_algo=sha1
test_compat_hash_algo=sha256
;;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 621cd31ae1..9eb79324ee 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1917,6 +1917,19 @@ test_lazy_prereq DEFAULT_HASH_ALGORITHM '
test_lazy_prereq DEFAULT_REPO_FORMAT '
test_have_prereq SHA1,REFFILES
'
+# BROKEN_OBJECTS is a test whether we can write deliberately broken objects and
+# expect them to work. When running using SHA-256 mode with SHA-1
+# compatibility, we cannot write such objects because there's no SHA-1
+# compatibility value for a nonexistent object.
+test_lazy_prereq BROKEN_OBJECTS '
+ ! test_have_prereq COMPAT_HASH
+'
+
+# COMPAT_HASH is a test if we're operating in a repository with SHA-256 with
+# SHA-1 compatibility.
+test_lazy_prereq COMPAT_HASH '
+ test -n "$test_repo_compat_hash_algo"
+'
# Ensure that no test accidentally triggers a Git command
# that runs the actual maintenance scheduler, affecting a user's
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH v2 8/9] t: allow specifying compatibility hash
2025-10-02 22:38 ` [PATCH v2 8/9] t: allow specifying compatibility hash brian m. carlson
@ 2025-10-03 17:14 ` Junio C Hamano
2025-10-03 20:45 ` brian m. carlson
0 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2025-10-03 17:14 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> Finally, in this scenario, we can no longer rely on having broken
> objects work since we lack compatibility mappings to rewrite objects in
> the repository. Add a prerequisite, BROKEN_OBJECTS, that we define in
> terms of COMPAT_HASH and checks to see if creating deliberately broken
> objects is possible, so that we can disable these tests if not.
Thanks for an attention for this kind of details.
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 621cd31ae1..9eb79324ee 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1917,6 +1917,19 @@ test_lazy_prereq DEFAULT_HASH_ALGORITHM '
> test_lazy_prereq DEFAULT_REPO_FORMAT '
> test_have_prereq SHA1,REFFILES
> '
> +# BROKEN_OBJECTS is a test whether we can write deliberately broken objects and
> +# expect them to work. When running using SHA-256 mode with SHA-1
> +# compatibility, we cannot write such objects because there's no SHA-1
> +# compatibility value for a nonexistent object.
> +test_lazy_prereq BROKEN_OBJECTS '
> + ! test_have_prereq COMPAT_HASH
> +'
> +
> +# COMPAT_HASH is a test if we're operating in a repository with SHA-256 with
> +# SHA-1 compatibility.
> +test_lazy_prereq COMPAT_HASH '
> + test -n "$test_repo_compat_hash_algo"
> +'
>
> # Ensure that no test accidentally triggers a Git command
> # that runs the actual maintenance scheduler, affecting a user's
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v2 8/9] t: allow specifying compatibility hash
2025-10-03 17:14 ` Junio C Hamano
@ 2025-10-03 20:45 ` brian m. carlson
0 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-03 20:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Patrick Steinhardt
[-- Attachment #1: Type: text/plain, Size: 885 bytes --]
On 2025-10-03 at 17:14:30, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
> > Finally, in this scenario, we can no longer rely on having broken
> > objects work since we lack compatibility mappings to rewrite objects in
> > the repository. Add a prerequisite, BROKEN_OBJECTS, that we define in
> > terms of COMPAT_HASH and checks to see if creating deliberately broken
> > objects is possible, so that we can disable these tests if not.
>
> Thanks for an attention for this kind of details.
I appreciate the kind words. To be fair, this is one of a few major
causes of testsuite failures when running in compatibility mode, so
addressing these is important to getting the testsuite to work properly
in such a mode. Fortunately, they are relatively easy to spot and fix.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v2 9/9] t1010: use BROKEN_OBJECTS prerequisite
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (7 preceding siblings ...)
2025-10-02 22:38 ` [PATCH v2 8/9] t: allow specifying compatibility hash brian m. carlson
@ 2025-10-02 22:38 ` brian m. carlson
8 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-02 22:38 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
When hash compatibility mode is enabled, we cannot write broken objects
because they cannot be mapped into the other hash algorithm. Use the
BROKEN_OBJECTS prerequisite to disable these tests and the writing of
broken objects in this mode.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
t/t1010-mktree.sh | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index e9973f7494..312fe6717a 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -11,10 +11,13 @@ test_expect_success setup '
git add "$d" || return 1
done &&
echo zero >one &&
- git update-index --add --info-only one &&
- git write-tree --missing-ok >tree.missing &&
- git ls-tree $(cat tree.missing) >top.missing &&
- git ls-tree -r $(cat tree.missing) >all.missing &&
+ if test_have_prereq BROKEN_OBJECTS
+ then
+ git update-index --add --info-only one &&
+ git write-tree --missing-ok >tree.missing &&
+ git ls-tree $(cat tree.missing) >top.missing &&
+ git ls-tree -r $(cat tree.missing) >all.missing
+ fi &&
echo one >one &&
git add one &&
git write-tree >tree &&
@@ -53,7 +56,7 @@ test_expect_success 'ls-tree output in wrong order given to mktree (2)' '
test_cmp tree.withsub actual
'
-test_expect_success 'allow missing object with --missing' '
+test_expect_success BROKEN_OBJECTS 'allow missing object with --missing' '
git mktree --missing <top.missing >actual &&
test_cmp tree.missing actual
'
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1
2025-09-19 1:09 [PATCH 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (9 preceding siblings ...)
2025-10-02 22:38 ` [PATCH v2 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
@ 2025-10-09 21:56 ` brian m. carlson
2025-10-09 21:56 ` [PATCH v3 1/9] docs: update pack index v3 format brian m. carlson
` (9 more replies)
10 siblings, 10 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-09 21:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
This is the first of several series for SHA-1 and SHA-256
interoperability, all of which will hopefully land in Git 3.0.
The first set of commits here is documentation updates for documentation
which was incorrect, missing, or simply wrong. I have spent more time
than I'd like in the pack code and felt our documentation there could be
more helpful. I also am correcting some things about the
interoperability formats that I've found are not correct or efficient in
terms of implementation and thus I will be implementing differently.
The loose object documentation will be updated with the loose object
mapping in a future commit, but I felt I should send a basic loose
object document first, so here it is.
The remaining commits are for expected gpgsig headers in tags, which
causes some tests which use strict fsck to fail, as well as for
prerequisites for compatibility hashes in the testsuite. Actually using
this configuration is not possible since the tests are still very broken
using it, but declaring these prerequisites allows me and others to send
in patches that use them and thus make our testsuite more resilient.
For example, in interoperability mode we cannot write objects that
are not valid since we cannot convert them into the other hash
algorithm. Thus, when we're testing in a mode that has a compatibility
algorithm, we skip these tests.
The goal is to run the tests in a full compatibility mode where
everything is dual-hash as well as introduce some specific tests for
interoperability that run in all configurations of the tests.
Changes from v2:
* Improve the language slightly in the pack documentation.
Changes from v1:
* Squash the two test changes into one commit.
* Include a new commit showing the use of the BROKEN_OBJECTS prereq.
* Mention using main algorithm hash in pack index v3.
* Hopefully clarify signed tags.
* Improve text for pack format documentation.
* Wire up build of loose object documentation.
* Remove loose object map documentation.
* Rephrase text about loose objects.
* Remove needless RUST prerequisite.
* Wrap overly long line.
* Reject invalid signature algorithms in tag headers.
* Fix if/whether problem in test comment.
brian m. carlson (9):
docs: update pack index v3 format
docs: update offset order for pack index v3
docs: reflect actual double signature for tags
docs: improve ambiguous areas of pack format documentation
docs: add documentation for loose objects
rev-parse: allow printing compatibility hash
fsck: consider gpgsig headers expected in tags
t: allow specifying compatibility hash
t1010: use BROKEN_OBJECTS prerequisite
Documentation/Makefile | 1 +
Documentation/fsck-msgids.adoc | 6 +++
Documentation/git-rev-parse.adoc | 11 ++--
Documentation/gitformat-loose.adoc | 53 ++++++++++++++++++
Documentation/gitformat-pack.adoc | 19 +++++++
Documentation/meson.build | 1 +
.../technical/hash-function-transition.adoc | 42 ++++++++-------
builtin/rev-parse.c | 11 +++-
fsck.c | 18 +++++++
fsck.h | 2 +
t/t1010-mktree.sh | 13 +++--
t/t1450-fsck.sh | 54 +++++++++++++++++++
t/t1500-rev-parse.sh | 34 ++++++++++++
t/test-lib-functions.sh | 9 +++-
t/test-lib.sh | 13 +++++
15 files changed, 255 insertions(+), 32 deletions(-)
create mode 100644 Documentation/gitformat-loose.adoc
^ permalink raw reply [flat|nested] 67+ messages in thread* [PATCH v3 1/9] docs: update pack index v3 format
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
@ 2025-10-09 21:56 ` brian m. carlson
2025-10-09 21:56 ` [PATCH v3 2/9] docs: update offset order for pack index v3 brian m. carlson
` (8 subsequent siblings)
9 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-09 21:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
Our current pack index v3 format uses 4-byte integers to find the
trailer of the file. This effectively means that the file cannot be
much larger than 2^32. While this might at first seem to be okay, we
expect that each object will have at least 64 bytes worth of data, which
means that no more than about 67 million objects can be stored.
Again, this might seem fine, but unfortunately, we know of many users
who attempt to create repos with extremely large numbers of commits to
get a "high score," and we've already seen repositories with at least 55
million commits. In the interests of gracefully handling repositories
even for these well-intentioned but ultimately misguided users, let's
change these lengths to 8 bytes.
For the checksums at the end of the file, we're producing 32-byte
SHA-256 checksums because that's what we already do with pack index v2
and SHA-256. Truncating SHA-256 doesn't pose any actual security
problems other than those related to the reduced size, but our pack
checksum must already be 32 bytes (since SHA-256 packs have 32-byte
checksums) and it simplifies the code to use the existing hashfile logic
for these cases for the index checksum as well.
In addition, even though we may not need cryptographic security for the
index checksum, we'd like to avoid arguments from auditors and such for
organizations that may have compliance or security requirements. Using
the simple, boring choice of the full SHA-256 hash avoids all possible
discussion related to hash truncation and removes impediments for these
organizations.
Note that we do not yet have a pack index v3 implementation in Git, so
it should be fine to change this format. However, such an
implementation has been written for future inclusion following this
format.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
.../technical/hash-function-transition.adoc | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
index f047fd80ca..274dc993d4 100644
--- a/Documentation/technical/hash-function-transition.adoc
+++ b/Documentation/technical/hash-function-transition.adoc
@@ -227,9 +227,9 @@ network byte order):
** 4-byte length in bytes of shortened object names. This is the
shortest possible length needed to make names in the shortened
object name table unambiguous.
- ** 4-byte integer, recording where tables relating to this format
+ ** 8-byte integer, recording where tables relating to this format
are stored in this index file, as an offset from the beginning.
- * 4-byte offset to the trailer from the beginning of this file.
+ * 8-byte offset to the trailer from the beginning of this file.
* Zero or more additional key/value pairs (4-byte key, 4-byte
value). Only one key is supported: 'PSRC'. See the "Loose objects
and unreachable objects" section for supported values and how this
@@ -276,10 +276,14 @@ network byte order):
up to and not including the table of CRC32 values.
- Zero or more NUL bytes.
- The trailer consists of the following:
- * A copy of the 20-byte SHA-256 checksum at the end of the
+ * A copy of the full main hash checksum at the end of the
corresponding packfile.
- * 20-byte SHA-256 checksum of all of the above.
+ * Full main hash checksum of all of the above.
+
+The "full main hash" is a full-length hash of the main (not compatibility)
+algorithm in the repository. Thus, if the main algorithm is SHA-256, this is
+a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash.
Loose object index
~~~~~~~~~~~~~~~~~~
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v3 2/9] docs: update offset order for pack index v3
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-09 21:56 ` [PATCH v3 1/9] docs: update pack index v3 format brian m. carlson
@ 2025-10-09 21:56 ` brian m. carlson
2025-10-09 21:56 ` [PATCH v3 3/9] docs: reflect actual double signature for tags brian m. carlson
` (7 subsequent siblings)
9 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-09 21:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
The current design of pack index v3 has items in two different orders:
sorted shortened object ID order and pack order. The shortened object
IDs and the pack index offset values are in the former order and
everything else is in the latter.
This, however, poses some problems. We have many parts of the packfile
code that expect to find out data about an object knowing only its index
in pack order. With the current design, to find the pack offset after
having looked up the index in pack order, we must then look up the full
object ID and use that to look up the shortened object ID to find the
pack offset, which is inconvenient, inefficient, and leads to poor cache
usage.
Instead, let's change the offset values to be looked up by pack order.
This works better because once we know the pack order offset, we can
find the full object name and its location in the pack with a simple
index into their respective tables. This makes many operations much
more efficient, especially with the functions we already have, and it
avoids the need for the revindex with pack index v3.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/technical/hash-function-transition.adoc | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
index 274dc993d4..adb0c61e53 100644
--- a/Documentation/technical/hash-function-transition.adoc
+++ b/Documentation/technical/hash-function-transition.adoc
@@ -260,12 +260,10 @@ network byte order):
compressed data to be copied directly from pack to pack during
repacking without undetected data corruption.
- * A table of 4-byte offset values. For an object in the table of
- sorted shortened object names, the value at the corresponding
- index in this table indicates where that object can be found in
- the pack file. These are usually 31-bit pack file offsets, but
- large offsets are encoded as an index into the next table with the
- most significant bit set.
+ * A table of 4-byte offset values. The index of this table in pack order
+ indicates where that object can be found in the pack file. These are
+ usually 31-bit pack file offsets, but large offsets are encoded as
+ an index into the next table with the most significant bit set.
* A table of 8-byte offset entries (empty for pack files less than
2 GiB). Pack files are organized with heavily used objects toward
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v3 3/9] docs: reflect actual double signature for tags
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
2025-10-09 21:56 ` [PATCH v3 1/9] docs: update pack index v3 format brian m. carlson
2025-10-09 21:56 ` [PATCH v3 2/9] docs: update offset order for pack index v3 brian m. carlson
@ 2025-10-09 21:56 ` brian m. carlson
2025-10-09 21:56 ` [PATCH v3 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
` (6 subsequent siblings)
9 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-09 21:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
The documentation for the hash function transition reflects the original
design where the SHA-256 signature would always be placed in a header.
However, due to a missed patch in Git 2.29, we shipped SHA-256 support
such that the signature for the current algorithm is always an in-body
signature and the opposite algorithm is always in a header. Since the
documentation is inaccurate, update it to reflect the correct
information.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
.../technical/hash-function-transition.adoc | 20 ++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
index adb0c61e53..2359d7d106 100644
--- a/Documentation/technical/hash-function-transition.adoc
+++ b/Documentation/technical/hash-function-transition.adoc
@@ -429,17 +429,19 @@ ordinary unsigned commit.
Signed Tags
~~~~~~~~~~~
-We add a new field "gpgsig-sha256" to the tag object format to allow
-signing tags without relying on SHA-1. Its signed payload is the
-SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
-SIGNATURE-----" delimited in-body signature removed.
+We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
+allow signing tags in both formats. The in-body signature is used for the
+signature in the current hash algorithm and the header is used for the
+signature in the other algorithm. Thus, a dual-signature tag will contain both
+an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
+object or both an in-body signature and a gpgsig header for the SHA-256 format
+of and object.
-This means tags can be signed
+The signed payload of the tag is the content of the tag in the current
+algorithm with both its gpgsig and gpgsig-sha256 fields and
+"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed.
-1. using SHA-1 only, as in existing signed tag objects
-2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
- signature.
-3. using only SHA-256, by only using the gpgsig-sha256 field.
+This means tags can be signed using one or both algorithms.
Mergetag embedding
~~~~~~~~~~~~~~~~~~
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v3 4/9] docs: improve ambiguous areas of pack format documentation
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (2 preceding siblings ...)
2025-10-09 21:56 ` [PATCH v3 3/9] docs: reflect actual double signature for tags brian m. carlson
@ 2025-10-09 21:56 ` brian m. carlson
2025-10-09 21:56 ` [PATCH v3 5/9] docs: add documentation for loose objects brian m. carlson
` (5 subsequent siblings)
9 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-09 21:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
It is fair to say that our pack and indexing code is quite complex.
Contributors who wish to work on this code or implementors of other
implementations would benefit from clear, unambiguous documentation
about how our data formats are structured and encoded and what data is
used in the computation of certain values. Unfortunately, some of this
data is missing, which leads to confusion and frustration.
Let's document some of this data to help clarify things. Specify over
what data CRC32 values are computed and also note which CRC32 algorithm
is used, since Wikipedia mentions at least four 32-bit CRC algorithms
and notes that it's possible to use different bit orderings.
In addition, note how we encode objects in the pack. One might be led
to believe that packed objects are always stored with the "<type>
<size>\0" prefix of loose objects, but that is not the case, although
for obvious reasons this data is included in the computation of the
object ID. Explain why this is for the curious reader.
Finally, indicate what the size field of the packed object represents.
Otherwise, a reader might think that the size of a delta is the size of
the full object or that it might contain the offset or object ID,
neither of which are the case. Explain clearly, however, that the
values represent uncompressed sizes to avoid confusion.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/gitformat-pack.adoc | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/Documentation/gitformat-pack.adoc b/Documentation/gitformat-pack.adoc
index d6ae229be5..1b4db4aa61 100644
--- a/Documentation/gitformat-pack.adoc
+++ b/Documentation/gitformat-pack.adoc
@@ -32,6 +32,10 @@ In a repository using the traditional SHA-1, pack checksums, index checksums,
and object IDs (object names) mentioned below are all computed using SHA-1.
Similarly, in SHA-256 repositories, these values are computed using SHA-256.
+CRC32 checksums are always computed over the entire packed object, including
+the header (n-byte type and length); the base object name or offset, if any;
+and the entire compressed object. The CRC32 algorithm used is that of zlib.
+
== pack-*.pack files have the following format:
- A header appears at the beginning and consists of the following:
@@ -80,6 +84,16 @@ Valid object types are:
Type 5 is reserved for future expansion. Type 0 is invalid.
+=== Object encoding
+
+Unlike loose objects, packed objects do not have a prefix containing the type,
+size, and a NUL byte. These are not necessary because they can be determined by
+the n-byte type and length that prefixes the data and so they are omitted from
+the compressed and deltified data.
+
+The computation of the object ID still uses this prefix by reconstructing it
+from the type and length as needed.
+
=== Size encoding
This document uses the following "size encoding" of non-negative
@@ -92,6 +106,11 @@ values are more significant.
This size encoding should not be confused with the "offset encoding",
which is also used in this document.
+When encoding the size of an undeltified object in a pack, the size is that of
+the uncompressed raw object. For deltified objects, it is the size of the
+uncompressed delta. The base object name or offset is not included in the size
+computation.
+
=== Deltified representation
Conceptually there are only four object types: commit, tree, tag and
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v3 5/9] docs: add documentation for loose objects
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (3 preceding siblings ...)
2025-10-09 21:56 ` [PATCH v3 4/9] docs: improve ambiguous areas of pack format documentation brian m. carlson
@ 2025-10-09 21:56 ` brian m. carlson
2025-10-09 21:56 ` [PATCH v3 6/9] rev-parse: allow printing compatibility hash brian m. carlson
` (4 subsequent siblings)
9 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-09 21:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
We currently have no documentation for how loose objects are stored.
Let's add some here so it's easy for people to understand how they
work.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/Makefile | 1 +
Documentation/gitformat-loose.adoc | 53 ++++++++++++++++++++++++++++++
Documentation/meson.build | 1 +
3 files changed, 55 insertions(+)
create mode 100644 Documentation/gitformat-loose.adoc
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6fb83d0c6e..e1d38fbfe6 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -34,6 +34,7 @@ MAN5_TXT += gitformat-bundle.adoc
MAN5_TXT += gitformat-chunk.adoc
MAN5_TXT += gitformat-commit-graph.adoc
MAN5_TXT += gitformat-index.adoc
+MAN5_TXT += gitformat-loose.adoc
MAN5_TXT += gitformat-pack.adoc
MAN5_TXT += gitformat-signature.adoc
MAN5_TXT += githooks.adoc
diff --git a/Documentation/gitformat-loose.adoc b/Documentation/gitformat-loose.adoc
new file mode 100644
index 0000000000..947993663e
--- /dev/null
+++ b/Documentation/gitformat-loose.adoc
@@ -0,0 +1,53 @@
+gitformat-loose(5)
+==================
+
+NAME
+----
+gitformat-loose - Git loose object format
+
+
+SYNOPSIS
+--------
+[verse]
+$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
+
+DESCRIPTION
+-----------
+
+Loose objects are how Git stores individual objects, where every object is
+written as a separate file.
+
+Over the lifetime of a repository, objects are usually written as loose objects
+initially. Eventually, these loose objects will be compacted into packfiles
+via repository maintenance to improve disk space usage and speed up the lookup
+of these objects.
+
+== Loose objects
+
+Each loose object contains a prefix, followed immediately by the data of the
+object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
+`tree`, `commit`, or `tag` and `size` is the size of the data (without the
+prefix) as a decimal integer expressed in ASCII.
+
+The entire contents, prefix and data concatenated, is then compressed with zlib
+and the compressed data is stored in the file. The object ID of the object is
+the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
+
+The file for the loose object is stored under the `objects` directory, with the
+first two hex characters of the object ID being the directory and the remaining
+characters being the file name. This is done to shard the data and avoid too
+many files being in one directory, since some file systems perform poorly with
+many items in a directory.
+
+As an example, the empty tree contains the data (when uncompressed) `tree 0\0`
+and, in a SHA-256 repository, would have the object ID
+`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be
+stored under
+`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`.
+
+Similarly, a blob containing the contents `abc` would have the uncompressed
+data of `blob 3\0abc`.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Documentation/meson.build b/Documentation/meson.build
index 41f43e0336..64f70ac724 100644
--- a/Documentation/meson.build
+++ b/Documentation/meson.build
@@ -172,6 +172,7 @@ manpages = {
'gitformat-chunk.adoc' : 5,
'gitformat-commit-graph.adoc' : 5,
'gitformat-index.adoc' : 5,
+ 'gitformat-loose.adoc' : 5,
'gitformat-pack.adoc' : 5,
'gitformat-signature.adoc' : 5,
'githooks.adoc' : 5,
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v3 6/9] rev-parse: allow printing compatibility hash
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (4 preceding siblings ...)
2025-10-09 21:56 ` [PATCH v3 5/9] docs: add documentation for loose objects brian m. carlson
@ 2025-10-09 21:56 ` brian m. carlson
2025-10-09 21:56 ` [PATCH v3 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
` (3 subsequent siblings)
9 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-09 21:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
Right now, we have a way to print the storage hash, the input hash, and
the output hash, but we lack a way to print the compatibility hash. Add
a new type to --show-object-format, compat, which prints this value.
If no compatibility hash exists, simply print a newline. This is
important to allow users to use multiple options at once while still
getting unambiguous output.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/git-rev-parse.adoc | 11 ++++++-----
builtin/rev-parse.c | 11 ++++++++++-
t/t1500-rev-parse.sh | 34 ++++++++++++++++++++++++++++++++
3 files changed, 50 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-rev-parse.adoc b/Documentation/git-rev-parse.adoc
index cc32b4b4f0..465ae3e29d 100644
--- a/Documentation/git-rev-parse.adoc
+++ b/Documentation/git-rev-parse.adoc
@@ -324,11 +324,12 @@ The following options are unaffected by `--path-format`:
path of the current directory relative to the top-level
directory.
---show-object-format[=(storage|input|output)]::
- Show the object format (hash algorithm) used for the repository
- for storage inside the `.git` directory, input, or output. For
- input, multiple algorithms may be printed, space-separated.
- If not specified, the default is "storage".
+--show-object-format[=(storage|input|output|compat)]::
+ Show the object format (hash algorithm) used for the repository for storage
+ inside the `.git` directory, input, output, or compatibility. For input,
+ multiple algorithms may be printed, space-separated. If `compat` is
+ requested and no compatibility algorithm is enabled, prints an empty line. If
+ not specified, the default is "storage".
--show-ref-format::
Show the reference storage format used for the repository.
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 44ff1b8342..187b7e8be9 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -1108,11 +1108,20 @@ int cmd_rev_parse(int argc,
const char *val = arg ? arg : "storage";
if (strcmp(val, "storage") &&
+ strcmp(val, "compat") &&
strcmp(val, "input") &&
strcmp(val, "output"))
die(_("unknown mode for --show-object-format: %s"),
arg);
- puts(the_hash_algo->name);
+
+ if (!strcmp(val, "compat")) {
+ if (the_repository->compat_hash_algo)
+ puts(the_repository->compat_hash_algo->name);
+ else
+ putchar('\n');
+ } else {
+ puts(the_hash_algo->name);
+ }
continue;
}
if (!strcmp(arg, "--show-ref-format")) {
diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
index 58a4583088..7739ab611b 100755
--- a/t/t1500-rev-parse.sh
+++ b/t/t1500-rev-parse.sh
@@ -207,6 +207,40 @@ test_expect_success 'rev-parse --show-object-format in repo' '
grep "unknown mode for --show-object-format: squeamish-ossifrage" err
'
+
+test_expect_success 'rev-parse --show-object-format in repo with compat mode' '
+ mkdir repo &&
+ (
+ sane_unset GIT_DEFAULT_HASH &&
+ cd repo &&
+ git init --object-format=sha256 &&
+ git config extensions.compatobjectformat sha1 &&
+ echo sha256 >expect &&
+ git rev-parse --show-object-format >actual &&
+ test_cmp expect actual &&
+ git rev-parse --show-object-format=storage >actual &&
+ test_cmp expect actual &&
+ git rev-parse --show-object-format=input >actual &&
+ test_cmp expect actual &&
+ git rev-parse --show-object-format=output >actual &&
+ test_cmp expect actual &&
+ echo sha1 >expect &&
+ git rev-parse --show-object-format=compat >actual &&
+ test_cmp expect actual &&
+ test_must_fail git rev-parse --show-object-format=squeamish-ossifrage 2>err &&
+ grep "unknown mode for --show-object-format: squeamish-ossifrage" err
+ ) &&
+ mkdir repo2 &&
+ (
+ sane_unset GIT_DEFAULT_HASH &&
+ cd repo2 &&
+ git init --object-format=sha256 &&
+ echo >expect &&
+ git rev-parse --show-object-format=compat >actual &&
+ test_cmp expect actual
+ )
+'
+
test_expect_success 'rev-parse --show-ref-format' '
test_detect_ref_format >expect &&
git rev-parse --show-ref-format >actual &&
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v3 7/9] fsck: consider gpgsig headers expected in tags
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (5 preceding siblings ...)
2025-10-09 21:56 ` [PATCH v3 6/9] rev-parse: allow printing compatibility hash brian m. carlson
@ 2025-10-09 21:56 ` brian m. carlson
2025-10-09 21:56 ` [PATCH v3 8/9] t: allow specifying compatibility hash brian m. carlson
` (2 subsequent siblings)
9 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-09 21:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
When we're creating a tag, we want to make sure that gpgsig and
gpgsig-sha256 headers are allowed for the commit. The default fsck
behavior is to ignore the fact that they're left over, but some of our
tests enable strict checking which flags them nonetheless. Add
improved checking for these headers as well as documentation and several
tests.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Documentation/fsck-msgids.adoc | 6 ++++
fsck.c | 18 ++++++++++++
fsck.h | 2 ++
t/t1450-fsck.sh | 54 ++++++++++++++++++++++++++++++++++
4 files changed, 80 insertions(+)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..52d9a8a811 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -10,6 +10,12 @@
`badFilemode`::
(INFO) A tree contains a bad filemode entry.
+`badGpgsig`::
+ (ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header.
+
+`badHeaderContinuation`::
+ (ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated.
+
`badName`::
(ERROR) An author/committer name is empty.
diff --git a/fsck.c b/fsck.c
index 171b424dd5..341e100d24 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1067,6 +1067,24 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
else
ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
+ if (buffer < buffer_end && (skip_prefix(buffer, "gpgsig ", &buffer) || skip_prefix(buffer, "gpgsig-sha256 ", &buffer))) {
+ eol = memchr(buffer, '\n', buffer_end - buffer);
+ if (!eol) {
+ ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_GPGSIG, "invalid format - unexpected end after 'gpgsig' or 'gpgsig-sha256' line");
+ goto done;
+ }
+ buffer = eol + 1;
+
+ while (buffer < buffer_end && starts_with(buffer, " ")) {
+ eol = memchr(buffer, '\n', buffer_end - buffer);
+ if (!eol) {
+ ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_HEADER_CONTINUATION, "invalid format - unexpected end in 'gpgsig' or 'gpgsig-sha256' continuation line");
+ goto done;
+ }
+ buffer = eol + 1;
+ }
+ }
+
if (buffer < buffer_end && !starts_with(buffer, "\n")) {
/*
* The verify_headers() check will allow
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..c26616d7eb 100644
--- a/fsck.h
+++ b/fsck.h
@@ -25,9 +25,11 @@ enum fsck_msg_type {
FUNC(NUL_IN_HEADER, FATAL) \
FUNC(UNTERMINATED_HEADER, FATAL) \
/* errors */ \
+ FUNC(BAD_HEADER_CONTINUATION, ERROR) \
FUNC(BAD_DATE, ERROR) \
FUNC(BAD_DATE_OVERFLOW, ERROR) \
FUNC(BAD_EMAIL, ERROR) \
+ FUNC(BAD_GPGSIG, ERROR) \
FUNC(BAD_NAME, ERROR) \
FUNC(BAD_OBJECT_SHA1, ERROR) \
FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5ae86c42be..c4b651c2dc 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -454,6 +454,60 @@ test_expect_success 'tag with NUL in header' '
test_grep "error in tag $tag.*unterminated header: NUL at offset" out
'
+test_expect_success 'tag accepts gpgsig header even if not validly signed' '
+ test_oid_cache <<-\EOF &&
+ header sha1:gpgsig-sha256
+ header sha256:gpgsig
+ EOF
+ header=$(test_oid header) &&
+ sha=$(git rev-parse HEAD) &&
+ cat >good-tag <<-EOF &&
+ object $sha
+ type commit
+ tag good
+ tagger T A Gger <tagger@example.com> 1234567890 -0000
+ $header -----BEGIN PGP SIGNATURE-----
+ Not a valid signature
+ -----END PGP SIGNATURE-----
+
+ This is a good tag.
+ EOF
+
+ tag=$(git hash-object --literally -t tag -w --stdin <good-tag) &&
+ test_when_finished "remove_object $tag" &&
+ git update-ref refs/tags/good $tag &&
+ test_when_finished "git update-ref -d refs/tags/good" &&
+ git -c fsck.extraHeaderEntry=error fsck --tags
+'
+
+test_expect_success 'tag rejects invalid headers' '
+ test_oid_cache <<-\EOF &&
+ header sha1:gpgsig-sha256
+ header sha256:gpgsig
+ EOF
+ header=$(test_oid header) &&
+ sha=$(git rev-parse HEAD) &&
+ cat >bad-tag <<-EOF &&
+ object $sha
+ type commit
+ tag good
+ tagger T A Gger <tagger@example.com> 1234567890 -0000
+ $header -----BEGIN PGP SIGNATURE-----
+ Not a valid signature
+ -----END PGP SIGNATURE-----
+ junk
+
+ This is a bad tag with junk at the end of the headers.
+ EOF
+
+ tag=$(git hash-object --literally -t tag -w --stdin <bad-tag) &&
+ test_when_finished "remove_object $tag" &&
+ git update-ref refs/tags/bad $tag &&
+ test_when_finished "git update-ref -d refs/tags/bad" &&
+ test_must_fail git -c fsck.extraHeaderEntry=error fsck --tags 2>out &&
+ test_grep "error in tag $tag.*invalid format - extra header" out
+'
+
test_expect_success 'cleaned up' '
git fsck >actual 2>&1 &&
test_must_be_empty actual
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v3 8/9] t: allow specifying compatibility hash
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (6 preceding siblings ...)
2025-10-09 21:56 ` [PATCH v3 7/9] fsck: consider gpgsig headers expected in tags brian m. carlson
@ 2025-10-09 21:56 ` brian m. carlson
2025-10-09 21:56 ` [PATCH v3 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
2025-10-13 15:24 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 Junio C Hamano
9 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-09 21:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
We want to specify a compatibility hash for testing interactions for
SHA-256 repositories where we have SHA-1 compatibility enabled. Allow
the user to specify this scenario in the test suite by setting
GIT_TEST_DEFAULT_HASH to "sha256:sha1".
Note that this will get passed into GIT_DEFAULT_HASH, which Git itself
does not presently support. However, we will support this in a future
commit.
Since we'll now want to know the value for a specific version, let's add
the ability to specify either the storage hash (in this case, SHA-256)
or the compatibility hash (SHA-1). We use a different value for the
compatibility hash that will be enabled for all repositories
(test_repo_compat_hash_algo) versus the one that is used individually in
some tests (test_compat_hash_algo), since we want to still run those
individual tests without requiring that the testsuite be run fully in a
compatibility mode.
In some cases, we'll need to adjust our test suite to work in a proper
way with a compatibility hash. For example, in such a case, we'll only
use pack index v3, since v1 and v2 lack support for multiple algorithms.
Since we won't want to write those older formats, we'll need to skip
tests that do so. Let's add a COMPAT_HASH prerequisite for this
purpose.
Finally, in this scenario, we can no longer rely on having broken
objects work since we lack compatibility mappings to rewrite objects in
the repository. Add a prerequisite, BROKEN_OBJECTS, that we define in
terms of COMPAT_HASH and checks to see if creating deliberately broken
objects is possible, so that we can disable these tests if not.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
t/test-lib-functions.sh | 9 +++++++--
t/test-lib.sh | 13 +++++++++++++
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index a28de7b19b..52d7759bf5 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1708,11 +1708,16 @@ test_set_hash () {
# Detect the hash algorithm in use.
test_detect_hash () {
case "${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}" in
- "sha256")
+ *:*)
+ test_hash_algo="${GIT_TEST_DEFAULT_HASH%%:*}"
+ test_compat_hash_algo="${GIT_TEST_DEFAULT_HASH##*:}"
+ test_repo_compat_hash_algo="$test_compat_hash_algo"
+ ;;
+ sha256)
test_hash_algo=sha256
test_compat_hash_algo=sha1
;;
- *)
+ sha1)
test_hash_algo=sha1
test_compat_hash_algo=sha256
;;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 621cd31ae1..9eb79324ee 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1917,6 +1917,19 @@ test_lazy_prereq DEFAULT_HASH_ALGORITHM '
test_lazy_prereq DEFAULT_REPO_FORMAT '
test_have_prereq SHA1,REFFILES
'
+# BROKEN_OBJECTS is a test whether we can write deliberately broken objects and
+# expect them to work. When running using SHA-256 mode with SHA-1
+# compatibility, we cannot write such objects because there's no SHA-1
+# compatibility value for a nonexistent object.
+test_lazy_prereq BROKEN_OBJECTS '
+ ! test_have_prereq COMPAT_HASH
+'
+
+# COMPAT_HASH is a test if we're operating in a repository with SHA-256 with
+# SHA-1 compatibility.
+test_lazy_prereq COMPAT_HASH '
+ test -n "$test_repo_compat_hash_algo"
+'
# Ensure that no test accidentally triggers a Git command
# that runs the actual maintenance scheduler, affecting a user's
^ permalink raw reply related [flat|nested] 67+ messages in thread* [PATCH v3 9/9] t1010: use BROKEN_OBJECTS prerequisite
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (7 preceding siblings ...)
2025-10-09 21:56 ` [PATCH v3 8/9] t: allow specifying compatibility hash brian m. carlson
@ 2025-10-09 21:56 ` brian m. carlson
2025-10-13 15:24 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 Junio C Hamano
9 siblings, 0 replies; 67+ messages in thread
From: brian m. carlson @ 2025-10-09 21:56 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
When hash compatibility mode is enabled, we cannot write broken objects
because they cannot be mapped into the other hash algorithm. Use the
BROKEN_OBJECTS prerequisite to disable these tests and the writing of
broken objects in this mode.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
t/t1010-mktree.sh | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index e9973f7494..312fe6717a 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -11,10 +11,13 @@ test_expect_success setup '
git add "$d" || return 1
done &&
echo zero >one &&
- git update-index --add --info-only one &&
- git write-tree --missing-ok >tree.missing &&
- git ls-tree $(cat tree.missing) >top.missing &&
- git ls-tree -r $(cat tree.missing) >all.missing &&
+ if test_have_prereq BROKEN_OBJECTS
+ then
+ git update-index --add --info-only one &&
+ git write-tree --missing-ok >tree.missing &&
+ git ls-tree $(cat tree.missing) >top.missing &&
+ git ls-tree -r $(cat tree.missing) >all.missing
+ fi &&
echo one >one &&
git add one &&
git write-tree >tree &&
@@ -53,7 +56,7 @@ test_expect_success 'ls-tree output in wrong order given to mktree (2)' '
test_cmp tree.withsub actual
'
-test_expect_success 'allow missing object with --missing' '
+test_expect_success BROKEN_OBJECTS 'allow missing object with --missing' '
git mktree --missing <top.missing >actual &&
test_cmp tree.missing actual
'
^ permalink raw reply related [flat|nested] 67+ messages in thread* Re: [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1
2025-10-09 21:56 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 brian m. carlson
` (8 preceding siblings ...)
2025-10-09 21:56 ` [PATCH v3 9/9] t1010: use BROKEN_OBJECTS prerequisite brian m. carlson
@ 2025-10-13 15:24 ` Junio C Hamano
2025-10-13 16:34 ` brian m. carlson
9 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2025-10-13 15:24 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Patrick Steinhardt
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> brian m. carlson (9):
> docs: update pack index v3 format
> docs: update offset order for pack index v3
> docs: reflect actual double signature for tags
> docs: improve ambiguous areas of pack format documentation
> docs: add documentation for loose objects
> rev-parse: allow printing compatibility hash
> fsck: consider gpgsig headers expected in tags
> t: allow specifying compatibility hash
> t1010: use BROKEN_OBJECTS prerequisite
The topic has been quiet ovre the weekend. Shall I mark it for
'next' now?
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1
2025-10-13 15:24 ` [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1 Junio C Hamano
@ 2025-10-13 16:34 ` brian m. carlson
2025-10-14 5:53 ` Patrick Steinhardt
0 siblings, 1 reply; 67+ messages in thread
From: brian m. carlson @ 2025-10-13 16:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Patrick Steinhardt
[-- Attachment #1: Type: text/plain, Size: 967 bytes --]
On 2025-10-13 at 15:24:55, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
> > brian m. carlson (9):
> > docs: update pack index v3 format
> > docs: update offset order for pack index v3
> > docs: reflect actual double signature for tags
> > docs: improve ambiguous areas of pack format documentation
> > docs: add documentation for loose objects
> > rev-parse: allow printing compatibility hash
> > fsck: consider gpgsig headers expected in tags
> > t: allow specifying compatibility hash
> > t1010: use BROKEN_OBJECTS prerequisite
>
> The topic has been quiet ovre the weekend. Shall I mark it for
> 'next' now?
Yes, I think it's ready. The only difference between v2 and v3 was your
comment on the text and there were no other comments.
Of course, if Patrick or anyone else would like more time to review, I'm
happy to wait.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v3 0/9] SHA-1/SHA-256 interoperability, part 1
2025-10-13 16:34 ` brian m. carlson
@ 2025-10-14 5:53 ` Patrick Steinhardt
0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2025-10-14 5:53 UTC (permalink / raw)
To: brian m. carlson, Junio C Hamano, git
On Mon, Oct 13, 2025 at 04:34:47PM +0000, brian m. carlson wrote:
> On 2025-10-13 at 15:24:55, Junio C Hamano wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >
> > > brian m. carlson (9):
> > > docs: update pack index v3 format
> > > docs: update offset order for pack index v3
> > > docs: reflect actual double signature for tags
> > > docs: improve ambiguous areas of pack format documentation
> > > docs: add documentation for loose objects
> > > rev-parse: allow printing compatibility hash
> > > fsck: consider gpgsig headers expected in tags
> > > t: allow specifying compatibility hash
> > > t1010: use BROKEN_OBJECTS prerequisite
> >
> > The topic has been quiet ovre the weekend. Shall I mark it for
> > 'next' now?
>
> Yes, I think it's ready. The only difference between v2 and v3 was your
> comment on the text and there were no other comments.
>
> Of course, if Patrick or anyone else would like more time to review, I'm
> happy to wait.
I just had another read through the series and am happy with this
version. Thanks!
Patrick
^ permalink raw reply [flat|nested] 67+ messages in thread