git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	Patrick Steinhardt <ps@pks.im>,
	Ezekiel Newren <ezekielnewren@gmail.com>
Subject: [PATCH v2 00/15] SHA-1/SHA-256 interoperability, part 2
Date: Mon, 17 Nov 2025 22:16:06 +0000	[thread overview]
Message-ID: <20251117221621.2863243-1-sandals@crustytoothpaste.net> (raw)
In-Reply-To: <20251027004404.2152927-1-sandals@crustytoothpaste.net>

This is the second part of the SHA-1/SHA-256 interoperability work.  It
introduces our first major use of Rust code to implement a object map
format as well as preparatory work to make that happen, including
changing types to more Rust-friendly ones.  Since Rust will be required
for the interoperability work, we require that in the testsuite.

We also verify that our object ID algorithm is valid when looking up
data in the hash map since the Rust code intentionally has no knowledge
about global mutable state like the_repository and so cannot default to
the main hash algorithm when we've zero-initialized a struct object_id.

The advantage to this Rust code is that it is comprehensively tested
with unit testing.  We can serialize our object map and then verify that
we can also load it again and perform various testing, such as whether
certain object IDs are found in the map and mapped correctly. We can
also test our slightly subtle custom binary search code effectively and
be confident that it works, since Rust doesn't provide a way to binary
search slices of variable length.

I have opted not to use an enum type for our hash algorithm and have
preserved the use of uint32_t from v1.  A C enum type would not map
one-to-one with the Rust type (since the C version would use
GIT_HASH_UNKNOWN for unknown values and Rust would use None instead), so
to avoid problems as we generate more of the integration code with
bindgen and cbindgen, I've chosen to leave it as it is.

Changes since v1:

* Use `MAYBE_UNUSED` instead of casting.
* Explain reason for `ObjectID` structure.
* Switch to `Result` in hash algorithm abstraction.
* Add some additional helpers to `ObjectID`.
* Rename function to `hash_algo_ptr_by_number`.
* Switch to `xmalloc`.
* Fix `build.rs` to use syntax compatible with Rust 1.63.
* Remove unneeded libraries from `build.rs`.
* Improve Rust documentation.
* Explain that safe hashing is about untrusted data, not memory safety.
* Add a trait for hashing to allow for future unsafe (trusted data) hashing.
* Rename `Hasher` to `CryptoHasher`.
* Remove description of legacy loose object map.
* Rename loose object map to object map.
* Update documentation for object map to be clearer about padding, alignment, and endianness.
* Explain which hash algorithm is used in object map.
* Remove mention of chunks in object map in favour of generic "additional data".
* Fix indentation in object map documentation.
* Generally clarify object map documentation.
* Fix clippy warnings in Rust code.

brian m. carlson (15):
  repository: require Rust support for interoperability
  conversion: don't crash when no destination algo
  hash: use uint32_t for object_id algorithm
  rust: add a ObjectID struct
  rust: add a hash algorithm abstraction
  hash: add a function to look up hash algo structs
  rust: add additional helpers for ObjectID
  csum-file: define hashwrite's count as a uint32_t
  write-or-die: add an fsync component for the object map
  hash: expose hash context functions to Rust
  rust: add a build.rs script for tests
  rust: add functionality to hash an object
  rust: add a new binary object map format
  rust: add a small wrapper around the hashfile code
  object-file-convert: always make sure object ID algo is valid

 Documentation/gitformat-loose.adoc |  78 +++
 Makefile                           |   5 +-
 build.rs                           |  17 +
 csum-file.c                        |   2 +-
 csum-file.h                        |   2 +-
 hash.c                             |  48 +-
 hash.h                             |  38 +-
 object-file-convert.c              |  14 +-
 oidtree.c                          |   2 +-
 repository.c                       |  12 +-
 repository.h                       |   4 +-
 serve.c                            |   2 +-
 src/csum_file.rs                   |  81 +++
 src/hash.rs                        | 466 +++++++++++++++
 src/lib.rs                         |   3 +
 src/loose.rs                       | 913 +++++++++++++++++++++++++++++
 src/meson.build                    |   3 +
 t/t1006-cat-file.sh                |  82 ++-
 t/t1016-compatObjectFormat.sh      |   6 +
 t/t1500-rev-parse.sh               |   2 +-
 t/t9305-fast-import-signatures.sh  |   4 +-
 t/t9350-fast-export.sh             |   4 +-
 t/test-lib.sh                      |   4 +
 write-or-die.h                     |   4 +-
 24 files changed, 1722 insertions(+), 74 deletions(-)
 create mode 100644 build.rs
 create mode 100644 src/csum_file.rs
 create mode 100644 src/hash.rs
 create mode 100644 src/loose.rs


  parent reply	other threads:[~2025-11-17 22:16 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27  0:43 [PATCH 00/14] SHA-1/SHA-256 interoperability, part 2 brian m. carlson
2025-10-27  0:43 ` [PATCH 01/14] repository: require Rust support for interoperability brian m. carlson
2025-10-28  9:16   ` Patrick Steinhardt
2025-10-27  0:43 ` [PATCH 02/14] conversion: don't crash when no destination algo brian m. carlson
2025-10-27  0:43 ` [PATCH 03/14] hash: use uint32_t for object_id algorithm brian m. carlson
2025-10-28  9:16   ` Patrick Steinhardt
2025-10-28 18:28     ` Ezekiel Newren
2025-10-28 19:33     ` Junio C Hamano
2025-10-28 19:58       ` Ezekiel Newren
2025-10-28 20:20         ` Junio C Hamano
2025-10-30  0:23       ` brian m. carlson
2025-10-30  1:58         ` Collin Funk
2025-11-03  1:30           ` brian m. carlson
2025-10-29  0:33     ` brian m. carlson
2025-10-29  9:07       ` Patrick Steinhardt
2025-10-27  0:43 ` [PATCH 04/14] rust: add a ObjectID struct brian m. carlson
2025-10-28  9:17   ` Patrick Steinhardt
2025-10-28 19:07     ` Ezekiel Newren
2025-10-29  0:42       ` brian m. carlson
2025-10-28 19:40     ` Junio C Hamano
2025-10-29  0:47       ` brian m. carlson
2025-10-29  0:36     ` brian m. carlson
2025-10-29  9:08       ` Patrick Steinhardt
2025-10-30  0:32         ` brian m. carlson
2025-10-27  0:43 ` [PATCH 05/14] rust: add a hash algorithm abstraction brian m. carlson
2025-10-28  9:18   ` Patrick Steinhardt
2025-10-28 17:09     ` Ezekiel Newren
2025-10-28 20:00   ` Junio C Hamano
2025-10-28 20:03     ` Ezekiel Newren
2025-10-29 13:27       ` Junio C Hamano
2025-10-29 14:32         ` Junio C Hamano
2025-10-27  0:43 ` [PATCH 06/14] hash: add a function to look up hash algo structs brian m. carlson
2025-10-28  9:18   ` Patrick Steinhardt
2025-10-28 20:12   ` Junio C Hamano
2025-11-04  1:48     ` brian m. carlson
2025-11-04 10:24       ` Junio C Hamano
2025-10-27  0:43 ` [PATCH 07/14] csum-file: define hashwrite's count as a uint32_t brian m. carlson
2025-10-28 17:22   ` Ezekiel Newren
2025-10-27  0:43 ` [PATCH 08/14] write-or-die: add an fsync component for the loose object map brian m. carlson
2025-10-27  0:43 ` [PATCH 09/14] hash: expose hash context functions to Rust brian m. carlson
2025-10-29 16:32   ` Junio C Hamano
2025-10-30 21:42     ` brian m. carlson
2025-10-30 21:52       ` Junio C Hamano
2025-10-27  0:44 ` [PATCH 10/14] rust: add a build.rs script for tests brian m. carlson
2025-10-28  9:18   ` Patrick Steinhardt
2025-10-28 17:42     ` Ezekiel Newren
2025-10-29 16:43   ` Junio C Hamano
2025-10-29 22:10     ` Ezekiel Newren
2025-10-29 23:12       ` Junio C Hamano
2025-10-30  6:26         ` Patrick Steinhardt
2025-10-30 13:54           ` Junio C Hamano
2025-10-31 22:43             ` Ezekiel Newren
2025-11-01 11:18               ` Junio C Hamano
2025-10-27  0:44 ` [PATCH 11/14] rust: add functionality to hash an object brian m. carlson
2025-10-28  9:18   ` Patrick Steinhardt
2025-10-29  0:53     ` brian m. carlson
2025-10-29  9:07       ` Patrick Steinhardt
2025-10-28 18:05   ` Ezekiel Newren
2025-10-29  1:05     ` brian m. carlson
2025-10-29 16:02       ` Ben Knoble
2025-10-27  0:44 ` [PATCH 12/14] rust: add a new binary loose object map format brian m. carlson
2025-10-28  9:18   ` Patrick Steinhardt
2025-10-29  1:37     ` brian m. carlson
2025-10-29  9:07       ` Patrick Steinhardt
2025-10-29 17:03   ` Junio C Hamano
2025-10-29 18:21   ` Junio C Hamano
2025-10-27  0:44 ` [PATCH 13/14] rust: add a small wrapper around the hashfile code brian m. carlson
2025-10-28 18:19   ` Ezekiel Newren
2025-10-29  1:39     ` brian m. carlson
2025-10-27  0:44 ` [PATCH 14/14] object-file-convert: always make sure object ID algo is valid brian m. carlson
2025-10-29 20:07 ` [PATCH 00/14] SHA-1/SHA-256 interoperability, part 2 Junio C Hamano
2025-10-29 20:15   ` Junio C Hamano
2025-11-11  0:12 ` Ezekiel Newren
2025-11-14 17:25 ` Junio C Hamano
2025-11-14 21:11   ` Junio C Hamano
2025-11-17  6:56   ` Junio C Hamano
2025-11-17 22:09     ` brian m. carlson
2025-11-18  0:13       ` Junio C Hamano
2025-11-19 23:04         ` brian m. carlson
2025-11-19 23:24           ` Junio C Hamano
2025-11-19 23:37           ` Ezekiel Newren
2025-11-20 19:52             ` Ezekiel Newren
2025-11-20 23:02               ` brian m. carlson
2025-11-20 23:11                 ` Ezekiel Newren
2025-11-20 23:14                   ` Junio C Hamano
2025-11-17 22:16 ` brian m. carlson [this message]
2025-11-17 22:16   ` [PATCH v2 01/15] repository: require Rust support for interoperability brian m. carlson
2025-11-17 22:16   ` [PATCH v2 02/15] conversion: don't crash when no destination algo brian m. carlson
2025-11-17 22:16   ` [PATCH v2 03/15] hash: use uint32_t for object_id algorithm brian m. carlson
2025-11-17 22:16   ` [PATCH v2 04/15] rust: add a ObjectID struct brian m. carlson
2025-11-17 22:16   ` [PATCH v2 05/15] rust: add a hash algorithm abstraction brian m. carlson
2025-11-17 22:16   ` [PATCH v2 06/15] hash: add a function to look up hash algo structs brian m. carlson
2025-11-17 22:16   ` [PATCH v2 07/15] rust: add additional helpers for ObjectID brian m. carlson
2025-11-17 22:16   ` [PATCH v2 08/15] csum-file: define hashwrite's count as a uint32_t brian m. carlson
2025-11-17 22:16   ` [PATCH v2 09/15] write-or-die: add an fsync component for the object map brian m. carlson
2025-11-17 22:16   ` [PATCH v2 10/15] hash: expose hash context functions to Rust brian m. carlson
2025-11-17 22:16   ` [PATCH v2 11/15] rust: add a build.rs script for tests brian m. carlson
2025-11-17 22:16   ` [PATCH v2 12/15] rust: add functionality to hash an object brian m. carlson
2025-11-17 22:16   ` [PATCH v2 13/15] rust: add a new binary object map format brian m. carlson
2025-11-17 22:16   ` [PATCH v2 14/15] rust: add a small wrapper around the hashfile code brian m. carlson
2025-11-17 22:16   ` [PATCH v2 15/15] object-file-convert: always make sure object ID algo is valid brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251117221621.2863243-1-sandals@crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=ezekielnewren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).