Git development
 help / color / mirror / Atom feed
From: "Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com>
To: "Patrick Steinhardt" <ps@pks.im>, git@vger.kernel.org
Subject: Re: [PATCH] hash: introduce support for the MD5 hash algorithm
Date: Wed, 01 Apr 2026 12:54:05 +0200	[thread overview]
Message-ID: <67f10f21-121b-426d-abee-32d034f84fe7@app.fastmail.com> (raw)
In-Reply-To: <20260401-pks-object-format-md5-v1-1-1b8f0be23713@pks.im>

On Wed, Apr 1, 2026, at 12:42, Patrick Steinhardt wrote:
> We are currently in the process of migrating to SHA256 as the
> alternative to SHA1. But we believe that proposal is misguided.
>
> When Linus first announced Git in April 2005, he was explicit about the
> role of SHA1 in the design: the hash is used for content integrity, not
> for cryptographic security [1]. Given this foundational principle, the
> collision resistance of the underlying hash algorithm is essentially
> irrelevant. What matters is that identical content always produces the
> same name, and that any corruption of stored data is detectable.
>
> While SHA256 technically provides stronger collision resistance than
> SHA1, it does so at the cost of 64-byte object names instead of 40, a
> 60% increase in verbosity for no practical benefit.
>
> As an alternative, MD5 satisfies the requirements of collision
> resistance and deterministic checksums perfectly well. At a length of 32
> hex characters they are shorter than SHA1, roll off the tongue more
> easily, and have been a beloved companion to the software engineer for
> decades. Furthermore, it remains in active use throughout the ecosystem,
> in checksums on download pages, filesystem integrity tools, and
> countless systems out there, which overall proves the point that they
> aren't inherently broken.
>
> Quoting Linus in [1]:
>
>   In other words, I think we could have used md5's as the hash, if we
>   just make sure we have good practices. And it wouldn't have been
>   "insecure".
>
> Let's do so and wire up MD5 as a new alternatitve hash algorithm next to
> SHA1 and SHA256. Repositories can easily be initialized with MD5 by
> saying `git init --object-format=md5`, and tests can be executed with
> the new hash by setting the `GIT_TEST_DEFAULT_HASH_ALGO=md5` environment
> variable.
>
> [1]:
> https://lore.kernel.org/git/Pine.LNX.4.58.0504160913180.7211@ppc970.osdl.org/
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> Hi,
>
> I guess the title says it all. Let's correct course!
>
> Patrick

I’ve been waiting years for this! Thank you so much!!!

When will this be packaged on CD-ROM?

> ---
>  chunk-format.c             |  2 ++
>  hash.c                     | 69 +++++++++++++++++++++++++++++++++++++++++++++-
>  hash.h                     | 34 +++++++++++++++++++----
>  md5/openssl.h              | 49 ++++++++++++++++++++++++++++++++
>  pack-mtimes.c              | 12 +++++++-
>  pkt-line.c                 |  2 +-
>  refs/reftable-backend.c    |  9 ++++++
>  reftable/basics.c          |  2 ++
>  reftable/basics.h          |  1 +
>  reftable/reftable-basics.h |  2 ++
>  reftable/table.c           |  3 ++
>  reftable/writer.c          |  3 ++
>  12 files changed, 180 insertions(+), 8 deletions(-)
>
> diff --git a/chunk-format.c b/chunk-format.c
> index 51b5a2c959..28aa0ae945 100644
> --- a/chunk-format.c
> +++ b/chunk-format.c
> @@ -209,6 +209,8 @@ uint8_t oid_version(const struct git_hash_algo *algop)
>  		return 1;
>  	case GIT_HASH_SHA256:
>  		return 2;
> +	case GIT_HASH_MD5:
> +		return 3;
>  	default:
>  		die(_("invalid hash version"));
>  	}
> diff --git a/hash.c b/hash.c
> index 553f2008ea..adfae35b7c 100644
> --- a/hash.c
> +++ b/hash.c
> @@ -43,6 +43,25 @@ static const struct object_id null_oid_sha256 = {
>  	.algo = GIT_HASH_SHA256,
>  };
>
> +static const struct object_id empty_tree_oid_md5 = {
> +	.hash = {
> +		0xfe, 0x20, 0x56, 0x59, 0xa1, 0x2b, 0x97, 0x65, 0x5d, 0x0a,
> +		0x89, 0x87, 0x50, 0xce, 0xaf, 0x96
> +	},
> +	.algo = GIT_HASH_MD5,
> +};
> +static const struct object_id empty_blob_oid_md5 = {
> +	.hash = {
> +		0x80, 0xff, 0xc6, 0xeb, 0x72, 0x86, 0xb1, 0x5a, 0xfc, 0x63,
> +		0xf9, 0xb8, 0x61, 0x79, 0xcc, 0xb1
> +	},
> +	.algo = GIT_HASH_MD5,
> +};
> +static const struct object_id null_oid_md5 = {
> +	.hash = {0},
> +	.algo = GIT_HASH_MD5,
> +};
> +
>  static void git_hash_sha1_init(struct git_hash_ctx *ctx)
>  {
>  	ctx->algop = &hash_algos[GIT_HASH_SHA1];
> @@ -135,6 +154,39 @@ static void git_hash_sha256_final_oid(struct
> object_id *oid, struct git_hash_ctx
>  	oid->algo = GIT_HASH_SHA256;
>  }
>
> +static void git_hash_md5_init(struct git_hash_ctx *ctx)
> +{
> +	ctx->algop = unsafe_hash_algo(&hash_algos[GIT_HASH_MD5]);
> +	git_MD5_Init(&ctx->state.md5);
> +}
> +
> +static void git_hash_md5_clone(struct git_hash_ctx *dst, const struct
> git_hash_ctx *src)
> +{
> +	dst->algop = src->algop;
> +	git_MD5_Clone(&dst->state.md5, &src->state.md5);
> +}
> +
> +static void git_hash_md5_update(struct git_hash_ctx *ctx, const void
> *data, size_t len)
> +{
> +	git_MD5_Update(&ctx->state.md5, data, len);
> +}
> +
> +static void git_hash_md5_final(unsigned char *hash, struct
> git_hash_ctx *ctx)
> +{
> +	git_MD5_Final(hash, &ctx->state.md5);
> +}
> +
> +static void git_hash_md5_final_oid(struct object_id *oid, struct
> git_hash_ctx *ctx)
> +{
> +	git_MD5_Final(oid->hash, &ctx->state.md5);
> +	/*
> +	 * This currently does nothing, so the compiler should optimize it
> out,
> +	 * but keep it in case we extend the hash size again.
> +	 */
> +	memset(oid->hash + GIT_MD5_RAWSZ, 0, GIT_MAX_RAWSZ - GIT_MD5_RAWSZ);
> +	oid->algo = GIT_HASH_MD5;
> +}
> +
>  static void git_hash_unknown_init(struct git_hash_ctx *ctx UNUSED)
>  {
>  	BUG("trying to init unknown hash");
> @@ -227,7 +279,22 @@ const struct git_hash_algo
> hash_algos[GIT_HASH_NALGOS] = {
>  		.empty_tree = &empty_tree_oid_sha256,
>  		.empty_blob = &empty_blob_oid_sha256,
>  		.null_oid = &null_oid_sha256,
> -	}
> +	},
> +	{
> +		.name = "md5",
> +		.format_id = GIT_MD5_FORMAT_ID,
> +		.rawsz = GIT_MD5_RAWSZ,
> +		.hexsz = GIT_MD5_HEXSZ,
> +		.blksz = GIT_MD5_BLKSZ,
> +		.init_fn = git_hash_md5_init,
> +		.clone_fn = git_hash_md5_clone,
> +		.update_fn = git_hash_md5_update,
> +		.final_fn = git_hash_md5_final,
> +		.final_oid_fn = git_hash_md5_final_oid,
> +		.empty_tree = &empty_tree_oid_md5,
> +		.empty_blob = &empty_blob_oid_md5,
> +		.null_oid = &null_oid_md5,
> +	},
>  };
>
>  const struct object_id *null_oid(const struct git_hash_algo *algop)
> diff --git a/hash.h b/hash.h
> index d51efce1d3..18f97eb1a9 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -131,6 +131,14 @@
>  #define git_SHA256_Clone	platform_SHA256_Clone
>  #endif
>
> +#include "md5/openssl.h"
> +
> +#define git_MD5_CTX	platform_MD5_CTX
> +#define git_MD5_Init	platform_MD5_Init
> +#define git_MD5_Update	platform_MD5_Update
> +#define git_MD5_Final	platform_MD5_Final
> +#define git_MD5_Clone	platform_MD5_Clone
> +
>  #ifdef SHA1_MAX_BLOCK_SIZE
>  #include "compat/sha1-chunked.h"
>  #undef git_SHA1_Update
> @@ -172,8 +180,10 @@ static inline void git_SHA256_Clone(git_SHA256_CTX
> *dst, const git_SHA256_CTX *s
>  #define GIT_HASH_SHA1 1
>  /* SHA-256  */
>  #define GIT_HASH_SHA256 2
> +/* MD5 */
> +#define GIT_HASH_MD5 3
>  /* Number of algorithms supported (including unknown). */
> -#define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
> +#define GIT_HASH_NALGOS (GIT_HASH_MD5 + 1)
>
>  /* Default hash algorithm if unspecified. */
>  #ifdef WITH_BREAKING_CHANGES
> @@ -203,6 +213,15 @@ static inline void git_SHA256_Clone(git_SHA256_CTX
> *dst, const git_SHA256_CTX *s
>  /* The block size of SHA-256. */
>  #define GIT_SHA256_BLKSZ 64
>
> +/* "md5s", big-endian */
> +#define GIT_MD5_FORMAT_ID 0x6d643573
> +
> +/* The length in bytes and in hex digits of an object name (MD5 value). */
> +#define GIT_MD5_RAWSZ 16
> +#define GIT_MD5_HEXSZ (2 * GIT_MD5_RAWSZ)
> +/* The block size of MD5. */
> +#define GIT_MD5_BLKSZ 64
> +
>  /* The length in byte and in hex digits of the largest possible hash value. */
>  #define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
>  #define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
> @@ -263,6 +282,7 @@ struct git_hash_ctx {
>  		git_SHA_CTX sha1;
>  		git_SHA_CTX_unsafe sha1_unsafe;
>  		git_SHA256_CTX sha256;
> +		git_MD5_CTX md5;
>  	} state;
>  };
>
> @@ -359,8 +379,10 @@ static inline int hashcmp(const unsigned char
> *sha1, const unsigned char *sha2,
>  	 * Teach the compiler that there are only two possibilities of hash
> size
>  	 * here, so that it can optimize for this case as much as possible.
>  	 */
> -	if (algop->rawsz == GIT_MAX_RAWSZ)
> -		return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
> +	if (algop->rawsz == GIT_SHA256_RAWSZ)
> +		return memcmp(sha1, sha2, GIT_SHA256_RAWSZ);
> +	if (algop->rawsz == GIT_MD5_RAWSZ)
> +		return memcmp(sha1, sha2, GIT_MD5_RAWSZ);
>  	return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
>  }
>
> @@ -370,8 +392,10 @@ static inline int hasheq(const unsigned char
> *sha1, const unsigned char *sha2, c
>  	 * We write this here instead of deferring to hashcmp so that the
>  	 * compiler can properly inline it and avoid calling memcmp.
>  	 */
> -	if (algop->rawsz == GIT_MAX_RAWSZ)
> -		return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
> +	if (algop->rawsz == GIT_SHA256_RAWSZ)
> +		return !memcmp(sha1, sha2, GIT_SHA256_RAWSZ);
> +	if (algop->rawsz == GIT_MD5_RAWSZ)
> +		return !memcmp(sha1, sha2, GIT_MD5_RAWSZ);
>  	return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
>  }
>
> diff --git a/md5/openssl.h b/md5/openssl.h
> new file mode 100644
> index 0000000000..4e5a041734
> --- /dev/null
> +++ b/md5/openssl.h
> @@ -0,0 +1,49 @@
> +/* wrappers for the EVP API of OpenSSL 3+ */
> +#ifndef MD5_OPENSSL_H
> +#define MD5_OPENSSL_H
> +#include <openssl/evp.h>
> +
> +struct openssl_MD5_CTX {
> +	EVP_MD_CTX *ectx;
> +};
> +
> +typedef struct openssl_MD5_CTX openssl_MD5_CTX;
> +
> +static inline void openssl_MD5_Init(struct openssl_MD5_CTX *ctx)
> +{
> +	const EVP_MD *type = EVP_md5();
> +
> +	ctx->ectx = EVP_MD_CTX_new();
> +	if (!ctx->ectx)
> +		die("EVP_MD_CTX_new: out of memory");
> +
> +	EVP_DigestInit_ex(ctx->ectx, type, NULL);
> +}
> +
> +static inline void openssl_MD5_Update(struct openssl_MD5_CTX *ctx,
> +				      const void *data,
> +				      size_t len)
> +{
> +	EVP_DigestUpdate(ctx->ectx, data, len);
> +}
> +
> +static inline void openssl_MD5_Final(unsigned char *digest,
> +				     struct openssl_MD5_CTX *ctx)
> +{
> +	EVP_DigestFinal_ex(ctx->ectx, digest, NULL);
> +	EVP_MD_CTX_free(ctx->ectx);
> +}
> +
> +static inline void openssl_MD5_Clone(struct openssl_MD5_CTX *dst,
> +				     const struct openssl_MD5_CTX *src)
> +{
> +	EVP_MD_CTX_copy_ex(dst->ectx, src->ectx);
> +}
> +
> +#define platform_MD5_CTX openssl_MD5_CTX
> +#define platform_MD5_Init openssl_MD5_Init
> +#define platform_MD5_Clone openssl_MD5_Clone
> +#define platform_MD5_Update openssl_MD5_Update
> +#define platform_MD5_Final openssl_MD5_Final
> +
> +#endif /* MD5_OPENSSL_H */
> diff --git a/pack-mtimes.c b/pack-mtimes.c
> index 8e1f2dec0e..ee54aa3dd4 100644
> --- a/pack-mtimes.c
> +++ b/pack-mtimes.c
> @@ -75,7 +75,17 @@ static int load_pack_mtimes_file(char *mtimes_file,
>
>  	expected_size = MTIMES_HEADER_SIZE;
>  	expected_size = st_add(expected_size, st_mult(sizeof(uint32_t),
> num_objects));
> -	expected_size = st_add(expected_size, 2 * (header.hash_id == 1 ?
> GIT_SHA1_RAWSZ : GIT_SHA256_RAWSZ));
> +	switch (header.hash_id) {
> +		case 1:
> +			expected_size = st_add(expected_size, 2 * GIT_SHA1_RAWSZ);
> +			break;
> +		case 2:
> +			expected_size = st_add(expected_size, 2 * GIT_SHA256_RAWSZ);
> +			break;
> +		case 3:
> +			expected_size = st_add(expected_size, 2 * GIT_MD5_RAWSZ);
> +			break;
> +	}
>
>  	if (mtimes_size != expected_size) {
>  		ret = error(_("mtimes file %s is corrupt"), mtimes_file);
> diff --git a/pkt-line.c b/pkt-line.c
> index 3fc3e9ea70..4bc917c8d3 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -395,7 +395,7 @@ static const char *find_packfile_uri_path(const
> char *buffer)
>
>  	len = strspn(buffer, "0123456789abcdefABCDEF");
>  	/* size of SHA1 and SHA256 hash */
> -	if (!(len == 40 || len == 64) || buffer[len] != ' ')
> +	if (!(len == 40 || len == 64 || len == 32) || buffer[len] != ' ')
>  		return NULL; /* required "<hash>SP" not seen */
>
>  	path = strstr(buffer + len + 1, URI_MARK);
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index b124404663..7bb0a60f15 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -106,6 +106,9 @@ static int reftable_backend_read_ref(struct
> reftable_backend *be,
>  		case REFTABLE_HASH_SHA256:
>  			hash_id = GIT_HASH_SHA256;
>  			break;
> +		case REFTABLE_HASH_MD5:
> +			hash_id = GIT_HASH_MD5;
> +			break;
>  		default:
>  			BUG("unhandled hash ID %d", reftable_stack_hash_id(be->stack));
>  		}
> @@ -401,6 +404,9 @@ static struct ref_store *reftable_be_init(struct
> repository *repo,
>  	case GIT_SHA256_FORMAT_ID:
>  		refs->write_options.hash_id = REFTABLE_HASH_SHA256;
>  		break;
> +	case GIT_MD5_FORMAT_ID:
> +		refs->write_options.hash_id = REFTABLE_HASH_MD5;
> +		break;
>  	default:
>  		BUG("unknown hash algorithm %d", repo->hash_algo->format_id);
>  	}
> @@ -2788,6 +2794,9 @@ static int reftable_be_fsck(struct ref_store
> *ref_store, struct fsck_options *o,
>  			case REFTABLE_HASH_SHA256:
>  				hash_id = GIT_HASH_SHA256;
>  				break;
> +			case REFTABLE_HASH_MD5:
> +				hash_id = GIT_HASH_MD5;
> +				break;
>  			default:
>  				BUG("unhandled hash ID %d",
>  				    reftable_stack_hash_id(backend->stack));
> diff --git a/reftable/basics.c b/reftable/basics.c
> index e969927b61..3b62c562cf 100644
> --- a/reftable/basics.c
> +++ b/reftable/basics.c
> @@ -273,6 +273,8 @@ uint32_t hash_size(enum reftable_hash id)
>  		return REFTABLE_HASH_SIZE_SHA1;
>  	case REFTABLE_HASH_SHA256:
>  		return REFTABLE_HASH_SIZE_SHA256;
> +	case REFTABLE_HASH_MD5:
> +		return REFTABLE_HASH_SIZE_MD5;
>  	}
>  	abort();
>  }
> diff --git a/reftable/basics.h b/reftable/basics.h
> index e4b83b2b03..9f0e0b3fa5 100644
> --- a/reftable/basics.h
> +++ b/reftable/basics.h
> @@ -287,5 +287,6 @@ uint32_t hash_size(enum reftable_hash id);
>   */
>  #define REFTABLE_FORMAT_ID_SHA1   ((uint32_t) 0x73686131)
>  #define REFTABLE_FORMAT_ID_SHA256 ((uint32_t) 0x73323536)
> +#define REFTABLE_FORMAT_ID_MD5    ((uint32_t) 0x6d643573)
>
>  #endif
> diff --git a/reftable/reftable-basics.h b/reftable/reftable-basics.h
> index 6d73f19c85..8cc196f91e 100644
> --- a/reftable/reftable-basics.h
> +++ b/reftable/reftable-basics.h
> @@ -27,9 +27,11 @@ struct reftable_buf {
>  enum reftable_hash {
>  	REFTABLE_HASH_SHA1   = 89,
>  	REFTABLE_HASH_SHA256 = 247,
> +	REFTABLE_HASH_MD5    = 104,
>  };
>  #define REFTABLE_HASH_SIZE_SHA1   20
>  #define REFTABLE_HASH_SIZE_SHA256 32
> +#define REFTABLE_HASH_SIZE_MD5    16
>  #define REFTABLE_HASH_SIZE_MAX    REFTABLE_HASH_SIZE_SHA256
>
>  /* Overrides the functions to use for memory management. */
> diff --git a/reftable/table.c b/reftable/table.c
> index 56362df0ed..5c463ade73 100644
> --- a/reftable/table.c
> +++ b/reftable/table.c
> @@ -79,6 +79,9 @@ static int parse_footer(struct reftable_table *t,
> uint8_t *footer,
>  		case REFTABLE_FORMAT_ID_SHA256:
>  			t->hash_id = REFTABLE_HASH_SHA256;
>  			break;
> +		case REFTABLE_FORMAT_ID_MD5:
> +			t->hash_id = REFTABLE_HASH_MD5;
> +			break;
>  		default:
>  			err = REFTABLE_FORMAT_ERROR;
>  			goto done;
> diff --git a/reftable/writer.c b/reftable/writer.c
> index 0133b64975..9499fd9a73 100644
> --- a/reftable/writer.c
> +++ b/reftable/writer.c
> @@ -114,6 +114,9 @@ static int writer_write_header(struct
> reftable_writer *w, uint8_t *dest)
>  		case REFTABLE_HASH_SHA256:
>  			hash_id = REFTABLE_FORMAT_ID_SHA256;
>  			break;
> +		case REFTABLE_HASH_MD5:
> +			hash_id = REFTABLE_FORMAT_ID_MD5;
> +			break;
>  		default:
>  			return -1;
>  		}
>
> ---
> base-commit: 270e10ad6dda3379ea0da7efd11e4fbf2cd7a325
> change-id: 20260401-pks-object-format-md5-5e34f91d5b06

-- 

sent from my SamSun g

  reply	other threads:[~2026-04-01 10:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 10:42 [PATCH] hash: introduce support for the MD5 hash algorithm Patrick Steinhardt
2026-04-01 10:54 ` Kristoffer Haugsbakk [this message]
2026-04-01 13:47   ` Toon Claes
2026-04-01 17:41     ` Tian Yuchen
2026-04-04 15:34       ` K Jayatheerth
2026-04-01 18:42 ` Junio C Hamano
2026-04-02  7:08 ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67f10f21-121b-426d-abee-32d034f84fe7@app.fastmail.com \
    --to=kristofferhaugsbakk@fastmail.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox