git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Eric W. Biederman" <ebiederm@xmission.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap
Date: Sun, 10 Sep 2023 13:00:40 -0500	[thread overview]
Message-ID: <87bke9hprr.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <ZP3UCQf+9D/J3wqT@tapette.crustytoothpaste.net> (brian m. carlson's message of "Sun, 10 Sep 2023 14:34:49 +0000")

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2023-09-08 at 23:10:19, Eric W. Biederman wrote:
>> Ir makes a lot of sense for the hash algorithm that determines how all
>
> Minor nit: "It".
>
>> diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
>> index 4b937480848a..10572c5794f9 100644
>> --- a/Documentation/technical/hash-function-transition.txt
>> +++ b/Documentation/technical/hash-function-transition.txt
>> @@ -148,14 +148,14 @@ Detailed Design
>>  Repository format extension
>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>  A SHA-256 repository uses repository format version `1` (see
>> -Documentation/technical/repository-version.txt) with extensions
>> -`objectFormat` and `compatObjectFormat`:
>> +Documentation/technical/repository-version.txt) with the extension
>> +`objectFormat`, and an optional core.compatMap configuration.
>>  
>>  	[core]
>>  		repositoryFormatVersion = 1
>> +		compatMap = on
>>  	[extensions]
>>  		objectFormat = sha256
>> -		compatObjectFormat = sha1
>
> While I'm in favour of an approach that uses the compat map, the
> situation we've implemented here doesn't specify the extra hash
> algorithm.  We want this approach to work just as well for moving from
> SHA-1 to SHA-256 as it might for a future transition from SHA-256 to,
> say, SHA-3-512, if that becomes necessary.
>
> Making a future transition easier has been a goal of my SHA-256 work
> (because who wants to write several hundred patches in such a case?), so
> my hope is we can keep that here as well by explicitly naming the
> algorithm we're using.
>
> I also wonder if an approach that doesn't use an extension is going to
> be helpful.  Say, that I have a repository that is using Git 3.x, which
> supports interop, but I also need to use Git 2.x, which does not.  While
> it's true that Git 2.x can read my SHA-256 repository, it won't write
> the appropriate objects into the map, and thus it will be practically
> very difficult to actually use Git 3.x to push data to a repository of a
> different hash function.  We might well prefer to have Git 2.x not work
> with the repository at all rather than have incomplete data preventing
> us from, well, interoperating.

First it is my hope that we can get a command such as "git gc" to scan
the repository and fill in all of the missing compatibility hashes.

Not so much for day to day work, but for people able to enable
compatibility hashes on an existing repository.  Enabling compatibility
hashes on a sha1 repository is going to be necessary to create a sha256
repository from it.  A depth first walk, or a topological sort of the
objects pretty much has to happen as a separate pass.  So it makes sense
just to require all of the objects have their compatibility hash
computed before attempting to generate a pack in the compatibility
format.

I say all of that and I feel silly.

The core and optimized path is what whatever receive pack does to deal
with a pack in the repositories compatibility format.  Once that is
built we can create a sha256 repository from a sha1 repository just
by cloning it, and letting receive-pack figure out the details.

Before we can generate a sha256 pack from a sha1 pack we still need
to compute the sha256 hash of every object, but that can be very
optimized and local to the case of receiving a non-native pack.  So a
repository that generates a compatibility hash for all of it's objects
is not necessary to transition to another hash algorithm.  All we need
is another repository in the other format.


That said there is value in being able to add compatibility hashes
to an existing repository.  The upstream repository can just convert
to the new hash function and all of the downstream repositories
can compute their compatibility hashes and convert when they are ready.

Basically once a git with transition support exists any repository can
convert at any time without creating a problem for other repositories.

In my head it seems cheaper/safer to compute the compatibility hash of
every object in an existing repository than it does to convert a
repository.  Is it?

I think that if the first pull from a repository in another format can
trigger the initial computation of the compatibility hash (like the
first use of a reverse index triggers the creation of the reverse
index), then it will definitely be easier to just enable compatibility
hashes in an existing repository.

The additional hash computation step every pull from upstream (even when
well optimized) should be an incentive for people to fully convert their
repositories after the upstream has converted.


That is when things get tricky and the transition plan has not talked
about.  There are references to existing oid's in email, bug trackers,
and commit comments.  Digging through the history and dealing with those
references is something that developers are going to need to do for the
rest of the life of a project.

Which means eventually we will need to support a mode where we have some
packs with a ``.compat'' index but we no longer compute or generate the
old hash for new objects.

In summary.  I agree that compatMap is likely insufficient. So far I
think it is too cheap/easy to generate the missing mappings to make it a
mandatory requirement that all operations always generate them.

I also agree that making the configuration resilient foreseeable future
demands is a good idea.

So I will push this change farther out in the patch series.

Eric

  reply	other threads:[~2023-09-10 18:01 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08 23:05 [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
2023-09-08 23:10 ` [PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256 Eric W. Biederman
2023-09-10 14:24   ` brian m. carlson
2023-09-10 18:07     ` Eric W. Biederman
2023-09-12  0:14   ` brian m. carlson
2023-09-12 13:36     ` Eric W. Biederman
2023-09-08 23:10 ` [PATCH 02/32] doc hash-function-transition: Replace compatObjectFormat with compatMap Eric W. Biederman
2023-09-10 14:34   ` brian m. carlson
2023-09-10 18:00     ` Eric W. Biederman [this message]
2023-09-11  6:11     ` Junio C Hamano
2023-09-11 16:35       ` [PATCH v2 02/32] doc hash-function-transition: Replace compatObjectFormat with mapObjectFormat Eric W. Biederman
2023-09-11 23:46         ` [PATCH v3 02/32] doc hash-function-transition: Augment compatObjectFormat with readCompatMap Eric W. Biederman
2023-09-12  7:57           ` Oswald Buddenhagen
2023-09-12 12:11             ` Eric W. Biederman
2023-09-13  8:10               ` Oswald Buddenhagen
2023-09-08 23:10 ` [PATCH 03/32] object-file-convert: Stubs for converting from one object format to another Eric W. Biederman
2023-09-08 23:10 ` [PATCH 04/32] object-name: Initial support for ^{sha1} and ^{sha256} Eric W. Biederman
2023-09-08 23:10 ` [PATCH 05/32] repository: add a compatibility hash algorithm Eric W. Biederman
2023-09-08 23:10 ` [PATCH 06/32] repository: Implement core.compatMap Eric W. Biederman
2023-09-08 23:10 ` [PATCH 07/32] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
2023-09-08 23:10 ` [PATCH 08/32] loose: Compatibilty short name support Eric W. Biederman
2023-09-08 23:10 ` [PATCH 09/32] object-file: Update the loose object map when writing loose objects Eric W. Biederman
2023-09-08 23:10 ` [PATCH 10/32] bulk-checkin: Only accept blobs Eric W. Biederman
2023-09-08 23:10 ` [PATCH 11/32] pack: Communicate the compat_oid through struct pack_idx_entry Eric W. Biederman
2023-09-08 23:10 ` [PATCH 12/32] bulk-checkin: hash object with compatibility algorithm Eric W. Biederman
2023-09-11  6:17   ` Junio C Hamano
2023-09-08 23:10 ` [PATCH 13/32] object-file: Add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
2023-09-08 23:10 ` [PATCH 14/32] commit: write commits for both hashes Eric W. Biederman
2023-09-11  6:25   ` Junio C Hamano
2023-09-08 23:10 ` [PATCH 15/32] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
2023-09-08 23:10 ` [PATCH 16/32] object: Factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
2023-09-08 23:10 ` [PATCH 17/32] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
2023-09-08 23:10 ` [PATCH 18/32] object-file-convert: convert commit objects when writing Eric W. Biederman
2023-09-08 23:10 ` [PATCH 19/32] object-file-convert: convert tag commits " Eric W. Biederman
2023-09-08 23:10 ` [PATCH 20/32] builtin/cat-file: Let the oid determine the output algorithm Eric W. Biederman
2023-09-08 23:10 ` [PATCH 21/32] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
2023-09-08 23:10 ` [PATCH 22/32] object-file: Handle compat objects in check_object_signature Eric W. Biederman
2023-09-08 23:10 ` [PATCH 23/32] builtin/ls-tree: Let the oid determine the output algorithm Eric W. Biederman
2023-09-08 23:10 ` [PATCH 24/32] builtin/pack-objects: Communicate the compatibility hash through struct pack_idx_entry Eric W. Biederman
2023-09-08 23:10 ` [PATCH 25/32] pack-compat-map: Add support for .compat files of a packfile Eric W. Biederman
2023-09-11  6:30   ` Junio C Hamano
2023-10-05 18:14     ` Taylor Blau
2023-09-08 23:10 ` [PATCH 26/32] object-file-convert: Implement convert_object_file_{begin,step,end} Eric W. Biederman
2023-09-11  6:28   ` Junio C Hamano
2023-09-08 23:10 ` [PATCH 27/32] builtin/fast-import: compute compatibility hashs for imported objects Eric W. Biederman
2023-09-08 23:10 ` [PATCH 28/32] builtin/index-pack: Add a simple oid index Eric W. Biederman
2023-09-08 23:10 ` [PATCH 29/32] builtin/index-pack: Compute the compatibility hash Eric W. Biederman
2023-09-08 23:10 ` [PATCH 30/32] builtin/index-pack: Make the stack in compute_compat_oid explicit Eric W. Biederman
2023-09-08 23:10 ` [PATCH 31/32] unpack-objects: Update to compute and write the compatibility hashes Eric W. Biederman
2023-09-08 23:10 ` [PATCH 32/32] object-file-convert: Implement repo_submodule_oid_to_algop Eric W. Biederman
2023-09-09 12:58 ` [RFC][PATCH 0/32] SHA256 and SHA1 interoperability Eric W. Biederman
2023-09-10 15:38 ` brian m. carlson
2023-09-10 18:20   ` Eric W. Biederman
2023-09-11  6:37 ` Junio C Hamano
2023-09-11 16:13   ` Eric W. Biederman
2023-09-11 22:05     ` brian m. carlson
2023-09-12 21:19       ` Eric W. Biederman
2023-09-12 16:20     ` Junio C Hamano
2023-09-14 19:57       ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bke9hprr.fsf@email.froward.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).