From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Git Mailing List <git@vger.kernel.org>,
Stefan Beller <sbeller@google.com>,
bmwill@google.com, Jonathan Tan <jonathantanmy@google.com>,
Jeff King <peff@peff.net>, David Lang <david@lang.hm>,
"brian m. carlson" <sandals@crustytoothpaste.net>,
Masaya Suzuki <masayasuzuki@google.com>,
demerphq@gmail.com, The Keccak Team <keccak@noekeon.org>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH v4] technical doc: add a design doc for hash function transition
Date: Mon, 02 Oct 2017 17:25:15 +0900 [thread overview]
Message-ID: <xmqq3772ot1w.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170929173413.GI19555@aiede.mtv.corp.google.com> (Jonathan Nieder's message of "Fri, 29 Sep 2017 10:34:13 -0700")
Jonathan Nieder <jrnieder@gmail.com> writes:
>>> +6. Skip fetching some submodules of a project into a NewHash
>>> + repository. (This also depends on NewHash support in Git
>>> + protocol.)
>>
>> It is unclear what this means. Around submodule support, one thing
>> I can think of is that a NewHash tree in a superproject would record
>> a gitlink that is a NewHash commit object name in it, therefore it
>> cannot refer to an unconverted SHA-1 submodule repository. But it
>> is unclear if the above description refers to the same issue, or
>> something else.
>
> It refers to that issue.
We may want to find a way to make it clear, then.
>> It makes me wonder if we want to add the hashname in this object
>> header. "length" would be different for non-blob objects anyway,
>> and it is not "compat metadata" we want to avoid baked in, yet it
>> would help diagnose a mistake of attempting to use a "mixed" objects
>> in a single repository. Not a big issue, though.
>
> Do you mean that adding the hashname into the computation that
> produces the object name would help in some use case?
What I mean is that for SHA-1 objects we keep the object header to
be "<type> <length> NUL". For objects in newer world, use the
object header to "<type> <hash> <length> NUL", and include the
hashname in the object name computation.
> For loose objects, it would be nice to name the hash in the file, so
> that "file" can understand what is happening if someone accidentally
> mixes types using "cp". The only downside is losing the ability to
> copy blobs (which have the same content despite being named using
> different hashes) between repositories after determining their new
> names. That doesn't seem like a strong downside --- it's pretty
> harmless to include the hash type in loose object files, too. I think
> I would prefer this to be a "magic number" instead of part of the
> zlib-deflated payload, since this way "file" can discover it more
> easily.
Yeah, thanks for doing pros-and-cons for me ;-)
>> If it is a goal to eventually be able to lose SHA-1 compatibility
>> metadata from the objects, then we might want to remove SHA-1 based
>> signature bits (e.g. PGP trailer in signed tag, gpgsig header in the
>> commit object) from NewHash contents, and instead have them stored
>> in a side "metadata" table, only to be used while converting back.
>> I dunno if that is desirable.
>
> I don't consider that desirable.
Agreed. Let's not go there.
>> Hmm, as the corresponding packfile stores object data only in
>> NewHash content format, it is somewhat curious that this table that
>> stores CRC32 of the data appears in the "Tables for each object
>> format" section, as they would be identical, no? Unless I am
>> grossly misleading the spec, the checksum should either go outside
>> the "Tables for each object format" section but still in .idx, or
>> should be eliminated and become part of the packdata stream instead,
>> perhaps?
>
> It's actually only present for the first object format. Will find a
> better way to describe this.
I see. One way to do so is to have it upfront before the "after
this point, these tables repeat for each of the hashes" part of the
file.
>> Oy. So we can go from a short prefix to the pack location by first
>> finding it via binsearch in the short-name table, realize that it is
>> nth object in the object name order, and consulting this table.
>> When we know the pack-order of an object, there is no direct way to
>> go to its location (short of reversing the name-order-to-pack-order
>> table)?
>
> An earlier version of the design also had a pack-order-to-pack-offset
> table, but we weren't able to think of any cases where that would be
> used without also looking up the object name that can be used to
> verify the integrity of the inflated object.
The primary thing I was interested in knowing was if we tried to
think of any case where it may be useful and then didn't think of
any---I couldn't but I know I am not imaginative enough, and I
wanted to know you guys didn't, either.
next prev parent reply other threads:[~2017-10-02 8:25 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-04 1:12 RFC: Another proposed hash function transition plan Jonathan Nieder
2017-03-05 2:35 ` Linus Torvalds
2017-03-06 0:26 ` brian m. carlson
2017-03-06 18:24 ` Brandon Williams
2017-06-15 10:30 ` Which hash function to use, was " Johannes Schindelin
2017-06-15 11:05 ` Mike Hommey
2017-06-15 13:01 ` Jeff King
2017-06-15 16:30 ` Ævar Arnfjörð Bjarmason
2017-06-15 19:34 ` Johannes Schindelin
2017-06-15 21:59 ` Adam Langley
2017-06-15 22:41 ` brian m. carlson
2017-06-15 23:36 ` Ævar Arnfjörð Bjarmason
2017-06-16 0:17 ` brian m. carlson
2017-06-16 6:25 ` Ævar Arnfjörð Bjarmason
2017-06-16 13:24 ` Johannes Schindelin
2017-06-16 17:38 ` Adam Langley
2017-06-16 20:52 ` Junio C Hamano
2017-06-16 21:12 ` Junio C Hamano
2017-06-16 21:24 ` Jonathan Nieder
2017-06-16 21:39 ` Ævar Arnfjörð Bjarmason
2017-06-16 20:42 ` Jeff King
2017-06-19 9:26 ` Johannes Schindelin
2017-06-15 21:10 ` Mike Hommey
2017-06-16 4:30 ` Jeff King
2017-06-15 17:36 ` Brandon Williams
2017-06-15 19:20 ` Junio C Hamano
2017-06-15 19:13 ` Jonathan Nieder
2017-03-07 0:17 ` RFC v3: " Jonathan Nieder
2017-03-09 19:14 ` Shawn Pearce
2017-03-09 20:24 ` Jonathan Nieder
2017-03-10 19:38 ` Jeff King
2017-03-10 19:55 ` Jonathan Nieder
2017-09-28 4:43 ` [PATCH v4] technical doc: add a design doc for hash function transition Jonathan Nieder
2017-09-29 6:06 ` Junio C Hamano
2017-09-29 8:09 ` Junio C Hamano
2017-09-29 17:34 ` Jonathan Nieder
2017-10-02 8:25 ` Junio C Hamano [this message]
2017-10-02 19:41 ` Jason Cooper
2017-10-02 9:02 ` Junio C Hamano
2017-10-02 19:23 ` Jason Cooper
2017-10-03 5:40 ` Junio C Hamano
2017-10-03 13:08 ` Jason Cooper
2017-10-04 1:44 ` Junio C Hamano
2017-09-06 6:28 ` RFC v3: Another proposed hash function transition plan Junio C Hamano
2017-09-08 2:40 ` Junio C Hamano
2017-09-08 3:34 ` Jeff King
2017-09-11 18:59 ` Brandon Williams
2017-09-13 12:05 ` Johannes Schindelin
2017-09-13 13:43 ` demerphq
2017-09-13 22:51 ` Jonathan Nieder
2017-09-14 18:26 ` Johannes Schindelin
2017-09-14 18:40 ` Jonathan Nieder
2017-09-14 22:09 ` Johannes Schindelin
2017-09-13 23:30 ` Linus Torvalds
2017-09-14 18:45 ` Johannes Schindelin
2017-09-18 12:17 ` Gilles Van Assche
2017-09-18 22:16 ` Johannes Schindelin
2017-09-19 16:45 ` Gilles Van Assche
2017-09-29 13:17 ` Johannes Schindelin
2017-09-29 14:54 ` Joan Daemen
2017-09-29 22:33 ` Johannes Schindelin
2017-09-30 22:02 ` Joan Daemen
2017-10-02 14:26 ` Johannes Schindelin
2017-09-18 22:25 ` Jonathan Nieder
2017-09-26 17:05 ` Jason Cooper
2017-09-26 22:11 ` Johannes Schindelin
2017-09-26 22:25 ` [PATCH] technical doc: add a design doc for hash function transition Stefan Beller
2017-09-26 23:38 ` Jonathan Nieder
2017-09-26 23:51 ` RFC v3: Another proposed hash function transition plan Jonathan Nieder
2017-10-02 14:54 ` Jason Cooper
2017-10-02 16:50 ` Brandon Williams
2017-10-02 14:00 ` Jason Cooper
2017-10-02 17:18 ` Linus Torvalds
2017-10-02 19:37 ` Jeff King
2017-09-13 16:30 ` Jonathan Nieder
2017-09-13 21:52 ` Junio C Hamano
2017-09-13 22:07 ` Stefan Beller
2017-09-13 22:18 ` Jonathan Nieder
2017-09-14 2:13 ` Junio C Hamano
2017-09-14 15:23 ` Johannes Schindelin
2017-09-14 15:45 ` demerphq
2017-09-14 22:06 ` Johannes Schindelin
2017-09-13 22:15 ` Junio C Hamano
2017-09-13 22:27 ` Jonathan Nieder
2017-09-14 2:10 ` Junio C Hamano
2017-09-14 12:39 ` Johannes Schindelin
2017-09-14 16:36 ` Brandon Williams
2017-09-14 18:49 ` Jonathan Nieder
2017-09-15 20:42 ` Philip Oakley
2017-03-05 11:02 ` RFC: " David Lang
[not found] ` <CA+dhYEXHbQfJ6KUB1tWS9u1MLEOJL81fTYkbxu4XO-i+379LPw@mail.gmail.com>
2017-03-06 9:43 ` Jeff King
2017-03-06 23:40 ` Jonathan Nieder
2017-03-07 0:03 ` Mike Hommey
2017-03-06 8:43 ` Jeff King
2017-03-06 18:39 ` Jonathan Tan
2017-03-06 19:22 ` Linus Torvalds
2017-03-06 19:59 ` Brandon Williams
2017-03-06 21:53 ` Junio C Hamano
2017-03-07 8:59 ` Jeff King
2017-03-06 18:43 ` Junio C Hamano
2017-03-07 18:57 ` Ian Jackson
2017-03-07 19:15 ` Linus Torvalds
2017-03-08 11:20 ` Ian Jackson
2017-03-08 15:37 ` Johannes Schindelin
2017-03-08 15:40 ` Johannes Schindelin
2017-03-20 5:21 ` Use base32? Jason Hennessey
2017-03-20 5:58 ` Michael Steuer
2017-03-20 8:05 ` Jacob Keller
2017-03-21 3:07 ` Michael Steuer
2017-03-13 9:24 ` RFC: Another proposed hash function transition plan The Keccak Team
2017-03-13 17:48 ` Jonathan Nieder
2017-03-13 18:34 ` ankostis
2017-03-17 11:07 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq3772ot1w.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=bmwill@google.com \
--cc=david@lang.hm \
--cc=demerphq@gmail.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=keccak@noekeon.org \
--cc=masayasuzuki@google.com \
--cc=peff@peff.net \
--cc=sandals@crustytoothpaste.net \
--cc=sbeller@google.com \
--cc=spearce@spearce.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.