git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Lets avoid the SHA-1 term (was [doc] User Manual Suggestion)
@ 2009-04-26 23:38 Felipe Contreras
  2009-04-27  0:28 ` Björn Steinbrink
  2009-04-27 12:06 ` Michael J Gruber
  0 siblings, 2 replies; 5+ messages in thread
From: Felipe Contreras @ 2009-04-26 23:38 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: David Abrahams, Michael Witten, Jeff King, Daniel Barkalow,
	Johan Herland, git, J. Bruce Fields

2009/4/27 Björn Steinbrink <B.Steinbrink@gmx.de>:
> On 2009.04.24 20:48:57 -0400, David Abrahams wrote:
>>
>> On Apr 24, 2009, at 8:01 PM, Michael Witten wrote:
>>
>>>> What's wrong with just calling the object name "object name"?
>>>
>>> What's wrong with calling the object address "object address"?
>>
>> Neither captures the connection to the object's contents.  I think
>> "value ID" would be closer, but it's probably too horrible.
>
> I think I asked this in another mail, but I'm quite tired, so just to
> make sure: What do you mean by "value"? I might be weird (I'm not a
> native speaker, so I probably make funny and wrong connotations from
> time to time), but while I can accept "content" to include the type and
> size of the object, the term "value" makes me want to exclude those
> pieces of meta data. So "value" somehow feels wrong to me, as the hash
> covers those two fields.

Just to summarize.

Do you agree that SHA-1 is not the proper term to choose?

Do you agree that either 'id' or 'hash' would work fine?

Personally I think there's an advantage of choosing 'hash'; if we pick
'id' then the user might think that he can change the contents of the
object while keeping the same id, if we pick 'hash' then it's obvious
the 'id' is tied to the content and why.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lets avoid the SHA-1 term (was [doc] User Manual Suggestion)
  2009-04-26 23:38 Lets avoid the SHA-1 term (was [doc] User Manual Suggestion) Felipe Contreras
@ 2009-04-27  0:28 ` Björn Steinbrink
  2009-04-27 13:02   ` Michael Witten
  2009-04-27 12:06 ` Michael J Gruber
  1 sibling, 1 reply; 5+ messages in thread
From: Björn Steinbrink @ 2009-04-27  0:28 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: David Abrahams, Michael Witten, Jeff King, Daniel Barkalow,
	Johan Herland, git, J. Bruce Fields

On 2009.04.27 02:38:40 +0300, Felipe Contreras wrote:
> 2009/4/27 Björn Steinbrink <B.Steinbrink@gmx.de>:
> > On 2009.04.24 20:48:57 -0400, David Abrahams wrote:
> >>
> >> On Apr 24, 2009, at 8:01 PM, Michael Witten wrote:
> >>
> >>>> What's wrong with just calling the object name "object name"?
> >>>
> >>> What's wrong with calling the object address "object address"?
> >>
> >> Neither captures the connection to the object's contents.  I think
> >> "value ID" would be closer, but it's probably too horrible.
> >
> > I think I asked this in another mail, but I'm quite tired, so just to
> > make sure: What do you mean by "value"? I might be weird (I'm not a
> > native speaker, so I probably make funny and wrong connotations from
> > time to time), but while I can accept "content" to include the type and
> > size of the object, the term "value" makes me want to exclude those
> > pieces of meta data. So "value" somehow feels wrong to me, as the hash
> > covers those two fields.
> 
> Just to summarize.
> 
> Do you agree that SHA-1 is not the proper term to choose?

Yes, IMHO that's too strongly tied to the implementation. But a quick
grep run tells me that the "object name" area is probably not where you
need to get rid of that. The "object name" term is already used a lot.
If you want to ban SHA-1 then the rev-parse man page, describing the
"extended SHA1 syntax" would probably be a better place to start (unless
you want to "fix" everything at once).

> Do you agree that either 'id' or 'hash' would work fine?

"object id" would work for me, but I'm fine with the existing "object
name" as well. I don't like "object hash" (or "object hash id"), because
it IMHO doesn't express that well that it's used to identify an object.

> Personally I think there's an advantage of choosing 'hash'; if we pick
> 'id' then the user might think that he can change the contents of the
> object while keeping the same id, if we pick 'hash' then it's obvious
> the 'id' is tied to the content and why.

Heh, if you use "hash", there's no "id" tied to the content, there's
just the hash. SCNR ;-) See my other mails why I think that "hash" isn't
that advantageous.

Björn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lets avoid the SHA-1 term (was [doc] User Manual Suggestion)
  2009-04-26 23:38 Lets avoid the SHA-1 term (was [doc] User Manual Suggestion) Felipe Contreras
  2009-04-27  0:28 ` Björn Steinbrink
@ 2009-04-27 12:06 ` Michael J Gruber
  1 sibling, 0 replies; 5+ messages in thread
From: Michael J Gruber @ 2009-04-27 12:06 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Björn Steinbrink, David Abrahams, Michael Witten, Jeff King,
	Daniel Barkalow, Johan Herland, git, J. Bruce Fields,
	Johannes Sixt, Wincent Colaiuta, Junio C Hamano, Dmitry Potapov

Felipe Contreras venit, vidit, dixit 27.04.2009 01:38:
> 2009/4/27 Björn Steinbrink <B.Steinbrink@gmx.de>:
>> On 2009.04.24 20:48:57 -0400, David Abrahams wrote:
>>>
>>> On Apr 24, 2009, at 8:01 PM, Michael Witten wrote:
>>>
>>>>> What's wrong with just calling the object name "object name"?
>>>>
>>>> What's wrong with calling the object address "object address"?
>>>
>>> Neither captures the connection to the object's contents.  I think
>>> "value ID" would be closer, but it's probably too horrible.
>>
>> I think I asked this in another mail, but I'm quite tired, so just to
>> make sure: What do you mean by "value"? I might be weird (I'm not a
>> native speaker, so I probably make funny and wrong connotations from
>> time to time), but while I can accept "content" to include the type and
>> size of the object, the term "value" makes me want to exclude those
>> pieces of meta data. So "value" somehow feels wrong to me, as the hash
>> covers those two fields.
> 
> Just to summarize.
> 
> Do you agree that SHA-1 is not the proper term to choose?
> 
> Do you agree that either 'id' or 'hash' would work fine?
> 
> Personally I think there's an advantage of choosing 'hash'; if we pick
> 'id' then the user might think that he can change the contents of the
> object while keeping the same id, if we pick 'hash' then it's obvious
> the 'id' is tied to the content and why.
> 

Apparently a branch of that thread touched the "[PATCH 0/2] Unify use of
[sha,SHA][,-]1", so I'll do a cc merge, feeling entitled to summarize
the latter:

- There are two SHA-1ish things we talk about: the SHA-1 hash
algorithm/function on the one hand and git object names on the other hand.

- The object name of a file is not the SHA-1 checksum of its contents:
That's more or less obvious because there are no files in git, only
objects. The object name is the SHA-1 of a representation of an object
(which, for blobs, consists of header + content).

- There seemed to be an implicit claim that the Doc uses SHA-1 for the
algorithm and sha1/SHA1 for the object name. That's not founded by facts
(see below) and is not practical.

- The glossary defines SHA1 to be equivalent to the object name and does
not mention any other spelling.

The stats (line counts for simplicity) and facts for Documentation/ are:

SHA-1: 56
Used exclusively for the object name.

SHA1: 73
Used mostly for the object name, but also for the patch-id (SHA-1
checksum of patch), in the tutorial, and pack-format, i.e. in places
where the actual hash algorithm/function is mentioned.

sha1: 102
Used all over the place, mostly for the object name and when quoting
from the source. I don't think it's used for the hash algorithm/function.

sha-1: 0

So, the current confusion is mostly due to the fact that 3 different
names are used for the same thing (object name) and to a much lesser
degree to the fact that the same name (SHA1) is used for 2 different
things (hash algorithm/function vs. object name).

My patch tried to lessen the confusion by naming one thing by 1 name
only (SHA-1). It continued the tradition of identifying the object name
with the hash algorithm which is used in forming that name. I don't
think it matters much (confusion-wise) which one we choose from those 3,
it would be easy to rewrite the patch to use SHA1 or sha1 instead of
SHA-1 (and I'd be willing to), but consistently so.

An alternative patch would substitute most occurrences of the above by
X, X being the future term for "object name" to be agreed upon, and go
for say SHA-1 at the very few places where the actual algorithm is
mentioned. I just don't want to bet on that agreement and patch happening.

Michael

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lets avoid the SHA-1 term (was [doc] User Manual Suggestion)
  2009-04-27  0:28 ` Björn Steinbrink
@ 2009-04-27 13:02   ` Michael Witten
  2009-05-02 15:37     ` Björn Steinbrink
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Witten @ 2009-04-27 13:02 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Felipe Contreras, David Abrahams, Jeff King, Daniel Barkalow,
	Johan Herland, git, J. Bruce Fields

2009/4/26 Björn Steinbrink <B.Steinbrink@gmx.de>:
>
>> Do you agree that either 'id' or 'hash' would work fine?
>
> "object id" would work for me, but I'm fine with the existing "object
> name" as well. I don't like "object hash" (or "object hash id"), because
> it IMHO doesn't express that well that it's used to identify an object.

However, the SHA-1 hash is not actually essential to git. In the git
world, there is only content and every object is identified by its
content. Now, to identify an object, it would be pretty cumbersome to
have to write out the contents, so we abbreviate the contents with a
hash.

So, the hash or object name or object id or whatever you want to call
it isn't even an essential part to git. It is a convenience.

In that sense, I think that '[cryptographic] hash' is the right term,
because the others ("object name" and "object id") seem special. A
hash is not special. In fact, the documentation should read "For
convenience, the git tools refer to objects using the hash value of
their contents". You see? It's not essential.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lets avoid the SHA-1 term (was [doc] User Manual Suggestion)
  2009-04-27 13:02   ` Michael Witten
@ 2009-05-02 15:37     ` Björn Steinbrink
  0 siblings, 0 replies; 5+ messages in thread
From: Björn Steinbrink @ 2009-05-02 15:37 UTC (permalink / raw)
  To: Michael Witten
  Cc: Felipe Contreras, David Abrahams, Jeff King, Daniel Barkalow,
	Johan Herland, git, J. Bruce Fields

On 2009.04.27 08:02:02 -0500, Michael Witten wrote:
> 2009/4/26 Björn Steinbrink <B.Steinbrink@gmx.de>:
> >
> >> Do you agree that either 'id' or 'hash' would work fine?
> >
> > "object id" would work for me, but I'm fine with the existing "object
> > name" as well. I don't like "object hash" (or "object hash id"), because
> > it IMHO doesn't express that well that it's used to identify an object.
> 
> However, the SHA-1 hash is not actually essential to git. In the git
> world, there is only content and every object is identified by its
> content. Now, to identify an object, it would be pretty cumbersome to
> have to write out the contents, so we abbreviate the contents with a
> hash.
> 
> So, the hash or object name or object id or whatever you want to call
> it isn't even an essential part to git. It is a convenience.
> 
> In that sense, I think that '[cryptographic] hash' is the right term,
> because the others ("object name" and "object id") seem special. A
> hash is not special. In fact, the documentation should read "For
> convenience, the git tools refer to objects using the hash value of
> their contents". You see? It's not essential.

"For convenience" means "To make it suck less for the user" to me. And
that's why you can use an abbreviated object name as long as its unique.

That a hash is used isn't essential for the basic data model, where
commits reference trees which in turn reference other trees and blobs.
To understand that model, it's not essential to know that hashes are
used.  But it is essential that some kind of identifier other than the
whole content is used. Otherwise, the whole data model would make no
sense at all. If you use the whole content to identify an object, that
means that commits contain the whole trees which in turn contain the
whole other trees and the whole blobs. So you could as well just have
only commit objects, they contain everything anyway.

So that we have the object name instead of the whole content _is_
essential.

Björn

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-05-02 15:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-26 23:38 Lets avoid the SHA-1 term (was [doc] User Manual Suggestion) Felipe Contreras
2009-04-27  0:28 ` Björn Steinbrink
2009-04-27 13:02   ` Michael Witten
2009-05-02 15:37     ` Björn Steinbrink
2009-04-27 12:06 ` Michael J Gruber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).