* Re: Lets avoid the SHA-1 term (was [doc] User Manual Suggestion)
2009-04-26 23:38 Lets avoid the SHA-1 term (was [doc] User Manual Suggestion) Felipe Contreras
@ 2009-04-27 0:28 ` Björn Steinbrink
2009-04-27 13:02 ` Michael Witten
2009-04-27 12:06 ` Michael J Gruber
1 sibling, 1 reply; 5+ messages in thread
From: Björn Steinbrink @ 2009-04-27 0:28 UTC (permalink / raw)
To: Felipe Contreras
Cc: David Abrahams, Michael Witten, Jeff King, Daniel Barkalow,
Johan Herland, git, J. Bruce Fields
On 2009.04.27 02:38:40 +0300, Felipe Contreras wrote:
> 2009/4/27 Björn Steinbrink <B.Steinbrink@gmx.de>:
> > On 2009.04.24 20:48:57 -0400, David Abrahams wrote:
> >>
> >> On Apr 24, 2009, at 8:01 PM, Michael Witten wrote:
> >>
> >>>> What's wrong with just calling the object name "object name"?
> >>>
> >>> What's wrong with calling the object address "object address"?
> >>
> >> Neither captures the connection to the object's contents. I think
> >> "value ID" would be closer, but it's probably too horrible.
> >
> > I think I asked this in another mail, but I'm quite tired, so just to
> > make sure: What do you mean by "value"? I might be weird (I'm not a
> > native speaker, so I probably make funny and wrong connotations from
> > time to time), but while I can accept "content" to include the type and
> > size of the object, the term "value" makes me want to exclude those
> > pieces of meta data. So "value" somehow feels wrong to me, as the hash
> > covers those two fields.
>
> Just to summarize.
>
> Do you agree that SHA-1 is not the proper term to choose?
Yes, IMHO that's too strongly tied to the implementation. But a quick
grep run tells me that the "object name" area is probably not where you
need to get rid of that. The "object name" term is already used a lot.
If you want to ban SHA-1 then the rev-parse man page, describing the
"extended SHA1 syntax" would probably be a better place to start (unless
you want to "fix" everything at once).
> Do you agree that either 'id' or 'hash' would work fine?
"object id" would work for me, but I'm fine with the existing "object
name" as well. I don't like "object hash" (or "object hash id"), because
it IMHO doesn't express that well that it's used to identify an object.
> Personally I think there's an advantage of choosing 'hash'; if we pick
> 'id' then the user might think that he can change the contents of the
> object while keeping the same id, if we pick 'hash' then it's obvious
> the 'id' is tied to the content and why.
Heh, if you use "hash", there's no "id" tied to the content, there's
just the hash. SCNR ;-) See my other mails why I think that "hash" isn't
that advantageous.
Björn
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Lets avoid the SHA-1 term (was [doc] User Manual Suggestion)
2009-04-26 23:38 Lets avoid the SHA-1 term (was [doc] User Manual Suggestion) Felipe Contreras
2009-04-27 0:28 ` Björn Steinbrink
@ 2009-04-27 12:06 ` Michael J Gruber
1 sibling, 0 replies; 5+ messages in thread
From: Michael J Gruber @ 2009-04-27 12:06 UTC (permalink / raw)
To: Felipe Contreras
Cc: Björn Steinbrink, David Abrahams, Michael Witten, Jeff King,
Daniel Barkalow, Johan Herland, git, J. Bruce Fields,
Johannes Sixt, Wincent Colaiuta, Junio C Hamano, Dmitry Potapov
Felipe Contreras venit, vidit, dixit 27.04.2009 01:38:
> 2009/4/27 Björn Steinbrink <B.Steinbrink@gmx.de>:
>> On 2009.04.24 20:48:57 -0400, David Abrahams wrote:
>>>
>>> On Apr 24, 2009, at 8:01 PM, Michael Witten wrote:
>>>
>>>>> What's wrong with just calling the object name "object name"?
>>>>
>>>> What's wrong with calling the object address "object address"?
>>>
>>> Neither captures the connection to the object's contents. I think
>>> "value ID" would be closer, but it's probably too horrible.
>>
>> I think I asked this in another mail, but I'm quite tired, so just to
>> make sure: What do you mean by "value"? I might be weird (I'm not a
>> native speaker, so I probably make funny and wrong connotations from
>> time to time), but while I can accept "content" to include the type and
>> size of the object, the term "value" makes me want to exclude those
>> pieces of meta data. So "value" somehow feels wrong to me, as the hash
>> covers those two fields.
>
> Just to summarize.
>
> Do you agree that SHA-1 is not the proper term to choose?
>
> Do you agree that either 'id' or 'hash' would work fine?
>
> Personally I think there's an advantage of choosing 'hash'; if we pick
> 'id' then the user might think that he can change the contents of the
> object while keeping the same id, if we pick 'hash' then it's obvious
> the 'id' is tied to the content and why.
>
Apparently a branch of that thread touched the "[PATCH 0/2] Unify use of
[sha,SHA][,-]1", so I'll do a cc merge, feeling entitled to summarize
the latter:
- There are two SHA-1ish things we talk about: the SHA-1 hash
algorithm/function on the one hand and git object names on the other hand.
- The object name of a file is not the SHA-1 checksum of its contents:
That's more or less obvious because there are no files in git, only
objects. The object name is the SHA-1 of a representation of an object
(which, for blobs, consists of header + content).
- There seemed to be an implicit claim that the Doc uses SHA-1 for the
algorithm and sha1/SHA1 for the object name. That's not founded by facts
(see below) and is not practical.
- The glossary defines SHA1 to be equivalent to the object name and does
not mention any other spelling.
The stats (line counts for simplicity) and facts for Documentation/ are:
SHA-1: 56
Used exclusively for the object name.
SHA1: 73
Used mostly for the object name, but also for the patch-id (SHA-1
checksum of patch), in the tutorial, and pack-format, i.e. in places
where the actual hash algorithm/function is mentioned.
sha1: 102
Used all over the place, mostly for the object name and when quoting
from the source. I don't think it's used for the hash algorithm/function.
sha-1: 0
So, the current confusion is mostly due to the fact that 3 different
names are used for the same thing (object name) and to a much lesser
degree to the fact that the same name (SHA1) is used for 2 different
things (hash algorithm/function vs. object name).
My patch tried to lessen the confusion by naming one thing by 1 name
only (SHA-1). It continued the tradition of identifying the object name
with the hash algorithm which is used in forming that name. I don't
think it matters much (confusion-wise) which one we choose from those 3,
it would be easy to rewrite the patch to use SHA1 or sha1 instead of
SHA-1 (and I'd be willing to), but consistently so.
An alternative patch would substitute most occurrences of the above by
X, X being the future term for "object name" to be agreed upon, and go
for say SHA-1 at the very few places where the actual algorithm is
mentioned. I just don't want to bet on that agreement and patch happening.
Michael
^ permalink raw reply [flat|nested] 5+ messages in thread