git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [doc] User Manual Suggestion
@ 2009-04-22 19:38 David Abrahams
  2009-04-23 17:57 ` J. Bruce Fields
  0 siblings, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-22 19:38 UTC (permalink / raw)
  To: git


http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#how-to-check-out
covers "git reset" way too early, IMO, before one has the conceptual
foundation necessary to understand what it means to "modify the current
branch to point at v2.6.17".  If this operation must be covered this
early in the manual, it should probably not be until
http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#manipulating-branches

HTH,

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-22 19:38 [doc] User Manual Suggestion David Abrahams
@ 2009-04-23 17:57 ` J. Bruce Fields
  2009-04-23 18:37   ` Michael Witten
  0 siblings, 1 reply; 90+ messages in thread
From: J. Bruce Fields @ 2009-04-23 17:57 UTC (permalink / raw)
  To: David Abrahams; +Cc: git

On Wed, Apr 22, 2009 at 03:38:52PM -0400, David Abrahams wrote:
> 
> http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#how-to-check-out
> covers "git reset" way too early, IMO, before one has the conceptual
> foundation necessary to understand what it means to "modify the current
> branch to point at v2.6.17".  If this operation must be covered this
> early in the manual, it should probably not be until
> http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#manipulating-branches

I agree; we should suggest just a git-checkout (to a detached HEAD)
instead, though that needs a little explanation so people aren't scared
by the warning message it gives.

I also have a longstanding todo to experiment with rewriting the
beginning to use detached heads more and defer branch management till
later.

--b.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 17:57 ` J. Bruce Fields
@ 2009-04-23 18:37   ` Michael Witten
  2009-04-23 20:16     ` Jeff King
  2009-04-24  2:29     ` J. Bruce Fields
  0 siblings, 2 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-23 18:37 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: David Abrahams, git

On Thu, Apr 23, 2009 at 12:57, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Wed, Apr 22, 2009 at 03:38:52PM -0400, David Abrahams wrote:
>>
>> http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#how-to-check-out
>> covers "git reset" way too early, IMO, before one has the conceptual
>> foundation necessary to understand what it means to "modify the current
>> branch to point at v2.6.17".  If this operation must be covered this
>> early in the manual, it should probably not be until
>> http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#manipulating-branches
>
> I agree; we should suggest just a git-checkout (to a detached HEAD)
> instead, though that needs a little explanation so people aren't scared
> by the warning message it gives.

Everyone talks about "before one has the conceptual foundation
necessary to understand". Well, here's an idea: The git documentation
should start with the concepts!

Why don't the docs start out defining blobs and trees and the object
database and references into that database? The reason everything is
so confusing is that the understanding is brushed under the tutorial
rug. People need to learn how to think before they can effectively
learn to start doing.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 18:37   ` Michael Witten
@ 2009-04-23 20:16     ` Jeff King
  2009-04-23 20:45       ` Michael Witten
  2009-04-23 21:26       ` David Abrahams
  2009-04-24  2:29     ` J. Bruce Fields
  1 sibling, 2 replies; 90+ messages in thread
From: Jeff King @ 2009-04-23 20:16 UTC (permalink / raw)
  To: Michael Witten; +Cc: J. Bruce Fields, David Abrahams, git

On Thu, Apr 23, 2009 at 01:37:05PM -0500, Michael Witten wrote:

> Everyone talks about "before one has the conceptual foundation
> necessary to understand". Well, here's an idea: The git documentation
> should start with the concepts!
> 
> Why don't the docs start out defining blobs and trees and the object
> database and references into that database? The reason everything is
> so confusing is that the understanding is brushed under the tutorial
> rug. People need to learn how to think before they can effectively
> learn to start doing.

I agree with you, but not everyone does (and you can find prior debates
in the list archives). The user-manual is pretty "top down". There are
some "bottom-up" resources available, but I haven't seen one pointed to
as "definitive". I think it might actually be nice for there to be a
parallel to the user manual that follows the bottom-up approach, and
people could read the one that appeals most to them (or if they have a
lot of time on their hands, read both and hopefully it makes sense in
the middle ;) ).

But we would need somebody to volunteer to write it. I would be happy to
help out, but I'm too short on time at the moment to be the driving
force.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 20:16     ` Jeff King
@ 2009-04-23 20:45       ` Michael Witten
  2009-04-23 21:31         ` David Abrahams
  2009-04-24 14:11         ` Jeff King
  2009-04-23 21:26       ` David Abrahams
  1 sibling, 2 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-23 20:45 UTC (permalink / raw)
  To: Jeff King; +Cc: J. Bruce Fields, David Abrahams, git

On Thu, Apr 23, 2009 at 15:16, Jeff King <peff@peff.net> wrote:
> On Thu, Apr 23, 2009 at 01:37:05PM -0500, Michael Witten wrote:
>
>> Everyone talks about "before one has the conceptual foundation
>> necessary to understand". Well, here's an idea: The git documentation
>> should start with the concepts!
>>
>> Why don't the docs start out defining blobs and trees and the object
>> database and references into that database? The reason everything is
>> so confusing is that the understanding is brushed under the tutorial
>> rug. People need to learn how to think before they can effectively
>> learn to start doing.
>
> I agree with you, but not everyone does (and you can find prior debates
> in the list archives). The user-manual is pretty "top down". There are
> some "bottom-up" resources available, but I haven't seen one pointed to
> as "definitive".I think it might actually be nice for there to be a
> parallel to the user manual that follows the bottom-up approach, and
> people could read the one that appeals most to them (or if they have a
> lot of time on their hands, read both and hopefully it makes sense in
> the middle ;) ).

I think the main problem, then, is that the tools have a UI that is
somewhere in the middle.

However, a discussion of blobs, trees, commits, objects, and
references isn't necessarily low-level. It seems to me that it is a
high-level understanding of the git world. Without those
*definitions*, people are left to their own wrong, inconsistent
thoughts.

The low-level stuff is HOW those concepts have been used in the
implementation of git: Where certain files are stored, how certain
bytes are organized in memory, what are the underlying porcelain
tools, etc. That what's low-level.

> But we would need somebody to volunteer to write it. I would be happy to
> help out, but I'm too short on time at the moment to be the driving
> force.

Maybe I'll try to write something, but it won't take place quickly,
either. I'd want to read ALL of the existing documentation first.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 20:16     ` Jeff King
  2009-04-23 20:45       ` Michael Witten
@ 2009-04-23 21:26       ` David Abrahams
  2009-04-23 22:51         ` Johan Herland
  1 sibling, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-23 21:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Witten, J. Bruce Fields, git


On Apr 23, 2009, at 4:16 PM, Jeff King wrote:

> On Thu, Apr 23, 2009 at 01:37:05PM -0500, Michael Witten wrote:
>
>> Everyone talks about "before one has the conceptual foundation
>> necessary to understand". Well, here's an idea: The git documentation
>> should start with the concepts!
>>
>> Why don't the docs start out defining blobs and trees and the object
>> database and references into that database? The reason everything is
>> so confusing is that the understanding is brushed under the tutorial
>> rug. People need to learn how to think before they can effectively
>> learn to start doing.
>
> I agree with you, but not everyone does (and you can find prior  
> debates
> in the list archives). The user-manual is pretty "top down".

And that's a problem because so many things are badly named.  It also  
leaves out lots of top

> There are
> some "bottom-up" resources available, but I haven't seen one pointed  
> to
> as "definitive".

I've been pointed at:

1. http://eagain.net/articles/git-for-computer-scientists
2. http://www.newartisans.com/2008/04/git-from-the-bottom-up.html

which, IMO, should be read in that order.  I've just sent John Wiegley  
a huge pile of editorial commentary on #2, which I think could improve  
things.

But that said, "laying conceptual foundation" doesn't imply bottom- 
up!  In fact, I don't think the first one is particularly bottom-up

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 20:45       ` Michael Witten
@ 2009-04-23 21:31         ` David Abrahams
  2009-04-24  0:31           ` Michael Witten
  2009-04-24 14:18           ` Jeff King
  2009-04-24 14:11         ` Jeff King
  1 sibling, 2 replies; 90+ messages in thread
From: David Abrahams @ 2009-04-23 21:31 UTC (permalink / raw)
  To: Michael Witten; +Cc: Jeff King, J. Bruce Fields, git


On Apr 23, 2009, at 4:45 PM, Michael Witten wrote:

> On Thu, Apr 23, 2009 at 15:16, Jeff King <peff@peff.net> wrote:
>> On Thu, Apr 23, 2009 at 01:37:05PM -0500, Michael Witten wrote:
>>
>>> Everyone talks about "before one has the conceptual foundation
>>> necessary to understand". Well, here's an idea: The git  
>>> documentation
>>> should start with the concepts!
>>>
>>> Why don't the docs start out defining blobs and trees and the object
>>> database and references into that database? The reason everything is
>>> so confusing is that the understanding is brushed under the tutorial
>>> rug. People need to learn how to think before they can effectively
>>> learn to start doing.
>>
>> I agree with you, but not everyone does (and you can find prior  
>> debates
>> in the list archives). The user-manual is pretty "top down". There  
>> are
>> some "bottom-up" resources available, but I haven't seen one  
>> pointed to
>> as "definitive".I think it might actually be nice for there to be a
>> parallel to the user manual that follows the bottom-up approach, and
>> people could read the one that appeals most to them (or if they  
>> have a
>> lot of time on their hands, read both and hopefully it makes sense in
>> the middle ;) ).
>
> I think the main problem, then, is that the tools have a UI that is
> somewhere in the middle.

Well, "the UI" (how many do we really have for Git?) is spread across  
the spectrum.  The git command-line alone lets you do incredibly low- 
level things that "nobody should ever do" and some really high-level  
things that are everyone's bread-and-butter.  There's no obvious  
distinction.

> However, a discussion of blobs, trees, commits, objects, and
> references isn't necessarily low-level. It seems to me that it is a
> high-level understanding of the git world. Without those
> *definitions*, people are left to their own wrong, inconsistent
> thoughts.

1000% agreed.

> The low-level stuff is HOW those concepts have been used in the
> implementation of git: Where certain files are stored, how certain
> bytes are organized in memory, what are the underlying porcelain
> tools, etc. That what's low-level.

Yep

>> But we would need somebody to volunteer to write it. I would be  
>> happy to
>> help out, but I'm too short on time at the moment to be the driving
>> force.
>
> Maybe I'll try to write something, but it won't take place quickly,
> either. I'd want to read ALL of the existing documentation first.

See you in a couple years ;-)

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 21:26       ` David Abrahams
@ 2009-04-23 22:51         ` Johan Herland
  2009-04-24  0:30           ` Michael Witten
  0 siblings, 1 reply; 90+ messages in thread
From: Johan Herland @ 2009-04-23 22:51 UTC (permalink / raw)
  To: git; +Cc: David Abrahams, Jeff King, Michael Witten, J. Bruce Fields

On Thursday 23 April 2009, David Abrahams wrote:
> On Apr 23, 2009, at 4:16 PM, Jeff King wrote:
> > There are some "bottom-up" resources available, but I haven't seen one
> > pointed to as "definitive".
> I've been pointed at:
>
> 1. http://eagain.net/articles/git-for-computer-scientists
> 2. http://www.newartisans.com/2008/04/git-from-the-bottom-up.html

There's also http://www.eecs.harvard.edu/~cduan/technical/git/ which I think 
is a great bottom-up introduction:
- not too heavy on the concepts
- shows how the concepts relates to common git commands
- short enough to be covered in just 1-2 sessions.

In fact, I'm loosely planning a presentation on Git (for $dayjob), and I'm 
probably going to base it on this introduction.


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 22:51         ` Johan Herland
@ 2009-04-24  0:30           ` Michael Witten
  2009-04-24 20:30             ` Johan Herland
  0 siblings, 1 reply; 90+ messages in thread
From: Michael Witten @ 2009-04-24  0:30 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, David Abrahams, Jeff King, J. Bruce Fields

On Thu, Apr 23, 2009 at 17:51, Johan Herland <johan@herland.net> wrote:
> On Thursday 23 April 2009, David Abrahams wrote:
>> On Apr 23, 2009, at 4:16 PM, Jeff King wrote:
>> > There are some "bottom-up" resources available, but I haven't seen one
>> > pointed to as "definitive".
>> I've been pointed at:
>>
>> 1. http://eagain.net/articles/git-for-computer-scientists
>> 2. http://www.newartisans.com/2008/04/git-from-the-bottom-up.html
>
> There's also http://www.eecs.harvard.edu/~cduan/technical/git/ which I think
> is a great bottom-up introduction:
> - not too heavy on the concepts

I really don't understand this mentality. Concepts are the only things
that are important. From concepts falls all else.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 21:31         ` David Abrahams
@ 2009-04-24  0:31           ` Michael Witten
  2009-04-24 14:18           ` Jeff King
  1 sibling, 0 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-24  0:31 UTC (permalink / raw)
  To: David Abrahams; +Cc: Jeff King, J. Bruce Fields, git

On Thu, Apr 23, 2009 at 16:31, David Abrahams <dave@boostpro.com> wrote:
>
>
>> However, a discussion of blobs, trees, commits, objects, and
>> references isn't necessarily low-level. It seems to me that it is a
>> high-level understanding of the git world. Without those
>> *definitions*, people are left to their own wrong, inconsistent
>> thoughts.
>
> 1000% agreed.

I think this is a case in point:

    http://marc.info/?l=git&m=124052299832318&w=2

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 18:37   ` Michael Witten
  2009-04-23 20:16     ` Jeff King
@ 2009-04-24  2:29     ` J. Bruce Fields
  2009-04-24  2:34       ` Michael Witten
  2009-04-24  4:06       ` David Abrahams
  1 sibling, 2 replies; 90+ messages in thread
From: J. Bruce Fields @ 2009-04-24  2:29 UTC (permalink / raw)
  To: Michael Witten; +Cc: David Abrahams, git

On Thu, Apr 23, 2009 at 01:37:05PM -0500, Michael Witten wrote:
> On Thu, Apr 23, 2009 at 12:57, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Wed, Apr 22, 2009 at 03:38:52PM -0400, David Abrahams wrote:
> >>
> >> http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#how-to-check-out
> >> covers "git reset" way too early, IMO, before one has the conceptual
> >> foundation necessary to understand what it means to "modify the current
> >> branch to point at v2.6.17".  If this operation must be covered this
> >> early in the manual, it should probably not be until
> >> http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#manipulating-branches
> >
> > I agree; we should suggest just a git-checkout (to a detached HEAD)
> > instead, though that needs a little explanation so people aren't scared
> > by the warning message it gives.
> 
> Everyone talks about "before one has the conceptual foundation
> necessary to understand". Well, here's an idea: The git documentation
> should start with the concepts!
> 
> Why don't the docs start out defining blobs and trees and the object
> database and references into that database? The reason everything is
> so confusing is that the understanding is brushed under the tutorial
> rug. People need to learn how to think before they can effectively
> learn to start doing.

OK, but let's not over-generalize: the person that just wants to figure
out whether the driver for their network card was fixed in today's
network devel tree shouldn't have to sit through a discussion of the
object database.  And even among readers that are in it for the long
haul, I think many people will react better to something that gives them
at least a little concrete how-to information up front.

So the goal was always to find a tutorial route through the material
that would allow us to introduce the concepts as we go along.

And I agree that I haven't succeeded at that--patches welcomed,
including patches that, say, move more of the current chapter 7 to an
earlier place.  (But this has to be done carefully, and I'd still rather
it not be the *very* first thing.)

I've unfortunately had a lot less time to work on this, but am happy to
at least help review patches.

--b.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24  2:29     ` J. Bruce Fields
@ 2009-04-24  2:34       ` Michael Witten
  2009-04-24  4:06       ` David Abrahams
  1 sibling, 0 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-24  2:34 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: David Abrahams, git

On Thu, Apr 23, 2009 at 21:29, J. Bruce Fields <bfields@fieldses.org> wrote:
> OK, but let's not over-generalize: the person that just wants to figure
> out whether the driver for their network card was fixed in today's
> network devel tree shouldn't have to sit through a discussion of the
> object database.  And even among readers that are in it for the long
> haul, I think many people will react better to something that gives them
> at least a little concrete how-to information up front.

A quick shell synopsis is probably what you want then. Beyond that,
casual users should be ignored; quick instructions are usually
provided by each project anyway.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24  2:29     ` J. Bruce Fields
  2009-04-24  2:34       ` Michael Witten
@ 2009-04-24  4:06       ` David Abrahams
  2009-04-24 14:10         ` J. Bruce Fields
  1 sibling, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-24  4:06 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Michael Witten, git


On Apr 23, 2009, at 10:29 PM, J. Bruce Fields wrote:

> On Thu, Apr 23, 2009 at 01:37:05PM -0500, Michael Witten wrote:
>> On Thu, Apr 23, 2009 at 12:57, J. Bruce Fields  
>> <bfields@fieldses.org> wrote:
>>> On Wed, Apr 22, 2009 at 03:38:52PM -0400, David Abrahams wrote:
>>>>
>>>> http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#how-to-check-out
>>>> covers "git reset" way too early, IMO, before one has the  
>>>> conceptual
>>>> foundation necessary to understand what it means to "modify the  
>>>> current
>>>> branch to point at v2.6.17".  If this operation must be covered  
>>>> this
>>>> early in the manual, it should probably not be until
>>>> http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#manipulating-branches
>>>
>>> I agree; we should suggest just a git-checkout (to a detached HEAD)
>>> instead, though that needs a little explanation so people aren't  
>>> scared
>>> by the warning message it gives.
>>
>> Everyone talks about "before one has the conceptual foundation
>> necessary to understand". Well, here's an idea: The git documentation
>> should start with the concepts!
>>
>> Why don't the docs start out defining blobs and trees and the object
>> database and references into that database? The reason everything is
>> so confusing is that the understanding is brushed under the tutorial
>> rug. People need to learn how to think before they can effectively
>> learn to start doing.
>
> OK, but let's not over-generalize: the person that just wants to  
> figure
> out whether the driver for their network card was fixed in today's
> network devel tree shouldn't have to sit through a discussion of the
> object database.

Those people don't need a VCS.  They should download a snapshot or use  
a web interface.  Seriously.  There's no way you can make even the  
best-designed VCS simple enough to justify the time it takes to learn  
enough just to use it for that.

> And even among readers that are in it for the long
> haul, I think many people will react better to something that gives  
> them
> at least a little concrete how-to information up front.

People (well, people like me) should get a brief "hello, world" demo  
up front, to give them a feel for the flavor of the system, but  
[important:] it shouldn't attempt to be instructive.  Fundamental  
concepts are next.  How-to information can come after that, or after  
the reference information.

> So the goal was always to find a tutorial route through the material
> that would allow us to introduce the concepts as we go along.

Maybe that will work for some people, but it *really* won't work for  
me.  You can't start throwing around terms of art without defining  
them unless you want to raise more questions than you're answering.  I  
would be surprised if it wasn't the same for many tech people.


--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24  4:06       ` David Abrahams
@ 2009-04-24 14:10         ` J. Bruce Fields
  0 siblings, 0 replies; 90+ messages in thread
From: J. Bruce Fields @ 2009-04-24 14:10 UTC (permalink / raw)
  To: David Abrahams; +Cc: Michael Witten, git

On Fri, Apr 24, 2009 at 12:06:12AM -0400, David Abrahams wrote:
>
> On Apr 23, 2009, at 10:29 PM, J. Bruce Fields wrote:
>
>> On Thu, Apr 23, 2009 at 01:37:05PM -0500, Michael Witten wrote:
>>> On Thu, Apr 23, 2009 at 12:57, J. Bruce Fields  
>>> <bfields@fieldses.org> wrote:
>>> Why don't the docs start out defining blobs and trees and the object
>>> database and references into that database? The reason everything is
>>> so confusing is that the understanding is brushed under the tutorial
>>> rug. People need to learn how to think before they can effectively
>>> learn to start doing.
>>
>> OK, but let's not over-generalize: the person that just wants to  
>> figure
>> out whether the driver for their network card was fixed in today's
>> network devel tree shouldn't have to sit through a discussion of the
>> object database.
>
> Those people don't need a VCS.  They should download a snapshot or use a 
> web interface.  Seriously.  There's no way you can make even the  
> best-designed VCS simple enough to justify the time it takes to learn  
> enough just to use it for that.
>
>> And even among readers that are in it for the long
>> haul, I think many people will react better to something that gives  
>> them
>> at least a little concrete how-to information up front.
>
> People (well, people like me) should get a brief "hello, world" demo up 
> front, to give them a feel for the flavor of the system, but  
> [important:] it shouldn't attempt to be instructive.  Fundamental  
> concepts are next.  How-to information can come after that, or after the 
> reference information.
>
>> So the goal was always to find a tutorial route through the material
>> that would allow us to introduce the concepts as we go along.
>
> Maybe that will work for some people, but it *really* won't work for me.  
> You can't start throwing around terms of art without defining them unless 
> you want to raise more questions than you're answering.  I would be 
> surprised if it wasn't the same for many tech people.

I agree that (with rare exceptions) terms shouldn't be used before
they're defined.  I don't agree with all of the above, but I think we
could come to a satisfactory compromise.  I'll see if I can find a few
hours this weekend to at least sketch a new organization.  But, as I've
said, I'm short on time and could really use some help.

--b.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 20:45       ` Michael Witten
  2009-04-23 21:31         ` David Abrahams
@ 2009-04-24 14:11         ` Jeff King
  2009-04-24 14:30           ` Michael Witten
  1 sibling, 1 reply; 90+ messages in thread
From: Jeff King @ 2009-04-24 14:11 UTC (permalink / raw)
  To: Michael Witten; +Cc: J. Bruce Fields, David Abrahams, git

On Thu, Apr 23, 2009 at 03:45:46PM -0500, Michael Witten wrote:

> However, a discussion of blobs, trees, commits, objects, and
> references isn't necessarily low-level. It seems to me that it is a
> high-level understanding of the git world. Without those
> *definitions*, people are left to their own wrong, inconsistent
> thoughts.
> 
> The low-level stuff is HOW those concepts have been used in the
> implementation of git: Where certain files are stored, how certain
> bytes are organized in memory, what are the underlying porcelain
> tools, etc. That what's low-level.

I think I wasn't clear in my original message. I didn't mean teaching
low-level stuff like plumbing or file layouts. By "bottom-up" I really
meant teaching concepts (like objects, their types, and references),
from which user operations and workflows can be explained (or often
deduced by the user). Whereas a top-down approach would _start_ with
workflows and say "To accomplish X, do Y".

So I think we are in agreement about the right "level" to start at.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-23 21:31         ` David Abrahams
  2009-04-24  0:31           ` Michael Witten
@ 2009-04-24 14:18           ` Jeff King
  2009-04-24 14:20             ` J. Bruce Fields
  2009-04-24 17:28             ` David Abrahams
  1 sibling, 2 replies; 90+ messages in thread
From: Jeff King @ 2009-04-24 14:18 UTC (permalink / raw)
  To: David Abrahams; +Cc: Michael Witten, J. Bruce Fields, git

On Thu, Apr 23, 2009 at 05:31:13PM -0400, David Abrahams wrote:

>> I think the main problem, then, is that the tools have a UI that is
>> somewhere in the middle.
>
> Well, "the UI" (how many do we really have for Git?) is spread across the 
> spectrum.  The git command-line alone lets you do incredibly low-level 
> things that "nobody should ever do" and some really high-level things that 
> are everyone's bread-and-butter.  There's no obvious distinction.

I think this is a bit better than it used to be. Plumbing commands are
mostly hidden outside of the user's PATH. Unfortunately there are still
some warts, like the fact that users may be referred to "git help
rev-parse" to learn about how revisions are specified. But they have to
wade through the information on the "rev-parse" command, which is
something that most users will never need to know or care about.

A lot of that is historical baggage. The original git was not a VCS but
rather a _toolkit_ for building a VCS. So the natural place for talking
about parsing revisions was rev-parse, because that was the only way to
access the revision parsing code. :)

I think a lot of documentation like the "specifying revisions" section
of rev-parse might benefit from being split into its own "concept"
section, like gitrevisions(7). And commands which allow specifying
revisions (at least the major ones, like log, diff, etc) should
reference it (but not include it directly, as we do with some
documentation snippets -- the point is to make the user aware that they
are learning a separate concept that can be applied in multiple places,
and to give that concept a name).

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 14:18           ` Jeff King
@ 2009-04-24 14:20             ` J. Bruce Fields
  2009-04-24 17:28             ` David Abrahams
  1 sibling, 0 replies; 90+ messages in thread
From: J. Bruce Fields @ 2009-04-24 14:20 UTC (permalink / raw)
  To: Jeff King; +Cc: David Abrahams, Michael Witten, git

On Fri, Apr 24, 2009 at 10:18:47AM -0400, Jeff King wrote:
> On Thu, Apr 23, 2009 at 05:31:13PM -0400, David Abrahams wrote:
> 
> >> I think the main problem, then, is that the tools have a UI that is
> >> somewhere in the middle.
> >
> > Well, "the UI" (how many do we really have for Git?) is spread across the 
> > spectrum.  The git command-line alone lets you do incredibly low-level 
> > things that "nobody should ever do" and some really high-level things that 
> > are everyone's bread-and-butter.  There's no obvious distinction.
> 
> I think this is a bit better than it used to be. Plumbing commands are
> mostly hidden outside of the user's PATH. Unfortunately there are still
> some warts, like the fact that users may be referred to "git help
> rev-parse" to learn about how revisions are specified. But they have to
> wade through the information on the "rev-parse" command, which is
> something that most users will never need to know or care about.
> 
> A lot of that is historical baggage. The original git was not a VCS but
> rather a _toolkit_ for building a VCS. So the natural place for talking
> about parsing revisions was rev-parse, because that was the only way to
> access the revision parsing code. :)
> 
> I think a lot of documentation like the "specifying revisions" section
> of rev-parse might benefit from being split into its own "concept"
> section, like gitrevisions(7). And commands which allow specifying
> revisions (at least the major ones, like log, diff, etc) should
> reference it (but not include it directly, as we do with some
> documentation snippets -- the point is to make the user aware that they
> are learning a separate concept that can be applied in multiple places,
> and to give that concept a name).

I'd be in favor of that.

--b.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 14:11         ` Jeff King
@ 2009-04-24 14:30           ` Michael Witten
  2009-04-24 14:33             ` Michael Witten
  2009-04-24 15:04             ` Jeff King
  0 siblings, 2 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-24 14:30 UTC (permalink / raw)
  To: Jeff King; +Cc: J. Bruce Fields, David Abrahams, git

On Fri, Apr 24, 2009 at 09:11, Jeff King <peff@peff.net> wrote:
> I think I wasn't clear in my original message. I didn't mean teaching
> low-level stuff like plumbing or file layouts. By "bottom-up" I really
> meant teaching concepts (like objects, their types, and references),
> from which user operations and workflows can be explained (or often
> deduced by the user). Whereas a top-down approach would _start_ with
> workflows and say "To accomplish X, do Y".

I knew you would make exactly this rebuttle ;-D

However, notice that you can't reasonably be expected to understand
"accomplish X" without having concepts like objects and references.
The reason most people get by is that git's operation can be
compatible with a number of other theories people might have already
picked up from using computers. The trouble starts when their existing
theories don't mesh well with the underlying git theory, leading the
user to develop the equivalent of epicycles in order to explain to
himself whats going on.

Basically, the problem is that the documentation is currently catering
for people, who just want to download source files (as Bruce basically
said); a quick shell synopsis for this is fine, but there needs to be
documentation solely devoted to understanding git fully and precisely.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 14:30           ` Michael Witten
@ 2009-04-24 14:33             ` Michael Witten
  2009-04-24 15:04             ` Jeff King
  1 sibling, 0 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-24 14:33 UTC (permalink / raw)
  To: Jeff King; +Cc: J. Bruce Fields, David Abrahams, git

On Fri, Apr 24, 2009 at 09:30, Michael Witten <mfwitten@gmail.com> wrote:
> there needs to be
> documentation solely devoted to understanding git fully and precisely.

A user should be able to read from top-to bottom in one-pass----no
jumping around or later clarifications.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 14:30           ` Michael Witten
  2009-04-24 14:33             ` Michael Witten
@ 2009-04-24 15:04             ` Jeff King
  2009-04-24 15:18               ` Michael Witten
  1 sibling, 1 reply; 90+ messages in thread
From: Jeff King @ 2009-04-24 15:04 UTC (permalink / raw)
  To: Michael Witten; +Cc: J. Bruce Fields, David Abrahams, git

On Fri, Apr 24, 2009 at 09:30:20AM -0500, Michael Witten wrote:

> On Fri, Apr 24, 2009 at 09:11, Jeff King <peff@peff.net> wrote:
> > I think I wasn't clear in my original message. I didn't mean teaching
> > low-level stuff like plumbing or file layouts. By "bottom-up" I really
> > meant teaching concepts (like objects, their types, and references),
> > from which user operations and workflows can be explained (or often
> > deduced by the user). Whereas a top-down approach would _start_ with
> > workflows and say "To accomplish X, do Y".
> 
> I knew you would make exactly this rebuttle ;-D
> 
> However, notice that you can't reasonably be expected to understand
> "accomplish X" without having concepts like objects and references.

Heh. I don't think you also predicted the paragraph that I ended up
deleting, which made it more clear that I was not trying to rebut, but
rather agree.

Like you, I think that not teaching concepts first leads to confusion
later.  Version control (or at least git) is just complex enough that
you are much better off understanding what is happening than simply
following a recipe. So when your recipe doesn't go as planned, or you
don't know which recipe to use, or you need some variant of a recipe,
you have some basis for understanding what to do.

But users in the past have really seemed to want to start with recipes,
so that they can be productive as soon as possible (and I think some
people have said that the top-down ordering just makes more sense to
them, so it may just be a matter of learning style). And I think the
user manual is somewhat of a response to that request, since the
command manpages are very bottom-up (but are also quite confusing, just
because of their size, and because concept information is scattered
throughout).

So I am advocating for more bottom-up documentation (which I think you
are), but I don't necessarily think it should _replace_ the top-down
documentation (which I'm not sure is your position or not).

> The reason most people get by is that git's operation can be
> compatible with a number of other theories people might have already
> picked up from using computers. The trouble starts when their existing
> theories don't mesh well with the underlying git theory, leading the
> user to develop the equivalent of epicycles in order to explain to
> himself whats going on.

Epicycles? I thought commit orbits were defined by the ether through
they flowed.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 15:04             ` Jeff King
@ 2009-04-24 15:18               ` Michael Witten
  2009-04-24 17:38                 ` J. Bruce Fields
  0 siblings, 1 reply; 90+ messages in thread
From: Michael Witten @ 2009-04-24 15:18 UTC (permalink / raw)
  To: Jeff King; +Cc: J. Bruce Fields, David Abrahams, git

On Fri, Apr 24, 2009 at 10:04, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 24, 2009 at 09:30:20AM -0500, Michael Witten wrote:
>
>> On Fri, Apr 24, 2009 at 09:11, Jeff King <peff@peff.net> wrote:
>> > I think I wasn't clear in my original message. I didn't mean teaching
>> > low-level stuff like plumbing or file layouts. By "bottom-up" I really
>> > meant teaching concepts (like objects, their types, and references),
>> > from which user operations and workflows can be explained (or often
>> > deduced by the user). Whereas a top-down approach would _start_ with
>> > workflows and say "To accomplish X, do Y".
>>
>> I knew you would make exactly this rebuttle ;-D
>>
>> However, notice that you can't reasonably be expected to understand
>> "accomplish X" without having concepts like objects and references.
>
> Heh. I don't think you also predicted the paragraph that I ended up
> deleting, which made it more clear that I was not trying to rebut, but
> rather agree.

Indeed. I saw that last sentence of yours, but I consciously ignored
it, because I like to argue ;-)

> Like you, I think that not teaching concepts first leads to confusion
> later.  Version control (or at least git) is just complex enough that
> you are much better off understanding what is happening than simply
> following a recipe. So when your recipe doesn't go as planned, or you
> don't know which recipe to use, or you need some variant of a recipe,
> you have some basis for understanding what to do.

That, my friend, is the most important lesson of learning.

> But users in the past have really seemed to want to start with recipes,
> so that they can be productive as soon as possible (and I think some
> people have said that the top-down ordering just makes more sense to
> them, so it may just be a matter of learning style). And I think the
> user manual is somewhat of a response to that request, since the
> command manpages are very bottom-up (but are also quite confusing, just
> because of their size, and because concept information is scattered
> throughout).
>
> So I am advocating for more bottom-up documentation (which I think you
> are), but I don't necessarily think it should _replace_ the top-down
> documentation (which I'm not sure is your position or not).

I think that we've already got that tutorial-esque style covered (I
haven't read it in a while):

    http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html

However, the User Manual should make a Mathematician happy.

>> The reason most people get by is that git's operation can be
>> compatible with a number of other theories people might have already
>> picked up from using computers. The trouble starts when their existing
>> theories don't mesh well with the underlying git theory, leading the
>> user to develop the equivalent of epicycles in order to explain to
>> himself whats going on.
>
> Epicycles? I thought commit orbits were defined by the ether through
> they flowed.

Actually, those commit orbits are defined by the giant glass sphere to
which they are attached.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 14:18           ` Jeff King
  2009-04-24 14:20             ` J. Bruce Fields
@ 2009-04-24 17:28             ` David Abrahams
  2009-04-24 18:15               ` Jeff King
  1 sibling, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-24 17:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Witten, J. Bruce Fields, git


On Apr 24, 2009, at 10:18 AM, Jeff King wrote:

> On Thu, Apr 23, 2009 at 05:31:13PM -0400, David Abrahams wrote:
>
>>> I think the main problem, then, is that the tools have a UI that is
>>> somewhere in the middle.
>>
>> Well, "the UI" (how many do we really have for Git?) is spread  
>> across the
>> spectrum.  The git command-line alone lets you do incredibly low- 
>> level
>> things that "nobody should ever do" and some really high-level  
>> things that
>> are everyone's bread-and-butter.  There's no obvious distinction.
>
> I think this is a bit better than it used to be. Plumbing commands are
> mostly hidden outside of the user's PATH.

Huh?

git hash-object
git cat-file -t ...
git ls-tree
git rev-parse
git write-tree
git commit-tree

   ...

These are just some of the ones I learned about by reading John  
Wiegley's "Git From the Bottom Up."

Maybe I'm wrong about rev-parse, but for the most part, having all  
these low-level commands available through the same executable that's  
used for "git add," "git merge," "git commit," et. al. makes the whole  
shebang hard to approach.  It would be better for users if the low- 
level stuff was accessed some other way.

> A lot of that is historical baggage. The original git was not a VCS  
> but
> rather a _toolkit_ for building a VCS. So the natural place for  
> talking
> about parsing revisions was rev-parse, because that was the only way  
> to
> access the revision parsing code. :)

I understand that, but it doesn't change the present reality.

> I think a lot of documentation like the "specifying revisions" section
> of rev-parse might benefit from being split into its own "concept"
> section, like gitrevisions(7).

Yes, please.


[excuse me, but what the #@&*! is "porcelainish" supposed to mean? (http://www.kernel.org/pub/software/scm/git/docs/git-rev-parse.html 
)]

> And commands which allow specifying
> revisions (at least the major ones, like log, diff, etc) should
> reference it (but not include it directly, as we do with some
> documentation snippets -- the point is to make the user aware that  
> they
> are learning a separate concept that can be applied in multiple  
> places,
> and to give that concept a name).


Very nice.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 15:18               ` Michael Witten
@ 2009-04-24 17:38                 ` J. Bruce Fields
  2009-04-24 18:27                   ` Jeff King
                                     ` (2 more replies)
  0 siblings, 3 replies; 90+ messages in thread
From: J. Bruce Fields @ 2009-04-24 17:38 UTC (permalink / raw)
  To: Michael Witten; +Cc: Jeff King, David Abrahams, git

On Fri, Apr 24, 2009 at 10:18:15AM -0500, Michael Witten wrote:
> I think that we've already got that tutorial-esque style covered (I
> haven't read it in a while):
> 
>     http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html
> 
> However, the User Manual should make a Mathematician happy.

I'm all for making mathematicians happy.  But, again, help?:

	- Specific examples?
	- Patches?  Please, patches?
	- Suggested text?
	- Suggested outline?

There's no shortage of high-level ideas.  What there's always a need for
more of is people willing to submit patches, respond to review, etc.

--b.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 17:28             ` David Abrahams
@ 2009-04-24 18:15               ` Jeff King
  2009-04-24 19:00                 ` David Abrahams
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff King @ 2009-04-24 18:15 UTC (permalink / raw)
  To: David Abrahams; +Cc: Michael Witten, J. Bruce Fields, git

On Fri, Apr 24, 2009 at 01:28:35PM -0400, David Abrahams wrote:

>> I think this is a bit better than it used to be. Plumbing commands are
>> mostly hidden outside of the user's PATH.
>
> Huh?
>
> git hash-object
> git cat-file -t ...
> git ls-tree
> git rev-parse
> git write-tree
> git commit-tree

How did you find out about them? They are not in your PATH, so shell
completion doesn't find them. They are not in the programmable bash
completion. They are not in the short command list git gives you when
you type "git help" or "git" without arguments.

So you must have read about them somewhere...

> These are just some of the ones I learned about by reading John Wiegley's 
> "Git From the Bottom Up."

...like here. So if that document gave you the impression that those are
part of an everyday git workflow, then I think the document is at fault,
not git itself.

I admit I haven't read "Git From the Bottom Up" carefully, but I think
what Michael is proposing would probably start a little higher from the
bottom than that document. You can give the concepts of the object
types, show them in pretty-printed form with "git show", and not worry
about telling the user "this is how 'git commit' could be implemented in
terms of primitive operations". And then you can avoid most of the
low-level commands entirely.

> Maybe I'm wrong about rev-parse, but for the most part, having all these 
> low-level commands available through the same executable that's used for 
> "git add," "git merge," "git commit," et. al. makes the whole shebang hard 
> to approach.  It would be better for users if the low-level stuff was 
> accessed some other way.

Perhaps. The general approach is to make those commands accessible as
"git foo", but not to _advertise_ them in the same way as the porcelain
commands. The idea was to give a uniform calling convention without
unnecessarily confusing users by presenting a large number of
infrequently-used commands.

At any rate, it is too late to change the calling convention for
plumbing. The whole point of them is to be a stable interface for
scripting. Changing them to "git low-level rev-parse" (if it was even
something that we wanted to do, which I don't think it is) would break
everyone's scripts.

>> A lot of that is historical baggage. The original git was not a VCS but
>> rather a _toolkit_ for building a VCS. So the natural place for talking
>> about parsing revisions was rev-parse, because that was the only way to
>> access the revision parsing code. :)
>
> I understand that, but it doesn't change the present reality.

Right. I'm just trying to say how we got here, which I think is relevant
because it gives a hint of what directions we can go in. In other words,
nobody _designed_ what we have now. It evolved into this state, which
obviously has some drawbacks. So I think you won't find much resistance
in trying to evolve the documentation to present git more as a coherent
tool, and less as a set of unrelated commands.

> [excuse me, but what the #@&*! is "porcelainish" supposed to mean? 
> (http://www.kernel.org/pub/software/scm/git/docs/git-rev-parse.html)]

Heh. That one is particularly egregious, because it rests on several
layers of git jargon. The low-level tools are plumbing, like pipes and
valves. The high-level commands intended for end users are porcelain,
like sinks and toilets. The -ish suffix is often used in git to refer to
a type, or something we can convert into a type (like a "tree-ish" could
be a tree object, or a commit object which points to a tree, or a tag
object which points to a commit which points to a tree). So I think by
saying "porcelain-ish" here, the author meant "not just porcelain, but
other things which take revisions and behave sort of like porcelain".

Which is a truly horrible thing to throw at a new user who just wants to
see how to specify a revision.

So yeah, if you are saying that could be worded better, I absolutely
agree. There are a lot of spots like that. They are getting fixed slowly
over time. I'm not sure if that is enough, or if somebody knowledgeable
really needs to take a sledge hammer to the existing documentation and
just reorganize and rewrite a lot of it.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 17:38                 ` J. Bruce Fields
@ 2009-04-24 18:27                   ` Jeff King
  2009-04-24 18:35                     ` J. Bruce Fields
       [not found]                   ` <34BD51FF-0908-48A8-BBBC-E27B0EFB32E5@boostpro.com>
  2009-04-24 19:12                   ` Michael Witten
  2 siblings, 1 reply; 90+ messages in thread
From: Jeff King @ 2009-04-24 18:27 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Michael Witten, David Abrahams, git

On Fri, Apr 24, 2009 at 01:38:52PM -0400, J. Bruce Fields wrote:

> On Fri, Apr 24, 2009 at 10:18:15AM -0500, Michael Witten wrote:
> > I think that we've already got that tutorial-esque style covered (I
> > haven't read it in a while):
> > 
> >     http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html
> > 
> > However, the User Manual should make a Mathematician happy.
> 
> I'm all for making mathematicians happy.  But, again, help?:
> 
> 	- Specific examples?
> 	- Patches?  Please, patches?
> 	- Suggested text?
> 	- Suggested outline?
> 
> There's no shortage of high-level ideas.  What there's always a need for
> more of is people willing to submit patches, respond to review, etc.

I usually hate to "me too", but I really want to second this notion. We
have been getting minor documentation fixups trickling in, and I think
those really help, and maybe they eventually would make the
documentation perfect. But I have the feeling we would benefit from
somebody taking ownership and considering the big picture of how the
documentation fits together, and then really pushing it forward with
something concrete.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 18:27                   ` Jeff King
@ 2009-04-24 18:35                     ` J. Bruce Fields
  0 siblings, 0 replies; 90+ messages in thread
From: J. Bruce Fields @ 2009-04-24 18:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Witten, David Abrahams, git

On Fri, Apr 24, 2009 at 02:27:52PM -0400, Jeff King wrote:
> On Fri, Apr 24, 2009 at 01:38:52PM -0400, J. Bruce Fields wrote:
> 
> > On Fri, Apr 24, 2009 at 10:18:15AM -0500, Michael Witten wrote:
> > > I think that we've already got that tutorial-esque style covered (I
> > > haven't read it in a while):
> > > 
> > >     http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html
> > > 
> > > However, the User Manual should make a Mathematician happy.
> > 
> > I'm all for making mathematicians happy.  But, again, help?:
> > 
> > 	- Specific examples?
> > 	- Patches?  Please, patches?
> > 	- Suggested text?
> > 	- Suggested outline?
> > 
> > There's no shortage of high-level ideas.  What there's always a need for
> > more of is people willing to submit patches, respond to review, etc.
> 
> I usually hate to "me too", but I really want to second this notion. We
> have been getting minor documentation fixups trickling in, and I think
> those really help, and maybe they eventually would make the
> documentation perfect. But I have the feeling we would benefit from
> somebody taking ownership and considering the big picture of how the
> documentation fits together, and then really pushing it forward with
> something concrete.

Yup, and dealing seriously with objections, getting concensus for the
resulting solutions, etc--in other words, being a maintainer.  I thought
I'd be able to do that at some point, but just haven't consistently had
the time.

That said, several smaller suggestions have been made which could be
handled now:

	- I don't think I've seen objections to the idea of a
	  git-revision-specifying manpage, whatever you want to call
	  it--so probably that just needs someone to write the patch.
	- There've been complaints about terms being used before they're
	  defined sufficiently well.  I can believe it, but: specific
	  examples would help!

--b.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
       [not found]                   ` <34BD51FF-0908-48A8-BBBC-E27B0EFB32E5@boostpro.com>
@ 2009-04-24 18:52                     ` J. Bruce Fields
  2009-04-25 10:35                       ` Felipe Contreras
  0 siblings, 1 reply; 90+ messages in thread
From: J. Bruce Fields @ 2009-04-24 18:52 UTC (permalink / raw)
  To: David Abrahams; +Cc: Michael Witten, Jeff King, git

On Fri, Apr 24, 2009 at 02:32:36PM -0400, David Abrahams wrote:
>
> On Apr 24, 2009, at 1:38 PM, J. Bruce Fields wrote:
>
>> On Fri, Apr 24, 2009 at 10:18:15AM -0500, Michael Witten wrote:
>>> I think that we've already got that tutorial-esque style covered (I
>>> haven't read it in a while):
>>>
>>>    http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html
>>>
>>> However, the User Manual should make a Mathematician happy.
>>
>> I'm all for making mathematicians happy.  But, again, help?:
>>
>> 	- Specific examples?
>> 	- Patches?  Please, patches?
>> 	- Suggested text?
>> 	- Suggested outline?
>>
>> There's no shortage of high-level ideas.  What there's always a need  
>> for
>> more of is people willing to submit patches, respond to review, etc.
>
>
> I'll probably try to write something myself once I figure this stuff  
> out.

That would be great, thanks.  Several people have gone off and posted
their own tutorials someplace, and that's fine, but it would be
especially helpful if you could contribute to the actual Documentation/
directory.  That may mean arguing with people and making compromises.
But it also means the results will be distributed with git, will be
integrated with other git documentation, and will get first-class
technical review.

I'd also encourage incrementally improving existing documentation where
possible instead of starting over from scratch.  But having broken that
rule myself a couple times I'm hardly in a position to insist.  If you
must start over, at least think about how to replace or fit it in with
existing documentation.

--b.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 18:15               ` Jeff King
@ 2009-04-24 19:00                 ` David Abrahams
  2009-04-24 20:24                   ` Jeff King
  0 siblings, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-24 19:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Witten, J. Bruce Fields, git


On Apr 24, 2009, at 2:15 PM, Jeff King wrote:

> On Fri, Apr 24, 2009 at 01:28:35PM -0400, David Abrahams wrote:
>
>>> I think this is a bit better than it used to be. Plumbing commands  
>>> are
>>> mostly hidden outside of the user's PATH.
>>
>> Huh?
>>
>> git hash-object
>> git cat-file -t ...
>> git ls-tree
>> git rev-parse
>> git write-tree
>> git commit-tree
>
> How did you find out about them?

The first time?

  $ man git

> They are not in your PATH, so shell
> completion doesn't find them.

Huh?  `which git` works.  ls-tree is an argument to git as far as I  
know.

Yes, I know there are aliases like git-ls-tree somewhere, but that  
only adds to the sense that all commands are equal.

> They are not in the programmable bash
> completion. They are not in the short command list git gives you when
> you type "git help" or "git" without arguments.
>
> So you must have read about them somewhere..

   $ man git

which makes no distinction.

   $ xxx [--]help

is usually OK if I already know xxx pretty well and just want a  
refresher.  If know I'll need a little more than that, I use man  
straight away.

>> These are just some of the ones I learned about by reading John  
>> Wiegley's
>> "Git From the Bottom Up."
>
> ...like here.

That's where I learned *what they do*.

> So if that document gave you the impression that those are
> part of an everyday git workflow, then I think the document is at  
> fault,
> not git itself.

It didn't.

> I admit I haven't read "Git From the Bottom Up" carefully, but I think
> what Michael is proposing would probably start a little higher from  
> the
> bottom than that document.

Yes, please.  "Git for Computer Scientists" is a great foundation.   
 From there add more information about naming things so I know what  
things like remotes/origin/master mean when I see them in gitk, and  
I'm off to the races.

> You can give the concepts of the object
> types, show them in pretty-printed form with "git show", and not worry
> about telling the user "this is how 'git commit' could be  
> implemented in
> terms of primitive operations". And then you can avoid most of the
> low-level commands entirely.

Yes, that's fine.  Although I think there may be some things in GFTBU  
that are good fundamental concepts.  There's a nice list of terms with  
definitions early in the document.

>> Maybe I'm wrong about rev-parse, but for the most part, having all  
>> these
>> low-level commands available through the same executable that's  
>> used for
>> "git add," "git merge," "git commit," et. al. makes the whole  
>> shebang hard
>> to approach.  It would be better for users if the low-level stuff was
>> accessed some other way.
>
> Perhaps. The general approach is to make those commands accessible as
> "git foo", but not to _advertise_ them in the same way as the  
> porcelain
> commands.

What is "porcelain," please?  This is one among many examples of  
jargon used only (or encountered by me for the first time) in the Git  
community.

> The idea was to give a uniform calling convention without
> unnecessarily confusing users by presenting a large number of
> infrequently-used commands.

It's not working, I'm sorry to say.

> At any rate, it is too late to change the calling convention for
> plumbing.

I disagree.  You can leave the old functionality there in a  
"deprecated" state and change the way you advertise it.  It would even  
help a lot if the plumbing were all spelled "git-xxx" and the high  
level stuff were "git xxx."

> The whole point of them is to be a stable interface for
> scripting. Changing them to "git low-level rev-parse" (if it was even
> something that we wanted to do, which I don't think it is) would break
> everyone's scripts.

See above.

>> [excuse me, but what the #@&*! is "porcelainish" supposed to mean?
>> (http://www.kernel.org/pub/software/scm/git/docs/git-rev-parse.html)]
>
> Heh. That one is particularly egregious, because it rests on several
> layers of git jargon. The low-level tools are plumbing, like pipes and
> valves.

? I use the valves on my kitchen sink all the time.

> The high-level commands intended for end users are porcelain,
> like sinks and toilets. The -ish suffix is often used in git to  
> refer to
> a type, or something we can convert into a type (like a "tree-ish"  
> could
> be a tree object, or a commit object which points to a tree, or a tag
> object which points to a commit which points to a tree). So I think by
> saying "porcelain-ish" here, the author meant "not just porcelain, but
> other things which take revisions and behave sort of like porcelain".

bah. humbug.

> Which is a truly horrible thing to throw at a new user who just  
> wants to
> see how to specify a revision.

yeeeeah.

> So yeah, if you are saying that could be worded better, I absolutely
> agree. There are a lot of spots like that. They are getting fixed  
> slowly
> over time. I'm not sure if that is enough, or if somebody  
> knowledgeable
> really needs to take a sledge hammer to the existing documentation and
> just reorganize and rewrite a lot of it.


I'm thinking the latter.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 17:38                 ` J. Bruce Fields
  2009-04-24 18:27                   ` Jeff King
       [not found]                   ` <34BD51FF-0908-48A8-BBBC-E27B0EFB32E5@boostpro.com>
@ 2009-04-24 19:12                   ` Michael Witten
  2 siblings, 0 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-24 19:12 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff King, David Abrahams, git

On Fri, Apr 24, 2009 at 12:38, J. Bruce Fields <bfields@fieldses.org> wrote:
> I'm all for making mathematicians happy.  But, again, help?:

I intend to help, but I have a terrible tendency to shave the GNU;
right now, I'm waist deep in shavings.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 19:00                 ` David Abrahams
@ 2009-04-24 20:24                   ` Jeff King
  2009-04-24 21:06                     ` David Abrahams
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff King @ 2009-04-24 20:24 UTC (permalink / raw)
  To: David Abrahams; +Cc: Michael Witten, J. Bruce Fields, git

On Fri, Apr 24, 2009 at 03:00:19PM -0400, David Abrahams wrote:

>> How did you find out about them?
>
> The first time?
>
>  $ man git
>
> [...]
>
> which makes no distinction [between porcelain and plumbing].

Really? The command list in my version is divided into "HIGH-LEVEL
COMMANDS (PORCELAIN)" and "LOW-LEVEL COMMANDS (PLUMBING)", with the
commands you mentioned falling into the latter. And skimming "git log
Documentation/git.txt", it looks like it has been that way for some
time.

There is a little discussion under the plumbing section of what plumbing
is. It could perhaps be more emphatic in warning regular users away.

>> They are not in your PATH, so shell
>> completion doesn't find them.
>
> Huh?  `which git` works.  ls-tree is an argument to git as far as I know.

Yes, but shell completion will never present you with the text
"ls-tree". You have to have found out about it somewhere else (and
completion used to show, because git-ls-tree was in the PATH).

>   $ xxx [--]help
>
> is usually OK if I already know xxx pretty well and just want a  
> refresher.  If know I'll need a little more than that, I use man straight 
> away.

git --help shows a list of common commands, but otherwise "git help
foo" and "git foo --help" _do_ show the manpage. It may be that "man
git" could use some cleanup; specific suggestions are welcome.

> What is "porcelain," please?  This is one among many examples of jargon 
> used only (or encountered by me for the first time) in the Git community.

I think I ended up explaining it later in my email, but let me know if
you are still confused.

>> The idea was to give a uniform calling convention without
>> unnecessarily confusing users by presenting a large number of
>> infrequently-used commands.
>
> It's not working, I'm sorry to say.

Right, that's why I'm trying to figure out why you are hung up on the
low-level commands. The idea was that you wouldn't need to be exposed to
them at all, but obviously you were (or if you were exposed, it would be
in a list that was clearly marked as "this is low-level stuff that you
don't really need to worry about". So I'm trying to figure out where it
went wrong.

>> At any rate, it is too late to change the calling convention for
>> plumbing.
>
> I disagree.  You can leave the old functionality there in a "deprecated" 
> state and change the way you advertise it.

But does that really help? It means that "git hash-object" is still
there, which I thought was the problem you had. You can argue that it
wouldn't be advertised to users, and so wouldn't be a problem, but that
is _already_ the strategy we are using. So either that strategy is fine,
in which case we are on the right track but may still have some work to
do in properly implementing it. Or it's not, in which case your proposal
is no better.

> It would even help a lot if the plumbing were all spelled "git-xxx"
> and the high level stuff were "git xxx."

Differentating calling conventions like that was proposed when dashed
forms were deprecated and removed from the PATH. But if we had dashed
forms for plumbing (i.e., not forwarding them via the "git" wrapper),
then you have to do one of:

  - put them in the user's PATH. Now tab completion or looking in your
    PATH means you see _just_ the plumbing commands, and none of the
    high level ones. Which is one of the reasons they were removed from
    the PATH in the first place (due to numerous user complaints).

  - put them elsewhere, and force plumbing users to add $GIT_EXEC_PATH
    to their PATH. That becomes very annoying for casual plumbing users.
    If you come to the mailing list with a problem, I would have to jump
    through extra hoops to ask you to show me the output of "git
    ls-files".

Not to mention that the git wrapper does other useful things besides
simply exec'ing. For example, it supports --git-dir, --bare, etc.
So the problem is that the low-level commands _are_ still useful, and
many people still want to call them, just like regular git commands.
It's just that they are numerous and low-level, which makes them
daunting for new users.

And it has become obvious over several years of the git mailing list
that users, once they see mention of a command, must start investigating 
it to find out if and how it is useful. And I am not saying that is a
failing of users; on the contrary, I think it is quite a healthy
behavior on a unix-ish system. But it means that if we want not to
advertise low-level commands, we have to be very careful about the ways
in which we mention them.

Perhaps it would make sense for each plumbing command's man page to
start with something like "this is a low-level command used for
scripting git or investigating its internals. For high-level use, you
may be more interested in $X", where $X may be "git commit" for
write-tree, commit-tree, etc. And that would at least help intercept
users before they get too confused.

>>> [excuse me, but what the #@&*! is "porcelainish" supposed to mean?
>>> (http://www.kernel.org/pub/software/scm/git/docs/git-rev-parse.html)]
>>
>> Heh. That one is particularly egregious, because it rests on several
>> layers of git jargon. The low-level tools are plumbing, like pipes and
>> valves.
>
> ? I use the valves on my kitchen sink all the time.

Sorry, I meant the ones under the sink, that you would use if you were
replacing the faucet. I would call the ones above "taps". But hopefully
you get a sense of the distinction between plumbing and porcelain.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24  0:30           ` Michael Witten
@ 2009-04-24 20:30             ` Johan Herland
  2009-04-24 21:34               ` Daniel Barkalow
  0 siblings, 1 reply; 90+ messages in thread
From: Johan Herland @ 2009-04-24 20:30 UTC (permalink / raw)
  To: Michael Witten; +Cc: git, David Abrahams, Jeff King, J. Bruce Fields

On Friday 24 April 2009, Michael Witten wrote:
> On Thu, Apr 23, 2009 at 17:51, Johan Herland <johan@herland.net> wrote:
> > There's also http://www.eecs.harvard.edu/~cduan/technical/git/ which I
> > think is a great bottom-up introduction:
> > - not too heavy on the concepts
>
> I really don't understand this mentality. Concepts are the only things
> that are important. From concepts falls all else.

Sorry for not being clear: Concepts are indeed (and should be) important. 
What I mean is that the concepts introduced are short and simple enough for 
novice users to understand (without much VCS experience, if any at all). If 
we start off _too_ detailed, we risk loosing the audience, and no one is 
better off.

Like Jeff King said elsewhere in this thread: We want to start a little 
higher from the bottom. The above introduction does not focus on blobs or 
trees, but manages to introduce Git in a useful manner by starting off with 
only two concepts: commits and refs. With only these two concepts, and 
showing how high-level commands (remember: no plumbing) work with these 
concepts, I believe it is possible to teach anyone to use Git well. Of 
course, as users progress towards becoming power-users, more concepts are 
needed, but I don't think these are needed from the start.

As Einstein might have said: As simple as possible, but no simpler.


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 20:24                   ` Jeff King
@ 2009-04-24 21:06                     ` David Abrahams
  2009-04-24 22:45                       ` Björn Steinbrink
  0 siblings, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-24 21:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Witten, J. Bruce Fields, git


On Apr 24, 2009, at 4:24 PM, Jeff King wrote:

> On Fri, Apr 24, 2009 at 03:00:19PM -0400, David Abrahams wrote:
>
>>> How did you find out about them?
>>
>> The first time?
>>
>> $ man git
>>
>> [...]
>>
>> which makes no distinction [between porcelain and plumbing].
>
> Really? The command list in my version is divided into "HIGH-LEVEL
> COMMANDS (PORCELAIN)" and "LOW-LEVEL COMMANDS (PLUMBING)", with the
> commands you mentioned falling into the latter. And skimming "git log
> Documentation/git.txt", it looks like it has been that way for some
> time.

Sorry, you are totally right.

The list is just so crazy-long; I may have skimmed it.

>> Huh?  `which git` works.  ls-tree is an argument to git as far as I  
>> know.
>
> Yes, but shell completion will never present you with the text
> "ls-tree". You have to have found out about it somewhere else (and
> completion used to show, because git-ls-tree was in the PATH).
>
>>  $ xxx [--]help
>>
>> is usually OK if I already know xxx pretty well and just want a
>> refresher.  If know I'll need a little more than that, I use man  
>> straight
>> away.
>
> git --help shows a list of common commands, but otherwise "git help
> foo" and "git foo --help" _do_ show the manpage. It may be that "man
> git" could use some cleanup; specific suggestions are welcome.
>
>> What is "porcelain," please?  This is one among many examples of  
>> jargon
>> used only (or encountered by me for the first time) in the Git  
>> community.
>
> I think I ended up explaining it later in my email, but let me know if
> you are still confused.

Nope; I'm fine now.  It's not a great analogy, because everyone who  
uses a sink ends up dealing with spigots and valves, but I get it.

>>> The idea was to give a uniform calling convention without
>>> unnecessarily confusing users by presenting a large number of
>>> infrequently-used commands.
>>
>> It's not working, I'm sorry to say.
>
> Right, that's why I'm trying to figure out why you are hung up on the
> low-level commands. The idea was that you wouldn't need to be  
> exposed to
> them at all, but obviously you were (or if you were exposed, it  
> would be
> in a list that was clearly marked as "this is low-level stuff that you
> don't really need to worry about". So I'm trying to figure out where  
> it
> went wrong.

I'm sorry that I can't be much help in that department.  If I really  
knew how I ended up with that wrong impression, I probably would have  
corrected it already.  It's weird; git is composed of ideas that are  
all very familiar to me (reference-counted management of immutable  
data, hashing, etc.) yet for me, getting to know it has been really  
tough.  By contrast, for example, subversion was instantly  
understandable when I pawed through the SVN book.

>>> At any rate, it is too late to change the calling convention for
>>> plumbing.
>>
>> I disagree.  You can leave the old functionality there in a  
>> "deprecated"
>> state and change the way you advertise it.
>
> But does that really help? It means that "git hash-object" is still
> there, which I thought was the problem you had. You can argue that it
> wouldn't be advertised to users, and so wouldn't be a problem, but  
> that
> is _already_ the strategy we are using. So either that strategy is  
> fine,
> in which case we are on the right track but may still have some work  
> to
> do in properly implementing it. Or it's not, in which case your  
> proposal
> is no better.

You've got me stumped there, I have to admit.

>> It would even help a lot if the plumbing were all spelled "git-xxx"
>> and the high level stuff were "git xxx."
>
> Differentating calling conventions like that was proposed when dashed
> forms were deprecated and removed from the PATH. But if we had dashed
> forms for plumbing (i.e., not forwarding them via the "git" wrapper),
> then you have to do one of:
>
>  - put them in the user's PATH. Now tab completion or looking in your
>    PATH means you see _just_ the plumbing commands, and none of the
>    high level ones. Which is one of the reasons they were removed from
>    the PATH in the first place (due to numerous user complaints).
>
>  - put them elsewhere, and force plumbing users to add $GIT_EXEC_PATH
>    to their PATH. That becomes very annoying for casual plumbing  
> users.
>    If you come to the mailing list with a problem, I would have to  
> jump
>    through extra hoops to ask you to show me the output of "git
>    ls-files".

I see your point.

   llgit xxx

?

> Not to mention that the git wrapper does other useful things besides
> simply exec'ing. For example, it supports --git-dir, --bare, etc.
> So the problem is that the low-level commands _are_ still useful, and
> many people still want to call them, just like regular git commands.
> It's just that they are numerous and low-level, which makes them
> daunting for new users.
>
> And it has become obvious over several years of the git mailing list
> that users, once they see mention of a command, must start  
> investigating
> it to find out if and how it is useful. And I am not saying that is a
> failing of users; on the contrary, I think it is quite a healthy
> behavior on a unix-ish system. But it means that if we want not to
> advertise low-level commands, we have to be very careful about the  
> ways
> in which we mention them.
>
> Perhaps it would make sense for each plumbing command's man page to
> start with something like "this is a low-level command used for
> scripting git or investigating its internals. For high-level use, you
> may be more interested in $X", where $X may be "git commit" for
> write-tree, commit-tree, etc. And that would at least help intercept
> users before they get too confused.

Sounds like a great idea to me.

>>>> [excuse me, but what the #@&*! is "porcelainish" supposed to mean?
>>>> (http://www.kernel.org/pub/software/scm/git/docs/git-rev- 
>>>> parse.html)]
>>>
>>> Heh. That one is particularly egregious, because it rests on several
>>> layers of git jargon. The low-level tools are plumbing, like pipes  
>>> and
>>> valves.
>>
>> ? I use the valves on my kitchen sink all the time.
>
> Sorry, I meant the ones under the sink, that you would use if you were
> replacing the faucet. I would call the ones above "taps". But  
> hopefully
> you get a sense of the distinction between plumbing and porcelain.


I know, but the point is, they're not porcelain.  They're "plumbing  
fixtures."

I think UI/API works way better than porcelain/plumbing.  We are,  
after all, programmers.  It would also be good to link to a definition  
any time you use a term of art in the docs.  I would even do that in  
the case of UI/API since the distinction could appear to be subtle.

I should also say, most of the docs and interfaces I see in Git (and  
its wrappers, web interfaces, etc.) give the SHA1 hashes way too much  
exposure.  The times when it's actually more convenient to use a hash  
instead of one of the other notations are rare, and if hashes weren't  
so exposed I bet most interfaces would make those other names more  
available.  One reason I think hashes retain their prominent exposure  
is that you have no other reasonably stable way of referring to  
commits, since branch~NN counts backward from HEAD.  Adding such a  
thing would help.

Oh, one other specific issue: the rev-parse manpage uses $GIT_DIR  
without saying what it is.  I *think* that means the root of the  
working copy and has nothing to do with environment variables, but  
it's hard to be sure, and if I'm right about that, it's misleading  
notation.

Someone needs to get gitiseasy.org/gitiseasy.net and then provide  
content that lives up to the name :^)

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 20:30             ` Johan Herland
@ 2009-04-24 21:34               ` Daniel Barkalow
  2009-04-24 21:38                 ` Jeff King
  0 siblings, 1 reply; 90+ messages in thread
From: Daniel Barkalow @ 2009-04-24 21:34 UTC (permalink / raw)
  To: Johan Herland
  Cc: Michael Witten, git, David Abrahams, Jeff King, J. Bruce Fields

On Fri, 24 Apr 2009, Johan Herland wrote:

> On Friday 24 April 2009, Michael Witten wrote:
> > On Thu, Apr 23, 2009 at 17:51, Johan Herland <johan@herland.net> wrote:
> > > There's also http://www.eecs.harvard.edu/~cduan/technical/git/ which I
> > > think is a great bottom-up introduction:
> > > - not too heavy on the concepts
> >
> > I really don't understand this mentality. Concepts are the only things
> > that are important. From concepts falls all else.
> 
> Sorry for not being clear: Concepts are indeed (and should be) important. 
> What I mean is that the concepts introduced are short and simple enough for 
> novice users to understand (without much VCS experience, if any at all). If 
> we start off _too_ detailed, we risk loosing the audience, and no one is 
> better off.
> 
> Like Jeff King said elsewhere in this thread: We want to start a little 
> higher from the bottom. The above introduction does not focus on blobs or 
> trees, but manages to introduce Git in a useful manner by starting off with 
> only two concepts: commits and refs.

I'd say that blobs and trees are an implementation detail of "the full 
content of a version of the project", not something conceptually 
important. Likewise, the date representation used in commits isn't 
important. It might be worth saying that git purposefully discards any 
information in your filesystem that is just incidental and not project 
content, like whether other users on the system where the working 
directory is can access your files; but a full enumeration of what the 
"content" and "incidental" categories contain can go in an appendix or 
something.

(FWIW, git originally didn't use tree objects for subdirectories or mask
out the g+w bit from tree entries. These weren't conceptual changes, but 
implementation details.)

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 21:34               ` Daniel Barkalow
@ 2009-04-24 21:38                 ` Jeff King
  2009-04-24 22:18                   ` Michael Witten
                                     ` (2 more replies)
  0 siblings, 3 replies; 90+ messages in thread
From: Jeff King @ 2009-04-24 21:38 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Johan Herland, Michael Witten, git, David Abrahams,
	J. Bruce Fields

On Fri, Apr 24, 2009 at 05:34:00PM -0400, Daniel Barkalow wrote:

> I'd say that blobs and trees are an implementation detail of "the full 
> content of a version of the project", not something conceptually 
> important. Likewise, the date representation used in commits isn't 

I disagree. I think it's important to note that trees and blobs have a
name, and you can refer to them. Once you know that, the fact that you
can do:

  git show master
  git show master:Documentation
  git show master:Makefile

just makes sense. You are always just specifying an object, but the type
is different for each (and show "does the right thing" based on object
type).

No, that isn't critical for understanding how _commit_ operations work,
but I think that is exactly the sort of conceptual knowledge that let
people use git more fully.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 21:38                 ` Jeff King
@ 2009-04-24 22:18                   ` Michael Witten
  2009-04-24 22:25                     ` Michael Witten
  2009-04-24 23:16                     ` Björn Steinbrink
  2009-04-24 23:21                   ` Daniel Barkalow
  2009-04-25  0:19                   ` David Abrahams
  2 siblings, 2 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-24 22:18 UTC (permalink / raw)
  To: Jeff King
  Cc: Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On Fri, Apr 24, 2009 at 16:38, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 24, 2009 at 05:34:00PM -0400, Daniel Barkalow wrote:
>
>> I'd say that blobs and trees are an implementation detail of "the full
>> content of a version of the project", not something conceptually
>> important. Likewise, the date representation used in commits isn't
> ...
> No, that isn't critical for understanding how _commit_ operations work,
> but I think that is exactly the sort of conceptual knowledge that let
> people use git more fully.

I think the key conlusion here is that the main concepts are *objects*
and references to those objects. One type of object is not necessarily
more low-level or high-level than another type of object; each type of
object is the most important type of object for a particular task in
or view of the git world.

> I disagree. I think it's important to note that trees and blobs have a
> name, and you can refer to them. Once you know that, the fact that you
> can do:
>
>  git show master
>  git show master:Documentation
>  git show master:Makefile
>
> just makes sense. You are always just specifying an object, but the type
> is different for each (and show "does the right thing" based on object
> type).

In fact, I think it's important to note that the notation:

    git show master:Makefile

actually involves a translation from a Unix filesystem address to a
git object address that is then used to find the relevant data.

In fact, I think masking this kind of thing with a catch-all word
'reference' is a bad idea. Rather than being hidden, it should be
exposed: I think it would be beneficial to use the word 'address'
rather than 'reference' when talking about the SHA-1 names. Then HEAD
could be called a pointer variable, etc.

So, a pointer variable's value is an object address that is the
location of an object in git 'memory'. I think using this approach
would make things significantly more transparent.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 22:18                   ` Michael Witten
@ 2009-04-24 22:25                     ` Michael Witten
  2009-04-24 23:11                       ` Daniel Barkalow
  2009-04-24 23:16                     ` Björn Steinbrink
  1 sibling, 1 reply; 90+ messages in thread
From: Michael Witten @ 2009-04-24 22:25 UTC (permalink / raw)
  To: Jeff King
  Cc: Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On Fri, Apr 24, 2009 at 17:18, Michael Witten <mfwitten@gmail.com> wrote:
> In fact, I think masking this kind of thing with a catch-all word
> 'reference' is a bad idea. Rather than being hidden, it should be
> exposed: I think it would be beneficial to use the word 'address'
> rather than 'reference' when talking about the SHA-1 names. Then HEAD
> could be called a pointer variable, etc.
>
> So, a pointer variable's value is an object address that is the
> location of an object in git 'memory'. I think using this approach
> would make things significantly more transparent.

In fact, it's not particularly important that SHA-1 is used to compute
the address into git memory. The only thing that's important is that
the address is determined by content alone (I'm not even sure that
specifying that the address is a cryptographically sound hash of the
content is important; shouldn't that follow from the declaration that
it must be uniquely based on content alone?); the fact that's a SHA-1
is purely an implementation detail, and so it shouldn't appear
prominently in the documentation.

So, what do you say?

Let's start a reformation of the git terminology to use analogies that
have been around since the dawn of computing: 'memory', 'address', and
'pointer'.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 21:06                     ` David Abrahams
@ 2009-04-24 22:45                       ` Björn Steinbrink
  2009-04-25  0:39                         ` David Abrahams
  0 siblings, 1 reply; 90+ messages in thread
From: Björn Steinbrink @ 2009-04-24 22:45 UTC (permalink / raw)
  To: David Abrahams; +Cc: Jeff King, Michael Witten, J. Bruce Fields, git

On 2009.04.24 17:06:27 -0400, David Abrahams wrote:
> On Apr 24, 2009, at 4:24 PM, Jeff King wrote:
>> On Fri, Apr 24, 2009 at 03:00:19PM -0400, David Abrahams wrote:
>>> It would even help a lot if the plumbing were all spelled "git-xxx"
>>> and the high level stuff were "git xxx."
>>
>> Differentating calling conventions like that was proposed when dashed
>> forms were deprecated and removed from the PATH. But if we had dashed
>> forms for plumbing (i.e., not forwarding them via the "git" wrapper),
>> then you have to do one of:
>>
>>  - put them in the user's PATH. Now tab completion or looking in your
>>    PATH means you see _just_ the plumbing commands, and none of the
>>    high level ones. Which is one of the reasons they were removed
>>    from the PATH in the first place (due to numerous user
>>    complaints).
>>
>>  - put them elsewhere, and force plumbing users to add $GIT_EXEC_PATH
>>    to their PATH. That becomes very annoying for casual plumbing
>>    users. If you come to the mailing list with a problem, I would
>>    have to jump through extra hoops to ask you to show me the output
>>    of "git ls-files".
>
> I see your point.
>
>   llgit xxx
>
> ?

If that was the exclusive way of calling the low-level commands, that
would still break existing scripts. And if you keep e.g. "git
write-tree" and just add "llgit write-tree" as an alias, that will IMHO
just cause more confusion once old and new git users meet. And I agree
with Peff, it's not important whether it's "git foo", "llgit foo", "git
lowlevel foo" or something else. It's just about how much your users
really _need_ to know and how you tell them to use the stuff.

> I think UI/API works way better than porcelain/plumbing. We are, after
> all, programmers.

We are programmers, but not all git users are programmers.

> It would also be good to link to a definition any time you use a term
> of art in the docs. I would even do that in the case of UI/API since
> the distinction could appear to be subtle.
>
> I should also say, most of the docs and interfaces I see in Git (and
> its wrappers, web interfaces, etc.) give the SHA1 hashes way too much
> exposure. The times when it's actually more convenient to use a hash
> instead of one of the other notations are rare,

How often do you need a name for a commit shown by a command and can
accept that it is not stable? I usually need a name because I
want to reference that commit later on, either because I need to talk to
other users, or because I'm working on something and might need to look
at that commit now and then, regardless on my current state of things.
One big exception in my workflow is when I use "git blame", then I
usually just need the name once to look at the full commit. But then I
prefer a 7-8 characters long sha-1 prefix to something like
improve_foo_speed~132^12~1^3. And "pseudo-stable" numbers have been
discussed to death.

> and if hashes weren't so exposed I bet most interfaces would make
> those other names more available. One reason I think hashes retain
> their prominent exposure is that you have no other reasonably stable
> way of referring to commits, since branch~NN counts backward from
> HEAD. Adding such a thing would help.

It counts backwards from "branch".

> Oh, one other specific issue: the rev-parse manpage uses $GIT_DIR
> without saying what it is. I *think* that means the root of the
> working copy and has nothing to do with environment variables, but
> it's hard to be sure, and if I'm right about that, it's misleading
> notation.

$GIT_DIR means the .git directory of a non-bare repo.

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 22:25                     ` Michael Witten
@ 2009-04-24 23:11                       ` Daniel Barkalow
  2009-04-24 23:14                         ` Jeff King
                                           ` (2 more replies)
  0 siblings, 3 replies; 90+ messages in thread
From: Daniel Barkalow @ 2009-04-24 23:11 UTC (permalink / raw)
  To: Michael Witten
  Cc: Jeff King, Johan Herland, git, David Abrahams, J. Bruce Fields

On Fri, 24 Apr 2009, Michael Witten wrote:

> On Fri, Apr 24, 2009 at 17:18, Michael Witten <mfwitten@gmail.com> wrote:
> > In fact, I think masking this kind of thing with a catch-all word
> > 'reference' is a bad idea. Rather than being hidden, it should be
> > exposed: I think it would be beneficial to use the word 'address'
> > rather than 'reference' when talking about the SHA-1 names. Then HEAD
> > could be called a pointer variable, etc.
> >
> > So, a pointer variable's value is an object address that is the
> > location of an object in git 'memory'. I think using this approach
> > would make things significantly more transparent.
> 
> In fact, it's not particularly important that SHA-1 is used to compute
> the address into git memory. The only thing that's important is that
> the address is determined by content alone (I'm not even sure that
> specifying that the address is a cryptographically sound hash of the
> content is important; shouldn't that follow from the declaration that
> it must be uniquely based on content alone?); the fact that's a SHA-1
> is purely an implementation detail, and so it shouldn't appear
> prominently in the documentation.
> 
> So, what do you say?
> 
> Let's start a reformation of the git terminology to use analogies that
> have been around since the dawn of computing: 'memory', 'address', and
> 'pointer'.

I actually think calling them "sha1s" is better, simply because this bit 
of jargon doesn't mean anything else (git deals with email, so "address" 
is overloaded). And the term is already in use for this particular case, 
and it doesn't mean anything else at all (since, of course, the crypto 
thing is "SHA-1", not "sha1"), and it's short (which is important for 
making it easy to look at usage help).

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:11                       ` Daniel Barkalow
@ 2009-04-24 23:14                         ` Jeff King
  2009-04-24 23:18                           ` Michael Witten
                                             ` (2 more replies)
  2009-04-24 23:26                         ` Michael Witten
  2009-04-25  0:41                         ` David Abrahams
  2 siblings, 3 replies; 90+ messages in thread
From: Jeff King @ 2009-04-24 23:14 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Michael Witten, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On Fri, Apr 24, 2009 at 07:11:40PM -0400, Daniel Barkalow wrote:

> > Let's start a reformation of the git terminology to use analogies that
> > have been around since the dawn of computing: 'memory', 'address', and
> > 'pointer'.
> 
> I actually think calling them "sha1s" is better, simply because this bit 
> of jargon doesn't mean anything else (git deals with email, so "address" 
> is overloaded). And the term is already in use for this particular case, 
> and it doesn't mean anything else at all (since, of course, the crypto 
> thing is "SHA-1", not "sha1"), and it's short (which is important for 
> making it easy to look at usage help).

Junio suggested "object name" in another thread, which I think is nicely
descriptive.

FWIW, I think the pointer nomenclature has terrible connotations. I
think everyone who works on git groks pointers just fine, but aren't
they generally reviled among the progrmaming populace as the most
complex and error-prone part of learning to program? Do we really need
to increase git's reputation as complex and error-prone? ;)

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 22:18                   ` Michael Witten
  2009-04-24 22:25                     ` Michael Witten
@ 2009-04-24 23:16                     ` Björn Steinbrink
  2009-04-25  0:01                       ` Michael Witten
  1 sibling, 1 reply; 90+ messages in thread
From: Björn Steinbrink @ 2009-04-24 23:16 UTC (permalink / raw)
  To: Michael Witten
  Cc: Jeff King, Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On 2009.04.24 17:18:44 -0500, Michael Witten wrote:
> On Fri, Apr 24, 2009 at 16:38, Jeff King <peff@peff.net> wrote:
> > On Fri, Apr 24, 2009 at 05:34:00PM -0400, Daniel Barkalow wrote:
> >
> >> I'd say that blobs and trees are an implementation detail of "the full
> >> content of a version of the project", not something conceptually
> >> important. Likewise, the date representation used in commits isn't
> > ...
> > No, that isn't critical for understanding how _commit_ operations work,
> > but I think that is exactly the sort of conceptual knowledge that let
> > people use git more fully.
> 
> I think the key conlusion here is that the main concepts are *objects*
> and references to those objects. One type of object is not necessarily
> more low-level or high-level than another type of object; each type of
> object is the most important type of object for a particular task in
> or view of the git world.
> 
> > I disagree. I think it's important to note that trees and blobs have a
> > name, and you can refer to them. Once you know that, the fact that you
> > can do:
> >
> >  git show master
> >  git show master:Documentation
> >  git show master:Makefile
> >
> > just makes sense. You are always just specifying an object, but the type
> > is different for each (and show "does the right thing" based on object
> > type).
> 
> In fact, I think it's important to note that the notation:
> 
>     git show master:Makefile
> 
> actually involves a translation from a Unix filesystem address to a
> git object address that is then used to find the relevant data.

Hm? Resolving master:Makefile means to first find what master is, most
likely the shortname for refs/heads/master. That usually references a
commit object (by its name). The "<tree-ish>:<path>" syntax then causes
git to lookup the tree referenced by that commit (again, by its name).
And then the tree entry for "Makefile" is looked up, leading to the name
for the object identified by "master:Makefile".

> In fact, I think masking this kind of thing with a catch-all word
> 'reference' is a bad idea.

"master:Makefile" is not a reference. Just "master" is a shortname for a
reference, the full name might be refs/heads/master.

git has:
 - object names (which happen to be SHA-1 hashes).
 - references (which reference objects by their names)
 - symbolic references (which reference other references by their names)

The "<tree-ish>:<path>" syntax is not called "reference".

> Rather than being hidden, it should be exposed: I think it would be
> beneficial to use the word 'address' rather than 'reference' when
> talking about the SHA-1 names. Then HEAD could be called a pointer
> variable, etc.

What's wrong with just calling the object name "object name"? References
are something different, and the above "master:Makefile" is yet a
different thing, using the "extended SHA1" syntax to identify an object.

> So, a pointer variable's value is an object address that is the
> location of an object in git 'memory'. I think using this approach
> would make things significantly more transparent.

But then HEAD would be a pointer pointer variable (symbolic ref), unless
you have a detached HEAD.

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:14                         ` Jeff King
@ 2009-04-24 23:18                           ` Michael Witten
  2009-04-24 23:31                           ` Michael Witten
  2009-04-25 10:18                           ` Felipe Contreras
  2 siblings, 0 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-24 23:18 UTC (permalink / raw)
  To: Jeff King
  Cc: Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On Fri, Apr 24, 2009 at 18:14, Jeff King <peff@peff.net> wrote:
> but aren't
> they generally reviled among the progrmaming populace as the most
> complex and error-prone part of learning to program?

And now you know why people struggle with git; as I said in a previous email:

    http://marc.info/?l=git&m=124022418313288&w=2

    I think that the human brain struggles with indirection.
    Consider that so many programmers have a hard time
    understanding pointers; no wonderso many people
    find git's underlying concepts boggling.

Of course, the difference here is that we're not asking people to do
memory management; we have garbage collection.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 21:38                 ` Jeff King
  2009-04-24 22:18                   ` Michael Witten
@ 2009-04-24 23:21                   ` Daniel Barkalow
  2009-04-24 23:25                     ` Jeff King
  2009-04-24 23:29                     ` Michael Witten
  2009-04-25  0:19                   ` David Abrahams
  2 siblings, 2 replies; 90+ messages in thread
From: Daniel Barkalow @ 2009-04-24 23:21 UTC (permalink / raw)
  To: Jeff King
  Cc: Johan Herland, Michael Witten, git, David Abrahams,
	J. Bruce Fields

On Fri, 24 Apr 2009, Jeff King wrote:

> On Fri, Apr 24, 2009 at 05:34:00PM -0400, Daniel Barkalow wrote:
> 
> > I'd say that blobs and trees are an implementation detail of "the full 
> > content of a version of the project", not something conceptually 
> > important. Likewise, the date representation used in commits isn't 
> 
> I disagree. I think it's important to note that trees and blobs have a
> name, and you can refer to them. Once you know that, the fact that you
> can do:
> 
>   git show master
>   git show master:Documentation
>   git show master:Makefile
> 
> just makes sense. You are always just specifying an object, but the type
> is different for each (and show "does the right thing" based on object
> type).
> 
> No, that isn't critical for understanding how _commit_ operations work,
> but I think that is exactly the sort of conceptual knowledge that let
> people use git more fully.

Yeah, I'll agree with that. They're good to explain as "these are things 
git can tell you about", but they're not relevant to the discussion of 
"what is history".

(And, actually, I think git has a few usability warts due to relying too 
much on command line arguments being objects; it would be quite nice if 
"git blame 1a2b3c:Makefile" worked despite this technically being 
incoherent.)

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:21                   ` Daniel Barkalow
@ 2009-04-24 23:25                     ` Jeff King
  2009-04-26 23:41                       ` Björn Steinbrink
  2009-04-24 23:29                     ` Michael Witten
  1 sibling, 1 reply; 90+ messages in thread
From: Jeff King @ 2009-04-24 23:25 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Johan Herland, Michael Witten, git, David Abrahams,
	J. Bruce Fields

On Fri, Apr 24, 2009 at 07:21:26PM -0400, Daniel Barkalow wrote:

> (And, actually, I think git has a few usability warts due to relying too 
> much on command line arguments being objects; it would be quite nice if 
> "git blame 1a2b3c:Makefile" worked despite this technically being 
> incoherent.)

Yeah, I think another is that "git show master:file" will not do CRLF or
other filters, and "git diff master:file other:file" will not respect
diff settings. I think all of those could be solved by path lookup
attaching a "here is a pathname I used to get to this object" string,
which can then be accessed as appropriate.

It is not all that different conceptually than what "git rev-list
--objects" does.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:11                       ` Daniel Barkalow
  2009-04-24 23:14                         ` Jeff King
@ 2009-04-24 23:26                         ` Michael Witten
  2009-04-25 18:55                           ` Daniel Barkalow
  2009-04-25  0:41                         ` David Abrahams
  2 siblings, 1 reply; 90+ messages in thread
From: Michael Witten @ 2009-04-24 23:26 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Jeff King, Johan Herland, git, David Abrahams, J. Bruce Fields

On Fri, Apr 24, 2009 at 18:11, Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Fri, 24 Apr 2009, Michael Witten wrote:
>
>> On Fri, Apr 24, 2009 at 17:18, Michael Witten <mfwitten@gmail.com> wrote:
>> > In fact, I think masking this kind of thing with a catch-all word
>> > 'reference' is a bad idea. Rather than being hidden, it should be
>> > exposed: I think it would be beneficial to use the word 'address'
>> > rather than 'reference' when talking about the SHA-1 names. Then HEAD
>> > could be called a pointer variable, etc.
>> >
>> > So, a pointer variable's value is an object address that is the
>> > location of an object in git 'memory'. I think using this approach
>> > would make things significantly more transparent.
>>
>> In fact, it's not particularly important that SHA-1 is used to compute
>> the address into git memory. The only thing that's important is that
>> the address is determined by content alone (I'm not even sure that
>> specifying that the address is a cryptographically sound hash of the
>> content is important; shouldn't that follow from the declaration that
>> it must be uniquely based on content alone?); the fact that's a SHA-1
>> is purely an implementation detail, and so it shouldn't appear
>> prominently in the documentation.
>>
>> So, what do you say?
>>
>> Let's start a reformation of the git terminology to use analogies that
>> have been around since the dawn of computing: 'memory', 'address', and
>> 'pointer'.
>
> I actually think calling them "sha1s" is better, simply because this bit
> of jargon doesn't mean anything else (git deals with email, so "address"
> is overloaded).

I don't know if I buy that reason; the human brain is pretty good with context.

I would at least like 'location' better.

> And the term is already in use for this particular case,
> and it doesn't mean anything else at all (since, of course, the crypto
> thing is "SHA-1", not "sha1"), and it's short (which is important for
> making it easy to look at usage help).

What happens when SHA-1 is shown to be broken or there is a better
alternative? Then we'll see "sha1 for historical reasons"... bleh!

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:21                   ` Daniel Barkalow
  2009-04-24 23:25                     ` Jeff King
@ 2009-04-24 23:29                     ` Michael Witten
  2009-04-27  0:00                       ` Björn Steinbrink
  1 sibling, 1 reply; 90+ messages in thread
From: Michael Witten @ 2009-04-24 23:29 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Jeff King, Johan Herland, git, David Abrahams, J. Bruce Fields

On Fri, Apr 24, 2009 at 18:21, Daniel Barkalow <barkalow@iabervon.org> wrote:
> "git blame 1a2b3c:Makefile" worked despite this technically being
> incoherent.

It seems to work on my end, and it's perfectly coherent if you
consider git-blame to be overloaded to handle both pointers and
addresses (or references and object names, if you prefer).

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:14                         ` Jeff King
  2009-04-24 23:18                           ` Michael Witten
@ 2009-04-24 23:31                           ` Michael Witten
  2009-04-24 23:35                             ` Jeff King
  2009-04-25 10:18                           ` Felipe Contreras
  2 siblings, 1 reply; 90+ messages in thread
From: Michael Witten @ 2009-04-24 23:31 UTC (permalink / raw)
  To: Jeff King
  Cc: Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On Fri, Apr 24, 2009 at 18:14, Jeff King <peff@peff.net> wrote:
> Junio suggested "object name" in another thread, which I think is nicely
> descriptive.

The reason I don't like "object name" is that "name" has connotations
that don't go well with the idea of referencing. Isn't "address" (or
"location") better in this sense?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:31                           ` Michael Witten
@ 2009-04-24 23:35                             ` Jeff King
  2009-04-25  0:19                               ` Michael Witten
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff King @ 2009-04-24 23:35 UTC (permalink / raw)
  To: Michael Witten
  Cc: Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On Fri, Apr 24, 2009 at 06:31:26PM -0500, Michael Witten wrote:

> On Fri, Apr 24, 2009 at 18:14, Jeff King <peff@peff.net> wrote:
> > Junio suggested "object name" in another thread, which I think is nicely
> > descriptive.
> 
> The reason I don't like "object name" is that "name" has connotations
> that don't go well with the idea of referencing. Isn't "address" (or
> "location") better in this sense?

I'm not sure I agree, but if you are concerned with "name", then I think
something like "object id" or "object identifier" would probably be
better. "address" and "location" imply to me that they are part of a
contiguous set. And while technically they may be considered addresses
of a sparse 2^160 array, I'm not sure that explanation is really helping
new users understand what is going on.

What the user really cares about is that it is persistent and
unambiguous.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:16                     ` Björn Steinbrink
@ 2009-04-25  0:01                       ` Michael Witten
  2009-04-25  0:48                         ` David Abrahams
  2009-05-02 15:53                         ` Björn Steinbrink
  0 siblings, 2 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-25  0:01 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Jeff King, Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

2009/4/24 Björn Steinbrink <B.Steinbrink@gmx.de>:
>> In fact, I think it's important to note that the notation:
>>
>>     git show master:Makefile
>>
>> actually involves a translation from a Unix filesystem address to a
>> git object address that is then used to find the relevant data.
>
> Hm? Resolving master:Makefile means to first find what master is, most
> likely the shortname for refs/heads/master. That usually references a
> commit object (by its name). The "<tree-ish>:<path>" syntax then causes
> git to lookup the tree referenced by that commit (again, by its name).
> And then the tree entry for "Makefile" is looked up, leading to the name
> for the object identified by "master:Makefile".

Firstly, your head is too bound to low-level implementation.

Secondly, you've basically just expounded upon what I said. The
Makefile part is for humans to write using a filesystem path (address)
that is mapped into what I call a git address. The point is that the
user is interfacing between two theories of content storage.

>> In fact, I think masking this kind of thing with a catch-all word
>> 'reference' is a bad idea.
>
> "master:Makefile" is not a reference. Just "master" is a shortname for a
> reference, the full name might be refs/heads/master.
>
> git has:
>  - object names (which happen to be SHA-1 hashes).
>  - references (which reference objects by their names)
>  - symbolic references (which reference other references by their names)
>
> The "<tree-ish>:<path>" syntax is not called "reference".

I will admit that I used this term wrongly then, and that git has a
set of terminologies much closer to what I proposed:

    * object addresses: object names
    * pointers: references
    * handle: symbolic reference (I don't know, I just now made that one up)

I was under the impression that object names were in fact called
references and that things like '[refs/heads/]master' were just
considered conveniences. I'm glad to have been disabused; though I
like my terms better ;-D

>> Rather than being hidden, it should be exposed: I think it would be
>> beneficial to use the word 'address' rather than 'reference' when
>> talking about the SHA-1 names. Then HEAD could be called a pointer
>> variable, etc.
>
> What's wrong with just calling the object name "object name"?

What's wrong with calling the object address "object address"?

As I've stated: "address", "pointer", and "handle" are an analogy to
terminology that has been around for ages. In fact, another name for
"pointer" is "reference".

> are something different, and the above "master:Makefile" is yet a
> different thing, using the "extended SHA1" syntax to identify an object.

It is certainly something different. It's an interface between
theories of content storage.

>> So, a pointer variable's value is an object address that is the
>> location of an object in git 'memory'. I think using this approach
>> would make things significantly more transparent.
>
> But then HEAD would be a pointer pointer variable (symbolic ref), unless
> you have a detached HEAD.

We call those handles.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 21:38                 ` Jeff King
  2009-04-24 22:18                   ` Michael Witten
  2009-04-24 23:21                   ` Daniel Barkalow
@ 2009-04-25  0:19                   ` David Abrahams
  2009-04-25  0:26                     ` Michael Witten
  2009-04-25  0:35                     ` Jeff King
  2 siblings, 2 replies; 90+ messages in thread
From: David Abrahams @ 2009-04-25  0:19 UTC (permalink / raw)
  To: Jeff King
  Cc: Daniel Barkalow, Johan Herland, Michael Witten, git,
	J. Bruce Fields


On Apr 24, 2009, at 5:38 PM, Jeff King wrote:

> On Fri, Apr 24, 2009 at 05:34:00PM -0400, Daniel Barkalow wrote:
>
>> I'd say that blobs and trees are an implementation detail of "the  
>> full
>> content of a version of the project", not something conceptually
>> important. Likewise, the date representation used in commits isn't
>
> I disagree. I think it's important to note that trees and blobs have a
> name, and you can refer to them. Once you know that, the fact that you
> can do:
>
>  git show master
>  git show master:Documentation
>  git show master:Makefile
>
> just makes sense. You are always just specifying an object, but the  
> type
> is different for each (and show "does the right thing" based on object
> type).


I don't believe you need to know about trees and blobs to make sense  
of that.  Those are just directories and files.  The whole idea that  
trees are a more-general thing that could be used to represent  
something other than directory structure and blobs could be used to  
represent something other than file contents is way below most  
peoples' need-to-know threshold.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:35                             ` Jeff King
@ 2009-04-25  0:19                               ` Michael Witten
  0 siblings, 0 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-25  0:19 UTC (permalink / raw)
  To: Jeff King
  Cc: Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On Fri, Apr 24, 2009 at 18:35, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 24, 2009 at 06:31:26PM -0500, Michael Witten wrote:
>
>> On Fri, Apr 24, 2009 at 18:14, Jeff King <peff@peff.net> wrote:
>> > Junio suggested "object name" in another thread, which I think is nicely
>> > descriptive.
>>
>> The reason I don't like "object name" is that "name" has connotations
>> that don't go well with the idea of referencing. Isn't "address" (or
>> "location") better in this sense?
>
> I'm not sure I agree, but if you are concerned with "name", then I think
> something like "object id" or "object identifier" would probably be
> better. "address" and "location" imply to me that they are part of a
> contiguous set. And while technically they may be considered addresses
> of a sparse 2^160 array, I'm not sure that explanation is really helping
> new users understand what is going on.

You make an interesting point about implied contiguousness, but I
don't think any git operation is in danger of evoking that thought. I
mainly like the idea of "address" and "location", because they go
extremely well with "pointer", "handle" and the idea of a "git store
(memory)". Most importantly, this is an analogy that has been around a
long time.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25  0:19                   ` David Abrahams
@ 2009-04-25  0:26                     ` Michael Witten
  2009-04-25  0:35                     ` Jeff King
  1 sibling, 0 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-25  0:26 UTC (permalink / raw)
  To: David Abrahams
  Cc: Jeff King, Daniel Barkalow, Johan Herland, git, J. Bruce Fields

On Fri, Apr 24, 2009 at 19:19, David Abrahams <dave@boostpro.com> wrote:
>>  git show master
>>  git show master:Documentation
>>  git show master:Makefile
>>
>> just makes sense. You are always just specifying an object, but the type
>> is different for each (and show "does the right thing" based on object
>> type).
>
> I don't believe you need to know about trees and blobs to make sense of
> that.  Those are just directories and files.

I still think the key is that commits and blobs and trees are all
objects, and the important things are the concepts of objects, object
addresses, object pointers, and handles (or, what everyone else calls
objects, object names, references, and symbolic references).

Also, you've mixed in the theory of file system addressing in with the
theory of git addressing. I think it's important to realize that the
tool 'git show' is actually providing a translation between the two
worlds. There's not really any need for paths to be considered a
fundamental git concept; simply, git tools know how to translate
between both worlds.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25  0:19                   ` David Abrahams
  2009-04-25  0:26                     ` Michael Witten
@ 2009-04-25  0:35                     ` Jeff King
  2009-04-25  0:53                       ` David Abrahams
  1 sibling, 1 reply; 90+ messages in thread
From: Jeff King @ 2009-04-25  0:35 UTC (permalink / raw)
  To: David Abrahams
  Cc: Daniel Barkalow, Johan Herland, Michael Witten, git,
	J. Bruce Fields

On Fri, Apr 24, 2009 at 08:19:18PM -0400, David Abrahams wrote:

>>  git show master
>>  git show master:Documentation
>>  git show master:Makefile
>>
> I don't believe you need to know about trees and blobs to make sense of 
> that.  Those are just directories and files.  The whole idea that trees 
> are a more-general thing that could be used to represent something other 
> than directory structure and blobs could be used to represent something 
> other than file contents is way below most peoples' need-to-know 
> threshold.

Actually, it is not the generally of trees that I think is interesting
there, but the generality of _objects_. That is, each of those things is
a first-class object, and has a unique name by which it can be referred.
The examples above are just _one_ of the ways you can refer to the same
objects.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 22:45                       ` Björn Steinbrink
@ 2009-04-25  0:39                         ` David Abrahams
  2009-04-26 23:35                           ` Björn Steinbrink
  0 siblings, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-25  0:39 UTC (permalink / raw)
  To: Björn Steinbrink; +Cc: Jeff King, Michael Witten, J. Bruce Fields, git


On Apr 24, 2009, at 6:45 PM, Björn Steinbrink wrote:

>> I think UI/API works way better than porcelain/plumbing. We are,  
>> after
>> all, programmers.
>
> We are programmers, but not all git users are programmers.

I'm sure you will admit that the vast majority are programmers.  This  
is about speaking effectively to your primary audience.

>> It would also be good to link to a definition any time you use a term
>> of art in the docs. I would even do that in the case of UI/API since
>> the distinction could appear to be subtle.
>>
>> I should also say, most of the docs and interfaces I see in Git (and
>> its wrappers, web interfaces, etc.) give the SHA1 hashes way too much
>> exposure. The times when it's actually more convenient to use a hash
>> instead of one of the other notations are rare,
>
> How often do you need a name for a commit shown by a command and can
> accept that it is not stable?

I can accept it as long as it's stable inside my own repo.  Maybe I  
need the SHA1 to talk about it wherever it may roam.  I think you  
could count in the other direction (i.e. from the roots instead of the  
leaves) to get fairly stable symbolic names.

Also, I don't think I need to see the hashes for trees and blobs most  
of the time.

> I usually need a name because I
> want to reference that commit later on, either because I need to  
> talk to
> other users, or because I'm working on something and might need to  
> look
> at that commit now and then, regardless on my current state of things.
> One big exception in my workflow is when I use "git blame", then I
> usually just need the name once to look at the full commit. But then I
> prefer a 7-8 characters long sha-1 prefix to something like
> improve_foo_speed~132^12~1^3. And "pseudo-stable" numbers have been
> discussed to death.

Okay, I "say uncle."

>> and if hashes weren't so exposed I bet most interfaces would make
>> those other names more available. One reason I think hashes retain
>> their prominent exposure is that you have no other reasonably stable
>> way of referring to commits, since branch~NN counts backward from
>> HEAD. Adding such a thing would help.
>
> It counts backwards from "branch".

Right, thanks.

>> Oh, one other specific issue: the rev-parse manpage uses $GIT_DIR
>> without saying what it is. I *think* that means the root of the
>> working copy and has nothing to do with environment variables, but
>> it's hard to be sure, and if I'm right about that, it's misleading
>> notation.
>
> $GIT_DIR means the .git directory of a non-bare repo.


Thanks for clarifying.  But don't neglect to fix the docs so the next  
guy doesn't have to ask ;-)

BTW, "[non-]bare repo" is yet another Git-specific jargon.  I know  
what it means... again, only because I asked someone.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:11                       ` Daniel Barkalow
  2009-04-24 23:14                         ` Jeff King
  2009-04-24 23:26                         ` Michael Witten
@ 2009-04-25  0:41                         ` David Abrahams
  2 siblings, 0 replies; 90+ messages in thread
From: David Abrahams @ 2009-04-25  0:41 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Michael Witten, Jeff King, Johan Herland, git, J. Bruce Fields


On Apr 24, 2009, at 7:11 PM, Daniel Barkalow wrote:

> I actually think calling them "sha1s" is better, simply because this  
> bit
> of jargon doesn't mean anything else (git deals with email, so  
> "address"
> is overloaded). And the term is already in use for this particular  
> case,
> and it doesn't mean anything else at all (since, of course, the crypto
> thing is "SHA-1", not "sha1"), and it's short (which is important for
> making it easy to look at usage help).


The word "hash" would be an improvement.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25  0:01                       ` Michael Witten
@ 2009-04-25  0:48                         ` David Abrahams
  2009-04-26 22:42                           ` Björn Steinbrink
  2009-05-02 15:53                         ` Björn Steinbrink
  1 sibling, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-25  0:48 UTC (permalink / raw)
  To: Michael Witten
  Cc: Björn Steinbrink, Jeff King, Daniel Barkalow, Johan Herland,
	git, J. Bruce Fields


On Apr 24, 2009, at 8:01 PM, Michael Witten wrote:

>> What's wrong with just calling the object name "object name"?
>
> What's wrong with calling the object address "object address"?


Neither captures the connection to the object's contents.  I think  
"value ID" would be closer, but it's probably too horrible.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25  0:35                     ` Jeff King
@ 2009-04-25  0:53                       ` David Abrahams
  2009-04-29  6:34                         ` Jeff King
  0 siblings, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-25  0:53 UTC (permalink / raw)
  To: Jeff King
  Cc: Daniel Barkalow, Johan Herland, Michael Witten, git,
	J. Bruce Fields


On Apr 24, 2009, at 8:35 PM, Jeff King wrote:

> On Fri, Apr 24, 2009 at 08:19:18PM -0400, David Abrahams wrote:
>
>>> git show master
>>> git show master:Documentation
>>> git show master:Makefile
>>>
>> I don't believe you need to know about trees and blobs to make  
>> sense of
>> that.  Those are just directories and files.  The whole idea that  
>> trees
>> are a more-general thing that could be used to represent something  
>> other
>> than directory structure and blobs could be used to represent  
>> something
>> other than file contents is way below most peoples' need-to-know
>> threshold.
>
> Actually, it is not the generally of trees that I think is interesting
> there, but the generality of _objects_. That is, each of those  
> things is
> a first-class object, and has a unique name by which it can be  
> referred.


I'm sorry, but I think most people would find that so unremarkable  
that making a big deal about it would lead to "what am I missing here"  
confusion.  Maybe a person who's exclusively used CVS (or older)  
technologies before coming to Git would be happy to know that, but  
it's sort of obvious.  In CVS the lack of first-class directories  
sticks out like a sore thumb.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:14                         ` Jeff King
  2009-04-24 23:18                           ` Michael Witten
  2009-04-24 23:31                           ` Michael Witten
@ 2009-04-25 10:18                           ` Felipe Contreras
  2 siblings, 0 replies; 90+ messages in thread
From: Felipe Contreras @ 2009-04-25 10:18 UTC (permalink / raw)
  To: Jeff King
  Cc: Daniel Barkalow, Michael Witten, Johan Herland, git,
	David Abrahams, J. Bruce Fields

On Sat, Apr 25, 2009 at 2:14 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 24, 2009 at 07:11:40PM -0400, Daniel Barkalow wrote:
>
>> > Let's start a reformation of the git terminology to use analogies that
>> > have been around since the dawn of computing: 'memory', 'address', and
>> > 'pointer'.
>>
>> I actually think calling them "sha1s" is better, simply because this bit
>> of jargon doesn't mean anything else (git deals with email, so "address"
>> is overloaded). And the term is already in use for this particular case,
>> and it doesn't mean anything else at all (since, of course, the crypto
>> thing is "SHA-1", not "sha1"), and it's short (which is important for
>> making it easy to look at usage help).
>
> Junio suggested "object name" in another thread, which I think is nicely
> descriptive.

It's not a name, it's an identification, so how about "id"? You have
tree ids, commit ids, blob ids, and so on.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 18:52                     ` J. Bruce Fields
@ 2009-04-25 10:35                       ` Felipe Contreras
  0 siblings, 0 replies; 90+ messages in thread
From: Felipe Contreras @ 2009-04-25 10:35 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: David Abrahams, Michael Witten, Jeff King, git

On Fri, Apr 24, 2009 at 9:52 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> That would be great, thanks.  Several people have gone off and posted
> their own tutorials someplace, and that's fine, but it would be
> especially helpful if you could contribute to the actual Documentation/
> directory.  That may mean arguing with people and making compromises.
> But it also means the results will be distributed with git, will be
> integrated with other git documentation, and will get first-class
> technical review.
>
> I'd also encourage incrementally improving existing documentation where
> possible instead of starting over from scratch.  But having broken that
> rule myself a couple times I'm hardly in a position to insist.  If you
> must start over, at least think about how to replace or fit it in with
> existing documentation.

People will continue to write git documentation from scratch because
there is a huge gap from the top-bottom approach to a point where you
actually "get git", and people are trying to find short-cuts so that
other people can really get it too.

I spent years using git simply repeating the templates I had seen in
multiple places until I stumbled upon "git from the bottom up" and
then I finally understood the beauty and simplicity of git's design.
From that point I understood why many command didn't do what I
expected.

Note that "bottom" doesn't mean plumbing, the "plumbing" is usually
referred to the git.git tools, but you can work with git low-level
objects through your own implementation as people like Scott Chacon
have indeed done (git-ruby). "bottom" then means git basic building
blocks: blobs, trees, commits, refs.

Ideally the UI should expose the basic concepts of git, but instead
its is hiding them, so no wonder people *need* special documentation
to 'understand git conceptually', or learn 'git from the bottom up',
etc.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:26                         ` Michael Witten
@ 2009-04-25 18:55                           ` Daniel Barkalow
  2009-04-25 19:16                             ` Michael Witten
  0 siblings, 1 reply; 90+ messages in thread
From: Daniel Barkalow @ 2009-04-25 18:55 UTC (permalink / raw)
  To: Michael Witten
  Cc: Jeff King, Johan Herland, git, David Abrahams, J. Bruce Fields

On Fri, 24 Apr 2009, Michael Witten wrote:

> > And the term is already in use for this particular case,
> > and it doesn't mean anything else at all (since, of course, the crypto
> > thing is "SHA-1", not "sha1"), and it's short (which is important for
> > making it easy to look at usage help).
> 
> What happens when SHA-1 is shown to be broken or there is a better
> alternative? Then we'll see "sha1 for historical reasons"... bleh!

Why do you think SHA-1 has anything to do with it? Git's sha1s could just 
as easily be 160 bits of a SHA-256 hash and there wouldn't be any 
user-visible difference. The term doesn't imply any particular significant 
connection to a particular algorithm. It could be like "pencil lead", 
which has never been made of lead, but is called that for no particularly 
important reason.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25 18:55                           ` Daniel Barkalow
@ 2009-04-25 19:16                             ` Michael Witten
  2009-04-25 19:24                               ` Felipe Contreras
  0 siblings, 1 reply; 90+ messages in thread
From: Michael Witten @ 2009-04-25 19:16 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Jeff King, Johan Herland, git, David Abrahams, J. Bruce Fields

On Sat, Apr 25, 2009 at 13:55, Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Fri, 24 Apr 2009, Michael Witten wrote:
>
>> > And the term is already in use for this particular case,
>> > and it doesn't mean anything else at all (since, of course, the crypto
>> > thing is "SHA-1", not "sha1"), and it's short (which is important for
>> > making it easy to look at usage help).
>>
>> What happens when SHA-1 is shown to be broken or there is a better
>> alternative? Then we'll see "sha1 for historical reasons"... bleh!
>
> Why do you think SHA-1 has anything to do with it?

Well, it's named sha1.

> Git's sha1s could just
> as easily be 160 bits of a SHA-256 hash and there wouldn't be any
> user-visible difference. The term doesn't imply any particular significant
> connection to a particular algorithm.

Then give it a generic name like 'hash'.

> It could be like "pencil lead", which has never been made of lead,
> but is called that for no particularly important reason.

Hence the perennial:

    "Hey! Did you know that pencil lead isn't lead at all?"

to which someone might respond:

    "Why do you think lead has anything to do with it?"

Look familiar?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25 19:16                             ` Michael Witten
@ 2009-04-25 19:24                               ` Felipe Contreras
  2009-04-25 19:36                                 ` David Abrahams
  0 siblings, 1 reply; 90+ messages in thread
From: Felipe Contreras @ 2009-04-25 19:24 UTC (permalink / raw)
  To: Michael Witten
  Cc: Daniel Barkalow, Jeff King, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On Sat, Apr 25, 2009 at 10:16 PM, Michael Witten <mfwitten@gmail.com> wrote:
> On Sat, Apr 25, 2009 at 13:55, Daniel Barkalow <barkalow@iabervon.org> wrote:
>> On Fri, 24 Apr 2009, Michael Witten wrote:
>>
>>> > And the term is already in use for this particular case,
>>> > and it doesn't mean anything else at all (since, of course, the crypto
>>> > thing is "SHA-1", not "sha1"), and it's short (which is important for
>>> > making it easy to look at usage help).
>>>
>>> What happens when SHA-1 is shown to be broken or there is a better
>>> alternative? Then we'll see "sha1 for historical reasons"... bleh!
>>
>> Why do you think SHA-1 has anything to do with it?
>
> Well, it's named sha1.
>
>> Git's sha1s could just
>> as easily be 160 bits of a SHA-256 hash and there wouldn't be any
>> user-visible difference. The term doesn't imply any particular significant
>> connection to a particular algorithm.
>
> Then give it a generic name like 'hash'.

For most purposes in the documentation sha1's are used as ids, so why
don't use "id" instead? Like 'commit id'. The fact that the id is also
a hash sum is hardly relevant for the user.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25 19:24                               ` Felipe Contreras
@ 2009-04-25 19:36                                 ` David Abrahams
  2009-04-25 20:53                                   ` Felipe Contreras
  2009-04-26 11:28                                   ` Björn Steinbrink
  0 siblings, 2 replies; 90+ messages in thread
From: David Abrahams @ 2009-04-25 19:36 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Michael Witten, Daniel Barkalow, Jeff King, Johan Herland, git,
	J. Bruce Fields


On Apr 25, 2009, at 3:24 PM, Felipe Contreras wrote:

> On Sat, Apr 25, 2009 at 10:16 PM, Michael Witten  
> <mfwitten@gmail.com> wrote:
>> On Sat, Apr 25, 2009 at 13:55, Daniel Barkalow  
>> <barkalow@iabervon.org> wrote:
>>> On Fri, 24 Apr 2009, Michael Witten wrote:
>>>
>>>>> And the term is already in use for this particular case,
>>>>> and it doesn't mean anything else at all (since, of course, the  
>>>>> crypto
>>>>> thing is "SHA-1", not "sha1"), and it's short (which is  
>>>>> important for
>>>>> making it easy to look at usage help).
>>>>
>>>> What happens when SHA-1 is shown to be broken or there is a better
>>>> alternative? Then we'll see "sha1 for historical reasons"... bleh!
>>>
>>> Why do you think SHA-1 has anything to do with it?
>>
>> Well, it's named sha1.
>>
>>> Git's sha1s could just
>>> as easily be 160 bits of a SHA-256 hash and there wouldn't be any
>>> user-visible difference. The term doesn't imply any particular  
>>> significant
>>> connection to a particular algorithm.
>>
>> Then give it a generic name like 'hash'.
>
> For most purposes in the documentation sha1's are used as ids, so why
> don't use "id" instead? Like 'commit id'. The fact that the id is also
> a hash sum is hardly relevant for the user.


Where it's relevant when the user notices that two distinct files have  
the same id (because they happen to have the same contents) and  
wonders what's up.

It's not a foregone conclusion that objects with the same value have  
identical ids, but it's immediately apparent if the id is known to be  
a hash.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25 19:36                                 ` David Abrahams
@ 2009-04-25 20:53                                   ` Felipe Contreras
  2009-04-26 11:28                                   ` Björn Steinbrink
  1 sibling, 0 replies; 90+ messages in thread
From: Felipe Contreras @ 2009-04-25 20:53 UTC (permalink / raw)
  To: David Abrahams
  Cc: Michael Witten, Daniel Barkalow, Jeff King, Johan Herland, git,
	J. Bruce Fields

On Sat, Apr 25, 2009 at 10:36 PM, David Abrahams <dave@boostpro.com> wrote:
>
> On Apr 25, 2009, at 3:24 PM, Felipe Contreras wrote:
> Where it's relevant when the user notices that two distinct files have the
> same id (because they happen to have the same contents) and wonders what's
> up.
>
> It's not a foregone conclusion that objects with the same value have
> identical ids, but it's immediately apparent if the id is known to be a
> hash.

That's true.

hash +1

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25 19:36                                 ` David Abrahams
  2009-04-25 20:53                                   ` Felipe Contreras
@ 2009-04-26 11:28                                   ` Björn Steinbrink
  2009-04-26 13:55                                     ` David Abrahams
  2009-04-26 16:36                                     ` Michael Witten
  1 sibling, 2 replies; 90+ messages in thread
From: Björn Steinbrink @ 2009-04-26 11:28 UTC (permalink / raw)
  To: David Abrahams
  Cc: Felipe Contreras, Michael Witten, Daniel Barkalow, Jeff King,
	Johan Herland, git, J. Bruce Fields

On 2009.04.25 15:36:24 -0400, David Abrahams wrote:
> Where it's relevant when the user notices that two distinct files have  
> the same id (because they happen to have the same contents) and wonders 
> what's up.

Why would the user have to care about the object files in the repo? And
why would your implementation save the same object twice, in two
distinct files? The SHA-1 hash is created from the object, that means
the its type, size and data. It's not an id of a file in the working
tree, but of an object.

> It's not a foregone conclusion that objects with the same value have  
> identical ids, but it's immediately apparent if the id is known to be a 
> hash.

You can't have two objects with the same contents to begin with, same
content => same object.  You can just have that one object stored
multiple times in different places (for sane implementations this likely
means that you have more than one repo to look at, and each has its own
copy of that object, but that's nothing you as an user should have to
care about).

It's an identity relation: same name/id => same object. Unlike e.g. a
hash-table where you are expected to deal with collisions, and having
the same hash doesn't mean that you have identical data. But that's not
true of git, it expects an identity relation, which is IMHO better
expressed through "object name" or "object id". You can still say that
the name/id is generated by using a hash function, but the important
part is that the name/id is used to _uniquely_ identify an object, which
isn't apparent when you call it a hash.

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-26 11:28                                   ` Björn Steinbrink
@ 2009-04-26 13:55                                     ` David Abrahams
  2009-04-26 17:56                                       ` Björn Steinbrink
  2009-04-26 16:36                                     ` Michael Witten
  1 sibling, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-26 13:55 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Felipe Contreras, Michael Witten, Daniel Barkalow, Jeff King,
	Johan Herland, git, J. Bruce Fields


On Apr 26, 2009, at 7:28 AM, Björn Steinbrink wrote:

> On 2009.04.25 15:36:24 -0400, David Abrahams wrote:
>> Where it's relevant when the user notices that two distinct files  
>> have
>> the same id (because they happen to have the same contents) and  
>> wonders
>> what's up.
>
> Why would the user have to care about the object files in the repo?

What a strange question.  I have no idea how to answer.  It seems self- 
evident to me that users of a VCS care that their files are stored in  
it.

> And
> why would your implementation save the same object twice, in two
> distinct files?

One could easily have the expectation that contents can be duplicated  
because there are numerous precedents in everyone's experience of  
computing, for example in filesystems and in any programming language  
that is not pure-functional.

> The SHA-1 hash is created from the object, that means
> the its type, size and data. It's not an id of a file in the working
> tree, but of an object

All true.  All somewhat subtle distinctions that are not nearly as  
apparent unless you actually use the word "hash" as I have been  
advocating.

>> It's not a foregone conclusion that objects with the same value have
>> identical ids, but it's immediately apparent if the id is known to  
>> be a
>> hash.
>
> You can't have two objects with the same contents to begin with, same
> content => same object.

In the Git world, I agree.  In general, I disagree.  The fact that is  
so in the Git world is reinforced by the notion that the id of an  
object is a hash of its contents.

> You can just have that one object stored
> multiple times in different places (for sane implementations this  
> likely
> means that you have more than one repo to look at, and each has its  
> own
> copy of that object, but that's nothing you as an user should have to
> care about).

> It's an identity relation: same name/id => same object. Unlike e.g. a
> hash-table where you are expected to deal with collisions, and having
> the same hash doesn't mean that you have identical data.  But that's  
> not
> true of git, it expects an identity relation, which is IMHO better
> expressed through "object name" or "object id".

Yes, that's true in the Git world (though not necessarily elsewhere),  
or at least you hope it is.  In fact, there's no guarantee that SHA1  
collisions won't occur; it's just exremely unlikely.  In fact, if you  
google it you can find some interesting papers about SHA1 collision.

Another way to express what you wrote above:

    same same id => same hash ?=> same contents => same object

where ?=> means "almost certainly implies."  What you left out was the  
implication in the other direction, which is a true guarantee at all  
steps, and "hash" is well-understood to mean

    same contents => same hash

> You can still say that
> the name/id is generated by using a hash function, but the important
> part is that the name/id is used to _uniquely_ identify an object,  
> which
> isn't apparent when you call it a hash.


I think the implication is important in both directions.  Neither one  
is self-evident to a new user.  Maybe the right answer is 'hash id'.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-26 11:28                                   ` Björn Steinbrink
  2009-04-26 13:55                                     ` David Abrahams
@ 2009-04-26 16:36                                     ` Michael Witten
  2009-04-26 18:12                                       ` Björn Steinbrink
  1 sibling, 1 reply; 90+ messages in thread
From: Michael Witten @ 2009-04-26 16:36 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: David Abrahams, Felipe Contreras, Daniel Barkalow, Jeff King,
	Johan Herland, git, J. Bruce Fields

2009/4/26 Björn Steinbrink <B.Steinbrink@gmx.de>:
> On 2009.04.25 15:36:24 -0400, David Abrahams wrote:
>> Where it's relevant when the user notices that two distinct files have
>> the same id (because they happen to have the same contents) and wonders
>> what's up.
>...
> And why would your implementation save the same object twice, in two
> distinct files?

This question makes me think that you don't understand the parent's
point. He's not talking about implementation details; in fact, there's
no reason to mix the git world and the file system world at all in
this discussion.

David is pointing out that a user might notice that two different
trees list the same blob. This can be startling if you have incomplete
picture about what's going on.

From a practical point of view, you might argue that not too many
people are looking at trees and blobs; however, it seems to me that
most people are afraid to use any of git's most useful features
precisely because they don't understand the git model and they don't
understand that nothing is ever lost unless you explicitly clean up
unreferenced objects---they don't see how easy it is manipulate their
repos. I argue that if they are given the full knowledge of git's
concepts, then they will be able to reason about their repo actions
with confidence, even if they only work with commits.

I think the key is to stress in the documentation the idea that there
are 2 separate worlds (the git object world and the working
directory's file system world) and that the git tools provide an
interface between them; this seems like a small and unnecessarily
academic point, but I believe that it's important to working with
confidence.

> ...
> You can't have two objects with the same contents to begin with, same
> content => same object.  You can just have that one object stored
> multiple times in different places (for sane implementations this likely
> means that you have more than one repo to look at, and each has its own
> copy of that object, but that's nothing you as an user should have to
> care about).

Indeed it's nothing you should care about. It's an implementation
detail again; theoretically, every repo is in the same git world where
all git objects are stored---in a sense, a particular repo state is
itself an object of this world.

> It's an identity relation: same name/id => same object. Unlike e.g. a
> hash-table where you are expected to deal with collisions, and having
> the same hash doesn't mean that you have identical data.

However, having the same *cryptographic* hash does mean that you have
identical data.

The overall point is this: The documentation should force people to
learn the right ideas, so that they can have confidence to think
beyond blog-post templates for using git.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-26 13:55                                     ` David Abrahams
@ 2009-04-26 17:56                                       ` Björn Steinbrink
  2009-04-26 20:17                                         ` David Abrahams
  0 siblings, 1 reply; 90+ messages in thread
From: Björn Steinbrink @ 2009-04-26 17:56 UTC (permalink / raw)
  To: David Abrahams
  Cc: Felipe Contreras, Michael Witten, Daniel Barkalow, Jeff King,
	Johan Herland, git, J. Bruce Fields

On 2009.04.26 09:55:34 -0400, David Abrahams wrote:
>
> On Apr 26, 2009, at 7:28 AM, Björn Steinbrink wrote:
>
>> On 2009.04.25 15:36:24 -0400, David Abrahams wrote:
>>> Where it's relevant when the user notices that two distinct files
>>> have the same id (because they happen to have the same contents) and
>>> wonders what's up.
>>
>> Why would the user have to care about the object files in the repo?
>
> What a strange question. I have no idea how to answer. It seems
> self- evident to me that users of a VCS care that their files are
> stored in it.

_Their_ files. The files that come from/end up in the working tree. I
cared about those when I used SVN, too. But I never went to the SVN repo
to find out if there are two equal files in it. We're talking about
object names, and those belong to objects, not files in the working
tree.

>> And why would your implementation save the same object twice, in two
>> distinct files?
>
> One could easily have the expectation that contents can be duplicated  
> because there are numerous precedents in everyone's experience of  
> computing, for example in filesystems and in any programming language  
> that is not pure-functional.

That's not answering my question. I asked why you come up with an
implementation that is "broken" enough to save the same object twice
with different file names.  If the implementation does not do that, your
"when the user notices that two distinct files has the same id" is
immediately invalid. The user cannot come into that situation then. And
anyway, when the user notices something, that's a discovery, not an
expectation.

>> The SHA-1 hash is created from the object, that means
>> the its type, size and data. It's not an id of a file in the working
>> tree, but of an object
>
> All true.  All somewhat subtle distinctions that are not nearly as  
> apparent unless you actually use the word "hash" as I have been  
> advocating.

Hu? How does saying "object hash" instead of "object id" make it any
more apparent that a file in the working tree is something else than a
git object?

>>> It's not a foregone conclusion that objects with the same value have
>>> identical ids, but it's immediately apparent if the id is known to  
>>> be a
>>> hash.
>>
>> You can't have two objects with the same contents to begin with, same
>> content => same object.
>
> In the Git world, I agree.  In general, I disagree.

I don't think were discussing a term to describe something that
identifies an object in general. So, "in general" you can disagree as
much as you want, but for git that doesn't matter at all.

> The fact that is so in the Git world is reinforced by the notion that
> the id of an object is a hash of its contents.
>
>> You can just have that one object stored multiple times in different
>> places (for sane implementations this  likely means that you have
>> more than one repo to look at, and each has its  own copy of that
>> object, but that's nothing you as an user should have to care about).
>
>> It's an identity relation: same name/id => same object. Unlike e.g. a
>> hash-table where you are expected to deal with collisions, and having
>> the same hash doesn't mean that you have identical data.  But that's
>> not true of git, it expects an identity relation, which is IMHO
>> better expressed through "object name" or "object id".
>
> Yes, that's true in the Git world (though not necessarily elsewhere), or 
> at least you hope it is.  In fact, there's no guarantee that SHA1  
> collisions won't occur; it's just exremely unlikely.  In fact, if you  
> google it you can find some interesting papers about SHA1 collision.

Sure, it's an assumption that has been made and is required to hold true
for git to work.

> Another way to express what you wrote above:
>
>    same same id => same hash ?=> same contents => same object
>
> where ?=> means "almost certainly implies."

No, that chain shows how git could be "unreliable" when you get hash
collisions. You could put that into a chapter that explains the
implications of the way git generates its object ids. But it's not very
interesting when you use git and (implicitly) trust the assumption that
no collisions happen.

For that case, you need a different chain:

same name/id ==> same object ==> same content

That's interesting when you e.g. want to "access" some object or when
you look at a tree that references the same object twice. For example
when both references are for file entries, you know that those files
have the same content. That it is a hash doesn't matter, the id could be
anything that uniquely identifies an object. The "same object ==> same
content" part should be pretty obvious, so you only need to know that
the "same name/id ==> same object" part is true, i.e. that the object
name/id uniquely identifies the object. And that _is_ true, simply
because you cannot have two objects in the same repo that have the same
hash and thus the same id. Even if you get a collision, you'll still
have just one object.  And that's not something that a term that
contains the word "hash" is telling me, it would instead tell me that it
is not something that really uniquely identifies an object, although git
uses it as such.


Only when you want to explain how git manages to avoid duplicated
storage of fully identical contents, then you need to mention that the
object names are the hashes of the full object contents. But that's not
what you actually use the object names for.

same content ==> same content hash ==> object name/id ==> same object

(Actually, you need an additional detail: "same
file/symlink/directory/... contents ==> same object contents", which
can't be made explicit by just saying that you use a hash).

Your chain was in the wrong order and explains neither the "a tree that
has the same object name/id for two entries" case (because of the
uncertainity of the "same hash ?=> same content" part), nor, when read
in the other direction, where all implications are true, why same
content leads to the same object (as it already starts at the object
level).

>> You can still say that the name/id is generated by using a hash
>> function, but the important part is that the name/id is used to
>> _uniquely_ identify an object,  which isn't apparent when you call it
>> a hash.
>
> I think the implication is important in both directions.  Neither one is 
> self-evident to a new user.  Maybe the right answer is 'hash id'.

git could work different. Just moving the storage of the filenames from
the tree objects to the blobs would mean that you'd get different
objects for files that have the same content but different names. You'd
still have a hash of the object contents as the object name, but
suddenly you get more objects. Just saying "hash" or "hash id" doesn't
magically explain all the other things.


Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-26 16:36                                     ` Michael Witten
@ 2009-04-26 18:12                                       ` Björn Steinbrink
  2009-04-26 20:20                                         ` David Abrahams
  0 siblings, 1 reply; 90+ messages in thread
From: Björn Steinbrink @ 2009-04-26 18:12 UTC (permalink / raw)
  To: Michael Witten
  Cc: David Abrahams, Felipe Contreras, Daniel Barkalow, Jeff King,
	Johan Herland, git, J. Bruce Fields

On 2009.04.26 11:36:04 -0500, Michael Witten wrote:
> 2009/4/26 Björn Steinbrink <B.Steinbrink@gmx.de>:
> > On 2009.04.25 15:36:24 -0400, David Abrahams wrote:
> >> Where it's relevant when the user notices that two distinct files have
> >> the same id (because they happen to have the same contents) and wonders
> >> what's up.
> >...
> > And why would your implementation save the same object twice, in two
> > distinct files?
> 
> This question makes me think that you don't understand the parent's
> point. He's not talking about implementation details; in fact, there's
> no reason to mix the git world and the file system world at all in
> this discussion.
> 
> David is pointing out that a user might notice that two different
> trees list the same blob. This can be startling if you have incomplete
> picture about what's going on.

David said that the user encounters two distinct files with the same id.
The ids are properties of the objects. So he must have meant object
files, or he attributed the id to the wrong thing. I assumed that he
didn't mix those things up and really meant the object files, thus my
reply.

> >From a practical point of view, you might argue that not too many
> people are looking at trees and blobs;

Heh, I'd rather argue that too _few_ people have looked at commits and
trees at least once, whether it's an actual object or a graph like in
git for computer scientists.

> however, it seems to me that most people are afraid to use any of
> git's most useful features precisely because they don't understand the
> git model and they don't understand that nothing is ever lost unless
> you explicitly clean up unreferenced objects---they don't see how easy
> it is manipulate their repos. I argue that if they are given the full
> knowledge of git's concepts, then they will be able to reason about
> their repo actions with confidence, even if they only work with
> commits.

Agreed.

> I think the key is to stress in the documentation the idea that there
> are 2 separate worlds (the git object world and the working
> directory's file system world) and that the git tools provide an
> interface between them; this seems like a small and unnecessarily
> academic point, but I believe that it's important to working with
> confidence.

Agreed. That's also why I asked David why the user would look at the
object files in the repo (the .git dir). To some degree those are also
an implementation detail. The user works with the working tree and uses
the git tools to modify the repo.

> > It's an identity relation: same name/id => same object. Unlike e.g. a
> > hash-table where you are expected to deal with collisions, and having
> > the same hash doesn't mean that you have identical data.
> 
> However, having the same *cryptographic* hash does mean that you have
> identical data.

That's the _assumption_ that git makes. Hash collisions are always
possible, just hard to create intentionally when the hash function has
not yet been broken.

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-26 17:56                                       ` Björn Steinbrink
@ 2009-04-26 20:17                                         ` David Abrahams
  2009-04-26 22:25                                           ` Björn Steinbrink
  0 siblings, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-26 20:17 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Felipe Contreras, Michael Witten, Daniel Barkalow, Jeff King,
	Johan Herland, git, J. Bruce Fields


On Apr 26, 2009, at 1:56 PM, Björn Steinbrink wrote:

> On 2009.04.26 09:55:34 -0400, David Abrahams wrote:
>>
>> On Apr 26, 2009, at 7:28 AM, Björn Steinbrink wrote:
>>
>>> On 2009.04.25 15:36:24 -0400, David Abrahams wrote:
>>>> Where it's relevant when the user notices that two distinct files
>>>> have the same id (because they happen to have the same contents)  
>>>> and
>>>> wonders what's up.
>>>
>>> Why would the user have to care about the object files in the repo?
>>
>> What a strange question. I have no idea how to answer. It seems
>> self- evident to me that users of a VCS care that their files are
>> stored in it.
>
> _Their_ files. The files that come from/end up in the working tree. I
> cared about those when I used SVN, too. But I never went to the SVN  
> repo
> to find out if there are two equal files in it. We're talking about
> object names, and those belong to objects, not files in the working
> tree.

I'm telling you, many new users who aren't already versed in Git will  
naturally associate the SHA1 codes exposed by the interface with the  
files they've checked in without understand that they actually  
identify object files (another poorly chosen Git name, if I've manage  
to deduce what it means) rather than directly corresponding to states  
of their files. And anyway, if you want to get into implementation  
details, SHA1s don't always identify object files because blobs get  
delta-compressed.

>>> And why would your implementation save the same object twice, in two
>>> distinct files?
>>
>> One could easily have the expectation that contents can be duplicated
>> because there are numerous precedents in everyone's experience of
>> computing, for example in filesystems and in any programming language
>> that is not pure-functional.
>
> That's not answering my question. I asked why you come up with an
> implementation that is "broken" enough to save the same object twice
> with different file names.

I don't know what you mean by "come up with an implementation."  I'm  
not inventing an implementation.  I'm saying, new users inevitably and  
inexorably develop a mental model of the system they're learning  
about, and they don't always develop the right mental model, and I'm  
saying that it's easy to see how they can fall into incorrect  
assumptions.  The word "hash" helps a bit with avoiding one of those  
assumptions.

> If the implementation does not do that, your
> "when the user notices that two distinct files has the same id" is
> immediately invalid. The user cannot come into that situation then.

I think this is why Git remains more opaque than it should be.  You  
can't assume that people will naturally develop the smartest possible  
mental model of a VCS, even with faced with some hints in the form of  
a partial understanding of Git.

> And
> anyway, when the user notices something, that's a discovery, not an
> expectation.

It's better to give people something to connect their discoveries to  
(e.g. "oh, I see, they call those things hashes, so it makes sense  
that these two identical things are stored once")

>>> The SHA-1 hash is created from the object, that means
>>> the its type, size and data. It's not an id of a file in the working
>>> tree, but of an object
>>
>> All true.  All somewhat subtle distinctions that are not nearly as
>> apparent unless you actually use the word "hash" as I have been
>> advocating.
>
> Hu? How does saying "object hash" instead of "object id" make it any
> more apparent that a file in the working tree is something else than a
> git object?

It makes it apparent that two identical things can only have one ID,  
and thus must correspond to one object.

>>> You can't have two objects with the same contents to begin with,  
>>> same
>>> content => same object.
>>
>> In the Git world, I agree.  In general, I disagree.
>
> I don't think were discussing a term to describe something that
> identifies an object in general. So, "in general" you can disagree as
> much as you want, but for git that doesn't matter at all.

You don't think the general rules of the computing world and existing  
meanings of terms have an impact on a new user's ability to grok Git?   
If not, we don't have much to discuss.

>> The fact that is so in the Git world is reinforced by the notion that
>> the id of an object is a hash of its contents.
>>
>>> You can just have that one object stored multiple times in different
>>> places (for sane implementations this  likely means that you have
>>> more than one repo to look at, and each has its  own copy of that
>>> object, but that's nothing you as an user should have to care  
>>> about).
>>
>>> It's an identity relation: same name/id => same object. Unlike  
>>> e.g. a
>>> hash-table where you are expected to deal with collisions, and  
>>> having
>>> the same hash doesn't mean that you have identical data.  But that's
>>> not true of git, it expects an identity relation, which is IMHO
>>> better expressed through "object name" or "object id".
>>
>> Yes, that's true in the Git world (though not necessarily  
>> elsewhere), or
>> at least you hope it is.  In fact, there's no guarantee that SHA1
>> collisions won't occur; it's just exremely unlikely.  In fact, if you
>> google it you can find some interesting papers about SHA1 collision.
>
> Sure, it's an assumption that has been made and is required to hold  
> true
> for git to work.
>
>> Another way to express what you wrote above:
>>
>>   same same id => same hash ?=> same contents => same object
>>
>> where ?=> means "almost certainly implies."
>
> No, that chain shows how git could be "unreliable" when you get hash
> collisions. You could put that into a chapter that explains the
> implications of the way git generates its object ids. But it's not  
> very
> interesting when you use git and (implicitly) trust the assumption  
> that
> no collisions happen.

My point in mentioning that it's not certain was to point out that you  
left out the implication that actually /is/ certain, even across repos.

> Only when you want to explain how git manages to avoid duplicated
> storage of fully identical contents, then you need to mention that the
> object names are the hashes of the full object contents. But that's  
> not
> what you actually use the object names for.
>
> same content ==> same content hash ==> object name/id ==> same object
>
> (Actually, you need an additional detail: "same
> file/symlink/directory/... contents ==> same object contents", which
> can't be made explicit by just saying that you use a hash).
>
> Your chain was in the wrong order

If you think there's a right order, you haven't understood that all  
the arrows are bidirectional.

> and explains neither the "a tree that
> has the same object name/id for two entries" case (because of the
> uncertainity of the "same hash ?=> same content" part), nor, when read
> in the other direction, where all implications are true, why same
> content leads to the same object (as it already starts at the object
> level).

>> I think the implication is important in both directions.  Neither  
>> one is
>> self-evident to a new user.  Maybe the right answer is 'hash id'.
>
> git could work different. Just moving the storage of the filenames  
> from
> the tree objects to the blobs would mean that you'd get different
> objects for files that have the same content but different names.  
> You'd
> still have a hash of the object contents as the object name, but
> suddenly you get more objects. Just saying "hash" or "hash id" doesn't
> magically explain all the other things.


But that's a strawman.  I'm not claiming that it magically explains  
all the other things.  I'm just claiming that it helps in avoiding  
some possible misunderstandings.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-26 18:12                                       ` Björn Steinbrink
@ 2009-04-26 20:20                                         ` David Abrahams
  0 siblings, 0 replies; 90+ messages in thread
From: David Abrahams @ 2009-04-26 20:20 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Michael Witten, Felipe Contreras, Daniel Barkalow, Jeff King,
	Johan Herland, git, J. Bruce Fields


On Apr 26, 2009, at 2:12 PM, Björn Steinbrink wrote:

> That's also why I asked David why the user would look at the
> object files in the repo (the .git dir).


For what it's worth, I didn't understand what you meant by "object  
files" until now.  I never claimed they would look at those files, at  
least not intentionally.  But just look at any web interface to a Git  
repo and you'll see why they might encounter the object file names  
even before they've installed git on their own machine.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-26 20:17                                         ` David Abrahams
@ 2009-04-26 22:25                                           ` Björn Steinbrink
  2009-04-27  1:41                                             ` David Abrahams
  2009-04-27 16:30                                             ` David Abrahams
  0 siblings, 2 replies; 90+ messages in thread
From: Björn Steinbrink @ 2009-04-26 22:25 UTC (permalink / raw)
  To: David Abrahams
  Cc: Felipe Contreras, Michael Witten, Daniel Barkalow, Jeff King,
	Johan Herland, git, J. Bruce Fields

On 2009.04.26 16:17:43 -0400, David Abrahams wrote:
>
> On Apr 26, 2009, at 1:56 PM, Björn Steinbrink wrote:
>
>> On 2009.04.26 09:55:34 -0400, David Abrahams wrote:
>>>
>>> On Apr 26, 2009, at 7:28 AM, Björn Steinbrink wrote:
>>>
>>>> On 2009.04.25 15:36:24 -0400, David Abrahams wrote:
>>>>> Where it's relevant when the user notices that two distinct files
>>>>> have the same id (because they happen to have the same contents)
>>>>> and wonders what's up.
>>>>
>>>> Why would the user have to care about the object files in the repo?
>>>
>>> What a strange question. I have no idea how to answer. It seems
>>> self- evident to me that users of a VCS care that their files are
>>> stored in it.
>>
>> _Their_ files. The files that come from/end up in the working tree. I
>> cared about those when I used SVN, too. But I never went to the SVN
>> repo to find out if there are two equal files in it. We're talking
>> about object names, and those belong to objects, not files in the
>> working tree.
>
> I'm telling you, many new users who aren't already versed in Git will
> naturally associate the SHA1 codes exposed by the interface with the
> files they've checked in without understand that they actually
> identify object files (another poorly chosen Git name, if I've manage
> to deduce what it means)

Hm, not sure if that name is really important. The way objects are
stored is an implementation detail. Usually, we're just talking about
"objects" not the files the loose objects are stored in (loose object =
an object stored in its own file, not in a pack file). But as you
complained about it, how would you call a file in which an object is
stored?

> rather than directly corresponding to states
> of their files. And anyway, if you want to get into implementation
> details, SHA1s don't always identify object files because blobs get
> delta-compressed.

True, they identify the object, it's not even necessesary to mention
delta compression, just having the object in a pack file causes the
object name to no longer identify the file in which the object can be
found. Heck, the object might be in a different repo when you use
alternates ;-). And I think I never explicitly said that they
identify a file storing an object, but implied that by "accepting" your
example and assuming that you meant two object files having the same id.
I should have said that your "two distinct files have the same id" makes
no sense and should have asked what you mean.

>>>> And why would your implementation save the same object twice, in
>>>> two distinct files?
>>>
>>> One could easily have the expectation that contents can be
>>> duplicated because there are numerous precedents in everyone's
>>> experience of computing, for example in filesystems and in any
>>> programming language that is not pure-functional.
>>
>> That's not answering my question. I asked why you come up with an
>> implementation that is "broken" enough to save the same object twice
>> with different file names.
>
> I don't know what you mean by "come up with an implementation."  I'm
> not inventing an implementation.

Sorry, "come up with" is clearly wrong. "Assume" or "expect" or so might
have been more correct. But I think we could agree that you misused the
"id" term by using it for files, and what ensued confused both of us? If
you didn't mean the stored objects by "files", then that part of the
discussion was just based on a misunderstanding and can be ignored.

> I'm saying, new users inevitably and inexorably develop a mental model
> of the system they're learning about, and they don't always develop
> the right mental model, and I'm saying that it's easy to see how they
> can fall into incorrect assumptions.  The word "hash" helps a bit with
> avoiding one of those assumptions.

I've not met a lot of people that were actually confused about the fact
that the same object might be "reused" for tree entries with different
names. But most (all?) of those that were confused knew that the objects
are identified by hashes, but expected the filenames to be part of the
object and didn't know about tree objects.

>> If the implementation does not do that, your "when the user notices
>> that two distinct files has the same id" is immediately invalid. The
>> user cannot come into that situation then.
>
> I think this is why Git remains more opaque than it should be.  You
> can't assume that people will naturally develop the smartest possible
> mental model of a VCS, even with faced with some hints in the form of
> a partial understanding of Git.

I don't think I understand what you mean here. If users don't understand
the data model, that's caused by missing/bad documentation or because
the user doesn't want to read the existing documentation. (I'll make no
assumptions here, it's been some time since I had a close look at the
docs). But I've been talking about how the given implementation stores
data in the repository. Could you explain?

>> And anyway, when the user notices something, that's a discovery, not
>> an expectation.
>
> It's better to give people something to connect their discoveries to
> (e.g. "oh, I see, they call those things hashes, so it makes sense
> that these two identical things are stored once")

We're talking about seeing, for example,  the same object name more than
once, for different "files", in e.g. gitweb, right? Then the "Hu? Isn't
the filename part of the object?" thing might still apply. The user can
still very easily make a wrong guess.

As Michael said in another mail, the important point is probably rather
to teach people to make a distinction between files and directories in
the working tree and the contents stored in the git objects. And that's
not accomplished by saying that the id is a hash, when the user doesn't
know what the hash is based upon.

Somewhat related: I'm trying to remember if I ever had problems
explaining the concept of hardlinks to someone, but I don't remember any
such situation anymore. There are no hashes involved there, and I feel
like that was quite easy to grasp for most people I talked to. It's
pretty similar, separating content from names.

>>>> The SHA-1 hash is created from the object, that means the its type,
>>>> size and data. It's not an id of a file in the working tree, but of
>>>> an object
>>>
>>> All true.  All somewhat subtle distinctions that are not nearly as
>>> apparent unless you actually use the word "hash" as I have been
>>> advocating.
>>
>> Hu? How does saying "object hash" instead of "object id" make it any
>> more apparent that a file in the working tree is something else than
>> a git object?
>
> It makes it apparent that two identical things can only have one ID,
> and thus must correspond to one object.

See above, the user needs to know "what" is identical in the first
place.

>>>> You can't have two objects with the same contents to begin with,
>>>> same content => same object.
>>>
>>> In the Git world, I agree.  In general, I disagree.
>>
>> I don't think were discussing a term to describe something that
>> identifies an object in general. So, "in general" you can disagree as
>> much as you want, but for git that doesn't matter at all.
>
> You don't think the general rules of the computing world and existing
> meanings of terms have an impact on a new user's ability to grok Git?
> If not, we don't have much to discuss.

This was probably also based on the files+id misunderstanding combined
with the fact that you used the term "object" where I thought that you
meant a "git object" (you probably didn't, right?). Because when talking
about "git objects" you actually can't have two different ones with the
same "value" (I guess you mean type, size and content when you say
"value", right?)

And admittedly, for this one, the "hash" term _would_ help to get the
user to understand that in git you cannot have two different objects
with the same contents and that this makes git different and efficient.
But I still don't buy that this is important for understanding the basic
data model. It's a nice hint why git can always quickly tell that two
things are equal and why the repository size doesn't explode. But the
important part is the separation of names and content, that trees give
names to the contents stored in blobs. The "hash" name would only help
to understand its efficiency once you already understood the data model.
See below.

>>> The fact that is so in the Git world is reinforced by the notion
>>> that the id of an object is a hash of its contents.
>>>
>>>> You can just have that one object stored multiple times in
>>>> different places (for sane implementations this  likely means that
>>>> you have more than one repo to look at, and each has its  own copy
>>>> of that object, but that's nothing you as an user should have to
>>>> care  about).
>>>
>>>> It's an identity relation: same name/id => same object. Unlike
>>>> e.g. a hash-table where you are expected to deal with collisions,
>>>> and  having the same hash doesn't mean that you have identical
>>>> data.  But that's not true of git, it expects an identity relation,
>>>> which is IMHO better expressed through "object name" or "object
>>>> id".
>>>
>>> Yes, that's true in the Git world (though not necessarily
>>> elsewhere), or at least you hope it is.  In fact, there's no
>>> guarantee that SHA1 collisions won't occur; it's just exremely
>>> unlikely.  In fact, if you google it you can find some interesting
>>> papers about SHA1 collision.
>>
>> Sure, it's an assumption that has been made and is required to hold
>> true for git to work.
>>
>>> Another way to express what you wrote above:
>>>
>>>   same same id => same hash ?=> same contents => same object
>>>
>>> where ?=> means "almost certainly implies."
>>
>> No, that chain shows how git could be "unreliable" when you get hash
>> collisions. You could put that into a chapter that explains the
>> implications of the way git generates its object ids. But it's not
>> very interesting when you use git and (implicitly) trust the
>> assumption  that no collisions happen.
>
> My point in mentioning that it's not certain was to point out that you
> left out the implication that actually /is/ certain, even across
> repos.

And my point is that this is not important for understanding the basic
data model, but only how git efficiently implements it, and which
assumptions it has to make.

>> Only when you want to explain how git manages to avoid duplicated
>> storage of fully identical contents, then you need to mention that
>> the object names are the hashes of the full object contents. But
>> that's  not what you actually use the object names for.
>>
>> same content ==> same content hash ==> object name/id ==> same object
>>
>> (Actually, you need an additional detail: "same
>> file/symlink/directory/... contents ==> same object contents", which
>> can't be made explicit by just saying that you use a hash).
>>
>> Your chain was in the wrong order
>
> If you think there's a right order, you haven't understood that all
> the arrows are bidirectional.

There's one that is not truly bidirectional.

id <=> hash <?=> contents <=> object

I can't go from id/hash to contents/object without hitting the "hash =>
content" assumption. I had two chains for a reason.

	id => object => content => hash
and
	content => hash => id => object

are guaranteed, at least within a single repo.

While:
	content => hash => id => object ?=> content

has a non-guaranteed part again, just an assumption, at least when the
first and last "content" mean the same content. If you get a collision,
you rather have a guarantee that one version of the content is "not in
the repo".

And as I said, that fact, that the identifier is not globally unique,
along with the fact that git cannot have two different objects with the
same contents or name is not required to understand how commits, tree
and blobs go together to store the history of a project. It's IM(NS?)HO
far more important to understand the separation of names and content.
Then you can understand that multiple names can be associated with the
same object holding some content (which can be done with other kinds of
ids as well, even with more than one object having the same contents,
just not necessarily as efficiently). And that objects have a name that
is used to identify the object. And only then can you understand and
appreciate how hashes help to efficiently implement that model, knowing
which data is used to calculate the hash.

>> and explains neither the "a tree that has the same object name/id for
>> two entries" case (because of the uncertainity of the "same hash ?=>
>> same content" part), nor, when read in the other direction, where all
>> implications are true, why same content leads to the same object (as
>> it already starts at the object level).
>
>>> I think the implication is important in both directions.  Neither
>>> one is self-evident to a new user.  Maybe the right answer is 'hash
>>> id'.
>>
>> git could work different. Just moving the storage of the filenames
>> from the tree objects to the blobs would mean that you'd get
>> different objects for files that have the same content but different
>> names.  You'd still have a hash of the object contents as the object
>> name, but suddenly you get more objects. Just saying "hash" or "hash
>> id" doesn't magically explain all the other things.
>
> But that's a strawman.  I'm not claiming that it magically explains
> all the other things.  I'm just claiming that it helps in avoiding
> some possible misunderstandings.

And I think that it doesn't help much at all and might confuse users,
because they expect the hash to be based on the wrong stuff. It's just
important that the "thing" is used to identify an object.

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25  0:48                         ` David Abrahams
@ 2009-04-26 22:42                           ` Björn Steinbrink
  0 siblings, 0 replies; 90+ messages in thread
From: Björn Steinbrink @ 2009-04-26 22:42 UTC (permalink / raw)
  To: David Abrahams
  Cc: Michael Witten, Jeff King, Daniel Barkalow, Johan Herland, git,
	J. Bruce Fields

On 2009.04.24 20:48:57 -0400, David Abrahams wrote:
>
> On Apr 24, 2009, at 8:01 PM, Michael Witten wrote:
>
>>> What's wrong with just calling the object name "object name"?
>>
>> What's wrong with calling the object address "object address"?
>
> Neither captures the connection to the object's contents.  I think  
> "value ID" would be closer, but it's probably too horrible.

I think I asked this in another mail, but I'm quite tired, so just to
make sure: What do you mean by "value"? I might be weird (I'm not a
native speaker, so I probably make funny and wrong connotations from
time to time), but while I can accept "content" to include the type and
size of the object, the term "value" makes me want to exclude those
pieces of meta data. So "value" somehow feels wrong to me, as the hash
covers those two fields.

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25  0:39                         ` David Abrahams
@ 2009-04-26 23:35                           ` Björn Steinbrink
  0 siblings, 0 replies; 90+ messages in thread
From: Björn Steinbrink @ 2009-04-26 23:35 UTC (permalink / raw)
  To: David Abrahams; +Cc: Jeff King, Michael Witten, J. Bruce Fields, git

On 2009.04.24 20:39:12 -0400, David Abrahams wrote:
>
> On Apr 24, 2009, at 6:45 PM, Björn Steinbrink wrote:
>
>>> I think UI/API works way better than porcelain/plumbing. We are,
>>> after all, programmers.
>>
>> We are programmers, but not all git users are programmers.
>
> I'm sure you will admit that the vast majority are programmers.  This is 
> about speaking effectively to your primary audience.

My experience says to at least drop the "vast", but that might be
biased, due to the fact that the non-programmers probably need more time
when you explain things to them.

But thinking about it again, I don't think I like UI/API regardless of
that. High-Level/Low-Level yes, but API? No. The plumbing is meant to be
stable so it can serve as an API, and it also has options that only make
sense when you use it that way (e.g. the parse-opt support in rev-parse)
but I also happen to just use those programs as a UI. For example
ls-files, ls-remote, or apply.

And git(1) also has the sections titled "HIGH-LEVEL COMMANDS
(PORCELAIN)" and "LOW_LEVEL COMMANDS (PLUMBING)". So if we were to get
rid of the porcelain and plumbing terms, then _I_ would go for just
"high-level commands" and "low-level commands".

>>> I should also say, most of the docs and interfaces I see in Git (and
>>> its wrappers, web interfaces, etc.) give the SHA1 hashes way too much
>>> exposure. The times when it's actually more convenient to use a hash
>>> instead of one of the other notations are rare,
>>
>> How often do you need a name for a commit shown by a command and can
>> accept that it is not stable?
>
> I can accept it as long as it's stable inside my own repo.  Maybe I
> need the SHA1 to talk about it wherever it may roam.  I think you
> could count in the other direction (i.e. from the roots instead of the
> leaves) to get fairly stable symbolic names.

I'm sure this has been discussed in the earlier "stable revision
numbers" threads as well, so you can find more information there, but I
just want to mention that one drawback of this is that those numbers
still have no notion of "commit age". You could have 5000 commits in
your repo, and then you fetch someone elses stuff that might have some
very old stuff that you don't have yet. And that gets high numbers now.
So 5051 might be older than 200. Doesn't exactly help to make those
numbers "useful". Just like the "gaps" you get by using e.g. rebase -i
or other means that cause commits to be garbage collected.

> Also, I don't think I need to see the hashes for trees and blobs most of 
> the time.

OK, I think finally see what you might mean there. I'm almost
exclusively using the CLI and gitk and seldomly see tree/blob object
names in a prominent way unless I ask for them. But I just noticed that
gitweb is at least showing a "daunting" number of object names without
further details when you ask for a "commit", while the "commitdiff" is
closer to what "git show <commit>" would show. And yeah, I think that
could be improved, moving the object name more into the "background" (I
don't think it should be completely removed, just be less prominent).
Any other "high-level" tool that you noticed being noisy about tree/blob
hashes?

>>> Oh, one other specific issue: the rev-parse manpage uses $GIT_DIR
>>> without saying what it is. I *think* that means the root of the
>>> working copy and has nothing to do with environment variables, but
>>> it's hard to be sure, and if I'm right about that, it's misleading
>>> notation.
>>
>> $GIT_DIR means the .git directory of a non-bare repo.
>
>
> Thanks for clarifying.  But don't neglect to fix the docs so the next
> guy doesn't have to ask ;-)

Hm, I provide the information, you provide the patch? ;-) Hm, maybe I'll
find some time to provide one myself. But my git and general todo lists
already grew beyond all limits...

> BTW, "[non-]bare repo" is yet another Git-specific jargon.  I know what 
> it means... again, only because I asked someone.

At least "bare repository" appears as an entry in the glossary
(gitglossary(7), also reachable via "git help glossary").

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:25                     ` Jeff King
@ 2009-04-26 23:41                       ` Björn Steinbrink
  0 siblings, 0 replies; 90+ messages in thread
From: Björn Steinbrink @ 2009-04-26 23:41 UTC (permalink / raw)
  To: Jeff King
  Cc: Daniel Barkalow, Johan Herland, Michael Witten, git,
	David Abrahams, J. Bruce Fields

On 2009.04.24 19:25:31 -0400, Jeff King wrote:
> On Fri, Apr 24, 2009 at 07:21:26PM -0400, Daniel Barkalow wrote:
> 
> > (And, actually, I think git has a few usability warts due to relying too 
> > much on command line arguments being objects; it would be quite nice if 
> > "git blame 1a2b3c:Makefile" worked despite this technically being 
> > incoherent.)
> 
> Yeah, I think another is that "git show master:file" will not do CRLF or
> other filters, and "git diff master:file other:file" will not respect
> diff settings. I think all of those could be solved by path lookup
> attaching a "here is a pathname I used to get to this object" string,
> which can then be accessed as appropriate.
> 
> It is not all that different conceptually than what "git rev-list
> --objects" does.

It's also something that hash-object already does in some way. To apply
e.g. attributes to content that you supply via stdin.

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-24 23:29                     ` Michael Witten
@ 2009-04-27  0:00                       ` Björn Steinbrink
  0 siblings, 0 replies; 90+ messages in thread
From: Björn Steinbrink @ 2009-04-27  0:00 UTC (permalink / raw)
  To: Michael Witten
  Cc: Daniel Barkalow, Jeff King, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On 2009.04.24 18:29:22 -0500, Michael Witten wrote:
> On Fri, Apr 24, 2009 at 18:21, Daniel Barkalow <barkalow@iabervon.org> wrote:
> > "git blame 1a2b3c:Makefile" worked despite this technically being
> > incoherent.
> 
> It seems to work on my end, and it's perfectly coherent if you
> consider git-blame to be overloaded to handle both pointers and
> addresses (or references and object names, if you prefer).

Fails for me. And it's technically incoherent in that it makes no sense
to use blame with a blob object. 1a2b3c:Makefile identifies "just" a
blob object. And that has no parents and no history, just contents. Only
the commit objects have the references that connect them to form a
history.

For example, you could have a history like this:

A---B---C---D---E

And a file "foo" that has the same contents for A and E. Then "A:foo"
and "E:foo" lead to the same blob object, and you can't uniquely go from
that blob object to any commit object. So technically, you can't tell if
"git blame E:foo" means "git blame E foo" or "git blame A foo" (and you
can add a bunch of complexity by having, for example, a second file
with a different name that had the same content at some point).

To make that coherent, you must change the definition of the
<tree-ish>:<path> syntax so that the context in which the path is
resolved is kept, it must no longer just identify an object, but
something more complex.

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-26 22:25                                           ` Björn Steinbrink
@ 2009-04-27  1:41                                             ` David Abrahams
  2009-04-27 16:30                                             ` David Abrahams
  1 sibling, 0 replies; 90+ messages in thread
From: David Abrahams @ 2009-04-27  1:41 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Felipe Contreras, Michael Witten, Daniel Barkalow, Jeff King,
	Johan Herland, git, J. Bruce Fields


on Sun Apr 26 2009, Björn Steinbrink <B.Steinbrink-AT-gmx.de> wrote:

>> I think this is why Git remains more opaque than it should be.  You
>> can't assume that people will naturally develop the smartest possible
>> mental model of a VCS, even with faced with some hints in the form of
>> a partial understanding of Git.
>
> I don't think I understand what you mean here. If users don't understand
> the data model, that's caused by missing/bad documentation or because
> the user doesn't want to read the existing documentation. (I'll make no
> assumptions here, it's been some time since I had a close look at the
> docs). But I've been talking about how the given implementation stores
> data in the repository. Could you explain?

You don't have to "not want to read the documentation" to have an
incomplete mental model.  The mental model development doesn't happen
upon finishing the documentation; it happens while the person is
learning.  Halfway through the docs, I have an incomplete mental model.
If you make it hard enough for me, maybe I never finish and I retain
that incomplete model forever.  The more you can help people avoid
incorrect assumptions as they read along, the easier it will be for them
to grok the next bit they are reading, and the less likely they are to
become discouraged.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-26 22:25                                           ` Björn Steinbrink
  2009-04-27  1:41                                             ` David Abrahams
@ 2009-04-27 16:30                                             ` David Abrahams
  2009-04-27 16:52                                               ` Michael Witten
  1 sibling, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-27 16:30 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Felipe Contreras, Michael Witten, Daniel Barkalow, Jeff King,
	Johan Herland, git, J. Bruce Fields


On Apr 26, 2009, at 6:25 PM, Björn Steinbrink wrote:

> On 2009.04.26 16:17:43 -0400, David Abrahams wrote:
>>

>> I'm telling you, many new users who aren't already versed in Git will
>> naturally associate the SHA1 codes exposed by the interface with the
>> files they've checked in without understand that they actually
>> identify object files (another poorly chosen Git name, if I've manage
>> to deduce what it means)
>
> Hm, not sure if that name is really important. The way objects are
> stored is an implementation detail. Usually, we're just talking about
> "objects" not the files the loose objects are stored in (loose  
> object =
> an object stored in its own file, not in a pack file). But as you
> complained about it, how would you call a file in which an object is
> stored?

"Object" is OK. "Object file" is overloaded and confusing.  I'd just  
say there are "Git data files" or "files in Git's object store", some  
of which store single objects whose id is the same as the filename,  
and some of which store multiple objects.

>> rather than directly corresponding to states
>> of their files. And anyway, if you want to get into implementation
>> details, SHA1s don't always identify object files because blobs get
>> delta-compressed.
>
> True, they identify the object, it's not even necessesary to mention
> delta compression, just having the object in a pack file causes the
> object name to no longer identify the file in which the object can be
> found.

Right.

> Heck, the object might be in a different repo when you use
> alternates ;-). And I think I never explicitly said that they
> identify a file storing an object, but implied that by "accepting"  
> your
> example and assuming that you meant two object files having the same  
> id.

Yes, that assumption was wrong, and then when you responded using the  
term "object file" I didn't know what it meant.

> I should have said that your "two distinct files have the same id"  
> makes
> no sense and should have asked what you mean.
>
>>>>> And why would your implementation save the same object twice, in
>>>>> two distinct files?
>>>>
>>>> One could easily have the expectation that contents can be
>>>> duplicated because there are numerous precedents in everyone's
>>>> experience of computing, for example in filesystems and in any
>>>> programming language that is not pure-functional.
>>>
>>> That's not answering my question. I asked why you come up with an
>>> implementation that is "broken" enough to save the same object twice
>>> with different file names.
>>
>> I don't know what you mean by "come up with an implementation."  I'm
>> not inventing an implementation.
>
> Sorry, "come up with" is clearly wrong. "Assume" or "expect" or so  
> might
> have been more correct.

I think I explained why one might make that assumption.

> But I think we could agree that you misused the
> "id" term by using it for files, and what ensued confused both of  
> us? If
> you didn't mean the stored objects by "files", then that part of the
> discussion was just based on a misunderstanding and can be ignored.

I meant what the user thinks of as files stored in the repository.

>> I'm saying, new users inevitably and inexorably develop a mental  
>> model
>> of the system they're learning about, and they don't always develop
>> the right mental model, and I'm saying that it's easy to see how they
>> can fall into incorrect assumptions.  The word "hash" helps a bit  
>> with
>> avoiding one of those assumptions.
>
> I've not met a lot of people that were actually confused about the  
> fact
> that the same object might be "reused" for tree entries with different
> names. But most (all?) of those that were confused knew that the  
> objects
> are identified by hashes, but expected the filenames to be part of the
> object and didn't know about tree objects.

Well, there's certainly precedent for the idea that the filenames are  
distinct from file contents.

>>> And anyway, when the user notices something, that's a discovery, not
>>> an expectation.
>>
>> It's better to give people something to connect their discoveries to
>> (e.g. "oh, I see, they call those things hashes, so it makes sense
>> that these two identical things are stored once")
>
> We're talking about seeing, for example,  the same object name more  
> than
> once, for different "files", in e.g. gitweb, right? Then the "Hu?  
> Isn't
> the filename part of the object?" thing might still apply. The user  
> can
> still very easily make a wrong guess.
>
> As Michael said in another mail, the important point is probably  
> rather
> to teach people to make a distinction between files and directories in
> the working tree and the contents stored in the git objects. And  
> that's
> not accomplished by saying that the id is a hash, when the user  
> doesn't
> know what the hash is based upon.
>
> Somewhat related: I'm trying to remember if I ever had problems
> explaining the concept of hardlinks to someone, but I don't remember  
> any
> such situation anymore. There are no hashes involved there, and I feel
> like that was quite easy to grasp for most people I talked to. It's
> pretty similar, separating content from names.

The difference is that hardlinks are only generated explicitly.  You'd  
need something like a hash to generate them automatically and  
implicitly.

>>>>> You can't have two objects with the same contents to begin with,
>>>>> same content => same object.
>>>>
>>>> In the Git world, I agree.  In general, I disagree.
>>>
>>> I don't think were discussing a term to describe something that
>>> identifies an object in general. So, "in general" you can disagree  
>>> as
>>> much as you want, but for git that doesn't matter at all.
>>
>> You don't think the general rules of the computing world and existing
>> meanings of terms have an impact on a new user's ability to grok Git?
>> If not, we don't have much to discuss.
>
> This was probably also based on the files+id misunderstanding combined
> with the fact that you used the term "object" where I thought that you
> meant a "git object" (you probably didn't, right?).

I didn't.  I meant the general notion of "object" in computing.  I'm  
trying to talk about how the language used by Git's docs can bias  
people toward correct or incorrect understandings of Git as they're  
learning.

> Because when talking
> about "git objects" you actually can't have two different ones with  
> the
> same "value" (I guess you mean type, size and content when you say
> "value", right?)

Yes.  Size is a function of content, so that adds nothing, and whether  
it even makes sense to say that two things of different type have  
identical content is debatable.

> And admittedly, for this one, the "hash" term _would_ help to get the
> user to understand that in git you cannot have two different objects
> with the same contents and that this makes git different and  
> efficient.
> But I still don't buy that this is important for understanding the  
> basic
> data model. It's a nice hint why git can always quickly tell that two
> things are equal and why the repository size doesn't explode. But the
> important part is the separation of names and content, that trees give
> names to the contents stored in blobs.

But there's nothing unique about that; it's not distinct from what  
filesystems do.

> The "hash" name would only help
> to understand its efficiency once you already understood the data  
> model.

It would help to reinforce that an object's id is a function of its  
contents.  It would help to make clear why the same object can be  
identified in the same way across all repos.

>>>> Another way to express what you wrote above:
>>>>
>>>>  same same id => same hash ?=> same contents => same object
>>>>
>>>> where ?=> means "almost certainly implies."
>>>
>>> No, that chain shows how git could be "unreliable" when you get hash
>>> collisions. You could put that into a chapter that explains the
>>> implications of the way git generates its object ids. But it's not
>>> very interesting when you use git and (implicitly) trust the
>>> assumption  that no collisions happen.
>>
>> My point in mentioning that it's not certain was to point out that  
>> you
>> left out the implication that actually /is/ certain, even across
>> repos.
>
> And my point is that this is not important for understanding the basic
> data model, but only how git efficiently implements it, and which
> assumptions it has to make.

Look, you're talking to someone who has just had to go through the  
process of learning all this stuff.  What I'm telling you is based on  
my experiences.  Just one datapoint, to be sure, but knowing that it's  
a hash was crucial for me.

>> If you think there's a right order, you haven't understood that all
>> the arrows are bidirectional.
>
> There's one that is not truly bidirectional.
>
> id <=> hash <?=> contents <=> object
>
> I can't go from id/hash to contents/object without hitting the "hash  
> =>
> content" assumption.

Quite right.  You can't derive contents from the hash.

>> But that's a strawman.  I'm not claiming that it magically explains
>> all the other things.  I'm just claiming that it helps in avoiding
>> some possible misunderstandings.
>
> And I think that it doesn't help much at all and might confuse users,
> because they expect the hash to be based on the wrong stuff. It's just
> important that the "thing" is used to identify an object.


OK, I give up.  *I* now understand the system, and it's starting to  
look like too much of a struggle to improve things for others, so they  
can fend for themselves I guess.

Thanks for the lively discussion, anyway.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-27 16:30                                             ` David Abrahams
@ 2009-04-27 16:52                                               ` Michael Witten
  0 siblings, 0 replies; 90+ messages in thread
From: Michael Witten @ 2009-04-27 16:52 UTC (permalink / raw)
  To: David Abrahams
  Cc: Björn Steinbrink, Felipe Contreras, Daniel Barkalow,
	Jeff King, Johan Herland, git, J. Bruce Fields

2009/4/27 David Abrahams <dave@boostpro.com>:
>
> I didn't.  I meant the general notion of "object" in computing.  I'm trying
> to talk about how the language used by Git's docs can bias people toward
> correct or incorrect understandings of Git as they're learning.

Actually, I believe object was first used to describe anything stored
in memory. Given that, I still think my usage of C pointer terminology
is superior to everything; it's just the case that objects are content
addressable in the git world and location addressable in the C world.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25  0:53                       ` David Abrahams
@ 2009-04-29  6:34                         ` Jeff King
  2009-04-29 13:27                           ` David Abrahams
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff King @ 2009-04-29  6:34 UTC (permalink / raw)
  To: David Abrahams; +Cc: git

On Fri, Apr 24, 2009 at 08:53:37PM -0400, David Abrahams wrote:

>> Actually, it is not the generally of trees that I think is interesting
>> there, but the generality of _objects_. That is, each of those things is
>> a first-class object, and has a unique name by which it can be  
>> referred.
>
> I'm sorry, but I think most people would find that so unremarkable that 
> making a big deal about it would lead to "what am I missing here"  
> confusion.  Maybe a person who's exclusively used CVS (or older)  
> technologies before coming to Git would be happy to know that, but it's 
> sort of obvious.  In CVS the lack of first-class directories sticks out 
> like a sore thumb.

Sadly, I was away from email all weekend and so missed the ensuing storm
in this thread. :) However, I did want to respond to this one point.

To me (and I am talking from personal experience, so it really may be
_just_ me), an important part of understanding git was understanding the
object storage. That is, half of the idea of git is a big database of
content-addressable objects. The _other_ half is the actual VCS built on
top of it. ;)

And by understanding that, and the places where objects refer to each
other (commits point to other commits and to trees, trees point to
blobs, blobs are always leaves), I find it easier to understand what
each operation is doing. And that if I'm unsure of something, I can
always inspect it at many levels.

I don't know. Maybe that is too low-level for most people. I did end up
working on git, so perhaps I am inordinately interested.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-29  6:34                         ` Jeff King
@ 2009-04-29 13:27                           ` David Abrahams
  2009-04-29 14:05                             ` Jeff King
  0 siblings, 1 reply; 90+ messages in thread
From: David Abrahams @ 2009-04-29 13:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git


On Apr 29, 2009, at 2:34 AM, Jeff King wrote:

> On Fri, Apr 24, 2009 at 08:53:37PM -0400, David Abrahams wrote:
>
>>> Actually, it is not the generally of trees that I think is  
>>> interesting
>>> there, but the generality of _objects_. That is, each of those  
>>> things is
>>> a first-class object, and has a unique name by which it can be
>>> referred.
>>
>> I'm sorry, but I think most people would find that so unremarkable  
>> that
>> making a big deal about it would lead to "what am I missing here"
>> confusion.  Maybe a person who's exclusively used CVS (or older)
>> technologies before coming to Git would be happy to know that, but  
>> it's
>> sort of obvious.  In CVS the lack of first-class directories sticks  
>> out
>> like a sore thumb.
>
> Sadly, I was away from email all weekend and so missed the ensuing  
> storm
> in this thread. :) However, I did want to respond to this one point.
>
> To me (and I am talking from personal experience, so it really may be
> _just_ me), an important part of understanding git was understanding  
> the
> object storage. That is, half of the idea of git is a big database of
> content-addressable objects.

Absolutely, it's important to know that everything is content- 
addressable (which essentially communicates the same important  
information as "the object's id is a hash of its contents").  I was  
trying to say that the fact that each one is a "first-class" object   
and has a unique name is not particularly remarkable.

--
David Abrahams
BoostPro Computing
http://boostpro.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-29 13:27                           ` David Abrahams
@ 2009-04-29 14:05                             ` Jeff King
  0 siblings, 0 replies; 90+ messages in thread
From: Jeff King @ 2009-04-29 14:05 UTC (permalink / raw)
  To: David Abrahams; +Cc: git

On Wed, Apr 29, 2009 at 09:27:11AM -0400, David Abrahams wrote:

>> object storage. That is, half of the idea of git is a big database of
>> content-addressable objects.
>
> Absolutely, it's important to know that everything is content-addressable 
> (which essentially communicates the same important information as "the 
> object's id is a hash of its contents").  I was trying to say that the 
> fact that each one is a "first-class" object  and has a unique name is not 
> particularly remarkable.

I see. I consider those concepts inextricably linked. But I suppose you
could explain one without the other.

Anyway, thanks for the perspective.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-04-25  0:01                       ` Michael Witten
  2009-04-25  0:48                         ` David Abrahams
@ 2009-05-02 15:53                         ` Björn Steinbrink
  2009-05-02 18:36                           ` Michael Witten
  1 sibling, 1 reply; 90+ messages in thread
From: Björn Steinbrink @ 2009-05-02 15:53 UTC (permalink / raw)
  To: Michael Witten
  Cc: Jeff King, Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On 2009.04.24 19:01:48 -0500, Michael Witten wrote:
> 2009/4/24 Björn Steinbrink <B.Steinbrink@gmx.de>:
> >> In fact, I think it's important to note that the notation:
> >>
> >>     git show master:Makefile
> >>
> >> actually involves a translation from a Unix filesystem address to a
> >> git object address that is then used to find the relevant data.
> >
> > Hm? Resolving master:Makefile means to first find what master is, most
> > likely the shortname for refs/heads/master. That usually references a
> > commit object (by its name). The "<tree-ish>:<path>" syntax then causes
> > git to lookup the tree referenced by that commit (again, by its name).
> > And then the tree entry for "Makefile" is looked up, leading to the name
> > for the object identified by "master:Makefile".
> 
> Firstly, your head is too bound to low-level implementation.
> 
> Secondly, you've basically just expounded upon what I said. The
> Makefile part is for humans to write using a filesystem path (address)
> that is mapped into what I call a git address. The point is that the
> user is interfacing between two theories of content storage.

Sorry, that part missed a few sentences I thought I had written. It was
meant to show where the term "reference" is used. I just walked along
your example, as that was right there, and I didn't have to come up with
something else ;-)

Of course there are two "parts", just like scp uses <host>:<path>.

> >> Rather than being hidden, it should be exposed: I think it would be
> >> beneficial to use the word 'address' rather than 'reference' when
> >> talking about the SHA-1 names. Then HEAD could be called a pointer
> >> variable, etc.
> >
> > What's wrong with just calling the object name "object name"?
> 
> What's wrong with calling the object address "object address"?

The term "object name" is already used in the docs, so you'll have to
prove that it's bad and needs to be replaced.

> As I've stated: "address", "pointer", and "handle" are an analogy to
> terminology that has been around for ages. In fact, another name for
> "pointer" is "reference".

AFAIK a pointer is just one kind of reference. C++ references are
another kind, file descriptors are yet another. A reference is one piece
of data that lets me access a different piece of data.

And there are probably plenty of examples where you could apply that
analogy, yet nobody (I know) does. Arrays, database tables, ...

And "memory" usually means "RAM" to me, not "WORM"-memory (well,
actually, you can also delete and then rewrite, but not modify). So the
analogy would even hurt my mental model (just like the "commit --amend"
command might be consider harmful, because it actually creates a new
commit, but some users actually think the original commit is modified).

> >> So, a pointer variable's value is an object address that is the
> >> location of an object in git 'memory'. I think using this approach
> >> would make things significantly more transparent.
> >
> > But then HEAD would be a pointer pointer variable (symbolic ref), unless
> > you have a detached HEAD.
> 
> We call those handles.

Isn't a handle basically an opaque/abstract reference, at least in
"modern" usage? Symvolic references aren't. The user is free to create
and manipulate them, and gets full access to the things referenced by
them. And saying that HEAD is a reference, that might be symbolic is
IMHO by far easier to understand than saying that HEAD might be a
pointer or a handle.

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-05-02 15:53                         ` Björn Steinbrink
@ 2009-05-02 18:36                           ` Michael Witten
  2009-05-02 21:11                             ` Björn Steinbrink
  0 siblings, 1 reply; 90+ messages in thread
From: Michael Witten @ 2009-05-02 18:36 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Jeff King, Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

2009/5/2 Björn Steinbrink <B.Steinbrink@gmx.de>:
>> As I've stated: "address", "pointer", and "handle" are an analogy to
>> terminology that has been around for ages. In fact, another name for
>> "pointer" is "reference".
>
> AFAIK a pointer is just one kind of reference. C++ references are
> another kind...

Actually, a C++ reference is a pointer with restrictions (AFAIK).

> A reference is one piece of data that lets me access a different
> piece of data.

The key word there is 'access', which implies some kind of storage (or memory).

>
> And there are probably plenty of examples where you could apply that
> analogy, yet nobody (I know) does. Arrays, database tables, ...

Well, this terminology is certainly used with arrays in C, because
array elements can be accessed with pointers.

Also, databases use a much different scheme for addressing information
than does memory.

However, you're probably correct that pointer terminology doesn't
exist much outside of C/C++ and older languages (Ada?).

>
> And "memory" usually means "RAM" to me, not "WORM"-memory (well,
> actually, you can also delete and then rewrite, but not modify).

Well, I don't see how Random Access Memory really conflicts. One
certainly can access objects in the object memory/store randomly. The
main difference is that the computer store is addressed by location,
wheras the git store is addressed by content.

Also, I would say that conceptually deletion is an implementation
detail. Because git's object store is content addressable, one could
think of it as already containing all possible objects (of course, I'm
assuming that the 160-bit hash is also an implementation detail; an
infinite number of objects implies infinitely large addresses, though
the nonsignificant zeros could be disregarded as with real numbers or
something. I don't know, I'm making this up as I go :-D). That the git
tools ever complain no such object exists is an implementation detail
resulting from our finite storage in reality.

> So the
> analogy would even hurt my mental model (just like the "commit --amend"
> command might be consider harmful, because it actually creates a new
> commit, but some users actually think the original commit is modified).

Actually, this is why it's so important to have the underlying
concepts at hand. Understanding that objects are simply addressed by
content (that is, objects are immutable) completely extirpates this
kind of confusion.

>> >> So, a pointer variable's value is an object address that is the
>> >> location of an object in git 'memory'. I think using this approach
>> >> would make things significantly more transparent.
>> >
>> > But then HEAD would be a pointer pointer variable (symbolic ref), unless
>> > you have a detached HEAD.
>>
>> We call those handles.
>
> Isn't a handle basically an opaque/abstract reference, at least in
> "modern" usage? Symvolic references aren't. The user is free to create
> and manipulate them, and gets full access to the things referenced by
> them. And saying that HEAD is a reference, that might be symbolic is
> IMHO by far easier to understand than saying that HEAD might be a
> pointer or a handle.

Fair enough. Call them symbolic pointers; however, I don't really see
the problem with pointer pointers.

In any case, I *think* my point is that it's important to understand
that git uses content addressing; at first I was emphatic about the
idea of 'addressing', so I went with pointer terminology (which works
quite well, in my opinion). However, I think the 'content' part is
more important, which is why 'object hash' is loads better than
'object name' or 'object id'. Also, at least the documentation could
say that 'objects are addressed by their hashes', which says a whole
lot in one quick sentence about how git works.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-05-02 18:36                           ` Michael Witten
@ 2009-05-02 21:11                             ` Björn Steinbrink
  2009-05-02 23:13                               ` Michael Witten
  0 siblings, 1 reply; 90+ messages in thread
From: Björn Steinbrink @ 2009-05-02 21:11 UTC (permalink / raw)
  To: Michael Witten
  Cc: Jeff King, Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On 2009.05.02 13:36:35 -0500, Michael Witten wrote:
> 2009/5/2 Björn Steinbrink <B.Steinbrink@gmx.de>:
> >> As I've stated: "address", "pointer", and "handle" are an analogy to
> >> terminology that has been around for ages. In fact, another name for
> >> "pointer" is "reference".
> >
> > AFAIK a pointer is just one kind of reference. C++ references are
> > another kind...
> 
> Actually, a C++ reference is a pointer with restrictions (AFAIK).

I'm not really aware of what the C++ standard says about it, but from a
usage point of view, they're IMHO different enough to consider them as
truly different types of references.

> > And there are probably plenty of examples where you could apply that
> > analogy, yet nobody (I know) does. Arrays, database tables, ...
> 
> Well, this terminology is certainly used with arrays in C, because
> array elements can be accessed with pointers.

But when you apply the analogy, then the array is the memory, and an
integer is an address and an index variable is a pointer.

> Also, databases use a much different scheme for addressing information
> than does memory.

I don't see any inherent problem in saying that the primary key
determines the address of a row. (It just gets funny when you have a
table schema without a primary key *g*)

> > And "memory" usually means "RAM" to me, not "WORM"-memory (well,
> > actually, you can also delete and then rewrite, but not modify).
> 
> Well, I don't see how Random Access Memory really conflicts. One
> certainly can access objects in the object memory/store randomly. The
> main difference is that the computer store is addressed by location,
> wheras the git store is addressed by content.

When I have a (non const) pointer in C I can write to the memory
location it references. With git, I can't do that. ("RWRAM" would have
been more correct, I'm damaged by the common usage of RAM as meaning
RWRAM).

> Also, I would say that conceptually deletion is an implementation
> detail.

Yeah, thus I put it in parentheses, just to show that, in practise, we
don't even have WORM-memory (but still taking the hash collision problem
into account, so we need to write once).

> Because git's object store is content addressable, one could
> think of it as already containing all possible objects (of course, I'm
> assuming that the 160-bit hash is also an implementation detail; an
> infinite number of objects implies infinitely large addresses, though
> the nonsignificant zeros could be disregarded as with real numbers or
> something. I don't know, I'm making this up as I go :-D). That the git
> tools ever complain no such object exists is an implementation detail
> resulting from our finite storage in reality.

I prefer to take the hash collision into account when looking at things
like that, but yeah, one could look at it like that.

> > So the analogy would even hurt my mental model (just like the
> > "commit --amend" command might be consider harmful, because it
> > actually creates a new commit, but some users actually think the
> > original commit is modified).
> 
> Actually, this is why it's so important to have the underlying
> concepts at hand. Understanding that objects are simply addressed by
> content (that is, objects are immutable) completely extirpates this
> kind of confusion.

I never disagreed with that, though I put more emphasis on the plain
object relationships and their immutability than on the fact that hashes
are used. Having that part right (how objects work together to form
history) is a large part of what you need to understand all the rest.

> >> >> So, a pointer variable's value is an object address that is the
> >> >> location of an object in git 'memory'. I think using this approach
> >> >> would make things significantly more transparent.
> >> >
> >> > But then HEAD would be a pointer pointer variable (symbolic ref), unless
> >> > you have a detached HEAD.
> >>
> >> We call those handles.
> >
> > Isn't a handle basically an opaque/abstract reference, at least in
> > "modern" usage? Symvolic references aren't. The user is free to create
> > and manipulate them, and gets full access to the things referenced by
> > them. And saying that HEAD is a reference, that might be symbolic is
> > IMHO by far easier to understand than saying that HEAD might be a
> > pointer or a handle.
> 
> Fair enough. Call them symbolic pointers; however, I don't really see
> the problem with pointer pointers.

You called them handles anyway ;-) But seriously, it's that "pointer"
triggers C for me. And having an entity that can switch between being a
pointer and a pointer pointer needs casting or a union (or a struct if
you want to), not something I'd like to have to think about in my mental
model of git.

> In any case, I *think* my point is that it's important to understand
> that git uses content addressing; at first I was emphatic about the
> idea of 'addressing', so I went with pointer terminology (which works
> quite well, in my opinion). However, I think the 'content' part is
> more important, which is why 'object hash' is loads better than
> 'object name' or 'object id'. Also, at least the documentation could
> say that 'objects are addressed by their hashes', which says a whole
> lot in one quick sentence about how git works.

Hm, like chapter 7 "Git concepts"?

>>>>>>
The Object Database

We already saw in the section called “Understanding History: Commits”
that all commits are stored under a 40-digit "object name". In fact, all
the information needed to represent the history of a project is stored
in objects with such names. In each case the name is calculated by
taking the SHA-1 hash of the contents of the object. The SHA-1 hash is a
cryptographic hash function. What that means to us is that it is
impossible to find two different objects with the same name. This has a
number of advantages; among others:

    * Git can quickly determine whether two objects are identical or
      not, just by comparing names.
    * Since object names are computed the same way in every repository,
      the same content stored in two repositories will always be stored
      under the same name.
    * Git can detect errors when it reads an object, by checking that
      the object's name is still the SHA-1 hash of its contents.
<<<<<<

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-05-02 21:11                             ` Björn Steinbrink
@ 2009-05-02 23:13                               ` Michael Witten
  2009-05-02 23:32                                 ` Björn Steinbrink
  2009-05-03  1:18                                 ` Mark Lodato
  0 siblings, 2 replies; 90+ messages in thread
From: Michael Witten @ 2009-05-02 23:13 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Jeff King, Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

2009/5/2 Björn Steinbrink <B.Steinbrink@gmx.de>:
>> In any case, I *think* my point is that it's important to understand
>> that git uses content addressing; at first I was emphatic about the
>> idea of 'addressing', so I went with pointer terminology (which works
>> quite well, in my opinion). However, I think the 'content' part is
>> more important, which is why 'object hash' is loads better than
>> 'object name' or 'object id'. Also, at least the documentation could
>> say that 'objects are addressed by their hashes', which says a whole
>> lot in one quick sentence about how git works.
>
> Hm, like chapter 7 "Git concepts"?

That's exactly the problem. It should be in chapter 0.

I also dislike the use of 'name' rather than 'hash'; a name is
something provided by the user, but a hash is something computed. The
use of sha[-]1 is even more egregious.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-05-02 23:13                               ` Michael Witten
@ 2009-05-02 23:32                                 ` Björn Steinbrink
  2009-05-03  1:10                                   ` Michael Witten
  2009-05-03  1:18                                 ` Mark Lodato
  1 sibling, 1 reply; 90+ messages in thread
From: Björn Steinbrink @ 2009-05-02 23:32 UTC (permalink / raw)
  To: Michael Witten
  Cc: Jeff King, Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On 2009.05.02 18:13:24 -0500, Michael Witten wrote:
> 2009/5/2 Björn Steinbrink <B.Steinbrink@gmx.de>:
> >> In any case, I *think* my point is that it's important to understand
> >> that git uses content addressing; at first I was emphatic about the
> >> idea of 'addressing', so I went with pointer terminology (which works
> >> quite well, in my opinion). However, I think the 'content' part is
> >> more important, which is why 'object hash' is loads better than
> >> 'object name' or 'object id'. Also, at least the documentation could
> >> say that 'objects are addressed by their hashes', which says a whole
> >> lot in one quick sentence about how git works.
> >
> > Hm, like chapter 7 "Git concepts"?
> 
> That's exactly the problem. It should be in chapter 0.

I'm not opposed to re-ordering stuff. Though I often think that having
commands and concepts "together" is better.  Maybe we just need that
twice? Once the plain data model, and once a "hands on" version where
the effects of the commands are described in terms of the data model.

The former "sucks" for those that want to just "dive in" (but might
still be happy to get told what their actions do), the latter sucks when
you just want to look something up.

Hm?

Björn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-05-02 23:32                                 ` Björn Steinbrink
@ 2009-05-03  1:10                                   ` Michael Witten
  2009-05-03  1:48                                     ` Björn Steinbrink
  0 siblings, 1 reply; 90+ messages in thread
From: Michael Witten @ 2009-05-03  1:10 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Jeff King, Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

2009/5/2 Björn Steinbrink <B.Steinbrink@gmx.de>:
>> > Hm, like chapter 7 "Git concepts"?
>>
>> That's exactly the problem. It should be in chapter 0.
>
> I'm not opposed to re-ordering stuff. Though I often think that having
> commands and concepts "together" is better.  Maybe we just need that
> twice? Once the plain data model, and once a "hands on" version where
> the effects of the commands are described in terms of the data model.
>
> The former "sucks" for those that want to just "dive in" (but might
> still be happy to get told what their actions do), the latter sucks when
> you just want to look something up.

Indeed. I think the key is to split up the documentation for these 2 paths.

    http://marc.info/?l=git&m=124058631814726&w=2

The mixing of the 2 is what makes everyone unhappy.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-05-02 23:13                               ` Michael Witten
  2009-05-02 23:32                                 ` Björn Steinbrink
@ 2009-05-03  1:18                                 ` Mark Lodato
  2009-05-03  1:26                                   ` Michael Witten
  1 sibling, 1 reply; 90+ messages in thread
From: Mark Lodato @ 2009-05-03  1:18 UTC (permalink / raw)
  To: Michael Witten
  Cc: Björn Steinbrink, Jeff King, Daniel Barkalow, Johan Herland,
	git, David Abrahams, J. Bruce Fields

2009/5/2 Michael Witten <mfwitten@gmail.com>:
> I also dislike the use of 'name' rather than 'hash'; a name is
> something provided by the user, but a hash is something computed. The
> use of sha[-]1 is even more egregious.

What about "identifier" as a compromise between "hash" and "name"?
This is really what we're talking about - a way of identifying
objects.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-05-03  1:18                                 ` Mark Lodato
@ 2009-05-03  1:26                                   ` Michael Witten
  0 siblings, 0 replies; 90+ messages in thread
From: Michael Witten @ 2009-05-03  1:26 UTC (permalink / raw)
  To: Mark Lodato
  Cc: Björn Steinbrink, Jeff King, Daniel Barkalow, Johan Herland,
	git, David Abrahams, J. Bruce Fields

On Sat, May 2, 2009 at 20:18, Mark Lodato <lodatom@gmail.com> wrote:
> 2009/5/2 Michael Witten <mfwitten@gmail.com>:
>> I also dislike the use of 'name' rather than 'hash'; a name is
>> something provided by the user, but a hash is something computed. The
>> use of sha[-]1 is even more egregious.
>
> What about "identifier" as a compromise between "hash" and "name"?
> This is really what we're talking about - a way of identifying
> objects.

It's the same problem, in my opinion. '[Cryptographic] hash' says so
much more and still remains quite generic.

Also, continuing with 'sha1' doesn't seem satisfactory:

    http://marc.info/?l=git&m=124068702303042&w=2

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [doc] User Manual Suggestion
  2009-05-03  1:10                                   ` Michael Witten
@ 2009-05-03  1:48                                     ` Björn Steinbrink
  0 siblings, 0 replies; 90+ messages in thread
From: Björn Steinbrink @ 2009-05-03  1:48 UTC (permalink / raw)
  To: Michael Witten
  Cc: Jeff King, Daniel Barkalow, Johan Herland, git, David Abrahams,
	J. Bruce Fields

On 2009.05.02 20:10:14 -0500, Michael Witten wrote:
> 2009/5/2 Björn Steinbrink <B.Steinbrink@gmx.de>:
> >> > Hm, like chapter 7 "Git concepts"?
> >>
> >> That's exactly the problem. It should be in chapter 0.
> >
> > I'm not opposed to re-ordering stuff. Though I often think that having
> > commands and concepts "together" is better.  Maybe we just need that
> > twice? Once the plain data model, and once a "hands on" version where
> > the effects of the commands are described in terms of the data model.
> >
> > The former "sucks" for those that want to just "dive in" (but might
> > still be happy to get told what their actions do), the latter sucks when
> > you just want to look something up.
> 
> Indeed. I think the key is to split up the documentation for these 2 paths.
> 
>     http://marc.info/?l=git&m=124058631814726&w=2
> 
> The mixing of the 2 is what makes everyone unhappy.

I'm not sure which part of that email you're referring to (and I'm
getting tired, 3:20am...). I'm just seeing the paragraph where Jeff has
said that we have a split, between the tutorial and the manual. And what
I tried to said, is that we might need the tutorial to be less of a
"recipe collection", but more of a hands-on introduction that actively
explains the data model and how data is manipulated by using the
commands. And the user manual might become less example oriented,
focussing more on concepts, giving examples in addition. So that we have
both approaches, hands-on and theoretical, but both keeping the data
model in mind, at least to some extend.

For example the "hands on" version might rather create a "toy"
repository than importing an existing project right away, to get a
smaller scope of things to describe at once, and to be able to show e.g.
full "graphs" of the early repo as it evolves. Users that simply don't
want to care can still skip over the explanations and suffer^Wjust pick
up the commands. You could e.g. say "To create a lightweight tag you use
..., which adds a new reference, while ... adds an annotated tags, which
is a real tag object, with a message and a tagger and which can possibly
be signed using your GPG key." And maybe explain the tag object a bit
further.

While the manual might, for example, have a section "Tags" instead of
the current "Creating tags"(*), where the different types of tags are
described, how they fit into the data model, what the different types of
tags mean, and only then give examples how to create them.

Lots of possible work...

Björn

(*) Why's that in the "Exploring git history chapter"? Let's see if I
can sort out my local asciidoc problems and find some time to provide
some basic patches for that. Though I still haven't managed to get the
one for the git-push man page done... *sigh*

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2009-05-03  1:49 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-22 19:38 [doc] User Manual Suggestion David Abrahams
2009-04-23 17:57 ` J. Bruce Fields
2009-04-23 18:37   ` Michael Witten
2009-04-23 20:16     ` Jeff King
2009-04-23 20:45       ` Michael Witten
2009-04-23 21:31         ` David Abrahams
2009-04-24  0:31           ` Michael Witten
2009-04-24 14:18           ` Jeff King
2009-04-24 14:20             ` J. Bruce Fields
2009-04-24 17:28             ` David Abrahams
2009-04-24 18:15               ` Jeff King
2009-04-24 19:00                 ` David Abrahams
2009-04-24 20:24                   ` Jeff King
2009-04-24 21:06                     ` David Abrahams
2009-04-24 22:45                       ` Björn Steinbrink
2009-04-25  0:39                         ` David Abrahams
2009-04-26 23:35                           ` Björn Steinbrink
2009-04-24 14:11         ` Jeff King
2009-04-24 14:30           ` Michael Witten
2009-04-24 14:33             ` Michael Witten
2009-04-24 15:04             ` Jeff King
2009-04-24 15:18               ` Michael Witten
2009-04-24 17:38                 ` J. Bruce Fields
2009-04-24 18:27                   ` Jeff King
2009-04-24 18:35                     ` J. Bruce Fields
     [not found]                   ` <34BD51FF-0908-48A8-BBBC-E27B0EFB32E5@boostpro.com>
2009-04-24 18:52                     ` J. Bruce Fields
2009-04-25 10:35                       ` Felipe Contreras
2009-04-24 19:12                   ` Michael Witten
2009-04-23 21:26       ` David Abrahams
2009-04-23 22:51         ` Johan Herland
2009-04-24  0:30           ` Michael Witten
2009-04-24 20:30             ` Johan Herland
2009-04-24 21:34               ` Daniel Barkalow
2009-04-24 21:38                 ` Jeff King
2009-04-24 22:18                   ` Michael Witten
2009-04-24 22:25                     ` Michael Witten
2009-04-24 23:11                       ` Daniel Barkalow
2009-04-24 23:14                         ` Jeff King
2009-04-24 23:18                           ` Michael Witten
2009-04-24 23:31                           ` Michael Witten
2009-04-24 23:35                             ` Jeff King
2009-04-25  0:19                               ` Michael Witten
2009-04-25 10:18                           ` Felipe Contreras
2009-04-24 23:26                         ` Michael Witten
2009-04-25 18:55                           ` Daniel Barkalow
2009-04-25 19:16                             ` Michael Witten
2009-04-25 19:24                               ` Felipe Contreras
2009-04-25 19:36                                 ` David Abrahams
2009-04-25 20:53                                   ` Felipe Contreras
2009-04-26 11:28                                   ` Björn Steinbrink
2009-04-26 13:55                                     ` David Abrahams
2009-04-26 17:56                                       ` Björn Steinbrink
2009-04-26 20:17                                         ` David Abrahams
2009-04-26 22:25                                           ` Björn Steinbrink
2009-04-27  1:41                                             ` David Abrahams
2009-04-27 16:30                                             ` David Abrahams
2009-04-27 16:52                                               ` Michael Witten
2009-04-26 16:36                                     ` Michael Witten
2009-04-26 18:12                                       ` Björn Steinbrink
2009-04-26 20:20                                         ` David Abrahams
2009-04-25  0:41                         ` David Abrahams
2009-04-24 23:16                     ` Björn Steinbrink
2009-04-25  0:01                       ` Michael Witten
2009-04-25  0:48                         ` David Abrahams
2009-04-26 22:42                           ` Björn Steinbrink
2009-05-02 15:53                         ` Björn Steinbrink
2009-05-02 18:36                           ` Michael Witten
2009-05-02 21:11                             ` Björn Steinbrink
2009-05-02 23:13                               ` Michael Witten
2009-05-02 23:32                                 ` Björn Steinbrink
2009-05-03  1:10                                   ` Michael Witten
2009-05-03  1:48                                     ` Björn Steinbrink
2009-05-03  1:18                                 ` Mark Lodato
2009-05-03  1:26                                   ` Michael Witten
2009-04-24 23:21                   ` Daniel Barkalow
2009-04-24 23:25                     ` Jeff King
2009-04-26 23:41                       ` Björn Steinbrink
2009-04-24 23:29                     ` Michael Witten
2009-04-27  0:00                       ` Björn Steinbrink
2009-04-25  0:19                   ` David Abrahams
2009-04-25  0:26                     ` Michael Witten
2009-04-25  0:35                     ` Jeff King
2009-04-25  0:53                       ` David Abrahams
2009-04-29  6:34                         ` Jeff King
2009-04-29 13:27                           ` David Abrahams
2009-04-29 14:05                             ` Jeff King
2009-04-24  2:29     ` J. Bruce Fields
2009-04-24  2:34       ` Michael Witten
2009-04-24  4:06       ` David Abrahams
2009-04-24 14:10         ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).