Re: Question about your comment on the git parable

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Question about your comment on the git parable
       [not found]   ` <4B4C5353-9820-4068-92DA-50665B1011E1@gmail.com>
@ 2012-02-26 14:10     ` Jakub Narebski
  0 siblings, 0 replies; 3+ messages in thread
From: Jakub Narebski @ 2012-02-26 14:10 UTC (permalink / raw)
  To: Federico Galassi; +Cc: git

Federico Galassi wrote:
> On 26/feb/2012, at 13:03, Jakub Narebski wrote:
>> Jakub Narebski wrote:

[...]
>>> Note also that the staging area is also a performance hack (perhaps it
>>> began as such; I am not sure about this aspect of git history).  Git uses
>>> it to be able to _cheaply_ check which files were changed.
>> 
>> The first name for staging area, _dircache_, hints at this.
> 
> Unfortunately, i'm not into git development. Do you have a clue on why
> the index, apparently a tree referring to objects, is much faster than
> reading that stuff right from the database?  

The index (at the very beginning "dircache"), or the staging area, stores
more information that are saved in object database, for example stat
information (file metadata).  Most of file metadata is highly local, so
it doesn't make sense to save it in object database of repository, but
it is used to avoid a file read: usually stat-ing a file, which is much
more cheap, is enough to notice that the file did not change.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Question about your comment on the git parable
       [not found]   ` <1E5ECB5A-595A-4B04-8269-6E35BF3FEA1A@gmail.com>
@ 2012-02-26 15:06     ` Jakub Narebski
  2012-02-28  2:41       ` Neal Kreitzinger
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Narebski @ 2012-02-26 15:06 UTC (permalink / raw)
  To: Federico Galassi; +Cc: git

On Sun, 26 Feb 2012, Federico Galassi wrote:
> On 26/feb/2012, at 12:29, Jakub Narebski wrote:
> 
>> Would you mind if this discussion was moved to git mailing
>> list (git@vger.kernel.org), of course always with copy directly
>> to you?  There are people there that can answer your questions
>> better.
> 
> No problem.
>
>> On Sun, 26 Feb 2012, Federico Galassi wrote:
>>> Hello, i think you're the author of these comments:
>>> http://news.ycombinator.com/item?id=616610 
>>> 
>>> I'm doing educational work on git based on the parable (talks,
>>> articles, etc..) and i'd like to improve on the real reason
>>> for a staging area.  
>>> 
>>> My question basically is: why is it really needed for merging?
>>> I mean, given the fictional git-like system of the parable,
>>> if I need to merge 2 snapshots i could: 
>>> 
>>> 1) search the commit tree for a base point
[...]
>>> 2) compare the diffs between the snapshots and the base point snapshot
>>> 3) if a conflict happens (change in the same line), just leave
>>>   something in the working dir to mark the conflict. For example,
>>>   keeping it simple, the system could reject a new commit until
>>>   the markers of the conflict are removed from the conflicting file.   
>>> 
>>> Couldn't it just work this way?
>> 
>> Well, it could; that is how many if not most of other version control
>> systems work.
>> 
>> 
>> There are (at least!) three problems with that approach.  First, sometimes
>> it is not possible to "leave something in the working dir to mark the
>> conflict".  Take for example case where binary file (e.g. image) was
>> changed, and textual 3-way diff file-merge algorithm wouldn't work.
>> 
>> Second, what to do in the case of *tree-level* conflict, for example
>> rename/rename conflict, where one side renamed file to different
>> name (moved to different place) than the other side.  There are no
>> conflict markers for this...
>> 
>> Third, what about false positives with detecting conflict markers,
>> i.e. the case where "rejecting new commit until conflict markers are
>> removed", for example AsciiDoc files can be falsely detected as having
>> partial conflict markers, and of course test vectors for testing conflict
>> would have to have conflict markers in them.
> 
> Ok, it's clear to me that the markers in file approach is just a little
> bit too simple. Do you see any concrete advantage in the staging area
> compared to, say, tree conflict metadata in the working dir and maybe
> a dedicated smart "resolve conflict" command?   

First, for such _local_ information working directory isn't the best place.
What if you accidentally delete this?  It is not and should not be
committed to repository,so there is no way to undelete it, except redoing
merge and losing all your progress so far in resolving merge conflicts.
It is much better to put such information somewhere in administrative
area[1] of repository. 

Second, if we have staging area where we store information about which
files are tracked, and a bunch of per-file metadata like modification time
for better performance, why not use it also for storing information about
merge in progress?

[1]: Name taken from "Version Control by Example" (free e-book) by
     Eric Sink.


There is also a thing very specific to Git, namely that "git add" adds
a current content of a file to object database of a repository (though
with modern git there is also "git add --intent-to-add" which works 
like add-ing file in other version control systems)... and you have to
store reference to newly created object somewhere so that it doesn't get
garbage-collected.

>>> Can you mention other situations in which the pattern "files to be added"
>>> is either mandatory or really helpful? 
>> 
>> Note that any version control system must have a kind of proto-staging
>> area to know which files are to be added in next commit.
>> 
>> If you do
>> 
>>  $ scm add file.c
>> 
>> then version control system must save somewhere that 'file.c' is to be
>> tracked (to be added in next commit).
> 
> Yes, the fictional vcs just tracked all the files in the working dir.
> Being selective on which file to track is of course another interesting
> feature.  

IRL it is a _necessary_ feature.  One of more common, if not most common
application of version control system is to manage source files for a
computer program.  And there you have object files, executables and other
_generated_ files which shouldn't be put in version control, not to
mention backups created by your editor / IDE (e.g. "*~" files in Unix
world, "*.bak" files in MS Windows world).

Not to mention files which you have added to working directory, but are
not ready to be added to new commit.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Question about your comment on the git parable
  2012-02-26 15:06     ` Jakub Narebski
@ 2012-02-28  2:41       ` Neal Kreitzinger
  0 siblings, 0 replies; 3+ messages in thread
From: Neal Kreitzinger @ 2012-02-28  2:41 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Federico Galassi, git

On 2/26/2012 9:06 AM, Jakub Narebski wrote:
> On Sun, 26 Feb 2012, Federico Galassi wrote:
>> On 26/feb/2012, at 12:29, Jakub Narebski wrote:
>>
>>> Would you mind if this discussion was moved to git mailing
>>> list (git@vger.kernel.org), of course always with copy directly
>>> to you?  There are people there that can answer your questions
>>> better.
>>
>> No problem.
>>
>>> On Sun, 26 Feb 2012, Federico Galassi wrote:
>>>> Hello, i think you're the author of these comments:
>>>> http://news.ycombinator.com/item?id=616610
>>>>
>>>> I'm doing educational work on git based on the parable (talks,
>>>> articles, etc..) and i'd like to improve on the real reason
>>>> for a staging area.
>>>>
>>>> My question basically is: why is it really needed for merging?
>>>> I mean, given the fictional git-like system of the parable,
>>>> if I need to merge 2 snapshots i could:
>>>>
>>>> 1) search the commit tree for a base point
> [...]
>>>> 2) compare the diffs between the snapshots and the base point snapshot
>>>> 3) if a conflict happens (change in the same line), just leave
>>>>    something in the working dir to mark the conflict. For example,
>>>>    keeping it simple, the system could reject a new commit until
>>>>    the markers of the conflict are removed from the conflicting file.
>>>>
>>>> Couldn't it just work this way?
>>>
>>> Well, it could; that is how many if not most of other version control
>>> systems work.
>>>
>>>
>>> There are (at least!) three problems with that approach.  First, sometimes
>>> it is not possible to "leave something in the working dir to mark the
>>> conflict".  Take for example case where binary file (e.g. image) was
>>> changed, and textual 3-way diff file-merge algorithm wouldn't work.
>>>
>>> Second, what to do in the case of *tree-level* conflict, for example
>>> rename/rename conflict, where one side renamed file to different
>>> name (moved to different place) than the other side.  There are no
>>> conflict markers for this...
>>>
>>> Third, what about false positives with detecting conflict markers,
>>> i.e. the case where "rejecting new commit until conflict markers are
>>> removed", for example AsciiDoc files can be falsely detected as having
>>> partial conflict markers, and of course test vectors for testing conflict
>>> would have to have conflict markers in them.
>>
>> Ok, it's clear to me that the markers in file approach is just a little
>> bit too simple. Do you see any concrete advantage in the staging area
>> compared to, say, tree conflict metadata in the working dir and maybe
>> a dedicated smart "resolve conflict" command?
>
> First, for such _local_ information working directory isn't the best place.
> What if you accidentally delete this?  It is not and should not be
> committed to repository,so there is no way to undelete it, except redoing
> merge and losing all your progress so far in resolving merge conflicts.
> It is much better to put such information somewhere in administrative
> area[1] of repository.
>
> Second, if we have staging area where we store information about which
> files are tracked, and a bunch of per-file metadata like modification time
> for better performance, why not use it also for storing information about
> merge in progress?
>
> [1]: Name taken from "Version Control by Example" (free e-book) by
>       Eric Sink.
>
>
> There is also a thing very specific to Git, namely that "git add" adds
> a current content of a file to object database of a repository (though
> with modern git there is also "git add --intent-to-add" which works
> like add-ing file in other version control systems)... and you have to
> store reference to newly created object somewhere so that it doesn't get
> garbage-collected.
>
>>>> Can you mention other situations in which the pattern "files to be added"
>>>> is either mandatory or really helpful?
>>>
>>> Note that any version control system must have a kind of proto-staging
>>> area to know which files are to be added in next commit.
>>>
>>> If you do
>>>
>>>   $ scm add file.c
>>>
>>> then version control system must save somewhere that 'file.c' is to be
>>> tracked (to be added in next commit).
>>
>> Yes, the fictional vcs just tracked all the files in the working dir.
>> Being selective on which file to track is of course another interesting
>> feature.
>
> IRL it is a _necessary_ feature.  One of more common, if not most common
> application of version control system is to manage source files for a
> computer program.  And there you have object files, executables and other
> _generated_ files which shouldn't be put in version control, not to
> mention backups created by your editor / IDE (e.g. "*~" files in Unix
> world, "*.bak" files in MS Windows world).
>
> Not to mention files which you have added to working directory, but are
> not ready to be added to new commit.
>
In the google tech talk "Contributing to Git": 
http://www.youtube.com/watch?v=j45cs5_nY2k , at the 44:00 min mark the 
index is discussed.  It notes that many scm's have an "index" but hide 
it from you.  Some of the advantages of git giving you access to the 
index are also discussed.

v/r,
neal

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-02-28  2:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <A98A438D-76DD-41B5-B8E1-6FA170B00801@gmail.com>
     [not found] ` <201202261303.38957.jnareb@gmail.com>
     [not found]   ` <4B4C5353-9820-4068-92DA-50665B1011E1@gmail.com>
2012-02-26 14:10     ` Question about your comment on the git parable Jakub Narebski
     [not found] ` <201202261229.51199.jnareb@gmail.com>
     [not found]   ` <1E5ECB5A-595A-4B04-8269-6E35BF3FEA1A@gmail.com>
2012-02-26 15:06     ` Jakub Narebski
2012-02-28  2:41       ` Neal Kreitzinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).