Notes on supporting Git operations in/on partial Working Directories

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Notes on supporting Git operations in/on partial Working Directories
@ 2006-09-14 19:05 A Large Angry SCM
  2006-09-14 19:21 ` Shawn Pearce
  2006-09-14 19:50 ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: A Large Angry SCM @ 2006-09-14 19:05 UTC (permalink / raw)
  To: git

Notes on supporting Git operations in/on partial Working Directories
====================================================================

Motivation
----------
Be able to checkout only part of a tree, do some work, and commit the
changes. Support for partial working directories is also (almost) a
requirement for supporting partial repositories.

Expectations
------------
All Git commands that currently work with the index or the working
directory will work with indexes or working directories that are partial
checkouts.

Leading directories common to all objects of a partial checkout are not
present in the working directory.

The contents of a partial working directory can be determined on an path
by path basis; entire directories are not required.

Implementation Sketch
---------------------
The minimum required changes[*1*][*2*][*3*] to the index file to support
partial checkouts are:

1) the addition of WD_Prefix string to hold the common path prefix of
all objects in the working directory. For a full checkout, the WD_Prefix
string would be empty.

2) A (new) flag for each entry in the index indicating whether or not
the object is in the partial checkout.

The contents of the index file still reflect the full tree but flag each
object (file or symlink) separately as part of the checkout or not. The
WD_Prefix string is so that a partial checkout consisting of only
objects somewhere in the a/b/c/d/ tree can be found in the working
directory without the a/b/c/d/ prefix to the path of the object.

All the Git commands that use the index file will need to be changed to
support this but the transfer protocols do not need to change.

Notes
-----
[*1*] As long as the index file structure is being changed, it may be
worth while also including the ideas in:
	http://www.gelato.unsw.edu.au/archives/git/0601/15471.html
	http://www.gelato.unsw.edu.au/archives/git/0601/15483.html
	http://www.gelato.unsw.edu.au/archives/git/0601/15484.html
Except for the "bind" parts since I still think that is a bad idea.

[*2*] The index "TREE" (cache-tree) extension should also become a
required part of the index.

[*3*] Possibly split the index up by directory and store the parts in
the working directory. An index "distributed" in this way would have
a "natural" cache-tree built in and (finally) be able support empty
directories.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-14 19:05 Notes on supporting Git operations in/on partial Working Directories A Large Angry SCM
@ 2006-09-14 19:21 ` Shawn Pearce
  2006-09-14 20:08   ` A Large Angry SCM
  2006-09-14 19:50 ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Shawn Pearce @ 2006-09-14 19:21 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: git

A Large Angry SCM <gitzilla@gmail.com> wrote:
> The contents of the index file still reflect the full tree but flag each
> object (file or symlink) separately as part of the checkout or not. The
> WD_Prefix string is so that a partial checkout consisting of only
> objects somewhere in the a/b/c/d/ tree can be found in the working
> directory without the a/b/c/d/ prefix to the path of the object.

Why not just load a partial index?

If we only want "a/b/c/d" subtree then only load that into the index.
At git-write-tree time return the new root tree by loading the tree
of the current `HEAD` commit and walking down to a/b/c/d, updating
that with the tree from the index, then walking back updating each
node you recursed down through.  Finally output the new root tree.

The advantage is that if you have a subtree checked out you aren't
working with the entire massive index.

But how does this let the user checkout and work on the 10 top
level directories at once and perform an atomic commit to all
of them, but not checkout the other 100+ top level directories?
As I recall this was desired in the Mozilla project for example.

> [*3*] Possibly split the index up by directory and store the parts in
> the working directory. An index "distributed" in this way would have
> a "natural" cache-tree built in and (finally) be able support empty
> directories.

Please, no.  On a project with a large number of directories
operations like git-write-tree would take a longer time to scan the
index and generate the new trees.  I unfortunately work on such
projects as its common for Java applications to be very deeply
nested and large projects have a *lot* of directories.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-14 19:05 Notes on supporting Git operations in/on partial Working Directories A Large Angry SCM
  2006-09-14 19:21 ` Shawn Pearce
@ 2006-09-14 19:50 ` Junio C Hamano
  2006-09-14 20:19   ` A Large Angry SCM
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2006-09-14 19:50 UTC (permalink / raw)
  To: gitzilla; +Cc: git

A Large Angry SCM <gitzilla@gmail.com> writes:

> The minimum required changes[*1*][*2*][*3*] to the index file to support
> partial checkouts are:
>
> 1) the addition of WD_Prefix string to hold the common path prefix of
> all objects in the working directory. For a full checkout, the WD_Prefix
> string would be empty.
>
> 2) A (new) flag for each entry in the index indicating whether or not
> the object is in the partial checkout.
>
> The contents of the index file still reflect the full tree but flag each
> object (file or symlink) separately as part of the checkout or not. The
> WD_Prefix string is so that a partial checkout consisting of only
> objects somewhere in the a/b/c/d/ tree can be found in the working
> directory without the a/b/c/d/ prefix to the path of the object.
>
> All the Git commands that use the index file will need to be changed to
> support this but the transfer protocols do not need to change.

While this may be a good start, you need a lot more than this if
you want to do (1) and (2):

The tree object contained by a commit is by definition a full
tree snapshot, so if you want to do a WD_Prefix, you somehow
need a way to come up with the final tree that is a combination
of what write-tree would write out from such a partial index
(i.e. an index that describes only a subdirectory) and the rest
of the tree from the current HEAD.  I think you can more or less
do this change to Porcelain using today's git core.  The
sequence to emulate it with the today's git would be:

 (1) write-tree (of the WD_Prefix part of the subtree),
 (2) read-tree HEAD (to populate the index fully),
 (3) piping a massaged output from git-ls-files WD_prefix to
     update-index --index-info, followed by read-tree
     --prefix=WD_prefix to swap the partial tree in to
     WD_prefix,
 (4) write-tree (to get the final result).

If you want to do per-path-inside-directory checkout (your 2),
this combining step would need to be even more complex.  You can
do that by hand (reading ls-tree and ls-files and driving
update-index --index-info yourself) but it certainly would be
more involved.  But it's just a matter of Porcelain programming
;-) [*1*].

But a good news is that today's git core lets you work in a
sparsely checked out repository without any of the above
crap^Wcomplexity, if you drop the WD_Prefix and per
path-inside-directory checkout "expectations".  Just staying
within the directory you are working in, and saying "commit ."
when you are tempted to say "commit -a", would be more or less
what are needed.

Note.  The "git checkout" Porcelain would want to check-out
everythingd, so a tool to prepare such a sparsely checked out
tree needs to be written if somebody wants to try this, since
the above "good news" is only about "working in" a sparsely
checked out tree.

[*1*] Obviously you would also need to worry about activities
other than making your own changes and committing.  When you are
always pulling from a single upstream that never rewinds the
head, the problem becomes simpler, but for other cases (read:
anything that makes distributed version control more interesting
and useful) you would need to worry about merges too.  What
happens when the upstream you based your changes on and the
repository you are pulling from today had conflicting changes
outside of your area of interest?  Without resolving the
conflicts, you cannot sanely claim you merged the two branches,
and even if you wanted to resolve them yourself, an non-empty
WD_prefix would get in your way.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-14 19:21 ` Shawn Pearce
@ 2006-09-14 20:08   ` A Large Angry SCM
  0 siblings, 0 replies; 13+ messages in thread
From: A Large Angry SCM @ 2006-09-14 20:08 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

Shawn Pearce wrote:
> A Large Angry SCM <gitzilla@gmail.com> wrote:
>> The contents of the index file still reflect the full tree but flag each
>> object (file or symlink) separately as part of the checkout or not. The
>> WD_Prefix string is so that a partial checkout consisting of only
>> objects somewhere in the a/b/c/d/ tree can be found in the working
>> directory without the a/b/c/d/ prefix to the path of the object.
> 
> Why not just load a partial index?
> 
> If we only want "a/b/c/d" subtree then only load that into the index.
> At git-write-tree time return the new root tree by loading the tree
> of the current `HEAD` commit and walking down to a/b/c/d, updating
> that with the tree from the index, then walking back updating each
> node you recursed down through.  Finally output the new root tree.
> 
> The advantage is that if you have a subtree checked out you aren't
> working with the entire massive index.

I was looking for minimal changes to the index and associated code. 
Either way works.

> But how does this let the user checkout and work on the 10 top
> level directories at once and perform an atomic commit to all
> of them, but not checkout the other 100+ top level directories?
> As I recall this was desired in the Mozilla project for example.

That's a partial working working directory by my definition so it would 
work. How it's specified on the command line is TBD.

It's desired by a lot of very modular projects.

>> [*3*] Possibly split the index up by directory and store the parts in
>> the working directory. An index "distributed" in this way would have
>> a "natural" cache-tree built in and (finally) be able support empty
>> directories.
> 
> Please, no.  On a project with a large number of directories
> operations like git-write-tree would take a longer time to scan the
> index and generate the new trees.  I unfortunately work on such
> projects as its common for Java applications to be very deeply
> nested and large projects have a *lot* of directories.

Directory trees without any changes might actually be less expensive to 
work with using the split index since you could ignore all of the 
unchanged entries easily.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-14 19:50 ` Junio C Hamano
@ 2006-09-14 20:19   ` A Large Angry SCM
  2006-09-15  2:43     ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: A Large Angry SCM @ 2006-09-14 20:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> A Large Angry SCM <gitzilla@gmail.com> writes:
...
> 
> While this may be a good start, you need a lot more than this if
> you want to do (1) and (2):
> 
> The tree object contained by a commit is by definition a full
> tree snapshot, so if you want to do a WD_Prefix, you somehow
> need a way to come up with the final tree that is a combination
> of what write-tree would write out from such a partial index
> (i.e. an index that describes only a subdirectory) and the rest
> of the tree from the current HEAD.  I think you can more or less
> do this change to Porcelain using today's git core.  The
> sequence to emulate it with the today's git would be:

I think you misunderstood, the index file would list all of the tree 
entries of the the checked out commit, same as the current index, but 
would flag the entries that are actually present in the working 
directory. The WD_Prefix is to identify which index entries _can not_ be 
part of the working directory, and where the working directory fits in 
the full index. That way, all the information needed by the top level 
write-tree is still in the index and the cache-tree extension.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-14 20:19   ` A Large Angry SCM
@ 2006-09-15  2:43     ` Junio C Hamano
  2006-09-15 18:15       ` A Large Angry SCM
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2006-09-15  2:43 UTC (permalink / raw)
  To: gitzilla; +Cc: git

A Large Angry SCM <gitzilla@gmail.com> writes:

> Junio C Hamano wrote:
>> A Large Angry SCM <gitzilla@gmail.com> writes:
> ...
>>
>> While this may be a good start, you need a lot more than this if
>> you want to do (1) and (2):
>>
>> The tree object contained by a commit is by definition a full
>> tree snapshot, so if you want to do a WD_Prefix, you somehow
>> need a way to come up with the final tree that is a combination
>> of what write-tree would write out from such a partial index
>> (i.e. an index that describes only a subdirectory) and the rest
>> of the tree from the current HEAD.  I think you can more or less
>> do this change to Porcelain using today's git core.  The
>> sequence to emulate it with the today's git would be:
>
> I think you misunderstood, the index file would list all of the tree
> entries of the the checked out commit, same as the current index, but
> would flag the entries that are actually present in the working
> directory. The WD_Prefix is to identify which index entries _can not_
> be part of the working directory, and where the working directory fits
> in the full index. That way, all the information needed by the top
> level write-tree is still in the index and the cache-tree extension.

Ah indeed.  That makes it more palatable ;-).

Having said that, I do not necessarily agree that highly modular
projects would want to put everything in one git repository and
track everything as a whole unit.

The primary audience of git, the kernel project, is reasonably
modular (although Andrew seems to be suffering from subsystem
maintainers touching overlapping areas these days and says it is
rather unusual), and is a non-trivial size, yet it has
everything under one umbrella.  The model makes sense in that
project, since the core developers need to occasionally change
an internal API wholesale across the tree.  The people at fringe
who work only on limited part of the system (e.g. one particular
filesystem implementation), on the other hand, may not care what
happens in the other parts (e.g. random device drivers that
should not interact directly with the filesystem implementation
in question) of the system most of the time, but they do have to
care if the layer closer to the core that their work depends on
changes (e.g. a VFS layer update changes the rule filesystems
must play under), so having to check out the full kernel tree
while they usually work only on one part of it often cumbersome
but sometimes absolutely necessary so it is tolerated.

Everybody is forced to work on the same codebase and merge the
whole tree as a unit, which might inconvenience the people
really at the fringe (e.g. driver writers), but being able to
make sure everything is in sync is a good thing to the core
developers, and that benefit outweighs the convenience of fringe
people (also the core people are who gets to pick the tool they
use ;-).

In the kernel case, out-of-tree driver people have a choice to
build out of tree against just kernel headers as modlues.
Nobody (including git) gives mechanical support to enforce that
this version of the out-of-tree driver must be used only with
such and such main tree, but build procedure and INSTALL
documents of such a driver usually take care of that integration
issues.

I suspect most highly modular projects are run that way, not
just from the version control point of view, but simply because
of people interaction issues.  Nobody can be on top of all
possible interface details between many modular pieces of a
truly huge project, so there would be clean separation of parts
and narrow definition of how they mesh together (after all, that
is what being highly modular is all about).  And in such a case,
subsystems can be (and I'd even claim they had better be)
version controlled more or less independently with each other,
with certain version dependencies, such as "libfoo subsystem is
used by all of our programs A, B, ..., Z, but recent libfoo 1.29
release added some feature to support enhancement in version 2.4
of program Z.  So libfoo 1.29 or later is required if you are
building the latest tip of program Z, but everybody else can
stay at 1.28 if updating libfoo is not convenient, oh by the way
1.30 has a thinko that broke what is used heavily only by
program A, so if you are working on program A use libfoo 1.30 or
later, or stay at libfoo 1.28."  And there would be tons of tiny
commits between these point releases.

My point is that while there will always be _some_ version
synchronization requirements between subcomponents of such a
huge highly modular project, it is a lot looser than tracking
each and every change in the entire tree as a whole, like git's
commit does.  The model of throwing all subcomponents in a
single repository and trying to track everything as a whole may
not match the real requirement of such a project.

In other words, when somebody adds a line in a file in a tiny
corner of libfoo to fix a typo in the comment and makes a
commit, that should not have to necessarily mean the version
number of the project as a whole needs to be bumped up.  It is
my understanding that people who house collection of related
projects in subversion gets this wrong, because subversion makes
it too eacy to propagate the revision number increment up to the
root level when you update something in a subtree.  It may be a
cheap operation from the storage point of view (incrementing the
revision number stored in a few tree nodes near the root), but
it does not change the fact that it changes the revision of the
whole project and affects other parts of projects that is not
affected by the particular change at all (it is not Subversion's
fault, but more of a fault of people who put everything in one
repository).  You could manage the versions that way, but you do
not have to.  And if it gets in the way of things you would want
to do, maybe you shouldn't.

I think what truly huge but highly modular projects need is a
good support to lay-out check-outs from multiple subprojects,
each of which is managed in its own repository but has loose
(looser than the level of individual commits) version
dependency.  That would need to solve three issues: (1) the
right versions from many repositories need to be checked out in
correct locations for a build, (2) after building and testing to
make sure they work together as a whole, these specific versions
from the subcomponent repositories need to be tagged to mark a
release, and (3) maybe a single large tarball that contains all
subprojects' checkout can be made easily.

So the issue may not be partial repository support, but support
for managing multiple projects.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-15  2:43     ` Junio C Hamano
@ 2006-09-15 18:15       ` A Large Angry SCM
  2006-09-17 10:43         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: A Large Angry SCM @ 2006-09-15 18:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
...
> 
> Having said that, I do not necessarily agree that highly modular
> projects would want to put everything in one git repository and
> track everything as a whole unit.

And yet that's exactly how a lot of developers use CVS. You can argue 
that some other way is better but when they move from CVS they're 
looking for continuity of productivity which often means not radically 
changing how they work. At least in the short term.

> The primary audience of git, the kernel project, is reasonably
> modular (although Andrew seems to be suffering from subsystem

I no longer believe that the Linux kernel developers are the "primary 
audience". They are certainly an important and influential set of Git 
users but there are also a lot of non kernel projects using Git. If not 
now, there will soon be more non kernel Git users than kernel Git users.

[Nice description of how to work with the Linux kernel code base.]

[Nice description of one way a hypothetical project with dependencies on 
libraries under active development could work.]

> I think what truly huge but highly modular projects need is a
> good support to lay-out check-outs from multiple subprojects,
> each of which is managed in its own repository but has loose
> (looser than the level of individual commits) version
> dependency.  That would need to solve three issues: (1) the
> right versions from many repositories need to be checked out in
> correct locations for a build, (2) after building and testing to
> make sure they work together as a whole, these specific versions
> from the subcomponent repositories need to be tagged to mark a
> release, and (3) maybe a single large tarball that contains all
> subprojects' checkout can be made easily.
 >
> So the issue may not be partial repository support, but support
> for managing multiple projects.

There's no question that that may be better for some projects. But I 
believe that the project members (or owners) should decide how they use 
their tools.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-15 18:15       ` A Large Angry SCM
@ 2006-09-17 10:43         ` Junio C Hamano
  2006-09-17 18:47           ` A Large Angry SCM
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2006-09-17 10:43 UTC (permalink / raw)
  To: gitzilla; +Cc: git

A Large Angry SCM <gitzilla@gmail.com> writes:

> Junio C Hamano wrote:
> ...
>>
>> Having said that, I do not necessarily agree that highly modular
>> projects would want to put everything in one git repository and
>> track everything as a whole unit.
>
> And yet that's exactly how a lot of developers use CVS. You can argue
> that some other way is better but when they move from CVS they're
> looking for continuity of productivity which often means not radically
> changing how they work. At least in the short term.

(Note. In the next paragraph, I used (for the want of better
       wording) the word "unrelated" to mean "indeed related,
       but without need to be in sync at whole-tree commit
       level, and probably insisting to be in sync at that level
       has more disadvantages than advantages".  Think of it to
       refer to the relationship among more-or-less independent
       project subcomponents in my earlier example).

Well, the fact is, it does not make any difference to CVS if you
put unrelated projects in a single "repository", because CVS
does not even have the concept of managing a project as a whole.
Except perhaps that you can give the same tag to individual
files and treat the set of the revs of the files that have that
particular tag forms a revision of a project.  And even that is
just a kludge -- if you throw a totally unrelated project into
such a "repository" and give files a tag that happens to be the
same name as the one used in another project, the tool happily
lets you do so and checking out a revision by the tag will pull
files from totally unrelated projects together.  We happen to
use the words "repository" and "revision" in git but what they
mean is quite different because we are more whole-tree oriented.
It misses the point to compare CVS and git and say CVS allows
placing unrelated things in the same "repository".  CVS does not
even track the whole tree state, so it does not hurt the user
nor the tool even if you did so.  With a tool that tracks the
whole tree, you need a bit of thinking and planning.

I do not think, by the way, we are aiming at much different
things.  We both know that our current tools do not support
either mode of operation; the direction you are coming from
where everything is under one roof and only parts are accessed
while others are left untouched, or an organization where
loosely-related projects from different repositories are checked
out into a working tree hierarchy.  You alluded to "split index"
in another message, and the project organization I suggested to
keep component projects in their own separate repositories would
also have separate indices in component repositories.

I am certainly not opposed to the idea of making operations in
such a tree (built either way) go seamlessly for the users.  A
Porcelain that supports such mode of operation needs to be built
or enhanced, because we do not have one.

What I am saying is that I suspect everything-under-one-roof
approach would incur higher damage to the core than multiple
repositories approach.  It is my understanding that Cogito has
such a light-weight subproject support that lets you have
separate repositories laid out in a single working tree.

For example, users of such a Porcelain most likely would not
worry about what is stored in .git/index and .git/remotes/
directories of individual repositories that appear in such a
single working tree.  The Porcelain would keep track of which
files are locally modified, what components are checked out in
which subdirectory, where their upstream repositories are, and
things like that.  It will use .git/index and .git/remotes/ of
component repositories to implement the unified tree, but the
use of the individual .git/ directories is its implementation
detail.  It is likely to do its own bookkeeping that is beyond
what the current core offers, but I do not think it needs much
additional core support.  If such bookkeeping turns out to be
useful and necessary to have it in the core for whatever reason
(either performance or interoperability across Porcelains), we
could certainly talk about putting that into the core.

I suspect that everything-under-one-roof approach is coming from
an observation that:

 - with CVS, projects that share the same cvsroot can be updated
   with single 'cvs update' command in a directory close to the
   root.

 - with git, if you use multiple repositories checked out at
   right places in the working tree hierarchy, you need to run
   around and say "git checkout" or "git commit" everywhere.

and the latter looks very inconvenient.

But of course the latter is very inconvenient.  The current "git
checkout" nor "git commit" are not such subprojects-aware
Porcelain commands.  But that does not mean you have to house
everything in the same repository and make partial check-in to
work.  You will be enhancing or replacing the same "git checkout"
and "git commit" commands to do so anyway.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-17 10:43         ` Junio C Hamano
@ 2006-09-17 18:47           ` A Large Angry SCM
  2006-09-17 18:55             ` Jakub Narebski
  0 siblings, 1 reply; 13+ messages in thread
From: A Large Angry SCM @ 2006-09-17 18:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> A Large Angry SCM <gitzilla@gmail.com> writes:
> 
>> Junio C Hamano wrote:
>> ...
>>> Having said that, I do not necessarily agree that highly modular
>>> projects would want to put everything in one git repository and
>>> track everything as a whole unit.
>> And yet that's exactly how a lot of developers use CVS. You can argue
>> that some other way is better but when they move from CVS they're
>> looking for continuity of productivity which often means not radically
>> changing how they work. At least in the short term.
> 
[...]

> I suspect that everything-under-one-roof approach is coming from
> an observation that:
> 
>  - with CVS, projects that share the same cvsroot can be updated
>    with single 'cvs update' command in a directory close to the
>    root.
> 
>  - with git, if you use multiple repositories checked out at
>    right places in the working tree hierarchy, you need to run
>    around and say "git checkout" or "git commit" everywhere.
> 
> and the latter looks very inconvenient.
> 
> But of course the latter is very inconvenient.  The current "git
> checkout" nor "git commit" are not such subprojects-aware
> Porcelain commands.  But that does not mean you have to house
> everything in the same repository and make partial check-in to
> work.  You will be enhancing or replacing the same "git checkout"
> and "git commit" commands to do so anyway.

I used CVS as an example but I've seen the "everything-under-one-roof" 
approach, as you put it, used in other VCS' that do work with 
changesets. One instance, in particular, has all the source and config 
files in a single tree that (I'm told) would take several Gigs of 
filesystem space to checkout fully. The codebase is modular to a great 
extent but any particular project in it usually required files from a 
large number of other projects. There is a lot of automation to find the 
required parts for builds and other actions on a project's codebase.

Could this be done with multiple repositories? Yes, but we're talking 
hundreds (no exaggeration) and many of those would likely end-up 
including a large percentage of the other repositories the way the Git 
repository includes the Gitk repository. It could work but their 
existing approach already works and is likely better suited for their 
codebase. Git can not, currently, do all the things that this 
organization wants a VCS to do, working with partial checkouts is a key one.

There is no fundamental reason Git can not support partial 
checkouts/working directories. In fact, there is no fundamental reason 
Git can not support operations on partial (sparse?) repositories in both 
space (working content/state, etc.) and time (history); it's just a 
matter of record keeping[*1*]. That isn't how the Linux kernel 
developers want to use their VCS but it _is_ how others want to use theirs.

Notes:
[*1*] I'm currently working on determining the minimum requirements 
needed to support repositories with partial or sparse history.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-17 18:47           ` A Large Angry SCM
@ 2006-09-17 18:55             ` Jakub Narebski
  2006-09-17 20:01               ` A Large Angry SCM
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Narebski @ 2006-09-17 18:55 UTC (permalink / raw)
  To: git

A Large Angry SCM wrote:

> There is no fundamental reason Git can not support partial 
> checkouts/working directories. In fact, there is no fundamental reason 
> Git can not support operations on partial (sparse?) repositories in both 
> space (working content/state, etc.) and time (history); it's just a 
> matter of record keeping[*1*]. That isn't how the Linux kernel 
> developers want to use their VCS but it _is_ how others want to use
> theirs. 

There is perhaps not much trouble with partial checkouts, but there is
problem with partial _commits_, at least for snapshot based SCM (as opposed
to patchset based SCM). 

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-17 18:55             ` Jakub Narebski
@ 2006-09-17 20:01               ` A Large Angry SCM
  2006-09-17 20:28                 ` Jakub Narebski
  0 siblings, 1 reply; 13+ messages in thread
From: A Large Angry SCM @ 2006-09-17 20:01 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski wrote:
> A Large Angry SCM wrote:
> 
>> There is no fundamental reason Git can not support partial 
>> checkouts/working directories. In fact, there is no fundamental reason 
>> Git can not support operations on partial (sparse?) repositories in both 
>> space (working content/state, etc.) and time (history); it's just a 
>> matter of record keeping[*1*]. That isn't how the Linux kernel 
>> developers want to use their VCS but it _is_ how others want to use
>> theirs. 
> 
> There is perhaps not much trouble with partial checkouts, but there is
> problem with partial _commits_, at least for snapshot based SCM (as opposed
> to patchset based SCM). 

By "partial commit" I take it you mean a commit with only partial 
information about the new (content) state? If so, the missing 
information about the new state can be assumed to have not changed from 
the previous recorded state (commit).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-17 20:01               ` A Large Angry SCM
@ 2006-09-17 20:28                 ` Jakub Narebski
  2006-09-17 21:11                   ` A Large Angry SCM
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Narebski @ 2006-09-17 20:28 UTC (permalink / raw)
  To: git

A Large Angry SCM wrote:
> Jakub Narebski wrote:
>> A Large Angry SCM wrote:
>> 
>>> There is no fundamental reason Git can not support partial 
>>> checkouts/working directories. In fact, there is no fundamental reason 
>>> Git can not support operations on partial (sparse?) repositories in both 
>>> space (working content/state, etc.) and time (history); it's just a 
>>> matter of record keeping[*1*]. That isn't how the Linux kernel 
>>> developers want to use their VCS but it _is_ how others want to use
>>> theirs. 
>> 
>> There is perhaps not much trouble with partial checkouts, but there is
>> problem with partial _commits_, at least for snapshot based SCM 
>> (as opposed to patchset based SCM). 
> 
> By "partial commit" I take it you mean a commit with only partial 
> information about the new (content) state? If so, the missing 
> information about the new state can be assumed to have not changed from 
> the previous recorded state (commit).

That of course assumes that 1) the whole state is recorded somewhere
(perhaps in the repository); so the partial checkout saves space only if
repository compress really well, 2) there are no merges outside checked out
part.

Is anybody working on "bind" header and subproject support?
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Notes on supporting Git operations in/on partial Working Directories
  2006-09-17 20:28                 ` Jakub Narebski
@ 2006-09-17 21:11                   ` A Large Angry SCM
  0 siblings, 0 replies; 13+ messages in thread
From: A Large Angry SCM @ 2006-09-17 21:11 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski wrote:
> A Large Angry SCM wrote:
>> Jakub Narebski wrote:
>>> A Large Angry SCM wrote:
>>>
>>>> There is no fundamental reason Git can not support partial 
>>>> checkouts/working directories. In fact, there is no fundamental reason 
>>>> Git can not support operations on partial (sparse?) repositories in both 
>>>> space (working content/state, etc.) and time (history); it's just a 
>>>> matter of record keeping[*1*]. That isn't how the Linux kernel 
>>>> developers want to use their VCS but it _is_ how others want to use
>>>> theirs. 
>>> There is perhaps not much trouble with partial checkouts, but there is
>>> problem with partial _commits_, at least for snapshot based SCM 
>>> (as opposed to patchset based SCM). 
>> By "partial commit" I take it you mean a commit with only partial 
>> information about the new (content) state? If so, the missing 
>> information about the new state can be assumed to have not changed from 
>> the previous recorded state (commit).
> 
> That of course assumes that 1) the whole state is recorded somewhere
> (perhaps in the repository); so the partial checkout saves space only if
> repository compress really well, 2) there are no merges outside checked out
> part.

1) The TREE objects leading to the objects that are 
added/deleted/changed objects are required. TREEs not leading to the 
added/deleted/changed objects are not required, only their IDs. That is 
sufficient to commit the changes in a partial checkout.

2) Obviously, only the part checked out can be worked on. If you want to 
merge changes to some other part, you will need that part, and possibly 
a mergebase.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-09-17 21:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-14 19:05 Notes on supporting Git operations in/on partial Working Directories A Large Angry SCM
2006-09-14 19:21 ` Shawn Pearce
2006-09-14 20:08   ` A Large Angry SCM
2006-09-14 19:50 ` Junio C Hamano
2006-09-14 20:19   ` A Large Angry SCM
2006-09-15  2:43     ` Junio C Hamano
2006-09-15 18:15       ` A Large Angry SCM
2006-09-17 10:43         ` Junio C Hamano
2006-09-17 18:47           ` A Large Angry SCM
2006-09-17 18:55             ` Jakub Narebski
2006-09-17 20:01               ` A Large Angry SCM
2006-09-17 20:28                 ` Jakub Narebski
2006-09-17 21:11                   ` A Large Angry SCM

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).