* Notes on supporting Git operations in/on partial Working Directories @ 2006-09-14 19:05 A Large Angry SCM 2006-09-14 19:21 ` Shawn Pearce 2006-09-14 19:50 ` Junio C Hamano 0 siblings, 2 replies; 13+ messages in thread From: A Large Angry SCM @ 2006-09-14 19:05 UTC (permalink / raw) To: git Notes on supporting Git operations in/on partial Working Directories ==================================================================== Motivation ---------- Be able to checkout only part of a tree, do some work, and commit the changes. Support for partial working directories is also (almost) a requirement for supporting partial repositories. Expectations ------------ All Git commands that currently work with the index or the working directory will work with indexes or working directories that are partial checkouts. Leading directories common to all objects of a partial checkout are not present in the working directory. The contents of a partial working directory can be determined on an path by path basis; entire directories are not required. Implementation Sketch --------------------- The minimum required changes[*1*][*2*][*3*] to the index file to support partial checkouts are: 1) the addition of WD_Prefix string to hold the common path prefix of all objects in the working directory. For a full checkout, the WD_Prefix string would be empty. 2) A (new) flag for each entry in the index indicating whether or not the object is in the partial checkout. The contents of the index file still reflect the full tree but flag each object (file or symlink) separately as part of the checkout or not. The WD_Prefix string is so that a partial checkout consisting of only objects somewhere in the a/b/c/d/ tree can be found in the working directory without the a/b/c/d/ prefix to the path of the object. All the Git commands that use the index file will need to be changed to support this but the transfer protocols do not need to change. Notes ----- [*1*] As long as the index file structure is being changed, it may be worth while also including the ideas in: http://www.gelato.unsw.edu.au/archives/git/0601/15471.html http://www.gelato.unsw.edu.au/archives/git/0601/15483.html http://www.gelato.unsw.edu.au/archives/git/0601/15484.html Except for the "bind" parts since I still think that is a bad idea. [*2*] The index "TREE" (cache-tree) extension should also become a required part of the index. [*3*] Possibly split the index up by directory and store the parts in the working directory. An index "distributed" in this way would have a "natural" cache-tree built in and (finally) be able support empty directories. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-14 19:05 Notes on supporting Git operations in/on partial Working Directories A Large Angry SCM @ 2006-09-14 19:21 ` Shawn Pearce 2006-09-14 20:08 ` A Large Angry SCM 2006-09-14 19:50 ` Junio C Hamano 1 sibling, 1 reply; 13+ messages in thread From: Shawn Pearce @ 2006-09-14 19:21 UTC (permalink / raw) To: A Large Angry SCM; +Cc: git A Large Angry SCM <gitzilla@gmail.com> wrote: > The contents of the index file still reflect the full tree but flag each > object (file or symlink) separately as part of the checkout or not. The > WD_Prefix string is so that a partial checkout consisting of only > objects somewhere in the a/b/c/d/ tree can be found in the working > directory without the a/b/c/d/ prefix to the path of the object. Why not just load a partial index? If we only want "a/b/c/d" subtree then only load that into the index. At git-write-tree time return the new root tree by loading the tree of the current `HEAD` commit and walking down to a/b/c/d, updating that with the tree from the index, then walking back updating each node you recursed down through. Finally output the new root tree. The advantage is that if you have a subtree checked out you aren't working with the entire massive index. But how does this let the user checkout and work on the 10 top level directories at once and perform an atomic commit to all of them, but not checkout the other 100+ top level directories? As I recall this was desired in the Mozilla project for example. > [*3*] Possibly split the index up by directory and store the parts in > the working directory. An index "distributed" in this way would have > a "natural" cache-tree built in and (finally) be able support empty > directories. Please, no. On a project with a large number of directories operations like git-write-tree would take a longer time to scan the index and generate the new trees. I unfortunately work on such projects as its common for Java applications to be very deeply nested and large projects have a *lot* of directories. -- Shawn. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-14 19:21 ` Shawn Pearce @ 2006-09-14 20:08 ` A Large Angry SCM 0 siblings, 0 replies; 13+ messages in thread From: A Large Angry SCM @ 2006-09-14 20:08 UTC (permalink / raw) To: Shawn Pearce; +Cc: git Shawn Pearce wrote: > A Large Angry SCM <gitzilla@gmail.com> wrote: >> The contents of the index file still reflect the full tree but flag each >> object (file or symlink) separately as part of the checkout or not. The >> WD_Prefix string is so that a partial checkout consisting of only >> objects somewhere in the a/b/c/d/ tree can be found in the working >> directory without the a/b/c/d/ prefix to the path of the object. > > Why not just load a partial index? > > If we only want "a/b/c/d" subtree then only load that into the index. > At git-write-tree time return the new root tree by loading the tree > of the current `HEAD` commit and walking down to a/b/c/d, updating > that with the tree from the index, then walking back updating each > node you recursed down through. Finally output the new root tree. > > The advantage is that if you have a subtree checked out you aren't > working with the entire massive index. I was looking for minimal changes to the index and associated code. Either way works. > But how does this let the user checkout and work on the 10 top > level directories at once and perform an atomic commit to all > of them, but not checkout the other 100+ top level directories? > As I recall this was desired in the Mozilla project for example. That's a partial working working directory by my definition so it would work. How it's specified on the command line is TBD. It's desired by a lot of very modular projects. >> [*3*] Possibly split the index up by directory and store the parts in >> the working directory. An index "distributed" in this way would have >> a "natural" cache-tree built in and (finally) be able support empty >> directories. > > Please, no. On a project with a large number of directories > operations like git-write-tree would take a longer time to scan the > index and generate the new trees. I unfortunately work on such > projects as its common for Java applications to be very deeply > nested and large projects have a *lot* of directories. Directory trees without any changes might actually be less expensive to work with using the split index since you could ignore all of the unchanged entries easily. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-14 19:05 Notes on supporting Git operations in/on partial Working Directories A Large Angry SCM 2006-09-14 19:21 ` Shawn Pearce @ 2006-09-14 19:50 ` Junio C Hamano 2006-09-14 20:19 ` A Large Angry SCM 1 sibling, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2006-09-14 19:50 UTC (permalink / raw) To: gitzilla; +Cc: git A Large Angry SCM <gitzilla@gmail.com> writes: > The minimum required changes[*1*][*2*][*3*] to the index file to support > partial checkouts are: > > 1) the addition of WD_Prefix string to hold the common path prefix of > all objects in the working directory. For a full checkout, the WD_Prefix > string would be empty. > > 2) A (new) flag for each entry in the index indicating whether or not > the object is in the partial checkout. > > The contents of the index file still reflect the full tree but flag each > object (file or symlink) separately as part of the checkout or not. The > WD_Prefix string is so that a partial checkout consisting of only > objects somewhere in the a/b/c/d/ tree can be found in the working > directory without the a/b/c/d/ prefix to the path of the object. > > All the Git commands that use the index file will need to be changed to > support this but the transfer protocols do not need to change. While this may be a good start, you need a lot more than this if you want to do (1) and (2): The tree object contained by a commit is by definition a full tree snapshot, so if you want to do a WD_Prefix, you somehow need a way to come up with the final tree that is a combination of what write-tree would write out from such a partial index (i.e. an index that describes only a subdirectory) and the rest of the tree from the current HEAD. I think you can more or less do this change to Porcelain using today's git core. The sequence to emulate it with the today's git would be: (1) write-tree (of the WD_Prefix part of the subtree), (2) read-tree HEAD (to populate the index fully), (3) piping a massaged output from git-ls-files WD_prefix to update-index --index-info, followed by read-tree --prefix=WD_prefix to swap the partial tree in to WD_prefix, (4) write-tree (to get the final result). If you want to do per-path-inside-directory checkout (your 2), this combining step would need to be even more complex. You can do that by hand (reading ls-tree and ls-files and driving update-index --index-info yourself) but it certainly would be more involved. But it's just a matter of Porcelain programming ;-) [*1*]. But a good news is that today's git core lets you work in a sparsely checked out repository without any of the above crap^Wcomplexity, if you drop the WD_Prefix and per path-inside-directory checkout "expectations". Just staying within the directory you are working in, and saying "commit ." when you are tempted to say "commit -a", would be more or less what are needed. Note. The "git checkout" Porcelain would want to check-out everythingd, so a tool to prepare such a sparsely checked out tree needs to be written if somebody wants to try this, since the above "good news" is only about "working in" a sparsely checked out tree. [*1*] Obviously you would also need to worry about activities other than making your own changes and committing. When you are always pulling from a single upstream that never rewinds the head, the problem becomes simpler, but for other cases (read: anything that makes distributed version control more interesting and useful) you would need to worry about merges too. What happens when the upstream you based your changes on and the repository you are pulling from today had conflicting changes outside of your area of interest? Without resolving the conflicts, you cannot sanely claim you merged the two branches, and even if you wanted to resolve them yourself, an non-empty WD_prefix would get in your way. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-14 19:50 ` Junio C Hamano @ 2006-09-14 20:19 ` A Large Angry SCM 2006-09-15 2:43 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: A Large Angry SCM @ 2006-09-14 20:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: > A Large Angry SCM <gitzilla@gmail.com> writes: ... > > While this may be a good start, you need a lot more than this if > you want to do (1) and (2): > > The tree object contained by a commit is by definition a full > tree snapshot, so if you want to do a WD_Prefix, you somehow > need a way to come up with the final tree that is a combination > of what write-tree would write out from such a partial index > (i.e. an index that describes only a subdirectory) and the rest > of the tree from the current HEAD. I think you can more or less > do this change to Porcelain using today's git core. The > sequence to emulate it with the today's git would be: I think you misunderstood, the index file would list all of the tree entries of the the checked out commit, same as the current index, but would flag the entries that are actually present in the working directory. The WD_Prefix is to identify which index entries _can not_ be part of the working directory, and where the working directory fits in the full index. That way, all the information needed by the top level write-tree is still in the index and the cache-tree extension. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-14 20:19 ` A Large Angry SCM @ 2006-09-15 2:43 ` Junio C Hamano 2006-09-15 18:15 ` A Large Angry SCM 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2006-09-15 2:43 UTC (permalink / raw) To: gitzilla; +Cc: git A Large Angry SCM <gitzilla@gmail.com> writes: > Junio C Hamano wrote: >> A Large Angry SCM <gitzilla@gmail.com> writes: > ... >> >> While this may be a good start, you need a lot more than this if >> you want to do (1) and (2): >> >> The tree object contained by a commit is by definition a full >> tree snapshot, so if you want to do a WD_Prefix, you somehow >> need a way to come up with the final tree that is a combination >> of what write-tree would write out from such a partial index >> (i.e. an index that describes only a subdirectory) and the rest >> of the tree from the current HEAD. I think you can more or less >> do this change to Porcelain using today's git core. The >> sequence to emulate it with the today's git would be: > > I think you misunderstood, the index file would list all of the tree > entries of the the checked out commit, same as the current index, but > would flag the entries that are actually present in the working > directory. The WD_Prefix is to identify which index entries _can not_ > be part of the working directory, and where the working directory fits > in the full index. That way, all the information needed by the top > level write-tree is still in the index and the cache-tree extension. Ah indeed. That makes it more palatable ;-). Having said that, I do not necessarily agree that highly modular projects would want to put everything in one git repository and track everything as a whole unit. The primary audience of git, the kernel project, is reasonably modular (although Andrew seems to be suffering from subsystem maintainers touching overlapping areas these days and says it is rather unusual), and is a non-trivial size, yet it has everything under one umbrella. The model makes sense in that project, since the core developers need to occasionally change an internal API wholesale across the tree. The people at fringe who work only on limited part of the system (e.g. one particular filesystem implementation), on the other hand, may not care what happens in the other parts (e.g. random device drivers that should not interact directly with the filesystem implementation in question) of the system most of the time, but they do have to care if the layer closer to the core that their work depends on changes (e.g. a VFS layer update changes the rule filesystems must play under), so having to check out the full kernel tree while they usually work only on one part of it often cumbersome but sometimes absolutely necessary so it is tolerated. Everybody is forced to work on the same codebase and merge the whole tree as a unit, which might inconvenience the people really at the fringe (e.g. driver writers), but being able to make sure everything is in sync is a good thing to the core developers, and that benefit outweighs the convenience of fringe people (also the core people are who gets to pick the tool they use ;-). In the kernel case, out-of-tree driver people have a choice to build out of tree against just kernel headers as modlues. Nobody (including git) gives mechanical support to enforce that this version of the out-of-tree driver must be used only with such and such main tree, but build procedure and INSTALL documents of such a driver usually take care of that integration issues. I suspect most highly modular projects are run that way, not just from the version control point of view, but simply because of people interaction issues. Nobody can be on top of all possible interface details between many modular pieces of a truly huge project, so there would be clean separation of parts and narrow definition of how they mesh together (after all, that is what being highly modular is all about). And in such a case, subsystems can be (and I'd even claim they had better be) version controlled more or less independently with each other, with certain version dependencies, such as "libfoo subsystem is used by all of our programs A, B, ..., Z, but recent libfoo 1.29 release added some feature to support enhancement in version 2.4 of program Z. So libfoo 1.29 or later is required if you are building the latest tip of program Z, but everybody else can stay at 1.28 if updating libfoo is not convenient, oh by the way 1.30 has a thinko that broke what is used heavily only by program A, so if you are working on program A use libfoo 1.30 or later, or stay at libfoo 1.28." And there would be tons of tiny commits between these point releases. My point is that while there will always be _some_ version synchronization requirements between subcomponents of such a huge highly modular project, it is a lot looser than tracking each and every change in the entire tree as a whole, like git's commit does. The model of throwing all subcomponents in a single repository and trying to track everything as a whole may not match the real requirement of such a project. In other words, when somebody adds a line in a file in a tiny corner of libfoo to fix a typo in the comment and makes a commit, that should not have to necessarily mean the version number of the project as a whole needs to be bumped up. It is my understanding that people who house collection of related projects in subversion gets this wrong, because subversion makes it too eacy to propagate the revision number increment up to the root level when you update something in a subtree. It may be a cheap operation from the storage point of view (incrementing the revision number stored in a few tree nodes near the root), but it does not change the fact that it changes the revision of the whole project and affects other parts of projects that is not affected by the particular change at all (it is not Subversion's fault, but more of a fault of people who put everything in one repository). You could manage the versions that way, but you do not have to. And if it gets in the way of things you would want to do, maybe you shouldn't. I think what truly huge but highly modular projects need is a good support to lay-out check-outs from multiple subprojects, each of which is managed in its own repository but has loose (looser than the level of individual commits) version dependency. That would need to solve three issues: (1) the right versions from many repositories need to be checked out in correct locations for a build, (2) after building and testing to make sure they work together as a whole, these specific versions from the subcomponent repositories need to be tagged to mark a release, and (3) maybe a single large tarball that contains all subprojects' checkout can be made easily. So the issue may not be partial repository support, but support for managing multiple projects. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-15 2:43 ` Junio C Hamano @ 2006-09-15 18:15 ` A Large Angry SCM 2006-09-17 10:43 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: A Large Angry SCM @ 2006-09-15 18:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: ... > > Having said that, I do not necessarily agree that highly modular > projects would want to put everything in one git repository and > track everything as a whole unit. And yet that's exactly how a lot of developers use CVS. You can argue that some other way is better but when they move from CVS they're looking for continuity of productivity which often means not radically changing how they work. At least in the short term. > The primary audience of git, the kernel project, is reasonably > modular (although Andrew seems to be suffering from subsystem I no longer believe that the Linux kernel developers are the "primary audience". They are certainly an important and influential set of Git users but there are also a lot of non kernel projects using Git. If not now, there will soon be more non kernel Git users than kernel Git users. [Nice description of how to work with the Linux kernel code base.] [Nice description of one way a hypothetical project with dependencies on libraries under active development could work.] > I think what truly huge but highly modular projects need is a > good support to lay-out check-outs from multiple subprojects, > each of which is managed in its own repository but has loose > (looser than the level of individual commits) version > dependency. That would need to solve three issues: (1) the > right versions from many repositories need to be checked out in > correct locations for a build, (2) after building and testing to > make sure they work together as a whole, these specific versions > from the subcomponent repositories need to be tagged to mark a > release, and (3) maybe a single large tarball that contains all > subprojects' checkout can be made easily. > > So the issue may not be partial repository support, but support > for managing multiple projects. There's no question that that may be better for some projects. But I believe that the project members (or owners) should decide how they use their tools. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-15 18:15 ` A Large Angry SCM @ 2006-09-17 10:43 ` Junio C Hamano 2006-09-17 18:47 ` A Large Angry SCM 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2006-09-17 10:43 UTC (permalink / raw) To: gitzilla; +Cc: git A Large Angry SCM <gitzilla@gmail.com> writes: > Junio C Hamano wrote: > ... >> >> Having said that, I do not necessarily agree that highly modular >> projects would want to put everything in one git repository and >> track everything as a whole unit. > > And yet that's exactly how a lot of developers use CVS. You can argue > that some other way is better but when they move from CVS they're > looking for continuity of productivity which often means not radically > changing how they work. At least in the short term. (Note. In the next paragraph, I used (for the want of better wording) the word "unrelated" to mean "indeed related, but without need to be in sync at whole-tree commit level, and probably insisting to be in sync at that level has more disadvantages than advantages". Think of it to refer to the relationship among more-or-less independent project subcomponents in my earlier example). Well, the fact is, it does not make any difference to CVS if you put unrelated projects in a single "repository", because CVS does not even have the concept of managing a project as a whole. Except perhaps that you can give the same tag to individual files and treat the set of the revs of the files that have that particular tag forms a revision of a project. And even that is just a kludge -- if you throw a totally unrelated project into such a "repository" and give files a tag that happens to be the same name as the one used in another project, the tool happily lets you do so and checking out a revision by the tag will pull files from totally unrelated projects together. We happen to use the words "repository" and "revision" in git but what they mean is quite different because we are more whole-tree oriented. It misses the point to compare CVS and git and say CVS allows placing unrelated things in the same "repository". CVS does not even track the whole tree state, so it does not hurt the user nor the tool even if you did so. With a tool that tracks the whole tree, you need a bit of thinking and planning. I do not think, by the way, we are aiming at much different things. We both know that our current tools do not support either mode of operation; the direction you are coming from where everything is under one roof and only parts are accessed while others are left untouched, or an organization where loosely-related projects from different repositories are checked out into a working tree hierarchy. You alluded to "split index" in another message, and the project organization I suggested to keep component projects in their own separate repositories would also have separate indices in component repositories. I am certainly not opposed to the idea of making operations in such a tree (built either way) go seamlessly for the users. A Porcelain that supports such mode of operation needs to be built or enhanced, because we do not have one. What I am saying is that I suspect everything-under-one-roof approach would incur higher damage to the core than multiple repositories approach. It is my understanding that Cogito has such a light-weight subproject support that lets you have separate repositories laid out in a single working tree. For example, users of such a Porcelain most likely would not worry about what is stored in .git/index and .git/remotes/ directories of individual repositories that appear in such a single working tree. The Porcelain would keep track of which files are locally modified, what components are checked out in which subdirectory, where their upstream repositories are, and things like that. It will use .git/index and .git/remotes/ of component repositories to implement the unified tree, but the use of the individual .git/ directories is its implementation detail. It is likely to do its own bookkeeping that is beyond what the current core offers, but I do not think it needs much additional core support. If such bookkeeping turns out to be useful and necessary to have it in the core for whatever reason (either performance or interoperability across Porcelains), we could certainly talk about putting that into the core. I suspect that everything-under-one-roof approach is coming from an observation that: - with CVS, projects that share the same cvsroot can be updated with single 'cvs update' command in a directory close to the root. - with git, if you use multiple repositories checked out at right places in the working tree hierarchy, you need to run around and say "git checkout" or "git commit" everywhere. and the latter looks very inconvenient. But of course the latter is very inconvenient. The current "git checkout" nor "git commit" are not such subprojects-aware Porcelain commands. But that does not mean you have to house everything in the same repository and make partial check-in to work. You will be enhancing or replacing the same "git checkout" and "git commit" commands to do so anyway. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-17 10:43 ` Junio C Hamano @ 2006-09-17 18:47 ` A Large Angry SCM 2006-09-17 18:55 ` Jakub Narebski 0 siblings, 1 reply; 13+ messages in thread From: A Large Angry SCM @ 2006-09-17 18:47 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: > A Large Angry SCM <gitzilla@gmail.com> writes: > >> Junio C Hamano wrote: >> ... >>> Having said that, I do not necessarily agree that highly modular >>> projects would want to put everything in one git repository and >>> track everything as a whole unit. >> And yet that's exactly how a lot of developers use CVS. You can argue >> that some other way is better but when they move from CVS they're >> looking for continuity of productivity which often means not radically >> changing how they work. At least in the short term. > [...] > I suspect that everything-under-one-roof approach is coming from > an observation that: > > - with CVS, projects that share the same cvsroot can be updated > with single 'cvs update' command in a directory close to the > root. > > - with git, if you use multiple repositories checked out at > right places in the working tree hierarchy, you need to run > around and say "git checkout" or "git commit" everywhere. > > and the latter looks very inconvenient. > > But of course the latter is very inconvenient. The current "git > checkout" nor "git commit" are not such subprojects-aware > Porcelain commands. But that does not mean you have to house > everything in the same repository and make partial check-in to > work. You will be enhancing or replacing the same "git checkout" > and "git commit" commands to do so anyway. I used CVS as an example but I've seen the "everything-under-one-roof" approach, as you put it, used in other VCS' that do work with changesets. One instance, in particular, has all the source and config files in a single tree that (I'm told) would take several Gigs of filesystem space to checkout fully. The codebase is modular to a great extent but any particular project in it usually required files from a large number of other projects. There is a lot of automation to find the required parts for builds and other actions on a project's codebase. Could this be done with multiple repositories? Yes, but we're talking hundreds (no exaggeration) and many of those would likely end-up including a large percentage of the other repositories the way the Git repository includes the Gitk repository. It could work but their existing approach already works and is likely better suited for their codebase. Git can not, currently, do all the things that this organization wants a VCS to do, working with partial checkouts is a key one. There is no fundamental reason Git can not support partial checkouts/working directories. In fact, there is no fundamental reason Git can not support operations on partial (sparse?) repositories in both space (working content/state, etc.) and time (history); it's just a matter of record keeping[*1*]. That isn't how the Linux kernel developers want to use their VCS but it _is_ how others want to use theirs. Notes: [*1*] I'm currently working on determining the minimum requirements needed to support repositories with partial or sparse history. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-17 18:47 ` A Large Angry SCM @ 2006-09-17 18:55 ` Jakub Narebski 2006-09-17 20:01 ` A Large Angry SCM 0 siblings, 1 reply; 13+ messages in thread From: Jakub Narebski @ 2006-09-17 18:55 UTC (permalink / raw) To: git A Large Angry SCM wrote: > There is no fundamental reason Git can not support partial > checkouts/working directories. In fact, there is no fundamental reason > Git can not support operations on partial (sparse?) repositories in both > space (working content/state, etc.) and time (history); it's just a > matter of record keeping[*1*]. That isn't how the Linux kernel > developers want to use their VCS but it _is_ how others want to use > theirs. There is perhaps not much trouble with partial checkouts, but there is problem with partial _commits_, at least for snapshot based SCM (as opposed to patchset based SCM). -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-17 18:55 ` Jakub Narebski @ 2006-09-17 20:01 ` A Large Angry SCM 2006-09-17 20:28 ` Jakub Narebski 0 siblings, 1 reply; 13+ messages in thread From: A Large Angry SCM @ 2006-09-17 20:01 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski wrote: > A Large Angry SCM wrote: > >> There is no fundamental reason Git can not support partial >> checkouts/working directories. In fact, there is no fundamental reason >> Git can not support operations on partial (sparse?) repositories in both >> space (working content/state, etc.) and time (history); it's just a >> matter of record keeping[*1*]. That isn't how the Linux kernel >> developers want to use their VCS but it _is_ how others want to use >> theirs. > > There is perhaps not much trouble with partial checkouts, but there is > problem with partial _commits_, at least for snapshot based SCM (as opposed > to patchset based SCM). By "partial commit" I take it you mean a commit with only partial information about the new (content) state? If so, the missing information about the new state can be assumed to have not changed from the previous recorded state (commit). ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-17 20:01 ` A Large Angry SCM @ 2006-09-17 20:28 ` Jakub Narebski 2006-09-17 21:11 ` A Large Angry SCM 0 siblings, 1 reply; 13+ messages in thread From: Jakub Narebski @ 2006-09-17 20:28 UTC (permalink / raw) To: git A Large Angry SCM wrote: > Jakub Narebski wrote: >> A Large Angry SCM wrote: >> >>> There is no fundamental reason Git can not support partial >>> checkouts/working directories. In fact, there is no fundamental reason >>> Git can not support operations on partial (sparse?) repositories in both >>> space (working content/state, etc.) and time (history); it's just a >>> matter of record keeping[*1*]. That isn't how the Linux kernel >>> developers want to use their VCS but it _is_ how others want to use >>> theirs. >> >> There is perhaps not much trouble with partial checkouts, but there is >> problem with partial _commits_, at least for snapshot based SCM >> (as opposed to patchset based SCM). > > By "partial commit" I take it you mean a commit with only partial > information about the new (content) state? If so, the missing > information about the new state can be assumed to have not changed from > the previous recorded state (commit). That of course assumes that 1) the whole state is recorded somewhere (perhaps in the repository); so the partial checkout saves space only if repository compress really well, 2) there are no merges outside checked out part. Is anybody working on "bind" header and subproject support? -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Notes on supporting Git operations in/on partial Working Directories 2006-09-17 20:28 ` Jakub Narebski @ 2006-09-17 21:11 ` A Large Angry SCM 0 siblings, 0 replies; 13+ messages in thread From: A Large Angry SCM @ 2006-09-17 21:11 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski wrote: > A Large Angry SCM wrote: >> Jakub Narebski wrote: >>> A Large Angry SCM wrote: >>> >>>> There is no fundamental reason Git can not support partial >>>> checkouts/working directories. In fact, there is no fundamental reason >>>> Git can not support operations on partial (sparse?) repositories in both >>>> space (working content/state, etc.) and time (history); it's just a >>>> matter of record keeping[*1*]. That isn't how the Linux kernel >>>> developers want to use their VCS but it _is_ how others want to use >>>> theirs. >>> There is perhaps not much trouble with partial checkouts, but there is >>> problem with partial _commits_, at least for snapshot based SCM >>> (as opposed to patchset based SCM). >> By "partial commit" I take it you mean a commit with only partial >> information about the new (content) state? If so, the missing >> information about the new state can be assumed to have not changed from >> the previous recorded state (commit). > > That of course assumes that 1) the whole state is recorded somewhere > (perhaps in the repository); so the partial checkout saves space only if > repository compress really well, 2) there are no merges outside checked out > part. 1) The TREE objects leading to the objects that are added/deleted/changed objects are required. TREEs not leading to the added/deleted/changed objects are not required, only their IDs. That is sufficient to commit the changes in a partial checkout. 2) Obviously, only the part checked out can be worked on. If you want to merge changes to some other part, you will need that part, and possibly a mergebase. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-09-17 21:11 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-09-14 19:05 Notes on supporting Git operations in/on partial Working Directories A Large Angry SCM 2006-09-14 19:21 ` Shawn Pearce 2006-09-14 20:08 ` A Large Angry SCM 2006-09-14 19:50 ` Junio C Hamano 2006-09-14 20:19 ` A Large Angry SCM 2006-09-15 2:43 ` Junio C Hamano 2006-09-15 18:15 ` A Large Angry SCM 2006-09-17 10:43 ` Junio C Hamano 2006-09-17 18:47 ` A Large Angry SCM 2006-09-17 18:55 ` Jakub Narebski 2006-09-17 20:01 ` A Large Angry SCM 2006-09-17 20:28 ` Jakub Narebski 2006-09-17 21:11 ` A Large Angry SCM
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).