* restriction of pulls @ 2007-02-09 10:49 Christoph Duelli 2007-02-09 11:19 ` Jakub Narebski 2007-02-09 14:54 ` Johannes Schindelin 0 siblings, 2 replies; 13+ messages in thread From: Christoph Duelli @ 2007-02-09 10:49 UTC (permalink / raw) To: git Is it possible to restrict a chechout, clone or a later pull to some subdirectory of a repository? (Background: using subversion (or cvs), it is possible to do a file or directory-restricted update.) Say, I have a repository containing 2 (mostly) independent projects A and B (in separate) directories: - R - A - B Is it possibly to pull all the changes made to B, but not those made to A. (Yes, I know that this causes trouble if there are dependencies into A.) Regards -- Christoph Duelli MELOS GmbH ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-09 10:49 restriction of pulls Christoph Duelli @ 2007-02-09 11:19 ` Jakub Narebski 2007-02-09 14:54 ` Johannes Schindelin 1 sibling, 0 replies; 13+ messages in thread From: Jakub Narebski @ 2007-02-09 11:19 UTC (permalink / raw) To: git Christoph Duelli wrote: > Is it possible to restrict a chechout, clone or a later pull to some > subdirectory of a repository? > (Background: using subversion (or cvs), it is possible to do a file or > directory-restricted update.) > > Say, I have a repository containing 2 (mostly) independent projects > A and B (in separate) directories: > - R > - A > - B > Is it possibly to pull all the changes made to B, but not those made to A. > (Yes, I know that this causes trouble if there are dependencies into A.) No, it is not possible. Moreover, it is not sensible, as it breaks atomicity of a commit. Well, you can hack, but... That said, there is experimental submodule (subproject) support http://git.or.cz/gitwiki/SubprojectSupport http://git.admingilde.org/tali/git.git/module2 (there was also proposal of more lightweight submodule support, but I don't have a link to it), and you should have set A and B as submodules (subprojects). -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-09 10:49 restriction of pulls Christoph Duelli 2007-02-09 11:19 ` Jakub Narebski @ 2007-02-09 14:54 ` Johannes Schindelin 2007-02-09 15:32 ` Rogan Dawes 1 sibling, 1 reply; 13+ messages in thread From: Johannes Schindelin @ 2007-02-09 14:54 UTC (permalink / raw) To: Christoph Duelli; +Cc: git Hi, On Fri, 9 Feb 2007, Christoph Duelli wrote: > Is it possible to restrict a chechout, clone or a later pull to some > subdirectory of a repository? No. In git, a revision really is a revision, and not a group of file revisions. Ciao, Dscho ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-09 14:54 ` Johannes Schindelin @ 2007-02-09 15:32 ` Rogan Dawes 2007-02-09 16:19 ` Andy Parkins 2007-02-10 14:50 ` Johannes Schindelin 0 siblings, 2 replies; 13+ messages in thread From: Rogan Dawes @ 2007-02-09 15:32 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Christoph Duelli, git Johannes Schindelin wrote: > Hi, > > On Fri, 9 Feb 2007, Christoph Duelli wrote: > >> Is it possible to restrict a chechout, clone or a later pull to some >> subdirectory of a repository? > > No. In git, a revision really is a revision, and not a group of file > revisions. > > Ciao, > Dscho > I thought about how this might be implemented, although I'm not entirely sure how efficient this will be. One obstacle to implementing partial checkouts is that one does not know which objects have changed or been deleted. One way of addressing this is to keep a record of the hashes of all the objects that were NOT checked out. (If one does not check out part of a directory, simply store the hash of the top level, and you do not need to store the child hashes.) This record would be a kind of "negative index". When deciding what to check in, or which files are modified, one would check the "negative index" first to see if an entry exists. If not, only then would you check the filesystem to see if modification times have changed. With the "negative index", and the files in the file system, one would be able to construct new commits, without any problem. It would also require an updated transfer protocol, which would allow the client to specify a tag/commit, then walk the tree that it points to to find the portion that the client is looking for, then pull only those objects (and possibly their history). This is likely to be VERY inefficient in terms of round trips, at least initially. This might be able to benefit from the shallow checkout support that was recently implemented. Comments? Rogan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-09 15:32 ` Rogan Dawes @ 2007-02-09 16:19 ` Andy Parkins 2007-02-09 16:36 ` Rogan Dawes 2007-02-10 14:50 ` Johannes Schindelin 1 sibling, 1 reply; 13+ messages in thread From: Andy Parkins @ 2007-02-09 16:19 UTC (permalink / raw) To: git; +Cc: Rogan Dawes, Johannes Schindelin, Christoph Duelli On Friday 2007 February 09 15:32, Rogan Dawes wrote: > One obstacle to implementing partial checkouts is that one does not know > which objects have changed or been deleted. One way of addressing this Why would you want to do a partial checkout. I used subversion for a long time before git, which does to partial checkouts and it's a nightmare. Things like this cd dir1/ edit files cd ../dir2 edit files svn commit * committed revision 100 KABLAM! Disaster. Revision 100 no longer compiles/runs. The changes in dir1 and dir2 were complimentary changes (say like renaming a function and then the places that call that function). I didn't even notice how awful it was until I started using git and had a VCS that did the right thing. In every way that matters you can do a partial checkout - I can pull any version of any file out of the repository. However, it should certainly not be the case that git records that fact. I think what you're actually after (from your description) is a shallow clone. I believe that went in a while ago from Dscho. $ git clone --depth=5 <someurl> Will fetch only the last 5 revisions from the remote. The other half to that is a shallow-by-tree clone; that is anathema to git as there is no such thing as a partial tree. Submodule support is what you want, but that's still in development. The only piece that (I think) is missing to get the functionality you want is a kind of lazy transfer mode. For something like, say, the kde repository you can do svn checkout svn://svn.kde.org/kde/some/deep/path/in/the/project And just get that directory - i.e. you don't have to pay the cost of downloading the whole of KDE. Git can't do that; however, I think one day it will be able to by choosing not to download every object from the remote. Andy -- Dr Andy Parkins, M Eng (hons), MIEE andyparkins@gmail.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-09 16:19 ` Andy Parkins @ 2007-02-09 16:36 ` Rogan Dawes 2007-02-09 16:45 ` Andy Parkins 0 siblings, 1 reply; 13+ messages in thread From: Rogan Dawes @ 2007-02-09 16:36 UTC (permalink / raw) To: Andy Parkins; +Cc: git, Johannes Schindelin, Christoph Duelli Andy Parkins wrote: > On Friday 2007 February 09 15:32, Rogan Dawes wrote: > >> One obstacle to implementing partial checkouts is that one does not know >> which objects have changed or been deleted. One way of addressing this > > Why would you want to do a partial checkout. I used subversion for a long > time before git, which does to partial checkouts and it's a nightmare. > > Things like this > > cd dir1/ > edit files > cd ../dir2 > edit files > svn commit > * committed revision 100 > > KABLAM! Disaster. Revision 100 no longer compiles/runs. The changes in dir1 > and dir2 were complimentary changes (say like renaming a function and then > the places that call that function). Please note that my suggestion does NOT imply allowing partial checkins (or if it does, it was not my intention) What I am trying to support is Jon Smirl's description of how some Mozilla contributors work, specifically the documentation folks. They do not have any need to look at the actual code, but simply limit themselves to the files in the doc/ directory. Supporting a partial checkout of this doc/ directory would allow them to get a "check in"-able subdirectory, without having to download the rest of the source. What I intended to convey was that when determining which files have changed, and presenting them to the user to decide whether to commit them or not, the filesystem-walker would first check the "negative index" to see if that directory/file had been explicitly excluded from the checkout. This implies that they did not (and do not intend to) modify that portion of the tree. Which implies that the committer can then construct a complete view of the entire tree (now including the changes that were made in the partial checkout) by resolving the modified files with the knowledge of the hashes of the unmodified files/trees. > > In every way that matters you can do a partial checkout - I can pull any > version of any file out of the repository. However, it should certainly not > be the case that git records that fact. Why not? If you only want to modify that file, does it not make sense that you can just check out that file, modify it, and check it back in? Or at least if not check it in, construct a diff for mailing to the maintainer? Or even, allowing the maintainer to pull/merge the changes from the contributor, even though the contributor doesn't necessarily have all the blobs required to make up the tree he is committing? They should all be available from the "alternate" if required. Rogan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-09 16:36 ` Rogan Dawes @ 2007-02-09 16:45 ` Andy Parkins 2007-02-09 17:32 ` Rogan Dawes 0 siblings, 1 reply; 13+ messages in thread From: Andy Parkins @ 2007-02-09 16:45 UTC (permalink / raw) To: git; +Cc: Rogan Dawes, Johannes Schindelin, Christoph Duelli On Friday 2007 February 09 16:36, Rogan Dawes wrote: > Please note that my suggestion does NOT imply allowing partial checkins > (or if it does, it was not my intention) My apologies then; I did misunderstand. > > In every way that matters you can do a partial checkout - I can pull any > > version of any file out of the repository. However, it should certainly > > not be the case that git records that fact. > > Why not? If you only want to modify that file, does it not make sense > that you can just check out that file, modify it, and check it back in? Sorry - what I meant was that it shouldn't record that you checked out revision 74 of that file and retain a link from the current version to that old version. Andy -- Dr Andy Parkins, M Eng (hons), MIEE andyparkins@gmail.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-09 16:45 ` Andy Parkins @ 2007-02-09 17:32 ` Rogan Dawes 2007-02-10 9:59 ` Andy Parkins 0 siblings, 1 reply; 13+ messages in thread From: Rogan Dawes @ 2007-02-09 17:32 UTC (permalink / raw) To: Andy Parkins; +Cc: git, Johannes Schindelin, Christoph Duelli Andy Parkins wrote: > On Friday 2007 February 09 16:36, Rogan Dawes wrote: > >> Please note that my suggestion does NOT imply allowing partial checkins >> (or if it does, it was not my intention) > > My apologies then; I did misunderstand. > That'll teach me to be more clear ;-) > >>> In every way that matters you can do a partial checkout - I can pull any >>> version of any file out of the repository. However, it should certainly >>> not be the case that git records that fact. >> Why not? If you only want to modify that file, does it not make sense >> that you can just check out that file, modify it, and check it back in? > > Sorry - what I meant was that it shouldn't record that you checked out > revision 74 of that file and retain a link from the current version to that > old version. Well, the new commit would have the previous commit as its direct parent, even though it may not have all the blobs to support it. Which implies that all the git merge semantics should still work, assuming that the person actually doing the merge has all the necessary objects to resolve any conflicts. (Which does not necessarily imply that he has ALL of the objects in the tree, just those that are implicated in any conflicts). So, for example, the doc team may have a documentation maintainer who has the entire doc/ directory, who resolves any submissions from the doc team, and feeds that up into the master tree. And all of this could be done by means of pulls by the upstream maintainers. Rogan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-09 17:32 ` Rogan Dawes @ 2007-02-10 9:59 ` Andy Parkins 0 siblings, 0 replies; 13+ messages in thread From: Andy Parkins @ 2007-02-10 9:59 UTC (permalink / raw) To: git; +Cc: Rogan Dawes, Johannes Schindelin, Christoph Duelli On Friday 2007, February 09, Rogan Dawes wrote: > Well, the new commit would have the previous commit as its direct > parent, even though it may not have all the blobs to support it. This I agree with; this seems like the way that a partial checkout would be supported. As you say - there would be no need to have the blobs available for objects you aren't altering. Unfortunately, it seems like it would be a huge amount of work to actually do. Andy -- Dr Andrew Parkins, M Eng (Hons), AMIEE andyparkins@gmail.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-09 15:32 ` Rogan Dawes 2007-02-09 16:19 ` Andy Parkins @ 2007-02-10 14:50 ` Johannes Schindelin 2007-02-12 13:58 ` Rogan Dawes 1 sibling, 1 reply; 13+ messages in thread From: Johannes Schindelin @ 2007-02-10 14:50 UTC (permalink / raw) To: Rogan Dawes; +Cc: Christoph Duelli, git Hi, On Fri, 9 Feb 2007, Rogan Dawes wrote: > Johannes Schindelin wrote: > > > > On Fri, 9 Feb 2007, Christoph Duelli wrote: > > > > > Is it possible to restrict a chechout, clone or a later pull to some > > > subdirectory of a repository? > > > > No. In git, a revision really is a revision, and not a group of file > > revisions. > > I thought about how this might be implemented, although I'm not entirely > sure how efficient this will be. There are basically three ways I can think of: - rewrite the commit objects on the fly. You might want to avoid the use of the pack protocol here (i.e. use HTTP or FTP transport). - try to teach git a way to ignore certain missing objects and directories. This might be involved, but you could extend upload-pack easily with a new extension for that. (my favourite:) - use git-split to create a new branch, which only contains doc/. Do work only on that branch, and merge into mainline from time to time. If you don't need the history, you don't need to git-split the branch. You only need to make sure that the newly created branch is _not_ branched off of mainline, since the next merge would _delete_ all files outside of doc/ (merge would see that the files exist in mainline, and existed in the common ancestor, too, so would think that the files were deleted in the doc branch). Ciao, Dscho ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-10 14:50 ` Johannes Schindelin @ 2007-02-12 13:58 ` Rogan Dawes 2007-02-12 14:13 ` Johannes Schindelin 0 siblings, 1 reply; 13+ messages in thread From: Rogan Dawes @ 2007-02-12 13:58 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Rogan Dawes, Christoph Duelli, git Johannes Schindelin wrote: > Hi, > > On Fri, 9 Feb 2007, Rogan Dawes wrote: > >> Johannes Schindelin wrote: >>> On Fri, 9 Feb 2007, Christoph Duelli wrote: >>> >>>> Is it possible to restrict a chechout, clone or a later pull to some >>>> subdirectory of a repository? >>> No. In git, a revision really is a revision, and not a group of file >>> revisions. >> I thought about how this might be implemented, although I'm not entirely >> sure how efficient this will be. > > There are basically three ways I can think of: > > - rewrite the commit objects on the fly. You might want to avoid the use > of the pack protocol here (i.e. use HTTP or FTP transport). > > - try to teach git a way to ignore certain missing objects and > directories. This might be involved, but you could extend upload-pack > easily with a new extension for that. > > (my favourite:) > - use git-split to create a new branch, which only contains doc/. Do work > only on that branch, and merge into mainline from time to time. > > If you don't need the history, you don't need to git-split the branch. > > You only need to make sure that the newly created branch is _not_ branched > off of mainline, since the next merge would _delete_ all files outside of > doc/ (merge would see that the files exist in mainline, and existed in the > common ancestor, too, so would think that the files were deleted in the > doc branch). > > Ciao, > Dscho > Your third option sounds quite clever, apart from the problem of attributing a commit and a commit message to someone, when the actual commit doesn't match what they actually did :-( As well as wondering what happens when they check out a few more files. Do we rewrite those commits as well? What happens if the user has made some commits already? What happens if they have already sent those upstream? etc. I think the best solution is ultimately to make git able to cope with certain missing objects. I started writing this in response to another message, but it will do fine here, too: The description I give here will likely horrify people in terms of communications inefficiency, but I'm sure that can be improved. Scenario: A user sees a documentation bug in a git-managed project, and decides that she wants to do something about it. Since she is not on the fastest of connections, she'd like to reduce the checkout to a reasonable minimum, while still working with the git tools. Viewing the repo layout using gitweb, she sees that all the documentation is stored in the docs/ directory from the root. So, she creates a local repo to work in: $ git init-db She configures her local repo to reference the source one: (Hypothetical syntax) $ git clone --reference http://example.com/project.git \ http://example.com/project.git Since the reference and repo are the same (and non-local), git doesn't actually download anything, other than the current heads (and maybe tags). She then does a partial checkout of the master branch, but only the docs/ directory: $ git checkout -p master docs/ The -p flag indicates that this is a partial checkout of master. Git records that the current HEAD is "master", checks out the docs/ directory, and removes any other files in the working directory (that it knew about from the existing index, if any - I'm not suggesting that it should arbitrarily delete files!) The checkout process goes as follows: Resolve the <treeish> that HEAD points to, and retrieve it from the upstream repo if it does not exist locally. Continue requesting only the necessary tree and blob objects to satisfy the requested checkout. i.e. From the first tree, identify the docs/ directory. Then request only that tree object. Continue to download tree and blob objects until the entire docs/ directory can be created in the working directory. This will likely require a new index file format, that simply stores the hashes of objects (blobs or trees) that have not been checked out, as well as the current file's stat information. Now create a "negative index" (pindex?) that has details about the other files and directories that were not checked out. Obviously, this does not need to recurse into directories that were not checked out. Simply having the hash of the parent directory in the pindex is sufficient information to reconstruct a new index. (This might require a new index format that does not include all known files, but simply stores the hash of the unchecked-out tree or blob.) Then creating a new commit would require creating the necessary blobs for changed files, new tree objects for trees that change, and a commit object. As far as I can tell, that could then be pushed/pulled/merged using the existing tools, without any problems. Rogan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-12 13:58 ` Rogan Dawes @ 2007-02-12 14:13 ` Johannes Schindelin 2007-02-12 14:29 ` Rogan Dawes 0 siblings, 1 reply; 13+ messages in thread From: Johannes Schindelin @ 2007-02-12 14:13 UTC (permalink / raw) To: Rogan Dawes; +Cc: Rogan Dawes, Christoph Duelli, git Hi, On Mon, 12 Feb 2007, Rogan Dawes wrote: > Johannes Schindelin wrote: > > > > (my favourite:) > > - use git-split to create a new branch, which only contains doc/. Do work > > only on that branch, and merge into mainline from time to time. > > Your third option sounds quite clever, apart from the problem of attributing a > commit and a commit message to someone, when the actual commit doesn't match > what they actually did :-( This problem is not related to subprojects at all. If the commit message does not match the patch, you are always fscked. > As well as wondering what happens when they check out a few more files. Do we > rewrite those commits as well? What happens if the user has made some commits > already? What happens if they have already sent those upstream? etc. I think you misunderstood. My favourite option would make docs a _separate_ project, with its own history. It just happens to be pulled from time to time, just like git-gui, gitk and git-fast-import in git.git. > I think the best solution is ultimately to make git able to cope with > certain missing objects. Hmm. I am not convinced. On nice thing about git is its level of integrity. Which means that no random objects are missing. > I started writing this in response to another message, but it will do fine > here, too: > > The description I give here will likely horrify people in terms of > communications inefficiency, but I'm sure that can be improved. > > [goes on... and describes the lazy clone!] AFAICT this really is the lazy clone. And it was already determined that it is all to easy to pull in all commit objects by accident. Which boils down to a substantial chunk of the repository. But if you want to play with it: by all means, go ahead. It might just be that you overcome the fundamental difficulties, and we get something nice out of it. Ciao, Dscho ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls 2007-02-12 14:13 ` Johannes Schindelin @ 2007-02-12 14:29 ` Rogan Dawes 0 siblings, 0 replies; 13+ messages in thread From: Rogan Dawes @ 2007-02-12 14:29 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Christoph Duelli, git Johannes Schindelin wrote: > Hi, > > On Mon, 12 Feb 2007, Rogan Dawes wrote: > >> Johannes Schindelin wrote: >>> (my favourite:) >>> - use git-split to create a new branch, which only contains doc/. Do work >>> only on that branch, and merge into mainline from time to time. >> Your third option sounds quite clever, apart from the problem of attributing a >> commit and a commit message to someone, when the actual commit doesn't match >> what they actually did :-( > > This problem is not related to subprojects at all. If the commit message > does not match the patch, you are always fscked. Well, I was thinking about the fact that the files originally checked in will not match the files "checked in" in the rewritten commit. >> As well as wondering what happens when they check out a few more files. Do we >> rewrite those commits as well? What happens if the user has made some commits >> already? What happens if they have already sent those upstream? etc. > > I think you misunderstood. My favourite option would make docs a > _separate_ project, with its own history. It just happens to be pulled > from time to time, just like git-gui, gitk and git-fast-import in git.git. I see. However, that does not allow for the random single-file checkout scenario I sketched out. Which may or may not be common/desirable, but it is an extreme case of the partial checkout, without fixed delineation. >> I think the best solution is ultimately to make git able to cope with >> certain missing objects. > > Hmm. I am not convinced. On nice thing about git is its level of > integrity. Which means that no random objects are missing. Good point. :-( >> I started writing this in response to another message, but it will do fine >> here, too: >> >> The description I give here will likely horrify people in terms of >> communications inefficiency, but I'm sure that can be improved. >> >> [goes on... and describes the lazy clone!] > > AFAICT this really is the lazy clone. And it was already determined that > it is all to easy to pull in all commit objects by accident. Which boils > down to a substantial chunk of the repository. > Not so much a lazy clone as a partial clone. It is only in the "clone", "fetch" or "checkout" code paths that new objects will be retrieved from the source repo. Things like "git log"/"git show" would not do so, and would be required to handle missing objects gracefully. > But if you want to play with it: by all means, go ahead. It might just be > that you overcome the fundamental difficulties, and we get something nice > out of it. > > Ciao, > Dscho > Maybe ;-) We'll see if I get any time for it. Rogan ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-02-12 14:31 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-02-09 10:49 restriction of pulls Christoph Duelli 2007-02-09 11:19 ` Jakub Narebski 2007-02-09 14:54 ` Johannes Schindelin 2007-02-09 15:32 ` Rogan Dawes 2007-02-09 16:19 ` Andy Parkins 2007-02-09 16:36 ` Rogan Dawes 2007-02-09 16:45 ` Andy Parkins 2007-02-09 17:32 ` Rogan Dawes 2007-02-10 9:59 ` Andy Parkins 2007-02-10 14:50 ` Johannes Schindelin 2007-02-12 13:58 ` Rogan Dawes 2007-02-12 14:13 ` Johannes Schindelin 2007-02-12 14:29 ` Rogan Dawes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).