* 'git clone' doesn't use alternates automatically? @ 2009-01-30 22:12 James Pickens 2009-01-31 7:12 ` Jeff King 0 siblings, 1 reply; 13+ messages in thread From: James Pickens @ 2009-01-30 22:12 UTC (permalink / raw) To: Git ML Hi, I have a central, shared Git repository on an NFS drive at path $central. I have added "$central/objects" to $central/objects/info/alternates. I see that when I clone this repository with Git 1.6.1, the alternates file is automatically copied to the clone, but so are all the pack files and loose objects. If I then cd to the clone and run 'git gc', it removes the redundant local objects. I thought I tested this setup a few months back, and 'git clone' automatically used the alternates file to avoid copying the redundant objects into the clone. Has this behavior changed, or is my memory bad? James ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-01-30 22:12 'git clone' doesn't use alternates automatically? James Pickens @ 2009-01-31 7:12 ` Jeff King 2009-01-31 20:08 ` James Pickens 0 siblings, 1 reply; 13+ messages in thread From: Jeff King @ 2009-01-31 7:12 UTC (permalink / raw) To: James Pickens; +Cc: Git ML On Fri, Jan 30, 2009 at 03:12:42PM -0700, James Pickens wrote: > I have a central, shared Git repository on an NFS drive at path > $central. I have added "$central/objects" to > $central/objects/info/alternates. I see that when I clone this > repository with Git 1.6.1, the alternates file is automatically copied > to the clone, but so are all the pack files and loose objects. If I > then cd to the clone and run 'git gc', it removes the redundant local > objects. Yes, we don't set up alternates to an origin by default. If it's a local clone, we do hardlink by default: $ ls -i git/.git/objects/pack 7639155 pack-0651ae7e35ffde1921db158a3292e1c81153be1a.idx 7638782 pack-0651ae7e35ffde1921db158a3292e1c81153be1a.pack $ git clone git foo ... $ ls -i foo/.git/objects/pack 7639155 pack-0651ae7e35ffde1921db158a3292e1c81153be1a.idx 7638782 pack-0651ae7e35ffde1921db158a3292e1c81153be1a.pack but presumably in your example the second clone is _not_ on the NFS mount, and therefore can't hardlink. So you can try "git clone -s" to specify that you definitely want alternates. > I thought I tested this setup a few months back, and 'git clone' > automatically used the alternates file to avoid copying the redundant > objects into the clone. Has this behavior changed, or is my memory > bad? I don't recall clone ever being that clever, but I could be wrong (it is not an area of the code that I am too familiar with). Can you try a test with a few different versions to see if it ever behaved as you expected (and if it does, bisect to find the breakage)? -Peff ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-01-31 7:12 ` Jeff King @ 2009-01-31 20:08 ` James Pickens 2009-01-31 21:08 ` Jakub Narebski ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: James Pickens @ 2009-01-31 20:08 UTC (permalink / raw) To: Git ML; +Cc: Jeff King On Sat, Jan 31, 2009 at 12:12 AM, Jeff King <peff@peff.net> wrote: > but presumably in your example the second clone is _not_ on the NFS > mount, and therefore can't hardlink. That's correct. > So you can try "git clone -s" to specify that you definitely want > alternates. Well, the clone gets the alternates either way. It just doesn't use them to avoid copying the data unless I give -s. More importantly, if 'git clone' worked the way I thought, then when I clone a remote repository for which I have a local mirror, I could avoid typing '--reference <path to local mirror>' by adding <path to local mirror>/objects to the alternates file in the remote repository. > I don't recall clone ever being that clever, but I could be wrong (it is > not an area of the code that I am too familiar with). > > Can you try a test with a few different versions to see if it ever > behaved as you expected (and if it does, bisect to find the breakage)? Damn. I was hoping the response would be "it's a regression, and here's a patch to fix it". I went ahead and tested a few old versions and they all behave the same way. So, is there any reason 'git clone' shouldn't automatically use the alternates that it copied into the new repository? I might look into writing a patch if nobody objects. James ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-01-31 20:08 ` James Pickens @ 2009-01-31 21:08 ` Jakub Narebski 2009-01-31 21:43 ` James Pickens 2009-01-31 21:55 ` Jeff King 2009-02-01 0:55 ` Junio C Hamano 2 siblings, 1 reply; 13+ messages in thread From: Jakub Narebski @ 2009-01-31 21:08 UTC (permalink / raw) To: James Pickens; +Cc: Git ML, Jeff King James Pickens <jepicken@gmail.com> writes: > So, is there any reason 'git clone' shouldn't automatically use > the alternates that it copied into the new repository? I might > look into writing a patch if nobody objects. Alternates are fragile with respect to garbage collecting in the repository you borrow objects from. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-01-31 21:08 ` Jakub Narebski @ 2009-01-31 21:43 ` James Pickens 0 siblings, 0 replies; 13+ messages in thread From: James Pickens @ 2009-01-31 21:43 UTC (permalink / raw) To: Git ML; +Cc: Jeff King, Jakub Narebski On Sat, Jan 31, 2009 at 2:08 PM, Jakub Narebski <jnareb@gmail.com> wrote: > James Pickens <jepicken@gmail.com> writes: > >> So, is there any reason 'git clone' shouldn't automatically use >> the alternates that it copied into the new repository? I might >> look into writing a patch if nobody objects. > > Alternates are fragile with respect to garbage collecting in the > repository you borrow objects from. I think that's irrelevant in this case. The scenario is that I clone repo A, which is borrowing objects from repo B. So repo A was already assuming that it's safe to borrow from B. The current behavior is that the clone of A also borrows from B automatically. What I am asking is whether 'git clone' should take advantage of that to avoid copying redundant objects from A into the clone. They will get deleted the first time I run 'git gc' in the clone anyways. James ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-01-31 20:08 ` James Pickens 2009-01-31 21:08 ` Jakub Narebski @ 2009-01-31 21:55 ` Jeff King 2009-02-01 1:19 ` Junio C Hamano 2009-02-01 0:55 ` Junio C Hamano 2 siblings, 1 reply; 13+ messages in thread From: Jeff King @ 2009-01-31 21:55 UTC (permalink / raw) To: James Pickens; +Cc: Git ML On Sat, Jan 31, 2009 at 01:08:16PM -0700, James Pickens wrote: > Well, the clone gets the alternates either way. It just doesn't > use them to avoid copying the data unless I give -s. More The other key change is that you don't depend on the origin in your alternates when you don't use "-s". > So, is there any reason 'git clone' shouldn't automatically use > the alternates that it copied into the new repository? I might > look into writing a patch if nobody objects. I think the reason "-s" isn't the default is that alternates are fragile (as Jakub mentioned), and we don't want ot set them up without the user asking to do so. So from what you've posted (but I haven't double checked or looked at the code), it sounds like the current behavior is: - with "-s", add the origin as an alternate, and use alternates while cloning - "with --reference", add some other repo as an alternate, and use alternates while cloning - without either, copy alternates from origin, but _don't_ use alternates while cloning The last one seems a little silly. Why bother setting up the alternates if you're not going to use them? I guess because we might not be able to get the objects at all, otherwise, and we need to know where to copy them from. But either: - that is an implementation-specific detail of clone, and those alternates should go away after we clone or - we should fully respect those alternates The only downside to the latter is that now somebody who has cloned a repository with alternates now has an alternates-based repository and might not know it (i.e., they might have been the one who set up alternates in the origin). -Peff ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-01-31 21:55 ` Jeff King @ 2009-02-01 1:19 ` Junio C Hamano 2009-02-02 13:07 ` Jeff King 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2009-02-01 1:19 UTC (permalink / raw) To: Jeff King; +Cc: James Pickens, Git ML Jeff King <peff@peff.net> writes: > - without either, copy alternates from origin, but _don't_ use > alternates while cloning Are you talking about a local clone optimization that does hardlink from the source repository? I am fairly certain that copying alternates from the source repository was not an intended behaviour but was a consequence of lazy coding of how we copy (or link) everything from it. The original was literally the simple matter of: find objects ! -type d -print | cpio $cpio_quiet_flag -pumd$l "$GIT_DIR/" whose intention was to copy objects/?? and objects/pack/. and it wasn't even part of the design consideration to worry about what would happen to the alternates the source repository might have in objects/info/. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-02-01 1:19 ` Junio C Hamano @ 2009-02-02 13:07 ` Jeff King 2009-02-03 4:30 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Jeff King @ 2009-02-02 13:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: James Pickens, Git ML On Sat, Jan 31, 2009 at 05:19:31PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > - without either, copy alternates from origin, but _don't_ use > > alternates while cloning > > Are you talking about a local clone optimization that does hardlink from > the source repository? Sorry, I was wrong about what was happening. From reading James' posts and not doing any experimenting or looking, I had the impression that doing this: # plain repo mkdir repo1 && (cd repo1 && git init && echo content >file && git add . && git commit -m one) # repo with alternates, but extra content git clone -s repo1 repo2 && (cd repo2 && echo content >>file && git commit -a -m two) # clone of repo w/ alternates git clone repo2 repo3 would cause the final clone to set up the alternate to repo1, but still pull in the objects. But that isn't the case, of course. Either: 1. It is a local hardlink clone, in which case we just pull in the objects from repo2. 2. It isn't, in which case we don't copy over the alternates. > I am fairly certain that copying alternates from the source repository was > not an intended behaviour but was a consequence of lazy coding of how we > copy (or link) everything from it. The original was literally the simple > matter of: > > find objects ! -type d -print | cpio $cpio_quiet_flag -pumd$l "$GIT_DIR/" > > whose intention was to copy objects/?? and objects/pack/. and it wasn't > even part of the design consideration to worry about what would happen to > the alternates the source repository might have in objects/info/. Right, I think that is what is going on. And what I was suggesting in my other email is that it is actively harmful to have this behavior, because now repo3 depends on repo1, without the user having explicitly asked for such a relationship (and they might not even be aware of repo1). I was tempted to suggest avoiding copying the alternates from repo2 to repo3. But you can't do that: repo2 is _missing_ objects that repo3 won't have. Without the alternates file pointing to repo1, repo3 is corrupt. So simply avoiding copying the alternates file doesn't work; one would have to actually pull the missing objects in from the alternate before doing so. But actually, I think there is even more breakage in hardlinking the alternates file: alternates files can be relative paths. So if repo2 points to "../../../repo1/.git/objects" (which it doesn't in the example above, as "clone -s" uses absolute paths -- but it is easy enough to construct a broken case), then repo3 will gain that alternate pointer, but may be in a totally different directory where that relative path is broken. And then repo3 is corrupt. So the alternates must be copied and any relative paths munged for it to work reliably. The hardlink code operates by default because it was thought to be a safe optimization that couldn't bite people. But it interacts badly with the concept of alternates. So I think a sane fix would be to disable hardlinking if the parent repo is using alternates at all. Then a vanilla "git clone repo2 repo3" will do the safe but more costly behavior of actually copying the objects. If the user wants to accept the risks of alternates, then he can give "-s" explicitly, and git will track the alternates recursively through repo2 to repo1 at runtime. -Peff ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-02-02 13:07 ` Jeff King @ 2009-02-03 4:30 ` Junio C Hamano 2009-02-03 6:06 ` Jeff King 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2009-02-03 4:30 UTC (permalink / raw) To: Jeff King; +Cc: James Pickens, Git ML Jeff King <peff@peff.net> writes: > The hardlink code operates by default because it was thought to be a > safe optimization that couldn't bite people. But it interacts badly with > the concept of alternates. Yes, you are right. To be fair, I think it was proposed/implemented by somebody who almost never uses alternates himself, and certainly never a relative alternates. The intention of hardlinking was while saving tons of disk space, still be independent from the original repository. Back when e95ab1e ([PATCH] Short-circuit git-clone-pack while cloning locally (take 2)., 2005-07-06) was done, the packfile implementation was still only a week old, and hardlinking made a lot of sense from space saving's point of view. These days, if you make a local hardlinked clone, work a little there and then repack it, most of the space saving will be gone; there isn't much point in the hardlink optimization anymore from that angle, even though it still is a good compromise between the clone speed and safety, especially when no alternates are involved. I think a possible fix would be not to copy alternates file literally, but install an alternates file to directly borrow from the same repositories the clone-source repository borrows from ourselves, taking relative paths into account. Another would be to look at the alternates and hardlink the objects and packs while cloning, and if the repositories involved reside across filesystem boundaries, we need to fall back to copying. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-02-03 4:30 ` Junio C Hamano @ 2009-02-03 6:06 ` Jeff King 0 siblings, 0 replies; 13+ messages in thread From: Jeff King @ 2009-02-03 6:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: James Pickens, Git ML On Mon, Feb 02, 2009 at 08:30:36PM -0800, Junio C Hamano wrote: > To be fair, I think it was proposed/implemented by somebody who almost > never uses alternates himself, and certainly never a relative alternates. > The intention of hardlinking was while saving tons of disk space, still be > independent from the original repository. > > Back when e95ab1e ([PATCH] Short-circuit git-clone-pack while cloning > locally (take 2)., 2005-07-06) was done, the packfile implementation was Well, in your defense, relative alternates didn't come about until two months later, in ccfd3e9. So you can blame the author of that patch for screwing up your existing work. :) > still only a week old, and hardlinking made a lot of sense from space > saving's point of view. These days, if you make a local hardlinked clone, > work a little there and then repack it, most of the space saving will be > gone; there isn't much point in the hardlink optimization anymore from > that angle, even though it still is a good compromise between the clone > speed and safety, especially when no alternates are involved. True. But I still think the hardlinks are nice for one-off repositories. Every once in a while I want to start a new topic or experiment while my repository is a mess; it's nice to just "git clone git foo", hack around in the work directory, and blow it away. And the hardlinks make that first step a _lot_ faster. But I also don't mind having to add a command-line option to get the speed. And for my use case, there really isn't a benefit to hardlinks over alternates. > I think a possible fix would be not to copy alternates file literally, but > install an alternates file to directly borrow from the same repositories > the clone-source repository borrows from ourselves, taking relative paths > into account. Another would be to look at the alternates and hardlink the > objects and packs while cloning, and if the repositories involved reside > across filesystem boundaries, we need to fall back to copying. Yes, either of those would work. But I wonder if it is really worth the complexity. When I suggested just ditching hardlinks if the remote uses alternates, my thought was that most people won't really care. Either they use alternates, in which case they should be providing "-s" and not doing hardlinks, or they don't, in which case things will happen as usual. But reading your response, I wonder if it is worth keeping the hardlink optimization around at all; getting rid of it would simplify the code and the explanation of why "git clone foo" is different from "git clone file://$PWD/foo". If people want a fast, dependent clone, they can use "-s". I guess hardlinks are also useful for a fast "git clone foo bar; rm -rf foo". But I'm not sure how common that is. -Peff ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-01-31 20:08 ` James Pickens 2009-01-31 21:08 ` Jakub Narebski 2009-01-31 21:55 ` Jeff King @ 2009-02-01 0:55 ` Junio C Hamano 2009-02-01 1:32 ` James Pickens 2 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2009-02-01 0:55 UTC (permalink / raw) To: James Pickens; +Cc: Git ML, Jeff King James Pickens <jepicken@gmail.com> writes: > So, is there any reason 'git clone' shouldn't automatically use > the alternates that it copied into the new repository? When you say "git clone" without -s, you are saying "I do not want to use the repository I am cloning from as my alternate, because I do not know if will stay stable. I do not trust it." This would be a very sensible way to clone, if you were cloning my repository whose 'pu' and its constituent topic branches are subject to rewinding at any time. After I rebase some of the branches and rebuild 'pu', and prune the unnecessary objects from my repository, the objects you may have been borrowing from me will be gone from my repository. Of course, I can remove my repository altogether any time, and when that happens, your repository will have many missing objects. That is why "-s" is not the default. Only when you positively know that the other repository will not drop branches or rewind them, perhaps because you control that repository yourself, it is safe to use it as your alternate, and you use commands like "git clone -s" and/or "git clone --reference" to do so. Side note. People on k.org are encouraged to use Linus's repository as an alternate to save space on the k.org machine, because it is known that Linus's repository will never rewind its branches. Now, if you are cloning from a local filesystem, by default we will copy the objects/info/alternates from the source repository to the new one. It may be debatable if this is a sensible thing to do. On one hand, because you are saying you don't trust if the objects in the source repository will stay stable by not giving "-s", it might be sensible not to trust its choice of alternates either. But in such a case, you can always use file:// URL when cloning to get a full freestanding copy. I suspect you are trying to improve the other extreme end: trusting all the other repositories involved in the cloning process a lot more than the code currently does. I do not think it is a bad thing to do per-se. I haven't looked at the codepaths involved recently, but if I recall correctly, optimizing of cloning from a repository that uses alternates itself was never a part of the initial design considerations. I suspect there may be an ample room for you to optimize things. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-02-01 0:55 ` Junio C Hamano @ 2009-02-01 1:32 ` James Pickens 2009-02-01 1:38 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: James Pickens @ 2009-02-01 1:32 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git ML, Jeff King On Sat, Jan 31, 2009, Junio C Hamano <gitster@pobox.com> wrote: > When you say "git clone" without -s, you are saying "I do not want to use > the repository I am cloning from as my alternate, because I do not know if > will stay stable. I do not trust it." Yes, I'm aware of the caveats of -s. I was talking about what happens when I *don't* use -s. > Now, if you are cloning from a local filesystem, by default we will copy > the objects/info/alternates from the source repository to the new one. It Crap, I didn't realize the alternates were only copied when you clone from the local filesystem. I wanted to use this when cloning over ssh from site A to site B, to automatically add a mirror at site B as an alternate. Sounds like I have no choice but to use --reference for that. > I suspect you are trying to improve the other extreme end: trusting all > the other repositories involved in the cloning process a lot more than the > code currently does. What I was suggesting did not involve trusting anything any more than the current code does. It just meant taking immediate advantage of the trust that was already there. Thanks for your input, James ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 'git clone' doesn't use alternates automatically? 2009-02-01 1:32 ` James Pickens @ 2009-02-01 1:38 ` Junio C Hamano 0 siblings, 0 replies; 13+ messages in thread From: Junio C Hamano @ 2009-02-01 1:38 UTC (permalink / raw) To: James Pickens; +Cc: Git ML, Jeff King James Pickens <jepicken@gmail.com> writes: > ... Sounds like I have no choice > but to use --reference for that. As --reference was invented exactly for that use case, I think using it to instruct where to borrow your object from is a very sensible thing to do. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-02-03 6:07 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-30 22:12 'git clone' doesn't use alternates automatically? James Pickens 2009-01-31 7:12 ` Jeff King 2009-01-31 20:08 ` James Pickens 2009-01-31 21:08 ` Jakub Narebski 2009-01-31 21:43 ` James Pickens 2009-01-31 21:55 ` Jeff King 2009-02-01 1:19 ` Junio C Hamano 2009-02-02 13:07 ` Jeff King 2009-02-03 4:30 ` Junio C Hamano 2009-02-03 6:06 ` Jeff King 2009-02-01 0:55 ` Junio C Hamano 2009-02-01 1:32 ` James Pickens 2009-02-01 1:38 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).