* 'git replace' and pushing @ 2010-11-24 4:33 Cory Fields 2010-11-25 8:37 ` Michael J Gruber 0 siblings, 1 reply; 10+ messages in thread From: Cory Fields @ 2010-11-24 4:33 UTC (permalink / raw) To: git I am having some trouble understanding how a replaced object (commit) should behave when pushed to a remote repo. Here's my scenario: We are moving from svn to git. Our svn repo is huge, and most of the history is useless. To save space, I would like to do a 50/50 split so that when the repo is cloned, 50% is seen by default, and the historical 50% can be seen by fetching the replacement history. I've done this by creating a phony snapshot at 3 then using a 'replace' to put the others on top. The history is purely linear. 1---2---3---4---5 \---4---5 When the replacement is in place, the repo is half size (commit-wise) as expected. The problem is that 'git push' does not honor the replace. So when I push, all objects go with it, which defeats the purpose. The only way that seams to work is doing a filter-branch and replacing the other way. Is this by design? I would really like a way to split the repo without breaking hashes for the developers that have already begun using git svn. Thanks, Cory ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 'git replace' and pushing 2010-11-24 4:33 'git replace' and pushing Cory Fields @ 2010-11-25 8:37 ` Michael J Gruber 2010-11-26 21:16 ` Cory Fields 0 siblings, 1 reply; 10+ messages in thread From: Michael J Gruber @ 2010-11-25 8:37 UTC (permalink / raw) To: Cory Fields; +Cc: git Cory Fields venit, vidit, dixit 24.11.2010 05:33: > I am having some trouble understanding how a replaced object (commit) > should behave when pushed to a remote repo. Here's my scenario: > > We are moving from svn to git. Our svn repo is huge, and most of the > history is useless. To save space, I would like to do a 50/50 split so > that when the repo is cloned, 50% is seen by default, and the > historical 50% can be seen by fetching the replacement history. I've > done this by creating a phony snapshot at 3 then using a 'replace' to > put the others on top. The history is purely linear. > > 1---2---3---4---5 > \---4---5 I assume the "other" 4 goes off 3 (you're not using a monospaced font, are you?). Also, the other 4 should have no parent, otherwise you've not cut-off any history. > > When the replacement is in place, the repo is half size (commit-wise) > as expected. The problem is that 'git push' does not honor the > replace. So when I push, all objects go with it, which defeats the > purpose. The only way that seams to work is doing a filter-branch and > replacing the other way. > > Is this by design? I would really like a way to split the repo without > breaking hashes for the developers that have already begun using git > svn. It is by design since a replace creates a "fake history", and this should not be created behind a users back. The 5 is not rewritten, and it's ancestry contains the whole history. If that is the commit your developers have already and that you want to preserve then there's not much you can do. You could try to push or pull your replacement refs first (refs/replace) but I don't think this will change what objects the push of 5 will transfer. Just have a try. Michael ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 'git replace' and pushing 2010-11-25 8:37 ` Michael J Gruber @ 2010-11-26 21:16 ` Cory Fields 2010-11-26 21:43 ` Jonathan Nieder 0 siblings, 1 reply; 10+ messages in thread From: Cory Fields @ 2010-11-26 21:16 UTC (permalink / raw) To: Michael J Gruber; +Cc: git On Thu, Nov 25, 2010 at 3:37 AM, Michael J Gruber <git@drmicha.warpmail.net> wrote: > Cory Fields venit, vidit, dixit 24.11.2010 05:33: >> I am having some trouble understanding how a replaced object (commit) >> should behave when pushed to a remote repo. Here's my scenario: >> >> We are moving from svn to git. Our svn repo is huge, and most of the >> history is useless. To save space, I would like to do a 50/50 split so >> that when the repo is cloned, 50% is seen by default, and the >> historical 50% can be seen by fetching the replacement history. I've >> done this by creating a phony snapshot at 3 then using a 'replace' to >> put the others on top. The history is purely linear. >> >> 1---2---3---4---5 >> \---4---5 > > I assume the "other" 4 goes off 3 (you're not using a monospaced font, > are you?). > I used a monospace font, but gmail decided not to use it. Sorry for that. > Also, the other 4 should have no parent, otherwise you've not cut-off > any history. I created a "fake" 4 that consists of the full working tree at 4 with no parent. As I mentioned, everything looks fine locally. > >> >> When the replacement is in place, the repo is half size (commit-wise) >> as expected. The problem is that 'git push' does not honor the >> replace. So when I push, all objects go with it, which defeats the >> purpose. The only way that seams to work is doing a filter-branch and >> replacing the other way. >> >> Is this by design? I would really like a way to split the repo without >> breaking hashes for the developers that have already begun using git >> svn. > > It is by design since a replace creates a "fake history", and this > should not be created behind a users back. > > The 5 is not rewritten, and it's ancestry contains the whole history. If > that is the commit your developers have already and that you want to > preserve then there's not much you can do. > > You could try to push or pull your replacement refs first (refs/replace) > but I don't think this will change what objects the push of 5 will > transfer. Just have a try. I tried this to no avail. I realize that allowing replacements to be pushed "behind users backs", so I guess not respecting it makes sense. But is there no way that I can pull this off without rewriting hashes? Thanks, Cory ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 'git replace' and pushing 2010-11-26 21:16 ` Cory Fields @ 2010-11-26 21:43 ` Jonathan Nieder 2010-11-26 23:18 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Jonathan Nieder @ 2010-11-26 21:43 UTC (permalink / raw) To: Cory Fields; +Cc: Michael J Gruber, git Hi Cory, Cory Fields wrote: > I realize that allowing replacements to be pushed "behind users backs", so > I guess not respecting it makes sense. > > But is there no way that I can pull this off without rewriting hashes? The usual way to accomplish what you are talking about would be like this: Real history ------------ 4' --- 5 --- 6 1 --- 2 --- 3 --- 4 Fake history ------------ 1 --- 2 --- 3 --- 4 --- 5 --- 6 Replacement ref --------------- 4' --> 4 This way, a person a person can fetch either piece of real history without trouble, and if they fetch the replacement ref, too, the history is pasted together. It is not possible in git to push a commit without its ancestors; replacement refs do not change that. However, it is sort of possible to fetch a commit without its ancestors using the --depth option to clone and fetch. Hope that helps, Jonathan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 'git replace' and pushing 2010-11-26 21:43 ` Jonathan Nieder @ 2010-11-26 23:18 ` Junio C Hamano 2010-11-27 1:58 ` Cory Fields 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2010-11-26 23:18 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Cory Fields, Michael J Gruber, git Jonathan Nieder <jrnieder@gmail.com> writes: > Real history > ------------ > 4' --- 5 --- 6 > > 1 --- 2 --- 3 --- 4 > > Fake history > ------------ > 1 --- 2 --- 3 --- 4 --- 5 --- 6 > > Replacement ref > --------------- > 4' --> 4 > > This way, a person a person can fetch either piece of real history > without trouble, and if they fetch the replacement ref, too, the > history is pasted together. > > It is not possible in git to push a commit without its ancestors; > replacement refs do not change that. True, but I suspect the above picture pretty much satisfies Cory's initial wish, no? You can fetch recent 4'--5---6 history as if 4' were the root commit, and if you fetched replacement that tells us to pretend that 4' has 3 as its parent (and the history leading to 3), you will get a deeper history. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 'git replace' and pushing 2010-11-26 23:18 ` Junio C Hamano @ 2010-11-27 1:58 ` Cory Fields 2010-11-26 20:29 ` Martin von Zweigbergk ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Cory Fields @ 2010-11-27 1:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Nieder, Michael J Gruber, git On Fri, Nov 26, 2010 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote: > Jonathan Nieder <jrnieder@gmail.com> writes: > >> Real history >> ------------ >> 4' --- 5 --- 6 >> >> 1 --- 2 --- 3 --- 4 >> >> Fake history >> ------------ >> 1 --- 2 --- 3 --- 4 --- 5 --- 6 >> >> Replacement ref >> --------------- >> 4' --> 4 >> >> This way, a person a person can fetch either piece of real history >> without trouble, and if they fetch the replacement ref, too, the >> history is pasted together. >> >> It is not possible in git to push a commit without its ancestors; >> replacement refs do not change that. > > True, but I suspect the above picture pretty much satisfies Cory's initial > wish, no? You can fetch recent 4'--5---6 history as if 4' were the root > commit, and if you fetched replacement that tells us to pretend that 4' > has 3 as its parent (and the history leading to 3), you will get a deeper > history. > Yes, both of these can be accomplished. I've managed to get that part working, where a default clone pulls in half history, and fetching refs/replace gives you the rest. The only problem is that it requires a filter-branch before pushing. Otherwise, 4 gets pushed rather than 4', meaning that clones will require all the objects. So it works, but I'll have to spend quite a while making it 'perfect' so that I only have to rewrite history once. A shallow clone does not fit for us, because we want the default clone to only pull half. Having a public 1gb repository that will be cloned quite often is bound to make our host unhappy, so we're doing everything we can to get the size down. Also, maybe I haven't made this clear... the "real" commit IDs need to match the "fake" ones in order to prevent confusion. I think that's the part that makes this so difficult. Otherwise, something like this [1] would work just fine (probably exactly what Junio was suggesting) Any other suggestions? Or do I just have to face the fact that I'm going to have to break hashes? [1] http://progit.org/2010/03/17/replace.html Cory ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 'git replace' and pushing 2010-11-27 1:58 ` Cory Fields @ 2010-11-26 20:29 ` Martin von Zweigbergk 2010-11-27 1:59 ` Cory Fields 2010-11-27 7:52 ` Jonathan Nieder 2 siblings, 0 replies; 10+ messages in thread From: Martin von Zweigbergk @ 2010-11-26 20:29 UTC (permalink / raw) To: Cory Fields; +Cc: Junio C Hamano, Jonathan Nieder, Michael J Gruber, git On Fri, 26 Nov 2010, Cory Fields wrote: > A shallow clone does not fit for us, because we want the default clone > to only pull half. > Having a public 1gb repository that will be cloned quite often is > bound to make our host > unhappy, so we're doing everything we can to get the size down. At the GitTogether last month, my colleague brought up the subject of how to cope with repositories growing over time. The conclusion from the discussions was that shallow clones would proabably be the best option in general. FYI, even though it may not help you right now, having a default shallow clone depth configured in the repository on the server was also discussed. /Martin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 'git replace' and pushing 2010-11-27 1:58 ` Cory Fields 2010-11-26 20:29 ` Martin von Zweigbergk @ 2010-11-27 1:59 ` Cory Fields 2010-11-27 7:52 ` Jonathan Nieder 2 siblings, 0 replies; 10+ messages in thread From: Cory Fields @ 2010-11-27 1:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Nieder, Michael J Gruber, git On Fri, Nov 26, 2010 at 8:58 PM, Cory Fields <FOSS@atlastechnologiesinc.com> wrote: > On Fri, Nov 26, 2010 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote: >> Jonathan Nieder <jrnieder@gmail.com> writes: >> >>> Real history >>> ------------ >>> 4' --- 5 --- 6 >>> >>> 1 --- 2 --- 3 --- 4 >>> >>> Fake history >>> ------------ >>> 1 --- 2 --- 3 --- 4 --- 5 --- 6 >>> >>> Replacement ref >>> --------------- >>> 4' --> 4 >>> >>> This way, a person a person can fetch either piece of real history >>> without trouble, and if they fetch the replacement ref, too, the >>> history is pasted together. >>> >>> It is not possible in git to push a commit without its ancestors; >>> replacement refs do not change that. >> >> True, but I suspect the above picture pretty much satisfies Cory's initial >> wish, no? You can fetch recent 4'--5---6 history as if 4' were the root >> commit, and if you fetched replacement that tells us to pretend that 4' >> has 3 as its parent (and the history leading to 3), you will get a deeper >> history. >> > > Yes, both of these can be accomplished. I've managed to get that part > working, where a > default clone pulls in half history, and fetching refs/replace gives > you the rest. The only > problem is that it requires a filter-branch before pushing. Otherwise, > 4 gets pushed rather > than 4', meaning that clones will require all the objects. So it > works, but I'll have to spend > quite a while making it 'perfect' so that I only have to rewrite history once. > > A shallow clone does not fit for us, because we want the default clone > to only pull half. > Having a public 1gb repository that will be cloned quite often is > bound to make our host > unhappy, so we're doing everything we can to get the size down. > > Also, maybe I haven't made this clear... the "real" commit IDs need to > match the "fake" > ones in order to prevent confusion. I think that's the part that makes > this so difficult. > Otherwise, something like this [1] would work just fine (probably > exactly what Junio was > suggesting) > > Any other suggestions? Or do I just have to face the fact that I'm > going to have to break > hashes? > > [1] http://progit.org/2010/03/17/replace.html > > Cory > Sorry for the stupid wrapping.. gmail and I are not getting along in this thread! Cory ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 'git replace' and pushing 2010-11-27 1:58 ` Cory Fields 2010-11-26 20:29 ` Martin von Zweigbergk 2010-11-27 1:59 ` Cory Fields @ 2010-11-27 7:52 ` Jonathan Nieder 2010-11-27 17:54 ` Cory Fields 2 siblings, 1 reply; 10+ messages in thread From: Jonathan Nieder @ 2010-11-27 7:52 UTC (permalink / raw) To: Cory Fields; +Cc: Junio C Hamano, Michael J Gruber, git Cory Fields wrote: > On Fri, Nov 26, 2010 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote: >> True, but I suspect the above picture pretty much satisfies Cory's initial >> wish, no? You can fetch recent 4'--5---6 history as if 4' were the root >> commit, and if you fetched replacement that tells us to pretend that 4' >> has 3 as its parent (and the history leading to 3), you will get a deeper >> history. > > Yes, both of these can be accomplished. I've managed to get that part > working, where a default clone pulls in half history, and fetching > refs/replace gives you the rest. The only problem is that it requires a > filter-branch before pushing. That's a one-time thing, not per-push, right? A filter-branch would indeed be needed to transform the history 1 --- 2 --- 3 --- 4 --- 5' --- 6' into 1 --- 2 --- 3 --- 4 4' --- 5 --- 6 and that is unavoidable: the object names encode the entire list of ancestors, you cannot push an object without its ancestors, etc. But afterwards you can build on the history rooted at 4' and all should be well, and you can use checkout --orphan to get a new root when the current line of history is about to grow too long. In other words, the distinction between real history and fake history is very relevant. Object transport only cares about the real history (barring bugs); if you want to tweak what objects get transferred, you really need to rewrite the real history (or use --depth). > A shallow clone does not fit for us, because we want the default clone to > only pull half. Having a public 1gb repository that will be cloned quite > often is bound to make our host unhappy, so we're doing everything we can to > get the size down. Why not publish a "git bundle" of the first 1gb using HTTP, BitTorrent, or some other cache-friendly protocol and use a hook to reject attempts to fetch too many objects at once from the host? > Also, maybe I haven't made this clear... the "real" commit IDs need to > match the "fake" ones in order to prevent confusion. Not sure what this means. But commit IDs are defined based on content, and for simplicity and sanity the object transport machinery deliberately does not look beyond that. Regards, Jonathan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 'git replace' and pushing 2010-11-27 7:52 ` Jonathan Nieder @ 2010-11-27 17:54 ` Cory Fields 0 siblings, 0 replies; 10+ messages in thread From: Cory Fields @ 2010-11-27 17:54 UTC (permalink / raw) To: Jonathan Nieder Cc: Junio C Hamano, Michael J Gruber, git, martin.von.zweigbergk On Sat, Nov 27, 2010 at 2:52 AM, Jonathan Nieder <jrnieder@gmail.com> wrote: > Cory Fields wrote: >> On Fri, Nov 26, 2010 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote: > >>> True, but I suspect the above picture pretty much satisfies Cory's initial >>> wish, no? You can fetch recent 4'--5---6 history as if 4' were the root >>> commit, and if you fetched replacement that tells us to pretend that 4' >>> has 3 as its parent (and the history leading to 3), you will get a deeper >>> history. >> >> Yes, both of these can be accomplished. I've managed to get that part >> working, where a default clone pulls in half history, and fetching >> refs/replace gives you the rest. The only problem is that it requires a >> filter-branch before pushing. > > That's a one-time thing, not per-push, right? A filter-branch would > indeed be needed to transform the history > > 1 --- 2 --- 3 --- 4 --- 5' --- 6' > > into > > 1 --- 2 --- 3 --- 4 > 4' --- 5 --- 6 > > and that is unavoidable: the object names encode the entire list of > ancestors, you cannot push an object without its ancestors, etc. > But afterwards you can build on the history rooted at 4' and all > should be well, and you can use checkout --orphan to get a new > root when the current line of history is about to grow too long. > > In other words, the distinction between real history and fake history > is very relevant. Object transport only cares about the real history > (barring bugs); if you want to tweak what objects get transferred, you > really need to rewrite the real history (or use --depth). > >> A shallow clone does not fit for us, because we want the default clone to >> only pull half. Having a public 1gb repository that will be cloned quite >> often is bound to make our host unhappy, so we're doing everything we can to >> get the size down. > > Why not publish a "git bundle" of the first 1gb using HTTP, > BitTorrent, or some other cache-friendly protocol and use a hook to > reject attempts to fetch too many objects at once from the host? > >> Also, maybe I haven't made this clear... the "real" commit IDs need to >> match the "fake" ones in order to prevent confusion. > > Not sure what this means. But commit IDs are defined based on > content, and for simplicity and sanity the object transport machinery > deliberately does not look beyond that. > > Regards, > Jonathan > I think a one-time filter-branch is going to be our best bet. I had assumed that this was the case, I just wanted reassurance that it was necessary. I have that now. Thanks to all for the responses. Martin: That sounds very interesting indeed. However, the docs make shallow clones sound scary. From the docs: "A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it)" I suppose these limitations would need to be addressed if/when looking into serverside depth defaults? Cory ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-11-27 17:54 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-24 4:33 'git replace' and pushing Cory Fields 2010-11-25 8:37 ` Michael J Gruber 2010-11-26 21:16 ` Cory Fields 2010-11-26 21:43 ` Jonathan Nieder 2010-11-26 23:18 ` Junio C Hamano 2010-11-27 1:58 ` Cory Fields 2010-11-26 20:29 ` Martin von Zweigbergk 2010-11-27 1:59 ` Cory Fields 2010-11-27 7:52 ` Jonathan Nieder 2010-11-27 17:54 ` Cory Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).