git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 'git replace' and pushing
@ 2010-11-24  4:33 Cory Fields
  2010-11-25  8:37 ` Michael J Gruber
  0 siblings, 1 reply; 10+ messages in thread
From: Cory Fields @ 2010-11-24  4:33 UTC (permalink / raw)
  To: git

I am having some trouble understanding how a replaced object (commit)
should behave when pushed to a remote repo. Here's my scenario:

We are moving from svn to git. Our svn repo is huge, and most of the
history is useless. To save space, I would like to do a 50/50 split so
that when the repo is cloned, 50% is seen by default, and the
historical 50% can be seen by fetching the replacement history. I've
done this by creating a phony snapshot at 3 then using a 'replace' to
put the others on top. The history is purely linear.

1---2---3---4---5
                 \---4---5

When the replacement is in place, the repo is half size (commit-wise)
as expected. The problem is that 'git push' does not honor the
replace. So when I push, all objects go with it, which defeats the
purpose. The only way that seams to work is doing a filter-branch and
replacing the other way.

Is this by design? I would really like a way to split the repo without
breaking hashes for the developers that have already begun using git
svn.

Thanks,
Cory

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git replace' and pushing
  2010-11-24  4:33 'git replace' and pushing Cory Fields
@ 2010-11-25  8:37 ` Michael J Gruber
  2010-11-26 21:16   ` Cory Fields
  0 siblings, 1 reply; 10+ messages in thread
From: Michael J Gruber @ 2010-11-25  8:37 UTC (permalink / raw)
  To: Cory Fields; +Cc: git

Cory Fields venit, vidit, dixit 24.11.2010 05:33:
> I am having some trouble understanding how a replaced object (commit)
> should behave when pushed to a remote repo. Here's my scenario:
> 
> We are moving from svn to git. Our svn repo is huge, and most of the
> history is useless. To save space, I would like to do a 50/50 split so
> that when the repo is cloned, 50% is seen by default, and the
> historical 50% can be seen by fetching the replacement history. I've
> done this by creating a phony snapshot at 3 then using a 'replace' to
> put the others on top. The history is purely linear.
> 
> 1---2---3---4---5
>                  \---4---5

I assume the "other" 4 goes off 3 (you're not using a monospaced font,
are you?).

Also, the other 4 should have no parent, otherwise you've not cut-off
any history.

> 
> When the replacement is in place, the repo is half size (commit-wise)
> as expected. The problem is that 'git push' does not honor the
> replace. So when I push, all objects go with it, which defeats the
> purpose. The only way that seams to work is doing a filter-branch and
> replacing the other way.
> 
> Is this by design? I would really like a way to split the repo without
> breaking hashes for the developers that have already begun using git
> svn.

It is by design since a replace creates a "fake history", and this
should not be created behind a users back.

The 5 is not rewritten, and it's ancestry contains the whole history. If
that is the commit your developers have already and that you want to
preserve then there's not much you can do.

You could try to push or pull your replacement refs first (refs/replace)
but I don't think this will change what objects the push of 5 will
transfer. Just have a try.

Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git replace' and pushing
  2010-11-27  1:58         ` Cory Fields
@ 2010-11-26 20:29           ` Martin von Zweigbergk
  2010-11-27  1:59           ` Cory Fields
  2010-11-27  7:52           ` Jonathan Nieder
  2 siblings, 0 replies; 10+ messages in thread
From: Martin von Zweigbergk @ 2010-11-26 20:29 UTC (permalink / raw)
  To: Cory Fields; +Cc: Junio C Hamano, Jonathan Nieder, Michael J Gruber, git

On Fri, 26 Nov 2010, Cory Fields wrote:

> A shallow clone does not fit for us, because we want the default clone
> to only pull half.
> Having a public 1gb repository that will be cloned quite often is
> bound to make our host
> unhappy, so we're doing everything we can to get the size down.

At the GitTogether last month, my colleague brought up the subject of
how to cope with repositories growing over time. The conclusion from
the discussions was that shallow clones would proabably be the best
option in general. FYI, even though it may not help you right now,
having a default shallow clone depth configured in the repository on
the server was also discussed.

/Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git replace' and pushing
  2010-11-25  8:37 ` Michael J Gruber
@ 2010-11-26 21:16   ` Cory Fields
  2010-11-26 21:43     ` Jonathan Nieder
  0 siblings, 1 reply; 10+ messages in thread
From: Cory Fields @ 2010-11-26 21:16 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git

On Thu, Nov 25, 2010 at 3:37 AM, Michael J Gruber
<git@drmicha.warpmail.net> wrote:
> Cory Fields venit, vidit, dixit 24.11.2010 05:33:
>> I am having some trouble understanding how a replaced object (commit)
>> should behave when pushed to a remote repo. Here's my scenario:
>>
>> We are moving from svn to git. Our svn repo is huge, and most of the
>> history is useless. To save space, I would like to do a 50/50 split so
>> that when the repo is cloned, 50% is seen by default, and the
>> historical 50% can be seen by fetching the replacement history. I've
>> done this by creating a phony snapshot at 3 then using a 'replace' to
>> put the others on top. The history is purely linear.
>>
>> 1---2---3---4---5
>>                  \---4---5
>
> I assume the "other" 4 goes off 3 (you're not using a monospaced font,
> are you?).
>

I used a monospace font, but gmail decided not to use it. Sorry for that.

> Also, the other 4 should have no parent, otherwise you've not cut-off
> any history.

I created a "fake" 4 that consists of the full working tree at 4 with no parent.
As I mentioned, everything looks fine locally.

>
>>
>> When the replacement is in place, the repo is half size (commit-wise)
>> as expected. The problem is that 'git push' does not honor the
>> replace. So when I push, all objects go with it, which defeats the
>> purpose. The only way that seams to work is doing a filter-branch and
>> replacing the other way.
>>
>> Is this by design? I would really like a way to split the repo without
>> breaking hashes for the developers that have already begun using git
>> svn.
>
> It is by design since a replace creates a "fake history", and this
> should not be created behind a users back.
>
> The 5 is not rewritten, and it's ancestry contains the whole history. If
> that is the commit your developers have already and that you want to
> preserve then there's not much you can do.
>
> You could try to push or pull your replacement refs first (refs/replace)
> but I don't think this will change what objects the push of 5 will
> transfer. Just have a try.

I tried this to no avail.

I realize that allowing replacements to be pushed "behind users backs", so
I guess not respecting it makes sense.

But is there no way that I can pull this off without rewriting hashes?

Thanks,
Cory

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git replace' and pushing
  2010-11-26 21:16   ` Cory Fields
@ 2010-11-26 21:43     ` Jonathan Nieder
  2010-11-26 23:18       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Nieder @ 2010-11-26 21:43 UTC (permalink / raw)
  To: Cory Fields; +Cc: Michael J Gruber, git

Hi Cory,

Cory Fields wrote:

> I realize that allowing replacements to be pushed "behind users backs", so
> I guess not respecting it makes sense.
>
> But is there no way that I can pull this off without rewriting hashes?

The usual way to accomplish what you are talking about would be like
this:

 Real history
 ------------
 4' --- 5 --- 6

 1 --- 2 --- 3 --- 4

 Fake history
 ------------
 1 --- 2 --- 3 --- 4 --- 5 --- 6

 Replacement ref
 ---------------
 4' --> 4

This way, a person a person can fetch either piece of real history
without trouble, and if they fetch the replacement ref, too, the
history is pasted together.

It is not possible in git to push a commit without its ancestors;
replacement refs do not change that.  However, it is sort of possible
to fetch a commit without its ancestors using the --depth option to
clone and fetch.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git replace' and pushing
  2010-11-26 21:43     ` Jonathan Nieder
@ 2010-11-26 23:18       ` Junio C Hamano
  2010-11-27  1:58         ` Cory Fields
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2010-11-26 23:18 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Cory Fields, Michael J Gruber, git

Jonathan Nieder <jrnieder@gmail.com> writes:

>  Real history
>  ------------
>  4' --- 5 --- 6
>
>  1 --- 2 --- 3 --- 4
>
>  Fake history
>  ------------
>  1 --- 2 --- 3 --- 4 --- 5 --- 6
>
>  Replacement ref
>  ---------------
>  4' --> 4
>
> This way, a person a person can fetch either piece of real history
> without trouble, and if they fetch the replacement ref, too, the
> history is pasted together.
>
> It is not possible in git to push a commit without its ancestors;
> replacement refs do not change that.

True, but I suspect the above picture pretty much satisfies Cory's initial
wish, no?  You can fetch recent 4'--5---6 history as if 4' were the root
commit, and if you fetched replacement that tells us to pretend that 4'
has 3 as its parent (and the history leading to 3), you will get a deeper
history.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git replace' and pushing
  2010-11-26 23:18       ` Junio C Hamano
@ 2010-11-27  1:58         ` Cory Fields
  2010-11-26 20:29           ` Martin von Zweigbergk
                             ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Cory Fields @ 2010-11-27  1:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Michael J Gruber, git

On Fri, Nov 26, 2010 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
>
>>  Real history
>>  ------------
>>  4' --- 5 --- 6
>>
>>  1 --- 2 --- 3 --- 4
>>
>>  Fake history
>>  ------------
>>  1 --- 2 --- 3 --- 4 --- 5 --- 6
>>
>>  Replacement ref
>>  ---------------
>>  4' --> 4
>>
>> This way, a person a person can fetch either piece of real history
>> without trouble, and if they fetch the replacement ref, too, the
>> history is pasted together.
>>
>> It is not possible in git to push a commit without its ancestors;
>> replacement refs do not change that.
>
> True, but I suspect the above picture pretty much satisfies Cory's initial
> wish, no?  You can fetch recent 4'--5---6 history as if 4' were the root
> commit, and if you fetched replacement that tells us to pretend that 4'
> has 3 as its parent (and the history leading to 3), you will get a deeper
> history.
>

Yes, both of these can be accomplished. I've managed to get that part
working, where a
default clone pulls in half history, and fetching refs/replace gives
you the rest. The only
problem is that it requires a filter-branch before pushing. Otherwise,
4 gets pushed rather
than 4', meaning that clones will require all the objects. So it
works, but I'll have to spend
quite a while making it 'perfect' so that I only have to rewrite history once.

A shallow clone does not fit for us, because we want the default clone
to only pull half.
Having a public 1gb repository that will be cloned quite often is
bound to make our host
unhappy, so we're doing everything we can to get the size down.

Also, maybe I haven't made this clear... the "real" commit IDs need to
match the "fake"
ones in order to prevent confusion. I think that's the part that makes
this so difficult.
Otherwise, something like this [1] would work just fine (probably
exactly what Junio was
suggesting)

Any other suggestions? Or do I just have to face the fact that I'm
going to have to break
hashes?

[1] http://progit.org/2010/03/17/replace.html

Cory

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git replace' and pushing
  2010-11-27  1:58         ` Cory Fields
  2010-11-26 20:29           ` Martin von Zweigbergk
@ 2010-11-27  1:59           ` Cory Fields
  2010-11-27  7:52           ` Jonathan Nieder
  2 siblings, 0 replies; 10+ messages in thread
From: Cory Fields @ 2010-11-27  1:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Michael J Gruber, git

On Fri, Nov 26, 2010 at 8:58 PM, Cory Fields
<FOSS@atlastechnologiesinc.com> wrote:
> On Fri, Nov 26, 2010 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Jonathan Nieder <jrnieder@gmail.com> writes:
>>
>>>  Real history
>>>  ------------
>>>  4' --- 5 --- 6
>>>
>>>  1 --- 2 --- 3 --- 4
>>>
>>>  Fake history
>>>  ------------
>>>  1 --- 2 --- 3 --- 4 --- 5 --- 6
>>>
>>>  Replacement ref
>>>  ---------------
>>>  4' --> 4
>>>
>>> This way, a person a person can fetch either piece of real history
>>> without trouble, and if they fetch the replacement ref, too, the
>>> history is pasted together.
>>>
>>> It is not possible in git to push a commit without its ancestors;
>>> replacement refs do not change that.
>>
>> True, but I suspect the above picture pretty much satisfies Cory's initial
>> wish, no?  You can fetch recent 4'--5---6 history as if 4' were the root
>> commit, and if you fetched replacement that tells us to pretend that 4'
>> has 3 as its parent (and the history leading to 3), you will get a deeper
>> history.
>>
>
> Yes, both of these can be accomplished. I've managed to get that part
> working, where a
> default clone pulls in half history, and fetching refs/replace gives
> you the rest. The only
> problem is that it requires a filter-branch before pushing. Otherwise,
> 4 gets pushed rather
> than 4', meaning that clones will require all the objects. So it
> works, but I'll have to spend
> quite a while making it 'perfect' so that I only have to rewrite history once.
>
> A shallow clone does not fit for us, because we want the default clone
> to only pull half.
> Having a public 1gb repository that will be cloned quite often is
> bound to make our host
> unhappy, so we're doing everything we can to get the size down.
>
> Also, maybe I haven't made this clear... the "real" commit IDs need to
> match the "fake"
> ones in order to prevent confusion. I think that's the part that makes
> this so difficult.
> Otherwise, something like this [1] would work just fine (probably
> exactly what Junio was
> suggesting)
>
> Any other suggestions? Or do I just have to face the fact that I'm
> going to have to break
> hashes?
>
> [1] http://progit.org/2010/03/17/replace.html
>
> Cory
>

Sorry for the stupid wrapping.. gmail and I are not getting along in
this thread!

Cory

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git replace' and pushing
  2010-11-27  1:58         ` Cory Fields
  2010-11-26 20:29           ` Martin von Zweigbergk
  2010-11-27  1:59           ` Cory Fields
@ 2010-11-27  7:52           ` Jonathan Nieder
  2010-11-27 17:54             ` Cory Fields
  2 siblings, 1 reply; 10+ messages in thread
From: Jonathan Nieder @ 2010-11-27  7:52 UTC (permalink / raw)
  To: Cory Fields; +Cc: Junio C Hamano, Michael J Gruber, git

Cory Fields wrote:
> On Fri, Nov 26, 2010 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote:

>> True, but I suspect the above picture pretty much satisfies Cory's initial
>> wish, no?  You can fetch recent 4'--5---6 history as if 4' were the root
>> commit, and if you fetched replacement that tells us to pretend that 4'
>> has 3 as its parent (and the history leading to 3), you will get a deeper
>> history.
>
> Yes, both of these can be accomplished. I've managed to get that part
> working, where a default clone pulls in half history, and fetching
> refs/replace gives you the rest. The only problem is that it requires a
> filter-branch before pushing.

That's a one-time thing, not per-push, right?  A filter-branch would
indeed be needed to transform the history

 1 --- 2 --- 3 --- 4 --- 5' --- 6'

into

 1 --- 2 --- 3 --- 4
 4' --- 5 --- 6

and that is unavoidable: the object names encode the entire list of
ancestors, you cannot push an object without its ancestors, etc.
But afterwards you can build on the history rooted at 4' and all
should be well, and you can use checkout --orphan to get a new
root when the current line of history is about to grow too long.

In other words, the distinction between real history and fake history
is very relevant.  Object transport only cares about the real history
(barring bugs); if you want to tweak what objects get transferred, you
really need to rewrite the real history (or use --depth).

> A shallow clone does not fit for us, because we want the default clone to
> only pull half.  Having a public 1gb repository that will be cloned quite
> often is bound to make our host unhappy, so we're doing everything we can to
> get the size down.

Why not publish a "git bundle" of the first 1gb using HTTP,
BitTorrent, or some other cache-friendly protocol and use a hook to
reject attempts to fetch too many objects at once from the host?

> Also, maybe I haven't made this clear... the "real" commit IDs need to
> match the "fake" ones in order to prevent confusion.

Not sure what this means.  But commit IDs are defined based on
content, and for simplicity and sanity the object transport machinery
deliberately does not look beyond that.

Regards,
Jonathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git replace' and pushing
  2010-11-27  7:52           ` Jonathan Nieder
@ 2010-11-27 17:54             ` Cory Fields
  0 siblings, 0 replies; 10+ messages in thread
From: Cory Fields @ 2010-11-27 17:54 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Junio C Hamano, Michael J Gruber, git, martin.von.zweigbergk

On Sat, Nov 27, 2010 at 2:52 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Cory Fields wrote:
>> On Fri, Nov 26, 2010 at 6:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
>>> True, but I suspect the above picture pretty much satisfies Cory's initial
>>> wish, no?  You can fetch recent 4'--5---6 history as if 4' were the root
>>> commit, and if you fetched replacement that tells us to pretend that 4'
>>> has 3 as its parent (and the history leading to 3), you will get a deeper
>>> history.
>>
>> Yes, both of these can be accomplished. I've managed to get that part
>> working, where a default clone pulls in half history, and fetching
>> refs/replace gives you the rest. The only problem is that it requires a
>> filter-branch before pushing.
>
> That's a one-time thing, not per-push, right?  A filter-branch would
> indeed be needed to transform the history
>
>  1 --- 2 --- 3 --- 4 --- 5' --- 6'
>
> into
>
>  1 --- 2 --- 3 --- 4
>  4' --- 5 --- 6
>
> and that is unavoidable: the object names encode the entire list of
> ancestors, you cannot push an object without its ancestors, etc.
> But afterwards you can build on the history rooted at 4' and all
> should be well, and you can use checkout --orphan to get a new
> root when the current line of history is about to grow too long.
>
> In other words, the distinction between real history and fake history
> is very relevant.  Object transport only cares about the real history
> (barring bugs); if you want to tweak what objects get transferred, you
> really need to rewrite the real history (or use --depth).
>
>> A shallow clone does not fit for us, because we want the default clone to
>> only pull half.  Having a public 1gb repository that will be cloned quite
>> often is bound to make our host unhappy, so we're doing everything we can to
>> get the size down.
>
> Why not publish a "git bundle" of the first 1gb using HTTP,
> BitTorrent, or some other cache-friendly protocol and use a hook to
> reject attempts to fetch too many objects at once from the host?
>
>> Also, maybe I haven't made this clear... the "real" commit IDs need to
>> match the "fake" ones in order to prevent confusion.
>
> Not sure what this means.  But commit IDs are defined based on
> content, and for simplicity and sanity the object transport machinery
> deliberately does not look beyond that.
>
> Regards,
> Jonathan
>

I think a one-time filter-branch is going to be our best bet. I had
assumed that this was the case, I just wanted reassurance that it was
necessary. I have that now. Thanks to all for the responses.

Martin: That sounds very interesting indeed. However, the docs make
shallow clones sound scary. From the docs: "A shallow repository has a
number of limitations (you cannot clone or fetch from it, nor push
from nor into it)"

I suppose these limitations would need to be addressed if/when looking
into serverside depth defaults?

Cory

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-11-27 17:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-24  4:33 'git replace' and pushing Cory Fields
2010-11-25  8:37 ` Michael J Gruber
2010-11-26 21:16   ` Cory Fields
2010-11-26 21:43     ` Jonathan Nieder
2010-11-26 23:18       ` Junio C Hamano
2010-11-27  1:58         ` Cory Fields
2010-11-26 20:29           ` Martin von Zweigbergk
2010-11-27  1:59           ` Cory Fields
2010-11-27  7:52           ` Jonathan Nieder
2010-11-27 17:54             ` Cory Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).