Stalled git cloning and possible solutions

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Stalled git cloning and possible solutions
@ 2013-08-29 19:48 V.Krishn
  2013-08-29 21:10 ` Jonathan Nieder
  0 siblings, 1 reply; 8+ messages in thread
From: V.Krishn @ 2013-08-29 19:48 UTC (permalink / raw)
  To: git

Hi,

Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what 
been downloaded, and process needs re-start.

Is there a way to recover or continue from already downloaded files during 
cloning ?
Please point me to an archive url if solution exists. (though I continue to 
search through them as I email this)

Can there be something like:
git clone <url> --use-method=rsync

-- 
Regards.
V.Krishn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Stalled git cloning and possible solutions
  2013-08-29 19:48 Stalled git cloning and possible solutions V.Krishn
@ 2013-08-29 21:10 ` Jonathan Nieder
  2013-08-29 21:35   ` V.Krishn
  2013-08-30 12:17   ` Duy Nguyen
  0 siblings, 2 replies; 8+ messages in thread
From: Jonathan Nieder @ 2013-08-29 21:10 UTC (permalink / raw)
  To: V.Krishn; +Cc: git

V.Krishn wrote:

> Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what 
> been downloaded, and process needs re-start.
>
> Is there a way to recover or continue from already downloaded files during 
> cloning ?

No, sadly.  The pack sent for a clone is generated dynamically, so
there's no easy way to support the equivalent of an HTTP Range request
to resume.  Someone might implement an appropriate protocol extension
to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
but for now it doesn't exist.

What you *can* do today is create a bundle from the large repo
somewhere with a reliable connection and then grab that using a
resumable transport such as HTTP.  A kind person made a service to do
that.

  http://thread.gmane.org/gmane.comp.version-control.git/181380

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Stalled git cloning and possible solutions
  2013-08-29 21:10 ` Jonathan Nieder
@ 2013-08-29 21:35   ` V.Krishn
  2013-08-29 22:18     ` Junio C Hamano
  2013-08-30 12:17   ` Duy Nguyen
  1 sibling, 1 reply; 8+ messages in thread
From: V.Krishn @ 2013-08-29 21:35 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Friday, August 30, 2013 02:40:34 AM you wrote:
> V.Krishn wrote:
> > Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans
> > what been downloaded, and process needs re-start.
> > 
> > Is there a way to recover or continue from already downloaded files
> > during cloning ?
> 
> No, sadly.  The pack sent for a clone is generated dynamically, so
> there's no easy way to support the equivalent of an HTTP Range request
> to resume.  Someone might implement an appropriate protocol extension
> to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
> but for now it doesn't exist.

This is what I tried but then realized something more is needed:

During stalled clone avoid  Ctrl+c. 
1. Copy the content .i.e .git folder some other place.
2. cd <new dir>
3. git config fetch.unpackLimit 999999
4. git config transfer.unpackLimit 999999
5. cat .git/config #to see if config went ok

6. recover process:
 git unpack-objects -r --strict <.git/objects/pack/tmp_pack_0mSPsc

THEN... hopefully thought following should do the trick
 git pull
 OR
 git-fetch-pack
 OR
 git repack + git pull

but then something more is needed.... :)
like index/map file... etc for it work.

> 
> What you *can* do today is create a bundle from the large repo
> somewhere with a reliable connection and then grab that using a
> resumable transport such as HTTP.  A kind person made a service to do
> that.
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/181380

Service looks nice. Hope its gets sponsors to keep it running.

-- 
Regards.
V.Krishn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Stalled git cloning and possible solutions
  2013-08-29 21:35   ` V.Krishn
@ 2013-08-29 22:18     ` Junio C Hamano
  2013-08-29 22:28       ` V.Krishn
  2013-09-04  1:06       ` V.Krishn
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2013-08-29 22:18 UTC (permalink / raw)
  To: vkrishn4; +Cc: Jonathan Nieder, git

"V.Krishn" <vkrishn4@gmail.com> writes:

> On Friday, August 30, 2013 02:40:34 AM you wrote:
>> V.Krishn wrote:
>> > Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans
>> > what been downloaded, and process needs re-start.
>> > 
>> > Is there a way to recover or continue from already downloaded files
>> > during cloning ?
>> 
>> No, sadly.  The pack sent for a clone is generated dynamically, so
>> there's no easy way to support the equivalent of an HTTP Range request
>> to resume.  Someone might implement an appropriate protocol extension
>> to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
>> but for now it doesn't exist.
>
> This is what I tried but then realized something more is needed:
>
> During stalled clone avoid  Ctrl+c. 
> 1. Copy the content .i.e .git folder some other place.
> 2. cd <new dir>
> 3. git config fetch.unpackLimit 999999
> 4. git config transfer.unpackLimit 999999

These two steps will not help, as negotiation between the sender and
the receiver is based on the commits that are known to be complete,
and an earlier failed "fetch" will not (and should not) update refs
on the receiver's side.

>> What you *can* do today is create a bundle from the large repo
>> somewhere with a reliable connection and then grab that using a
>> resumable transport such as HTTP.

Yes.

Another possibility is, if the project being cloned has a tag (or a
branch) that points at a commit back when it was smaller, do this

	git init x &&
        cd x &&
        git fetch $that_repository $that_tag:refs/tags/back_then_i_was_small

to prime the object store of a temporary repository 'x' with a
hopefully smaller transfer, and then use it as a "--reference"
repository to the real clone.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Stalled git cloning and possible solutions
  2013-08-29 22:18     ` Junio C Hamano
@ 2013-08-29 22:28       ` V.Krishn
  2013-09-04  1:06       ` V.Krishn
  1 sibling, 0 replies; 8+ messages in thread
From: V.Krishn @ 2013-08-29 22:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Friday, August 30, 2013 03:48:44 AM you wrote:
> "V.Krishn" <vkrishn4@gmail.com> writes:
> > On Friday, August 30, 2013 02:40:34 AM you wrote:
> >> V.Krishn wrote:
> >> > Quite sometimes when cloning a large repo stalls, hitting Ctrl+c
> >> > cleans what been downloaded, and process needs re-start.
> >> > 
> >> > Is there a way to recover or continue from already downloaded files
> >> > during cloning ?
> >> 
> >> No, sadly.  The pack sent for a clone is generated dynamically, so
> >> there's no easy way to support the equivalent of an HTTP Range request
> >> to resume.  Someone might implement an appropriate protocol extension
> >> to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
> >> but for now it doesn't exist.
> > 
> > This is what I tried but then realized something more is needed:
> > 
> > During stalled clone avoid  Ctrl+c.
> > 1. Copy the content .i.e .git folder some other place.
> > 2. cd <new dir>
> > 3. git config fetch.unpackLimit 999999
> > 4. git config transfer.unpackLimit 999999
> 
> These two steps will not help, as negotiation between the sender and
> the receiver is based on the commits that are known to be complete,
> and an earlier failed "fetch" will not (and should not) update refs
> on the receiver's side.
> 
> >> What you *can* do today is create a bundle from the large repo
> >> somewhere with a reliable connection and then grab that using a
> >> resumable transport such as HTTP.
> 
> Yes.
> 
> Another possibility is, if the project being cloned has a tag (or a
> branch) that points at a commit back when it was smaller, do this
> 
> 	git init x &&
>         cd x &&
>         git fetch $that_repository
> $that_tag:refs/tags/back_then_i_was_small
> 
> to prime the object store of a temporary repository 'x' with a
> hopefully smaller transfer, and then use it as a "--reference"
> repository to the real clone.

Would be nice if,
1. the clone process downloaded all files in .git before the blobs or packing 
process and added a lock file like .clone and then started the packing 
process.
2. Any interrupt((ctrl+c) should not delete the already dowloaded files but on 
re-clone process it should check .clone file and resume cloning.
3. Upon finishing cloning delete .clone file.

-- 
Regards.
V.Krishn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Stalled git cloning and possible solutions
  2013-08-29 21:10 ` Jonathan Nieder
  2013-08-29 21:35   ` V.Krishn
@ 2013-08-30 12:17   ` Duy Nguyen
  2013-08-30 12:41     ` Duy Nguyen
  1 sibling, 1 reply; 8+ messages in thread
From: Duy Nguyen @ 2013-08-30 12:17 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: V.Krishn, Git Mailing List

On Fri, Aug 30, 2013 at 4:10 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> V.Krishn wrote:
>
>> Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what
>> been downloaded, and process needs re-start.
>>
>> Is there a way to recover or continue from already downloaded files during
>> cloning ?
>
> No, sadly.  The pack sent for a clone is generated dynamically, so
> there's no easy way to support the equivalent of an HTTP Range request
> to resume.  Someone might implement an appropriate protocol extension
> to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
> but for now it doesn't exist.

OK how about a new capability "resume" to upload-pack. fetch-pack can
then send capability "resume[=<SHA-1>,<skip>]" to upload-pack. The
first time it sends "resume" without parameters, and upload-pack will
send back an SHA-1 to identify the pack being transferred together
with a full pack as usual. When early disconnection happens, it sends
the received SHA-1 and the received pack's size so far. It either
receives the remaining part, or a full pack.

When upload-pack gets "resume", it calculates a checksum of all input
that may impact pack generation. If the checksum matches the SHA-1
from fetch-pack, it'll continue to generate the pack as usual, but
will skip sending the first <skip> bytes (maybe with a fake header so
that fetch-pack realizes this is a partial pack). If the checksum does
not match, it sends full pack again. I count on index-pack to spot
corrupt resumed pack due to bugs.

The input to calculate SHA-1 checksum includes:

 - the result SHA-1 list from rev-list
 - git version string
 - .git/shallow
 - replace object database
 - pack.* config
 - maybe some other variables (I haven't checked pack-objects)

Another Git implementation can generate this SHA-1 in a totally
different way and may even cache the generated pack.

If at resume time, the load balancer directs the request to another
upload-pack that generates this SHA-1 differently, ok this won't work
(i.e. full pack is returned). In a busy repository, some refs may have
moved so rev-list result at the resume time won't match any more, but
we can deal with that later by relaxing to allow "want " lines with
SHA-1 that are reachable from current refs, not just one of the refs
(pack v4 or reachability bitmaps help).
-- 
Duy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Stalled git cloning and possible solutions
  2013-08-30 12:17   ` Duy Nguyen
@ 2013-08-30 12:41     ` Duy Nguyen
  0 siblings, 0 replies; 8+ messages in thread
From: Duy Nguyen @ 2013-08-30 12:41 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: V.Krishn, Git Mailing List

On Fri, Aug 30, 2013 at 7:17 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> OK how about a new capability "resume" to upload-pack. fetch-pack can
> then send capability "resume[=<SHA-1>,<skip>]" to upload-pack. The
> first time it sends "resume" without parameters, and upload-pack will
> send back an SHA-1 to identify the pack being transferred together
> with a full pack as usual. When early disconnection happens, it sends
> the received SHA-1 and the received pack's size so far. It either
> receives the remaining part, or a full pack.
>
> When upload-pack gets "resume", it calculates a checksum of all input
> that may impact pack generation. If the checksum matches the SHA-1
> from fetch-pack, it'll continue to generate the pack as usual, but
> will skip sending the first <skip> bytes (maybe with a fake header so
> that fetch-pack realizes this is a partial pack). If the checksum does
> not match, it sends full pack again. I count on index-pack to spot
> corrupt resumed pack due to bugs.
>
> The input to calculate SHA-1 checksum includes:
>
>  - the result SHA-1 list from rev-list
>  - git version string
>  - .git/shallow
>  - replace object database
>  - pack.* config
>  - maybe some other variables (I haven't checked pack-objects)

should have tested something first before I wrote. --threads adds some
randomness to pack generation so it has to be --threads=1. Not sure if
git repository hosts are happy with it..

> Another Git implementation can generate this SHA-1 in a totally
> different way and may even cache the generated pack.
>
> If at resume time, the load balancer directs the request to another
> upload-pack that generates this SHA-1 differently, ok this won't work
> (i.e. full pack is returned). In a busy repository, some refs may have
> moved so rev-list result at the resume time won't match any more, but
> we can deal with that later by relaxing to allow "want " lines with
> SHA-1 that are reachable from current refs, not just one of the refs
> (pack v4 or reachability bitmaps help).
> --
> Duy



-- 
Duy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Stalled git cloning and possible solutions
  2013-08-29 22:18     ` Junio C Hamano
  2013-08-29 22:28       ` V.Krishn
@ 2013-09-04  1:06       ` V.Krishn
  1 sibling, 0 replies; 8+ messages in thread
From: V.Krishn @ 2013-09-04  1:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Friday, August 30, 2013 03:48:44 AM you wrote:
> "V.Krishn" <vkrishn4@gmail.com> writes:
> > On Friday, August 30, 2013 02:40:34 AM you wrote:
> >> V.Krishn wrote:
> >> > Quite sometimes when cloning a large repo stalls, hitting Ctrl+c
> >> > cleans what been downloaded, and process needs re-start.
> >> > 
> >> > Is there a way to recover or continue from already downloaded files
> >> > during cloning ?
> >> 
> >> No, sadly.  The pack sent for a clone is generated dynamically, so
> >> there's no easy way to support the equivalent of an HTTP Range request
> >> to resume.  Someone might implement an appropriate protocol extension
> >> to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
> >> but for now it doesn't exist.
> > 
> > This is what I tried but then realized something more is needed:
> > 
> > During stalled clone avoid  Ctrl+c.
> > 1. Copy the content .i.e .git folder some other place.
> > 2. cd <new dir>
> > 3. git config fetch.unpackLimit 999999
> > 4. git config transfer.unpackLimit 999999
> 
> These two steps will not help, as negotiation between the sender and
> the receiver is based on the commits that are known to be complete,
> and an earlier failed "fetch" will not (and should not) update refs
> on the receiver's side.
> 
> >> What you *can* do today is create a bundle from the large repo
> >> somewhere with a reliable connection and then grab that using a
> >> resumable transport such as HTTP.
> 
> Yes.
> 
> Another possibility is, if the project being cloned has a tag (or a
> branch) that points at a commit back when it was smaller, do this
> 
> 	git init x &&
>         cd x &&
>         git fetch $that_repository
> $that_tag:refs/tags/back_then_i_was_small
> 
> to prime the object store of a temporary repository 'x' with a
> hopefully smaller transfer, and then use it as a "--reference"
> repository to the real clone.

What more files/info would be needed.
I noticed the tmp_pack_xxxxxx may not have object type commit/tree.
Do I need to manually create .git/refs..

I was wondering the following would further help in recovering.

A
1. If pack file was created in sequence to commit history(date), i.e 
blob+commit+tree....tags...+blob+commit+tree. 
also if in parallel idx was also created or atleast a tmp idx.
2. Update other files in .git dir before pack process.
    (as stated in previous email).
3. Objects are named like datestamp(epoch)+sha1 
     and stored in epoch directory. (date fmt can be yymmdd).
     (this might break back-compat)
4. Add "git fsck --defrag [1..4]" 
   #this can take another parameter like level, 
     applying various heuristic optimization.

B
Another option would be:
git clone <url> --use-method=rsync
this would transfer files as is in .git dir (ones necessary).
And run `git gc` or any other housekeeping upon completion.
This method would allow resuming.
Cons:
  Any change in pack file on server during download becomes a potential issue.

The clone resume may not be a priority but if a minor changes can help in 
recovery, this would be nice. 

I still like the bundle method if git services made this easy.

-- 
Regards.
V.Krishn

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-09-04  1:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-29 19:48 Stalled git cloning and possible solutions V.Krishn
2013-08-29 21:10 ` Jonathan Nieder
2013-08-29 21:35   ` V.Krishn
2013-08-29 22:18     ` Junio C Hamano
2013-08-29 22:28       ` V.Krishn
2013-09-04  1:06       ` V.Krishn
2013-08-30 12:17   ` Duy Nguyen
2013-08-30 12:41     ` Duy Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).