RE: With big repos and slower connections, git clone can be hard to work with

All of lore.kernel.org
 help / color / mirror / Atom feed

From: <rsbecker@nexbridge.com>
To: "'ellie'" <el@horse64.org>, <git@vger.kernel.org>
Subject: RE: With big repos and slower connections, git clone can be hard to work with
Date: Sun, 7 Jul 2024 21:27:52 -0400	[thread overview]
Message-ID: <0a7401dad0d6$10d27e20$32777a60$@nexbridge.com> (raw)
In-Reply-To: <d3b3c9bb-fa2a-422d-99a7-4add5f98326e@horse64.org>

On Sunday, July 7, 2024 7:42 PM, ellie wrote:
>I have now encountered a repository where even --deepen=1 is bound to be failing
>because it pulls in something fairly large that takes a few minutes. (Possibly, the
>server proxy has a faulty timeout setting that punishes slow connections, but for
>connections unreliable on the client side the problem would be the same.)
>
>So this workaround sadly doesn't seem to cover all cases of resume.
>
>Regards,
>
>Ellie
>
>On 6/8/24 2:46 AM, ellie wrote:
>> The deepening worked perfectly, thank you so much! I hope a resume
>> will still be considered however, if even just to help out newcomers.
>>
>> Regards,
>>
>> Ellie
>>
>> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote:
>>> On Friday, June 7, 2024 8:03 PM, ellie wrote:
>>>> Subject: Re: With big repos and slower connections, git clone can be
>>>> hard to work with
>>>>
>>>> Thanks, this is very helpful as an emergency workaround!
>>>>
>>>> Nevertheless, I usually want the entire history, especially since I
>>>> wouldn't mind waiting half an hour. But without resume, I've
>>>> encountered it regularly that it just won't complete even if I give
>>>> it the time, while way longer downloads in the browser would. The
>>>> key problem here seems to be the lack of any resume.
>>>>
>>>> I hope this helps to understand why I made the suggestion.
>>>>
>>>> Regards,
>>>>
>>>> Ellie
>>>>
>>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote:
>>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>>>>>> I'm terribly sorry if this is the wrong place, but I'd like to
>>>>>> suggest a potential issue with "git clone".
>>>>>>
>>>>>> The problem is that any sort of interruption or connection issue,
>>>>>> no matter how brief, causes the clone to stop and leave nothing behind:
>>>>>>
>>>>>> $ git clone https://github.com/Nheko-Reborn/nheko
>>>>>> Cloning into 'nheko'...
>>>>>> remote: Enumerating objects: 43991, done.
>>>>>> remote: Counting objects: 100% (6535/6535), done.
>>>>>> remote: Compressing objects: 100% (1449/1449), done.
>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>>> CANCEL (err 8)
>>>>>> error: 2771 bytes of body are still expected
>>>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>>>> fatal: early EOF
>>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>>>>>> bash: cd: nheko: No such file or director
>>>>>>
>>>>>> In my experience, this can be really impactful with 1. big
>>>>>> repositories and 2.
>>>>>> unreliable internet - which I would argue isn't unheard of! E.g.
>>>>>> a developer may work via mobile connection on a business trip. The
>>>>>> result can even be that a repository is uncloneable for some users!
>>>>>>
>>>>>> This has left me in the absurd situation where I was able to
>>>>>> download a tarball via HTTPS from the git hoster just fine, even
>>>>>> way larger binary release items, thanks to the browser's HTTPS
>>>>>> resume. And yet a simple git clone of the same project failed repeatedly.
>>>>>>
>>>>>> My deepest apologies if I missed an option to fix or address this.
>>>>>> But summed up, please consider making git clone recover from hiccups.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Ellie
>>>>>>
>>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing
>>>>>> out slower git clone connections from the server side even if the
>>>>>> transfer is ongoing. A git auto-resume would reduce the impact of
>>>>>> that, too.
>>>>>
>>>>> I suggest that you look into two git topics: --depth, which
>>>>> controls how much
>>>> history is obtained in a clone, and sparse-checkout, which describes
>>>> the part of the repository you will retrieve. You can prune the
>>>> contents of the repository so that clone is faster, if you do not
>>>> need all of the history, or all of the files. This is typically done
>>>> in complex large repositories, particularly those used for
>>>> production support as release repositories.
>>>
>>> Consider doing the clone with --depth=1 then using git fetch
>>> --depth=n as the resume. There are other options that effectively
>>> give you a resume, including --deepen=n.
>>>
>>> Build automation, like Jenkins, uses this to speed up the clone/checkout.

Can you please provide more details on this? It is difficult to understand your issue without knowing what situation is failing? What size file? Is this a large single pack file? Can you reproduce this with a script we can try?

next prev parent reply	other threads:[~2024-07-08  1:28 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-07 23:28 With big repos and slower connections, git clone can be hard to work with ellie
2024-06-07 23:33 ` rsbecker
2024-06-08  0:03   ` ellie
2024-06-08  0:35     ` rsbecker
2024-06-08  0:46       ` ellie
2024-06-08  8:43         ` Jeff King
2024-06-08  9:40           ` ellie
2024-06-08  9:44             ` ellie
2024-06-08 10:38               ` Jeff King
2024-06-08 10:35             ` Jeff King
2024-06-08 11:05               ` ellie
2024-06-08 19:00           ` Junio C Hamano
2024-06-08 20:16             ` ellie
2024-06-10  6:46           ` Patrick Steinhardt
2024-06-10 19:04           ` Emily Shaffer
2024-06-10 20:34             ` Junio C Hamano
2024-06-10 21:55               ` ellie
2024-06-13 10:10                 ` Toon claes
2024-06-11  6:31               ` Jeff King
2024-06-11 15:12                 ` Junio C Hamano
2024-06-29  1:53                   ` Sitaram Chamarty
2024-06-11  6:26             ` Jeff King
2024-06-11 19:40               ` Ivan Frade
2024-07-07 23:42         ` ellie
2024-07-08  1:27           ` rsbecker [this message]
2024-07-08  2:28             ` ellie
2024-07-08 12:30               ` rsbecker
2024-07-08 12:41                 ` ellie
2024-07-08 14:32                   ` Konstantin Khomoutov
2024-07-08 15:02                     ` rsbecker
2024-07-08 15:14                     ` ellie
2024-07-08 15:31                       ` rsbecker
2024-07-08 15:48                         ` ellie
2024-07-08 16:23                           ` rsbecker
2024-07-08 17:06                             ` ellie
2024-07-08 17:38                               ` rsbecker
2024-07-08 16:09                         ` Emanuel Czirai
2024-07-08 15:44                       ` Konstantin Khomoutov
2024-07-08 16:27                         ` rsbecker
2024-07-14 12:00                           ` ellie
2024-07-24  6:42                           ` ellie
2025-09-08  2:34                           ` Ellie
2024-09-30 21:01 ` Ellie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='0a7401dad0d6$10d27e20$32777a60$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=el@horse64.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.