RE: With big repos and slower connections, git clone can be hard to work with

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: <rsbecker@nexbridge.com>
To: "'ellie'" <el@horse64.org>, <git@vger.kernel.org>
Subject: RE: With big repos and slower connections, git clone can be hard to work with
Date: Mon, 8 Jul 2024 08:30:41 -0400	[thread overview]
Message-ID: <000001dad132$a9339170$fb9ab450$@nexbridge.com> (raw)
In-Reply-To: <15bb8955-8ef6-4d83-b10c-e8593f65790c@horse64.org>

On Sunday, July 7, 2024 10:28 PM, ellie wrote:
>I was intending to suggest that depending on the largest object in the repository,
>resume may remain a concern for lower end users. My apologies for being unclear.
>
>As for my concrete problem, I can only guess what's happening, maybe github's
>HTTPS proxy too eagerly discarding slow connections:
>
>$ git clone https://github.com/maliit/keyboard maliit-keyboard Cloning into 'maliit-
>keyboard'...
>remote: Enumerating objects: 23243, done.
>remote: Counting objects: 100% (464/464), done.
>remote: Compressing objects: 100% (207/207), done.
>error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>CANCEL (err 8)
>error: 2507 bytes of body are still expected
>fetch-pack: unexpected disconnect while reading sideband packet
>fatal: early EOF
>fatal: fetch-pack: invalid index-pack output
>
>A deepen seems to fail for this repo since one deepen step already gets killed off. Git
>HTTPS clones from any other hoster I tried, including gitlab.com, work fine, as do git
>SSH clones from github.com.
>
>Sorry for the long tangent. Basically, my point was just that resume still seems like a
>good idea even with deepen existing.
>
>Regards,
>
>Ellie
>
>On 7/8/24 3:27 AM, rsbecker@nexbridge.com wrote:
>> On Sunday, July 7, 2024 7:42 PM, ellie wrote:
>>> I have now encountered a repository where even --deepen=1 is bound to
>>> be failing because it pulls in something fairly large that takes a
>>> few minutes. (Possibly, the server proxy has a faulty timeout setting
>>> that punishes slow connections, but for connections unreliable on the
>>> client side the problem would be the same.)
>>>
>>> So this workaround sadly doesn't seem to cover all cases of resume.
>>>
>>> Regards,
>>>
>>> Ellie
>>>
>>> On 6/8/24 2:46 AM, ellie wrote:
>>>> The deepening worked perfectly, thank you so much! I hope a resume
>>>> will still be considered however, if even just to help out newcomers.
>>>>
>>>> Regards,
>>>>
>>>> Ellie
>>>>
>>>> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote:
>>>>> On Friday, June 7, 2024 8:03 PM, ellie wrote:
>>>>>> Subject: Re: With big repos and slower connections, git clone can
>>>>>> be hard to work with
>>>>>>
>>>>>> Thanks, this is very helpful as an emergency workaround!
>>>>>>
>>>>>> Nevertheless, I usually want the entire history, especially since
>>>>>> I wouldn't mind waiting half an hour. But without resume, I've
>>>>>> encountered it regularly that it just won't complete even if I
>>>>>> give it the time, while way longer downloads in the browser would.
>>>>>> The key problem here seems to be the lack of any resume.
>>>>>>
>>>>>> I hope this helps to understand why I made the suggestion.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Ellie
>>>>>>
>>>>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote:
>>>>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>>>>>>>> I'm terribly sorry if this is the wrong place, but I'd like to
>>>>>>>> suggest a potential issue with "git clone".
>>>>>>>>
>>>>>>>> The problem is that any sort of interruption or connection
>>>>>>>> issue, no matter how brief, causes the clone to stop and leave nothing
>behind:
>>>>>>>>
>>>>>>>> $ git clone https://github.com/Nheko-Reborn/nheko
>>>>>>>> Cloning into 'nheko'...
>>>>>>>> remote: Enumerating objects: 43991, done.
>>>>>>>> remote: Counting objects: 100% (6535/6535), done.
>>>>>>>> remote: Compressing objects: 100% (1449/1449), done.
>>>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>>>>> CANCEL (err 8)
>>>>>>>> error: 2771 bytes of body are still expected
>>>>>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>>>>>> fatal: early EOF
>>>>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>>>>>>>> bash: cd: nheko: No such file or director
>>>>>>>>
>>>>>>>> In my experience, this can be really impactful with 1. big
>>>>>>>> repositories and 2.
>>>>>>>> unreliable internet - which I would argue isn't unheard of! E.g.
>>>>>>>> a developer may work via mobile connection on a business trip.
>>>>>>>> The result can even be that a repository is uncloneable for some users!
>>>>>>>>
>>>>>>>> This has left me in the absurd situation where I was able to
>>>>>>>> download a tarball via HTTPS from the git hoster just fine, even
>>>>>>>> way larger binary release items, thanks to the browser's HTTPS
>>>>>>>> resume. And yet a simple git clone of the same project failed repeatedly.
>>>>>>>>
>>>>>>>> My deepest apologies if I missed an option to fix or address this.
>>>>>>>> But summed up, please consider making git clone recover from hiccups.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Ellie
>>>>>>>>
>>>>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing
>>>>>>>> out slower git clone connections from the server side even if
>>>>>>>> the transfer is ongoing. A git auto-resume would reduce the
>>>>>>>> impact of that, too.
>>>>>>>
>>>>>>> I suggest that you look into two git topics: --depth, which
>>>>>>> controls how much
>>>>>> history is obtained in a clone, and sparse-checkout, which
>>>>>> describes the part of the repository you will retrieve. You can
>>>>>> prune the contents of the repository so that clone is faster, if
>>>>>> you do not need all of the history, or all of the files. This is
>>>>>> typically done in complex large repositories, particularly those
>>>>>> used for production support as release repositories.
>>>>>
>>>>> Consider doing the clone with --depth=1 then using git fetch
>>>>> --depth=n as the resume. There are other options that effectively
>>>>> give you a resume, including --deepen=n.
>>>>>
>>>>> Build automation, like Jenkins, uses this to speed up the clone/checkout.
>>
>> Can you please provide more details on this? It is difficult to understand your issue
>without knowing what situation is failing? What size file? Is this a large single pack
>file? Can you reproduce this with a script we can try?
>>

First, for this mailing list, please put your replies at the bottom.

Second, the full clone takes under 5 seconds on my system and does not experience any error that you are seeing. I suggest that your ISP may be throttling your account. I have seen this happen on some ISPs under SSH but few under HTTPS. It is likely a firewall or as you said, a proxy setting. GitHub has no proxy.

My suggestion is that this is more of a communication issue instead of than a large repo issue. 133Mb is a relatively small a repository and clones quickly. This might be something to take up on the GitHub support forums rather that for git - since it seems like something in the path outside of git is not working correctly. None of the files in this repository, including pack-files is larger than 100 blocks, so there is not much point with a mid-pack restart.

next prev parent reply	other threads:[~2024-07-08 12:30 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-07 23:28 With big repos and slower connections, git clone can be hard to work with ellie
2024-06-07 23:33 ` rsbecker
2024-06-08  0:03   ` ellie
2024-06-08  0:35     ` rsbecker
2024-06-08  0:46       ` ellie
2024-06-08  8:43         ` Jeff King
2024-06-08  9:40           ` ellie
2024-06-08  9:44             ` ellie
2024-06-08 10:38               ` Jeff King
2024-06-08 10:35             ` Jeff King
2024-06-08 11:05               ` ellie
2024-06-08 19:00           ` Junio C Hamano
2024-06-08 20:16             ` ellie
2024-06-10  6:46           ` Patrick Steinhardt
2024-06-10 19:04           ` Emily Shaffer
2024-06-10 20:34             ` Junio C Hamano
2024-06-10 21:55               ` ellie
2024-06-13 10:10                 ` Toon claes
2024-06-11  6:31               ` Jeff King
2024-06-11 15:12                 ` Junio C Hamano
2024-06-29  1:53                   ` Sitaram Chamarty
2024-06-11  6:26             ` Jeff King
2024-06-11 19:40               ` Ivan Frade
2024-07-07 23:42         ` ellie
2024-07-08  1:27           ` rsbecker
2024-07-08  2:28             ` ellie
2024-07-08 12:30               ` rsbecker [this message]
2024-07-08 12:41                 ` ellie
2024-07-08 14:32                   ` Konstantin Khomoutov
2024-07-08 15:02                     ` rsbecker
2024-07-08 15:14                     ` ellie
2024-07-08 15:31                       ` rsbecker
2024-07-08 15:48                         ` ellie
2024-07-08 16:23                           ` rsbecker
2024-07-08 17:06                             ` ellie
2024-07-08 17:38                               ` rsbecker
2024-07-08 16:09                         ` Emanuel Czirai
2024-07-08 15:44                       ` Konstantin Khomoutov
2024-07-08 16:27                         ` rsbecker
2024-07-14 12:00                           ` ellie
2024-07-24  6:42                           ` ellie
2025-09-08  2:34                           ` Ellie
2024-09-30 21:01 ` Ellie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000001dad132$a9339170$fb9ab450$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=el@horse64.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).