With big repos and slower connections, git clone can be hard to work with

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* With big repos and slower connections, git clone can be hard to work with
@ 2024-06-07 23:28 ellie
  2024-06-07 23:33 ` rsbecker
  2024-09-30 21:01 ` Ellie
  0 siblings, 2 replies; 43+ messages in thread
From: ellie @ 2024-06-07 23:28 UTC (permalink / raw)
  To: git

Dear git team,

I'm terribly sorry if this is the wrong place, but I'd like to suggest a 
potential issue with "git clone".

The problem is that any sort of interruption or connection issue, no 
matter how brief, causes the clone to stop and leave nothing behind:

$ git clone https://github.com/Nheko-Reborn/nheko
Cloning into 'nheko'...
remote: Enumerating objects: 43991, done.
remote: Counting objects: 100% (6535/6535), done.
remote: Compressing objects: 100% (1449/1449), done.
error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: 
CANCEL (err 8)
error: 2771 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output
$ cd nheko
bash: cd: nheko: No such file or director

In my experience, this can be really impactful with 1. big repositories 
and 2. unreliable internet - which I would argue isn't unheard of! E.g. 
a developer may work via mobile connection on a business trip. The 
result can even be that a repository is uncloneable for some users!

This has left me in the absurd situation where I was able to download a 
tarball via HTTPS from the git hoster just fine, even way larger binary 
release items, thanks to the browser's HTTPS resume. And yet a simple 
git clone of the same project failed repeatedly.

My deepest apologies if I missed an option to fix or address this. But 
summed up, please consider making git clone recover from hiccups.

Regards,

Ellie

PS: I've seen git hosters have apparent proxy bugs, like timing out 
slower git clone connections from the server side even if the transfer 
is ongoing. A git auto-resume would reduce the impact of that, too.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: With big repos and slower connections, git clone can be hard to work with
  2024-06-07 23:28 With big repos and slower connections, git clone can be hard to work with ellie
@ 2024-06-07 23:33 ` rsbecker
  2024-06-08  0:03   ` ellie
  2024-09-30 21:01 ` Ellie
  1 sibling, 1 reply; 43+ messages in thread
From: rsbecker @ 2024-06-07 23:33 UTC (permalink / raw)
  To: 'ellie', git

On Friday, June 7, 2024 7:28 PM, ellie wrote:
>I'm terribly sorry if this is the wrong place, but I'd like to suggest a potential issue
>with "git clone".
>
>The problem is that any sort of interruption or connection issue, no matter how
>brief, causes the clone to stop and leave nothing behind:
>
>$ git clone https://github.com/Nheko-Reborn/nheko
>Cloning into 'nheko'...
>remote: Enumerating objects: 43991, done.
>remote: Counting objects: 100% (6535/6535), done.
>remote: Compressing objects: 100% (1449/1449), done.
>error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>CANCEL (err 8)
>error: 2771 bytes of body are still expected
>fetch-pack: unexpected disconnect while reading sideband packet
>fatal: early EOF
>fatal: fetch-pack: invalid index-pack output $ cd nheko
>bash: cd: nheko: No such file or director
>
>In my experience, this can be really impactful with 1. big repositories and 2.
>unreliable internet - which I would argue isn't unheard of! E.g.
>a developer may work via mobile connection on a business trip. The result can even
>be that a repository is uncloneable for some users!
>
>This has left me in the absurd situation where I was able to download a tarball via
>HTTPS from the git hoster just fine, even way larger binary release items, thanks to
>the browser's HTTPS resume. And yet a simple git clone of the same project failed
>repeatedly.
>
>My deepest apologies if I missed an option to fix or address this. But summed up,
>please consider making git clone recover from hiccups.
>
>Regards,
>
>Ellie
>
>PS: I've seen git hosters have apparent proxy bugs, like timing out slower git clone
>connections from the server side even if the transfer is ongoing. A git auto-resume
>would reduce the impact of that, too.

I suggest that you look into two git topics: --depth, which controls how much history is obtained in a clone, and sparse-checkout, which describes the part of the repository you will retrieve. You can prune the contents of the repository so that clone is faster, if you do not need all of the history, or all of the files. This is typically done in complex large repositories, particularly those used for production support as release repositories.
--Randall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-07 23:33 ` rsbecker
@ 2024-06-08  0:03   ` ellie
  2024-06-08  0:35     ` rsbecker
  0 siblings, 1 reply; 43+ messages in thread
From: ellie @ 2024-06-08  0:03 UTC (permalink / raw)
  To: rsbecker, git

Thanks, this is very helpful as an emergency workaround!

Nevertheless, I usually want the entire history, especially since I 
wouldn't mind waiting half an hour. But without resume, I've encountered 
it regularly that it just won't complete even if I give it the time, 
while way longer downloads in the browser would. The key problem here 
seems to be the lack of any resume.

I hope this helps to understand why I made the suggestion.

Regards,

Ellie

On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote:
> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>> I'm terribly sorry if this is the wrong place, but I'd like to suggest a potential issue
>> with "git clone".
>>
>> The problem is that any sort of interruption or connection issue, no matter how
>> brief, causes the clone to stop and leave nothing behind:
>>
>> $ git clone https://github.com/Nheko-Reborn/nheko
>> Cloning into 'nheko'...
>> remote: Enumerating objects: 43991, done.
>> remote: Counting objects: 100% (6535/6535), done.
>> remote: Compressing objects: 100% (1449/1449), done.
>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>> CANCEL (err 8)
>> error: 2771 bytes of body are still expected
>> fetch-pack: unexpected disconnect while reading sideband packet
>> fatal: early EOF
>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>> bash: cd: nheko: No such file or director
>>
>> In my experience, this can be really impactful with 1. big repositories and 2.
>> unreliable internet - which I would argue isn't unheard of! E.g.
>> a developer may work via mobile connection on a business trip. The result can even
>> be that a repository is uncloneable for some users!
>>
>> This has left me in the absurd situation where I was able to download a tarball via
>> HTTPS from the git hoster just fine, even way larger binary release items, thanks to
>> the browser's HTTPS resume. And yet a simple git clone of the same project failed
>> repeatedly.
>>
>> My deepest apologies if I missed an option to fix or address this. But summed up,
>> please consider making git clone recover from hiccups.
>>
>> Regards,
>>
>> Ellie
>>
>> PS: I've seen git hosters have apparent proxy bugs, like timing out slower git clone
>> connections from the server side even if the transfer is ongoing. A git auto-resume
>> would reduce the impact of that, too.
> 
> I suggest that you look into two git topics: --depth, which controls how much history is obtained in a clone, and sparse-checkout, which describes the part of the repository you will retrieve. You can prune the contents of the repository so that clone is faster, if you do not need all of the history, or all of the files. This is typically done in complex large repositories, particularly those used for production support as release repositories.
> --Randall
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  0:03   ` ellie
@ 2024-06-08  0:35     ` rsbecker
  2024-06-08  0:46       ` ellie
  0 siblings, 1 reply; 43+ messages in thread
From: rsbecker @ 2024-06-08  0:35 UTC (permalink / raw)
  To: 'ellie', git

On Friday, June 7, 2024 8:03 PM, ellie wrote:
>Subject: Re: With big repos and slower connections, git clone can be hard to work
>with
>
>Thanks, this is very helpful as an emergency workaround!
>
>Nevertheless, I usually want the entire history, especially since I wouldn't mind
>waiting half an hour. But without resume, I've encountered it regularly that it just
>won't complete even if I give it the time, while way longer downloads in the
>browser would. The key problem here seems to be the lack of any resume.
>
>I hope this helps to understand why I made the suggestion.
>
>Regards,
>
>Ellie
>
>On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote:
>> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>>> I'm terribly sorry if this is the wrong place, but I'd like to
>>> suggest a potential issue with "git clone".
>>>
>>> The problem is that any sort of interruption or connection issue, no
>>> matter how brief, causes the clone to stop and leave nothing behind:
>>>
>>> $ git clone https://github.com/Nheko-Reborn/nheko
>>> Cloning into 'nheko'...
>>> remote: Enumerating objects: 43991, done.
>>> remote: Counting objects: 100% (6535/6535), done.
>>> remote: Compressing objects: 100% (1449/1449), done.
>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>> CANCEL (err 8)
>>> error: 2771 bytes of body are still expected
>>> fetch-pack: unexpected disconnect while reading sideband packet
>>> fatal: early EOF
>>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>>> bash: cd: nheko: No such file or director
>>>
>>> In my experience, this can be really impactful with 1. big repositories and 2.
>>> unreliable internet - which I would argue isn't unheard of! E.g.
>>> a developer may work via mobile connection on a business trip. The
>>> result can even be that a repository is uncloneable for some users!
>>>
>>> This has left me in the absurd situation where I was able to download
>>> a tarball via HTTPS from the git hoster just fine, even way larger
>>> binary release items, thanks to the browser's HTTPS resume. And yet a
>>> simple git clone of the same project failed repeatedly.
>>>
>>> My deepest apologies if I missed an option to fix or address this.
>>> But summed up, please consider making git clone recover from hiccups.
>>>
>>> Regards,
>>>
>>> Ellie
>>>
>>> PS: I've seen git hosters have apparent proxy bugs, like timing out
>>> slower git clone connections from the server side even if the
>>> transfer is ongoing. A git auto-resume would reduce the impact of that, too.
>>
>> I suggest that you look into two git topics: --depth, which controls how much
>history is obtained in a clone, and sparse-checkout, which describes the part of the
>repository you will retrieve. You can prune the contents of the repository so that
>clone is faster, if you do not need all of the history, or all of the files. This is typically
>done in complex large repositories, particularly those used for production support
>as release repositories.

Consider doing the clone with --depth=1 then using git fetch --depth=n as the resume. There are other options that effectively give you a resume, including --deepen=n.

Build automation, like Jenkins, uses this to speed up the clone/checkout.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  0:35     ` rsbecker
@ 2024-06-08  0:46       ` ellie
  2024-06-08  8:43         ` Jeff King
  2024-07-07 23:42         ` ellie
  0 siblings, 2 replies; 43+ messages in thread
From: ellie @ 2024-06-08  0:46 UTC (permalink / raw)
  To: rsbecker, git

The deepening worked perfectly, thank you so much! I hope a resume will 
still be considered however, if even just to help out newcomers.

Regards,

Ellie

On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote:
> On Friday, June 7, 2024 8:03 PM, ellie wrote:
>> Subject: Re: With big repos and slower connections, git clone can be hard to work
>> with
>>
>> Thanks, this is very helpful as an emergency workaround!
>>
>> Nevertheless, I usually want the entire history, especially since I wouldn't mind
>> waiting half an hour. But without resume, I've encountered it regularly that it just
>> won't complete even if I give it the time, while way longer downloads in the
>> browser would. The key problem here seems to be the lack of any resume.
>>
>> I hope this helps to understand why I made the suggestion.
>>
>> Regards,
>>
>> Ellie
>>
>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote:
>>> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>>>> I'm terribly sorry if this is the wrong place, but I'd like to
>>>> suggest a potential issue with "git clone".
>>>>
>>>> The problem is that any sort of interruption or connection issue, no
>>>> matter how brief, causes the clone to stop and leave nothing behind:
>>>>
>>>> $ git clone https://github.com/Nheko-Reborn/nheko
>>>> Cloning into 'nheko'...
>>>> remote: Enumerating objects: 43991, done.
>>>> remote: Counting objects: 100% (6535/6535), done.
>>>> remote: Compressing objects: 100% (1449/1449), done.
>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>> CANCEL (err 8)
>>>> error: 2771 bytes of body are still expected
>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>> fatal: early EOF
>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>>>> bash: cd: nheko: No such file or director
>>>>
>>>> In my experience, this can be really impactful with 1. big repositories and 2.
>>>> unreliable internet - which I would argue isn't unheard of! E.g.
>>>> a developer may work via mobile connection on a business trip. The
>>>> result can even be that a repository is uncloneable for some users!
>>>>
>>>> This has left me in the absurd situation where I was able to download
>>>> a tarball via HTTPS from the git hoster just fine, even way larger
>>>> binary release items, thanks to the browser's HTTPS resume. And yet a
>>>> simple git clone of the same project failed repeatedly.
>>>>
>>>> My deepest apologies if I missed an option to fix or address this.
>>>> But summed up, please consider making git clone recover from hiccups.
>>>>
>>>> Regards,
>>>>
>>>> Ellie
>>>>
>>>> PS: I've seen git hosters have apparent proxy bugs, like timing out
>>>> slower git clone connections from the server side even if the
>>>> transfer is ongoing. A git auto-resume would reduce the impact of that, too.
>>>
>>> I suggest that you look into two git topics: --depth, which controls how much
>> history is obtained in a clone, and sparse-checkout, which describes the part of the
>> repository you will retrieve. You can prune the contents of the repository so that
>> clone is faster, if you do not need all of the history, or all of the files. This is typically
>> done in complex large repositories, particularly those used for production support
>> as release repositories.
> 
> Consider doing the clone with --depth=1 then using git fetch --depth=n as the resume. There are other options that effectively give you a resume, including --deepen=n.
> 
> Build automation, like Jenkins, uses this to speed up the clone/checkout.
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  0:46       ` ellie
@ 2024-06-08  8:43         ` Jeff King
  2024-06-08  9:40           ` ellie
                             ` (3 more replies)
  2024-07-07 23:42         ` ellie
  1 sibling, 4 replies; 43+ messages in thread
From: Jeff King @ 2024-06-08  8:43 UTC (permalink / raw)
  To: ellie; +Cc: rsbecker, git

On Sat, Jun 08, 2024 at 02:46:38AM +0200, ellie wrote:

> The deepening worked perfectly, thank you so much! I hope a resume will
> still be considered however, if even just to help out newcomers.

Because the packfile to send the user is created on the fly, making a
clone fully resumable is tricky (a second clone may get an equivalent
but slightly different pack due to new objects entering the repo, or
even raciness between threads).

One strategy people have worked on is for servers to point clients at
static packfiles (which _do_ remain byte-for-byte identical, and can be
resumed) to get some of the objects. But it requires some scheme on the
server side to decide when and how to create those packfiles. So while
there is support inside Git itself for this idea (both on the server and
client side), I don't know of any servers where it is in active use.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  8:43         ` Jeff King
@ 2024-06-08  9:40           ` ellie
  2024-06-08  9:44             ` ellie
  2024-06-08 10:35             ` Jeff King
  2024-06-08 19:00           ` Junio C Hamano
                             ` (2 subsequent siblings)
  3 siblings, 2 replies; 43+ messages in thread
From: ellie @ 2024-06-08  9:40 UTC (permalink / raw)
  To: Jeff King; +Cc: rsbecker, git

Sorry if I'm misunderstanding, and I assume this is a naive suggestion 
that may not work in some way: but couldn't git somehow retain all the 
objects it already has fully downloaded cached? And then otherwise start 
over cleanly (and automatically), but just get the objects it already 
has from the local cache? In practice, that might already be enough to 
get through a longer clone despite occasional hiccups.

Sorry, I'm really not qualified to make good suggestions, it's just that 
the current situation feels frustrating as an outside user.

Regards,

Ellie

On 6/8/24 10:43 AM, Jeff King wrote:
> On Sat, Jun 08, 2024 at 02:46:38AM +0200, ellie wrote:
> 
>> The deepening worked perfectly, thank you so much! I hope a resume will
>> still be considered however, if even just to help out newcomers.
> 
> Because the packfile to send the user is created on the fly, making a
> clone fully resumable is tricky (a second clone may get an equivalent
> but slightly different pack due to new objects entering the repo, or
> even raciness between threads).
> 
> One strategy people have worked on is for servers to point clients at
> static packfiles (which _do_ remain byte-for-byte identical, and can be
> resumed) to get some of the objects. But it requires some scheme on the
> server side to decide when and how to create those packfiles. So while
> there is support inside Git itself for this idea (both on the server and
> client side), I don't know of any servers where it is in active use.
> 
> -Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  9:40           ` ellie
@ 2024-06-08  9:44             ` ellie
  2024-06-08 10:38               ` Jeff King
  2024-06-08 10:35             ` Jeff King
  1 sibling, 1 reply; 43+ messages in thread
From: ellie @ 2024-06-08  9:44 UTC (permalink / raw)
  To: Jeff King; +Cc: rsbecker, git

Another idea that probably is silly in some way too: couldn't after the 
first error, git automatically start over and do this whole --depth=1 
followed by --deepen... automatically? I feel like anything that 
wouldn't require knowing and manually doing that process would be an 
improvement for people affected often by this.

Regards,

Ellie

On 6/8/24 11:40 AM, ellie wrote:
> Sorry if I'm misunderstanding, and I assume this is a naive suggestion 
> that may not work in some way: but couldn't git somehow retain all the 
> objects it already has fully downloaded cached? And then otherwise start 
> over cleanly (and automatically), but just get the objects it already 
> has from the local cache? In practice, that might already be enough to 
> get through a longer clone despite occasional hiccups.
> 
> Sorry, I'm really not qualified to make good suggestions, it's just that 
> the current situation feels frustrating as an outside user.
> 
> Regards,
> 
> Ellie
> 
> On 6/8/24 10:43 AM, Jeff King wrote:
>> On Sat, Jun 08, 2024 at 02:46:38AM +0200, ellie wrote:
>>
>>> The deepening worked perfectly, thank you so much! I hope a resume will
>>> still be considered however, if even just to help out newcomers.
>>
>> Because the packfile to send the user is created on the fly, making a
>> clone fully resumable is tricky (a second clone may get an equivalent
>> but slightly different pack due to new objects entering the repo, or
>> even raciness between threads).
>>
>> One strategy people have worked on is for servers to point clients at
>> static packfiles (which _do_ remain byte-for-byte identical, and can be
>> resumed) to get some of the objects. But it requires some scheme on the
>> server side to decide when and how to create those packfiles. So while
>> there is support inside Git itself for this idea (both on the server and
>> client side), I don't know of any servers where it is in active use.
>>
>> -Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  9:44             ` ellie
@ 2024-06-08 10:38               ` Jeff King
  0 siblings, 0 replies; 43+ messages in thread
From: Jeff King @ 2024-06-08 10:38 UTC (permalink / raw)
  To: ellie; +Cc: rsbecker, git

On Sat, Jun 08, 2024 at 11:44:09AM +0200, ellie wrote:

> Another idea that probably is silly in some way too: couldn't after the
> first error, git automatically start over and do this whole --depth=1
> followed by --deepen... automatically? I feel like anything that wouldn't
> require knowing and manually doing that process would be an improvement for
> people affected often by this.

I'm skeptical that shallow-cloning and deepening is a good strategy in
general. Serving shallow clones like this is expensive for the server,
and there's more network overhead in the back-and-forth requests.

It also only slices up the repository in one dimension. There could be
a single tree that's really big, or even a single blob that you can
never get past.

So yes, it may work sometimes, but I don't think it's something we
should codify.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  9:40           ` ellie
  2024-06-08  9:44             ` ellie
@ 2024-06-08 10:35             ` Jeff King
  2024-06-08 11:05               ` ellie
  1 sibling, 1 reply; 43+ messages in thread
From: Jeff King @ 2024-06-08 10:35 UTC (permalink / raw)
  To: ellie; +Cc: rsbecker, git

On Sat, Jun 08, 2024 at 11:40:47AM +0200, ellie wrote:

> Sorry if I'm misunderstanding, and I assume this is a naive suggestion that
> may not work in some way: but couldn't git somehow retain all the objects it
> already has fully downloaded cached? And then otherwise start over cleanly
> (and automatically), but just get the objects it already has from the local
> cache? In practice, that might already be enough to get through a longer
> clone despite occasional hiccups.

The problem is that the client/server communication does not share an
explicit list of objects. Instead, the client tells the server some
points in the object graph that it wants (i.e., the tips of some
branches that it wants to fetch) and that it already has (existing
branches, or nothing in the case of a clone), and then the server can do
its own graph traversal to figure out what needs to be sent.

When you've got a partially completed clone, the client can figure out
which objects it received. But it can't tell the server "hey, I have
commit XYZ, don't send that". Because the server would assume that
having XYZ means that it has all of the objects reachable from there
(parent commits, their trees and blobs, and so on). And the pack does
not come in that order.

And even if there was a way to disable reachability analysis, and send a
"raw" set of objects that we already have, it would be prohibitively
large. The full set of sha1 hashes for linux.git is over 200MB. So
naively saying "don't send object X, I have it" would approach that
size.

It's possible the client could do some analysis to see if it has
complete segments of history. In practice it won't, because of the way
we order packfiles (it's split by type, and then roughly
reverse-chronological through history). If the server re-ordered its
response to fill history from the bottom up, it would be possible. We
don't do that now because it's not really the optimal order for
accessing objects in day-to-day use, and the packfile the server sends
is stored directly on disk by the client.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08 10:35             ` Jeff King
@ 2024-06-08 11:05               ` ellie
  0 siblings, 0 replies; 43+ messages in thread
From: ellie @ 2024-06-08 11:05 UTC (permalink / raw)
  To: Jeff King; +Cc: rsbecker, git


I see! Unfortunate, but I'm thankful for your detailed explanation.

The "shallow-cloning and deepening is [...] expensive for the server" 
makes me sadder about the current situation. I don't like that I need to 
make the server's life hard just because my connection is shaky... :-|

 > It's possible the client could do some analysis to see if it has
 > complete segments of history. In practice it won't, because of the way
 > we order packfiles (it's split by type, and then roughly
 > reverse-chronological through history). If the server re-ordered its
 > response to fill history from the bottom up, it would be possible.

I wonder if that would be the most feasible idea, if any at all...?

My main take-away is that I don't know enough to suggest a good way out, 
and that git is even more impressive and complex tech than I thought. 
Thanks so much for the detailed responses, and I hope at least some of 
my uninformed rambling was of any use.

Regards,

Ellie

On 6/8/24 12:35 PM, Jeff King wrote:
> On Sat, Jun 08, 2024 at 11:40:47AM +0200, ellie wrote:
> 
>> Sorry if I'm misunderstanding, and I assume this is a naive suggestion that
>> may not work in some way: but couldn't git somehow retain all the objects it
>> already has fully downloaded cached? And then otherwise start over cleanly
>> (and automatically), but just get the objects it already has from the local
>> cache? In practice, that might already be enough to get through a longer
>> clone despite occasional hiccups.
> 
> The problem is that the client/server communication does not share an
> explicit list of objects. Instead, the client tells the server some
> points in the object graph that it wants (i.e., the tips of some
> branches that it wants to fetch) and that it already has (existing
> branches, or nothing in the case of a clone), and then the server can do
> its own graph traversal to figure out what needs to be sent.
> 
> When you've got a partially completed clone, the client can figure out
> which objects it received. But it can't tell the server "hey, I have
> commit XYZ, don't send that". Because the server would assume that
> having XYZ means that it has all of the objects reachable from there
> (parent commits, their trees and blobs, and so on). And the pack does
> not come in that order.
> 
> And even if there was a way to disable reachability analysis, and send a
> "raw" set of objects that we already have, it would be prohibitively
> large. The full set of sha1 hashes for linux.git is over 200MB. So
> naively saying "don't send object X, I have it" would approach that
> size.
> 
> It's possible the client could do some analysis to see if it has
> complete segments of history. In practice it won't, because of the way
> we order packfiles (it's split by type, and then roughly
> reverse-chronological through history). If the server re-ordered its
> response to fill history from the bottom up, it would be possible. We
> don't do that now because it's not really the optimal order for
> accessing objects in day-to-day use, and the packfile the server sends
> is stored directly on disk by the client.
> 
> -Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  8:43         ` Jeff King
  2024-06-08  9:40           ` ellie
@ 2024-06-08 19:00           ` Junio C Hamano
  2024-06-08 20:16             ` ellie
  2024-06-10  6:46           ` Patrick Steinhardt
  2024-06-10 19:04           ` Emily Shaffer
  3 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2024-06-08 19:00 UTC (permalink / raw)
  To: Jeff King; +Cc: ellie, rsbecker, git

Jeff King <peff@peff.net> writes:

> One strategy people have worked on is for servers to point clients at
> static packfiles (which _do_ remain byte-for-byte identical, and can be
> resumed) to get some of the objects. But it requires some scheme on the
> server side to decide when and how to create those packfiles. So while
> there is support inside Git itself for this idea (both on the server and
> client side), I don't know of any servers where it is in active use.

Didn't the bundle URL work originate at GitHub?  I thought this use
case was a reasonable match to the mechanism.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08 19:00           ` Junio C Hamano
@ 2024-06-08 20:16             ` ellie
  0 siblings, 0 replies; 43+ messages in thread
From: ellie @ 2024-06-08 20:16 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: rsbecker, git

(I'm probably not the person to answer fully. But I can say HTTPS Git 
clones from GitHub don't ever resume for me, if that's informative.)

On 6/8/24 9:00 PM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
>> One strategy people have worked on is for servers to point clients at
>> static packfiles (which _do_ remain byte-for-byte identical, and can be
>> resumed) to get some of the objects. But it requires some scheme on the
>> server side to decide when and how to create those packfiles. So while
>> there is support inside Git itself for this idea (both on the server and
>> client side), I don't know of any servers where it is in active use.
> 
> Didn't the bundle URL work originate at GitHub?  I thought this use
> case was a reasonable match to the mechanism.
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  8:43         ` Jeff King
  2024-06-08  9:40           ` ellie
  2024-06-08 19:00           ` Junio C Hamano
@ 2024-06-10  6:46           ` Patrick Steinhardt
  2024-06-10 19:04           ` Emily Shaffer
  3 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2024-06-10  6:46 UTC (permalink / raw)
  To: Jeff King; +Cc: ellie, rsbecker, git

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

On Sat, Jun 08, 2024 at 04:43:23AM -0400, Jeff King wrote:
> On Sat, Jun 08, 2024 at 02:46:38AM +0200, ellie wrote:
> 
> > The deepening worked perfectly, thank you so much! I hope a resume will
> > still be considered however, if even just to help out newcomers.
> 
> Because the packfile to send the user is created on the fly, making a
> clone fully resumable is tricky (a second clone may get an equivalent
> but slightly different pack due to new objects entering the repo, or
> even raciness between threads).
> 
> One strategy people have worked on is for servers to point clients at
> static packfiles (which _do_ remain byte-for-byte identical, and can be
> resumed) to get some of the objects. But it requires some scheme on the
> server side to decide when and how to create those packfiles. So while
> there is support inside Git itself for this idea (both on the server and
> client side), I don't know of any servers where it is in active use.

At GitLab, we have started to roll out use of bundle URIs so that we can
pregenerate them and thus reduce load. The next step to evaluate in this
context is whether we can easily reuse that infrastructure to eventually
enable resumable clones via such bundle URIs. I assume that it cannot be
that hard to make this work.

That of course wouldn't be a perfect solution, as the clone can only be
resumed as long as such a pregenerated bundle continues to exist on the
server. But it should still be way better compared to the status quo.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  8:43         ` Jeff King
                             ` (2 preceding siblings ...)
  2024-06-10  6:46           ` Patrick Steinhardt
@ 2024-06-10 19:04           ` Emily Shaffer
  2024-06-10 20:34             ` Junio C Hamano
  2024-06-11  6:26             ` Jeff King
  3 siblings, 2 replies; 43+ messages in thread
From: Emily Shaffer @ 2024-06-10 19:04 UTC (permalink / raw)
  To: Jeff King; +Cc: ellie, rsbecker, git

On Sat, Jun 8, 2024 at 1:43 AM Jeff King <peff@peff.net> wrote:
>
> On Sat, Jun 08, 2024 at 02:46:38AM +0200, ellie wrote:
>
> > The deepening worked perfectly, thank you so much! I hope a resume will
> > still be considered however, if even just to help out newcomers.
>
> Because the packfile to send the user is created on the fly, making a
> clone fully resumable is tricky (a second clone may get an equivalent
> but slightly different pack due to new objects entering the repo, or
> even raciness between threads).
>
> One strategy people have worked on is for servers to point clients at
> static packfiles (which _do_ remain byte-for-byte identical, and can be
> resumed) to get some of the objects. But it requires some scheme on the
> server side to decide when and how to create those packfiles. So while
> there is support inside Git itself for this idea (both on the server and
> client side), I don't know of any servers where it is in active use.

We use packfile offloading heavily at Google (any repositories hosted
at *.googlesource.com, as well as our internal-facing hosting). It
works quite well for us scaling large projects like Android and
Chrome; we've been using it for some time now and are happy with it.

However, one thing that's missing is the resumable download Ellie is
describing. With a clone which has been turned into a packfile fetch
from a different data store, it *should* be resumable. But the client
currently lacks the ability to do that. (This just came up for us
internally the other day, and we ended up moving an internal bug to
https://git.g-issues.gerritcodereview.com/issues/345241684.) After a
resumed clone like this, you may not necessarily have latest - for
example, you may lose connection with 90% of the clone finished, then
not get connection back for some days, after which point upstream has
moved as Peff described elsewhere in this thread. But it would still
probably be cheaper to resume that 10% of packfile fetch from the
offloaded data store, then do an incremental fetch back to the server
to get the couple days of updates on top, as compared to starting over
from zero with the server.

It seems to me that packfile URIs and bundle URIs are similar enough
that we could work out similar logic for both, no? Or maybe there's
something I'm missing about the way bundle offloading differs from
packfiles.

 - Emily

>
> -Peff
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-10 19:04           ` Emily Shaffer
@ 2024-06-10 20:34             ` Junio C Hamano
  2024-06-10 21:55               ` ellie
  2024-06-11  6:31               ` Jeff King
  2024-06-11  6:26             ` Jeff King
  1 sibling, 2 replies; 43+ messages in thread
From: Junio C Hamano @ 2024-06-10 20:34 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jeff King, ellie, rsbecker, git

Emily Shaffer <nasamuffin@google.com> writes:

> It seems to me that packfile URIs and bundle URIs are similar enough
> that we could work out similar logic for both, no? Or maybe there's
> something I'm missing about the way bundle offloading differs from
> packfiles.

Probably we can deprecate one and let the other one take over?  It
seems that bundleURI have plenty of documentation, but the only hit
for packfile URI side I find in the output of

    $ git grep -i 'pack.*file.*uri' Documentation

is the description of how the designed protocol extension is
supposed to work in Documentation/technical/packfile-uri.txt and not
even the configuration variable uploadpack.blobPackfileURI that
controls the "experimental" feature is documented.

Perhaps whoever was adding the feature to the public side stopped
after pushing out the absolute minimum and lost interest or
something?  We should update the documentation to reflect the
current status (e.g. is it still experimental? what more work do we
need on top of it to make it no longer experimental?), add at least
minimum description for server operators how to configure it on the
server side, etc. (I am assuming that the end-user does not have to
do anything to get the feature, as long as their version of Git is
recent enough).

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-10 20:34             ` Junio C Hamano
@ 2024-06-10 21:55               ` ellie
  2024-06-13 10:10                 ` Toon claes
  2024-06-11  6:31               ` Jeff King
  1 sibling, 1 reply; 43+ messages in thread
From: ellie @ 2024-06-10 21:55 UTC (permalink / raw)
  To: Junio C Hamano, Emily Shaffer; +Cc: Jeff King, rsbecker, git

Sorry for again another total newcomer/outsider question: Is a bundle or 
pack file something any regular git HTTPS instance would naturally 
provide when setup the usual ways? Like, if resume relied on that, would 
this work when following the standard smart HTTP setup procedure 
https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP (sorry if I 
got the wrong link) and then git cloning from that? That would result in 
the best availability of such a resume feature, if it ever came to be.

Regards,

Ellie

On 6/10/24 10:34 PM, Junio C Hamano wrote:
> Emily Shaffer <nasamuffin@google.com> writes:
> 
>> It seems to me that packfile URIs and bundle URIs are similar enough
>> that we could work out similar logic for both, no? Or maybe there's
>> something I'm missing about the way bundle offloading differs from
>> packfiles.
> 
> Probably we can deprecate one and let the other one take over?  It
> seems that bundleURI have plenty of documentation, but the only hit
> for packfile URI side I find in the output of
> 
>      $ git grep -i 'pack.*file.*uri' Documentation
> 
> is the description of how the designed protocol extension is
> supposed to work in Documentation/technical/packfile-uri.txt and not
> even the configuration variable uploadpack.blobPackfileURI that
> controls the "experimental" feature is documented.
> 
> Perhaps whoever was adding the feature to the public side stopped
> after pushing out the absolute minimum and lost interest or
> something?  We should update the documentation to reflect the
> current status (e.g. is it still experimental? what more work do we
> need on top of it to make it no longer experimental?), add at least
> minimum description for server operators how to configure it on the
> server side, etc. (I am assuming that the end-user does not have to
> do anything to get the feature, as long as their version of Git is
> recent enough).
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-10 21:55               ` ellie
@ 2024-06-13 10:10                 ` Toon claes
  0 siblings, 0 replies; 43+ messages in thread
From: Toon claes @ 2024-06-13 10:10 UTC (permalink / raw)
  To: ellie, Junio C Hamano, Emily Shaffer; +Cc: Jeff King, rsbecker, git

ellie <el@horse64.org> writes:

> Sorry for again another total newcomer/outsider question:

Don't apologize for asking these questions, you're more than welcome.

> Is a bundle or pack file something any regular git HTTPS instance
> would naturally provide when setup the usual ways?

Yes and no. Bundle and packfile format can used in many places.
Packfiles are used to transfer a bunch of objects, or store them locally
in Git's object database. A bundle is a packfile, but with a leading
header describing refs. You can read about that at
https://git-scm.com/docs/gitformat-bundle.

> Like, if resume relied on that, would this work when following the
> standard smart HTTP setup procedure
> https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP (sorry if
> I got the wrong link) and then git cloning from that? That would
> result in the best availability of such a resume feature, if it ever
> came to be.

As mentioned elsewhere in the thread, on clone (and fetch) the client
negotiates with the server which objects to download. Because the state
of the remote repository can change between clones, so will the result
of this negotiation. This means the content of the packfile sent over
might differ, which is disruptive for caching these files.

That's why the proposal of bundle URI or packfile URI is suggested. In
case of bundle URI, it will tell the client to download a pre-made
bundle before starting the negotiation. This bundle can be stored on a
CDN or whatever static HTTP(s) server. But it requires the server to
create it, store it, and tell the client about it. This is not something
that's builtin into Git itself at the moment.

This is not really related to the Smart HTTP protocol, because it can be
used over SSH as well. But when such file is stored on a regular HTTP
server, we can rely on resumable downloads. Only after that bundle is
downloaded, the client will start the negotiation with the server to get
missing objects and refs (which should be a small subset when the bundle
is recent).

--
Toon

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-10 20:34             ` Junio C Hamano
  2024-06-10 21:55               ` ellie
@ 2024-06-11  6:31               ` Jeff King
  2024-06-11 15:12                 ` Junio C Hamano
  1 sibling, 1 reply; 43+ messages in thread
From: Jeff King @ 2024-06-11  6:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Emily Shaffer, ellie, rsbecker, git

On Mon, Jun 10, 2024 at 01:34:12PM -0700, Junio C Hamano wrote:

> Emily Shaffer <nasamuffin@google.com> writes:
> 
> > It seems to me that packfile URIs and bundle URIs are similar enough
> > that we could work out similar logic for both, no? Or maybe there's
> > something I'm missing about the way bundle offloading differs from
> > packfiles.
> 
> Probably we can deprecate one and let the other one take over?  It
> seems that bundleURI have plenty of documentation, but the only hit
> for packfile URI side I find in the output of
> 
>     $ git grep -i 'pack.*file.*uri' Documentation
> 
> is the description of how the designed protocol extension is
> supposed to work in Documentation/technical/packfile-uri.txt and not
> even the configuration variable uploadpack.blobPackfileURI that
> controls the "experimental" feature is documented.

I think they serve two different purposes. A packfile URI does not have
any connectivity guarantees. So it lets a server say "here's all the
objects, except for XYZ which you should fetch from this URL". That's
good for offloading pieces of a clone, like single large objects.

Whereas bundle URIs require very little cooperation from the server.
While a server can advertise bundle URIs, it doesn't need to know about
the particular bundle a client grabbed. The client comes back with the
usual have/want, just like any other fetching client.

At least that's my understanding. I have to admit I didn't follow the
recent bundleURI work all that closely.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-11  6:31               ` Jeff King
@ 2024-06-11 15:12                 ` Junio C Hamano
  2024-06-29  1:53                   ` Sitaram Chamarty
  0 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2024-06-11 15:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Emily Shaffer, ellie, rsbecker, git

Jeff King <peff@peff.net> writes:

> I think they serve two different purposes. A packfile URI does not have
> any connectivity guarantees. So it lets a server say "here's all the
> objects, except for XYZ which you should fetch from this URL". That's
> good for offloading pieces of a clone, like single large objects.
>
> Whereas bundle URIs require very little cooperation from the server.
> While a server can advertise bundle URIs, it doesn't need to know about
> the particular bundle a client grabbed. The client comes back with the
> usual have/want, just like any other fetching client.

Yes, a bundle being a self-contained "object-store + tips", it is
a much more suitable building block for offloading clone traffic.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-11 15:12                 ` Junio C Hamano
@ 2024-06-29  1:53                   ` Sitaram Chamarty
  0 siblings, 0 replies; 43+ messages in thread
From: Sitaram Chamarty @ 2024-06-29  1:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Emily Shaffer, ellie, rsbecker, git, konstantin

On Tue, Jun 11, 2024 at 08:12:12AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > I think they serve two different purposes. A packfile URI does not have
> > any connectivity guarantees. So it lets a server say "here's all the
> > objects, except for XYZ which you should fetch from this URL". That's
> > good for offloading pieces of a clone, like single large objects.
> >
> > Whereas bundle URIs require very little cooperation from the server.
> > While a server can advertise bundle URIs, it doesn't need to know about
> > the particular bundle a client grabbed. The client comes back with the
> > usual have/want, just like any other fetching client.
> 
> Yes, a bundle being a self-contained "object-store + tips", it is
> a much more suitable building block for offloading clone traffic.

[Adding mricon to cc]

Apologies for jumping in so late...

Gitolite supports this out of the box.  Just a couple of lines
change to the rc file and users can just run `rsync` (still
mediated and access controlled by gitolite) to get a bundle.
Admittedly the first call by someone may take some time but it
*is* resumable.

See [1] for details.

[1]: https://github.com/sitaramc/gitolite/blob/master/src/commands/rsync

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-10 19:04           ` Emily Shaffer
  2024-06-10 20:34             ` Junio C Hamano
@ 2024-06-11  6:26             ` Jeff King
  2024-06-11 19:40               ` Ivan Frade
  1 sibling, 1 reply; 43+ messages in thread
From: Jeff King @ 2024-06-11  6:26 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: ellie, rsbecker, git

On Mon, Jun 10, 2024 at 12:04:30PM -0700, Emily Shaffer wrote:

> > One strategy people have worked on is for servers to point clients at
> > static packfiles (which _do_ remain byte-for-byte identical, and can be
> > resumed) to get some of the objects. But it requires some scheme on the
> > server side to decide when and how to create those packfiles. So while
> > there is support inside Git itself for this idea (both on the server and
> > client side), I don't know of any servers where it is in active use.
> 
> We use packfile offloading heavily at Google (any repositories hosted
> at *.googlesource.com, as well as our internal-facing hosting). It
> works quite well for us scaling large projects like Android and
> Chrome; we've been using it for some time now and are happy with it.

Cool! I'm glad to hear it is in use.

It might be helpful for other potential users if you can share how you
decide when to create the off-loaded packfiles, what goes in them, and
so on. IIRC the server-side config is mostly geared at stuffing a few
large blobs into a pack (since each blob must have an individual config
key). Maybe JGit (which I'm assuming is what powers googlesource) has
better options there.

> However, one thing that's missing is the resumable download Ellie is
> describing. With a clone which has been turned into a packfile fetch
> from a different data store, it *should* be resumable. But the client
> currently lacks the ability to do that. (This just came up for us
> internally the other day, and we ended up moving an internal bug to
> https://git.g-issues.gerritcodereview.com/issues/345241684.) After a
> resumed clone like this, you may not necessarily have latest - for
> example, you may lose connection with 90% of the clone finished, then
> not get connection back for some days, after which point upstream has
> moved as Peff described elsewhere in this thread. But it would still
> probably be cheaper to resume that 10% of packfile fetch from the
> offloaded data store, then do an incremental fetch back to the server
> to get the couple days of updates on top, as compared to starting over
> from zero with the server.

I do agree that resuming the offloaded parts, even if it is a few days
later, will generally be beneficial.

For packfile offloading, I think the server has to be aware of what's in
the packfiles (since it has to know not to send you those objects). So
if you got all of the server's response packfile, but didn't finish the
offloaded packfiles, it's a no-brainer to finish downloading them,
completing your old clone. And then you can fetch on top of that to get
fully up to date.

But if you didn't get all of the server's response, then you have to
contact it again. If it points you to the same offloaded packfile, you
can resume that transfer. But if it has moved on and doesn't advertise
that packfile anymore, I don't think it's useful.

Whereas with bundleURI offloading, I think the client could always
resume grabbing the bundle. Whatever it got is going to be useful
because it will tell the server what it already has in the usual way
(packfile offloads can't do that because the individual packfiles don't
enforce the usual reachability guarantees).

> It seems to me that packfile URIs and bundle URIs are similar enough
> that we could work out similar logic for both, no? Or maybe there's
> something I'm missing about the way bundle offloading differs from
> packfiles.

They are pretty similar, but I think the resume strategy would be a
little different, based on what I wrote above.

In general I don't think packfile-uris are that useful for resuming,
compared to bundle URIs.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-11  6:26             ` Jeff King
@ 2024-06-11 19:40               ` Ivan Frade
  0 siblings, 0 replies; 43+ messages in thread
From: Ivan Frade @ 2024-06-11 19:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Emily Shaffer, ellie, rsbecker, git

On Mon, Jun 10, 2024 at 11:27 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Jun 10, 2024 at 12:04:30PM -0700, Emily Shaffer wrote:
>
> > > One strategy people have worked on is for servers to point clients at
> > > static packfiles (which _do_ remain byte-for-byte identical, and can be
> > > resumed) to get some of the objects. But it requires some scheme on the
> > > server side to decide when and how to create those packfiles. So while
> > > there is support inside Git itself for this idea (both on the server and
> > > client side), I don't know of any servers where it is in active use.
> >
> > We use packfile offloading heavily at Google (any repositories hosted
> > at *.googlesource.com, as well as our internal-facing hosting). It
> > works quite well for us scaling large projects like Android and
> > Chrome; we've been using it for some time now and are happy with it.
>
> Cool! I'm glad to hear it is in use.
>
> It might be helpful for other potential users if you can share how you
> decide when to create the off-loaded packfiles, what goes in them, and
> so on. IIRC the server-side config is mostly geared at stuffing a few
> large blobs into a pack (since each blob must have an individual config
> key). Maybe JGit (which I'm assuming is what powers googlesource) has
> better options there.

IIRC the upstream conf was oriented to offload individual blobs. In
JGit/Google we do the offloading at pack level. We write to storage
and CDN when creating a pack and keep the offloaded location in the
pack metadata. We do this only in certain conditions (GC, above a
certain size,...).

At serving time, if we see that we need to send a pack "as-is" (all
objects inside are needed) and it has an offload, then we mark it to
send the URL instead of the contents. As the offload is just a copy of
the pack, we can use the pack bitmap to know what is there or not.

> > However, one thing that's missing is the resumable download Ellie is
> > describing.

Another thing missing in the offload story is supporting offloads in
non-http protocols. e.g. after cloning via my-protocol://, being able
to fetch my-protocol://blah/blah urls.

Ivan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-08  0:46       ` ellie
  2024-06-08  8:43         ` Jeff King
@ 2024-07-07 23:42         ` ellie
  2024-07-08  1:27           ` rsbecker
  1 sibling, 1 reply; 43+ messages in thread
From: ellie @ 2024-07-07 23:42 UTC (permalink / raw)
  To: rsbecker, git

I have now encountered a repository where even --deepen=1 is bound to be 
failing because it pulls in something fairly large that takes a few 
minutes. (Possibly, the server proxy has a faulty timeout setting that 
punishes slow connections, but for connections unreliable on the client 
side the problem would be the same.)

So this workaround sadly doesn't seem to cover all cases of resume.

Regards,

Ellie

On 6/8/24 2:46 AM, ellie wrote:
> The deepening worked perfectly, thank you so much! I hope a resume will 
> still be considered however, if even just to help out newcomers.
> 
> Regards,
> 
> Ellie
> 
> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote:
>> On Friday, June 7, 2024 8:03 PM, ellie wrote:
>>> Subject: Re: With big repos and slower connections, git clone can be 
>>> hard to work
>>> with
>>>
>>> Thanks, this is very helpful as an emergency workaround!
>>>
>>> Nevertheless, I usually want the entire history, especially since I 
>>> wouldn't mind
>>> waiting half an hour. But without resume, I've encountered it 
>>> regularly that it just
>>> won't complete even if I give it the time, while way longer downloads 
>>> in the
>>> browser would. The key problem here seems to be the lack of any resume.
>>>
>>> I hope this helps to understand why I made the suggestion.
>>>
>>> Regards,
>>>
>>> Ellie
>>>
>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote:
>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>>>>> I'm terribly sorry if this is the wrong place, but I'd like to
>>>>> suggest a potential issue with "git clone".
>>>>>
>>>>> The problem is that any sort of interruption or connection issue, no
>>>>> matter how brief, causes the clone to stop and leave nothing behind:
>>>>>
>>>>> $ git clone https://github.com/Nheko-Reborn/nheko
>>>>> Cloning into 'nheko'...
>>>>> remote: Enumerating objects: 43991, done.
>>>>> remote: Counting objects: 100% (6535/6535), done.
>>>>> remote: Compressing objects: 100% (1449/1449), done.
>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>> CANCEL (err 8)
>>>>> error: 2771 bytes of body are still expected
>>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>>> fatal: early EOF
>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>>>>> bash: cd: nheko: No such file or director
>>>>>
>>>>> In my experience, this can be really impactful with 1. big 
>>>>> repositories and 2.
>>>>> unreliable internet - which I would argue isn't unheard of! E.g.
>>>>> a developer may work via mobile connection on a business trip. The
>>>>> result can even be that a repository is uncloneable for some users!
>>>>>
>>>>> This has left me in the absurd situation where I was able to download
>>>>> a tarball via HTTPS from the git hoster just fine, even way larger
>>>>> binary release items, thanks to the browser's HTTPS resume. And yet a
>>>>> simple git clone of the same project failed repeatedly.
>>>>>
>>>>> My deepest apologies if I missed an option to fix or address this.
>>>>> But summed up, please consider making git clone recover from hiccups.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Ellie
>>>>>
>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing out
>>>>> slower git clone connections from the server side even if the
>>>>> transfer is ongoing. A git auto-resume would reduce the impact of 
>>>>> that, too.
>>>>
>>>> I suggest that you look into two git topics: --depth, which controls 
>>>> how much
>>> history is obtained in a clone, and sparse-checkout, which describes 
>>> the part of the
>>> repository you will retrieve. You can prune the contents of the 
>>> repository so that
>>> clone is faster, if you do not need all of the history, or all of the 
>>> files. This is typically
>>> done in complex large repositories, particularly those used for 
>>> production support
>>> as release repositories.
>>
>> Consider doing the clone with --depth=1 then using git fetch --depth=n 
>> as the resume. There are other options that effectively give you a 
>> resume, including --deepen=n.
>>
>> Build automation, like Jenkins, uses this to speed up the clone/checkout.
>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: With big repos and slower connections, git clone can be hard to work with
  2024-07-07 23:42         ` ellie
@ 2024-07-08  1:27           ` rsbecker
  2024-07-08  2:28             ` ellie
  0 siblings, 1 reply; 43+ messages in thread
From: rsbecker @ 2024-07-08  1:27 UTC (permalink / raw)
  To: 'ellie', git

On Sunday, July 7, 2024 7:42 PM, ellie wrote:
>I have now encountered a repository where even --deepen=1 is bound to be failing
>because it pulls in something fairly large that takes a few minutes. (Possibly, the
>server proxy has a faulty timeout setting that punishes slow connections, but for
>connections unreliable on the client side the problem would be the same.)
>
>So this workaround sadly doesn't seem to cover all cases of resume.
>
>Regards,
>
>Ellie
>
>On 6/8/24 2:46 AM, ellie wrote:
>> The deepening worked perfectly, thank you so much! I hope a resume
>> will still be considered however, if even just to help out newcomers.
>>
>> Regards,
>>
>> Ellie
>>
>> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote:
>>> On Friday, June 7, 2024 8:03 PM, ellie wrote:
>>>> Subject: Re: With big repos and slower connections, git clone can be
>>>> hard to work with
>>>>
>>>> Thanks, this is very helpful as an emergency workaround!
>>>>
>>>> Nevertheless, I usually want the entire history, especially since I
>>>> wouldn't mind waiting half an hour. But without resume, I've
>>>> encountered it regularly that it just won't complete even if I give
>>>> it the time, while way longer downloads in the browser would. The
>>>> key problem here seems to be the lack of any resume.
>>>>
>>>> I hope this helps to understand why I made the suggestion.
>>>>
>>>> Regards,
>>>>
>>>> Ellie
>>>>
>>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote:
>>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>>>>>> I'm terribly sorry if this is the wrong place, but I'd like to
>>>>>> suggest a potential issue with "git clone".
>>>>>>
>>>>>> The problem is that any sort of interruption or connection issue,
>>>>>> no matter how brief, causes the clone to stop and leave nothing behind:
>>>>>>
>>>>>> $ git clone https://github.com/Nheko-Reborn/nheko
>>>>>> Cloning into 'nheko'...
>>>>>> remote: Enumerating objects: 43991, done.
>>>>>> remote: Counting objects: 100% (6535/6535), done.
>>>>>> remote: Compressing objects: 100% (1449/1449), done.
>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>>> CANCEL (err 8)
>>>>>> error: 2771 bytes of body are still expected
>>>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>>>> fatal: early EOF
>>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>>>>>> bash: cd: nheko: No such file or director
>>>>>>
>>>>>> In my experience, this can be really impactful with 1. big
>>>>>> repositories and 2.
>>>>>> unreliable internet - which I would argue isn't unheard of! E.g.
>>>>>> a developer may work via mobile connection on a business trip. The
>>>>>> result can even be that a repository is uncloneable for some users!
>>>>>>
>>>>>> This has left me in the absurd situation where I was able to
>>>>>> download a tarball via HTTPS from the git hoster just fine, even
>>>>>> way larger binary release items, thanks to the browser's HTTPS
>>>>>> resume. And yet a simple git clone of the same project failed repeatedly.
>>>>>>
>>>>>> My deepest apologies if I missed an option to fix or address this.
>>>>>> But summed up, please consider making git clone recover from hiccups.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Ellie
>>>>>>
>>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing
>>>>>> out slower git clone connections from the server side even if the
>>>>>> transfer is ongoing. A git auto-resume would reduce the impact of
>>>>>> that, too.
>>>>>
>>>>> I suggest that you look into two git topics: --depth, which
>>>>> controls how much
>>>> history is obtained in a clone, and sparse-checkout, which describes
>>>> the part of the repository you will retrieve. You can prune the
>>>> contents of the repository so that clone is faster, if you do not
>>>> need all of the history, or all of the files. This is typically done
>>>> in complex large repositories, particularly those used for
>>>> production support as release repositories.
>>>
>>> Consider doing the clone with --depth=1 then using git fetch
>>> --depth=n as the resume. There are other options that effectively
>>> give you a resume, including --deepen=n.
>>>
>>> Build automation, like Jenkins, uses this to speed up the clone/checkout.

Can you please provide more details on this? It is difficult to understand your issue without knowing what situation is failing? What size file? Is this a large single pack file? Can you reproduce this with a script we can try?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08  1:27           ` rsbecker
@ 2024-07-08  2:28             ` ellie
  2024-07-08 12:30               ` rsbecker
  0 siblings, 1 reply; 43+ messages in thread
From: ellie @ 2024-07-08  2:28 UTC (permalink / raw)
  To: rsbecker, git

I was intending to suggest that depending on the largest object in the 
repository, resume may remain a concern for lower end users. My 
apologies for being unclear.

As for my concrete problem, I can only guess what's happening, maybe 
github's HTTPS proxy too eagerly discarding slow connections:

$ git clone https://github.com/maliit/keyboard maliit-keyboard
Cloning into 'maliit-keyboard'...
remote: Enumerating objects: 23243, done.
remote: Counting objects: 100% (464/464), done.
remote: Compressing objects: 100% (207/207), done.
error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: 
CANCEL (err 8)
error: 2507 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

A deepen seems to fail for this repo since one deepen step already gets 
killed off. Git HTTPS clones from any other hoster I tried, including 
gitlab.com, work fine, as do git SSH clones from github.com.

Sorry for the long tangent. Basically, my point was just that resume 
still seems like a good idea even with deepen existing.

Regards,

Ellie

On 7/8/24 3:27 AM, rsbecker@nexbridge.com wrote:
> On Sunday, July 7, 2024 7:42 PM, ellie wrote:
>> I have now encountered a repository where even --deepen=1 is bound to be failing
>> because it pulls in something fairly large that takes a few minutes. (Possibly, the
>> server proxy has a faulty timeout setting that punishes slow connections, but for
>> connections unreliable on the client side the problem would be the same.)
>>
>> So this workaround sadly doesn't seem to cover all cases of resume.
>>
>> Regards,
>>
>> Ellie
>>
>> On 6/8/24 2:46 AM, ellie wrote:
>>> The deepening worked perfectly, thank you so much! I hope a resume
>>> will still be considered however, if even just to help out newcomers.
>>>
>>> Regards,
>>>
>>> Ellie
>>>
>>> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote:
>>>> On Friday, June 7, 2024 8:03 PM, ellie wrote:
>>>>> Subject: Re: With big repos and slower connections, git clone can be
>>>>> hard to work with
>>>>>
>>>>> Thanks, this is very helpful as an emergency workaround!
>>>>>
>>>>> Nevertheless, I usually want the entire history, especially since I
>>>>> wouldn't mind waiting half an hour. But without resume, I've
>>>>> encountered it regularly that it just won't complete even if I give
>>>>> it the time, while way longer downloads in the browser would. The
>>>>> key problem here seems to be the lack of any resume.
>>>>>
>>>>> I hope this helps to understand why I made the suggestion.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Ellie
>>>>>
>>>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote:
>>>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>>>>>>> I'm terribly sorry if this is the wrong place, but I'd like to
>>>>>>> suggest a potential issue with "git clone".
>>>>>>>
>>>>>>> The problem is that any sort of interruption or connection issue,
>>>>>>> no matter how brief, causes the clone to stop and leave nothing behind:
>>>>>>>
>>>>>>> $ git clone https://github.com/Nheko-Reborn/nheko
>>>>>>> Cloning into 'nheko'...
>>>>>>> remote: Enumerating objects: 43991, done.
>>>>>>> remote: Counting objects: 100% (6535/6535), done.
>>>>>>> remote: Compressing objects: 100% (1449/1449), done.
>>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>>>> CANCEL (err 8)
>>>>>>> error: 2771 bytes of body are still expected
>>>>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>>>>> fatal: early EOF
>>>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>>>>>>> bash: cd: nheko: No such file or director
>>>>>>>
>>>>>>> In my experience, this can be really impactful with 1. big
>>>>>>> repositories and 2.
>>>>>>> unreliable internet - which I would argue isn't unheard of! E.g.
>>>>>>> a developer may work via mobile connection on a business trip. The
>>>>>>> result can even be that a repository is uncloneable for some users!
>>>>>>>
>>>>>>> This has left me in the absurd situation where I was able to
>>>>>>> download a tarball via HTTPS from the git hoster just fine, even
>>>>>>> way larger binary release items, thanks to the browser's HTTPS
>>>>>>> resume. And yet a simple git clone of the same project failed repeatedly.
>>>>>>>
>>>>>>> My deepest apologies if I missed an option to fix or address this.
>>>>>>> But summed up, please consider making git clone recover from hiccups.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Ellie
>>>>>>>
>>>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing
>>>>>>> out slower git clone connections from the server side even if the
>>>>>>> transfer is ongoing. A git auto-resume would reduce the impact of
>>>>>>> that, too.
>>>>>>
>>>>>> I suggest that you look into two git topics: --depth, which
>>>>>> controls how much
>>>>> history is obtained in a clone, and sparse-checkout, which describes
>>>>> the part of the repository you will retrieve. You can prune the
>>>>> contents of the repository so that clone is faster, if you do not
>>>>> need all of the history, or all of the files. This is typically done
>>>>> in complex large repositories, particularly those used for
>>>>> production support as release repositories.
>>>>
>>>> Consider doing the clone with --depth=1 then using git fetch
>>>> --depth=n as the resume. There are other options that effectively
>>>> give you a resume, including --deepen=n.
>>>>
>>>> Build automation, like Jenkins, uses this to speed up the clone/checkout.
> 
> Can you please provide more details on this? It is difficult to understand your issue without knowing what situation is failing? What size file? Is this a large single pack file? Can you reproduce this with a script we can try?
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: With big repos and slower connections, git clone can be hard to work with
  2024-07-08  2:28             ` ellie
@ 2024-07-08 12:30               ` rsbecker
  2024-07-08 12:41                 ` ellie
  0 siblings, 1 reply; 43+ messages in thread
From: rsbecker @ 2024-07-08 12:30 UTC (permalink / raw)
  To: 'ellie', git

On Sunday, July 7, 2024 10:28 PM, ellie wrote:
>I was intending to suggest that depending on the largest object in the repository,
>resume may remain a concern for lower end users. My apologies for being unclear.
>
>As for my concrete problem, I can only guess what's happening, maybe github's
>HTTPS proxy too eagerly discarding slow connections:
>
>$ git clone https://github.com/maliit/keyboard maliit-keyboard Cloning into 'maliit-
>keyboard'...
>remote: Enumerating objects: 23243, done.
>remote: Counting objects: 100% (464/464), done.
>remote: Compressing objects: 100% (207/207), done.
>error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>CANCEL (err 8)
>error: 2507 bytes of body are still expected
>fetch-pack: unexpected disconnect while reading sideband packet
>fatal: early EOF
>fatal: fetch-pack: invalid index-pack output
>
>A deepen seems to fail for this repo since one deepen step already gets killed off. Git
>HTTPS clones from any other hoster I tried, including gitlab.com, work fine, as do git
>SSH clones from github.com.
>
>Sorry for the long tangent. Basically, my point was just that resume still seems like a
>good idea even with deepen existing.
>
>Regards,
>
>Ellie
>
>On 7/8/24 3:27 AM, rsbecker@nexbridge.com wrote:
>> On Sunday, July 7, 2024 7:42 PM, ellie wrote:
>>> I have now encountered a repository where even --deepen=1 is bound to
>>> be failing because it pulls in something fairly large that takes a
>>> few minutes. (Possibly, the server proxy has a faulty timeout setting
>>> that punishes slow connections, but for connections unreliable on the
>>> client side the problem would be the same.)
>>>
>>> So this workaround sadly doesn't seem to cover all cases of resume.
>>>
>>> Regards,
>>>
>>> Ellie
>>>
>>> On 6/8/24 2:46 AM, ellie wrote:
>>>> The deepening worked perfectly, thank you so much! I hope a resume
>>>> will still be considered however, if even just to help out newcomers.
>>>>
>>>> Regards,
>>>>
>>>> Ellie
>>>>
>>>> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote:
>>>>> On Friday, June 7, 2024 8:03 PM, ellie wrote:
>>>>>> Subject: Re: With big repos and slower connections, git clone can
>>>>>> be hard to work with
>>>>>>
>>>>>> Thanks, this is very helpful as an emergency workaround!
>>>>>>
>>>>>> Nevertheless, I usually want the entire history, especially since
>>>>>> I wouldn't mind waiting half an hour. But without resume, I've
>>>>>> encountered it regularly that it just won't complete even if I
>>>>>> give it the time, while way longer downloads in the browser would.
>>>>>> The key problem here seems to be the lack of any resume.
>>>>>>
>>>>>> I hope this helps to understand why I made the suggestion.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Ellie
>>>>>>
>>>>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote:
>>>>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>>>>>>>> I'm terribly sorry if this is the wrong place, but I'd like to
>>>>>>>> suggest a potential issue with "git clone".
>>>>>>>>
>>>>>>>> The problem is that any sort of interruption or connection
>>>>>>>> issue, no matter how brief, causes the clone to stop and leave nothing
>behind:
>>>>>>>>
>>>>>>>> $ git clone https://github.com/Nheko-Reborn/nheko
>>>>>>>> Cloning into 'nheko'...
>>>>>>>> remote: Enumerating objects: 43991, done.
>>>>>>>> remote: Counting objects: 100% (6535/6535), done.
>>>>>>>> remote: Compressing objects: 100% (1449/1449), done.
>>>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>>>>> CANCEL (err 8)
>>>>>>>> error: 2771 bytes of body are still expected
>>>>>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>>>>>> fatal: early EOF
>>>>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>>>>>>>> bash: cd: nheko: No such file or director
>>>>>>>>
>>>>>>>> In my experience, this can be really impactful with 1. big
>>>>>>>> repositories and 2.
>>>>>>>> unreliable internet - which I would argue isn't unheard of! E.g.
>>>>>>>> a developer may work via mobile connection on a business trip.
>>>>>>>> The result can even be that a repository is uncloneable for some users!
>>>>>>>>
>>>>>>>> This has left me in the absurd situation where I was able to
>>>>>>>> download a tarball via HTTPS from the git hoster just fine, even
>>>>>>>> way larger binary release items, thanks to the browser's HTTPS
>>>>>>>> resume. And yet a simple git clone of the same project failed repeatedly.
>>>>>>>>
>>>>>>>> My deepest apologies if I missed an option to fix or address this.
>>>>>>>> But summed up, please consider making git clone recover from hiccups.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Ellie
>>>>>>>>
>>>>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing
>>>>>>>> out slower git clone connections from the server side even if
>>>>>>>> the transfer is ongoing. A git auto-resume would reduce the
>>>>>>>> impact of that, too.
>>>>>>>
>>>>>>> I suggest that you look into two git topics: --depth, which
>>>>>>> controls how much
>>>>>> history is obtained in a clone, and sparse-checkout, which
>>>>>> describes the part of the repository you will retrieve. You can
>>>>>> prune the contents of the repository so that clone is faster, if
>>>>>> you do not need all of the history, or all of the files. This is
>>>>>> typically done in complex large repositories, particularly those
>>>>>> used for production support as release repositories.
>>>>>
>>>>> Consider doing the clone with --depth=1 then using git fetch
>>>>> --depth=n as the resume. There are other options that effectively
>>>>> give you a resume, including --deepen=n.
>>>>>
>>>>> Build automation, like Jenkins, uses this to speed up the clone/checkout.
>>
>> Can you please provide more details on this? It is difficult to understand your issue
>without knowing what situation is failing? What size file? Is this a large single pack
>file? Can you reproduce this with a script we can try?
>>

First, for this mailing list, please put your replies at the bottom.

Second, the full clone takes under 5 seconds on my system and does not experience any error that you are seeing. I suggest that your ISP may be throttling your account. I have seen this happen on some ISPs under SSH but few under HTTPS. It is likely a firewall or as you said, a proxy setting. GitHub has no proxy.

My suggestion is that this is more of a communication issue instead of than a large repo issue. 133Mb is a relatively small a repository and clones quickly. This might be something to take up on the GitHub support forums rather that for git - since it seems like something in the path outside of git is not working correctly. None of the files in this repository, including pack-files is larger than 100 blocks, so there is not much point with a mid-pack restart.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 12:30               ` rsbecker
@ 2024-07-08 12:41                 ` ellie
  2024-07-08 14:32                   ` Konstantin Khomoutov
  0 siblings, 1 reply; 43+ messages in thread
From: ellie @ 2024-07-08 12:41 UTC (permalink / raw)
  To: rsbecker, git



On 7/8/24 2:30 PM, rsbecker@nexbridge.com wrote:
> On Sunday, July 7, 2024 10:28 PM, ellie wrote:
>> I was intending to suggest that depending on the largest object in the repository,
>> resume may remain a concern for lower end users. My apologies for being unclear.
>>
>> As for my concrete problem, I can only guess what's happening, maybe github's
>> HTTPS proxy too eagerly discarding slow connections:
>>
>> $ git clone https://github.com/maliit/keyboard maliit-keyboard Cloning into 'maliit-
>> keyboard'...
>> remote: Enumerating objects: 23243, done.
>> remote: Counting objects: 100% (464/464), done.
>> remote: Compressing objects: 100% (207/207), done.
>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>> CANCEL (err 8)
>> error: 2507 bytes of body are still expected
>> fetch-pack: unexpected disconnect while reading sideband packet
>> fatal: early EOF
>> fatal: fetch-pack: invalid index-pack output
>>
>> A deepen seems to fail for this repo since one deepen step already gets killed off. Git
>> HTTPS clones from any other hoster I tried, including gitlab.com, work fine, as do git
>> SSH clones from github.com.
>>
>> Sorry for the long tangent. Basically, my point was just that resume still seems like a
>> good idea even with deepen existing.
>>
>> Regards,
>>
>> Ellie
>>
>> On 7/8/24 3:27 AM, rsbecker@nexbridge.com wrote:
>>> On Sunday, July 7, 2024 7:42 PM, ellie wrote:
>>>> I have now encountered a repository where even --deepen=1 is bound to
>>>> be failing because it pulls in something fairly large that takes a
>>>> few minutes. (Possibly, the server proxy has a faulty timeout setting
>>>> that punishes slow connections, but for connections unreliable on the
>>>> client side the problem would be the same.)
>>>>
>>>> So this workaround sadly doesn't seem to cover all cases of resume.
>>>>
>>>> Regards,
>>>>
>>>> Ellie
>>>>
>>>> On 6/8/24 2:46 AM, ellie wrote:
>>>>> The deepening worked perfectly, thank you so much! I hope a resume
>>>>> will still be considered however, if even just to help out newcomers.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Ellie
>>>>>
>>>>> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote:
>>>>>> On Friday, June 7, 2024 8:03 PM, ellie wrote:
>>>>>>> Subject: Re: With big repos and slower connections, git clone can
>>>>>>> be hard to work with
>>>>>>>
>>>>>>> Thanks, this is very helpful as an emergency workaround!
>>>>>>>
>>>>>>> Nevertheless, I usually want the entire history, especially since
>>>>>>> I wouldn't mind waiting half an hour. But without resume, I've
>>>>>>> encountered it regularly that it just won't complete even if I
>>>>>>> give it the time, while way longer downloads in the browser would.
>>>>>>> The key problem here seems to be the lack of any resume.
>>>>>>>
>>>>>>> I hope this helps to understand why I made the suggestion.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Ellie
>>>>>>>
>>>>>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote:
>>>>>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote:
>>>>>>>>> I'm terribly sorry if this is the wrong place, but I'd like to
>>>>>>>>> suggest a potential issue with "git clone".
>>>>>>>>>
>>>>>>>>> The problem is that any sort of interruption or connection
>>>>>>>>> issue, no matter how brief, causes the clone to stop and leave nothing
>> behind:
>>>>>>>>>
>>>>>>>>> $ git clone https://github.com/Nheko-Reborn/nheko
>>>>>>>>> Cloning into 'nheko'...
>>>>>>>>> remote: Enumerating objects: 43991, done.
>>>>>>>>> remote: Counting objects: 100% (6535/6535), done.
>>>>>>>>> remote: Compressing objects: 100% (1449/1449), done.
>>>>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>>>>>> CANCEL (err 8)
>>>>>>>>> error: 2771 bytes of body are still expected
>>>>>>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>>>>>>> fatal: early EOF
>>>>>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko
>>>>>>>>> bash: cd: nheko: No such file or director
>>>>>>>>>
>>>>>>>>> In my experience, this can be really impactful with 1. big
>>>>>>>>> repositories and 2.
>>>>>>>>> unreliable internet - which I would argue isn't unheard of! E.g.
>>>>>>>>> a developer may work via mobile connection on a business trip.
>>>>>>>>> The result can even be that a repository is uncloneable for some users!
>>>>>>>>>
>>>>>>>>> This has left me in the absurd situation where I was able to
>>>>>>>>> download a tarball via HTTPS from the git hoster just fine, even
>>>>>>>>> way larger binary release items, thanks to the browser's HTTPS
>>>>>>>>> resume. And yet a simple git clone of the same project failed repeatedly.
>>>>>>>>>
>>>>>>>>> My deepest apologies if I missed an option to fix or address this.
>>>>>>>>> But summed up, please consider making git clone recover from hiccups.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Ellie
>>>>>>>>>
>>>>>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing
>>>>>>>>> out slower git clone connections from the server side even if
>>>>>>>>> the transfer is ongoing. A git auto-resume would reduce the
>>>>>>>>> impact of that, too.
>>>>>>>>
>>>>>>>> I suggest that you look into two git topics: --depth, which
>>>>>>>> controls how much
>>>>>>> history is obtained in a clone, and sparse-checkout, which
>>>>>>> describes the part of the repository you will retrieve. You can
>>>>>>> prune the contents of the repository so that clone is faster, if
>>>>>>> you do not need all of the history, or all of the files. This is
>>>>>>> typically done in complex large repositories, particularly those
>>>>>>> used for production support as release repositories.
>>>>>>
>>>>>> Consider doing the clone with --depth=1 then using git fetch
>>>>>> --depth=n as the resume. There are other options that effectively
>>>>>> give you a resume, including --deepen=n.
>>>>>>
>>>>>> Build automation, like Jenkins, uses this to speed up the clone/checkout.
>>>
>>> Can you please provide more details on this? It is difficult to understand your issue
>> without knowing what situation is failing? What size file? Is this a large single pack
>> file? Can you reproduce this with a script we can try?
>>>
> 
> First, for this mailing list, please put your replies at the bottom.
> 
> Second, the full clone takes under 5 seconds on my system and does not experience any error that you are seeing. I suggest that your ISP may be throttling your account. I have seen this happen on some ISPs under SSH but few under HTTPS. It is likely a firewall or as you said, a proxy setting. GitHub has no proxy.
> 
> My suggestion is that this is more of a communication issue instead of than a large repo issue. 133Mb is a relatively small a repository and clones quickly. This might be something to take up on the GitHub support forums rather that for git - since it seems like something in the path outside of git is not working correctly. None of the files in this repository, including pack-files is larger than 100 blocks, so there is not much point with a mid-pack restart.
> 

I apologize for not placing the responses where expected.

It seems extremely unlikely to me to be possibly an ISP issue, for which 
I already listed the reasons. An additional one is HTTPS downloads from 
github outside of git, e.g. from zip archives, for way larger files work 
fine as well.

Nevertheless, this irrelevant to my initial request. Since even if it's 
not caused by a Github server side issue, a resume would still help.

Regards,

Ellie

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 12:41                 ` ellie
@ 2024-07-08 14:32                   ` Konstantin Khomoutov
  2024-07-08 15:02                     ` rsbecker
  2024-07-08 15:14                     ` ellie
  0 siblings, 2 replies; 43+ messages in thread
From: Konstantin Khomoutov @ 2024-07-08 14:32 UTC (permalink / raw)
  To: ellie; +Cc: rsbecker, git

On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote:

[...]
> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL
> (err 8)
[...]
> It seems extremely unlikely to me to be possibly an ISP issue, for which I
> already listed the reasons. An additional one is HTTPS downloads from github
> outside of git, e.g. from zip archives, for way larger files work fine as
> well.
[...]

What if you explicitly disable HTTP/2 when cloning?

  git -c http.version=HTTP/1.1 clone ...

should probably do this.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 14:32                   ` Konstantin Khomoutov
@ 2024-07-08 15:02                     ` rsbecker
  2024-07-08 15:14                     ` ellie
  1 sibling, 0 replies; 43+ messages in thread
From: rsbecker @ 2024-07-08 15:02 UTC (permalink / raw)
  To: 'Konstantin Khomoutov', 'ellie'; +Cc: git

On Monday, July 8, 2024 10:33 AM, Konstantin Khomoutov wrote:
>On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote:
>
>[...]
>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>> CANCEL (err 8)
>[...]
>> It seems extremely unlikely to me to be possibly an ISP issue, for
>> which I already listed the reasons. An additional one is HTTPS
>> downloads from github outside of git, e.g. from zip archives, for way
>> larger files work fine as well.
>[...]
>
>What if you explicitly disable HTTP/2 when cloning?
>
>  git -c http.version=HTTP/1.1 clone ...
>
>should probably do this.

I can verify that this works in my environment.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 14:32                   ` Konstantin Khomoutov
  2024-07-08 15:02                     ` rsbecker
@ 2024-07-08 15:14                     ` ellie
  2024-07-08 15:31                       ` rsbecker
  2024-07-08 15:44                       ` Konstantin Khomoutov
  1 sibling, 2 replies; 43+ messages in thread
From: ellie @ 2024-07-08 15:14 UTC (permalink / raw)
  To: rsbecker, git

On 7/8/24 4:32 PM, Konstantin Khomoutov wrote:
> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote:
> 
> [...]
>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL
>> (err 8)
> [...]
>> It seems extremely unlikely to me to be possibly an ISP issue, for which I
>> already listed the reasons. An additional one is HTTPS downloads from github
>> outside of git, e.g. from zip archives, for way larger files work fine as
>> well.
> [...]
> 
> What if you explicitly disable HTTP/2 when cloning?
> 
>    git -c http.version=HTTP/1.1 clone ...
> 
> should probably do this.
> 

Thanks for the idea! I tested it:

$  git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard 
maliit-keyboard
Cloning into 'maliit-keyboard'...
remote: Enumerating objects: 23243, done.
remote: Counting objects: 100% (464/464), done.
remote: Compressing objects: 100% (207/207), done.
error: RPC failed; curl 18 transfer closed with outstanding read data 
remaining
error: 5361 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

Sadly, it seems like the error is only slightly different. It was still 
worth a try. I contacted GitHub support a while ago but it got stuck. If 
there were resume available such hiccups wouldn't matter, I hope that 
explains why I suggested that feature.

Regards,

Ellie

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 15:14                     ` ellie
@ 2024-07-08 15:31                       ` rsbecker
  2024-07-08 15:48                         ` ellie
  2024-07-08 16:09                         ` Emanuel Czirai
  2024-07-08 15:44                       ` Konstantin Khomoutov
  1 sibling, 2 replies; 43+ messages in thread
From: rsbecker @ 2024-07-08 15:31 UTC (permalink / raw)
  To: 'ellie', git

On Monday, July 8, 2024 11:15 AM, ellie wrote:
>On 7/8/24 4:32 PM, Konstantin Khomoutov wrote:
>> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote:
>>
>> [...]
>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>> CANCEL (err 8)
>> [...]
>>> It seems extremely unlikely to me to be possibly an ISP issue, for
>>> which I already listed the reasons. An additional one is HTTPS
>>> downloads from github outside of git, e.g. from zip archives, for way
>>> larger files work fine as well.
>> [...]
>>
>> What if you explicitly disable HTTP/2 when cloning?
>>
>>    git -c http.version=HTTP/1.1 clone ...
>>
>> should probably do this.
>>
>
>Thanks for the idea! I tested it:
>
>$  git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard
>maliit-keyboard
>Cloning into 'maliit-keyboard'...
>remote: Enumerating objects: 23243, done.
>remote: Counting objects: 100% (464/464), done.
>remote: Compressing objects: 100% (207/207), done.
>error: RPC failed; curl 18 transfer closed with outstanding read data remaining
>error: 5361 bytes of body are still expected
>fetch-pack: unexpected disconnect while reading sideband packet
>fatal: early EOF
>fatal: fetch-pack: invalid index-pack output
>
>Sadly, it seems like the error is only slightly different. It was still worth a try. I
>contacted GitHub support a while ago but it got stuck. If there were resume
>available such hiccups wouldn't matter, I hope that explains why I suggested that
>feature.

I don't really understand what "it got stuck" means. Is that a colloquialism? What got stuck? That case at GitHub?

Have you tried git config --global http.postBuffer 524288000

It might help. The feature being requesting, even if possible, will probably not happen quickly, unless someone has a solid and simple design for this. That is why we are trying to figure out the root cause of your situation, which is not clear to me as to what exactly is failing (possibly a buffer size issue, if this is consistently failing). My experience, as I said before, on these symptoms, is a proxy (even a local one) that is in the way. If you have your linux instance on a VM, the hypervisor may not be configured correctly. Lack of further evidence (all we really have is the curl RPC failure) makes diagnosing this very difficult.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 15:31                       ` rsbecker
@ 2024-07-08 15:48                         ` ellie
  2024-07-08 16:23                           ` rsbecker
  2024-07-08 16:09                         ` Emanuel Czirai
  1 sibling, 1 reply; 43+ messages in thread
From: ellie @ 2024-07-08 15:48 UTC (permalink / raw)
  To: rsbecker, git



On 7/8/24 5:31 PM, rsbecker@nexbridge.com wrote:
> On Monday, July 8, 2024 11:15 AM, ellie wrote:
>> On 7/8/24 4:32 PM, Konstantin Khomoutov wrote:
>>> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote:
>>>
>>> [...]
>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>> CANCEL (err 8)
>>> [...]
>>>> It seems extremely unlikely to me to be possibly an ISP issue, for
>>>> which I already listed the reasons. An additional one is HTTPS
>>>> downloads from github outside of git, e.g. from zip archives, for way
>>>> larger files work fine as well.
>>> [...]
>>>
>>> What if you explicitly disable HTTP/2 when cloning?
>>>
>>>     git -c http.version=HTTP/1.1 clone ...
>>>
>>> should probably do this.
>>>
>>
>> Thanks for the idea! I tested it:
>>
>> $  git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard
>> maliit-keyboard
>> Cloning into 'maliit-keyboard'...
>> remote: Enumerating objects: 23243, done.
>> remote: Counting objects: 100% (464/464), done.
>> remote: Compressing objects: 100% (207/207), done.
>> error: RPC failed; curl 18 transfer closed with outstanding read data remaining
>> error: 5361 bytes of body are still expected
>> fetch-pack: unexpected disconnect while reading sideband packet
>> fatal: early EOF
>> fatal: fetch-pack: invalid index-pack output
>>
>> Sadly, it seems like the error is only slightly different. It was still worth a try. I
>> contacted GitHub support a while ago but it got stuck. If there were resume
>> available such hiccups wouldn't matter, I hope that explains why I suggested that
>> feature.
> 
> I don't really understand what "it got stuck" means. Is that a colloquialism? What got stuck? That case at GitHub?
> 
> Have you tried git config --global http.postBuffer 524288000
> 
> It might help. The feature being requesting, even if possible, will probably not happen quickly, unless someone has a solid and simple design for this. That is why we are trying to figure out the root cause of your situation, which is not clear to me as to what exactly is failing (possibly a buffer size issue, if this is consistently failing). My experience, as I said before, on these symptoms, is a proxy (even a local one) that is in the way. If you have your linux instance on a VM, the hypervisor may not be configured correctly. Lack of further evidence (all we really have is the curl RPC failure) makes diagnosing this very difficult.
> 

Thanks for your response, I appreciate it. I don't know what the hold up 
is for them, but I'm probably too unimportant, which I understand. I'm 
not an enterprise user, and >99% of others have faster connections than 
me which is perhaps why they dodge this config(?) issue.

And thanks for your suggestion, but sadly it seems to have no effect:

$ git config --global http.postBuffer 524288000
$ git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard 
maliit-keyboard
Cloning into 'maliit-keyboard'...
remote: Enumerating objects: 23243, done.
remote: Counting objects: 100% (464/464), done.
remote: Compressing objects: 100% (207/207), done.
error: RPC failed; curl 18 transfer closed with outstanding read data 
remaining
error: 2444 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

I'm doubtful this is solvable without either some resume or a fix from 
Github's end. But I can use SSH clone so this isn't urgent.

Resume just seemed like an idea that would also help others, and it's 
what makes many other internet services work much better for me.

Regards,

Ellie


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 15:48                         ` ellie
@ 2024-07-08 16:23                           ` rsbecker
  2024-07-08 17:06                             ` ellie
  0 siblings, 1 reply; 43+ messages in thread
From: rsbecker @ 2024-07-08 16:23 UTC (permalink / raw)
  To: 'ellie', git

On Monday, July 8, 2024 11:49 AM, ellie wrote:
>On 7/8/24 5:31 PM, rsbecker@nexbridge.com wrote:
>> On Monday, July 8, 2024 11:15 AM, ellie wrote:
>>> On 7/8/24 4:32 PM, Konstantin Khomoutov wrote:
>>>> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote:
>>>>
>>>> [...]
>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>> CANCEL (err 8)
>>>> [...]
>>>>> It seems extremely unlikely to me to be possibly an ISP issue, for
>>>>> which I already listed the reasons. An additional one is HTTPS
>>>>> downloads from github outside of git, e.g. from zip archives, for
>>>>> way larger files work fine as well.
>>>> [...]
>>>>
>>>> What if you explicitly disable HTTP/2 when cloning?
>>>>
>>>>     git -c http.version=HTTP/1.1 clone ...
>>>>
>>>> should probably do this.
>>>>
>>>
>>> Thanks for the idea! I tested it:
>>>
>>> $  git -c http.version=HTTP/1.1 clone
>>> https://github.com/maliit/keyboard
>>> maliit-keyboard
>>> Cloning into 'maliit-keyboard'...
>>> remote: Enumerating objects: 23243, done.
>>> remote: Counting objects: 100% (464/464), done.
>>> remote: Compressing objects: 100% (207/207), done.
>>> error: RPC failed; curl 18 transfer closed with outstanding read data
>>> remaining
>>> error: 5361 bytes of body are still expected
>>> fetch-pack: unexpected disconnect while reading sideband packet
>>> fatal: early EOF
>>> fatal: fetch-pack: invalid index-pack output
>>>
>>> Sadly, it seems like the error is only slightly different. It was
>>> still worth a try. I contacted GitHub support a while ago but it got
>>> stuck. If there were resume available such hiccups wouldn't matter, I
>>> hope that explains why I suggested that feature.
>>
>> I don't really understand what "it got stuck" means. Is that a colloquialism? What
>got stuck? That case at GitHub?
>>
>> Have you tried git config --global http.postBuffer 524288000
>>
>> It might help. The feature being requesting, even if possible, will probably not
>happen quickly, unless someone has a solid and simple design for this. That is why
>we are trying to figure out the root cause of your situation, which is not clear to me
>as to what exactly is failing (possibly a buffer size issue, if this is consistently failing).
>My experience, as I said before, on these symptoms, is a proxy (even a local one)
>that is in the way. If you have your linux instance on a VM, the hypervisor may not
>be configured correctly. Lack of further evidence (all we really have is the curl RPC
>failure) makes diagnosing this very difficult.
>>
>
>Thanks for your response, I appreciate it. I don't know what the hold up is for them,
>but I'm probably too unimportant, which I understand. I'm not an enterprise user,
>and >99% of others have faster connections than me which is perhaps why they
>dodge this config(?) issue.
>
>And thanks for your suggestion, but sadly it seems to have no effect:
>
>$ git config --global http.postBuffer 524288000 $ git -c http.version=HTTP/1.1
>clone https://github.com/maliit/keyboard
>maliit-keyboard
>Cloning into 'maliit-keyboard'...
>remote: Enumerating objects: 23243, done.
>remote: Counting objects: 100% (464/464), done.
>remote: Compressing objects: 100% (207/207), done.
>error: RPC failed; curl 18 transfer closed with outstanding read data remaining
>error: 2444 bytes of body are still expected
>fetch-pack: unexpected disconnect while reading sideband packet
>fatal: early EOF
>fatal: fetch-pack: invalid index-pack output
>
>I'm doubtful this is solvable without either some resume or a fix from Github's end.
>But I can use SSH clone so this isn't urgent.
>
>Resume just seemed like an idea that would also help others, and it's what makes
>many other internet services work much better for me.

I do not know which pack file is having the issue - it may be the first one. Try running with the following environment variables GIT_TRACE=true and GIT_PACKET_TRACE=true. This will not correct the problem but might give additional helpful information. git uses libcurl to perform https transfers - which appears to be where the error is coming from. It is my opinion, given the issue is very likely in curl, that a restart capability will not help at all - at least not until we find the actual root cause (still mostly an unknown, although this error is widely discussed online in other non-git places). The failure appears to be transferring a single pack file (139824442 bytes) size may be an issue, but restarting in the middle of a pack file may not solve the problem (discussed in other threads) as the file is potentially built on demand (as I understand it from GitHub) and may not be the same on the next clone attempt. What we probably will find is that a restart will be stuck in the same spot and not move forward because the failure is not at a file boundary.

In addition to this, GitHub may have limits on the size of files that can be transferred, which you might be hitting (unlikely but possible). Check your plan options. I tried on a light plan, so this is unlikely but I want to exclude it.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 16:23                           ` rsbecker
@ 2024-07-08 17:06                             ` ellie
  2024-07-08 17:38                               ` rsbecker
  0 siblings, 1 reply; 43+ messages in thread
From: ellie @ 2024-07-08 17:06 UTC (permalink / raw)
  To: rsbecker, git

[-- Attachment #1: Type: text/plain, Size: 5674 bytes --]



On 7/8/24 6:23 PM, rsbecker@nexbridge.com wrote:
> On Monday, July 8, 2024 11:49 AM, ellie wrote:
>> On 7/8/24 5:31 PM, rsbecker@nexbridge.com wrote:
>>> On Monday, July 8, 2024 11:15 AM, ellie wrote:
>>>> On 7/8/24 4:32 PM, Konstantin Khomoutov wrote:
>>>>> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote:
>>>>>
>>>>> [...]
>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>>> CANCEL (err 8)
>>>>> [...]
>>>>>> It seems extremely unlikely to me to be possibly an ISP issue, for
>>>>>> which I already listed the reasons. An additional one is HTTPS
>>>>>> downloads from github outside of git, e.g. from zip archives, for
>>>>>> way larger files work fine as well.
>>>>> [...]
>>>>>
>>>>> What if you explicitly disable HTTP/2 when cloning?
>>>>>
>>>>>      git -c http.version=HTTP/1.1 clone ...
>>>>>
>>>>> should probably do this.
>>>>>
>>>>
>>>> Thanks for the idea! I tested it:
>>>>
>>>> $  git -c http.version=HTTP/1.1 clone
>>>> https://github.com/maliit/keyboard
>>>> maliit-keyboard
>>>> Cloning into 'maliit-keyboard'...
>>>> remote: Enumerating objects: 23243, done.
>>>> remote: Counting objects: 100% (464/464), done.
>>>> remote: Compressing objects: 100% (207/207), done.
>>>> error: RPC failed; curl 18 transfer closed with outstanding read data
>>>> remaining
>>>> error: 5361 bytes of body are still expected
>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>> fatal: early EOF
>>>> fatal: fetch-pack: invalid index-pack output
>>>>
>>>> Sadly, it seems like the error is only slightly different. It was
>>>> still worth a try. I contacted GitHub support a while ago but it got
>>>> stuck. If there were resume available such hiccups wouldn't matter, I
>>>> hope that explains why I suggested that feature.
>>>
>>> I don't really understand what "it got stuck" means. Is that a colloquialism? What
>> got stuck? That case at GitHub?
>>>
>>> Have you tried git config --global http.postBuffer 524288000
>>>
>>> It might help. The feature being requesting, even if possible, will probably not
>> happen quickly, unless someone has a solid and simple design for this. That is why
>> we are trying to figure out the root cause of your situation, which is not clear to me
>> as to what exactly is failing (possibly a buffer size issue, if this is consistently failing).
>> My experience, as I said before, on these symptoms, is a proxy (even a local one)
>> that is in the way. If you have your linux instance on a VM, the hypervisor may not
>> be configured correctly. Lack of further evidence (all we really have is the curl RPC
>> failure) makes diagnosing this very difficult.
>>>
>>
>> Thanks for your response, I appreciate it. I don't know what the hold up is for them,
>> but I'm probably too unimportant, which I understand. I'm not an enterprise user,
>> and >99% of others have faster connections than me which is perhaps why they
>> dodge this config(?) issue.
>>
>> And thanks for your suggestion, but sadly it seems to have no effect:
>>
>> $ git config --global http.postBuffer 524288000 $ git -c http.version=HTTP/1.1
>> clone https://github.com/maliit/keyboard
>> maliit-keyboard
>> Cloning into 'maliit-keyboard'...
>> remote: Enumerating objects: 23243, done.
>> remote: Counting objects: 100% (464/464), done.
>> remote: Compressing objects: 100% (207/207), done.
>> error: RPC failed; curl 18 transfer closed with outstanding read data remaining
>> error: 2444 bytes of body are still expected
>> fetch-pack: unexpected disconnect while reading sideband packet
>> fatal: early EOF
>> fatal: fetch-pack: invalid index-pack output
>>
>> I'm doubtful this is solvable without either some resume or a fix from Github's end.
>> But I can use SSH clone so this isn't urgent.
>>
>> Resume just seemed like an idea that would also help others, and it's what makes
>> many other internet services work much better for me.
> 
> I do not know which pack file is having the issue - it may be the first one. Try running with the following environment variables GIT_TRACE=true and GIT_PACKET_TRACE=true. This will not correct the problem but might give additional helpful information. git uses libcurl to perform https transfers - which appears to be where the error is coming from. It is my opinion, given the issue is very likely in curl, that a restart capability will not help at all - at least not until we find the actual root cause (still mostly an unknown, although this error is widely discussed online in other non-git places). The failure appears to be transferring a single pack file (139824442 bytes) size may be an issue, but restarting in the middle of a pack file may not solve the problem (discussed in other threads) as the file is potentially built on demand (as I understand it from GitHub) and may not be the same on the next clone attempt. What we probably will find is that a restart will be stuck in the same spot and not move forward because the failure is not at a file boundary.
> 
> In addition to this, GitHub may have limits on the size of files that can be transferred, which you might be hitting (unlikely but possible). Check your plan options. I tried on a light plan, so this is unlikely but I want to exclude it.
> 
> 
I attached the output of this command:

$ GIT_TRACE=true GIT_PACKET_TRACE=true git -c http.version=HTTP/1.1 
clone https://github.com/malii
t/keyboard maliit-keyboard > log.txt 2>&1

My best guess is still that due to some unfortunate timeout choice, 
Github's end simply becomes impatient and closes the connection.

Regards,

Ellie

[-- Attachment #2: log.txt --]
[-- Type: text/plain, Size: 1090 bytes --]

18:44:33.182907 git.c:465               trace: built-in: git clone https://github.com/maliit/keyboard maliit-keyboard
Cloning into 'maliit-keyboard'...
18:44:33.186926 run-command.c:657       trace: run_command: git remote-https origin https://github.com/maliit/keyboard
18:44:33.188668 git.c:750               trace: exec: git-remote-https origin https://github.com/maliit/keyboard
18:44:33.188728 run-command.c:657       trace: run_command: git-remote-https origin https://github.com/maliit/keyboard
18:44:34.757740 run-command.c:657       trace: run_command: git index-pack --stdin --fix-thin '--keep=fetch-pack 14261 on elliedeck' --check-self-contained-and-connected
18:44:34.759305 git.c:465               trace: built-in: git index-pack --stdin --fix-thin '--keep=fetch-pack 14261 on elliedeck' --check-self-contained-and-connected
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
error: 5858 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 17:06                             ` ellie
@ 2024-07-08 17:38                               ` rsbecker
  0 siblings, 0 replies; 43+ messages in thread
From: rsbecker @ 2024-07-08 17:38 UTC (permalink / raw)
  To: 'ellie', git

On Monday, July 8, 2024 1:06 PM, ellie wrote:
>On 7/8/24 6:23 PM, rsbecker@nexbridge.com wrote:
>> On Monday, July 8, 2024 11:49 AM, ellie wrote:
>>> On 7/8/24 5:31 PM, rsbecker@nexbridge.com wrote:
>>>> On Monday, July 8, 2024 11:15 AM, ellie wrote:
>>>>> On 7/8/24 4:32 PM, Konstantin Khomoutov wrote:
>>>>>> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote:
>>>>>>
>>>>>> [...]
>>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>>>> CANCEL (err 8)
>>>>>> [...]
>>>>>>> It seems extremely unlikely to me to be possibly an ISP issue,
>>>>>>> for which I already listed the reasons. An additional one is
>>>>>>> HTTPS downloads from github outside of git, e.g. from zip
>>>>>>> archives, for way larger files work fine as well.
>>>>>> [...]
>>>>>>
>>>>>> What if you explicitly disable HTTP/2 when cloning?
>>>>>>
>>>>>>      git -c http.version=HTTP/1.1 clone ...
>>>>>>
>>>>>> should probably do this.
>>>>>>
>>>>>
>>>>> Thanks for the idea! I tested it:
>>>>>
>>>>> $  git -c http.version=HTTP/1.1 clone
>>>>> https://github.com/maliit/keyboard
>>>>> maliit-keyboard
>>>>> Cloning into 'maliit-keyboard'...
>>>>> remote: Enumerating objects: 23243, done.
>>>>> remote: Counting objects: 100% (464/464), done.
>>>>> remote: Compressing objects: 100% (207/207), done.
>>>>> error: RPC failed; curl 18 transfer closed with outstanding read
>>>>> data remaining
>>>>> error: 5361 bytes of body are still expected
>>>>> fetch-pack: unexpected disconnect while reading sideband packet
>>>>> fatal: early EOF
>>>>> fatal: fetch-pack: invalid index-pack output
>>>>>
>>>>> Sadly, it seems like the error is only slightly different. It was
>>>>> still worth a try. I contacted GitHub support a while ago but it
>>>>> got stuck. If there were resume available such hiccups wouldn't
>>>>> matter, I hope that explains why I suggested that feature.
>>>>
>>>> I don't really understand what "it got stuck" means. Is that a
>>>> colloquialism? What
>>> got stuck? That case at GitHub?
>>>>
>>>> Have you tried git config --global http.postBuffer 524288000
>>>>
>>>> It might help. The feature being requesting, even if possible, will
>>>> probably not
>>> happen quickly, unless someone has a solid and simple design for
>>> this. That is why we are trying to figure out the root cause of your
>>> situation, which is not clear to me as to what exactly is failing (possibly a buffer
>size issue, if this is consistently failing).
>>> My experience, as I said before, on these symptoms, is a proxy (even
>>> a local one) that is in the way. If you have your linux instance on a
>>> VM, the hypervisor may not be configured correctly. Lack of further
>>> evidence (all we really have is the curl RPC
>>> failure) makes diagnosing this very difficult.
>>>>
>>>
>>> Thanks for your response, I appreciate it. I don't know what the hold
>>> up is for them, but I'm probably too unimportant, which I understand.
>>> I'm not an enterprise user, and >99% of others have faster
>>> connections than me which is perhaps why they dodge this config(?) issue.
>>>
>>> And thanks for your suggestion, but sadly it seems to have no effect:
>>>
>>> $ git config --global http.postBuffer 524288000 $ git -c
>>> http.version=HTTP/1.1 clone https://github.com/maliit/keyboard
>>> maliit-keyboard
>>> Cloning into 'maliit-keyboard'...
>>> remote: Enumerating objects: 23243, done.
>>> remote: Counting objects: 100% (464/464), done.
>>> remote: Compressing objects: 100% (207/207), done.
>>> error: RPC failed; curl 18 transfer closed with outstanding read data
>>> remaining
>>> error: 2444 bytes of body are still expected
>>> fetch-pack: unexpected disconnect while reading sideband packet
>>> fatal: early EOF
>>> fatal: fetch-pack: invalid index-pack output
>>>
>>> I'm doubtful this is solvable without either some resume or a fix from Github's
>end.
>>> But I can use SSH clone so this isn't urgent.
>>>
>>> Resume just seemed like an idea that would also help others, and it's
>>> what makes many other internet services work much better for me.
>>
>> I do not know which pack file is having the issue - it may be the first one. Try
>running with the following environment variables GIT_TRACE=true and
>GIT_PACKET_TRACE=true. This will not correct the problem but might give
>additional helpful information. git uses libcurl to perform https transfers - which
>appears to be where the error is coming from. It is my opinion, given the issue is
>very likely in curl, that a restart capability will not help at all - at least not until we
>find the actual root cause (still mostly an unknown, although this error is widely
>discussed online in other non-git places). The failure appears to be transferring a
>single pack file (139824442 bytes) size may be an issue, but restarting in the middle
>of a pack file may not solve the problem (discussed in other threads) as the file is
>potentially built on demand (as I understand it from GitHub) and may not be the
>same on the next clone attempt. What we probably will find is that a restart will be
>stuck in the same spot and not move forward because the failure is not at a file
>boundary.
>>
>> In addition to this, GitHub may have limits on the size of files that can be
>transferred, which you might be hitting (unlikely but possible). Check your plan
>options. I tried on a light plan, so this is unlikely but I want to exclude it.
>>
>>
>I attached the output of this command:
>
>$ GIT_TRACE=true GIT_PACKET_TRACE=true git -c http.version=HTTP/1.1 clone
>https://github.com/malii t/keyboard maliit-keyboard > log.txt 2>&1
>
>My best guess is still that due to some unfortunate timeout choice, Github's end
>simply becomes impatient and closes the connection.

18:44:33.182907 git.c:465               trace: built-in: git clone https://github.com/maliit/keyboard maliit-keyboard
Cloning into 'maliit-keyboard'...
18:44:33.186926 run-command.c:657       trace: run_command: git remote-https origin https://github.com/maliit/keyboard
18:44:33.188668 git.c:750               trace: exec: git-remote-https origin https://github.com/maliit/keyboard
18:44:33.188728 run-command.c:657       trace: run_command: git-remote-https origin https://github.com/maliit/keyboard
18:44:34.757740 run-command.c:657       trace: run_command: git index-pack --stdin --fix-thin '--keep=fetch-pack 14261 on elliedeck' --check-self-contained-and-connected
18:44:34.759305 git.c:465               trace: built-in: git index-pack --stdin --fix-thin '--keep=fetch-pack 14261 on elliedeck' --check-self-contained-and-connected
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
error: 5858 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

From what I could tell from the log, the operation took less than 3 seconds. How long does it appear to take to you? This does not look like a timeout. In fact, it looks like the failure happened before git was able to process any content. From what I read from the log, libcurl encountered a failure and passed that up to git, which stopped the operation. You could try putting -v into your .curlrc file or otherwise getting some verbose information out of curl here the failure is occurring. I would also suggest passing this over to the curl team for examination. I am at a loss on resolving this, further, particularly if there are no intermediary components like firewalls and proxies - note that many ISPs build firewalls and proxies into their NAT routers. A curl verbose trace might show this. My home ISP in Canada has all kinds of stuff in their cable modems, which I had disabled by the tech who installed the box, and I have no issues cloning the above repo. They do have QoS limits but have not blocked https downloads.

--Randall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 15:31                       ` rsbecker
  2024-07-08 15:48                         ` ellie
@ 2024-07-08 16:09                         ` Emanuel Czirai
  1 sibling, 0 replies; 43+ messages in thread
From: Emanuel Czirai @ 2024-07-08 16:09 UTC (permalink / raw)
  To: git

Can try traffic shaping it, temporarily, just to can reproduce the
issue on (presumably)anyone's linux machine, like:
$ sudo tc qdisc change dev em1 root tbf rate 8kbit burst 8kbit latency 100ms
(replace em1 with eth0 or whichever `ip a` reports as your LAN interface)

Look at it:
$ sudo tc qdisc show dev em1
qdisc tbf 8001: root refcnt 2 rate 8Kbit burst 1Kb lat 100ms

$ git clone https://github.com/maliit/keyboard
Cloning into 'keyboard'...
remote: Enumerating objects: 23243, done. remote: Counting objects:
100% (464/464), done. remote: Compressing objects: 100% (207/207),
done. error: 153 bytes of body are still expectedMiB | 1.14 MiB/s
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output


It's different for me, but maybe this traffic shaping idea might still
help if properly modified? (maybe it's too fast still? or not latent
enough, I don't know)

I tried it again: (seems different)
$ git clone https://github.com/maliit/keyboard
Cloning into 'keyboard'...
remote: Enumerating objects: 23243, done.
remote: Counting objects: 100% (464/464), done.
remote: Compressing objects: 100% (207/207), done.
error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
CANCEL (err 8)
error: 7932 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

Change it: (use different values here for those 8 values and for the
100, if needed, you get the picture)
$ sudo tc qdisc change dev em1 root tbf rate 8kbit burst 8kbit latency 100ms

or Delete it:(restore your unshaped traffic)
$ sudo tc qdisc del dev em1 root

Look at it after deletion:
$ sudo tc qdisc show dev em1
qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514
target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64

/sbin/tc comes from package sys-apps/iproute2 6.9.0 on my Gentoo, ymmv.
Good luck.

On Mon, Jul 8, 2024 at 5:32 PM <rsbecker@nexbridge.com> wrote:
>
> On Monday, July 8, 2024 11:15 AM, ellie wrote:
> >On 7/8/24 4:32 PM, Konstantin Khomoutov wrote:
> >> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote:
> >>
> >> [...]
> >>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
> >>> CANCEL (err 8)
> >> [...]
> >>> It seems extremely unlikely to me to be possibly an ISP issue, for
> >>> which I already listed the reasons. An additional one is HTTPS
> >>> downloads from github outside of git, e.g. from zip archives, for way
> >>> larger files work fine as well.
> >> [...]
> >>
> >> What if you explicitly disable HTTP/2 when cloning?
> >>
> >>    git -c http.version=HTTP/1.1 clone ...
> >>
> >> should probably do this.
> >>
> >
> >Thanks for the idea! I tested it:
> >
> >$  git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard
> >maliit-keyboard
> >Cloning into 'maliit-keyboard'...
> >remote: Enumerating objects: 23243, done.
> >remote: Counting objects: 100% (464/464), done.
> >remote: Compressing objects: 100% (207/207), done.
> >error: RPC failed; curl 18 transfer closed with outstanding read data remaining
> >error: 5361 bytes of body are still expected
> >fetch-pack: unexpected disconnect while reading sideband packet
> >fatal: early EOF
> >fatal: fetch-pack: invalid index-pack output
> >
> >Sadly, it seems like the error is only slightly different. It was still worth a try. I
> >contacted GitHub support a while ago but it got stuck. If there were resume
> >available such hiccups wouldn't matter, I hope that explains why I suggested that
> >feature.
>
> I don't really understand what "it got stuck" means. Is that a colloquialism? What got stuck? That case at GitHub?
>
> Have you tried git config --global http.postBuffer 524288000
>
> It might help. The feature being requesting, even if possible, will probably not happen quickly, unless someone has a solid and simple design for this. That is why we are trying to figure out the root cause of your situation, which is not clear to me as to what exactly is failing (possibly a buffer size issue, if this is consistently failing). My experience, as I said before, on these symptoms, is a proxy (even a local one) that is in the way. If you have your linux instance on a VM, the hypervisor may not be configured correctly. Lack of further evidence (all we really have is the curl RPC failure) makes diagnosing this very difficult.
>
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 15:14                     ` ellie
  2024-07-08 15:31                       ` rsbecker
@ 2024-07-08 15:44                       ` Konstantin Khomoutov
  2024-07-08 16:27                         ` rsbecker
  1 sibling, 1 reply; 43+ messages in thread
From: Konstantin Khomoutov @ 2024-07-08 15:44 UTC (permalink / raw)
  To: ellie; +Cc: rsbecker, git

On Mon, Jul 08, 2024 at 05:14:33PM +0200, ellie wrote:

[...]
> > > error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL
> > > (err 8)
> > [...]
> > > It seems extremely unlikely to me to be possibly an ISP issue, for which I
> > > already listed the reasons. An additional one is HTTPS downloads from github
> > > outside of git, e.g. from zip archives, for way larger files work fine as
> > > well.
> > [...]
> > What if you explicitly disable HTTP/2 when cloning?
[...]
> Thanks for the idea! I tested it:
> 
> $  git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard

Over there at SO people are trying all sorts of black magic to combat a
problem which manifests itself in a way very similar to yours [1]. I'm not
sure anything from there could be of help but maybe worth trying anyway as you
can override any (or almost any) Git's configuration setting using that "-c"
command-line option, so basically test round-trips should not be painstakingly
long.

[...]
> fetch-pack: unexpected disconnect while reading sideband packet
[...]
> Sadly, it seems like the error is only slightly different.

I actually find it interesting that in each case a sideband packet is
mentioned. But quite possibly it's a red herring anyway.

 1. https://stackoverflow.com/questions/66366582


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 15:44                       ` Konstantin Khomoutov
@ 2024-07-08 16:27                         ` rsbecker
  2024-07-14 12:00                           ` ellie
                                             ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: rsbecker @ 2024-07-08 16:27 UTC (permalink / raw)
  To: 'Konstantin Khomoutov', 'ellie'; +Cc: git

On Monday, July 8, 2024 11:45 AM, Konstantin Khomoutov wrote:
>On Mon, Jul 08, 2024 at 05:14:33PM +0200, ellie wrote:
>
>[...]
>> > > error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>> > > CANCEL (err 8)
>> > [...]
>> > > It seems extremely unlikely to me to be possibly an ISP issue, for
>> > > which I already listed the reasons. An additional one is HTTPS
>> > > downloads from github outside of git, e.g. from zip archives, for
>> > > way larger files work fine as well.
>> > [...]
>> > What if you explicitly disable HTTP/2 when cloning?
>[...]
>> Thanks for the idea! I tested it:
>>
>> $  git -c http.version=HTTP/1.1 clone
>> https://github.com/maliit/keyboard
>
>Over there at SO people are trying all sorts of black magic to combat a
problem
>which manifests itself in a way very similar to yours [1]. I'm not sure
anything from
>there could be of help but maybe worth trying anyway as you can override
any (or
>almost any) Git's configuration setting using that "-c"
>command-line option, so basically test round-trips should not be
painstakingly
>long.
>
>[...]
>> fetch-pack: unexpected disconnect while reading sideband packet
>[...]
>> Sadly, it seems like the error is only slightly different.
>
>I actually find it interesting that in each case a sideband packet is
mentioned. But
>quite possibly it's a red herring anyway.
>
> 1. https://stackoverflow.com/questions/66366582

I have customers who hit this problem frequently setting up git. It is 99%
of the time a firewall or proxy configuration issue, not specific to GitHub,
and changes to those usually resolve the problem. The firewall and proxy can
be implemented in the ISP's modem if coming from a home network. That is why
I really think the OP's issue is the network, not something that can
reasonably fixed in git. I think the network speed is also a potential
red-herring unless the speed issue relates to the ISP's configuration.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 16:27                         ` rsbecker
@ 2024-07-14 12:00                           ` ellie
  2024-07-24  6:42                           ` ellie
  2025-09-08  2:34                           ` Ellie
  2 siblings, 0 replies; 43+ messages in thread
From: ellie @ 2024-07-14 12:00 UTC (permalink / raw)
  To: rsbecker, 'Konstantin Khomoutov'; +Cc: git



On 7/8/24 6:27 PM, rsbecker@nexbridge.com wrote:
> On Monday, July 8, 2024 11:45 AM, Konstantin Khomoutov wrote:
>> On Mon, Jul 08, 2024 at 05:14:33PM +0200, ellie wrote:
>>
>> [...]
>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>> CANCEL (err 8)
>>>> [...]
>>>>> It seems extremely unlikely to me to be possibly an ISP issue, for
>>>>> which I already listed the reasons. An additional one is HTTPS
>>>>> downloads from github outside of git, e.g. from zip archives, for
>>>>> way larger files work fine as well.
>>>> [...]
>>>> What if you explicitly disable HTTP/2 when cloning?
>> [...]
>>> Thanks for the idea! I tested it:
>>>
>>> $  git -c http.version=HTTP/1.1 clone
>>> https://github.com/maliit/keyboard
>>
>> Over there at SO people are trying all sorts of black magic to combat a
> problem
>> which manifests itself in a way very similar to yours [1]. I'm not sure
> anything from
>> there could be of help but maybe worth trying anyway as you can override
> any (or
>> almost any) Git's configuration setting using that "-c"
>> command-line option, so basically test round-trips should not be
> painstakingly
>> long.
>>
>> [...]
>>> fetch-pack: unexpected disconnect while reading sideband packet
>> [...]
>>> Sadly, it seems like the error is only slightly different.
>>
>> I actually find it interesting that in each case a sideband packet is
> mentioned. But
>> quite possibly it's a red herring anyway.
>>
>> 1. https://stackoverflow.com/questions/66366582
> 
> I have customers who hit this problem frequently setting up git. It is 99%
> of the time a firewall or proxy configuration issue, not specific to GitHub,
> and changes to those usually resolve the problem. The firewall and proxy can
> be implemented in the ISP's modem if coming from a home network. That is why
> I really think the OP's issue is the network, not something that can
> reasonably fixed in git. I think the network speed is also a potential
> red-herring unless the speed issue relates to the ISP's configuration.
> 
For what it's worth, it's definitely Github-specifc for me. Maybe one 
day Github support will respond, I can only hope.

Regards,

Ellie


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 16:27                         ` rsbecker
  2024-07-14 12:00                           ` ellie
@ 2024-07-24  6:42                           ` ellie
  2025-09-08  2:34                           ` Ellie
  2 siblings, 0 replies; 43+ messages in thread
From: ellie @ 2024-07-24  6:42 UTC (permalink / raw)
  To: rsbecker, 'Konstantin Khomoutov'; +Cc: git

For what it's worth, Github support now confirmed to me that it looks 
like they might have a timeout problem on their side, but until more 
people report it they likely won't address it. I appreciate their 
honesty. But I think it shows the vulnerability of a process without 
resume well.

(Sorry to harp on, I thought this extra info might be interesting.)

Regards,

Ellie

On 7/8/24 6:27 PM, rsbecker@nexbridge.com wrote:
> On Monday, July 8, 2024 11:45 AM, Konstantin Khomoutov wrote:
>> On Mon, Jul 08, 2024 at 05:14:33PM +0200, ellie wrote:
>>
>> [...]
>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>> CANCEL (err 8)
>>>> [...]
>>>>> It seems extremely unlikely to me to be possibly an ISP issue, for
>>>>> which I already listed the reasons. An additional one is HTTPS
>>>>> downloads from github outside of git, e.g. from zip archives, for
>>>>> way larger files work fine as well.
>>>> [...]
>>>> What if you explicitly disable HTTP/2 when cloning?
>> [...]
>>> Thanks for the idea! I tested it:
>>>
>>> $  git -c http.version=HTTP/1.1 clone
>>> https://github.com/maliit/keyboard
>>
>> Over there at SO people are trying all sorts of black magic to combat a
> problem
>> which manifests itself in a way very similar to yours [1]. I'm not sure
> anything from
>> there could be of help but maybe worth trying anyway as you can override
> any (or
>> almost any) Git's configuration setting using that "-c"
>> command-line option, so basically test round-trips should not be
> painstakingly
>> long.
>>
>> [...]
>>> fetch-pack: unexpected disconnect while reading sideband packet
>> [...]
>>> Sadly, it seems like the error is only slightly different.
>>
>> I actually find it interesting that in each case a sideband packet is
> mentioned. But
>> quite possibly it's a red herring anyway.
>>
>> 1. https://stackoverflow.com/questions/66366582
> 
> I have customers who hit this problem frequently setting up git. It is 99%
> of the time a firewall or proxy configuration issue, not specific to GitHub,
> and changes to those usually resolve the problem. The firewall and proxy can
> be implemented in the ISP's modem if coming from a home network. That is why
> I really think the OP's issue is the network, not something that can
> reasonably fixed in git. I think the network speed is also a potential
> red-herring unless the speed issue relates to the ISP's configuration.
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-07-08 16:27                         ` rsbecker
  2024-07-14 12:00                           ` ellie
  2024-07-24  6:42                           ` ellie
@ 2025-09-08  2:34                           ` Ellie
  2 siblings, 0 replies; 43+ messages in thread
From: Ellie @ 2025-09-08  2:34 UTC (permalink / raw)
  To: rsbecker, 'Konstantin Khomoutov'; +Cc: git

This has been addressed on Github's side by now, it seems to have been a 
Github server config issue.

Nevertheless, the ability to resume a file transfer remains what some 
would consider essential for internet software. I still hope it'll be 
added one day.

Thank you for the lively debate.

Regards,

Ellie

On 7/8/24 6:27 PM, rsbecker@nexbridge.com wrote:
> On Monday, July 8, 2024 11:45 AM, Konstantin Khomoutov wrote:
>> On Mon, Jul 08, 2024 at 05:14:33PM +0200, ellie wrote:
>>
>> [...]
>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
>>>>> CANCEL (err 8)
>>>> [...]
>>>>> It seems extremely unlikely to me to be possibly an ISP issue, for
>>>>> which I already listed the reasons. An additional one is HTTPS
>>>>> downloads from github outside of git, e.g. from zip archives, for
>>>>> way larger files work fine as well.
>>>> [...]
>>>> What if you explicitly disable HTTP/2 when cloning?
>> [...]
>>> Thanks for the idea! I tested it:
>>>
>>> $  git -c http.version=HTTP/1.1 clone
>>> https://github.com/maliit/keyboard
>>
>> Over there at SO people are trying all sorts of black magic to combat a
> problem
>> which manifests itself in a way very similar to yours [1]. I'm not sure
> anything from
>> there could be of help but maybe worth trying anyway as you can override
> any (or
>> almost any) Git's configuration setting using that "-c"
>> command-line option, so basically test round-trips should not be
> painstakingly
>> long.
>>
>> [...]
>>> fetch-pack: unexpected disconnect while reading sideband packet
>> [...]
>>> Sadly, it seems like the error is only slightly different.
>>
>> I actually find it interesting that in each case a sideband packet is
> mentioned. But
>> quite possibly it's a red herring anyway.
>>
>> 1. https://stackoverflow.com/questions/66366582
> 
> I have customers who hit this problem frequently setting up git. It is 99%
> of the time a firewall or proxy configuration issue, not specific to GitHub,
> and changes to those usually resolve the problem. The firewall and proxy can
> be implemented in the ISP's modem if coming from a home network. That is why
> I really think the OP's issue is the network, not something that can
> reasonably fixed in git. I think the network speed is also a potential
> red-herring unless the speed issue relates to the ISP's configuration.
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: With big repos and slower connections, git clone can be hard to work with
  2024-06-07 23:28 With big repos and slower connections, git clone can be hard to work with ellie
  2024-06-07 23:33 ` rsbecker
@ 2024-09-30 21:01 ` Ellie
  1 sibling, 0 replies; 43+ messages in thread
From: Ellie @ 2024-09-30 21:01 UTC (permalink / raw)
  To: git

My apologies for bringing this up again, but for what it's worth, this 
git repository I can't even clone at depth 1:

$ git clone --depth 1 https://github.com/alf632/terrain3dglitch
Cloning into 'terrain3dglitch'...
remote: Enumerating objects: 697, done.
remote: Counting objects: 100% (697/697), done.
remote: Compressing objects: 100% (439/439), done.
error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: 
CANCEL (err 8)
error: 1754 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

The problem seems to be possibly amplified by a timeout config issue 
from github's side, but also made worse by depth 1 already being 100MB+. 
Downloading that amount without resume isn't feasible for everyone. I'm 
assuming if I need all files and sub dirs, there's no workaround here?

I don't want to waste anybody's time, I'm just hoping to provide some 
further data points that in some edge cases, this can be impactful.

(And sorry if I did something silly while cloning and didn't realize.)

Regards,

Ellie

On 6/8/24 1:28 AM, ellie wrote:
> Dear git team,
> 
> I'm terribly sorry if this is the wrong place, but I'd like to suggest a 
> potential issue with "git clone".
> 
> The problem is that any sort of interruption or connection issue, no 
> matter how brief, causes the clone to stop and leave nothing behind:
> 
> $ git clone https://github.com/Nheko-Reborn/nheko
> Cloning into 'nheko'...
> remote: Enumerating objects: 43991, done.
> remote: Counting objects: 100% (6535/6535), done.
> remote: Compressing objects: 100% (1449/1449), done.
> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: 
> CANCEL (err 8)
> error: 2771 bytes of body are still expected
> fetch-pack: unexpected disconnect while reading sideband packet
> fatal: early EOF
> fatal: fetch-pack: invalid index-pack output
> $ cd nheko
> bash: cd: nheko: No such file or director
> 
> In my experience, this can be really impactful with 1. big repositories 
> and 2. unreliable internet - which I would argue isn't unheard of! E.g. 
> a developer may work via mobile connection on a business trip. The 
> result can even be that a repository is uncloneable for some users!
> 
> This has left me in the absurd situation where I was able to download a 
> tarball via HTTPS from the git hoster just fine, even way larger binary 
> release items, thanks to the browser's HTTPS resume. And yet a simple 
> git clone of the same project failed repeatedly.
> 
> My deepest apologies if I missed an option to fix or address this. But 
> summed up, please consider making git clone recover from hiccups.
> 
> Regards,
> 
> Ellie
> 
> PS: I've seen git hosters have apparent proxy bugs, like timing out 
> slower git clone connections from the server side even if the transfer 
> is ongoing. A git auto-resume would reduce the impact of that, too.
> 
> 
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2025-09-08  2:44 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-07 23:28 With big repos and slower connections, git clone can be hard to work with ellie
2024-06-07 23:33 ` rsbecker
2024-06-08  0:03   ` ellie
2024-06-08  0:35     ` rsbecker
2024-06-08  0:46       ` ellie
2024-06-08  8:43         ` Jeff King
2024-06-08  9:40           ` ellie
2024-06-08  9:44             ` ellie
2024-06-08 10:38               ` Jeff King
2024-06-08 10:35             ` Jeff King
2024-06-08 11:05               ` ellie
2024-06-08 19:00           ` Junio C Hamano
2024-06-08 20:16             ` ellie
2024-06-10  6:46           ` Patrick Steinhardt
2024-06-10 19:04           ` Emily Shaffer
2024-06-10 20:34             ` Junio C Hamano
2024-06-10 21:55               ` ellie
2024-06-13 10:10                 ` Toon claes
2024-06-11  6:31               ` Jeff King
2024-06-11 15:12                 ` Junio C Hamano
2024-06-29  1:53                   ` Sitaram Chamarty
2024-06-11  6:26             ` Jeff King
2024-06-11 19:40               ` Ivan Frade
2024-07-07 23:42         ` ellie
2024-07-08  1:27           ` rsbecker
2024-07-08  2:28             ` ellie
2024-07-08 12:30               ` rsbecker
2024-07-08 12:41                 ` ellie
2024-07-08 14:32                   ` Konstantin Khomoutov
2024-07-08 15:02                     ` rsbecker
2024-07-08 15:14                     ` ellie
2024-07-08 15:31                       ` rsbecker
2024-07-08 15:48                         ` ellie
2024-07-08 16:23                           ` rsbecker
2024-07-08 17:06                             ` ellie
2024-07-08 17:38                               ` rsbecker
2024-07-08 16:09                         ` Emanuel Czirai
2024-07-08 15:44                       ` Konstantin Khomoutov
2024-07-08 16:27                         ` rsbecker
2024-07-14 12:00                           ` ellie
2024-07-24  6:42                           ` ellie
2025-09-08  2:34                           ` Ellie
2024-09-30 21:01 ` Ellie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).