* With big repos and slower connections, git clone can be hard to work with @ 2024-06-07 23:28 ellie 2024-06-07 23:33 ` rsbecker 2024-09-30 21:01 ` Ellie 0 siblings, 2 replies; 43+ messages in thread From: ellie @ 2024-06-07 23:28 UTC (permalink / raw) To: git Dear git team, I'm terribly sorry if this is the wrong place, but I'd like to suggest a potential issue with "git clone". The problem is that any sort of interruption or connection issue, no matter how brief, causes the clone to stop and leave nothing behind: $ git clone https://github.com/Nheko-Reborn/nheko Cloning into 'nheko'... remote: Enumerating objects: 43991, done. remote: Counting objects: 100% (6535/6535), done. remote: Compressing objects: 100% (1449/1449), done. error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL (err 8) error: 2771 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output $ cd nheko bash: cd: nheko: No such file or director In my experience, this can be really impactful with 1. big repositories and 2. unreliable internet - which I would argue isn't unheard of! E.g. a developer may work via mobile connection on a business trip. The result can even be that a repository is uncloneable for some users! This has left me in the absurd situation where I was able to download a tarball via HTTPS from the git hoster just fine, even way larger binary release items, thanks to the browser's HTTPS resume. And yet a simple git clone of the same project failed repeatedly. My deepest apologies if I missed an option to fix or address this. But summed up, please consider making git clone recover from hiccups. Regards, Ellie PS: I've seen git hosters have apparent proxy bugs, like timing out slower git clone connections from the server side even if the transfer is ongoing. A git auto-resume would reduce the impact of that, too. ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: With big repos and slower connections, git clone can be hard to work with 2024-06-07 23:28 With big repos and slower connections, git clone can be hard to work with ellie @ 2024-06-07 23:33 ` rsbecker 2024-06-08 0:03 ` ellie 2024-09-30 21:01 ` Ellie 1 sibling, 1 reply; 43+ messages in thread From: rsbecker @ 2024-06-07 23:33 UTC (permalink / raw) To: 'ellie', git On Friday, June 7, 2024 7:28 PM, ellie wrote: >I'm terribly sorry if this is the wrong place, but I'd like to suggest a potential issue >with "git clone". > >The problem is that any sort of interruption or connection issue, no matter how >brief, causes the clone to stop and leave nothing behind: > >$ git clone https://github.com/Nheko-Reborn/nheko >Cloning into 'nheko'... >remote: Enumerating objects: 43991, done. >remote: Counting objects: 100% (6535/6535), done. >remote: Compressing objects: 100% (1449/1449), done. >error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >CANCEL (err 8) >error: 2771 bytes of body are still expected >fetch-pack: unexpected disconnect while reading sideband packet >fatal: early EOF >fatal: fetch-pack: invalid index-pack output $ cd nheko >bash: cd: nheko: No such file or director > >In my experience, this can be really impactful with 1. big repositories and 2. >unreliable internet - which I would argue isn't unheard of! E.g. >a developer may work via mobile connection on a business trip. The result can even >be that a repository is uncloneable for some users! > >This has left me in the absurd situation where I was able to download a tarball via >HTTPS from the git hoster just fine, even way larger binary release items, thanks to >the browser's HTTPS resume. And yet a simple git clone of the same project failed >repeatedly. > >My deepest apologies if I missed an option to fix or address this. But summed up, >please consider making git clone recover from hiccups. > >Regards, > >Ellie > >PS: I've seen git hosters have apparent proxy bugs, like timing out slower git clone >connections from the server side even if the transfer is ongoing. A git auto-resume >would reduce the impact of that, too. I suggest that you look into two git topics: --depth, which controls how much history is obtained in a clone, and sparse-checkout, which describes the part of the repository you will retrieve. You can prune the contents of the repository so that clone is faster, if you do not need all of the history, or all of the files. This is typically done in complex large repositories, particularly those used for production support as release repositories. --Randall ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-07 23:33 ` rsbecker @ 2024-06-08 0:03 ` ellie 2024-06-08 0:35 ` rsbecker 0 siblings, 1 reply; 43+ messages in thread From: ellie @ 2024-06-08 0:03 UTC (permalink / raw) To: rsbecker, git Thanks, this is very helpful as an emergency workaround! Nevertheless, I usually want the entire history, especially since I wouldn't mind waiting half an hour. But without resume, I've encountered it regularly that it just won't complete even if I give it the time, while way longer downloads in the browser would. The key problem here seems to be the lack of any resume. I hope this helps to understand why I made the suggestion. Regards, Ellie On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote: > On Friday, June 7, 2024 7:28 PM, ellie wrote: >> I'm terribly sorry if this is the wrong place, but I'd like to suggest a potential issue >> with "git clone". >> >> The problem is that any sort of interruption or connection issue, no matter how >> brief, causes the clone to stop and leave nothing behind: >> >> $ git clone https://github.com/Nheko-Reborn/nheko >> Cloning into 'nheko'... >> remote: Enumerating objects: 43991, done. >> remote: Counting objects: 100% (6535/6535), done. >> remote: Compressing objects: 100% (1449/1449), done. >> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >> CANCEL (err 8) >> error: 2771 bytes of body are still expected >> fetch-pack: unexpected disconnect while reading sideband packet >> fatal: early EOF >> fatal: fetch-pack: invalid index-pack output $ cd nheko >> bash: cd: nheko: No such file or director >> >> In my experience, this can be really impactful with 1. big repositories and 2. >> unreliable internet - which I would argue isn't unheard of! E.g. >> a developer may work via mobile connection on a business trip. The result can even >> be that a repository is uncloneable for some users! >> >> This has left me in the absurd situation where I was able to download a tarball via >> HTTPS from the git hoster just fine, even way larger binary release items, thanks to >> the browser's HTTPS resume. And yet a simple git clone of the same project failed >> repeatedly. >> >> My deepest apologies if I missed an option to fix or address this. But summed up, >> please consider making git clone recover from hiccups. >> >> Regards, >> >> Ellie >> >> PS: I've seen git hosters have apparent proxy bugs, like timing out slower git clone >> connections from the server side even if the transfer is ongoing. A git auto-resume >> would reduce the impact of that, too. > > I suggest that you look into two git topics: --depth, which controls how much history is obtained in a clone, and sparse-checkout, which describes the part of the repository you will retrieve. You can prune the contents of the repository so that clone is faster, if you do not need all of the history, or all of the files. This is typically done in complex large repositories, particularly those used for production support as release repositories. > --Randall > ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: With big repos and slower connections, git clone can be hard to work with 2024-06-08 0:03 ` ellie @ 2024-06-08 0:35 ` rsbecker 2024-06-08 0:46 ` ellie 0 siblings, 1 reply; 43+ messages in thread From: rsbecker @ 2024-06-08 0:35 UTC (permalink / raw) To: 'ellie', git On Friday, June 7, 2024 8:03 PM, ellie wrote: >Subject: Re: With big repos and slower connections, git clone can be hard to work >with > >Thanks, this is very helpful as an emergency workaround! > >Nevertheless, I usually want the entire history, especially since I wouldn't mind >waiting half an hour. But without resume, I've encountered it regularly that it just >won't complete even if I give it the time, while way longer downloads in the >browser would. The key problem here seems to be the lack of any resume. > >I hope this helps to understand why I made the suggestion. > >Regards, > >Ellie > >On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote: >> On Friday, June 7, 2024 7:28 PM, ellie wrote: >>> I'm terribly sorry if this is the wrong place, but I'd like to >>> suggest a potential issue with "git clone". >>> >>> The problem is that any sort of interruption or connection issue, no >>> matter how brief, causes the clone to stop and leave nothing behind: >>> >>> $ git clone https://github.com/Nheko-Reborn/nheko >>> Cloning into 'nheko'... >>> remote: Enumerating objects: 43991, done. >>> remote: Counting objects: 100% (6535/6535), done. >>> remote: Compressing objects: 100% (1449/1449), done. >>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>> CANCEL (err 8) >>> error: 2771 bytes of body are still expected >>> fetch-pack: unexpected disconnect while reading sideband packet >>> fatal: early EOF >>> fatal: fetch-pack: invalid index-pack output $ cd nheko >>> bash: cd: nheko: No such file or director >>> >>> In my experience, this can be really impactful with 1. big repositories and 2. >>> unreliable internet - which I would argue isn't unheard of! E.g. >>> a developer may work via mobile connection on a business trip. The >>> result can even be that a repository is uncloneable for some users! >>> >>> This has left me in the absurd situation where I was able to download >>> a tarball via HTTPS from the git hoster just fine, even way larger >>> binary release items, thanks to the browser's HTTPS resume. And yet a >>> simple git clone of the same project failed repeatedly. >>> >>> My deepest apologies if I missed an option to fix or address this. >>> But summed up, please consider making git clone recover from hiccups. >>> >>> Regards, >>> >>> Ellie >>> >>> PS: I've seen git hosters have apparent proxy bugs, like timing out >>> slower git clone connections from the server side even if the >>> transfer is ongoing. A git auto-resume would reduce the impact of that, too. >> >> I suggest that you look into two git topics: --depth, which controls how much >history is obtained in a clone, and sparse-checkout, which describes the part of the >repository you will retrieve. You can prune the contents of the repository so that >clone is faster, if you do not need all of the history, or all of the files. This is typically >done in complex large repositories, particularly those used for production support >as release repositories. Consider doing the clone with --depth=1 then using git fetch --depth=n as the resume. There are other options that effectively give you a resume, including --deepen=n. Build automation, like Jenkins, uses this to speed up the clone/checkout. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 0:35 ` rsbecker @ 2024-06-08 0:46 ` ellie 2024-06-08 8:43 ` Jeff King 2024-07-07 23:42 ` ellie 0 siblings, 2 replies; 43+ messages in thread From: ellie @ 2024-06-08 0:46 UTC (permalink / raw) To: rsbecker, git The deepening worked perfectly, thank you so much! I hope a resume will still be considered however, if even just to help out newcomers. Regards, Ellie On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote: > On Friday, June 7, 2024 8:03 PM, ellie wrote: >> Subject: Re: With big repos and slower connections, git clone can be hard to work >> with >> >> Thanks, this is very helpful as an emergency workaround! >> >> Nevertheless, I usually want the entire history, especially since I wouldn't mind >> waiting half an hour. But without resume, I've encountered it regularly that it just >> won't complete even if I give it the time, while way longer downloads in the >> browser would. The key problem here seems to be the lack of any resume. >> >> I hope this helps to understand why I made the suggestion. >> >> Regards, >> >> Ellie >> >> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote: >>> On Friday, June 7, 2024 7:28 PM, ellie wrote: >>>> I'm terribly sorry if this is the wrong place, but I'd like to >>>> suggest a potential issue with "git clone". >>>> >>>> The problem is that any sort of interruption or connection issue, no >>>> matter how brief, causes the clone to stop and leave nothing behind: >>>> >>>> $ git clone https://github.com/Nheko-Reborn/nheko >>>> Cloning into 'nheko'... >>>> remote: Enumerating objects: 43991, done. >>>> remote: Counting objects: 100% (6535/6535), done. >>>> remote: Compressing objects: 100% (1449/1449), done. >>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>> CANCEL (err 8) >>>> error: 2771 bytes of body are still expected >>>> fetch-pack: unexpected disconnect while reading sideband packet >>>> fatal: early EOF >>>> fatal: fetch-pack: invalid index-pack output $ cd nheko >>>> bash: cd: nheko: No such file or director >>>> >>>> In my experience, this can be really impactful with 1. big repositories and 2. >>>> unreliable internet - which I would argue isn't unheard of! E.g. >>>> a developer may work via mobile connection on a business trip. The >>>> result can even be that a repository is uncloneable for some users! >>>> >>>> This has left me in the absurd situation where I was able to download >>>> a tarball via HTTPS from the git hoster just fine, even way larger >>>> binary release items, thanks to the browser's HTTPS resume. And yet a >>>> simple git clone of the same project failed repeatedly. >>>> >>>> My deepest apologies if I missed an option to fix or address this. >>>> But summed up, please consider making git clone recover from hiccups. >>>> >>>> Regards, >>>> >>>> Ellie >>>> >>>> PS: I've seen git hosters have apparent proxy bugs, like timing out >>>> slower git clone connections from the server side even if the >>>> transfer is ongoing. A git auto-resume would reduce the impact of that, too. >>> >>> I suggest that you look into two git topics: --depth, which controls how much >> history is obtained in a clone, and sparse-checkout, which describes the part of the >> repository you will retrieve. You can prune the contents of the repository so that >> clone is faster, if you do not need all of the history, or all of the files. This is typically >> done in complex large repositories, particularly those used for production support >> as release repositories. > > Consider doing the clone with --depth=1 then using git fetch --depth=n as the resume. There are other options that effectively give you a resume, including --deepen=n. > > Build automation, like Jenkins, uses this to speed up the clone/checkout. > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 0:46 ` ellie @ 2024-06-08 8:43 ` Jeff King 2024-06-08 9:40 ` ellie ` (3 more replies) 2024-07-07 23:42 ` ellie 1 sibling, 4 replies; 43+ messages in thread From: Jeff King @ 2024-06-08 8:43 UTC (permalink / raw) To: ellie; +Cc: rsbecker, git On Sat, Jun 08, 2024 at 02:46:38AM +0200, ellie wrote: > The deepening worked perfectly, thank you so much! I hope a resume will > still be considered however, if even just to help out newcomers. Because the packfile to send the user is created on the fly, making a clone fully resumable is tricky (a second clone may get an equivalent but slightly different pack due to new objects entering the repo, or even raciness between threads). One strategy people have worked on is for servers to point clients at static packfiles (which _do_ remain byte-for-byte identical, and can be resumed) to get some of the objects. But it requires some scheme on the server side to decide when and how to create those packfiles. So while there is support inside Git itself for this idea (both on the server and client side), I don't know of any servers where it is in active use. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 8:43 ` Jeff King @ 2024-06-08 9:40 ` ellie 2024-06-08 9:44 ` ellie 2024-06-08 10:35 ` Jeff King 2024-06-08 19:00 ` Junio C Hamano ` (2 subsequent siblings) 3 siblings, 2 replies; 43+ messages in thread From: ellie @ 2024-06-08 9:40 UTC (permalink / raw) To: Jeff King; +Cc: rsbecker, git Sorry if I'm misunderstanding, and I assume this is a naive suggestion that may not work in some way: but couldn't git somehow retain all the objects it already has fully downloaded cached? And then otherwise start over cleanly (and automatically), but just get the objects it already has from the local cache? In practice, that might already be enough to get through a longer clone despite occasional hiccups. Sorry, I'm really not qualified to make good suggestions, it's just that the current situation feels frustrating as an outside user. Regards, Ellie On 6/8/24 10:43 AM, Jeff King wrote: > On Sat, Jun 08, 2024 at 02:46:38AM +0200, ellie wrote: > >> The deepening worked perfectly, thank you so much! I hope a resume will >> still be considered however, if even just to help out newcomers. > > Because the packfile to send the user is created on the fly, making a > clone fully resumable is tricky (a second clone may get an equivalent > but slightly different pack due to new objects entering the repo, or > even raciness between threads). > > One strategy people have worked on is for servers to point clients at > static packfiles (which _do_ remain byte-for-byte identical, and can be > resumed) to get some of the objects. But it requires some scheme on the > server side to decide when and how to create those packfiles. So while > there is support inside Git itself for this idea (both on the server and > client side), I don't know of any servers where it is in active use. > > -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 9:40 ` ellie @ 2024-06-08 9:44 ` ellie 2024-06-08 10:38 ` Jeff King 2024-06-08 10:35 ` Jeff King 1 sibling, 1 reply; 43+ messages in thread From: ellie @ 2024-06-08 9:44 UTC (permalink / raw) To: Jeff King; +Cc: rsbecker, git Another idea that probably is silly in some way too: couldn't after the first error, git automatically start over and do this whole --depth=1 followed by --deepen... automatically? I feel like anything that wouldn't require knowing and manually doing that process would be an improvement for people affected often by this. Regards, Ellie On 6/8/24 11:40 AM, ellie wrote: > Sorry if I'm misunderstanding, and I assume this is a naive suggestion > that may not work in some way: but couldn't git somehow retain all the > objects it already has fully downloaded cached? And then otherwise start > over cleanly (and automatically), but just get the objects it already > has from the local cache? In practice, that might already be enough to > get through a longer clone despite occasional hiccups. > > Sorry, I'm really not qualified to make good suggestions, it's just that > the current situation feels frustrating as an outside user. > > Regards, > > Ellie > > On 6/8/24 10:43 AM, Jeff King wrote: >> On Sat, Jun 08, 2024 at 02:46:38AM +0200, ellie wrote: >> >>> The deepening worked perfectly, thank you so much! I hope a resume will >>> still be considered however, if even just to help out newcomers. >> >> Because the packfile to send the user is created on the fly, making a >> clone fully resumable is tricky (a second clone may get an equivalent >> but slightly different pack due to new objects entering the repo, or >> even raciness between threads). >> >> One strategy people have worked on is for servers to point clients at >> static packfiles (which _do_ remain byte-for-byte identical, and can be >> resumed) to get some of the objects. But it requires some scheme on the >> server side to decide when and how to create those packfiles. So while >> there is support inside Git itself for this idea (both on the server and >> client side), I don't know of any servers where it is in active use. >> >> -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 9:44 ` ellie @ 2024-06-08 10:38 ` Jeff King 0 siblings, 0 replies; 43+ messages in thread From: Jeff King @ 2024-06-08 10:38 UTC (permalink / raw) To: ellie; +Cc: rsbecker, git On Sat, Jun 08, 2024 at 11:44:09AM +0200, ellie wrote: > Another idea that probably is silly in some way too: couldn't after the > first error, git automatically start over and do this whole --depth=1 > followed by --deepen... automatically? I feel like anything that wouldn't > require knowing and manually doing that process would be an improvement for > people affected often by this. I'm skeptical that shallow-cloning and deepening is a good strategy in general. Serving shallow clones like this is expensive for the server, and there's more network overhead in the back-and-forth requests. It also only slices up the repository in one dimension. There could be a single tree that's really big, or even a single blob that you can never get past. So yes, it may work sometimes, but I don't think it's something we should codify. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 9:40 ` ellie 2024-06-08 9:44 ` ellie @ 2024-06-08 10:35 ` Jeff King 2024-06-08 11:05 ` ellie 1 sibling, 1 reply; 43+ messages in thread From: Jeff King @ 2024-06-08 10:35 UTC (permalink / raw) To: ellie; +Cc: rsbecker, git On Sat, Jun 08, 2024 at 11:40:47AM +0200, ellie wrote: > Sorry if I'm misunderstanding, and I assume this is a naive suggestion that > may not work in some way: but couldn't git somehow retain all the objects it > already has fully downloaded cached? And then otherwise start over cleanly > (and automatically), but just get the objects it already has from the local > cache? In practice, that might already be enough to get through a longer > clone despite occasional hiccups. The problem is that the client/server communication does not share an explicit list of objects. Instead, the client tells the server some points in the object graph that it wants (i.e., the tips of some branches that it wants to fetch) and that it already has (existing branches, or nothing in the case of a clone), and then the server can do its own graph traversal to figure out what needs to be sent. When you've got a partially completed clone, the client can figure out which objects it received. But it can't tell the server "hey, I have commit XYZ, don't send that". Because the server would assume that having XYZ means that it has all of the objects reachable from there (parent commits, their trees and blobs, and so on). And the pack does not come in that order. And even if there was a way to disable reachability analysis, and send a "raw" set of objects that we already have, it would be prohibitively large. The full set of sha1 hashes for linux.git is over 200MB. So naively saying "don't send object X, I have it" would approach that size. It's possible the client could do some analysis to see if it has complete segments of history. In practice it won't, because of the way we order packfiles (it's split by type, and then roughly reverse-chronological through history). If the server re-ordered its response to fill history from the bottom up, it would be possible. We don't do that now because it's not really the optimal order for accessing objects in day-to-day use, and the packfile the server sends is stored directly on disk by the client. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 10:35 ` Jeff King @ 2024-06-08 11:05 ` ellie 0 siblings, 0 replies; 43+ messages in thread From: ellie @ 2024-06-08 11:05 UTC (permalink / raw) To: Jeff King; +Cc: rsbecker, git I see! Unfortunate, but I'm thankful for your detailed explanation. The "shallow-cloning and deepening is [...] expensive for the server" makes me sadder about the current situation. I don't like that I need to make the server's life hard just because my connection is shaky... :-| > It's possible the client could do some analysis to see if it has > complete segments of history. In practice it won't, because of the way > we order packfiles (it's split by type, and then roughly > reverse-chronological through history). If the server re-ordered its > response to fill history from the bottom up, it would be possible. I wonder if that would be the most feasible idea, if any at all...? My main take-away is that I don't know enough to suggest a good way out, and that git is even more impressive and complex tech than I thought. Thanks so much for the detailed responses, and I hope at least some of my uninformed rambling was of any use. Regards, Ellie On 6/8/24 12:35 PM, Jeff King wrote: > On Sat, Jun 08, 2024 at 11:40:47AM +0200, ellie wrote: > >> Sorry if I'm misunderstanding, and I assume this is a naive suggestion that >> may not work in some way: but couldn't git somehow retain all the objects it >> already has fully downloaded cached? And then otherwise start over cleanly >> (and automatically), but just get the objects it already has from the local >> cache? In practice, that might already be enough to get through a longer >> clone despite occasional hiccups. > > The problem is that the client/server communication does not share an > explicit list of objects. Instead, the client tells the server some > points in the object graph that it wants (i.e., the tips of some > branches that it wants to fetch) and that it already has (existing > branches, or nothing in the case of a clone), and then the server can do > its own graph traversal to figure out what needs to be sent. > > When you've got a partially completed clone, the client can figure out > which objects it received. But it can't tell the server "hey, I have > commit XYZ, don't send that". Because the server would assume that > having XYZ means that it has all of the objects reachable from there > (parent commits, their trees and blobs, and so on). And the pack does > not come in that order. > > And even if there was a way to disable reachability analysis, and send a > "raw" set of objects that we already have, it would be prohibitively > large. The full set of sha1 hashes for linux.git is over 200MB. So > naively saying "don't send object X, I have it" would approach that > size. > > It's possible the client could do some analysis to see if it has > complete segments of history. In practice it won't, because of the way > we order packfiles (it's split by type, and then roughly > reverse-chronological through history). If the server re-ordered its > response to fill history from the bottom up, it would be possible. We > don't do that now because it's not really the optimal order for > accessing objects in day-to-day use, and the packfile the server sends > is stored directly on disk by the client. > > -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 8:43 ` Jeff King 2024-06-08 9:40 ` ellie @ 2024-06-08 19:00 ` Junio C Hamano 2024-06-08 20:16 ` ellie 2024-06-10 6:46 ` Patrick Steinhardt 2024-06-10 19:04 ` Emily Shaffer 3 siblings, 1 reply; 43+ messages in thread From: Junio C Hamano @ 2024-06-08 19:00 UTC (permalink / raw) To: Jeff King; +Cc: ellie, rsbecker, git Jeff King <peff@peff.net> writes: > One strategy people have worked on is for servers to point clients at > static packfiles (which _do_ remain byte-for-byte identical, and can be > resumed) to get some of the objects. But it requires some scheme on the > server side to decide when and how to create those packfiles. So while > there is support inside Git itself for this idea (both on the server and > client side), I don't know of any servers where it is in active use. Didn't the bundle URL work originate at GitHub? I thought this use case was a reasonable match to the mechanism. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 19:00 ` Junio C Hamano @ 2024-06-08 20:16 ` ellie 0 siblings, 0 replies; 43+ messages in thread From: ellie @ 2024-06-08 20:16 UTC (permalink / raw) To: Junio C Hamano, Jeff King; +Cc: rsbecker, git (I'm probably not the person to answer fully. But I can say HTTPS Git clones from GitHub don't ever resume for me, if that's informative.) On 6/8/24 9:00 PM, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > >> One strategy people have worked on is for servers to point clients at >> static packfiles (which _do_ remain byte-for-byte identical, and can be >> resumed) to get some of the objects. But it requires some scheme on the >> server side to decide when and how to create those packfiles. So while >> there is support inside Git itself for this idea (both on the server and >> client side), I don't know of any servers where it is in active use. > > Didn't the bundle URL work originate at GitHub? I thought this use > case was a reasonable match to the mechanism. > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 8:43 ` Jeff King 2024-06-08 9:40 ` ellie 2024-06-08 19:00 ` Junio C Hamano @ 2024-06-10 6:46 ` Patrick Steinhardt 2024-06-10 19:04 ` Emily Shaffer 3 siblings, 0 replies; 43+ messages in thread From: Patrick Steinhardt @ 2024-06-10 6:46 UTC (permalink / raw) To: Jeff King; +Cc: ellie, rsbecker, git [-- Attachment #1: Type: text/plain, Size: 1533 bytes --] On Sat, Jun 08, 2024 at 04:43:23AM -0400, Jeff King wrote: > On Sat, Jun 08, 2024 at 02:46:38AM +0200, ellie wrote: > > > The deepening worked perfectly, thank you so much! I hope a resume will > > still be considered however, if even just to help out newcomers. > > Because the packfile to send the user is created on the fly, making a > clone fully resumable is tricky (a second clone may get an equivalent > but slightly different pack due to new objects entering the repo, or > even raciness between threads). > > One strategy people have worked on is for servers to point clients at > static packfiles (which _do_ remain byte-for-byte identical, and can be > resumed) to get some of the objects. But it requires some scheme on the > server side to decide when and how to create those packfiles. So while > there is support inside Git itself for this idea (both on the server and > client side), I don't know of any servers where it is in active use. At GitLab, we have started to roll out use of bundle URIs so that we can pregenerate them and thus reduce load. The next step to evaluate in this context is whether we can easily reuse that infrastructure to eventually enable resumable clones via such bundle URIs. I assume that it cannot be that hard to make this work. That of course wouldn't be a perfect solution, as the clone can only be resumed as long as such a pregenerated bundle continues to exist on the server. But it should still be way better compared to the status quo. Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 8:43 ` Jeff King ` (2 preceding siblings ...) 2024-06-10 6:46 ` Patrick Steinhardt @ 2024-06-10 19:04 ` Emily Shaffer 2024-06-10 20:34 ` Junio C Hamano 2024-06-11 6:26 ` Jeff King 3 siblings, 2 replies; 43+ messages in thread From: Emily Shaffer @ 2024-06-10 19:04 UTC (permalink / raw) To: Jeff King; +Cc: ellie, rsbecker, git On Sat, Jun 8, 2024 at 1:43 AM Jeff King <peff@peff.net> wrote: > > On Sat, Jun 08, 2024 at 02:46:38AM +0200, ellie wrote: > > > The deepening worked perfectly, thank you so much! I hope a resume will > > still be considered however, if even just to help out newcomers. > > Because the packfile to send the user is created on the fly, making a > clone fully resumable is tricky (a second clone may get an equivalent > but slightly different pack due to new objects entering the repo, or > even raciness between threads). > > One strategy people have worked on is for servers to point clients at > static packfiles (which _do_ remain byte-for-byte identical, and can be > resumed) to get some of the objects. But it requires some scheme on the > server side to decide when and how to create those packfiles. So while > there is support inside Git itself for this idea (both on the server and > client side), I don't know of any servers where it is in active use. We use packfile offloading heavily at Google (any repositories hosted at *.googlesource.com, as well as our internal-facing hosting). It works quite well for us scaling large projects like Android and Chrome; we've been using it for some time now and are happy with it. However, one thing that's missing is the resumable download Ellie is describing. With a clone which has been turned into a packfile fetch from a different data store, it *should* be resumable. But the client currently lacks the ability to do that. (This just came up for us internally the other day, and we ended up moving an internal bug to https://git.g-issues.gerritcodereview.com/issues/345241684.) After a resumed clone like this, you may not necessarily have latest - for example, you may lose connection with 90% of the clone finished, then not get connection back for some days, after which point upstream has moved as Peff described elsewhere in this thread. But it would still probably be cheaper to resume that 10% of packfile fetch from the offloaded data store, then do an incremental fetch back to the server to get the couple days of updates on top, as compared to starting over from zero with the server. It seems to me that packfile URIs and bundle URIs are similar enough that we could work out similar logic for both, no? Or maybe there's something I'm missing about the way bundle offloading differs from packfiles. - Emily > > -Peff > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-10 19:04 ` Emily Shaffer @ 2024-06-10 20:34 ` Junio C Hamano 2024-06-10 21:55 ` ellie 2024-06-11 6:31 ` Jeff King 2024-06-11 6:26 ` Jeff King 1 sibling, 2 replies; 43+ messages in thread From: Junio C Hamano @ 2024-06-10 20:34 UTC (permalink / raw) To: Emily Shaffer; +Cc: Jeff King, ellie, rsbecker, git Emily Shaffer <nasamuffin@google.com> writes: > It seems to me that packfile URIs and bundle URIs are similar enough > that we could work out similar logic for both, no? Or maybe there's > something I'm missing about the way bundle offloading differs from > packfiles. Probably we can deprecate one and let the other one take over? It seems that bundleURI have plenty of documentation, but the only hit for packfile URI side I find in the output of $ git grep -i 'pack.*file.*uri' Documentation is the description of how the designed protocol extension is supposed to work in Documentation/technical/packfile-uri.txt and not even the configuration variable uploadpack.blobPackfileURI that controls the "experimental" feature is documented. Perhaps whoever was adding the feature to the public side stopped after pushing out the absolute minimum and lost interest or something? We should update the documentation to reflect the current status (e.g. is it still experimental? what more work do we need on top of it to make it no longer experimental?), add at least minimum description for server operators how to configure it on the server side, etc. (I am assuming that the end-user does not have to do anything to get the feature, as long as their version of Git is recent enough). Thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-10 20:34 ` Junio C Hamano @ 2024-06-10 21:55 ` ellie 2024-06-13 10:10 ` Toon claes 2024-06-11 6:31 ` Jeff King 1 sibling, 1 reply; 43+ messages in thread From: ellie @ 2024-06-10 21:55 UTC (permalink / raw) To: Junio C Hamano, Emily Shaffer; +Cc: Jeff King, rsbecker, git Sorry for again another total newcomer/outsider question: Is a bundle or pack file something any regular git HTTPS instance would naturally provide when setup the usual ways? Like, if resume relied on that, would this work when following the standard smart HTTP setup procedure https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP (sorry if I got the wrong link) and then git cloning from that? That would result in the best availability of such a resume feature, if it ever came to be. Regards, Ellie On 6/10/24 10:34 PM, Junio C Hamano wrote: > Emily Shaffer <nasamuffin@google.com> writes: > >> It seems to me that packfile URIs and bundle URIs are similar enough >> that we could work out similar logic for both, no? Or maybe there's >> something I'm missing about the way bundle offloading differs from >> packfiles. > > Probably we can deprecate one and let the other one take over? It > seems that bundleURI have plenty of documentation, but the only hit > for packfile URI side I find in the output of > > $ git grep -i 'pack.*file.*uri' Documentation > > is the description of how the designed protocol extension is > supposed to work in Documentation/technical/packfile-uri.txt and not > even the configuration variable uploadpack.blobPackfileURI that > controls the "experimental" feature is documented. > > Perhaps whoever was adding the feature to the public side stopped > after pushing out the absolute minimum and lost interest or > something? We should update the documentation to reflect the > current status (e.g. is it still experimental? what more work do we > need on top of it to make it no longer experimental?), add at least > minimum description for server operators how to configure it on the > server side, etc. (I am assuming that the end-user does not have to > do anything to get the feature, as long as their version of Git is > recent enough). > > Thanks. > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-10 21:55 ` ellie @ 2024-06-13 10:10 ` Toon claes 0 siblings, 0 replies; 43+ messages in thread From: Toon claes @ 2024-06-13 10:10 UTC (permalink / raw) To: ellie, Junio C Hamano, Emily Shaffer; +Cc: Jeff King, rsbecker, git ellie <el@horse64.org> writes: > Sorry for again another total newcomer/outsider question: Don't apologize for asking these questions, you're more than welcome. > Is a bundle or pack file something any regular git HTTPS instance > would naturally provide when setup the usual ways? Yes and no. Bundle and packfile format can used in many places. Packfiles are used to transfer a bunch of objects, or store them locally in Git's object database. A bundle is a packfile, but with a leading header describing refs. You can read about that at https://git-scm.com/docs/gitformat-bundle. > Like, if resume relied on that, would this work when following the > standard smart HTTP setup procedure > https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP (sorry if > I got the wrong link) and then git cloning from that? That would > result in the best availability of such a resume feature, if it ever > came to be. As mentioned elsewhere in the thread, on clone (and fetch) the client negotiates with the server which objects to download. Because the state of the remote repository can change between clones, so will the result of this negotiation. This means the content of the packfile sent over might differ, which is disruptive for caching these files. That's why the proposal of bundle URI or packfile URI is suggested. In case of bundle URI, it will tell the client to download a pre-made bundle before starting the negotiation. This bundle can be stored on a CDN or whatever static HTTP(s) server. But it requires the server to create it, store it, and tell the client about it. This is not something that's builtin into Git itself at the moment. This is not really related to the Smart HTTP protocol, because it can be used over SSH as well. But when such file is stored on a regular HTTP server, we can rely on resumable downloads. Only after that bundle is downloaded, the client will start the negotiation with the server to get missing objects and refs (which should be a small subset when the bundle is recent). -- Toon ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-10 20:34 ` Junio C Hamano 2024-06-10 21:55 ` ellie @ 2024-06-11 6:31 ` Jeff King 2024-06-11 15:12 ` Junio C Hamano 1 sibling, 1 reply; 43+ messages in thread From: Jeff King @ 2024-06-11 6:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: Emily Shaffer, ellie, rsbecker, git On Mon, Jun 10, 2024 at 01:34:12PM -0700, Junio C Hamano wrote: > Emily Shaffer <nasamuffin@google.com> writes: > > > It seems to me that packfile URIs and bundle URIs are similar enough > > that we could work out similar logic for both, no? Or maybe there's > > something I'm missing about the way bundle offloading differs from > > packfiles. > > Probably we can deprecate one and let the other one take over? It > seems that bundleURI have plenty of documentation, but the only hit > for packfile URI side I find in the output of > > $ git grep -i 'pack.*file.*uri' Documentation > > is the description of how the designed protocol extension is > supposed to work in Documentation/technical/packfile-uri.txt and not > even the configuration variable uploadpack.blobPackfileURI that > controls the "experimental" feature is documented. I think they serve two different purposes. A packfile URI does not have any connectivity guarantees. So it lets a server say "here's all the objects, except for XYZ which you should fetch from this URL". That's good for offloading pieces of a clone, like single large objects. Whereas bundle URIs require very little cooperation from the server. While a server can advertise bundle URIs, it doesn't need to know about the particular bundle a client grabbed. The client comes back with the usual have/want, just like any other fetching client. At least that's my understanding. I have to admit I didn't follow the recent bundleURI work all that closely. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-11 6:31 ` Jeff King @ 2024-06-11 15:12 ` Junio C Hamano 2024-06-29 1:53 ` Sitaram Chamarty 0 siblings, 1 reply; 43+ messages in thread From: Junio C Hamano @ 2024-06-11 15:12 UTC (permalink / raw) To: Jeff King; +Cc: Emily Shaffer, ellie, rsbecker, git Jeff King <peff@peff.net> writes: > I think they serve two different purposes. A packfile URI does not have > any connectivity guarantees. So it lets a server say "here's all the > objects, except for XYZ which you should fetch from this URL". That's > good for offloading pieces of a clone, like single large objects. > > Whereas bundle URIs require very little cooperation from the server. > While a server can advertise bundle URIs, it doesn't need to know about > the particular bundle a client grabbed. The client comes back with the > usual have/want, just like any other fetching client. Yes, a bundle being a self-contained "object-store + tips", it is a much more suitable building block for offloading clone traffic. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-11 15:12 ` Junio C Hamano @ 2024-06-29 1:53 ` Sitaram Chamarty 0 siblings, 0 replies; 43+ messages in thread From: Sitaram Chamarty @ 2024-06-29 1:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Emily Shaffer, ellie, rsbecker, git, konstantin On Tue, Jun 11, 2024 at 08:12:12AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > I think they serve two different purposes. A packfile URI does not have > > any connectivity guarantees. So it lets a server say "here's all the > > objects, except for XYZ which you should fetch from this URL". That's > > good for offloading pieces of a clone, like single large objects. > > > > Whereas bundle URIs require very little cooperation from the server. > > While a server can advertise bundle URIs, it doesn't need to know about > > the particular bundle a client grabbed. The client comes back with the > > usual have/want, just like any other fetching client. > > Yes, a bundle being a self-contained "object-store + tips", it is > a much more suitable building block for offloading clone traffic. [Adding mricon to cc] Apologies for jumping in so late... Gitolite supports this out of the box. Just a couple of lines change to the rc file and users can just run `rsync` (still mediated and access controlled by gitolite) to get a bundle. Admittedly the first call by someone may take some time but it *is* resumable. See [1] for details. [1]: https://github.com/sitaramc/gitolite/blob/master/src/commands/rsync ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-10 19:04 ` Emily Shaffer 2024-06-10 20:34 ` Junio C Hamano @ 2024-06-11 6:26 ` Jeff King 2024-06-11 19:40 ` Ivan Frade 1 sibling, 1 reply; 43+ messages in thread From: Jeff King @ 2024-06-11 6:26 UTC (permalink / raw) To: Emily Shaffer; +Cc: ellie, rsbecker, git On Mon, Jun 10, 2024 at 12:04:30PM -0700, Emily Shaffer wrote: > > One strategy people have worked on is for servers to point clients at > > static packfiles (which _do_ remain byte-for-byte identical, and can be > > resumed) to get some of the objects. But it requires some scheme on the > > server side to decide when and how to create those packfiles. So while > > there is support inside Git itself for this idea (both on the server and > > client side), I don't know of any servers where it is in active use. > > We use packfile offloading heavily at Google (any repositories hosted > at *.googlesource.com, as well as our internal-facing hosting). It > works quite well for us scaling large projects like Android and > Chrome; we've been using it for some time now and are happy with it. Cool! I'm glad to hear it is in use. It might be helpful for other potential users if you can share how you decide when to create the off-loaded packfiles, what goes in them, and so on. IIRC the server-side config is mostly geared at stuffing a few large blobs into a pack (since each blob must have an individual config key). Maybe JGit (which I'm assuming is what powers googlesource) has better options there. > However, one thing that's missing is the resumable download Ellie is > describing. With a clone which has been turned into a packfile fetch > from a different data store, it *should* be resumable. But the client > currently lacks the ability to do that. (This just came up for us > internally the other day, and we ended up moving an internal bug to > https://git.g-issues.gerritcodereview.com/issues/345241684.) After a > resumed clone like this, you may not necessarily have latest - for > example, you may lose connection with 90% of the clone finished, then > not get connection back for some days, after which point upstream has > moved as Peff described elsewhere in this thread. But it would still > probably be cheaper to resume that 10% of packfile fetch from the > offloaded data store, then do an incremental fetch back to the server > to get the couple days of updates on top, as compared to starting over > from zero with the server. I do agree that resuming the offloaded parts, even if it is a few days later, will generally be beneficial. For packfile offloading, I think the server has to be aware of what's in the packfiles (since it has to know not to send you those objects). So if you got all of the server's response packfile, but didn't finish the offloaded packfiles, it's a no-brainer to finish downloading them, completing your old clone. And then you can fetch on top of that to get fully up to date. But if you didn't get all of the server's response, then you have to contact it again. If it points you to the same offloaded packfile, you can resume that transfer. But if it has moved on and doesn't advertise that packfile anymore, I don't think it's useful. Whereas with bundleURI offloading, I think the client could always resume grabbing the bundle. Whatever it got is going to be useful because it will tell the server what it already has in the usual way (packfile offloads can't do that because the individual packfiles don't enforce the usual reachability guarantees). > It seems to me that packfile URIs and bundle URIs are similar enough > that we could work out similar logic for both, no? Or maybe there's > something I'm missing about the way bundle offloading differs from > packfiles. They are pretty similar, but I think the resume strategy would be a little different, based on what I wrote above. In general I don't think packfile-uris are that useful for resuming, compared to bundle URIs. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-11 6:26 ` Jeff King @ 2024-06-11 19:40 ` Ivan Frade 0 siblings, 0 replies; 43+ messages in thread From: Ivan Frade @ 2024-06-11 19:40 UTC (permalink / raw) To: Jeff King; +Cc: Emily Shaffer, ellie, rsbecker, git On Mon, Jun 10, 2024 at 11:27 PM Jeff King <peff@peff.net> wrote: > > On Mon, Jun 10, 2024 at 12:04:30PM -0700, Emily Shaffer wrote: > > > > One strategy people have worked on is for servers to point clients at > > > static packfiles (which _do_ remain byte-for-byte identical, and can be > > > resumed) to get some of the objects. But it requires some scheme on the > > > server side to decide when and how to create those packfiles. So while > > > there is support inside Git itself for this idea (both on the server and > > > client side), I don't know of any servers where it is in active use. > > > > We use packfile offloading heavily at Google (any repositories hosted > > at *.googlesource.com, as well as our internal-facing hosting). It > > works quite well for us scaling large projects like Android and > > Chrome; we've been using it for some time now and are happy with it. > > Cool! I'm glad to hear it is in use. > > It might be helpful for other potential users if you can share how you > decide when to create the off-loaded packfiles, what goes in them, and > so on. IIRC the server-side config is mostly geared at stuffing a few > large blobs into a pack (since each blob must have an individual config > key). Maybe JGit (which I'm assuming is what powers googlesource) has > better options there. IIRC the upstream conf was oriented to offload individual blobs. In JGit/Google we do the offloading at pack level. We write to storage and CDN when creating a pack and keep the offloaded location in the pack metadata. We do this only in certain conditions (GC, above a certain size,...). At serving time, if we see that we need to send a pack "as-is" (all objects inside are needed) and it has an offload, then we mark it to send the URL instead of the contents. As the offload is just a copy of the pack, we can use the pack bitmap to know what is there or not. > > However, one thing that's missing is the resumable download Ellie is > > describing. Another thing missing in the offload story is supporting offloads in non-http protocols. e.g. after cloning via my-protocol://, being able to fetch my-protocol://blah/blah urls. Ivan ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-08 0:46 ` ellie 2024-06-08 8:43 ` Jeff King @ 2024-07-07 23:42 ` ellie 2024-07-08 1:27 ` rsbecker 1 sibling, 1 reply; 43+ messages in thread From: ellie @ 2024-07-07 23:42 UTC (permalink / raw) To: rsbecker, git I have now encountered a repository where even --deepen=1 is bound to be failing because it pulls in something fairly large that takes a few minutes. (Possibly, the server proxy has a faulty timeout setting that punishes slow connections, but for connections unreliable on the client side the problem would be the same.) So this workaround sadly doesn't seem to cover all cases of resume. Regards, Ellie On 6/8/24 2:46 AM, ellie wrote: > The deepening worked perfectly, thank you so much! I hope a resume will > still be considered however, if even just to help out newcomers. > > Regards, > > Ellie > > On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote: >> On Friday, June 7, 2024 8:03 PM, ellie wrote: >>> Subject: Re: With big repos and slower connections, git clone can be >>> hard to work >>> with >>> >>> Thanks, this is very helpful as an emergency workaround! >>> >>> Nevertheless, I usually want the entire history, especially since I >>> wouldn't mind >>> waiting half an hour. But without resume, I've encountered it >>> regularly that it just >>> won't complete even if I give it the time, while way longer downloads >>> in the >>> browser would. The key problem here seems to be the lack of any resume. >>> >>> I hope this helps to understand why I made the suggestion. >>> >>> Regards, >>> >>> Ellie >>> >>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote: >>>> On Friday, June 7, 2024 7:28 PM, ellie wrote: >>>>> I'm terribly sorry if this is the wrong place, but I'd like to >>>>> suggest a potential issue with "git clone". >>>>> >>>>> The problem is that any sort of interruption or connection issue, no >>>>> matter how brief, causes the clone to stop and leave nothing behind: >>>>> >>>>> $ git clone https://github.com/Nheko-Reborn/nheko >>>>> Cloning into 'nheko'... >>>>> remote: Enumerating objects: 43991, done. >>>>> remote: Counting objects: 100% (6535/6535), done. >>>>> remote: Compressing objects: 100% (1449/1449), done. >>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>> CANCEL (err 8) >>>>> error: 2771 bytes of body are still expected >>>>> fetch-pack: unexpected disconnect while reading sideband packet >>>>> fatal: early EOF >>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko >>>>> bash: cd: nheko: No such file or director >>>>> >>>>> In my experience, this can be really impactful with 1. big >>>>> repositories and 2. >>>>> unreliable internet - which I would argue isn't unheard of! E.g. >>>>> a developer may work via mobile connection on a business trip. The >>>>> result can even be that a repository is uncloneable for some users! >>>>> >>>>> This has left me in the absurd situation where I was able to download >>>>> a tarball via HTTPS from the git hoster just fine, even way larger >>>>> binary release items, thanks to the browser's HTTPS resume. And yet a >>>>> simple git clone of the same project failed repeatedly. >>>>> >>>>> My deepest apologies if I missed an option to fix or address this. >>>>> But summed up, please consider making git clone recover from hiccups. >>>>> >>>>> Regards, >>>>> >>>>> Ellie >>>>> >>>>> PS: I've seen git hosters have apparent proxy bugs, like timing out >>>>> slower git clone connections from the server side even if the >>>>> transfer is ongoing. A git auto-resume would reduce the impact of >>>>> that, too. >>>> >>>> I suggest that you look into two git topics: --depth, which controls >>>> how much >>> history is obtained in a clone, and sparse-checkout, which describes >>> the part of the >>> repository you will retrieve. You can prune the contents of the >>> repository so that >>> clone is faster, if you do not need all of the history, or all of the >>> files. This is typically >>> done in complex large repositories, particularly those used for >>> production support >>> as release repositories. >> >> Consider doing the clone with --depth=1 then using git fetch --depth=n >> as the resume. There are other options that effectively give you a >> resume, including --deepen=n. >> >> Build automation, like Jenkins, uses this to speed up the clone/checkout. >> ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: With big repos and slower connections, git clone can be hard to work with 2024-07-07 23:42 ` ellie @ 2024-07-08 1:27 ` rsbecker 2024-07-08 2:28 ` ellie 0 siblings, 1 reply; 43+ messages in thread From: rsbecker @ 2024-07-08 1:27 UTC (permalink / raw) To: 'ellie', git On Sunday, July 7, 2024 7:42 PM, ellie wrote: >I have now encountered a repository where even --deepen=1 is bound to be failing >because it pulls in something fairly large that takes a few minutes. (Possibly, the >server proxy has a faulty timeout setting that punishes slow connections, but for >connections unreliable on the client side the problem would be the same.) > >So this workaround sadly doesn't seem to cover all cases of resume. > >Regards, > >Ellie > >On 6/8/24 2:46 AM, ellie wrote: >> The deepening worked perfectly, thank you so much! I hope a resume >> will still be considered however, if even just to help out newcomers. >> >> Regards, >> >> Ellie >> >> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote: >>> On Friday, June 7, 2024 8:03 PM, ellie wrote: >>>> Subject: Re: With big repos and slower connections, git clone can be >>>> hard to work with >>>> >>>> Thanks, this is very helpful as an emergency workaround! >>>> >>>> Nevertheless, I usually want the entire history, especially since I >>>> wouldn't mind waiting half an hour. But without resume, I've >>>> encountered it regularly that it just won't complete even if I give >>>> it the time, while way longer downloads in the browser would. The >>>> key problem here seems to be the lack of any resume. >>>> >>>> I hope this helps to understand why I made the suggestion. >>>> >>>> Regards, >>>> >>>> Ellie >>>> >>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote: >>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote: >>>>>> I'm terribly sorry if this is the wrong place, but I'd like to >>>>>> suggest a potential issue with "git clone". >>>>>> >>>>>> The problem is that any sort of interruption or connection issue, >>>>>> no matter how brief, causes the clone to stop and leave nothing behind: >>>>>> >>>>>> $ git clone https://github.com/Nheko-Reborn/nheko >>>>>> Cloning into 'nheko'... >>>>>> remote: Enumerating objects: 43991, done. >>>>>> remote: Counting objects: 100% (6535/6535), done. >>>>>> remote: Compressing objects: 100% (1449/1449), done. >>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>>> CANCEL (err 8) >>>>>> error: 2771 bytes of body are still expected >>>>>> fetch-pack: unexpected disconnect while reading sideband packet >>>>>> fatal: early EOF >>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko >>>>>> bash: cd: nheko: No such file or director >>>>>> >>>>>> In my experience, this can be really impactful with 1. big >>>>>> repositories and 2. >>>>>> unreliable internet - which I would argue isn't unheard of! E.g. >>>>>> a developer may work via mobile connection on a business trip. The >>>>>> result can even be that a repository is uncloneable for some users! >>>>>> >>>>>> This has left me in the absurd situation where I was able to >>>>>> download a tarball via HTTPS from the git hoster just fine, even >>>>>> way larger binary release items, thanks to the browser's HTTPS >>>>>> resume. And yet a simple git clone of the same project failed repeatedly. >>>>>> >>>>>> My deepest apologies if I missed an option to fix or address this. >>>>>> But summed up, please consider making git clone recover from hiccups. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Ellie >>>>>> >>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing >>>>>> out slower git clone connections from the server side even if the >>>>>> transfer is ongoing. A git auto-resume would reduce the impact of >>>>>> that, too. >>>>> >>>>> I suggest that you look into two git topics: --depth, which >>>>> controls how much >>>> history is obtained in a clone, and sparse-checkout, which describes >>>> the part of the repository you will retrieve. You can prune the >>>> contents of the repository so that clone is faster, if you do not >>>> need all of the history, or all of the files. This is typically done >>>> in complex large repositories, particularly those used for >>>> production support as release repositories. >>> >>> Consider doing the clone with --depth=1 then using git fetch >>> --depth=n as the resume. There are other options that effectively >>> give you a resume, including --deepen=n. >>> >>> Build automation, like Jenkins, uses this to speed up the clone/checkout. Can you please provide more details on this? It is difficult to understand your issue without knowing what situation is failing? What size file? Is this a large single pack file? Can you reproduce this with a script we can try? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 1:27 ` rsbecker @ 2024-07-08 2:28 ` ellie 2024-07-08 12:30 ` rsbecker 0 siblings, 1 reply; 43+ messages in thread From: ellie @ 2024-07-08 2:28 UTC (permalink / raw) To: rsbecker, git I was intending to suggest that depending on the largest object in the repository, resume may remain a concern for lower end users. My apologies for being unclear. As for my concrete problem, I can only guess what's happening, maybe github's HTTPS proxy too eagerly discarding slow connections: $ git clone https://github.com/maliit/keyboard maliit-keyboard Cloning into 'maliit-keyboard'... remote: Enumerating objects: 23243, done. remote: Counting objects: 100% (464/464), done. remote: Compressing objects: 100% (207/207), done. error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL (err 8) error: 2507 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output A deepen seems to fail for this repo since one deepen step already gets killed off. Git HTTPS clones from any other hoster I tried, including gitlab.com, work fine, as do git SSH clones from github.com. Sorry for the long tangent. Basically, my point was just that resume still seems like a good idea even with deepen existing. Regards, Ellie On 7/8/24 3:27 AM, rsbecker@nexbridge.com wrote: > On Sunday, July 7, 2024 7:42 PM, ellie wrote: >> I have now encountered a repository where even --deepen=1 is bound to be failing >> because it pulls in something fairly large that takes a few minutes. (Possibly, the >> server proxy has a faulty timeout setting that punishes slow connections, but for >> connections unreliable on the client side the problem would be the same.) >> >> So this workaround sadly doesn't seem to cover all cases of resume. >> >> Regards, >> >> Ellie >> >> On 6/8/24 2:46 AM, ellie wrote: >>> The deepening worked perfectly, thank you so much! I hope a resume >>> will still be considered however, if even just to help out newcomers. >>> >>> Regards, >>> >>> Ellie >>> >>> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote: >>>> On Friday, June 7, 2024 8:03 PM, ellie wrote: >>>>> Subject: Re: With big repos and slower connections, git clone can be >>>>> hard to work with >>>>> >>>>> Thanks, this is very helpful as an emergency workaround! >>>>> >>>>> Nevertheless, I usually want the entire history, especially since I >>>>> wouldn't mind waiting half an hour. But without resume, I've >>>>> encountered it regularly that it just won't complete even if I give >>>>> it the time, while way longer downloads in the browser would. The >>>>> key problem here seems to be the lack of any resume. >>>>> >>>>> I hope this helps to understand why I made the suggestion. >>>>> >>>>> Regards, >>>>> >>>>> Ellie >>>>> >>>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote: >>>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote: >>>>>>> I'm terribly sorry if this is the wrong place, but I'd like to >>>>>>> suggest a potential issue with "git clone". >>>>>>> >>>>>>> The problem is that any sort of interruption or connection issue, >>>>>>> no matter how brief, causes the clone to stop and leave nothing behind: >>>>>>> >>>>>>> $ git clone https://github.com/Nheko-Reborn/nheko >>>>>>> Cloning into 'nheko'... >>>>>>> remote: Enumerating objects: 43991, done. >>>>>>> remote: Counting objects: 100% (6535/6535), done. >>>>>>> remote: Compressing objects: 100% (1449/1449), done. >>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>>>> CANCEL (err 8) >>>>>>> error: 2771 bytes of body are still expected >>>>>>> fetch-pack: unexpected disconnect while reading sideband packet >>>>>>> fatal: early EOF >>>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko >>>>>>> bash: cd: nheko: No such file or director >>>>>>> >>>>>>> In my experience, this can be really impactful with 1. big >>>>>>> repositories and 2. >>>>>>> unreliable internet - which I would argue isn't unheard of! E.g. >>>>>>> a developer may work via mobile connection on a business trip. The >>>>>>> result can even be that a repository is uncloneable for some users! >>>>>>> >>>>>>> This has left me in the absurd situation where I was able to >>>>>>> download a tarball via HTTPS from the git hoster just fine, even >>>>>>> way larger binary release items, thanks to the browser's HTTPS >>>>>>> resume. And yet a simple git clone of the same project failed repeatedly. >>>>>>> >>>>>>> My deepest apologies if I missed an option to fix or address this. >>>>>>> But summed up, please consider making git clone recover from hiccups. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Ellie >>>>>>> >>>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing >>>>>>> out slower git clone connections from the server side even if the >>>>>>> transfer is ongoing. A git auto-resume would reduce the impact of >>>>>>> that, too. >>>>>> >>>>>> I suggest that you look into two git topics: --depth, which >>>>>> controls how much >>>>> history is obtained in a clone, and sparse-checkout, which describes >>>>> the part of the repository you will retrieve. You can prune the >>>>> contents of the repository so that clone is faster, if you do not >>>>> need all of the history, or all of the files. This is typically done >>>>> in complex large repositories, particularly those used for >>>>> production support as release repositories. >>>> >>>> Consider doing the clone with --depth=1 then using git fetch >>>> --depth=n as the resume. There are other options that effectively >>>> give you a resume, including --deepen=n. >>>> >>>> Build automation, like Jenkins, uses this to speed up the clone/checkout. > > Can you please provide more details on this? It is difficult to understand your issue without knowing what situation is failing? What size file? Is this a large single pack file? Can you reproduce this with a script we can try? > ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: With big repos and slower connections, git clone can be hard to work with 2024-07-08 2:28 ` ellie @ 2024-07-08 12:30 ` rsbecker 2024-07-08 12:41 ` ellie 0 siblings, 1 reply; 43+ messages in thread From: rsbecker @ 2024-07-08 12:30 UTC (permalink / raw) To: 'ellie', git On Sunday, July 7, 2024 10:28 PM, ellie wrote: >I was intending to suggest that depending on the largest object in the repository, >resume may remain a concern for lower end users. My apologies for being unclear. > >As for my concrete problem, I can only guess what's happening, maybe github's >HTTPS proxy too eagerly discarding slow connections: > >$ git clone https://github.com/maliit/keyboard maliit-keyboard Cloning into 'maliit- >keyboard'... >remote: Enumerating objects: 23243, done. >remote: Counting objects: 100% (464/464), done. >remote: Compressing objects: 100% (207/207), done. >error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >CANCEL (err 8) >error: 2507 bytes of body are still expected >fetch-pack: unexpected disconnect while reading sideband packet >fatal: early EOF >fatal: fetch-pack: invalid index-pack output > >A deepen seems to fail for this repo since one deepen step already gets killed off. Git >HTTPS clones from any other hoster I tried, including gitlab.com, work fine, as do git >SSH clones from github.com. > >Sorry for the long tangent. Basically, my point was just that resume still seems like a >good idea even with deepen existing. > >Regards, > >Ellie > >On 7/8/24 3:27 AM, rsbecker@nexbridge.com wrote: >> On Sunday, July 7, 2024 7:42 PM, ellie wrote: >>> I have now encountered a repository where even --deepen=1 is bound to >>> be failing because it pulls in something fairly large that takes a >>> few minutes. (Possibly, the server proxy has a faulty timeout setting >>> that punishes slow connections, but for connections unreliable on the >>> client side the problem would be the same.) >>> >>> So this workaround sadly doesn't seem to cover all cases of resume. >>> >>> Regards, >>> >>> Ellie >>> >>> On 6/8/24 2:46 AM, ellie wrote: >>>> The deepening worked perfectly, thank you so much! I hope a resume >>>> will still be considered however, if even just to help out newcomers. >>>> >>>> Regards, >>>> >>>> Ellie >>>> >>>> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote: >>>>> On Friday, June 7, 2024 8:03 PM, ellie wrote: >>>>>> Subject: Re: With big repos and slower connections, git clone can >>>>>> be hard to work with >>>>>> >>>>>> Thanks, this is very helpful as an emergency workaround! >>>>>> >>>>>> Nevertheless, I usually want the entire history, especially since >>>>>> I wouldn't mind waiting half an hour. But without resume, I've >>>>>> encountered it regularly that it just won't complete even if I >>>>>> give it the time, while way longer downloads in the browser would. >>>>>> The key problem here seems to be the lack of any resume. >>>>>> >>>>>> I hope this helps to understand why I made the suggestion. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Ellie >>>>>> >>>>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote: >>>>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote: >>>>>>>> I'm terribly sorry if this is the wrong place, but I'd like to >>>>>>>> suggest a potential issue with "git clone". >>>>>>>> >>>>>>>> The problem is that any sort of interruption or connection >>>>>>>> issue, no matter how brief, causes the clone to stop and leave nothing >behind: >>>>>>>> >>>>>>>> $ git clone https://github.com/Nheko-Reborn/nheko >>>>>>>> Cloning into 'nheko'... >>>>>>>> remote: Enumerating objects: 43991, done. >>>>>>>> remote: Counting objects: 100% (6535/6535), done. >>>>>>>> remote: Compressing objects: 100% (1449/1449), done. >>>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>>>>> CANCEL (err 8) >>>>>>>> error: 2771 bytes of body are still expected >>>>>>>> fetch-pack: unexpected disconnect while reading sideband packet >>>>>>>> fatal: early EOF >>>>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko >>>>>>>> bash: cd: nheko: No such file or director >>>>>>>> >>>>>>>> In my experience, this can be really impactful with 1. big >>>>>>>> repositories and 2. >>>>>>>> unreliable internet - which I would argue isn't unheard of! E.g. >>>>>>>> a developer may work via mobile connection on a business trip. >>>>>>>> The result can even be that a repository is uncloneable for some users! >>>>>>>> >>>>>>>> This has left me in the absurd situation where I was able to >>>>>>>> download a tarball via HTTPS from the git hoster just fine, even >>>>>>>> way larger binary release items, thanks to the browser's HTTPS >>>>>>>> resume. And yet a simple git clone of the same project failed repeatedly. >>>>>>>> >>>>>>>> My deepest apologies if I missed an option to fix or address this. >>>>>>>> But summed up, please consider making git clone recover from hiccups. >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Ellie >>>>>>>> >>>>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing >>>>>>>> out slower git clone connections from the server side even if >>>>>>>> the transfer is ongoing. A git auto-resume would reduce the >>>>>>>> impact of that, too. >>>>>>> >>>>>>> I suggest that you look into two git topics: --depth, which >>>>>>> controls how much >>>>>> history is obtained in a clone, and sparse-checkout, which >>>>>> describes the part of the repository you will retrieve. You can >>>>>> prune the contents of the repository so that clone is faster, if >>>>>> you do not need all of the history, or all of the files. This is >>>>>> typically done in complex large repositories, particularly those >>>>>> used for production support as release repositories. >>>>> >>>>> Consider doing the clone with --depth=1 then using git fetch >>>>> --depth=n as the resume. There are other options that effectively >>>>> give you a resume, including --deepen=n. >>>>> >>>>> Build automation, like Jenkins, uses this to speed up the clone/checkout. >> >> Can you please provide more details on this? It is difficult to understand your issue >without knowing what situation is failing? What size file? Is this a large single pack >file? Can you reproduce this with a script we can try? >> First, for this mailing list, please put your replies at the bottom. Second, the full clone takes under 5 seconds on my system and does not experience any error that you are seeing. I suggest that your ISP may be throttling your account. I have seen this happen on some ISPs under SSH but few under HTTPS. It is likely a firewall or as you said, a proxy setting. GitHub has no proxy. My suggestion is that this is more of a communication issue instead of than a large repo issue. 133Mb is a relatively small a repository and clones quickly. This might be something to take up on the GitHub support forums rather that for git - since it seems like something in the path outside of git is not working correctly. None of the files in this repository, including pack-files is larger than 100 blocks, so there is not much point with a mid-pack restart. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 12:30 ` rsbecker @ 2024-07-08 12:41 ` ellie 2024-07-08 14:32 ` Konstantin Khomoutov 0 siblings, 1 reply; 43+ messages in thread From: ellie @ 2024-07-08 12:41 UTC (permalink / raw) To: rsbecker, git On 7/8/24 2:30 PM, rsbecker@nexbridge.com wrote: > On Sunday, July 7, 2024 10:28 PM, ellie wrote: >> I was intending to suggest that depending on the largest object in the repository, >> resume may remain a concern for lower end users. My apologies for being unclear. >> >> As for my concrete problem, I can only guess what's happening, maybe github's >> HTTPS proxy too eagerly discarding slow connections: >> >> $ git clone https://github.com/maliit/keyboard maliit-keyboard Cloning into 'maliit- >> keyboard'... >> remote: Enumerating objects: 23243, done. >> remote: Counting objects: 100% (464/464), done. >> remote: Compressing objects: 100% (207/207), done. >> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >> CANCEL (err 8) >> error: 2507 bytes of body are still expected >> fetch-pack: unexpected disconnect while reading sideband packet >> fatal: early EOF >> fatal: fetch-pack: invalid index-pack output >> >> A deepen seems to fail for this repo since one deepen step already gets killed off. Git >> HTTPS clones from any other hoster I tried, including gitlab.com, work fine, as do git >> SSH clones from github.com. >> >> Sorry for the long tangent. Basically, my point was just that resume still seems like a >> good idea even with deepen existing. >> >> Regards, >> >> Ellie >> >> On 7/8/24 3:27 AM, rsbecker@nexbridge.com wrote: >>> On Sunday, July 7, 2024 7:42 PM, ellie wrote: >>>> I have now encountered a repository where even --deepen=1 is bound to >>>> be failing because it pulls in something fairly large that takes a >>>> few minutes. (Possibly, the server proxy has a faulty timeout setting >>>> that punishes slow connections, but for connections unreliable on the >>>> client side the problem would be the same.) >>>> >>>> So this workaround sadly doesn't seem to cover all cases of resume. >>>> >>>> Regards, >>>> >>>> Ellie >>>> >>>> On 6/8/24 2:46 AM, ellie wrote: >>>>> The deepening worked perfectly, thank you so much! I hope a resume >>>>> will still be considered however, if even just to help out newcomers. >>>>> >>>>> Regards, >>>>> >>>>> Ellie >>>>> >>>>> On 6/8/24 2:35 AM, rsbecker@nexbridge.com wrote: >>>>>> On Friday, June 7, 2024 8:03 PM, ellie wrote: >>>>>>> Subject: Re: With big repos and slower connections, git clone can >>>>>>> be hard to work with >>>>>>> >>>>>>> Thanks, this is very helpful as an emergency workaround! >>>>>>> >>>>>>> Nevertheless, I usually want the entire history, especially since >>>>>>> I wouldn't mind waiting half an hour. But without resume, I've >>>>>>> encountered it regularly that it just won't complete even if I >>>>>>> give it the time, while way longer downloads in the browser would. >>>>>>> The key problem here seems to be the lack of any resume. >>>>>>> >>>>>>> I hope this helps to understand why I made the suggestion. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Ellie >>>>>>> >>>>>>> On 6/8/24 1:33 AM, rsbecker@nexbridge.com wrote: >>>>>>>> On Friday, June 7, 2024 7:28 PM, ellie wrote: >>>>>>>>> I'm terribly sorry if this is the wrong place, but I'd like to >>>>>>>>> suggest a potential issue with "git clone". >>>>>>>>> >>>>>>>>> The problem is that any sort of interruption or connection >>>>>>>>> issue, no matter how brief, causes the clone to stop and leave nothing >> behind: >>>>>>>>> >>>>>>>>> $ git clone https://github.com/Nheko-Reborn/nheko >>>>>>>>> Cloning into 'nheko'... >>>>>>>>> remote: Enumerating objects: 43991, done. >>>>>>>>> remote: Counting objects: 100% (6535/6535), done. >>>>>>>>> remote: Compressing objects: 100% (1449/1449), done. >>>>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>>>>>> CANCEL (err 8) >>>>>>>>> error: 2771 bytes of body are still expected >>>>>>>>> fetch-pack: unexpected disconnect while reading sideband packet >>>>>>>>> fatal: early EOF >>>>>>>>> fatal: fetch-pack: invalid index-pack output $ cd nheko >>>>>>>>> bash: cd: nheko: No such file or director >>>>>>>>> >>>>>>>>> In my experience, this can be really impactful with 1. big >>>>>>>>> repositories and 2. >>>>>>>>> unreliable internet - which I would argue isn't unheard of! E.g. >>>>>>>>> a developer may work via mobile connection on a business trip. >>>>>>>>> The result can even be that a repository is uncloneable for some users! >>>>>>>>> >>>>>>>>> This has left me in the absurd situation where I was able to >>>>>>>>> download a tarball via HTTPS from the git hoster just fine, even >>>>>>>>> way larger binary release items, thanks to the browser's HTTPS >>>>>>>>> resume. And yet a simple git clone of the same project failed repeatedly. >>>>>>>>> >>>>>>>>> My deepest apologies if I missed an option to fix or address this. >>>>>>>>> But summed up, please consider making git clone recover from hiccups. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Ellie >>>>>>>>> >>>>>>>>> PS: I've seen git hosters have apparent proxy bugs, like timing >>>>>>>>> out slower git clone connections from the server side even if >>>>>>>>> the transfer is ongoing. A git auto-resume would reduce the >>>>>>>>> impact of that, too. >>>>>>>> >>>>>>>> I suggest that you look into two git topics: --depth, which >>>>>>>> controls how much >>>>>>> history is obtained in a clone, and sparse-checkout, which >>>>>>> describes the part of the repository you will retrieve. You can >>>>>>> prune the contents of the repository so that clone is faster, if >>>>>>> you do not need all of the history, or all of the files. This is >>>>>>> typically done in complex large repositories, particularly those >>>>>>> used for production support as release repositories. >>>>>> >>>>>> Consider doing the clone with --depth=1 then using git fetch >>>>>> --depth=n as the resume. There are other options that effectively >>>>>> give you a resume, including --deepen=n. >>>>>> >>>>>> Build automation, like Jenkins, uses this to speed up the clone/checkout. >>> >>> Can you please provide more details on this? It is difficult to understand your issue >> without knowing what situation is failing? What size file? Is this a large single pack >> file? Can you reproduce this with a script we can try? >>> > > First, for this mailing list, please put your replies at the bottom. > > Second, the full clone takes under 5 seconds on my system and does not experience any error that you are seeing. I suggest that your ISP may be throttling your account. I have seen this happen on some ISPs under SSH but few under HTTPS. It is likely a firewall or as you said, a proxy setting. GitHub has no proxy. > > My suggestion is that this is more of a communication issue instead of than a large repo issue. 133Mb is a relatively small a repository and clones quickly. This might be something to take up on the GitHub support forums rather that for git - since it seems like something in the path outside of git is not working correctly. None of the files in this repository, including pack-files is larger than 100 blocks, so there is not much point with a mid-pack restart. > I apologize for not placing the responses where expected. It seems extremely unlikely to me to be possibly an ISP issue, for which I already listed the reasons. An additional one is HTTPS downloads from github outside of git, e.g. from zip archives, for way larger files work fine as well. Nevertheless, this irrelevant to my initial request. Since even if it's not caused by a Github server side issue, a resume would still help. Regards, Ellie ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 12:41 ` ellie @ 2024-07-08 14:32 ` Konstantin Khomoutov 2024-07-08 15:02 ` rsbecker 2024-07-08 15:14 ` ellie 0 siblings, 2 replies; 43+ messages in thread From: Konstantin Khomoutov @ 2024-07-08 14:32 UTC (permalink / raw) To: ellie; +Cc: rsbecker, git On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote: [...] > error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL > (err 8) [...] > It seems extremely unlikely to me to be possibly an ISP issue, for which I > already listed the reasons. An additional one is HTTPS downloads from github > outside of git, e.g. from zip archives, for way larger files work fine as > well. [...] What if you explicitly disable HTTP/2 when cloning? git -c http.version=HTTP/1.1 clone ... should probably do this. ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: With big repos and slower connections, git clone can be hard to work with 2024-07-08 14:32 ` Konstantin Khomoutov @ 2024-07-08 15:02 ` rsbecker 2024-07-08 15:14 ` ellie 1 sibling, 0 replies; 43+ messages in thread From: rsbecker @ 2024-07-08 15:02 UTC (permalink / raw) To: 'Konstantin Khomoutov', 'ellie'; +Cc: git On Monday, July 8, 2024 10:33 AM, Konstantin Khomoutov wrote: >On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote: > >[...] >> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >> CANCEL (err 8) >[...] >> It seems extremely unlikely to me to be possibly an ISP issue, for >> which I already listed the reasons. An additional one is HTTPS >> downloads from github outside of git, e.g. from zip archives, for way >> larger files work fine as well. >[...] > >What if you explicitly disable HTTP/2 when cloning? > > git -c http.version=HTTP/1.1 clone ... > >should probably do this. I can verify that this works in my environment. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 14:32 ` Konstantin Khomoutov 2024-07-08 15:02 ` rsbecker @ 2024-07-08 15:14 ` ellie 2024-07-08 15:31 ` rsbecker 2024-07-08 15:44 ` Konstantin Khomoutov 1 sibling, 2 replies; 43+ messages in thread From: ellie @ 2024-07-08 15:14 UTC (permalink / raw) To: rsbecker, git On 7/8/24 4:32 PM, Konstantin Khomoutov wrote: > On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote: > > [...] >> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL >> (err 8) > [...] >> It seems extremely unlikely to me to be possibly an ISP issue, for which I >> already listed the reasons. An additional one is HTTPS downloads from github >> outside of git, e.g. from zip archives, for way larger files work fine as >> well. > [...] > > What if you explicitly disable HTTP/2 when cloning? > > git -c http.version=HTTP/1.1 clone ... > > should probably do this. > Thanks for the idea! I tested it: $ git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard maliit-keyboard Cloning into 'maliit-keyboard'... remote: Enumerating objects: 23243, done. remote: Counting objects: 100% (464/464), done. remote: Compressing objects: 100% (207/207), done. error: RPC failed; curl 18 transfer closed with outstanding read data remaining error: 5361 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output Sadly, it seems like the error is only slightly different. It was still worth a try. I contacted GitHub support a while ago but it got stuck. If there were resume available such hiccups wouldn't matter, I hope that explains why I suggested that feature. Regards, Ellie ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: With big repos and slower connections, git clone can be hard to work with 2024-07-08 15:14 ` ellie @ 2024-07-08 15:31 ` rsbecker 2024-07-08 15:48 ` ellie 2024-07-08 16:09 ` Emanuel Czirai 2024-07-08 15:44 ` Konstantin Khomoutov 1 sibling, 2 replies; 43+ messages in thread From: rsbecker @ 2024-07-08 15:31 UTC (permalink / raw) To: 'ellie', git On Monday, July 8, 2024 11:15 AM, ellie wrote: >On 7/8/24 4:32 PM, Konstantin Khomoutov wrote: >> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote: >> >> [...] >>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>> CANCEL (err 8) >> [...] >>> It seems extremely unlikely to me to be possibly an ISP issue, for >>> which I already listed the reasons. An additional one is HTTPS >>> downloads from github outside of git, e.g. from zip archives, for way >>> larger files work fine as well. >> [...] >> >> What if you explicitly disable HTTP/2 when cloning? >> >> git -c http.version=HTTP/1.1 clone ... >> >> should probably do this. >> > >Thanks for the idea! I tested it: > >$ git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard >maliit-keyboard >Cloning into 'maliit-keyboard'... >remote: Enumerating objects: 23243, done. >remote: Counting objects: 100% (464/464), done. >remote: Compressing objects: 100% (207/207), done. >error: RPC failed; curl 18 transfer closed with outstanding read data remaining >error: 5361 bytes of body are still expected >fetch-pack: unexpected disconnect while reading sideband packet >fatal: early EOF >fatal: fetch-pack: invalid index-pack output > >Sadly, it seems like the error is only slightly different. It was still worth a try. I >contacted GitHub support a while ago but it got stuck. If there were resume >available such hiccups wouldn't matter, I hope that explains why I suggested that >feature. I don't really understand what "it got stuck" means. Is that a colloquialism? What got stuck? That case at GitHub? Have you tried git config --global http.postBuffer 524288000 It might help. The feature being requesting, even if possible, will probably not happen quickly, unless someone has a solid and simple design for this. That is why we are trying to figure out the root cause of your situation, which is not clear to me as to what exactly is failing (possibly a buffer size issue, if this is consistently failing). My experience, as I said before, on these symptoms, is a proxy (even a local one) that is in the way. If you have your linux instance on a VM, the hypervisor may not be configured correctly. Lack of further evidence (all we really have is the curl RPC failure) makes diagnosing this very difficult. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 15:31 ` rsbecker @ 2024-07-08 15:48 ` ellie 2024-07-08 16:23 ` rsbecker 2024-07-08 16:09 ` Emanuel Czirai 1 sibling, 1 reply; 43+ messages in thread From: ellie @ 2024-07-08 15:48 UTC (permalink / raw) To: rsbecker, git On 7/8/24 5:31 PM, rsbecker@nexbridge.com wrote: > On Monday, July 8, 2024 11:15 AM, ellie wrote: >> On 7/8/24 4:32 PM, Konstantin Khomoutov wrote: >>> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote: >>> >>> [...] >>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>> CANCEL (err 8) >>> [...] >>>> It seems extremely unlikely to me to be possibly an ISP issue, for >>>> which I already listed the reasons. An additional one is HTTPS >>>> downloads from github outside of git, e.g. from zip archives, for way >>>> larger files work fine as well. >>> [...] >>> >>> What if you explicitly disable HTTP/2 when cloning? >>> >>> git -c http.version=HTTP/1.1 clone ... >>> >>> should probably do this. >>> >> >> Thanks for the idea! I tested it: >> >> $ git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard >> maliit-keyboard >> Cloning into 'maliit-keyboard'... >> remote: Enumerating objects: 23243, done. >> remote: Counting objects: 100% (464/464), done. >> remote: Compressing objects: 100% (207/207), done. >> error: RPC failed; curl 18 transfer closed with outstanding read data remaining >> error: 5361 bytes of body are still expected >> fetch-pack: unexpected disconnect while reading sideband packet >> fatal: early EOF >> fatal: fetch-pack: invalid index-pack output >> >> Sadly, it seems like the error is only slightly different. It was still worth a try. I >> contacted GitHub support a while ago but it got stuck. If there were resume >> available such hiccups wouldn't matter, I hope that explains why I suggested that >> feature. > > I don't really understand what "it got stuck" means. Is that a colloquialism? What got stuck? That case at GitHub? > > Have you tried git config --global http.postBuffer 524288000 > > It might help. The feature being requesting, even if possible, will probably not happen quickly, unless someone has a solid and simple design for this. That is why we are trying to figure out the root cause of your situation, which is not clear to me as to what exactly is failing (possibly a buffer size issue, if this is consistently failing). My experience, as I said before, on these symptoms, is a proxy (even a local one) that is in the way. If you have your linux instance on a VM, the hypervisor may not be configured correctly. Lack of further evidence (all we really have is the curl RPC failure) makes diagnosing this very difficult. > Thanks for your response, I appreciate it. I don't know what the hold up is for them, but I'm probably too unimportant, which I understand. I'm not an enterprise user, and >99% of others have faster connections than me which is perhaps why they dodge this config(?) issue. And thanks for your suggestion, but sadly it seems to have no effect: $ git config --global http.postBuffer 524288000 $ git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard maliit-keyboard Cloning into 'maliit-keyboard'... remote: Enumerating objects: 23243, done. remote: Counting objects: 100% (464/464), done. remote: Compressing objects: 100% (207/207), done. error: RPC failed; curl 18 transfer closed with outstanding read data remaining error: 2444 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output I'm doubtful this is solvable without either some resume or a fix from Github's end. But I can use SSH clone so this isn't urgent. Resume just seemed like an idea that would also help others, and it's what makes many other internet services work much better for me. Regards, Ellie ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: With big repos and slower connections, git clone can be hard to work with 2024-07-08 15:48 ` ellie @ 2024-07-08 16:23 ` rsbecker 2024-07-08 17:06 ` ellie 0 siblings, 1 reply; 43+ messages in thread From: rsbecker @ 2024-07-08 16:23 UTC (permalink / raw) To: 'ellie', git On Monday, July 8, 2024 11:49 AM, ellie wrote: >On 7/8/24 5:31 PM, rsbecker@nexbridge.com wrote: >> On Monday, July 8, 2024 11:15 AM, ellie wrote: >>> On 7/8/24 4:32 PM, Konstantin Khomoutov wrote: >>>> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote: >>>> >>>> [...] >>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>> CANCEL (err 8) >>>> [...] >>>>> It seems extremely unlikely to me to be possibly an ISP issue, for >>>>> which I already listed the reasons. An additional one is HTTPS >>>>> downloads from github outside of git, e.g. from zip archives, for >>>>> way larger files work fine as well. >>>> [...] >>>> >>>> What if you explicitly disable HTTP/2 when cloning? >>>> >>>> git -c http.version=HTTP/1.1 clone ... >>>> >>>> should probably do this. >>>> >>> >>> Thanks for the idea! I tested it: >>> >>> $ git -c http.version=HTTP/1.1 clone >>> https://github.com/maliit/keyboard >>> maliit-keyboard >>> Cloning into 'maliit-keyboard'... >>> remote: Enumerating objects: 23243, done. >>> remote: Counting objects: 100% (464/464), done. >>> remote: Compressing objects: 100% (207/207), done. >>> error: RPC failed; curl 18 transfer closed with outstanding read data >>> remaining >>> error: 5361 bytes of body are still expected >>> fetch-pack: unexpected disconnect while reading sideband packet >>> fatal: early EOF >>> fatal: fetch-pack: invalid index-pack output >>> >>> Sadly, it seems like the error is only slightly different. It was >>> still worth a try. I contacted GitHub support a while ago but it got >>> stuck. If there were resume available such hiccups wouldn't matter, I >>> hope that explains why I suggested that feature. >> >> I don't really understand what "it got stuck" means. Is that a colloquialism? What >got stuck? That case at GitHub? >> >> Have you tried git config --global http.postBuffer 524288000 >> >> It might help. The feature being requesting, even if possible, will probably not >happen quickly, unless someone has a solid and simple design for this. That is why >we are trying to figure out the root cause of your situation, which is not clear to me >as to what exactly is failing (possibly a buffer size issue, if this is consistently failing). >My experience, as I said before, on these symptoms, is a proxy (even a local one) >that is in the way. If you have your linux instance on a VM, the hypervisor may not >be configured correctly. Lack of further evidence (all we really have is the curl RPC >failure) makes diagnosing this very difficult. >> > >Thanks for your response, I appreciate it. I don't know what the hold up is for them, >but I'm probably too unimportant, which I understand. I'm not an enterprise user, >and >99% of others have faster connections than me which is perhaps why they >dodge this config(?) issue. > >And thanks for your suggestion, but sadly it seems to have no effect: > >$ git config --global http.postBuffer 524288000 $ git -c http.version=HTTP/1.1 >clone https://github.com/maliit/keyboard >maliit-keyboard >Cloning into 'maliit-keyboard'... >remote: Enumerating objects: 23243, done. >remote: Counting objects: 100% (464/464), done. >remote: Compressing objects: 100% (207/207), done. >error: RPC failed; curl 18 transfer closed with outstanding read data remaining >error: 2444 bytes of body are still expected >fetch-pack: unexpected disconnect while reading sideband packet >fatal: early EOF >fatal: fetch-pack: invalid index-pack output > >I'm doubtful this is solvable without either some resume or a fix from Github's end. >But I can use SSH clone so this isn't urgent. > >Resume just seemed like an idea that would also help others, and it's what makes >many other internet services work much better for me. I do not know which pack file is having the issue - it may be the first one. Try running with the following environment variables GIT_TRACE=true and GIT_PACKET_TRACE=true. This will not correct the problem but might give additional helpful information. git uses libcurl to perform https transfers - which appears to be where the error is coming from. It is my opinion, given the issue is very likely in curl, that a restart capability will not help at all - at least not until we find the actual root cause (still mostly an unknown, although this error is widely discussed online in other non-git places). The failure appears to be transferring a single pack file (139824442 bytes) size may be an issue, but restarting in the middle of a pack file may not solve the problem (discussed in other threads) as the file is potentially built on demand (as I understand it from GitHub) and may not be the same on the next clone attempt. What we probably will find is that a restart will be stuck in the same spot and not move forward because the failure is not at a file boundary. In addition to this, GitHub may have limits on the size of files that can be transferred, which you might be hitting (unlikely but possible). Check your plan options. I tried on a light plan, so this is unlikely but I want to exclude it. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 16:23 ` rsbecker @ 2024-07-08 17:06 ` ellie 2024-07-08 17:38 ` rsbecker 0 siblings, 1 reply; 43+ messages in thread From: ellie @ 2024-07-08 17:06 UTC (permalink / raw) To: rsbecker, git [-- Attachment #1: Type: text/plain, Size: 5674 bytes --] On 7/8/24 6:23 PM, rsbecker@nexbridge.com wrote: > On Monday, July 8, 2024 11:49 AM, ellie wrote: >> On 7/8/24 5:31 PM, rsbecker@nexbridge.com wrote: >>> On Monday, July 8, 2024 11:15 AM, ellie wrote: >>>> On 7/8/24 4:32 PM, Konstantin Khomoutov wrote: >>>>> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote: >>>>> >>>>> [...] >>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>>> CANCEL (err 8) >>>>> [...] >>>>>> It seems extremely unlikely to me to be possibly an ISP issue, for >>>>>> which I already listed the reasons. An additional one is HTTPS >>>>>> downloads from github outside of git, e.g. from zip archives, for >>>>>> way larger files work fine as well. >>>>> [...] >>>>> >>>>> What if you explicitly disable HTTP/2 when cloning? >>>>> >>>>> git -c http.version=HTTP/1.1 clone ... >>>>> >>>>> should probably do this. >>>>> >>>> >>>> Thanks for the idea! I tested it: >>>> >>>> $ git -c http.version=HTTP/1.1 clone >>>> https://github.com/maliit/keyboard >>>> maliit-keyboard >>>> Cloning into 'maliit-keyboard'... >>>> remote: Enumerating objects: 23243, done. >>>> remote: Counting objects: 100% (464/464), done. >>>> remote: Compressing objects: 100% (207/207), done. >>>> error: RPC failed; curl 18 transfer closed with outstanding read data >>>> remaining >>>> error: 5361 bytes of body are still expected >>>> fetch-pack: unexpected disconnect while reading sideband packet >>>> fatal: early EOF >>>> fatal: fetch-pack: invalid index-pack output >>>> >>>> Sadly, it seems like the error is only slightly different. It was >>>> still worth a try. I contacted GitHub support a while ago but it got >>>> stuck. If there were resume available such hiccups wouldn't matter, I >>>> hope that explains why I suggested that feature. >>> >>> I don't really understand what "it got stuck" means. Is that a colloquialism? What >> got stuck? That case at GitHub? >>> >>> Have you tried git config --global http.postBuffer 524288000 >>> >>> It might help. The feature being requesting, even if possible, will probably not >> happen quickly, unless someone has a solid and simple design for this. That is why >> we are trying to figure out the root cause of your situation, which is not clear to me >> as to what exactly is failing (possibly a buffer size issue, if this is consistently failing). >> My experience, as I said before, on these symptoms, is a proxy (even a local one) >> that is in the way. If you have your linux instance on a VM, the hypervisor may not >> be configured correctly. Lack of further evidence (all we really have is the curl RPC >> failure) makes diagnosing this very difficult. >>> >> >> Thanks for your response, I appreciate it. I don't know what the hold up is for them, >> but I'm probably too unimportant, which I understand. I'm not an enterprise user, >> and >99% of others have faster connections than me which is perhaps why they >> dodge this config(?) issue. >> >> And thanks for your suggestion, but sadly it seems to have no effect: >> >> $ git config --global http.postBuffer 524288000 $ git -c http.version=HTTP/1.1 >> clone https://github.com/maliit/keyboard >> maliit-keyboard >> Cloning into 'maliit-keyboard'... >> remote: Enumerating objects: 23243, done. >> remote: Counting objects: 100% (464/464), done. >> remote: Compressing objects: 100% (207/207), done. >> error: RPC failed; curl 18 transfer closed with outstanding read data remaining >> error: 2444 bytes of body are still expected >> fetch-pack: unexpected disconnect while reading sideband packet >> fatal: early EOF >> fatal: fetch-pack: invalid index-pack output >> >> I'm doubtful this is solvable without either some resume or a fix from Github's end. >> But I can use SSH clone so this isn't urgent. >> >> Resume just seemed like an idea that would also help others, and it's what makes >> many other internet services work much better for me. > > I do not know which pack file is having the issue - it may be the first one. Try running with the following environment variables GIT_TRACE=true and GIT_PACKET_TRACE=true. This will not correct the problem but might give additional helpful information. git uses libcurl to perform https transfers - which appears to be where the error is coming from. It is my opinion, given the issue is very likely in curl, that a restart capability will not help at all - at least not until we find the actual root cause (still mostly an unknown, although this error is widely discussed online in other non-git places). The failure appears to be transferring a single pack file (139824442 bytes) size may be an issue, but restarting in the middle of a pack file may not solve the problem (discussed in other threads) as the file is potentially built on demand (as I understand it from GitHub) and may not be the same on the next clone attempt. What we probably will find is that a restart will be stuck in the same spot and not move forward because the failure is not at a file boundary. > > In addition to this, GitHub may have limits on the size of files that can be transferred, which you might be hitting (unlikely but possible). Check your plan options. I tried on a light plan, so this is unlikely but I want to exclude it. > > I attached the output of this command: $ GIT_TRACE=true GIT_PACKET_TRACE=true git -c http.version=HTTP/1.1 clone https://github.com/malii t/keyboard maliit-keyboard > log.txt 2>&1 My best guess is still that due to some unfortunate timeout choice, Github's end simply becomes impatient and closes the connection. Regards, Ellie [-- Attachment #2: log.txt --] [-- Type: text/plain, Size: 1090 bytes --] 18:44:33.182907 git.c:465 trace: built-in: git clone https://github.com/maliit/keyboard maliit-keyboard Cloning into 'maliit-keyboard'... 18:44:33.186926 run-command.c:657 trace: run_command: git remote-https origin https://github.com/maliit/keyboard 18:44:33.188668 git.c:750 trace: exec: git-remote-https origin https://github.com/maliit/keyboard 18:44:33.188728 run-command.c:657 trace: run_command: git-remote-https origin https://github.com/maliit/keyboard 18:44:34.757740 run-command.c:657 trace: run_command: git index-pack --stdin --fix-thin '--keep=fetch-pack 14261 on elliedeck' --check-self-contained-and-connected 18:44:34.759305 git.c:465 trace: built-in: git index-pack --stdin --fix-thin '--keep=fetch-pack 14261 on elliedeck' --check-self-contained-and-connected error: RPC failed; curl 18 transfer closed with outstanding read data remaining error: 5858 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: With big repos and slower connections, git clone can be hard to work with 2024-07-08 17:06 ` ellie @ 2024-07-08 17:38 ` rsbecker 0 siblings, 0 replies; 43+ messages in thread From: rsbecker @ 2024-07-08 17:38 UTC (permalink / raw) To: 'ellie', git On Monday, July 8, 2024 1:06 PM, ellie wrote: >On 7/8/24 6:23 PM, rsbecker@nexbridge.com wrote: >> On Monday, July 8, 2024 11:49 AM, ellie wrote: >>> On 7/8/24 5:31 PM, rsbecker@nexbridge.com wrote: >>>> On Monday, July 8, 2024 11:15 AM, ellie wrote: >>>>> On 7/8/24 4:32 PM, Konstantin Khomoutov wrote: >>>>>> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote: >>>>>> >>>>>> [...] >>>>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>>>> CANCEL (err 8) >>>>>> [...] >>>>>>> It seems extremely unlikely to me to be possibly an ISP issue, >>>>>>> for which I already listed the reasons. An additional one is >>>>>>> HTTPS downloads from github outside of git, e.g. from zip >>>>>>> archives, for way larger files work fine as well. >>>>>> [...] >>>>>> >>>>>> What if you explicitly disable HTTP/2 when cloning? >>>>>> >>>>>> git -c http.version=HTTP/1.1 clone ... >>>>>> >>>>>> should probably do this. >>>>>> >>>>> >>>>> Thanks for the idea! I tested it: >>>>> >>>>> $ git -c http.version=HTTP/1.1 clone >>>>> https://github.com/maliit/keyboard >>>>> maliit-keyboard >>>>> Cloning into 'maliit-keyboard'... >>>>> remote: Enumerating objects: 23243, done. >>>>> remote: Counting objects: 100% (464/464), done. >>>>> remote: Compressing objects: 100% (207/207), done. >>>>> error: RPC failed; curl 18 transfer closed with outstanding read >>>>> data remaining >>>>> error: 5361 bytes of body are still expected >>>>> fetch-pack: unexpected disconnect while reading sideband packet >>>>> fatal: early EOF >>>>> fatal: fetch-pack: invalid index-pack output >>>>> >>>>> Sadly, it seems like the error is only slightly different. It was >>>>> still worth a try. I contacted GitHub support a while ago but it >>>>> got stuck. If there were resume available such hiccups wouldn't >>>>> matter, I hope that explains why I suggested that feature. >>>> >>>> I don't really understand what "it got stuck" means. Is that a >>>> colloquialism? What >>> got stuck? That case at GitHub? >>>> >>>> Have you tried git config --global http.postBuffer 524288000 >>>> >>>> It might help. The feature being requesting, even if possible, will >>>> probably not >>> happen quickly, unless someone has a solid and simple design for >>> this. That is why we are trying to figure out the root cause of your >>> situation, which is not clear to me as to what exactly is failing (possibly a buffer >size issue, if this is consistently failing). >>> My experience, as I said before, on these symptoms, is a proxy (even >>> a local one) that is in the way. If you have your linux instance on a >>> VM, the hypervisor may not be configured correctly. Lack of further >>> evidence (all we really have is the curl RPC >>> failure) makes diagnosing this very difficult. >>>> >>> >>> Thanks for your response, I appreciate it. I don't know what the hold >>> up is for them, but I'm probably too unimportant, which I understand. >>> I'm not an enterprise user, and >99% of others have faster >>> connections than me which is perhaps why they dodge this config(?) issue. >>> >>> And thanks for your suggestion, but sadly it seems to have no effect: >>> >>> $ git config --global http.postBuffer 524288000 $ git -c >>> http.version=HTTP/1.1 clone https://github.com/maliit/keyboard >>> maliit-keyboard >>> Cloning into 'maliit-keyboard'... >>> remote: Enumerating objects: 23243, done. >>> remote: Counting objects: 100% (464/464), done. >>> remote: Compressing objects: 100% (207/207), done. >>> error: RPC failed; curl 18 transfer closed with outstanding read data >>> remaining >>> error: 2444 bytes of body are still expected >>> fetch-pack: unexpected disconnect while reading sideband packet >>> fatal: early EOF >>> fatal: fetch-pack: invalid index-pack output >>> >>> I'm doubtful this is solvable without either some resume or a fix from Github's >end. >>> But I can use SSH clone so this isn't urgent. >>> >>> Resume just seemed like an idea that would also help others, and it's >>> what makes many other internet services work much better for me. >> >> I do not know which pack file is having the issue - it may be the first one. Try >running with the following environment variables GIT_TRACE=true and >GIT_PACKET_TRACE=true. This will not correct the problem but might give >additional helpful information. git uses libcurl to perform https transfers - which >appears to be where the error is coming from. It is my opinion, given the issue is >very likely in curl, that a restart capability will not help at all - at least not until we >find the actual root cause (still mostly an unknown, although this error is widely >discussed online in other non-git places). The failure appears to be transferring a >single pack file (139824442 bytes) size may be an issue, but restarting in the middle >of a pack file may not solve the problem (discussed in other threads) as the file is >potentially built on demand (as I understand it from GitHub) and may not be the >same on the next clone attempt. What we probably will find is that a restart will be >stuck in the same spot and not move forward because the failure is not at a file >boundary. >> >> In addition to this, GitHub may have limits on the size of files that can be >transferred, which you might be hitting (unlikely but possible). Check your plan >options. I tried on a light plan, so this is unlikely but I want to exclude it. >> >> >I attached the output of this command: > >$ GIT_TRACE=true GIT_PACKET_TRACE=true git -c http.version=HTTP/1.1 clone >https://github.com/malii t/keyboard maliit-keyboard > log.txt 2>&1 > >My best guess is still that due to some unfortunate timeout choice, Github's end >simply becomes impatient and closes the connection. 18:44:33.182907 git.c:465 trace: built-in: git clone https://github.com/maliit/keyboard maliit-keyboard Cloning into 'maliit-keyboard'... 18:44:33.186926 run-command.c:657 trace: run_command: git remote-https origin https://github.com/maliit/keyboard 18:44:33.188668 git.c:750 trace: exec: git-remote-https origin https://github.com/maliit/keyboard 18:44:33.188728 run-command.c:657 trace: run_command: git-remote-https origin https://github.com/maliit/keyboard 18:44:34.757740 run-command.c:657 trace: run_command: git index-pack --stdin --fix-thin '--keep=fetch-pack 14261 on elliedeck' --check-self-contained-and-connected 18:44:34.759305 git.c:465 trace: built-in: git index-pack --stdin --fix-thin '--keep=fetch-pack 14261 on elliedeck' --check-self-contained-and-connected error: RPC failed; curl 18 transfer closed with outstanding read data remaining error: 5858 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output From what I could tell from the log, the operation took less than 3 seconds. How long does it appear to take to you? This does not look like a timeout. In fact, it looks like the failure happened before git was able to process any content. From what I read from the log, libcurl encountered a failure and passed that up to git, which stopped the operation. You could try putting -v into your .curlrc file or otherwise getting some verbose information out of curl here the failure is occurring. I would also suggest passing this over to the curl team for examination. I am at a loss on resolving this, further, particularly if there are no intermediary components like firewalls and proxies - note that many ISPs build firewalls and proxies into their NAT routers. A curl verbose trace might show this. My home ISP in Canada has all kinds of stuff in their cable modems, which I had disabled by the tech who installed the box, and I have no issues cloning the above repo. They do have QoS limits but have not blocked https downloads. --Randall ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 15:31 ` rsbecker 2024-07-08 15:48 ` ellie @ 2024-07-08 16:09 ` Emanuel Czirai 1 sibling, 0 replies; 43+ messages in thread From: Emanuel Czirai @ 2024-07-08 16:09 UTC (permalink / raw) To: git Can try traffic shaping it, temporarily, just to can reproduce the issue on (presumably)anyone's linux machine, like: $ sudo tc qdisc change dev em1 root tbf rate 8kbit burst 8kbit latency 100ms (replace em1 with eth0 or whichever `ip a` reports as your LAN interface) Look at it: $ sudo tc qdisc show dev em1 qdisc tbf 8001: root refcnt 2 rate 8Kbit burst 1Kb lat 100ms $ git clone https://github.com/maliit/keyboard Cloning into 'keyboard'... remote: Enumerating objects: 23243, done. remote: Counting objects: 100% (464/464), done. remote: Compressing objects: 100% (207/207), done. error: 153 bytes of body are still expectedMiB | 1.14 MiB/s fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output It's different for me, but maybe this traffic shaping idea might still help if properly modified? (maybe it's too fast still? or not latent enough, I don't know) I tried it again: (seems different) $ git clone https://github.com/maliit/keyboard Cloning into 'keyboard'... remote: Enumerating objects: 23243, done. remote: Counting objects: 100% (464/464), done. remote: Compressing objects: 100% (207/207), done. error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL (err 8) error: 7932 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output Change it: (use different values here for those 8 values and for the 100, if needed, you get the picture) $ sudo tc qdisc change dev em1 root tbf rate 8kbit burst 8kbit latency 100ms or Delete it:(restore your unshaped traffic) $ sudo tc qdisc del dev em1 root Look at it after deletion: $ sudo tc qdisc show dev em1 qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 /sbin/tc comes from package sys-apps/iproute2 6.9.0 on my Gentoo, ymmv. Good luck. On Mon, Jul 8, 2024 at 5:32 PM <rsbecker@nexbridge.com> wrote: > > On Monday, July 8, 2024 11:15 AM, ellie wrote: > >On 7/8/24 4:32 PM, Konstantin Khomoutov wrote: > >> On Mon, Jul 08, 2024 at 04:28:25AM +0200, ellie wrote: > >> > >> [...] > >>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: > >>> CANCEL (err 8) > >> [...] > >>> It seems extremely unlikely to me to be possibly an ISP issue, for > >>> which I already listed the reasons. An additional one is HTTPS > >>> downloads from github outside of git, e.g. from zip archives, for way > >>> larger files work fine as well. > >> [...] > >> > >> What if you explicitly disable HTTP/2 when cloning? > >> > >> git -c http.version=HTTP/1.1 clone ... > >> > >> should probably do this. > >> > > > >Thanks for the idea! I tested it: > > > >$ git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard > >maliit-keyboard > >Cloning into 'maliit-keyboard'... > >remote: Enumerating objects: 23243, done. > >remote: Counting objects: 100% (464/464), done. > >remote: Compressing objects: 100% (207/207), done. > >error: RPC failed; curl 18 transfer closed with outstanding read data remaining > >error: 5361 bytes of body are still expected > >fetch-pack: unexpected disconnect while reading sideband packet > >fatal: early EOF > >fatal: fetch-pack: invalid index-pack output > > > >Sadly, it seems like the error is only slightly different. It was still worth a try. I > >contacted GitHub support a while ago but it got stuck. If there were resume > >available such hiccups wouldn't matter, I hope that explains why I suggested that > >feature. > > I don't really understand what "it got stuck" means. Is that a colloquialism? What got stuck? That case at GitHub? > > Have you tried git config --global http.postBuffer 524288000 > > It might help. The feature being requesting, even if possible, will probably not happen quickly, unless someone has a solid and simple design for this. That is why we are trying to figure out the root cause of your situation, which is not clear to me as to what exactly is failing (possibly a buffer size issue, if this is consistently failing). My experience, as I said before, on these symptoms, is a proxy (even a local one) that is in the way. If you have your linux instance on a VM, the hypervisor may not be configured correctly. Lack of further evidence (all we really have is the curl RPC failure) makes diagnosing this very difficult. > > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 15:14 ` ellie 2024-07-08 15:31 ` rsbecker @ 2024-07-08 15:44 ` Konstantin Khomoutov 2024-07-08 16:27 ` rsbecker 1 sibling, 1 reply; 43+ messages in thread From: Konstantin Khomoutov @ 2024-07-08 15:44 UTC (permalink / raw) To: ellie; +Cc: rsbecker, git On Mon, Jul 08, 2024 at 05:14:33PM +0200, ellie wrote: [...] > > > error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL > > > (err 8) > > [...] > > > It seems extremely unlikely to me to be possibly an ISP issue, for which I > > > already listed the reasons. An additional one is HTTPS downloads from github > > > outside of git, e.g. from zip archives, for way larger files work fine as > > > well. > > [...] > > What if you explicitly disable HTTP/2 when cloning? [...] > Thanks for the idea! I tested it: > > $ git -c http.version=HTTP/1.1 clone https://github.com/maliit/keyboard Over there at SO people are trying all sorts of black magic to combat a problem which manifests itself in a way very similar to yours [1]. I'm not sure anything from there could be of help but maybe worth trying anyway as you can override any (or almost any) Git's configuration setting using that "-c" command-line option, so basically test round-trips should not be painstakingly long. [...] > fetch-pack: unexpected disconnect while reading sideband packet [...] > Sadly, it seems like the error is only slightly different. I actually find it interesting that in each case a sideband packet is mentioned. But quite possibly it's a red herring anyway. 1. https://stackoverflow.com/questions/66366582 ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: With big repos and slower connections, git clone can be hard to work with 2024-07-08 15:44 ` Konstantin Khomoutov @ 2024-07-08 16:27 ` rsbecker 2024-07-14 12:00 ` ellie ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: rsbecker @ 2024-07-08 16:27 UTC (permalink / raw) To: 'Konstantin Khomoutov', 'ellie'; +Cc: git On Monday, July 8, 2024 11:45 AM, Konstantin Khomoutov wrote: >On Mon, Jul 08, 2024 at 05:14:33PM +0200, ellie wrote: > >[...] >> > > error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >> > > CANCEL (err 8) >> > [...] >> > > It seems extremely unlikely to me to be possibly an ISP issue, for >> > > which I already listed the reasons. An additional one is HTTPS >> > > downloads from github outside of git, e.g. from zip archives, for >> > > way larger files work fine as well. >> > [...] >> > What if you explicitly disable HTTP/2 when cloning? >[...] >> Thanks for the idea! I tested it: >> >> $ git -c http.version=HTTP/1.1 clone >> https://github.com/maliit/keyboard > >Over there at SO people are trying all sorts of black magic to combat a problem >which manifests itself in a way very similar to yours [1]. I'm not sure anything from >there could be of help but maybe worth trying anyway as you can override any (or >almost any) Git's configuration setting using that "-c" >command-line option, so basically test round-trips should not be painstakingly >long. > >[...] >> fetch-pack: unexpected disconnect while reading sideband packet >[...] >> Sadly, it seems like the error is only slightly different. > >I actually find it interesting that in each case a sideband packet is mentioned. But >quite possibly it's a red herring anyway. > > 1. https://stackoverflow.com/questions/66366582 I have customers who hit this problem frequently setting up git. It is 99% of the time a firewall or proxy configuration issue, not specific to GitHub, and changes to those usually resolve the problem. The firewall and proxy can be implemented in the ISP's modem if coming from a home network. That is why I really think the OP's issue is the network, not something that can reasonably fixed in git. I think the network speed is also a potential red-herring unless the speed issue relates to the ISP's configuration. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 16:27 ` rsbecker @ 2024-07-14 12:00 ` ellie 2024-07-24 6:42 ` ellie 2025-09-08 2:34 ` Ellie 2 siblings, 0 replies; 43+ messages in thread From: ellie @ 2024-07-14 12:00 UTC (permalink / raw) To: rsbecker, 'Konstantin Khomoutov'; +Cc: git On 7/8/24 6:27 PM, rsbecker@nexbridge.com wrote: > On Monday, July 8, 2024 11:45 AM, Konstantin Khomoutov wrote: >> On Mon, Jul 08, 2024 at 05:14:33PM +0200, ellie wrote: >> >> [...] >>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>> CANCEL (err 8) >>>> [...] >>>>> It seems extremely unlikely to me to be possibly an ISP issue, for >>>>> which I already listed the reasons. An additional one is HTTPS >>>>> downloads from github outside of git, e.g. from zip archives, for >>>>> way larger files work fine as well. >>>> [...] >>>> What if you explicitly disable HTTP/2 when cloning? >> [...] >>> Thanks for the idea! I tested it: >>> >>> $ git -c http.version=HTTP/1.1 clone >>> https://github.com/maliit/keyboard >> >> Over there at SO people are trying all sorts of black magic to combat a > problem >> which manifests itself in a way very similar to yours [1]. I'm not sure > anything from >> there could be of help but maybe worth trying anyway as you can override > any (or >> almost any) Git's configuration setting using that "-c" >> command-line option, so basically test round-trips should not be > painstakingly >> long. >> >> [...] >>> fetch-pack: unexpected disconnect while reading sideband packet >> [...] >>> Sadly, it seems like the error is only slightly different. >> >> I actually find it interesting that in each case a sideband packet is > mentioned. But >> quite possibly it's a red herring anyway. >> >> 1. https://stackoverflow.com/questions/66366582 > > I have customers who hit this problem frequently setting up git. It is 99% > of the time a firewall or proxy configuration issue, not specific to GitHub, > and changes to those usually resolve the problem. The firewall and proxy can > be implemented in the ISP's modem if coming from a home network. That is why > I really think the OP's issue is the network, not something that can > reasonably fixed in git. I think the network speed is also a potential > red-herring unless the speed issue relates to the ISP's configuration. > For what it's worth, it's definitely Github-specifc for me. Maybe one day Github support will respond, I can only hope. Regards, Ellie ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 16:27 ` rsbecker 2024-07-14 12:00 ` ellie @ 2024-07-24 6:42 ` ellie 2025-09-08 2:34 ` Ellie 2 siblings, 0 replies; 43+ messages in thread From: ellie @ 2024-07-24 6:42 UTC (permalink / raw) To: rsbecker, 'Konstantin Khomoutov'; +Cc: git For what it's worth, Github support now confirmed to me that it looks like they might have a timeout problem on their side, but until more people report it they likely won't address it. I appreciate their honesty. But I think it shows the vulnerability of a process without resume well. (Sorry to harp on, I thought this extra info might be interesting.) Regards, Ellie On 7/8/24 6:27 PM, rsbecker@nexbridge.com wrote: > On Monday, July 8, 2024 11:45 AM, Konstantin Khomoutov wrote: >> On Mon, Jul 08, 2024 at 05:14:33PM +0200, ellie wrote: >> >> [...] >>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>> CANCEL (err 8) >>>> [...] >>>>> It seems extremely unlikely to me to be possibly an ISP issue, for >>>>> which I already listed the reasons. An additional one is HTTPS >>>>> downloads from github outside of git, e.g. from zip archives, for >>>>> way larger files work fine as well. >>>> [...] >>>> What if you explicitly disable HTTP/2 when cloning? >> [...] >>> Thanks for the idea! I tested it: >>> >>> $ git -c http.version=HTTP/1.1 clone >>> https://github.com/maliit/keyboard >> >> Over there at SO people are trying all sorts of black magic to combat a > problem >> which manifests itself in a way very similar to yours [1]. I'm not sure > anything from >> there could be of help but maybe worth trying anyway as you can override > any (or >> almost any) Git's configuration setting using that "-c" >> command-line option, so basically test round-trips should not be > painstakingly >> long. >> >> [...] >>> fetch-pack: unexpected disconnect while reading sideband packet >> [...] >>> Sadly, it seems like the error is only slightly different. >> >> I actually find it interesting that in each case a sideband packet is > mentioned. But >> quite possibly it's a red herring anyway. >> >> 1. https://stackoverflow.com/questions/66366582 > > I have customers who hit this problem frequently setting up git. It is 99% > of the time a firewall or proxy configuration issue, not specific to GitHub, > and changes to those usually resolve the problem. The firewall and proxy can > be implemented in the ISP's modem if coming from a home network. That is why > I really think the OP's issue is the network, not something that can > reasonably fixed in git. I think the network speed is also a potential > red-herring unless the speed issue relates to the ISP's configuration. > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-07-08 16:27 ` rsbecker 2024-07-14 12:00 ` ellie 2024-07-24 6:42 ` ellie @ 2025-09-08 2:34 ` Ellie 2 siblings, 0 replies; 43+ messages in thread From: Ellie @ 2025-09-08 2:34 UTC (permalink / raw) To: rsbecker, 'Konstantin Khomoutov'; +Cc: git This has been addressed on Github's side by now, it seems to have been a Github server config issue. Nevertheless, the ability to resume a file transfer remains what some would consider essential for internet software. I still hope it'll be added one day. Thank you for the lively debate. Regards, Ellie On 7/8/24 6:27 PM, rsbecker@nexbridge.com wrote: > On Monday, July 8, 2024 11:45 AM, Konstantin Khomoutov wrote: >> On Mon, Jul 08, 2024 at 05:14:33PM +0200, ellie wrote: >> >> [...] >>>>> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: >>>>> CANCEL (err 8) >>>> [...] >>>>> It seems extremely unlikely to me to be possibly an ISP issue, for >>>>> which I already listed the reasons. An additional one is HTTPS >>>>> downloads from github outside of git, e.g. from zip archives, for >>>>> way larger files work fine as well. >>>> [...] >>>> What if you explicitly disable HTTP/2 when cloning? >> [...] >>> Thanks for the idea! I tested it: >>> >>> $ git -c http.version=HTTP/1.1 clone >>> https://github.com/maliit/keyboard >> >> Over there at SO people are trying all sorts of black magic to combat a > problem >> which manifests itself in a way very similar to yours [1]. I'm not sure > anything from >> there could be of help but maybe worth trying anyway as you can override > any (or >> almost any) Git's configuration setting using that "-c" >> command-line option, so basically test round-trips should not be > painstakingly >> long. >> >> [...] >>> fetch-pack: unexpected disconnect while reading sideband packet >> [...] >>> Sadly, it seems like the error is only slightly different. >> >> I actually find it interesting that in each case a sideband packet is > mentioned. But >> quite possibly it's a red herring anyway. >> >> 1. https://stackoverflow.com/questions/66366582 > > I have customers who hit this problem frequently setting up git. It is 99% > of the time a firewall or proxy configuration issue, not specific to GitHub, > and changes to those usually resolve the problem. The firewall and proxy can > be implemented in the ISP's modem if coming from a home network. That is why > I really think the OP's issue is the network, not something that can > reasonably fixed in git. I think the network speed is also a potential > red-herring unless the speed issue relates to the ISP's configuration. > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: With big repos and slower connections, git clone can be hard to work with 2024-06-07 23:28 With big repos and slower connections, git clone can be hard to work with ellie 2024-06-07 23:33 ` rsbecker @ 2024-09-30 21:01 ` Ellie 1 sibling, 0 replies; 43+ messages in thread From: Ellie @ 2024-09-30 21:01 UTC (permalink / raw) To: git My apologies for bringing this up again, but for what it's worth, this git repository I can't even clone at depth 1: $ git clone --depth 1 https://github.com/alf632/terrain3dglitch Cloning into 'terrain3dglitch'... remote: Enumerating objects: 697, done. remote: Counting objects: 100% (697/697), done. remote: Compressing objects: 100% (439/439), done. error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL (err 8) error: 1754 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output The problem seems to be possibly amplified by a timeout config issue from github's side, but also made worse by depth 1 already being 100MB+. Downloading that amount without resume isn't feasible for everyone. I'm assuming if I need all files and sub dirs, there's no workaround here? I don't want to waste anybody's time, I'm just hoping to provide some further data points that in some edge cases, this can be impactful. (And sorry if I did something silly while cloning and didn't realize.) Regards, Ellie On 6/8/24 1:28 AM, ellie wrote: > Dear git team, > > I'm terribly sorry if this is the wrong place, but I'd like to suggest a > potential issue with "git clone". > > The problem is that any sort of interruption or connection issue, no > matter how brief, causes the clone to stop and leave nothing behind: > > $ git clone https://github.com/Nheko-Reborn/nheko > Cloning into 'nheko'... > remote: Enumerating objects: 43991, done. > remote: Counting objects: 100% (6535/6535), done. > remote: Compressing objects: 100% (1449/1449), done. > error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: > CANCEL (err 8) > error: 2771 bytes of body are still expected > fetch-pack: unexpected disconnect while reading sideband packet > fatal: early EOF > fatal: fetch-pack: invalid index-pack output > $ cd nheko > bash: cd: nheko: No such file or director > > In my experience, this can be really impactful with 1. big repositories > and 2. unreliable internet - which I would argue isn't unheard of! E.g. > a developer may work via mobile connection on a business trip. The > result can even be that a repository is uncloneable for some users! > > This has left me in the absurd situation where I was able to download a > tarball via HTTPS from the git hoster just fine, even way larger binary > release items, thanks to the browser's HTTPS resume. And yet a simple > git clone of the same project failed repeatedly. > > My deepest apologies if I missed an option to fix or address this. But > summed up, please consider making git clone recover from hiccups. > > Regards, > > Ellie > > PS: I've seen git hosters have apparent proxy bugs, like timing out > slower git clone connections from the server side even if the transfer > is ongoing. A git auto-resume would reduce the impact of that, too. > > > ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2025-09-08 2:44 UTC | newest] Thread overview: 43+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-07 23:28 With big repos and slower connections, git clone can be hard to work with ellie 2024-06-07 23:33 ` rsbecker 2024-06-08 0:03 ` ellie 2024-06-08 0:35 ` rsbecker 2024-06-08 0:46 ` ellie 2024-06-08 8:43 ` Jeff King 2024-06-08 9:40 ` ellie 2024-06-08 9:44 ` ellie 2024-06-08 10:38 ` Jeff King 2024-06-08 10:35 ` Jeff King 2024-06-08 11:05 ` ellie 2024-06-08 19:00 ` Junio C Hamano 2024-06-08 20:16 ` ellie 2024-06-10 6:46 ` Patrick Steinhardt 2024-06-10 19:04 ` Emily Shaffer 2024-06-10 20:34 ` Junio C Hamano 2024-06-10 21:55 ` ellie 2024-06-13 10:10 ` Toon claes 2024-06-11 6:31 ` Jeff King 2024-06-11 15:12 ` Junio C Hamano 2024-06-29 1:53 ` Sitaram Chamarty 2024-06-11 6:26 ` Jeff King 2024-06-11 19:40 ` Ivan Frade 2024-07-07 23:42 ` ellie 2024-07-08 1:27 ` rsbecker 2024-07-08 2:28 ` ellie 2024-07-08 12:30 ` rsbecker 2024-07-08 12:41 ` ellie 2024-07-08 14:32 ` Konstantin Khomoutov 2024-07-08 15:02 ` rsbecker 2024-07-08 15:14 ` ellie 2024-07-08 15:31 ` rsbecker 2024-07-08 15:48 ` ellie 2024-07-08 16:23 ` rsbecker 2024-07-08 17:06 ` ellie 2024-07-08 17:38 ` rsbecker 2024-07-08 16:09 ` Emanuel Czirai 2024-07-08 15:44 ` Konstantin Khomoutov 2024-07-08 16:27 ` rsbecker 2024-07-14 12:00 ` ellie 2024-07-24 6:42 ` ellie 2025-09-08 2:34 ` Ellie 2024-09-30 21:01 ` Ellie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).