* [BUG] Git push sends too much data unnecessarily @ 2026-01-14 12:41 Rajiv Sharma 2026-01-14 16:27 ` Karthik Nayak 0 siblings, 1 reply; 7+ messages in thread From: Rajiv Sharma @ 2026-01-14 12:41 UTC (permalink / raw) To: git Thank you for filling out a Git bug report! Please answer the following questions to help us understand your issue. What did you do before the bug happened? (Steps to reproduce your issue) I tried to create a new branch pointing to the commit which was the ancestor of the current branch (i.e. HEAD~1) and pushing it to the remote. Since the commit was already known to the server, I expected the push to be kind of no-op since it's simply creating a new pointer. However the push ended up taking 10+ minutes. Since I was running with the `--verbose` flag, I realised that the push ended up sending multiple GBs worth of data just for creating a new branch on an existing commit already known to the remote. After some experimentation, I managed to find an easy repro for this issue: Clone a non-empty repo from some remote (e.g. git clone https://SERVER_HOSTNAME/repo_name.git) in two locations, `primary` and `secondary` and ensure that both have the same branch checked out. Navigate to the `primary` location and create a local commit for repo `repo_name`. Push this commit C1 to the remote server Navigate to the `secondary` location and try to create a new branch by running `git push origin HEAD:refs/heads/shiny_new_branch --verbose` (or by checking out that branch and pushing it). Note that `HEAD` here refers to the `HEAD` commit as seen by `secondary` which in reality is `HEAD~1` compared to the remote If the repo had some commits on the checked out branch, you will notice the verbose output highlighting objects being sent to the server where there was no need to do so To understand more about exactly how much data is sent, I ran a few more experiments and came to the conclusion that the git client sends HEAD commit + all ancestors of HEAD commit except the commits which are also ancestors of some other branch / ref known to Git. Pictorially, it can be represented as: B1 B2 <-- HEAD * * (sent) | | * * (sent) | | * * (sent) | / | / * (NOT sent) | * (NOT sent) This explains the multi GB push in my case because I was working on a long standing branch with lots of commits. Initially I assumed this was a server problem but then realised that in the push path the server just advertises refs and where they point and it's the client that does the negotiation. I think the bug exists somewhere in the negotiation logic but I am not sure. What did you expect to happen? (Expected behavior) I would have expected the push to be extremely lightweight without sending any objects to the server. What happened instead? (Actual behavior) Already detailed in the first section above. What's different between what you expected and what actually happened? The git client sends loads of data to the server when it shouldn't have had to send anything at all. Anything else you want to add: Note that there are workarounds for this problem. If I do a `git pull` and get the latest state of the repo before performing any push, this problem doesn't occur. Nevertheless, I think it might be worthwhile to fix this. I managed to repro this across OS (Linux, MacOS) and across versions. [System Info] git version: git version 2.47.3 cpu: x86_64 no commit associated with this build sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh libcurl: 7.76.1 OpenSSL: OpenSSL 3.5.1 1 Jul 2025 zlib: 1.2.11 uname: Linux 6.9.0-0_fbk12_0_g28f2d09ad102 #1 SMP Thu Nov 6 08:05:52 PST 2025 x86_64 compiler info: gnuc: 11.5 libc info: glibc: 2.34 $SHELL (typically, interactive shell): /bin/bash [Enabled Hooks] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] Git push sends too much data unnecessarily 2026-01-14 12:41 [BUG] Git push sends too much data unnecessarily Rajiv Sharma @ 2026-01-14 16:27 ` Karthik Nayak 2026-01-14 17:38 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Karthik Nayak @ 2026-01-14 16:27 UTC (permalink / raw) To: Rajiv Sharma, git [-- Attachment #1: Type: text/plain, Size: 6950 bytes --] Rajiv Sharma <rajiv.tilakraj.sharma@gmail.com> writes: > Thank you for filling out a Git bug report! > Please answer the following questions to help us understand your issue. > > What did you do before the bug happened? (Steps to reproduce your issue) > > I tried to create a new branch pointing to the commit which was the > ancestor of the current branch (i.e. HEAD~1) and pushing it to the > remote. Since the commit was already known to the server, I expected > the push to be kind of no-op since it's simply creating a new pointer. > However the push ended up taking 10+ minutes. Since I was running with > the `--verbose` flag, I realised that the push ended up sending > multiple GBs worth of data just for creating a new branch on an > existing commit already known to the remote. After some > experimentation, I managed to find an easy repro for this issue: > > Clone a non-empty repo from some remote (e.g. git clone > https://SERVER_HOSTNAME/repo_name.git) in two locations, `primary` and > `secondary` and ensure that both have the same branch checked out. > Navigate to the `primary` location and create a local commit for repo > `repo_name`. Push this commit C1 to the remote server > Navigate to the `secondary` location and try to create a new branch by > running `git push origin HEAD:refs/heads/shiny_new_branch --verbose` > (or by checking out that branch and pushing it). Note that `HEAD` here > refers to the `HEAD` commit as seen by `secondary` which in reality is > `HEAD~1` compared to the remote > If the repo had some commits on the checked out branch, you will > notice the verbose output highlighting objects being sent to the > server where there was no need to do so > > > To understand more about exactly how much data is sent, I ran a few > more experiments and came to the conclusion that the git client sends > HEAD commit + all ancestors of HEAD commit except the commits which > are also ancestors of some other branch / ref known to Git. > Pictorially, it can be represented as: > > B1 B2 <-- HEAD > * * (sent) > | | > * * (sent) > | | > * * (sent) > | / > | / > * (NOT sent) > | > * (NOT sent) > > This explains the multi GB push in my case because I was working on a > long standing branch with lots of commits. Initially I assumed this > was a server problem but then realised that in the push path the > server just advertises refs and where they point and it's the client > that does the negotiation. I think the bug exists somewhere in the > negotiation logic but I am not sure. > Thanks for the detailed explanation. I don't think this is a bug per-se, but that doesn't mean this isn't something we can't discuss and potentiall optimize To reiterate my understanding, I did a quick local PoC: $ git init remote $ git -C remote config set receive.denyCurrentBranch ignore $ git -C remote commit --allow-empty -m "C1" $ git -C remote commit --allow-empty -m "C2" $ git -C remote commit --allow-empty -m "C3" $ git clone remote/ base1 $ git clone remote/ base2 $ git -C base1 commit --allow-empty -m "C4" $ git -C base1 push -f --verbose Pushing to /tmp/remote/ Enumerating objects: 1, done. Counting objects: 100% (1/1), done. Writing objects: 100% (1/1), 704 bytes | 704.00 KiB/s, done. Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0) To /tmp/remote/ 78c400c..affbad8 master -> master updating local tracking ref 'refs/remotes/origin/master' $ git -C base2 push -f --verbose origin HEAD:refs/heads/fun Pushing to /tmp/remote/ Enumerating objects: 4, done. Counting objects: 100% (4/4), done. Delta compression using up to 16 threads Compressing objects: 100% (3/3), done. Writing objects: 100% (4/4), 1.98 KiB | 1.98 MiB/s, done. Total 4 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0) To /tmp/remote/ * [new branch] HEAD -> fun updating local tracking ref 'refs/remotes/origin/fun' What you're stating about and can be easily seen here is that while pushing C4 from base1 only transferred one object, pushing HEAD from base2 (which is C4~1), pushes 4 objects. After base1 creates C4 and pushes: ================================== remote: C1 --- C2 --- C3 --- C4 (master) base1: C1 --- C2 --- C3 --- C4 (master, origin/master) ^ | (transfers only C4) base2: C1 --- C2 --- C3 (master, origin/master) When base2 pushes HEAD (=C3) to refs/heads/fun: ================================================ remote: C1 --- C2 --- C3 --- C4 (master) \ fun base2: C1 --- C2 --- C3 (master, origin/master) ^ | (transfers C1, C2, C3, + tree object) (4 objects total) This boils down to how Git negotiates between the client <> server. In our case, remote will list the references it already contains. So in our experiment, that'd be: - C4: affbad8 With this information, the client should find all the objects the remote would need to satisfy the new references being pushed. Since C4 is a reference the client (base2) knows nothing about, it cannot find a common ancestor between the provided commit vs all commits present within the repository itself. This is seems obvious to us, since C4~1 is the common ancestor here, but base2 doesn't have sufficient information to come to that conclusion. So it sends all objects required to create the reference, in our case 4 objects, in your case GBs of data. > What did you expect to happen? (Expected behavior) > > I would have expected the push to be extremely lightweight without > sending any objects to the server. > > > What happened instead? (Actual behavior) > > Already detailed in the first section above. > > > What's different between what you expected and what actually happened? > > The git client sends loads of data to the server when it shouldn't > have had to send anything at all. > > > Anything else you want to add: > > Note that there are workarounds for this problem. If I do a `git pull` > and get the latest state of the repo before performing any push, this > problem doesn't occur. Nevertheless, I think it might be worthwhile to > fix this. I managed to repro this across OS (Linux, MacOS) and across > versions. > That said, I do think we can potentially optimize this, AFAIK the negotiation phase has the server listing its refs and this is compared to the list of refs locally present to determine all missing objects. So any commits which are not represented by a ref, would be missed. One way to reduce this would be for the server to also provide additional information such as commits which are not represented by any refs. But how many such commits? What about sampling? Finally we'd have to consider if it is worth it. Thanks, Karthik [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] Git push sends too much data unnecessarily 2026-01-14 16:27 ` Karthik Nayak @ 2026-01-14 17:38 ` Junio C Hamano 2026-01-14 17:39 ` Rajiv Sharma 2026-01-15 9:43 ` Karthik Nayak 0 siblings, 2 replies; 7+ messages in thread From: Junio C Hamano @ 2026-01-14 17:38 UTC (permalink / raw) To: Karthik Nayak; +Cc: Rajiv Sharma, git Karthik Nayak <karthik.188@gmail.com> writes: > So it sends all objects required to create the reference, in our case 4 > objects, in your case GBs of data. "push.negotiate"? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] Git push sends too much data unnecessarily 2026-01-14 17:38 ` Junio C Hamano @ 2026-01-14 17:39 ` Rajiv Sharma 2026-01-14 21:11 ` Jeff King 2026-01-15 9:43 ` Karthik Nayak 1 sibling, 1 reply; 7+ messages in thread From: Rajiv Sharma @ 2026-01-14 17:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: Karthik Nayak, git Thanks for the great explanation! You are right, it's not really a bug (because there is no correctness problem here) but it surely is suboptimal behavior. > This boils down to how Git negotiates between the client <> server I think that's the crux of the problem here. I don't think git negotiates in the push path the way it does in the read path, i.e. there is no process of client-server communication that involves gradually arriving at the common base (in this case it would be C3). The read path does this quite well (using something akin to a skiplist IIRC?) and the common base is found in a couple iterations in most cases. I am unaware of the historical context behind this difference but I assume the server sending unnecessary extra data during the read path would be much more expensive than the client doing it hence the push protocol is kept simpler. This kind of negotiation _could_ be added to the push path but it would be a breaking change. I read somewhere that there were plans for Push Protocol V2 (in the same vein as Read Protocol V2) so it would be great to see this improvement making its way there! Thanks Rajiv Sharma On Wed, Jan 14, 2026 at 5:38 PM Junio C Hamano <gitster@pobox.com> wrote: > > Karthik Nayak <karthik.188@gmail.com> writes: > > > So it sends all objects required to create the reference, in our case 4 > > objects, in your case GBs of data. > > "push.negotiate"? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] Git push sends too much data unnecessarily 2026-01-14 17:39 ` Rajiv Sharma @ 2026-01-14 21:11 ` Jeff King 2026-01-14 21:48 ` Rajiv Sharma 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2026-01-14 21:11 UTC (permalink / raw) To: Rajiv Sharma; +Cc: Junio C Hamano, Karthik Nayak, git On Wed, Jan 14, 2026 at 05:39:43PM +0000, Rajiv Sharma wrote: > > This boils down to how Git negotiates between the client <> server > > I think that's the crux of the problem here. I don't think git > negotiates in the push path the way it does in the read path, i.e. > there is no process of client-server communication that involves > gradually arriving at the common base (in this case it would be C3). > The read path does this quite well (using something akin to a skiplist > IIRC?) and the common base is found in a couple iterations in most > cases. I am unaware of the historical context behind this difference > but I assume the server sending unnecessary extra data during the read > path would be much more expensive than the client doing it hence the > push protocol is kept simpler. > > This kind of negotiation _could_ be added to the push path but it > would be a breaking change. I read somewhere that there were plans for > Push Protocol V2 (in the same vein as Read Protocol V2) so it would be > great to see this improvement making its way there! I think you may have misunderstood Junio's response. We do have push.negotiate already. It's just not the default. Did you try your example with "git -c push.negotiate=true push ..."? -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] Git push sends too much data unnecessarily 2026-01-14 21:11 ` Jeff King @ 2026-01-14 21:48 ` Rajiv Sharma 0 siblings, 0 replies; 7+ messages in thread From: Rajiv Sharma @ 2026-01-14 21:48 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Karthik Nayak, git Ah you are right, "push.negotiate" is exactly what is needed here. I tried this out and it works like a charm. Thanks for sorting this out. - Rajiv Sharma On Wed, Jan 14, 2026 at 9:11 PM Jeff King <peff@peff.net> wrote: > > On Wed, Jan 14, 2026 at 05:39:43PM +0000, Rajiv Sharma wrote: > > > > This boils down to how Git negotiates between the client <> server > > > > I think that's the crux of the problem here. I don't think git > > negotiates in the push path the way it does in the read path, i.e. > > there is no process of client-server communication that involves > > gradually arriving at the common base (in this case it would be C3). > > The read path does this quite well (using something akin to a skiplist > > IIRC?) and the common base is found in a couple iterations in most > > cases. I am unaware of the historical context behind this difference > > but I assume the server sending unnecessary extra data during the read > > path would be much more expensive than the client doing it hence the > > push protocol is kept simpler. > > > > This kind of negotiation _could_ be added to the push path but it > > would be a breaking change. I read somewhere that there were plans for > > Push Protocol V2 (in the same vein as Read Protocol V2) so it would be > > great to see this improvement making its way there! > > I think you may have misunderstood Junio's response. We do have > push.negotiate already. It's just not the default. > > Did you try your example with "git -c push.negotiate=true push ..."? > > -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] Git push sends too much data unnecessarily 2026-01-14 17:38 ` Junio C Hamano 2026-01-14 17:39 ` Rajiv Sharma @ 2026-01-15 9:43 ` Karthik Nayak 1 sibling, 0 replies; 7+ messages in thread From: Karthik Nayak @ 2026-01-15 9:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: Rajiv Sharma, git [-- Attachment #1: Type: text/plain, Size: 285 bytes --] Junio C Hamano <gitster@pobox.com> writes: > Karthik Nayak <karthik.188@gmail.com> writes: > >> So it sends all objects required to create the reference, in our case 4 >> objects, in your case GBs of data. > > "push.negotiate"? Neat. Everyday there is something new to know! Thanks. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-01-15 9:43 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-14 12:41 [BUG] Git push sends too much data unnecessarily Rajiv Sharma 2026-01-14 16:27 ` Karthik Nayak 2026-01-14 17:38 ` Junio C Hamano 2026-01-14 17:39 ` Rajiv Sharma 2026-01-14 21:11 ` Jeff King 2026-01-14 21:48 ` Rajiv Sharma 2026-01-15 9:43 ` Karthik Nayak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox