[BUG] Git push sends too much data unnecessarily

public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed

* [BUG] Git push sends too much data unnecessarily
@ 2026-01-14 12:41 Rajiv Sharma
  2026-01-14 16:27 ` Karthik Nayak
  0 siblings, 1 reply; 7+ messages in thread
From: Rajiv Sharma @ 2026-01-14 12:41 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)

I tried to create a new branch pointing to the commit which was the
ancestor of the current branch (i.e. HEAD~1) and pushing it to the
remote. Since the commit was already known to the server, I expected
the push to be kind of no-op since it's simply creating a new pointer.
However the push ended up taking 10+ minutes. Since I was running with
the `--verbose` flag, I realised that the push ended up sending
multiple GBs worth of data just for creating a new branch on an
existing commit already known to the remote. After some
experimentation, I managed to find an easy repro for this issue:

Clone a non-empty repo from some remote (e.g. git clone
https://SERVER_HOSTNAME/repo_name.git) in two locations, `primary` and
`secondary` and ensure that both have the same branch checked out.
Navigate to the `primary` location and create a local commit for repo
`repo_name`. Push this commit C1 to the remote server
Navigate to the `secondary` location and try to create a new branch by
running `git push origin HEAD:refs/heads/shiny_new_branch --verbose`
(or by checking out that branch and pushing it). Note that `HEAD` here
refers to the `HEAD` commit as seen by `secondary` which in reality is
`HEAD~1` compared to the remote
If the repo had some commits on the checked out branch, you will
notice the verbose output highlighting objects being sent to the
server where there was no need to do so

To understand more about exactly how much data is sent, I ran a few
more experiments and came to the conclusion that the git client sends
HEAD commit + all ancestors of HEAD commit except the commits which
are also ancestors of some other branch / ref known to Git.
Pictorially, it can be represented as:

B1  B2       <-- HEAD
*      *         (sent)
|       |
*       *         (sent)
|        |
*        *        (sent)
|      /
|    /
*                  (NOT sent)
|
*                  (NOT sent)

This explains the multi GB push in my case because I was working on a
long standing branch with lots of commits. Initially I assumed this
was a server problem but then realised that in the push path the
server just advertises refs and where they point and it's the client
that does the negotiation. I think the bug exists somewhere in the
negotiation logic but I am not sure.

What did you expect to happen? (Expected behavior)

I would have expected the push to be extremely lightweight without
sending any objects to the server.

What happened instead? (Actual behavior)

Already detailed in the first section above.

What's different between what you expected and what actually happened?

The git client sends loads of data to the server when it shouldn't
have had to send anything at all.

Anything else you want to add:

Note that there are workarounds for this problem. If I do a `git pull`
and get the latest state of the repo before performing any push, this
problem doesn't occur. Nevertheless, I think it might be worthwhile to
fix this. I managed to repro this across OS (Linux, MacOS) and across
versions.

[System Info]
git version:
git version 2.47.3
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
libcurl: 7.76.1
OpenSSL: OpenSSL 3.5.1 1 Jul 2025
zlib: 1.2.11
uname: Linux 6.9.0-0_fbk12_0_g28f2d09ad102 #1 SMP Thu Nov  6 08:05:52
PST 2025 x86_64
compiler info: gnuc: 11.5
libc info: glibc: 2.34
$SHELL (typically, interactive shell): /bin/bash

[Enabled Hooks]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Git push sends too much data unnecessarily
  2026-01-14 12:41 [BUG] Git push sends too much data unnecessarily Rajiv Sharma
@ 2026-01-14 16:27 ` Karthik Nayak
  2026-01-14 17:38   ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Karthik Nayak @ 2026-01-14 16:27 UTC (permalink / raw)
  To: Rajiv Sharma, git

[-- Attachment #1: Type: text/plain, Size: 6950 bytes --]

Rajiv Sharma <rajiv.tilakraj.sharma@gmail.com> writes:

> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
>
> I tried to create a new branch pointing to the commit which was the
> ancestor of the current branch (i.e. HEAD~1) and pushing it to the
> remote. Since the commit was already known to the server, I expected
> the push to be kind of no-op since it's simply creating a new pointer.
> However the push ended up taking 10+ minutes. Since I was running with
> the `--verbose` flag, I realised that the push ended up sending
> multiple GBs worth of data just for creating a new branch on an
> existing commit already known to the remote. After some
> experimentation, I managed to find an easy repro for this issue:
>
> Clone a non-empty repo from some remote (e.g. git clone
> https://SERVER_HOSTNAME/repo_name.git) in two locations, `primary` and
> `secondary` and ensure that both have the same branch checked out.
> Navigate to the `primary` location and create a local commit for repo
> `repo_name`. Push this commit C1 to the remote server
> Navigate to the `secondary` location and try to create a new branch by
> running `git push origin HEAD:refs/heads/shiny_new_branch --verbose`
> (or by checking out that branch and pushing it). Note that `HEAD` here
> refers to the `HEAD` commit as seen by `secondary` which in reality is
> `HEAD~1` compared to the remote
> If the repo had some commits on the checked out branch, you will
> notice the verbose output highlighting objects being sent to the
> server where there was no need to do so
>
>
> To understand more about exactly how much data is sent, I ran a few
> more experiments and came to the conclusion that the git client sends
> HEAD commit + all ancestors of HEAD commit except the commits which
> are also ancestors of some other branch / ref known to Git.
> Pictorially, it can be represented as:
>
> B1  B2       <-- HEAD
> *      *         (sent)
> |       |
> *       *         (sent)
> |        |
> *        *        (sent)
> |      /
> |    /
> *                  (NOT sent)
> |
> *                  (NOT sent)
>
> This explains the multi GB push in my case because I was working on a
> long standing branch with lots of commits. Initially I assumed this
> was a server problem but then realised that in the push path the
> server just advertises refs and where they point and it's the client
> that does the negotiation. I think the bug exists somewhere in the
> negotiation logic but I am not sure.
>

Thanks for the detailed explanation. I don't think this is a bug per-se,
but that doesn't mean this isn't something we can't discuss and
potentiall optimize

To reiterate my understanding, I did a quick local PoC:

$ git init remote
$ git -C remote config set receive.denyCurrentBranch ignore
$ git -C remote commit --allow-empty -m "C1"
$ git -C remote commit --allow-empty -m "C2"
$ git -C remote commit --allow-empty -m "C3"

$ git clone remote/ base1
$ git clone remote/ base2

$ git -C base1 commit --allow-empty -m "C4"
$ git -C base1 push -f --verbose
Pushing to /tmp/remote/
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 704 bytes | 704.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To /tmp/remote/
   78c400c..affbad8  master -> master
updating local tracking ref 'refs/remotes/origin/master'

$ git -C base2 push -f --verbose origin HEAD:refs/heads/fun
Pushing to /tmp/remote/
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 1.98 KiB | 1.98 MiB/s, done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To /tmp/remote/
 * [new branch]      HEAD -> fun
updating local tracking ref 'refs/remotes/origin/fun'

What you're stating about and can be easily seen here is that while
pushing C4 from base1 only transferred one object, pushing HEAD from
base2 (which is C4~1), pushes 4 objects.

After base1 creates C4 and pushes:
==================================
remote:     C1 --- C2 --- C3 --- C4 (master)

base1:      C1 --- C2 --- C3 --- C4 (master, origin/master)
                                  ^
                                  |
                         (transfers only C4)

base2:      C1 --- C2 --- C3 (master, origin/master)


When base2 pushes HEAD (=C3) to refs/heads/fun:
================================================
remote:     C1 --- C2 --- C3 --- C4 (master)
                            \
                             fun

base2:      C1 --- C2 --- C3 (master, origin/master)
                        ^
                        |
              (transfers C1, C2, C3, + tree object)
              (4 objects total)

This boils down to how Git negotiates between the client <> server.
In our case, remote will list the references it already contains. So in
our experiment, that'd be:

 - C4: affbad8

With this information, the client should find all the objects the remote
would need to satisfy the new references being pushed.

Since C4 is a reference the client (base2) knows nothing about, it
cannot find a common ancestor between the provided commit vs all commits
present within the repository itself. This is seems obvious to us, since
C4~1 is the common ancestor here, but base2 doesn't have sufficient
information to come to that conclusion.

So it sends all objects required to create the reference, in our case 4
objects, in your case GBs of data.

> What did you expect to happen? (Expected behavior)
>
> I would have expected the push to be extremely lightweight without
> sending any objects to the server.
>
>
> What happened instead? (Actual behavior)
>
> Already detailed in the first section above.
>
>
> What's different between what you expected and what actually happened?
>
> The git client sends loads of data to the server when it shouldn't
> have had to send anything at all.
>
>
> Anything else you want to add:
>
> Note that there are workarounds for this problem. If I do a `git pull`
> and get the latest state of the repo before performing any push, this
> problem doesn't occur. Nevertheless, I think it might be worthwhile to
> fix this. I managed to repro this across OS (Linux, MacOS) and across
> versions.
>

That said, I do think we can potentially optimize this, AFAIK the
negotiation phase has the server listing its refs and this is compared
to the list of refs locally present to determine all missing objects.

So any commits which are not represented by a ref, would be missed. One
way to reduce this would be for the server to also provide additional
information such as commits which are not represented by any refs. But
how many such commits? What about sampling? Finally we'd have to
consider if it is worth it.

Thanks,
Karthik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Git push sends too much data unnecessarily
  2026-01-14 16:27 ` Karthik Nayak
@ 2026-01-14 17:38   ` Junio C Hamano
  2026-01-14 17:39     ` Rajiv Sharma
  2026-01-15  9:43     ` Karthik Nayak
  0 siblings, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2026-01-14 17:38 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: Rajiv Sharma, git

Karthik Nayak <karthik.188@gmail.com> writes:

> So it sends all objects required to create the reference, in our case 4
> objects, in your case GBs of data.

"push.negotiate"?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Git push sends too much data unnecessarily
  2026-01-14 17:38   ` Junio C Hamano
@ 2026-01-14 17:39     ` Rajiv Sharma
  2026-01-14 21:11       ` Jeff King
  2026-01-15  9:43     ` Karthik Nayak
  1 sibling, 1 reply; 7+ messages in thread
From: Rajiv Sharma @ 2026-01-14 17:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, git

Thanks for the great explanation! You are right, it's not really a bug
(because there is no correctness problem here) but it surely is
suboptimal behavior.

> This boils down to how Git negotiates between the client <> server

I think that's the crux of the problem here. I don't think git
negotiates in the push path the way it does in the read path, i.e.
there is no process of client-server communication that involves
gradually arriving at the common base (in this case it would be C3).
The read path does this quite well (using something akin to a skiplist
IIRC?) and the common base is found in a couple iterations in most
cases. I am unaware of the historical context behind this difference
but I assume the server sending unnecessary extra data during the read
path would be much more expensive than the client doing it hence the
push protocol is kept simpler.

This kind of negotiation _could_ be added to the push path but it
would be a breaking change. I read somewhere that there were plans for
Push Protocol V2 (in the same vein as Read Protocol V2) so it would be
great to see this improvement making its way there!

Thanks
Rajiv Sharma

On Wed, Jan 14, 2026 at 5:38 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Karthik Nayak <karthik.188@gmail.com> writes:
>
> > So it sends all objects required to create the reference, in our case 4
> > objects, in your case GBs of data.
>
> "push.negotiate"?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Git push sends too much data unnecessarily
  2026-01-14 17:39     ` Rajiv Sharma
@ 2026-01-14 21:11       ` Jeff King
  2026-01-14 21:48         ` Rajiv Sharma
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2026-01-14 21:11 UTC (permalink / raw)
  To: Rajiv Sharma; +Cc: Junio C Hamano, Karthik Nayak, git

On Wed, Jan 14, 2026 at 05:39:43PM +0000, Rajiv Sharma wrote:

> > This boils down to how Git negotiates between the client <> server
> 
> I think that's the crux of the problem here. I don't think git
> negotiates in the push path the way it does in the read path, i.e.
> there is no process of client-server communication that involves
> gradually arriving at the common base (in this case it would be C3).
> The read path does this quite well (using something akin to a skiplist
> IIRC?) and the common base is found in a couple iterations in most
> cases. I am unaware of the historical context behind this difference
> but I assume the server sending unnecessary extra data during the read
> path would be much more expensive than the client doing it hence the
> push protocol is kept simpler.
> 
> This kind of negotiation _could_ be added to the push path but it
> would be a breaking change. I read somewhere that there were plans for
> Push Protocol V2 (in the same vein as Read Protocol V2) so it would be
> great to see this improvement making its way there!

I think you may have misunderstood Junio's response. We do have
push.negotiate already. It's just not the default.

Did you try your example with "git -c push.negotiate=true push ..."?

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Git push sends too much data unnecessarily
  2026-01-14 21:11       ` Jeff King
@ 2026-01-14 21:48         ` Rajiv Sharma
  0 siblings, 0 replies; 7+ messages in thread
From: Rajiv Sharma @ 2026-01-14 21:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Karthik Nayak, git

Ah you are right, "push.negotiate" is exactly what is needed here. I
tried this out and it works like a charm. Thanks for sorting this out.

- Rajiv Sharma

On Wed, Jan 14, 2026 at 9:11 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jan 14, 2026 at 05:39:43PM +0000, Rajiv Sharma wrote:
>
> > > This boils down to how Git negotiates between the client <> server
> >
> > I think that's the crux of the problem here. I don't think git
> > negotiates in the push path the way it does in the read path, i.e.
> > there is no process of client-server communication that involves
> > gradually arriving at the common base (in this case it would be C3).
> > The read path does this quite well (using something akin to a skiplist
> > IIRC?) and the common base is found in a couple iterations in most
> > cases. I am unaware of the historical context behind this difference
> > but I assume the server sending unnecessary extra data during the read
> > path would be much more expensive than the client doing it hence the
> > push protocol is kept simpler.
> >
> > This kind of negotiation _could_ be added to the push path but it
> > would be a breaking change. I read somewhere that there were plans for
> > Push Protocol V2 (in the same vein as Read Protocol V2) so it would be
> > great to see this improvement making its way there!
>
> I think you may have misunderstood Junio's response. We do have
> push.negotiate already. It's just not the default.
>
> Did you try your example with "git -c push.negotiate=true push ..."?
>
> -Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Git push sends too much data unnecessarily
  2026-01-14 17:38   ` Junio C Hamano
  2026-01-14 17:39     ` Rajiv Sharma
@ 2026-01-15  9:43     ` Karthik Nayak
  1 sibling, 0 replies; 7+ messages in thread
From: Karthik Nayak @ 2026-01-15  9:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Rajiv Sharma, git

[-- Attachment #1: Type: text/plain, Size: 285 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> So it sends all objects required to create the reference, in our case 4
>> objects, in your case GBs of data.
>
> "push.negotiate"?

Neat. Everyday there is something new to know! Thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-01-15  9:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-14 12:41 [BUG] Git push sends too much data unnecessarily Rajiv Sharma
2026-01-14 16:27 ` Karthik Nayak
2026-01-14 17:38   ` Junio C Hamano
2026-01-14 17:39     ` Rajiv Sharma
2026-01-14 21:11       ` Jeff King
2026-01-14 21:48         ` Rajiv Sharma
2026-01-15  9:43     ` Karthik Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox