* [BUG] `git push` sends unnecessary objects
@ 2023-09-13 22:59 Javier Mora
2023-09-17 13:21 ` Bagas Sanjaya
0 siblings, 1 reply; 4+ messages in thread
From: Javier Mora @ 2023-09-13 22:59 UTC (permalink / raw)
To: git
I came across this issue accidentally when trying to move a directory
containing a very large file, and deleting another file in that
directory while I was at it.
It seems to be caused by `pack.useSparse=true` being the default since
v2.27 (which I found out after spending quite a while manually
bisecting and compiling git since I noticed that this didn't happen in
v2.25; commit de3a864 introduces this regression).
* Expected:
Pushing a commit that moves a file without modifying it shouldn't
require sending a blob object for that file, since the remote server
already has that blob object.
* Observed:
Pushing a commit that moves a directory containing a file and also
adds/deletes other files in that directory will for some reason also
send blobs for all the files in that directory, even the ones that
were already in the remote.
* Consequences:
This has a very big impact in push times for very small commits
that just move around files, if those files are very big (I had this
happen with a >100MB file over a problematic connection... yikes!)
* Note:
The commit introducing the regression does warn about possible
scenarios involving a special arrangement of exact copies across
directories, but these are not "copies", I just moved a file, which
seems like a rather common operation.
Code snippet for reproduction:
```
mkdir TEST_git
cd TEST_git
mkdir -p local remote/origin.git
cd remote/origin.git
git init --bare
cd ../../local
git init
git remote add origin file://"${PWD%/*}"/remote/origin.git
mkdir zig
for i in a b c d e; do
dd if=/dev/urandom of=zig/"$i" bs=1M count=1
done
git add .
git commit -m 'Add big files'
git push -u origin master
#>> Writing objects: 100% (8/8), 5.00 MiB | 13.27 MiB/s, done.
#^ makes sense: 1 commit + 2 trees (/ and /zig) + 5 files = 8;
# 5 MiB in total for the 5x 1 MiB binary files
git mv zig zag
git commit -m 'Move zig'
git push
#>> Writing objects: 100% (2/2), 233 bytes | 233.00 KiB/s, done.
#^ makes sense: 1 commit + 1 tree (/ renames /zig to /zag) = 2;
# a,b,c,d,e objects already in remote
git mv zag zog
touch zog/f
git add zog/f
git commit -m 'For great justice'
git push
#>> Writing objects: 100% (9/9), 5.00 MiB | 24.63 MiB/s, done.
#^ It re-uploaded the 5x 1 MiB blobs
# even though remote already had them.
```
Note that the latter doesn't happen if I use `git -c pack.useSparse=false push`.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [BUG] `git push` sends unnecessary objects
2023-09-13 22:59 [BUG] `git push` sends unnecessary objects Javier Mora
@ 2023-09-17 13:21 ` Bagas Sanjaya
2023-11-25 14:54 ` Javier Mora
0 siblings, 1 reply; 4+ messages in thread
From: Bagas Sanjaya @ 2023-09-17 13:21 UTC (permalink / raw)
To: Javier Mora, Git Mailing List; +Cc: Derrick Stolee, Junio C Hamano
[-- Attachment #1: Type: text/plain, Size: 2863 bytes --]
On Wed, Sep 13, 2023 at 11:59:35PM +0100, Javier Mora wrote:
> I came across this issue accidentally when trying to move a directory
> containing a very large file, and deleting another file in that
> directory while I was at it.
> It seems to be caused by `pack.useSparse=true` being the default since
> v2.27 (which I found out after spending quite a while manually
> bisecting and compiling git since I noticed that this didn't happen in
> v2.25; commit de3a864 introduces this regression).
>
> * Expected:
> Pushing a commit that moves a file without modifying it shouldn't
> require sending a blob object for that file, since the remote server
> already has that blob object.
> * Observed:
> Pushing a commit that moves a directory containing a file and also
> adds/deletes other files in that directory will for some reason also
> send blobs for all the files in that directory, even the ones that
> were already in the remote.
> * Consequences:
> This has a very big impact in push times for very small commits
> that just move around files, if those files are very big (I had this
> happen with a >100MB file over a problematic connection... yikes!)
> * Note:
> The commit introducing the regression does warn about possible
> scenarios involving a special arrangement of exact copies across
> directories, but these are not "copies", I just moved a file, which
> seems like a rather common operation.
>
> Code snippet for reproduction:
> ```
> mkdir TEST_git
> cd TEST_git
>
> mkdir -p local remote/origin.git
> cd remote/origin.git
> git init --bare
> cd ../../local
> git init
> git remote add origin file://"${PWD%/*}"/remote/origin.git
>
> mkdir zig
> for i in a b c d e; do
> dd if=/dev/urandom of=zig/"$i" bs=1M count=1
> done
> git add .
> git commit -m 'Add big files'
> git push -u origin master
> #>> Writing objects: 100% (8/8), 5.00 MiB | 13.27 MiB/s, done.
> #^ makes sense: 1 commit + 2 trees (/ and /zig) + 5 files = 8;
> # 5 MiB in total for the 5x 1 MiB binary files
>
> git mv zig zag
> git commit -m 'Move zig'
> git push
> #>> Writing objects: 100% (2/2), 233 bytes | 233.00 KiB/s, done.
> #^ makes sense: 1 commit + 1 tree (/ renames /zig to /zag) = 2;
> # a,b,c,d,e objects already in remote
>
> git mv zag zog
> touch zog/f
> git add zog/f
> git commit -m 'For great justice'
> git push
> #>> Writing objects: 100% (9/9), 5.00 MiB | 24.63 MiB/s, done.
> #^ It re-uploaded the 5x 1 MiB blobs
> # even though remote already had them.
> ```
>
> Note that the latter doesn't happen if I use `git -c pack.useSparse=false push`.
I can reproduce this regression on v2.42.0 (self-compiled) on my Debian
testing system.
Cc'ing Derrick and Junio.
Thanks for the report!
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [BUG] `git push` sends unnecessary objects
2023-09-17 13:21 ` Bagas Sanjaya
@ 2023-11-25 14:54 ` Javier Mora
[not found] ` <PH0PR00MB1349BB447657A94A8EA90A78A183A@PH0PR00MB1349.namprd00.prod.outlook.com>
0 siblings, 1 reply; 4+ messages in thread
From: Javier Mora @ 2023-11-25 14:54 UTC (permalink / raw)
To: Bagas Sanjaya; +Cc: Git Mailing List, Derrick Stolee, Junio C Hamano
Apparently if I do that in two commits (one to move the dir, and a
second one add the file), and then push all that after the second
commit, this doesn't happen -- the resulting push will only contain 6
objects (2 commits, 3 trees, and 1 file), and be a few bytes large.
El dom, 17 sept 2023 a las 14:21, Bagas Sanjaya
(<bagasdotme@gmail.com>) escribió:
>
> On Wed, Sep 13, 2023 at 11:59:35PM +0100, Javier Mora wrote:
> > I came across this issue accidentally when trying to move a directory
> > containing a very large file, and deleting another file in that
> > directory while I was at it.
> > It seems to be caused by `pack.useSparse=true` being the default since
> > v2.27 (which I found out after spending quite a while manually
> > bisecting and compiling git since I noticed that this didn't happen in
> > v2.25; commit de3a864 introduces this regression).
> >
> > * Expected:
> > Pushing a commit that moves a file without modifying it shouldn't
> > require sending a blob object for that file, since the remote server
> > already has that blob object.
> > * Observed:
> > Pushing a commit that moves a directory containing a file and also
> > adds/deletes other files in that directory will for some reason also
> > send blobs for all the files in that directory, even the ones that
> > were already in the remote.
> > * Consequences:
> > This has a very big impact in push times for very small commits
> > that just move around files, if those files are very big (I had this
> > happen with a >100MB file over a problematic connection... yikes!)
> > * Note:
> > The commit introducing the regression does warn about possible
> > scenarios involving a special arrangement of exact copies across
> > directories, but these are not "copies", I just moved a file, which
> > seems like a rather common operation.
> >
> > Code snippet for reproduction:
> > ```
> > mkdir TEST_git
> > cd TEST_git
> >
> > mkdir -p local remote/origin.git
> > cd remote/origin.git
> > git init --bare
> > cd ../../local
> > git init
> > git remote add origin file://"${PWD%/*}"/remote/origin.git
> >
> > mkdir zig
> > for i in a b c d e; do
> > dd if=/dev/urandom of=zig/"$i" bs=1M count=1
> > done
> > git add .
> > git commit -m 'Add big files'
> > git push -u origin master
> > #>> Writing objects: 100% (8/8), 5.00 MiB | 13.27 MiB/s, done.
> > #^ makes sense: 1 commit + 2 trees (/ and /zig) + 5 files = 8;
> > # 5 MiB in total for the 5x 1 MiB binary files
> >
> > git mv zig zag
> > git commit -m 'Move zig'
> > git push
> > #>> Writing objects: 100% (2/2), 233 bytes | 233.00 KiB/s, done.
> > #^ makes sense: 1 commit + 1 tree (/ renames /zig to /zag) = 2;
> > # a,b,c,d,e objects already in remote
> >
> > git mv zag zog
> > touch zog/f
> > git add zog/f
> > git commit -m 'For great justice'
> > git push
> > #>> Writing objects: 100% (9/9), 5.00 MiB | 24.63 MiB/s, done.
> > #^ It re-uploaded the 5x 1 MiB blobs
> > # even though remote already had them.
> > ```
> >
> > Note that the latter doesn't happen if I use `git -c pack.useSparse=false push`.
>
> I can reproduce this regression on v2.42.0 (self-compiled) on my Debian
> testing system.
>
> Cc'ing Derrick and Junio.
>
> Thanks for the report!
>
> --
> An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-11-30 13:33 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-13 22:59 [BUG] `git push` sends unnecessary objects Javier Mora
2023-09-17 13:21 ` Bagas Sanjaya
2023-11-25 14:54 ` Javier Mora
[not found] ` <PH0PR00MB1349BB447657A94A8EA90A78A183A@PH0PR00MB1349.namprd00.prod.outlook.com>
2023-11-30 13:33 ` [EXTERNAL] " Javier Mora
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).