* Are binary xdeltas only used if you use git-gc?
@ 2008-10-31 9:43 Thanassis Tsiodras
2008-10-31 11:02 ` Pierre Habouzit
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Thanassis Tsiodras @ 2008-10-31 9:43 UTC (permalink / raw)
To: git
Hi everyone.
I've been usig Git for the last couple of months and am quite happy with it.
In one of my Git repositories, I am storing uncompressed .tar files
(since being uncompressed allows git to detect and store
only their "real"differences).
However, when I introduce a new filename in the repos (with a minor
set of differences compared to an existing file with a different filename)
I've been unsuccessful in finding a way to tell Git to do it efficiently...
This is what I mean:
bash$ mkdir -p /var/tmp/tst
bash$ cd /var/tmp/tst
bash$ git init
bash$ cp /var/www/renderer-2.0e.tar .
bash$ git add renderer-2.0e.tar
bash$ git commit -m "First version"
bash$ du -s -k .git/
1724 .git/
bash$ cp renderer-2.0e.tar renderer-2.0f.tar
bash$ git add renderer-2.0f.tar
bash$ git commit -m "To add new version, first copy the first, so Git
detects it"
bash$ du -s -k .git/
1740 .git/
bash$ echo Good, Git detected it is the same
bash$ cp /var/www/renderer-2.0f.tar .
bash$ git add renderer-2.0f.tar
bash$ git commit -m "Real new version, slightly different to first"
bash$ du -s -k .git/
3344 .git/
bash$ echo What... did I do something wrong
bash$ xdelta delta renderer-2.0e.tar renderer-2.0f.tar delta
bash$ ls -l
total 7788
-rw-r--r-- 1 ttsiod ttsiod 8181 2008-10-31 11:27 delta
-rw-r--r-- 1 ttsiod ttsiod 3962880 2008-10-31 11:23 renderer-2.0e.tar
-rw-r--r-- 1 ttsiod ttsiod 3993600 2008-10-31 11:25 renderer-2.0f.tar
bash$ git-gc
bash$ du -s -k .git/
1660 .git/
So even though the xdelta is just 8KB, and git-gc actually finds out
that indeed
the new file is very similar to the old one, the initial commit of the
new version
in the repos is not taking advantage.
I found out about this when I tried to "git push" over a PSTN modem...
Then again, I must confess I only did the git-gc after I pushed.
Does the git-push actually take advantage of the similarities only if
I do a git-gc first?
If that is the case, I will create an alias to always git-gc after commits...
--
What I gave, I have; what I spent, I had; what I kept, I lost. -Old Epitaph
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 9:43 Are binary xdeltas only used if you use git-gc? Thanassis Tsiodras @ 2008-10-31 11:02 ` Pierre Habouzit 2008-10-31 11:16 ` Thanassis Tsiodras 2008-10-31 19:31 ` Nicolas Pitre 2008-10-31 11:15 ` Jakub Narebski 2008-10-31 12:42 ` Matthieu Moy 2 siblings, 2 replies; 23+ messages in thread From: Pierre Habouzit @ 2008-10-31 11:02 UTC (permalink / raw) To: Thanassis Tsiodras; +Cc: git [-- Attachment #1: Type: text/plain, Size: 631 bytes --] On Fri, Oct 31, 2008 at 09:43:43AM +0000, Thanassis Tsiodras wrote: > So even though the xdelta is just 8KB, and git-gc actually finds out > that indeed > the new file is very similar to the old one, the initial commit of the > new version > in the repos is not taking advantage. Have you tried to git repack with aggressive options, like: git repack --window=500 --depth=500 \ --window-memory=<fair amount of your physical RAM> -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 11:02 ` Pierre Habouzit @ 2008-10-31 11:16 ` Thanassis Tsiodras 2008-10-31 19:47 ` Nicolas Pitre 2008-10-31 19:31 ` Nicolas Pitre 1 sibling, 1 reply; 23+ messages in thread From: Thanassis Tsiodras @ 2008-10-31 11:16 UTC (permalink / raw) To: Pierre Habouzit; +Cc: git Actually, after using git-gc, git-repack isn't really needed... git-gc identifies that the two files are very similar and re-deltifies (see the du -s -k outputs in the original mail, after git-gc we have in fact lower usage than the first commit). My question is basically... (a) why doesn't git detect this during commit and needs a git-gc (b) whether after git-gc I would have seen the massive difference during a subsequent git-push or not Thanassis. > Have you tried to git repack with aggressive options, like: > > git repack --window=500 --depth=500 \ > --window-memory=<fair amount of your physical RAM> -- What I gave, I have; what I spent, I had; what I kept, I lost. -Old Epitaph ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 11:16 ` Thanassis Tsiodras @ 2008-10-31 19:47 ` Nicolas Pitre 0 siblings, 0 replies; 23+ messages in thread From: Nicolas Pitre @ 2008-10-31 19:47 UTC (permalink / raw) To: Thanassis Tsiodras; +Cc: Pierre Habouzit, git On Fri, 31 Oct 2008, Thanassis Tsiodras wrote: > Actually, after using git-gc, git-repack isn't really needed... > git-gc identifies that the two files are very similar and re-deltifies > (see the du -s -k outputs in the original mail, after git-gc we have > in fact lower usage than the first commit). git-gc does call git-repack already. In fact, git-gc is only a convenience wrapper for a couple maintenance commands. > My question is basically... > (a) why doesn't git detect this during commit and needs a git-gc Because we want commit operations to be fast. One of many usage scenarios for git is to apply a large amount of patches in one go, meaning many commits per second. The gc operation is potentially long, can be done unfrequently and deferred any time, like when you don't have to wait for it. > (b) whether after git-gc I would have seen the massive difference > during a subsequent git-push or not No. A push does create a pack with best size reduction already, whether or not your local or remote repositories are already packed. The only advantage for having your local repository packed is in the time required to create that same pack to be pushed. As mentioned already, you should consider the --thin switch if you are pushing over a slow link. If the remote repository already has necessary objects then all the pushed pack will contain is all deltas. The reason why --thin is not activated by default is because most people do pulls from a central server and only few people do pushes to such a server, and while thin packs do reduce the transmission size they do create slightly bigger packs on the receiving end which is best avoided on a busy server. Nicolas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 11:02 ` Pierre Habouzit 2008-10-31 11:16 ` Thanassis Tsiodras @ 2008-10-31 19:31 ` Nicolas Pitre 1 sibling, 0 replies; 23+ messages in thread From: Nicolas Pitre @ 2008-10-31 19:31 UTC (permalink / raw) To: Pierre Habouzit; +Cc: Thanassis Tsiodras, git On Fri, 31 Oct 2008, Pierre Habouzit wrote: > On Fri, Oct 31, 2008 at 09:43:43AM +0000, Thanassis Tsiodras wrote: > > So even though the xdelta is just 8KB, and git-gc actually finds out > > that indeed > > the new file is very similar to the old one, the initial commit of the > > new version > > in the repos is not taking advantage. > > Have you tried to git repack with aggressive options, like: > > git repack --window=500 --depth=500 \ > --window-memory=<fair amount of your physical RAM> That wouldn't bring any benefit in this case. Nicolas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 9:43 Are binary xdeltas only used if you use git-gc? Thanassis Tsiodras 2008-10-31 11:02 ` Pierre Habouzit @ 2008-10-31 11:15 ` Jakub Narebski 2008-10-31 11:28 ` Thanassis Tsiodras 2008-10-31 17:03 ` Jean-Luc Herren 2008-10-31 12:42 ` Matthieu Moy 2 siblings, 2 replies; 23+ messages in thread From: Jakub Narebski @ 2008-10-31 11:15 UTC (permalink / raw) To: Thanassis Tsiodras; +Cc: git "Thanassis Tsiodras" <ttsiodras@gmail.com> writes: > I've been usig Git for the last couple of months and am quite happy with it. > In one of my Git repositories, I am storing uncompressed .tar files > (since being uncompressed allows git to detect and store > only their "real"differences). I think you can use clean / smudge filter in gitattributes for that. [...] > Then again, I must confess I only did the git-gc after I pushed. > Does the git-push actually take advantage of the similarities only if > I do a git-gc first? Git does deltification _only_ in packfiles. But when you push via SSH git would generate a pack file with commits the other side doesn't have, and those packs are thin packs, so they also have deltas... but the remote side then adds bases to those thin packs making them standalone: you would have to git-gc on remote. HTH -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 11:15 ` Jakub Narebski @ 2008-10-31 11:28 ` Thanassis Tsiodras 2008-10-31 16:26 ` Jakub Narebski 2008-10-31 17:03 ` Jean-Luc Herren 1 sibling, 1 reply; 23+ messages in thread From: Thanassis Tsiodras @ 2008-10-31 11:28 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Fri, Oct 31, 2008 at 1:15 PM, Jakub Narebski <jnareb@gmail.com> wrote: > I think you can use clean / smudge filter in gitattributes for that. Thanks, I didn't know about that. Will look into it > Git does deltification _only_ in packfiles. But when you push via SSH > git would generate a pack file with commits the other side doesn't > have, and those packs are thin packs, so they also have deltas... but > the remote side then adds bases to those thin packs making them > standalone: you would have to git-gc on remote. So I have to git-gc on my side (after the commits), git-gc on the remote, and then git-push? What I gave, I have; what I spent, I had; what I kept, I lost. -Old Epitaph ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 11:28 ` Thanassis Tsiodras @ 2008-10-31 16:26 ` Jakub Narebski 2008-10-31 16:42 ` Matthieu Moy 0 siblings, 1 reply; 23+ messages in thread From: Jakub Narebski @ 2008-10-31 16:26 UTC (permalink / raw) To: Thanassis Tsiodras; +Cc: git Thanassis Tsiodras wrote: > On Fri, Oct 31, 2008 at 1:15 PM, Jakub Narebski <jnareb@gmail.com> wrote: > > Git does deltification _only_ in packfiles. But when you push via SSH > > git would generate a pack file with commits the other side doesn't > > have, and those packs are thin packs, so they also have deltas... but > > the remote side then adds bases to those thin packs making them > > standalone: you would have to git-gc on remote. > > So I have to git-gc on my side (after the commits), git-gc on the remote, > and then git-push? Perhaps I haven't made myself clear. On the local side: git-commit creates loose (compressed, but not deltified) objects. git-gc packs and deltifies. On the remote side (for smart protocols, i.e. git and ssh): git creates _thin_ pack, deltified; on the remote side git either makes pack thick/self contained by adding base objects (object + deltas), or explodes pack into loose object (object). You need git-gc on remote server to fully deltify on remote side. But transfer is fully deltified. On the remote side (for dumb protocols, i.e. rsync and http): git finds required packs and transfers them whole. So the situation is like on local side, but git might transfer more than really needed because it transfers packs in full. HTH. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 16:26 ` Jakub Narebski @ 2008-10-31 16:42 ` Matthieu Moy 2008-10-31 19:53 ` Nicolas Pitre 0 siblings, 1 reply; 23+ messages in thread From: Matthieu Moy @ 2008-10-31 16:42 UTC (permalink / raw) To: Jakub Narebski; +Cc: Thanassis Tsiodras, git Jakub Narebski <jnareb@gmail.com> writes: > Thanassis Tsiodras wrote: > >> So I have to git-gc on my side (after the commits), git-gc on the remote, >> and then git-push? > > Perhaps I haven't made myself clear. > > On the local side: git-commit creates loose (compressed, but not > deltified) objects. git-gc packs and deltifies. > > On the remote side (for smart protocols, i.e. git and ssh): git > creates _thin_ pack, deltified; I don't understand this point: the OP talks about pushing, so isn't the pack created on the _local_ machine (and then sent to the remote)? -- Matthieu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 16:42 ` Matthieu Moy @ 2008-10-31 19:53 ` Nicolas Pitre 2008-11-01 11:54 ` Thanassis Tsiodras 0 siblings, 1 reply; 23+ messages in thread From: Nicolas Pitre @ 2008-10-31 19:53 UTC (permalink / raw) To: Matthieu Moy; +Cc: Jakub Narebski, Thanassis Tsiodras, git On Fri, 31 Oct 2008, Matthieu Moy wrote: > Jakub Narebski <jnareb@gmail.com> writes: > > > Thanassis Tsiodras wrote: > > > >> So I have to git-gc on my side (after the commits), git-gc on the remote, > >> and then git-push? > > > > Perhaps I haven't made myself clear. > > > > On the local side: git-commit creates loose (compressed, but not > > deltified) objects. git-gc packs and deltifies. > > > > On the remote side (for smart protocols, i.e. git and ssh): git > > creates _thin_ pack, deltified; > > I don't understand this point: the OP talks about pushing, so isn't > the pack created on the _local_ machine (and then sent to the remote)? Yes, the pack is created on the fly when pushing, regardless if the repo is already packed or not locally. The only difference a locally packed repo provides is a shorter "Compressing objects" phase when pushing that's all. The packedness of the remote has no effect at all. Nicolas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 19:53 ` Nicolas Pitre @ 2008-11-01 11:54 ` Thanassis Tsiodras 2008-11-01 13:25 ` Nicolas Pitre 0 siblings, 1 reply; 23+ messages in thread From: Thanassis Tsiodras @ 2008-11-01 11:54 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Matthieu Moy, Jakub Narebski, git Thanks to everybody for your help. I will setup an alias to always use "git push --thin". For the reverse direction, I don't see a --thin for "git pull", My understanding is that "git pull" is optimal, and does what --thin does for push anyway, right? On 10/31/08, Nicolas Pitre <nico@cam.org> wrote: > On Fri, 31 Oct 2008, Matthieu Moy wrote: > >> Jakub Narebski <jnareb@gmail.com> writes: >> >> > Thanassis Tsiodras wrote: >> > >> >> So I have to git-gc on my side (after the commits), git-gc on the >> >> remote, >> >> and then git-push? >> > >> > Perhaps I haven't made myself clear. >> > >> > On the local side: git-commit creates loose (compressed, but not >> > deltified) objects. git-gc packs and deltifies. >> > >> > On the remote side (for smart protocols, i.e. git and ssh): git >> > creates _thin_ pack, deltified; >> >> I don't understand this point: the OP talks about pushing, so isn't >> the pack created on the _local_ machine (and then sent to the remote)? > > Yes, the pack is created on the fly when pushing, regardless if the repo > is already packed or not locally. The only difference a locally packed > repo provides is a shorter "Compressing objects" phase when pushing > that's all. The packedness of the remote has no effect at all. > > > Nicolas > -- What I gave, I have; what I spent, I had; what I kept, I lost. -Old Epitaph ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-11-01 11:54 ` Thanassis Tsiodras @ 2008-11-01 13:25 ` Nicolas Pitre 2008-11-03 20:35 ` Thanassis Tsiodras 0 siblings, 1 reply; 23+ messages in thread From: Nicolas Pitre @ 2008-11-01 13:25 UTC (permalink / raw) To: Thanassis Tsiodras; +Cc: Matthieu Moy, Jakub Narebski, git On Sat, 1 Nov 2008, Thanassis Tsiodras wrote: > Thanks to everybody for your help. > > I will setup an alias to always use "git push --thin". > For the reverse direction, I don't see a --thin for "git pull", > > My understanding is that "git pull" is optimal, > and does what --thin does for push anyway, right? Exact. Nicolas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-11-01 13:25 ` Nicolas Pitre @ 2008-11-03 20:35 ` Thanassis Tsiodras 2008-11-03 20:52 ` Pieter de Bie 2008-11-03 21:42 ` Nicolas Pitre 0 siblings, 2 replies; 23+ messages in thread From: Thanassis Tsiodras @ 2008-11-03 20:35 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Matthieu Moy, Jakub Narebski, git Despair... I just tested "git push --thin"... Doesn't work. It still sends the complete object, not a tiny pack as it could (should). But perhaps I now understand why: I run git-gc on both the remote end and the working end (before changing anything, i.e. with both repos being in sync - "git pull" and "git push" report all OK). I then noticed that on the remote side, .git/objects/pack had one big pack file, but on the local one I have two .pack files...! I proceeded to try (many combinations of params on) git-repack in a vain attempt to make my local repos also have one single .pack file (presumably, it should be able to exactly mirror the remote one, since it has the same objects inside it!). No way... git-prune and "git-fsck --full --strict --unreachable" report no errors either. I'm at a loss as to why the two repos are having different "pack representation" of the same objects and why git-gc and git-repack fail to create a single pack on my working side, but I'm guessing that this is why "git push --thin" fails to send small xdeltas... Any help/advice on what to try next would be most welcome... Thanassis. On 11/1/08, Nicolas Pitre <nico@cam.org> wrote: > On Sat, 1 Nov 2008, Thanassis Tsiodras wrote: > >> Thanks to everybody for your help. >> >> I will setup an alias to always use "git push --thin". >> For the reverse direction, I don't see a --thin for "git pull", >> >> My understanding is that "git pull" is optimal, >> and does what --thin does for push anyway, right? > > Exact. > > > Nicolas > -- What I gave, I have; what I spent, I had; what I kept, I lost. -Old Epitaph ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-11-03 20:35 ` Thanassis Tsiodras @ 2008-11-03 20:52 ` Pieter de Bie 2008-11-03 21:42 ` Nicolas Pitre 1 sibling, 0 replies; 23+ messages in thread From: Pieter de Bie @ 2008-11-03 20:52 UTC (permalink / raw) To: Thanassis Tsiodras; +Cc: Nicolas Pitre, Matthieu Moy, Jakub Narebski, git On 3 nov 2008, at 21:35, Thanassis Tsiodras wrote: > Any help/advice on what to try next would be most welcome... Perhaps this is one of the cases that can be handled by the tellme- more extension? http://repo.or.cz/w/git.git?a=commitdiff;h=5a9574c0100d287a4f2729dbaa64d057a5ee02e7 - Pieter ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-11-03 20:35 ` Thanassis Tsiodras 2008-11-03 20:52 ` Pieter de Bie @ 2008-11-03 21:42 ` Nicolas Pitre 2008-11-03 22:53 ` Thanassis Tsiodras 1 sibling, 1 reply; 23+ messages in thread From: Nicolas Pitre @ 2008-11-03 21:42 UTC (permalink / raw) To: Thanassis Tsiodras; +Cc: Matthieu Moy, Jakub Narebski, git On Mon, 3 Nov 2008, Thanassis Tsiodras wrote: > Despair... > > I just tested "git push --thin"... > Doesn't work. > > It still sends the complete object, not a tiny pack as it could (should). > > But perhaps I now understand why: > > I run git-gc on both the remote end and the working end (before > changing anything, > i.e. with both repos being in sync - "git pull" and "git push" report all OK). > I then noticed that on the remote side, .git/objects/pack had one big pack file, > but on the local one I have two .pack files...! > > I proceeded to try (many combinations of params on) git-repack in a vain attempt > to make my local repos also have one single .pack file (presumably, it > should be able > to exactly mirror the remote one, since it has the same objects inside > it!). No way... Please stop thinking that your repository layout has anything to do with what is actually transferred on a push. It has not. Here's a small test that you can do locally: mkdir repo_a mkdir repo_b cd repo_a git init seq 1000000 > data git add data git commit -m "initial commit" cd ../repo_b git init cd ../repo_a git push ../repo_b master:master Here you should see a line that says: Writing objects: 100% (3/3), 2.01 MiB, done. Therefore 2.1 MiB were transferred. Now let's continue: echo "foo" >> data git add data git commit -m "second commit" git push ../repo_b master:master You should get: Writing objects: 100% (3/3), 423 bytes, done. And this means that you even don't need the --thin switch (which is wrong -- this has been broken before but that's another story) for the transfer to actually send only the difference and not the whole file again. And note that none of those repositoryes actually contain any pack as everything is still loose objects. > I'm at a loss as to why the two repos are having different "pack > representation" of the same objects That's only because those objects entered each repositories in a different way. > and why git-gc and git-repack fail > to create a single pack on my working side, Maybe you have a .keep file in .git/objects/pack/ ? If so delete it and run 'git repack -a -d'. > but I'm guessing that this is why "git push --thin" fails to send > small xdeltas... Not at all. Please provide a complete log of your tests and maybe we could find something. Nicolas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-11-03 21:42 ` Nicolas Pitre @ 2008-11-03 22:53 ` Thanassis Tsiodras 2008-11-04 1:18 ` Nicolas Pitre 0 siblings, 1 reply; 23+ messages in thread From: Thanassis Tsiodras @ 2008-11-03 22:53 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Matthieu Moy, Jakub Narebski, git RESOLVED!!! Finally... What happened was actually quite reasonable, in hindsight... As I said in the original mail, this was what I did: cp version7.1.tar version7.2.tar git add version7.2.tar git commit -m "same data as old, so git will use old blob" echo MAGICPLACE read below... cp /path/to/work/realNewVersion7.2.tar version7.2.tar git add version7.2.tar git commit -m "and now, commit the really new version, so git can xdelta" git push --thin The problem was solved (that is, the "git push" became optimal, when I added a "git push" right after the MAGICPLACE mark above... In that way, the remote repo learned about the "dummy" commit that referenced the old blob... and when I did the subsequent "git push" at the end, the remote side could see that it already had this "dummy" commit to "xdelta on", and that it only needed the delta... Originally, when I used only one "git push --thin" at the end, the remote side didn't have the "dummy" commit, so it probably said: "I can't apply a delta, give me the full object". Phew. So it seems that if you must introduce a new file that is very similar to an existing one (in my case, a new version of software kept in an uncompressed .tar file), you have to do what I did above to allow for optimal "git push"es: that is: 1. Create the new filename by just copying the old (so the old blob is used) 2. commit 3. PUSH 4. copy the real new file 5. commit 6. PUSH. If you omit the middle PUSH in step 3, neither "git push", nor "git push --thin" can realize that this new file can be "incrementally built" on the remote side (even though git-gc totally squashes it in the pack). Thanks to all the people who responded, and especially Nicolas... Merci! On 11/3/08, Nicolas Pitre <nico@cam.org> wrote: > On Mon, 3 Nov 2008, Thanassis Tsiodras wrote: > >> Despair... >> >> I just tested "git push --thin"... >> Doesn't work. >> >> It still sends the complete object, not a tiny pack as it could (should). >> >> But perhaps I now understand why: >> >> I run git-gc on both the remote end and the working end (before >> changing anything, >> i.e. with both repos being in sync - "git pull" and "git push" report all >> OK). >> I then noticed that on the remote side, .git/objects/pack had one big pack >> file, >> but on the local one I have two .pack files...! >> >> I proceeded to try (many combinations of params on) git-repack in a vain >> attempt >> to make my local repos also have one single .pack file (presumably, it >> should be able >> to exactly mirror the remote one, since it has the same objects inside >> it!). No way... > > Please stop thinking that your repository layout has anything to do with > what is actually transferred on a push. It has not. > > Here's a small test that you can do locally: > > mkdir repo_a > mkdir repo_b > cd repo_a > git init > seq 1000000 > data > git add data > git commit -m "initial commit" > cd ../repo_b > git init > cd ../repo_a > git push ../repo_b master:master > > Here you should see a line that says: > > Writing objects: 100% (3/3), 2.01 MiB, done. > > Therefore 2.1 MiB were transferred. Now let's continue: > > echo "foo" >> data > git add data > git commit -m "second commit" > git push ../repo_b master:master > > You should get: > > Writing objects: 100% (3/3), 423 bytes, done. > > And this means that you even don't need the --thin switch (which is > wrong -- this has been broken before but that's another story) for the > transfer to actually send only the difference and not the whole file > again. And note that none of those repositoryes actually contain any > pack as everything is still loose objects. > >> I'm at a loss as to why the two repos are having different "pack >> representation" of the same objects > > That's only because those objects entered each repositories in a > different way. > >> and why git-gc and git-repack fail >> to create a single pack on my working side, > > Maybe you have a .keep file in .git/objects/pack/ ? If so delete it and > run 'git repack -a -d'. > >> but I'm guessing that this is why "git push --thin" fails to send >> small xdeltas... > > Not at all. > > Please provide a complete log of your tests and maybe we could find > something. > > > Nicolas > -- What I gave, I have; what I spent, I had; what I kept, I lost. -Old Epitaph ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-11-03 22:53 ` Thanassis Tsiodras @ 2008-11-04 1:18 ` Nicolas Pitre 2008-11-04 1:36 ` Junio C Hamano 0 siblings, 1 reply; 23+ messages in thread From: Nicolas Pitre @ 2008-11-04 1:18 UTC (permalink / raw) To: Thanassis Tsiodras; +Cc: Matthieu Moy, Jakub Narebski, git On Tue, 4 Nov 2008, Thanassis Tsiodras wrote: > RESOLVED!!! > > Finally... > What happened was actually quite reasonable, in hindsight... > As I said in the original mail, this was what I did: > > cp version7.1.tar version7.2.tar > git add version7.2.tar > git commit -m "same data as old, so git will use old blob" > echo MAGICPLACE read below... > cp /path/to/work/realNewVersion7.2.tar version7.2.tar > git add version7.2.tar > git commit -m "and now, commit the really new version, so git can xdelta" > git push --thin > > The problem was solved (that is, the "git push" became optimal, > when I added a "git push" right after the MAGICPLACE mark above... > In that way, the remote repo learned about the "dummy" commit that > referenced the old blob... and when I did the subsequent "git push" > at the end, the remote side could see that it already had this "dummy" > commit to "xdelta on", and that it only needed the delta... > > Originally, when I used only one "git push --thin" at the end, the remote > side didn't have the "dummy" commit, so it probably said: "I can't > apply a delta, give me the full object". Oh! But of course... In fact, the way thin packs work is to store delta against a base object which is not included in the pack. Those objects which are not included but used as delta base are currently only the previous version of a file which is part of the update to be pushed/fetched. In other words, there must be a previous version under the same name for this to work. Doing otherwise wouldn't scale if the previous commit had thousands of files to test against. But this particularity had escaped my mind somehow. > Phew. > > So it seems that if you must introduce a new file that is > very similar to an existing one (in my case, a new version > of software kept in an uncompressed .tar file), > you have to do what I did above to allow for optimal "git push"es: > that is: > > 1. Create the new filename by just copying the old > (so the old blob is used) > 2. commit > 3. PUSH > 4. copy the real new file > 5. commit > 6. PUSH. > > If you omit the middle PUSH in step 3, neither "git push", nor "git push --thin" > can realize that this new file can be "incrementally built" on the remote side > (even though git-gc totally squashes it in the pack). Right. Those thin packs were designed for different versions of the same file in mind, not different files with almost the same content. This could possibly be improved at some point... Nicolas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-11-04 1:18 ` Nicolas Pitre @ 2008-11-04 1:36 ` Junio C Hamano 2008-11-04 1:57 ` Nicolas Pitre 0 siblings, 1 reply; 23+ messages in thread From: Junio C Hamano @ 2008-11-04 1:36 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Thanassis Tsiodras, Matthieu Moy, Jakub Narebski, git Nicolas Pitre <nico@cam.org> writes: > Right. Those thin packs were designed for different versions of the > same file in mind, not different files with almost the same content. > This could possibly be improved at some point... Wouldn't using a large --window help by going across name-hash boundaries? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-11-04 1:36 ` Junio C Hamano @ 2008-11-04 1:57 ` Nicolas Pitre 2008-11-04 3:17 ` Junio C Hamano 0 siblings, 1 reply; 23+ messages in thread From: Nicolas Pitre @ 2008-11-04 1:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: Thanassis Tsiodras, Matthieu Moy, Jakub Narebski, git On Mon, 3 Nov 2008, Junio C Hamano wrote: > Nicolas Pitre <nico@cam.org> writes: > > > Right. Those thin packs were designed for different versions of the > > same file in mind, not different files with almost the same content. > > This could possibly be improved at some point... > > Wouldn't using a large --window help by going across name-hash boundaries? The issue is to decide what preferred delta base to add to the list of objects. Currently only objects with the same path as those being modified are considered. Nicolas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-11-04 1:57 ` Nicolas Pitre @ 2008-11-04 3:17 ` Junio C Hamano 0 siblings, 0 replies; 23+ messages in thread From: Junio C Hamano @ 2008-11-04 3:17 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Thanassis Tsiodras, Matthieu Moy, Jakub Narebski, git Nicolas Pitre <nico@cam.org> writes: > The issue is to decide what preferred delta base to add to the list of > objects. Currently only objects with the same path as those being > modified are considered. Ah, I was blind (even though that part is my code). Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 11:15 ` Jakub Narebski 2008-10-31 11:28 ` Thanassis Tsiodras @ 2008-10-31 17:03 ` Jean-Luc Herren 1 sibling, 0 replies; 23+ messages in thread From: Jean-Luc Herren @ 2008-10-31 17:03 UTC (permalink / raw) To: Jakub Narebski, Thanassis Tsiodras, git Jakub Narebski wrote: > "Thanassis Tsiodras" <ttsiodras@gmail.com> writes: >> Then again, I must confess I only did the git-gc after I pushed. >> Does the git-push actually take advantage of the similarities only if >> I do a git-gc first? > > Git does deltification _only_ in packfiles. But when you push via SSH > git would generate a pack file with commits the other side doesn't > have, and those packs are thin packs, so they also have deltas... AFAICT, git stopped pushing thin packs by default with 1.5.3.2, so you have to explicitely ask for it. The original poster might not be clear about this (or even know what a thin pack is). Thanassis, try to use "git push --thin". 'man git-push' says: --thin, --no-thin These options are passed to git-send-pack. Thin transfer spends extra cycles to minimize the number of objects to be sent and meant to be used on slower connection. I did a quick test with big random files and it indeed only sends small deltas on small changes, but if you don't pass --thin, it will send the full objects. I didn't find a configuration variable to change that default. It would make sense for people that regularly push over slow lines. Hope this helps, jlh ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 9:43 Are binary xdeltas only used if you use git-gc? Thanassis Tsiodras 2008-10-31 11:02 ` Pierre Habouzit 2008-10-31 11:15 ` Jakub Narebski @ 2008-10-31 12:42 ` Matthieu Moy 2008-10-31 14:22 ` Thanassis Tsiodras 2 siblings, 1 reply; 23+ messages in thread From: Matthieu Moy @ 2008-10-31 12:42 UTC (permalink / raw) To: Thanassis Tsiodras; +Cc: git "Thanassis Tsiodras" <ttsiodras@gmail.com> writes: > If that is the case, I will create an alias to always git-gc after commits... If you have a decent version of git, it already does "git gc --auto" regularly. With --auto, git gc will do nothing if you don't have too many unpacked objects, and will try to do the right thing otherwise (incremental packs, see man git gc). The idea is that "git gc" is a costly operation, and git prefers to waste a bit of disk space to make most "commit" really fast, and to take time to optimize the repository only when it grew too much. If you're worried about repository size and you have a permanently running machine, a good idea is to run git gc in a cron job, so that you work fast in daytime, and your computer optimizes hard at night time ;-) (I have gic gc + git fsck in a cron job, so I'll also know if a repository gets corrupted). -- Matthieu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Are binary xdeltas only used if you use git-gc? 2008-10-31 12:42 ` Matthieu Moy @ 2008-10-31 14:22 ` Thanassis Tsiodras 0 siblings, 0 replies; 23+ messages in thread From: Thanassis Tsiodras @ 2008-10-31 14:22 UTC (permalink / raw) To: Matthieu Moy; +Cc: git Actually, I am not so worried about disk size - I am far more worried about how long it takes to git-push over my PSTN modem connection. I'll try Jakub's suggestion (git-gc on both my machine and the remote machine hosting the repos) and report back. On Fri, Oct 31, 2008 at 2:42 PM, Matthieu Moy <Matthieu.Moy@imag.fr> wrote: > If you're worried about repository size and you have a permanently > running machine, a good idea is to run git gc in a cron job, so that > you work fast in daytime, and your computer optimizes hard at night > time ;-) (I have gic gc + git fsck in a cron job, so I'll also know if > a repository gets corrupted). > > -- > Matthieu > -- What I gave, I have; what I spent, I had; what I kept, I lost. -Old Epitaph ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2008-11-04 3:19 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-31 9:43 Are binary xdeltas only used if you use git-gc? Thanassis Tsiodras 2008-10-31 11:02 ` Pierre Habouzit 2008-10-31 11:16 ` Thanassis Tsiodras 2008-10-31 19:47 ` Nicolas Pitre 2008-10-31 19:31 ` Nicolas Pitre 2008-10-31 11:15 ` Jakub Narebski 2008-10-31 11:28 ` Thanassis Tsiodras 2008-10-31 16:26 ` Jakub Narebski 2008-10-31 16:42 ` Matthieu Moy 2008-10-31 19:53 ` Nicolas Pitre 2008-11-01 11:54 ` Thanassis Tsiodras 2008-11-01 13:25 ` Nicolas Pitre 2008-11-03 20:35 ` Thanassis Tsiodras 2008-11-03 20:52 ` Pieter de Bie 2008-11-03 21:42 ` Nicolas Pitre 2008-11-03 22:53 ` Thanassis Tsiodras 2008-11-04 1:18 ` Nicolas Pitre 2008-11-04 1:36 ` Junio C Hamano 2008-11-04 1:57 ` Nicolas Pitre 2008-11-04 3:17 ` Junio C Hamano 2008-10-31 17:03 ` Jean-Luc Herren 2008-10-31 12:42 ` Matthieu Moy 2008-10-31 14:22 ` Thanassis Tsiodras
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).