* git pull is slow
@ 2008-07-10 14:40 Stephan Hennig
2008-07-10 15:13 ` Martin Langhoff
` (4 more replies)
0 siblings, 5 replies; 51+ messages in thread
From: Stephan Hennig @ 2008-07-10 14:40 UTC (permalink / raw)
To: git
Hi,
I am observing very large data transfers when pulling from the
repository at <URL:http://repo.or.cz/w/wortliste.git>. This repository
contains one 13 MB text file that compressed is roughly 3 MB large.
While I'd expect pulling commits that change only a few lines of the
large text file to result in a download of less than, say 10kB, git pull
seems to transfer the complete, compressed file. I have observed this
several times for different commits. On the other hand, pushing my own
commits to the repository is fast (with git+ssh access method). Any
ideas what's going on and how to make pulling faster?
Best regards,
Stephan Hennig
> $ git version
> git version 1.5.6.1.1071.g76fb
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-10 14:40 git pull is slow Stephan Hennig
@ 2008-07-10 15:13 ` Martin Langhoff
2008-07-10 15:28 ` Petr Baudis
` (3 subsequent siblings)
4 siblings, 0 replies; 51+ messages in thread
From: Martin Langhoff @ 2008-07-10 15:13 UTC (permalink / raw)
To: Stephan Hennig; +Cc: git
On Thu, Jul 10, 2008 at 11:40 AM, Stephan Hennig <mailing_list@arcor.de> wrote:
> commits to the repository is fast (with git+ssh access method). Any
> ideas what's going on and how to make pulling faster?
When pushing you can use --thin to get a "thin pack" transfer, where
the delta-base is not transferred. Not sure how you request this from
git fetch these days though :-/ Search the list for "thin packs"
perhaps?
cheers,
m
--
martin.langhoff@gmail.com
martin@laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-10 14:40 git pull is slow Stephan Hennig
2008-07-10 15:13 ` Martin Langhoff
@ 2008-07-10 15:28 ` Petr Baudis
2008-07-10 15:30 ` Johannes Sixt
` (2 subsequent siblings)
4 siblings, 0 replies; 51+ messages in thread
From: Petr Baudis @ 2008-07-10 15:28 UTC (permalink / raw)
To: Stephan Hennig; +Cc: git
Hi,
On Thu, Jul 10, 2008 at 04:40:17PM +0200, Stephan Hennig wrote:
> I am observing very large data transfers when pulling from the
> repository at <URL:http://repo.or.cz/w/wortliste.git>. This repository
> contains one 13 MB text file that compressed is roughly 3 MB large.
>
> While I'd expect pulling commits that change only a few lines of the
> large text file to result in a download of less than, say 10kB, git pull
> seems to transfer the complete, compressed file. I have observed this
> several times for different commits. On the other hand, pushing my own
> commits to the repository is fast (with git+ssh access method). Any
> ideas what's going on and how to make pulling faster?
do you use HTTP or native Git protocol for pulling? If HTTP, you have
to live with this, sorry - the automatic repacks will create a new pack
every time and you will have to redownload the whole history; I tried to
avoid this problem but in the end I had to bow down to the agressive
repacking strategy because the number of packs was getting out of hand.
It is technically possible to implement some alternative more
HTTP-friendly packing strategy, but this has always stayed only in idea
stage. If you want to implement something, repo.or.cz will become a glad
user. :-)
--
Petr "Pasky" Baudis
The last good thing written in C++ was the Pachelbel Canon. -- J. Olson
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-10 14:40 git pull is slow Stephan Hennig
2008-07-10 15:13 ` Martin Langhoff
2008-07-10 15:28 ` Petr Baudis
@ 2008-07-10 15:30 ` Johannes Sixt
2008-07-10 15:45 ` Stephan Hennig
2008-07-11 12:25 ` Stephan Hennig
2008-07-11 12:55 ` Stephan Hennig
4 siblings, 1 reply; 51+ messages in thread
From: Johannes Sixt @ 2008-07-10 15:30 UTC (permalink / raw)
To: Stephan Hennig; +Cc: git
Stephan Hennig schrieb:
> I am observing very large data transfers when pulling from the
> repository at <URL:http://repo.or.cz/w/wortliste.git>. This repository
> contains one 13 MB text file that compressed is roughly 3 MB large.
>
> While I'd expect pulling commits that change only a few lines of the
> large text file to result in a download of less than, say 10kB, git pull
> seems to transfer the complete, compressed file. I have observed this
> several times for different commits. On the other hand, pushing my own
> commits to the repository is fast (with git+ssh access method). Any
> ideas what's going on and how to make pulling faster?
Do you by any chance use a http URL to pull? Don't do that. Use git protocol.
-- Hannes
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-10 15:30 ` Johannes Sixt
@ 2008-07-10 15:45 ` Stephan Hennig
2008-07-10 15:50 ` Petr Baudis
0 siblings, 1 reply; 51+ messages in thread
From: Stephan Hennig @ 2008-07-10 15:45 UTC (permalink / raw)
To: git; +Cc: Johannes Sixt
Johannes Sixt schrieb:
> Stephan Hennig schrieb:
>> I am observing very large data transfers when pulling from the
>> repository at <URL:http://repo.or.cz/w/wortliste.git>. This
>> repository contains one 13 MB text file that compressed is roughly
>> 3 MB large.
>
> Do you by any chance use a http URL to pull? Don't do that. Use git
> protocol.
No, I'm already using git+ssh.
Best regards,
Stephan Hennig
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-10 15:45 ` Stephan Hennig
@ 2008-07-10 15:50 ` Petr Baudis
2008-07-10 17:44 ` Stephan Hennig
0 siblings, 1 reply; 51+ messages in thread
From: Petr Baudis @ 2008-07-10 15:50 UTC (permalink / raw)
To: Stephan Hennig; +Cc: git, Johannes Sixt
On Thu, Jul 10, 2008 at 05:45:14PM +0200, Stephan Hennig wrote:
> Johannes Sixt schrieb:
> > Stephan Hennig schrieb:
> >> I am observing very large data transfers when pulling from the
> >> repository at <URL:http://repo.or.cz/w/wortliste.git>. This
> >> repository contains one 13 MB text file that compressed is roughly
> >> 3 MB large.
> >
> > Do you by any chance use a http URL to pull? Don't do that. Use git
> > protocol.
>
> No, I'm already using git+ssh.
Oh, ok. By the way, how long are you hitting this issue? Just today,
I have upgraded the chroot Git from some anonymous 2007-12-08 version
to the almost-latest #next.
--
Petr "Pasky" Baudis
The last good thing written in C++ was the Pachelbel Canon. -- J. Olson
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-10 15:50 ` Petr Baudis
@ 2008-07-10 17:44 ` Stephan Hennig
0 siblings, 0 replies; 51+ messages in thread
From: Stephan Hennig @ 2008-07-10 17:44 UTC (permalink / raw)
To: git; +Cc: Petr Baudis, Johannes Sixt
Petr Baudis schrieb:
> On Thu, Jul 10, 2008 at 05:45:14PM +0200, Stephan Hennig wrote:
>>
>> No, I'm already using git+ssh.
>
> Oh, ok. By the way, how long are you hitting this issue? Just today,
> I have upgraded the chroot Git from some anonymous 2007-12-08 version
> to the almost-latest #next.
I don't know for sure. But I think I've had the issue since I started
accessing that repository in October 2007 (then with WinGit-0.2-alpha).
Best regards,
Stephan Hennig
PS: Attached is my .git/config. Could the repositoryformatversion line
trigger some sort of dumb transfer protocol?
> [core]
> repositoryformatversion = 0
> filemode = false
> bare = false
> logallrefupdates = true
> symlinks = false
> autocrlf = false
> [remote "origin"]
> url = git+ssh://xxx@repo.or.cz/srv/git/wortliste.git
> fetch = +refs/heads/*:refs/remotes/origin/*
> [branch "master"]
> remote = origin
> merge = refs/heads/master
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-10 14:40 git pull is slow Stephan Hennig
` (2 preceding siblings ...)
2008-07-10 15:30 ` Johannes Sixt
@ 2008-07-11 12:25 ` Stephan Hennig
2008-07-11 13:34 ` Andreas Ericsson
2008-07-11 12:55 ` Stephan Hennig
4 siblings, 1 reply; 51+ messages in thread
From: Stephan Hennig @ 2008-07-11 12:25 UTC (permalink / raw)
To: git
Stephan Hennig schrieb:
> I am observing very large data transfers when pulling from the
> repository at <URL:http://repo.or.cz/w/wortliste.git>.
Here's the output of a recent pull:
> unknown@COMMODORE ~/Themen/trennmuster/repository/wortliste (master)
> $ git pull
> Enter passphrase for key '/x/home/.ssh/id_rsa':
> remote: Counting objects: 15, done.←[K
> remote: Compressing objects: 100% (7/7), done.←[K) ←[Kts: 8% (1/12)
> remote: Total 12 (delta 5), reused 12 (delta 5)←[K
> Unpacking objects: 100% (12/12), done.
> From git+ssh://xxx@repo.or.cz/srv/git/wortliste
> d905095..d0c6a33 master -> origin/master
> * [new tag] dehyph-exptl-v0.13 -> dehyph-exptl-v0.13
> Updating d905095..d0c6a33
> Fast forward
> wortliste←[m | 19 ←[32m+++++++++++←[m←[31m--------←[m
> 1 files changed, 11 insertions(+), 8 deletions(-)←[m
After the line containing "remote: Compressing objects:" had been
printed several MB have been transferred.
Does it matter that the original clone has been performed with plain git
protocol? I have only later changed the url in .git/config to use git+ssh.
Best regards,
Stephan Hennig
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-10 14:40 git pull is slow Stephan Hennig
` (3 preceding siblings ...)
2008-07-11 12:25 ` Stephan Hennig
@ 2008-07-11 12:55 ` Stephan Hennig
4 siblings, 0 replies; 51+ messages in thread
From: Stephan Hennig @ 2008-07-11 12:55 UTC (permalink / raw)
To: git
Stephan Hennig schrieb:
> Does it matter that the original clone has been performed with plain git
> protocol? I have only later changed the url in .git/config to use git+ssh.
To be precise, the original pull has been done via
url = git://repo.or.cz/wortliste.git
which is a read-only URL. I changed .git/config manually to point to
url = git+ssh://xxx@repo.or.cz/srv/git/wortliste.git
later, which is the push URL.
Best regards,
Stephan Hennig
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-11 12:25 ` Stephan Hennig
@ 2008-07-11 13:34 ` Andreas Ericsson
2008-07-11 14:04 ` Johannes Schindelin
0 siblings, 1 reply; 51+ messages in thread
From: Andreas Ericsson @ 2008-07-11 13:34 UTC (permalink / raw)
To: Stephan Hennig; +Cc: git
Stephan Hennig wrote:
> Stephan Hennig schrieb:
>
>> I am observing very large data transfers when pulling from the
>> repository at <URL:http://repo.or.cz/w/wortliste.git>.
>
> Here's the output of a recent pull:
>
>> unknown@COMMODORE ~/Themen/trennmuster/repository/wortliste (master)
>> $ git pull
>> Enter passphrase for key '/x/home/.ssh/id_rsa':
>> remote: Counting objects: 15, done.←[K
>> remote: Compressing objects: 100% (7/7), done.←[K) ←[Kts: 8% (1/12)
>> remote: Total 12 (delta 5), reused 12 (delta 5)←[K
>> Unpacking objects: 100% (12/12), done.
>> From git+ssh://xxx@repo.or.cz/srv/git/wortliste
>> d905095..d0c6a33 master -> origin/master
>> * [new tag] dehyph-exptl-v0.13 -> dehyph-exptl-v0.13
>> Updating d905095..d0c6a33
>> Fast forward
>> wortliste←[m | 19 ←[32m+++++++++++←[m←[31m--------←[m
>> 1 files changed, 11 insertions(+), 8 deletions(-)←[m
>
> After the line containing "remote: Compressing objects:" had been
> printed several MB have been transferred.
>
Seems like you're being bitten by a bug we had some months back,
where the client requested full history for new tag objects.
Does this bug still show up if you use the latest git from
master of git.git?
I *think* v1.5.4.3-440-g41fa7d2 fixed the issue, but I'm
not 100% certain as the commit message doesn't mention anything
about any bugs being solved. Otoh, I vaguely remember the first
bug-reporter being told to try 'next', and afair, that solved it
for him/her.
> Does it matter that the original clone has been performed with plain git
> protocol? I have only later changed the url in .git/config to use git+ssh.
>
No, that doesn't matter in the slightest.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-11 13:34 ` Andreas Ericsson
@ 2008-07-11 14:04 ` Johannes Schindelin
2008-07-12 12:32 ` Stephan Hennig
0 siblings, 1 reply; 51+ messages in thread
From: Johannes Schindelin @ 2008-07-11 14:04 UTC (permalink / raw)
To: Andreas Ericsson; +Cc: Stephan Hennig, git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1923 bytes --]
Hi,
On Fri, 11 Jul 2008, Andreas Ericsson wrote:
> Stephan Hennig wrote:
> > Stephan Hennig schrieb:
> >
> > > I am observing very large data transfers when pulling from the
> > > repository at <URL:http://repo.or.cz/w/wortliste.git>.
> >
> > Here's the output of a recent pull:
> >
> > > unknown@COMMODORE ~/Themen/trennmuster/repository/wortliste (master)
> > > $ git pull
> > > Enter passphrase for key '/x/home/.ssh/id_rsa':
> > > remote: Counting objects: 15, done.←[K
> > > remote: Compressing objects: 100% (7/7), done.←[K) ←[Kts: 8% (1/12)
> > > remote: Total 12 (delta 5), reused 12 (delta 5)←[K
> > > Unpacking objects: 100% (12/12), done.
> > > From git+ssh://xxx@repo.or.cz/srv/git/wortliste
> > > d905095..d0c6a33 master -> origin/master
> > > * [new tag] dehyph-exptl-v0.13 -> dehyph-exptl-v0.13
> > > Updating d905095..d0c6a33
> > > Fast forward
> > > wortliste←[m | 19 ←[32m+++++++++++←[m←[31m--------←[m
> > > 1 files changed, 11 insertions(+), 8 deletions(-)←[m
> >
> > After the line containing "remote: Compressing objects:" had been
> > printed several MB have been transferred.
> >
>
> Seems like you're being bitten by a bug we had some months back,
> where the client requested full history for new tag objects.
I do not think so. I think it is a problem with the pack. The slowness
is already there in the clone, in the resolving phase.
Judging from the output of "verify-pack -v", it seems that the delta
chains are quite long.
I saw memory usage go up pretty quickly. I thought that maybe
eac12e2(Correct pack memory leak causing git gc to try to exceed ulimit)
would help things, since it fixes a place where memory usage grows out of
bounds, but it did not.
Unfortunately, I do not have time to look into this further.
BTW there is a nasty usability bug when "Compressing objects:" and
"Receiving objects:" messages
Ciao,
Dscho
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-11 14:04 ` Johannes Schindelin
@ 2008-07-12 12:32 ` Stephan Hennig
2008-07-12 17:05 ` Johannes Schindelin
0 siblings, 1 reply; 51+ messages in thread
From: Stephan Hennig @ 2008-07-12 12:32 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Andreas Ericsson
Hi,
Johannes Schindelin schrieb:
> On Fri, 11 Jul 2008, Andreas Ericsson wrote:
>
>> Seems like you're being bitten by a bug we had some months back,
>> where the client requested full history for new tag objects.
>
> I do not think so. I think it is a problem with the pack. The slowness
> is already there in the clone, in the resolving phase.
Thanks for having a look at this! What does "problem with the pack"
mean? Do you think it is a Git problem (client or server side?) or just
a misconfiguration?
Best regards,
Stephan Hennig
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-12 12:32 ` Stephan Hennig
@ 2008-07-12 17:05 ` Johannes Schindelin
2008-07-13 1:15 ` Shawn O. Pearce
2008-07-13 9:01 ` git pull is slow Stephan Hennig
0 siblings, 2 replies; 51+ messages in thread
From: Johannes Schindelin @ 2008-07-12 17:05 UTC (permalink / raw)
To: Stephan Hennig, Nicolas Pitre; +Cc: Andreas Ericsson, git
Hi,
On Sat, 12 Jul 2008, Stephan Hennig wrote:
> Johannes Schindelin schrieb:
> > On Fri, 11 Jul 2008, Andreas Ericsson wrote:
> >
> >> Seems like you're being bitten by a bug we had some months back,
> >> where the client requested full history for new tag objects.
> >
> > I do not think so. I think it is a problem with the pack. The
> > slowness is already there in the clone, in the resolving phase.
>
> Thanks for having a look at this! What does "problem with the pack"
> mean? Do you think it is a Git problem (client or server side?) or just
> a misconfiguration?
I thought that the blobs in the pack are just too similar. That makes for
a good compression, since you get many relatively small deltas. But it
also makes for a lot of work to reconstruct the blobs.
I suspected that you run out of space for the cache holding some
reconstructed blobs (to prevent reconstructing all of them from scratch).
To see what I mean, just look at
$ git -p verify-pack -v \
.git/objects/pack/pack-563c2d83940c7e2d8c20a35206a390e2e567282f.pack
(or whatever pack you have there). It has this:
-- snip --
chain length = 40: 7 objects
chain length = 41: 8 objects
chain length = 42: 4 objects
chain length = 43: 8 objects
chain length = 44: 6 objects
chain length = 45: 2 objects
chain length = 46: 6 objects
chain length = 47: 2 objects
chain length = 48: 2 objects
chain length = 49: 2 objects
chain length = 50: 2 objects
-- snap --
... but that could not be the reason, as my current git.git's pack shows
this:
-- snip --
chain length = 40: 122 objects
chain length = 41: 99 objects
chain length = 42: 77 objects
chain length = 43: 76 objects
chain length = 44: 69 objects
chain length = 45: 72 objects
chain length = 46: 66 objects
chain length = 47: 103 objects
chain length = 48: 77 objects
chain length = 49: 111 objects
chain length = 50: 86 objects
chain length > 50: 60 objects
-- snap --
... which is much worse.
So I tried this:
-- snip --
wortliste$ /usr/bin/time git index-pack -o /dev/null
.git/objects/pack/pack-563c2d83940c7e2d8c20a35206a390e2e567282f.pack
fatal: unable to create /dev/null: File exists
Command exited with non-zero status 128
27.12user 11.21system 2:51.02elapsed 22%CPU (0avgtext+0avgdata
0maxresident)k
81848inputs+0outputs (1134major+2042348minor)pagefaults 0swaps
-- snap --
Compare that to git.git:
-- snip --
git$ /usr/bin/time git index-pack -o /dev/null
.git/objects/pack/pack-355b54f45778b56c00099bf45369f8a4f2704a51.pack
fatal: unable to create /dev/null: File exists
Command exited with non-zero status 128
16.13user 0.38system 0:17.80elapsed 92%CPU (0avgtext+0avgdata
0maxresident)k
81288inputs+0outputs (38major+51917minor)pagefaults 0swaps
-- snap --
So it seems that the major faults (requiring I/O) occur substantially more
often with your repository.
BTW the RAM numbers here are obviously bogus, the program trashed the disk
like there was no tomorrow.
Okay, "valgrind --tool=massif" to the rescue:
-- snip --
MB
555.9^ , #
| @..#
| @. :@::#
| , @@: :@::#
| ,@. @:. .:: @: : @@: :@::#
| @: .@@::@:: :::: @: : @@: :@::#
| , .@: :@@::@:: :::: @: : @@: :@::#
| . .@ :@: :@@::@:: :::: @: : @@: :@::#
| . :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::#
| . : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::#
| . ,.: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# :
| .: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# :
| ::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# :
| :::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# :
| . ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# :.
| .: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
| ::: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
| : ::: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
| .:: ::: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
| . :::: ::: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
0----------------------------------------------------------------------->Gi
32.83
-- snap --
Whoa. As you can see, your puny little 3.3 megabyte pack is blown to a
full 555 megabyte in RAM.
That is bad.
Okay, so what is the reason?
You have a pretty large file there, "wortliste", weighing in with 13
megabyte. This file is part of at least one of those 50-strong delta
chains.
To reconstruct the blobs, we have to store all intermediate versions in
RAM (since index-pack is called with "--stdin" from receive-pack, which is
called by clone). Now, the file was big from the beginning, so you end up
with ~13*50 megabyte (actually, even 100 megabyte less) while indexing
one single delta chain.
My tests were performed on a puny little laptop (512MB RAM, to be precise,
as I am a strong believer that developers with too powerful machines just
lose touch to reality and write programs that are only useful to
themselves, but useless for everyone else), where this hurt big time.
Now, I do not know the internals of index-pack enough to know if there is
a way to cut the memory usage (by throwing out earlier reconstructed
blobs, for example, and reconstructing them _again_ if need be), so I
Cc:ed Nico and hand the problem off to him.
I expect this to touch the resolve_delta() function of index-pack.c in a
major way, though.
Ciao,
Dscho
P.S.: It seems that "git verify-pack -v" only shows the sizes of the
deltas. Might be interesting to some to show the unpacked _full_ size,
too.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-12 17:05 ` Johannes Schindelin
@ 2008-07-13 1:15 ` Shawn O. Pearce
2008-07-13 13:59 ` Johannes Schindelin
` (5 more replies)
2008-07-13 9:01 ` git pull is slow Stephan Hennig
1 sibling, 6 replies; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-13 1:15 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Stephan Hennig, Nicolas Pitre, Andreas Ericsson, git
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Sat, 12 Jul 2008, Stephan Hennig wrote:
> >
> > Thanks for having a look at this! What does "problem with the pack"
> > mean? Do you think it is a Git problem (client or server side?) or just
> > a misconfiguration?
>
> I thought that the blobs in the pack are just too similar. That makes for
> a good compression, since you get many relatively small deltas. But it
> also makes for a lot of work to reconstruct the blobs.
>
> I suspected that you run out of space for the cache holding some
> reconstructed blobs (to prevent reconstructing all of them from scratch).
...
> Whoa. As you can see, your puny little 3.3 megabyte pack is blown to a
> full 555 megabyte in RAM.
...
> I expect this to touch the resolve_delta() function of index-pack.c in a
> major way, though.
Yea, that's going to be ugly. The "cache" you speak of above is held
on the call stack as resolv_delta() recurses through the delta chain
to reconstruct objects and generate their SHA-1s. There isn't a way to
release these objects when memory gets low so your worst case scenario
is a 100M+ blob with a delta chain of 50 or more - that will take you
5G of memory to pass through index-pack.
jgit isn't any better here.
What we need to do is maintain a list of the objects we are holding
on the call stack, and reduce ones up near the top when memory
gets low. Then upon recursing back up we can just recreate the
object if we had to throw it out. The higher up on the stack the
object is, the less likely we are to need it in the near future.
The more that I think about this, the easier it sounds to implement.
I may try to look at it a little later this evening.
> P.S.: It seems that "git verify-pack -v" only shows the sizes of the
> deltas. Might be interesting to some to show the unpacked _full_ size,
> too.
It wouldn't be very difficult to get that unpacked size. We just have
to deflate enough of the delta to see the delta header and obtain the
inflated object size from that. Unfortunately there is not an API in
sha1_file.c to offer that information to callers.
--
Shawn.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-12 17:05 ` Johannes Schindelin
2008-07-13 1:15 ` Shawn O. Pearce
@ 2008-07-13 9:01 ` Stephan Hennig
1 sibling, 0 replies; 51+ messages in thread
From: Stephan Hennig @ 2008-07-13 9:01 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Nicolas Pitre, Andreas Ericsson
Johannes Schindelin schrieb:
> I suspected that you run out of space for the cache holding some
> reconstructed blobs (to prevent reconstructing all of them from scratch).
>
> [...]
>
> Okay, "valgrind --tool=massif" to the rescue:
>
> -- snip --
> MB
> 555.9^ , #
> | @..#
> ~
> | . :::: ::: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
> 0----------------------------------------------------------------------->Gi
> 32.83
> -- snap --
>
> [...]
>
> Now, I do not know the internals of index-pack enough to know if there is
> a way to cut the memory usage (by throwing out earlier reconstructed
> blobs, for example, and reconstructing them _again_ if need be), so I
> Cc:ed Nico and hand the problem off to him.
>
> I expect this to touch the resolve_delta() function of index-pack.c in a
> major way, though.
Thank you for the analysis! As a workaround, I think about splitting
the file. Can you suggest a proper file size?
Best regards,
Stephan Hennig
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-13 1:15 ` Shawn O. Pearce
@ 2008-07-13 13:59 ` Johannes Schindelin
2008-07-13 22:11 ` Shawn O. Pearce
2008-07-14 2:07 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
` (4 subsequent siblings)
5 siblings, 1 reply; 51+ messages in thread
From: Johannes Schindelin @ 2008-07-13 13:59 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Stephan Hennig, Nicolas Pitre, Andreas Ericsson, git
Hi,
On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > On Sat, 12 Jul 2008, Stephan Hennig wrote:
> > >
> > > Thanks for having a look at this! What does "problem with the pack"
> > > mean? Do you think it is a Git problem (client or server side?) or just
> > > a misconfiguration?
> >
> > I thought that the blobs in the pack are just too similar. That makes for
> > a good compression, since you get many relatively small deltas. But it
> > also makes for a lot of work to reconstruct the blobs.
> >
> > I suspected that you run out of space for the cache holding some
> > reconstructed blobs (to prevent reconstructing all of them from scratch).
> ...
> > Whoa. As you can see, your puny little 3.3 megabyte pack is blown to a
> > full 555 megabyte in RAM.
> ...
> > I expect this to touch the resolve_delta() function of index-pack.c in a
> > major way, though.
>
> Yea, that's going to be ugly. The "cache" you speak of above is held on
> the call stack as resolv_delta() recurses through the delta chain to
> reconstruct objects and generate their SHA-1s. There isn't a way to
> release these objects when memory gets low so your worst case scenario
> is a 100M+ blob with a delta chain of 50 or more - that will take you 5G
> of memory to pass through index-pack.
Actually, there is...
You would only need to tap into the "release_pack_memory()" mechanism,
adding a sort of a "smart pointer" that knows how to reconstruct its
contents.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: git pull is slow
2008-07-13 13:59 ` Johannes Schindelin
@ 2008-07-13 22:11 ` Shawn O. Pearce
0 siblings, 0 replies; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-13 22:11 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Stephan Hennig, Nicolas Pitre, Andreas Ericsson, git
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
> > Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > ...
> > > I expect this to touch the resolve_delta() function of index-pack.c in a
> > > major way, though.
> >
> > Yea, that's going to be ugly. The "cache" you speak of above is held on
> > the call stack as resolv_delta() recurses through the delta chain to
> > reconstruct objects and generate their SHA-1s. There isn't a way to
> > release these objects when memory gets low so your worst case scenario
> > is a 100M+ blob with a delta chain of 50 or more - that will take you 5G
> > of memory to pass through index-pack.
>
> Actually, there is...
>
> You would only need to tap into the "release_pack_memory()" mechanism,
> adding a sort of a "smart pointer" that knows how to reconstruct its
> contents.
I guess you don't really know how index-pack is organized than.
It is not quite that simple.
I think we want to use the core.deltabasecachelimit parameter inside
of index-pack to limit the size of the cache, but evict based on
the callstack depth. Objects deeper back in the callstack should
evict before objects closer to the top of the stack.
Reconstruction is a bit complex as we don't have the machinary in
sha1_file.c available to us, as the index is not available, as we
are still in the process of creating the damn thing.
I'm working up a patch series to resolve this. I'm going to shutup
now and just try to post working code sometime this evening.
--
Shawn.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-13 1:15 ` Shawn O. Pearce
2008-07-13 13:59 ` Johannes Schindelin
@ 2008-07-14 2:07 ` Shawn O. Pearce
2008-07-14 2:27 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 1/4] index-pack: Refactor base arguments of resolve_delta into a struct Shawn O. Pearce
` (3 subsequent siblings)
5 siblings, 1 reply; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-14 2:07 UTC (permalink / raw)
To: Junio C Hamano, Junio C Hamano
Cc: git, Stephan Hennig, Nicolas Pitre, Andreas Ericsson
This should resolve the obscene memory usage of index-pack when
resolving deltas for very large files.
Shawn O. Pearce (4):
index-pack: Refactor base arguments of resolve_delta into a struct
index-pack: Chain the struct base_data on the stack for traversal
index-pack: Track the object_entry that creates each base_data
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas
index-pack.c | 137 +++++++++++++++++++++++++++++++++++++++++++++------------
1 files changed, 108 insertions(+), 29 deletions(-)
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 1/4] index-pack: Refactor base arguments of resolve_delta into a struct
2008-07-13 1:15 ` Shawn O. Pearce
2008-07-13 13:59 ` Johannes Schindelin
2008-07-14 2:07 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
@ 2008-07-14 2:07 ` Shawn O. Pearce
2008-07-15 2:40 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 2/4] index-pack: Chain the struct base_data on the stack for traversal Shawn O. Pearce
` (2 subsequent siblings)
5 siblings, 1 reply; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-14 2:07 UTC (permalink / raw)
To: Junio C Hamano, Junio C Hamano
Cc: git, Stephan Hennig, Nicolas Pitre, Andreas Ericsson
We need to discard base objects which are not recently used if our
memory gets low, such as when we are unpacking a long delta chain
of a very large object.
To support tracking the available base objects we combine the
pointer and size into a struct. Future changes would allow the
data pointer to be free'd and marked NULL if memory gets low.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
index-pack.c | 60 +++++++++++++++++++++++++++++++--------------------------
1 files changed, 33 insertions(+), 27 deletions(-)
diff --git a/index-pack.c b/index-pack.c
index 25db5db..db03478 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -26,6 +26,11 @@ union delta_base {
off_t offset;
};
+struct base_data {
+ void *data;
+ unsigned long size;
+};
+
/*
* Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
* to memcmp() only the first 20 bytes.
@@ -426,25 +431,25 @@ static void sha1_object(const void *data, unsigned long size,
}
}
-static void resolve_delta(struct object_entry *delta_obj, void *base_data,
- unsigned long base_size, enum object_type type)
+static void resolve_delta(struct object_entry *delta_obj,
+ struct base_data *base_obj, enum object_type type)
{
void *delta_data;
unsigned long delta_size;
- void *result;
- unsigned long result_size;
union delta_base delta_base;
int j, first, last;
+ struct base_data result;
delta_obj->real_type = type;
delta_data = get_data_from_pack(delta_obj);
delta_size = delta_obj->size;
- result = patch_delta(base_data, base_size, delta_data, delta_size,
- &result_size);
+ result.data = patch_delta(base_obj->data, base_obj->size,
+ delta_data, delta_size,
+ &result.size);
free(delta_data);
- if (!result)
+ if (!result.data)
bad_object(delta_obj->idx.offset, "failed to apply delta");
- sha1_object(result, result_size, type, delta_obj->idx.sha1);
+ sha1_object(result.data, result.size, type, delta_obj->idx.sha1);
nr_resolved_deltas++;
hashcpy(delta_base.sha1, delta_obj->idx.sha1);
@@ -452,7 +457,7 @@ static void resolve_delta(struct object_entry *delta_obj, void *base_data,
for (j = first; j <= last; j++) {
struct object_entry *child = objects + deltas[j].obj_no;
if (child->real_type == OBJ_REF_DELTA)
- resolve_delta(child, result, result_size, type);
+ resolve_delta(child, &result, type);
}
}
@@ -462,11 +467,11 @@ static void resolve_delta(struct object_entry *delta_obj, void *base_data,
for (j = first; j <= last; j++) {
struct object_entry *child = objects + deltas[j].obj_no;
if (child->real_type == OBJ_OFS_DELTA)
- resolve_delta(child, result, result_size, type);
+ resolve_delta(child, &result, type);
}
}
- free(result);
+ free(result.data);
}
static int compare_delta_entry(const void *a, const void *b)
@@ -481,7 +486,6 @@ static void parse_pack_objects(unsigned char *sha1)
{
int i;
struct delta_entry *delta = deltas;
- void *data;
struct stat st;
/*
@@ -496,7 +500,7 @@ static void parse_pack_objects(unsigned char *sha1)
nr_objects);
for (i = 0; i < nr_objects; i++) {
struct object_entry *obj = &objects[i];
- data = unpack_raw_entry(obj, &delta->base);
+ void *data = unpack_raw_entry(obj, &delta->base);
obj->real_type = obj->type;
if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA) {
nr_deltas++;
@@ -545,6 +549,7 @@ static void parse_pack_objects(unsigned char *sha1)
struct object_entry *obj = &objects[i];
union delta_base base;
int j, ref, ref_first, ref_last, ofs, ofs_first, ofs_last;
+ struct base_data base_obj;
if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA)
continue;
@@ -555,22 +560,22 @@ static void parse_pack_objects(unsigned char *sha1)
ofs = !find_delta_children(&base, &ofs_first, &ofs_last);
if (!ref && !ofs)
continue;
- data = get_data_from_pack(obj);
+ base_obj.data = get_data_from_pack(obj);
+ base_obj.size = obj->size;
+
if (ref)
for (j = ref_first; j <= ref_last; j++) {
struct object_entry *child = objects + deltas[j].obj_no;
if (child->real_type == OBJ_REF_DELTA)
- resolve_delta(child, data,
- obj->size, obj->type);
+ resolve_delta(child, &base_obj, obj->type);
}
if (ofs)
for (j = ofs_first; j <= ofs_last; j++) {
struct object_entry *child = objects + deltas[j].obj_no;
if (child->real_type == OBJ_OFS_DELTA)
- resolve_delta(child, data,
- obj->size, obj->type);
+ resolve_delta(child, &base_obj, obj->type);
}
- free(data);
+ free(base_obj.data);
display_progress(progress, nr_resolved_deltas);
}
}
@@ -656,28 +661,29 @@ static void fix_unresolved_deltas(int nr_unresolved)
for (i = 0; i < n; i++) {
struct delta_entry *d = sorted_by_pos[i];
- void *data;
- unsigned long size;
enum object_type type;
int j, first, last;
+ struct base_data base_obj;
if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
continue;
- data = read_sha1_file(d->base.sha1, &type, &size);
- if (!data)
+ base_obj.data = read_sha1_file(d->base.sha1, &type, &base_obj.size);
+ if (!base_obj.data)
continue;
find_delta_children(&d->base, &first, &last);
for (j = first; j <= last; j++) {
struct object_entry *child = objects + deltas[j].obj_no;
if (child->real_type == OBJ_REF_DELTA)
- resolve_delta(child, data, size, type);
+ resolve_delta(child, &base_obj, type);
}
- if (check_sha1_signature(d->base.sha1, data, size, typename(type)))
+ if (check_sha1_signature(d->base.sha1, base_obj.data,
+ base_obj.size, typename(type)))
die("local object %s is corrupt", sha1_to_hex(d->base.sha1));
- append_obj_to_pack(d->base.sha1, data, size, type);
- free(data);
+ append_obj_to_pack(d->base.sha1, base_obj.data,
+ base_obj.size, type);
+ free(base_obj.data);
display_progress(progress, nr_resolved_deltas);
}
free(sorted_by_pos);
--
1.5.6.2.393.g45096
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH 2/4] index-pack: Chain the struct base_data on the stack for traversal
2008-07-13 1:15 ` Shawn O. Pearce
` (2 preceding siblings ...)
2008-07-14 2:07 ` [PATCH 1/4] index-pack: Refactor base arguments of resolve_delta into a struct Shawn O. Pearce
@ 2008-07-14 2:07 ` Shawn O. Pearce
2008-07-15 2:48 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 3/4] index-pack: Track the object_entry that creates each base_data Shawn O. Pearce
2008-07-14 2:07 ` [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas Shawn O. Pearce
5 siblings, 1 reply; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-14 2:07 UTC (permalink / raw)
To: Junio C Hamano, Junio C Hamano
Cc: git, Stephan Hennig, Nicolas Pitre, Andreas Ericsson
We need to release earlier inflated base objects when memory gets
low, which means we need to be able to walk up or down the stack
to locate the objects we want to release, and free their data.
The new link/unlink routines allow inserting and removing the struct
base_data during recursion inside resolve_delta, and the global
base_cache gives us the head of the chain (bottom of the stack)
so we can traverse it.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
index-pack.c | 34 +++++++++++++++++++++++++++++++---
1 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/index-pack.c b/index-pack.c
index db03478..6c59fd3 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -27,6 +27,8 @@ union delta_base {
};
struct base_data {
+ struct base_data *base;
+ struct base_data *child;
void *data;
unsigned long size;
};
@@ -48,6 +50,7 @@ struct delta_entry
static struct object_entry *objects;
static struct delta_entry *deltas;
+static struct base_data *base_cache;
static int nr_objects;
static int nr_deltas;
static int nr_resolved_deltas;
@@ -216,6 +219,27 @@ static void bad_object(unsigned long offset, const char *format, ...)
die("pack has bad object at offset %lu: %s", offset, buf);
}
+static void link_base_data(struct base_data *base, struct base_data *c)
+{
+ if (base)
+ base->child = c;
+ else
+ base_cache = c;
+
+ c->base = base;
+ c->child = NULL;
+}
+
+static void unlink_base_data(struct base_data *c)
+{
+ struct base_data *base = c->base;
+ if (base)
+ base->child = NULL;
+ else
+ base_cache = NULL;
+ free(c->data);
+}
+
static void *unpack_entry_data(unsigned long offset, unsigned long size)
{
z_stream stream;
@@ -452,6 +476,8 @@ static void resolve_delta(struct object_entry *delta_obj,
sha1_object(result.data, result.size, type, delta_obj->idx.sha1);
nr_resolved_deltas++;
+ link_base_data(base_obj, &result);
+
hashcpy(delta_base.sha1, delta_obj->idx.sha1);
if (!find_delta_children(&delta_base, &first, &last)) {
for (j = first; j <= last; j++) {
@@ -471,7 +497,7 @@ static void resolve_delta(struct object_entry *delta_obj,
}
}
- free(result.data);
+ unlink_base_data(&result);
}
static int compare_delta_entry(const void *a, const void *b)
@@ -562,6 +588,7 @@ static void parse_pack_objects(unsigned char *sha1)
continue;
base_obj.data = get_data_from_pack(obj);
base_obj.size = obj->size;
+ link_base_data(NULL, &base_obj);
if (ref)
for (j = ref_first; j <= ref_last; j++) {
@@ -575,7 +602,7 @@ static void parse_pack_objects(unsigned char *sha1)
if (child->real_type == OBJ_OFS_DELTA)
resolve_delta(child, &base_obj, obj->type);
}
- free(base_obj.data);
+ unlink_base_data(&base_obj);
display_progress(progress, nr_resolved_deltas);
}
}
@@ -670,6 +697,7 @@ static void fix_unresolved_deltas(int nr_unresolved)
base_obj.data = read_sha1_file(d->base.sha1, &type, &base_obj.size);
if (!base_obj.data)
continue;
+ link_base_data(NULL, &base_obj);
find_delta_children(&d->base, &first, &last);
for (j = first; j <= last; j++) {
@@ -683,7 +711,7 @@ static void fix_unresolved_deltas(int nr_unresolved)
die("local object %s is corrupt", sha1_to_hex(d->base.sha1));
append_obj_to_pack(d->base.sha1, base_obj.data,
base_obj.size, type);
- free(base_obj.data);
+ unlink_base_data(&base_obj);
display_progress(progress, nr_resolved_deltas);
}
free(sorted_by_pos);
--
1.5.6.2.393.g45096
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH 3/4] index-pack: Track the object_entry that creates each base_data
2008-07-13 1:15 ` Shawn O. Pearce
` (3 preceding siblings ...)
2008-07-14 2:07 ` [PATCH 2/4] index-pack: Chain the struct base_data on the stack for traversal Shawn O. Pearce
@ 2008-07-14 2:07 ` Shawn O. Pearce
2008-07-14 10:15 ` Johannes Schindelin
2008-07-15 2:50 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas Shawn O. Pearce
5 siblings, 2 replies; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-14 2:07 UTC (permalink / raw)
To: Junio C Hamano, Junio C Hamano
Cc: git, Stephan Hennig, Nicolas Pitre, Andreas Ericsson
If we free the data stored within a base_data we need the struct
object_entry to get the data back again for use with another
dependent delta. Storing the object_entry* makes it simple to call
get_data_from_pack() to recover the compressed information.
This however means we must add the missing baes object to the end
of our packfile prior to calling resolve_delta() on each of the
dependent deltas. Adding the base first ensures we can read the
base back from the pack we indexing, as if it had been included by
the remote side.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
index-pack.c | 18 ++++++++++++------
1 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/index-pack.c b/index-pack.c
index 6c59fd3..7239e89 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -29,6 +29,7 @@ union delta_base {
struct base_data {
struct base_data *base;
struct base_data *child;
+ struct object_entry *obj;
void *data;
unsigned long size;
};
@@ -476,6 +477,7 @@ static void resolve_delta(struct object_entry *delta_obj,
sha1_object(result.data, result.size, type, delta_obj->idx.sha1);
nr_resolved_deltas++;
+ result.obj = delta_obj;
link_base_data(base_obj, &result);
hashcpy(delta_base.sha1, delta_obj->idx.sha1);
@@ -588,6 +590,7 @@ static void parse_pack_objects(unsigned char *sha1)
continue;
base_obj.data = get_data_from_pack(obj);
base_obj.size = obj->size;
+ base_obj.obj = obj;
link_base_data(NULL, &base_obj);
if (ref)
@@ -633,7 +636,8 @@ static int write_compressed(int fd, void *in, unsigned int size, uint32_t *obj_c
return size;
}
-static void append_obj_to_pack(const unsigned char *sha1, void *buf,
+static struct object_entry *append_obj_to_pack(
+ const unsigned char *sha1, void *buf,
unsigned long size, enum object_type type)
{
struct object_entry *obj = &objects[nr_objects++];
@@ -654,6 +658,7 @@ static void append_obj_to_pack(const unsigned char *sha1, void *buf,
obj[1].idx.offset = obj[0].idx.offset + n;
obj[1].idx.offset += write_compressed(output_fd, buf, size, &obj[0].idx.crc32);
hashcpy(obj->idx.sha1, sha1);
+ return obj;
}
static int delta_pos_compare(const void *_a, const void *_b)
@@ -697,6 +702,12 @@ static void fix_unresolved_deltas(int nr_unresolved)
base_obj.data = read_sha1_file(d->base.sha1, &type, &base_obj.size);
if (!base_obj.data)
continue;
+
+ if (check_sha1_signature(d->base.sha1, base_obj.data,
+ base_obj.size, typename(type)))
+ die("local object %s is corrupt", sha1_to_hex(d->base.sha1));
+ base_obj.obj = append_obj_to_pack(d->base.sha1, base_obj.data,
+ base_obj.size, type);
link_base_data(NULL, &base_obj);
find_delta_children(&d->base, &first, &last);
@@ -706,11 +717,6 @@ static void fix_unresolved_deltas(int nr_unresolved)
resolve_delta(child, &base_obj, type);
}
- if (check_sha1_signature(d->base.sha1, base_obj.data,
- base_obj.size, typename(type)))
- die("local object %s is corrupt", sha1_to_hex(d->base.sha1));
- append_obj_to_pack(d->base.sha1, base_obj.data,
- base_obj.size, type);
unlink_base_data(&base_obj);
display_progress(progress, nr_resolved_deltas);
}
--
1.5.6.2.393.g45096
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas
2008-07-13 1:15 ` Shawn O. Pearce
` (4 preceding siblings ...)
2008-07-14 2:07 ` [PATCH 3/4] index-pack: Track the object_entry that creates each base_data Shawn O. Pearce
@ 2008-07-14 2:07 ` Shawn O. Pearce
2008-07-15 3:05 ` Nicolas Pitre
5 siblings, 1 reply; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-14 2:07 UTC (permalink / raw)
To: Junio C Hamano, Junio C Hamano
Cc: git, Stephan Hennig, Nicolas Pitre, Andreas Ericsson
If we are trying to resolve deltas for a long delta chain composed
of multi-megabyte objects we can easily run into requiring 500M+
of memory to hold each object in the chain on the call stack while
we recurse into the dependent objects and resolve them.
We now use a simple delta cache that discards objects near the
bottom of the call stack first, as they are the most least recently
used objects in this current delta chain. If we recurse out of a
chain we may find the base object is no longer available, as it was
free'd to keep memory under the deltaBaseCacheLimit. In such cases
we must unpack the base object again, which will require recursing
back to the root of the top of the delta chain as we released that
root first.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
index-pack.c | 43 +++++++++++++++++++++++++++++++++++++++++--
1 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/index-pack.c b/index-pack.c
index 7239e89..84c9fc1 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -52,6 +52,7 @@ struct delta_entry
static struct object_entry *objects;
static struct delta_entry *deltas;
static struct base_data *base_cache;
+static size_t base_cache_used;
static int nr_objects;
static int nr_deltas;
static int nr_resolved_deltas;
@@ -220,6 +221,20 @@ static void bad_object(unsigned long offset, const char *format, ...)
die("pack has bad object at offset %lu: %s", offset, buf);
}
+static void prune_base_data(void)
+{
+ struct base_data *b = base_cache;
+ for (b = base_cache;
+ base_cache_used > delta_base_cache_limit && b && b->child;
+ b = b->child) {
+ if (b->data) {
+ free(b->data);
+ b->data = NULL;
+ base_cache_used -= b->size;
+ }
+ }
+}
+
static void link_base_data(struct base_data *base, struct base_data *c)
{
if (base)
@@ -229,6 +244,8 @@ static void link_base_data(struct base_data *base, struct base_data *c)
c->base = base;
c->child = NULL;
+ base_cache_used += c->size;
+ prune_base_data();
}
static void unlink_base_data(struct base_data *c)
@@ -238,7 +255,10 @@ static void unlink_base_data(struct base_data *c)
base->child = NULL;
else
base_cache = NULL;
- free(c->data);
+ if (c->data) {
+ free(c->data);
+ base_cache_used -= c->size;
+ }
}
static void *unpack_entry_data(unsigned long offset, unsigned long size)
@@ -456,6 +476,25 @@ static void sha1_object(const void *data, unsigned long size,
}
}
+static void *get_base_data(struct base_data *c)
+{
+ if (!c->data) {
+ struct object_entry *obj = c->obj;
+ void *raw = get_data_from_pack(obj);
+ if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA) {
+ c->data = patch_delta(
+ get_base_data(c->base), c->base->size,
+ raw, obj->size,
+ &c->size);
+ free(raw);
+ if (!c->data)
+ bad_object(obj->idx.offset, "failed to apply delta");
+ } else
+ c->data = raw;
+ }
+ return c->data;
+}
+
static void resolve_delta(struct object_entry *delta_obj,
struct base_data *base_obj, enum object_type type)
{
@@ -468,7 +507,7 @@ static void resolve_delta(struct object_entry *delta_obj,
delta_obj->real_type = type;
delta_data = get_data_from_pack(delta_obj);
delta_size = delta_obj->size;
- result.data = patch_delta(base_obj->data, base_obj->size,
+ result.data = patch_delta(get_base_data(base_obj), base_obj->size,
delta_data, delta_size,
&result.size);
free(delta_data);
--
1.5.6.2.393.g45096
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 2:07 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
@ 2008-07-14 2:27 ` Nicolas Pitre
2008-07-14 3:12 ` Shawn O. Pearce
0 siblings, 1 reply; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-14 2:27 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Junio C Hamano, Junio C Hamano, git, Stephan Hennig,
Andreas Ericsson
On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
> This should resolve the obscene memory usage of index-pack when
> resolving deltas for very large files.
>
> Shawn O. Pearce (4):
> index-pack: Refactor base arguments of resolve_delta into a struct
> index-pack: Chain the struct base_data on the stack for traversal
> index-pack: Track the object_entry that creates each base_data
> index-pack: Honor core.deltaBaseCacheLimit when resolving deltas
Well...
I don't like this. Not your patch, but rather the direction this whole
thing took in the first place, and now we have to bolt extra complexity
on top.
I have the feeling this is not the best approach, especially since many
long delta chains are going deep and all those intermediate objects are
simply useless once the _only_ delta child has been resolved and
therefore could be freed right away instead.
But I have to sleep on it first.
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 2:27 ` Nicolas Pitre
@ 2008-07-14 3:12 ` Shawn O. Pearce
2008-07-14 11:44 ` Johannes Schindelin
0 siblings, 1 reply; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-14 3:12 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git, Stephan Hennig, Andreas Ericsson
Nicolas Pitre <nico@cam.org> wrote:
> On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
>
> > This should resolve the obscene memory usage of index-pack when
> > resolving deltas for very large files.
>
> I don't like this. Not your patch, but rather the direction this whole
> thing took in the first place, and now we have to bolt extra complexity
> on top.
>
> I have the feeling this is not the best approach, especially since many
> long delta chains are going deep and all those intermediate objects are
> simply useless once the _only_ delta child has been resolved and
> therefore could be freed right away instead.
I thought about trying to free a base object if this is the last
resolve_delta() call which would require that data, but I realized
this is just a "tail-call optimization" and doesn't work in the
more general case. This patch series is the best solution I could
come up with to handle even the general case.
The only other alternative I can come up with is to change
pack-objects (or at least its defaults) so we don't generate these
massive delta chains. But there are already packs in the wild that
use these chains, resulting in performance problems for clients.
--
Shawn.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 3/4] index-pack: Track the object_entry that creates each base_data
2008-07-14 2:07 ` [PATCH 3/4] index-pack: Track the object_entry that creates each base_data Shawn O. Pearce
@ 2008-07-14 10:15 ` Johannes Schindelin
2008-07-15 2:50 ` Nicolas Pitre
1 sibling, 0 replies; 51+ messages in thread
From: Johannes Schindelin @ 2008-07-14 10:15 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Junio C Hamano, git, Stephan Hennig, Nicolas Pitre,
Andreas Ericsson
Hi,
On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
> This however means we must add the missing baes object to the end
> of our packfile prior to calling resolve_delta() on each of the
> dependent deltas. Adding the base first ensures we can read the
> base back from the pack we indexing, as if it had been included by
> the remote side.
s/baes/base/
Otherwise very clear.
Thanks,
Dscho
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 3:12 ` Shawn O. Pearce
@ 2008-07-14 11:44 ` Johannes Schindelin
2008-07-14 11:54 ` Jakub Narebski
0 siblings, 1 reply; 51+ messages in thread
From: Johannes Schindelin @ 2008-07-14 11:44 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Nicolas Pitre, Junio C Hamano, git, Stephan Hennig,
Andreas Ericsson
Hi,
On Mon, 14 Jul 2008, Shawn O. Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> > On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
> >
> > > This should resolve the obscene memory usage of index-pack when
> > > resolving deltas for very large files.
> >
> > I don't like this. Not your patch, but rather the direction this whole
> > thing took in the first place, and now we have to bolt extra complexity
> > on top.
> >
> > I have the feeling this is not the best approach, especially since many
> > long delta chains are going deep and all those intermediate objects are
> > simply useless once the _only_ delta child has been resolved and
> > therefore could be freed right away instead.
>
> I thought about trying to free a base object if this is the last
> resolve_delta() call which would require that data, but I realized this
> is just a "tail-call optimization" and doesn't work in the more general
> case.
However, for cases like Stephan's I assume this is the rule. If you
think of it, most use cases for such a big blob are one-user,
append-only.
> The only other alternative I can come up with is to change pack-objects
> (or at least its defaults) so we don't generate these massive delta
> chains. But there are already packs in the wild that use these chains,
> resulting in performance problems for clients.
But the long chains make the pack actually as efficient as it is...
Ciao,
Dscho
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 11:44 ` Johannes Schindelin
@ 2008-07-14 11:54 ` Jakub Narebski
2008-07-14 12:10 ` Johannes Schindelin
` (2 more replies)
0 siblings, 3 replies; 51+ messages in thread
From: Jakub Narebski @ 2008-07-14 11:54 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Shawn O. Pearce, Nicolas Pitre, Junio C Hamano, git,
Stephan Hennig, Andreas Ericsson
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> On Mon, 14 Jul 2008, Shawn O. Pearce wrote:
> > The only other alternative I can come up with is to change pack-objects
> > (or at least its defaults) so we don't generate these massive delta
> > chains. But there are already packs in the wild that use these chains,
> > resulting in performance problems for clients.
>
> But the long chains make the pack actually as efficient as it is...
Perhaps Shawn thought here about limiting delta chain not by its
*length*, but by its *size* (as required when unpacking last object
in a delta chanin).
What do you think about this idea?
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 11:54 ` Jakub Narebski
@ 2008-07-14 12:10 ` Johannes Schindelin
2008-07-14 12:16 ` Andreas Ericsson
2008-07-15 4:19 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2 siblings, 0 replies; 51+ messages in thread
From: Johannes Schindelin @ 2008-07-14 12:10 UTC (permalink / raw)
To: Jakub Narebski
Cc: Shawn O. Pearce, Nicolas Pitre, Junio C Hamano, git,
Stephan Hennig, Andreas Ericsson
Hi,
On Mon, 14 Jul 2008, Jakub Narebski wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > On Mon, 14 Jul 2008, Shawn O. Pearce wrote:
>
> > > The only other alternative I can come up with is to change pack-objects
> > > (or at least its defaults) so we don't generate these massive delta
> > > chains. But there are already packs in the wild that use these chains,
> > > resulting in performance problems for clients.
> >
> > But the long chains make the pack actually as efficient as it is...
>
> Perhaps Shawn thought here about limiting delta chain not by its
> *length*, but by its *size* (as required when unpacking last object
> in a delta chanin).
So you mean once the sizes of the reconstructed objects are too big, i.e.
when the delta chain is actually _most useful_, we should not do it?
Don't think so.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 11:54 ` Jakub Narebski
2008-07-14 12:10 ` Johannes Schindelin
@ 2008-07-14 12:16 ` Andreas Ericsson
2008-07-14 12:25 ` Johannes Schindelin
2008-07-15 4:19 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2 siblings, 1 reply; 51+ messages in thread
From: Andreas Ericsson @ 2008-07-14 12:16 UTC (permalink / raw)
To: Jakub Narebski
Cc: Johannes Schindelin, Shawn O. Pearce, Nicolas Pitre,
Junio C Hamano, git, Stephan Hennig
Jakub Narebski wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> On Mon, 14 Jul 2008, Shawn O. Pearce wrote:
>
>>> The only other alternative I can come up with is to change pack-objects
>>> (or at least its defaults) so we don't generate these massive delta
>>> chains. But there are already packs in the wild that use these chains,
>>> resulting in performance problems for clients.
>> But the long chains make the pack actually as efficient as it is...
>
> Perhaps Shawn thought here about limiting delta chain not by its
> *length*, but by its *size* (as required when unpacking last object
> in a delta chanin).
>
> What do you think about this idea?
Sorry for being clueless here, but why does the older versions need
to be kept in-memory anyway? Aren't we applying the delta each time
we find one, repeatedly creating a new base-object in-memory for
each delta? If we aren't doing that, why do we need more than just
a small amount of memory just for keeping the delta?
Feel free to tell me to go away if I'm being stupid. I'm just
curious and probably won't be able to hack up any patches anyway.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 12:16 ` Andreas Ericsson
@ 2008-07-14 12:25 ` Johannes Schindelin
2008-07-14 12:51 ` Andreas Ericsson
0 siblings, 1 reply; 51+ messages in thread
From: Johannes Schindelin @ 2008-07-14 12:25 UTC (permalink / raw)
To: Andreas Ericsson
Cc: Jakub Narebski, Shawn O. Pearce, Nicolas Pitre, Junio C Hamano,
git, Stephan Hennig
Hi,
On Mon, 14 Jul 2008, Andreas Ericsson wrote:
> Sorry for being clueless here, but why does the older versions need
> to be kept in-memory anyway? Aren't we applying the delta each time
> we find one, repeatedly creating a new base-object in-memory for
> each delta? If we aren't doing that, why do we need more than just
> a small amount of memory just for keeping the delta?
Think of a delta chain of 49 delta objects, 1 base object. Now
reconstruct all of the objects.
If you do it one by one, not storing the intermediate results, you end up
applying 1 delta for the first, 2 for the second (first the first, then
the second), and so on, in total 1 + 2 + 3 + ... + 49 = 49 * 50 / 2 = 1225
times.
Compare that to the 49 times when reusing the intermediate results.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 12:25 ` Johannes Schindelin
@ 2008-07-14 12:51 ` Andreas Ericsson
2008-07-14 12:58 ` Johannes Schindelin
2008-07-15 2:21 ` Nicolas Pitre
0 siblings, 2 replies; 51+ messages in thread
From: Andreas Ericsson @ 2008-07-14 12:51 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Jakub Narebski, Shawn O. Pearce, Nicolas Pitre, Junio C Hamano,
git, Stephan Hennig
Johannes Schindelin wrote:
> Hi,
>
> On Mon, 14 Jul 2008, Andreas Ericsson wrote:
>
>> Sorry for being clueless here, but why does the older versions need
>> to be kept in-memory anyway? Aren't we applying the delta each time
>> we find one, repeatedly creating a new base-object in-memory for
>> each delta? If we aren't doing that, why do we need more than just
>> a small amount of memory just for keeping the delta?
>
> Think of a delta chain of 49 delta objects, 1 base object. Now
> reconstruct all of the objects.
>
> If you do it one by one, not storing the intermediate results, you end up
> applying 1 delta for the first, 2 for the second (first the first, then
> the second), and so on, in total 1 + 2 + 3 + ... + 49 = 49 * 50 / 2 = 1225
> times.
>
> Compare that to the 49 times when reusing the intermediate results.
>
That's only true if you discard the result of applying D1 to DB though.
What I'm wondering is; Why isn't it done like this (pseudo-code):
while (delta = find_earlier_delta(sha1)) {
if (is_base_object(delta)) {
base_object = delta;
break;
}
push_delta(delta, patch_stack);
}
while (pop_delta(patch_stack))
apply_delta(base_object, delta);
where "apply_delta" replaces the base_object->blob with the delta
applied, releasing the previously used memory?
That way, you'll never use more memory than the largest ever size
of the object + 1 delta at a time and still not apply the delta
more than delta_chain_length-1 times.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 12:51 ` Andreas Ericsson
@ 2008-07-14 12:58 ` Johannes Schindelin
2008-07-15 2:21 ` Nicolas Pitre
1 sibling, 0 replies; 51+ messages in thread
From: Johannes Schindelin @ 2008-07-14 12:58 UTC (permalink / raw)
To: Andreas Ericsson
Cc: Jakub Narebski, Shawn O. Pearce, Nicolas Pitre, Junio C Hamano,
git, Stephan Hennig
Hi,
On Mon, 14 Jul 2008, Andreas Ericsson wrote:
> Johannes Schindelin wrote:
>
> > On Mon, 14 Jul 2008, Andreas Ericsson wrote:
> >
> > > Sorry for being clueless here, but why does the older versions need
> > > to be kept in-memory anyway? Aren't we applying the delta each time
> > > we find one, repeatedly creating a new base-object in-memory for
> > > each delta? If we aren't doing that, why do we need more than just
> > > a small amount of memory just for keeping the delta?
> >
> > Think of a delta chain of 49 delta objects, 1 base object. Now reconstruct
> > all of the objects.
> >
> > If you do it one by one, not storing the intermediate results, you end up
> > applying 1 delta for the first, 2 for the second (first the first, then the
> > second), and so on, in total 1 + 2 + 3 + ... + 49 = 49 * 50 / 2 = 1225
> > times.
> >
> > Compare that to the 49 times when reusing the intermediate results.
> >
>
> [...]
>
> while (pop_delta(patch_stack))
> apply_delta(base_object, delta);
>
> where "apply_delta" replaces the base_object->blob with the delta
> applied, releasing the previously used memory?
Delta chains are not (necessarily) independent.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 12:51 ` Andreas Ericsson
2008-07-14 12:58 ` Johannes Schindelin
@ 2008-07-15 2:21 ` Nicolas Pitre
2008-07-15 2:47 ` Shawn O. Pearce
1 sibling, 1 reply; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-15 2:21 UTC (permalink / raw)
To: Andreas Ericsson
Cc: Johannes Schindelin, Jakub Narebski, Shawn O. Pearce,
Junio C Hamano, git, Stephan Hennig
On Mon, 14 Jul 2008, Andreas Ericsson wrote:
> Johannes Schindelin wrote:
> > Hi,
> >
> > On Mon, 14 Jul 2008, Andreas Ericsson wrote:
> >
> > > Sorry for being clueless here, but why does the older versions need
> > > to be kept in-memory anyway? Aren't we applying the delta each time
> > > we find one, repeatedly creating a new base-object in-memory for
> > > each delta? If we aren't doing that, why do we need more than just
> > > a small amount of memory just for keeping the delta?
> >
> > Think of a delta chain of 49 delta objects, 1 base object. Now reconstruct
> > all of the objects.
> >
> > If you do it one by one, not storing the intermediate results, you end up
> > applying 1 delta for the first, 2 for the second (first the first, then the
> > second), and so on, in total 1 + 2 + 3 + ... + 49 = 49 * 50 / 2 = 1225
> > times.
> >
> > Compare that to the 49 times when reusing the intermediate results.
> >
>
> That's only true if you discard the result of applying D1 to DB though.
> What I'm wondering is; Why isn't it done like this (pseudo-code):
>
> while (delta = find_earlier_delta(sha1)) {
> if (is_base_object(delta)) {
> base_object = delta;
> break;
> }
> push_delta(delta, patch_stack);
> }
>
> while (pop_delta(patch_stack))
> apply_delta(base_object, delta);
>
>
>
> where "apply_delta" replaces the base_object->blob with the delta
> applied, releasing the previously used memory?
>
> That way, you'll never use more memory than the largest ever size
> of the object + 1 delta at a time and still not apply the delta
> more than delta_chain_length-1 times.
Those delta chains aren't simple chained lists. They are trees of
deltas where one object might be the base for an unlimited number of
deltas of depth 1, and in turn each of those deltas might constitute the
base for an unlimited number of deltas of depth 2, and so on.
So what the code does is to find out which objects are not deltas but
are the base for a delta. Then, for each of them, all deltas having
given object for base are found and they are recursively resolved so
each resolved delta is then considered a possible base for more deltas,
etc. In other words, those deltas are resolved by walking the delta
tree in a "depth first" fashion.
If we discard previous delta bases, we will have to recreate them each
time a delta sibling is processed. And if those delta bases are
themselves deltas then you have an explosion of delta results to
re-compute.
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 1/4] index-pack: Refactor base arguments of resolve_delta into a struct
2008-07-14 2:07 ` [PATCH 1/4] index-pack: Refactor base arguments of resolve_delta into a struct Shawn O. Pearce
@ 2008-07-15 2:40 ` Nicolas Pitre
0 siblings, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-15 2:40 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Junio C Hamano, Junio C Hamano, git, Stephan Hennig,
Andreas Ericsson
On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
> We need to discard base objects which are not recently used if our
> memory gets low, such as when we are unpacking a long delta chain
> of a very large object.
>
> To support tracking the available base objects we combine the
> pointer and size into a struct. Future changes would allow the
> data pointer to be free'd and marked NULL if memory gets low.
>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ACK
> ---
> index-pack.c | 60 +++++++++++++++++++++++++++++++--------------------------
> 1 files changed, 33 insertions(+), 27 deletions(-)
>
> diff --git a/index-pack.c b/index-pack.c
> index 25db5db..db03478 100644
> --- a/index-pack.c
> +++ b/index-pack.c
> @@ -26,6 +26,11 @@ union delta_base {
> off_t offset;
> };
>
> +struct base_data {
> + void *data;
> + unsigned long size;
> +};
> +
> /*
> * Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
> * to memcmp() only the first 20 bytes.
> @@ -426,25 +431,25 @@ static void sha1_object(const void *data, unsigned long size,
> }
> }
>
> -static void resolve_delta(struct object_entry *delta_obj, void *base_data,
> - unsigned long base_size, enum object_type type)
> +static void resolve_delta(struct object_entry *delta_obj,
> + struct base_data *base_obj, enum object_type type)
> {
> void *delta_data;
> unsigned long delta_size;
> - void *result;
> - unsigned long result_size;
> union delta_base delta_base;
> int j, first, last;
> + struct base_data result;
>
> delta_obj->real_type = type;
> delta_data = get_data_from_pack(delta_obj);
> delta_size = delta_obj->size;
> - result = patch_delta(base_data, base_size, delta_data, delta_size,
> - &result_size);
> + result.data = patch_delta(base_obj->data, base_obj->size,
> + delta_data, delta_size,
> + &result.size);
> free(delta_data);
> - if (!result)
> + if (!result.data)
> bad_object(delta_obj->idx.offset, "failed to apply delta");
> - sha1_object(result, result_size, type, delta_obj->idx.sha1);
> + sha1_object(result.data, result.size, type, delta_obj->idx.sha1);
> nr_resolved_deltas++;
>
> hashcpy(delta_base.sha1, delta_obj->idx.sha1);
> @@ -452,7 +457,7 @@ static void resolve_delta(struct object_entry *delta_obj, void *base_data,
> for (j = first; j <= last; j++) {
> struct object_entry *child = objects + deltas[j].obj_no;
> if (child->real_type == OBJ_REF_DELTA)
> - resolve_delta(child, result, result_size, type);
> + resolve_delta(child, &result, type);
> }
> }
>
> @@ -462,11 +467,11 @@ static void resolve_delta(struct object_entry *delta_obj, void *base_data,
> for (j = first; j <= last; j++) {
> struct object_entry *child = objects + deltas[j].obj_no;
> if (child->real_type == OBJ_OFS_DELTA)
> - resolve_delta(child, result, result_size, type);
> + resolve_delta(child, &result, type);
> }
> }
>
> - free(result);
> + free(result.data);
> }
>
> static int compare_delta_entry(const void *a, const void *b)
> @@ -481,7 +486,6 @@ static void parse_pack_objects(unsigned char *sha1)
> {
> int i;
> struct delta_entry *delta = deltas;
> - void *data;
> struct stat st;
>
> /*
> @@ -496,7 +500,7 @@ static void parse_pack_objects(unsigned char *sha1)
> nr_objects);
> for (i = 0; i < nr_objects; i++) {
> struct object_entry *obj = &objects[i];
> - data = unpack_raw_entry(obj, &delta->base);
> + void *data = unpack_raw_entry(obj, &delta->base);
> obj->real_type = obj->type;
> if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA) {
> nr_deltas++;
> @@ -545,6 +549,7 @@ static void parse_pack_objects(unsigned char *sha1)
> struct object_entry *obj = &objects[i];
> union delta_base base;
> int j, ref, ref_first, ref_last, ofs, ofs_first, ofs_last;
> + struct base_data base_obj;
>
> if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA)
> continue;
> @@ -555,22 +560,22 @@ static void parse_pack_objects(unsigned char *sha1)
> ofs = !find_delta_children(&base, &ofs_first, &ofs_last);
> if (!ref && !ofs)
> continue;
> - data = get_data_from_pack(obj);
> + base_obj.data = get_data_from_pack(obj);
> + base_obj.size = obj->size;
> +
> if (ref)
> for (j = ref_first; j <= ref_last; j++) {
> struct object_entry *child = objects + deltas[j].obj_no;
> if (child->real_type == OBJ_REF_DELTA)
> - resolve_delta(child, data,
> - obj->size, obj->type);
> + resolve_delta(child, &base_obj, obj->type);
> }
> if (ofs)
> for (j = ofs_first; j <= ofs_last; j++) {
> struct object_entry *child = objects + deltas[j].obj_no;
> if (child->real_type == OBJ_OFS_DELTA)
> - resolve_delta(child, data,
> - obj->size, obj->type);
> + resolve_delta(child, &base_obj, obj->type);
> }
> - free(data);
> + free(base_obj.data);
> display_progress(progress, nr_resolved_deltas);
> }
> }
> @@ -656,28 +661,29 @@ static void fix_unresolved_deltas(int nr_unresolved)
>
> for (i = 0; i < n; i++) {
> struct delta_entry *d = sorted_by_pos[i];
> - void *data;
> - unsigned long size;
> enum object_type type;
> int j, first, last;
> + struct base_data base_obj;
>
> if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
> continue;
> - data = read_sha1_file(d->base.sha1, &type, &size);
> - if (!data)
> + base_obj.data = read_sha1_file(d->base.sha1, &type, &base_obj.size);
> + if (!base_obj.data)
> continue;
>
> find_delta_children(&d->base, &first, &last);
> for (j = first; j <= last; j++) {
> struct object_entry *child = objects + deltas[j].obj_no;
> if (child->real_type == OBJ_REF_DELTA)
> - resolve_delta(child, data, size, type);
> + resolve_delta(child, &base_obj, type);
> }
>
> - if (check_sha1_signature(d->base.sha1, data, size, typename(type)))
> + if (check_sha1_signature(d->base.sha1, base_obj.data,
> + base_obj.size, typename(type)))
> die("local object %s is corrupt", sha1_to_hex(d->base.sha1));
> - append_obj_to_pack(d->base.sha1, data, size, type);
> - free(data);
> + append_obj_to_pack(d->base.sha1, base_obj.data,
> + base_obj.size, type);
> + free(base_obj.data);
> display_progress(progress, nr_resolved_deltas);
> }
> free(sorted_by_pos);
> --
> 1.5.6.2.393.g45096
>
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-15 2:21 ` Nicolas Pitre
@ 2008-07-15 2:47 ` Shawn O. Pearce
2008-07-15 3:06 ` Nicolas Pitre
2008-07-17 16:06 ` Stephan Hennig
0 siblings, 2 replies; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-15 2:47 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Andreas Ericsson, Johannes Schindelin, Jakub Narebski,
Junio C Hamano, git, Stephan Hennig
Nicolas Pitre <nico@cam.org> wrote:
>
> Those delta chains aren't simple chained lists. They are trees of
> deltas where one object might be the base for an unlimited number of
> deltas of depth 1, and in turn each of those deltas might constitute the
> base for an unlimited number of deltas of depth 2, and so on.
>
> So what the code does is to find out which objects are not deltas but
> are the base for a delta. Then, for each of them, all deltas having
> given object for base are found and they are recursively resolved so
> each resolved delta is then considered a possible base for more deltas,
> etc. In other words, those deltas are resolved by walking the delta
> tree in a "depth first" fashion.
>
> If we discard previous delta bases, we will have to recreate them each
> time a delta sibling is processed. And if those delta bases are
> themselves deltas then you have an explosion of delta results to
> re-compute.
Yes, it would be horrible if we had to recompute 10 deltas in order
to recover a previously discarded delta base in order to visit new
siblings.
But its even more horrible that we use 512M of memory in our working
set size on a 256M machine to process a pack that is only 300M in
size, due to long delta chains on large objects. In such a case
the system will swap and perform fairly poorly due to the huge disk
IO necessary to keep moving the working set around.
We're better off keeping our memory usage low and recomputing
the delta base when we need to return to it to process a sibling.
Please. Remember that index-pack, unlike unpack-objects, does not
hold the unresolved deltas in memory while processing the input.
It assumes the total size of the unresolved deltas may exceed
the available memory for our working set and writes them to disk,
to be read back in later during the resolving phase.
At some point it is possible for the completely inflated delta
chain to exceed the physical memory of the system. As soon as you
do that you are committed to some form of swapping. We can probably
do that better from the packfile by reinflating the super compressed
deltas than letting the OS page in huge tracts of the virtual address
space off the swap device. Plus the OS does not need to expend disk
IO to swap out the pages, we have already spent that cost when we
wrote the pack file down to disk as part of our normal operation.
I don't like adding code either. But I think I'm right. We really
need to not allow index-pack to create these massive working sets
and assume the operating system is going to be able to handle
it magically. Memory is not infinite, even if the Turing machine
theory claims it is.
--
Shawn.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 2/4] index-pack: Chain the struct base_data on the stack for traversal
2008-07-14 2:07 ` [PATCH 2/4] index-pack: Chain the struct base_data on the stack for traversal Shawn O. Pearce
@ 2008-07-15 2:48 ` Nicolas Pitre
0 siblings, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-15 2:48 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Junio C Hamano, Junio C Hamano, git, Stephan Hennig,
Andreas Ericsson
On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
> We need to release earlier inflated base objects when memory gets
> low, which means we need to be able to walk up or down the stack
> to locate the objects we want to release, and free their data.
>
> The new link/unlink routines allow inserting and removing the struct
> base_data during recursion inside resolve_delta, and the global
> base_cache gives us the head of the chain (bottom of the stack)
> so we can traverse it.
>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
You don't really need that 'base' pointer, do you? When linking a new
child, then the 'child' pointer simply has to be overwritten. there is
a window where that 'child' pointer would be invalid after the child
structure has been discarded, but no walking of the list should occur at
that point anyway.
> ---
> index-pack.c | 34 +++++++++++++++++++++++++++++++---
> 1 files changed, 31 insertions(+), 3 deletions(-)
>
> diff --git a/index-pack.c b/index-pack.c
> index db03478..6c59fd3 100644
> --- a/index-pack.c
> +++ b/index-pack.c
> @@ -27,6 +27,8 @@ union delta_base {
> };
>
> struct base_data {
> + struct base_data *base;
> + struct base_data *child;
> void *data;
> unsigned long size;
> };
> @@ -48,6 +50,7 @@ struct delta_entry
>
> static struct object_entry *objects;
> static struct delta_entry *deltas;
> +static struct base_data *base_cache;
> static int nr_objects;
> static int nr_deltas;
> static int nr_resolved_deltas;
> @@ -216,6 +219,27 @@ static void bad_object(unsigned long offset, const char *format, ...)
> die("pack has bad object at offset %lu: %s", offset, buf);
> }
>
> +static void link_base_data(struct base_data *base, struct base_data *c)
> +{
> + if (base)
> + base->child = c;
> + else
> + base_cache = c;
> +
> + c->base = base;
> + c->child = NULL;
> +}
> +
> +static void unlink_base_data(struct base_data *c)
> +{
> + struct base_data *base = c->base;
> + if (base)
> + base->child = NULL;
> + else
> + base_cache = NULL;
> + free(c->data);
> +}
> +
> static void *unpack_entry_data(unsigned long offset, unsigned long size)
> {
> z_stream stream;
> @@ -452,6 +476,8 @@ static void resolve_delta(struct object_entry *delta_obj,
> sha1_object(result.data, result.size, type, delta_obj->idx.sha1);
> nr_resolved_deltas++;
>
> + link_base_data(base_obj, &result);
> +
> hashcpy(delta_base.sha1, delta_obj->idx.sha1);
> if (!find_delta_children(&delta_base, &first, &last)) {
> for (j = first; j <= last; j++) {
> @@ -471,7 +497,7 @@ static void resolve_delta(struct object_entry *delta_obj,
> }
> }
>
> - free(result.data);
> + unlink_base_data(&result);
> }
>
> static int compare_delta_entry(const void *a, const void *b)
> @@ -562,6 +588,7 @@ static void parse_pack_objects(unsigned char *sha1)
> continue;
> base_obj.data = get_data_from_pack(obj);
> base_obj.size = obj->size;
> + link_base_data(NULL, &base_obj);
>
> if (ref)
> for (j = ref_first; j <= ref_last; j++) {
> @@ -575,7 +602,7 @@ static void parse_pack_objects(unsigned char *sha1)
> if (child->real_type == OBJ_OFS_DELTA)
> resolve_delta(child, &base_obj, obj->type);
> }
> - free(base_obj.data);
> + unlink_base_data(&base_obj);
> display_progress(progress, nr_resolved_deltas);
> }
> }
> @@ -670,6 +697,7 @@ static void fix_unresolved_deltas(int nr_unresolved)
> base_obj.data = read_sha1_file(d->base.sha1, &type, &base_obj.size);
> if (!base_obj.data)
> continue;
> + link_base_data(NULL, &base_obj);
>
> find_delta_children(&d->base, &first, &last);
> for (j = first; j <= last; j++) {
> @@ -683,7 +711,7 @@ static void fix_unresolved_deltas(int nr_unresolved)
> die("local object %s is corrupt", sha1_to_hex(d->base.sha1));
> append_obj_to_pack(d->base.sha1, base_obj.data,
> base_obj.size, type);
> - free(base_obj.data);
> + unlink_base_data(&base_obj);
> display_progress(progress, nr_resolved_deltas);
> }
> free(sorted_by_pos);
> --
> 1.5.6.2.393.g45096
>
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 3/4] index-pack: Track the object_entry that creates each base_data
2008-07-14 2:07 ` [PATCH 3/4] index-pack: Track the object_entry that creates each base_data Shawn O. Pearce
2008-07-14 10:15 ` Johannes Schindelin
@ 2008-07-15 2:50 ` Nicolas Pitre
2008-07-15 3:20 ` Shawn O. Pearce
1 sibling, 1 reply; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-15 2:50 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Junio C Hamano, Junio C Hamano, git, Stephan Hennig,
Andreas Ericsson
On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
> If we free the data stored within a base_data we need the struct
> object_entry to get the data back again for use with another
> dependent delta. Storing the object_entry* makes it simple to call
> get_data_from_pack() to recover the compressed information.
>
> This however means we must add the missing baes object to the end
Typo?
> of our packfile prior to calling resolve_delta() on each of the
> dependent deltas. Adding the base first ensures we can read the
> base back from the pack we indexing, as if it had been included by
> the remote side.
>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ACK
> ---
> index-pack.c | 18 ++++++++++++------
> 1 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/index-pack.c b/index-pack.c
> index 6c59fd3..7239e89 100644
> --- a/index-pack.c
> +++ b/index-pack.c
> @@ -29,6 +29,7 @@ union delta_base {
> struct base_data {
> struct base_data *base;
> struct base_data *child;
> + struct object_entry *obj;
> void *data;
> unsigned long size;
> };
> @@ -476,6 +477,7 @@ static void resolve_delta(struct object_entry *delta_obj,
> sha1_object(result.data, result.size, type, delta_obj->idx.sha1);
> nr_resolved_deltas++;
>
> + result.obj = delta_obj;
> link_base_data(base_obj, &result);
>
> hashcpy(delta_base.sha1, delta_obj->idx.sha1);
> @@ -588,6 +590,7 @@ static void parse_pack_objects(unsigned char *sha1)
> continue;
> base_obj.data = get_data_from_pack(obj);
> base_obj.size = obj->size;
> + base_obj.obj = obj;
> link_base_data(NULL, &base_obj);
>
> if (ref)
> @@ -633,7 +636,8 @@ static int write_compressed(int fd, void *in, unsigned int size, uint32_t *obj_c
> return size;
> }
>
> -static void append_obj_to_pack(const unsigned char *sha1, void *buf,
> +static struct object_entry *append_obj_to_pack(
> + const unsigned char *sha1, void *buf,
> unsigned long size, enum object_type type)
> {
> struct object_entry *obj = &objects[nr_objects++];
> @@ -654,6 +658,7 @@ static void append_obj_to_pack(const unsigned char *sha1, void *buf,
> obj[1].idx.offset = obj[0].idx.offset + n;
> obj[1].idx.offset += write_compressed(output_fd, buf, size, &obj[0].idx.crc32);
> hashcpy(obj->idx.sha1, sha1);
> + return obj;
> }
>
> static int delta_pos_compare(const void *_a, const void *_b)
> @@ -697,6 +702,12 @@ static void fix_unresolved_deltas(int nr_unresolved)
> base_obj.data = read_sha1_file(d->base.sha1, &type, &base_obj.size);
> if (!base_obj.data)
> continue;
> +
> + if (check_sha1_signature(d->base.sha1, base_obj.data,
> + base_obj.size, typename(type)))
> + die("local object %s is corrupt", sha1_to_hex(d->base.sha1));
> + base_obj.obj = append_obj_to_pack(d->base.sha1, base_obj.data,
> + base_obj.size, type);
> link_base_data(NULL, &base_obj);
>
> find_delta_children(&d->base, &first, &last);
> @@ -706,11 +717,6 @@ static void fix_unresolved_deltas(int nr_unresolved)
> resolve_delta(child, &base_obj, type);
> }
>
> - if (check_sha1_signature(d->base.sha1, base_obj.data,
> - base_obj.size, typename(type)))
> - die("local object %s is corrupt", sha1_to_hex(d->base.sha1));
> - append_obj_to_pack(d->base.sha1, base_obj.data,
> - base_obj.size, type);
> unlink_base_data(&base_obj);
> display_progress(progress, nr_resolved_deltas);
> }
> --
> 1.5.6.2.393.g45096
>
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas
2008-07-14 2:07 ` [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas Shawn O. Pearce
@ 2008-07-15 3:05 ` Nicolas Pitre
2008-07-15 3:18 ` Shawn O. Pearce
0 siblings, 1 reply; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-15 3:05 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Junio C Hamano, Junio C Hamano, git, Stephan Hennig,
Andreas Ericsson
On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
> If we are trying to resolve deltas for a long delta chain composed
> of multi-megabyte objects we can easily run into requiring 500M+
> of memory to hold each object in the chain on the call stack while
> we recurse into the dependent objects and resolve them.
>
> We now use a simple delta cache that discards objects near the
> bottom of the call stack first, as they are the most least recently
> used objects in this current delta chain. If we recurse out of a
> chain we may find the base object is no longer available, as it was
> free'd to keep memory under the deltaBaseCacheLimit. In such cases
> we must unpack the base object again, which will require recursing
> back to the root of the top of the delta chain as we released that
> root first.
>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
OK now I see what the 'base' pointer I previously dismissed is really
needed for.
But this patch is suboptimal as it actually recreate the same memory
pressure, to a lesser degree, this series is meant to solve. If you do:
> + struct object_entry *obj = c->obj;
> + void *raw = get_data_from_pack(obj);
> + if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA) {
> + c->data = patch_delta(
> + get_base_data(c->base), c->base->size,
> + raw, obj->size,
> + &c->size);
> + free(raw);
What you actually do is to read the delta data in memory, then recurse
down to read more delta data, then recurse down to read the base which
might be yet more delta data in memory, etc. etc. Only when you reach
the bottom of the stack will you start resolving all those deltas in
memory. Instead, the check for a delta object should be done first, and
if so then recursion for the base object be performed _before_ reading
the currently wanted object data. This way you won't have more than one
delta buffer at any time in memory.
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-15 2:47 ` Shawn O. Pearce
@ 2008-07-15 3:06 ` Nicolas Pitre
2008-07-17 16:06 ` Stephan Hennig
1 sibling, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-15 3:06 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Andreas Ericsson, Johannes Schindelin, Jakub Narebski,
Junio C Hamano, git, Stephan Hennig
On Tue, 15 Jul 2008, Shawn O. Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> >
> > Those delta chains aren't simple chained lists. They are trees of
> > deltas where one object might be the base for an unlimited number of
> > deltas of depth 1, and in turn each of those deltas might constitute the
> > base for an unlimited number of deltas of depth 2, and so on.
> >
> > So what the code does is to find out which objects are not deltas but
> > are the base for a delta. Then, for each of them, all deltas having
> > given object for base are found and they are recursively resolved so
> > each resolved delta is then considered a possible base for more deltas,
> > etc. In other words, those deltas are resolved by walking the delta
> > tree in a "depth first" fashion.
> >
> > If we discard previous delta bases, we will have to recreate them each
> > time a delta sibling is processed. And if those delta bases are
> > themselves deltas then you have an explosion of delta results to
> > re-compute.
>
> Yes, it would be horrible if we had to recompute 10 deltas in order
> to recover a previously discarded delta base in order to visit new
> siblings.
>
> But its even more horrible that we use 512M of memory in our working
> set size on a 256M machine to process a pack that is only 300M in
> size, due to long delta chains on large objects. In such a case
> the system will swap and perform fairly poorly due to the huge disk
> IO necessary to keep moving the working set around.
>
> We're better off keeping our memory usage low and recomputing
> the delta base when we need to return to it to process a sibling.
Please relax! ;-)
And have a look in your mbox.
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas
2008-07-15 3:05 ` Nicolas Pitre
@ 2008-07-15 3:18 ` Shawn O. Pearce
2008-07-15 4:45 ` [PATCH v2] " Shawn O. Pearce
0 siblings, 1 reply; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-15 3:18 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git, Stephan Hennig, Andreas Ericsson
Nicolas Pitre <nico@cam.org> wrote:
> On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
>
> But this patch is suboptimal as it actually recreate the same memory
> pressure, to a lesser degree, this series is meant to solve. If you do:
Arrgh. Good catch on review. I'll post a replacement patch shortly
that doesn't suffer from this problem. Thanks!
> > + struct object_entry *obj = c->obj;
> > + void *raw = get_data_from_pack(obj);
> > + if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA) {
> > + c->data = patch_delta(
> > + get_base_data(c->base), c->base->size,
> > + raw, obj->size,
> > + &c->size);
> > + free(raw);
--
Shawn.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 3/4] index-pack: Track the object_entry that creates each base_data
2008-07-15 2:50 ` Nicolas Pitre
@ 2008-07-15 3:20 ` Shawn O. Pearce
2008-07-15 3:42 ` Nicolas Pitre
0 siblings, 1 reply; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-15 3:20 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git, Stephan Hennig, Andreas Ericsson
Nicolas Pitre <nico@cam.org> wrote:
> On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
>
> > If we free the data stored within a base_data we need the struct
> > object_entry to get the data back again for use with another
> > dependent delta. Storing the object_entry* makes it simple to call
> > get_data_from_pack() to recover the compressed information.
> >
> > This however means we must add the missing baes object to the end
>
> Typo?
Yea, Dscho also pointed it out.
Junio, if you can, please fix this typo in the commit message.
Its not really a big deal. I have no plans on posting a replacement
patch just for this one small typo. Plenty of my current commits
in git.git have typos too. I'm not that good of a typist. And gcc
doesn't check my commit messages. ;-)
--
Shawn.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 3/4] index-pack: Track the object_entry that creates each base_data
2008-07-15 3:20 ` Shawn O. Pearce
@ 2008-07-15 3:42 ` Nicolas Pitre
0 siblings, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-15 3:42 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Junio C Hamano, git, Stephan Hennig, Andreas Ericsson
On Tue, 15 Jul 2008, Shawn O. Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> > On Sun, 13 Jul 2008, Shawn O. Pearce wrote:
> >
> > > If we free the data stored within a base_data we need the struct
> > > object_entry to get the data back again for use with another
> > > dependent delta. Storing the object_entry* makes it simple to call
> > > get_data_from_pack() to recover the compressed information.
> > >
> > > This however means we must add the missing baes object to the end
> >
> > Typo?
>
> Yea, Dscho also pointed it out.
>
> Junio, if you can, please fix this typo in the commit message.
>
> Its not really a big deal. I have no plans on posting a replacement
> patch just for this one small typo.
No, it is not a big deal. I do write crappy english myself. I
initially had some comments on the patch itself which whould have
warranted another patch and then changed my mind.
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-14 11:54 ` Jakub Narebski
2008-07-14 12:10 ` Johannes Schindelin
2008-07-14 12:16 ` Andreas Ericsson
@ 2008-07-15 4:19 ` Shawn O. Pearce
2 siblings, 0 replies; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-15 4:19 UTC (permalink / raw)
To: Jakub Narebski
Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano, git,
Stephan Hennig, Andreas Ericsson
Jakub Narebski <jnareb@gmail.com> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > On Mon, 14 Jul 2008, Shawn O. Pearce wrote:
>
> > > The only other alternative I can come up with is to change pack-objects
> > > (or at least its defaults) so we don't generate these massive delta
> > > chains. But there are already packs in the wild that use these chains,
> > > resulting in performance problems for clients.
> >
> > But the long chains make the pack actually as efficient as it is...
>
> Perhaps Shawn thought here about limiting delta chain not by its
> *length*, but by its *size* (as required when unpacking last object
> in a delta chanin).
>
> What do you think about this idea?
I was talking about that. Or at least thinking it. But its not a
good solution, as Dscho points out those long chains is what makes
the pack file such a powerful source code compressor.
IMHO the right solution is to continue to allow these long chains,
especially since they are typically used for the older, less often
accessed objects, and use bounded caching where necessary in the
readers to avoid unpacking costs. We already have a bounded cache
for the random access case used by most programs. What we missed
was the bounded cache in index-pack, and my 4 patch series is an
attempt to define that.
--
Shawn.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH v2] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas
2008-07-15 3:18 ` Shawn O. Pearce
@ 2008-07-15 4:45 ` Shawn O. Pearce
2008-07-15 5:05 ` Nicolas Pitre
2008-07-15 18:48 ` Junio C Hamano
0 siblings, 2 replies; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-15 4:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Nicolas Pitre, Stephan Hennig, Andreas Ericsson
If we are trying to resolve deltas for a long delta chain composed
of multi-megabyte objects we can easily run into requiring 500M+
of memory to hold each object in the chain on the call stack while
we recurse into the dependent objects and resolve them.
We now use a simple delta cache that discards objects near the
bottom of the call stack first, as they are the most least recently
used objects in this current delta chain. If we recurse out of a
chain we may find the base object is no longer available, as it was
free'd to keep memory under the deltaBaseCacheLimit. In such cases
we must unpack the base object again, which will require recursing
back to the root of the top of the delta chain as we released that
root first.
The astute reader will probably realize that we can still exceed
the delta base cache limit, but this happens only if the most
recent base plus the delta plus the inflated dependent sum up to
more than the base cache limit. Due to the way patch_delta is
currently implemented we cannot operate in less memory anyway.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
Version 2 plugs the case Nico noticed, where the patch was causing
the exact behavior it was trying to prevent while recovering from
what it did to avoid the excessive memory usage in the first place.
The change was in get_base_data() where we now unpack the delta
after we have unpacked the base. Nico and I both missed that we
must also bump base_cache_used when we restore the base, and we
must also prune the bases in case this base has caused us to go
over the limit.
:-)
index-pack.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/index-pack.c b/index-pack.c
index 7239e89..b4ec736 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -52,6 +52,7 @@ struct delta_entry
static struct object_entry *objects;
static struct delta_entry *deltas;
static struct base_data *base_cache;
+static size_t base_cache_used;
static int nr_objects;
static int nr_deltas;
static int nr_resolved_deltas;
@@ -220,6 +221,20 @@ static void bad_object(unsigned long offset, const char *format, ...)
die("pack has bad object at offset %lu: %s", offset, buf);
}
+static void prune_base_data(struct base_data *retain)
+{
+ struct base_data *b = base_cache;
+ for (b = base_cache;
+ base_cache_used > delta_base_cache_limit && b;
+ b = b->child) {
+ if (b->data && b != retain) {
+ free(b->data);
+ b->data = NULL;
+ base_cache_used -= b->size;
+ }
+ }
+}
+
static void link_base_data(struct base_data *base, struct base_data *c)
{
if (base)
@@ -229,6 +244,8 @@ static void link_base_data(struct base_data *base, struct base_data *c)
c->base = base;
c->child = NULL;
+ base_cache_used += c->size;
+ prune_base_data(c);
}
static void unlink_base_data(struct base_data *c)
@@ -238,7 +255,10 @@ static void unlink_base_data(struct base_data *c)
base->child = NULL;
else
base_cache = NULL;
- free(c->data);
+ if (c->data) {
+ free(c->data);
+ base_cache_used -= c->size;
+ }
}
static void *unpack_entry_data(unsigned long offset, unsigned long size)
@@ -456,6 +476,30 @@ static void sha1_object(const void *data, unsigned long size,
}
}
+static void *get_base_data(struct base_data *c)
+{
+ if (!c->data) {
+ struct object_entry *obj = c->obj;
+
+ if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA) {
+ void *base = get_base_data(c->base);
+ void *raw = get_data_from_pack(obj);
+ c->data = patch_delta(
+ base, c->base->size,
+ raw, obj->size,
+ &c->size);
+ free(raw);
+ if (!c->data)
+ bad_object(obj->idx.offset, "failed to apply delta");
+ } else
+ c->data = get_data_from_pack(obj);
+
+ base_cache_used += c->size;
+ prune_base_data(c);
+ }
+ return c->data;
+}
+
static void resolve_delta(struct object_entry *delta_obj,
struct base_data *base_obj, enum object_type type)
{
@@ -468,7 +512,7 @@ static void resolve_delta(struct object_entry *delta_obj,
delta_obj->real_type = type;
delta_data = get_data_from_pack(delta_obj);
delta_size = delta_obj->size;
- result.data = patch_delta(base_obj->data, base_obj->size,
+ result.data = patch_delta(get_base_data(base_obj), base_obj->size,
delta_data, delta_size,
&result.size);
free(delta_data);
--
1.5.6.2.393.g45096
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH v2] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas
2008-07-15 4:45 ` [PATCH v2] " Shawn O. Pearce
@ 2008-07-15 5:05 ` Nicolas Pitre
2008-07-15 18:48 ` Junio C Hamano
1 sibling, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-15 5:05 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Junio C Hamano, git, Stephan Hennig, Andreas Ericsson
On Tue, 15 Jul 2008, Shawn O. Pearce wrote:
> If we are trying to resolve deltas for a long delta chain composed
> of multi-megabyte objects we can easily run into requiring 500M+
> of memory to hold each object in the chain on the call stack while
> we recurse into the dependent objects and resolve them.
>
> We now use a simple delta cache that discards objects near the
> bottom of the call stack first, as they are the most least recently
> used objects in this current delta chain. If we recurse out of a
> chain we may find the base object is no longer available, as it was
> free'd to keep memory under the deltaBaseCacheLimit. In such cases
> we must unpack the base object again, which will require recursing
> back to the root of the top of the delta chain as we released that
> root first.
>
> The astute reader will probably realize that we can still exceed
> the delta base cache limit, but this happens only if the most
> recent base plus the delta plus the inflated dependent sum up to
> more than the base cache limit. Due to the way patch_delta is
> currently implemented we cannot operate in less memory anyway.
You may also set the limit lower than the size of the largest object
which is going to bust it too. Same goes for the cache in sha1_file.c.
There is simply a limit to how tight we can sanely make memory usage.
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ACK
> ---
>
> Version 2 plugs the case Nico noticed, where the patch was causing
> the exact behavior it was trying to prevent while recovering from
> what it did to avoid the excessive memory usage in the first place.
>
> The change was in get_base_data() where we now unpack the delta
> after we have unpacked the base. Nico and I both missed that we
> must also bump base_cache_used when we restore the base, and we
> must also prune the bases in case this base has caused us to go
> over the limit.
>
> :-)
>
> index-pack.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
> 1 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/index-pack.c b/index-pack.c
> index 7239e89..b4ec736 100644
> --- a/index-pack.c
> +++ b/index-pack.c
> @@ -52,6 +52,7 @@ struct delta_entry
> static struct object_entry *objects;
> static struct delta_entry *deltas;
> static struct base_data *base_cache;
> +static size_t base_cache_used;
> static int nr_objects;
> static int nr_deltas;
> static int nr_resolved_deltas;
> @@ -220,6 +221,20 @@ static void bad_object(unsigned long offset, const char *format, ...)
> die("pack has bad object at offset %lu: %s", offset, buf);
> }
>
> +static void prune_base_data(struct base_data *retain)
> +{
> + struct base_data *b = base_cache;
> + for (b = base_cache;
> + base_cache_used > delta_base_cache_limit && b;
> + b = b->child) {
> + if (b->data && b != retain) {
> + free(b->data);
> + b->data = NULL;
> + base_cache_used -= b->size;
> + }
> + }
> +}
> +
> static void link_base_data(struct base_data *base, struct base_data *c)
> {
> if (base)
> @@ -229,6 +244,8 @@ static void link_base_data(struct base_data *base, struct base_data *c)
>
> c->base = base;
> c->child = NULL;
> + base_cache_used += c->size;
> + prune_base_data(c);
> }
>
> static void unlink_base_data(struct base_data *c)
> @@ -238,7 +255,10 @@ static void unlink_base_data(struct base_data *c)
> base->child = NULL;
> else
> base_cache = NULL;
> - free(c->data);
> + if (c->data) {
> + free(c->data);
> + base_cache_used -= c->size;
> + }
> }
>
> static void *unpack_entry_data(unsigned long offset, unsigned long size)
> @@ -456,6 +476,30 @@ static void sha1_object(const void *data, unsigned long size,
> }
> }
>
> +static void *get_base_data(struct base_data *c)
> +{
> + if (!c->data) {
> + struct object_entry *obj = c->obj;
> +
> + if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA) {
> + void *base = get_base_data(c->base);
> + void *raw = get_data_from_pack(obj);
> + c->data = patch_delta(
> + base, c->base->size,
> + raw, obj->size,
> + &c->size);
> + free(raw);
> + if (!c->data)
> + bad_object(obj->idx.offset, "failed to apply delta");
> + } else
> + c->data = get_data_from_pack(obj);
> +
> + base_cache_used += c->size;
> + prune_base_data(c);
> + }
> + return c->data;
> +}
> +
> static void resolve_delta(struct object_entry *delta_obj,
> struct base_data *base_obj, enum object_type type)
> {
> @@ -468,7 +512,7 @@ static void resolve_delta(struct object_entry *delta_obj,
> delta_obj->real_type = type;
> delta_data = get_data_from_pack(delta_obj);
> delta_size = delta_obj->size;
> - result.data = patch_delta(base_obj->data, base_obj->size,
> + result.data = patch_delta(get_base_data(base_obj), base_obj->size,
> delta_data, delta_size,
> &result.size);
> free(delta_data);
> --
> 1.5.6.2.393.g45096
>
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas
2008-07-15 4:45 ` [PATCH v2] " Shawn O. Pearce
2008-07-15 5:05 ` Nicolas Pitre
@ 2008-07-15 18:48 ` Junio C Hamano
1 sibling, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2008-07-15 18:48 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git, Nicolas Pitre, Stephan Hennig, Andreas Ericsson
"Shawn O. Pearce" <spearce@spearce.org> writes:
> Version 2 plugs the case Nico noticed, where the patch was causing
> the exact behavior it was trying to prevent while recovering from
> what it did to avoid the excessive memory usage in the first place.
Thanks both; it makes sense.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-15 2:47 ` Shawn O. Pearce
2008-07-15 3:06 ` Nicolas Pitre
@ 2008-07-17 16:06 ` Stephan Hennig
2008-07-17 16:25 ` Nicolas Pitre
1 sibling, 1 reply; 51+ messages in thread
From: Stephan Hennig @ 2008-07-17 16:06 UTC (permalink / raw)
To: git
Cc: Shawn O. Pearce, Nicolas Pitre, Andreas Ericsson,
Johannes Schindelin, Jakub Narebski, Junio C Hamano
Shawn O. Pearce schrieb:
> We're better off keeping our memory usage low and recomputing
> the delta base when we need to return to it to process a sibling.
Thanks to all who have had a look at this issue! From a user's
perspective I have one more suggestions and a question:
First, it would have helped me to bring this issue onto the list if I
had earlier known that this was no misconfiguration, but a memory
problem. Even though Git now makes some efforts to substitute runtime
for memory to be able to operate with low(er) memory, I think it would
still be informative for a user that repository and hardware, resp.
core.deltaBaseCacheLimit, are, say, incompatible. If valuable objects
have to be discarded due to memory restrictions a warning could be
issued to make the user aware of this fact, e.g.,
Warning! Low memory. Git might be slowing down.
Second, while there have been some changes to Git now, as a poor user,
how can I make use of that changes? I think, updating my client should
only help with pushing. For pulling, I have to wait for repo.or.cz to
update to Git 1.6.0, right?
Best regards,
Stephan Hennig
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-17 16:06 ` Stephan Hennig
@ 2008-07-17 16:25 ` Nicolas Pitre
2008-07-17 21:35 ` Shawn O. Pearce
0 siblings, 1 reply; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-17 16:25 UTC (permalink / raw)
To: Stephan Hennig
Cc: Shawn O. Pearce, Andreas Ericsson, Johannes Schindelin,
Jakub Narebski, Junio C Hamano, git
On Thu, 17 Jul 2008, Stephan Hennig wrote:
> Shawn O. Pearce schrieb:
>
> > We're better off keeping our memory usage low and recomputing
> > the delta base when we need to return to it to process a sibling.
>
> Thanks to all who have had a look at this issue! From a user's
> perspective I have one more suggestions and a question:
>
> First, it would have helped me to bring this issue onto the list if I
> had earlier known that this was no misconfiguration, but a memory
> problem.
Well, if we had known before that this could be a problem, we'd have
fixed it earlier. In other words, sh*t happens.
> Even though Git now makes some efforts to substitute runtime
> for memory to be able to operate with low(er) memory, I think it would
> still be informative for a user that repository and hardware, resp.
> core.deltaBaseCacheLimit, are, say, incompatible. If valuable objects
> have to be discarded due to memory restrictions a warning could be
> issued to make the user aware of this fact, e.g.,
>
> Warning! Low memory. Git might be slowing down.
Well, I disagree. First we don't know how slow git would effectively be
since all (my) concerns so far were totally theoretical. It will still
work better than, say, 'git verify-pack' nevertheless. And git should
just do its best regardless and avoid being needlessly verbose.
> Second, while there have been some changes to Git now, as a poor user,
> how can I make use of that changes? I think, updating my client should
> only help with pushing. For pulling, I have to wait for repo.or.cz to
> update to Git 1.6.0, right?
Actually that's the other way around. This change will help the
receiving side of a transfer. So it should help you when pulling.
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
2008-07-17 16:25 ` Nicolas Pitre
@ 2008-07-17 21:35 ` Shawn O. Pearce
2008-07-17 22:02 ` [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small Shawn O. Pearce
0 siblings, 1 reply; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-17 21:35 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Stephan Hennig, Andreas Ericsson, Johannes Schindelin,
Jakub Narebski, Junio C Hamano, git
Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 17 Jul 2008, Stephan Hennig wrote:
> > Even though Git now makes some efforts to substitute runtime
> > for memory to be able to operate with low(er) memory, I think it would
> > still be informative for a user that repository and hardware, resp.
> > core.deltaBaseCacheLimit, are, say, incompatible. If valuable objects
> > have to be discarded due to memory restrictions a warning could be
> > issued to make the user aware of this fact, e.g.,
> >
> > Warning! Low memory. Git might be slowing down.
>
> Well, I disagree. First we don't know how slow git would effectively be
> since all (my) concerns so far were totally theoretical. It will still
> work better than, say, 'git verify-pack' nevertheless. And git should
> just do its best regardless and avoid being needlessly verbose.
Actually, this warning may be a good idea. I'll post an RFC patch
for it in a few minutes. If people hate the idea, that's what an
RFC is for. :)
--
Shawn.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small
2008-07-17 21:35 ` Shawn O. Pearce
@ 2008-07-17 22:02 ` Shawn O. Pearce
2008-07-17 23:45 ` Nicolas Pitre
0 siblings, 1 reply; 51+ messages in thread
From: Shawn O. Pearce @ 2008-07-17 22:02 UTC (permalink / raw)
To: Nicolas Pitre, Stephan Hennig
Cc: Andreas Ericsson, Johannes Schindelin, Jakub Narebski,
Junio C Hamano, git
Its rare that we should exceed deltaBaseCacheLimit while resolving
delta compressed objects. By default this limit is 16M, and most
chains are under 50 objects in length. This affords about 327K per
object in the chain, which is quite large by source code standards.
If we have to recreate a prior delta base because we evicted it to
stay within the deltaBaseCacheLimit we can warn the user that their
configured limit is perhaps too low for this repository data set.
If the user keeps seeing the warning they can research it in the
documentation, and consider setting it higher on this repository,
or just globally on their system.
Suggested-by: Stephan Hennig <mailing_list@arcor.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
index-pack.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/index-pack.c b/index-pack.c
index ac20a46..97533d6 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -53,6 +53,7 @@ static struct object_entry *objects;
static struct delta_entry *deltas;
static struct base_data *base_cache;
static size_t base_cache_used;
+static int oom_warning;
static int nr_objects;
static int nr_deltas;
static int nr_resolved_deltas;
@@ -481,6 +482,13 @@ static void *get_base_data(struct base_data *c)
if (!c->data) {
struct object_entry *obj = c->obj;
+ if (!oom_warning && verbose) {
+ if (progress)
+ fputc('\n', stderr);
+ warning("One or more delta chains are larger than deltaBaseCache.");
+ oom_warning = 1;
+ }
+
if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA) {
void *base = get_base_data(c->base);
void *raw = get_data_from_pack(obj);
--
1.5.6.3.569.ga9185
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small
2008-07-17 22:02 ` [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small Shawn O. Pearce
@ 2008-07-17 23:45 ` Nicolas Pitre
0 siblings, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2008-07-17 23:45 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Stephan Hennig, Andreas Ericsson, Johannes Schindelin,
Jakub Narebski, Junio C Hamano, git
On Thu, 17 Jul 2008, Shawn O. Pearce wrote:
> Its rare that we should exceed deltaBaseCacheLimit while resolving
> delta compressed objects. By default this limit is 16M, and most
> chains are under 50 objects in length. This affords about 327K per
> object in the chain, which is quite large by source code standards.
>
> If we have to recreate a prior delta base because we evicted it to
> stay within the deltaBaseCacheLimit we can warn the user that their
> configured limit is perhaps too low for this repository data set.
> If the user keeps seeing the warning they can research it in the
> documentation, and consider setting it higher on this repository,
> or just globally on their system.
As I said earlier, I don't think this is a good idea, but I'll elaborate
a bit more.
First, this is a really bad clue for setting deltaBaseCacheLimit. The
likelyhood of this warning to actually show up during an initial clone
is relatively high, yet this doesn't mean that deltaBaseCacheLimit has
to be changed at all. For one, the real time usage of
deltaBaseCacheLimit is to cap a cache of objects for multiple delta
chains with random access, and not only one chain traversed linearly
like in the index-pack case,
and that cache is
likely to always be full and in active eviction mode -- that's the point
of a cap after all. In the index-pack this is only used to avoid
excessive memory usage for intermediate delta results and not really a
cache. In other words, we have two rather different usages for the same
settings. Now don't read me wrong: I think that reusing this setting is
sensible, but its value should not be determined by what index-pack may
happen to do with it, especially on a first clone. And issuing warnings
on the first clone is not the way to give new users confidence either.
Secondly, on subsequent fetches, the warning is likely to never appear
again due to the fact that the delta chains will typically be much
shorter. And that would be true even if in reality the runtime access
to the repository would benefit a lot from deltaBaseCacheLimit being
raised. And it is the runtime access which is important here, not the
occasional fetch. Yet the full delta chains are not likely to be walked
in their entirety very often anyway either.
Thirdly, if such indication is considered useful, then it should really
be part of some statistic/analysis tool, such as verify-pack for
example. Such a tool could compute the exact memory requirements for a
given repository usage and possibly provide suggestions as to what the
optimal deltaBaseCacheLimit value could be. But yet that cache has a
hardcoded number of entries at the moment and its hash function might
not be optimal either, making the connection with index-pack even more
apart.
And finally, I think that index-pack would benefit a lot from a really
simple optimization which is to free the resulting intermediate delta
base object right away when there is only one delta child to resolve,
before that child is itself used as a base for further delta
grand-children. That is likely to cover most cases of big delta chains
already, making that warning an even worse indicator.
> Suggested-by: Stephan Hennig <mailing_list@arcor.de>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Unrecommended-by: Nicolas Pitre <nico@cam.org>
Nicolas
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2008-07-17 23:46 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-10 14:40 git pull is slow Stephan Hennig
2008-07-10 15:13 ` Martin Langhoff
2008-07-10 15:28 ` Petr Baudis
2008-07-10 15:30 ` Johannes Sixt
2008-07-10 15:45 ` Stephan Hennig
2008-07-10 15:50 ` Petr Baudis
2008-07-10 17:44 ` Stephan Hennig
2008-07-11 12:25 ` Stephan Hennig
2008-07-11 13:34 ` Andreas Ericsson
2008-07-11 14:04 ` Johannes Schindelin
2008-07-12 12:32 ` Stephan Hennig
2008-07-12 17:05 ` Johannes Schindelin
2008-07-13 1:15 ` Shawn O. Pearce
2008-07-13 13:59 ` Johannes Schindelin
2008-07-13 22:11 ` Shawn O. Pearce
2008-07-14 2:07 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2008-07-14 2:27 ` Nicolas Pitre
2008-07-14 3:12 ` Shawn O. Pearce
2008-07-14 11:44 ` Johannes Schindelin
2008-07-14 11:54 ` Jakub Narebski
2008-07-14 12:10 ` Johannes Schindelin
2008-07-14 12:16 ` Andreas Ericsson
2008-07-14 12:25 ` Johannes Schindelin
2008-07-14 12:51 ` Andreas Ericsson
2008-07-14 12:58 ` Johannes Schindelin
2008-07-15 2:21 ` Nicolas Pitre
2008-07-15 2:47 ` Shawn O. Pearce
2008-07-15 3:06 ` Nicolas Pitre
2008-07-17 16:06 ` Stephan Hennig
2008-07-17 16:25 ` Nicolas Pitre
2008-07-17 21:35 ` Shawn O. Pearce
2008-07-17 22:02 ` [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small Shawn O. Pearce
2008-07-17 23:45 ` Nicolas Pitre
2008-07-15 4:19 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2008-07-14 2:07 ` [PATCH 1/4] index-pack: Refactor base arguments of resolve_delta into a struct Shawn O. Pearce
2008-07-15 2:40 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 2/4] index-pack: Chain the struct base_data on the stack for traversal Shawn O. Pearce
2008-07-15 2:48 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 3/4] index-pack: Track the object_entry that creates each base_data Shawn O. Pearce
2008-07-14 10:15 ` Johannes Schindelin
2008-07-15 2:50 ` Nicolas Pitre
2008-07-15 3:20 ` Shawn O. Pearce
2008-07-15 3:42 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas Shawn O. Pearce
2008-07-15 3:05 ` Nicolas Pitre
2008-07-15 3:18 ` Shawn O. Pearce
2008-07-15 4:45 ` [PATCH v2] " Shawn O. Pearce
2008-07-15 5:05 ` Nicolas Pitre
2008-07-15 18:48 ` Junio C Hamano
2008-07-13 9:01 ` git pull is slow Stephan Hennig
2008-07-11 12:55 ` Stephan Hennig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).